|
|
Created:
6 years, 9 months ago by jvoung (off chromium) Modified:
6 years, 9 months ago CC:
native-client-reviews_googlegroups.com Base URL:
http://git.chromium.org/native_client/pnacl-llvm.git@master Visibility:
Public. |
DescriptionAdd self-scheduling to threaded translation (vs static)
For the x86-64 sandboxed translator it makes about a 5%
difference over a set of 13 pexes from the web / webstore.
For python.pexe this reduces the translation time w/ 4
threads from 43 seconds to 37 seconds. Most of the other
web benchmarks only get slight improvements (if any).
For the ARM sandboxed translator, it's about 4.5%
aggregate difference for on the a15. Again, most of the
apps are a wash except for python, gamecake and the
gameboy emulator.
For the unsandboxed translator, this is often a wash,
except on a few benchmarks.
Testing on spec2k: Same or better for sandboxed translator.
Notably, crafty translation time drops from 1.59 secs to
1.19 secs w/ CPU utilization going from 1.8 to 2.7
w/ 4 threads. With two threads it's 2.1 secs to 1.6 secs
and CPU utilization from 1.2 to 1.7. Crafty was one of
the benchmarks that regressed after switching from 3.3
to 3.4 and perhaps part of it was bad scheduling.
The gcc and eon benchmarks also get a nice boost.
Similar on ARM: crafty drops from 5.7 to 4.8 w/ 2 threads.
Use a ChunkSize of 8, to reduce the synchronization
overhead somewhat, hopefully without hurting streaming
translation. This tapers off to 1 when there are
few functions remaining to attempt finer grained load
balancing near the end.
For some idea of how this affects streaming translation
the average function size for the benchmarks range
from 160 bytes to 1700 bytes. So 8 times that is
about 1.3KB to 14KB. Geomean of average func size is
570 bytes. For that geomean, 4 threads times 8 functions
would grab ~18KB at a time on geo-average-ish.
BUG= http://code.google.com/p/nativeclient/issues/detail?id=3777
R=jfb@chromium.org, stichnot@chromium.org
Committed: https://gerrit.chromium.org/gerrit/gitweb?p=native_client/pnacl-llvm.git;a=commit;h=91ec3c7
Patch Set 1 #Patch Set 2 : cleanup #Patch Set 3 : lower chunk size just in case #
Total comments: 15
Patch Set 4 : jim review #Patch Set 5 : two dots #Patch Set 6 : disallow copy and assign #
Total comments: 21
Patch Set 7 : JFs review #Patch Set 8 : add assert and comment, etc. #Patch Set 9 : reinstate check #
Messages
Total messages: 12 (0 generated)
https://codereview.chromium.org/196793026/diff/30001/tools/pnacl-llc/Threaded... File tools/pnacl-llc/ThreadedFunctionQueue.h (right): https://codereview.chromium.org/196793026/diff/30001/tools/pnacl-llc/Threaded... tools/pnacl-llc/ThreadedFunctionQueue.h:68: unsigned RecommendedChunkSize() { declare as const https://codereview.chromium.org/196793026/diff/30001/tools/pnacl-llc/Threaded... tools/pnacl-llc/ThreadedFunctionQueue.h:81: unsigned SplitModuleCount; Make SplitModuleCount and NumFunctions const? https://codereview.chromium.org/196793026/diff/30001/tools/pnacl-llc/pnacl-ll... File tools/pnacl-llc/pnacl-llc.cpp (right): https://codereview.chromium.org/196793026/diff/30001/tools/pnacl-llc/pnacl-ll... tools/pnacl-llc/pnacl-llc.cpp:135: NoSplitModuleDynamic("no-split-module-dynamic", Instead of naming a variable with a negative meaning, could we use something like SplitModuleDynamic or SplitModuleStatic? https://codereview.chromium.org/196793026/diff/30001/tools/pnacl-llc/pnacl-ll... tools/pnacl-llc/pnacl-llc.cpp:458: while(FuncIndex < NextIndex) { while( ==> while ( http://llvm.org/docs/CodingStandards.html#spaces-before-parentheses https://codereview.chromium.org/196793026/diff/30001/tools/pnacl-llc/pnacl-ll... tools/pnacl-llc/pnacl-llc.cpp:464: if (I == E) { Maybe add "&& I != E" to the "while" condition instead of "if(...)break;"? https://codereview.chromium.org/196793026/diff/30001/tools/pnacl-llc/pnacl-ll... tools/pnacl-llc/pnacl-llc.cpp:469: while(FuncIndex < NextIndex) { while( ==> while ( http://llvm.org/docs/CodingStandards.html#spaces-before-parentheses https://codereview.chromium.org/196793026/diff/30001/tools/pnacl-llc/pnacl-ll... tools/pnacl-llc/pnacl-llc.cpp:472: if (I == E) { Maybe add "&& I != E" to the "while" condition instead of "if(...)break;"?
Thanks! https://codereview.chromium.org/196793026/diff/30001/tools/pnacl-llc/Threaded... File tools/pnacl-llc/ThreadedFunctionQueue.h (right): https://codereview.chromium.org/196793026/diff/30001/tools/pnacl-llc/Threaded... tools/pnacl-llc/ThreadedFunctionQueue.h:68: unsigned RecommendedChunkSize() { On 2014/03/18 20:49:49, stichnot wrote: > declare as const Done. https://codereview.chromium.org/196793026/diff/30001/tools/pnacl-llc/Threaded... tools/pnacl-llc/ThreadedFunctionQueue.h:81: unsigned SplitModuleCount; On 2014/03/18 20:49:49, stichnot wrote: > Make SplitModuleCount and NumFunctions const? Done. https://codereview.chromium.org/196793026/diff/30001/tools/pnacl-llc/pnacl-ll... File tools/pnacl-llc/pnacl-llc.cpp (right): https://codereview.chromium.org/196793026/diff/30001/tools/pnacl-llc/pnacl-ll... tools/pnacl-llc/pnacl-llc.cpp:135: NoSplitModuleDynamic("no-split-module-dynamic", On 2014/03/18 20:49:49, stichnot wrote: > Instead of naming a variable with a negative meaning, could we use something > like SplitModuleDynamic or SplitModuleStatic? Changed to small enum -- is that what you had in mind? https://codereview.chromium.org/196793026/diff/30001/tools/pnacl-llc/pnacl-ll... tools/pnacl-llc/pnacl-llc.cpp:458: while(FuncIndex < NextIndex) { On 2014/03/18 20:49:49, stichnot wrote: > while( ==> while ( > http://llvm.org/docs/CodingStandards.html#spaces-before-parentheses Done. https://codereview.chromium.org/196793026/diff/30001/tools/pnacl-llc/pnacl-ll... tools/pnacl-llc/pnacl-llc.cpp:464: if (I == E) { On 2014/03/18 20:49:49, stichnot wrote: > Maybe add "&& I != E" to the "while" condition instead of "if(...)break;"? Done. Well at least I didn't use a goto =) https://codereview.chromium.org/196793026/diff/30001/tools/pnacl-llc/pnacl-ll... tools/pnacl-llc/pnacl-llc.cpp:469: while(FuncIndex < NextIndex) { On 2014/03/18 20:49:49, stichnot wrote: > while( ==> while ( > http://llvm.org/docs/CodingStandards.html#spaces-before-parentheses Done. https://codereview.chromium.org/196793026/diff/30001/tools/pnacl-llc/pnacl-ll... tools/pnacl-llc/pnacl-llc.cpp:472: if (I == E) { On 2014/03/18 20:49:49, stichnot wrote: > Maybe add "&& I != E" to the "while" condition instead of "if(...)break;"? Done.
lgtm https://codereview.chromium.org/196793026/diff/30001/tools/pnacl-llc/pnacl-ll... File tools/pnacl-llc/pnacl-llc.cpp (right): https://codereview.chromium.org/196793026/diff/30001/tools/pnacl-llc/pnacl-ll... tools/pnacl-llc/pnacl-llc.cpp:135: NoSplitModuleDynamic("no-split-module-dynamic", On 2014/03/18 22:01:17, jvoung (cr) wrote: > On 2014/03/18 20:49:49, stichnot wrote: > > Instead of naming a variable with a negative meaning, could we use something > > like SplitModuleDynamic or SplitModuleStatic? > > Changed to small enum -- is that what you had in mind? I was just thinking of naming it without using the word "No", but this is cool. :)
https://codereview.chromium.org/196793026/diff/80001/tools/pnacl-llc/Threaded... File tools/pnacl-llc/ThreadedFunctionQueue.h (right): https://codereview.chromium.org/196793026/diff/80001/tools/pnacl-llc/Threaded... tools/pnacl-llc/ThreadedFunctionQueue.h:23: : SplitModuleCount(SplitModuleCount), Silly corner case, but checking that SplitModuleCount isn't 0 seems like a good idea. https://codereview.chromium.org/196793026/diff/80001/tools/pnacl-llc/Threaded... tools/pnacl-llc/ThreadedFunctionQueue.h:35: return FuncIndex % SplitModuleCount == ThreadIndex; IIUC this won't work if SplitModuleCount < ThreadIndex? Say you have 16 threads and you statically split modules 8 ways, then each function will get translated twice? https://codereview.chromium.org/196793026/diff/80001/tools/pnacl-llc/Threaded... tools/pnacl-llc/ThreadedFunctionQueue.h:57: return __sync_bool_compare_and_swap(&CurrentFunction, Cur, NextIndex); The failure case here doesn't actually set NextIndex at the right location: say ChunkSize was 8 for this thread and 4 for another, the other thread consumes first, then this thread will fail consuming, but its NextIndex will be advanced by 8 instead of 4. https://codereview.chromium.org/196793026/diff/80001/tools/pnacl-llc/Threaded... tools/pnacl-llc/ThreadedFunctionQueue.h:71: std::max((int)(NumFunctions - CurrentFunction), 1); Converting an unsigned that's not representable as signed is undefined behavior. I'd just have everything be signed from the start :-) https://codereview.chromium.org/196793026/diff/80001/tools/pnacl-llc/Threaded... tools/pnacl-llc/ThreadedFunctionQueue.h:76: return std::max(1U, DynamicChunkSize); return max(1, min(8, DynamicChunkSize)); https://codereview.chromium.org/196793026/diff/80001/tools/pnacl-llc/Threaded... tools/pnacl-llc/ThreadedFunctionQueue.h:86: void operator=(const ThreadedFunctionQueue&) LLVM_DELETED_FUNCTION; Not that it matters since it's deleted, but operator= usually returns T&. https://codereview.chromium.org/196793026/diff/80001/tools/pnacl-llc/pnacl-ll... File tools/pnacl-llc/pnacl-llc.cpp (right): https://codereview.chromium.org/196793026/diff/80001/tools/pnacl-llc/pnacl-ll... tools/pnacl-llc/pnacl-llc.cpp:453: if (SplitModuleSched == SplitModuleStatic) { It would be nicer if this were a switch on SplitModuleSched, since adding a new scheduling type then wouldn't require changing if/else/else if, and may give a compiler warning on unhandled enum value. https://codereview.chromium.org/196793026/diff/80001/tools/pnacl-llc/pnacl-ll... tools/pnacl-llc/pnacl-llc.cpp:470: while (FuncIndex < NextIndex && I != E) { Can it happen that I == E? Shouldn't we have failed to get that many functions?
https://codereview.chromium.org/196793026/diff/80001/tools/pnacl-llc/Threaded... File tools/pnacl-llc/ThreadedFunctionQueue.h (right): https://codereview.chromium.org/196793026/diff/80001/tools/pnacl-llc/Threaded... tools/pnacl-llc/ThreadedFunctionQueue.h:23: : SplitModuleCount(SplitModuleCount), On 2014/03/19 04:47:05, JF wrote: > Silly corner case, but checking that SplitModuleCount isn't 0 seems like a good > idea. Done. https://codereview.chromium.org/196793026/diff/80001/tools/pnacl-llc/Threaded... tools/pnacl-llc/ThreadedFunctionQueue.h:35: return FuncIndex % SplitModuleCount == ThreadIndex; On 2014/03/19 04:47:05, JF wrote: > IIUC this won't work if SplitModuleCount < ThreadIndex? Say you have 16 threads > and you statically split modules 8 ways, then each function will get translated > twice? It's pretty unlikely SplitModuleCount < ThreadIndex, or if it is we would have a different parallelization strategy. That would mean it's safe to share the same module between threads. If that was true, then we would just share the single global module between every thread instead of having the overhead of cloning global variables and memcpy'ing bitcode stream bytes for the different modules. However, this probably won't work with SplitModuleCount > # Threads either. It would leave some functions out because there wouldn't be any threads to work on some of the modules. In order to support that, the module sharding would need to be organized a bit differently. E.g., some modules would take functions [0, NumFuncs/N), some would take [NumFuncs/N, 2 * NumFuncs/N), ... Then those module sets would be sharded further between the threads, e.g., with this modulo style of sharding. Overall, I think this class is only concerned about sharding things between # of threads, and the range of functions. For now, the range of functions is [0, NumFuncs). Perhaps SplitModuleCount should be renamed to NumThreads to be more accurate then. If we ever change the sharding so that SplitModuleCount != NumThreads, then we only need to change the function range assumption away from [0, NumFuncs). The linker will also complain loudly when we make such changes to the assumptions without having fixed up each part. https://codereview.chromium.org/196793026/diff/80001/tools/pnacl-llc/Threaded... tools/pnacl-llc/ThreadedFunctionQueue.h:57: return __sync_bool_compare_and_swap(&CurrentFunction, Cur, NextIndex); On 2014/03/19 04:47:05, JF wrote: > The failure case here doesn't actually set NextIndex at the right location: say > ChunkSize was 8 for this thread and 4 for another, the other thread consumes > first, then this thread will fail consuming, but its NextIndex will be advanced > by 8 instead of 4. Good catch -- I'll see if the cmpxchange succeeded or not. If it failed then re-read CurrentFunction into NextIndex. Otherwise, perhaps just using a mutex would have been more clean. The perf is actually about the same w/ a mutex vs cmpxchange (at least on my linux x86 workstation)... https://codereview.chromium.org/196793026/diff/80001/tools/pnacl-llc/Threaded... tools/pnacl-llc/ThreadedFunctionQueue.h:71: std::max((int)(NumFunctions - CurrentFunction), 1); On 2014/03/19 04:47:05, JF wrote: > Converting an unsigned that's not representable as signed is undefined behavior. > I'd just have everything be signed from the start :-) Note that NumFunctions was unsigned to begin with because .size() returns a size_t (unsigned). So if I make it signed, then I'd have to add a check that size() returns something within range. Once I do that check in the ctor, I guess I wouldn't need to do any checks after though. https://codereview.chromium.org/196793026/diff/80001/tools/pnacl-llc/Threaded... tools/pnacl-llc/ThreadedFunctionQueue.h:76: return std::max(1U, DynamicChunkSize); On 2014/03/19 04:47:05, JF wrote: > return max(1, min(8, DynamicChunkSize)); Done. https://codereview.chromium.org/196793026/diff/80001/tools/pnacl-llc/Threaded... tools/pnacl-llc/ThreadedFunctionQueue.h:86: void operator=(const ThreadedFunctionQueue&) LLVM_DELETED_FUNCTION; On 2014/03/19 04:47:05, JF wrote: > Not that it matters since it's deleted, but operator= usually returns T&. LLVM's style is to have void, e.g., include/llvm/IR/Instruction.h: void operator=(const Instruction &) LLVM_DELETED_FUNCTION; So... I'm just following that style. https://codereview.chromium.org/196793026/diff/80001/tools/pnacl-llc/pnacl-ll... File tools/pnacl-llc/pnacl-llc.cpp (right): https://codereview.chromium.org/196793026/diff/80001/tools/pnacl-llc/pnacl-ll... tools/pnacl-llc/pnacl-llc.cpp:453: if (SplitModuleSched == SplitModuleStatic) { On 2014/03/19 04:47:05, JF wrote: > It would be nicer if this were a switch on SplitModuleSched, since adding a new > scheduling type then wouldn't require changing if/else/else if, and may give a > compiler warning on unhandled enum value. Done. https://codereview.chromium.org/196793026/diff/80001/tools/pnacl-llc/pnacl-ll... tools/pnacl-llc/pnacl-llc.cpp:470: while (FuncIndex < NextIndex && I != E) { On 2014/03/19 04:47:05, JF wrote: > Can it happen that I == E? Shouldn't we have failed to get that many functions? It happened back when I was experimenting with ChunkSize tapering and did not necessarily have ChunkSize taper down to 1 near the end. E.g., taper down to 2 instead. Now that it tapers down to 1, we can assume "I" will hit "E" exactly when FuncIndex hits NextIndex. If we ever change the tapering, then we'd have to readjust.
https://codereview.chromium.org/196793026/diff/80001/tools/pnacl-llc/Threaded... File tools/pnacl-llc/ThreadedFunctionQueue.h (right): https://codereview.chromium.org/196793026/diff/80001/tools/pnacl-llc/Threaded... tools/pnacl-llc/ThreadedFunctionQueue.h:35: return FuncIndex % SplitModuleCount == ThreadIndex; On 2014/03/19 18:44:37, jvoung (cr) wrote: > On 2014/03/19 04:47:05, JF wrote: > > IIUC this won't work if SplitModuleCount < ThreadIndex? Say you have 16 > threads > > and you statically split modules 8 ways, then each function will get > translated > > twice? > > It's pretty unlikely SplitModuleCount < ThreadIndex, or if it is we would have a > different parallelization strategy. > > That would mean it's safe to share the same module between threads. If that was > true, then we would just share the single global module between every thread > instead of having the overhead of cloning global variables and memcpy'ing > bitcode stream bytes for the different modules. > > > However, this probably won't work with SplitModuleCount > # Threads either. It > would leave some functions out because there wouldn't be any threads to work on > some of the modules. In order to support that, the module sharding would need to > be organized a bit differently. E.g., some modules would take functions [0, > NumFuncs/N), some would take [NumFuncs/N, 2 * NumFuncs/N), ... Then those module > sets would be sharded further between the threads, e.g., with this modulo style > of sharding. > > Overall, I think this class is only concerned about sharding things between # of > threads, and the range of functions. For now, the range of functions is [0, > NumFuncs). Perhaps SplitModuleCount should be renamed to NumThreads to be more > accurate then. > > If we ever change the sharding so that SplitModuleCount != NumThreads, then we > only need to change the function range assumption away from [0, NumFuncs). The > linker will also complain loudly when we make such changes to the assumptions > without having fixed up each part. > > It would be useful to assert on this value then, with a comment explaining this tradeoff. https://codereview.chromium.org/196793026/diff/80001/tools/pnacl-llc/Threaded... tools/pnacl-llc/ThreadedFunctionQueue.h:57: return __sync_bool_compare_and_swap(&CurrentFunction, Cur, NextIndex); On 2014/03/19 18:44:37, jvoung (cr) wrote: > On 2014/03/19 04:47:05, JF wrote: > > The failure case here doesn't actually set NextIndex at the right location: > say > > ChunkSize was 8 for this thread and 4 for another, the other thread consumes > > first, then this thread will fail consuming, but its NextIndex will be > advanced > > by 8 instead of 4. > > Good catch -- I'll see if the cmpxchange succeeded or not. If it failed then > re-read CurrentFunction into NextIndex. > > Otherwise, perhaps just using a mutex would have been more clean. The perf is > actually about the same w/ a mutex vs cmpxchange (at least on my linux x86 > workstation)... That works, but it's cleaner to just use: if (Cur == (Index = __sync_compare_and_swap)) { return true; } NextIndex = Index; return false; You remove one contended read in that case.
https://codereview.chromium.org/196793026/diff/80001/tools/pnacl-llc/Threaded... File tools/pnacl-llc/ThreadedFunctionQueue.h (right): https://codereview.chromium.org/196793026/diff/80001/tools/pnacl-llc/Threaded... tools/pnacl-llc/ThreadedFunctionQueue.h:35: return FuncIndex % SplitModuleCount == ThreadIndex; On 2014/03/19 20:26:08, JF wrote: > On 2014/03/19 18:44:37, jvoung (cr) wrote: > > On 2014/03/19 04:47:05, JF wrote: > > > IIUC this won't work if SplitModuleCount < ThreadIndex? Say you have 16 > > threads > > > and you statically split modules 8 ways, then each function will get > > translated > > > twice? > > > > It's pretty unlikely SplitModuleCount < ThreadIndex, or if it is we would have > a > > different parallelization strategy. > > > > That would mean it's safe to share the same module between threads. If that > was > > true, then we would just share the single global module between every thread > > instead of having the overhead of cloning global variables and memcpy'ing > > bitcode stream bytes for the different modules. > > > > > > However, this probably won't work with SplitModuleCount > # Threads either. It > > would leave some functions out because there wouldn't be any threads to work > on > > some of the modules. In order to support that, the module sharding would need > to > > be organized a bit differently. E.g., some modules would take functions [0, > > NumFuncs/N), some would take [NumFuncs/N, 2 * NumFuncs/N), ... Then those > module > > sets would be sharded further between the threads, e.g., with this modulo > style > > of sharding. > > > > Overall, I think this class is only concerned about sharding things between # > of > > threads, and the range of functions. For now, the range of functions is [0, > > NumFuncs). Perhaps SplitModuleCount should be renamed to NumThreads to be more > > accurate then. > > > > If we ever change the sharding so that SplitModuleCount != NumThreads, then we > > only need to change the function range assumption away from [0, NumFuncs). The > > linker will also complain loudly when we make such changes to the assumptions > > without having fixed up each part. > > > > > > It would be useful to assert on this value then, with a comment explaining this > tradeoff. Done. https://codereview.chromium.org/196793026/diff/80001/tools/pnacl-llc/Threaded... tools/pnacl-llc/ThreadedFunctionQueue.h:57: return __sync_bool_compare_and_swap(&CurrentFunction, Cur, NextIndex); On 2014/03/19 20:26:08, JF wrote: > On 2014/03/19 18:44:37, jvoung (cr) wrote: > > On 2014/03/19 04:47:05, JF wrote: > > > The failure case here doesn't actually set NextIndex at the right location: > > say > > > ChunkSize was 8 for this thread and 4 for another, the other thread consumes > > > first, then this thread will fail consuming, but its NextIndex will be > > advanced > > > by 8 instead of 4. > > > > Good catch -- I'll see if the cmpxchange succeeded or not. If it failed then > > re-read CurrentFunction into NextIndex. > > > > Otherwise, perhaps just using a mutex would have been more clean. The perf is > > actually about the same w/ a mutex vs cmpxchange (at least on my linux x86 > > workstation)... > > That works, but it's cleaner to just use: > if (Cur == (Index = __sync_compare_and_swap)) { > return true; > } > NextIndex = Index; > return false; > > You remove one contended read in that case. Done.
lgtm
https://codereview.chromium.org/196793026/diff/80001/tools/pnacl-llc/pnacl-ll... File tools/pnacl-llc/pnacl-llc.cpp (right): https://codereview.chromium.org/196793026/diff/80001/tools/pnacl-llc/pnacl-ll... tools/pnacl-llc/pnacl-llc.cpp:470: while (FuncIndex < NextIndex && I != E) { On 2014/03/19 18:44:37, jvoung (cr) wrote: > On 2014/03/19 04:47:05, JF wrote: > > Can it happen that I == E? Shouldn't we have failed to get that many > functions? > > It happened back when I was experimenting with ChunkSize tapering and did not > necessarily have ChunkSize taper down to 1 near the end. E.g., taper down to 2 > instead. > > Now that it tapers down to 1, we can assume "I" will hit "E" exactly when > FuncIndex hits NextIndex. If we ever change the tapering, then we'd have to > readjust. Looks like not all threads agree on how many functions and declarations there should be: Thread 1 finished at 115 Thread 3 finished at 116 Thread 0 finished at 116 Thread 2 finished at 116 This can happen in small programs if some thread is "lucky" and never sees a function w/ a function call... when they run the ResolvePNaClIntrinsics.cpp function pass, they won't end up adding intrinsic declarations to the module because that only happens when visitCall() is used. Meanwhile other threads will probably see a function call and add an intrinsic declaration. Adding an intrinsic declaration adds to the set of things that can be iterated over... If the threads don't agree on where the end should be, then one of the threads may be in a position where they call GrabFunctionDynamic() w/ say FuncIndex = TrueEndForThread - 50, lose the race and are told that NextIndex shall be TrueEndEndForThread + NumExtraDecls (because TrueEndForThread + NumExtraDecls == TrueEndForSomeOtherThread). Going to add back the check for now. I think ultimately, our intrinsic resolving passes should add needed declarations up front early and consistently, so that there isn't this type of difference. Then we can remove the check for I != E. Also, by adding the decls early, the NumFunctions used to initialized FuncQueue will be accurate and include the added decls, instead of being an underapproximation.
On 2014/03/20 15:50:30, jvoung (cr) wrote: > https://codereview.chromium.org/196793026/diff/80001/tools/pnacl-llc/pnacl-ll... > File tools/pnacl-llc/pnacl-llc.cpp (right): > > https://codereview.chromium.org/196793026/diff/80001/tools/pnacl-llc/pnacl-ll... > tools/pnacl-llc/pnacl-llc.cpp:470: while (FuncIndex < NextIndex && I != E) { > On 2014/03/19 18:44:37, jvoung (cr) wrote: > > On 2014/03/19 04:47:05, JF wrote: > > > Can it happen that I == E? Shouldn't we have failed to get that many > > functions? > > > > It happened back when I was experimenting with ChunkSize tapering and did not > > necessarily have ChunkSize taper down to 1 near the end. E.g., taper down to 2 > > instead. > > > > Now that it tapers down to 1, we can assume "I" will hit "E" exactly when > > FuncIndex hits NextIndex. If we ever change the tapering, then we'd have to > > readjust. > > Looks like not all threads agree on how many functions and declarations there > should be: > > Thread 1 finished at 115 > Thread 3 finished at 116 > Thread 0 finished at 116 > Thread 2 finished at 116 > > This can happen in small programs if some thread is "lucky" and never sees a > function w/ a function call... when they run the ResolvePNaClIntrinsics.cpp > function pass, they won't end up adding intrinsic declarations to the module > because that only happens when visitCall() is used. Meanwhile other threads will > probably see a function call and add an intrinsic declaration. Adding an > intrinsic declaration adds to the set of things that can be iterated over... > > If the threads don't agree on where the end should be, then one of the threads > may be in a position where they call GrabFunctionDynamic() w/ say FuncIndex = > TrueEndForThread - 50, lose the race and are told that NextIndex shall be > TrueEndEndForThread + NumExtraDecls (because TrueEndForThread + NumExtraDecls == > TrueEndForSomeOtherThread). > > Going to add back the check for now. > > I think ultimately, our intrinsic resolving passes should add needed > declarations up front early and consistently, so that there isn't this type of > difference. Then we can remove the check for I != E. Also, by adding the decls > early, the NumFunctions used to initialized FuncQueue will be accurate and > include the added decls, instead of being an underapproximation. Sounds good. Could you leave a comment that summarizes this?
Message was sent while issue was closed.
Committed patchset #9 manually as r91ec3c7 (presubmit successful). |