|
|
Created:
8 years, 3 months ago by dmazzoni Modified:
8 years, 2 months ago CC:
gyp-developer_googlegroups.com Base URL:
http://git.chromium.org/external/gyp.git@master Visibility:
Public. |
DescriptionLoad target build files in parallel using Python multiprocessing.
This parallelizes the portion of the processing that took the largest fraction of runtime previously. There's more opportunity for parallelization elsewhere, but this seems to have the biggest impact.
I did some testing to verify the output is identical, but mostly with ninja. Someone familiar with all of the generators should make sure I'm not making assumptions I shouldn't be. Also, it might make sense to keep this behind an optional flag or env var for now.
Raw numbers:
GYP_GENERATORS=ninja
time build/gyp_chromium
1. Linux, z600, chromium src + src-internal + full WebKit: 23.14 s ->
16.04 s (30% speedup)
2. MacBook Pro, chromium src only: 36.83 s -> 27.87 (25% speedup)
Patch Set 1 #
Total comments: 12
Patch Set 2 : Avoid double copy #Patch Set 3 : Put behind flag with env var #
Total comments: 9
Patch Set 4 : Add lock and explicit class for global state #
Total comments: 8
Patch Set 5 : Switch to docstrings, move LoadTargetBuildFileCallback inside ParallelState. #Messages
Total messages: 13 (0 generated)
This is great! Could you add some timing info to the description? I don't feel qualified to review the change, the data setup is very subtle. Maybe Mark or Nico?
https://chromiumcodereview.appspot.com/10911082/diff/1/pylib/gyp/input.py File pylib/gyp/input.py (right): https://chromiumcodereview.appspot.com/10911082/diff/1/pylib/gyp/input.py#new... pylib/gyp/input.py:356: data_keys = set(data.keys()) Why double copy? data_keys = set(data) https://chromiumcodereview.appspot.com/10911082/diff/1/pylib/gyp/input.py#new... pylib/gyp/input.py:357: aux_data_keys = set(aux_data.keys()) same https://chromiumcodereview.appspot.com/10911082/diff/1/pylib/gyp/input.py#new... pylib/gyp/input.py:484: time.sleep(0.003) it'll likely slow down the equivalent of time.sleep(0.01) on Windows, you would have to confirm manually.
http://codereview.chromium.org/10911082/diff/1/pylib/gyp/input.py File pylib/gyp/input.py (right): http://codereview.chromium.org/10911082/diff/1/pylib/gyp/input.py#newcode356 pylib/gyp/input.py:356: data_keys = set(data.keys()) On 2012/09/05 19:45:39, Marc-Antoine Ruel wrote: > Why double copy? > data_keys = set(data) Done. http://codereview.chromium.org/10911082/diff/1/pylib/gyp/input.py#newcode357 pylib/gyp/input.py:357: aux_data_keys = set(aux_data.keys()) On 2012/09/05 19:45:39, Marc-Antoine Ruel wrote: > same Done. http://codereview.chromium.org/10911082/diff/1/pylib/gyp/input.py#newcode484 pylib/gyp/input.py:484: time.sleep(0.003) On 2012/09/05 19:45:39, Marc-Antoine Ruel wrote: > it'll likely slow down the equivalent of time.sleep(0.01) on Windows, you would > have to confirm manually. On Linux and Mac, any delay is better than none. I'm having trouble getting it to work on Windows at all - multiprocessing complains that it can't find gyp_chromium after forking. Any ideas?
http://codereview.chromium.org/10911082/diff/1/pylib/gyp/input.py File pylib/gyp/input.py (right): http://codereview.chromium.org/10911082/diff/1/pylib/gyp/input.py#newcode461 pylib/gyp/input.py:461: def LoadTargetBuildFileCallback(result): A bit of docstrings explaining the dataflows would help. http://codereview.chromium.org/10911082/diff/1/pylib/gyp/input.py#newcode464 pylib/gyp/input.py:464: for key in data0: gyp.data.update(data0) ? http://codereview.chromium.org/10911082/diff/1/pylib/gyp/input.py#newcode484 pylib/gyp/input.py:484: time.sleep(0.003) On 2012/09/06 16:40:59, Dominic Mazzoni wrote: > On 2012/09/05 19:45:39, Marc-Antoine Ruel wrote: > > it'll likely slow down the equivalent of time.sleep(0.01) on Windows, you > would > > have to confirm manually. > > On Linux and Mac, any delay is better than none. > > I'm having trouble getting it to work on Windows at all - multiprocessing > complains that it can't find gyp_chromium after forking. Any ideas? It's because fork doesn't exist on Windows. So it is 100% emulated by python. As such, it is much better if the inner code to be processed can be separated into a smaller process with a smaller script first. This is cumbersome, so you may just want to not enable it on Windows. http://codereview.chromium.org/10911082/diff/1/pylib/gyp/input.py#newcode507 pylib/gyp/input.py:507: time.sleep(0.003) You could use a Queue.Queue instead, so you wouldn't have to poll&sleep at all
I cleaned it up a bit and put the parallel code behind a --parallel flag (off by default) with an env var GYP_PARALLEL. If it looks reasonable, maybe I can check this in and encourage people to give it a try and see if there are incompatibilities.
Any other comments from anyone? Now that I've made it optional/opt-in, perhaps someone could just review the patch carefully enough to make sure that if it's not enabled, the behavior will not change. Then we can check this in and encourage people to experiment with it at their own risk and report bugs.
http://codereview.chromium.org/10911082/diff/1/pylib/gyp/input.py File pylib/gyp/input.py (right): http://codereview.chromium.org/10911082/diff/1/pylib/gyp/input.py#newcode507 pylib/gyp/input.py:507: time.sleep(0.003) On 2012/09/06 18:02:36, Marc-Antoine Ruel wrote: > You could use a Queue.Queue instead, so you wouldn't have to poll&sleep at all You didn't tell why you didn't want to use a single Queue for tasks completed in gyp.dependencies. In particular, the LoadTargetBuildFileCallback is called in a separate thread and you are not using locking at all. http://codereview.chromium.org/10911082/diff/3002/pylib/gyp/__init__.py File pylib/gyp/__init__.py (right): http://codereview.chromium.org/10911082/diff/3002/pylib/gyp/__init__.py#newco... pylib/gyp/__init__.py:322: parser.add_option('--parallel', dest='parallel', action='store_true', Remove parameter dest='parallel' it's not needed. http://codereview.chromium.org/10911082/diff/3002/pylib/gyp/__init__.py#newco... pylib/gyp/__init__.py:324: regenerate=False, What's regenerate? http://codereview.chromium.org/10911082/diff/3002/pylib/gyp/__init__.py#newco... pylib/gyp/__init__.py:384: parallel = os.environ.get('GYP_PARALLEL') options.parallel = bool(os.environ.get('GYP_PARALLEL')) http://codereview.chromium.org/10911082/diff/3002/pylib/gyp/input.py File pylib/gyp/input.py (right): http://codereview.chromium.org/10911082/diff/3002/pylib/gyp/input.py#newcode490 pylib/gyp/input.py:490: gyp.data['target_build_files'].add(build_file_path0) Can you tell me where the "gyp" variable is defined? http://codereview.chromium.org/10911082/diff/3002/pylib/gyp/input.py#newcode504 pylib/gyp/input.py:504: gyp.dependencies = [build_file_path] I find the code a bit hard to read, the "gyp" variable looks magical to me, but I'm not super familiar with the gyp code.
Thanks for the great feedback. I made the global state explicit and documented, and added a condition variable to avoid a race condition and eliminate polling. One more data point: on my new Z620 with an SSD, build/gyp_chromium time goes from 21.41 s -> 14.14 s (34% speedup), the best number yet. http://codereview.chromium.org/10911082/diff/1/pylib/gyp/input.py File pylib/gyp/input.py (right): http://codereview.chromium.org/10911082/diff/1/pylib/gyp/input.py#newcode507 pylib/gyp/input.py:507: time.sleep(0.003) On 2012/09/18 19:42:18, Marc-Antoine Ruel wrote: > On 2012/09/06 18:02:36, Marc-Antoine Ruel wrote: > > You could use a Queue.Queue instead, so you wouldn't have to poll&sleep at all > > You didn't tell why you didn't want to use a single Queue for tasks completed in > gyp.dependencies. > > In particular, the LoadTargetBuildFileCallback is called in a separate thread > and you are not using locking at all. You're right, I think there was a possible race condition. I fixed it with a condition variable that gets notified each time a callback completes; this also eliminates the need for polling or sleeping. As for a Queue, I think it would be more code to manage subprocesses and results explicitly and I'm not sure I see the benefit. multiprocessing.Pool already uses a Queue under the hood, so I think this is giving us the same benefit with the advantage of using a simpler high-level API. http://codereview.chromium.org/10911082/diff/3002/pylib/gyp/__init__.py File pylib/gyp/__init__.py (right): http://codereview.chromium.org/10911082/diff/3002/pylib/gyp/__init__.py#newco... pylib/gyp/__init__.py:322: parser.add_option('--parallel', dest='parallel', action='store_true', On 2012/09/18 19:42:18, Marc-Antoine Ruel wrote: > Remove parameter dest='parallel' > it's not needed. Done. http://codereview.chromium.org/10911082/diff/3002/pylib/gyp/__init__.py#newco... pylib/gyp/__init__.py:324: regenerate=False, On 2012/09/18 19:42:18, Marc-Antoine Ruel wrote: > What's regenerate? Looks like this isn't needed if 'dest' isn't used. http://codereview.chromium.org/10911082/diff/3002/pylib/gyp/__init__.py#newco... pylib/gyp/__init__.py:384: parallel = os.environ.get('GYP_PARALLEL') On 2012/09/18 19:42:18, Marc-Antoine Ruel wrote: > options.parallel = bool(os.environ.get('GYP_PARALLEL')) Done. http://codereview.chromium.org/10911082/diff/3002/pylib/gyp/input.py File pylib/gyp/input.py (right): http://codereview.chromium.org/10911082/diff/3002/pylib/gyp/input.py#newcode490 pylib/gyp/input.py:490: gyp.data['target_build_files'].add(build_file_path0) On 2012/09/18 19:42:18, Marc-Antoine Ruel wrote: > Can you tell me where the "gyp" variable is defined? gyp is the module object. I cleaned this up by creating a single global called parallel_state and documenting all of its members.
Ping?
On 2012/09/26 16:20:10, Dominic Mazzoni wrote: > Ping? I just tried the patch on Windows. With GYP_PARALLEL=1 I get: d:\src\cr3\src>python build\gyp_chromium Enabled Psyco JIT. Updating projects from gyp files... Using parallel processing (experimental). Traceback (most recent call last): File "<string>", line 1, in <module> File "d:\src\depot_tools\python_bin\lib\multiprocessing\forking.py", line 341, in main prepare(preparation_data) File "d:\src\depot_tools\python_bin\lib\multiprocessing\forking.py", line 450, in prepare file, path_name, etc = imp.find_module(main_name, dirs) ImportError: No module named gyp_chromium Traceback (most recent call last): File "<string>", line 1, in <module> File "d:\src\depot_tools\python_bin\lib\multiprocessing\forking.py", line 341, in main prepare(preparation_data) File "d:\src\depot_tools\python_bin\lib\multiprocessing\forking.py", line 450, in prepare file, path_name, etc = imp.find_module(main_name, dirs) ImportError: No module named gyp_chromium Traceback (most recent call last): File "<string>", line 1, in <module> File "d:\src\depot_tools\python_bin\lib\multiprocessing\forking.py", line 341, in main prepare(preparation_data) File "d:\src\depot_tools\python_bin\lib\multiprocessing\forking.py", line 450, in prepare file, path_name, etc = imp.find_module(main_name, dirs) ImportError: No module named gyp_chromium Traceback (most recent call last): File "<string>", line 1, in <module> File "d:\src\depot_tools\python_bin\lib\multiprocessing\forking.py", line 341, in main prepare(preparation_data) File "d:\src\depot_tools\python_bin\lib\multiprocessing\forking.py", line 450, in prepare file, path_name, etc = imp.find_module(main_name, dirs) ImportError: No module named gyp_chromium Traceback (most recent call last): File "<string>", line 1, in <module> File "d:\src\depot_tools\python_bin\lib\multiprocessing\forking.py", line 341, in main prepare(preparation_data) File "d:\src\depot_tools\python_bin\lib\multiprocessing\forking.py", line 450, in prepare file, path_name, etc = imp.find_module(main_name, dirs) ImportError: No module named gyp_chromium Traceback (most recent call last): File "<string>", line 1, in <module> File "d:\src\depot_tools\python_bin\lib\multiprocessing\forking.py", line 341, in main prepare(preparation_data) File "d:\src\depot_tools\python_bin\lib\multiprocessing\forking.py", line 450, in prepare file, path_name, etc = imp.find_module(main_name, dirs) ImportError: No module named gyp_chromium Traceback (most recent call last): File "<string>", line 1, in <module> File "d:\src\depot_tools\python_bin\lib\multiprocessing\forking.py", line 341, in main prepare(preparation_data) File "d:\src\depot_tools\python_bin\lib\multiprocessing\forking.py", line 450, in prepare file, path_name, etc = imp.find_module(main_name, dirs) ImportError: No module named gyp_chromium Traceback (most recent call last): File "<string>", line 1, in <module> File "d:\src\depot_tools\python_bin\lib\multiprocessing\forking.py", line 341, in main prepare(preparation_data) File "d:\src\depot_tools\python_bin\lib\multiprocessing\forking.py", line 450, in prepare file, path_name, etc = imp.find_module(main_name, dirs) ImportError: No module named gyp_chromium I guess there's some PYTHONPATH (or something) futzing required?
lgtm with some fixes. http://codereview.chromium.org/10911082/diff/12001/pylib/gyp/input.py File pylib/gyp/input.py (right): http://codereview.chromium.org/10911082/diff/12001/pylib/gyp/input.py#newcode447 pylib/gyp/input.py:447: # Wrapper around LoadTargetBuildFile used when LoadTargetBuildFile Why not a docstring? http://codereview.chromium.org/10911082/diff/12001/pylib/gyp/input.py#newcode488 pylib/gyp/input.py:488: # If build files are loaded in parallel, use this to keep track of Same for all file level symbols with comments. http://codereview.chromium.org/10911082/diff/12001/pylib/gyp/input.py#newcode491 pylib/gyp/input.py:491: class ParallelState: class ParallelState(object): http://codereview.chromium.org/10911082/diff/12001/pylib/gyp/input.py#newcode514 pylib/gyp/input.py:514: def LoadTargetBuildFileCallback(result): If you had put it a member function of ParallelState, so a member of the parallel_state instance, you'd be able to use at line 568, e.g.; callback=parallel_state.LoadTargetBuildFileCallback
http://codereview.chromium.org/10911082/diff/12001/pylib/gyp/input.py File pylib/gyp/input.py (right): http://codereview.chromium.org/10911082/diff/12001/pylib/gyp/input.py#newcode447 pylib/gyp/input.py:447: # Wrapper around LoadTargetBuildFile used when LoadTargetBuildFile On 2012/09/27 00:41:31, Marc-Antoine Ruel wrote: > Why not a docstring? Done. http://codereview.chromium.org/10911082/diff/12001/pylib/gyp/input.py#newcode488 pylib/gyp/input.py:488: # If build files are loaded in parallel, use this to keep track of On 2012/09/27 00:41:31, Marc-Antoine Ruel wrote: > Same for all file level symbols with comments. Done. http://codereview.chromium.org/10911082/diff/12001/pylib/gyp/input.py#newcode491 pylib/gyp/input.py:491: class ParallelState: On 2012/09/27 00:41:31, Marc-Antoine Ruel wrote: > class ParallelState(object): Done. http://codereview.chromium.org/10911082/diff/12001/pylib/gyp/input.py#newcode514 pylib/gyp/input.py:514: def LoadTargetBuildFileCallback(result): On 2012/09/27 00:41:31, Marc-Antoine Ruel wrote: > If you had put it a member function of ParallelState, so a member of the > parallel_state instance, you'd be able to use at line 568, e.g.; > callback=parallel_state.LoadTargetBuildFileCallback Good idea, done.
Committed r1508 |