|
|
DescriptionFix a GN race condition.
The loader maintains a reference count on the "pending work". But the
Setup object was starting the load before incrementing this reference count.
It the loader happened to load anything on the background thread before the
main thread could increment the reference count, GN would think there is no
work left and exit early.
There is another possible race condition between defining items and doing
work on the thread pool can cause the build to be marked complete before
everything is actually done. The race condition is described in more detail
in the added comment in setup.cc.
BUG=674213
Review-Url: https://codereview.chromium.org/2617253002
Cr-Commit-Position: refs/heads/master@{#442416}
Committed: https://chromium.googlesource.com/chromium/src/+/717182cd794aae4577fbae651a88e3ba6d973dc8
Patch Set 1 #Patch Set 2 : Fix #Patch Set 3 : Fix other race condition. #Messages
Total messages: 28 (16 generated)
Fix
Description was changed from ========== Fix a GN race condition. THis race condition between defining items and doing work on the thread pool can cause the build to be marked complete before everything is actually done. The race condition is described in more detail in the added comment in setup.cc. BUG=674213 ========== to ========== Fix a GN race condition. THis race condition between defining items and doing work on the thread pool can cause the build to be marked complete before everything is actually done. The race condition is described in more detail in the added comment in setup.cc. BUG=674213 ==========
brettw@chromium.org changed reviewers: + dpranke@chromium.org, tsniatowski@opera.com
Description was changed from ========== Fix a GN race condition. THis race condition between defining items and doing work on the thread pool can cause the build to be marked complete before everything is actually done. The race condition is described in more detail in the added comment in setup.cc. BUG=674213 ========== to ========== Fix a GN race condition. This race condition between defining items and doing work on the thread pool can cause the build to be marked complete before everything is actually done. The race condition is described in more detail in the added comment in setup.cc. BUG=674213 ==========
The CQ bit was checked by brettw@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
lgtm
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
On 2017/01/06 23:58:37, Dirk Pranke wrote: > lgtm Unfortunately I can still observe the issue with this patch applied on top of master (@676555942ebaa74b60776e67874efd0b1f9a5779). It still goes away after reverting https://codereview.chromium.org/2515383005 like https://codereview.chromium.org/2612083004/ does.
Hmm, I';m pretty sure the issue I described is real, but there may be a similar issue elsewhere. Is your test setup easy for me to duplicate?
On 2017/01/07 18:57:29, brettw (ping after 24h) wrote: > Hmm, I';m pretty sure the issue I described is real, but there may be a similar > issue elsewhere. > > Is your test setup easy for me to duplicate? The setup is a non-overpowered build machine (some openstack vm; 8 virtual cores) running `gn gen` in one shell while on a separate shell I have root running `stress --cpu 8`. The problem reproduces 50-90% for me on a plain chromium checkout too, and continues to reproduce even with this patch applied. Applying the revert makes the issue go away, Note: Only release gn seems to be useful for this (on latest master debug gn trips a: substitution_writer.cc(402)] Check failed: false. Unsupported substitution for this function: {{output}} Want a bug for that?
Thanks, on my beefy dev box using "stress --cpu 48" gives an occasional "no targets" which I think is the same bug.
Description was changed from ========== Fix a GN race condition. This race condition between defining items and doing work on the thread pool can cause the build to be marked complete before everything is actually done. The race condition is described in more detail in the added comment in setup.cc. BUG=674213 ========== to ========== Fix a GN race condition. The loader maintains a reference count on the "pending work". But the Setup object was starting the load before incrementing this reference count. It the loader happened to load anything on the background thread before the main thread could increment the reference count, GN would think there is no work left and exit early. There is another possible race condition between defining items and doing work on the thread pool can cause the build to be marked complete before everything is actually done. The race condition is described in more detail in the added comment in setup.cc. BUG=674213 ==========
Okay, I found an even more obvious race which is probably what we're normally encountering. It seems to fix the problem for me locally (but sometimes it takes a while to repro so it's hard to be sure).
The CQ bit was checked by brettw@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by brettw@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from dpranke@chromium.org Link to the patchset: https://codereview.chromium.org/2617253002/#ps40001 (title: "Fix other race condition.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 40001, "attempt_start_ts": 1484005390433770, "parent_rev": "60ddb7a0abc0ab2a2783e1b9fbe98f5468c8d711", "commit_rev": "717182cd794aae4577fbae651a88e3ba6d973dc8"}
Message was sent while issue was closed.
Description was changed from ========== Fix a GN race condition. The loader maintains a reference count on the "pending work". But the Setup object was starting the load before incrementing this reference count. It the loader happened to load anything on the background thread before the main thread could increment the reference count, GN would think there is no work left and exit early. There is another possible race condition between defining items and doing work on the thread pool can cause the build to be marked complete before everything is actually done. The race condition is described in more detail in the added comment in setup.cc. BUG=674213 ========== to ========== Fix a GN race condition. The loader maintains a reference count on the "pending work". But the Setup object was starting the load before incrementing this reference count. It the loader happened to load anything on the background thread before the main thread could increment the reference count, GN would think there is no work left and exit early. There is another possible race condition between defining items and doing work on the thread pool can cause the build to be marked complete before everything is actually done. The race condition is described in more detail in the added comment in setup.cc. BUG=674213 Review-Url: https://codereview.chromium.org/2617253002 Cr-Commit-Position: refs/heads/master@{#442416} Committed: https://chromium.googlesource.com/chromium/src/+/717182cd794aae4577fbae651a88... ==========
Message was sent while issue was closed.
Committed patchset #3 (id:40001) as https://chromium.googlesource.com/chromium/src/+/717182cd794aae4577fbae651a88...
Message was sent while issue was closed.
On 2017/01/09 22:29:55, brettw (ping after 24h) wrote: > Okay, I found an even more obvious race which is probably what we're normally > encountering. It seems to fix the problem for me locally (but sometimes it takes > a while to repro so it's hard to be sure). Seems to have done the trick! Thanks!
Message was sent while issue was closed.
> Seems to have done the trick! Thanks! Great, thanks for the help tracking this down! |