|
|
Created:
4 years, 2 months ago by katthomas Modified:
4 years, 1 month ago CC:
chromium-reviews, dpranke+depot_tools_chromium.org, iannucci+depot_tools_chromium.org, phajdan Target Ref:
refs/heads/master Project:
depot_tools Visibility:
Public. |
DescriptionRemove git lockfile flakiness on win (bot_update)
Hypothesis: Sometimes bot update fails because windows fails to delete
a lockfile associated with a git process.
Test: If this happens, let's delete that lockfile and try again.
BUG=651602
Committed: https://chromium.googlesource.com/chromium/tools/depot_tools/+/df66a34f68f4b2f2c08039f178d267c7dafd9d59
Patch Set 1 #Patch Set 2 : Remove git lockfile flakiness on win (bot_update) #
Total comments: 7
Patch Set 3 : Remove git lockfile flakiness on win (bot_update) #
Total comments: 11
Patch Set 4 : Remove git lockfile flakiness on win (bot_update) #
Total comments: 1
Messages
Total messages: 19 (6 generated)
Description was changed from ========== Remove git lockfile flakiness on win (bot_update) Hypothesis: Sometimes bot update fails because windows fails to delete a lockfile associated with a git process. Test: If this happens, let's delete that lockfile and try again. BUG=651602 ========== to ========== Remove git lockfile flakiness on win (bot_update) Hypothesis: Sometimes bot update fails because windows fails to delete a lockfile associated with a git process. Test: If this happens, let's delete that lockfile and try again. BUG=651602 ==========
katthomas@google.com changed reviewers: + agable@chromium.org
phajdan.jr@chromium.org changed reviewers: + phajdan.jr@chromium.org
Drive-by. Hope you'd find it useful. https://codereview.chromium.org/2382653005/diff/20001/recipe_modules/bot_upda... File recipe_modules/bot_update/resources/bot_update.py (right): https://codereview.chromium.org/2382653005/diff/20001/recipe_modules/bot_upda... recipe_modules/bot_update/resources/bot_update.py:224: exitcode = subprocess.call(['cmd.exe', '/c', 'del', Please consider breaking the locks only once, at the beginning of bot_update/gclient invocation. gclient could recursively do so for all repos it manages. We already do that for the git cache. Also consider killing possibly stuck git processes in build\scripts\slave\kill_processes.py .
https://codereview.chromium.org/2382653005/diff/20001/recipe_modules/bot_upda... File recipe_modules/bot_update/resources/bot_update.py (right): https://codereview.chromium.org/2382653005/diff/20001/recipe_modules/bot_upda... recipe_modules/bot_update/resources/bot_update.py:209: maybe_remove_win_lockfile(sys.platform, outval) If possible, this shouldn't be at the end of call(). The call function is potentially used for many things, not just syncing git checkouts, and removing the lockfile only makes sense in a subset of those cases. https://codereview.chromium.org/2382653005/diff/20001/recipe_modules/bot_upda... recipe_modules/bot_update/resources/bot_update.py:215: # Windows sometimes has trouble deleting files. This can make git commands Use a docstring on line 218 instead of a comment above the definition. https://codereview.chromium.org/2382653005/diff/20001/recipe_modules/bot_upda... recipe_modules/bot_update/resources/bot_update.py:224: exitcode = subprocess.call(['cmd.exe', '/c', 'del', On 2016/10/03 at 11:53:22, Paweł Hajdan Jr. wrote: > Please consider breaking the locks only once, at the beginning of bot_update/gclient invocation. > > gclient could recursively do so for all repos it manages. We already do that for the git cache. > > Also consider killing possibly stuck git processes in build\scripts\slave\kill_processes.py . Agreed with this comment -- it makes more sense to me to aggressively break locks at the beginning, rather than reactively doing it after things fail. And also yes it's a good tip to look into kill_processes.py as well, but I'd only do that if this is unsuccessful. Finally, if you do move the call to this out from call(), then you should use call() instead of subprocess.call().
PTAL https://codereview.chromium.org/2382653005/diff/20001/recipe_modules/bot_upda... File recipe_modules/bot_update/resources/bot_update.py (right): https://codereview.chromium.org/2382653005/diff/20001/recipe_modules/bot_upda... recipe_modules/bot_update/resources/bot_update.py:209: maybe_remove_win_lockfile(sys.platform, outval) On 2016/10/05 16:28:21, agable wrote: > If possible, this shouldn't be at the end of call(). The call function is > potentially used for many things, not just syncing git checkouts, and removing > the lockfile only makes sense in a subset of those cases. Done. https://codereview.chromium.org/2382653005/diff/20001/recipe_modules/bot_upda... recipe_modules/bot_update/resources/bot_update.py:215: # Windows sometimes has trouble deleting files. This can make git commands On 2016/10/05 16:28:20, agable wrote: > Use a docstring on line 218 instead of a comment above the definition. Done. https://codereview.chromium.org/2382653005/diff/20001/recipe_modules/bot_upda... recipe_modules/bot_update/resources/bot_update.py:224: exitcode = subprocess.call(['cmd.exe', '/c', 'del', On 2016/10/05 16:28:21, agable wrote: > On 2016/10/03 at 11:53:22, Paweł Hajdan Jr. wrote: > > Please consider breaking the locks only once, at the beginning of > bot_update/gclient invocation. > > > > gclient could recursively do so for all repos it manages. We already do that > for the git cache. > > > > Also consider killing possibly stuck git processes in > build\scripts\slave\kill_processes.py . > > Agreed with this comment -- it makes more sense to me to aggressively break > locks at the beginning, rather than reactively doing it after things fail. > > And also yes it's a good tip to look into kill_processes.py as well, but I'd > only do that if this is unsuccessful. > > Finally, if you do move the call to this out from call(), then you should use > call() instead of subprocess.call(). Done. I put the lockbreaking before running any git command, with the logic that if we do it once at the beginning, we could end up with the same problem if subsequent git calls leave behind a lockfile. Thoughts?
https://codereview.chromium.org/2382653005/diff/40001/recipe_modules/bot_upda... File recipe_modules/bot_update/resources/bot_update.py (right): https://codereview.chromium.org/2382653005/diff/40001/recipe_modules/bot_upda... recipe_modules/bot_update/resources/bot_update.py:226: remove_git_lockfiles(sys.platform, os.getcwd()) Well this still runs before _each_ git command rather than exactly _once_ at the beginning. One idea might be to add support for breaking locks to gclient, which knows where all of the sub-repos should be. I'd suggest something similar to git_cache.py's CMDunlock. This probably doesn't have to be Windows-specific, right?
https://codereview.chromium.org/2382653005/diff/40001/recipe_modules/bot_upda... File recipe_modules/bot_update/resources/bot_update.py (right): https://codereview.chromium.org/2382653005/diff/40001/recipe_modules/bot_upda... recipe_modules/bot_update/resources/bot_update.py:226: remove_git_lockfiles(sys.platform, os.getcwd()) Right, I was thinking that any git command could cause this problem (windows doesn't successfully delete lock file, causing subsequent git command to fail). We could try doing it just once at the beginning, and then get more aggressive if that doesn't work? The issue was only happening on Windows. We discussed that it's not great practice to manually delete lockfiles, so I think it makes sense to be more specific here. On 2016/10/07 13:47:43, Paweł Hajdan Jr. wrote: > Well this still runs before _each_ git command rather than exactly _once_ at the > beginning. One idea might be to add support for breaking locks to gclient, which > knows where all of the sub-repos should be. > > I'd suggest something similar to git_cache.py's CMDunlock. > > This probably doesn't have to be Windows-specific, right?
https://codereview.chromium.org/2382653005/diff/40001/recipe_modules/bot_upda... File recipe_modules/bot_update/resources/bot_update.py (right): https://codereview.chromium.org/2382653005/diff/40001/recipe_modules/bot_upda... recipe_modules/bot_update/resources/bot_update.py:226: remove_git_lockfiles(sys.platform, os.getcwd()) On 2016/10/07 at 18:49:16, katthomas wrote: > Right, I was thinking that any git command could cause this problem (windows doesn't successfully delete lock file, causing subsequent git command to fail). We could try doing it just once at the beginning, and then get more aggressive if that doesn't work? That'd sound good to me. > The issue was only happening on Windows. We discussed that it's not great practice to manually delete lockfiles, so I think it makes sense to be more specific here. IMO there's nothing wrong with it as long as we do that _once_ and at the beginning. Locks could have been left in an inconsistent state, and it's reasonable to bring the system to a consistent state at the beginning of bot_update step. All following operations should keep the system in a sane state, unless abruptly interrupted. I'm fine with starting Windows-only.
https://codereview.chromium.org/2382653005/diff/40001/recipe_modules/bot_upda... File recipe_modules/bot_update/resources/bot_update.py (right): https://codereview.chromium.org/2382653005/diff/40001/recipe_modules/bot_upda... recipe_modules/bot_update/resources/bot_update.py:226: remove_git_lockfiles(sys.platform, os.getcwd()) On 2016/10/10 at 14:58:04, Paweł Hajdan Jr. wrote: > On 2016/10/07 at 18:49:16, katthomas wrote: > > Right, I was thinking that any git command could cause this problem (windows doesn't successfully delete lock file, causing subsequent git command to fail). We could try doing it just once at the beginning, and then get more aggressive if that doesn't work? > > That'd sound good to me. > > > The issue was only happening on Windows. We discussed that it's not great practice to manually delete lockfiles, so I think it makes sense to be more specific here. > > IMO there's nothing wrong with it as long as we do that _once_ and at the beginning. Locks could have been left in an inconsistent state, and it's reasonable to bring the system to a consistent state at the beginning of bot_update step. All following operations should keep the system in a sane state, unless abruptly interrupted. > > I'm fine with starting Windows-only. Yep, I'm in agreement with Pawel here: breaking locks for every Git invocation is both a) slow, and b) not maintaining the invariant of "bot_update should know what it is doing". Breaking the locks is the solution for "something went wrong outside of what we control". If bot_update is actually tripping over itself in the middle of a single invocation, then we definitely need to solve the problem, but breaking locks won't be the right solution for that. https://codereview.chromium.org/2382653005/diff/40001/recipe_modules/bot_upda... recipe_modules/bot_update/resources/bot_update.py:231: if not platform.startswith('win'): nit: It's a matter of opinion whether it is better to do this check at the top of this function or in the caller. The former keeps us safe if someone else decides to reuse this function; the latter reduces the complexity of the function parameters and lets someone doing a straight-line readthrough of the calling function keep going without having to read this function too. Not expressing a strong opinion either way, just making sure both are being considered. https://codereview.chromium.org/2382653005/diff/40001/recipe_modules/bot_upda... recipe_modules/bot_update/resources/bot_update.py:233: git_dir = cwd + '/.git' Can you guarantee that all callers of this function will call it from the root of a repository? I'm pretty sure that in other places we're more careful about calculating the path to the .git dir. That, or rename 'cwd' to 'git_root_path' or something like that, so it is clear to callers that the path must be a repo root. https://codereview.chromium.org/2382653005/diff/40001/recipe_modules/bot_upda... recipe_modules/bot_update/resources/bot_update.py:242: print '===Attempting to remove lockfile %s===' % filename super duper tiny nit: I'd print these big header/footer pairs only once for the entire lock-breaking operation. A single line per file per attempt seems like plenty of output to me.
PTAL I found that gclient sync has a flag that does what we wanted, so that's great. I wasn't entirely sure on the best way to test, but took a stab at it anyway. https://codereview.chromium.org/2382653005/diff/40001/recipe_modules/bot_upda... File recipe_modules/bot_update/resources/bot_update.py (right): https://codereview.chromium.org/2382653005/diff/40001/recipe_modules/bot_upda... recipe_modules/bot_update/resources/bot_update.py:226: remove_git_lockfiles(sys.platform, os.getcwd()) On 2016/10/10 23:01:34, agable wrote: > On 2016/10/10 at 14:58:04, Paweł Hajdan Jr. wrote: > > On 2016/10/07 at 18:49:16, katthomas wrote: > > > Right, I was thinking that any git command could cause this problem (windows > doesn't successfully delete lock file, causing subsequent git command to fail). > We could try doing it just once at the beginning, and then get more aggressive > if that doesn't work? > > > > That'd sound good to me. > > > > > The issue was only happening on Windows. We discussed that it's not great > practice to manually delete lockfiles, so I think it makes sense to be more > specific here. > > > > IMO there's nothing wrong with it as long as we do that _once_ and at the > beginning. Locks could have been left in an inconsistent state, and it's > reasonable to bring the system to a consistent state at the beginning of > bot_update step. All following operations should keep the system in a sane > state, unless abruptly interrupted. > > > > I'm fine with starting Windows-only. > > Yep, I'm in agreement with Pawel here: breaking locks for every Git invocation > is both a) slow, and b) not maintaining the invariant of "bot_update should know > what it is doing". Breaking the locks is the solution for "something went wrong > outside of what we control". If bot_update is actually tripping over itself in > the middle of a single invocation, then we definitely need to solve the problem, > but breaking locks won't be the right solution for that. Makes sense, done. https://codereview.chromium.org/2382653005/diff/40001/recipe_modules/bot_upda... recipe_modules/bot_update/resources/bot_update.py:231: if not platform.startswith('win'): On 2016/10/10 23:01:34, agable wrote: > nit: It's a matter of opinion whether it is better to do this check at the top > of this function or in the caller. The former keeps us safe if someone else > decides to reuse this function; the latter reduces the complexity of the > function parameters and lets someone doing a straight-line readthrough of the > calling function keep going without having to read this function too. > > Not expressing a strong opinion either way, just making sure both are being > considered. Thanks! https://codereview.chromium.org/2382653005/diff/40001/recipe_modules/bot_upda... recipe_modules/bot_update/resources/bot_update.py:233: git_dir = cwd + '/.git' On 2016/10/10 23:01:34, agable wrote: > Can you guarantee that all callers of this function will call it from the root > of a repository? I'm pretty sure that in other places we're more careful about > calculating the path to the .git dir. That, or rename 'cwd' to 'git_root_path' > or something like that, so it is clear to callers that the path must be a repo > root. Done. https://codereview.chromium.org/2382653005/diff/40001/recipe_modules/bot_upda... recipe_modules/bot_update/resources/bot_update.py:242: print '===Attempting to remove lockfile %s===' % filename On 2016/10/10 23:01:34, agable wrote: > super duper tiny nit: I'd print these big header/footer pairs only once for the > entire lock-breaking operation. A single line per file per attempt seems like > plenty of output to me. Done.
LGTM
lgtm
The CQ bit was checked by katthomas@google.com
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Message was sent while issue was closed.
Description was changed from ========== Remove git lockfile flakiness on win (bot_update) Hypothesis: Sometimes bot update fails because windows fails to delete a lockfile associated with a git process. Test: If this happens, let's delete that lockfile and try again. BUG=651602 ========== to ========== Remove git lockfile flakiness on win (bot_update) Hypothesis: Sometimes bot update fails because windows fails to delete a lockfile associated with a git process. Test: If this happens, let's delete that lockfile and try again. BUG=651602 Committed: https://chromium.googlesource.com/chromium/tools/depot_tools/+/df66a34f68f4b2... ==========
Message was sent while issue was closed.
Committed patchset #4 (id:60001) as https://chromium.googlesource.com/chromium/tools/depot_tools/+/df66a34f68f4b2...
Message was sent while issue was closed.
ddoman@google.com changed reviewers: + ddoman@google.com
Message was sent while issue was closed.
https://codereview.chromium.org/2382653005/diff/60001/recipe_modules/bot_upda... File recipe_modules/bot_update/resources/bot_update.py (right): https://codereview.chromium.org/2382653005/diff/60001/recipe_modules/bot_upda... recipe_modules/bot_update/resources/bot_update.py:730: git_ref = git_checkout(solutions, revisions, shallow, refs, git_cache_dir) @katthomas, The following log shows that git checkout failed at this due to a broken lock: https://build.chromium.org/p/chromium.win/builders/Win%20x64%20Builder/builds... Therefore, those locks need to be checked and removed before git_checkout(). |