|
|
Created:
6 years, 6 months ago by hinoka Modified:
6 years, 6 months ago CC:
chromium-reviews, Dirk Pranke, cmp-cc_chromium.org, iannucci+depot_tools_chromium.org, ilevy-cc_chromium.org, agable Visibility:
Public. |
DescriptionHave git cache bootstrap repo if repo is corrupt
We're seeing fetches fail in interesting ways:
running "git fetch -v --progress origin +refs/heads/*:refs/heads/*" in "/mnt/scratch0/b_used/build/slave/cache_dir/chrome--internal.googlesource.com-chrome-src--internal"
error: object file ./objects/a1/4bd89aa4cc7d7bbad7594cba0ae73e99dbb54c is empty
error: object file ./objects/a1/4bd89aa4cc7d7bbad7594cba0ae73e99dbb54c is empty
fatal: loose object a14bd89aa4cc7d7bbad7594cba0ae73e99dbb54c (stored in ./objects/a1/4bd89aa4cc7d7bbad7594cba0ae73e99dbb54c) is corrupt
fatal: The remote end hung up unexpectedly
And then the cache becomes corrupted. This blows the cache away if this happens.
BUG=261741
Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=280123
Patch Set 1 #
Total comments: 2
Patch Set 2 : Recurse=True #
Total comments: 1
Patch Set 3 : Refactor #
Total comments: 6
Patch Set 4 : Try to move defuct folder #Patch Set 5 : Review fix #Patch Set 6 : remptdir #Patch Set 7 : import shutil #Messages
Total messages: 21 (0 generated)
https://codereview.chromium.org/352543003/diff/1/git_cache.py File git_cache.py (right): https://codereview.chromium.org/352543003/diff/1/git_cache.py#newcode381 git_cache.py:381: # The cache is corrupt, bootstrap it. What does recursion have to do with it? And nothing sets recurse=True, so we'll always hit the codepath above and never re-bootstrap.
https://codereview.chromium.org/352543003/diff/1/git_cache.py File git_cache.py (right): https://codereview.chromium.org/352543003/diff/1/git_cache.py#newcode381 git_cache.py:381: # The cache is corrupt, bootstrap it. On 2014/06/24 00:49:37, agable wrote: > What does recursion have to do with it? And nothing sets recurse=True, so we'll > always hit the codepath above and never re-bootstrap. Oops, that was supposed to default to recurse=True, fixed.
pingaling
https://codereview.chromium.org/352543003/diff/20001/git_cache.py File git_cache.py (right): https://codereview.chromium.org/352543003/diff/20001/git_cache.py#newcode383 git_cache.py:383: self.populate(depth, shallow, True, verbose, ignore_lock, False) I would like this better if you split out the logic into sub-methods along these lines: if needs_bootstrap(): bootstrap() else: try: fetch() except: bootstrap() fetch() As it's currently written, this is getting pretty spaghetti-like, and I don't fully understand the ramifications of, e.g., calling lockfile.lock() twice, or potentially creating multiple temp dirs. Also, minor point: os.remove(config_file) can fail on Windows, you need to catch that (and maybe retry).
Refactored, also the forcing mechanism is now explicit (force=True) rather than the implicit delete config file.
I'd like us to track the occurrence of this, and also archive corrupt repos for later analysis. Can you add code for communicating information about corrupt repos to the caller (e.g. bot_update.py), and also save the corrupted repo to /tmp (or %TEMPDIR%)?
On 2014/06/26 05:52:47, szager1 wrote: > I'd like us to track the occurrence of this, and also archive corrupt repos for > later analysis. > > Can you add code for communicating information about corrupt repos to the caller > (e.g. bot_update.py), and also save the corrupted repo to /tmp (or %TEMPDIR%)? Er.... I'd actually put a caveat there, which is we should only archive to tmp on whitelisted host names, and then whitelist the hosts on fyi or something. Maybe? Now that I write this, I'm not actually sure what would be better...
New PS attempts to moves to /tmp, and just rmtrees if that fails. https://codereview.chromium.org/352543003/diff/40001/git_cache.py File git_cache.py (right): https://codereview.chromium.org/352543003/diff/40001/git_cache.py#newcode28 git_cache.py:28: GIT_CACHE_CORRUPT_MESSAGE = 'The Git cache is corrupt!! Re-bootstrapping now.' I was planning on tracking the occurrence of this by having bot_update grep for this string in the gclient sync output.
https://codereview.chromium.org/352543003/diff/40001/git_cache.py File git_cache.py (right): https://codereview.chromium.org/352543003/diff/40001/git_cache.py#newcode28 git_cache.py:28: GIT_CACHE_CORRUPT_MESSAGE = 'The Git cache is corrupt!! Re-bootstrapping now.' On 2014/06/26 17:35:03, Ryan T. wrote: > I was planning on tracking the occurrence of this by having bot_update grep for > this string in the gclient sync output. That sounds good, but how about getting rid of the '!' characters, and put all the info on one line for easy grep-ing: GIT_CACHE_CORRUPT_TEMPLATE = 'WARNING: Corrupt git cache for %s; saved for debugging to %s' ... print GIT_CACHE_CORRUPT_TEMPLATE % (remote_url, tmp_dir) https://codereview.chromium.org/352543003/diff/40001/git_cache.py#newcode389 git_cache.py:389: try: I don't think you really need nested 'try' here: tempdir = None try: tempdir = self._ensure_bootstrap(depth, bootstrap) rundir = tempdir or self.mirror_path self._fetch(rundir, verbose, depth) except RefsHeadsFailedToFetch: gclient_utils.rmtree(rundir) tempdir = self._ensure_bootstrap(depth, bootstrap, force=True) assert tempdir self._fetch(tempdir, verbose, depth) finally: if tempdir: os.rename(tempdir, self.mirror_path) if not ignore_lock: lockfile.unlock()
https://codereview.chromium.org/352543003/diff/40001/git_cache.py File git_cache.py (right): https://codereview.chromium.org/352543003/diff/40001/git_cache.py#newcode28 git_cache.py:28: GIT_CACHE_CORRUPT_MESSAGE = 'The Git cache is corrupt!! Re-bootstrapping now.' On 2014/06/26 20:25:36, szager1 wrote: > On 2014/06/26 17:35:03, Ryan T. wrote: > > I was planning on tracking the occurrence of this by having bot_update grep > for > > this string in the gclient sync output. > > That sounds good, but how about getting rid of the '!' characters, and put all > the info on one line for easy grep-ing: > > GIT_CACHE_CORRUPT_TEMPLATE = 'WARNING: Corrupt git cache for %s; saved for > debugging to %s' > > ... > print GIT_CACHE_CORRUPT_TEMPLATE % (remote_url, tmp_dir) So the point of having this as a macro up to is so I can do something like this in bot_update: output = call(...) idx = output.find(git_cache.GIT_CACHE_CORRUPT_MESSAGE) so making it a template defeats that purpose. Maybe GIT_CACHE_CORRUPT_MESSAGE = 'WARNING: Corrupt git cache found.' ... print '%s cache for %s moved to %s' % (GIT_CACHE_CORRUPT_MESSAGE, remote_url, tmp_dir) Also the move() is in a try/except, so it might not always run. https://codereview.chromium.org/352543003/diff/40001/git_cache.py#newcode389 git_cache.py:389: try: On 2014/06/26 20:25:36, szager1 wrote: > I don't think you really need nested 'try' here: > tempdir = None > try: > tempdir = self._ensure_bootstrap(depth, bootstrap) > rundir = tempdir or self.mirror_path > self._fetch(rundir, verbose, depth) > except RefsHeadsFailedToFetch: > gclient_utils.rmtree(rundir) > tempdir = self._ensure_bootstrap(depth, bootstrap, force=True) > assert tempdir > self._fetch(tempdir, verbose, depth) > finally: > if tempdir: > os.rename(tempdir, self.mirror_path) > if not ignore_lock: > lockfile.unlock() Done.
lgtm https://codereview.chromium.org/352543003/diff/40001/git_cache.py File git_cache.py (right): https://codereview.chromium.org/352543003/diff/40001/git_cache.py#newcode28 git_cache.py:28: GIT_CACHE_CORRUPT_MESSAGE = 'The Git cache is corrupt!! Re-bootstrapping now.' On 2014/06/26 21:37:58, Ryan T. wrote: > On 2014/06/26 20:25:36, szager1 wrote: > > On 2014/06/26 17:35:03, Ryan T. wrote: > > > I was planning on tracking the occurrence of this by having bot_update grep > > for > > > this string in the gclient sync output. > > > > That sounds good, but how about getting rid of the '!' characters, and put all > > the info on one line for easy grep-ing: > > > > GIT_CACHE_CORRUPT_TEMPLATE = 'WARNING: Corrupt git cache for %s; saved for > > debugging to %s' > > > > ... > > print GIT_CACHE_CORRUPT_TEMPLATE % (remote_url, tmp_dir) > > So the point of having this as a macro up to is so I can do something like this > in bot_update: > output = call(...) > idx = output.find(git_cache.GIT_CACHE_CORRUPT_MESSAGE) > so making it a template defeats that purpose. Maybe > > GIT_CACHE_CORRUPT_MESSAGE = 'WARNING: Corrupt git cache found.' > > ... > print '%s cache for %s moved to %s' % (GIT_CACHE_CORRUPT_MESSAGE, remote_url, > tmp_dir) > > Also the move() is in a try/except, so it might not always run. That's fine, but it would be nice to be able to: output = call(...) for line in output: if line.startswith(git_cache.GIT_CACHE_CORRUPT_MESSAGE): # parse url and temp dir from line Even though the move might fail, we still want to print the path so we can at least try to debug it.
Yeah it does get printed, during time of move.
The CQ bit was checked by hinoka@google.com
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/hinoka@chromium.org/352543003/100001
The CQ bit was unchecked by commit-bot@chromium.org
Presubmit check for 352543003-100001 failed and returned exit status 1. Running presubmit commit checks ... Checking out rietveld... Running save-description-on-failure.sh Running push-basic.sh Running upstream.sh Running submit-from-new-dir.sh Running abandon.sh Running submodule-merge-test.sh Running upload-local-tracking-branch.sh Running hooks.sh Running post-dcommit-hook-test.sh Running upload-stale.sh Running patch.sh Running basic.sh ** Presubmit ERRORS ** Pylint (97 files) (33.86s) failed ************* Module /b/infra_internal/commit_queue/workdir/tools/depot_tools/git_cache.py E0602:340,10:Mirror._ensure_bootstrapped: Undefined variable 'shutil' Presubmit checks took 101.9s to calculate.
Well this is embarrassing, pylint did catch this on my local machine, but it was buried under a ton of fake failures. fixed.
The CQ bit was checked by hinoka@google.com
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/hinoka@chromium.org/352543003/120001
Message was sent while issue was closed.
Change committed as 280123 |