|
|
DescriptionMake get_syzygy_binaries.py resistant to cloud failures.
This now uses gsutil to perform downloads which ensures that the downloaded data is not corrupt and matches its stored checksum. It also tries repeatedly with an exponential backoff.
BUG=
Committed: https://crrev.com/3641d5ffaee287ab74c7a2c0270e28d5e9864a7d
Cr-Commit-Position: refs/heads/master@{#338718}
Patch Set 1 #
Total comments: 2
Patch Set 2 : Add exponential backoff, use gsutil. #Patch Set 3 : Cleanup. #
Total comments: 4
Patch Set 4 : Fix nits. #Messages
Total messages: 16 (3 generated)
chrisha@chromium.org changed reviewers: + thakis@chromium.org
This script failed on multiple bots over the last 24 hours due to flaky cloud storage connections. At the sheriff's request, adding some fault tolerance. PTAL?
Do any of the other downloading scripts do anything like this? Is this more likely to happen for syzygy for some reason? Do more bots use syzygy than the other downloaded thingies? (I'm fine with this CL, just wondering why syzygy needs this but nothing else does) https://codereview.chromium.org/1236623002/diff/1/build/get_syzygy_binaries.py File build/get_syzygy_binaries.py (right): https://codereview.chromium.org/1236623002/diff/1/build/get_syzygy_binaries.p... build/get_syzygy_binaries.py:267: retries -= 1 Trying again immediately likely won't work. Try sleeping 4s after the first try, 8s after the second or something like that ( https://en.wikipedia.org/wiki/Exponential_backoff etc)
On 2015/07/10 15:04:39, Nico wrote: > Do any of the other downloading scripts do anything like this? Is this more > likely to happen for syzygy for some reason? Do more bots use syzygy than the > other downloaded thingies? > > (I'm fine with this CL, just wondering why syzygy needs this but nothing else > does) That's a good question. Everybody else uses "download_from_google_storage" (doesn't quite work for us, because we need to unpack archives), which in turn uses "gsutil". Let me look into using that instead, as it may have it's own internal fault tolerance mechanism. > > https://codereview.chromium.org/1236623002/diff/1/build/get_syzygy_binaries.py > File build/get_syzygy_binaries.py (right): > > https://codereview.chromium.org/1236623002/diff/1/build/get_syzygy_binaries.p... > build/get_syzygy_binaries.py:267: retries -= 1 > Trying again immediately likely won't work. Try sleeping 4s after the first try, > 8s after the second or something like that ( > https://en.wikipedia.org/wiki/Exponential_backoff etc) Will do if "gsutil" doesn't turn out to be a better choice.
On 2015/07/10 15:20:02, chrisha wrote: > On 2015/07/10 15:04:39, Nico wrote: > > Do any of the other downloading scripts do anything like this? Is this more > > likely to happen for syzygy for some reason? Do more bots use syzygy than the > > other downloaded thingies? > > > > (I'm fine with this CL, just wondering why syzygy needs this but nothing else > > does) > > That's a good question. Everybody else uses "download_from_google_storage" > (doesn't quite work for us, because we need to unpack archives), which in turn > uses "gsutil". Let me look into using that instead, as it may have it's own > internal fault tolerance mechanism. (the clang downloader script also rolls its own thing since it predates download_from_google_storage and I'm not aware of problems caused by that. It currently shells out to curl which might have retry logic too. But there's no curl on Windows, so we want to move everything to manually downloading from python – in fact on Windows we already do, but there aren't many Windows bots yet – and I wasn't yet planning on adding explicit retry logic. So this is pretty relevant to my interests :-) )
> (the clang downloader script also rolls its own thing since it predates > download_from_google_storage and I'm not aware of problems caused by that. It > currently shells out to curl which might have retry logic too. But there's no > curl on Windows, so we want to move everything to manually downloading from > python – in fact on Windows we already do, but there aren't many Windows bots > yet – and I wasn't yet planning on adding explicit retry logic. So this is > pretty relevant to my interests :-) ) So, turns out that "gsutil" does retries, exponential backoff, hash validation, resumable transfers, etc, making it a much better choice for this (not sure why it wasn't used initially). AFAIK this isn't something that happens often, as we've been rolling Syzygy DEPS this way for 2 years and this is the *first* time it's flaked. However, it flaked twice in about 6 hours, so clearly it can happen again. If you guys are storing things on cloud storage maybe that's the way to go? All of the bots have depot_tools, so it's always available.
On 2015/07/10 15:47:34, chrisha wrote: > > (the clang downloader script also rolls its own thing since it predates > > download_from_google_storage and I'm not aware of problems caused by that. It > > currently shells out to curl which might have retry logic too. But there's no > > curl on Windows, so we want to move everything to manually downloading from > > python – in fact on Windows we already do, but there aren't many Windows > bots > > yet – and I wasn't yet planning on adding explicit retry logic. So this is > > pretty relevant to my interests :-) ) > > So, turns out that "gsutil" does retries, exponential backoff, hash validation, > resumable transfers, etc, making it a much better choice for this (not sure why > it wasn't used initially). AFAIK this isn't something that happens often, as > we've been rolling Syzygy DEPS this way for 2 years and this is the *first* time > it's flaked. However, it flaked twice in about 6 hours, so clearly it can happen > again. If you guys are storing things on cloud storage maybe that's the way to > go? All of the bots have depot_tools, so it's always available. Yeah, maybe. The clang stuff is used pretty widely, some places I don't know much about (the v8 waterfall, the nacl waterfall, I think the webrtc waterfall, maybe somewhere in angle land) – src/tools/clang gets mirrored into its own git repo that folks pull, so I think it's good to not depend on many things in there. gsutil should be available in most places though.
PTAL? https://codereview.chromium.org/1236623002/diff/1/build/get_syzygy_binaries.py File build/get_syzygy_binaries.py (right): https://codereview.chromium.org/1236623002/diff/1/build/get_syzygy_binaries.p... build/get_syzygy_binaries.py:267: retries -= 1 On 2015/07/10 15:04:39, Nico wrote: > Trying again immediately likely won't work. Try sleeping 4s after the first try, > 8s after the second or something like that ( > https://en.wikipedia.org/wiki/Exponential_backoff etc) Err, yup. Meant to do that.
lgtm (didn't you say that gsutil already does retries? if so, you probably don't need to do this yourself too?) https://codereview.chromium.org/1236623002/diff/40001/build/get_syzygy_binari... File build/get_syzygy_binaries.py (right): https://codereview.chromium.org/1236623002/diff/40001/build/get_syzygy_binari... build/get_syzygy_binaries.py:241: for path in os.environ['PATH'].split(';'): os.pathsep instead of ; https://codereview.chromium.org/1236623002/diff/40001/build/get_syzygy_binari... build/get_syzygy_binaries.py:313: archive = zipfile.ZipFile(open(path, 'rb')) nit: `with open(path, 'rb') as path:`, then you don't need the explicit close call
On 2015/07/14 17:58:41, Nico wrote: > (didn't you say that gsutil already does retries? if so, you probably don't need > to do this yourself too?) gsutil does retries for certain things (dropped connections, corruption, etc), but not for unreachable network. Figured a small retry loop for that wouldn't hurt. (But yeah, it may be a little overkill.)
https://codereview.chromium.org/1236623002/diff/40001/build/get_syzygy_binari... File build/get_syzygy_binaries.py (right): https://codereview.chromium.org/1236623002/diff/40001/build/get_syzygy_binari... build/get_syzygy_binaries.py:241: for path in os.environ['PATH'].split(';'): On 2015/07/14 17:58:41, Nico wrote: > os.pathsep instead of ; Done. https://codereview.chromium.org/1236623002/diff/40001/build/get_syzygy_binari... build/get_syzygy_binaries.py:313: archive = zipfile.ZipFile(open(path, 'rb')) On 2015/07/14 17:58:41, Nico wrote: > nit: `with open(path, 'rb') as path:`, then you don't need the explicit close > call Done.
The CQ bit was checked by chrisha@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from thakis@chromium.org Link to the patchset: https://codereview.chromium.org/1236623002/#ps60001 (title: "Fix nits.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1236623002/60001
Message was sent while issue was closed.
Committed patchset #4 (id:60001)
Message was sent while issue was closed.
Patchset 4 (id:??) landed as https://crrev.com/3641d5ffaee287ab74c7a2c0270e28d5e9864a7d Cr-Commit-Position: refs/heads/master@{#338718} |