|
|
Chromium Code Reviews|
Created:
4 years, 8 months ago by nednguyen Modified:
4 years, 8 months ago Reviewers:
aiolos (Not reviewing) CC:
catapult-reviews_chromium.org Base URL:
https://github.com/catapult-project/catapult@master Target Ref:
refs/heads/master Project:
catapult Visibility:
Public. |
DescriptionChange telemetry run_tests.py to prefetch all binaries before tests in parallel
This also set DISABLE_CLOUD_STORAGE_IO to '1' in each
subprocess of telemetry's run_tests.py to make sure that
telemetry don't try to download files in parallel which has
created deadlock/Os permission errors.
BUG=chromium:580919
Committed: https://chromium.googlesource.com/external/github.com/catapult-project/catapult/+/f0d8b9cd7219ed054e1229a8724b7d96a813b557
Patch Set 1 #
Total comments: 9
Patch Set 2 : Support 1 & '1' for DISABLE_CLOUD_STORAGE_IO #Patch Set 3 : Make sure to also fetch client configs #
Total comments: 2
Patch Set 4 : Switch Os & Arch #
Total comments: 1
Patch Set 5 : Fix some of the tests #Patch Set 6 : Make fetch_reference_chrome_binary an argument of FetchBinaryDependencies #Patch Set 7 : Remove unnecessary change #Patch Set 8 : Fix bug in DependencyManager.PrefetchPaths #
Total comments: 1
Messages
Total messages: 27 (9 generated)
Description was changed from ========== Change telemetry run_tests.py to prefetch all binaries before running tests in parallel. This is to make sure that telemetry don't try to download files in parallel which has created deadlock. BUG=chromium:580919 ========== to ========== Change telemetry run_tests.py to prefetch all binaries before running tests in parallel. This also set DISABLE_CLOUD_STORAGE_IO to '1' in each subprocess of telemetry's run_tests.py to make sure that telemetry don't try to download files in parallel which has created deadlock. BUG=chromium:580919 ==========
nednguyen@google.com changed reviewers: + aiolos@chromium.org
https://codereview.chromium.org/1838863002/diff/1/catapult_base/catapult_base... File catapult_base/catapult_base/cloud_storage.py (right): https://codereview.chromium.org/1838863002/diff/1/catapult_base/catapult_base... catapult_base/catapult_base/cloud_storage.py:142: disable_cloud_storage_env_val) Reviewers: I initially set os.environ[DISABLE_CLOUD_STORAGE_IO] to 1 and realized that it should be '1' instead. This is to improve debuggability when things go wrong.
https://codereview.chromium.org/1838863002/diff/1/catapult_base/catapult_base... File catapult_base/catapult_base/cloud_storage.py (right): https://codereview.chromium.org/1838863002/diff/1/catapult_base/catapult_base... catapult_base/catapult_base/cloud_storage.py:142: disable_cloud_storage_env_val) On 2016/03/28 17:41:06, nednguyen wrote: > Reviewers: I initially set os.environ[DISABLE_CLOUD_STORAGE_IO] to 1 and > realized that it should be '1' instead. This is to improve debuggability when > things go wrong. Is there a reason why we shouldn't just make both of them work instead? https://codereview.chromium.org/1838863002/diff/1/catapult_base/catapult_base... catapult_base/catapult_base/cloud_storage.py:143: if (disable_cloud_storage_env_val == '1' and ie, why isn't this just if (disable_cloud_storage_env_val and ....): https://codereview.chromium.org/1838863002/diff/1/telemetry/telemetry/interna... File telemetry/telemetry/internal/util/binary_manager.py (right): https://codereview.chromium.org/1838863002/diff/1/telemetry/telemetry/interna... telemetry/telemetry/internal/util/binary_manager.py:71: def FetchBinaryDeps(platform): This is the hard way of doing things, and doesn't download any client-specific dependencies. Is there a reason you aren't just calling PrefetchPaths on _binary_manager during/right after InitDependencyManager? https://codereview.chromium.org/1838863002/diff/1/telemetry/telemetry/testing... File telemetry/telemetry/testing/run_tests.py (right): https://codereview.chromium.org/1838863002/diff/1/telemetry/telemetry/testing... telemetry/telemetry/testing/run_tests.py:131: binary_manager.FetchBinaryDeps(platform) I'm pretty sure this needs to be done in main instead of here. I highly suggest doing this right after the InitDependencyManager call. Or better yet, change the Init to optionally prefetch the dependencies at initialization time. That should help avoid any multi-process race conditions (which I'm pretty sure is causing the redness you're seeing in the try bots.)
https://codereview.chromium.org/1838863002/diff/1/catapult_base/catapult_base... File catapult_base/catapult_base/cloud_storage.py (right): https://codereview.chromium.org/1838863002/diff/1/catapult_base/catapult_base... catapult_base/catapult_base/cloud_storage.py:142: disable_cloud_storage_env_val) On 2016/03/28 19:08:32, aiolos wrote: > On 2016/03/28 17:41:06, nednguyen wrote: > > Reviewers: I initially set os.environ[DISABLE_CLOUD_STORAGE_IO] to 1 and > > realized that it should be '1' instead. This is to improve debuggability when > > things go wrong. > > Is there a reason why we shouldn't just make both of them work instead? Done. https://codereview.chromium.org/1838863002/diff/1/catapult_base/catapult_base... catapult_base/catapult_base/cloud_storage.py:143: if (disable_cloud_storage_env_val == '1' and On 2016/03/28 19:08:32, aiolos wrote: > ie, why isn't this just > > if (disable_cloud_storage_env_val and > ....): People can do: 'DISABLE_CLOUD_STORAGE_IO' = '0' which will make this evaluate to True, and it is probably not what they meant. https://codereview.chromium.org/1838863002/diff/1/telemetry/telemetry/interna... File telemetry/telemetry/internal/util/binary_manager.py (right): https://codereview.chromium.org/1838863002/diff/1/telemetry/telemetry/interna... telemetry/telemetry/internal/util/binary_manager.py:71: def FetchBinaryDeps(platform): On 2016/03/28 19:08:32, aiolos wrote: > This is the hard way of doing things, and doesn't download any client-specific > dependencies. > > Is there a reason you aren't just calling PrefetchPaths on _binary_manager > during/right after InitDependencyManager? It will always download chrome reference binary whether or not people want. The size of chrome reference binary is ~ 2Gb, whereas the rest of binaries is around 1 - 2 hundred mbs, which make this an non negligible optimization. https://codereview.chromium.org/1838863002/diff/1/telemetry/telemetry/testing... File telemetry/telemetry/testing/run_tests.py (right): https://codereview.chromium.org/1838863002/diff/1/telemetry/telemetry/testing... telemetry/telemetry/testing/run_tests.py:131: binary_manager.FetchBinaryDeps(platform) On 2016/03/28 19:08:32, aiolos wrote: > I'm pretty sure this needs to be done in main instead of here. I highly suggest > doing this right after the InitDependencyManager call. Or better yet, change the > Init to optionally prefetch the dependencies at initialization time. That should > help avoid any multi-process race conditions (which I'm pretty sure is causing > the redness you're seeing in the try bots.) This code is run in single process (main() directly call Run()), and typ only kick in at line 180. The reason this is done here is deciding whether to download the reference browser binary can only be done here, after we already parse the args.
https://codereview.chromium.org/1838863002/diff/40001/telemetry/telemetry/int... File telemetry/telemetry/internal/util/binary_manager.py (right): https://codereview.chromium.org/1838863002/diff/40001/telemetry/telemetry/int... telemetry/telemetry/internal/util/binary_manager.py:83: target_platform = '%s_%s' % (platform.GetArchName(), platform.GetOSName()) should be <os_name>_<arch_name> Please switch the order. You can see warnings for this in the logs like so: WARNING:root:Dependency crash_service not configured for platform AMD64_win. Skipping prefetch. And check against the config. Ex: https://code.google.com/p/chromium/codesearch#chromium/src/third_party/catapu...
https://codereview.chromium.org/1838863002/diff/40001/telemetry/telemetry/int... File telemetry/telemetry/internal/util/binary_manager.py (right): https://codereview.chromium.org/1838863002/diff/40001/telemetry/telemetry/int... telemetry/telemetry/internal/util/binary_manager.py:83: target_platform = '%s_%s' % (platform.GetArchName(), platform.GetOSName()) On 2016/03/28 20:25:09, aiolos wrote: > should be <os_name>_<arch_name> > > Please switch the order. > > You can see warnings for this in the logs like so: > WARNING:root:Dependency crash_service not configured for platform AMD64_win. > Skipping prefetch. > > And check against the config. Ex: > https://code.google.com/p/chromium/codesearch#chromium/src/third_party/catapu... Done. Thanks for catching this!
The CQ bit was checked by aiolos@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1838863002/50001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1838863002/50001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: Catapult Windows Tryserver on master.tryserver.client.catapult (JOB_FAILED, https://build.chromium.org/p/tryserver.client.catapult/builders/Catapult%20Wi...)
On 2016/03/28 22:23:36, commit-bot: I haz the power wrote: > Dry run: CQ is trying da patch. Follow status at > https://chromium-cq-status.appspot.com/patch-status/1838863002/50001 > View timeline at > https://chromium-cq-status.appspot.com/patch-timeline/1838863002/50001 Besides the fact that os.environ can only accept strings as value, the two issues left that may make the tests failing are: 1) Some of the android test are run on all platforms using mock, and those tests also require fetching android deps. I will probably just prefetch these binary manually in run_tests.py 2) binary_manager_unittest should be updated since we change the import a bit.
> Besides the fact that os.environ can only accept strings as value, the two > issues left that may make the tests failing are: > 1) Some of the android test are run on all platforms using mock, and those tests > also require fetching android deps. I will probably just prefetch these binary > manually in run_tests.py That seems prone to being hacky, but maybe I'm just not seeing a good way to do it atm. Why are we running Android tests on all platforms when we don't support Android on any host but Linux? Can't we just disabled those tests on the platforms that aren't needed? Or if we're using Mock, do we actually need the dependencies that are being pulled in? > 2) binary_manager_unittest should be updated since we change the import a bit. https://codereview.chromium.org/1838863002/diff/50001/telemetry/telemetry/int... File telemetry/telemetry/internal/util/binary_manager.py (right): https://codereview.chromium.org/1838863002/diff/50001/telemetry/telemetry/int... telemetry/telemetry/internal/util/binary_manager.py:71: def FetchBinaryDeps(platform, client_config): Naming nit: FetchDependenciesIfNeeded (or remove "IfNeeded" from FetchReferenceBrowserBinaryIfNeeded for naming consistency.) It would be nice to have the name reflect that not all of the dependencies are fetched (since we don't download the reference browser) but I agree with your choice for brevity in this case.
On 2016/03/28 22:39:45, aiolos wrote: > > Besides the fact that os.environ can only accept strings as value, the two > > issues left that may make the tests failing are: > > 1) Some of the android test are run on all platforms using mock, and those > tests > > also require fetching android deps. I will probably just prefetch these binary > > manually in run_tests.py > > That seems prone to being hacky, but maybe I'm just not seeing a good way to do > it atm. > > Why are we running Android tests on all platforms when we don't support Android > on any host but Linux? Can't we just disabled those tests on the platforms that > aren't needed? Or if we're using Mock, do we actually need the dependencies that > are being pulled in? Yeah, it's funky. Once I see the test run again, I can see which one we can disable. > > > 2) binary_manager_unittest should be updated since we change the import a bit. > > https://codereview.chromium.org/1838863002/diff/50001/telemetry/telemetry/int... > File telemetry/telemetry/internal/util/binary_manager.py (right): > > https://codereview.chromium.org/1838863002/diff/50001/telemetry/telemetry/int... > telemetry/telemetry/internal/util/binary_manager.py:71: def > FetchBinaryDeps(platform, client_config): > Naming nit: FetchDependenciesIfNeeded (or remove "IfNeeded" from > FetchReferenceBrowserBinaryIfNeeded for naming consistency.) Done. > > It would be nice to have the name reflect that not all of the dependencies are > fetched (since we don't download the reference browser) but I agree with your > choice for brevity in this case.
On 2016/03/28 22:45:01, nednguyen wrote: > On 2016/03/28 22:39:45, aiolos wrote: > > > Besides the fact that os.environ can only accept strings as value, the two > > > issues left that may make the tests failing are: > > > 1) Some of the android test are run on all platforms using mock, and those > > tests > > > also require fetching android deps. I will probably just prefetch these > binary > > > manually in run_tests.py > > > > That seems prone to being hacky, but maybe I'm just not seeing a good way to > do > > it atm. > > > > Why are we running Android tests on all platforms when we don't support > Android > > on any host but Linux? Can't we just disabled those tests on the platforms > that > > aren't needed? Or if we're using Mock, do we actually need the dependencies > that > > are being pulled in? > > Yeah, it's funky. Once I see the test run again, I can see which one we can > disable. > > > > > > 2) binary_manager_unittest should be updated since we change the import a > bit. > > > > > https://codereview.chromium.org/1838863002/diff/50001/telemetry/telemetry/int... > > File telemetry/telemetry/internal/util/binary_manager.py (right): > > > > > https://codereview.chromium.org/1838863002/diff/50001/telemetry/telemetry/int... > > telemetry/telemetry/internal/util/binary_manager.py:71: def > > FetchBinaryDeps(platform, client_config): > > Naming nit: FetchDependenciesIfNeeded (or remove "IfNeeded" from > > FetchReferenceBrowserBinaryIfNeeded for naming consistency.) > > Done. > > > > It would be nice to have the name reflect that not all of the dependencies are > > fetched (since we don't download the reference browser) but I agree with your > > choice for brevity in this case. I make "fetch_reference_chrome_binary" an argument to FetchBinaryDependencies which probably address this issue.
On 2016/03/28 23:08:12, nednguyen wrote: > On 2016/03/28 22:45:01, nednguyen wrote: > > On 2016/03/28 22:39:45, aiolos wrote: > > > > Besides the fact that os.environ can only accept strings as value, the two > > > > issues left that may make the tests failing are: > > > > 1) Some of the android test are run on all platforms using mock, and those > > > tests > > > > also require fetching android deps. I will probably just prefetch these > > binary > > > > manually in run_tests.py > > > > > > That seems prone to being hacky, but maybe I'm just not seeing a good way to > > do > > > it atm. > > > > > > Why are we running Android tests on all platforms when we don't support > > Android > > > on any host but Linux? Can't we just disabled those tests on the platforms > > that > > > aren't needed? Or if we're using Mock, do we actually need the dependencies > > that > > > are being pulled in? > > > > Yeah, it's funky. Once I see the test run again, I can see which one we can > > disable. > > > > > > > > > 2) binary_manager_unittest should be updated since we change the import a > > bit. > > > > > > > > > https://codereview.chromium.org/1838863002/diff/50001/telemetry/telemetry/int... > > > File telemetry/telemetry/internal/util/binary_manager.py (right): > > > > > > > > > https://codereview.chromium.org/1838863002/diff/50001/telemetry/telemetry/int... > > > telemetry/telemetry/internal/util/binary_manager.py:71: def > > > FetchBinaryDeps(platform, client_config): > > > Naming nit: FetchDependenciesIfNeeded (or remove "IfNeeded" from > > > FetchReferenceBrowserBinaryIfNeeded for naming consistency.) > > > > Done. > > > > > > It would be nice to have the name reflect that not all of the dependencies > are > > > fetched (since we don't download the reference browser) but I agree with > your > > > choice for brevity in this case. > > I make "fetch_reference_chrome_binary" an argument to FetchBinaryDependencies > which probably address this issue. Looks like there is something funky going on with the win platform. i.e: it stills try to fetch the binaries even the prefetch should be done already.
> I make "fetch_reference_chrome_binary" an argument to FetchBinaryDependencies > which probably address this issue. Nice. Should that have a default value? > Looks like there is something funky going on with the win platform. i.e: it > stills try to fetch the binaries even the prefetch should be done already. The error is unclear on whether the file failed to download and doesn't exist locally, or if it thinks that the local file has a different hash than expected. All of the errors I've seen are on crash_service, which is a win-specific dependency. So the issue may be because of windows, or that specific dependency. I didn't see anything in the log to suggest that the download failed, but you may want to add some debug logging to figure out for sure.
https://codereview.chromium.org/1838863002/diff/130001/dependency_manager/dep... File dependency_manager/dependency_manager/manager.py (right): https://codereview.chromium.org/1838863002/diff/130001/dependency_manager/dep... dependency_manager/dependency_manager/manager.py:155: if missing_deps: Here is the bug
On 2016/03/29 00:13:10, aiolos wrote: > > I make "fetch_reference_chrome_binary" an argument to FetchBinaryDependencies > > which probably address this issue. > > Nice. Should that have a default value? Not for this case. Removing the default_value force all call sites to explicitly specify the value for fetch_reference_chrome_binary param, making it clear for code reader that this method handles fetching chrome binary specially. > > > Looks like there is something funky going on with the win platform. i.e: it > > stills try to fetch the binaries even the prefetch should be done already. > > The error is unclear on whether the file failed to download and doesn't exist > locally, or if it thinks that the local file has a different hash than expected. > All of the errors I've seen are on crash_service, which is a win-specific > dependency. So the issue may be because of windows, or that specific dependency. > I didn't see anything in the log to suggest that the download failed, but you > may want to add some debug logging to figure out for sure.
Description was changed from ========== Change telemetry run_tests.py to prefetch all binaries before running tests in parallel. This also set DISABLE_CLOUD_STORAGE_IO to '1' in each subprocess of telemetry's run_tests.py to make sure that telemetry don't try to download files in parallel which has created deadlock. BUG=chromium:580919 ========== to ========== Change telemetry run_tests.py to prefetch all binaries before tests in parallel. This also set DISABLE_CLOUD_STORAGE_IO to '1' in each subprocess of telemetry's run_tests.py to make sure that telemetry don't try to download files in parallel which has created deadlock. BUG=chromium:580919 ==========
Description was changed from ========== Change telemetry run_tests.py to prefetch all binaries before tests in parallel. This also set DISABLE_CLOUD_STORAGE_IO to '1' in each subprocess of telemetry's run_tests.py to make sure that telemetry don't try to download files in parallel which has created deadlock. BUG=chromium:580919 ========== to ========== Change telemetry run_tests.py to prefetch all binaries before tests in parallel This also set DISABLE_CLOUD_STORAGE_IO to '1' in each subprocess of telemetry's run_tests.py to make sure that telemetry don't try to download files in parallel which has created deadlock. BUG=chromium:580919 ==========
Description was changed from ========== Change telemetry run_tests.py to prefetch all binaries before tests in parallel This also set DISABLE_CLOUD_STORAGE_IO to '1' in each subprocess of telemetry's run_tests.py to make sure that telemetry don't try to download files in parallel which has created deadlock. BUG=chromium:580919 ========== to ========== Change telemetry run_tests.py to prefetch all binaries before tests in parallel This also set DISABLE_CLOUD_STORAGE_IO to '1' in each subprocess of telemetry's run_tests.py to make sure that telemetry don't try to download files in parallel which has created deadlock/Os permission errors. BUG=chromium:580919 ==========
lgtm
The CQ bit was checked by nednguyen@google.com
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1838863002/130001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1838863002/130001
Message was sent while issue was closed.
Description was changed from ========== Change telemetry run_tests.py to prefetch all binaries before tests in parallel This also set DISABLE_CLOUD_STORAGE_IO to '1' in each subprocess of telemetry's run_tests.py to make sure that telemetry don't try to download files in parallel which has created deadlock/Os permission errors. BUG=chromium:580919 ========== to ========== Change telemetry run_tests.py to prefetch all binaries before tests in parallel This also set DISABLE_CLOUD_STORAGE_IO to '1' in each subprocess of telemetry's run_tests.py to make sure that telemetry don't try to download files in parallel which has created deadlock/Os permission errors. BUG=chromium:580919 Committed: https://chromium.googlesource.com/external/github.com/catapult-project/catapu... ==========
Message was sent while issue was closed.
Committed patchset #8 (id:130001) as https://chromium.googlesource.com/external/github.com/catapult-project/catapu... |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
