|
|
Created:
5 years, 11 months ago by aiolos (Not reviewing) Modified:
5 years, 9 months ago CC:
tonyg, chromium-reviews, telemetry+watch_chromium.org, Ken Russell (switch to Gerrit) Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionRefactor serving_dirs with two goals in mind:
1) Move the serving_dir related logic to one place and add some comments to improve hackability and stability for cloud_storage in telemetry.
2) Make serving_dirs available for user stories as part of the refactor to remove page_sets.
This is the second part of https://codereview.chromium.org/794493004/, broken out for reviewer ease.
BUG=454531
BUG=435063
Committed: https://crrev.com/eed596dced7b9e3f7d21c84f695a39b946c62bf2
Cr-Commit-Position: refs/heads/master@{#320523}
Committed: https://crrev.com/cd55fdb2cfcd494abab3e4ccab7934b14f94581b
Cr-Commit-Position: refs/heads/master@{#320990}
Patch Set 1 #
Total comments: 16
Patch Set 2 : Add unittests. #
Total comments: 4
Patch Set 3 : Refactor unittests. #
Total comments: 1
Patch Set 4 : Better refactor. #
Total comments: 2
Patch Set 5 : Make paths win friendly. #
Total comments: 2
Patch Set 6 : Properly check if in the root dir, upadate tests to catch invalid directory. #
Total comments: 3
Patch Set 7 : Rebase #Patch Set 8 : Fixed import. #Messages
Total messages: 57 (12 generated)
aiolos@chromium.org changed reviewers: + dtu@chromium.org, sullivan@chromium.org, tonyg@chromium.org
aiolos@chromium.org changed reviewers: + nednguyen@google.com
tonyg@chromium.org changed reviewers: - tonyg@chromium.org
aiolos@chromium.org changed reviewers: + chrishenry@google.com
Ping. Adding Chris as a reviewer as well.
https://codereview.chromium.org/838253005/diff/1/tools/telemetry/telemetry/pa... File tools/telemetry/telemetry/page/__init__.py (right): https://codereview.chromium.org/838253005/diff/1/tools/telemetry/telemetry/pa... tools/telemetry/telemetry/page/__init__.py:199: return None Can the logic below be moved to user_story? https://codereview.chromium.org/838253005/diff/1/tools/telemetry/telemetry/pa... File tools/telemetry/telemetry/page/page_set_unittest.py (right): https://codereview.chromium.org/838253005/diff/1/tools/telemetry/telemetry/pa... tools/telemetry/telemetry/page/page_set_unittest.py:29: os.path.join(real_directory_path, 'd')]) How does this change happen and is it correct? https://codereview.chromium.org/838253005/diff/1/tools/telemetry/telemetry/us... File tools/telemetry/telemetry/user_story/user_story_set.py (right): https://codereview.chromium.org/838253005/diff/1/tools/telemetry/telemetry/us... tools/telemetry/telemetry/user_story/user_story_set.py:121: def UpdateServingDirDataIfNeeded(self): Why do you think it's better to move this functionality to here? https://codereview.chromium.org/838253005/diff/1/tools/telemetry/telemetry/ut... File tools/telemetry/telemetry/util/find_dependencies.py (left): https://codereview.chromium.org/838253005/diff/1/tools/telemetry/telemetry/ut... tools/telemetry/telemetry/util/find_dependencies.py:96: yield page.serving_dir What is this change for?
https://codereview.chromium.org/838253005/diff/1/tools/telemetry/telemetry/pa... File tools/telemetry/telemetry/page/__init__.py (right): https://codereview.chromium.org/838253005/diff/1/tools/telemetry/telemetry/pa... tools/telemetry/telemetry/page/__init__.py:199: return None On 2015/02/03 01:37:37, nednguyen wrote: > Can the logic below be moved to user_story? user_story doesn't currently have a file_path parameter. If you want me to add one to user_story, I can move the logic below. https://codereview.chromium.org/838253005/diff/1/tools/telemetry/telemetry/pa... File tools/telemetry/telemetry/page/page_set_unittest.py (right): https://codereview.chromium.org/838253005/diff/1/tools/telemetry/telemetry/pa... tools/telemetry/telemetry/page/page_set_unittest.py:29: os.path.join(real_directory_path, 'd')]) On 2015/02/03 01:37:37, nednguyen wrote: > How does this change happen and is it correct? The change happens because I changed user_story_set to return a list of it's serving_dirs and it's page's serving dirs. The only places we need the serving dir, we have also needed to iterate over the pages in the pageset to add their serving dirs. I changed the logic to include the serving_dirs of it's user_stories as well, and thus needed to change the test. https://codereview.chromium.org/838253005/diff/1/tools/telemetry/telemetry/us... File tools/telemetry/telemetry/user_story/user_story_set.py (right): https://codereview.chromium.org/838253005/diff/1/tools/telemetry/telemetry/us... tools/telemetry/telemetry/user_story/user_story_set.py:121: def UpdateServingDirDataIfNeeded(self): On 2015/02/03 01:37:37, nednguyen wrote: > Why do you think it's better to move this functionality to here? I definitely think we should move the function from it's current locations to minimize maintenance and the possibility of code rot, but have no attachment to it being here specifically. It might be better to put this in a util file, or possibly add it to cloud_storage.py. Thoughts? https://codereview.chromium.org/838253005/diff/1/tools/telemetry/telemetry/ut... File tools/telemetry/telemetry/util/find_dependencies.py (left): https://codereview.chromium.org/838253005/diff/1/tools/telemetry/telemetry/ut... tools/telemetry/telemetry/util/find_dependencies.py:96: yield page.serving_dir On 2015/02/03 01:37:37, nednguyen wrote: > What is this change for? The only time we ever use the serving_dir of a page_set, we also need to add the serving_dir for each of it's pages. So I moved the code for that into the getter for the page_set serving_dirs. This is just removing the now repetitive addition of the page's serving_dirs.
https://codereview.chromium.org/838253005/diff/1/tools/telemetry/telemetry/pa... File tools/telemetry/telemetry/page/__init__.py (right): https://codereview.chromium.org/838253005/diff/1/tools/telemetry/telemetry/pa... tools/telemetry/telemetry/page/__init__.py:199: return None On 2015/02/03 02:35:36, aiolos wrote: > On 2015/02/03 01:37:37, nednguyen wrote: > > Can the logic below be moved to user_story? > > user_story doesn't currently have a file_path parameter. If you want me to add > one to user_story, I can move the logic below. file_path is a field that only works in page's case, since there is no notion of 'url' for a generic user story. I think it's fine to keep this here. https://codereview.chromium.org/838253005/diff/1/tools/telemetry/telemetry/us... File tools/telemetry/telemetry/user_story/__init__.py (right): https://codereview.chromium.org/838253005/diff/1/tools/telemetry/telemetry/us... tools/telemetry/telemetry/user_story/__init__.py:105: updated.""" Nits: the close """ should be on the next line. https://codereview.chromium.org/838253005/diff/1/tools/telemetry/telemetry/us... File tools/telemetry/telemetry/user_story/user_story_set.py (right): https://codereview.chromium.org/838253005/diff/1/tools/telemetry/telemetry/us... tools/telemetry/telemetry/user_story/user_story_set.py:55: def serving_dirs(self): Can you add unittest coverage for user_story_set.serving_dirs? Maybe use the unittest in page_set for this one. https://codereview.chromium.org/838253005/diff/1/tools/telemetry/telemetry/us... tools/telemetry/telemetry/user_story/user_story_set.py:121: def UpdateServingDirDataIfNeeded(self): On 2015/02/03 02:35:36, aiolos wrote: > On 2015/02/03 01:37:37, nednguyen wrote: > > Why do you think it's better to move this functionality to here? > > I definitely think we should move the function from it's current locations to > minimize maintenance and the possibility of code rot, I am not sure I understand this argument, but it sounds ok to reduce the code complexity in user_story_runner. > but have no attachment to > it being here specifically. It might be better to put this in a util file, or > possibly add it to cloud_storage.py. Thoughts? cloud_storage seems to be a better location for this. We can make a method called UpdateServingDirData(serving_dir, bucket). Please add unittest of it too. cloud_storage_unittest should already provide some convenient ways for testing this without invoking real network.
https://codereview.chromium.org/838253005/diff/1/tools/telemetry/telemetry/us... File tools/telemetry/telemetry/user_story/user_story_set.py (right): https://codereview.chromium.org/838253005/diff/1/tools/telemetry/telemetry/us... tools/telemetry/telemetry/user_story/user_story_set.py:121: def UpdateServingDirDataIfNeeded(self): On 2015/02/03 18:03:55, nednguyen wrote: > On 2015/02/03 02:35:36, aiolos wrote: > > On 2015/02/03 01:37:37, nednguyen wrote: > > > Why do you think it's better to move this functionality to here? > > > > I definitely think we should move the function from it's current locations to > > minimize maintenance and the possibility of code rot, > > I am not sure I understand this argument, but it sounds ok to reduce the code > complexity in user_story_runner. > > > but have no attachment to > > it being here specifically. It might be better to put this in a util file, or > > possibly add it to cloud_storage.py. Thoughts? > > cloud_storage seems to be a better location for this. We can make a method > called UpdateServingDirData(serving_dir, bucket). > > Please add unittest of it too. cloud_storage_unittest should already provide > some convenient ways for testing this without invoking real network. cloud_storage actually doesn't seem to have unittests for any of it's important functionality, and thus doesn't have anything setup to test it without touching the network yet. There is cloud_storage_stub, which is a non-network touching stub for other unittests to use. I was planning on adding a system stub version of the function. Should I look into a way to test the cloud_storage functions without touching network?
On 2015/02/04 18:11:59, aiolos wrote: > https://codereview.chromium.org/838253005/diff/1/tools/telemetry/telemetry/us... > File tools/telemetry/telemetry/user_story/user_story_set.py (right): > > https://codereview.chromium.org/838253005/diff/1/tools/telemetry/telemetry/us... > tools/telemetry/telemetry/user_story/user_story_set.py:121: def > UpdateServingDirDataIfNeeded(self): > On 2015/02/03 18:03:55, nednguyen wrote: > > On 2015/02/03 02:35:36, aiolos wrote: > > > On 2015/02/03 01:37:37, nednguyen wrote: > > > > Why do you think it's better to move this functionality to here? > > > > > > I definitely think we should move the function from it's current locations > to > > > minimize maintenance and the possibility of code rot, > > > > I am not sure I understand this argument, but it sounds ok to reduce the code > > complexity in user_story_runner. > > > > > but have no attachment to > > > it being here specifically. It might be better to put this in a util file, > or > > > possibly add it to cloud_storage.py. Thoughts? > > > > cloud_storage seems to be a better location for this. We can make a method > > called UpdateServingDirData(serving_dir, bucket). > > > > Please add unittest of it too. cloud_storage_unittest should already provide > > some convenient ways for testing this without invoking real network. > > cloud_storage actually doesn't seem to have unittests for any of it's important > functionality, and thus doesn't have anything setup to test it without touching > the network yet. There is cloud_storage_stub, which is a non-network touching > stub for other unittests to use. I was planning on adding a system stub version > of the function. > > Should I look into a way to test the cloud_storage functions without touching > network? Yes please. The general view of telemetry unittests is more of them should work without requiring network connections.
ptal https://codereview.chromium.org/838253005/diff/1/tools/telemetry/telemetry/us... File tools/telemetry/telemetry/user_story/__init__.py (right): https://codereview.chromium.org/838253005/diff/1/tools/telemetry/telemetry/us... tools/telemetry/telemetry/user_story/__init__.py:105: updated.""" On 2015/02/03 18:03:55, nednguyen wrote: > Nits: the close """ should be on the next line. Done. https://codereview.chromium.org/838253005/diff/1/tools/telemetry/telemetry/us... File tools/telemetry/telemetry/user_story/user_story_set.py (right): https://codereview.chromium.org/838253005/diff/1/tools/telemetry/telemetry/us... tools/telemetry/telemetry/user_story/user_story_set.py:55: def serving_dirs(self): On 2015/02/03 18:03:55, nednguyen wrote: > Can you add unittest coverage for user_story_set.serving_dirs? Maybe use the > unittest in page_set for this one. There already is in page_set_unittest. (I updated it in this CL already.) Is there something in addition to the current test that you specifically want? https://codereview.chromium.org/838253005/diff/1/tools/telemetry/telemetry/us... tools/telemetry/telemetry/user_story/user_story_set.py:121: def UpdateServingDirDataIfNeeded(self): On 2015/02/04 18:11:59, aiolos wrote: > On 2015/02/03 18:03:55, nednguyen wrote: > > On 2015/02/03 02:35:36, aiolos wrote: > > > On 2015/02/03 01:37:37, nednguyen wrote: > > > > Why do you think it's better to move this functionality to here? > > > > > > I definitely think we should move the function from it's current locations > to > > > minimize maintenance and the possibility of code rot, > > > > I am not sure I understand this argument, but it sounds ok to reduce the code > > complexity in user_story_runner. > > > > > but have no attachment to > > > it being here specifically. It might be better to put this in a util file, > or > > > possibly add it to cloud_storage.py. Thoughts? > > > > cloud_storage seems to be a better location for this. We can make a method > > called UpdateServingDirData(serving_dir, bucket). > > > > Please add unittest of it too. cloud_storage_unittest should already provide > > some convenient ways for testing this without invoking real network. > > cloud_storage actually doesn't seem to have unittests for any of it's important > functionality, and thus doesn't have anything setup to test it without touching > the network yet. There is cloud_storage_stub, which is a non-network touching > stub for other unittests to use. I was planning on adding a system stub version > of the function. > > Should I look into a way to test the cloud_storage functions without touching > network? Done.
https://codereview.chromium.org/838253005/diff/20001/tools/telemetry/telemetr... File tools/telemetry/telemetry/util/cloud_storage.py (right): https://codereview.chromium.org/838253005/diff/20001/tools/telemetry/telemetr... tools/telemetry/telemetry/util/cloud_storage.py:243: @decorators.Cache Since this function is moved, it's good to keep the functionality the same as before. But we should do a follow-up to remove usage of @decorators.Cache (see http://crbug.com/459787) https://codereview.chromium.org/838253005/diff/20001/tools/telemetry/telemetr... File tools/telemetry/telemetry/util/cloud_storage_unittest.py (right): https://codereview.chromium.org/838253005/diff/20001/tools/telemetry/telemetr... tools/telemetry/telemetry/util/cloud_storage_unittest.py:45: cloud_storage._RunCommand, []) These lines are repeated many times throughout these tests with just small changes to communicate_result and assertRaises. Can you factor this code into a common helper? I think if you made the naming of the helper clear, it would make it a lot easier to see what the test is doing: assertErrorRaisedWithFailure(cloud_storage.CredentialsError, 'You are attempting to access protected data with no configured') assertErrorRaisedWithFailure(cloud_storage.CredentialsError, 'Failure: No handler was ready to authenticate.') assertErrorRaisedWithFailure(cloud_storage.PermissionError, 'status=403') My naming's not quite right, but hopefully that will make the test clearer and shorter.
ptal. https://codereview.chromium.org/838253005/diff/20001/tools/telemetry/telemetr... File tools/telemetry/telemetry/util/cloud_storage.py (right): https://codereview.chromium.org/838253005/diff/20001/tools/telemetry/telemetr... tools/telemetry/telemetry/util/cloud_storage.py:243: @decorators.Cache On 2015/02/20 17:47:37, sullivan wrote: > Since this function is moved, it's good to keep the functionality the same as > before. > > But we should do a follow-up to remove usage of @decorators.Cache (see > http://crbug.com/459787) Acknowledged. I added a TODO for it. https://codereview.chromium.org/838253005/diff/20001/tools/telemetry/telemetr... File tools/telemetry/telemetry/util/cloud_storage_unittest.py (right): https://codereview.chromium.org/838253005/diff/20001/tools/telemetry/telemetr... tools/telemetry/telemetry/util/cloud_storage_unittest.py:45: cloud_storage._RunCommand, []) On 2015/02/20 17:47:37, sullivan wrote: > These lines are repeated many times throughout these tests with just small > changes to communicate_result and assertRaises. Can you factor this code into a > common helper? I think if you made the naming of the helper clear, it would make > it a lot easier to see what the test is doing: > > assertErrorRaisedWithFailure(cloud_storage.CredentialsError, 'You are attempting > to access protected data with no configured') > assertErrorRaisedWithFailure(cloud_storage.CredentialsError, 'Failure: No > handler was ready to authenticate.') > assertErrorRaisedWithFailure(cloud_storage.PermissionError, 'status=403') > > My naming's not quite right, but hopefully that will make the test clearer and > shorter. Done.
https://codereview.chromium.org/838253005/diff/40001/tools/telemetry/telemetr... File tools/telemetry/telemetry/util/cloud_storage_unittest.py (right): https://codereview.chromium.org/838253005/diff/40001/tools/telemetry/telemetr... tools/telemetry/telemetry/util/cloud_storage_unittest.py:43: stubs.open.files = {'fake gsutil path':''} So lines 40-44 and 51-53 are still copy/pasted a whole bunch. Is there any reason the statements in the try blocks need to be successive? If not, I'd split them up into separate tests and put all the repeated code into the helper. If so, I'd still put all the repeated code in the helper and send in a list of errors/failures.
On 2015/02/20 19:13:38, sullivan wrote: > https://codereview.chromium.org/838253005/diff/40001/tools/telemetry/telemetr... > File tools/telemetry/telemetry/util/cloud_storage_unittest.py (right): > > https://codereview.chromium.org/838253005/diff/40001/tools/telemetry/telemetr... > tools/telemetry/telemetry/util/cloud_storage_unittest.py:43: stubs.open.files = > {'fake gsutil path':''} > So lines 40-44 and 51-53 are still copy/pasted a whole bunch. Is there any > reason the statements in the try blocks need to be successive? If not, I'd split > them up into separate tests and put all the repeated code into the helper. If > so, I'd still put all the repeated code in the helper and send in a list of > errors/failures. ptal
lgtm
lgtm
https://codereview.chromium.org/838253005/diff/60001/tools/telemetry/telemetr... File tools/telemetry/telemetry/util/cloud_storage.py (right): https://codereview.chromium.org/838253005/diff/60001/tools/telemetry/telemetr... tools/telemetry/telemetry/util/cloud_storage.py:245: def GetFilesInDirectoryIfChanged(directory, bucket): nit: Instead of "Get", probably it should be "Download" to make it clearer what this does? I think the renaming should apply places too. Feel free to address this in other patch.
https://codereview.chromium.org/838253005/diff/60001/tools/telemetry/telemetr... File tools/telemetry/telemetry/util/cloud_storage.py (right): https://codereview.chromium.org/838253005/diff/60001/tools/telemetry/telemetr... tools/telemetry/telemetry/util/cloud_storage.py:245: def GetFilesInDirectoryIfChanged(directory, bucket): On 2015/02/20 20:12:21, nednguyen wrote: > nit: Instead of "Get", probably it should be "Download" to make it clearer what > this does? I think the renaming should apply places too. Feel free to address > this in other patch. I used Get because that's consistent with the other cloud_storage functions (Get and GetIfChanged.) It would only make sense to me to change this if we wanted to change those function names as well, since it will be more confusing to have Get and Download used in different places to mean the same thing. I'm not sure it's worth the refactor.
On 2015/02/20 20:17:18, aiolos wrote: > https://codereview.chromium.org/838253005/diff/60001/tools/telemetry/telemetr... > File tools/telemetry/telemetry/util/cloud_storage.py (right): > > https://codereview.chromium.org/838253005/diff/60001/tools/telemetry/telemetr... > tools/telemetry/telemetry/util/cloud_storage.py:245: def > GetFilesInDirectoryIfChanged(directory, bucket): > On 2015/02/20 20:12:21, nednguyen wrote: > > nit: Instead of "Get", probably it should be "Download" to make it clearer > what > > this does? I think the renaming should apply places too. Feel free to address > > this in other patch. > > I used Get because that's consistent with the other cloud_storage functions (Get > and GetIfChanged.) It would only make sense to me to change this if we wanted to > change those function names as well, since it will be more confusing to have Get > and Download used in different places to mean the same thing. I'm not sure it's > worth the refactor. Totally. If we do it, it should be Download- everywhere.
The CQ bit was checked by aiolos@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/838253005/60001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: win_chromium_rel_ng on tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_...)
I changed the file paths to work across all os. Anyone want to take another look before I submit?
https://codereview.chromium.org/838253005/diff/80001/tools/telemetry/telemetr... File tools/telemetry/telemetry/util/cloud_storage.py (right): https://codereview.chromium.org/838253005/diff/80001/tools/telemetry/telemetr... tools/telemetry/telemetry/util/cloud_storage.py:253: if os.path.dirname(directory) == directory: # If in the root directory. This will always return True if directory is a folder directory. e.g: os.path.dirname('/tmp/foo/') == '/tmp/foo'
https://codereview.chromium.org/838253005/diff/80001/tools/telemetry/telemetr... File tools/telemetry/telemetry/util/cloud_storage.py (right): https://codereview.chromium.org/838253005/diff/80001/tools/telemetry/telemetr... tools/telemetry/telemetry/util/cloud_storage.py:253: if os.path.dirname(directory) == directory: # If in the root directory. On 2015/03/03 22:20:13, nednguyen wrote: > This will always return True if directory is a folder directory. e.g: > os.path.dirname('/tmp/foo/') == '/tmp/foo' Good catch. Done.
lgtm again
dpranke@chromium.org changed reviewers: + dpranke@chromium.org
lgtm https://codereview.chromium.org/838253005/diff/90001/tools/telemetry/telemetr... File tools/telemetry/telemetry/util/cloud_storage.py (right): https://codereview.chromium.org/838253005/diff/90001/tools/telemetry/telemetr... tools/telemetry/telemetry/util/cloud_storage.py:264: GetIfChanged(path_name, bucket) Is it possible that you might call into this from multiple unit tests running at the same time? If that happened, windows might have multiple processes try to write to the same file, and bad things would happen.
On 2015/03/06 04:03:58, Dirk Pranke wrote: > lgtm er, I didn't mean to actually lgtm this :).
https://codereview.chromium.org/838253005/diff/90001/tools/telemetry/telemetr... File tools/telemetry/telemetry/util/cloud_storage.py (right): https://codereview.chromium.org/838253005/diff/90001/tools/telemetry/telemetr... tools/telemetry/telemetry/util/cloud_storage.py:264: GetIfChanged(path_name, bucket) On 2015/03/06 04:03:58, Dirk Pranke wrote: > Is it possible that you might call into this from multiple unit tests running at > the same time? If that happened, windows might have multiple processes try to > write to the same file, and bad things would happen. Does typ provide a way to synchronize the function calls?
On 2015/03/06 04:54:36, nednguyen wrote: > Does typ provide a way to synchronize the function calls? Not generally, no. Tests are run in multiple independent processes, so this may not be straightforward to do. In addition, we should be writing tests so that they can be run concurrently, so I wouldn't want to encourage people to synchronize things. That said, you can decorate the tests themselves to indicate that they need to be run in isolation, in which case they will be run at the end of the test run, after the parallel phase is complete. But, annotating tests isn't the same as annotating functions the tests call, so I'm not sure if any of those things really helps here. If the problem is a concurrency one, we'll need to look more closely at what this patch is trying to do and figure out an alternative.
On 2015/03/06 18:51:34, Dirk Pranke wrote: > On 2015/03/06 04:54:36, nednguyen wrote: > > Does typ provide a way to synchronize the function calls? > > Not generally, no. Tests are run in multiple independent processes, > so this may not be straightforward to do. In addition, we should > be writing tests so that they can be run concurrently, so I wouldn't > want to encourage people to synchronize things. > > That said, you can decorate the tests themselves to indicate that > they need to be run in isolation, in which case they will be run at > the end of the test run, after the parallel phase is complete. > > But, annotating tests isn't the same as annotating functions the > tests call, so I'm not sure if any of those things really helps here. > > If the problem is a concurrency one, we'll need to look more closely > at what this patch is trying to do and figure out an alternative. I agree that this is the best approach. However, to alleviate the pain of debugging concurrency issue, is there a way for us to check which methods got called in parallel? I think the reality is there is a significant portion of telemetry are global objects/methods. Every once in a while, someone will add a test on those global entities without knowing the context of all other tests that also use them. The solution maybe mock out the global object, maybe specifying the test to be in isolation, etc. but figuring out the culprit is the key in fixing this. Also looking at http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_x64_..., it's not very clear to me which tests are failling here.
On 2015/03/06 18:51:34, Dirk Pranke wrote: > On 2015/03/06 04:54:36, nednguyen wrote: > > Does typ provide a way to synchronize the function calls? > > Not generally, no. Tests are run in multiple independent processes, > so this may not be straightforward to do. In addition, we should > be writing tests so that they can be run concurrently, so I wouldn't > want to encourage people to synchronize things. > > That said, you can decorate the tests themselves to indicate that > they need to be run in isolation, in which case they will be run at > the end of the test run, after the parallel phase is complete. > > But, annotating tests isn't the same as annotating functions the > tests call, so I'm not sure if any of those things really helps here. > > If the problem is a concurrency one, we'll need to look more closely > at what this patch is trying to do and figure out an alternative. This patch is consolidating logic that is already happening. If we're doing it right (which isn't quite the case) we should never hit this code (or GetIfChanged or any other real cloud_storage functions) in a unittest. That should only happen when actually running a benchmark. I recently replaced the cloud_storage usage in archive_info_unittest since I was able to find it on my Mac due to a failing test. But I con't actually tell which test is failing on Win. It may be worthwhile to audit all of the unittests to make sure they are using the stub instead of the real cloud_storage. That being said, having an error that can't be tied to a specific test is still an issue. The part that I don't understand is why would we not hit this on any non-Windows platforms, but consistently on Win? Are the tests run concurrently on Windows and not the other platforms? Or are they scheduled in different orders? I don't know enough about Windows to know other causes.
On 2015/03/06 19:02:41, nednguyen wrote: > On 2015/03/06 18:51:34, Dirk Pranke wrote: > > On 2015/03/06 04:54:36, nednguyen wrote: > > > Does typ provide a way to synchronize the function calls? > > > > Not generally, no. Tests are run in multiple independent processes, > > so this may not be straightforward to do. In addition, we should > > be writing tests so that they can be run concurrently, so I wouldn't > > want to encourage people to synchronize things. > > > > That said, you can decorate the tests themselves to indicate that > > they need to be run in isolation, in which case they will be run at > > the end of the test run, after the parallel phase is complete. > > > > But, annotating tests isn't the same as annotating functions the > > tests call, so I'm not sure if any of those things really helps here. > > > > If the problem is a concurrency one, we'll need to look more closely > > at what this patch is trying to do and figure out an alternative. > > I agree that this is the best approach. However, to alleviate the pain of > debugging concurrency issue, is there a way for us to check which methods got > called in parallel? I think every test in telemetry is run in parallel today; I don't see any users of the @decorators.Isolated call. I know when we were first switching telemetry to use typ and run tests in parallel I made several fixes to make sure things work in parallel, and tonyg was of the opinion that any test that couldn't be run in parallel with other tests should just be disabled, but I don't think we needed it in practice. > I think the reality is there is a significant portion of > telemetry are global objects/methods. Every once in a while, someone will add a > test on those global entities without knowing the context of all other tests > that also use them. The solution maybe mock out the global object, maybe > specifying the test to be in isolation, etc. but figuring out the culprit is the > key in fixing this. Yes of course. However, we avoid many of the potential issues by running tests in different processes, so in a given python process only one test is running at a time. So, the only time you tend to hit things in practice is when concurrently modifying the file system (or things like that). > > Also looking at > http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_x64_..., > it's not very clear to me which tests are failling here. Yup, that's the bug I need to keep hunting down. typ should never fail in this way.
On 2015/03/06 19:10:52, aiolos wrote: > On 2015/03/06 18:51:34, Dirk Pranke wrote: > > On 2015/03/06 04:54:36, nednguyen wrote: > > > Does typ provide a way to synchronize the function calls? > > > > Not generally, no. Tests are run in multiple independent processes, > > so this may not be straightforward to do. In addition, we should > > be writing tests so that they can be run concurrently, so I wouldn't > > want to encourage people to synchronize things. > > > > That said, you can decorate the tests themselves to indicate that > > they need to be run in isolation, in which case they will be run at > > the end of the test run, after the parallel phase is complete. > > > > But, annotating tests isn't the same as annotating functions the > > tests call, so I'm not sure if any of those things really helps here. > > > > If the problem is a concurrency one, we'll need to look more closely > > at what this patch is trying to do and figure out an alternative. > > This patch is consolidating logic that is already happening. If we're doing it > right (which isn't quite the case) we should never hit this code (or > GetIfChanged or any other real cloud_storage functions) in a unittest. That > should only happen when actually running a benchmark. Okay. Like I said in my original note, I wasn't sure that there was something wrong with what you were doing, it just looked suspicious at a glance. > I recently replaced the > cloud_storage usage in archive_info_unittest since I was able to find it on my > Mac due to a failing test. But I con't actually tell which test is failing on > Win. It may be worthwhile to audit all of the unittests to make sure they are > using the stub instead of the real cloud_storage. That being said, having an > error that can't be tied to a specific test is still an issue. Yes, definitely! > The part that I don't understand is why would we not hit this on any non-Windows > platforms, but consistently on Win? Are the tests run concurrently on Windows > and not the other platforms? Or are they scheduled in different orders? I don't > know enough about Windows to know other causes. The tests are run concurrently on every platform, but the timing and scheduling can certainly be different. Windows is much slower to launch subprocesses, the filesystem is much slower, and it is much more sensitive to conflicts over open files.
In case it wasn't obvious before, perhaps it bears repeating: 1) I'm not sure there's something wrong w/ this patch, it was a guess 2) There's definitely a problem in typ that I need to fix; you should never see the error you're seeing. I still don't know the root cause and am investigating ...
On 2015/03/06 20:17:17, Dirk Pranke wrote: > On 2015/03/06 19:02:41, nednguyen wrote: > > On 2015/03/06 18:51:34, Dirk Pranke wrote: > > > On 2015/03/06 04:54:36, nednguyen wrote: > > > > Does typ provide a way to synchronize the function calls? > > > > > > Not generally, no. Tests are run in multiple independent processes, > > > so this may not be straightforward to do. In addition, we should > > > be writing tests so that they can be run concurrently, so I wouldn't > > > want to encourage people to synchronize things. > > > > > > That said, you can decorate the tests themselves to indicate that > > > they need to be run in isolation, in which case they will be run at > > > the end of the test run, after the parallel phase is complete. > > > > > > But, annotating tests isn't the same as annotating functions the > > > tests call, so I'm not sure if any of those things really helps here. > > > > > > If the problem is a concurrency one, we'll need to look more closely > > > at what this patch is trying to do and figure out an alternative. > > > > I agree that this is the best approach. However, to alleviate the pain of > > debugging concurrency issue, is there a way for us to check which methods got > > called in parallel? > > I think every test in telemetry is run in parallel today; I don't see any users > of > the @decorators.Isolated call. Hi, by checking which methods got called in parallel, I meant s.t like: when text X fails, then typ tells us that it X fails when test A, B are running. This will help us nail down the cause of bug much easier. > I know when we were first switching telemetry > to use typ and run tests in parallel I made several fixes to make sure things > work > in parallel, and tonyg was of the opinion that any test that couldn't be run > in parallel with other tests should just be disabled, but I don't think we > needed > it in practice. > > > I think the reality is there is a significant portion of > > telemetry are global objects/methods. Every once in a while, someone will add > a > > test on those global entities without knowing the context of all other tests > > that also use them. The solution maybe mock out the global object, maybe > > specifying the test to be in isolation, etc. but figuring out the culprit is > the > > key in fixing this. > > Yes of course. However, we avoid many of the potential issues by running tests > in different processes, so in a given python process only one test is running at > a > time. So, the only time you tend to hit things in practice is when concurrently > modifying the file system (or things like that). > > > > > Also looking at > > > http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_x64_..., > > it's not very clear to me which tests are failling here. > > Yup, that's the bug I need to keep hunting down. typ should never fail in this > way.
On 2015/03/06 20:57:20, nednguyen wrote: > Hi, by checking which methods got called in parallel, I meant s.t like: > when text X fails, then typ tells us that it X fails when test A, B are running. > This will help us nail down the cause of bug much easier. Ah. There isn't a way of getting that directly, but the logs should tell you roughly which tests were running.
https://codereview.chromium.org/838253005/diff/90001/tools/telemetry/telemetr... File tools/telemetry/telemetry/util/cloud_storage_unittest.py (right): https://codereview.chromium.org/838253005/diff/90001/tools/telemetry/telemetr... tools/telemetry/telemetry/util/cloud_storage_unittest.py:36: stubs = system_stub.Override(cloud_storage, ['open', 'subprocess']) Okay, it turns out that the underlying problem is that you can't override 'open' this way. 'open' is a built-in function, nor an imported module. So, when you attempt to override it in system_stub, the first time you try to save the old value, you call getattr(cloud_storage, 'open', None), and it returns None (because open is not an attribute on cloud_storage). system_stub.Override() saves the None away as the "original value", and when you call stubs.Restore(), it *creates* an attribute in open that shadows out the built-in name. Then, down the road, when you end up calling open from inside cloud_storage -- which happens when you call @unittest.skipUnless in find_dependencies_unittest -- you end up trying to dereference None. This is the bug in your CL (or, at least, the bug that your CL is exposing). Because the @unittest.skipUnless is evaluated when we construct the test suite for the test, it isn't caught by the normal try/catch block we wrap each unit test in; this is bug #1 in typ. We're probably tripping over this now just because the ordering of tests between processes has shifted and we happen to run the find_dependencies test after one of these tests. bug #2 in typ is that we're not propagating the stack trace back, making it hard to see what the heck happened (though it would've been nigh-impossible to figure out what happened from a log regardless). I will fix the bugs in typ, but you will want to restructure your tests and the code in cloud_storage as well.
The CQ bit was checked by aiolos@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from sullivan@chromium.org, nednguyen@google.com, dpranke@chromium.org Link to the patchset: https://codereview.chromium.org/838253005/#ps110001 (title: "Rebase")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/838253005/110001
Message was sent while issue was closed.
Committed patchset #7 (id:110001)
Message was sent while issue was closed.
Patchset 7 (id:??) landed as https://crrev.com/eed596dced7b9e3f7d21c84f695a39b946c62bf2 Cr-Commit-Position: refs/heads/master@{#320523}
Message was sent while issue was closed.
A revert of this CL (patchset #7 id:110001) has been created in https://codereview.chromium.org/1005903004/ by vangelis@google.com. The reason for reverting is: Causing failures on GPU bots http://build.chromium.org/p/chromium.gpu/builders/Win7%20Release%20%28NVIDIA%... .
Message was sent while issue was closed.
kbr@chromium.org changed reviewers: + kbr@chromium.org
Message was sent while issue was closed.
crbug.com/467149 filed about why this CL passed the commit queue but broke the waterfall.
On 2015/03/13 20:15:12, Ken Russell wrote: > crbug.com/467149 filed about why this CL passed the commit queue but broke the > waterfall. I fixed the import. Still not sure how that got deleted without causing tests/lint to fail, but the bug on it has been closed. Trying again, hoping the issue is fixed. Could someone else PTAL to see if there is anything I missed?
On 2015/03/14 00:35:03, aiolos wrote: > On 2015/03/13 20:15:12, Ken Russell wrote: > > crbug.com/467149 filed about why this CL passed the commit queue but broke the > > waterfall. > > I fixed the import. Still not sure how that got deleted without causing > tests/lint to fail, but the bug on it has been closed. Trying again, hoping the > issue is fixed. > > Could someone else PTAL to see if there is anything I missed? lgtm My guess is that it's merge problem. I would just check the cq again. :-/
The CQ bit was checked by aiolos@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from dpranke@chromium.org, sullivan@chromium.org Link to the patchset: https://codereview.chromium.org/838253005/#ps130001 (title: "Fixed import.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/838253005/130001
Message was sent while issue was closed.
Committed patchset #8 (id:130001)
Message was sent while issue was closed.
Patchset 8 (id:??) landed as https://crrev.com/cd55fdb2cfcd494abab3e4ccab7934b14f94581b Cr-Commit-Position: refs/heads/master@{#320990} |