|
|
Chromium Code Reviews|
Created:
4 years, 7 months ago by Ken Russell (switch to Gerrit) Modified:
4 years, 7 months ago CC:
chromium-reviews, darin-cc_chromium.org, jam, piman+watch_chromium.org, bajones Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionHandle Telemetry's TimeoutException in retries of flaky tests.
Exceptions raised in DidRunStory weren't being caught.
BUG=603329
CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel
Patch Set 1 #Patch Set 2 : Added unit test. #Patch Set 3 : Redo patch to handle exceptions in DidRunStory. #
Total comments: 1
Messages
Total messages: 14 (5 generated)
Description was changed from ========== Handle Telemetry's TimeoutException in retries of flaky tests. This exception class derives from Error so they weren't being caught. BUG=603329 ========== to ========== Handle Telemetry's TimeoutException in retries of flaky tests. This exception class derives from Error so they weren't being caught. BUG=603329 CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel ==========
kbr@chromium.org changed reviewers: + zmo@chromium.org
Mo: please review. Thanks.
On 2016/04/28 16:45:37, Ken Russell OOO till 5-2-2016 wrote: > Mo: please review. Thanks. Hmh, telemetry.core.exceptions.Error is also a subclass of Exception. So I am not sure why the try cach fails to capture it.
On 2016/04/28 16:49:36, nednguyen (ooo til 5-4) wrote: > On 2016/04/28 16:45:37, Ken Russell OOO till 5-2-2016 wrote: > > Mo: please review. Thanks. > > Hmh, telemetry.core.exceptions.Error is also a subclass of Exception. So I am > not sure why the try cach fails to capture it. Reference: https://github.com/catapult-project/catapult/blob/master/telemetry/telemetry/...
On 2016/04/28 16:50:37, nednguyen (ooo til 5-4) wrote: > On 2016/04/28 16:49:36, nednguyen (ooo til 5-4) wrote: > > On 2016/04/28 16:45:37, Ken Russell OOO till 5-2-2016 wrote: > > > Mo: please review. Thanks. > > > > Hmh, telemetry.core.exceptions.Error is also a subclass of Exception. So I am > > not sure why the try cach fails to capture it. > > Reference: > https://github.com/catapult-project/catapult/blob/master/telemetry/telemetry/... Hmmmm. You're right. The new unittest passes if the code change to gpu_test_base.py is undone. Looking more closely, the exception that is breaking things is: Exception raised when cleaning story run: Traceback (most recent call last): _RunStoryAndProcessErrorIfNeeded at c:\users\chrome~1\appdata\local\temp\runlhwond\third_party\catapult\telemetry\telemetry\internal\story_runner.py:104 state.DidRunStory(results) DidRunStory at c:\users\chrome~1\appdata\local\temp\runlhwond\third_party\catapult\telemetry\telemetry\page\shared_page_state.py:163 self._current_tab.Close() Close at c:\users\chrome~1\appdata\local\temp\runlhwond\third_party\catapult\telemetry\telemetry\internal\browser\tab.py:100 self._tab_list_backend.CloseTab(self.id) CloseTab at c:\users\chrome~1\appdata\local\temp\runlhwond\third_party\catapult\telemetry\telemetry\internal\backends\chrome\tab_list_backend.py:66 util.WaitFor(lambda: tab_id not in self.IterContextIds(), timeout=5) WaitFor at c:\users\chrome~1\appdata\local\temp\runlhwond\third_party\catapult\telemetry\telemetry\core\util.py:94 (timeout, GetConditionString())) TimeoutException: Timed out while waiting 5s for util.WaitFor(lambda: tab_id not in self.IterContextIds(), timeout=5). If I override DidRunStory in the various SharedPageState subclasses in src/content/test/gpu/gpu_tests/gpu_test_base.py to squelch certain exceptions, Ned, do you think that'd be safe?
On 2016/04/28 18:08:00, Ken Russell OOO till 5-2-2016 wrote: > On 2016/04/28 16:50:37, nednguyen (ooo til 5-4) wrote: > > On 2016/04/28 16:49:36, nednguyen (ooo til 5-4) wrote: > > > On 2016/04/28 16:45:37, Ken Russell OOO till 5-2-2016 wrote: > > > > Mo: please review. Thanks. > > > > > > Hmh, telemetry.core.exceptions.Error is also a subclass of Exception. So I > am > > > not sure why the try cach fails to capture it. > > > > Reference: > > > https://github.com/catapult-project/catapult/blob/master/telemetry/telemetry/... > > Hmmmm. You're right. The new unittest passes if the code change to > gpu_test_base.py is undone. > > Looking more closely, the exception that is breaking things is: > > Exception raised when cleaning story run: > > Traceback (most recent call last): > _RunStoryAndProcessErrorIfNeeded at > c:\users\chrome~1\appdata\local\temp\runlhwond\third_party\catapult\telemetry\telemetry\internal\story_runner.py:104 > state.DidRunStory(results) > DidRunStory at > c:\users\chrome~1\appdata\local\temp\runlhwond\third_party\catapult\telemetry\telemetry\page\shared_page_state.py:163 > self._current_tab.Close() > Close at > c:\users\chrome~1\appdata\local\temp\runlhwond\third_party\catapult\telemetry\telemetry\internal\browser\tab.py:100 > self._tab_list_backend.CloseTab(self.id) > CloseTab at > c:\users\chrome~1\appdata\local\temp\runlhwond\third_party\catapult\telemetry\telemetry\internal\backends\chrome\tab_list_backend.py:66 > util.WaitFor(lambda: tab_id not in self.IterContextIds(), timeout=5) > WaitFor at > c:\users\chrome~1\appdata\local\temp\runlhwond\third_party\catapult\telemetry\telemetry\core\util.py:94 > (timeout, GetConditionString())) > TimeoutException: Timed out while waiting 5s for util.WaitFor(lambda: tab_id not > in self.IterContextIds(), timeout=5). > > If I override DidRunStory in the various SharedPageState subclasses in > src/content/test/gpu/gpu_tests/gpu_test_base.py to squelch certain exceptions, > Ned, do you think that'd be safe? It seems messy but ok as we plan to migrate things to unittest framework.
On 2016/04/28 18:21:02, nednguyen(REVIEW IN OTHER ACC) wrote: > On 2016/04/28 18:08:00, Ken Russell OOO till 5-2-2016 wrote: > > On 2016/04/28 16:50:37, nednguyen (ooo til 5-4) wrote: > > > On 2016/04/28 16:49:36, nednguyen (ooo til 5-4) wrote: > > > > On 2016/04/28 16:45:37, Ken Russell OOO till 5-2-2016 wrote: > > > > > Mo: please review. Thanks. > > > > > > > > Hmh, telemetry.core.exceptions.Error is also a subclass of Exception. So I > > am > > > > not sure why the try cach fails to capture it. > > > > > > Reference: > > > > > > https://github.com/catapult-project/catapult/blob/master/telemetry/telemetry/... > > > > Hmmmm. You're right. The new unittest passes if the code change to > > gpu_test_base.py is undone. > > > > Looking more closely, the exception that is breaking things is: > > > > Exception raised when cleaning story run: > > > > Traceback (most recent call last): > > _RunStoryAndProcessErrorIfNeeded at > > > c:\users\chrome~1\appdata\local\temp\runlhwond\third_party\catapult\telemetry\telemetry\internal\story_runner.py:104 > > state.DidRunStory(results) > > DidRunStory at > > > c:\users\chrome~1\appdata\local\temp\runlhwond\third_party\catapult\telemetry\telemetry\page\shared_page_state.py:163 > > self._current_tab.Close() > > Close at > > > c:\users\chrome~1\appdata\local\temp\runlhwond\third_party\catapult\telemetry\telemetry\internal\browser\tab.py:100 > > self._tab_list_backend.CloseTab(self.id) > > CloseTab at > > > c:\users\chrome~1\appdata\local\temp\runlhwond\third_party\catapult\telemetry\telemetry\internal\backends\chrome\tab_list_backend.py:66 > > util.WaitFor(lambda: tab_id not in self.IterContextIds(), timeout=5) > > WaitFor at > > > c:\users\chrome~1\appdata\local\temp\runlhwond\third_party\catapult\telemetry\telemetry\core\util.py:94 > > (timeout, GetConditionString())) > > TimeoutException: Timed out while waiting 5s for util.WaitFor(lambda: tab_id > not > > in self.IterContextIds(), timeout=5). > > > > If I override DidRunStory in the various SharedPageState subclasses in > > src/content/test/gpu/gpu_tests/gpu_test_base.py to squelch certain exceptions, > > Ned, do you think that'd be safe? > > It seems messy but ok as we plan to migrate things to unittest framework. Here's a new patch set implementing that approach. It seems to work though I have a feeling it may break when running (or retrying) flaky content. It throws an unexpected exception: [4/41] gpu_tests.gpu_test_base_unittest.PageExecutionTest.testFlakyFailureInDidRunStory passed 0.0110s: FLAKY TEST FAILURE, retrying: page1 Expected exception in DidRunStory for page1 Traceback (most recent call last): _DidRunStory at content/test/gpu/gpu_tests/gpu_test_base.py:116 raise exceptions.TimeoutException('Deliberate timeout') TimeoutException: Deliberate timeout Locals: cls : <class 'gpu_tests.gpu_test_base.FakeGpuSharedPageState'> expectation : 'flaky' expectations : <gpu_tests.gpu_test_expectations.GpuTestExpectations object at 0x10c180e50> force_timeout_exception : True msg : 'Expected exception in DidRunStory for page1' page : <gpu_tests.gpu_test_base_unittest.PageWhichFailsNTimes object at 0x10c180e10> results : <telemetry.internal.results.page_test_results.PageTestResults object at 0x10c196190> shared_page_state : <gpu_tests.gpu_test_base_unittest.SharedPageStateWhichTimesOutNTimes object at 0x10c196790> Expected exception in DidRunStory for page1 Traceback (most recent call last): _DidRunStory at content/test/gpu/gpu_tests/gpu_test_base.py:114 super(cls, shared_page_state).DidRunStory(results) DidRunStory at third_party/catapult/telemetry/telemetry/testing/fakes/__init__.py:167 super(FakeSharedPageState, self).DidRunStory(results) DidRunStory at third_party/catapult/telemetry/telemetry/page/shared_page_state.py:165 if self._current_page.credentials and self._did_login_for_current_page: AttributeError: 'NoneType' object has no attribute 'credentials' Locals: results : <gpu_tests.gpu_test_base_unittest.SharedPageStateWhichTimesOutNTimes object at 0x10c196790> I'm not sure Telemetry will recover gracefully from the TimeoutError that happened earlier. Maybe the core harness needs to be made more robust to exceptions. Let's put this in and see. Please re-review.
Description was changed from ========== Handle Telemetry's TimeoutException in retries of flaky tests. This exception class derives from Error so they weren't being caught. BUG=603329 CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel ========== to ========== Handle Telemetry's TimeoutException in retries of flaky tests. Exceptions raised in DidRunStory weren't being caught. BUG=603329 CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel ==========
kbr@chromium.org changed reviewers: + nednguyen@google.com
kbr@chromium.org changed reviewers: + aiolos@chromium.org, eakuefner@chromium.org
Adding other Telemetry owners for any feedback on the approach.
I think your unittest doesn't include the behavior of the next page in the run yet https://codereview.chromium.org/1915033009/diff/40001/content/test/gpu/gpu_te... File content/test/gpu/gpu_tests/gpu_test_base.py (right): https://codereview.chromium.org/1915033009/diff/40001/content/test/gpu/gpu_te... content/test/gpu/gpu_tests/gpu_test_base.py:129: if expectation != 'pass': I think you may want exercise the code path of tearing down the browser & restarting it here since the story_runner will no longer do that for you https://github.com/catapult-project/catapult/blob/master/telemetry/telemetry/... |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
