Issue 1591003002: [Findit] Modify tryjob pipelines to trigger try jobs for test failure.

Issue 1591003002: [Findit] Modify tryjob pipelines to trigger try jobs for test failure. (Closed)

Created:
4 years, 11 months ago by chanli

Modified:
3 years, 8 months ago

Reviewers:
stgao, lijeffrey, qyearsley

CC:
chromium-reviews, infra-reviews+infra_chromium.org

Base URL:
https://chromium.googlesource.com/infra/infra.git@master

Target Ref:
refs/heads/master

Project:
infra

Visibility:
Public.

More Reviews

Description

[Findit] Modify tryjob pipelines to trigger try jobs for test failure. Test failures are different from compile because there may be multiple steps and in each step there may be multiple tests, so the final result may contain multiple culprits for different tests. Main task: 1. In each sub-pipeline, add logic to handle test failures. 2. Add a table to result page to display culprit for each step/test. BUG=583806 Committed: https://chromium.googlesource.com/infra/infra/+/9ec3c4aed10f55037eb6f8a3dc623d537f6b33e9

Patch Set 1 #

Total comments: 12

Patch Set 2 : . #

Total comments: 12

Patch Set 3 : . #

Patch Set 4 : Address comments. #

Patch Set 5 : Address comments. #

Patch Set 6 : Address comments. #

Patch Set 7 : If different tests within the same step fail in different revisions, all revisions should be culpri… #

Total comments: 35

Patch Set 8 : Address comments. #

Patch Set 9 : rebase #

Total comments: 14

Patch Set 10 : Address comments. #

Patch Set 11 : Simplify the format for final try-job result for test failures. #

Total comments: 17

Patch Set 12 : Rebase and address conflicts. #

Total comments: 6

Patch Set 13 : Fix nits. #

Total comments: 8

Patch Set 14 : . #

Total comments: 10

Patch Set 15 : rebase #

Patch Set 16 : . #

Total comments: 2

Patch Set 17 : . #

Created: 4 years, 10 months ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+1019 lines, -220 lines)			Patch
M	appengine/findit/model/wf_try_job.py	View	1 2 3 4 5 6 7 8 9 10 11 12 13 14	1 chunk	+3 lines, -1 line	0 comments	Download
M	appengine/findit/waterfall/identify_try_job_culprit_pipeline.py	View	1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16	2 chunks	+114 lines, -37 lines	0 comments	Download
M	appengine/findit/waterfall/monitor_try_job_pipeline.py	View	1 2 3 4 5 6 7 8 9 10 11 12 13 14 15	3 chunks	+43 lines, -38 lines	0 comments	Download
M	appengine/findit/waterfall/schedule_try_job_pipeline.py	View	1 2 3 4 5 6 7 8 9 10 11 12 13 14	2 chunks	+23 lines, -15 lines	0 comments	Download
M	appengine/findit/waterfall/test/identify_try_job_culprit_pipeline_test.py	View	1 2 3 4 5 6 7 8 9 10 11 12 13 14	8 chunks	+293 lines, -19 lines	0 comments	Download
M	appengine/findit/waterfall/test/monitor_try_job_pipeline_test.py	View	1 2 3 4 5 6 7 8 9 10 11 12 13 14	4 chunks	+67 lines, -5 lines	0 comments	Download
M	appengine/findit/waterfall/test/schedule_try_job_pipeline_test.py	View	1 2 3 4 5 6 7 8 9 10 11 12 13 14	5 chunks	+61 lines, -9 lines	0 comments	Download
M	appengine/findit/waterfall/test/try_job_pipeline_test.py	View	1 2 3 4 5 6 7 8 9 10 11 12 13 14	7 chunks	+35 lines, -16 lines	0 comments	Download
M	appengine/findit/waterfall/test/try_job_util_test.py	View	1 2 3 4 5 6 7 8 9 10 11 12	8 chunks	+129 lines, -26 lines	0 comments	Download
M	appengine/findit/waterfall/try_job_pipeline.py	View	1 2 3 4 5 6 7 8 9 10 11 13	2 chunks	+12 lines, -11 lines	0 comments	Download
A	appengine/findit/waterfall/try_job_result_format.md	View	1 2 3 4 5 6 7 8 9 10 11 12 13 14 15	1 chunk	+147 lines, -0 lines	0 comments	Download
A	appengine/findit/waterfall/try_job_type.py	View	1 2 3 4 5 6 7 8 9 10 11 12 13 14 15	1 chunk	+8 lines, -0 lines	0 comments	Download
M	appengine/findit/waterfall/try_job_util.py	View	1 2 3 4 5 6 7 8 9 10 11 12 13	3 chunks	+84 lines, -43 lines	0 comments	Download

Dependent Patchsets:

Issue 1652603003 Patch 130001

Messages

Total messages: 36 (6 generated)

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages

chanli

Hi, This cl is about triggering try jobs for test failures. Please review it when ...

4 years, 11 months ago (2016-01-15 18:59:14 UTC) #2

lijeffrey

https://codereview.chromium.org/1591003002/diff/1/appengine/findit/waterfall/identify_try_job_culprit_pipeline.py File appengine/findit/waterfall/identify_try_job_culprit_pipeline.py (right): https://codereview.chromium.org/1591003002/diff/1/appengine/findit/waterfall/identify_try_job_culprit_pipeline.py#newcode118 appengine/findit/waterfall/identify_try_job_culprit_pipeline.py:118: if result and len(result.get('result', [])) > 0: I think ...

4 years, 11 months ago (2016-01-16 00:27:12 UTC) #3

qyearsley

Since this is a relatively large change, is there more context or a bug link ...

4 years, 11 months ago (2016-01-17 03:43:12 UTC) #4

chanli

https://codereview.chromium.org/1591003002/diff/20001/appengine/findit/model/wf_try_job.py File appengine/findit/model/wf_try_job.py (right): https://codereview.chromium.org/1591003002/diff/20001/appengine/findit/model/wf_try_job.py#newcode19 appengine/findit/model/wf_try_job.py:19: # A list of dict containing results and urls ...

4 years, 11 months ago (2016-01-20 18:07:09 UTC) #5

chanli

4 years, 11 months ago (2016-01-20 18:28:04 UTC) #6

chanli

Hi, I just made another change to make sure if different tests within the same ...

4 years, 11 months ago (2016-01-22 19:41:29 UTC) #8

stgao

Besides the comments inline with code, I feel that this CL is a little big ...

4 years, 11 months ago (2016-01-26 00:51:41 UTC) #9

chanli

https://codereview.chromium.org/1591003002/diff/120001/appengine/findit/.gitignore File appengine/findit/.gitignore (right): https://codereview.chromium.org/1591003002/diff/120001/appengine/findit/.gitignore#newcode7 appengine/findit/.gitignore:7: # but do track run.sh On 2016/01/26 00:51:40, stgao ...

4 years, 10 months ago (2016-01-27 18:49:55 UTC) #10

https://codereview.chromium.org/1591003002/diff/120001/appengine/findit/.giti...
File appengine/findit/.gitignore (right):

https://codereview.chromium.org/1591003002/diff/120001/appengine/findit/.giti...
appengine/findit/.gitignore:7: # but do track run.sh
On 2016/01/26 00:51:40, stgao wrote:
> As this is not relative to the main purpose of this CL, maybe create a
separate
> CL for it?

Done.

https://codereview.chromium.org/1591003002/diff/120001/appengine/findit/.giti...
appengine/findit/.gitignore:7: # but do track run.sh
On 2016/01/26 00:51:40, stgao wrote:
> Does this work? See the conditions 3 and 4 below.
> 
> (https://git-scm.com/docs/gitignore)
> 
> To re-include files or directories when their parent directory is excluded,
the
> following conditions must be met:
> 
> 1. The rules to exclude a directory and re-include a subset back must be in
the
> same .gitignore file.
> 
> 2. The directory part in the re-include rules must be literal (i.e. no
> wildcards)
> 
> 3. The rules to exclude the parent directory must not end with a trailing
slash.
> 
> 4. The rules to exclude the parent directory must have at least one slash.

It works just fine for me.

https://codereview.chromium.org/1591003002/diff/120001/appengine/findit/.giti...
appengine/findit/.gitignore:7: # but do track run.sh
On 2016/01/26 00:51:40, stgao wrote:
> This comment might be misleading, as we need to track more than run.sh

Because I have some local scripts in util_scripts/ , I need to have these
changes so git won't complain about those scripts.

https://codereview.chromium.org/1591003002/diff/120001/appengine/findit/model...
File appengine/findit/model/test/wf_try_job_test.py (right):

https://codereview.chromium.org/1591003002/diff/120001/appengine/findit/model...
appengine/findit/model/test/wf_try_job_test.py:14: try_job =
WfTryJob.Create('m', 'b', 123)
On 2016/01/26 00:51:40, stgao wrote:
> If you like, the rename here and in other files could go into a separate CL
for
> such refactoring purpose.

Got it. Done.

https://codereview.chromium.org/1591003002/diff/120001/appengine/findit/water...
File appengine/findit/waterfall/identify_try_job_culprit_pipeline.py (right):

https://codereview.chromium.org/1591003002/diff/120001/appengine/findit/water...
appengine/findit/waterfall/identify_try_job_culprit_pipeline.py:13:
'https://chromium.googlesource.com/chromium/src.git', HttpClient())
On 2016/01/26 00:51:40, stgao wrote:
> It seems in quite a few places in the code base, we use this url for quite a
few
> times.
> 
> We'd better reduce the occurrence of the same strings.
> How about filing a bug to track this and fixing it in a separate CL?

Acknowledged.

https://codereview.chromium.org/1591003002/diff/120001/appengine/findit/water...
appengine/findit/waterfall/identify_try_job_culprit_pipeline.py:33: # For test
failures, the try job will run against every revision,
On 2016/01/26 00:51:40, stgao wrote:
> Just as a side note: This is the case for now. But we'd like to improve the
> recipe findit/chromium/test and make it do the same as the
> findit/chromium/compile -- stop testing a test if the first failure is found.
In
> this way, we could simplify the logic on Findit app side.

Acknowledged. I'll keep it in mind and make the change when the recipe is ready.

https://codereview.chromium.org/1591003002/diff/120001/appengine/findit/water...
appengine/findit/waterfall/identify_try_job_culprit_pipeline.py:39: for step,
step_result in result['result'][revision].iteritems():
On 2016/01/26 00:51:40, stgao wrote:
> What if the revision is not tested yet due to infra failure, like exception
> during the recipe run?

If there were any infra failure or something else, data['result'] should be
'FAILURE' and there would be data['failure_reason'] as well, right?

I'll modify buildbucket_client a little to include those info. Then I'll add a
check on data['result'] == 'SUCCESS' and I'll handle other cases in a separate
CL.

https://codereview.chromium.org/1591003002/diff/120001/appengine/findit/water...
appengine/findit/waterfall/identify_try_job_culprit_pipeline.py:55:
culprit_map[step]['tests'][failed_test] = {}
On 2016/01/26 00:51:40, stgao wrote:
> Under the failed_test, is there a reason to have another level {'revision':
> revision}? Can it be just the string ``revision``?

Just like the example below, we will record the culprit info for each test as
well:
                'tests': {
                    'a_test1': {
                      'revision': 'rev1',
                      'commit_position': '1',
                      'review_url': 'url_1'
                    },
                    'a_test2': {
                      'revision': 'rev2',
                      'commit_position': '2',
                      'review_url': 'url_2'
                    }

https://codereview.chromium.org/1591003002/diff/120001/appengine/findit/water...
appengine/findit/waterfall/identify_try_job_culprit_pipeline.py:96: The format
for final try-job result for test failures is:
On 2016/01/26 00:51:40, stgao wrote:
> What do you think of adding a file recipe_result_format.md to explain the
format
> of the recipe results and referring to the file here instead?

Done.

https://codereview.chromium.org/1591003002/diff/120001/appengine/findit/water...
appengine/findit/waterfall/identify_try_job_culprit_pipeline.py:203:
result['culprit'] = culprits[failed_revisions[0]]
On 2016/01/26 00:51:40, stgao wrote:
> Unrelated to this CL: In the else case, it means the try job can't identify
the
> culprit for some reason, i.e. infra exception.
> Should we show a warming on the UI too?

Acknowledged. I'll file a bug for this as well.

https://codereview.chromium.org/1591003002/diff/120001/appengine/findit/water...
appengine/findit/waterfall/identify_try_job_culprit_pipeline.py:215:
result_needs_update = (
On 2016/01/26 00:51:40, stgao wrote:
> nit: naming.

Done.

https://codereview.chromium.org/1591003002/diff/120001/appengine/findit/water...
File appengine/findit/waterfall/monitor_try_job_pipeline.py (right):

https://codereview.chromium.org/1591003002/diff/120001/appengine/findit/water...
appengine/findit/waterfall/monitor_try_job_pipeline.py:44: if try_job_type ==
'compile':
On 2016/01/26 00:51:40, stgao wrote:
> use a CONSTANT instead? An enum might be better too.

Done.

https://codereview.chromium.org/1591003002/diff/120001/appengine/findit/water...
appengine/findit/waterfall/monitor_try_job_pipeline.py:45: result_needs_update =
try_job_result.compile_results
On 2016/01/26 00:51:40, stgao wrote:
> the var name seems like a True/False flag, but it's actually not.
> How about naming it "result_to_update"?

Done.

https://codereview.chromium.org/1591003002/diff/120001/appengine/findit/water...
appengine/findit/waterfall/monitor_try_job_pipeline.py:57: if build.status ==
'STARTED' and not already_set_started:
On 2016/01/26 00:51:40, stgao wrote:
> unrelated to this CL: use enum or constant instead of hardcoded string.

Done.

https://codereview.chromium.org/1591003002/diff/120001/appengine/findit/water...
appengine/findit/waterfall/monitor_try_job_pipeline.py:75:
result_needs_update.append(result)
On 2016/01/26 00:51:40, stgao wrote:
> Line #58-75 and #37-53 are very similar. How about extracting them into a
> separate function ``UpdateTryJobResult`` or the like?
> 
> This could simplify unittest too, as we test the new function directly.
> To simplify unittest (especially setting up dummy data), we'd better go in
this
> way later.

Done.

https://codereview.chromium.org/1591003002/diff/120001/appengine/findit/water...
File appengine/findit/waterfall/schedule_try_job_pipeline.py (right):

https://codereview.chromium.org/1591003002/diff/120001/appengine/findit/water...
appengine/findit/waterfall/schedule_try_job_pipeline.py:31: if targeted_tests:
On 2016/01/26 00:51:41, stgao wrote:
> For test failures, we will always require the ``targeted_tests``, as we need
to
> know which tests to run.
> 
> Thus this ``if`` should be removed, and an assert should be added instead.

Done.

https://codereview.chromium.org/1591003002/diff/120001/appengine/findit/water...
File appengine/findit/waterfall/try_job_util.py (right):

https://codereview.chromium.org/1591003002/diff/120001/appengine/findit/water...
appengine/findit/waterfall/try_job_util.py:61: targeted_tests = {}
On 2016/01/26 00:51:41, stgao wrote:
> Alternative: extract this else branch into a helper function -- retrieve
failed
> steps and tests.

Done.

chanli

4 years, 10 months ago (2016-01-28 17:55:59 UTC) #11

stgao

Some more comments. https://codereview.chromium.org/1591003002/diff/160001/appengine/findit/waterfall/monitor_try_job_pipeline.py File appengine/findit/waterfall/monitor_try_job_pipeline.py (right): https://codereview.chromium.org/1591003002/diff/160001/appengine/findit/waterfall/monitor_try_job_pipeline.py#newcode47 appengine/findit/waterfall/monitor_try_job_pipeline.py:47: try_job_result.status = wf_analysis_status.ANALYZING What if status ...

4 years, 10 months ago (2016-01-28 19:33:52 UTC) #12

chanli

https://codereview.chromium.org/1591003002/diff/160001/appengine/findit/waterfall/monitor_try_job_pipeline.py File appengine/findit/waterfall/monitor_try_job_pipeline.py (right): https://codereview.chromium.org/1591003002/diff/160001/appengine/findit/waterfall/monitor_try_job_pipeline.py#newcode47 appengine/findit/waterfall/monitor_try_job_pipeline.py:47: try_job_result.status = wf_analysis_status.ANALYZING On 2016/01/28 19:33:51, stgao wrote: > ...

4 years, 10 months ago (2016-01-28 22:40:12 UTC) #13

chanli

Hi, I made some change to simplify the format for final try-job result for test ...

4 years, 10 months ago (2016-01-29 00:37:39 UTC) #14

stgao

https://codereview.chromium.org/1591003002/diff/200001/appengine/findit/waterfall/identify_try_job_culprit_pipeline.py File appengine/findit/waterfall/identify_try_job_culprit_pipeline.py (right): https://codereview.chromium.org/1591003002/diff/200001/appengine/findit/waterfall/identify_try_job_culprit_pipeline.py#newcode25 appengine/findit/waterfall/identify_try_job_culprit_pipeline.py:25: if change_log: If the request to Gitiles fails for ...

4 years, 10 months ago (2016-01-29 18:55:02 UTC) #15

lijeffrey

https://codereview.chromium.org/1591003002/diff/200001/appengine/findit/waterfall/identify_try_job_culprit_pipeline.py File appengine/findit/waterfall/identify_try_job_culprit_pipeline.py (right): https://codereview.chromium.org/1591003002/diff/200001/appengine/findit/waterfall/identify_try_job_culprit_pipeline.py#newcode75 appengine/findit/waterfall/identify_try_job_culprit_pipeline.py:75: def run( you may want to do a rebase, ...

4 years, 10 months ago (2016-01-29 19:49:46 UTC) #17

chanli

Hi, I finally finished the merge... Happy Friday : ) https://codereview.chromium.org/1591003002/diff/200001/appengine/findit/waterfall/identify_try_job_culprit_pipeline.py File appengine/findit/waterfall/identify_try_job_culprit_pipeline.py (right): https://codereview.chromium.org/1591003002/diff/200001/appengine/findit/waterfall/identify_try_job_culprit_pipeline.py#newcode25 ...

4 years, 10 months ago (2016-01-30 02:22:37 UTC) #18

stgao

lgtm with nits. https://codereview.chromium.org/1591003002/diff/220001/appengine/findit/waterfall/test/try_job_pipeline_test.py File appengine/findit/waterfall/test/try_job_pipeline_test.py (right): https://codereview.chromium.org/1591003002/diff/220001/appengine/findit/waterfall/test/try_job_pipeline_test.py#newcode125 appengine/findit/waterfall/test/try_job_pipeline_test.py:125: print json.dumps(try_job.compile_results, indent=4) This should be ...

4 years, 10 months ago (2016-02-01 19:09:01 UTC) #19

stgao

https://codereview.chromium.org/1591003002/diff/220001/appengine/findit/waterfall/try_job_result_format.md File appengine/findit/waterfall/try_job_result_format.md (right): https://codereview.chromium.org/1591003002/diff/220001/appengine/findit/waterfall/try_job_result_format.md#newcode3 appengine/findit/waterfall/try_job_result_format.md:3: ['rev1', 'passed'], This now has to be updated after ...

4 years, 10 months ago (2016-02-01 19:22:02 UTC) #20

chanli

https://codereview.chromium.org/1591003002/diff/220001/appengine/findit/waterfall/test/try_job_pipeline_test.py File appengine/findit/waterfall/test/try_job_pipeline_test.py (right): https://codereview.chromium.org/1591003002/diff/220001/appengine/findit/waterfall/test/try_job_pipeline_test.py#newcode125 appengine/findit/waterfall/test/try_job_pipeline_test.py:125: print json.dumps(try_job.compile_results, indent=4) On 2016/02/01 19:09:01, stgao wrote: > ...

4 years, 10 months ago (2016-02-01 22:01:49 UTC) #21

lijeffrey

lgtm with nits https://codereview.chromium.org/1591003002/diff/200001/appengine/findit/waterfall/monitor_try_job_pipeline.py File appengine/findit/waterfall/monitor_try_job_pipeline.py (right): https://codereview.chromium.org/1591003002/diff/200001/appengine/findit/waterfall/monitor_try_job_pipeline.py#newcode28 appengine/findit/waterfall/monitor_try_job_pipeline.py:28: 'result': result_content, nit: for compile, this ...

4 years, 10 months ago (2016-02-02 00:24:31 UTC) #22

chanli

https://codereview.chromium.org/1591003002/diff/240001/appengine/findit/waterfall/test/identify_try_job_culprit_pipeline_test.py File appengine/findit/waterfall/test/identify_try_job_culprit_pipeline_test.py (right): https://codereview.chromium.org/1591003002/diff/240001/appengine/findit/waterfall/test/identify_try_job_culprit_pipeline_test.py#newcode158 appengine/findit/waterfall/test/identify_try_job_culprit_pipeline_test.py:158: compile_result = { On 2016/02/02 00:24:31, lijeffrey wrote: > ...

4 years, 10 months ago (2016-02-02 01:53:02 UTC) #23

qyearsley

Is there a BUG= for this CL? https://codereview.chromium.org/1591003002/diff/260001/appengine/findit/waterfall/identify_try_job_culprit_pipeline.py File appengine/findit/waterfall/identify_try_job_culprit_pipeline.py (right): https://codereview.chromium.org/1591003002/diff/260001/appengine/findit/waterfall/identify_try_job_culprit_pipeline.py#newcode132 appengine/findit/waterfall/identify_try_job_culprit_pipeline.py:132: if step_result['valid'] ...

4 years, 10 months ago (2016-02-03 18:55:14 UTC) #24

chanli

On 2016/02/03 18:55:14, qyearsley wrote: > Is there a BUG= for this CL? > > ...

4 years, 10 months ago (2016-02-03 21:10:29 UTC) #26

chanli

https://codereview.chromium.org/1591003002/diff/260001/appengine/findit/waterfall/identify_try_job_culprit_pipeline.py File appengine/findit/waterfall/identify_try_job_culprit_pipeline.py (right): https://codereview.chromium.org/1591003002/diff/260001/appengine/findit/waterfall/identify_try_job_culprit_pipeline.py#newcode132 appengine/findit/waterfall/identify_try_job_culprit_pipeline.py:132: if step_result['valid'] and step_result['status'] == 'failed': On 2016/02/03 18:55:14, ...

4 years, 10 months ago (2016-02-03 23:44:13 UTC) #27

stgao

lgtm with a nit. https://codereview.chromium.org/1591003002/diff/300001/appengine/findit/waterfall/identify_try_job_culprit_pipeline.py File appengine/findit/waterfall/identify_try_job_culprit_pipeline.py (right): https://codereview.chromium.org/1591003002/diff/300001/appengine/findit/waterfall/identify_try_job_culprit_pipeline.py#newcode135 appengine/findit/waterfall/identify_try_job_culprit_pipeline.py:135: continue nit: an empty line ...

4 years, 10 months ago (2016-02-04 00:19:47 UTC) #29

chanli

https://codereview.chromium.org/1591003002/diff/300001/appengine/findit/waterfall/identify_try_job_culprit_pipeline.py File appengine/findit/waterfall/identify_try_job_culprit_pipeline.py (right): https://codereview.chromium.org/1591003002/diff/300001/appengine/findit/waterfall/identify_try_job_culprit_pipeline.py#newcode135 appengine/findit/waterfall/identify_try_job_culprit_pipeline.py:135: continue On 2016/02/04 00:19:47, stgao wrote: > nit: an ...

4 years, 10 months ago (2016-02-04 00:22:46 UTC) #30

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1591003002/320001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1591003002/320001

4 years, 10 months ago (2016-02-04 00:23:03 UTC) #33

commit-bot: I haz the power

4 years, 10 months ago (2016-02-04 00:25:55 UTC) #35

Message was sent while issue was closed.

Committed patchset #17 (id:320001) as
https://chromium.googlesource.com/infra/infra/+/9ec3c4aed10f55037eb6f8a3dc623...

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages