|
|
Chromium Code Reviews
Description[Findit] Group failures by culprit and send notification to codereview.
BUG=621140
Committed: https://chromium.googlesource.com/infra/infra/+/6f5db5c7c252a2f2ae9557d48788b27ac58c00b6
Patch Set 1 #
Total comments: 18
Patch Set 2 : Address Jeff's comments. Tests will be added in next patch. #
Total comments: 2
Patch Set 3 : Add unittests and address comments. #Patch Set 4 : Clean up. #
Dependent Patchsets: Messages
Total messages: 70 (41 generated)
Patchset #1 (id:1) has been deleted
Description was changed from ========== [Findit] Group failures by culprit to send notifications. BUG= ========== to ========== [Findit] Group failures by culprit to send notifications. BUG=621140 ==========
stgao@chromium.org changed reviewers: + chanli@chromium.org, lijeffrey@chromium.org
ptal This is to use the Rieveld client to send the notifications to culprits.
https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/model/... File appengine/findit/model/wf_culprit.py (right): https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/model/... appengine/findit/model/wf_culprit.py:21: found_time = ndb.DateTimeProperty() do we want to index found_time should we ever want to query for historical WfCulprit entities? I imagine this could eventually be used in conjunction with auto revert and we may want metrics for that It also doesn't look like found_time is being set anywhere https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/waterf... File appengine/findit/waterfall/identify_try_job_culprit_pipeline.py (right): https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/waterf... appengine/findit/waterfall/identify_try_job_culprit_pipeline.py:19: from waterfall.send_notification_for_culprit_pipeline import \ nit: I think gpylint doesn't like multi-line lines using \ is it possible to do something like: from waterfall.send_notification_for_culprit_pipeline import ( SendNotificationForCulpritPipeline)? https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/waterf... File appengine/findit/waterfall/send_notification_for_culprit_pipeline.py (right): https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/waterf... appengine/findit/waterfall/send_notification_for_culprit_pipeline.py:22: culprit = WfCulprit.Get(repo_name, revision) nit: how about culprit = WfCulprit.Get(repo_name, revision) or WfCulprit.Create(repo_name, revision)? https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/waterf... appengine/findit/waterfall/send_notification_for_culprit_pipeline.py:31: should_send = len(culprit.failed_builds) >= 2 # TODO(stgao): move to config. is it possible to merge these 2? i.e. should_send = (len(culprit.failed_builds >= 2 and culprit.notification_status not in (status.COMPLETED, status.RUNNING))
Patchset #2 (id:40001) has been deleted
https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/model/... File appengine/findit/model/wf_culprit.py (right): https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/model/... appengine/findit/model/wf_culprit.py:21: found_time = ndb.DateTimeProperty() On 2016/06/21 00:07:34, lijeffrey wrote: > do we want to index found_time should we ever want to query for historical > WfCulprit entities? I imagine this could eventually be used in conjunction with > auto revert and we may want metrics for that We do want to query for historical data to know which and how many culprits we have sent notification to code-review, and later trigger auto-revert when supported. > > It also doesn't look like found_time is being set anywhere Ooops, good catch. Changed to notification_time instead. https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/waterf... File appengine/findit/waterfall/identify_try_job_culprit_pipeline.py (right): https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/waterf... appengine/findit/waterfall/identify_try_job_culprit_pipeline.py:19: from waterfall.send_notification_for_culprit_pipeline import \ On 2016/06/21 00:07:34, lijeffrey wrote: > nit: I think gpylint doesn't like multi-line lines using \ > > is it possible to do something like: > > from waterfall.send_notification_for_culprit_pipeline import ( > SendNotificationForCulpritPipeline)? Done. https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/waterf... File appengine/findit/waterfall/send_notification_for_culprit_pipeline.py (right): https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/waterf... appengine/findit/waterfall/send_notification_for_culprit_pipeline.py:22: culprit = WfCulprit.Get(repo_name, revision) On 2016/06/21 00:07:35, lijeffrey wrote: > nit: how about > > culprit = WfCulprit.Get(repo_name, revision) or WfCulprit.Create(repo_name, > revision)? Good idea! Done. https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/waterf... appengine/findit/waterfall/send_notification_for_culprit_pipeline.py:31: should_send = len(culprit.failed_builds) >= 2 # TODO(stgao): move to config. On 2016/06/21 00:07:34, lijeffrey wrote: > is it possible to merge these 2? > > i.e. > > should_send = (len(culprit.failed_builds >= 2 and > culprit.notification_status not in (status.COMPLETED, status.RUNNING)) Done.
https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/model/... File appengine/findit/model/wf_culprit.py (right): https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/model/... appengine/findit/model/wf_culprit.py:23: # The status of notification delivery. Do you want to list the statuses here? For reference? https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/model/... appengine/findit/model/wf_culprit.py:30: def project_name(self): # pragma: no cover Why don't just use repo_name? https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/waterf... File appengine/findit/waterfall/identify_try_job_culprit_pipeline.py (right): https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/waterf... appengine/findit/waterfall/identify_try_job_culprit_pipeline.py:344: _NotifyCulprits(master_name, builder_name, build_number, culprits) So only culprits found by try jobs will be included, right? And that's expected? If so, when do you want to enable this change in prod? I mean, should we wait for the accuracy rate for try jobs to get higher? https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/waterf... File appengine/findit/waterfall/send_notification_for_culprit_pipeline.py (right): https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/waterf... appengine/findit/waterfall/send_notification_for_culprit_pipeline.py:30: # builds to avoid false positive due to flakiness. Do you have data to back up this idea? What percentage of false positives could be filtered out by this?
lgtm https://codereview.chromium.org/2075423002/diff/60001/appengine/findit/model/... File appengine/findit/model/wf_culprit.py (right): https://codereview.chromium.org/2075423002/diff/60001/appengine/findit/model/... appengine/findit/model/wf_culprit.py:23: cr_notification_time = ndb.DateTimeProperty() do we still want to do ndb.DateTimeProperty(indexed=True) to be able to query for this later?
Patchset #3 (id:80001) has been deleted
https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/model/... File appengine/findit/model/wf_culprit.py (right): https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/model/... appengine/findit/model/wf_culprit.py:23: # The status of notification delivery. On 2016/06/21 17:50:31, chanli wrote: > Do you want to list the statuses here? For reference? Done. https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/model/... appengine/findit/model/wf_culprit.py:30: def project_name(self): # pragma: no cover On 2016/06/21 17:50:31, chanli wrote: > Why don't just use repo_name? This is for UI. I'm more leaning to make repo_name the full path of repo checkout dir (i.e. third_party/pdfium), while project name as the short name (i.e. pdfium). https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/waterf... File appengine/findit/waterfall/identify_try_job_culprit_pipeline.py (right): https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/waterf... appengine/findit/waterfall/identify_try_job_culprit_pipeline.py:344: _NotifyCulprits(master_name, builder_name, build_number, culprits) On 2016/06/21 17:50:31, chanli wrote: > So only culprits found by try jobs will be included, right? And that's expected? That's what I thought. Are you proposing heuristic results should be notified too? > > If so, when do you want to enable this change in prod? I mean, should we wait > for the accuracy rate for try jobs to get higher? As in the other pipeline, we only notify if the same culprit is identified for 2+ different builders. I think that's good enough to avoid the false positive caused by flaky compile. Did you find some cases that it won't work? I was supposed to add a flag to disable the notification. But it will be in a separate CL. https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/waterf... File appengine/findit/waterfall/send_notification_for_culprit_pipeline.py (right): https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/waterf... appengine/findit/waterfall/send_notification_for_culprit_pipeline.py:30: # builds to avoid false positive due to flakiness. On 2016/06/21 17:50:31, chanli wrote: > Do you have data to back up this idea? What percentage of false positives could > be filtered out by this? This is from data in the InCorrect-Found. Only one case can't be caught by this filter so far. https://codereview.chromium.org/2075423002/diff/60001/appengine/findit/model/... File appengine/findit/model/wf_culprit.py (right): https://codereview.chromium.org/2075423002/diff/60001/appengine/findit/model/... appengine/findit/model/wf_culprit.py:23: cr_notification_time = ndb.DateTimeProperty() On 2016/06/21 17:59:54, lijeffrey wrote: > do we still want to do ndb.DateTimeProperty(indexed=True) to be able to query > for this later? By default, it is indexed. But I made it explicit.
Description was changed from ========== [Findit] Group failures by culprit to send notifications. BUG=621140 ========== to ========== [Findit] Group failures by culprit and send notification to codereview. BUG=621140 ==========
https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/waterf... File appengine/findit/waterfall/identify_try_job_culprit_pipeline.py (right): https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/waterf... appengine/findit/waterfall/identify_try_job_culprit_pipeline.py:344: _NotifyCulprits(master_name, builder_name, build_number, culprits) > As in the other pipeline, we only notify if the same culprit is identified for > 2+ different builders. I think that's good enough to avoid the false positive > caused by flaky compile. > Did you find some cases that it won't work? > > I was supposed to add a flag to disable the notification. > But it will be in a separate CL. Such as the case where the culprit revision is skipped by analyze? https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/waterf... File appengine/findit/waterfall/send_notification_for_culprit_pipeline.py (right): https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/waterf... appengine/findit/waterfall/send_notification_for_culprit_pipeline.py:30: # builds to avoid false positive due to flakiness. On 2016/06/24 16:10:28, stgao wrote: > On 2016/06/21 17:50:31, chanli wrote: > > Do you have data to back up this idea? What percentage of false positives > could > > be filtered out by this? > > This is from data in the InCorrect-Found. Only one case can't be caught by this > filter so far. I'm not sure about it: I can see quite several cases where multiple builds shared the same culprit but that culprit is incorrect. For example: 401764: [(chromium.memory, Linux Chromium OS ASan LSan Tests (1), 13919), (chromium.win, Win7 Tests (1), 54147), (chromium.win, Win7 (32) Tests, 6659), ...] 398149: [(chromium.chromiumos, ChromiumOS amd64-generic Compile, 19370), (chromium.chromiumos, ChromiumOS daisy Compile, 21018), (chromium.chromiumos, ChromiumOS x86-generic Compile, 20468)] 393771: [(chromium.chrome, Google Chrome Win, 7100), (chromium, Win, 43403)] I would say we can go with this approach but we'd better wait for try jobs to be more reliable
On 2016/06/24 23:42:38, chanli wrote: > https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/waterf... > File appengine/findit/waterfall/identify_try_job_culprit_pipeline.py (right): > > https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/waterf... > appengine/findit/waterfall/identify_try_job_culprit_pipeline.py:344: > _NotifyCulprits(master_name, builder_name, build_number, culprits) > > As in the other pipeline, we only notify if the same culprit is identified for > > 2+ different builders. I think that's good enough to avoid the false positive > > caused by flaky compile. > > Did you find some cases that it won't work? > > > > I was supposed to add a flag to disable the notification. > > But it will be in a separate CL. > > Such as the case where the culprit revision is skipped by analyze? > > https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/waterf... > File appengine/findit/waterfall/send_notification_for_culprit_pipeline.py > (right): > > https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/waterf... > appengine/findit/waterfall/send_notification_for_culprit_pipeline.py:30: # > builds to avoid false positive due to flakiness. > On 2016/06/24 16:10:28, stgao wrote: > > On 2016/06/21 17:50:31, chanli wrote: > > > Do you have data to back up this idea? What percentage of false positives > > could > > > be filtered out by this? > > > > This is from data in the InCorrect-Found. Only one case can't be caught by > this > > filter so far. > > I'm not sure about it: I can see quite several cases where multiple builds > shared the same culprit but that culprit is incorrect. > > For example: > 401764: [(chromium.memory, Linux Chromium OS ASan LSan Tests (1), 13919), > (chromium.win, Win7 Tests (1), 54147), (chromium.win, Win7 (32) Tests, 6659), > ...] > > 398149: [(chromium.chromiumos, ChromiumOS amd64-generic Compile, 19370), > (chromium.chromiumos, ChromiumOS daisy Compile, 21018), (chromium.chromiumos, > ChromiumOS x86-generic Compile, 20468)] > > 393771: [(chromium.chrome, Google Chrome Win, 7100), (chromium, Win, 43403)] > > I would say we can go with this approach but we'd better wait for try jobs to be > more reliable lgtm on functionality, though I'm a little worried that we can not avoid all the false positives.
On 2016/06/24 23:44:55, chanli wrote: > On 2016/06/24 23:42:38, chanli wrote: > > > https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/waterf... > > File appengine/findit/waterfall/identify_try_job_culprit_pipeline.py (right): > > > > > https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/waterf... > > appengine/findit/waterfall/identify_try_job_culprit_pipeline.py:344: > > _NotifyCulprits(master_name, builder_name, build_number, culprits) > > > As in the other pipeline, we only notify if the same culprit is identified > for > > > 2+ different builders. I think that's good enough to avoid the false > positive > > > caused by flaky compile. > > > Did you find some cases that it won't work? > > > > > > I was supposed to add a flag to disable the notification. > > > But it will be in a separate CL. > > > > Such as the case where the culprit revision is skipped by analyze? > > > > > https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/waterf... > > File appengine/findit/waterfall/send_notification_for_culprit_pipeline.py > > (right): > > > > > https://codereview.chromium.org/2075423002/diff/20001/appengine/findit/waterf... > > appengine/findit/waterfall/send_notification_for_culprit_pipeline.py:30: # > > builds to avoid false positive due to flakiness. > > On 2016/06/24 16:10:28, stgao wrote: > > > On 2016/06/21 17:50:31, chanli wrote: > > > > Do you have data to back up this idea? What percentage of false positives > > > could > > > > be filtered out by this? > > > > > > This is from data in the InCorrect-Found. Only one case can't be caught by > > this > > > filter so far. > > > > I'm not sure about it: I can see quite several cases where multiple builds > > shared the same culprit but that culprit is incorrect. > > > > For example: > > 401764: [(chromium.memory, Linux Chromium OS ASan LSan Tests (1), 13919), > > (chromium.win, Win7 Tests (1), 54147), (chromium.win, Win7 (32) Tests, 6659), > > ...] This is a bug in the dependency "Analyze", not an issue within Findit. chanli@, was the bug filed to the two guys working on "Analyze"? > > > > 398149: [(chromium.chromiumos, ChromiumOS amd64-generic Compile, 19370), > > (chromium.chromiumos, ChromiumOS daisy Compile, 21018), (chromium.chromiumos, > > ChromiumOS x86-generic Compile, 20468)] This is not clear to me yet, but seems more like an infra issue. > > > > 393771: [(chromium.chrome, Google Chrome Win, 7100), (chromium, Win, 43403)] This is flaky compile failure. Findit can't do anything for it. > > > > I would say we can go with this approach but we'd better wait for try jobs to > be > > more reliable > > lgtm on functionality, though I'm a little worried that we can not avoid all the > false positives. I will land as-is. Later we could tune the criteria of when the notification should be sent.
The CQ bit was checked by stgao@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from lijeffrey@chromium.org Link to the patchset: https://codereview.chromium.org/2075423002/#ps100001 (title: "Add unittests and address comments.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: Infra Mac Tester on master.tryserver.infra (JOB_FAILED, https://build.chromium.org/p/tryserver.infra/builders/Infra%20Mac%20Tester/bu...)
The CQ bit was checked by stgao@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from chanli@chromium.org, lijeffrey@chromium.org Link to the patchset: https://codereview.chromium.org/2075423002/#ps140001 (title: "Fix testname.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: Infra Linux Trusty 64 Tester on master.tryserver.infra (JOB_FAILED, https://build.chromium.org/p/tryserver.infra/builders/Infra%20Linux%20Trusty%...)
The CQ bit was checked by stgao@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: Infra Mac Tester on master.tryserver.infra (JOB_FAILED, https://build.chromium.org/p/tryserver.infra/builders/Infra%20Mac%20Tester/bu...)
The CQ bit was checked by stgao@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: Infra Presubmit on master.tryserver.infra (JOB_FAILED, https://build.chromium.org/p/tryserver.infra/builders/Infra%20Presubmit/build...)
The CQ bit was checked by stgao@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: Infra Presubmit on master.tryserver.infra (JOB_FAILED, https://build.chromium.org/p/tryserver.infra/builders/Infra%20Presubmit/build...)
The CQ bit was checked by stgao@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: Infra Presubmit on master.tryserver.infra (JOB_FAILED, https://build.chromium.org/p/tryserver.infra/builders/Infra%20Presubmit/build...)
Patchset #10 (id:240001) has been deleted
The CQ bit was checked by stgao@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: Infra Win Tester on master.tryserver.infra (JOB_FAILED, https://build.chromium.org/p/tryserver.infra/builders/Infra%20Win%20Tester/bu...)
The CQ bit was checked by stgao@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: Infra Linux Trusty 64 Tester on master.tryserver.infra (JOB_FAILED, https://build.chromium.org/p/tryserver.infra/builders/Infra%20Linux%20Trusty%...)
The CQ bit was checked by stgao@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Patchset #6 (id:160001) has been deleted
Patchset #6 (id:180001) has been deleted
Patchset #6 (id:200001) has been deleted
Patchset #6 (id:220001) has been deleted
Patchset #6 (id:260001) has been deleted
Patchset #6 (id:280001) has been deleted
Patchset #6 (id:300001) has been deleted
Patchset #6 (id:320001) has been deleted
Patchset #6 (id:340001) has been deleted
Patchset #5 (id:140001) has been deleted
Patchset #4 (id:120001) has been deleted
The CQ bit was checked by stgao@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from chanli@chromium.org, lijeffrey@chromium.org Link to the patchset: https://codereview.chromium.org/2075423002/#ps360001 (title: "Clean up.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Message was sent while issue was closed.
Description was changed from ========== [Findit] Group failures by culprit and send notification to codereview. BUG=621140 ========== to ========== [Findit] Group failures by culprit and send notification to codereview. BUG=621140 Committed: https://chromium.googlesource.com/infra/infra/+/6f5db5c7c252a2f2ae9557d48788b... ==========
Message was sent while issue was closed.
Committed patchset #4 (id:360001) as https://chromium.googlesource.com/infra/infra/+/6f5db5c7c252a2f2ae9557d48788b... |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
