|
|
Description[scheduler] Discard anomalously long tasks and bump metrics version.
We assume that very long tasks (> 30 seconds) are a result of
measurement errors such as system falling asleep in the middle of a
task.
R=skyostil@chromium.org,isherman@chromium.org
Review-Url: https://codereview.chromium.org/2891393002
Cr-Commit-Position: refs/heads/master@{#473890}
Committed: https://chromium.googlesource.com/chromium/src/+/1dc490d564adebf5ed17d275ac5654eb9d63606e
Patch Set 1 #
Total comments: 6
Patch Set 2 : addressed comments from skyostil@ #
Total comments: 8
Patch Set 3 : Addressed comments from isherman@ #
Messages
Total messages: 27 (14 generated)
Description was changed from ========== [scheduler] Discard anomalously long tasks and bump metrics version. We assume that very long tasks (> 30 seconds) are a result of measurement errors such as system falling asleep in the middle of a task. R=skyostil@chromium.org ========== to ========== [scheduler] Discard anomalously long tasks and bump metrics version. We assume that very long tasks (> 30 seconds) are a result of measurement errors such as system falling asleep in the middle of a task. R=skyostil@chromium.org,isherman@chromium.org ==========
altimin@chromium.org changed reviewers: + isherman@chromium.org
PTAL
lgtm with two changes. https://codereview.chromium.org/2891393002/diff/1/third_party/WebKit/Source/p... File third_party/WebKit/Source/platform/scheduler/renderer/renderer_scheduler_impl.cc (right): https://codereview.chromium.org/2891393002/diff/1/third_party/WebKit/Source/p... third_party/WebKit/Source/platform/scheduler/renderer/renderer_scheduler_impl.cc:72: "RendererScheduler.ForegroundRendererLoad2", load_percentage); I don't think we need to rename the counters since they're not stored anywhere. https://codereview.chromium.org/2891393002/diff/1/third_party/WebKit/Source/p... third_party/WebKit/Source/platform/scheduler/renderer/renderer_scheduler_impl.cc:84: TRACE_COUNTER1(TRACE_DISABLED_BY_DEFAULT("renderer.scheduler"), Ditto. https://codereview.chromium.org/2891393002/diff/1/third_party/WebKit/Source/p... third_party/WebKit/Source/platform/scheduler/renderer/renderer_scheduler_impl.cc:1858: UMA_HISTOGRAM_CUSTOM_COUNTS("RendererScheduler.TaskTime", I guess we're not renaming this since we only added it recently?
https://codereview.chromium.org/2891393002/diff/1/third_party/WebKit/Source/p... File third_party/WebKit/Source/platform/scheduler/renderer/renderer_scheduler_impl.cc (right): https://codereview.chromium.org/2891393002/diff/1/third_party/WebKit/Source/p... third_party/WebKit/Source/platform/scheduler/renderer/renderer_scheduler_impl.cc:72: "RendererScheduler.ForegroundRendererLoad2", load_percentage); On 2017/05/19 16:11:18, Sami wrote: > I don't think we need to rename the counters since they're not stored anywhere. Done. https://codereview.chromium.org/2891393002/diff/1/third_party/WebKit/Source/p... third_party/WebKit/Source/platform/scheduler/renderer/renderer_scheduler_impl.cc:84: TRACE_COUNTER1(TRACE_DISABLED_BY_DEFAULT("renderer.scheduler"), On 2017/05/19 16:11:18, Sami wrote: > Ditto. Done. https://codereview.chromium.org/2891393002/diff/1/third_party/WebKit/Source/p... third_party/WebKit/Source/platform/scheduler/renderer/renderer_scheduler_impl.cc:1858: UMA_HISTOGRAM_CUSTOM_COUNTS("RendererScheduler.TaskTime", On 2017/05/19 16:11:18, Sami wrote: > I guess we're not renaming this since we only added it recently? Actually, that's the oldest of 'em all. Thanks for spotting.
The CQ bit was checked by altimin@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2891393002/diff/20001/third_party/WebKit/Sour... File third_party/WebKit/Source/platform/scheduler/renderer/renderer_scheduler_impl.cc (right): https://codereview.chromium.org/2891393002/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/scheduler/renderer/renderer_scheduler_impl.cc:58: // of the task). Hmm, why are you making this assumption? Could you specifically subtract out sleep time, rather than just guessing? I'm concerned that we might be blinding ourselves to real egregious tasks by simply ignoring them.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
https://codereview.chromium.org/2891393002/diff/20001/third_party/WebKit/Sour... File third_party/WebKit/Source/platform/scheduler/renderer/renderer_scheduler_impl.cc (right): https://codereview.chromium.org/2891393002/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/scheduler/renderer/renderer_scheduler_impl.cc:58: // of the task). On 2017/05/19 21:15:11, Ilya Sherman wrote: > Hmm, why are you making this assumption? Could you specifically subtract out > sleep time, rather than just guessing? I'm concerned that we might be blinding > ourselves to real egregious tasks by simply ignoring them. The problem is that we can't reliably detect when system is sleeping (plus it's very platform-specific). We are already doing something like this for other metrics. It is a good idea to add one more histogram to track the number (and duration) of discarded tasks? This way we can monitor and act if something goes wrong. (The assumption is here that we have a very small number of bad tasks).
https://codereview.chromium.org/2891393002/diff/20001/third_party/WebKit/Sour... File third_party/WebKit/Source/platform/scheduler/renderer/renderer_scheduler_impl.cc (right): https://codereview.chromium.org/2891393002/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/scheduler/renderer/renderer_scheduler_impl.cc:58: // of the task). FWIW the inactive renderer dialog should already be triggering for very long tasks like these and we get crash reports when that happens (if the user agrees to send one), so I think we should be already covered here.
https://codereview.chromium.org/2891393002/diff/20001/third_party/WebKit/Sour... File third_party/WebKit/Source/platform/scheduler/renderer/renderer_scheduler_impl.cc (right): https://codereview.chromium.org/2891393002/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/scheduler/renderer/renderer_scheduler_impl.cc:58: // of the task). On 2017/05/19 22:34:20, altimin wrote: > On 2017/05/19 21:15:11, Ilya Sherman wrote: > > Hmm, why are you making this assumption? Could you specifically subtract out > > sleep time, rather than just guessing? I'm concerned that we might be > blinding > > ourselves to real egregious tasks by simply ignoring them. > > The problem is that we can't reliably detect when system is sleeping (plus it's > very platform-specific). > We are already doing something like this for other metrics. > > It is a good idea to add one more histogram to track the number (and duration) > of discarded tasks? This way we can monitor and act if something goes wrong. > (The assumption is here that we have a very small number of bad tasks). Having some visibility into what data is being missed by this metric does seem appropriate. If there's already coverage for this, then LGTM; otherwise, yes, please do add an additional histogram. https://codereview.chromium.org/2891393002/diff/20001/tools/metrics/histogram... File tools/metrics/histograms/histograms.xml (left): https://codereview.chromium.org/2891393002/diff/20001/tools/metrics/histogram... tools/metrics/histograms/histograms.xml:59038: -<histogram name="RendererScheduler.BackgroundRendererMainThreadLoad" units="%"> Please mark the old histogram names as <obsolete> https://codereview.chromium.org/2891393002/diff/20001/tools/metrics/histogram... tools/metrics/histograms/histograms.xml:59041: Renderer main thread load when renderer is backgrounded, i.e. percentage of Please update each histogram's summary to document that some events are being discarded.
The CQ bit was checked by altimin@chromium.org to run a CQ dry run
https://codereview.chromium.org/2891393002/diff/20001/tools/metrics/histogram... File tools/metrics/histograms/histograms.xml (left): https://codereview.chromium.org/2891393002/diff/20001/tools/metrics/histogram... tools/metrics/histograms/histograms.xml:59038: -<histogram name="RendererScheduler.BackgroundRendererMainThreadLoad" units="%"> On 2017/05/22 20:43:43, Ilya Sherman wrote: > Please mark the old histogram names as <obsolete> Done. https://codereview.chromium.org/2891393002/diff/20001/tools/metrics/histogram... tools/metrics/histograms/histograms.xml:59041: Renderer main thread load when renderer is backgrounded, i.e. percentage of On 2017/05/22 20:43:43, Ilya Sherman wrote: > Please update each histogram's summary to document that some events are being > discarded. Done.
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
lgtm
The CQ bit was checked by altimin@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from isherman@chromium.org Link to the patchset: https://codereview.chromium.org/2891393002/#ps40001 (title: "Addressed comments from isherman@")
CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 40001, "attempt_start_ts": 1495550431353880, "parent_rev": "006f9034020134314605f8a64b0b3f925d1b6012", "commit_rev": "1dc490d564adebf5ed17d275ac5654eb9d63606e"}
Message was sent while issue was closed.
Description was changed from ========== [scheduler] Discard anomalously long tasks and bump metrics version. We assume that very long tasks (> 30 seconds) are a result of measurement errors such as system falling asleep in the middle of a task. R=skyostil@chromium.org,isherman@chromium.org ========== to ========== [scheduler] Discard anomalously long tasks and bump metrics version. We assume that very long tasks (> 30 seconds) are a result of measurement errors such as system falling asleep in the middle of a task. R=skyostil@chromium.org,isherman@chromium.org Review-Url: https://codereview.chromium.org/2891393002 Cr-Commit-Position: refs/heads/master@{#473890} Committed: https://chromium.googlesource.com/chromium/src/+/1dc490d564adebf5ed17d275ac56... ==========
Message was sent while issue was closed.
Committed patchset #3 (id:40001) as https://chromium.googlesource.com/chromium/src/+/1dc490d564adebf5ed17d275ac56...
Message was sent while issue was closed.
On 2017/05/22 at 20:43:43, isherman wrote: > https://codereview.chromium.org/2891393002/diff/20001/third_party/WebKit/Sour... > File third_party/WebKit/Source/platform/scheduler/renderer/renderer_scheduler_impl.cc (right): > > https://codereview.chromium.org/2891393002/diff/20001/third_party/WebKit/Sour... > third_party/WebKit/Source/platform/scheduler/renderer/renderer_scheduler_impl.cc:58: // of the task). > On 2017/05/19 22:34:20, altimin wrote: > > On 2017/05/19 21:15:11, Ilya Sherman wrote: > > > Hmm, why are you making this assumption? Could you specifically subtract out > > > sleep time, rather than just guessing? I'm concerned that we might be > > blinding > > > ourselves to real egregious tasks by simply ignoring them. > > > > The problem is that we can't reliably detect when system is sleeping (plus it's > > very platform-specific). > > We are already doing something like this for other metrics. > > > > It is a good idea to add one more histogram to track the number (and duration) > > of discarded tasks? This way we can monitor and act if something goes wrong. > > (The assumption is here that we have a very small number of bad tasks). > > Having some visibility into what data is being missed by this metric does seem appropriate. If there's already coverage for this, then LGTM; otherwise, yes, please do add an additional histogram. Looks like this feedback didn't get a response. Is there already coverage for this? I don't see it in the patch.
Message was sent while issue was closed.
On 2017/05/23 19:40:10, Z_DONOTUSE wrote: > On 2017/05/22 at 20:43:43, isherman wrote: > > > https://codereview.chromium.org/2891393002/diff/20001/third_party/WebKit/Sour... > > File > third_party/WebKit/Source/platform/scheduler/renderer/renderer_scheduler_impl.cc > (right): > > > > > https://codereview.chromium.org/2891393002/diff/20001/third_party/WebKit/Sour... > > > third_party/WebKit/Source/platform/scheduler/renderer/renderer_scheduler_impl.cc:58: > // of the task). > > On 2017/05/19 22:34:20, altimin wrote: > > > On 2017/05/19 21:15:11, Ilya Sherman wrote: > > > > Hmm, why are you making this assumption? Could you specifically subtract > out > > > > sleep time, rather than just guessing? I'm concerned that we might be > > > blinding > > > > ourselves to real egregious tasks by simply ignoring them. > > > > > > The problem is that we can't reliably detect when system is sleeping (plus > it's > > > very platform-specific). > > > We are already doing something like this for other metrics. > > > > > > It is a good idea to add one more histogram to track the number (and > duration) > > > of discarded tasks? This way we can monitor and act if something goes wrong. > > > (The assumption is here that we have a very small number of bad tasks). > > > > Having some visibility into what data is being missed by this metric does seem > appropriate. If there's already coverage for this, then LGTM; otherwise, yes, > please do add an additional histogram. > > Looks like this feedback didn't get a response. Is there already coverage for > this? I don't see it in the patch. I haven't replied to the comment, sorry. In this case I assume that per Sami's comment we will get a crash dump for a hanged renderer, so we have coverage for this. |