Issue 397663002: Addition of first_gesture_scroll_latency

picksi

I've add two new metrics to help measure blink scheduling improvements: 1. time_for_input_latency_to_stabilize This measures, ...

6 years, 5 months ago (2014-07-15 14:25:59 UTC) #1

eseidel

https://codereview.chromium.org/397663002/diff/1/tools/telemetry/telemetry/util/statistics.py File tools/telemetry/telemetry/util/statistics.py (right): https://codereview.chromium.org/397663002/diff/1/tools/telemetry/telemetry/util/statistics.py#newcode206 tools/telemetry/telemetry/util/statistics.py:206: the mean stabilization time. This gives us a metric ...

6 years, 5 months ago (2014-07-15 21:35:15 UTC) #2

picksi

https://codereview.chromium.org/397663002/diff/1/tools/telemetry/telemetry/util/statistics.py File tools/telemetry/telemetry/util/statistics.py (right): https://codereview.chromium.org/397663002/diff/1/tools/telemetry/telemetry/util/statistics.py#newcode206 tools/telemetry/telemetry/util/statistics.py:206: the mean stabilization time. This gives us a metric ...

6 years, 5 months ago (2014-07-21 09:38:08 UTC) #3

picksi

Addition of unit tests and a small fix revealed by unit-tests. https://codereview.chromium.org/397663002/diff/40001/tools/telemetry/telemetry/util/statistics.py File tools/telemetry/telemetry/util/statistics.py (right): ...

6 years, 5 months ago (2014-07-21 14:06:31 UTC) #4

eseidel

Seems OK to me. I don't have power in this directory. We'd have to check ...

6 years, 5 months ago (2014-07-22 15:37:41 UTC) #5

Sami

Rick, can you have a peek since you had some Strong Opinions about this on ...

6 years, 5 months ago (2014-07-22 15:50:55 UTC) #6

Rick Byers

miletus@ should really weight in on this since it changes and extends some of his ...

6 years, 5 months ago (2014-07-22 21:04:29 UTC) #7

miletus@ should really weight in on this since it changes and extends some of
his input latency stuff.

I've been trying to figure out how this is different/better than simply
reporting the latency of the first GSU event.  Conceptually it's the latency of
the first GSU that matters to the user - that's how long from the time they
started dragging their finger to the time scrolling was free to run smoothly. 
Note that the trace viewer UI is deceiving here - it doesn't visualize the
ORIGINAL_COMPONENT entries (which is what's used in telemetry to mark the start
of the GSU) - you can quickly approximate this by looking at the total time of a
matching pair of InputLatency::TouchMove and InputLatency::GestureScrollUpdate
events.  So in practice I don't think there's really a bunch of different input
events whose latency we're waiting to stabilize, but exactly ONE important GSU
event and potentially a whole bunch of related supporting events (TouchMoves,
etc.) whose latency are all related to the latency of this special GSU.

One important way your approach is better is that it can be used to capture page
load jank for scenarios other than just GestureScroll.  In particular, for pages
that implement their own scrolling/dragging in javascript based on touchmove
events.  In that case we really do need some measure of the all the different
events.  Is the 90th percentile measure really what we want though?  What if the
page load jank exceeds 10% of the total gesture time, then we'd choose some
relatively arbitrary point within the jank as the cutoff.  Another option would
be to report just the max input event latency (the number of high-latency events
is already accounted for to some extent in mean_input_event_latency).

Are you planning on adding a new pageset specifically for measuring the blink
scheduler improvements?  Measurements of page load jank will be highly sensitive
to the amount of delay we insert between page load and scroll start, and this
isn't terribly predictable today (many page sets poll with a decaying delay
until some condition is true before starting the gesture).  I'm a little worried
we'll get lots of noise in smoothness from this metric.  We might need to reload
a page ~10 times and take the average to get something remotely stable.  Perhaps
this needs to be a new measurement instead of part of smoothness?

Can you share some results from running our existing smoothness tests with this
metric (maybe with a page_repeat of >= 10 so we can see how stable it is
compared to our other metrics)?

Also your BUG= in the CL description appears to have the wrong bug number.

Anyway if you've got data that shows this metric is useful (relatively stable
between runs, anecdotally discriminates well sites we know to be problematic on
load, etc.) then I'm OK with this CL.  This is a tough problem and we shouldn't
block progress on coming up with a perfect solution.  But I think our choice of
metric is very important to this work, and historically it's hard to come up
with metrics that are this complex and still useful in practice.

https://codereview.chromium.org/397663002/diff/40001/tools/telemetry/telemetr...
File tools/telemetry/telemetry/util/statistics.py (right):

https://codereview.chromium.org/397663002/diff/40001/tools/telemetry/telemetr...
tools/telemetry/telemetry/util/statistics.py:218: latency_list =
GetLatenciesAfterTime(event_details_list, 0)
nit: this would be shorter and clearer to me (and remove the need for a comment)
if you just did the map inline rather than (ab)use your helper.  Eg.
"latency_list = [e.latency for e in event_details_list]"

https://codereview.chromium.org/397663002/diff/40001/tools/telemetry/telemetr...
tools/telemetry/telemetry/util/statistics.py:260: """" Given a list of
InputEventDetails tuples and an time return a
nit: s/an/a/

https://codereview.chromium.org/397663002/diff/40001/tools/telemetry/telemetr...
tools/telemetry/telemetry/util/statistics.py:278: for event_details in
event_details_list:
I believe you can write these 4 lines more concisely in python with list
comprehension, eg:

return [e.latency for e in event_details if e.start_time >=
absolute_stabilization_time]

https://codereview.chromium.org/397663002/diff/40001/tools/telemetry/telemetr...
File tools/telemetry/telemetry/web_perf/metrics/rendering_stats.py (right):

https://codereview.chromium.org/397663002/diff/40001/tools/telemetry/telemetr...
tools/telemetry/telemetry/web_perf/metrics/rendering_stats.py:57: def
FindEventStartAndEnd(event):
nit: This function is very specific to input events / LatencyInfo.  Maybe call
it 'FindInputEventStartAndEnd' or even just 'FindInputEventBounds'.

Yufeng Shen (Slow to review)

My concern is that time_for_input_latency_to_stabilize is not a very straightforward metric and one has to ...

6 years, 5 months ago (2014-07-22 21:33:40 UTC) #8

My concern is that time_for_input_latency_to_stabilize is not a very
straightforward
metric and one has to think about what it tries to capture and why we choose
this
metric. If I were to do it, I would probably just go with something like the
average
of first 3 input events' latency.

But I don't mean to block this one. I would love to see how this metric value
changes/improves
when we have a real CL that improves the main thread input responsiveness.


On 2014/07/22 21:04:29, Rick Byers wrote:
> miletus@ should really weight in on this since it changes and extends some of
> his input latency stuff.
> 
> I've been trying to figure out how this is different/better than simply
> reporting the latency of the first GSU event.  Conceptually it's the latency
of
> the first GSU that matters to the user - that's how long from the time they
> started dragging their finger to the time scrolling was free to run smoothly. 
> Note that the trace viewer UI is deceiving here - it doesn't visualize the
> ORIGINAL_COMPONENT entries (which is what's used in telemetry to mark the
start
> of the GSU) - you can quickly approximate this by looking at the total time of
a
> matching pair of InputLatency::TouchMove and InputLatency::GestureScrollUpdate
> events.  So in practice I don't think there's really a bunch of different
input
> events whose latency we're waiting to stabilize, but exactly ONE important GSU
> event and potentially a whole bunch of related supporting events (TouchMoves,
> etc.) whose latency are all related to the latency of this special GSU.
> 
> One important way your approach is better is that it can be used to capture
page
> load jank for scenarios other than just GestureScroll.  In particular, for
pages
> that implement their own scrolling/dragging in javascript based on touchmove
> events.  In that case we really do need some measure of the all the different
> events.  Is the 90th percentile measure really what we want though?  What if
the
> page load jank exceeds 10% of the total gesture time, then we'd choose some
> relatively arbitrary point within the jank as the cutoff.  Another option
would
> be to report just the max input event latency (the number of high-latency
events
> is already accounted for to some extent in mean_input_event_latency).
> 
> Are you planning on adding a new pageset specifically for measuring the blink
> scheduler improvements?  Measurements of page load jank will be highly
sensitive
> to the amount of delay we insert between page load and scroll start, and this
> isn't terribly predictable today (many page sets poll with a decaying delay
> until some condition is true before starting the gesture).  I'm a little
worried
> we'll get lots of noise in smoothness from this metric.  We might need to
reload
> a page ~10 times and take the average to get something remotely stable. 
Perhaps
> this needs to be a new measurement instead of part of smoothness?
> 
> Can you share some results from running our existing smoothness tests with
this
> metric (maybe with a page_repeat of >= 10 so we can see how stable it is
> compared to our other metrics)?
> 
> Also your BUG= in the CL description appears to have the wrong bug number.
> 
> Anyway if you've got data that shows this metric is useful (relatively stable
> between runs, anecdotally discriminates well sites we know to be problematic
on
> load, etc.) then I'm OK with this CL.  This is a tough problem and we
shouldn't
> block progress on coming up with a perfect solution.  But I think our choice
of
> metric is very important to this work, and historically it's hard to come up
> with metrics that are this complex and still useful in practice.
> 
>
https://codereview.chromium.org/397663002/diff/40001/tools/telemetry/telemetr...
> File tools/telemetry/telemetry/util/statistics.py (right):
> 
>
https://codereview.chromium.org/397663002/diff/40001/tools/telemetry/telemetr...
> tools/telemetry/telemetry/util/statistics.py:218: latency_list =
> GetLatenciesAfterTime(event_details_list, 0)
> nit: this would be shorter and clearer to me (and remove the need for a
comment)
> if you just did the map inline rather than (ab)use your helper.  Eg.
> "latency_list = [e.latency for e in event_details_list]"
> 
>
https://codereview.chromium.org/397663002/diff/40001/tools/telemetry/telemetr...
> tools/telemetry/telemetry/util/statistics.py:260: """" Given a list of
> InputEventDetails tuples and an time return a
> nit: s/an/a/
> 
>
https://codereview.chromium.org/397663002/diff/40001/tools/telemetry/telemetr...
> tools/telemetry/telemetry/util/statistics.py:278: for event_details in
> event_details_list:
> I believe you can write these 4 lines more concisely in python with list
> comprehension, eg:
> 
> return [e.latency for e in event_details if e.start_time >=
> absolute_stabilization_time]
> 
>
https://codereview.chromium.org/397663002/diff/40001/tools/telemetry/telemetr...
> File tools/telemetry/telemetry/web_perf/metrics/rendering_stats.py (right):
> 
>
https://codereview.chromium.org/397663002/diff/40001/tools/telemetry/telemetr...
> tools/telemetry/telemetry/web_perf/metrics/rendering_stats.py:57: def
> FindEventStartAndEnd(event):
> nit: This function is very specific to input events / LatencyInfo.  Maybe call
> it 'FindInputEventStartAndEnd' or even just 'FindInputEventBounds'.

jdduke (slow)

On 2014/07/22 21:33:40, Yufeng Shen wrote: > My concern is that time_for_input_latency_to_stabilize is not a ...

6 years, 5 months ago (2014-07-22 22:01:50 UTC) #9

picksi1

Thanks for all the feedback! I'll make the code changes suggested by Rick I'll add ...

6 years, 5 months ago (2014-07-23 10:31:04 UTC) #10

Rick Byers

On 2014/07/23 10:31:04, picksi1 wrote: > Thanks for all the feedback! > > I'll make ...

6 years, 5 months ago (2014-07-23 14:22:21 UTC) #11

On 2014/07/23 10:31:04, picksi1 wrote:
> Thanks for all the feedback!
> 
> I'll make the code changes suggested by Rick
> I'll add another metric to capture the average of the first 3 input event's
> latency
> I'll gather some metrics on good Vs bad sites to show consistency and that the
> metric reflects the differing UX of these sites

Perfect, it's this latter step I think is most valuable - we can debate forever
about what should be the right approach, but if you can show data for any metric
that's stable and does a good job discriminating the cases we care about then
I'm happy :-)

> This metric is different from reporting just the first latency as there is a
> continuous period during which the input events all get delayed, this extended
> period of slow responsiveness lasts longer than a single event's latency as
some
> input events "sneak through the cracks" and will get processed during the
> slow-period.

Are we talking just about a single scroll gesture (as typically done by our
smoothness measurement) here?  If so then what you describe should be
impossible.  Once we get an ACK back for the touchstart and first touchmove
(which are both pre-requisities for sending the first GestureScrollUpdate), then
the scroll should be running freely on impl without blocking on blink at all. 
Can you show me a smoothness trace with a "delayed" input event that happens
AFTER the first GestureScrollUpdate?  If we can find ANY examples of that, then
we've either got a nasty bug we need to fix, or we're just suffering from core
contention (which the bothat exists,it's probably just a bug somewhere.  My
"slow_handler" test in tough_scheduling_cases.py is designed to test exactly
that and so should be an idealized test for your metric (there's a large
synthetic delay enabled for blink event handling, but the scroll runs very
quickly after starting).  You can see a synthetic demo of this at
http://www.rbyers.net/janky-touch-scroll.html with "do lots of work on the main
thread" and "enable empty touchmove handler" enabled - once scrolling starts
it's very smooth even though the blink thread is badly blocked.

If you're looking at manual traces (as opposed to ones generated by telemetry)
then there are some ways you can see something like what you describe:
 - Attempting multiple scrolls (a valid use case, but not something we're
currently testing in telemetry AFAIK)
 - Hitting the limits during a scroll (eg. scrolling a little in the opposite
direction at the start)
If these are the scenarios you're concerned with then we'll need to create a new
PageAction and new/updated page sets to cover these scenarios. 

> When I've added the new first_event_latency we'll be able to see if
> the two metrics are correlated enough to make one of them redundant.

Note that first_event_latency will need to be careful to look at the right
event.  The first couple touchmove events in a sequence may be uninteresting (if
they didn't exceed the slop distance then they won't be forwarded to the
renderer and so have a very low latency).  There's also some other events (eg.
GestureFlingCancel, GestureShowPress, GestureTapCancel) that may not always be
forwarded to blink and so could confuse a naive 'average of first 3 input event'
measurement.  I'd suggest looking specifically at only the first
GestureScrollUpdate event - that's the one that determines how long it takes for
scrolling to start.

picksi1

Thanks for this, there's lots of useful information here! I hadn't appreciated the way that ...

6 years, 5 months ago (2014-07-23 15:29:28 UTC) #12

Yufeng Shen (Slow to review)

On 2014/07/23 14:22:21, Rick Byers wrote: > On 2014/07/23 10:31:04, picksi1 wrote: > > Thanks ...

6 years, 5 months ago (2014-07-23 17:45:17 UTC) #13

On 2014/07/23 14:22:21, Rick Byers wrote:
> On 2014/07/23 10:31:04, picksi1 wrote:
> > Thanks for all the feedback!
> > 
> > I'll make the code changes suggested by Rick
> > I'll add another metric to capture the average of the first 3 input event's
> > latency
> > I'll gather some metrics on good Vs bad sites to show consistency and that
the
> > metric reflects the differing UX of these sites
> 
> Perfect, it's this latter step I think is most valuable - we can debate
forever
> about what should be the right approach, but if you can show data for any
metric
> that's stable and does a good job discriminating the cases we care about then
> I'm happy :-)
>  
> > This metric is different from reporting just the first latency as there is a
> > continuous period during which the input events all get delayed, this
extended
> > period of slow responsiveness lasts longer than a single event's latency as
> some
> > input events "sneak through the cracks" and will get processed during the
> > slow-period.
> 
> Are we talking just about a single scroll gesture (as typically done by our
> smoothness measurement) here?  If so then what you describe should be
> impossible.  Once we get an ACK back for the touchstart and first touchmove
> (which are both pre-requisities for sending the first GestureScrollUpdate),
then
> the scroll should be running freely on impl without blocking on blink at all. 
> Can you show me a smoothness trace with a "delayed" input event that happens
> AFTER the first GestureScrollUpdate?  If we can find ANY examples of that,
then
> we've either got a nasty bug we need to fix, or we're just suffering from core
> contention (which the bothat exists,it's probably just a bug somewhere.  My
> "slow_handler" test in tough_scheduling_cases.py is designed to test exactly
> that and so should be an idealized test for your metric (there's a large
> synthetic delay enabled for blink event handling, but the scroll runs very
> quickly after starting).  You can see a synthetic demo of this at
> http://www.rbyers.net/janky-touch-scroll.html with "do lots of work on the
main
> thread" and "enable empty touchmove handler" enabled - once scrolling starts
> it's very smooth even though the blink thread is badly blocked.
> 
> If you're looking at manual traces (as opposed to ones generated by telemetry)
> then there are some ways you can see something like what you describe:
>  - Attempting multiple scrolls (a valid use case, but not something we're
> currently testing in telemetry AFAIK)
>  - Hitting the limits during a scroll (eg. scrolling a little in the opposite
> direction at the start)
> If these are the scenarios you're concerned with then we'll need to create a
new
> PageAction and new/updated page sets to cover these scenarios. 
> 
> > When I've added the new first_event_latency we'll be able to see if
> > the two metrics are correlated enough to make one of them redundant.
> 
> Note that first_event_latency will need to be careful to look at the right
> event.  The first couple touchmove events in a sequence may be uninteresting
(if
> they didn't exceed the slop distance then they won't be forwarded to the
> renderer and so have a very low latency).  There's also some other events (eg.
> GestureFlingCancel, GestureShowPress, GestureTapCancel) that may not always be
> forwarded to blink and so could confuse a naive 'average of first 3 input
event'
> measurement.  I'd suggest looking specifically at only the first
> GestureScrollUpdate event - that's the one that determines how long it takes
for
> scrolling to start.

sorry I wasn't making it clear that when I say first 3 input events' latency I
meant
the latency for the first 3 input events that have caused swap buffer. If it is
a 
impl scroll page, then it would be the first 3 GestureScrollUpdate. If the page
has
touch handler which does real work, then it is the first 3 touch move latency
(or 
touch begin latency + 2 touch move latency).

Rick Byers

On 2014/07/23 15:29:28, picksi1 wrote: > Thanks for this, there's lots of useful information here! ...

6 years, 5 months ago (2014-07-23 21:03:45 UTC) #14

picksi

I've added a metric to report the average of the first 3 GSU events. I'll ...

6 years, 5 months ago (2014-07-25 12:33:22 UTC) #15

picksi1

I've run smoothness tests with a repeat of 10, the results are here: https://x20web.corp.google.com/~picksi/results.html I ...

6 years, 4 months ago (2014-07-28 14:18:28 UTC) #16

nduca

+ariblue --- ari, you and @picksi should probably become good friends ^_^

6 years, 4 months ago (2014-07-29 19:49:26 UTC) #18

Rick Byers

On 2014/07/28 14:18:28, picksi1 wrote: > I've run smoothness tests with a repeat of 10, ...

6 years, 4 months ago (2014-07-29 19:56:51 UTC) #19

Yufeng Shen (Slow to review)

On 2014/07/29 19:56:51, Rick Byers wrote: > On 2014/07/28 14:18:28, picksi1 wrote: > > I've ...

6 years, 4 months ago (2014-07-29 20:15:30 UTC) #20

On 2014/07/29 19:56:51, Rick Byers wrote:
> On 2014/07/28 14:18:28, picksi1 wrote:
> > I've run smoothness tests with a repeat of 10, the results are here:
> > 
> > https://x20web.corp.google.com/~picksi/results.html
> > 
> > I don't know which pages are known to be bad and which good, so I can't
judge
> > the effectiveness of the metric. I do note, however, that
> http://sports.yahoo.com has
> > the highest input_event_latency_discrepancy and also has the highest
> > time_for_input_latency_to_stabilize and highest
> first_3_GSU_events_latency_mean,
> > so there does appear to be some reasonable correlation.
> 
> Thanks Simon.  They both seem rather noisy - notice that many are +- >100%,
> compared to mean_input_event_latency where the variation across runs was <30%.

> first_3_GSU_events_latency does seem substantially less noisy than
> time_for_input_latency_to_stabilize.  I agree there's some decent correlation
> between the two though.  What if you look only at the first GSU event latency
> (instead of averaging the first 3) - that should be a stronger discriminator.
> 
> Yufeng, were you suggesting the first 3 input events that caused a swap so
that
> we can handle both scroll and touchmove cases?  If he's looking just at GSU
> events, then looking only at the first one should be best, right?  Alternately
> if we want this to be agnostic (work for touch-scroll, mouse, and touch-drag)
> then I agree something like the average of the first 3 swapping events could
> work.
> 

My reasoning behind "1/3 input/GSU events that cause swap":

1) 1 vs 3: this is intended to accommodate the unpredictability of main thread's
work. If we can repro consistently the timing in our tests that the first event
is blocked by predictable main thread work, then 1 event is ok. But, if we can't
consistently repro that, e.g. main thread happens to be idle, first event comes
and gets handled, then main thread becomes busy again and blocks the next event,
then in  this case I think choosing 3 events is likely more robust in capturing
the case of main thread blockage.

2) input vs GSU : if we are only focusing on scrolling, then looking only at GSU
is fine. If we also care about say touch handling in main thread, then we should
be more generic in choosing any input event that cause swap.


> There's some bugs ('JankyDueToBlinkScheduler' or whatever) with specific
> problematic sites that are motivating the scheduling work.  Those were the
> "known bad" sites I was talking about.  Also the synthetic test cases in the 
> touch_scheduler_cases pageset - the "slow touch handler" cases there are
> synthetic worst-case for scroll starting.  I suspect both of your heuristics
> should do a good job discriminating against those.  Can you post the results
> against touch_scheduler_cases?

Yufeng Shen (Slow to review)

On 2014/07/29 20:15:30, Yufeng Shen wrote: > On 2014/07/29 19:56:51, Rick Byers wrote: > > ...

6 years, 4 months ago (2014-07-29 21:42:21 UTC) #21

On 2014/07/29 20:15:30, Yufeng Shen wrote:
> On 2014/07/29 19:56:51, Rick Byers wrote:
> > On 2014/07/28 14:18:28, picksi1 wrote:
> > > I've run smoothness tests with a repeat of 10, the results are here:
> > > 
> > > https://x20web.corp.google.com/~picksi/results.html
> > > 
> > > I don't know which pages are known to be bad and which good, so I can't
> judge
> > > the effectiveness of the metric. I do note, however, that
> > http://sports.yahoo.com has
> > > the highest input_event_latency_discrepancy and also has the highest
> > > time_for_input_latency_to_stabilize and highest
> > first_3_GSU_events_latency_mean,
> > > so there does appear to be some reasonable correlation.
> > 
> > Thanks Simon.  They both seem rather noisy - notice that many are +- >100%,
> > compared to mean_input_event_latency where the variation across runs was
<30%.
> 
> > first_3_GSU_events_latency does seem substantially less noisy than
> > time_for_input_latency_to_stabilize.  I agree there's some decent
correlation
> > between the two though.  What if you look only at the first GSU event
latency
> > (instead of averaging the first 3) - that should be a stronger
discriminator.
> > 
> > Yufeng, were you suggesting the first 3 input events that caused a swap so
> that
> > we can handle both scroll and touchmove cases?  If he's looking just at GSU
> > events, then looking only at the first one should be best, right? 
Alternately
> > if we want this to be agnostic (work for touch-scroll, mouse, and
touch-drag)
> > then I agree something like the average of the first 3 swapping events could
> > work.
> > 
> 
> My reasoning behind "1/3 input/GSU events that cause swap":
> 
> 1) 1 vs 3: this is intended to accommodate the unpredictability of main
thread's
> work. If we can repro consistently the timing in our tests that the first
event
> is blocked by predictable main thread work, then 1 event is ok. But, if we
can't
> consistently repro that, e.g. main thread happens to be idle, first event
comes
> and gets handled, then main thread becomes busy again and blocks the next
event,
> then in  this case I think choosing 3 events is likely more robust in
capturing
> the case of main thread blockage.
>

OK, Rick corrected me on this offline. For impl scrolling, once the first GSU
starts,
then main thread will no longer block the scrolling since we handle the
following
GSUs in the impl thread.

The reason we see not only just the first GSU, but the first few GSUs all have
large
input latency is because we coalesce multiple touch moves in the browser and
only send
one coalesced touch move to the renderer. If the main thread is busy, the ack
for the
coalesced touch move gets delayed and so all the individual touch move gets
delayed ack.
And when those touch moves turn into GSUs, those GSUs will have large input
latency.

 
> 2) input vs GSU : if we are only focusing on scrolling, then looking only at
GSU
> is fine. If we also care about say touch handling in main thread, then we
should
> be more generic in choosing any input event that cause swap.
> 
> 
> > There's some bugs ('JankyDueToBlinkScheduler' or whatever) with specific
> > problematic sites that are motivating the scheduling work.  Those were the
> > "known bad" sites I was talking about.  Also the synthetic test cases in the

> > touch_scheduler_cases pageset - the "slow touch handler" cases there are
> > synthetic worst-case for scroll starting.  I suspect both of your heuristics
> > should do a good job discriminating against those.  Can you post the results
> > against touch_scheduler_cases?

picksi1

I've run tough_scheduling_cases with a repeat of 10 and have put the results here: https://x20web.corp.google.com/~picksi/GSU1and3ToughSchedulingCases.html ...

6 years, 4 months ago (2014-07-30 14:46:57 UTC) #22

picksi1

Queuing_duration is an average of how long input events spend waiting in a task queue ...

6 years, 4 months ago (2014-07-31 13:49:33 UTC) #24

jdduke (slow)

On 2014/07/30 14:46:57, picksi1 wrote: > I've run tough_scheduling_cases with a repeat of 10 and ...

6 years, 4 months ago (2014-07-31 15:29:58 UTC) #25

Rick Byers

Looking at the tough_scheduler_cases_results, there are some interesting points: 1) look at first_1_GSU_events_latency_mean.empty_touch_handler.html?slow_handler. That reports ...

6 years, 4 months ago (2014-07-31 17:47:55 UTC) #26

Looking at the tough_scheduler_cases_results, there are some interesting points:

1) look at
first_1_GSU_events_latency_mean.empty_touch_handler.html?slow_handler.  That
reports almost exactly 600ms, which is the latency we've added in this test
(each handleInputEvent is delayed by 200ms and there's touchstart,
gesturetapdown, and touchmove before the scroll can start).  There's also
essentially no variation between runs (~1%). So this is doing a very good job of
capturing the users perception of how long the scroll takes to start in this
simplest possible case.

2) time_for_input_latency_to_stabilize.empty_touch_handler.html?slow_handler
also seems to be doing a good job (630ms +=4%).

3) as expected, first_n_GSU_events_latency doesn't capture touch event handling
cases like touch_handler_scrolling.html - just reports 0.  This is fine - we
reliably get no such signals from such pages, but that shouldn't mess anything
up.  And mean_input_event_latency is the right metric for such a page when the
jank is steady.

4) time_for_input_latency_to_stabilize also doesn't seem to reliably capture
these cases (since the latency is consistent throughout the sequence, there is
no 'stabilize' so the results are meaningless and very noisy.  I'm worried that
if we add this metric we won't be able to alert on in reliably due to cases like
this.  Perhaps it should report 0 if there's not enough variation in the
latency?  Check out touch_handler_scrolling.html?slow_handler and
super_slow_handler - as far as I can tell the result is meaningless here with
tons of variability.

5) queueing_durations.empty_touch_handler.html?slow_handler also appears to
capture the jank on the thread on average, but this isn't directly correlated to
the user experience of scrolling because most of the jank does not block the
scroll.  It's also extremely noisy for some reasons - seems randomly to be
either 0, 200, 400 or 600ms.

6) first_3_GSU_latency shows similar patterns but the effect is muted (as a
result of being averaged with 2 low-latency GSUs).  I don't see any benefit of
this over first_1_GSU_latency.

Taken together this suggests to me that we should be focusing on the first GSU
latency (although I'd suggest simplifying the name to just
gesture_scroll_start_latency).  I'd suggest landing that first and watching it's
performance on the bots.

If we can figure out how to eliminate noise from
time_for_input_latency_to_stabilize when it's not meaningful then adding that
would help give coverage for touch event handling cases during page load.  I'm
not sure this adds enough value on top of the other metrics to justify the
complexity though.

Rick

picksi1

I have removed time to stablize and renamed the GSU metric as you suggested.

6 years, 4 months ago (2014-08-01 11:29:24 UTC) #27

Rick Byers

Thanks Simon, this LGTM (with nit). Also please update the CL description. https://codereview.chromium.org/397663002/diff/120001/tools/telemetry/telemetry/web_perf/metrics/smoothness.py File tools/telemetry/telemetry/web_perf/metrics/smoothness.py ...

6 years, 4 months ago (2014-08-01 12:42:02 UTC) #28

picksi1

On 2014/08/01 12:42:02, Rick Byers wrote: > Thanks Simon, this LGTM (with nit). > > ...

6 years, 4 months ago (2014-08-01 16:41:57 UTC) #29

Rick Byers

On 2014/08/01 16:44:10, picksi1 wrote: > ... and I've changed the Issue name & CL ...

6 years, 4 months ago (2014-08-01 17:48:49 UTC) #31

nduca

On 2014/08/01 at 17:48:49, rbyers wrote: > On 2014/08/01 16:44:10, picksi1 wrote: > > ...

6 years, 4 months ago (2014-08-01 18:52:42 UTC) #32

nduca

Er, i'd like to clarify something here: i'm not a drive by reviewer on this. ...

6 years, 4 months ago (2014-08-01 18:59:01 UTC) #33

nduca

ah, @$@)(#*, i didn't see you removed the quiesence stuff. I really like that btw, ...

6 years, 4 months ago (2014-08-01 19:20:43 UTC) #34

nduca

https://codereview.chromium.org/397663002/diff/140001/tools/telemetry/telemetry/web_perf/metrics/smoothness.py File tools/telemetry/telemetry/web_perf/metrics/smoothness.py (right): https://codereview.chromium.org/397663002/diff/140001/tools/telemetry/telemetry/web_perf/metrics/smoothness.py#newcode38 tools/telemetry/telemetry/web_perf/metrics/smoothness.py:38: input_start_and_latency_GSU = FlattenList(stats.input_start_and_latency_GSU) specifically should we have tests for ...

6 years, 4 months ago (2014-08-01 19:21:27 UTC) #35

jdduke (slow)

https://codereview.chromium.org/397663002/diff/140001/tools/telemetry/telemetry/web_perf/metrics/smoothness.py File tools/telemetry/telemetry/web_perf/metrics/smoothness.py (right): https://codereview.chromium.org/397663002/diff/140001/tools/telemetry/telemetry/web_perf/metrics/smoothness.py#newcode38 tools/telemetry/telemetry/web_perf/metrics/smoothness.py:38: input_start_and_latency_GSU = FlattenList(stats.input_start_and_latency_GSU) On 2014/08/01 19:21:26, nduca wrote: > ...

6 years, 4 months ago (2014-08-09 01:58:50 UTC) #36

tdresser

On 2014/08/09 01:58:50, jdduke wrote: > https://codereview.chromium.org/397663002/diff/140001/tools/telemetry/telemetry/web_perf/metrics/smoothness.py > File tools/telemetry/telemetry/web_perf/metrics/smoothness.py (right): > > https://codereview.chromium.org/397663002/diff/140001/tools/telemetry/telemetry/web_perf/metrics/smoothness.py#newcode38 > ...

6 years, 4 months ago (2014-08-12 18:55:14 UTC) #37

Sami

6 years, 4 months ago (2014-08-14 10:13:40 UTC) #38

On 2014/08/12 18:55:14, tdresser wrote:
> On 2014/08/09 01:58:50, jdduke wrote:
> >
>
https://codereview.chromium.org/397663002/diff/140001/tools/telemetry/telemet...
> > File tools/telemetry/telemetry/web_perf/metrics/smoothness.py (right):
> > 
> >
>
https://codereview.chromium.org/397663002/diff/140001/tools/telemetry/telemet...
> > tools/telemetry/telemetry/web_perf/metrics/smoothness.py:38:
> > input_start_and_latency_GSU = FlattenList(stats.input_start_and_latency_GSU)
> > On 2014/08/01 19:21:26, nduca wrote:
> > > specifically should we have tests for the smoothness metric part? or do we
> not
> > > currently unit test the integration point?
> > 
> > Hmm, looks like the smoothness metric lacks unit tests.  Coverage would be
> nice,
> > maybe in a follow-up patch?
> 
> Looks like the bug id is incorrect.

FYI, I've moved this patch over to https://codereview.chromium.org/467343002/.

Issue 397663002: Addition of first_gesture_scroll_latency (Closed)

Description

Patch Set 1 #

Patch Set 2 : Updates from code reviews #

Patch Set 3 : Addition of unit tests #

Patch Set 4 : Addition of GSU latency metric #

Patch Set 5 : Metric measures only first GSU event #

Patch Set 6 : Removing time to stablise metric #

Patch Set 7 : Tidy-up of unneeded imports & whitespace #

Patch Set 8 : Renaming of new metric #

Messages