Issue 2873223002: Record resource scheduler UMA

tbansal1

The CQ bit was checked by tbansal@chromium.org to run a CQ dry run

3 years, 7 months ago (2017-05-10 02:09:59 UTC) #1

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2873223002/1

3 years, 7 months ago (2017-05-10 02:11:11 UTC) #2

tbansal1

Description was changed from ========== Record UMA BUG= ========== to ========== Record resource scheduler UMA ...

3 years, 7 months ago (2017-05-10 02:11:13 UTC) #3

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 7 months ago (2017-05-10 03:59:15 UTC) #4

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: win_chromium_x64_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_x64_rel_ng/builds/422550)

3 years, 7 months ago (2017-05-10 03:59:16 UTC) #5

tbansal1

The CQ bit was checked by tbansal@chromium.org to run a CQ dry run

3 years, 7 months ago (2017-05-10 17:03:02 UTC) #6

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2873223002/1

3 years, 7 months ago (2017-05-10 17:05:01 UTC) #7

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 7 months ago (2017-05-10 18:42:41 UTC) #8

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

3 years, 7 months ago (2017-05-10 18:42:42 UTC) #9

tbansal1

The CQ bit was checked by tbansal@chromium.org to run a CQ dry run

3 years, 7 months ago (2017-05-10 19:55:01 UTC) #11

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2873223002/20001

3 years, 7 months ago (2017-05-10 19:55:59 UTC) #12

tbansal1

Description was changed from ========== Record resource scheduler UMA BUG= ========== to ========== Record resource ...

3 years, 7 months ago (2017-05-10 21:09:07 UTC) #13

tbansal1

tbansal@chromium.org changed reviewers: + rdsmith@chromium.org

3 years, 7 months ago (2017-05-10 21:09:40 UTC) #14

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 7 months ago (2017-05-10 21:58:04 UTC) #16

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

3 years, 7 months ago (2017-05-10 21:58:05 UTC) #17

tbansal1

The CQ bit was checked by tbansal@chromium.org to run a CQ dry run

3 years, 7 months ago (2017-05-18 20:18:56 UTC) #20

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2873223002/80001

3 years, 7 months ago (2017-05-18 20:20:33 UTC) #21

tbansal1

Description was changed from ========== Record resource scheduler UMA Record UMA on how many requests ...

3 years, 7 months ago (2017-05-18 21:17:58 UTC) #23

Randy Smith (Not in Mondays)

https://codereview.chromium.org/2873223002/diff/100001/content/browser/loader/resource_scheduler.cc File content/browser/loader/resource_scheduler.cc (right): https://codereview.chromium.org/2873223002/diff/100001/content/browser/loader/resource_scheduler.cc#newcode349 content/browser/loader/resource_scheduler.cc:349: // in-flight. I think the comment's out of date? ...

3 years, 7 months ago (2017-05-21 21:43:56 UTC) #25

tbansal1

rdsmith, thanks for the comments. ptal. https://codereview.chromium.org/2873223002/diff/100001/content/browser/loader/resource_scheduler.cc File content/browser/loader/resource_scheduler.cc (right): https://codereview.chromium.org/2873223002/diff/100001/content/browser/loader/resource_scheduler.cc#newcode349 content/browser/loader/resource_scheduler.cc:349: // in-flight. On ...

3 years, 6 months ago (2017-05-25 01:07:47 UTC) #26

rdsmith, thanks for the comments. ptal.

https://codereview.chromium.org/2873223002/diff/100001/content/browser/loader...
File content/browser/loader/resource_scheduler.cc (right):

https://codereview.chromium.org/2873223002/diff/100001/content/browser/loader...
content/browser/loader/resource_scheduler.cc:349: // in-flight.
On 2017/05/21 21:43:56, Randy Smith (Not in Mondays) wrote:
> I think the comment's out of date?  Sounds like it's talking about a boolean.

Thanks. Fixed.

https://codereview.chromium.org/2873223002/diff/100001/content/browser/loader...
content/browser/loader/resource_scheduler.cc:528:
non_delayable_in_flight_count);
On 2017/05/21 21:43:56, Randy Smith (Not in Mondays) wrote:
> I don't have strong objections, but histograms do cost (memory, bandwidth). 
Do
> you need a histogram that duplicates the information available in other
> histograms?  Why?

I am not sure if this is duplicate of some other histogram. This is recording
the difference between two values, however it is not possible to compute this
from the data from the other 2 histograms.

https://codereview.chromium.org/2873223002/diff/100001/content/browser/loader...
content/browser/loader/resource_scheduler.cc:549: RecordRequestCountMetrics();
On 2017/05/21 21:43:56, Randy Smith (Not in Mondays) wrote:
> I remain uncertain that you're going to get good statistics from this.  I'm
> willing to defer to the histograms.xml owner if you'd like (if you
specifically
> raise the issue with them) but to me, this means that you're going to way
> overcount situations in the above histograms when you've got a lot of
requests,
> and undercount when you have fewer.  I'd instead suggest that you have
> RecordRequestCountMetrics on a timer that goes off ever n ms while requests
are
> in-flight and records these metrics at those points.  Alternatively, you could
> do something that integrates the values of the various above metrics over
times
> in between insertions of in flight requests (though you'll need something that
> captures when the last in flight request completes).  But recording histograms
> on what is, as best I can tell, the statistically arbitrary point of when a
new
> request gets inserted still strikes me as wrong.

I agree with the spirit of this concern. The goal of these metrics is to compare
different resource scheduling algorithms, and see how frequently different
algorithms allow a certain number of requests to go in flight. 

One problem with using time-based sampling is that different algorithms may
record different number of samples (if we restrict taking the sample to only
when there are at least 1 request in-flight). This would make it difficult to
compare different algorithms. Recording at request insertion/deletion has the
nice property that changing the scheduling algorithm does not change the number
of samples.

I would agree that this metric has problems (e.g., it is oblivious of the
duration of different states). A more comprehensive way (but with more histogram
bloat) of recording this might be to use histograms of the format:
ResourceScheduler.Count.{All|Delayable|NonDelayble|TotalLayoutBlocking|DelayableWhenLayoutBlocking|DelayableWhenNonDelayable}.{1..N}.
This would require ~6*N histograms.

Then every time a request is inserted or removed, we record for how long (=time
duration) there were X requests in flight. e.g., if there were 3 requests in
flight for 30 seconds, we record a 30 second sample in the histogram:
ResourceScheduler.Count.All.3.
Similarly, we record 5 more samples.

I am hesitant to go with that approach since it is not clear if we really need
to record the time dimension.

Thoughts?

Randy Smith (Not in Mondays)

https://codereview.chromium.org/2873223002/diff/100001/content/browser/loader/resource_scheduler.cc File content/browser/loader/resource_scheduler.cc (right): https://codereview.chromium.org/2873223002/diff/100001/content/browser/loader/resource_scheduler.cc#newcode528 content/browser/loader/resource_scheduler.cc:528: non_delayable_in_flight_count); On 2017/05/25 01:07:46, tbansal1 wrote: > On 2017/05/21 ...

3 years, 6 months ago (2017-05-25 17:40:12 UTC) #27

https://codereview.chromium.org/2873223002/diff/100001/content/browser/loader...
File content/browser/loader/resource_scheduler.cc (right):

https://codereview.chromium.org/2873223002/diff/100001/content/browser/loader...
content/browser/loader/resource_scheduler.cc:528:
non_delayable_in_flight_count);
On 2017/05/25 01:07:46, tbansal1 wrote:
> On 2017/05/21 21:43:56, Randy Smith (Not in Mondays) wrote:
> > I don't have strong objections, but histograms do cost (memory, bandwidth). 
> Do
> > you need a histogram that duplicates the information available in other
> > histograms?  Why?
> 
> I am not sure if this is duplicate of some other histogram. This is recording
> the difference between two values, however it is not possible to compute this
> from the data from the other 2 histograms.

Good point, though it's possible with some serious dremeling.  

Ok, I'll just push back a little bit more than yield: Could you confirm for
yourself that you'll actually use all three of these histograms, rather just
needing the data you can get from two of them in the third context (e.g. average
for the distribution)?  If so, I'm good with them.

https://codereview.chromium.org/2873223002/diff/100001/content/browser/loader...
content/browser/loader/resource_scheduler.cc:549: RecordRequestCountMetrics();
On 2017/05/25 01:07:46, tbansal1 wrote:
> On 2017/05/21 21:43:56, Randy Smith (Not in Mondays) wrote:
> > I remain uncertain that you're going to get good statistics from this.  I'm
> > willing to defer to the histograms.xml owner if you'd like (if you
> specifically
> > raise the issue with them) but to me, this means that you're going to way
> > overcount situations in the above histograms when you've got a lot of
> requests,
> > and undercount when you have fewer.  I'd instead suggest that you have
> > RecordRequestCountMetrics on a timer that goes off ever n ms while requests
> are
> > in-flight and records these metrics at those points.  Alternatively, you
could
> > do something that integrates the values of the various above metrics over
> times
> > in between insertions of in flight requests (though you'll need something
that
> > captures when the last in flight request completes).  But recording
histograms
> > on what is, as best I can tell, the statistically arbitrary point of when a
> new
> > request gets inserted still strikes me as wrong.
> 
> I agree with the spirit of this concern. The goal of these metrics is to
compare
> different resource scheduling algorithms, and see how frequently different
> algorithms allow a certain number of requests to go in flight. 
> 
> One problem with using time-based sampling is that different algorithms may
> record different number of samples (if we restrict taking the sample to only
> when there are at least 1 request in-flight). This would make it difficult to
> compare different algorithms. Recording at request insertion/deletion has the
> nice property that changing the scheduling algorithm does not change the
number
> of samples.
> 
> I would agree that this metric has problems (e.g., it is oblivious of the
> duration of different states). A more comprehensive way (but with more
histogram
> bloat) of recording this might be to use histograms of the format:
>
ResourceScheduler.Count.{All|Delayable|NonDelayble|TotalLayoutBlocking|DelayableWhenLayoutBlocking|DelayableWhenNonDelayable}.{1..N}.
> This would require ~6*N histograms.
> 
> Then every time a request is inserted or removed, we record for how long
(=time
> duration) there were X requests in flight. e.g., if there were 3 requests in
> flight for 30 seconds, we record a 30 second sample in the histogram:
> ResourceScheduler.Count.All.3.
> Similarly, we record 5 more samples.
> 
> I am hesitant to go with that approach since it is not clear if we really need
> to record the time dimension.
> 
> Thoughts?

My problem is that I'm not a statistician, though I'm clearly trying to play one
on TV :-}.  If we take the goal as being only comparing scheduling algorithms, I
think a metric of "fewer at insertion of in flight request is better" is ok, and
this is a fine way to get that metric.  I'd recommending skipping the second
call on erasing in flights requests, though--I don't think we care much about
that value.  

An alternative metric to try and get would be "average number of in-flight
requests for the duration of any given request".  I'm inclined to think that's a
better metric than the value at insertion, as requests will be interfering with
each other for the full overlap of their lifetimes.  I'm happy to yield to your
preference.

If you'd prefer that metric, I don't think it would be hard to compute, and it
would only take one (per type being counted) histogram.  What you'd do is, for
each request, keep three variables: The amount of time the request has been
oustanding for, the average number of other requests active during that time,
and the last time the average was computed.  Whenever there was a change
(erasure or addition) of a request that wasn't the request in question, you
would recompute the average ((average * time outstanding + time since last
computed * other requests active since that time)/(time outstanding + time since
last compued)) and the time outstanding.  When the request completes, the
average gets entered into the histogram(s).

But I'm willing to accept what you currently have if you prefer it to that
(modulo removing the recording from the EraseInFlightRequest() method--if you
think that's a good idea you'll need to pitch it to me).

tbansal1

The CQ bit was checked by tbansal@chromium.org to run a CQ dry run

3 years, 6 months ago (2017-05-31 06:24:38 UTC) #32

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2873223002/200001

3 years, 6 months ago (2017-05-31 06:24:55 UTC) #33

tbansal1

rdsmith: ptal. Thanks for the comments. I have rm'ed some of the histograms. Hopefully, this ...

3 years, 6 months ago (2017-05-31 06:31:21 UTC) #34

rdsmith: ptal. Thanks for the comments. I have rm'ed some of the histograms.
Hopefully, this is somewhat better.

https://codereview.chromium.org/2873223002/diff/100001/content/browser/loader...
File content/browser/loader/resource_scheduler.cc (right):

https://codereview.chromium.org/2873223002/diff/100001/content/browser/loader...
content/browser/loader/resource_scheduler.cc:528:
non_delayable_in_flight_count);
On 2017/05/25 17:40:12, Randy Smith (Not in Mondays) wrote:
> On 2017/05/25 01:07:46, tbansal1 wrote:
> > On 2017/05/21 21:43:56, Randy Smith (Not in Mondays) wrote:
> > > I don't have strong objections, but histograms do cost (memory,
bandwidth). 
> > Do
> > > you need a histogram that duplicates the information available in other
> > > histograms?  Why?
> > 
> > I am not sure if this is duplicate of some other histogram. This is
recording
> > the difference between two values, however it is not possible to compute
this
> > from the data from the other 2 histograms.
> 
> Good point, though it's possible with some serious dremeling.  
> 
> Ok, I'll just push back a little bit more than yield: Could you confirm for
> yourself that you'll actually use all three of these histograms, rather just
> needing the data you can get from two of them in the third context (e.g.
average
> for the distribution)?  If so, I'm good with them.

Yes, for now I think it is useful to record the difference separately so that it
can be analyzed as a function different network qualities.

https://codereview.chromium.org/2873223002/diff/100001/content/browser/loader...
content/browser/loader/resource_scheduler.cc:549: RecordRequestCountMetrics();
On 2017/05/25 17:40:12, Randy Smith (Not in Mondays) wrote:
> On 2017/05/25 01:07:46, tbansal1 wrote:
> > On 2017/05/21 21:43:56, Randy Smith (Not in Mondays) wrote:
> > > I remain uncertain that you're going to get good statistics from this. 
I'm
> > > willing to defer to the histograms.xml owner if you'd like (if you
> > specifically
> > > raise the issue with them) but to me, this means that you're going to way
> > > overcount situations in the above histograms when you've got a lot of
> > requests,
> > > and undercount when you have fewer.  I'd instead suggest that you have
> > > RecordRequestCountMetrics on a timer that goes off ever n ms while
requests
> > are
> > > in-flight and records these metrics at those points.  Alternatively, you
> could
> > > do something that integrates the values of the various above metrics over
> > times
> > > in between insertions of in flight requests (though you'll need something
> that
> > > captures when the last in flight request completes).  But recording
> histograms
> > > on what is, as best I can tell, the statistically arbitrary point of when
a
> > new
> > > request gets inserted still strikes me as wrong.
> > 
> > I agree with the spirit of this concern. The goal of these metrics is to
> compare
> > different resource scheduling algorithms, and see how frequently different
> > algorithms allow a certain number of requests to go in flight. 
> > 
> > One problem with using time-based sampling is that different algorithms may
> > record different number of samples (if we restrict taking the sample to only
> > when there are at least 1 request in-flight). This would make it difficult
to
> > compare different algorithms. Recording at request insertion/deletion has
the
> > nice property that changing the scheduling algorithm does not change the
> number
> > of samples.
> > 
> > I would agree that this metric has problems (e.g., it is oblivious of the
> > duration of different states). A more comprehensive way (but with more
> histogram
> > bloat) of recording this might be to use histograms of the format:
> >
>
ResourceScheduler.Count.{All|Delayable|NonDelayble|TotalLayoutBlocking|DelayableWhenLayoutBlocking|DelayableWhenNonDelayable}.{1..N}.
> > This would require ~6*N histograms.
> > 
> > Then every time a request is inserted or removed, we record for how long
> (=time
> > duration) there were X requests in flight. e.g., if there were 3 requests in
> > flight for 30 seconds, we record a 30 second sample in the histogram:
> > ResourceScheduler.Count.All.3.
> > Similarly, we record 5 more samples.
> > 
> > I am hesitant to go with that approach since it is not clear if we really
need
> > to record the time dimension.
> > 
> > Thoughts?
> 
> My problem is that I'm not a statistician, though I'm clearly trying to play
one
> on TV :-}.  If we take the goal as being only comparing scheduling algorithms,
I
> think a metric of "fewer at insertion of in flight request is better" is ok,
and
> this is a fine way to get that metric.  I'd recommending skipping the second
> call on erasing in flights requests, though--I don't think we care much about
> that value.
Done.  
> 
> An alternative metric to try and get would be "average number of in-flight
> requests for the duration of any given request".  I'm inclined to think that's
a
> better metric than the value at insertion, as requests will be interfering
with
> each other for the full overlap of their lifetimes.  I'm happy to yield to
your
> preference.
> 
> If you'd prefer that metric, I don't think it would be hard to compute, and it
> would only take one (per type being counted) histogram.  What you'd do is, for
> each request, keep three variables: The amount of time the request has been
> oustanding for, the average number of other requests active during that time,
> and the last time the average was computed.  Whenever there was a change
> (erasure or addition) of a request that wasn't the request in question, you
> would recompute the average ((average * time outstanding + time since last
> computed * other requests active since that time)/(time outstanding + time
since
> last compued)) and the time outstanding.  When the request completes, the
> average gets entered into the histogram(s).
Average is a fine metric but then I would argue for recording average of
different types of requests. For example, average of count of requests of type X
when a request of type Y is in flight.
where X,Y belong to {delayable, non-delayble, layout blocking};
This is because we do not want to lump metrics from different types of requests
in the same histogram.
> 
> But I'm willing to accept what you currently have if you prefer it to that
> (modulo removing the recording from the EraseInFlightRequest() method--if you
> think that's a good idea you'll need to pitch it to me).
Done.

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 6 months ago (2017-05-31 07:55:54 UTC) #35

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

3 years, 6 months ago (2017-05-31 07:55:55 UTC) #36

Randy Smith (Not in Mondays)

On 2017/06/06 17:27:56, tbansal1 wrote: > rdsmith: gentle ping. Thanks. Sorry, CAM innovation week + ...

3 years, 6 months ago (2017-06-06 17:30:21 UTC) #38

Randy Smith (Not in Mondays)

LGTM modulo comment change in histograms.xml. https://codereview.chromium.org/2873223002/diff/200001/tools/metrics/histograms/histograms.xml File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/2873223002/diff/200001/tools/metrics/histograms/histograms.xml#newcode61123 tools/metrics/histograms/histograms.xml:61123: + in-flight requests ...

3 years, 6 months ago (2017-06-06 17:41:44 UTC) #39

tbansal1

tbansal@chromium.org changed reviewers: + rkaplow@chromium.org

3 years, 6 months ago (2017-06-06 20:40:05 UTC) #40

tbansal1

rkaplow: ptal at histograms.xml. Thanks. https://codereview.chromium.org/2873223002/diff/200001/tools/metrics/histograms/histograms.xml File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/2873223002/diff/200001/tools/metrics/histograms/histograms.xml#newcode61123 tools/metrics/histograms/histograms.xml:61123: + in-flight requests is ...

3 years, 6 months ago (2017-06-06 20:40:06 UTC) #41

tbansal1

The CQ bit was checked by tbansal@chromium.org to run a CQ dry run

3 years, 6 months ago (2017-06-06 20:40:11 UTC) #42

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2873223002/220001

3 years, 6 months ago (2017-06-06 20:40:59 UTC) #43

tbansal1

The patchset sent to the CQ was uploaded after l-g-t-m from rdsmith@chromium.org Link to the ...

3 years, 6 months ago (2017-06-06 22:06:08 UTC) #47

commit-bot: I haz the power

CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2873223002/220001

3 years, 6 months ago (2017-06-06 22:06:26 UTC) #48

commit-bot: I haz the power

CQ is committing da patch. Bot data: {"patchset_id": 220001, "attempt_start_ts": 1496786768278110, "parent_rev": "a2af3b52cd02bf15d82e879ec24afac043110df8", "commit_rev": "29fc63386af102f88a5373b8b83ee94ac6b4ec45"}

3 years, 6 months ago (2017-06-06 22:50:26 UTC) #49

commit-bot: I haz the power

Description was changed from ========== Record resource scheduler UMA Record UMA on how many requests ...

3 years, 6 months ago (2017-06-06 22:50:39 UTC) #50

commit-bot: I haz the power

Committed patchset #4 (id:220001) as https://chromium.googlesource.com/chromium/src/+/29fc63386af102f88a5373b8b83ee94ac6b4ec45

3 years, 6 months ago (2017-06-06 22:50:41 UTC) #51

falken

On 2017/06/06 22:50:41, commit-bot: I haz the power wrote: > Committed patchset #4 (id:220001) as ...

3 years, 6 months ago (2017-06-07 02:19:39 UTC) #52

falken

Ah, pretty_print.py isn't a presubmit. But pretty_print.py fails now.

3 years, 6 months ago (2017-06-07 02:29:53 UTC) #53

tbansal1

On 2017/06/07 02:29:53, falken wrote: > Ah, pretty_print.py isn't a presubmit. But pretty_print.py fails now. ...

3 years, 6 months ago (2017-06-07 16:17:10 UTC) #54

Timothy Loh

3 years, 6 months ago (2017-06-09 05:59:15 UTC) #55

Message was sent while issue was closed.

On 2017/06/07 16:17:10, tbansal1 wrote:
> On 2017/06/07 02:29:53, falken wrote:
> > Ah, pretty_print.py isn't a presubmit. But pretty_print.py fails now.
> 
> https://codereview.chromium.org/2927773002/ fixed the pretty_print.py.
> Please let me know if it is still broken. falken, thanks for reporting
> this.

Looks fixed now.

Issue 2873223002: Record resource scheduler UMA (Closed)

Description

Patch Set 1 : ps #

Patch Set 2 : ps #

Patch Set 3 : rm some histograms #

Patch Set 4 : rdsmith comment #

Messages