|
|
Chromium Code Reviews
DescriptionRecord resource scheduler UMA
Record UMA on how many requests are in-flight. UMA is recorded only when
the set of in-flight requests changes.
Also, record UMA on highest number of delayable requests in-flight when
a delayable or layout blocking request is in flight. This UMA is
recorded per-request (on select requests).
BUG=720783
Review-Url: https://codereview.chromium.org/2873223002
Cr-Commit-Position: refs/heads/master@{#477447}
Committed: https://chromium.googlesource.com/chromium/src/+/29fc63386af102f88a5373b8b83ee94ac6b4ec45
Patch Set 1 : ps #Patch Set 2 : ps #
Total comments: 10
Patch Set 3 : rm some histograms #
Total comments: 2
Patch Set 4 : rdsmith comment #
Messages
Total messages: 55 (38 generated)
The CQ bit was checked by tbansal@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Description was changed from ========== Record UMA BUG= ========== to ========== Record resource scheduler UMA BUG= ==========
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_chromium_x64_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_x64_...)
The CQ bit was checked by tbansal@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Patchset #1 (id:1) has been deleted
The CQ bit was checked by tbansal@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Description was changed from ========== Record resource scheduler UMA BUG= ========== to ========== Record resource scheduler UMA Record UMA on how many requests are in-flight. UMA is recorded only when the set of in-flight requests changes. BUG=720783 ==========
tbansal@chromium.org changed reviewers: + rdsmith@chromium.org
rdsmith: ptal. Thanks.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Patchset #2 (id:40001) has been deleted
Patchset #2 (id:60001) has been deleted
The CQ bit was checked by tbansal@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Patchset #2 (id:80001) has been deleted
Description was changed from ========== Record resource scheduler UMA Record UMA on how many requests are in-flight. UMA is recorded only when the set of in-flight requests changes. BUG=720783 ========== to ========== Record resource scheduler UMA Record UMA on how many requests are in-flight. UMA is recorded only when the set of in-flight requests changes. Also, record UMA on highest number of delayable requests in-flight when a delayable or layout blocking request is in flight. This UMA is recorded per-request (on select requests). BUG=720783 ==========
rdsmith: PTAnL. Thanks.
https://codereview.chromium.org/2873223002/diff/100001/content/browser/loader... File content/browser/loader/resource_scheduler.cc (right): https://codereview.chromium.org/2873223002/diff/100001/content/browser/loader... content/browser/loader/resource_scheduler.cc:349: // in-flight. I think the comment's out of date? Sounds like it's talking about a boolean. https://codereview.chromium.org/2873223002/diff/100001/content/browser/loader... content/browser/loader/resource_scheduler.cc:528: non_delayable_in_flight_count); I don't have strong objections, but histograms do cost (memory, bandwidth). Do you need a histogram that duplicates the information available in other histograms? Why? https://codereview.chromium.org/2873223002/diff/100001/content/browser/loader... content/browser/loader/resource_scheduler.cc:549: RecordRequestCountMetrics(); I remain uncertain that you're going to get good statistics from this. I'm willing to defer to the histograms.xml owner if you'd like (if you specifically raise the issue with them) but to me, this means that you're going to way overcount situations in the above histograms when you've got a lot of requests, and undercount when you have fewer. I'd instead suggest that you have RecordRequestCountMetrics on a timer that goes off ever n ms while requests are in-flight and records these metrics at those points. Alternatively, you could do something that integrates the values of the various above metrics over times in between insertions of in flight requests (though you'll need something that captures when the last in flight request completes). But recording histograms on what is, as best I can tell, the statistically arbitrary point of when a new request gets inserted still strikes me as wrong.
rdsmith, thanks for the comments. ptal. https://codereview.chromium.org/2873223002/diff/100001/content/browser/loader... File content/browser/loader/resource_scheduler.cc (right): https://codereview.chromium.org/2873223002/diff/100001/content/browser/loader... content/browser/loader/resource_scheduler.cc:349: // in-flight. On 2017/05/21 21:43:56, Randy Smith (Not in Mondays) wrote: > I think the comment's out of date? Sounds like it's talking about a boolean. Thanks. Fixed. https://codereview.chromium.org/2873223002/diff/100001/content/browser/loader... content/browser/loader/resource_scheduler.cc:528: non_delayable_in_flight_count); On 2017/05/21 21:43:56, Randy Smith (Not in Mondays) wrote: > I don't have strong objections, but histograms do cost (memory, bandwidth). Do > you need a histogram that duplicates the information available in other > histograms? Why? I am not sure if this is duplicate of some other histogram. This is recording the difference between two values, however it is not possible to compute this from the data from the other 2 histograms. https://codereview.chromium.org/2873223002/diff/100001/content/browser/loader... content/browser/loader/resource_scheduler.cc:549: RecordRequestCountMetrics(); On 2017/05/21 21:43:56, Randy Smith (Not in Mondays) wrote: > I remain uncertain that you're going to get good statistics from this. I'm > willing to defer to the histograms.xml owner if you'd like (if you specifically > raise the issue with them) but to me, this means that you're going to way > overcount situations in the above histograms when you've got a lot of requests, > and undercount when you have fewer. I'd instead suggest that you have > RecordRequestCountMetrics on a timer that goes off ever n ms while requests are > in-flight and records these metrics at those points. Alternatively, you could > do something that integrates the values of the various above metrics over times > in between insertions of in flight requests (though you'll need something that > captures when the last in flight request completes). But recording histograms > on what is, as best I can tell, the statistically arbitrary point of when a new > request gets inserted still strikes me as wrong. I agree with the spirit of this concern. The goal of these metrics is to compare different resource scheduling algorithms, and see how frequently different algorithms allow a certain number of requests to go in flight. One problem with using time-based sampling is that different algorithms may record different number of samples (if we restrict taking the sample to only when there are at least 1 request in-flight). This would make it difficult to compare different algorithms. Recording at request insertion/deletion has the nice property that changing the scheduling algorithm does not change the number of samples. I would agree that this metric has problems (e.g., it is oblivious of the duration of different states). A more comprehensive way (but with more histogram bloat) of recording this might be to use histograms of the format: ResourceScheduler.Count.{All|Delayable|NonDelayble|TotalLayoutBlocking|DelayableWhenLayoutBlocking|DelayableWhenNonDelayable}.{1..N}. This would require ~6*N histograms. Then every time a request is inserted or removed, we record for how long (=time duration) there were X requests in flight. e.g., if there were 3 requests in flight for 30 seconds, we record a 30 second sample in the histogram: ResourceScheduler.Count.All.3. Similarly, we record 5 more samples. I am hesitant to go with that approach since it is not clear if we really need to record the time dimension. Thoughts?
https://codereview.chromium.org/2873223002/diff/100001/content/browser/loader... File content/browser/loader/resource_scheduler.cc (right): https://codereview.chromium.org/2873223002/diff/100001/content/browser/loader... content/browser/loader/resource_scheduler.cc:528: non_delayable_in_flight_count); On 2017/05/25 01:07:46, tbansal1 wrote: > On 2017/05/21 21:43:56, Randy Smith (Not in Mondays) wrote: > > I don't have strong objections, but histograms do cost (memory, bandwidth). > Do > > you need a histogram that duplicates the information available in other > > histograms? Why? > > I am not sure if this is duplicate of some other histogram. This is recording > the difference between two values, however it is not possible to compute this > from the data from the other 2 histograms. Good point, though it's possible with some serious dremeling. Ok, I'll just push back a little bit more than yield: Could you confirm for yourself that you'll actually use all three of these histograms, rather just needing the data you can get from two of them in the third context (e.g. average for the distribution)? If so, I'm good with them. https://codereview.chromium.org/2873223002/diff/100001/content/browser/loader... content/browser/loader/resource_scheduler.cc:549: RecordRequestCountMetrics(); On 2017/05/25 01:07:46, tbansal1 wrote: > On 2017/05/21 21:43:56, Randy Smith (Not in Mondays) wrote: > > I remain uncertain that you're going to get good statistics from this. I'm > > willing to defer to the histograms.xml owner if you'd like (if you > specifically > > raise the issue with them) but to me, this means that you're going to way > > overcount situations in the above histograms when you've got a lot of > requests, > > and undercount when you have fewer. I'd instead suggest that you have > > RecordRequestCountMetrics on a timer that goes off ever n ms while requests > are > > in-flight and records these metrics at those points. Alternatively, you could > > do something that integrates the values of the various above metrics over > times > > in between insertions of in flight requests (though you'll need something that > > captures when the last in flight request completes). But recording histograms > > on what is, as best I can tell, the statistically arbitrary point of when a > new > > request gets inserted still strikes me as wrong. > > I agree with the spirit of this concern. The goal of these metrics is to compare > different resource scheduling algorithms, and see how frequently different > algorithms allow a certain number of requests to go in flight. > > One problem with using time-based sampling is that different algorithms may > record different number of samples (if we restrict taking the sample to only > when there are at least 1 request in-flight). This would make it difficult to > compare different algorithms. Recording at request insertion/deletion has the > nice property that changing the scheduling algorithm does not change the number > of samples. > > I would agree that this metric has problems (e.g., it is oblivious of the > duration of different states). A more comprehensive way (but with more histogram > bloat) of recording this might be to use histograms of the format: > ResourceScheduler.Count.{All|Delayable|NonDelayble|TotalLayoutBlocking|DelayableWhenLayoutBlocking|DelayableWhenNonDelayable}.{1..N}. > This would require ~6*N histograms. > > Then every time a request is inserted or removed, we record for how long (=time > duration) there were X requests in flight. e.g., if there were 3 requests in > flight for 30 seconds, we record a 30 second sample in the histogram: > ResourceScheduler.Count.All.3. > Similarly, we record 5 more samples. > > I am hesitant to go with that approach since it is not clear if we really need > to record the time dimension. > > Thoughts? My problem is that I'm not a statistician, though I'm clearly trying to play one on TV :-}. If we take the goal as being only comparing scheduling algorithms, I think a metric of "fewer at insertion of in flight request is better" is ok, and this is a fine way to get that metric. I'd recommending skipping the second call on erasing in flights requests, though--I don't think we care much about that value. An alternative metric to try and get would be "average number of in-flight requests for the duration of any given request". I'm inclined to think that's a better metric than the value at insertion, as requests will be interfering with each other for the full overlap of their lifetimes. I'm happy to yield to your preference. If you'd prefer that metric, I don't think it would be hard to compute, and it would only take one (per type being counted) histogram. What you'd do is, for each request, keep three variables: The amount of time the request has been oustanding for, the average number of other requests active during that time, and the last time the average was computed. Whenever there was a change (erasure or addition) of a request that wasn't the request in question, you would recompute the average ((average * time outstanding + time since last computed * other requests active since that time)/(time outstanding + time since last compued)) and the time outstanding. When the request completes, the average gets entered into the histogram(s). But I'm willing to accept what you currently have if you prefer it to that (modulo removing the recording from the EraseInFlightRequest() method--if you think that's a good idea you'll need to pitch it to me).
Patchset #5 (id:160001) has been deleted
Patchset #4 (id:140001) has been deleted
Patchset #3 (id:120001) has been deleted
Patchset #3 (id:180001) has been deleted
The CQ bit was checked by tbansal@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
rdsmith: ptal. Thanks for the comments. I have rm'ed some of the histograms. Hopefully, this is somewhat better. https://codereview.chromium.org/2873223002/diff/100001/content/browser/loader... File content/browser/loader/resource_scheduler.cc (right): https://codereview.chromium.org/2873223002/diff/100001/content/browser/loader... content/browser/loader/resource_scheduler.cc:528: non_delayable_in_flight_count); On 2017/05/25 17:40:12, Randy Smith (Not in Mondays) wrote: > On 2017/05/25 01:07:46, tbansal1 wrote: > > On 2017/05/21 21:43:56, Randy Smith (Not in Mondays) wrote: > > > I don't have strong objections, but histograms do cost (memory, bandwidth). > > Do > > > you need a histogram that duplicates the information available in other > > > histograms? Why? > > > > I am not sure if this is duplicate of some other histogram. This is recording > > the difference between two values, however it is not possible to compute this > > from the data from the other 2 histograms. > > Good point, though it's possible with some serious dremeling. > > Ok, I'll just push back a little bit more than yield: Could you confirm for > yourself that you'll actually use all three of these histograms, rather just > needing the data you can get from two of them in the third context (e.g. average > for the distribution)? If so, I'm good with them. Yes, for now I think it is useful to record the difference separately so that it can be analyzed as a function different network qualities. https://codereview.chromium.org/2873223002/diff/100001/content/browser/loader... content/browser/loader/resource_scheduler.cc:549: RecordRequestCountMetrics(); On 2017/05/25 17:40:12, Randy Smith (Not in Mondays) wrote: > On 2017/05/25 01:07:46, tbansal1 wrote: > > On 2017/05/21 21:43:56, Randy Smith (Not in Mondays) wrote: > > > I remain uncertain that you're going to get good statistics from this. I'm > > > willing to defer to the histograms.xml owner if you'd like (if you > > specifically > > > raise the issue with them) but to me, this means that you're going to way > > > overcount situations in the above histograms when you've got a lot of > > requests, > > > and undercount when you have fewer. I'd instead suggest that you have > > > RecordRequestCountMetrics on a timer that goes off ever n ms while requests > > are > > > in-flight and records these metrics at those points. Alternatively, you > could > > > do something that integrates the values of the various above metrics over > > times > > > in between insertions of in flight requests (though you'll need something > that > > > captures when the last in flight request completes). But recording > histograms > > > on what is, as best I can tell, the statistically arbitrary point of when a > > new > > > request gets inserted still strikes me as wrong. > > > > I agree with the spirit of this concern. The goal of these metrics is to > compare > > different resource scheduling algorithms, and see how frequently different > > algorithms allow a certain number of requests to go in flight. > > > > One problem with using time-based sampling is that different algorithms may > > record different number of samples (if we restrict taking the sample to only > > when there are at least 1 request in-flight). This would make it difficult to > > compare different algorithms. Recording at request insertion/deletion has the > > nice property that changing the scheduling algorithm does not change the > number > > of samples. > > > > I would agree that this metric has problems (e.g., it is oblivious of the > > duration of different states). A more comprehensive way (but with more > histogram > > bloat) of recording this might be to use histograms of the format: > > > ResourceScheduler.Count.{All|Delayable|NonDelayble|TotalLayoutBlocking|DelayableWhenLayoutBlocking|DelayableWhenNonDelayable}.{1..N}. > > This would require ~6*N histograms. > > > > Then every time a request is inserted or removed, we record for how long > (=time > > duration) there were X requests in flight. e.g., if there were 3 requests in > > flight for 30 seconds, we record a 30 second sample in the histogram: > > ResourceScheduler.Count.All.3. > > Similarly, we record 5 more samples. > > > > I am hesitant to go with that approach since it is not clear if we really need > > to record the time dimension. > > > > Thoughts? > > My problem is that I'm not a statistician, though I'm clearly trying to play one > on TV :-}. If we take the goal as being only comparing scheduling algorithms, I > think a metric of "fewer at insertion of in flight request is better" is ok, and > this is a fine way to get that metric. I'd recommending skipping the second > call on erasing in flights requests, though--I don't think we care much about > that value. Done. > > An alternative metric to try and get would be "average number of in-flight > requests for the duration of any given request". I'm inclined to think that's a > better metric than the value at insertion, as requests will be interfering with > each other for the full overlap of their lifetimes. I'm happy to yield to your > preference. > > If you'd prefer that metric, I don't think it would be hard to compute, and it > would only take one (per type being counted) histogram. What you'd do is, for > each request, keep three variables: The amount of time the request has been > oustanding for, the average number of other requests active during that time, > and the last time the average was computed. Whenever there was a change > (erasure or addition) of a request that wasn't the request in question, you > would recompute the average ((average * time outstanding + time since last > computed * other requests active since that time)/(time outstanding + time since > last compued)) and the time outstanding. When the request completes, the > average gets entered into the histogram(s). Average is a fine metric but then I would argue for recording average of different types of requests. For example, average of count of requests of type X when a request of type Y is in flight. where X,Y belong to {delayable, non-delayble, layout blocking}; This is because we do not want to lump metrics from different types of requests in the same histogram. > > But I'm willing to accept what you currently have if you prefer it to that > (modulo removing the recording from the EraseInFlightRequest() method--if you > think that's a good idea you'll need to pitch it to me). Done.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
rdsmith: gentle ping. Thanks.
On 2017/06/06 17:27:56, tbansal1 wrote: > rdsmith: gentle ping. Thanks. Sorry, CAM innovation week + not working Monday. I'll try to get to it by EOD today; feel free to ping me again tomorrow morning if I don't.
LGTM modulo comment change in histograms.xml. https://codereview.chromium.org/2873223002/diff/200001/tools/metrics/histogra... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/2873223002/diff/200001/tools/metrics/histogra... tools/metrics/histograms/histograms.xml:61123: + in-flight requests is modified by the resource scheduler. I think this comment is out of date? It's not just when a request is inserted?
tbansal@chromium.org changed reviewers: + rkaplow@chromium.org
rkaplow: ptal at histograms.xml. Thanks. https://codereview.chromium.org/2873223002/diff/200001/tools/metrics/histogra... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/2873223002/diff/200001/tools/metrics/histogra... tools/metrics/histograms/histograms.xml:61123: + in-flight requests is modified by the resource scheduler. On 2017/06/06 17:41:44, Randy Smith (Not in Mondays) wrote: > I think this comment is out of date? It's not just when a request is inserted? Done.
The CQ bit was checked by tbansal@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
lgtm
The CQ bit was unchecked by tbansal@chromium.org
The CQ bit was checked by tbansal@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from rdsmith@chromium.org Link to the patchset: https://codereview.chromium.org/2873223002/#ps220001 (title: "rdsmith comment")
CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch.
Bot data: {"patchset_id": 220001, "attempt_start_ts": 1496786768278110,
"parent_rev": "a2af3b52cd02bf15d82e879ec24afac043110df8", "commit_rev":
"29fc63386af102f88a5373b8b83ee94ac6b4ec45"}
Message was sent while issue was closed.
Description was changed from ========== Record resource scheduler UMA Record UMA on how many requests are in-flight. UMA is recorded only when the set of in-flight requests changes. Also, record UMA on highest number of delayable requests in-flight when a delayable or layout blocking request is in flight. This UMA is recorded per-request (on select requests). BUG=720783 ========== to ========== Record resource scheduler UMA Record UMA on how many requests are in-flight. UMA is recorded only when the set of in-flight requests changes. Also, record UMA on highest number of delayable requests in-flight when a delayable or layout blocking request is in flight. This UMA is recorded per-request (on select requests). BUG=720783 Review-Url: https://codereview.chromium.org/2873223002 Cr-Commit-Position: refs/heads/master@{#477447} Committed: https://chromium.googlesource.com/chromium/src/+/29fc63386af102f88a5373b8b83e... ==========
Message was sent while issue was closed.
Committed patchset #4 (id:220001) as https://chromium.googlesource.com/chromium/src/+/29fc63386af102f88a5373b8b83e...
Message was sent while issue was closed.
On 2017/06/06 22:50:41, commit-bot: I haz the power wrote: > Committed patchset #4 (id:220001) as > https://chromium.googlesource.com/chromium/src/+/29fc63386af102f88a5373b8b83e... Not sure why presubmit didn't work here, but this change seems to have broken pretty_print.py and presubmit for subsequent patches: ./tools/metrics/histograms/pretty_print.py INFO:root:Loading tools/metrics/histograms/enums.xml... INFO:root:enums.xml is correctly pretty-printed. INFO:root:Loading tools/metrics/histograms/histograms.xml... ERROR:root:Missing attribute "separator" in tag "histogram_suffixes" ERROR:root:Aborting parsing due to fatal errors: Traceback (most recent call last): File "./tools/metrics/histograms/../common/presubmit_util.py", line 65, in DoPresubmit pretty = prettyFn(original_xml) File "./tools/metrics/histograms/pretty_print.py", line 92, in PrettyPrintHistograms return print_style.GetPrintStyle().PrettyPrintXml(tree) File "./tools/metrics/histograms/../common/pretty_print_xml.py", line 81, in PrettyPrintXml tree = self.PrettyPrintNode(tree) File "./tools/metrics/histograms/../common/pretty_print_xml.py", line 169, in PrettyPrintNode return '\n'.join([self.PrettyPrintNode(n) for n in node.childNodes]) File "./tools/metrics/histograms/../common/pretty_print_xml.py", line 279, in PrettyPrintNode for n in child_nodes] File "./tools/metrics/histograms/../common/pretty_print_xml.py", line 279, in PrettyPrintNode for n in child_nodes] File "./tools/metrics/histograms/../common/pretty_print_xml.py", line 221, in PrettyPrintNode missing_attributes_str, node_str)) Error: Missing attributes "separator" in tag "<histogram_suffixes name="ResourceSchedulerPeakDelayableRequestsInFlight">"
Message was sent while issue was closed.
Ah, pretty_print.py isn't a presubmit. But pretty_print.py fails now.
Message was sent while issue was closed.
On 2017/06/07 02:29:53, falken wrote: > Ah, pretty_print.py isn't a presubmit. But pretty_print.py fails now. https://codereview.chromium.org/2927773002/ fixed the pretty_print.py. Please let me know if it is still broken. falken, thanks for reporting this.
Message was sent while issue was closed.
On 2017/06/07 16:17:10, tbansal1 wrote: > On 2017/06/07 02:29:53, falken wrote: > > Ah, pretty_print.py isn't a presubmit. But pretty_print.py fails now. > > https://codereview.chromium.org/2927773002/ fixed the pretty_print.py. > Please let me know if it is still broken. falken, thanks for reporting > this. Looks fixed now. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
