|
|
Chromium Code Reviews|
Created:
3 years, 8 months ago by Zhongyi Shi Modified:
3 years, 8 months ago CC:
chromium-reviews, cbentzel+watch_chromium.org, net-reviews_chromium.org Target Ref:
refs/heads/master Project:
chromium Visibility:
Public. |
DescriptionAdd JobController/Job count to UMA histograms when the count is a multiple of 100.
BUG=704988
Review-Url: https://codereview.chromium.org/2809453002
Cr-Commit-Position: refs/heads/master@{#463682}
Committed: https://chromium.googlesource.com/chromium/src/+/620855b6c99bb476fc87b67f9329b8d3724ea031
Patch Set 1 #
Total comments: 3
Patch Set 2 : add to a new histogram #Patch Set 3 : Change to only log the histograms when the count hits 200, 400, etc #Patch Set 4 : change a comment #
Total comments: 6
Patch Set 5 : fix comments #Patch Set 6 : change the boundary to a multiple of 100 #
Total comments: 4
Patch Set 7 : address #21 #Patch Set 8 : address #23 #
Messages
Total messages: 34 (10 generated)
zhongyi@chromium.org changed reviewers: + rch@chromium.org
lgtm One other metric we could consider logging would be the age of JobControllers. Imagine if each controller knew when it was created. Then we could log in UMA the age of the oldest controller in the set. That'd probably go a long way to show leaks. (Or perhaps the number of jobs more than 1 minute old, and or 5 minutes old, etc) https://codereview.chromium.org/2809453002/diff/1/net/http/http_stream_factor... File net/http/http_stream_factory_impl.cc (right): https://codereview.chromium.org/2809453002/diff/1/net/http/http_stream_factor... net/http/http_stream_factory_impl.cc:362: for (const auto& it : job_controller_set_) { nit: I think "it" is not an iterator, but rather a JobController, right? perhaps controller would be a better name? https://codereview.chromium.org/2809453002/diff/1/net/http/http_stream_factor... net/http/http_stream_factory_impl.cc:380: main_job_count); it's a bit of a bummer that we have to compute these numbers each time. I don't suppose there's any obvious way to cache/precomute this? We could easily simply log the size of job_controller_set_. I guess the controller could also call some method on the factory when it creates/deletes a job, but that's probably not terribly fool proof. Well, this definitely seems like a step forward, but its worth thinking about this.
https://codereview.chromium.org/2809453002/diff/1/net/http/http_stream_factor... File net/http/http_stream_factory_impl.cc (right): https://codereview.chromium.org/2809453002/diff/1/net/http/http_stream_factor... net/http/http_stream_factory_impl.cc:380: main_job_count); On 2017/04/07 22:51:05, Ryan Hamilton wrote: > it's a bit of a bummer that we have to compute these numbers each time. I don't > suppose there's any obvious way to cache/precomute this? > > We could easily simply log the size of job_controller_set_. I guess the > controller could also call some method on the factory when it creates/deletes a > job, but that's probably not terribly fool proof. > > Well, this definitely seems like a step forward, but its worth thinking about > this. Well, if we want to improve: I guess one way we could do is to use maps to hold job controllers and differentiate job controllers by its status. We could separate the HttpStreamFactoryImpl::job_controller_set_: 1. normal_job_controllers: JobController has main and alt job, and JobController that only no alt job from the beginning. 2. preconnect_job_controllers: JobController that is used for preconnect 3. job_controllers_with_pending_main_job: JobController that used to have 2 jobs, but now only main job 4. job_controllers_with_pending_alt_job: JobController that used to have 2 jobs, but now only alt job When alt job or main job finishes, we need to move the JobController around to the correct map. 3 and 4 are what we care more I think. WDYT?
zhongyi@chromium.org changed reviewers: + asvitkine@chromium.org
Ryan and Alexei: ptal!
lgtm
sorry, but I am going to push back on this implementation. This adds unnecessary code that needs to be run for *every* connection establishment in the network stack. I have been trying to remove dead/unnecessary code in these files for the past week, and I really don't want to see things added that obviously could be improved.
Description was changed from ========== Add JobController/Job count to UMA histograms everytime a new JobController is created BUG=704988 ========== to ========== Add JobController/Job count to UMA histograms everytime a new JobController is created BUG=704988 ==========
xunjieli@chromium.org changed reviewers: + xunjieli@chromium.org
On 2017/04/10 13:22:54, xunjieli wrote: > sorry, but I am going to push back on this implementation. This adds unnecessary > code that needs to be run for *every* connection establishment in the network > stack. I have been trying to remove dead/unnecessary code in these files for the > past week, and I really don't want to see things added that obviously could be > improved. We have had two major postmortems which were the result of the JobControllerSet growing without bound. An action item from them was to collect metrics on the size of the set so that we can understand if we're still growing without bound. We need to do this. This CL adds a single histogram once per request. This does not seem like a ton of overhead/useless code given the benefits. If you have a suggestion for an alternative implementation, I'm happy to pursue that, but we need to not delay collecting these metrics unnecessarily.
On 2017/04/10 13:37:22, Ryan Hamilton wrote: > On 2017/04/10 13:22:54, xunjieli wrote: > > sorry, but I am going to push back on this implementation. This adds > unnecessary > > code that needs to be run for *every* connection establishment in the network > > stack. I have been trying to remove dead/unnecessary code in these files for > the > > past week, and I really don't want to see things added that obviously could be > > improved. > > We have had two major postmortems which were the result of the JobControllerSet > growing without bound. An action item from them was to collect metrics on the > size of the set so that we can understand if we're still growing without bound. > We need to do this. > > This CL adds a single histogram once per request. This does not seem like a ton > of overhead/useless code given the benefits. > > If you have a suggestion for an alternative implementation, I'm happy to pursue > that, but we need to not delay collecting these metrics unnecessarily. I am fully aware that there are two postmortems here. There're probably more that we do not know (e.g crbug.com/704956). The obvious alternative is to upload the counts using MetricsService's schedule upload. We already have our own uploader, NetworkMetricsProvider. Plumbing the counts is not trivial is not a good justification why we should not do it.
rch@chromium.org changed reviewers: + cbentzel@chromium.org
+cbentzel On 2017/04/10 13:46:08, xunjieli wrote: > On 2017/04/10 13:37:22, Ryan Hamilton wrote: > > On 2017/04/10 13:22:54, xunjieli wrote: > > > sorry, but I am going to push back on this implementation. This adds > > unnecessary > > > code that needs to be run for *every* connection establishment in the > network > > > stack. I have been trying to remove dead/unnecessary code in these files for > > the > > > past week, and I really don't want to see things added that obviously could > be > > > improved. > > > > We have had two major postmortems which were the result of the > JobControllerSet > > growing without bound. An action item from them was to collect metrics on the > > size of the set so that we can understand if we're still growing without > bound. > > We need to do this. > > > > This CL adds a single histogram once per request. This does not seem like a > ton > > of overhead/useless code given the benefits. > > > > If you have a suggestion for an alternative implementation, I'm happy to > pursue > > that, but we need to not delay collecting these metrics unnecessarily. > > I am fully aware that there are two postmortems here. There're probably more > that we do not know (e.g crbug.com/704956). Exactly. > The obvious alternative is to upload > the counts using MetricsService's schedule upload. We already have our own > uploader, NetworkMetricsProvider. Plumbing the counts is not trivial is not a > good justification why we should not do it. I strenuously disagree. Protecting our users is our most important responsibility. Adding a single histogram's worth of technical debt is a small price to pay for getting visibility here. I have been asking for this metric for more than a month and we still do not have it. We should not block this CL landing while we spend even more time trying to figure out a better way to collect this data. Instead, we should start collecting this data while we try to figure out a better way to do that.
On 2017/04/10 14:23:40, Ryan Hamilton wrote: > +cbentzel > > On 2017/04/10 13:46:08, xunjieli wrote: > > On 2017/04/10 13:37:22, Ryan Hamilton wrote: > > > On 2017/04/10 13:22:54, xunjieli wrote: > > > > sorry, but I am going to push back on this implementation. This adds > > > unnecessary > > > > code that needs to be run for *every* connection establishment in the > > network > > > > stack. I have been trying to remove dead/unnecessary code in these files > for > > > the > > > > past week, and I really don't want to see things added that obviously > could > > be > > > > improved. > > > > > > We have had two major postmortems which were the result of the > > JobControllerSet > > > growing without bound. An action item from them was to collect metrics on > the > > > size of the set so that we can understand if we're still growing without > > bound. > > > We need to do this. > > > > > > This CL adds a single histogram once per request. This does not seem like a > > ton > > > of overhead/useless code given the benefits. > > > > > > If you have a suggestion for an alternative implementation, I'm happy to > > pursue > > > that, but we need to not delay collecting these metrics unnecessarily. > > > > I am fully aware that there are two postmortems here. There're probably more > > that we do not know (e.g crbug.com/704956). > > Exactly. > > > The obvious alternative is to upload > > the counts using MetricsService's schedule upload. We already have our own > > uploader, NetworkMetricsProvider. Plumbing the counts is not trivial is not a > > good justification why we should not do it. > > I strenuously disagree. Protecting our users is our most important > responsibility. Adding a single histogram's worth of technical debt is a small > price to pay for getting visibility here. I have been asking for this metric for > more than a month and we still do not have it. We should not block this CL > landing while we spend even more time trying to figure out a better way to > collect this data. Instead, we should start collecting this data while we try to > figure out a better way to do that. I suggested an alternative approach on the linked bug. To illustrate how easy it is, I wrote a CL in 5min https://codereview.chromium.org/2814473002/ I am not trying to delay things unnecessarily. I am just trying to figure out a better approach. Things left in TODOs are often not addressed.
On 2017/04/10 14:28:28, xunjieli wrote: > On 2017/04/10 14:23:40, Ryan Hamilton wrote: > > +cbentzel > > > > On 2017/04/10 13:46:08, xunjieli wrote: > > > On 2017/04/10 13:37:22, Ryan Hamilton wrote: > > > > On 2017/04/10 13:22:54, xunjieli wrote: > > > > > sorry, but I am going to push back on this implementation. This adds > > > > unnecessary > > > > > code that needs to be run for *every* connection establishment in the > > > network > > > > > stack. I have been trying to remove dead/unnecessary code in these files > > for > > > > the > > > > > past week, and I really don't want to see things added that obviously > > could > > > be > > > > > improved. > > > > > > > > We have had two major postmortems which were the result of the > > > JobControllerSet > > > > growing without bound. An action item from them was to collect metrics on > > the > > > > size of the set so that we can understand if we're still growing without > > > bound. > > > > We need to do this. > > > > > > > > This CL adds a single histogram once per request. This does not seem like > a > > > ton > > > > of overhead/useless code given the benefits. > > > > > > > > If you have a suggestion for an alternative implementation, I'm happy to > > > pursue > > > > that, but we need to not delay collecting these metrics unnecessarily. > > > > > > I am fully aware that there are two postmortems here. There're probably more > > > that we do not know (e.g crbug.com/704956). > > > > Exactly. > > > > > The obvious alternative is to upload > > > the counts using MetricsService's schedule upload. We already have our own > > > uploader, NetworkMetricsProvider. Plumbing the counts is not trivial is not > a > > > good justification why we should not do it. > > > > I strenuously disagree. Protecting our users is our most important > > responsibility. Adding a single histogram's worth of technical debt is a small > > price to pay for getting visibility here. I have been asking for this metric > for > > more than a month and we still do not have it. We should not block this CL > > landing while we spend even more time trying to figure out a better way to > > collect this data. Instead, we should start collecting this data while we try > to > > figure out a better way to do that. > > I suggested an alternative approach on the linked bug. To illustrate how easy it > is, I wrote a CL in 5min https://codereview.chromium.org/2814473002/ > I am not trying to delay things unnecessarily. I am just trying to figure out a > better approach. Things left in TODOs are often not addressed. I'm on vacation today. I'l have to leave it to you all to sort out. Please make sure something lands soon.
On 2017/04/10 14:28:28, xunjieli wrote: > On 2017/04/10 14:23:40, Ryan Hamilton wrote: > > +cbentzel > > > > On 2017/04/10 13:46:08, xunjieli wrote: > > > On 2017/04/10 13:37:22, Ryan Hamilton wrote: > > > > On 2017/04/10 13:22:54, xunjieli wrote: > > > > > sorry, but I am going to push back on this implementation. This adds > > > > unnecessary > > > > > code that needs to be run for *every* connection establishment in the > > > network > > > > > stack. I have been trying to remove dead/unnecessary code in these files > > for > > > > the > > > > > past week, and I really don't want to see things added that obviously > > could > > > be > > > > > improved. > > > > > > > > We have had two major postmortems which were the result of the > > > JobControllerSet > > > > growing without bound. An action item from them was to collect metrics on > > the > > > > size of the set so that we can understand if we're still growing without > > > bound. > > > > We need to do this. > > > > > > > > This CL adds a single histogram once per request. This does not seem like > a > > > ton > > > > of overhead/useless code given the benefits. > > > > > > > > If you have a suggestion for an alternative implementation, I'm happy to > > > pursue > > > > that, but we need to not delay collecting these metrics unnecessarily. > > > > > > I am fully aware that there are two postmortems here. There're probably more > > > that we do not know (e.g crbug.com/704956). > > > > Exactly. > > > > > The obvious alternative is to upload > > > the counts using MetricsService's schedule upload. We already have our own > > > uploader, NetworkMetricsProvider. Plumbing the counts is not trivial is not > a > > > good justification why we should not do it. > > > > I strenuously disagree. Protecting our users is our most important > > responsibility. Adding a single histogram's worth of technical debt is a small > > price to pay for getting visibility here. I have been asking for this metric > for > > more than a month and we still do not have it. We should not block this CL > > landing while we spend even more time trying to figure out a better way to > > collect this data. Instead, we should start collecting this data while we try > to > > figure out a better way to do that. > > I suggested an alternative approach on the linked bug. To illustrate how easy it > is, I wrote a CL in 5min https://codereview.chromium.org/2814473002/ > I am not trying to delay things unnecessarily. I am just trying to figure out a > better approach. Things left in TODOs are often not addressed. Helen, I was totally aware and did considered that we could use the NetworkMetricsProvider. However, currently NetworkMetricsProvider schedules histogram upload for data from NetworkChangeNotifier or existing histograms. MetricsObserver you added in https://codereview.chromium.org/2814473002/ doesn't seem to be appropriate to go with NetworkChangeNotifier, as "NetworkChangeNotifier monitors the system for network changes, and notifies registered observers of those events". rch@ and I talked about this earlier, and were thinking landing the current approach in M59 while sorting out the upload in a latter CL was the plan. WDYT?
Helen, change to the alternative approach discussed, ptal.
Looks great. Thanks for doing this. A few comments below. https://codereview.chromium.org/2809453002/diff/60001/net/http/http_stream_fa... File net/http/http_stream_factory_impl.cc (right): https://codereview.chromium.org/2809453002/diff/60001/net/http/http_stream_fa... net/http/http_stream_factory_impl.cc:367: return; Line 362 to 367 can be written as: if (job_controller_set.size_() / 100 == 0 || job_controller_set.size() % 100 != 0) return; so we have one early return. https://codereview.chromium.org/2809453002/diff/60001/tools/metrics/histogram... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/2809453002/diff/60001/tools/metrics/histogram... tools/metrics/histograms/histograms.xml:34694: + This counts number of all the uncleaned job controllers when it hits limit nit: consider replace "uncleaned." I haven't seen it used as a verb. One suggestion: This reports the number of live job controllers when that number is a multiple of 100. https://codereview.chromium.org/2809453002/diff/60001/tools/metrics/histogram... tools/metrics/histograms/histograms.xml:34703: + This counts number of all the uncleaned job controllers that are sill alive. nit: there is a typo in this sentence. Could get rid of the word "uncleaned"
Thanks Helen, ptal https://codereview.chromium.org/2809453002/diff/60001/net/http/http_stream_fa... File net/http/http_stream_factory_impl.cc (right): https://codereview.chromium.org/2809453002/diff/60001/net/http/http_stream_fa... net/http/http_stream_factory_impl.cc:367: return; On 2017/04/10 23:13:21, xunjieli wrote: > Line 362 to 367 can be written as: > > > if (job_controller_set.size_() / 100 == 0 || job_controller_set.size() % 100 != > 0) > return; > > > so we have one early return. Changed the limit to a multiple of 100. https://codereview.chromium.org/2809453002/diff/60001/tools/metrics/histogram... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/2809453002/diff/60001/tools/metrics/histogram... tools/metrics/histograms/histograms.xml:34694: + This counts number of all the uncleaned job controllers when it hits limit On 2017/04/10 23:13:21, xunjieli wrote: > nit: consider replace "uncleaned." I haven't seen it used as a verb. > > > One suggestion: > This reports the number of live job controllers when that number is a multiple > of 100. Done. https://codereview.chromium.org/2809453002/diff/60001/tools/metrics/histogram... tools/metrics/histograms/histograms.xml:34703: + This counts number of all the uncleaned job controllers that are sill alive. On 2017/04/10 23:13:21, xunjieli wrote: > nit: there is a typo in this sentence. > Could get rid of the word "uncleaned" Done.
lgtm https://codereview.chromium.org/2809453002/diff/100001/net/http/http_stream_f... File net/http/http_stream_factory_impl.h (right): https://codereview.chromium.org/2809453002/diff/100001/net/http/http_stream_f... net/http/http_stream_factory_impl.h:175: // the count hits the limit: 100, 200, 400, etc. Break down JobControllers nit: if the count are multiples of 100. https://codereview.chromium.org/2809453002/diff/100001/tools/metrics/histogra... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/2809453002/diff/100001/tools/metrics/histogra... tools/metrics/histograms/histograms.xml:34703: + This counts number of all the job controllers that are sill alive. nit: s/sill/still
asvitkine@: ptal! xunijeli@: thanks for review.
https://codereview.chromium.org/2809453002/diff/100001/net/http/http_stream_f... File net/http/http_stream_factory_impl.h (right): https://codereview.chromium.org/2809453002/diff/100001/net/http/http_stream_f... net/http/http_stream_factory_impl.h:175: // the count hits the limit: 100, 200, 400, etc. Break down JobControllers On 2017/04/10 23:49:14, xunjieli wrote: > nit: if the count are multiples of 100. hmm.. I don't see this one in the new patch. Could you also s/Add/Adds?
https://codereview.chromium.org/2809453002/diff/100001/net/http/http_stream_f... File net/http/http_stream_factory_impl.h (right): https://codereview.chromium.org/2809453002/diff/100001/net/http/http_stream_f... net/http/http_stream_factory_impl.h:175: // the count hits the limit: 100, 200, 400, etc. Break down JobControllers On 2017/04/11 00:14:22, xunjieli wrote: > On 2017/04/10 23:49:14, xunjieli wrote: > > nit: if the count are multiples of 100. > > hmm.. I don't see this one in the new patch. > > Could you also s/Add/Adds? > > > Done. Whoops, I updated in histograms.xml and forgot here. Sorry about that.
On 2017/04/11 00:21:22, Zhongyi Shi wrote: > https://codereview.chromium.org/2809453002/diff/100001/net/http/http_stream_f... > File net/http/http_stream_factory_impl.h (right): > > https://codereview.chromium.org/2809453002/diff/100001/net/http/http_stream_f... > net/http/http_stream_factory_impl.h:175: // the count hits the limit: 100, 200, > 400, etc. Break down JobControllers > On 2017/04/11 00:14:22, xunjieli wrote: > > On 2017/04/10 23:49:14, xunjieli wrote: > > > nit: if the count are multiples of 100. > > > > hmm.. I don't see this one in the new patch. > > > > Could you also s/Add/Adds? > > > > > > > > Done. Whoops, I updated in histograms.xml and forgot here. Sorry about that. lgtm. Thanks!
Description was changed from ========== Add JobController/Job count to UMA histograms everytime a new JobController is created BUG=704988 ========== to ========== Add JobController/Job count to UMA histograms when the count is a multiple of 100. BUG=704988 ==========
lgtm
The CQ bit was checked by zhongyi@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from rch@chromium.org Link to the patchset: https://codereview.chromium.org/2809453002/#ps140001 (title: "address #23")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch.
Bot data: {"patchset_id": 140001, "attempt_start_ts": 1491929321320020,
"parent_rev": "2fb66ab65463f429142b4d259f64728bbdae4bc3", "commit_rev":
"620855b6c99bb476fc87b67f9329b8d3724ea031"}
Message was sent while issue was closed.
Description was changed from ========== Add JobController/Job count to UMA histograms when the count is a multiple of 100. BUG=704988 ========== to ========== Add JobController/Job count to UMA histograms when the count is a multiple of 100. BUG=704988 Review-Url: https://codereview.chromium.org/2809453002 Cr-Commit-Position: refs/heads/master@{#463682} Committed: https://chromium.googlesource.com/chromium/src/+/620855b6c99bb476fc87b67f9329... ==========
Message was sent while issue was closed.
Committed patchset #8 (id:140001) as https://chromium.googlesource.com/chromium/src/+/620855b6c99bb476fc87b67f9329...
Message was sent while issue was closed.
On 2017/04/11 17:58:38, commit-bot: I haz the power wrote: > Committed patchset #8 (id:140001) as > https://chromium.googlesource.com/chromium/src/+/620855b6c99bb476fc87b67f9329... Thanks Helen and Cherie for getting this landed. I can't wait to see what the metrics show! |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
