Issue 2809453002: Add JobController/Job count to UMA histograms when the count hits limit

Zhongyi Shi

zhongyi@chromium.org changed reviewers: + rch@chromium.org

3 years, 8 months ago (2017-04-07 22:39:41 UTC) #1

Ryan Hamilton

lgtm One other metric we could consider logging would be the age of JobControllers. Imagine ...

3 years, 8 months ago (2017-04-07 22:51:05 UTC) #3

Zhongyi Shi

https://codereview.chromium.org/2809453002/diff/1/net/http/http_stream_factory_impl.cc File net/http/http_stream_factory_impl.cc (right): https://codereview.chromium.org/2809453002/diff/1/net/http/http_stream_factory_impl.cc#newcode380 net/http/http_stream_factory_impl.cc:380: main_job_count); On 2017/04/07 22:51:05, Ryan Hamilton wrote: > it's ...

3 years, 8 months ago (2017-04-07 23:43:03 UTC) #4

Zhongyi Shi

zhongyi@chromium.org changed reviewers: + asvitkine@chromium.org

3 years, 8 months ago (2017-04-08 00:42:02 UTC) #5

xunjieli

sorry, but I am going to push back on this implementation. This adds unnecessary code ...

3 years, 8 months ago (2017-04-10 13:22:54 UTC) #8

xunjieli

Description was changed from ========== Add JobController/Job count to UMA histograms everytime a new JobController ...

3 years, 8 months ago (2017-04-10 13:23:08 UTC) #9

xunjieli

xunjieli@chromium.org changed reviewers: + xunjieli@chromium.org

3 years, 8 months ago (2017-04-10 13:23:08 UTC) #10

Ryan Hamilton

On 2017/04/10 13:22:54, xunjieli wrote: > sorry, but I am going to push back on ...

3 years, 8 months ago (2017-04-10 13:37:22 UTC) #11

xunjieli

On 2017/04/10 13:37:22, Ryan Hamilton wrote: > On 2017/04/10 13:22:54, xunjieli wrote: > > sorry, ...

3 years, 8 months ago (2017-04-10 13:46:08 UTC) #12

Ryan Hamilton

rch@chromium.org changed reviewers: + cbentzel@chromium.org

3 years, 8 months ago (2017-04-10 14:23:00 UTC) #13

Ryan Hamilton

+cbentzel On 2017/04/10 13:46:08, xunjieli wrote: > On 2017/04/10 13:37:22, Ryan Hamilton wrote: > > ...

3 years, 8 months ago (2017-04-10 14:23:40 UTC) #14

+cbentzel

On 2017/04/10 13:46:08, xunjieli wrote:
> On 2017/04/10 13:37:22, Ryan Hamilton wrote:
> > On 2017/04/10 13:22:54, xunjieli wrote:
> > > sorry, but I am going to push back on this implementation. This adds
> > unnecessary
> > > code that needs to be run for *every* connection establishment in the
> network
> > > stack. I have been trying to remove dead/unnecessary code in these files
for
> > the
> > > past week, and I really don't want to see things added that obviously
could
> be
> > > improved.
> > 
> > We have had two major postmortems which were the result of the
> JobControllerSet
> > growing without bound. An action item from them was to collect metrics on
the
> > size of the set so that we can understand if we're still growing without
> bound.
> > We need to do this.
> > 
> > This CL adds a single histogram once per request. This does not seem like a
> ton
> > of overhead/useless code given the benefits.
> > 
> > If you have a suggestion for an alternative implementation, I'm happy to
> pursue
> > that, but we need to not delay collecting these metrics unnecessarily.
> 
> I am fully aware that there are two postmortems here. There're probably more
> that we do not know (e.g crbug.com/704956). 

Exactly.

> The obvious alternative is to upload
> the counts using MetricsService's schedule upload. We already have our own
> uploader, NetworkMetricsProvider. Plumbing the counts is not trivial is not a
> good justification why we should not do it.

I strenuously disagree. Protecting our users is our most important
responsibility. Adding a single histogram's worth of technical debt is a small
price to pay for getting visibility here. I have been asking for this metric for
more than a month and we still do not have it. We should not block this CL
landing while we spend even more time trying to figure out a better way to
collect this data. Instead, we should start collecting this data while we try to
figure out a better way to do that.

xunjieli

On 2017/04/10 14:23:40, Ryan Hamilton wrote: > +cbentzel > > On 2017/04/10 13:46:08, xunjieli wrote: ...

3 years, 8 months ago (2017-04-10 14:28:28 UTC) #15

On 2017/04/10 14:23:40, Ryan Hamilton wrote:
> +cbentzel
> 
> On 2017/04/10 13:46:08, xunjieli wrote:
> > On 2017/04/10 13:37:22, Ryan Hamilton wrote:
> > > On 2017/04/10 13:22:54, xunjieli wrote:
> > > > sorry, but I am going to push back on this implementation. This adds
> > > unnecessary
> > > > code that needs to be run for *every* connection establishment in the
> > network
> > > > stack. I have been trying to remove dead/unnecessary code in these files
> for
> > > the
> > > > past week, and I really don't want to see things added that obviously
> could
> > be
> > > > improved.
> > > 
> > > We have had two major postmortems which were the result of the
> > JobControllerSet
> > > growing without bound. An action item from them was to collect metrics on
> the
> > > size of the set so that we can understand if we're still growing without
> > bound.
> > > We need to do this.
> > > 
> > > This CL adds a single histogram once per request. This does not seem like
a
> > ton
> > > of overhead/useless code given the benefits.
> > > 
> > > If you have a suggestion for an alternative implementation, I'm happy to
> > pursue
> > > that, but we need to not delay collecting these metrics unnecessarily.
> > 
> > I am fully aware that there are two postmortems here. There're probably more
> > that we do not know (e.g crbug.com/704956). 
> 
> Exactly.
> 
> > The obvious alternative is to upload
> > the counts using MetricsService's schedule upload. We already have our own
> > uploader, NetworkMetricsProvider. Plumbing the counts is not trivial is not
a
> > good justification why we should not do it.
> 
> I strenuously disagree. Protecting our users is our most important
> responsibility. Adding a single histogram's worth of technical debt is a small
> price to pay for getting visibility here. I have been asking for this metric
for
> more than a month and we still do not have it. We should not block this CL
> landing while we spend even more time trying to figure out a better way to
> collect this data. Instead, we should start collecting this data while we try
to
> figure out a better way to do that.

I suggested an alternative approach on the linked bug. To illustrate how easy it
is, I wrote a CL in 5min https://codereview.chromium.org/2814473002/
I am not trying to delay things unnecessarily. I am just trying to figure out a
better approach. Things left in TODOs are often not addressed.

Ryan Hamilton

On 2017/04/10 14:28:28, xunjieli wrote: > On 2017/04/10 14:23:40, Ryan Hamilton wrote: > > +cbentzel ...

3 years, 8 months ago (2017-04-10 15:17:27 UTC) #16

On 2017/04/10 14:28:28, xunjieli wrote:
> On 2017/04/10 14:23:40, Ryan Hamilton wrote:
> > +cbentzel
> > 
> > On 2017/04/10 13:46:08, xunjieli wrote:
> > > On 2017/04/10 13:37:22, Ryan Hamilton wrote:
> > > > On 2017/04/10 13:22:54, xunjieli wrote:
> > > > > sorry, but I am going to push back on this implementation. This adds
> > > > unnecessary
> > > > > code that needs to be run for *every* connection establishment in the
> > > network
> > > > > stack. I have been trying to remove dead/unnecessary code in these
files
> > for
> > > > the
> > > > > past week, and I really don't want to see things added that obviously
> > could
> > > be
> > > > > improved.
> > > > 
> > > > We have had two major postmortems which were the result of the
> > > JobControllerSet
> > > > growing without bound. An action item from them was to collect metrics
on
> > the
> > > > size of the set so that we can understand if we're still growing without
> > > bound.
> > > > We need to do this.
> > > > 
> > > > This CL adds a single histogram once per request. This does not seem
like
> a
> > > ton
> > > > of overhead/useless code given the benefits.
> > > > 
> > > > If you have a suggestion for an alternative implementation, I'm happy to
> > > pursue
> > > > that, but we need to not delay collecting these metrics unnecessarily.
> > > 
> > > I am fully aware that there are two postmortems here. There're probably
more
> > > that we do not know (e.g crbug.com/704956). 
> > 
> > Exactly.
> > 
> > > The obvious alternative is to upload
> > > the counts using MetricsService's schedule upload. We already have our own
> > > uploader, NetworkMetricsProvider. Plumbing the counts is not trivial is
not
> a
> > > good justification why we should not do it.
> > 
> > I strenuously disagree. Protecting our users is our most important
> > responsibility. Adding a single histogram's worth of technical debt is a
small
> > price to pay for getting visibility here. I have been asking for this metric
> for
> > more than a month and we still do not have it. We should not block this CL
> > landing while we spend even more time trying to figure out a better way to
> > collect this data. Instead, we should start collecting this data while we
try
> to
> > figure out a better way to do that.
> 
> I suggested an alternative approach on the linked bug. To illustrate how easy
it
> is, I wrote a CL in 5min https://codereview.chromium.org/2814473002/
> I am not trying to delay things unnecessarily. I am just trying to figure out
a
> better approach. Things left in TODOs are often not addressed.

I'm on vacation today. I'l have to leave it to you all to sort out. Please make
sure something lands soon.

Zhongyi Shi

On 2017/04/10 14:28:28, xunjieli wrote: > On 2017/04/10 14:23:40, Ryan Hamilton wrote: > > +cbentzel ...

3 years, 8 months ago (2017-04-10 16:33:34 UTC) #17

On 2017/04/10 14:28:28, xunjieli wrote:
> On 2017/04/10 14:23:40, Ryan Hamilton wrote:
> > +cbentzel
> > 
> > On 2017/04/10 13:46:08, xunjieli wrote:
> > > On 2017/04/10 13:37:22, Ryan Hamilton wrote:
> > > > On 2017/04/10 13:22:54, xunjieli wrote:
> > > > > sorry, but I am going to push back on this implementation. This adds
> > > > unnecessary
> > > > > code that needs to be run for *every* connection establishment in the
> > > network
> > > > > stack. I have been trying to remove dead/unnecessary code in these
files
> > for
> > > > the
> > > > > past week, and I really don't want to see things added that obviously
> > could
> > > be
> > > > > improved.
> > > > 
> > > > We have had two major postmortems which were the result of the
> > > JobControllerSet
> > > > growing without bound. An action item from them was to collect metrics
on
> > the
> > > > size of the set so that we can understand if we're still growing without
> > > bound.
> > > > We need to do this.
> > > > 
> > > > This CL adds a single histogram once per request. This does not seem
like
> a
> > > ton
> > > > of overhead/useless code given the benefits.
> > > > 
> > > > If you have a suggestion for an alternative implementation, I'm happy to
> > > pursue
> > > > that, but we need to not delay collecting these metrics unnecessarily.
> > > 
> > > I am fully aware that there are two postmortems here. There're probably
more
> > > that we do not know (e.g crbug.com/704956). 
> > 
> > Exactly.
> > 
> > > The obvious alternative is to upload
> > > the counts using MetricsService's schedule upload. We already have our own
> > > uploader, NetworkMetricsProvider. Plumbing the counts is not trivial is
not
> a
> > > good justification why we should not do it.
> > 
> > I strenuously disagree. Protecting our users is our most important
> > responsibility. Adding a single histogram's worth of technical debt is a
small
> > price to pay for getting visibility here. I have been asking for this metric
> for
> > more than a month and we still do not have it. We should not block this CL
> > landing while we spend even more time trying to figure out a better way to
> > collect this data. Instead, we should start collecting this data while we
try
> to
> > figure out a better way to do that.
> 
> I suggested an alternative approach on the linked bug. To illustrate how easy
it
> is, I wrote a CL in 5min https://codereview.chromium.org/2814473002/
> I am not trying to delay things unnecessarily. I am just trying to figure out
a
> better approach. Things left in TODOs are often not addressed.

Helen, I was totally aware and did considered that we could use the
NetworkMetricsProvider. However, currently NetworkMetricsProvider schedules
histogram upload for data from NetworkChangeNotifier or existing histograms.
MetricsObserver you added in https://codereview.chromium.org/2814473002/ doesn't
seem to be appropriate to go with NetworkChangeNotifier, as
"NetworkChangeNotifier monitors the system for network changes, and notifies
registered observers of those events". rch@ and I talked about this earlier, and
were thinking landing the current approach in M59 while sorting out the upload
in a latter CL was the plan. WDYT?

Zhongyi Shi

Helen, change to the alternative approach discussed, ptal.

3 years, 8 months ago (2017-04-10 22:38:10 UTC) #18

xunjieli

Looks great. Thanks for doing this. A few comments below. https://codereview.chromium.org/2809453002/diff/60001/net/http/http_stream_factory_impl.cc File net/http/http_stream_factory_impl.cc (right): https://codereview.chromium.org/2809453002/diff/60001/net/http/http_stream_factory_impl.cc#newcode367 ...

3 years, 8 months ago (2017-04-10 23:13:21 UTC) #19

Zhongyi Shi

Thanks Helen, ptal https://codereview.chromium.org/2809453002/diff/60001/net/http/http_stream_factory_impl.cc File net/http/http_stream_factory_impl.cc (right): https://codereview.chromium.org/2809453002/diff/60001/net/http/http_stream_factory_impl.cc#newcode367 net/http/http_stream_factory_impl.cc:367: return; On 2017/04/10 23:13:21, xunjieli wrote: ...

3 years, 8 months ago (2017-04-10 23:30:13 UTC) #20

xunjieli

lgtm https://codereview.chromium.org/2809453002/diff/100001/net/http/http_stream_factory_impl.h File net/http/http_stream_factory_impl.h (right): https://codereview.chromium.org/2809453002/diff/100001/net/http/http_stream_factory_impl.h#newcode175 net/http/http_stream_factory_impl.h:175: // the count hits the limit: 100, 200, ...

3 years, 8 months ago (2017-04-10 23:49:14 UTC) #21

xunjieli

https://codereview.chromium.org/2809453002/diff/100001/net/http/http_stream_factory_impl.h File net/http/http_stream_factory_impl.h (right): https://codereview.chromium.org/2809453002/diff/100001/net/http/http_stream_factory_impl.h#newcode175 net/http/http_stream_factory_impl.h:175: // the count hits the limit: 100, 200, 400, ...

3 years, 8 months ago (2017-04-11 00:14:22 UTC) #23

Zhongyi Shi

https://codereview.chromium.org/2809453002/diff/100001/net/http/http_stream_factory_impl.h File net/http/http_stream_factory_impl.h (right): https://codereview.chromium.org/2809453002/diff/100001/net/http/http_stream_factory_impl.h#newcode175 net/http/http_stream_factory_impl.h:175: // the count hits the limit: 100, 200, 400, ...

3 years, 8 months ago (2017-04-11 00:21:22 UTC) #24

xunjieli

On 2017/04/11 00:21:22, Zhongyi Shi wrote: > https://codereview.chromium.org/2809453002/diff/100001/net/http/http_stream_factory_impl.h > File net/http/http_stream_factory_impl.h (right): > > https://codereview.chromium.org/2809453002/diff/100001/net/http/http_stream_factory_impl.h#newcode175 ...

3 years, 8 months ago (2017-04-11 00:23:10 UTC) #25

Zhongyi Shi

Description was changed from ========== Add JobController/Job count to UMA histograms everytime a new JobController ...

3 years, 8 months ago (2017-04-11 00:27:53 UTC) #26

Zhongyi Shi

The patchset sent to the CQ was uploaded after l-g-t-m from rch@chromium.org Link to the ...

3 years, 8 months ago (2017-04-11 16:48:41 UTC) #29

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2809453002/140001

3 years, 8 months ago (2017-04-11 16:49:09 UTC) #30

commit-bot: I haz the power

CQ is committing da patch. Bot data: {"patchset_id": 140001, "attempt_start_ts": 1491929321320020, "parent_rev": "2fb66ab65463f429142b4d259f64728bbdae4bc3", "commit_rev": "620855b6c99bb476fc87b67f9329b8d3724ea031"}

3 years, 8 months ago (2017-04-11 17:57:35 UTC) #31

commit-bot: I haz the power

Description was changed from ========== Add JobController/Job count to UMA histograms when the count is ...

3 years, 8 months ago (2017-04-11 17:58:37 UTC) #32

commit-bot: I haz the power

Committed patchset #8 (id:140001) as https://chromium.googlesource.com/chromium/src/+/620855b6c99bb476fc87b67f9329b8d3724ea031

3 years, 8 months ago (2017-04-11 17:58:38 UTC) #33

Ryan Hamilton

3 years, 8 months ago (2017-04-12 04:04:47 UTC) #34

Message was sent while issue was closed.

On 2017/04/11 17:58:38, commit-bot: I haz the power wrote:
> Committed patchset #8 (id:140001) as
>
https://chromium.googlesource.com/chromium/src/+/620855b6c99bb476fc87b67f9329...

Thanks Helen and Cherie for getting this landed. I can't wait to see what the
metrics show!

Issue 2809453002: Add JobController/Job count to UMA histograms when the count hits limit (Closed)

Description

Patch Set 1 #

Patch Set 2 : add to a new histogram #

Patch Set 3 : Change to only log the histograms when the count hits 200, 400, etc #

Patch Set 4 : change a comment #

Patch Set 5 : fix comments #

Patch Set 6 : change the boundary to a multiple of 100 #

Patch Set 7 : address #21 #

Patch Set 8 : address #23 #

Messages