|
|
Created:
5 years, 7 months ago by Anand Mistry (off Chromium) Modified:
5 years, 7 months ago CC:
chromium-reviews, cbentzel+watch_chromium.org, asvitkine+watch_chromium.org, chrome-apps-syd-reviews_chromium.org Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionAdd some basic UMA for proxy service.
There are four metrics for proxy resolution:
1. ResolvedUsingScript is whether a script was executed to resolve the URL.
2. ResolveTime is the total time taken to resolve a URL, including retries
due to resolver crashes. This is regardless of whether or not the script
was executed.
3. GetProxyUsingScriptTime is the time taken to resolve a URL when executing
the PAC script.
4. ScriptTerminated is whether or not a resolution caused the resolver to crash.
Having this as a boolean on total resolutions indicates how crashy the
out-of-process resolver is.
BUG=467832
Committed: https://crrev.com/ae468b799357272fce56c9bc0bb0d7a2444549b4
Cr-Commit-Position: refs/heads/master@{#329539}
Patch Set 1 #
Total comments: 4
Patch Set 2 : Change metrics. #Patch Set 3 : Formatting. #
Total comments: 4
Patch Set 4 : Move metrics to DidFinishResolvingProxy #Patch Set 5 : Formatting. #
Total comments: 6
Patch Set 6 : Fix histogram descriptions. #
Total comments: 4
Patch Set 7 : Tweak wording. #Patch Set 8 : More tweaking. #
Total comments: 10
Patch Set 9 : Change wording. #
Total comments: 2
Patch Set 10 : Tweak summary and rebase. #
Messages
Total messages: 21 (4 generated)
amistry@chromium.org changed reviewers: + eroman@chromium.org
I'll add histograms owners if you're ok with these metrics.
There are several different concepts of resolve time, and I don't think this CL is capturing the right things. At a high level I think we want: (a) The total time spent in ProxyService::ResolveProxy(). This would include any time spent waiting for the proxy settings, downloading the PAC script, initializing the proxy resolver with the PAC script. * This will be a good indication of the overall impact, as it will include any startup time for the process, as well as the averages weighted by how often it is used. (b) The time spent solely in the ProxyResolver::GetProxyForURL() * This is a good microbenchmark style measurement, which will show in the stead state how quickly requests are serviced inprocess vs out-of-process (i.e. once the process has been initialized with pac script). The only problem is command line flags like winhttp-proxy-resolver will skew the meaning of these results, but that is probably used infrequently. (c) The fraction of requests that went through PAC * This helps to interpret differences between (a) and (b). Please also be aware that PacRequest may be restarted. In cases where the proxy configuration changes while requests are outstanding on the ProxyResolver, CancelResolveJob() will be called, and then on resumption PacRequest::StartAndCompleteCheckingForSynchronous() is called. https://codereview.chromium.org/1121823004/diff/1/net/proxy/proxy_service.cc File net/proxy/proxy_service.cc (right): https://codereview.chromium.org/1121823004/diff/1/net/proxy/proxy_service.cc#... net/proxy/proxy_service.cc:894: base::TimeDelta::FromSeconds(10), 50); I don't think the upper range is high enough. I suggest 20 seconds (not much harm in increasing, since the buckets are spaced logarithmically). Also I don't know that we can rely on the tick granularity being better than 1ms https://codereview.chromium.org/1121823004/diff/1/net/proxy/proxy_service.cc#... net/proxy/proxy_service.cc:1080: UMA_HISTOGRAM_BOOLEAN("Net.ProxyService.ResolvedSynchronously", This is incomplete. Requests can also be completed synchronously via: TryToCompleteSynchronously
I've changed the metrics, so hopefully this is a better reflection of the time being spent. For reference, I loaded up a complex website using a complex PAC script and here are the metrics recorded (from about:histograms): Histogram: Net.ProxyService.GetProxyUsingScriptTime recorded 348 samples, average = 161.6 (flags = 0x1) 0 ... 3 ---------------------------------O (10 = 2.9%) {0.0%} 4 ------------------------------------------------------------------------O (22 = 6.3%) {2.9%} 5 ------------------------------------------------------------------------O (22 = 6.3%) {9.2%} 6 -------------------------------------------O (13 = 3.7%) {15.5%} 7 ---------------------------------O (10 = 2.9%) {19.3%} 8 -----------------------------------------O (25 = 7.2%) {22.1%} 10 ----------------------------O (17 = 4.9%) {29.3%} 12 ---------------O (14 = 4.0%) {34.2%} 15 ------------O (11 = 3.2%) {38.2%} 18 --------O (10 = 2.9%) {41.4%} 22 ------O (9 = 2.6%) {44.3%} 27 ---------O (14 = 4.0%) {46.8%} 33 -----O (8 = 2.3%) {50.9%} 40 ---O (4 = 1.1%) {53.2%} 48 ---O (4 = 1.1%) {54.3%} 58 --O (3 = 0.9%) {55.5%} 70 -O (1 = 0.3%) {56.3%} 85 -O (2 = 0.6%) {56.6%} 103 O (0 = 0.0%) {57.2%} 124 ---O (5 = 1.4%) {57.2%} 150 ----------O (16 = 4.6%) {58.6%} 181 ---------------------O (32 = 9.2%) {63.2%} 218 -------------------O (29 = 8.3%) {72.4%} 263 -----------O (17 = 4.9%) {80.7%} 317 ---------O (13 = 3.7%) {85.6%} 383 ------O (9 = 2.6%) {89.4%} 462 --O (3 = 0.9%) {92.0%} 558 -------O (11 = 3.2%) {92.8%} 674 -----O (7 = 2.0%) {96.0%} 814 -O (2 = 0.6%) {98.0%} 983 O (0 = 0.0%) {98.6%} 1187 -O (1 = 0.3%) {98.6%} 1433 -O (1 = 0.3%) {98.9%} 1730 --O (3 = 0.9%) {99.1%} 2088 ... Histogram: Net.ProxyService.ResolveProxyTime recorded 397 samples, average = 165.4 (flags = 0x1) 0 ------------------------------------------------------------------------O (48 = 12.1%) 1 --O (1 = 0.3%) {12.1%} 2 O (0 = 0.0%) {12.3%} 3 ------------O (8 = 2.0%) {12.3%} 4 ---------------------------------O (22 = 5.5%) {14.4%} 5 ---------------------------------O (22 = 5.5%) {19.9%} 6 --------------------O (13 = 3.3%) {25.4%} 7 ------------------O (12 = 3.0%) {28.7%} 8 -------------------O (25 = 6.3%) {31.7%} 10 -------------O (17 = 4.3%) {38.0%} 12 -------O (14 = 3.5%) {42.3%} 15 -----O (11 = 2.8%) {45.8%} 18 ----O (10 = 2.5%) {48.6%} 22 ---O (9 = 2.3%) {51.1%} 27 ----O (14 = 3.5%) {53.4%} 33 --O (8 = 2.0%) {56.9%} 40 -O (4 = 1.0%) {58.9%} 48 -O (4 = 1.0%) {59.9%} 58 -O (3 = 0.8%) {61.0%} 70 O (0 = 0.0%) {61.7%} 85 -O (2 = 0.5%) {61.7%} 103 O (0 = 0.0%) {62.2%} 124 -O (3 = 0.8%) {62.2%} 150 -----O (16 = 4.0%) {63.0%} 181 --------O (26 = 6.5%) {67.0%} 218 -------O (22 = 5.5%) {73.6%} 263 ----O (12 = 3.0%) {79.1%} 317 ---O (10 = 2.5%) {82.1%} 383 ----O (13 = 3.3%) {84.6%} 462 ---O (9 = 2.3%) {87.9%} 558 -----O (16 = 4.0%) {90.2%} 674 ----O (12 = 3.0%) {94.2%} 814 --O (6 = 1.5%) {97.2%} 983 O (0 = 0.0%) {98.7%} 1187 O (1 = 0.3%) {98.7%} 1433 O (1 = 0.3%) {99.0%} 1730 -O (3 = 0.8%) {99.2%} 2088 ... Histogram: Net.ProxyService.ResolvedUsingScript recorded 394 samples, average = 0.9 (flags = 0x1) 0 ----------O (46 = 11.7%) 1 ------------------------------------------------------------------------O (348 = 88.3%) {11.7%} 2 O (0 = 0.0%) {100.0%} Histogram: Net.ProxyService.ScriptTerminated recorded 351 samples, average = 0.0 (flags = 0x1) 0 ------------------------------------------------------------------------O (348 = 99.1%) 1 -O (3 = 0.9%) {99.1%} 2 O (0 = 0.0%) {100.0%} I killed the PAC process 3 times using the task manager. 46 requests did not go through the PAC script. The spike centered at 181 (and the long tail) is presumably DNS queries. https://codereview.chromium.org/1121823004/diff/1/net/proxy/proxy_service.cc File net/proxy/proxy_service.cc (right): https://codereview.chromium.org/1121823004/diff/1/net/proxy/proxy_service.cc#... net/proxy/proxy_service.cc:894: base::TimeDelta::FromSeconds(10), 50); On 2015/05/05 23:51:28, eroman wrote: > I don't think the upper range is high enough. > I suggest 20 seconds (not much harm in increasing, since the buckets are spaced > logarithmically). Done. > > Also I don't know that we can rely on the tick granularity being better than 1ms According to time.h, the granularity should be no worse than 1us if there's a high-resolution timer. Mac and POSIX (Linux/ChromeOS) have this. On windows, ~68% of users have it according to a comment in time_win.cc. https://codereview.chromium.org/1121823004/diff/1/net/proxy/proxy_service.cc#... net/proxy/proxy_service.cc:1080: UMA_HISTOGRAM_BOOLEAN("Net.ProxyService.ResolvedSynchronously", On 2015/05/05 23:51:29, eroman wrote: > This is incomplete. Requests can also be completed synchronously via: > TryToCompleteSynchronously I've changed these around. I hope they're better.
https://codereview.chromium.org/1121823004/diff/40001/net/proxy/proxy_service.cc File net/proxy/proxy_service.cc (right): https://codereview.chromium.org/1121823004/diff/40001/net/proxy/proxy_service... net/proxy/proxy_service.cc:1100: UMA_HISTOGRAM_BOOLEAN("Net.ProxyService.ResolvedUsingScript", false); This is split between multiple locations. Please put the logging at a single choke point, namely: DidFinishResolvingProxy() This does mean that DidFinishResolvingProxy() will need to be augmented with a parameter for the start time, and whether the completion was synchronous. However it is better to just have one place where the completion logic is applied. https://codereview.chromium.org/1121823004/diff/40001/net/proxy/proxy_service... net/proxy/proxy_service.cc:1108: UMA_HISTOGRAM_BOOLEAN("Net.ProxyService.ResolvedUsingScript", false); Remove the logging here. If there was no callback it means resolving will not be attempted. I believe this is being used to implement TryResolveProxySynchronously()
https://codereview.chromium.org/1121823004/diff/40001/net/proxy/proxy_service.cc File net/proxy/proxy_service.cc (right): https://codereview.chromium.org/1121823004/diff/40001/net/proxy/proxy_service... net/proxy/proxy_service.cc:1100: UMA_HISTOGRAM_BOOLEAN("Net.ProxyService.ResolvedUsingScript", false); On 2015/05/08 21:12:39, eroman wrote: > This is split between multiple locations. > Please put the logging at a single choke point, namely: > DidFinishResolvingProxy() > > This does mean that DidFinishResolvingProxy() will need to be augmented with a > parameter for the start time, and whether the completion was synchronous. > However it is better to just have one place where the completion logic is > applied. Done. https://codereview.chromium.org/1121823004/diff/40001/net/proxy/proxy_service... net/proxy/proxy_service.cc:1108: UMA_HISTOGRAM_BOOLEAN("Net.ProxyService.ResolvedUsingScript", false); On 2015/05/08 21:12:39, eroman wrote: > Remove the logging here. > If there was no callback it means resolving will not be attempted. > I believe this is being used to implement TryResolveProxySynchronously() Done.
lgtm thanks for making those modifications, I find this last version more understandable! https://codereview.chromium.org/1121823004/diff/80001/net/proxy/proxy_service.cc File net/proxy/proxy_service.cc (right): https://codereview.chromium.org/1121823004/diff/80001/net/proxy/proxy_service... net/proxy/proxy_service.cc:871: creation_time_, script_executed); Creation time of the Request may not be the same as when the ProxyService::Resolve started. Pretty similar to the issue in bug 484902, so maybe fine to leave it to that to solve (by passing in the start time as a dependency to Request rather than using Now() in its constructor) https://codereview.chromium.org/1121823004/diff/80001/tools/metrics/histogram... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/1121823004/diff/80001/tools/metrics/histogram... tools/metrics/histograms/histograms.xml:20352: + and DNS queries, but does not include retries due to errors and changing it "may" include time due to changing configuration. For instance: (1) start 4 PAC requests (2) ProxyConfigService notifies ProxyService of a configuration change (3) The PAC requests are cancelled (4) The new configuration is applied (maybe involves downloading a PAC script and re-creating a proxy resolver) (5) The Request is re-submitted to the new ProxyResolver (6) It completes, and logs its time relative to Request::creation_time_ I think the metric you are logging is correct and useful -- that it includes the extra time for re-fetching is good so it is directly comparable to ResolveProxyTime. My only concern is with the phrasing of this description. I suggest making it more like that of ResoleProxyTime https://codereview.chromium.org/1121823004/diff/80001/tools/metrics/histogram... tools/metrics/histograms/histograms.xml:20368: + The total time taken to resolve a URL. THis includes PAC script execution, typo: THis --> This
amistry@chromium.org changed reviewers: + mpearson@chromium.org
mpearson@chromium.org: For histograms.xml https://codereview.chromium.org/1121823004/diff/80001/net/proxy/proxy_service.cc File net/proxy/proxy_service.cc (right): https://codereview.chromium.org/1121823004/diff/80001/net/proxy/proxy_service... net/proxy/proxy_service.cc:871: creation_time_, script_executed); On 2015/05/11 23:42:40, eroman wrote: > Creation time of the Request may not be the same as when the > ProxyService::Resolve started. Pretty similar to the issue in bug 484902, so > maybe fine to leave it to that to solve (by passing in the start time as a > dependency to Request rather than using Now() in its constructor) Since ResolveProxyHelper creates this object, which sets the creation time in the constructor, I think the time difference between the two is going to be very small. On the order of microseconds, or less. https://codereview.chromium.org/1121823004/diff/80001/tools/metrics/histogram... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/1121823004/diff/80001/tools/metrics/histogram... tools/metrics/histograms/histograms.xml:20352: + and DNS queries, but does not include retries due to errors and changing On 2015/05/11 23:42:41, eroman wrote: > it "may" include time due to changing configuration. For instance: > > (1) start 4 PAC requests > (2) ProxyConfigService notifies ProxyService of a configuration change > (3) The PAC requests are cancelled > (4) The new configuration is applied (maybe involves downloading a PAC script > and re-creating a proxy resolver) > (5) The Request is re-submitted to the new ProxyResolver > (6) It completes, and logs its time relative to Request::creation_time_ > > I think the metric you are logging is correct and useful -- that it includes the > extra time for re-fetching is good so it is directly comparable to > ResolveProxyTime. > > My only concern is with the phrasing of this description. I suggest making it > more like that of ResoleProxyTime Done. https://codereview.chromium.org/1121823004/diff/80001/tools/metrics/histogram... tools/metrics/histograms/histograms.xml:20368: + The total time taken to resolve a URL. THis includes PAC script execution, On 2015/05/11 23:42:41, eroman wrote: > typo: THis --> This Done.
still lgtm https://codereview.chromium.org/1121823004/diff/100001/tools/metrics/histogra... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/1121823004/diff/100001/tools/metrics/histogra... tools/metrics/histograms/histograms.xml:20352: + DNS queries, and retries. nit: retries is somewhat ambiguous. There is another mechanism for retries (ReconsiderProxyAfterError) which is not counted by this statistic (each attempt will be logged separately). Again I think what is being measured is fine, but the wording here might benefit from further tweaks. https://codereview.chromium.org/1121823004/diff/100001/tools/metrics/histogra... tools/metrics/histograms/histograms.xml:20360: + resolver and executing the PAC script. nit: there are exceptions: if the --winhttp-proxy-resolver is used, then this will actually be measuring the time to evaluate proxy using WinHTTP when on windows, and surprisingly also changes the meaning on Mac (even though it doesn't use WinHTTP). Moreover when running in --single-process mode the meaning is equivalent. Perhaps not necessary to get into the full details, but I felt the need to point this out.
https://codereview.chromium.org/1121823004/diff/100001/tools/metrics/histogra... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/1121823004/diff/100001/tools/metrics/histogra... tools/metrics/histograms/histograms.xml:20352: + DNS queries, and retries. On 2015/05/12 00:48:51, eroman wrote: > nit: retries is somewhat ambiguous. There is another mechanism for retries > (ReconsiderProxyAfterError) which is not counted by this statistic (each attempt > will be logged separately). Again I think what is being measured is fine, but > the wording here might benefit from further tweaks. I've changed the wording a bit to clarify that these are retries that happen automatically. Calling ReconsiderProxyAfterError is a user-initiated retry. From what I can tell, this metric captures all retries that happen without any user intervention. https://codereview.chromium.org/1121823004/diff/100001/tools/metrics/histogra... tools/metrics/histograms/histograms.xml:20360: + resolver and executing the PAC script. On 2015/05/12 00:48:50, eroman wrote: > nit: there are exceptions: if the --winhttp-proxy-resolver is used, then this > will actually be measuring the time to evaluate proxy using WinHTTP when on > windows, and surprisingly also changes the meaning on Mac (even though it > doesn't use WinHTTP). Moreover when running in --single-process mode the meaning > is equivalent. > > Perhaps not necessary to get into the full details, but I felt the need to point > this out. I've added some wording to point this out, without getting into too many details.
lgtm
https://codereview.chromium.org/1121823004/diff/140001/tools/metrics/histogra... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/1121823004/diff/140001/tools/metrics/histogra... tools/metrics/histograms/histograms.xml:20347: +<histogram name="Net.ProxyService.GetProxyUsingScriptTime"> units=? (here and elsewhere, though elsewhere some may be enum=) Perhaps milliseconds for this histogram. https://codereview.chromium.org/1121823004/diff/140001/tools/metrics/histogra... tools/metrics/histograms/histograms.xml:20361: + resolver and executing the PAC script. Note that in some cases, the system Please say explicitly when this is recorded. (Does every URL request get passed through proxy resolution?) https://codereview.chromium.org/1121823004/diff/140001/tools/metrics/histogra... tools/metrics/histograms/histograms.xml:20362: + proxy resolver is used instead and this counts the requests forwarded to the this? If so, this counts -> this histogram includes and perhaps is used -> is used internally https://codereview.chromium.org/1121823004/diff/140001/tools/metrics/histogra... tools/metrics/histograms/histograms.xml:20363: + system proxy resolver. However, on desktop platforms, this should be a small this = ? the count of all emissions to the histogram? the count of all times the system proxy resolver is used? the count of all emissions to the histogram for the bucket ____? https://codereview.chromium.org/1121823004/diff/140001/tools/metrics/histogra... tools/metrics/histograms/histograms.xml:20382: + out-of-process PAC. is this the only time it can occur?
https://codereview.chromium.org/1121823004/diff/140001/tools/metrics/histogra... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/1121823004/diff/140001/tools/metrics/histogra... tools/metrics/histograms/histograms.xml:20347: +<histogram name="Net.ProxyService.GetProxyUsingScriptTime"> On 2015/05/12 04:21:42, Mark P wrote: > units=? > (here and elsewhere, though elsewhere some may be enum=) > > Perhaps milliseconds for this histogram. I think it's 100s of microseconds since it uses a custom time scale (us is too fine, ms is too coarse). Not exactly sure how to express this. I should mention that the timing histograms exist because we want to compare before/after, and not necessarily look at absolute values. https://codereview.chromium.org/1121823004/diff/140001/tools/metrics/histogra... tools/metrics/histograms/histograms.xml:20361: + resolver and executing the PAC script. Note that in some cases, the system On 2015/05/12 04:21:42, Mark P wrote: > Please say explicitly when this is recorded. (Does every URL request get passed > through proxy resolution?) Done. https://codereview.chromium.org/1121823004/diff/140001/tools/metrics/histogra... tools/metrics/histograms/histograms.xml:20362: + proxy resolver is used instead and this counts the requests forwarded to the On 2015/05/12 04:21:42, Mark P wrote: > this? > If so, > this counts > -> > this histogram includes > and perhaps > is used > -> > is used internally Done. https://codereview.chromium.org/1121823004/diff/140001/tools/metrics/histogra... tools/metrics/histograms/histograms.xml:20363: + system proxy resolver. However, on desktop platforms, this should be a small On 2015/05/12 04:21:42, Mark P wrote: > this = ? the count of all emissions to the histogram? the count of all times > the system proxy resolver is used? the count of all emissions to the histogram > for the bucket ____? Done. https://codereview.chromium.org/1121823004/diff/140001/tools/metrics/histogra... tools/metrics/histograms/histograms.xml:20382: + out-of-process PAC. On 2015/05/12 04:21:42, Mark P wrote: > is this the only time it can occur? Yes.
histograms.xml lgtm with one minor comment https://codereview.chromium.org/1121823004/diff/160001/tools/metrics/histogra... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/1121823004/diff/160001/tools/metrics/histogra... tools/metrics/histograms/histograms.xml:20368: + bucket caused by users of the system resolver should be small. nit: after "bucket", add "on desktop" for emphasis.
https://codereview.chromium.org/1121823004/diff/160001/tools/metrics/histogra... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/1121823004/diff/160001/tools/metrics/histogra... tools/metrics/histograms/histograms.xml:20368: + bucket caused by users of the system resolver should be small. On 2015/05/12 19:44:50, Mark P wrote: > nit: after "bucket", add "on desktop" for emphasis. Done.
The CQ bit was checked by amistry@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from eroman@chromium.org, mpearson@chromium.org Link to the patchset: https://codereview.chromium.org/1121823004/#ps180001 (title: "Tweak summary and rebase.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1121823004/180001
Message was sent while issue was closed.
Committed patchset #10 (id:180001)
Message was sent while issue was closed.
Patchset 10 (id:??) landed as https://crrev.com/ae468b799357272fce56c9bc0bb0d7a2444549b4 Cr-Commit-Position: refs/heads/master@{#329539} |