Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(20)

Side by Side Diff: docs/speed/diagnostic_metrics.md

Issue 2973213002: Add documentation on authoring metrics. (Closed)
Patch Set: Created 3 years, 5 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
« no previous file with comments | « no previous file | docs/speed/good_toplevel_metrics.md » ('j') | docs/speed/good_toplevel_metrics.md » ('J')
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
(Empty)
1 # Making Metrics Actionable with Diagnostic Metrics
2
3 [TOC]
4
5 We want our metrics to be reflective of user experience, so we know we’re optimi zing for the right thing. However, metrics which accurately reflect user experie nce are often so high level that they aren’t very actionable. Diagnostic metrics are submetrics which enable us to act on our high level user experience metrics . Also see the document on constructing a [good toplevel metric](good_toplevel_m etrics.md) for guidance on constructing high quality user experience metrics.
benjhayden 2017/07/13 17:35:38 Do you want to point readers to RelatedHistogramBr
tdresser 2017/07/20 16:16:38 Done.
6
7 There are three types of diagnostic metrics:
8
9 * Summations
10 * Slices
11 * Proxies
12
13 ## Summation Diagnostics
14
15 We often notice that a number is Too Big. Whether it’s the time it took to gener ate a frame, or the time until a page was visible, the first thing we want to kn ow is what’s contributing to the number.
16
17 Summations enable us to answer these questions. In a Summation diagnostic, the d iagnostic metrics sum up to the higher level metric. For example, a Summation di agnostic for First Meaningful Paint (FMP) might be the durations the main thread spent doing various tasks, such as Style, Layout, V8, Idle, etc before FMP fire d. These diagnostics often to lead to hierarchies, where the top level metric, s uch as FMP, has a diagnostic metric, such as time spent in V8 before FMP, which has further diagnostic metrics, such as the time spent parsing, compiling, or ex ecuting JS.
benjhayden 2017/07/13 17:35:38 "often to lead to"
tdresser 2017/07/20 16:16:38 Done.
18
19 With Summation diagnostics, the top level metric equals the sum of all diagnosti cs metrics. It’s **extremely important** that you don’t leave things out of a Su mmation diagnostic. This can seem a little daunting - how are you going to accou nt for everything that contributes to the top level metric?
20
21 The best way to do this is to start with something you can easily measure, and a lso report the "unexplained time".
22
23 Suppose we're creating a Summation diagnostic for TimeFromNavStartToInteractive. And, suppose we can easily time Idle and Script. So, we report those two only ( don’t do this!)
24
25 * TimeInScript: 800ms
26 * TimeInIdle: 300ms
27
28 You'd incorrectly conclude from this data that script is the problem, and focus on optimizing script. This would be a shame, because if you had reported unexpla ined time, the reality would become clearer:
29
30 * TimeInScript: 800ms
31 * TimeInIdle: 300ms
32 * Unexplained: 800ms
33
34 Here, it jumps out that you've got some data that you've not explained and you s hould, before you leap to conclusions.
35
36 So, start with a single pair of data:
37
38 1. a specific submetric that you're sure you can measure, and
39 2. a way to measure "the rest."
40
41 It might be that you start off just with:
42
43 1. Time in Script
44 2. Unexplained == TimeToInteractive - TimeInScript
45
46 But at least when you do this, your "unexplained time" is jumping out at you. Fr om there, your goal is to drive that number downward to the 5%-ish range. Maybe on most pages, script is so huge that you get to 80% coverage. Great! Then, you study a few pages with high "unexplained" time and figure out, "aha, this has a lot of idle time." So you add idle to your diagnostics, and maybe that gets you to 90% coverage. Repeat until you're happy enough.
47
48 Diagnostics are imperfect. You'll always have some unexplained. And tracking you r unexplained time will keep you honest and pointed in the right direction.
49
50 ## Slicing Diagnostics
51
52 Slicing Diagnostics split up a metric based on its context. For example, we coul d split up Memory Use by whether a process has foreground tabs, or the number of tabs a user has open, or whether there’s a video playing. For each way we slice the metric, the higher level metric is a weighted average of the diagnostic met rics.
53
54 With Slicing diagnostics, the top level metric equals the weighted sum of all di agnostic metrics. In the examples above, the weight of each diagnostic is the fr action of the time spent in the given context.
55
56 In the same way that when constructing a Summation Diagnostic we account for eve rything which contributes to the high level metric, when producing a Slicing Dia gnostic, we ensure that we don’t leave out any contexts. If you want to Slice a metric by the number of tabs a user has open, you shouldn’t just use a set of re asonable tab numbers, from 1-8 for example. You should make sure to also have an overflow context (9+), so we get the full picture.
57
58 ## Proxy Diagnostics
59
60 Some diagnostic metrics correlate with the higher level metric, but aren’t relat ed in any precise way. For example, the top level FMP metric measures wall clock time. We could add a CPU time equivalent as a diagnostic metric, which is likel y to have lower noise. In cases like this, we expect there to exist some monoton ic function which approximately maps from the top level metric to the diagnostic metric, but this relationship could be quite rough.
61
62 ## Composing Diagnostics
63
64 Many metrics will have multiple sets of diagnostics. For example, this set of FM P diagnostics involves Slicing, Summation, and Proxy Diagnostics.
65
66 With this diagnostic we can tell if there’s a regression in JS times that’s spec ific to when there’s a ServiceWorker on the page, or if there’s a reduction in i dle time spent on pages when we’ve got metered connections.
67
68 ![example of diagnostic metrics](images/diagnostic-metrics-example.png)
OLDNEW
« no previous file with comments | « no previous file | docs/speed/good_toplevel_metrics.md » ('j') | docs/speed/good_toplevel_metrics.md » ('J')

Powered by Google App Engine
This is Rietveld 408576698