OLD | NEW |
---|---|
(Empty) | |
1 # Making Metrics Actionable with Diagnostic Metrics | |
2 | |
3 [TOC] | |
4 | |
5 We want our metrics to be reflective of user experience, so we know we’re optimi zing for the right thing. However, metrics which accurately reflect user experie nce are often so high level that they aren’t very actionable. Diagnostic metrics are submetrics which enable us to act on our high level user experience metrics . Also see the document on constructing a [good toplevel metric](good_toplevel_m etrics.md) for guidance on constructing high quality user experience metrics. | |
benjhayden
2017/07/13 17:35:38
Do you want to point readers to RelatedHistogramBr
tdresser
2017/07/20 16:16:38
Done.
| |
6 | |
7 There are three types of diagnostic metrics: | |
8 | |
9 * Summations | |
10 * Slices | |
11 * Proxies | |
12 | |
13 ## Summation Diagnostics | |
14 | |
15 We often notice that a number is Too Big. Whether it’s the time it took to gener ate a frame, or the time until a page was visible, the first thing we want to kn ow is what’s contributing to the number. | |
16 | |
17 Summations enable us to answer these questions. In a Summation diagnostic, the d iagnostic metrics sum up to the higher level metric. For example, a Summation di agnostic for First Meaningful Paint (FMP) might be the durations the main thread spent doing various tasks, such as Style, Layout, V8, Idle, etc before FMP fire d. These diagnostics often to lead to hierarchies, where the top level metric, s uch as FMP, has a diagnostic metric, such as time spent in V8 before FMP, which has further diagnostic metrics, such as the time spent parsing, compiling, or ex ecuting JS. | |
benjhayden
2017/07/13 17:35:38
"often to lead to"
tdresser
2017/07/20 16:16:38
Done.
| |
18 | |
19 With Summation diagnostics, the top level metric equals the sum of all diagnosti cs metrics. It’s **extremely important** that you don’t leave things out of a Su mmation diagnostic. This can seem a little daunting - how are you going to accou nt for everything that contributes to the top level metric? | |
20 | |
21 The best way to do this is to start with something you can easily measure, and a lso report the "unexplained time". | |
22 | |
23 Suppose we're creating a Summation diagnostic for TimeFromNavStartToInteractive. And, suppose we can easily time Idle and Script. So, we report those two only ( don’t do this!) | |
24 | |
25 * TimeInScript: 800ms | |
26 * TimeInIdle: 300ms | |
27 | |
28 You'd incorrectly conclude from this data that script is the problem, and focus on optimizing script. This would be a shame, because if you had reported unexpla ined time, the reality would become clearer: | |
29 | |
30 * TimeInScript: 800ms | |
31 * TimeInIdle: 300ms | |
32 * Unexplained: 800ms | |
33 | |
34 Here, it jumps out that you've got some data that you've not explained and you s hould, before you leap to conclusions. | |
35 | |
36 So, start with a single pair of data: | |
37 | |
38 1. a specific submetric that you're sure you can measure, and | |
39 2. a way to measure "the rest." | |
40 | |
41 It might be that you start off just with: | |
42 | |
43 1. Time in Script | |
44 2. Unexplained == TimeToInteractive - TimeInScript | |
45 | |
46 But at least when you do this, your "unexplained time" is jumping out at you. Fr om there, your goal is to drive that number downward to the 5%-ish range. Maybe on most pages, script is so huge that you get to 80% coverage. Great! Then, you study a few pages with high "unexplained" time and figure out, "aha, this has a lot of idle time." So you add idle to your diagnostics, and maybe that gets you to 90% coverage. Repeat until you're happy enough. | |
47 | |
48 Diagnostics are imperfect. You'll always have some unexplained. And tracking you r unexplained time will keep you honest and pointed in the right direction. | |
49 | |
50 ## Slicing Diagnostics | |
51 | |
52 Slicing Diagnostics split up a metric based on its context. For example, we coul d split up Memory Use by whether a process has foreground tabs, or the number of tabs a user has open, or whether there’s a video playing. For each way we slice the metric, the higher level metric is a weighted average of the diagnostic met rics. | |
53 | |
54 With Slicing diagnostics, the top level metric equals the weighted sum of all di agnostic metrics. In the examples above, the weight of each diagnostic is the fr action of the time spent in the given context. | |
55 | |
56 In the same way that when constructing a Summation Diagnostic we account for eve rything which contributes to the high level metric, when producing a Slicing Dia gnostic, we ensure that we don’t leave out any contexts. If you want to Slice a metric by the number of tabs a user has open, you shouldn’t just use a set of re asonable tab numbers, from 1-8 for example. You should make sure to also have an overflow context (9+), so we get the full picture. | |
57 | |
58 ## Proxy Diagnostics | |
59 | |
60 Some diagnostic metrics correlate with the higher level metric, but aren’t relat ed in any precise way. For example, the top level FMP metric measures wall clock time. We could add a CPU time equivalent as a diagnostic metric, which is likel y to have lower noise. In cases like this, we expect there to exist some monoton ic function which approximately maps from the top level metric to the diagnostic metric, but this relationship could be quite rough. | |
61 | |
62 ## Composing Diagnostics | |
63 | |
64 Many metrics will have multiple sets of diagnostics. For example, this set of FM P diagnostics involves Slicing, Summation, and Proxy Diagnostics. | |
65 | |
66 With this diagnostic we can tell if there’s a regression in JS times that’s spec ific to when there’s a ServiceWorker on the page, or if there’s a reduction in i dle time spent on pages when we’ve got metered connections. | |
67 | |
68  | |
OLD | NEW |