OLD | NEW |
(Empty) | |
| 1 # Making Metrics Actionable with Diagnostic Metrics |
| 2 |
| 3 [TOC] |
| 4 |
| 5 We want our metrics to be reflective of user experience, so we know we’re optimi
zing for the right thing. However, metrics which accurately reflect user experie
nce are often so high level that they aren’t very actionable. Diagnostic metrics
are submetrics which enable us to act on our high level user experience metrics
. Also see the document on constructing a [good toplevel metric](good_toplevel_m
etrics.md) for guidance on constructing high quality user experience metrics. |
| 6 |
| 7 There are three types of diagnostic metrics: |
| 8 |
| 9 * Summations |
| 10 * Slices |
| 11 * Proxies |
| 12 |
| 13 ## Summation Diagnostics |
| 14 |
| 15 We often notice that a number is Too Big. Whether it’s the time it took to gener
ate a frame, or the time until a page was visible, the first thing we want to kn
ow is what’s contributing to the number. |
| 16 |
| 17 Summations enable us to answer these questions. In a Summation diagnostic, the d
iagnostic metrics sum up to the higher level metric. For example, a Summation di
agnostic for First Meaningful Paint (FMP) might be the durations the main thread
spent doing various tasks, such as Style, Layout, V8, Idle, etc before FMP fire
d. These diagnostics often lead to hierarchies, where the top level metric, such
as FMP, has a diagnostic metric, such as time spent in V8 before FMP, which has
further diagnostic metrics, such as the time spent parsing, compiling, or execu
ting JS. Summation breakdowns are implemented in telemetry as [Related Histogram
Breakdowns](https://cs.chromium.org/chromium/src/third_party/catapult/tracing/t
racing/value/diagnostics/related_histogram_breakdown.html?q=RelatedHistogramBrea
kdown&sq=package:chromium&l=18). |
| 18 |
| 19 With Summation diagnostics, the top level metric equals the sum of all diagnosti
cs metrics. It’s **extremely important** that you don’t leave things out of a Su
mmation diagnostic. This can seem a little daunting - how are you going to accou
nt for everything that contributes to the top level metric? |
| 20 |
| 21 The best way to do this is to start with something you can easily measure, and a
lso report the "unexplained time". |
| 22 |
| 23 Suppose we're creating a Summation diagnostic for TimeFromNavStartToInteractive.
And, suppose we can easily time Idle and Script. So, we report those two only (
don’t do this!) |
| 24 |
| 25 * TimeInScript: 800ms |
| 26 * TimeInIdle: 300ms |
| 27 |
| 28 You'd incorrectly conclude from this data that script is the problem, and focus
on optimizing script. This would be a shame, because if you had reported unexpla
ined time, the reality would become clearer: |
| 29 |
| 30 * TimeInScript: 800ms |
| 31 * TimeInIdle: 300ms |
| 32 * Unexplained: 800ms |
| 33 |
| 34 Here, it jumps out that you've got some data that you've not explained and you s
hould, before you leap to conclusions. |
| 35 |
| 36 So, start with a single pair of data: |
| 37 |
| 38 1. a specific submetric that you're sure you can measure, and |
| 39 2. a way to measure "the rest." |
| 40 |
| 41 It might be that you start off just with: |
| 42 |
| 43 1. Time in Script |
| 44 2. Unexplained == TimeToInteractive - TimeInScript |
| 45 |
| 46 But at least when you do this, your "unexplained time" is jumping out at you. Fr
om there, your goal is to drive that number downward to the 5%-ish range. Maybe
on most pages, script is so huge that you get to 80% coverage. Great! Then, you
study a few pages with high "unexplained" time and figure out, "aha, this has a
lot of idle time." So you add idle to your diagnostics, and maybe that gets you
to 90% coverage. Repeat until you're happy enough. |
| 47 |
| 48 Diagnostics are imperfect. You'll always have some unexplained. And tracking you
r unexplained time will keep you honest and pointed in the right direction. |
| 49 |
| 50 ## Slicing Diagnostics |
| 51 |
| 52 Slicing Diagnostics split up a metric based on its context. For example, we coul
d split up Memory Use by whether a process has foreground tabs, or the number of
tabs a user has open, or whether there’s a video playing. For each way we slice
the metric, the higher level metric is a weighted average of the diagnostic met
rics. |
| 53 |
| 54 With Slicing diagnostics, the top level metric equals the weighted sum of all di
agnostic metrics. In the examples above, the weight of each diagnostic is the fr
action of the time spent in the given context. Slicing diagnostics are implement
ed in telemetry via [Related Histogram Maps](https://cs.chromium.org/chromium/sr
c/third_party/catapult/tracing/tracing/value/diagnostics/related_histogram_map.h
tml?q=RelatedHistogramMap&sq=package:chromium&l=16). |
| 55 |
| 56 In the same way that when constructing a Summation Diagnostic we account for eve
rything which contributes to the high level metric, when producing a Slicing Dia
gnostic, we ensure that we don’t leave out any contexts. If you want to Slice a
metric by the number of tabs a user has open, you shouldn’t just use a set of re
asonable tab numbers, from 1-8 for example. You should make sure to also have an
overflow context (9+), so we get the full picture. |
| 57 |
| 58 ## Proxy Diagnostics |
| 59 |
| 60 Some diagnostic metrics correlate with the higher level metric, but aren’t relat
ed in any precise way. For example, the top level FMP metric measures wall clock
time. We could add a CPU time equivalent as a diagnostic metric, which is likel
y to have lower noise. In cases like this, we expect there to exist some monoton
ic function which approximately maps from the top level metric to the diagnostic
metric, but this relationship could be quite rough. |
| 61 |
| 62 Slicing diagnostics are implemented in telemetry via [Related Histogram Maps](ht
tps://cs.chromium.org/chromium/src/third_party/catapult/tracing/tracing/value/di
agnostics/related_histogram_map.html?q=RelatedHistogramMap&sq=package:chromium&l
=16). |
| 63 |
| 64 ## Composing Diagnostics |
| 65 |
| 66 Many metrics will have multiple sets of diagnostics. For example, this set of FM
P diagnostics involves Slicing, Summation, and Proxy Diagnostics. |
| 67 |
| 68 With this diagnostic we can tell if there’s a regression in JS times that’s spec
ific to when there’s a ServiceWorker on the page, or if there’s a reduction in i
dle time spent on pages when we’ve got metered connections. |
| 69 |
| 70  |
OLD | NEW |