docs/speed/diagnostic_metrics.md - Issue 2973213002: Add documentation on authoring metrics.

Side by Side Diff: docs/speed/diagnostic_metrics.md

Issue 2973213002: Add documentation on authoring metrics. (Closed)

Patch Set: Created 3 years, 5 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

OLD	NEW
(Empty)
	1 # Making Metrics Actionable with Diagnostic Metrics

	2

	3 [TOC]

	4

	5 We want our metrics to be reflective of user experience, so we know we’re optimi zing for the right thing. However, metrics which accurately reflect user experie nce are often so high level that they aren’t very actionable. Diagnostic metrics are submetrics which enable us to act on our high level user experience metrics . Also see the document on constructing a [good toplevel metric](good_toplevel_m etrics.md) for guidance on constructing high quality user experience metrics.
	benjhayden 2017/07/13 17:35:38 Do you want to point readers to RelatedHistogramBr Do you want to point readers to RelatedHistogramBreakdown for summations and RelatedHistogramMap for slices and proxies for further reading? tdresser 2017/07/20 16:16:38 Done. Show quoted text On 2017/07/13 17:35:38, benjhayden wrote: > Do you want to point readers to RelatedHistogramBreakdown for summations and > RelatedHistogramMap for slices and proxies for further reading? Done.
	6

	7 There are three types of diagnostic metrics:

	8

	9 * Summations

	10 * Slices

	11 * Proxies

	12

	13 ## Summation Diagnostics

	14

	15 We often notice that a number is Too Big. Whether it’s the time it took to gener ate a frame, or the time until a page was visible, the first thing we want to kn ow is what’s contributing to the number.

	16

	17 Summations enable us to answer these questions. In a Summation diagnostic, the d iagnostic metrics sum up to the higher level metric. For example, a Summation di agnostic for First Meaningful Paint (FMP) might be the durations the main thread spent doing various tasks, such as Style, Layout, V8, Idle, etc before FMP fire d. These diagnostics often to lead to hierarchies, where the top level metric, s uch as FMP, has a diagnostic metric, such as time spent in V8 before FMP, which has further diagnostic metrics, such as the time spent parsing, compiling, or ex ecuting JS.
	benjhayden 2017/07/13 17:35:38 "often to lead to" "often to lead to" tdresser 2017/07/20 16:16:38 Done. Show quoted text On 2017/07/13 17:35:38, benjhayden wrote: > "often to lead to" Done.
	18

	19 With Summation diagnostics, the top level metric equals the sum of all diagnosti cs metrics. It’s extremely important that you don’t leave things out of a Su mmation diagnostic. This can seem a little daunting - how are you going to accou nt for everything that contributes to the top level metric?

	20

	21 The best way to do this is to start with something you can easily measure, and a lso report the "unexplained time".

	22

	23 Suppose we're creating a Summation diagnostic for TimeFromNavStartToInteractive. And, suppose we can easily time Idle and Script. So, we report those two only ( don’t do this!)

	24

	25 * TimeInScript: 800ms

	26 * TimeInIdle: 300ms

	27

	28 You'd incorrectly conclude from this data that script is the problem, and focus on optimizing script. This would be a shame, because if you had reported unexpla ined time, the reality would become clearer:

	29

	30 * TimeInScript: 800ms

	31 * TimeInIdle: 300ms

	32 * Unexplained: 800ms

	33

	34 Here, it jumps out that you've got some data that you've not explained and you s hould, before you leap to conclusions.

	35

	36 So, start with a single pair of data:

	37

	38 1. a specific submetric that you're sure you can measure, and

	39 2. a way to measure "the rest."

	40

	41 It might be that you start off just with:

	42

	43 1. Time in Script

	44 2. Unexplained == TimeToInteractive - TimeInScript

	45

	46 But at least when you do this, your "unexplained time" is jumping out at you. Fr om there, your goal is to drive that number downward to the 5%-ish range. Maybe on most pages, script is so huge that you get to 80% coverage. Great! Then, you study a few pages with high "unexplained" time and figure out, "aha, this has a lot of idle time." So you add idle to your diagnostics, and maybe that gets you to 90% coverage. Repeat until you're happy enough.

	47

	48 Diagnostics are imperfect. You'll always have some unexplained. And tracking you r unexplained time will keep you honest and pointed in the right direction.

	49

	50 ## Slicing Diagnostics

	51

	52 Slicing Diagnostics split up a metric based on its context. For example, we coul d split up Memory Use by whether a process has foreground tabs, or the number of tabs a user has open, or whether there’s a video playing. For each way we slice the metric, the higher level metric is a weighted average of the diagnostic met rics.

	53

	54 With Slicing diagnostics, the top level metric equals the weighted sum of all di agnostic metrics. In the examples above, the weight of each diagnostic is the fr action of the time spent in the given context.

	55

	56 In the same way that when constructing a Summation Diagnostic we account for eve rything which contributes to the high level metric, when producing a Slicing Dia gnostic, we ensure that we don’t leave out any contexts. If you want to Slice a metric by the number of tabs a user has open, you shouldn’t just use a set of re asonable tab numbers, from 1-8 for example. You should make sure to also have an overflow context (9+), so we get the full picture.

	57

	58 ## Proxy Diagnostics

	59

	60 Some diagnostic metrics correlate with the higher level metric, but aren’t relat ed in any precise way. For example, the top level FMP metric measures wall clock time. We could add a CPU time equivalent as a diagnostic metric, which is likel y to have lower noise. In cases like this, we expect there to exist some monoton ic function which approximately maps from the top level metric to the diagnostic metric, but this relationship could be quite rough.

	61

	62 ## Composing Diagnostics

	63

	64 Many metrics will have multiple sets of diagnostics. For example, this set of FM P diagnostics involves Slicing, Summation, and Proxy Diagnostics.

	65

	66 With this diagnostic we can tell if there’s a regression in JS times that’s spec ific to when there’s a ServiceWorker on the page, or if there’s a reduction in i dle time spent on pages when we’ve got metered connections.

	67

	68 ![example of diagnostic metrics](images/diagnostic-metrics-example.png)

OLD	NEW

« no previous file with comments | « no previous file | docs/speed/good_toplevel_metrics.md » ('j') | docs/speed/good_toplevel_metrics.md » ('J')