OLD | NEW |
(Empty) | |
| 1 # Histogram Guidelines |
| 2 |
| 3 This document gives the best practices on how to use histograms in code and how |
| 4 to document the histograms for the dashboard. There are three general types of |
| 5 histograms: enumerated histograms (appropriate for enums), count histograms |
| 6 (appropriate for arbitrary numbers), and sparse histogram (appropriate for |
| 7 anything when the precision is important over a wide range is large and/or the |
| 8 range is not possible to specify a priori). |
| 9 |
| 10 [TOC] |
| 11 |
| 12 ## Emitting to Histograms |
| 13 |
| 14 ### Directly Measure What You Want |
| 15 |
| 16 Measure exactly what you want, whether that's time used for a function call, |
| 17 number of bytes transmitted to fetch a page, number of items in a list, etc. |
| 18 Do not assume you can calculate what you want from other histograms. Most of |
| 19 the ways to do this are incorrect. For example, if you want to know the time |
| 20 taken by a function that all it does is call two other functions, both of which |
| 21 are have histogram logging, you might think you can simply add up those |
| 22 the histograms for those functions to get the total time. This is wrong. |
| 23 If we knew which emissions came from which calls, we could pair them up and |
| 24 derive the total time for the function. However, histograms entries do not |
| 25 come with timestamps--we pair them up appropriately. If you simply add up the |
| 26 two histograms to get the total histogram, you're implicitly assuming those |
| 27 values are independent, which may not be the case. Directly measure what you |
| 28 care about; don't try to derive it from other data. |
| 29 |
| 30 ### Efficiency |
| 31 |
| 32 In general, the histogram code is highly optimized. Do not be concerned about |
| 33 the processing cost of emitting to a histogram (unless you're using [sparse |
| 34 histograms](#when-to-use-sparse-histograms)). |
| 35 |
| 36 ### Enum Histograms |
| 37 |
| 38 Enumerated histogram are most appropriate when you have a list of connected / |
| 39 related states that should be analyzed jointly. For example, the set of |
| 40 actions that can be done on the New Tab Page (use the omnibox, click a most |
| 41 visited tile, click a bookmark, etc.) would make a good enumerated histogram. |
| 42 If the total count of your histogram (i.e. the sum across all buckets) is |
| 43 something meaningful--as it is in this example--that is generally a good sign. |
| 44 However, the total count does not have to be meaningful for an enum histogram |
| 45 to still be the right choice. |
| 46 |
| 47 You may append to your enum if the possible states/actions grows. However, you |
| 48 should not reorder, renumber, or otherwise reuse existing values. As such, |
| 49 please put this warning by the enum definition: |
| 50 ``` |
| 51 // These values are written to logs. New enum values can be added, but existing |
| 52 // enums must never be renumbered or deleted and reused. |
| 53 ``` |
| 54 |
| 55 Also, please explicitly set enum values `= 0`, `= 1`, `= 2`, etc. This makes |
| 56 clearer that the actual values are important. In addition, it helps confirm |
| 57 the values align between the enum definition and histograms.xml. |
| 58 |
| 59 ### Count Histograms |
| 60 |
| 61 [histogram_macros.h](https://cs.chromium.org/chromium/src/base/metrics/histogram
_macros.h) |
| 62 provides macros for some common count types such as memory or elapsed time, in |
| 63 addition to general count macros. These have reasonable default values; you |
| 64 will not often need to choose number of buckets or histogram min. You still |
| 65 will need to choose the histogram max (use the advice below). |
| 66 |
| 67 If none of the default macros work well for you, please thoughtfully choose |
| 68 a min, max, and bucket count for your histogram using the advice below. |
| 69 |
| 70 ### Count Histograms: Choosing Min and Max |
| 71 |
| 72 For histogram max, choose a value so that very few emission to the histogram |
| 73 will exceed the max. If many emissions hit the max, it can be difficult to |
| 74 compute statistics such as average. One rule of thumb is at most 1% of samples |
| 75 should be in the overflow bucket. This allows analysis of the 99th percentile. |
| 76 Err on the side of too large a range versus too short a range. (Remember that i
f you choose poorly, you'll have to wait for another release cycle to fix it.) |
| 77 |
| 78 For histogram min, if you care about all possible values (zero and above), |
| 79 choose a min of 1. (All histograms have an underflow bucket; emitted zeros |
| 80 will go there. That's why a min of 1 is appropriate.) Otherwise, choose the |
| 81 min appropriate for your particular situation. |
| 82 |
| 83 ### Count Histograms: Choosing Number of Buckets |
| 84 |
| 85 Choose the smallest number of buckets that will get you the granularity you |
| 86 need. By default count histograms bucket sizes scale exponentially so you can |
| 87 get finely granularity when the numbers are small yet still reasonable |
| 88 resolution for larger numbers. The macros default to bucket sizes around 50 |
| 89 which is appropriate for most purposes. Because histograms pre-allocate all |
| 90 the buckets, the number of buckets selected directly dictate how much memory |
| 91 is used. Do not exceed 100 buckets without good reason (and consider whether |
| 92 [sparse histograms](#when-to-use-sparse-histograms) might work better for you |
| 93 in that case--they do not pre-allocate their buckets). |
| 94 |
| 95 ### Count Histograms with Linear Ranges |
| 96 |
| 97 If you want equally spaced buckets of size 1, use an enumerated histogram. |
| 98 While it's possible to do this with a count histogram, it's easy to make a |
| 99 mistake when setting the min, max, and number of buckets (because you have |
| 100 to remember how underflow and overflow buckets are handled) and end up with |
| 101 a histogram that ends up with mostly buckets of size 1 but not all. |
| 102 Using an enumerated histogram with a max value of your own choice is less |
| 103 error-prone. |
| 104 |
| 105 ### Testing |
| 106 |
| 107 Test your histograms using [chrome://histograms](chrome://histograms). Make |
| 108 sure they're being emitted to when you expect and not emitted to at other times. |
| 109 Also check that the values emitted to are correct. Finally, for count |
| 110 histograms, make sure that buckets capture enough precision for your needs over |
| 111 the range. |
| 112 |
| 113 ### Revising Histograms |
| 114 |
| 115 If you're changing the semantics of a histogram (when it's emitted, what buckets |
| 116 mean, etc.), make it into a new histogram with a new name. Otherwise the |
| 117 "Everything" view on the dashboard will be mixing two different interpretations |
| 118 of the data and make no sense. |
| 119 |
| 120 ### Deleting Histograms |
| 121 |
| 122 Please delete the code that emits to histograms that are no longer needed. |
| 123 Histograms take up memory. Cleaning up histograms that you no longer care about |
| 124 is good! But see the note below on [Deleting Histogram Entries] |
| 125 (#deleting-histogram-entries). |
| 126 |
| 127 ## Documenting Histograms |
| 128 |
| 129 ### Add Histogram and Documentation in the Same Changelist |
| 130 |
| 131 If possible, please add the histograms.xml description in the same changelist |
| 132 in which you add the histogram-emitting code. This has several benefits. One, |
| 133 it sometimes happens that the histograms.xml reviewer has questions or concerns |
| 134 about the histogram description that reveal problems with interpretation of the |
| 135 data and call for a different recording strategy. Two, it allows the histogram |
| 136 reviewer to easily review the emission code to see if it comports with these |
| 137 best practices, and to look for other errors. |
| 138 |
| 139 ### Understandable to Everyone |
| 140 |
| 141 Histogram descriptions should be roughly understandable to someone not familiar |
| 142 with your feature. Please add a sentence or two of background if necessary. |
| 143 |
| 144 It is good practice to note caveats associated with your histogram in this |
| 145 section, such as which platforms are supported (if the set of supported |
| 146 platforms is surprising). E.g., a desktop feature that happens not to be logged |
| 147 on Mac. |
| 148 |
| 149 ### State When It Is Recorded |
| 150 |
| 151 Histogram descriptions should clearly state when the histogram is emitted |
| 152 (profile open? network request received? etc.). |
| 153 |
| 154 ### Deleting Histogram Entries |
| 155 |
| 156 Do not delete histograms from histograms.xml. Instead, mark unused histograms |
| 157 as obsolete, annotating them with the associated date or milestone in the |
| 158 obsolete tag entry. If your histogram is being replaced by a new version, we |
| 159 suggest noting that in the previous histogram's description. |
| 160 |
| 161 Deleting histogram entries would be bad if someone to accidentally reused your |
| 162 old histogram name and thereby corrupts new data with whatever old data is still |
| 163 coming in. It's also useful to keep obsolete histogram descriptions in |
| 164 histograms.xml--that way, if someone is searching for a histogram to answer |
| 165 a particular question, they can learn if there was a histogram at some point |
| 166 that did so even if it isn't active now. |
| 167 |
| 168 ## When To Use Sparse Histograms |
| 169 |
| 170 Sparse histograms are well suited for recording counts of exact sample values |
| 171 that are sparsely distributed over a large range. |
| 172 |
| 173 The implementation uses a lock and a map, whereas other histogram types use a |
| 174 vector and no lock. It is thus more costly to add values to, and each value |
| 175 stored has more overhead, compared to the other histogram types. However it |
| 176 may be more efficient in memory if the total number of sample values is small |
| 177 compared to the range of their values. |
| 178 |
| 179 For more information, see [sparse_histograms.h] |
| 180 (https://cs.chromium.org/chromium/src/base/metrics/sparse_histogram.h). |
OLD | NEW |