Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(131)

Side by Side Diff: appengine/findit/crash/detect_regression_range.py

Issue 2378133004: [Findit] Rerun if the regression range is different. (Closed)
Patch Set: Fix nits. Created 4 years, 2 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
OLDNEW
1 # Copyright 2016 The Chromium Authors. All rights reserved. 1 # Copyright 2016 The Chromium Authors. All rights reserved.
2 # Use of this source code is governed by a BSD-style license that can be 2 # Use of this source code is governed by a BSD-style license that can be
3 # found in the LICENSE file. 3 # found in the LICENSE file.
4 4
5 import logging 5 import logging
6 6
7 # Default value for the maximum number of versions to look back. 7 # Default value for the maximum number of versions to look back.
8 _MAXIMUM_WINDOW_SIZE = 30 8 _MAXIMUM_WINDOW_SIZE = 30
9 9
10 # Add epsilon to avoid dividing by zero when computing spikiness. 10 # Add epsilon to avoid dividing by zero when computing spikiness.
11 _EPSILON = 0.00000001 11 _EPSILON = 0.00000001
12 12
13 # Default value to control weight of current data when computing spikiness. 13 # Default value to control weight of current data when computing spikiness.
14 _DEFAULT_ALPHA = 0.9 14 _DEFAULT_ALPHA = 0.9
15 15
16 # Threshold for calling something a spike. 16 # Threshold for calling something a spike.
17 _SPIKINESS_THRESHOLD = 20 17 _SPIKINESS_THRESHOLD = 20
18 18
19 19
20 # TODO(wrengr): make this streaming, rather than holding onto the whole list. 20 # TODO(wrengr): make this streaming, rather than holding onto the whole list.
21 def GetSpikes(events, get_value, alpha=_DEFAULT_ALPHA, 21 def GetSpikes(events, get_value, alpha=_DEFAULT_ALPHA,
22 threshold=_SPIKINESS_THRESHOLD): 22 threshold=_SPIKINESS_THRESHOLD):
23 """Given a time series, detect regression ranges for anomalous spikes. 23 """Given a time series, detect regression ranges for anomalous spikes.
24 24
25 The time series is represented by a list of "events" together with 25 The time series is represented by a list of "events" together with
26 a function for computing the "value" of each event. We assume the 26 a function for computing the "value" of each event. We assume the
27 events are given in order, and the only thing we care about them is 27 events are given in order, and the only thing we care about them is
28 their value. As we scan through the list, if we notice any "spikes" in 28 their value. As we scan through the list, if we notice any "spikes" in
29 the course of values (i.e., the current value seems anomalous compared 29 the course of values (i.e., the current value seems anomalous compared
30 to the values we've seen previously), then we produce a tuple of the 30 to the values we've seen previously), then we produce a tuple of the
31 events bracketing the spike. Since there can be many spikes, we return 31 events bracketing the spike. Since there can be many spikes, we return
32 a list of these tuples. 32 a list of these tuples.
33 33
34 The model we use for detecting spikes is exponential smoothing. This 34 The model we use for detecting spikes is exponential smoothing. This
35 model is based on the running average of the events' values, and it has 35 model is based on the running average of the events' values, and it has
36 two parameters: the alpha parameter determines how readily we update 36 two parameters: the alpha parameter determines how readily we update
37 the running average each time we see a new event, and the threshold 37 the running average each time we see a new event, and the threshold
38 parameter determines when an event's value deviates far enough from the 38 parameter determines when an event's value deviates far enough from the
39 running average that we call it a spike. N.B., in time series analysis 39 running average that we call it a spike. N.B., in time series analysis
40 more generally, exponential smoothing is considered a naive model since 40 more generally, exponential smoothing is considered a naive model since
41 it cannot account for things like scaling (i.e., if we multiply all 41 it cannot account for things like scaling (i.e., if we multiply all
42 the values by some constant, then we'll need to adjust the threshold 42 the values by some constant, then we'll need to adjust the threshold
43 by that constant too) and noise (i.e., if we add noise to the values, 43 by that constant too) and noise (i.e., if we add noise to the values,
44 then we'll need to adjust the threshold to try and filter that noise 44 then we'll need to adjust the threshold to try and filter that noise
45 out). However, based on some preliminary tests, this naive model seems 45 out). However, based on some preliminary tests, this naive model seems
46 to be good enough for our particular task. 46 to be good enough for our particular task.
47 47
48 Args: 48 Args:
49 events (list): A list of objects representing "events" in a time 49 events (list): A list of objects representing "events" in a time
50 series. The events themselves can be any sort of object (including 50 series. The events themselves can be any sort of object (including
51 None), all we care about is their value according to get_value. We 51 None), all we care about is their value according to get_value. We
52 assume the events are already in order, but we do not care about 52 assume the events are already in order, but we do not care about
53 their exact position within the list. 53 their exact position within the list.
54 get_value (callable): a valuation function mapping events to numbers. 54 get_value (callable): a valuation function mapping events to numbers.
55 alpha (float): In (0, 1], controls how we balance between evidence 55 alpha (float): In (0, 1], controls how we balance between evidence
56 from new events vs the running average. When alpha=0 we completely 56 from new events vs the running average. When alpha=0 we completely
57 ignore the new event; when alpha=1 we completely ignore the running 57 ignore the new event; when alpha=1 we completely ignore the running
(...skipping 66 matching lines...) Expand 10 before | Expand all | Expand 10 after
124 spikes = GetSpikes(historic_metadata[-max_win_size:], lambda x: x['cpm']) 124 spikes = GetSpikes(historic_metadata[-max_win_size:], lambda x: x['cpm'])
125 125
126 if not spikes: 126 if not spikes:
127 logging.warning('Failed to find spikes in history data %s' % repr( 127 logging.warning('Failed to find spikes in history data %s' % repr(
128 historic_metadata)) 128 historic_metadata))
129 return None 129 return None
130 130
131 # Only return the last/most-recent regression range. 131 # Only return the last/most-recent regression range.
132 last_good, first_bad = spikes[-1] 132 last_good, first_bad = spikes[-1]
133 return last_good['chrome_version'], first_bad['chrome_version'] 133 return last_good['chrome_version'], first_bad['chrome_version']
134
OLDNEW
« no previous file with comments | « appengine/findit/crash/crash_pipeline.py ('k') | appengine/findit/crash/findit_for_chromecrash.py » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698