appengine/findit/crash/detect_regression_range.py - Issue 2378133004: [Findit] Rerun if the regression range is different.

Side by Side Diff: appengine/findit/crash/detect_regression_range.py

Issue 2378133004: [Findit] Rerun if the regression range is different. (Closed)

Patch Set: Fix nits. Created 4 years, 2 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

« no previous file with comments | « appengine/findit/crash/crash_pipeline.py ('k') | appengine/findit/crash/findit_for_chromecrash.py » ('j') | no next file with comments »

OLD	NEW
1 # Copyright 2016 The Chromium Authors. All rights reserved.	1 # Copyright 2016 The Chromium Authors. All rights reserved.

2 # Use of this source code is governed by a BSD-style license that can be	2 # Use of this source code is governed by a BSD-style license that can be

3 # found in the LICENSE file.	3 # found in the LICENSE file.

4	4

5 import logging	5 import logging

6	6

7 # Default value for the maximum number of versions to look back.	7 # Default value for the maximum number of versions to look back.

8 _MAXIMUM_WINDOW_SIZE = 30	8 _MAXIMUM_WINDOW_SIZE = 30

9	9

10 # Add epsilon to avoid dividing by zero when computing spikiness.	10 # Add epsilon to avoid dividing by zero when computing spikiness.

11 _EPSILON = 0.00000001	11 _EPSILON = 0.00000001

12	12

13 # Default value to control weight of current data when computing spikiness.	13 # Default value to control weight of current data when computing spikiness.

14 _DEFAULT_ALPHA = 0.9	14 _DEFAULT_ALPHA = 0.9

15	15

16 # Threshold for calling something a spike.	16 # Threshold for calling something a spike.

17 _SPIKINESS_THRESHOLD = 20	17 _SPIKINESS_THRESHOLD = 20

18	18

19	19

20 # TODO(wrengr): make this streaming, rather than holding onto the whole list.	20 # TODO(wrengr): make this streaming, rather than holding onto the whole list.

21 def GetSpikes(events, get_value, alpha=_DEFAULT_ALPHA,	21 def GetSpikes(events, get_value, alpha=_DEFAULT_ALPHA,

22 threshold=_SPIKINESS_THRESHOLD):	22 threshold=_SPIKINESS_THRESHOLD):

23 """Given a time series, detect regression ranges for anomalous spikes.	23 """Given a time series, detect regression ranges for anomalous spikes.

24	24

25 The time series is represented by a list of "events" together with	25 The time series is represented by a list of "events" together with

26 a function for computing the "value" of each event. We assume the	26 a function for computing the "value" of each event. We assume the

27 events are given in order, and the only thing we care about them is	27 events are given in order, and the only thing we care about them is

28 their value. As we scan through the list, if we notice any "spikes" in	28 their value. As we scan through the list, if we notice any "spikes" in

29 the course of values (i.e., the current value seems anomalous compared	29 the course of values (i.e., the current value seems anomalous compared

30 to the values we've seen previously), then we produce a tuple of the	30 to the values we've seen previously), then we produce a tuple of the

31 events bracketing the spike. Since there can be many spikes, we return	31 events bracketing the spike. Since there can be many spikes, we return

32 a list of these tuples.	32 a list of these tuples.

33	33

34 The model we use for detecting spikes is exponential smoothing. This	34 The model we use for detecting spikes is exponential smoothing. This

35 model is based on the running average of the events' values, and it has	35 model is based on the running average of the events' values, and it has

36 two parameters: the alpha parameter determines how readily we update	36 two parameters: the alpha parameter determines how readily we update

37 the running average each time we see a new event, and the threshold	37 the running average each time we see a new event, and the threshold

38 parameter determines when an event's value deviates far enough from the	38 parameter determines when an event's value deviates far enough from the

39 running average that we call it a spike. N.B., in time series analysis	39 running average that we call it a spike. N.B., in time series analysis

40 more generally, exponential smoothing is considered a naive model since	40 more generally, exponential smoothing is considered a naive model since

41 it cannot account for things like scaling (i.e., if we multiply all	41 it cannot account for things like scaling (i.e., if we multiply all

42 the values by some constant, then we'll need to adjust the threshold	42 the values by some constant, then we'll need to adjust the threshold

43 by that constant too) and noise (i.e., if we add noise to the values,	43 by that constant too) and noise (i.e., if we add noise to the values,

44 then we'll need to adjust the threshold to try and filter that noise	44 then we'll need to adjust the threshold to try and filter that noise

45 out). However, based on some preliminary tests, this naive model seems	45 out). However, based on some preliminary tests, this naive model seems

46 to be good enough for our particular task.	46 to be good enough for our particular task.

47	47

48 Args:	48 Args:

49 events (list): A list of objects representing "events" in a time	49 events (list): A list of objects representing "events" in a time

50 series. The events themselves can be any sort of object (including	50 series. The events themselves can be any sort of object (including

51 None), all we care about is their value according to get_value. We	51 None), all we care about is their value according to get_value. We

52 assume the events are already in order, but we do not care about	52 assume the events are already in order, but we do not care about

53 their exact position within the list.	53 their exact position within the list.

54 get_value (callable): a valuation function mapping events to numbers.	54 get_value (callable): a valuation function mapping events to numbers.

55 alpha (float): In (0, 1], controls how we balance between evidence	55 alpha (float): In (0, 1], controls how we balance between evidence

56 from new events vs the running average. When alpha=0 we completely	56 from new events vs the running average. When alpha=0 we completely

57 ignore the new event; when alpha=1 we completely ignore the running	57 ignore the new event; when alpha=1 we completely ignore the running

(...skipping 66 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
124 spikes = GetSpikes(historic_metadata[-max_win_size:], lambda x: x['cpm'])	124 spikes = GetSpikes(historic_metadata[-max_win_size:], lambda x: x['cpm'])

125	125

126 if not spikes:	126 if not spikes:

127 logging.warning('Failed to find spikes in history data %s' % repr(	127 logging.warning('Failed to find spikes in history data %s' % repr(

128 historic_metadata))	128 historic_metadata))

129 return None	129 return None

130	130

131 # Only return the last/most-recent regression range.	131 # Only return the last/most-recent regression range.

132 last_good, first_bad = spikes[-1]	132 last_good, first_bad = spikes[-1]

133 return last_good['chrome_version'], first_bad['chrome_version']	133 return last_good['chrome_version'], first_bad['chrome_version']

134

OLD	NEW