Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(208)

Side by Side Diff: tools/bisect-perf-regression.py

Issue 241273002: Change "percentage change" function and add comments/test for it. (Closed) Base URL: https://chromium.googlesource.com/chromium/src.git@master
Patch Set: Merge latest changes if any Created 6 years, 8 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
« no previous file with comments | « no previous file | tools/bisect-perf-regression_test.py » ('j') | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 #!/usr/bin/env python 1 #!/usr/bin/env python
2 # Copyright (c) 2013 The Chromium Authors. All rights reserved. 2 # Copyright (c) 2013 The Chromium Authors. All rights reserved.
3 # Use of this source code is governed by a BSD-style license that can be 3 # Use of this source code is governed by a BSD-style license that can be
4 # found in the LICENSE file. 4 # found in the LICENSE file.
5 5
6 """Performance Test Bisect Tool 6 """Performance Test Bisect Tool
7 7
8 This script bisects a series of changelists using binary search. It starts at 8 This script bisects a series of changelists using binary search. It starts at
9 a bad revision where a performance metric has regressed, and asks for a last 9 a bad revision where a performance metric has regressed, and asks for a last
10 known-good revision. It will then binary search across this revision range by 10 known-good revision. It will then binary search across this revision range by
(...skipping 273 matching lines...) Expand 10 before | Expand all | Expand 10 after
284 284
285 mean = CalculateMean(values) 285 mean = CalculateMean(values)
286 differences_from_mean = [float(x) - mean for x in values] 286 differences_from_mean = [float(x) - mean for x in values]
287 squared_differences = [float(x * x) for x in differences_from_mean] 287 squared_differences = [float(x * x) for x in differences_from_mean]
288 variance = sum(squared_differences) / (len(values) - 1) 288 variance = sum(squared_differences) / (len(values) - 1)
289 std_dev = math.sqrt(variance) 289 std_dev = math.sqrt(variance)
290 290
291 return std_dev 291 return std_dev
292 292
293 293
294 def CalculateRelativeChange(before, after):
295 """Returns the relative change of before and after, relative to before.
296
297 There are several different ways to define relative difference between
298 two numbers; sometimes it is defined as relative to the smaller number,
299 or to the mean of the two numbers. This version returns the difference
300 relative to the first of the two numbers.
301
302 Args:
303 before: A number representing an earlier value.
304 after: Another number, representing a later value.
305
306 Returns:
307 A non-negative floating point number; 0.1 represents a 10% change.
308 """
309 if before == 0:
shatch 2014/04/17 19:46:17 If the "good" values were legitimately 0 (ie. the
qyearsley 2014/04/17 20:54:05 Yeah -- In this version before my latest patch, th
shatch 2014/04/17 22:58:28 Hmm, yeah I'm not sure what the best output here w
qyearsley 2014/04/18 19:38:03 Anyway, if we're calculating relative change relat
310 return float('nan')
311 difference = math.fabs(after - before)
shatch 2014/04/17 19:46:17 This fabs call seems unnecessary, considering you
qyearsley 2014/04/17 20:54:05 Good point, done. (My original thinking was: first
312 return math.fabs(difference / before)
313
314
294 def CalculatePooledStandardError(work_sets): 315 def CalculatePooledStandardError(work_sets):
295 numerator = 0.0 316 numerator = 0.0
296 denominator1 = 0.0 317 denominator1 = 0.0
297 denominator2 = 0.0 318 denominator2 = 0.0
298 319
299 for current_set in work_sets: 320 for current_set in work_sets:
300 std_dev = CalculateStandardDeviation(current_set) 321 std_dev = CalculateStandardDeviation(current_set)
301 numerator += (len(current_set) - 1) * std_dev ** 2 322 numerator += (len(current_set) - 1) * std_dev ** 2
302 denominator1 += len(current_set) - 1 323 denominator1 += len(current_set) - 1
303 denominator2 += 1.0 / len(current_set) 324 denominator2 += 1.0 / len(current_set)
(...skipping 2826 matching lines...) Expand 10 before | Expand all | Expand 10 after
3130 working_means.append(revision_data_sorted[i][1]['value']['values']) 3151 working_means.append(revision_data_sorted[i][1]['value']['values'])
3131 3152
3132 # Flatten the lists to calculate mean of all values. 3153 # Flatten the lists to calculate mean of all values.
3133 working_mean = sum(working_means, []) 3154 working_mean = sum(working_means, [])
3134 broken_mean = sum(broken_means, []) 3155 broken_mean = sum(broken_means, [])
3135 3156
3136 # Calculate the approximate size of the regression 3157 # Calculate the approximate size of the regression
3137 mean_of_bad_runs = CalculateMean(broken_mean) 3158 mean_of_bad_runs = CalculateMean(broken_mean)
3138 mean_of_good_runs = CalculateMean(working_mean) 3159 mean_of_good_runs = CalculateMean(working_mean)
3139 3160
3140 regression_size = math.fabs(max(mean_of_good_runs, mean_of_bad_runs) / 3161 regression_size = 100 * CalculateRelativeChange(mean_of_good_runs,
3141 max(0.0001, min(mean_of_good_runs, mean_of_bad_runs))) * 100.0 - 100.0 3162 mean_of_bad_runs)
3142 3163
3143 regression_std_err = math.fabs(CalculatePooledStandardError( 3164 regression_std_err = math.fabs(CalculatePooledStandardError(
3144 [working_mean, broken_mean]) / 3165 [working_mean, broken_mean]) /
3145 max(0.0001, min(mean_of_good_runs, mean_of_bad_runs))) * 100.0 3166 max(0.0001, min(mean_of_good_runs, mean_of_bad_runs))) * 100.0
3146 3167
3147 # Give a "confidence" in the bisect. At the moment we use how distinct the 3168 # Give a "confidence" in the bisect. At the moment we use how distinct the
3148 # values are before and after the last broken revision, and how noisy the 3169 # values are before and after the last broken revision, and how noisy the
3149 # overall graph is. 3170 # overall graph is.
3150 confidence = CalculateConfidence(working_means, broken_means) 3171 confidence = CalculateConfidence(working_means, broken_means)
3151 3172
(...skipping 514 matching lines...) Expand 10 before | Expand all | Expand 10 after
3666 # The perf dashboard scrapes the "results" step in order to comment on 3687 # The perf dashboard scrapes the "results" step in order to comment on
3667 # bugs. If you change this, please update the perf dashboard as well. 3688 # bugs. If you change this, please update the perf dashboard as well.
3668 bisect_utils.OutputAnnotationStepStart('Results') 3689 bisect_utils.OutputAnnotationStepStart('Results')
3669 print 'Error: %s' % e.message 3690 print 'Error: %s' % e.message
3670 if opts.output_buildbot_annotations: 3691 if opts.output_buildbot_annotations:
3671 bisect_utils.OutputAnnotationStepClosed() 3692 bisect_utils.OutputAnnotationStepClosed()
3672 return 1 3693 return 1
3673 3694
3674 if __name__ == '__main__': 3695 if __name__ == '__main__':
3675 sys.exit(main()) 3696 sys.exit(main())
OLDNEW
« no previous file with comments | « no previous file | tools/bisect-perf-regression_test.py » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698