tools/bisect-perf-regression.py - Issue 241273002: Change "percentage change" function and add comments/test for it.

Side by Side Diff: tools/bisect-perf-regression.py

Issue 241273002: Change "percentage change" function and add comments/test for it. (Closed) Base URL: https://chromium.googlesource.com/chromium/src.git@master

Patch Set: Merge latest changes if any Created 6 years, 8 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

OLD	NEW
1 #!/usr/bin/env python	1 #!/usr/bin/env python

2 # Copyright (c) 2013 The Chromium Authors. All rights reserved.	2 # Copyright (c) 2013 The Chromium Authors. All rights reserved.

3 # Use of this source code is governed by a BSD-style license that can be	3 # Use of this source code is governed by a BSD-style license that can be

4 # found in the LICENSE file.	4 # found in the LICENSE file.

5	5

6 """Performance Test Bisect Tool	6 """Performance Test Bisect Tool

7	7

8 This script bisects a series of changelists using binary search. It starts at	8 This script bisects a series of changelists using binary search. It starts at

9 a bad revision where a performance metric has regressed, and asks for a last	9 a bad revision where a performance metric has regressed, and asks for a last

10 known-good revision. It will then binary search across this revision range by	10 known-good revision. It will then binary search across this revision range by

(...skipping 273 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
284	284

285 mean = CalculateMean(values)	285 mean = CalculateMean(values)

286 differences_from_mean = [float(x) - mean for x in values]	286 differences_from_mean = [float(x) - mean for x in values]

287 squared_differences = [float(x * x) for x in differences_from_mean]	287 squared_differences = [float(x * x) for x in differences_from_mean]

288 variance = sum(squared_differences) / (len(values) - 1)	288 variance = sum(squared_differences) / (len(values) - 1)

289 std_dev = math.sqrt(variance)	289 std_dev = math.sqrt(variance)

290	290

291 return std_dev	291 return std_dev

292	292

293	293

	294 def CalculateRelativeChange(before, after):

	295 """Returns the relative change of before and after, relative to before.

	296

	297 There are several different ways to define relative difference between

	298 two numbers; sometimes it is defined as relative to the smaller number,

	299 or to the mean of the two numbers. This version returns the difference

	300 relative to the first of the two numbers.

	301

	302 Args:

	303 before: A number representing an earlier value.

	304 after: Another number, representing a later value.

	305

	306 Returns:

	307 A non-negative floating point number; 0.1 represents a 10% change.

	308 """

	309 if before == 0:
	shatch 2014/04/17 19:46:17 If the "good" values were legitimately 0 (ie. the If the "good" values were legitimately 0 (ie. the values being bisected were # of memory leaks or some such), won't this report a regression size of nan? Don't think it functionally changes anything, just might look a bit funny. qyearsley 2014/04/17 20:54:05 Yeah -- In this version before my latest patch, th Show quoted text On 2014/04/17 19:46:17, shatch wrote: > If the "good" values were legitimately 0 (ie. the values being bisected were # > of memory leaks or some such), won't this report a regression size of nan? Don't > think it functionally changes anything, just might look a bit funny. Yeah -- In this version before my latest patch, the output would be: Bisect reproduced a nan (+-stderr%) change ... But it might be more informative to output: Bisect reproduced a zero-to-nonzero (+-stderr%) change... Note that the dashboard has a special case from zero-to-nonzero; Annie used the string "freakin' huge", so it says "Found a freakin' huge regression in ...". This occasionally confuses some sheriffs. In the latest patch set for this CL, I've made a change so that if this function returns NaN, then the message printed should be "zero-to-nonzero". If you think it's better, We could also return some other value like 10000000 or "zero to nonzero". shatch 2014/04/17 22:58:28 Hmm, yeah I'm not sure what the best output here w Hmm, yeah I'm not sure what the best output here would be. "zero-to-nonzero" doesn't feel right either, definitely better than nan though. Maybe Prasad has some thoughts on this. On 2014/04/17 20:54:05, qyearsley wrote: Show quoted text > On 2014/04/17 19:46:17, shatch wrote: > > If the "good" values were legitimately 0 (ie. the values being bisected were # > > of memory leaks or some such), won't this report a regression size of nan? > Don't > > think it functionally changes anything, just might look a bit funny. > > Yeah -- In this version before my latest patch, the output would be: > Bisect reproduced a nan (+-stderr%) change ... > > But it might be more informative to output: > Bisect reproduced a zero-to-nonzero (+-stderr%) change... > > Note that the dashboard has a special case from zero-to-nonzero; Annie used the > string "freakin' huge", so it says "Found a freakin' huge regression in ...". > This occasionally confuses some sheriffs. > > In the latest patch set for this CL, I've made a change so that if this function > returns NaN, then the message printed should be "zero-to-nonzero". If you think > it's better, We could also return some other value like 10000000 or "zero to > nonzero". qyearsley 2014/04/18 19:38:03 Anyway, if we're calculating relative change relat Anyway, if we're calculating relative change relative to 0, I don't think a numerical return value is very helpful. The other way of calculating relative change has the same issue; the relative change from 1 to 0 or 0 to 1 would have been: (1 / 0.0001) * 100 - 100 = 999900.0
	310 return float('nan')

	311 difference = math.fabs(after - before)
	shatch 2014/04/17 19:46:17 This fabs call seems unnecessary, considering you This fabs call seems unnecessary, considering you do it again on the next line. qyearsley 2014/04/17 20:54:05 Good point, done. (My original thinking was: first Show quoted text On 2014/04/17 19:46:17, shatch wrote: > This fabs call seems unnecessary, considering you do it again on the next line. Good point, done. (My original thinking was: first line get the absolute difference. Second line also get abs value because \|before\| might be negative. But the result is the same if abs is not taken of the difference.)
	312 return math.fabs(difference / before)

	313

	314

294 def CalculatePooledStandardError(work_sets):	315 def CalculatePooledStandardError(work_sets):

295 numerator = 0.0	316 numerator = 0.0

296 denominator1 = 0.0	317 denominator1 = 0.0

297 denominator2 = 0.0	318 denominator2 = 0.0

298	319

299 for current_set in work_sets:	320 for current_set in work_sets:

300 std_dev = CalculateStandardDeviation(current_set)	321 std_dev = CalculateStandardDeviation(current_set)

301 numerator += (len(current_set) - 1) * std_dev ** 2	322 numerator += (len(current_set) - 1) * std_dev ** 2

302 denominator1 += len(current_set) - 1	323 denominator1 += len(current_set) - 1

303 denominator2 += 1.0 / len(current_set)	324 denominator2 += 1.0 / len(current_set)

(...skipping 2826 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
3130 working_means.append(revision_data_sorted[i][1]['value']['values'])	3151 working_means.append(revision_data_sorted[i][1]['value']['values'])

3131	3152

3132 # Flatten the lists to calculate mean of all values.	3153 # Flatten the lists to calculate mean of all values.

3133 working_mean = sum(working_means, [])	3154 working_mean = sum(working_means, [])

3134 broken_mean = sum(broken_means, [])	3155 broken_mean = sum(broken_means, [])

3135	3156

3136 # Calculate the approximate size of the regression	3157 # Calculate the approximate size of the regression

3137 mean_of_bad_runs = CalculateMean(broken_mean)	3158 mean_of_bad_runs = CalculateMean(broken_mean)

3138 mean_of_good_runs = CalculateMean(working_mean)	3159 mean_of_good_runs = CalculateMean(working_mean)

3139	3160

3140 regression_size = math.fabs(max(mean_of_good_runs, mean_of_bad_runs) /	3161 regression_size = 100 * CalculateRelativeChange(mean_of_good_runs,

3141 max(0.0001, min(mean_of_good_runs, mean_of_bad_runs))) * 100.0 - 100.0	3162 mean_of_bad_runs)

3142	3163

3143 regression_std_err = math.fabs(CalculatePooledStandardError(	3164 regression_std_err = math.fabs(CalculatePooledStandardError(

3144 [working_mean, broken_mean]) /	3165 [working_mean, broken_mean]) /

3145 max(0.0001, min(mean_of_good_runs, mean_of_bad_runs))) * 100.0	3166 max(0.0001, min(mean_of_good_runs, mean_of_bad_runs))) * 100.0

3146	3167

3147 # Give a "confidence" in the bisect. At the moment we use how distinct the	3168 # Give a "confidence" in the bisect. At the moment we use how distinct the

3148 # values are before and after the last broken revision, and how noisy the	3169 # values are before and after the last broken revision, and how noisy the

3149 # overall graph is.	3170 # overall graph is.

3150 confidence = CalculateConfidence(working_means, broken_means)	3171 confidence = CalculateConfidence(working_means, broken_means)

3151	3172

(...skipping 514 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
3666 # The perf dashboard scrapes the "results" step in order to comment on	3687 # The perf dashboard scrapes the "results" step in order to comment on

3667 # bugs. If you change this, please update the perf dashboard as well.	3688 # bugs. If you change this, please update the perf dashboard as well.

3668 bisect_utils.OutputAnnotationStepStart('Results')	3689 bisect_utils.OutputAnnotationStepStart('Results')

3669 print 'Error: %s' % e.message	3690 print 'Error: %s' % e.message

3670 if opts.output_buildbot_annotations:	3691 if opts.output_buildbot_annotations:

3671 bisect_utils.OutputAnnotationStepClosed()	3692 bisect_utils.OutputAnnotationStepClosed()

3672 return 1	3693 return 1

3673	3694

3674 if __name__ == '__main__':	3695 if __name__ == '__main__':

3675 sys.exit(main())	3696 sys.exit(main())

OLD	NEW

« no previous file with comments | « no previous file | tools/bisect-perf-regression_test.py » ('j') | no next file with comments »