tools/bisect-perf-regression.py - Issue 241273002: Change "percentage change" function and add comments/test for it.

Side by Side Diff: tools/bisect-perf-regression.py

Issue 241273002: Change "percentage change" function and add comments/test for it. (Closed) Base URL: https://chromium.googlesource.com/chromium/src.git@master

Patch Set: Created 6 years, 8 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

OLD	NEW
1 #!/usr/bin/env python	1 #!/usr/bin/env python

2 # Copyright (c) 2013 The Chromium Authors. All rights reserved.	2 # Copyright (c) 2013 The Chromium Authors. All rights reserved.

3 # Use of this source code is governed by a BSD-style license that can be	3 # Use of this source code is governed by a BSD-style license that can be

4 # found in the LICENSE file.	4 # found in the LICENSE file.

5	5

6 """Performance Test Bisect Tool	6 """Performance Test Bisect Tool

7	7

8 This script bisects a series of changelists using binary search. It starts at	8 This script bisects a series of changelists using binary search. It starts at

9 a bad revision where a performance metric has regressed, and asks for a last	9 a bad revision where a performance metric has regressed, and asks for a last

10 known-good revision. It will then binary search across this revision range by	10 known-good revision. It will then binary search across this revision range by

(...skipping 169 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
180 def _AddAdditionalDepotInfo(depot_info):	180 def _AddAdditionalDepotInfo(depot_info):

181 """Adds additional depot info to the global depot variables."""	181 """Adds additional depot info to the global depot variables."""

182 global DEPOT_DEPS_NAME	182 global DEPOT_DEPS_NAME

183 global DEPOT_NAMES	183 global DEPOT_NAMES

184 DEPOT_DEPS_NAME = dict(DEPOT_DEPS_NAME.items() +	184 DEPOT_DEPS_NAME = dict(DEPOT_DEPS_NAME.items() +

185 depot_info.items())	185 depot_info.items())

186 DEPOT_NAMES = DEPOT_DEPS_NAME.keys()	186 DEPOT_NAMES = DEPOT_DEPS_NAME.keys()

187	187

188	188

189 def CalculateTruncatedMean(data_set, truncate_percent):	189 def CalculateTruncatedMean(data_set, truncate_percent):

190 """Calculates the truncated mean of a set of values.	190 """Calculates the truncated mean of a set of values.
	qyearsley 2014/04/17 18:47:59 I just did a quick search, and it appears that cal I just did a quick search, and it appears that calculation of "truncated mean" (or "trimmed mean") usually doesn't involve; usually the X% trimmed mean is calculated by just excluding the top X% and bottom X% and taking the mean of the rest of the values. Reference: http://www.d.umn.edu/~yqi/stat3611/trimmed.pdf http://voices.yahoo.com/statistics-101-trimmed-mean-median-4152059.html http://www.statistics.com/glossary&term_id=866 However, calculation of the "interquartile mean" may involve weighting two of the values. http://en.wikipedia.org/wiki/Interquartile_mean Any thoughts? I'm thinking of replacing this with the simplified truncated mean function, which doesn't weight anything. shatch 2014/04/17 19:46:17 I don't have a strong opinion on it, so go ahead w I don't have a strong opinion on it, so go ahead with whatever you feel is best. http://en.wikipedia.org/wiki/Truncated_mean This does define the truncated/trimmed mean as using weighted values. Additionally, according to the wikipedia entry you linked, the interquartile mean is just the truncated mean with specific values (in this case, 25%). On 2014/04/17 18:47:59, qyearsley wrote: Show quoted text > I just did a quick search, and it appears that calculation of "truncated mean" > (or "trimmed mean") usually doesn't involve; usually the X% trimmed mean is > calculated by just excluding the top X% and bottom X% and taking the mean of the > rest of the values. > > Reference: > http://www.d.umn.edu/~yqi/stat3611/trimmed.pdf > http://voices.yahoo.com/statistics-101-trimmed-mean-median-4152059.html > http://www.statistics.com/glossary&term_id=866 > > However, calculation of the "interquartile mean" may involve weighting two of > the values. > http://en.wikipedia.org/wiki/Interquartile_mean > > Any thoughts? I'm thinking of replacing this with the simplified truncated mean > function, which doesn't weight anything. qyearsley 2014/04/17 20:54:05 Ah, right! I overlooked that. I think that the cu Ah, right! I overlooked that. I think that the current truncated mean function is fine, I just had a bit of trouble understanding it.
191	191

192 Note that this isn't just the mean of the set of values with the highest	192 Note that this isn't just the mean of the set of values with the highest

193 and lowest values discarded; the non-discarded values are also weighted	193 and lowest values discarded; the non-discarded values are also weighted

194 differently depending how many values are discarded.	194 differently depending how many values are discarded.

195	195

196 Args:	196 Args:

197 data_set: Non-empty list of values.	197 data_set: Non-empty list of values.

198 truncate_percent: The % from the upper and lower portions of the data set	198 truncate_percent: The % from the upper and lower portions of the data set

199 to discard, expressed as a value in [0, 1].	199 to discard, expressed as a value in [0, 1].

200	200

(...skipping 83 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
284	284

285 mean = CalculateMean(values)	285 mean = CalculateMean(values)

286 differences_from_mean = [float(x) - mean for x in values]	286 differences_from_mean = [float(x) - mean for x in values]

287 squared_differences = [float(x * x) for x in differences_from_mean]	287 squared_differences = [float(x * x) for x in differences_from_mean]

288 variance = sum(squared_differences) / (len(values) - 1)	288 variance = sum(squared_differences) / (len(values) - 1)

289 std_dev = math.sqrt(variance)	289 std_dev = math.sqrt(variance)

290	290

291 return std_dev	291 return std_dev

292	292

293	293

	294 def CalculateRelativeChange(before, after):

	295 """Returns the relative change of before and after, relative to before.

	296

	297 There are several different ways to define relative difference between

	298 two numbers; sometimes it is defined as relative to the smaller number,
	qyearsley 2014/04/17 18:47:59 The previous version was calculating the differenc The previous version was calculating the difference relative to the smaller number. It was calculating: (big/small) - 1.0 = (big/small) - (small/small) = (big - small) / small shatch 2014/04/17 19:46:17 Does this change make it match the perf dashboard? Does this change make it match the perf dashboard? On 2014/04/17 18:47:59, qyearsley wrote: Show quoted text > The previous version was calculating the difference relative to the smaller > number. > > It was calculating: > (big/small) - 1.0 > = (big/small) - (small/small) > = (big - small) / small shatch 2014/04/17 19:47:19 Doh! This is answered in your CL description :) O Doh! This is answered in your CL description :) On 2014/04/17 19:46:17, shatch wrote: Show quoted text > Does this change make it match the perf dashboard? > > On 2014/04/17 18:47:59, qyearsley wrote: > > The previous version was calculating the difference relative to the smaller > > number. > > > > It was calculating: > > (big/small) - 1.0 > > = (big/small) - (small/small) > > = (big - small) / small >
	299 or to the mean of the two numbers. This version returns the difference

	300 relative to the first of the two numbers.

	301

	302 Args:

	303 before: A number representing an earlier value.

	304 after: Another number, representing a later value.

	305

	306 Returns:

	307 A non-negative floating point number; 0.1 represents a 10% change.

	308 """

	309 if before == 0:

	310 return float('nan')

	311 difference = math.fabs(after - before)

	312 return math.fabs(difference / before)

	313

	314

294 def CalculatePooledStandardError(work_sets):	315 def CalculatePooledStandardError(work_sets):

295 numerator = 0.0	316 numerator = 0.0

296 denominator1 = 0.0	317 denominator1 = 0.0

297 denominator2 = 0.0	318 denominator2 = 0.0

298	319

299 for current_set in work_sets:	320 for current_set in work_sets:

300 std_dev = CalculateStandardDeviation(current_set)	321 std_dev = CalculateStandardDeviation(current_set)

301 numerator += (len(current_set) - 1) * std_dev ** 2	322 numerator += (len(current_set) - 1) * std_dev ** 2

302 denominator1 += len(current_set) - 1	323 denominator1 += len(current_set) - 1

303 denominator2 += 1.0 / len(current_set)	324 denominator2 += 1.0 / len(current_set)

(...skipping 2826 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
3130 working_means.append(revision_data_sorted[i][1]['value']['values'])	3151 working_means.append(revision_data_sorted[i][1]['value']['values'])

3131	3152

3132 # Flatten the lists to calculate mean of all values.	3153 # Flatten the lists to calculate mean of all values.

3133 working_mean = sum(working_means, [])	3154 working_mean = sum(working_means, [])

3134 broken_mean = sum(broken_means, [])	3155 broken_mean = sum(broken_means, [])

3135	3156

3136 # Calculate the approximate size of the regression	3157 # Calculate the approximate size of the regression

3137 mean_of_bad_runs = CalculateMean(broken_mean)	3158 mean_of_bad_runs = CalculateMean(broken_mean)

3138 mean_of_good_runs = CalculateMean(working_mean)	3159 mean_of_good_runs = CalculateMean(working_mean)

3139	3160

3140 regression_size = math.fabs(max(mean_of_good_runs, mean_of_bad_runs) /	3161 regression_size = 100 * CalculateRelativeChange(mean_of_good_runs,

3141 max(0.0001, min(mean_of_good_runs, mean_of_bad_runs))) * 100.0 - 100.0	3162 mean_of_bad_runs)

3142	3163

3143 regression_std_err = math.fabs(CalculatePooledStandardError(	3164 regression_std_err = math.fabs(CalculatePooledStandardError(

3144 [working_mean, broken_mean]) /	3165 [working_mean, broken_mean]) /

3145 max(0.0001, min(mean_of_good_runs, mean_of_bad_runs))) * 100.0	3166 max(0.0001, min(mean_of_good_runs, mean_of_bad_runs))) * 100.0

3146	3167

3147 # Give a "confidence" in the bisect. At the moment we use how distinct the	3168 # Give a "confidence" in the bisect. At the moment we use how distinct the

3148 # values are before and after the last broken revision, and how noisy the	3169 # values are before and after the last broken revision, and how noisy the

3149 # overall graph is.	3170 # overall graph is.

3150 confidence = CalculateConfidence(working_means, broken_means)	3171 confidence = CalculateConfidence(working_means, broken_means)

3151	3172

(...skipping 514 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
3666 # The perf dashboard scrapes the "results" step in order to comment on	3687 # The perf dashboard scrapes the "results" step in order to comment on

3667 # bugs. If you change this, please update the perf dashboard as well.	3688 # bugs. If you change this, please update the perf dashboard as well.

3668 bisect_utils.OutputAnnotationStepStart('Results')	3689 bisect_utils.OutputAnnotationStepStart('Results')

3669 print 'Error: %s' % e.message	3690 print 'Error: %s' % e.message

3670 if opts.output_buildbot_annotations:	3691 if opts.output_buildbot_annotations:

3671 bisect_utils.OutputAnnotationStepClosed()	3692 bisect_utils.OutputAnnotationStepClosed()

3672 return 1	3693 return 1

3673	3694

3674 if __name__ == '__main__':	3695 if __name__ == '__main__':

3675 sys.exit(main())	3696 sys.exit(main())

OLD	NEW

« no previous file with comments | « no previous file | tools/bisect-perf-regression_test.py » ('j') | no next file with comments »