|
|
DescriptionMake discarding the first result possible through Benchmark.ValueCanBeAddedPredicate (add a "is_first_result" arg to it).
The main motivation here is to eliminate special cases for PageTest methods/attributes in user_story_runner.Run.
BUG=440101
Committed: https://crrev.com/2cd4e23b41a986a172ed9a15e165d3a416d9056b
Cr-Commit-Position: refs/heads/master@{#319140}
Patch Set 1 #
Total comments: 3
Patch Set 2 : Address comments and make command-line override-behavior work. #Patch Set 3 : Make the discard rule part of the results filter. (Not a cmd-line option.) #Patch Set 4 : Add unit test. #Patch Set 5 : Remove user_story_runner test for discarding the first result. #
Total comments: 6
Patch Set 6 : Address review comments. #Messages
Total messages: 37 (8 generated)
slamm@chromium.org changed reviewers: + nednguyen@google.com, sullivan@chromium.org
nednguyen@google.com changed reviewers: + aiolos@chromium.org
I updated the description: The main motivation here is to eliminate special cases for PageTest methods/attributes in user_story_runner.Run. This CL makes "discard_first_result" a command-line flag (under the results group). Typically its value is set by the benchmark. However, it can be overridden from the command-line (handy). This also has the nice side-effect of making "discard_first_result" available for user stories (it previously only applied to PageTest objects).
sullivan@chromium.org changed reviewers: + dtu@chromium.org
+dtu because command line flags
On 2015/02/27 03:35:32, sullivan wrote: > +dtu because command line flags I find --discard-first-result is a useful commandline flag that's used quite often in the past (we removed it at some point, which I didn't remember why). Most of the time, the metrics of the first run results is different from the rest because of some caching effect.
https://codereview.chromium.org/962793004/diff/1/tools/perf/benchmarks/page_c... File tools/perf/benchmarks/page_cycler.py (right): https://codereview.chromium.org/962793004/diff/1/tools/perf/benchmarks/page_c... tools/perf/benchmarks/page_cycler.py:27: def ProcessCommandLineArgs(cls, parser, args): Don't you need to call super class' ProcessCommandLineArgs? https://codereview.chromium.org/962793004/diff/1/tools/perf/benchmarks/startu... File tools/perf/benchmarks/startup.py (right): https://codereview.chromium.org/962793004/diff/1/tools/perf/benchmarks/startu... tools/perf/benchmarks/startup.py:32: args.discard_first_result = True ditto, otherwise the options setting at line 24 is ineffective. https://codereview.chromium.org/962793004/diff/1/tools/telemetry/telemetry/re... File tools/telemetry/telemetry/results/results_options.py (right): https://codereview.chromium.org/962793004/diff/1/tools/telemetry/telemetry/re... tools/telemetry/telemetry/results/results_options.py:54: help='Delete the first result for each user story. ' s/Delete/Discard
On 2015/02/27 13:22:25, nednguyen wrote: > On 2015/02/27 03:35:32, sullivan wrote: > > +dtu because command line flags > > I find --discard-first-result is a useful commandline flag that's used quite > often in the past (we removed it at some point, which I didn't remember why). > Most of the time, the metrics of the first run results is different from the > rest because of some caching effect. It's plausible to me that this would be true, but I want some data about it. Regardless, we don't want the user to set it; if it's affecting the benchmark results it should always be set for that benchmark (or all benchmarks). After all, we are looking to reduce noise.
On 2015/02/27 13:22:25, nednguyen wrote: > On 2015/02/27 03:35:32, sullivan wrote: > > +dtu because command line flags > > I find --discard-first-result is a useful commandline flag that's used quite > often in the past (we removed it at some point, which I didn't remember why). > Most of the time, the metrics of the first run results is different from the > rest because of some caching effect. I think with our discussion of wanting to reduce the number of command line flags, we need more justification than "being able to discard the first result is useful." If it's super useful for reducing noise, why don't we just do it by default and let the tests that don't want to use that behavior (anything that want cold times for example) overwrite it. Or if it's only useful in certain cases, why don't we just have people write a warm version of their tests for those cases?
On 2015/02/27 22:11:55, aiolos wrote: > On 2015/02/27 13:22:25, nednguyen wrote: > > On 2015/02/27 03:35:32, sullivan wrote: > > > +dtu because command line flags > > > > I find --discard-first-result is a useful commandline flag that's used quite > > often in the past (we removed it at some point, which I didn't remember why). > > Most of the time, the metrics of the first run results is different from the > > rest because of some caching effect. > > I think with our discussion of wanting to reduce the number of command line > flags, we need more justification than "being able to discard the first result > is useful." If it's super useful for reducing noise, why don't we just do it by > default and let the tests that don't want to use that behavior (anything that > want cold times for example) overwrite it. Or if it's only useful in certain > cases, why don't we just have people write a warm version of their tests for > those cases? Ned convinced me by saying that Matt Lee had asked for it. I was also thinking it would be handy for letting users see just how much slower the first run is for the "warm" case. (Although, now that I think about it, they could simply run the "cold" case to know that.) Ned do you have more justification for the command-line flag. If the command-line flag is shot down, could the benchmark somehow tell the results instance to discard the first result?
On 2015/02/27 22:34:15, slamm wrote: > On 2015/02/27 22:11:55, aiolos wrote: > > On 2015/02/27 13:22:25, nednguyen wrote: > > > On 2015/02/27 03:35:32, sullivan wrote: > > > > +dtu because command line flags > > > > > > I find --discard-first-result is a useful commandline flag that's used quite > > > often in the past (we removed it at some point, which I didn't remember > why). > > > Most of the time, the metrics of the first run results is different from the > > > rest because of some caching effect. > > > > I think with our discussion of wanting to reduce the number of command line > > flags, we need more justification than "being able to discard the first result > > is useful." If it's super useful for reducing noise, why don't we just do it > by > > default and let the tests that don't want to use that behavior (anything that > > want cold times for example) overwrite it. Or if it's only useful in certain > > cases, why don't we just have people write a warm version of their tests for > > those cases? > > Ned convinced me by saying that Matt Lee had asked for it. I was also thinking > it would > be handy for letting users see just how much slower the first run is for the > "warm" case. > (Although, now that I think about it, they could simply run the "cold" case to > know that.) > > Ned do you have more justification for the command-line flag. > If the command-line flag is shot down, could the benchmark somehow tell the > results > instance to discard the first result? Another example that people may want to use --discard-first-result: https://codereview.chromium.org/959063002/
On 2015/02/28 01:05:21, nednguyen wrote: > On 2015/02/27 22:34:15, slamm wrote: > > On 2015/02/27 22:11:55, aiolos wrote: > > > On 2015/02/27 13:22:25, nednguyen wrote: > > > > On 2015/02/27 03:35:32, sullivan wrote: > > > > > +dtu because command line flags > > > > > > > > I find --discard-first-result is a useful commandline flag that's used > quite > > > > often in the past (we removed it at some point, which I didn't remember > > why). > > > > Most of the time, the metrics of the first run results is different from > the > > > > rest because of some caching effect. > > > > > > I think with our discussion of wanting to reduce the number of command line > > > flags, we need more justification than "being able to discard the first > result > > > is useful." If it's super useful for reducing noise, why don't we just do it > > by > > > default and let the tests that don't want to use that behavior (anything > that > > > want cold times for example) overwrite it. Or if it's only useful in certain > > > cases, why don't we just have people write a warm version of their tests for > > > those cases? > > > > Ned convinced me by saying that Matt Lee had asked for it. I was also thinking > > it would > > be handy for letting users see just how much slower the first run is for the > > "warm" case. > > (Although, now that I think about it, they could simply run the "cold" case to > > know that.) > > > > Ned do you have more justification for the command-line flag. > > If the command-line flag is shot down, could the benchmark somehow tell the > > results > > instance to discard the first result? > > Another example that people may want to use --discard-first-result: > https://codereview.chromium.org/959063002/ I don't have strong opinion about whether we make --discard-first-result a commandline flag, or a benchmark option thing that users can specify. I do think that this is a useful option to many cases, but not all case by default (because this certainly increase the cycle time).
On 2015/02/28 01:08:21, nednguyen wrote: > On 2015/02/28 01:05:21, nednguyen wrote: > > On 2015/02/27 22:34:15, slamm wrote: > > > On 2015/02/27 22:11:55, aiolos wrote: > > > > On 2015/02/27 13:22:25, nednguyen wrote: > > > > > On 2015/02/27 03:35:32, sullivan wrote: > > > > > > +dtu because command line flags > > > > > > > > > > I find --discard-first-result is a useful commandline flag that's used > > quite > > > > > often in the past (we removed it at some point, which I didn't remember > > > why). > > > > > Most of the time, the metrics of the first run results is different from > > the > > > > > rest because of some caching effect. > > > > > > > > I think with our discussion of wanting to reduce the number of command > line > > > > flags, we need more justification than "being able to discard the first > > result > > > > is useful." If it's super useful for reducing noise, why don't we just do > it > > > by > > > > default and let the tests that don't want to use that behavior (anything > > that > > > > want cold times for example) overwrite it. Or if it's only useful in > certain > > > > cases, why don't we just have people write a warm version of their tests > for > > > > those cases? > > > > > > Ned convinced me by saying that Matt Lee had asked for it. I was also > thinking > > > it would > > > be handy for letting users see just how much slower the first run is for the > > > "warm" case. > > > (Although, now that I think about it, they could simply run the "cold" case > to > > > know that.) > > > > > > Ned do you have more justification for the command-line flag. > > > If the command-line flag is shot down, could the benchmark somehow tell the > > > results > > > instance to discard the first result? > > > > Another example that people may want to use --discard-first-result: > > https://codereview.chromium.org/959063002/ > > I don't have strong opinion about whether we make --discard-first-result a > commandline flag, or a benchmark option thing that users can specify. I do think > that this is a useful option to many cases, but not all case by default (because > this certainly increase the cycle time). It should definitely be an option baked into the benchmark and we shouldn't allow users to specify it at the command-line. We disallow having command-line configurations that can affect the results of the benchmark.
On 2015/03/02 18:24:22, dtu wrote: > On 2015/02/28 01:08:21, nednguyen wrote: > > On 2015/02/28 01:05:21, nednguyen wrote: > > > On 2015/02/27 22:34:15, slamm wrote: > > > > On 2015/02/27 22:11:55, aiolos wrote: > > > > > On 2015/02/27 13:22:25, nednguyen wrote: > > > > > > On 2015/02/27 03:35:32, sullivan wrote: > > > > > > > +dtu because command line flags > > > > > > > > > > > > I find --discard-first-result is a useful commandline flag that's used > > > quite > > > > > > often in the past (we removed it at some point, which I didn't > remember > > > > why). > > > > > > Most of the time, the metrics of the first run results is different > from > > > the > > > > > > rest because of some caching effect. > > > > > > > > > > I think with our discussion of wanting to reduce the number of command > > line > > > > > flags, we need more justification than "being able to discard the first > > > result > > > > > is useful." If it's super useful for reducing noise, why don't we just > do > > it > > > > by > > > > > default and let the tests that don't want to use that behavior (anything > > > that > > > > > want cold times for example) overwrite it. Or if it's only useful in > > certain > > > > > cases, why don't we just have people write a warm version of their tests > > for > > > > > those cases? > > > > > > > > Ned convinced me by saying that Matt Lee had asked for it. I was also > > thinking > > > > it would > > > > be handy for letting users see just how much slower the first run is for > the > > > > "warm" case. > > > > (Although, now that I think about it, they could simply run the "cold" > case > > to > > > > know that.) > > > > > > > > Ned do you have more justification for the command-line flag. > > > > If the command-line flag is shot down, could the benchmark somehow tell > the > > > > results > > > > instance to discard the first result? > > > > > > Another example that people may want to use --discard-first-result: > > > https://codereview.chromium.org/959063002/ > > > > I don't have strong opinion about whether we make --discard-first-result a > > commandline flag, or a benchmark option thing that users can specify. I do > think > > that this is a useful option to many cases, but not all case by default > (because > > this certainly increase the cycle time). > > It should definitely be an option baked into the benchmark and we shouldn't > allow users to specify it at the command-line. We disallow having command-line > configurations that can affect the results of the benchmark. Cool, I like this philosophy. @slamm, can you make this a benchmark or result option?
On 2015/03/02 22:31:41, nednguyen(REVIEW IN OTHER ACC) wrote: > On 2015/03/02 18:24:22, dtu wrote: > > On 2015/02/28 01:08:21, nednguyen wrote: > > > On 2015/02/28 01:05:21, nednguyen wrote: > > > > On 2015/02/27 22:34:15, slamm wrote: > > > > > On 2015/02/27 22:11:55, aiolos wrote: > > > > > > On 2015/02/27 13:22:25, nednguyen wrote: > > > > > > > On 2015/02/27 03:35:32, sullivan wrote: > > > > > > > > +dtu because command line flags > > > > > > > > > > > > > > I find --discard-first-result is a useful commandline flag that's > used > > > > quite > > > > > > > often in the past (we removed it at some point, which I didn't > > remember > > > > > why). > > > > > > > Most of the time, the metrics of the first run results is different > > from > > > > the > > > > > > > rest because of some caching effect. > > > > > > > > > > > > I think with our discussion of wanting to reduce the number of command > > > line > > > > > > flags, we need more justification than "being able to discard the > first > > > > result > > > > > > is useful." If it's super useful for reducing noise, why don't we just > > do > > > it > > > > > by > > > > > > default and let the tests that don't want to use that behavior > (anything > > > > that > > > > > > want cold times for example) overwrite it. Or if it's only useful in > > > certain > > > > > > cases, why don't we just have people write a warm version of their > tests > > > for > > > > > > those cases? > > > > > > > > > > Ned convinced me by saying that Matt Lee had asked for it. I was also > > > thinking > > > > > it would > > > > > be handy for letting users see just how much slower the first run is for > > the > > > > > "warm" case. > > > > > (Although, now that I think about it, they could simply run the "cold" > > case > > > to > > > > > know that.) > > > > > > > > > > Ned do you have more justification for the command-line flag. > > > > > If the command-line flag is shot down, could the benchmark somehow tell > > the > > > > > results > > > > > instance to discard the first result? > > > > > > > > Another example that people may want to use --discard-first-result: > > > > https://codereview.chromium.org/959063002/ > > > > > > I don't have strong opinion about whether we make --discard-first-result a > > > commandline flag, or a benchmark option thing that users can specify. I do > > think > > > that this is a useful option to many cases, but not all case by default > > (because > > > this certainly increase the cycle time). > > > > It should definitely be an option baked into the benchmark and we shouldn't > > allow users to specify it at the command-line. We disallow having command-line > > configurations that can affect the results of the benchmark. > Cool, I like this philosophy. @slamm, can you make this a benchmark or result > option? Yes, I would like to make it work as a benchmark or result option. My current idea is to make it a benchmark option and then have that set a results option. The former being what a benchmark writer changes and the latter being what gets passed to user_story_runner.
On 2015/03/02 22:59:14, slamm wrote: > On 2015/03/02 22:31:41, nednguyen(REVIEW IN OTHER ACC) wrote: > > On 2015/03/02 18:24:22, dtu wrote: > > > On 2015/02/28 01:08:21, nednguyen wrote: > > > > On 2015/02/28 01:05:21, nednguyen wrote: > > > > > On 2015/02/27 22:34:15, slamm wrote: > > > > > > On 2015/02/27 22:11:55, aiolos wrote: > > > > > > > On 2015/02/27 13:22:25, nednguyen wrote: > > > > > > > > On 2015/02/27 03:35:32, sullivan wrote: > > > > > > > > > +dtu because command line flags > > > > > > > > > > > > > > > > I find --discard-first-result is a useful commandline flag that's > > used > > > > > quite > > > > > > > > often in the past (we removed it at some point, which I didn't > > > remember > > > > > > why). > > > > > > > > Most of the time, the metrics of the first run results is > different > > > from > > > > > the > > > > > > > > rest because of some caching effect. > > > > > > > > > > > > > > I think with our discussion of wanting to reduce the number of > command > > > > line > > > > > > > flags, we need more justification than "being able to discard the > > first > > > > > result > > > > > > > is useful." If it's super useful for reducing noise, why don't we > just > > > do > > > > it > > > > > > by > > > > > > > default and let the tests that don't want to use that behavior > > (anything > > > > > that > > > > > > > want cold times for example) overwrite it. Or if it's only useful in > > > > certain > > > > > > > cases, why don't we just have people write a warm version of their > > tests > > > > for > > > > > > > those cases? > > > > > > > > > > > > Ned convinced me by saying that Matt Lee had asked for it. I was also > > > > thinking > > > > > > it would > > > > > > be handy for letting users see just how much slower the first run is > for > > > the > > > > > > "warm" case. > > > > > > (Although, now that I think about it, they could simply run the "cold" > > > case > > > > to > > > > > > know that.) > > > > > > > > > > > > Ned do you have more justification for the command-line flag. > > > > > > If the command-line flag is shot down, could the benchmark somehow > tell > > > the > > > > > > results > > > > > > instance to discard the first result? > > > > > > > > > > Another example that people may want to use --discard-first-result: > > > > > https://codereview.chromium.org/959063002/ > > > > > > > > I don't have strong opinion about whether we make --discard-first-result a > > > > commandline flag, or a benchmark option thing that users can specify. I do > > > think > > > > that this is a useful option to many cases, but not all case by default > > > (because > > > > this certainly increase the cycle time). > > > > > > It should definitely be an option baked into the benchmark and we shouldn't > > > allow users to specify it at the command-line. We disallow having > command-line > > > configurations that can affect the results of the benchmark. > > Cool, I like this philosophy. @slamm, can you make this a benchmark or result > > option? > > Yes, I would like to make it work as a benchmark or result option. > My current idea is to make it a benchmark option and then have that set a > results option. The former being what a benchmark writer changes and the latter > being what gets passed to user_story_runner. Related to this, the page_cycler benchmark has a command-line arg, --cold-load-percent. Its value determines whether or not the first result is discarded. As part of my change, should cold-load-percent become a class attribute?
nednguyen@google.com changed reviewers: - nednguyen@chromium.org
On 2015/03/03 18:53:06, slamm wrote: > On 2015/03/02 22:59:14, slamm wrote: > > On 2015/03/02 22:31:41, nednguyen(REVIEW IN OTHER ACC) wrote: > > > On 2015/03/02 18:24:22, dtu wrote: > > > > On 2015/02/28 01:08:21, nednguyen wrote: > > > > > On 2015/02/28 01:05:21, nednguyen wrote: > > > > > > On 2015/02/27 22:34:15, slamm wrote: > > > > > > > On 2015/02/27 22:11:55, aiolos wrote: > > > > > > > > On 2015/02/27 13:22:25, nednguyen wrote: > > > > > > > > > On 2015/02/27 03:35:32, sullivan wrote: > > > > > > > > > > +dtu because command line flags > > > > > > > > > > > > > > > > > > I find --discard-first-result is a useful commandline flag > that's > > > used > > > > > > quite > > > > > > > > > often in the past (we removed it at some point, which I didn't > > > > remember > > > > > > > why). > > > > > > > > > Most of the time, the metrics of the first run results is > > different > > > > from > > > > > > the > > > > > > > > > rest because of some caching effect. > > > > > > > > > > > > > > > > I think with our discussion of wanting to reduce the number of > > command > > > > > line > > > > > > > > flags, we need more justification than "being able to discard the > > > first > > > > > > result > > > > > > > > is useful." If it's super useful for reducing noise, why don't we > > just > > > > do > > > > > it > > > > > > > by > > > > > > > > default and let the tests that don't want to use that behavior > > > (anything > > > > > > that > > > > > > > > want cold times for example) overwrite it. Or if it's only useful > in > > > > > certain > > > > > > > > cases, why don't we just have people write a warm version of their > > > tests > > > > > for > > > > > > > > those cases? > > > > > > > > > > > > > > Ned convinced me by saying that Matt Lee had asked for it. I was > also > > > > > thinking > > > > > > > it would > > > > > > > be handy for letting users see just how much slower the first run is > > for > > > > the > > > > > > > "warm" case. > > > > > > > (Although, now that I think about it, they could simply run the > "cold" > > > > case > > > > > to > > > > > > > know that.) > > > > > > > > > > > > > > Ned do you have more justification for the command-line flag. > > > > > > > If the command-line flag is shot down, could the benchmark somehow > > tell > > > > the > > > > > > > results > > > > > > > instance to discard the first result? > > > > > > > > > > > > Another example that people may want to use --discard-first-result: > > > > > > https://codereview.chromium.org/959063002/ > > > > > > > > > > I don't have strong opinion about whether we make --discard-first-result > a > > > > > commandline flag, or a benchmark option thing that users can specify. I > do > > > > think > > > > > that this is a useful option to many cases, but not all case by default > > > > (because > > > > > this certainly increase the cycle time). > > > > > > > > It should definitely be an option baked into the benchmark and we > shouldn't > > > > allow users to specify it at the command-line. We disallow having > > command-line > > > > configurations that can affect the results of the benchmark. > > > Cool, I like this philosophy. @slamm, can you make this a benchmark or > result > > > option? > > > > Yes, I would like to make it work as a benchmark or result option. > > My current idea is to make it a benchmark option and then have that set a > > results option. The former being what a benchmark writer changes and the > latter > > being what gets passed to user_story_runner. > > Related to this, the page_cycler benchmark has a command-line arg, > --cold-load-percent. Its value determines whether or not the first result is > discarded. As part of my change, should cold-load-percent become a class > attribute? I think we should make cold-load-percent a class attribute since this is a command line flag that can affect the results.
On 2015/03/03 19:18:33, nednguyen wrote: > On 2015/03/03 18:53:06, slamm wrote: > > On 2015/03/02 22:59:14, slamm wrote: > > > On 2015/03/02 22:31:41, nednguyen(REVIEW IN OTHER ACC) wrote: > > > > On 2015/03/02 18:24:22, dtu wrote: > > > > > On 2015/02/28 01:08:21, nednguyen wrote: > > > > > > On 2015/02/28 01:05:21, nednguyen wrote: > > > > > > > On 2015/02/27 22:34:15, slamm wrote: > > > > > > > > On 2015/02/27 22:11:55, aiolos wrote: > > > > > > > > > On 2015/02/27 13:22:25, nednguyen wrote: > > > > > > > > > > On 2015/02/27 03:35:32, sullivan wrote: > > > > > > > > > > > +dtu because command line flags > > > > > > > > > > > > > > > > > > > > I find --discard-first-result is a useful commandline flag > > that's > > > > used > > > > > > > quite > > > > > > > > > > often in the past (we removed it at some point, which I didn't > > > > > remember > > > > > > > > why). > > > > > > > > > > Most of the time, the metrics of the first run results is > > > different > > > > > from > > > > > > > the > > > > > > > > > > rest because of some caching effect. > > > > > > > > > > > > > > > > > > I think with our discussion of wanting to reduce the number of > > > command > > > > > > line > > > > > > > > > flags, we need more justification than "being able to discard > the > > > > first > > > > > > > result > > > > > > > > > is useful." If it's super useful for reducing noise, why don't > we > > > just > > > > > do > > > > > > it > > > > > > > > by > > > > > > > > > default and let the tests that don't want to use that behavior > > > > (anything > > > > > > > that > > > > > > > > > want cold times for example) overwrite it. Or if it's only > useful > > in > > > > > > certain > > > > > > > > > cases, why don't we just have people write a warm version of > their > > > > tests > > > > > > for > > > > > > > > > those cases? > > > > > > > > > > > > > > > > Ned convinced me by saying that Matt Lee had asked for it. I was > > also > > > > > > thinking > > > > > > > > it would > > > > > > > > be handy for letting users see just how much slower the first run > is > > > for > > > > > the > > > > > > > > "warm" case. > > > > > > > > (Although, now that I think about it, they could simply run the > > "cold" > > > > > case > > > > > > to > > > > > > > > know that.) > > > > > > > > > > > > > > > > Ned do you have more justification for the command-line flag. > > > > > > > > If the command-line flag is shot down, could the benchmark somehow > > > tell > > > > > the > > > > > > > > results > > > > > > > > instance to discard the first result? > > > > > > > > > > > > > > Another example that people may want to use --discard-first-result: > > > > > > > https://codereview.chromium.org/959063002/ > > > > > > > > > > > > I don't have strong opinion about whether we make > --discard-first-result > > a > > > > > > commandline flag, or a benchmark option thing that users can specify. > I > > do > > > > > think > > > > > > that this is a useful option to many cases, but not all case by > default > > > > > (because > > > > > > this certainly increase the cycle time). > > > > > > > > > > It should definitely be an option baked into the benchmark and we > > shouldn't > > > > > allow users to specify it at the command-line. We disallow having > > > command-line > > > > > configurations that can affect the results of the benchmark. > > > > Cool, I like this philosophy. @slamm, can you make this a benchmark or > > result > > > > option? > > > > > > Yes, I would like to make it work as a benchmark or result option. > > > My current idea is to make it a benchmark option and then have that set a > > > results option. The former being what a benchmark writer changes and the > > latter > > > being what gets passed to user_story_runner. > > > > Related to this, the page_cycler benchmark has a command-line arg, > > --cold-load-percent. Its value determines whether or not the first result is > > discarded. As part of my change, should cold-load-percent become a class > > attribute? > > I think we should make cold-load-percent a class attribute since this is a > command line flag that can affect the results. Agreed.
Patchset #3 (id:40001) has been deleted
On 2015/03/03 19:24:45, aiolos wrote: > On 2015/03/03 19:18:33, nednguyen wrote: > > On 2015/03/03 18:53:06, slamm wrote: > > > On 2015/03/02 22:59:14, slamm wrote: > > > > On 2015/03/02 22:31:41, nednguyen(REVIEW IN OTHER ACC) wrote: > > > > > On 2015/03/02 18:24:22, dtu wrote: > > > > > > On 2015/02/28 01:08:21, nednguyen wrote: > > > > > > > On 2015/02/28 01:05:21, nednguyen wrote: > > > > > > > > On 2015/02/27 22:34:15, slamm wrote: > > > > > > > > > On 2015/02/27 22:11:55, aiolos wrote: > > > > > > > > > > On 2015/02/27 13:22:25, nednguyen wrote: > > > > > > > > > > > On 2015/02/27 03:35:32, sullivan wrote: > > > > > > > > > > > > +dtu because command line flags > > > > > > > > > > > > > > > > > > > > > > I find --discard-first-result is a useful commandline flag > > > that's > > > > > used > > > > > > > > quite > > > > > > > > > > > often in the past (we removed it at some point, which I > didn't > > > > > > remember > > > > > > > > > why). > > > > > > > > > > > Most of the time, the metrics of the first run results is > > > > different > > > > > > from > > > > > > > > the > > > > > > > > > > > rest because of some caching effect. > > > > > > > > > > > > > > > > > > > > I think with our discussion of wanting to reduce the number of > > > > command > > > > > > > line > > > > > > > > > > flags, we need more justification than "being able to discard > > the > > > > > first > > > > > > > > result > > > > > > > > > > is useful." If it's super useful for reducing noise, why don't > > we > > > > just > > > > > > do > > > > > > > it > > > > > > > > > by > > > > > > > > > > default and let the tests that don't want to use that behavior > > > > > (anything > > > > > > > > that > > > > > > > > > > want cold times for example) overwrite it. Or if it's only > > useful > > > in > > > > > > > certain > > > > > > > > > > cases, why don't we just have people write a warm version of > > their > > > > > tests > > > > > > > for > > > > > > > > > > those cases? > > > > > > > > > > > > > > > > > > Ned convinced me by saying that Matt Lee had asked for it. I was > > > also > > > > > > > thinking > > > > > > > > > it would > > > > > > > > > be handy for letting users see just how much slower the first > run > > is > > > > for > > > > > > the > > > > > > > > > "warm" case. > > > > > > > > > (Although, now that I think about it, they could simply run the > > > "cold" > > > > > > case > > > > > > > to > > > > > > > > > know that.) > > > > > > > > > > > > > > > > > > Ned do you have more justification for the command-line flag. > > > > > > > > > If the command-line flag is shot down, could the benchmark > somehow > > > > tell > > > > > > the > > > > > > > > > results > > > > > > > > > instance to discard the first result? > > > > > > > > > > > > > > > > Another example that people may want to use > --discard-first-result: > > > > > > > > https://codereview.chromium.org/959063002/ > > > > > > > > > > > > > > I don't have strong opinion about whether we make > > --discard-first-result > > > a > > > > > > > commandline flag, or a benchmark option thing that users can > specify. > > I > > > do > > > > > > think > > > > > > > that this is a useful option to many cases, but not all case by > > default > > > > > > (because > > > > > > > this certainly increase the cycle time). > > > > > > > > > > > > It should definitely be an option baked into the benchmark and we > > > shouldn't > > > > > > allow users to specify it at the command-line. We disallow having > > > > command-line > > > > > > configurations that can affect the results of the benchmark. > > > > > Cool, I like this philosophy. @slamm, can you make this a benchmark or > > > result > > > > > option? > > > > > > > > Yes, I would like to make it work as a benchmark or result option. > > > > My current idea is to make it a benchmark option and then have that set a > > > > results option. The former being what a benchmark writer changes and the > > > latter > > > > being what gets passed to user_story_runner. > > > > > > Related to this, the page_cycler benchmark has a command-line arg, > > > --cold-load-percent. Its value determines whether or not the first result is > > > discarded. As part of my change, should cold-load-percent become a class > > > attribute? > > > > I think we should make cold-load-percent a class attribute since this is a > > command line flag that can affect the results. > > Agreed. Yus
Add unit test.
Remove user_story_runner test for discarding the first result.
On 2015/03/03 22:59:34, slamm wrote: > Remove user_story_runner test for discarding the first result. PTAL.
nednguyen@google.com changed reviewers: + eakuefner@chromium.org
+Ethan for results/ code review. This approach is lg2me
results changes lgtm https://codereview.chromium.org/962793004/diff/100001/tools/telemetry/telemet... File tools/telemetry/telemetry/results/page_test_results.py (right): https://codereview.chromium.org/962793004/diff/100001/tools/telemetry/telemet... tools/telemetry/telemetry/results/page_test_results.py:26: value_can_be_added_predicate=lambda v, f: True): f -> is_first maybe? It describes it in the text, but when I read "f" before I read the docstring I assumed it was going to be a function. I think v is okay because it's an obvious metavariable for a value. https://codereview.chromium.org/962793004/diff/100001/tools/telemetry/telemet... tools/telemetry/telemetry/results/page_test_results.py:59: self._all_user_stories = set() YES! https://codereview.chromium.org/962793004/diff/100001/tools/telemetry/telemet... File tools/telemetry/telemetry/results/results_options.py (right): https://codereview.chromium.org/962793004/diff/100001/tools/telemetry/telemet... tools/telemetry/telemetry/results/results_options.py:111: value_can_be_added_predicate=lambda v, f: True): and here, if you decide it's a good idea.
Address review comments.
Thank you for the comments. I am quite pleased with how this turned out. It is nice to clean-up some cruft. https://codereview.chromium.org/962793004/diff/100001/tools/telemetry/telemet... File tools/telemetry/telemetry/results/page_test_results.py (right): https://codereview.chromium.org/962793004/diff/100001/tools/telemetry/telemet... tools/telemetry/telemetry/results/page_test_results.py:26: value_can_be_added_predicate=lambda v, f: True): On 2015/03/04 19:19:05, eakuefner wrote: > f -> is_first maybe? It describes it in the text, but when I read "f" before I > read the docstring I assumed it was going to be a function. I think v is okay > because it's an obvious metavariable for a value. Done. https://codereview.chromium.org/962793004/diff/100001/tools/telemetry/telemet... tools/telemetry/telemetry/results/page_test_results.py:59: self._all_user_stories = set() On 2015/03/04 19:19:05, eakuefner wrote: > YES! Love the enthusiasm! =) https://codereview.chromium.org/962793004/diff/100001/tools/telemetry/telemet... File tools/telemetry/telemetry/results/results_options.py (right): https://codereview.chromium.org/962793004/diff/100001/tools/telemetry/telemet... tools/telemetry/telemetry/results/results_options.py:111: value_can_be_added_predicate=lambda v, f: True): On 2015/03/04 19:19:05, eakuefner wrote: > and here, if you decide it's a good idea. I do think it is a good idea. Thank you for the suggestion.
lgtm There is some risk that this may cause regression on those benchmarks. I can't think of anyway to test that, so probably need watch out for this patch if regression happens to them.
The CQ bit was checked by slamm@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from eakuefner@chromium.org Link to the patchset: https://codereview.chromium.org/962793004/#ps120001 (title: "Address review comments.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/962793004/120001
Message was sent while issue was closed.
Committed patchset #6 (id:120001)
Message was sent while issue was closed.
Patchset 6 (id:??) landed as https://crrev.com/2cd4e23b41a986a172ed9a15e165d3a416d9056b Cr-Commit-Position: refs/heads/master@{#319140} |