Issue 2561353002: tools: WPR support for testing speculative_prefetch_predictor.

Benoit L

lizeb@chromium.org changed reviewers: + pasko@chromium.org

4 years ago (2016-12-09 15:17:28 UTC) #1

pasko

https://codereview.chromium.org/2561353002/diff/1/tools/android/customtabs_benchmark/scripts/customtabs_benchmark.py File tools/android/customtabs_benchmark/scripts/customtabs_benchmark.py (right): https://codereview.chromium.org/2561353002/diff/1/tools/android/customtabs_benchmark/scripts/customtabs_benchmark.py#newcode1 tools/android/customtabs_benchmark/scripts/customtabs_benchmark.py:1: #!/usr/bin/python toplevel note: please start including alexilin@ as a ...

4 years ago (2016-12-12 18:07:10 UTC) #3

https://codereview.chromium.org/2561353002/diff/1/tools/android/customtabs_be...
File tools/android/customtabs_benchmark/scripts/customtabs_benchmark.py (right):

https://codereview.chromium.org/2561353002/diff/1/tools/android/customtabs_be...
tools/android/customtabs_benchmark/scripts/customtabs_benchmark.py:1:
#!/usr/bin/python
toplevel note: please start including alexilin@ as a reviewer for the benchmark
code.

This is partly for education of how we are benchmarking, partly for bus factor,
partly to reduce my codereview load.

https://codereview.chromium.org/2561353002/diff/1/tools/resource_prefetch_pre...
File tools/resource_prefetch_predictor/generate_database.py (right):

https://codereview.chromium.org/2561353002/diff/1/tools/resource_prefetch_pre...
tools/resource_prefetch_predictor/generate_database.py:9: Also generates a WPR
archive for another page.
Perhaps then makes sense to rename to generate_test_data.py

I (maybe naively) assume that after this script finishes, the benchmark can be
repeated as many times as desirable without running this script. Please let me
know if it is a wrong assumption.

https://codereview.chromium.org/2561353002/diff/1/tools/resource_prefetch_pre...
File tools/resource_prefetch_predictor/prefetch_benchmark.py (right):

https://codereview.chromium.org/2561353002/diff/1/tools/resource_prefetch_pre...
tools/resource_prefetch_predictor/prefetch_benchmark.py:53: 'prefetch. Runs will
randomly select one delay in the '
Are you looking for the delay that produces best results? A statistician hat
does not suit me well, but if I disregard this fact and try to fit the hat on my
head, I would seriously question the approach as leading to type 2 errors.

Can we agree upfront what 1-3 delays are considered 'typical' according to some
data from the field and then focus on benchmarking only those?

https://codereview.chromium.org/2561353002/diff/1/tools/resource_prefetch_pre...
tools/resource_prefetch_predictor/prefetch_benchmark.py:128: result =
_Go(device, url, prefetch_delay_ms, wpr_archive, network_condition)
naming: When given a "Go" and "RunOnce" and knowing that one is doing part of
the work of another, and asked which one is which, I would hesitate, and then
would say that "Go" probably does "RunOnce" several times, which is contrary to
what is really happening.

s/_Go/_RunOnceImpl/ or just inline?

https://codereview.chromium.org/2561353002/diff/1/tools/resource_prefetch_pre...
tools/resource_prefetch_predictor/prefetch_benchmark.py:130:
f.write(','.join(str(x) for x in result) + '\n')
please print CSV columns as the first line of the file. Otherwise extending the
format and interpreting results from previous runs seem too error-prone.

https://codereview.chromium.org/2561353002/diff/1/tools/resource_prefetch_pre...
tools/resource_prefetch_predictor/prefetch_benchmark.py:149: _RunOnce(device,
args.database, args.url, delay, args.output_filename,
for extra robustness against data mis-interpretation this should fail if the
file exists on start.

Benoit L

Thanks! https://codereview.chromium.org/2561353002/diff/1/tools/android/customtabs_benchmark/scripts/customtabs_benchmark.py File tools/android/customtabs_benchmark/scripts/customtabs_benchmark.py (right): https://codereview.chromium.org/2561353002/diff/1/tools/android/customtabs_benchmark/scripts/customtabs_benchmark.py#newcode1 tools/android/customtabs_benchmark/scripts/customtabs_benchmark.py:1: #!/usr/bin/python On 2016/12/12 18:07:09, pasko wrote: > toplevel ...

4 years ago (2016-12-14 17:39:40 UTC) #4

Thanks!

https://codereview.chromium.org/2561353002/diff/1/tools/android/customtabs_be...
File tools/android/customtabs_benchmark/scripts/customtabs_benchmark.py (right):

https://codereview.chromium.org/2561353002/diff/1/tools/android/customtabs_be...
tools/android/customtabs_benchmark/scripts/customtabs_benchmark.py:1:
#!/usr/bin/python
On 2016/12/12 18:07:09, pasko wrote:
> toplevel note: please start including alexilin@ as a reviewer for the
benchmark
> code.
> 
> This is partly for education of how we are benchmarking, partly for bus
factor,
> partly to reduce my codereview load.

Acknowledged.

https://codereview.chromium.org/2561353002/diff/1/tools/resource_prefetch_pre...
File tools/resource_prefetch_predictor/generate_database.py (right):

https://codereview.chromium.org/2561353002/diff/1/tools/resource_prefetch_pre...
tools/resource_prefetch_predictor/generate_database.py:9: Also generates a WPR
archive for another page.
On 2016/12/12 18:07:09, pasko wrote:
> Perhaps then makes sense to rename to generate_test_data.py
> 
> I (maybe naively) assume that after this script finishes, the benchmark can be
> repeated as many times as desirable without running this script. Please let me
> know if it is a wrong assumption.

Yes, this is the case.

https://codereview.chromium.org/2561353002/diff/1/tools/resource_prefetch_pre...
File tools/resource_prefetch_predictor/prefetch_benchmark.py (right):

https://codereview.chromium.org/2561353002/diff/1/tools/resource_prefetch_pre...
tools/resource_prefetch_predictor/prefetch_benchmark.py:53: 'prefetch. Runs will
randomly select one delay in the '
On 2016/12/12 18:07:10, pasko wrote:
> Are you looking for the delay that produces best results? A statistician hat
> does not suit me well, but if I disregard this fact and try to fit the hat on
my
> head, I would seriously question the approach as leading to type 2 errors.
> 
> Can we agree upfront what 1-3 delays are considered 'typical' according to
some
> data from the field and then focus on benchmarking only those?

No, it's to disable the prefetcher, with something like 3000,-1, since -1
disables the prefetcher.

https://codereview.chromium.org/2561353002/diff/1/tools/resource_prefetch_pre...
tools/resource_prefetch_predictor/prefetch_benchmark.py:128: result =
_Go(device, url, prefetch_delay_ms, wpr_archive, network_condition)
On 2016/12/12 18:07:09, pasko wrote:
> naming: When given a "Go" and "RunOnce" and knowing that one is doing part of
> the work of another, and asked which one is which, I would hesitate, and then
> would say that "Go" probably does "RunOnce" several times, which is contrary
to
> what is really happening.
> 
> s/_Go/_RunOnceImpl/ or just inline?

Done.

https://codereview.chromium.org/2561353002/diff/1/tools/resource_prefetch_pre...
tools/resource_prefetch_predictor/prefetch_benchmark.py:130:
f.write(','.join(str(x) for x in result) + '\n')
On 2016/12/12 18:07:09, pasko wrote:
> please print CSV columns as the first line of the file. Otherwise extending
the
> format and interpreting results from previous runs seem too error-prone.

Done.

https://codereview.chromium.org/2561353002/diff/1/tools/resource_prefetch_pre...
tools/resource_prefetch_predictor/prefetch_benchmark.py:149: _RunOnce(device,
args.database, args.url, delay, args.output_filename,
On 2016/12/12 18:07:09, pasko wrote:
> for extra robustness against data mis-interpretation this should fail if the
> file exists on start.

Done.

pasko

More suggestions on file naming and flag usability. I think those are the last suggestions ...

4 years ago (2016-12-14 19:38:34 UTC) #5

More suggestions on file naming and flag usability. I think those are the last
suggestions for this review.

Remainder is nits/+TODO.

https://codereview.chromium.org/2561353002/diff/1/tools/resource_prefetch_pre...
File tools/resource_prefetch_predictor/generate_database.py (right):

https://codereview.chromium.org/2561353002/diff/1/tools/resource_prefetch_pre...
tools/resource_prefetch_predictor/generate_database.py:9: Also generates a WPR
archive for another page.
On 2016/12/14 17:39:39, Benoit L wrote:
> On 2016/12/12 18:07:09, pasko wrote:
> > Perhaps then makes sense to rename to generate_test_data.py
> > 
> > I (maybe naively) assume that after this script finishes, the benchmark can
be
> > repeated as many times as desirable without running this script. Please let
me
> > know if it is a wrong assumption.
> 
> Yes, this is the case.

Then git mv generate_{database,test_data}.py sounds worth it.

https://codereview.chromium.org/2561353002/diff/1/tools/resource_prefetch_pre...
File tools/resource_prefetch_predictor/prefetch_benchmark.py (right):

https://codereview.chromium.org/2561353002/diff/1/tools/resource_prefetch_pre...
tools/resource_prefetch_predictor/prefetch_benchmark.py:53: 'prefetch. Runs will
randomly select one delay in the '
On 2016/12/14 17:39:39, Benoit L wrote:
> On 2016/12/12 18:07:10, pasko wrote:
> > Are you looking for the delay that produces best results? A statistician hat
> > does not suit me well, but if I disregard this fact and try to fit the hat
on
> my
> > head, I would seriously question the approach as leading to type 2 errors.
> > 
> > Can we agree upfront what 1-3 delays are considered 'typical' according to
> some
> > data from the field and then focus on benchmarking only those?
> 
> No, it's to disable the prefetcher, with something like 3000,-1, since -1
> disables the prefetcher.

if that's the only usecase, let's have a hardcoded delay and
--disable-prefetcher. It would be easier to use and less to cover when
presenting results (like: create this file with two magic numbers, etc etc).

https://codereview.chromium.org/2561353002/diff/1/tools/resource_prefetch_pre...
tools/resource_prefetch_predictor/prefetch_benchmark.py:114: print
wpr.chrome_args
nit: looks like debug code left here, but if you want to keep it, no objections,
logging.info may be better to avoid comments like this.

https://codereview.chromium.org/2561353002/diff/20001/tools/android/customtab...
File tools/android/customtabs_benchmark/scripts/customtabs_benchmark.py (right):

https://codereview.chromium.org/2561353002/diff/20001/tools/android/customtab...
tools/android/customtabs_benchmark/scripts/customtabs_benchmark.py:220: data =
np.genfromtxt(filename, delimiter=',', skip_header=1)
is this function going to be used somewhere?

The csv.DictReader would be more robust if it asserted for expected fields, like
here:
https://codesearch.chromium.org/chromium/src/tools/android/loading/csv_util.p...

Probably not for this change, a TODO would suffice.

Benoit L

Thanks! https://codereview.chromium.org/2561353002/diff/1/tools/resource_prefetch_predictor/generate_database.py File tools/resource_prefetch_predictor/generate_database.py (right): https://codereview.chromium.org/2561353002/diff/1/tools/resource_prefetch_predictor/generate_database.py#newcode9 tools/resource_prefetch_predictor/generate_database.py:9: Also generates a WPR archive for another page. ...

4 years ago (2016-12-16 14:27:00 UTC) #6

Thanks!

https://codereview.chromium.org/2561353002/diff/1/tools/resource_prefetch_pre...
File tools/resource_prefetch_predictor/generate_database.py (right):

https://codereview.chromium.org/2561353002/diff/1/tools/resource_prefetch_pre...
tools/resource_prefetch_predictor/generate_database.py:9: Also generates a WPR
archive for another page.
On 2016/12/14 19:38:33, pasko wrote:
> On 2016/12/14 17:39:39, Benoit L wrote:
> > On 2016/12/12 18:07:09, pasko wrote:
> > > Perhaps then makes sense to rename to generate_test_data.py
> > > 
> > > I (maybe naively) assume that after this script finishes, the benchmark
can
> be
> > > repeated as many times as desirable without running this script. Please
let
> me
> > > know if it is a wrong assumption.
> > 
> > Yes, this is the case.
> 
> Then git mv generate_{database,test_data}.py sounds worth it.

Done.

https://codereview.chromium.org/2561353002/diff/1/tools/resource_prefetch_pre...
File tools/resource_prefetch_predictor/prefetch_benchmark.py (right):

https://codereview.chromium.org/2561353002/diff/1/tools/resource_prefetch_pre...
tools/resource_prefetch_predictor/prefetch_benchmark.py:53: 'prefetch. Runs will
randomly select one delay in the '
On 2016/12/14 19:38:33, pasko wrote:
> On 2016/12/14 17:39:39, Benoit L wrote:
> > On 2016/12/12 18:07:10, pasko wrote:
> > > Are you looking for the delay that produces best results? A statistician
hat
> > > does not suit me well, but if I disregard this fact and try to fit the hat
> on
> > my
> > > head, I would seriously question the approach as leading to type 2 errors.
> > > 
> > > Can we agree upfront what 1-3 delays are considered 'typical' according to
> > some
> > > data from the field and then focus on benchmarking only those?
> > 
> > No, it's to disable the prefetcher, with something like 3000,-1, since -1
> > disables the prefetcher.
> 
> if that's the only usecase, let's have a hardcoded delay and
> --disable-prefetcher. It would be easier to use and less to cover when
> presenting results (like: create this file with two magic numbers, etc etc).

Well, I would also like to see the impact of several delays, so I would prefer
to leave that there.

https://codereview.chromium.org/2561353002/diff/1/tools/resource_prefetch_pre...
tools/resource_prefetch_predictor/prefetch_benchmark.py:114: print
wpr.chrome_args
On 2016/12/14 19:38:33, pasko wrote:
> nit: looks like debug code left here, but if you want to keep it, no
objections,
> logging.info may be better to avoid comments like this.

Done.

https://codereview.chromium.org/2561353002/diff/20001/tools/android/customtab...
File tools/android/customtabs_benchmark/scripts/customtabs_benchmark.py (right):

https://codereview.chromium.org/2561353002/diff/20001/tools/android/customtab...
tools/android/customtabs_benchmark/scripts/customtabs_benchmark.py:220: data =
np.genfromtxt(filename, delimiter=',', skip_header=1)
On 2016/12/14 19:38:33, pasko wrote:
> is this function going to be used somewhere?
> 
> The csv.DictReader would be more robust if it asserted for expected fields,
like
> here:
>
https://codesearch.chromium.org/chromium/src/tools/android/loading/csv_util.p...
> 
> Probably not for this change, a TODO would suffice.

It is used for the analysis.
Acknowledged.

pasko

lgtm https://codereview.chromium.org/2561353002/diff/1/tools/resource_prefetch_predictor/prefetch_benchmark.py File tools/resource_prefetch_predictor/prefetch_benchmark.py (right): https://codereview.chromium.org/2561353002/diff/1/tools/resource_prefetch_predictor/prefetch_benchmark.py#newcode53 tools/resource_prefetch_predictor/prefetch_benchmark.py:53: 'prefetch. Runs will randomly select one delay in ...

4 years ago (2016-12-16 16:49:55 UTC) #7

lgtm

https://codereview.chromium.org/2561353002/diff/1/tools/resource_prefetch_pre...
File tools/resource_prefetch_predictor/prefetch_benchmark.py (right):

https://codereview.chromium.org/2561353002/diff/1/tools/resource_prefetch_pre...
tools/resource_prefetch_predictor/prefetch_benchmark.py:53: 'prefetch. Runs will
randomly select one delay in the '
On 2016/12/16 14:27:00, Benoit L wrote:
> On 2016/12/14 19:38:33, pasko wrote:
> > On 2016/12/14 17:39:39, Benoit L wrote:
> > > On 2016/12/12 18:07:10, pasko wrote:
> > > > Are you looking for the delay that produces best results? A statistician
> hat
> > > > does not suit me well, but if I disregard this fact and try to fit the
hat
> > on
> > > my
> > > > head, I would seriously question the approach as leading to type 2
errors.
> > > > 
> > > > Can we agree upfront what 1-3 delays are considered 'typical' according
to
> > > some
> > > > data from the field and then focus on benchmarking only those?
> > > 
> > > No, it's to disable the prefetcher, with something like 3000,-1, since -1
> > > disables the prefetcher.
> > 
> > if that's the only usecase, let's have a hardcoded delay and
> > --disable-prefetcher. It would be easier to use and less to cover when
> > presenting results (like: create this file with two magic numbers, etc etc).
> 
> Well, I would also like to see the impact of several delays, so I would prefer
> to leave that there.

This contradicts to the "No, it's to disable the prefetcher" you wrote
previously. Sad. I cannot stop you from experimenting, so this is moot.

https://codereview.chromium.org/2561353002/diff/60001/tools/resource_prefetch...
File tools/resource_prefetch_predictor/prefetch_benchmark.py (right):

https://codereview.chromium.org/2561353002/diff/60001/tools/resource_prefetch...
tools/resource_prefetch_predictor/prefetch_benchmark.py:117: logging.info('WPR
arguments: ' +  ','.join(wpr.chrome_args))
nitty nit:

are WPR arguments separated with commas whenever they need to be passed to WPR?
I thought they would be space-separated. If the latter is true, and also if we
wanted to optimize copy-pasting those, then we should prefer to join with spaces
here

Benoit L

https://codereview.chromium.org/2561353002/diff/60001/tools/resource_prefetch_predictor/prefetch_benchmark.py File tools/resource_prefetch_predictor/prefetch_benchmark.py (right): https://codereview.chromium.org/2561353002/diff/60001/tools/resource_prefetch_predictor/prefetch_benchmark.py#newcode117 tools/resource_prefetch_predictor/prefetch_benchmark.py:117: logging.info('WPR arguments: ' + ','.join(wpr.chrome_args)) On 2016/12/16 16:49:55, pasko ...

4 years ago (2016-12-16 17:08:24 UTC) #8

Benoit L

The patchset sent to the CQ was uploaded after l-g-t-m from pasko@chromium.org Link to the ...

4 years ago (2016-12-16 17:08:30 UTC) #10