Issue 1423993007: CT Perf recipe to run benchmarks on the top 1k sites using swarming

rmistry

Description was changed from ========== Checkpoint BUG= ========== to ========== CT + Swarming + Perf ...

5 years, 1 month ago (2015-10-29 13:16:25 UTC) #1

rmistry

Description was changed from ========== CT + Swarming + Perf waterfall Prototype BUG= ========== to ...

5 years, 1 month ago (2015-11-06 18:16:50 UTC) #2

rmistry

Description was changed from ========== CT Perf recipe to run rasterize_and_record_micro on the top 1k ...

5 years, 1 month ago (2015-11-06 18:20:50 UTC) #3

rmistry

Description was changed from ========== CT Perf recipe to run rasterize_and_record_micro on the top 1k ...

5 years, 1 month ago (2015-11-06 18:23:07 UTC) #4

rmistry

Description was changed from ========== CT Perf recipe to run rasterize_and_record_micro on the top 1k ...

5 years, 1 month ago (2015-11-06 18:23:28 UTC) #5

Description was changed from

==========
CT Perf recipe to run rasterize_and_record_micro on the top 1k sites using
swarming.

A high level explanation of what the recipe does:

* Downloads the chrome build from the parent build.
* For each shard we want to run:
  * Downloads all required page sets and archives from Google Storage.
  * Uses the isolate file template in
https://codereview.chromium.org/1410353007/ to dynamically create isolate files
from the downloads.
  * Archives everything on the isolate server.
  * Triggers a swarming task.
  * The swarming task runs benchmark on the archives and uploads results to
chromeperf.appspot.com
  * Waits for swarming task to complete.
* Cleans up.


I have triggered tasks on swarming and things appear to be working. Here are
some timings with 1 page repeat (this might be upped to 3 page repeats for more
reliable results)-
1 slave:     7 mins
2 slaves:    8 mins.
3 slaves:    9 mins
50 slaves:  38 mins (during MTV peak time)
100 slaves: 68 mins (early EST so not during MTV peak time)


BUG=skia:4503
==========

to

==========
CT Perf recipe to run rasterize_and_record_micro on the top 1k sites using
swarming.

An explanation of what the recipe does:

* Downloads the chrome build from the parent build.
* For each shard we want to run:
  * Downloads all required page sets and archives from Google Storage.
  * Uses the isolate file template in
https://codereview.chromium.org/1410353007/ to dynamically create isolate files
from the downloads.
  * Archives everything on the isolate server.
  * Triggers a swarming task.
  * The swarming task runs benchmark on the archives and uploads results to
chromeperf.appspot.com
  * Waits for swarming task to complete.
* Cleans up.


I have triggered tasks on swarming and things appear to be working. Here are
some timings with 1 page repeat (this might be upped to 3 page repeats for more
reliable results)-
1 slave:     7 mins
2 slaves:    8 mins.
3 slaves:    9 mins
50 slaves:  38 mins (during MTV peak time)
100 slaves: 68 mins (early EST so not during MTV peak time)


BUG=skia:4503
==========

rmistry

Description was changed from ========== CT Perf recipe to run rasterize_and_record_micro on the top 1k ...

5 years, 1 month ago (2015-11-06 18:29:16 UTC) #6

Description was changed from

==========
CT Perf recipe to run rasterize_and_record_micro on the top 1k sites using
swarming.

An explanation of what the recipe does:

* Downloads the chrome build from the parent build.
* For each shard we want to run:
  * Downloads all required page sets and archives from Google Storage.
  * Uses the isolate file template in
https://codereview.chromium.org/1410353007/ to dynamically create isolate files
from the downloads.
  * Archives everything on the isolate server.
  * Triggers a swarming task.
  * The swarming task runs benchmark on the archives and uploads results to
chromeperf.appspot.com
  * Waits for swarming task to complete.
* Cleans up.


I have triggered tasks on swarming and things appear to be working. Here are
some timings with 1 page repeat (this might be upped to 3 page repeats for more
reliable results)-
1 slave:     7 mins
2 slaves:    8 mins.
3 slaves:    9 mins
50 slaves:  38 mins (during MTV peak time)
100 slaves: 68 mins (early EST so not during MTV peak time)


BUG=skia:4503
==========

to

==========
CT Perf recipe to run rasterize_and_record_micro on the top 1k sites using
swarming.

An explanation of what the recipe does:

* Downloads the chrome build from the parent build.
* For each shard we want to run:
  * Downloads all required page sets and archives from Google Storage.
  * Uses the isolate file template in
https://codereview.chromium.org/1410353007/ to dynamically create isolate files
from the downloads.
  * Archives everything on the isolate server.
  * Triggers a swarming task.
  * The swarming task runs benchmark on the archives and uploads results to
chromeperf.appspot.com
  * Waits for swarming task to complete.
* Cleans up.


I have triggered tasks on swarming and things appear to be working. Here are
some timings with 1 page repeat (this might be upped to 3 page repeats for more
reliable results)-
1 slave:     7 mins
2 slaves:    8 mins.
3 slaves:    9 mins
50 slaves:  38 mins (during MTV peak time)
100 slaves: 68 mins (early EST so not during MTV peak time)

Each swarming task appears to take around 6-7 mins.


BUG=skia:4503
==========

rmistry

rmistry@google.com changed reviewers: + maruel@chromium.org

5 years, 1 month ago (2015-11-06 18:30:32 UTC) #7

rmistry

Hi Marc-Antoine, This is still unpolished. I would like to extract some things into recipe_modules ...

5 years, 1 month ago (2015-11-06 18:30:33 UTC) #8

rmistry

Description was changed from ========== CT Perf recipe to run rasterize_and_record_micro on the top 1k ...

5 years, 1 month ago (2015-11-06 18:30:58 UTC) #9

Description was changed from

==========
CT Perf recipe to run rasterize_and_record_micro on the top 1k sites using
swarming.

An explanation of what the recipe does:

* Downloads the chrome build from the parent build.
* For each shard we want to run:
  * Downloads all required page sets and archives from Google Storage.
  * Uses the isolate file template in
https://codereview.chromium.org/1410353007/ to dynamically create isolate files
from the downloads.
  * Archives everything on the isolate server.
  * Triggers a swarming task.
  * The swarming task runs benchmark on the archives and uploads results to
chromeperf.appspot.com
  * Waits for swarming task to complete.
* Cleans up.


I have triggered tasks on swarming and things appear to be working. Here are
some timings with 1 page repeat (this might be upped to 3 page repeats for more
reliable results)-
1 slave:     7 mins
2 slaves:    8 mins.
3 slaves:    9 mins
50 slaves:  38 mins (during MTV peak time)
100 slaves: 68 mins (early EST so not during MTV peak time)

Each swarming task appears to take around 6-7 mins.


BUG=skia:4503
==========

to

==========
CT Perf recipe to run rasterize_and_record_micro on the top 1k sites using
swarming.

An explanation of what the recipe does:

* Downloads the chrome build from the parent build.
* For each shard we want to run:
  * Downloads all required page sets and archives from Google Storage.
  * Uses the isolate file template in
https://codereview.chromium.org/1410353007/ to dynamically create isolate files
from the downloads.
  * Archives everything on the isolate server.
  * Triggers a swarming task.
  * The swarming task runs benchmark on the archives and uploads results to
chromeperf.appspot.com
  * Waits for swarming task to complete.
* Cleans up.


I have triggered tasks on swarming and things appear to be working. Here are
some timings with 1 page repeat (this might be upped to 3 page repeats for more
reliable results)-
1 slave:     7 mins
2 slaves:    8 mins.
3 slaves:    9 mins
50 slaves:  38 mins (during MTV peak time)
100 slaves: 68 mins (early EST so not during MTV peak time)

Each swarming task appears to take around 6-7 mins.


BUG=skia:4503


COMMIT=false
This is still unpolished. I would like to extract some things into
recipe_modules and add unit tests.
==========

M-A Ruel

Can you delete patchsets #1 to #19? Thanks. https://codereview.chromium.org/1423993007/diff/380001/scripts/slave/recipes/perf/ct_top1k_rr_perf.py File scripts/slave/recipes/perf/ct_top1k_rr_perf.py (right): https://codereview.chromium.org/1423993007/diff/380001/scripts/slave/recipes/perf/ct_top1k_rr_perf.py#newcode5 scripts/slave/recipes/perf/ct_top1k_rr_perf.py:5: import ...

5 years, 1 month ago (2015-11-06 18:56:06 UTC) #10

rmistry

On 2015/11/06 18:56:06, M-A Ruel wrote: > Can you delete patchsets #1 to #19? Thanks. ...

5 years, 1 month ago (2015-11-09 16:23:41 UTC) #29

rmistry

https://codereview.chromium.org/1423993007/diff/380001/scripts/slave/recipes/perf/ct_top1k_rr_perf.py File scripts/slave/recipes/perf/ct_top1k_rr_perf.py (right): https://codereview.chromium.org/1423993007/diff/380001/scripts/slave/recipes/perf/ct_top1k_rr_perf.py#newcode5 scripts/slave/recipes/perf/ct_top1k_rr_perf.py:5: import os On 2015/11/06 18:56:06, M-A Ruel wrote: > ...

5 years, 1 month ago (2015-11-09 16:23:50 UTC) #30

https://codereview.chromium.org/1423993007/diff/380001/scripts/slave/recipes/...
File scripts/slave/recipes/perf/ct_top1k_rr_perf.py (right):

https://codereview.chromium.org/1423993007/diff/380001/scripts/slave/recipes/...
scripts/slave/recipes/perf/ct_top1k_rr_perf.py:5: import os
On 2015/11/06 18:56:06, M-A Ruel wrote:
> You are not supposed to import anything in recipes.

Done.

https://codereview.chromium.org/1423993007/diff/380001/scripts/slave/recipes/...
scripts/slave/recipes/perf/ct_top1k_rr_perf.py:35: # Number of slaves to shard
CT runs to.
On 2015/11/06 18:56:06, M-A Ruel wrote:
> s/slaves/bots/

I used "slaves" here to be consistent with CT's nomenclature where it calls its
workers/bots "slaves". The GS directory it downloads artifacts from is also
called "slave{1..100}". I can use slaves->bots everywhere here if you prefer,
but the GS directory name will still be "slave{1..100}".

https://codereview.chromium.org/1423993007/diff/380001/scripts/slave/recipes/...
scripts/slave/recipes/perf/ct_top1k_rr_perf.py:73: downloads_dir =
os.path.join(chromium_src_dir, 'content', 'test', 'ct')
On 2015/11/06 18:56:06, M-A Ruel wrote:
> Why not put this in tmp too?

I initially went that route but it made the isolate template file messy. I then
opted into storing things in the chromium checkout because it seemed much easier
for the isolate file to find artifacts when they were relative to the chromium
checkout.

https://codereview.chromium.org/1423993007/diff/380001/scripts/slave/recipes/...
scripts/slave/recipes/perf/ct_top1k_rr_perf.py:75: swarming_temp_dir =
str(api.path.mkdtemp('swarming-temp-dir'))
On 2015/11/06 18:56:06, M-A Ruel wrote:
> Use self.m.path['tmp_base'] instead. That's what _gtest_collect_step() does.

Done.

https://codereview.chromium.org/1423993007/diff/380001/scripts/slave/recipes/...
scripts/slave/recipes/perf/ct_top1k_rr_perf.py:92: page_sets_dir =
os.path.join(slave_dir, 'page_sets')
On 2015/11/06 18:56:06, M-A Ruel wrote:
> I don't understand why you need to download files CT_NUM_SLAVES times.

The purpose of this recipe is to run benchmarks on the top 1000 web pages.
This is done by sharding 10 web pages to each swarming bot. slave1 will process
the top 1-10 web pages. slave2 will process 11-20 webpages and so on.
Sharding these web pages involves sending each bot the pageset file of the web
pages and the archive of the web pages. Below you will see that these artifacts
are downloaded from a GS directory that is unique for each slave (contains
slave1,slave2,...).
Note: this follows the current architecture that CT uses
(https://skia.org/dev/testing/ct#system_diagram) but is attempting to use
swarming bots instead of the machines from CT's pool.

https://codereview.chromium.org/1423993007/diff/380001/scripts/slave/recipes/...
scripts/slave/recipes/perf/ct_top1k_rr_perf.py:127: with
open(isolate_template_path, 'rt') as fin:
On 2015/11/06 18:56:06, M-A Ruel wrote:
> use b instead of t, we don't want any CR character.

Agreed. Done.

https://codereview.chromium.org/1423993007/diff/380001/scripts/slave/recipes/...
scripts/slave/recipes/perf/ct_top1k_rr_perf.py:135: fout.close()
On 2015/11/06 18:56:06, M-A Ruel wrote:
> When using with open() ..., you shouldn't close the handle.

Done.

https://codereview.chromium.org/1423993007/diff/380001/scripts/slave/recipes/...
scripts/slave/recipes/perf/ct_top1k_rr_perf.py:148: '--path-variable',
'PRODUCT_DIR', tempfile.gettempdir(),
On 2015/11/06 18:56:06, M-A Ruel wrote:
> Can you add more details about what fails?

If I do not specify --path-variable then it fails with:
IsolateError: Variable "PRODUCT_DIR" was not found in {'EXECUTABLE_SUFFIX': ''}.
Did you forget to specify --path-variable?

Also if I do not create a dummy /tmp/bitmaptools then it fails with:
MappingError: Input file /tmp/bitmaptools doesn't exist

I believe this is due to
https://code.google.com/p/chromium/codesearch#chromium/src/tools/telemetry/te...
and I do not know how to make it pass without my above hack.

https://codereview.chromium.org/1423993007/diff/380001/scripts/slave/recipes/...
scripts/slave/recipes/perf/ct_top1k_rr_perf.py:191: # TODO(rmistry): Delete me!
On 2015/11/06 18:56:06, M-A Ruel wrote:
> :)

Deleted.

https://codereview.chromium.org/1423993007/diff/380001/scripts/slave/recipes/...
scripts/slave/recipes/perf/ct_top1k_rr_perf.py:199:
shutil.rmtree(swarming_temp_dir)
On 2015/11/06 18:56:06, M-A Ruel wrote:
> git gs "def rmtree"
> gave me
> recipe_modules/file/api.py:140:  def rmtree(self, name, path, **kwargs):

Done.

M-A Ruel

https://codereview.chromium.org/1423993007/diff/380001/scripts/slave/recipes/perf/ct_top1k_rr_perf.py File scripts/slave/recipes/perf/ct_top1k_rr_perf.py (right): https://codereview.chromium.org/1423993007/diff/380001/scripts/slave/recipes/perf/ct_top1k_rr_perf.py#newcode35 scripts/slave/recipes/perf/ct_top1k_rr_perf.py:35: # Number of slaves to shard CT runs to. ...

5 years, 1 month ago (2015-11-09 16:51:21 UTC) #31

https://codereview.chromium.org/1423993007/diff/380001/scripts/slave/recipes/...
File scripts/slave/recipes/perf/ct_top1k_rr_perf.py (right):

https://codereview.chromium.org/1423993007/diff/380001/scripts/slave/recipes/...
scripts/slave/recipes/perf/ct_top1k_rr_perf.py:35: # Number of slaves to shard
CT runs to.
On 2015/11/09 16:23:49, rmistry wrote:
> On 2015/11/06 18:56:06, M-A Ruel wrote:
> > s/slaves/bots/
> 
> I used "slaves" here to be consistent with CT's nomenclature where it calls
its
> workers/bots "slaves". The GS directory it downloads artifacts from is also
> called "slave{1..100}". I can use slaves->bots everywhere here if you prefer,
> but the GS directory name will still be "slave{1..100}".

There's two points of view:
- the word slave was maybe used because it implied buildbot slave?
- if it's moving to swarming bots and it effectively represent swarming bots,
long term switching to bots is a better choice.

I don't mind too much.

https://codereview.chromium.org/1423993007/diff/380001/scripts/slave/recipes/...
scripts/slave/recipes/perf/ct_top1k_rr_perf.py:148: '--path-variable',
'PRODUCT_DIR', tempfile.gettempdir(),
On 2015/11/09 16:23:49, rmistry wrote:
> On 2015/11/06 18:56:06, M-A Ruel wrote:
> > Can you add more details about what fails?
> 
> If I do not specify --path-variable then it fails with:
> IsolateError: Variable "PRODUCT_DIR" was not found in {'EXECUTABLE_SUFFIX':
''}.
> Did you forget to specify --path-variable?
> 
> Also if I do not create a dummy /tmp/bitmaptools then it fails with:
> MappingError: Input file /tmp/bitmaptools doesn't exist
> 
> I believe this is due to
>
https://code.google.com/p/chromium/codesearch#chromium/src/tools/telemetry/te...
> and I do not know how to make it pass without my above hack.

Do you need this file at all? If not, we should fix that.

https://codereview.chromium.org/1423993007/diff/420001/scripts/slave/recipes/...
File scripts/slave/recipes/perf/ct_top1k_rr_perf.py (right):

https://codereview.chromium.org/1423993007/diff/420001/scripts/slave/recipes/...
scripts/slave/recipes/perf/ct_top1k_rr_perf.py:131: # Archive everything on the
isolate server.
Actually, you *really* want to use batcharchive and archive everything at once.
That's really important performance wise. This means:
- moving this call after the loop.
- creating a second loop to trigger tasks after.

This may complicate things a bit but this will greatly improve performance.

batcharchive basically takes a bunch of .gen.json files, which specify the
command line arguments of each isolate call. Then it does magics to make this
fast.

rmistry

https://codereview.chromium.org/1423993007/diff/380001/scripts/slave/recipes/perf/ct_top1k_rr_perf.py File scripts/slave/recipes/perf/ct_top1k_rr_perf.py (right): https://codereview.chromium.org/1423993007/diff/380001/scripts/slave/recipes/perf/ct_top1k_rr_perf.py#newcode35 scripts/slave/recipes/perf/ct_top1k_rr_perf.py:35: # Number of slaves to shard CT runs to. ...

5 years, 1 month ago (2015-11-09 19:37:24 UTC) #32

https://codereview.chromium.org/1423993007/diff/380001/scripts/slave/recipes/...
File scripts/slave/recipes/perf/ct_top1k_rr_perf.py (right):

https://codereview.chromium.org/1423993007/diff/380001/scripts/slave/recipes/...
scripts/slave/recipes/perf/ct_top1k_rr_perf.py:35: # Number of slaves to shard
CT runs to.
On 2015/11/09 16:51:21, M-A Ruel wrote:
> On 2015/11/09 16:23:49, rmistry wrote:
> > On 2015/11/06 18:56:06, M-A Ruel wrote:
> > > s/slaves/bots/
> > 
> > I used "slaves" here to be consistent with CT's nomenclature where it calls
> its
> > workers/bots "slaves". The GS directory it downloads artifacts from is also
> > called "slave{1..100}". I can use slaves->bots everywhere here if you
prefer,
> > but the GS directory name will still be "slave{1..100}".
> 
> There's two points of view:
> - the word slave was maybe used because it implied buildbot slave?
> - if it's moving to swarming bots and it effectively represent swarming bots,
> long term switching to bots is a better choice.
> 
> I don't mind too much.

The word slave was used in CT independently of buildbot. It was used because in
the original architecture there was 1 master and 100 slaves. I will leave
"slave" here for now because changing it will require changing lots of
documentation.

https://codereview.chromium.org/1423993007/diff/380001/scripts/slave/recipes/...
scripts/slave/recipes/perf/ct_top1k_rr_perf.py:148: '--path-variable',
'PRODUCT_DIR', tempfile.gettempdir(),
On 2015/11/09 16:51:21, M-A Ruel wrote:
> On 2015/11/09 16:23:49, rmistry wrote:
> > On 2015/11/06 18:56:06, M-A Ruel wrote:
> > > Can you add more details about what fails?
> > 
> > If I do not specify --path-variable then it fails with:
> > IsolateError: Variable "PRODUCT_DIR" was not found in {'EXECUTABLE_SUFFIX':
> ''}.
> > Did you forget to specify --path-variable?
> > 
> > Also if I do not create a dummy /tmp/bitmaptools then it fails with:
> > MappingError: Input file /tmp/bitmaptools doesn't exist
> > 
> > I believe this is due to
> >
>
https://code.google.com/p/chromium/codesearch#chromium/src/tools/telemetry/te...
> > and I do not know how to make it pass without my above hack.
> 
> Do you need this file at all? If not, we should fix that.

+nednyugen
Ned, is this file required? if yes then how do I specify a path to it?

https://codereview.chromium.org/1423993007/diff/420001/scripts/slave/recipes/...
File scripts/slave/recipes/perf/ct_top1k_rr_perf.py (right):

https://codereview.chromium.org/1423993007/diff/420001/scripts/slave/recipes/...
scripts/slave/recipes/perf/ct_top1k_rr_perf.py:131: # Archive everything on the
isolate server.
On 2015/11/09 16:51:21, M-A Ruel wrote:
> Actually, you *really* want to use batcharchive and archive everything at
once.
> That's really important performance wise. This means:
> - moving this call after the loop.
> - creating a second loop to trigger tasks after.
> 
> This may complicate things a bit but this will greatly improve performance.
> 
> batcharchive basically takes a bunch of .gen.json files, which specify the
> command line arguments of each isolate call. Then it does magics to make this
> fast.

Done. PTAL.

M-A Ruel

https://codereview.chromium.org/1423993007/diff/440001/scripts/slave/recipes/perf/ct_top1k_rr_perf.py File scripts/slave/recipes/perf/ct_top1k_rr_perf.py (right): https://codereview.chromium.org/1423993007/diff/440001/scripts/slave/recipes/perf/ct_top1k_rr_perf.py#newcode39 scripts/slave/recipes/perf/ct_top1k_rr_perf.py:39: build_archive_url = api.properties['parent_build_archive_url'] I think you could inline this ...

5 years, 1 month ago (2015-11-09 19:46:46 UTC) #33

nednguyen

nednguyen@google.com changed reviewers: + nednguyen@google.com

5 years, 1 month ago (2015-11-09 23:04:32 UTC) #34

nednguyen

https://codereview.chromium.org/1423993007/diff/440001/scripts/slave/recipes/perf/ct_top1k_rr_perf.py File scripts/slave/recipes/perf/ct_top1k_rr_perf.py (right): https://codereview.chromium.org/1423993007/diff/440001/scripts/slave/recipes/perf/ct_top1k_rr_perf.py#newcode143 scripts/slave/recipes/perf/ct_top1k_rr_perf.py:143: # requires bitmaptools in PRODUCT_DIR. Bitmaptools build is used ...

5 years, 1 month ago (2015-11-09 23:04:32 UTC) #35

rmistry

https://codereview.chromium.org/1423993007/diff/440001/scripts/slave/recipes/perf/ct_top1k_rr_perf.py File scripts/slave/recipes/perf/ct_top1k_rr_perf.py (right): https://codereview.chromium.org/1423993007/diff/440001/scripts/slave/recipes/perf/ct_top1k_rr_perf.py#newcode39 scripts/slave/recipes/perf/ct_top1k_rr_perf.py:39: build_archive_url = api.properties['parent_build_archive_url'] On 2015/11/09 19:46:45, M-A Ruel wrote: ...

5 years, 1 month ago (2015-11-10 12:58:06 UTC) #36

https://codereview.chromium.org/1423993007/diff/440001/scripts/slave/recipes/...
File scripts/slave/recipes/perf/ct_top1k_rr_perf.py (right):

https://codereview.chromium.org/1423993007/diff/440001/scripts/slave/recipes/...
scripts/slave/recipes/perf/ct_top1k_rr_perf.py:39: build_archive_url =
api.properties['parent_build_archive_url']
On 2015/11/09 19:46:45, M-A Ruel wrote:
> I think you could inline this and it'd fit 80 cols ?

Done.

https://codereview.chromium.org/1423993007/diff/440001/scripts/slave/recipes/...
scripts/slave/recipes/perf/ct_top1k_rr_perf.py:70: downloads_dir =
chromium_src_dir.join('content', 'test', 'ct')
On 2015/11/09 19:46:45, M-A Ruel wrote:
> Is it in the .gitignore? Otherwise it'll get continuously deleted. Is is
> possible to keep a local cache to speed things up or it's not an issue in
> practice?

I think its ok to get continuously deleted for now. I can look into optimizing
this if needed once things are up and running.

https://codereview.chromium.org/1423993007/diff/440001/scripts/slave/recipes/...
scripts/slave/recipes/perf/ct_top1k_rr_perf.py:143: #                requires
bitmaptools in PRODUCT_DIR.
On 2015/11/09 23:04:32, nednguyen wrote:
> Bitmaptools build is used by telemetry's image processing code. It's required,
> unless you can prebuilt the binary on multiple platforms & upload them to
cloud
> storage so telemetry can fetch the  file.
> 
> I honestly don't know how to reference the path to bitmap tool in this
context.
> Can you just reuse the isolate defined for telemetry+chrome in
>
https://code.google.com/p/chromium/codesearch#chromium/src/tools/perf/chrome_...

I tried using it and it failed with:
IsolateError: These configuration variables were missing from the command line:
CONFIGURATION_NAME, asan, component, disable_nacl, fastbuild,
icu_use_data_file_flag, kasko, lsan, msan, msvs_version, target_arch, tsan,
use_custom_libcxx, use_instrumented_libraries,
use_prebuilt_instrumented_libraries, v8_use_external_startup_data

thats a lot of missing variables, I do not know how to specify them here.

As a workaround I removed the reference to telemetry.isolate in
https://codereview.chromium.org/1410353007/diff2/220001:240001/chrome/ct_top1...
for now.

https://codereview.chromium.org/1423993007/diff/440001/scripts/slave/recipes/...
scripts/slave/recipes/perf/ct_top1k_rr_perf.py:164: for isolated_gen_json in
isolated_gen_json_files:
On 2015/11/09 19:46:45, M-A Ruel wrote:
> batcharchive_args.extend(str(i) for i in isolated_gen_json_files)

Done.

https://codereview.chromium.org/1423993007/diff/440001/scripts/slave/recipes/...
scripts/slave/recipes/perf/ct_top1k_rr_perf.py:168:
api.swarming_client.path.join('isolate.py'),
On 2015/11/09 19:46:45, M-A Ruel wrote:
> Use luci-go implementation which is an order of magnitude faster. You should
be
> able to use it via the isolate recipe.

How do I use it using
https://code.google.com/p/chromium/codesearch#chromium/build/scripts/slave/re...
?
(sorry, I must have missed something).

nednguyen

nednguyen@google.com changed reviewers: + kbr@chromium.org

5 years, 1 month ago (2015-11-10 14:05:38 UTC) #37

nednguyen

My build knowledge is very questionable. +Kbr: how did your gpu test use telemetry.isolate without ...

5 years, 1 month ago (2015-11-10 14:05:39 UTC) #38

rmistry

In @Patchset7 I extracted reusable functionality into a ct_swarming recipe module. On 2015/11/10 12:58:06, rmistry ...

5 years, 1 month ago (2015-11-11 18:06:36 UTC) #41

In @Patchset7 I extracted reusable functionality into a ct_swarming recipe
module.

On 2015/11/10 12:58:06, rmistry wrote:
>
https://codereview.chromium.org/1423993007/diff/440001/scripts/slave/recipes/...
> File scripts/slave/recipes/perf/ct_top1k_rr_perf.py (right):
> 
>
https://codereview.chromium.org/1423993007/diff/440001/scripts/slave/recipes/...
> scripts/slave/recipes/perf/ct_top1k_rr_perf.py:39: build_archive_url =
> api.properties['parent_build_archive_url']
> On 2015/11/09 19:46:45, M-A Ruel wrote:
> > I think you could inline this and it'd fit 80 cols ?
> 
> Done.
> 
>
https://codereview.chromium.org/1423993007/diff/440001/scripts/slave/recipes/...
> scripts/slave/recipes/perf/ct_top1k_rr_perf.py:70: downloads_dir =
> chromium_src_dir.join('content', 'test', 'ct')
> On 2015/11/09 19:46:45, M-A Ruel wrote:
> > Is it in the .gitignore? Otherwise it'll get continuously deleted. Is is
> > possible to keep a local cache to speed things up or it's not an issue in
> > practice?
> 
> I think its ok to get continuously deleted for now. I can look into optimizing
> this if needed once things are up and running.
> 
>
https://codereview.chromium.org/1423993007/diff/440001/scripts/slave/recipes/...
> scripts/slave/recipes/perf/ct_top1k_rr_perf.py:143: #                requires
> bitmaptools in PRODUCT_DIR.
> On 2015/11/09 23:04:32, nednguyen wrote:
> > Bitmaptools build is used by telemetry's image processing code. It's
required,
> > unless you can prebuilt the binary on multiple platforms & upload them to
> cloud
> > storage so telemetry can fetch the  file.
> > 
> > I honestly don't know how to reference the path to bitmap tool in this
> context.
> > Can you just reuse the isolate defined for telemetry+chrome in
> >
>
https://code.google.com/p/chromium/codesearch#chromium/src/tools/perf/chrome_...
> 
> I tried using it and it failed with:
> IsolateError: These configuration variables were missing from the command
line:
> CONFIGURATION_NAME, asan, component, disable_nacl, fastbuild,
> icu_use_data_file_flag, kasko, lsan, msan, msvs_version, target_arch, tsan,
> use_custom_libcxx, use_instrumented_libraries,
> use_prebuilt_instrumented_libraries, v8_use_external_startup_data
> 
> thats a lot of missing variables, I do not know how to specify them here.
> 
> As a workaround I removed the reference to telemetry.isolate in
>
https://codereview.chromium.org/1410353007/diff2/220001:240001/chrome/ct_top1...
> for now.
> 
>
https://codereview.chromium.org/1423993007/diff/440001/scripts/slave/recipes/...
> scripts/slave/recipes/perf/ct_top1k_rr_perf.py:164: for isolated_gen_json in
> isolated_gen_json_files:
> On 2015/11/09 19:46:45, M-A Ruel wrote:
> > batcharchive_args.extend(str(i) for i in isolated_gen_json_files)
> 
> Done.
> 
>
https://codereview.chromium.org/1423993007/diff/440001/scripts/slave/recipes/...
> scripts/slave/recipes/perf/ct_top1k_rr_perf.py:168:
> api.swarming_client.path.join('isolate.py'),
> On 2015/11/09 19:46:45, M-A Ruel wrote:
> > Use luci-go implementation which is an order of magnitude faster. You should
> be
> > able to use it via the isolate recipe.
> 
> How do I use it using
>
https://code.google.com/p/chromium/codesearch#chromium/build/scripts/slave/re...
> ?
> (sorry, I must have missed something).

M-A Ruel

https://codereview.chromium.org/1423993007/diff/520001/scripts/slave/recipe_modules/ct_swarming/api.py File scripts/slave/recipe_modules/ct_swarming/api.py (right): https://codereview.chromium.org/1423993007/diff/520001/scripts/slave/recipe_modules/ct_swarming/api.py#newcode5 scripts/slave/recipe_modules/ct_swarming/api.py:5: import urllib Remove https://codereview.chromium.org/1423993007/diff/520001/scripts/slave/recipe_modules/ct_swarming/api.py#newcode144 scripts/slave/recipe_modules/ct_swarming/api.py:144: json_output = self.swarming_temp_dir.join('ct-task-%s.json' % ...

5 years, 1 month ago (2015-11-11 23:53:36 UTC) #42

rmistry

https://codereview.chromium.org/1423993007/diff/520001/scripts/slave/recipe_modules/ct_swarming/api.py File scripts/slave/recipe_modules/ct_swarming/api.py (right): https://codereview.chromium.org/1423993007/diff/520001/scripts/slave/recipe_modules/ct_swarming/api.py#newcode144 scripts/slave/recipe_modules/ct_swarming/api.py:144: json_output = self.swarming_temp_dir.join('ct-task-%s.json' % slave_num) On 2015/11/11 23:53:36, M-A ...

5 years, 1 month ago (2015-11-12 00:07:34 UTC) #43

M-A Ruel

https://codereview.chromium.org/1423993007/diff/440001/scripts/slave/recipes/perf/ct_top1k_rr_perf.py File scripts/slave/recipes/perf/ct_top1k_rr_perf.py (right): https://codereview.chromium.org/1423993007/diff/440001/scripts/slave/recipes/perf/ct_top1k_rr_perf.py#newcode168 scripts/slave/recipes/perf/ct_top1k_rr_perf.py:168: api.swarming_client.path.join('isolate.py'), On 2015/11/10 12:58:06, rmistry wrote: > On 2015/11/09 ...

5 years, 1 month ago (2015-11-12 00:36:02 UTC) #44

rmistry

https://codereview.chromium.org/1423993007/diff/440001/scripts/slave/recipes/perf/ct_top1k_rr_perf.py File scripts/slave/recipes/perf/ct_top1k_rr_perf.py (right): https://codereview.chromium.org/1423993007/diff/440001/scripts/slave/recipes/perf/ct_top1k_rr_perf.py#newcode168 scripts/slave/recipes/perf/ct_top1k_rr_perf.py:168: api.swarming_client.path.join('isolate.py'), On 2015/11/12 00:36:01, M-A Ruel wrote: > On ...

5 years, 1 month ago (2015-11-12 14:39:17 UTC) #46

M-A Ruel

Sorry for the delays as I'm travelling. My main goal here is to reduce the ...

5 years, 1 month ago (2015-11-12 17:33:41 UTC) #47

Ken Russell (switch to Gerrit)

On 2015/11/10 14:05:39, nednguyen wrote: > My build knowledge is very questionable. > +Kbr: how ...

5 years, 1 month ago (2015-11-12 23:41:49 UTC) #48

rmistry

On 2015/11/12 23:41:49, Ken Russell wrote: > On 2015/11/10 14:05:39, nednguyen wrote: > > My ...

5 years, 1 month ago (2015-11-13 14:41:40 UTC) #49

rmistry

https://codereview.chromium.org/1423993007/diff/560001/scripts/slave/recipe_modules/ct_swarming/api.py File scripts/slave/recipe_modules/ct_swarming/api.py (right): https://codereview.chromium.org/1423993007/diff/560001/scripts/slave/recipe_modules/ct_swarming/api.py#newcode19 scripts/slave/recipe_modules/ct_swarming/api.py:19: self.chromium_src_dir = None On 2015/11/12 17:33:41, M-A Ruel wrote: ...

5 years, 1 month ago (2015-11-13 14:59:58 UTC) #52

rmistry

Description was changed from ========== CT Perf recipe to run rasterize_and_record_micro on the top 1k ...

5 years, 1 month ago (2015-11-19 15:41:49 UTC) #54

Description was changed from

==========
CT Perf recipe to run rasterize_and_record_micro on the top 1k sites using
swarming.

An explanation of what the recipe does:

* Downloads the chrome build from the parent build.
* For each shard we want to run:
  * Downloads all required page sets and archives from Google Storage.
  * Uses the isolate file template in
https://codereview.chromium.org/1410353007/ to dynamically create isolate files
from the downloads.
  * Archives everything on the isolate server.
  * Triggers a swarming task.
  * The swarming task runs benchmark on the archives and uploads results to
chromeperf.appspot.com
  * Waits for swarming task to complete.
* Cleans up.


I have triggered tasks on swarming and things appear to be working. Here are
some timings with 1 page repeat (this might be upped to 3 page repeats for more
reliable results)-
1 slave:     7 mins
2 slaves:    8 mins.
3 slaves:    9 mins
50 slaves:  38 mins (during MTV peak time)
100 slaves: 68 mins (early EST so not during MTV peak time)

Each swarming task appears to take around 6-7 mins.


BUG=skia:4503


COMMIT=false
This is still unpolished. I would like to extract some things into
recipe_modules and add unit tests.
==========

to

==========
CT Perf recipe to run rasterize_and_record_micro on the top 1k sites using
swarming.

An explanation of what the recipe does:

* Downloads the chrome build from the parent build.
* For each shard we want to run:
  * Downloads all required page sets and archives from Google Storage.
  * Uses the isolate file template in
https://codereview.chromium.org/1410353007/ to dynamically create isolate files
from the downloads.
  * Archives everything on the isolate server.
  * Triggers a swarming task.
  * The swarming task runs benchmark on the archives and uploads results to
chromeperf.appspot.com
  * Waits for swarming task to complete.
* Cleans up.


I have triggered tasks on swarming and things appear to be working. Here are
some timings with 1 page repeat (this might be upped to 3 page repeats for more
reliable results)-
1 slave:     7 mins
2 slaves:    8 mins.
3 slaves:    9 mins
50 slaves:  38 mins (during MTV peak time)
100 slaves: 68 mins (early EST so not during MTV peak time)

Each swarming task appears to take around 6-7 mins.


BUG=skia:4503
==========

rmistry

Description was changed from ========== CT Perf recipe to run rasterize_and_record_micro on the top 1k ...

5 years, 1 month ago (2015-11-19 15:55:13 UTC) #57

Description was changed from

==========
CT Perf recipe to run rasterize_and_record_micro on the top 1k sites using
swarming.

An explanation of what the recipe does:

* Downloads the chrome build from the parent build.
* For each shard we want to run:
  * Downloads all required page sets and archives from Google Storage.
  * Uses the isolate file template in
https://codereview.chromium.org/1410353007/ to dynamically create isolate files
from the downloads.
  * Archives everything on the isolate server.
  * Triggers a swarming task.
  * The swarming task runs benchmark on the archives and uploads results to
chromeperf.appspot.com
  * Waits for swarming task to complete.
* Cleans up.


I have triggered tasks on swarming and things appear to be working. Here are
some timings with 1 page repeat (this might be upped to 3 page repeats for more
reliable results)-
1 slave:     7 mins
2 slaves:    8 mins.
3 slaves:    9 mins
50 slaves:  38 mins (during MTV peak time)
100 slaves: 68 mins (early EST so not during MTV peak time)

Each swarming task appears to take around 6-7 mins.


BUG=skia:4503
==========

to

==========
CT Perf recipe to run rasterize_and_record_micro on the top 1k sites using
swarming.

An explanation of what the recipe does:

* Downloads the chrome build from the parent build.
* For each shard we want to run:
  * Downloads all required page sets and archives from Google Storage.
  * Batch archives everything on the isolate server using the isolate file added
in https://codereview.chromium.org/1410353007/
  * Triggers a swarming task.
  * The swarming task runs benchmark on the archives and uploads results to
chromeperf.appspot.com
  * Waits for swarming task to complete.


I have triggered tasks on swarming and things appear to be working. Here are
some timings with 1 page repeat (this might be upped to 3 page repeats for more
reliable results)-
1 slave:     7 mins
2 slaves:    8 mins.
3 slaves:    9 mins
50 slaves:  38 mins (during MTV peak time)
100 slaves: 68 mins (early EST so not during MTV peak time)

Each swarming task appears to take around 6-7 mins.


BUG=skia:4503
==========

rmistry

Description was changed from ========== CT Perf recipe to run rasterize_and_record_micro on the top 1k ...

5 years, 1 month ago (2015-11-19 15:55:40 UTC) #58

Description was changed from

==========
CT Perf recipe to run rasterize_and_record_micro on the top 1k sites using
swarming.

An explanation of what the recipe does:

* Downloads the chrome build from the parent build.
* For each shard we want to run:
  * Downloads all required page sets and archives from Google Storage.
  * Batch archives everything on the isolate server using the isolate file added
in https://codereview.chromium.org/1410353007/
  * Triggers a swarming task.
  * The swarming task runs benchmark on the archives and uploads results to
chromeperf.appspot.com
  * Waits for swarming task to complete.


I have triggered tasks on swarming and things appear to be working. Here are
some timings with 1 page repeat (this might be upped to 3 page repeats for more
reliable results)-
1 slave:     7 mins
2 slaves:    8 mins.
3 slaves:    9 mins
50 slaves:  38 mins (during MTV peak time)
100 slaves: 68 mins (early EST so not during MTV peak time)

Each swarming task appears to take around 6-7 mins.


BUG=skia:4503
==========

to

==========
CT Perf recipe to run benchmarks on the top 1k sites using swarming.

An explanation of what the recipe does:

* Downloads the chrome build from the parent build.
* For each shard we want to run:
  * Downloads all required page sets and archives from Google Storage.
  * Batch archives everything on the isolate server using the isolate file added
in https://codereview.chromium.org/1410353007/
  * Triggers a swarming task.
  * The swarming task runs benchmark on the archives and uploads results to
chromeperf.appspot.com
  * Waits for swarming task to complete.


I have triggered tasks on swarming and things appear to be working. Here are
some timings with 1 page repeat (this might be upped to 3 page repeats for more
reliable results)-
1 slave:     7 mins
2 slaves:    8 mins.
3 slaves:    9 mins
50 slaves:  38 mins (during MTV peak time)
100 slaves: 68 mins (early EST so not during MTV peak time)

Each swarming task appears to take around 6-7 mins.


BUG=skia:4503
==========

rmistry

* Cleaned up things after https://codereview.chromium.org/1410353007/ landed in chromium/src. * Added tests. * Removed changes ...

5 years, 1 month ago (2015-11-19 16:17:17 UTC) #59

rmistry

On 2015/11/19 16:17:17, rmistry wrote: > * Cleaned up things after https://codereview.chromium.org/1410353007/ landed in > ...

5 years, 1 month ago (2015-11-20 17:03:10 UTC) #60

M-A Ruel

https://codereview.chromium.org/1423993007/diff/820001/scripts/slave/recipe_modules/ct_swarming/api.py File scripts/slave/recipe_modules/ct_swarming/api.py (right): https://codereview.chromium.org/1423993007/diff/820001/scripts/slave/recipe_modules/ct_swarming/api.py#newcode30 scripts/slave/recipe_modules/ct_swarming/api.py:30: return self._downloads_dir I'd prefer return self.m.path['checkout'].join('content', 'test', 'ct') so ...

5 years, 1 month ago (2015-11-20 18:14:02 UTC) #61

rmistry

Also kept back changes to master.cfg and slaves.cfg because https://code.google.com/p/chromium/issues/detail?id=557269 was resolved. https://codereview.chromium.org/1423993007/diff/820001/scripts/slave/recipe_modules/ct_swarming/api.py File scripts/slave/recipe_modules/ct_swarming/api.py ...

5 years, 1 month ago (2015-11-23 15:13:49 UTC) #63

M-A Ruel

lgtm Thanks for bearing with me. I think the resulting CTSwarmingApi is easier to reason ...

5 years, 1 month ago (2015-11-23 15:19:57 UTC) #64

rmistry

Thanks for the thorough review! https://codereview.chromium.org/1423993007/diff/860001/scripts/slave/recipe_modules/ct_swarming/api.py File scripts/slave/recipe_modules/ct_swarming/api.py (right): https://codereview.chromium.org/1423993007/diff/860001/scripts/slave/recipe_modules/ct_swarming/api.py#newcode150 scripts/slave/recipe_modules/ct_swarming/api.py:150: for swarm_hash in swarm_hashes: ...

5 years, 1 month ago (2015-11-23 16:07:28 UTC) #66

nednguyen

nednguyen@google.com changed reviewers: + dtu@chromium.org

5 years, 1 month ago (2015-11-23 16:13:34 UTC) #67

rmistry

The CQ bit was checked by rmistry@google.com to run a CQ dry run

5 years, 1 month ago (2015-11-23 16:15:34 UTC) #69

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1423993007/920001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1423993007/920001

5 years, 1 month ago (2015-11-23 16:15:38 UTC) #70

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

5 years, 1 month ago (2015-11-23 16:18:48 UTC) #71

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

5 years, 1 month ago (2015-11-23 16:18:49 UTC) #72

rmistry

The patchset sent to the CQ was uploaded after l-g-t-m from maruel@chromium.org Link to the ...

5 years ago (2015-11-24 13:31:51 UTC) #74

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1423993007/920001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1423993007/920001

5 years ago (2015-11-24 13:32:02 UTC) #75

commit-bot: I haz the power

5 years ago (2015-11-24 13:35:11 UTC) #76

Message was sent while issue was closed.

Committed patchset #20 (id:920001) as
http://src.chromium.org/viewvc/chrome?view=rev&revision=297686

Issue 1423993007: CT Perf recipe to run benchmarks on the top 1k sites using swarming (Closed)

Description

Patch Set 1 : Initial upload #

Patch Set 2 : Working e2e but perf upload remaining #

Patch Set 3 : Ready for review after cleanup #

Patch Set 4 : Address feedback #

Patch Set 5 : Use batcharchive #

Patch Set 6 : Address comments #

Patch Set 7 : Extract reusable functionality into recipe modules #

Patch Set 8 : Use isolate and swarming recipes #

Patch Set 9 : Address comments #

Patch Set 10 : Create placeholder for tests #

Patch Set 11 : Add task_output_dir #

Patch Set 12 : Use extra vars instead of isolate templates #

Patch Set 13 : Set executable bit on binary #

Patch Set 14 : Use file api for writing and inline python for exec bit #

Patch Set 15 : Add tests and cleanup #

Patch Set 16 : Undo changes to master.cfg and slaves.cfg #

Patch Set 17 : Rename ct_top1k_rr_perf to ct_top1k_perf #

Patch Set 18 : Address comments #

Patch Set 19 : Use enumerate #

Patch Set 20 : Fix indentation in slaves.cfg #

Messages