tools/android/loading/sandwich_tasks.py - Issue 1872313002: sandwich: Implement SandwichTaskBuilder

Side by Side Diff: tools/android/loading/sandwich_tasks.py

Issue 1872313002: sandwich: Implement SandwichTaskBuilder (Closed) Base URL: https://chromium.googlesource.com/chromium/src.git@master

Patch Set: Created 4 years, 8 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

OLD	NEW
(Empty)
	1 # Copyright 2016 The Chromium Authors. All rights reserved.

	2 # Use of this source code is governed by a BSD-style license that can be

	3 # found in the LICENSE file.

	4

	5 import csv

	6 import json

	7 import logging

	8 import os

	9 import re

	10 import shutil

	11

	12 import chrome_cache

	13 import emulation

	14 import loading_trace

	15 import loading_trace_analyzer

	16 import sandwich_metrics

	17 import sandwich_misc

	18 from sandwich_runner import SandwichRunner

	19 import task_manager

	20

	21

	22 def NoRunnerModification(runner):
	pasko 2016/04/11 14:54:08 This would be a NoOpTransformer, but actually I wo This would be a NoOpTransformer, but actually I would prefer to avoid having a small function like this, we can just: builder.PopulateNoStatePrefetchLoadBenchmark( sandwich_runner_transformer=lambda arg: None) gabadie 2016/04/13 09:53:44 Done. Show quoted text On 2016/04/11 14:54:08, pasko wrote: > This would be a NoOpTransformer, but actually I would prefer to avoid having a > small function like this, we can just: > > builder.PopulateNoStatePrefetchLoadBenchmark( > sandwich_runner_transformer=lambda arg: None) Done.
	23 """Callback that don't modify a sandwich runner.

	24

	25 Args:

	26 runner: A SandwichRunner to modify.

	27 """

	28 assert isinstance(runner, SandwichRunner)

	29

	30

	31 def EmulateNetworkModifier(network_condition):
	pasko 2016/04/11 14:54:08 As mentioned before, NetworkSimulationTransformer As mentioned before, NetworkSimulationTransformer would be more appropriate as a name gabadie 2016/04/13 09:53:44 Done. Show quoted text On 2016/04/11 14:54:08, pasko wrote: > As mentioned before, NetworkSimulationTransformer would be more appropriate as a > name Done.
	32 """Factory that create callbacks to modify a sandwich runner for a specific
	pasko 2016/04/11 14:54:08 Creates a function that accepts a SandwichRunner a Creates a function that accepts a SandwichRunner as a parameter and sets network emulation options on it. gabadie 2016/04/13 09:53:44 Done. Show quoted text On 2016/04/11 14:54:08, pasko wrote: > Creates a function that accepts a SandwichRunner as a parameter and sets network > emulation options on it. Done.
	33 browser sided network emulation.

	34

	35 Args:

	36 network_condition: The network condition to apply to the sandwich runner.

	37

	38 Returns:

	39 A callback modifying the runner given in argument accordingly
	pasko 2016/04/11 14:54:08 needs to mention \|SandwichRunner\| needs to mention \|SandwichRunner\| gabadie 2016/04/13 09:53:44 Done. Show quoted text On 2016/04/11 14:54:08, pasko wrote: > needs to mention \|SandwichRunner\| Done.
	40 """

	41 assert network_condition in emulation.NETWORK_CONDITIONS

	42 def RunnerModifier(runner):
	pasko 2016/04/11 14:54:09 s/runner/sandwich_runner/ s/runner/sandwich_runner/ gabadie 2016/04/13 09:53:44 Done. Show quoted text On 2016/04/11 14:54:09, pasko wrote: > s/runner/sandwich_runner/ Done.
	43 runner.network_condition = network_condition
	pasko 2016/04/11 14:54:08 assert isinstance(sandwich_runner, SandwichRunner) assert isinstance(sandwich_runner, SandwichRunner) gabadie 2016/04/13 09:53:45 Done. Show quoted text On 2016/04/11 14:54:08, pasko wrote: > assert isinstance(sandwich_runner, SandwichRunner) Done.
	44 return RunnerModifier

	45

	46

	47 class SandwichTaskBuilder(task_manager.Builder):

	48 """Sandwich's tasks builder."""
	pasko 2016/04/11 14:54:09 Needs more documentation. Something like: """A bu Needs more documentation. Something like: """A builder for a graph of tasks, each prepares or invokes the SandwichRunner. Also, we need a list of files/directories the task builder manages (limited to final targets). It can be at the top of the file, or here. gabadie 2016/04/13 09:53:44 Done. Show quoted text On 2016/04/11 14:54:09, pasko wrote: > Needs more documentation. > > Something like: > """A builder for a graph of tasks, each prepares or invokes the SandwichRunner. Done. Show quoted text > > Also, we need a list of files/directories the task builder manages (limited to > final targets). It can be at the top of the file, or here. I don't think so, because it is pretty straight forward with the dependency graph visualization, but the comment may become out-dated.
	49

	50 def __init__(self, output_directory, job_path, url_repeat):

	51 """Constructor.

	52

	53 Args:

	54 output_directory: Output directory where the dynamic tasks will be
	pasko 2016/04/11 14:54:09 output_directory: As in task_manager.Builder.__ini output_directory: As in task_manager.Builder.__init__ job_path: ... url_repeat: ... gabadie 2016/04/13 09:53:44 Oups... Done! Show quoted text On 2016/04/11 14:54:09, pasko wrote: > output_directory: As in task_manager.Builder.__init__ > job_path: ... > url_repeat: ... Oups... Done!
	55 generated to.

	56 """

	57 task_manager.Builder.__init__(self, output_directory)

	58 self._job_path = job_path

	59 self._url_repeat = url_repeat

	60 self.default_final_tasks = []
	pasko 2016/04/11 14:54:08 we want to avoid callers of this class from modify we want to avoid callers of this class from modifying default_final_tasks, so it would be better to have: def GetFinalTasks(self): return self._default_final_tasks gabadie 2016/04/13 09:53:44 used a @proprety. Done. Show quoted text On 2016/04/11 14:54:08, pasko wrote: > we want to avoid callers of this class from modifying default_final_tasks, so it > would be better to have: > def GetFinalTasks(self): > return self._default_final_tasks used a @proprety. Done. pasko 2016/04/14 12:34:42 did you want to allow users of the object to overw Show quoted text On 2016/04/13 09:53:44, gabadie wrote: > On 2016/04/11 14:54:08, pasko wrote: > > we want to avoid callers of this class from modifying default_final_tasks, so > it > > would be better to have: > > def GetFinalTasks(self): > > return self._default_final_tasks > > used a @proprety. Done. did you want to allow users of the object to overwrite the list? I think that is not desired, in which case it would be preferable to disallow modifications on the class API level. @property would be too verbose for it, that is why I suggested a simple getter. gabadie 2016/04/14 15:43:32 Sorry, I still don't see the issue with a simple g Show quoted text On 2016/04/14 12:34:42, pasko wrote: > On 2016/04/13 09:53:44, gabadie wrote: > > On 2016/04/11 14:54:08, pasko wrote: > > > we want to avoid callers of this class from modifying default_final_tasks, > so > > it > > > would be better to have: > > > def GetFinalTasks(self): > > > return self._default_final_tasks > > > > used a @proprety. Done. > > did you want to allow users of the object to overwrite the list? I think that is > not desired, in which case it would be preferable to disallow modifications on > the class API level. @property would be too verbose for it, that is why I > suggested a simple getter. Sorry, I still don't see the issue with a simple getter with a @property.
	61

	62 def __enter__(self):
	pasko 2016/04/11 14:54:08 not needed? not needed? gabadie 2016/04/13 09:53:44 My bad. Done Show quoted text On 2016/04/11 14:54:08, pasko wrote: > not needed? My bad. Done
	63 return self

	64

	65 def __exit__(self, exc_type, exc_val, exc_tb):
	pasko 2016/04/11 14:54:08 not needed any more? not needed any more? gabadie 2016/04/13 09:53:44 Done. Show quoted text On 2016/04/11 14:54:08, pasko wrote: > not needed any more? Done.
	66 pass

	67

	68 def _CreateRunner(self):
	pasko 2016/04/11 14:54:08 This methd is called only once, it is preferable t This methd is called only once, it is preferable to inline it because the helpful work it does is not obvious from the name. gabadie 2016/04/13 09:53:44 Done. Show quoted text On 2016/04/11 14:54:08, pasko wrote: > This methd is called only once, it is preferable to inline it because the > helpful work it does is not obvious from the name. Done.
	69 """Create a runner that may be used for benchmark purposes."""

	70 runner = self._CreateNonBenchmarkRunner()

	71 runner.record_video = True

	72 runner.job_repeat = self._url_repeat

	73 return runner

	74

	75 def _CreateNonBenchmarkRunner(self):
	pasko 2016/04/11 14:54:09 _CreateSandwichRunner to be clearer _CreateSandwichRunner to be clearer gabadie 2016/04/13 09:53:45 Done. Show quoted text On 2016/04/11 14:54:09, pasko wrote: > _CreateSandwichRunner to be clearer Done.
	76 """Create a runner for non benchmark purposes."""

	77 runner = SandwichRunner()

	78 runner.LoadJob(self._job_path)

	79 return runner

	80

	81 def SetOriginalWprPath(self, original_wpr_path):
	pasko 2016/04/11 14:54:08 It is not clear what 'original' refers to. It woul It is not clear what 'original' refers to. It would be clearer to: OverridePathToWprArchive() gabadie 2016/04/13 09:53:45 Done. Show quoted text On 2016/04/11 14:54:08, pasko wrote: > It is not clear what 'original' refers to. It would be clearer to: > OverridePathToWprArchive() Done.
	82 """Sets the original WPR archive path's to be used.

	83

	84 Args:

	85 original_wpr_path: Path of the original WPR archive to be used.

	86 """

	87 return self.CreateStaticTask('webpages.wpr', original_wpr_path)
	pasko 2016/04/11 14:54:09 'webpages.wpr' -> _WPR_ARCHIVE_NAME as a constant 'webpages.wpr' -> _WPR_ARCHIVE_NAME as a constant to avoid typos. gabadie 2016/04/13 09:53:44 Done. Show quoted text On 2016/04/11 14:54:09, pasko wrote: > 'webpages.wpr' -> _WPR_ARCHIVE_NAME as a constant to avoid typos. Done.
	88

	89 def PopulateWPRRecordingTask(self):
	pasko 2016/04/11 14:54:09 naming: s/WPR/Wpr/ naming: s/WPR/Wpr/ gabadie 2016/04/13 09:53:45 Done. Show quoted text On 2016/04/11 14:54:09, pasko wrote: > naming: s/WPR/Wpr/ Done.
	90 """Records the original WPR archive."""

	91 @self.RegisterTask('webpages.wpr')

	92 def BuildOriginalWpr():

	93 runner = self._CreateNonBenchmarkRunner()

	94 runner.wpr_archive_path = BuildOriginalWpr.path

	95 runner.wpr_record = True

	96 runner.Run()

	97

	98 return BuildOriginalWpr

	99

	100 def PopulateCommonPipelines(self):

	101 """Populates the pipeline that create the reference cache archive and list
	pasko 2016/04/11 14:54:08 It makes sense to explain in each Populate* what s It makes sense to explain in each Populate* what subgraph it creates. It would help readability because this code is 'optimized' for understanding what individual tasks do, but to make sense of all of them together, one needs to look at the .dot, and there you cannot see what each endividual Populate* function did, so I'd suggest to provide some help for the reader here, like: cache-ref-validation.log depends on: cache-ref.zip depends on: webpages-patched.wpr depends on: webpages.wpr depends on: urls-resources.json depends on: urls-resources-run/ depends on: webpages.wpr gabadie 2016/04/13 09:53:45 Done. Show quoted text On 2016/04/11 14:54:08, pasko wrote: > It makes sense to explain in each Populate* what subgraph it creates. > > It would help readability because this code is 'optimized' for understanding > what individual tasks do, but to make sense of all of them together, one needs > to look at the .dot, and there you cannot see what each endividual Populate* > function did, so I'd suggest to provide some help for the reader here, like: > > cache-ref-validation.log > depends on: cache-ref.zip > depends on: webpages-patched.wpr > depends on: webpages.wpr > depends on: urls-resources.json > depends on: urls-resources-run/ > depends on: webpages.wpr Done.
	102 of sub-resources per urls.

	103

	104 Returns:

	105 The last task of the pipeline.

	106 """

	107 original_wpr_task = self.tasks['webpages.wpr']

	108

	109 @self.RegisterTask('webpages-patched.wpr', [original_wpr_task])

	110 def BuildPatchedWpr():

	111 shutil.copyfile(original_wpr_task.path, BuildPatchedWpr.path)

	112 sandwich_misc.PatchWpr(BuildPatchedWpr.path)

	113

	114 @self.RegisterTask('cache-ref.zip', [BuildPatchedWpr])

	115 def BuildReferenceCache():

	116 runner = self._CreateNonBenchmarkRunner()

	117 runner.wpr_archive_path = BuildPatchedWpr.path

	118 runner.cache_archive_path = BuildReferenceCache.path

	119 runner.cache_operation = 'save'

	120 runner.Run()

	121

	122 # TODO(gabadie): It could be possible to generate the traces in the

	123 # cache-ref.zip run, but we would need to implement an octopus dynamic

	124 # task that can generate several files at a time.
	pasko 2016/04/11 14:54:09 It is not clear what this TODO proposes, maybe rem It is not clear what this TODO proposes, maybe remove it? As far as I can tell it does not need 'octopus dynamic task', it would be enough to allow to have a dict of assets that a task produces, and then two tasks can grab two outputs by name. gabadie 2016/04/13 09:53:44 Done. Show quoted text On 2016/04/11 14:54:09, pasko wrote: > It is not clear what this TODO proposes, maybe remove it? > > As far as I can tell it does not need 'octopus dynamic task', it would be enough > to allow to have a dict of assets that a task produces, and then two tasks can > grab two outputs by name. Done.
	125 @self.RegisterTask('urls-resources-run/', [original_wpr_task])

	126 def UrlsResourcesRun():

	127 runner = self._CreateNonBenchmarkRunner()

	128 runner.wpr_archive_path = original_wpr_task.path

	129 runner.cache_operation = 'clear'

	130 runner.trace_output_directory = UrlsResourcesRun.path

	131 runner.Run()

	132

	133 @self.RegisterTask('urls-resources.json', [UrlsResourcesRun])
	pasko 2016/04/11 14:54:08 subresources-for-urls.json would wake it clear whe subresources-for-urls.json would wake it clear where resources come from, right now it sounds like some urls and some resources with unknown relation to each other gabadie 2016/04/13 09:53:44 Done. Show quoted text On 2016/04/11 14:54:08, pasko wrote: > subresources-for-urls.json would wake it clear where resources come from, right > now it sounds like some urls and some resources with unknown relation to each > other Done.
	134 def ListUrlsResources():

	135 json_content = sandwich_misc.ListResourcesUrls(UrlsResourcesRun.path)

	136 with open(ListUrlsResources.path, 'w') as output:

	137 json.dump(json_content, output)

	138

	139 @self.RegisterTask('cache-ref-validation.log',

	140 [BuildReferenceCache, ListUrlsResources])

	141 def ValidateReferenceCache():

	142 json_content = json.load(open(ListUrlsResources.path))

	143 ref_urls = set()

	144 for urls in json_content.values():

	145 ref_urls.update(set(urls))

	146 sandwich_misc.ValidateCacheArchiveContent(

	147 ref_urls, BuildReferenceCache.path)

	148

	149 self.default_final_tasks.append(ValidateReferenceCache)

	150 return ValidateReferenceCache

	151

	152 def PopulateBenchmarkPipeline(self, setup_task, runner_modifier,

	153 benchmark_name):

	154 """Populate the a benchmark's pipeline from it's setup tasks.

	155

	156 Args:

	157 setup_task: The benchmark's setup task.

	158 runner_modifier: A callback to modify the sandwich runner.

	159 benchmark_name: The benchmark's name for that runner modifier.

	160

	161 Returns:

	162 The last task of the pipeline.

	163 """

	164 assert setup_task.name.endswith('-setup.json'), \

	165 'Task \'{}\' is not a benchmark setup.'.format(setup_task.name)

	166 benchmark_familly_name = setup_task.name[:-len('-setup.json')]

	167 patched_wpr_task = self.tasks['webpages-patched.wpr']

	168 reference_cache_task = self.tasks['cache-ref.zip']

	169

	170 @self.RegisterTask(benchmark_familly_name + '-cache.zip',

	171 dependencies=[setup_task, reference_cache_task],

	172 merge=True)

	173 def BuildBenchmarkCacheArchive():

	174 setup = json.load(open(setup_task.path))

	175 chrome_cache.ApplyUrlWhitelistToCacheArchive(

	176 cache_archive_path=reference_cache_task.path,

	177 whitelisted_urls=setup['cache_whitelist'],

	178 output_cache_archive_path=BuildBenchmarkCacheArchive.path)

	179

	180 @self.RegisterTask(benchmark_name + '-run/',

	181 dependencies=[BuildBenchmarkCacheArchive])

	182 def RunBenchmark():

	183 runner = self._CreateRunner()

	184 runner_modifier(runner)

	185 runner.wpr_archive_path = patched_wpr_task.path

	186 runner.wpr_out_log_path = os.path.join(RunBenchmark.path, 'wpr.log')

	187 runner.cache_archive_path = BuildBenchmarkCacheArchive.path

	188 runner.cache_operation = 'push'

	189 runner.trace_output_directory = RunBenchmark.path

	190 runner.Run()

	191

	192 @self.RegisterTask(benchmark_name + '-metrics.csv',

	193 dependencies=[RunBenchmark])

	194 def ExtractMetrics():

	195 sandwich_misc.VerifyBenchmarkOutputDirectory(

	196 setup_task.path, RunBenchmark.path)

	197 trace_metrics_list = sandwich_metrics.PullMetricsFromOutputDirectory(

	198 RunBenchmark.path)

	199 trace_metrics_list.sort(key=lambda e: e['id'])

	200 with open(ExtractMetrics.path, 'w') as csv_file:

	201 writer = csv.DictWriter(csv_file,

	202 fieldnames=sandwich_metrics.CSV_FIELD_NAMES)

	203 writer.writeheader()

	204 for trace_metrics in trace_metrics_list:

	205 writer.writerow(trace_metrics)

	206

	207 self.default_final_tasks.append(ExtractMetrics)

	208 return ExtractMetrics

	209

	210 def PopulateFullCacheLoadBenchmark(self, benchmark_name='fullcache',

	211 runner_modifier=NoRunnerModification):

	212 """Populates the full cache load benchmark's pipeline.

	213

	214 Args:
	pasko 2016/04/11 14:54:08 need args here need args here gabadie 2016/04/13 09:53:44 My bad. Done. Show quoted text On 2016/04/11 14:54:08, pasko wrote: > need args here My bad. Done.
	215

	216 Returns:

	217 The last task of the pipeline.

	218 """

	219 urls_resources_task = self.tasks['urls-resources.json']

	220

	221 @self.RegisterTask('fullcache-setup.json',

	222 dependencies=[urls_resources_task],

	223 merge=True)

	224 def SetupBenchmark():

	225 urls_resources = json.load(open(urls_resources_task.path))

	226 assert len(urls_resources) == 1, \

	227 "This recipe is not ready for multiple urls."

	228 url = urls_resources.keys()[0]

	229 url_resources = urls_resources[url]

	230 with open(SetupBenchmark.path, 'w') as output:

	231 json.dump({

	232 'cache_whitelist': url_resources,

	233 'url_resources': url_resources,

	234 }, output)

	235

	236

	237 return self.PopulateBenchmarkPipeline(

	238 SetupBenchmark, runner_modifier, benchmark_name)

	239

	240 def PopulateClearCacheLoadBenchmark(self, benchmark_name='clearcache',

	241 runner_modifier=NoRunnerModification):

	242 """Populates the clear cache load benchmark's pipeline.

	243

	244 Returns:

	245 The last task of the pipeline.

	246 """

	247 urls_resources_task = self.tasks['urls-resources.json']

	248

	249 @self.RegisterTask('clearcache-setup.json',

	250 dependencies=[urls_resources_task],

	251 merge=True)

	252 def SetupBenchmark():

	253 urls_resources = json.load(open(urls_resources_task.path))

	254 assert len(urls_resources) == 1, \

	255 "This recipe is not ready for multiple urls."

	256 url = urls_resources.keys()[0]

	257 url_resources = urls_resources[url]

	258 with open(SetupBenchmark.path, 'w') as output:

	259 json.dump({

	260 'cache_whitelist': [],

	261 'url_resources': url_resources,

	262 }, output)

	263

	264 return self.PopulateBenchmarkPipeline(

	265 SetupBenchmark, runner_modifier, benchmark_name)

	266

	267 def PopulateNoStatePrefetchLoadBenchmark(self, benchmark_name='prefetch',

	268 runner_modifier=NoRunnerModification):

	269 """Populates the NoState-Prefetch load benchmark's pipeline.

	270

	271 Returns:

	272 The last task of the pipeline.

	273 """

	274 # TODO(gabadie): make it generic for the different sub-resource discoverer.

	275 urls_resources_run_task = self.tasks['urls-resources-run/']
	pasko 2016/04/11 14:54:08 urls-resources-run/ and urls-resources.json are us urls-resources-run/ and urls-resources.json are used several times in this file, consider using constants gabadie 2016/04/13 09:53:44 Done. Show quoted text On 2016/04/11 14:54:08, pasko wrote: > urls-resources-run/ and urls-resources.json are used several times in this file, > consider using constants Done.
	276 urls_resources_task = self.tasks['urls-resources.json']

	277

	278 @self.RegisterTask('prefetch-setup.json',

	279 dependencies=[urls_resources_task],

	280 merge=True)

	281 def SetupBenchmark():

	282 trace_path = os.path.join(urls_resources_run_task.path, '0/trace.json')

	283 whitelisted_urls = sandwich_misc.ExtractDiscoverableUrls(

	284 trace_path, sandwich_misc.HTML_PRELOAD_SCANNER_DISCOVERER)

	285

	286 urls_resources = json.load(open(urls_resources_task.path))

	287 assert len(urls_resources) == 1, \

	288 "This recipe is not ready for multiple urls."

	289 url = urls_resources.keys()[0]

	290 url_resources = urls_resources[url]

	291 with open(SetupBenchmark.path, 'w') as output:

	292 json.dump({

	293 'cache_whitelist': [url for url in whitelisted_urls],

	294 'url_resources': url_resources,

	295 }, output)

	296

	297 return self.PopulateBenchmarkPipeline(

	298 SetupBenchmark, runner_modifier, benchmark_name)

OLD	NEW

« tools/android/loading/sandwich.py ('K') | « tools/android/loading/sandwich_misc.py ('k') | no next file » | no next file with comments »