build/android/bb_run_sharded_steps.py - Issue 12321138: Android: allows ignoring results of flaky sharded perf tests.

Side by Side Diff: build/android/bb_run_sharded_steps.py

Issue 12321138: Android: allows ignoring results of flaky sharded perf tests. (Closed) Base URL: svn://svn.chromium.org/chrome/trunk/src

Patch Set: Patch Created 7 years, 10 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch | Annotate | Revision Log

OLD	NEW
1 #!/usr/bin/env python	1 #!/usr/bin/env python

2 #	2 #

3 # Copyright (c) 2012 The Chromium Authors. All rights reserved.	3 # Copyright (c) 2012 The Chromium Authors. All rights reserved.

4 # Use of this source code is governed by a BSD-style license that can be	4 # Use of this source code is governed by a BSD-style license that can be

5 # found in the LICENSE file.	5 # found in the LICENSE file.

6	6

7 """Helper script to shard build bot steps and save results to disk.	7 """Helper script to shard build bot steps and save results to disk.

8	8

9 Our buildbot infrastructure requires each slave to run steps serially.	9 Our buildbot infrastructure requires each slave to run steps serially.

10 This is sub-optimal for android, where these steps can run independently on	10 This is sub-optimal for android, where these steps can run independently on

11 multiple connected devices.	11 multiple connected devices.

12	12

13 The buildbots will run this script multiple times per cycle:	13 The buildbots will run this script multiple times per cycle:

14 - First, without params: all steps will be executed in parallel using all	14 - First: all steps listed in -s in will be executed in parallel using all

15 connected devices. Step results will be pickled to disk (each step has a unique	15 connected devices. Step results will be pickled to disk. Each step has a unique

16 name).	16 name. The result code will be ignored if the step name is listed in --flaky.

17 The buildbot will treat this step as a regular step, and will not process any	17 The buildbot will treat this step as a regular step, and will not process any

18 graph data.	18 graph data.

19	19

20 - Then, with -p STEP_NAME: at this stage, we'll simply print the file with the	20 - Then, with -p STEP_NAME: at this stage, we'll simply print the file with the

21 step results previously saved. The buildbot will then process the graph data	21 step results previously saved. The buildbot will then process the graph data

22 accordingly.	22 accordingly.

23	23

24 The JSON config contains is a file containing a dictionary in the format:	24 The JSON config contains is a file containing a dictionary in the format:
	Tom Hudson 2013/02/26 16:42:36 While you're in here, fix the grammar of the comme While you're in here, fix the grammar of the comment? bulach 2013/02/26 17:14:16 ouch! fixed.. Show quoted text On 2013/02/26 16:42:36, Tom Hudson wrote: > While you're in here, fix the grammar of the comment? ouch! fixed..
25 {	25 {

26 'step_name_foo': 'script_to_execute foo',	26 "step_name_foo": "script_to_execute foo",

27 'step_name_bar': 'script_to_execute bar'	27 "step_name_bar": "script_to_execute bar"

28 }	28 }

29	29

	30 The JSON flaky file contains a list with step names which results should be

	31 ignored:

	32 [

	33 "step_name_foo",

	34 "step_name_bar"

	35 ]

	36

30 Note that script_to_execute necessarily have to take at least the following	37 Note that script_to_execute necessarily have to take at least the following

31 options:	38 options:

32 --device: the serial number to be passed to all adb commands.	39 --device: the serial number to be passed to all adb commands.

33 --keep_test_server_ports: indicates it's being run as a shard, and shouldn't	40 --keep_test_server_ports: indicates it's being run as a shard, and shouldn't

34 reset test server port allocation.	41 reset test server port allocation.

35 """	42 """

36	43

37	44

38 import datetime	45 import datetime

39 import json	46 import json

(...skipping 25 matching lines...) Expand all Loading...
65 results = []	72 results = []

66 for step in steps:	73 for step in steps:

67 start_time = datetime.datetime.now()	74 start_time = datetime.datetime.now()

68 print 'Starting %s: %s %s at %s' % (step['name'], step['cmd'],	75 print 'Starting %s: %s %s at %s' % (step['name'], step['cmd'],

69 start_time, step['device'])	76 start_time, step['device'])

70 output, exit_code = pexpect.run(	77 output, exit_code = pexpect.run(

71 step['cmd'], cwd=os.path.abspath(constants.CHROME_DIR),	78 step['cmd'], cwd=os.path.abspath(constants.CHROME_DIR),

72 withexitstatus=True, logfile=sys.stdout, timeout=1800,	79 withexitstatus=True, logfile=sys.stdout, timeout=1800,

73 env=os.environ)	80 env=os.environ)

74 end_time = datetime.datetime.now()	81 end_time = datetime.datetime.now()

75 print 'Finished %s: %s %s at %s' % (step['name'], step['cmd'],	82 exit_msg = '%s %s' % (exit_code or 0,
	Sami 2013/02/26 16:48:00 Maybe we should have a line above this one: e Maybe we should have a line above this one: exit_code = exit_code or 0 so we don't need to repeat the condition. bulach 2013/02/26 17:14:16 Done. Show quoted text On 2013/02/26 16:48:00, Sami wrote: > Maybe we should have a line above this one: > > exit_code = exit_code or 0 > > so we don't need to repeat the condition. Done.
76 end_time, step['device'])	83 '(ignored, flaky)' if step['flaky'] else '')

	84 print 'Finished %s: %s %s %s at %s' % (step['name'], exit_msg, step['cmd'],

	85 end_time, step['device'])
	Tom Hudson 2013/02/26 16:42:36 You're changing the format of the output string he You're changing the format of the output string here; I assume we don't have any scripts scraping it. bulach 2013/02/26 17:14:16 that's right, this part is purely informational. Show quoted text On 2013/02/26 16:42:36, Tom Hudson wrote: > You're changing the format of the output string here; I assume we don't have any > scripts scraping it. that's right, this part is purely informational.
	86 if step['flaky']:

	87 exit_code = 0

77 result = {'name': step['name'],	88 result = {'name': step['name'],

78 'output': output,	89 'output': output,

79 'exit_code': exit_code or 0,	90 'exit_code': exit_code or 0,

80 'total_time': (end_time - start_time).seconds,	91 'total_time': (end_time - start_time).seconds,

81 'device': step['device']}	92 'device': step['device']}

82 _SaveResult(result)	93 _SaveResult(result)

83 results += [result]	94 results += [result]

84 return results	95 return results

85	96

86	97

87 def _RunShardedSteps(steps, devices):	98 def _RunShardedSteps(steps, devices, flaky):
	Sami 2013/02/26 16:48:00 Call this something like flaky_steps to make it cl Call this something like flaky_steps to make it clearer? bulach 2013/02/26 17:14:16 that looks better, thanks! fixed, and also changed Show quoted text On 2013/02/26 16:48:00, Sami wrote: > Call this something like flaky_steps to make it clearer? that looks better, thanks! fixed, and also changed the order so it's "steps, flaky_steps, devices"... renamed everywhere.
88 assert steps	99 assert steps

89 assert devices, 'No devices connected?'	100 assert devices, 'No devices connected?'

90 if os.path.exists(_OUTPUT_DIR):	101 if os.path.exists(_OUTPUT_DIR):

91 assert '/step_results' in _OUTPUT_DIR	102 assert '/step_results' in _OUTPUT_DIR

92 shutil.rmtree(_OUTPUT_DIR)	103 shutil.rmtree(_OUTPUT_DIR)

93 if not os.path.exists(_OUTPUT_DIR):	104 if not os.path.exists(_OUTPUT_DIR):

94 os.makedirs(_OUTPUT_DIR)	105 os.makedirs(_OUTPUT_DIR)

95 step_names = sorted(steps.keys())	106 step_names = sorted(steps.keys())

96 all_params = []	107 all_params = []

97 num_devices = len(devices)	108 num_devices = len(devices)

98 shard_size = (len(steps) + num_devices - 1) / num_devices	109 shard_size = (len(steps) + num_devices - 1) / num_devices

99 for i, device in enumerate(devices):	110 for i, device in enumerate(devices):

100 steps_per_device = []	111 steps_per_device = []

101 for s in steps.keys()[i * shard_size:(i + 1) * shard_size]:	112 for s in steps.keys()[i * shard_size:(i + 1) * shard_size]:

102 steps_per_device += [{'name': s,	113 steps_per_device += [{'name': s,

103 'device': device,	114 'device': device,

	115 'flaky': s in flaky,

104 'cmd': steps[s] + ' --device ' + device +	116 'cmd': steps[s] + ' --device ' + device +

105 ' --keep_test_server_ports'}]	117 ' --keep_test_server_ports'}]

106 all_params += [steps_per_device]	118 all_params += [steps_per_device]

107 print 'Start sharding (note: output is not synchronized...)'	119 print 'Start sharding (note: output is not synchronized...)'

108 print '' 80	120 print '' 80

109 start_time = datetime.datetime.now()	121 start_time = datetime.datetime.now()

110 pool = multiprocessing.Pool(processes=num_devices)	122 pool = multiprocessing.Pool(processes=num_devices)

111 async_results = pool.map_async(_RunStepsPerDevice, all_params)	123 async_results = pool.map_async(_RunStepsPerDevice, all_params)

112 results_per_device = async_results.get(999999)	124 results_per_device = async_results.get(999999)

113 end_time = datetime.datetime.now()	125 end_time = datetime.datetime.now()

(...skipping 36 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
150 os.kill(int(pid), signal.SIGQUIT)	162 os.kill(int(pid), signal.SIGQUIT)

151 except Exception as e:	163 except Exception as e:

152 logging.warning('Failed killing %s %s %s', server, pid, e)	164 logging.warning('Failed killing %s %s %s', server, pid, e)

153	165

154	166

155 def main(argv):	167 def main(argv):

156 parser = optparse.OptionParser()	168 parser = optparse.OptionParser()

157 parser.add_option('-s', '--steps',	169 parser.add_option('-s', '--steps',

158 help='A JSON file containing all the steps to be '	170 help='A JSON file containing all the steps to be '

159 'sharded.')	171 'sharded.')

	172 parser.add_option('--flaky',

	173 help='A JSON file containing steps that are flaky and '

	174 'will have its exit code ignored.')

160 parser.add_option('-p', '--print_results',	175 parser.add_option('-p', '--print_results',

161 help='Only prints the results for the previously '	176 help='Only prints the results for the previously '

162 'executed step, do not run it again.')	177 'executed step, do not run it again.')

163 options, urls = parser.parse_args(argv)	178 options, urls = parser.parse_args(argv)

164 if options.print_results:	179 if options.print_results:

165 return _PrintStepOutput(options.print_results)	180 return _PrintStepOutput(options.print_results)

166	181

167 # At this point, we should kill everything that may have been left over from	182 # At this point, we should kill everything that may have been left over from

168 # previous runs.	183 # previous runs.

169 _KillPendingServers()	184 _KillPendingServers()

170	185

171 # Reset the test port allocation. It's important to do it before starting	186 # Reset the test port allocation. It's important to do it before starting

172 # to dispatch any step.	187 # to dispatch any step.

173 if not ports.ResetTestServerPortAllocation():	188 if not ports.ResetTestServerPortAllocation():

174 raise Exception('Failed to reset test server port.')	189 raise Exception('Failed to reset test server port.')

175	190

176 # Sort the devices so that we'll try to always run a step in the same device.	191 # Sort the devices so that we'll try to always run a step in the same device.

177 devices = sorted(android_commands.GetAttachedDevices())	192 devices = sorted(android_commands.GetAttachedDevices())

178 if not devices:	193 if not devices:

179 print 'You must attach a device'	194 print 'You must attach a device'

180 return 1	195 return 1

181	196

182 with file(options.steps, 'r') as f:	197 with file(options.steps, 'r') as f:

183 steps = json.load(f)	198 steps = json.load(f)

184 return _RunShardedSteps(steps, devices)	199 flaky = []

	200 if options.flaky:

	201 with file(options.flaky, 'r') as f:

	202 flaky = json.load(f)
	Sami 2013/02/26 16:48:00 Sort super duper bonus points should we sort the s Sort super duper bonus points should we sort the steps so that flaky tests always run last? That way the bot would show feedback sooner. Or will changing the order break something? bulach 2013/02/26 17:14:16 hmmm... that wouldn't make much difference in term Show quoted text On 2013/02/26 16:48:00, Sami wrote: > Sort super duper bonus points should we sort the steps so that flaky tests > always run last? That way the bot would show feedback sooner. Or will changing > the order break something? hmmm... that wouldn't make much difference in terms of getting feedback, unfortunately... :-/ the way it works, all of the tests run as a single step first, persist the result to disk, then once they are all done, the buildbot will print each one of these results... sure, the "single" step would have the stdout sooner, but that is fairly unreadable as it's not synchronized across the shards... (the _second_ step guarantees that each step will be printed sequentially).... leave as is?
	203 return _RunShardedSteps(steps, devices, flaky)

185	204

186	205

187 if __name__ == '__main__':	206 if __name__ == '__main__':

188 sys.exit(main(sys.argv))	207 sys.exit(main(sys.argv))

OLD	NEW

« no previous file with comments | « no previous file | no next file » | no next file with comments »