gm/rebaseline_server/download_actuals.py - Issue 310093003: rebaseline_server: download actual-results.json files from GCS instead of SVN

Side by Side Diff: gm/rebaseline_server/download_actuals.py

Issue 310093003: rebaseline_server: download actual-results.json files from GCS instead of SVN (Closed) Base URL: https://skia.googlesource.com/skia.git@master

Patch Set: download actual-results.json files from GCS instead of SVN Created 6 years, 6 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

OLD	NEW
1 #!/usr/bin/python	1 #!/usr/bin/python

2	2

3 """	3 """

4 Copyright 2014 Google Inc.	4 Copyright 2014 Google Inc.

5	5

6 Use of this source code is governed by a BSD-style license that can be	6 Use of this source code is governed by a BSD-style license that can be

7 found in the LICENSE file.	7 found in the LICENSE file.

8	8

9 Download actual GM results for a particular builder.	9 Download actual GM results for a particular builder.

10 """	10 """

(...skipping 87 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
98 results_of_this_type = actual_results_dict[result_type]	98 results_of_this_type = actual_results_dict[result_type]

99 if not results_of_this_type:	99 if not results_of_this_type:

100 continue	100 continue

101 for image_name in sorted(results_of_this_type.keys()):	101 for image_name in sorted(results_of_this_type.keys()):

102 (test, config) = self._image_filename_re.match(image_name).groups()	102 (test, config) = self._image_filename_re.match(image_name).groups()

103 (hash_type, hash_digest) = results_of_this_type[image_name]	103 (hash_type, hash_digest) = results_of_this_type[image_name]

104 source_url = gm_json.CreateGmActualUrl(	104 source_url = gm_json.CreateGmActualUrl(

105 test_name=test, hash_type=hash_type, hash_digest=hash_digest,	105 test_name=test, hash_type=hash_type, hash_digest=hash_digest,

106 gm_actuals_root_url=self._gm_actuals_root_url)	106 gm_actuals_root_url=self._gm_actuals_root_url)

107 dest_path = os.path.join(dest_dir, config, test + '.png')	107 dest_path = os.path.join(dest_dir, config, test + '.png')

108 # TODO(epoger): To speed this up, we should only download files that

109 # we don't already have on local disk.
epoger 2014/06/03 20:11:03 This comment turns out to be misleading... downloa This comment turns out to be misleading... download_actuals.py writes out files with NON-checksum-based filenames, so we can't easily tell whether the images we have already downloaded need to be overwritten.
110 copy_contents(source_url=source_url, dest_path=dest_path,	108 copy_contents(source_url=source_url, dest_path=dest_path,

111 create_subdirs_if_needed=True)	109 create_subdirs_if_needed=True)

112	110

113	111

114 def create_filepath_url(filepath):	112 def create_filepath_url(filepath):

115 """ Returns a file:/// URL pointing at the given filepath on local disk.	113 """ Returns a file:/// URL pointing at the given filepath on local disk.

116	114

117 For now, this is only used by unittests, but I anticipate it being useful	115 For now, this is only used by unittests, but I anticipate it being useful

118 in production, as a way for developers to run rebaseline_server over locally	116 in production, as a way for developers to run rebaseline_server over locally

119 generated images.	117 generated images.

(...skipping 35 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
155 """	153 """

156 if create_subdirs_if_needed:	154 if create_subdirs_if_needed:

157 dest_dir = os.path.dirname(dest_path)	155 dest_dir = os.path.dirname(dest_path)

158 if not os.path.exists(dest_dir):	156 if not os.path.exists(dest_dir):

159 os.makedirs(dest_dir)	157 os.makedirs(dest_dir)

160 with contextlib.closing(urllib.urlopen(source_url)) as source_handle:	158 with contextlib.closing(urllib.urlopen(source_url)) as source_handle:

161 with open(dest_path, 'wb') as dest_handle:	159 with open(dest_path, 'wb') as dest_handle:

162 shutil.copyfileobj(fsrc=source_handle, fdst=dest_handle)	160 shutil.copyfileobj(fsrc=source_handle, fdst=dest_handle)

163	161

164	162

	163 def gcs_download_file(source_bucket, source_path, dest_path,

	164 create_subdirs_if_needed=False):

	165 """ Downloads a single file from Google Cloud Storage to local disk.

	166

	167 EPOGER: move into a gcs utility module?
	epoger 2014/06/03 20:11:03 I think I should move gcs_download_file() and gcs_ I think I should move gcs_download_file() and gcs_list_bucket_contents() into a new gcs_utils.py module before committing this. Does that sound good to you? borenet 2014/06/03 20:27:47 Yes, that SGTM, but what's the difference then bet Show quoted text On 2014/06/03 20:11:03, epoger wrote: > I think I should move gcs_download_file() and gcs_list_bucket_contents() into a > new gcs_utils.py module before committing this. Does that sound good to you? Yes, that SGTM, but what's the difference then between "gcs_utils" and "gs_utils"? Should one supercede the other? This is another module which would be nice to share between buildbot and skia... epoger 2014/06/03 20:39:04 In this case I went with "gcs_utils.py" because (a Show quoted text On 2014/06/03 20:27:47, borenet wrote: > On 2014/06/03 20:11:03, epoger wrote: > > I think I should move gcs_download_file() and gcs_list_bucket_contents() into > a > > new gcs_utils.py module before committing this. Does that sound good to you? > > Yes, that SGTM, but what's the difference then between "gcs_utils" and > "gs_utils"? Should one supercede the other? This is another module which would > be nice to share between buildbot and skia... In this case I went with "gcs_utils.py" because (a) the public API documentation refers to Google Cloud Storage, and (b) it seemed helpful to be clear that this is distinct from the gs_utils.py file in buildbot and not using the gsutil binary. But I'm happy calling it just about anything. In buildbot-land, I think we should try to update gs_utils.py to use these API calls instead of the gsutil binary. (Ideally, the callers to it won't have to change at all, unless they want to.) I agree that it would be good for the two modules to be shared somehow. I'll add a TODO when I create the new module.
	168

	169 Args:

	170 source_bucket: GCS bucket to download the file from

	171 source_path: full path (Posix-style) within that bucket

	172 dest_path: full path (local-OS-style) on local disk to copy the file to

	173 create_subdirs_if_needed: boolean; whether to create subdirectories as

	174 needed to create dest_path

	175 """

	176 source_http_url = posixpath.join(

	177 'http://storage.googleapis.com', source_bucket, source_path)

	178 copy_contents(source_url=source_http_url, dest_path=dest_path,

	179 create_subdirs_if_needed=create_subdirs_if_needed)

	180

	181

165 def gcs_list_bucket_contents(bucket, subdir=None):	182 def gcs_list_bucket_contents(bucket, subdir=None):

166 """ Returns files in the Google Cloud Storage bucket as a (dirs, files) tuple.	183 """ Returns files in the Google Cloud Storage bucket as a (dirs, files) tuple.

167	184

168 Uses the API documented at	185 Uses the API documented at

169 https://developers.google.com/storage/docs/json_api/v1/objects/list	186 https://developers.google.com/storage/docs/json_api/v1/objects/list

170	187

	188 EPOGER: move into a gcs utility module?

	189

171 Args:	190 Args:

172 bucket: name of the Google Storage bucket	191 bucket: name of the Google Storage bucket

173 subdir: directory within the bucket to list, or None for root directory	192 subdir: directory within the bucket to list, or None for root directory

174 """	193 """

175 # The GCS command relies on the subdir name (if any) ending with a slash.	194 # The GCS command relies on the subdir name (if any) ending with a slash.

176 if subdir and not subdir.endswith('/'):	195 if subdir and not subdir.endswith('/'):

177 subdir += '/'	196 subdir += '/'

178 subdir_length = len(subdir) if subdir else 0	197 subdir_length = len(subdir) if subdir else 0

179	198

180 storage = build_service('storage', 'v1')	199 storage = build_service('storage', 'v1')

(...skipping 11 matching lines...) Expand all Loading...
192 dir_basename = dir_fullpath[subdir_length:]	211 dir_basename = dir_fullpath[subdir_length:]

193 dirs.append(dir_basename[:-1]) # strip trailing slash	212 dirs.append(dir_basename[:-1]) # strip trailing slash

194 files = []	213 files = []

195 for file_properties in results.get('items', []):	214 for file_properties in results.get('items', []):

196 file_fullpath = file_properties['name']	215 file_fullpath = file_properties['name']

197 file_basename = file_fullpath[subdir_length:]	216 file_basename = file_fullpath[subdir_length:]

198 files.append(file_basename)	217 files.append(file_basename)

199 return (dirs, files)	218 return (dirs, files)

200	219

201	220

	221 def get_builders_list(summaries_bucket=GM_SUMMARIES_BUCKET):

	222 """ Returns the list of builders we have actual results for.

	223

	224 Args:

	225 summaries_bucket: Google Cloud Storage bucket containing the summary

	226 JSON files

	227 """

	228 dirs, _ = gcs_list_bucket_contents(bucket=GM_SUMMARIES_BUCKET)

	229 return dirs

	230

	231

202 def main():	232 def main():

203 parser = optparse.OptionParser()	233 parser = optparse.OptionParser()

204 required_params = []	234 required_params = []

205 parser.add_option('--actuals-base-url',	235 parser.add_option('--actuals-base-url',

206 action='store', type='string',	236 action='store', type='string',

207 default=DEFAULT_ACTUALS_BASE_URL,	237 default=DEFAULT_ACTUALS_BASE_URL,

208 help=('Base URL from which to read files containing JSON '	238 help=('Base URL from which to read files containing JSON '

209 'summaries of actual GM results; defaults to '	239 'summaries of actual GM results; defaults to '

210 '"%default".'))	240 '"%default".'))

211 required_params.append('builder')	241 required_params.append('builder')

(...skipping 15 matching lines...) Expand all Loading...
227 parser.add_option('--json-filename',	257 parser.add_option('--json-filename',

228 action='store', type='string',	258 action='store', type='string',

229 default=DEFAULT_JSON_FILENAME,	259 default=DEFAULT_JSON_FILENAME,

230 help=('JSON summary filename to read for each builder; '	260 help=('JSON summary filename to read for each builder; '

231 'defaults to "%default".'))	261 'defaults to "%default".'))

232 parser.add_option('--list-builders', action='store_true',	262 parser.add_option('--list-builders', action='store_true',

233 help=('List all available builders.'))	263 help=('List all available builders.'))

234 (params, remaining_args) = parser.parse_args()	264 (params, remaining_args) = parser.parse_args()

235	265

236 if params.list_builders:	266 if params.list_builders:

237 dirs, _ = gcs_list_bucket_contents(bucket=GM_SUMMARIES_BUCKET)	267 print '\n'.join(get_builders_list())

238 print '\n'.join(dirs)

239 return	268 return

240	269

241 # Make sure all required options were set,	270 # Make sure all required options were set,

242 # and that there were no items left over in the command line.	271 # and that there were no items left over in the command line.

243 for required_param in required_params:	272 for required_param in required_params:

244 if not getattr(params, required_param):	273 if not getattr(params, required_param):

245 raise Exception('required option \'%s\' was not set' % required_param)	274 raise Exception('required option \'%s\' was not set' % required_param)

246 if len(remaining_args) is not 0:	275 if len(remaining_args) is not 0:

247 raise Exception('extra items specified in the command line: %s' %	276 raise Exception('extra items specified in the command line: %s' %

248 remaining_args)	277 remaining_args)

249	278

250 downloader = Download(actuals_base_url=params.actuals_base_url)	279 downloader = Download(actuals_base_url=params.actuals_base_url)

251 downloader.fetch(builder_name=params.builder,	280 downloader.fetch(builder_name=params.builder,

252 dest_dir=params.dest_dir)	281 dest_dir=params.dest_dir)

253	282

254	283

255	284

256 if __name__ == '__main__':	285 if __name__ == '__main__':

257 main()	286 main()

OLD	NEW

« no previous file with comments | « no previous file | gm/rebaseline_server/server.py » ('j') | gm/rebaseline_server/server.py » ('J')