Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(101)

Side by Side Diff: third_party/gsutil/gslib/commands/rsync.py

Issue 1380943003: Roll version of gsutil to 4.15. (Closed) Base URL: https://github.com/catapult-project/catapult.git@master
Patch Set: rebase Created 5 years ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
« no previous file with comments | « third_party/gsutil/gslib/commands/rm.py ('k') | third_party/gsutil/gslib/commands/stat.py » ('j') | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 # -*- coding: utf-8 -*- 1 # -*- coding: utf-8 -*-
2 # Copyright 2014 Google Inc. All Rights Reserved. 2 # Copyright 2014 Google Inc. All Rights Reserved.
3 # 3 #
4 # Licensed under the Apache License, Version 2.0 (the "License"); 4 # Licensed under the Apache License, Version 2.0 (the "License");
5 # you may not use this file except in compliance with the License. 5 # you may not use this file except in compliance with the License.
6 # You may obtain a copy of the License at 6 # You may obtain a copy of the License at
7 # 7 #
8 # http://www.apache.org/licenses/LICENSE-2.0 8 # http://www.apache.org/licenses/LICENSE-2.0
9 # 9 #
10 # Unless required by applicable law or agreed to in writing, software 10 # Unless required by applicable law or agreed to in writing, software
(...skipping 13 matching lines...) Expand all
24 import re 24 import re
25 import tempfile 25 import tempfile
26 import textwrap 26 import textwrap
27 import traceback 27 import traceback
28 import urllib 28 import urllib
29 29
30 from boto import config 30 from boto import config
31 import crcmod 31 import crcmod
32 32
33 from gslib import copy_helper 33 from gslib import copy_helper
34 from gslib.bucket_listing_ref import BucketListingObject
34 from gslib.cloud_api import NotFoundException 35 from gslib.cloud_api import NotFoundException
35 from gslib.command import Command 36 from gslib.command import Command
36 from gslib.command import DummyArgChecker 37 from gslib.command import DummyArgChecker
37 from gslib.command_argument import CommandArgument 38 from gslib.command_argument import CommandArgument
38 from gslib.copy_helper import CreateCopyHelperOpts 39 from gslib.copy_helper import CreateCopyHelperOpts
39 from gslib.copy_helper import SkipUnsupportedObjectError 40 from gslib.copy_helper import SkipUnsupportedObjectError
40 from gslib.cs_api_map import ApiSelector 41 from gslib.cs_api_map import ApiSelector
41 from gslib.exception import CommandException 42 from gslib.exception import CommandException
42 from gslib.hashing_helper import CalculateB64EncodedCrc32cFromContents 43 from gslib.hashing_helper import CalculateB64EncodedCrc32cFromContents
43 from gslib.hashing_helper import CalculateB64EncodedMd5FromContents 44 from gslib.hashing_helper import CalculateB64EncodedMd5FromContents
(...skipping 202 matching lines...) Expand 10 before | Expand all | Expand 10 after
246 you are running a Python library for computing CRC32C, which is much slower 247 you are running a Python library for computing CRC32C, which is much slower
247 than using the compiled code. For information on getting a compiled CRC32C 248 than using the compiled code. For information on getting a compiled CRC32C
248 implementation, see 'gsutil help crc32c'. 249 implementation, see 'gsutil help crc32c'.
249 250
250 251
251 <B>LIMITATIONS</B> 252 <B>LIMITATIONS</B>
252 1. The gsutil rsync command doesn't make the destination object's timestamps 253 1. The gsutil rsync command doesn't make the destination object's timestamps
253 match those of the source object (it can't; timestamp setting is not 254 match those of the source object (it can't; timestamp setting is not
254 allowed by the GCS API). 255 allowed by the GCS API).
255 256
256 2. The gsutil rsync command ignores versioning, synchronizing only the live 257 2. The gsutil rsync command considers only the current object generations in
257 object versions in versioned buckets. 258 the source and destination buckets when deciding what to copy / delete. If
259 versioning is enabled in the destination bucket then gsutil rsync's
260 overwriting or deleting objects will end up creating versions, but the
261 command doesn't try to make the archived generations match in the source
262 and destination buckets.
263
258 264
259 265
260 <B>OPTIONS</B> 266 <B>OPTIONS</B>
261 -c Causes the rsync command to compute checksums for files if the 267 -c Causes the rsync command to compute checksums for files if the
262 size of source and destination match, and then compare 268 size of source and destination match, and then compare
263 checksums. This option increases local disk I/O and run time 269 checksums. This option increases local disk I/O and run time
264 if either src_url or dst_url are on the local file system. 270 if either src_url or dst_url are on the local file system.
265 271
266 -C If an error occurs, continue to attempt to copy the remaining 272 -C If an error occurs, continue to attempt to copy the remaining
267 files. If errors occurred, gsutil's exit status will be non-zero 273 files. If errors occurred, gsutil's exit status will be non-zero
(...skipping 28 matching lines...) Expand all
296 rsync -p if you want all objects in the destination bucket to 302 rsync -p if you want all objects in the destination bucket to
297 end up with the same ACL by setting a default object ACL on that 303 end up with the same ACL by setting a default object ACL on that
298 bucket instead of using rsync -p. See 'help gsutil defacl'. 304 bucket instead of using rsync -p. See 'help gsutil defacl'.
299 305
300 -R, -r Causes directories, buckets, and bucket subdirectories to be 306 -R, -r Causes directories, buckets, and bucket subdirectories to be
301 synchronized recursively. If you neglect to use this option 307 synchronized recursively. If you neglect to use this option
302 gsutil will make only the top-level directory in the source 308 gsutil will make only the top-level directory in the source
303 and destination URLs match, skipping any sub-directories. 309 and destination URLs match, skipping any sub-directories.
304 310
305 -U Skip objects with unsupported object types instead of failing. 311 -U Skip objects with unsupported object types instead of failing.
306 Unsupported object types are s3 glacier objects. 312 Unsupported object types are Amazon S3 Objects in the GLACIER
313 storage class.
307 314
308 -x pattern Causes files/objects matching pattern to be excluded, i.e., any 315 -x pattern Causes files/objects matching pattern to be excluded, i.e., any
309 matching files/objects will not be copied or deleted. Note that 316 matching files/objects will not be copied or deleted. Note that
310 the pattern is a Python regular expression, not a wildcard (so, 317 the pattern is a Python regular expression, not a wildcard (so,
311 matching any string ending in 'abc' would be specified using 318 matching any string ending in 'abc' would be specified using
312 '.*abc' rather than '*abc'). Note also that the exclude path is 319 '.*abc' rather than '*abc'). Note also that the exclude path is
313 always relative (similar to Unix rsync or tar exclude options). 320 always relative (similar to Unix rsync or tar exclude options).
314 For example, if you run the command: 321 For example, if you run the command:
315 322
316 gsutil rsync -x 'data./.*\\.txt' dir gs://my-bucket 323 gsutil rsync -x 'data./.*\\.txt' dir gs://my-bucket
(...skipping 148 matching lines...) Expand 10 before | Expand all | Expand 10 after
465 # futile or could result in data loss - for example: 472 # futile or could result in data loss - for example:
466 # gsutil rsync -d gs://non-existent-bucket ./localdir 473 # gsutil rsync -d gs://non-existent-bucket ./localdir
467 # would delete files from localdir. 474 # would delete files from localdir.
468 cls.logger.error( 475 cls.logger.error(
469 'Caught non-retryable exception while listing %s: %s' % 476 'Caught non-retryable exception while listing %s: %s' %
470 (base_url_str, e)) 477 (base_url_str, e))
471 cls.non_retryable_listing_failures = 1 478 cls.non_retryable_listing_failures = 1
472 out_file.close() 479 out_file.close()
473 480
474 481
482 def _LocalDirIterator(base_url):
483 """A generator that yields a BLR for each file in a local directory.
484
485 We use this function instead of WildcardIterator for listing a local
486 directory without recursion, because the glob.globi implementation called
487 by WildcardIterator skips "dot" files (which we don't want to do when
488 synchronizing to or from a local directory).
489
490 Args:
491 base_url: URL for the directory over which to iterate.
492
493 Yields:
494 BucketListingObject for each file in the directory.
495 """
496 for filename in os.listdir(base_url.object_name):
497 filename = os.path.join(base_url.object_name, filename)
498 if os.path.isfile(filename):
499 yield BucketListingObject(StorageUrlFromString(filename), None)
500
501
475 def _FieldedListingIterator(cls, gsutil_api, base_url_str, desc): 502 def _FieldedListingIterator(cls, gsutil_api, base_url_str, desc):
476 """Iterator over base_url_str formatting output per _BuildTmpOutputLine. 503 """Iterator over base_url_str formatting output per _BuildTmpOutputLine.
477 504
478 Args: 505 Args:
479 cls: Command instance. 506 cls: Command instance.
480 gsutil_api: gsutil Cloud API instance to use for bucket listing. 507 gsutil_api: gsutil Cloud API instance to use for bucket listing.
481 base_url_str: The top-level URL string over which to iterate. 508 base_url_str: The top-level URL string over which to iterate.
482 desc: 'source' or 'destination'. 509 desc: 'source' or 'destination'.
483 510
484 Yields: 511 Yields:
485 Output line formatted per _BuildTmpOutputLine. 512 Output line formatted per _BuildTmpOutputLine.
486 """ 513 """
487 if cls.recursion_requested: 514 base_url = StorageUrlFromString(base_url_str)
488 wildcard = '%s/**' % base_url_str.rstrip('/\\') 515 if base_url.scheme == 'file' and not cls.recursion_requested:
516 iterator = _LocalDirIterator(base_url)
489 else: 517 else:
490 wildcard = '%s/*' % base_url_str.rstrip('/\\') 518 if cls.recursion_requested:
519 wildcard = '%s/**' % base_url_str.rstrip('/\\')
520 else:
521 wildcard = '%s/*' % base_url_str.rstrip('/\\')
522 iterator = CreateWildcardIterator(
523 wildcard, gsutil_api, debug=cls.debug,
524 project_id=cls.project_id).IterObjects(
525 # Request just the needed fields, to reduce bandwidth usage.
526 bucket_listing_fields=['crc32c', 'md5Hash', 'name', 'size'])
527
491 i = 0 528 i = 0
492 for blr in CreateWildcardIterator( 529 for blr in iterator:
493 wildcard, gsutil_api, debug=cls.debug,
494 project_id=cls.project_id).IterObjects(
495 # Request just the needed fields, to reduce bandwidth usage.
496 bucket_listing_fields=['crc32c', 'md5Hash', 'name', 'size']):
497 # Various GUI tools (like the GCS web console) create placeholder objects 530 # Various GUI tools (like the GCS web console) create placeholder objects
498 # ending with '/' when the user creates an empty directory. Normally these 531 # ending with '/' when the user creates an empty directory. Normally these
499 # tools should delete those placeholders once objects have been written 532 # tools should delete those placeholders once objects have been written
500 # "under" the directory, but sometimes the placeholders are left around. 533 # "under" the directory, but sometimes the placeholders are left around.
501 # We need to filter them out here, otherwise if the user tries to rsync 534 # We need to filter them out here, otherwise if the user tries to rsync
502 # from GCS to a local directory it will result in a directory/file 535 # from GCS to a local directory it will result in a directory/file
503 # conflict (e.g., trying to download an object called "mydata/" where the 536 # conflict (e.g., trying to download an object called "mydata/" where the
504 # local directory "mydata" exists). 537 # local directory "mydata" exists).
505 url = blr.storage_url 538 url = blr.storage_url
506 if IsCloudSubdirPlaceholder(url, blr=blr): 539 if IsCloudSubdirPlaceholder(url, blr=blr):
507 cls.logger.info('Skipping cloud sub-directory placeholder object (%s) ' 540 # We used to output the message 'Skipping cloud sub-directory placeholder
508 'because such objects aren\'t needed in (and would ' 541 # object...' but we no longer do so because it caused customer confusion.
509 'interfere with) directories in the local file system',
510 url)
511 continue 542 continue
512 if (cls.exclude_symlinks and url.IsFileUrl() 543 if (cls.exclude_symlinks and url.IsFileUrl()
513 and os.path.islink(url.object_name)): 544 and os.path.islink(url.object_name)):
514 continue 545 continue
515 if cls.exclude_pattern: 546 if cls.exclude_pattern:
516 str_to_check = url.url_string[len(base_url_str):] 547 str_to_check = url.url_string[len(base_url_str):]
517 if str_to_check.startswith(url.delim): 548 if str_to_check.startswith(url.delim):
518 str_to_check = str_to_check[1:] 549 str_to_check = str_to_check[1:]
519 if cls.exclude_pattern.match(str_to_check): 550 if cls.exclude_pattern.match(str_to_check):
520 continue 551 continue
(...skipping 502 matching lines...) Expand 10 before | Expand all | Expand 10 after
1023 elif o == '-x': 1054 elif o == '-x':
1024 if not a: 1055 if not a:
1025 raise CommandException('Invalid blank exclude filter') 1056 raise CommandException('Invalid blank exclude filter')
1026 try: 1057 try:
1027 self.exclude_pattern = re.compile(a) 1058 self.exclude_pattern = re.compile(a)
1028 except re.error: 1059 except re.error:
1029 raise CommandException('Invalid exclude filter (%s)' % a) 1060 raise CommandException('Invalid exclude filter (%s)' % a)
1030 return CreateCopyHelperOpts( 1061 return CreateCopyHelperOpts(
1031 preserve_acl=preserve_acl, 1062 preserve_acl=preserve_acl,
1032 skip_unsupported_objects=self.skip_unsupported_objects) 1063 skip_unsupported_objects=self.skip_unsupported_objects)
OLDNEW
« no previous file with comments | « third_party/gsutil/gslib/commands/rm.py ('k') | third_party/gsutil/gslib/commands/stat.py » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698