| OLD | NEW |
| (Empty) |
| 1 # Copyright 2012 Google Inc. All Rights Reserved. | |
| 2 # | |
| 3 # Licensed under the Apache License, Version 2.0 (the "License"); | |
| 4 # you may not use this file except in compliance with the License. | |
| 5 # You may obtain a copy of the License at | |
| 6 # | |
| 7 # http://www.apache.org/licenses/LICENSE-2.0 | |
| 8 # | |
| 9 # Unless required by applicable law or agreed to in writing, software | |
| 10 # distributed under the License is distributed on an "AS IS" BASIS, | |
| 11 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |
| 12 # See the License for the specific language governing permissions and | |
| 13 # limitations under the License. | |
| 14 | |
| 15 from gslib.help_provider import HELP_NAME | |
| 16 from gslib.help_provider import HELP_NAME_ALIASES | |
| 17 from gslib.help_provider import HELP_ONE_LINE_SUMMARY | |
| 18 from gslib.help_provider import HelpProvider | |
| 19 from gslib.help_provider import HELP_TEXT | |
| 20 from gslib.help_provider import HelpType | |
| 21 from gslib.help_provider import HELP_TYPE | |
| 22 | |
| 23 _detailed_help_text = (""" | |
| 24 <B>DESCRIPTION</B> | |
| 25 gsutil supports URI wildcards. For example, the command: | |
| 26 | |
| 27 gsutil cp gs://bucket/data/abc* . | |
| 28 | |
| 29 will copy all objects that start with gs://bucket/data/abc followed by any | |
| 30 number of characters within that subdirectory. | |
| 31 | |
| 32 | |
| 33 <B>DIRECTORY BY DIRECTORY VS RECURSIVE WILDCARDS</B> | |
| 34 The "*" wildcard only matches up to the end of a path within | |
| 35 a subdirectory. For example, if bucket contains objects | |
| 36 named gs://bucket/data/abcd, gs://bucket/data/abcdef, | |
| 37 and gs://bucket/data/abcxyx, as well as an object in a sub-directory | |
| 38 (gs://bucket/data/abc/def) the above gsutil cp command would match the | |
| 39 first 3 object names but not the last one. | |
| 40 | |
| 41 If you want matches to span directory boundaries, use a '**' wildcard: | |
| 42 | |
| 43 gsutil cp gs://bucket/data/abc** . | |
| 44 | |
| 45 will match all four objects above. | |
| 46 | |
| 47 Note that gsutil supports the same wildcards for both objects and file names. | |
| 48 Thus, for example: | |
| 49 | |
| 50 gsutil cp data/abc* gs://bucket | |
| 51 | |
| 52 will match all names in the local file system. Most command shells also | |
| 53 support wildcarding, so if you run the above command probably your shell | |
| 54 is expanding the matches before running gsutil. However, most shells do not | |
| 55 support recursive wildcards ('**'), and you can cause gsutil's wildcarding | |
| 56 support to work for such shells by single-quoting the arguments so they | |
| 57 don't get interpreted by the shell before being passed to gsutil: | |
| 58 | |
| 59 gsutil cp 'data/abc**' gs://bucket | |
| 60 | |
| 61 | |
| 62 <B>BUCKET WILDCARDS</B> | |
| 63 You can specify wildcards for bucket names. For example: | |
| 64 | |
| 65 gsutil ls gs://data*.example.com | |
| 66 | |
| 67 will list the contents of all buckets whose name starts with "data" and | |
| 68 ends with ".example.com". | |
| 69 | |
| 70 You can also combine bucket and object name wildcards. For example this | |
| 71 command will remove all ".txt" files in any of your Google Cloud Storage | |
| 72 buckets: | |
| 73 | |
| 74 gsutil rm gs://*/**.txt | |
| 75 | |
| 76 | |
| 77 <B>OTHER WILDCARD CHARACTERS</B> | |
| 78 In addition to '*', you can use these wildcards: | |
| 79 | |
| 80 ? Matches a single character. For example "gs://bucket/??.txt" | |
| 81 only matches objects with two characters followed by .txt. | |
| 82 | |
| 83 [chars] Match any of the specified characters. For example | |
| 84 "gs://bucket/[aeiou].txt" matches objects that contain a single vowel | |
| 85 character followed by .txt | |
| 86 | |
| 87 [char range] Match any of the range of characters. For example | |
| 88 "gs://bucket/[a-m].txt" matches objects that contain letters | |
| 89 a, b, c, ... or m, and end with .txt. | |
| 90 | |
| 91 You can combine wildcards to provide more powerful matches, for example: | |
| 92 gs://bucket/[a-m]??.j*g | |
| 93 | |
| 94 | |
| 95 <B>EFFICIENCY CONSIDERATION: USING WILDCARDS OVER MANY OBJECTS</B> | |
| 96 It is more efficient, faster, and less network traffic-intensive | |
| 97 to use wildcards that have a non-wildcard object-name prefix, like: | |
| 98 | |
| 99 gs://bucket/abc*.txt | |
| 100 | |
| 101 than it is to use wildcards as the first part of the object name, like: | |
| 102 | |
| 103 gs://bucket/*abc.txt | |
| 104 | |
| 105 This is because the request for "gs://bucket/abc*.txt" asks the server | |
| 106 to send back the subset of results whose object names start with "abc", | |
| 107 and then gsutil filters the result list for objects whose name ends with | |
| 108 ".txt". In contrast, "gs://bucket/*abc.txt" asks the server for the complete | |
| 109 list of objects in the bucket and then filters for those objects whose name | |
| 110 ends with "abc.txt". This efficiency consideration becomes increasingly | |
| 111 noticeable when you use buckets containing thousands or more objects. It is | |
| 112 sometimes possible to set up the names of your objects to fit with expected | |
| 113 wildcard matching patterns, to take advantage of the efficiency of doing | |
| 114 server-side prefix requests. See, for example "gsutil help prod" for a | |
| 115 concrete use case example. | |
| 116 | |
| 117 | |
| 118 <B>EFFICIENCY CONSIDERATION: USING MID-PATH WILDCARDS</B> | |
| 119 Suppose you have a bucket with these objects: | |
| 120 gs://bucket/obj1 | |
| 121 gs://bucket/obj2 | |
| 122 gs://bucket/obj3 | |
| 123 gs://bucket/obj4 | |
| 124 gs://bucket/dir1/obj5 | |
| 125 gs://bucket/dir2/obj6 | |
| 126 | |
| 127 If you run the command: | |
| 128 gsutil ls gs://bucket/*/obj5 | |
| 129 gsutil will perform a /-delimited top-level bucket listing and then one bucket | |
| 130 listing for each subdirectory, for a total of 3 bucket listings: | |
| 131 GET /bucket/?delimiter=/ | |
| 132 GET /bucket/?prefix=dir1/obj5&delimiter=/ | |
| 133 GET /bucket/?prefix=dir2/obj5&delimiter=/ | |
| 134 | |
| 135 The more bucket listings your wildcard requires, the slower and more expensive | |
| 136 it will be. The number of bucket listings required grows as: | |
| 137 - the number of wildcard components (e.g., "gs://bucket/a??b/c*/*/d" | |
| 138 has 3 wildcard components); | |
| 139 - the number of subdirectories that match each component; and | |
| 140 - the number of results (pagination is implemented using one GET | |
| 141 request per 1000 results, specifying markers for each). | |
| 142 | |
| 143 If you want to use a mid-path wildcard, you might try instead using a | |
| 144 recursive wildcard, for example: | |
| 145 | |
| 146 gsutil ls gs://bucket/**/obj5 | |
| 147 | |
| 148 This will match more objects than gs://bucket/*/obj5 (since it spans | |
| 149 directories), but is implemented using a delimiter-less bucket listing | |
| 150 request (which means fewer bucket requests, though it will list the entire | |
| 151 bucket and filter locally, so that could require a non-trivial amount of | |
| 152 network traffic). | |
| 153 """) | |
| 154 | |
| 155 | |
| 156 class CommandOptions(HelpProvider): | |
| 157 """Additional help about wildcards.""" | |
| 158 | |
| 159 help_spec = { | |
| 160 # Name of command or auxiliary help info for which this help applies. | |
| 161 HELP_NAME : 'wildcards', | |
| 162 # List of help name aliases. | |
| 163 HELP_NAME_ALIASES : ['wildcard', '*', '**'], | |
| 164 # Type of help: | |
| 165 HELP_TYPE : HelpType.ADDITIONAL_HELP, | |
| 166 # One line summary of this help. | |
| 167 HELP_ONE_LINE_SUMMARY : 'Wildcard support', | |
| 168 # The full help text. | |
| 169 HELP_TEXT : _detailed_help_text, | |
| 170 } | |
| OLD | NEW |