OLD | NEW |
| (Empty) |
1 # Copyright 2012 Google Inc. All Rights Reserved. | |
2 # | |
3 # Licensed under the Apache License, Version 2.0 (the "License"); | |
4 # you may not use this file except in compliance with the License. | |
5 # You may obtain a copy of the License at | |
6 # | |
7 # http://www.apache.org/licenses/LICENSE-2.0 | |
8 # | |
9 # Unless required by applicable law or agreed to in writing, software | |
10 # distributed under the License is distributed on an "AS IS" BASIS, | |
11 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |
12 # See the License for the specific language governing permissions and | |
13 # limitations under the License. | |
14 | |
15 from gslib.help_provider import HELP_NAME | |
16 from gslib.help_provider import HELP_NAME_ALIASES | |
17 from gslib.help_provider import HELP_ONE_LINE_SUMMARY | |
18 from gslib.help_provider import HelpProvider | |
19 from gslib.help_provider import HELP_TEXT | |
20 from gslib.help_provider import HelpType | |
21 from gslib.help_provider import HELP_TYPE | |
22 | |
23 _detailed_help_text = (""" | |
24 <B>DESCRIPTION</B> | |
25 gsutil supports URI wildcards. For example, the command: | |
26 | |
27 gsutil cp gs://bucket/data/abc* . | |
28 | |
29 will copy all objects that start with gs://bucket/data/abc followed by any | |
30 number of characters within that subdirectory. | |
31 | |
32 | |
33 <B>DIRECTORY BY DIRECTORY VS RECURSIVE WILDCARDS</B> | |
34 The "*" wildcard only matches up to the end of a path within | |
35 a subdirectory. For example, if bucket contains objects | |
36 named gs://bucket/data/abcd, gs://bucket/data/abcdef, | |
37 and gs://bucket/data/abcxyx, as well as an object in a sub-directory | |
38 (gs://bucket/data/abc/def) the above gsutil cp command would match the | |
39 first 3 object names but not the last one. | |
40 | |
41 If you want matches to span directory boundaries, use a '**' wildcard: | |
42 | |
43 gsutil cp gs://bucket/data/abc** . | |
44 | |
45 will match all four objects above. | |
46 | |
47 Note that gsutil supports the same wildcards for both objects and file names. | |
48 Thus, for example: | |
49 | |
50 gsutil cp data/abc* gs://bucket | |
51 | |
52 will match all names in the local file system. Most command shells also | |
53 support wildcarding, so if you run the above command probably your shell | |
54 is expanding the matches before running gsutil. However, most shells do not | |
55 support recursive wildcards ('**'), and you can cause gsutil's wildcarding | |
56 support to work for such shells by single-quoting the arguments so they | |
57 don't get interpreted by the shell before being passed to gsutil: | |
58 | |
59 gsutil cp 'data/abc**' gs://bucket | |
60 | |
61 | |
62 <B>BUCKET WILDCARDS</B> | |
63 You can specify wildcards for bucket names. For example: | |
64 | |
65 gsutil ls gs://data*.example.com | |
66 | |
67 will list the contents of all buckets whose name starts with "data" and | |
68 ends with ".example.com". | |
69 | |
70 You can also combine bucket and object name wildcards. For example this | |
71 command will remove all ".txt" files in any of your Google Cloud Storage | |
72 buckets: | |
73 | |
74 gsutil rm gs://*/**.txt | |
75 | |
76 | |
77 <B>OTHER WILDCARD CHARACTERS</B> | |
78 In addition to '*', you can use these wildcards: | |
79 | |
80 ? Matches a single character. For example "gs://bucket/??.txt" | |
81 only matches objects with two characters followed by .txt. | |
82 | |
83 [chars] Match any of the specified characters. For example | |
84 "gs://bucket/[aeiou].txt" matches objects that contain a single vowel | |
85 character followed by .txt | |
86 | |
87 [char range] Match any of the range of characters. For example | |
88 "gs://bucket/[a-m].txt" matches objects that contain letters | |
89 a, b, c, ... or m, and end with .txt. | |
90 | |
91 You can combine wildcards to provide more powerful matches, for example: | |
92 gs://bucket/[a-m]??.j*g | |
93 | |
94 | |
95 <B>EFFICIENCY CONSIDERATION: USING WILDCARDS OVER MANY OBJECTS</B> | |
96 It is more efficient, faster, and less network traffic-intensive | |
97 to use wildcards that have a non-wildcard object-name prefix, like: | |
98 | |
99 gs://bucket/abc*.txt | |
100 | |
101 than it is to use wildcards as the first part of the object name, like: | |
102 | |
103 gs://bucket/*abc.txt | |
104 | |
105 This is because the request for "gs://bucket/abc*.txt" asks the server | |
106 to send back the subset of results whose object names start with "abc", | |
107 and then gsutil filters the result list for objects whose name ends with | |
108 ".txt". In contrast, "gs://bucket/*abc.txt" asks the server for the complete | |
109 list of objects in the bucket and then filters for those objects whose name | |
110 ends with "abc.txt". This efficiency consideration becomes increasingly | |
111 noticeable when you use buckets containing thousands or more objects. It is | |
112 sometimes possible to set up the names of your objects to fit with expected | |
113 wildcard matching patterns, to take advantage of the efficiency of doing | |
114 server-side prefix requests. See, for example "gsutil help prod" for a | |
115 concrete use case example. | |
116 | |
117 | |
118 <B>EFFICIENCY CONSIDERATION: USING MID-PATH WILDCARDS</B> | |
119 Suppose you have a bucket with these objects: | |
120 gs://bucket/obj1 | |
121 gs://bucket/obj2 | |
122 gs://bucket/obj3 | |
123 gs://bucket/obj4 | |
124 gs://bucket/dir1/obj5 | |
125 gs://bucket/dir2/obj6 | |
126 | |
127 If you run the command: | |
128 gsutil ls gs://bucket/*/obj5 | |
129 gsutil will perform a /-delimited top-level bucket listing and then one bucket | |
130 listing for each subdirectory, for a total of 3 bucket listings: | |
131 GET /bucket/?delimiter=/ | |
132 GET /bucket/?prefix=dir1/obj5&delimiter=/ | |
133 GET /bucket/?prefix=dir2/obj5&delimiter=/ | |
134 | |
135 The more bucket listings your wildcard requires, the slower and more expensive | |
136 it will be. The number of bucket listings required grows as: | |
137 - the number of wildcard components (e.g., "gs://bucket/a??b/c*/*/d" | |
138 has 3 wildcard components); | |
139 - the number of subdirectories that match each component; and | |
140 - the number of results (pagination is implemented using one GET | |
141 request per 1000 results, specifying markers for each). | |
142 | |
143 If you want to use a mid-path wildcard, you might try instead using a | |
144 recursive wildcard, for example: | |
145 | |
146 gsutil ls gs://bucket/**/obj5 | |
147 | |
148 This will match more objects than gs://bucket/*/obj5 (since it spans | |
149 directories), but is implemented using a delimiter-less bucket listing | |
150 request (which means fewer bucket requests, though it will list the entire | |
151 bucket and filter locally, so that could require a non-trivial amount of | |
152 network traffic). | |
153 """) | |
154 | |
155 | |
156 class CommandOptions(HelpProvider): | |
157 """Additional help about wildcards.""" | |
158 | |
159 help_spec = { | |
160 # Name of command or auxiliary help info for which this help applies. | |
161 HELP_NAME : 'wildcards', | |
162 # List of help name aliases. | |
163 HELP_NAME_ALIASES : ['wildcard', '*', '**'], | |
164 # Type of help: | |
165 HELP_TYPE : HelpType.ADDITIONAL_HELP, | |
166 # One line summary of this help. | |
167 HELP_ONE_LINE_SUMMARY : 'Wildcard support', | |
168 # The full help text. | |
169 HELP_TEXT : _detailed_help_text, | |
170 } | |
OLD | NEW |