OLD | NEW |
| (Empty) |
1 # -*- coding: utf-8 -*- | |
2 # Copyright 2012 Google Inc. All Rights Reserved. | |
3 # | |
4 # Licensed under the Apache License, Version 2.0 (the "License"); | |
5 # you may not use this file except in compliance with the License. | |
6 # You may obtain a copy of the License at | |
7 # | |
8 # http://www.apache.org/licenses/LICENSE-2.0 | |
9 # | |
10 # Unless required by applicable law or agreed to in writing, software | |
11 # distributed under the License is distributed on an "AS IS" BASIS, | |
12 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |
13 # See the License for the specific language governing permissions and | |
14 # limitations under the License. | |
15 """Additional help about wildcards.""" | |
16 | |
17 from __future__ import absolute_import | |
18 | |
19 from gslib.help_provider import HelpProvider | |
20 | |
21 _DETAILED_HELP_TEXT = (""" | |
22 <B>DESCRIPTION</B> | |
23 gsutil supports URI wildcards. For example, the command: | |
24 | |
25 gsutil cp gs://bucket/data/abc* . | |
26 | |
27 will copy all objects that start with gs://bucket/data/abc followed by any | |
28 number of characters within that subdirectory. | |
29 | |
30 | |
31 <B>DIRECTORY BY DIRECTORY VS RECURSIVE WILDCARDS</B> | |
32 The "*" wildcard only matches up to the end of a path within | |
33 a subdirectory. For example, if bucket contains objects | |
34 named gs://bucket/data/abcd, gs://bucket/data/abcdef, | |
35 and gs://bucket/data/abcxyx, as well as an object in a sub-directory | |
36 (gs://bucket/data/abc/def) the above gsutil cp command would match the | |
37 first 3 object names but not the last one. | |
38 | |
39 If you want matches to span directory boundaries, use a '**' wildcard: | |
40 | |
41 gsutil cp gs://bucket/data/abc** . | |
42 | |
43 will match all four objects above. | |
44 | |
45 Note that gsutil supports the same wildcards for both objects and file names. | |
46 Thus, for example: | |
47 | |
48 gsutil cp data/abc* gs://bucket | |
49 | |
50 will match all names in the local file system. Most command shells also | |
51 support wildcarding, so if you run the above command probably your shell | |
52 is expanding the matches before running gsutil. However, most shells do not | |
53 support recursive wildcards ('**'), and you can cause gsutil's wildcarding | |
54 support to work for such shells by single-quoting the arguments so they | |
55 don't get interpreted by the shell before being passed to gsutil: | |
56 | |
57 gsutil cp 'data/abc**' gs://bucket | |
58 | |
59 | |
60 <B>BUCKET WILDCARDS</B> | |
61 You can specify wildcards for bucket names within a single project. For | |
62 example: | |
63 | |
64 gsutil ls gs://data*.example.com | |
65 | |
66 will list the contents of all buckets whose name starts with "data" and | |
67 ends with ".example.com" in the default project. The -p option can be used | |
68 to specify a project other than the default. For example: | |
69 | |
70 gsutil ls -p other-project gs://data*.example.com | |
71 | |
72 You can also combine bucket and object name wildcards. For example this | |
73 command will remove all ".txt" files in any of your Google Cloud Storage | |
74 buckets in the default project: | |
75 | |
76 gsutil rm gs://*/**.txt | |
77 | |
78 | |
79 <B>OTHER WILDCARD CHARACTERS</B> | |
80 In addition to '*', you can use these wildcards: | |
81 | |
82 ? | |
83 Matches a single character. For example "gs://bucket/??.txt" | |
84 only matches objects with two characters followed by .txt. | |
85 | |
86 [chars] | |
87 Match any of the specified characters. For example | |
88 "gs://bucket/[aeiou].txt" matches objects that contain a single vowel | |
89 character followed by .txt | |
90 | |
91 [char range] | |
92 Match any of the range of characters. For example | |
93 "gs://bucket/[a-m].txt" matches objects that contain letters | |
94 a, b, c, ... or m, and end with .txt. | |
95 | |
96 You can combine wildcards to provide more powerful matches, for example: | |
97 | |
98 gs://bucket/[a-m]??.j*g | |
99 | |
100 | |
101 <B>EFFICIENCY CONSIDERATION: USING WILDCARDS OVER MANY OBJECTS</B> | |
102 It is more efficient, faster, and less network traffic-intensive | |
103 to use wildcards that have a non-wildcard object-name prefix, like: | |
104 | |
105 gs://bucket/abc*.txt | |
106 | |
107 than it is to use wildcards as the first part of the object name, like: | |
108 | |
109 gs://bucket/*abc.txt | |
110 | |
111 This is because the request for "gs://bucket/abc*.txt" asks the server to send | |
112 back the subset of results whose object name start with "abc" at the bucket | |
113 root, and then gsutil filters the result list for objects whose name ends with | |
114 ".txt". In contrast, "gs://bucket/*abc.txt" asks the server for the complete | |
115 list of objects in the bucket root, and then filters for those objects whose | |
116 name ends with "abc.txt". This efficiency consideration becomes increasingly | |
117 noticeable when you use buckets containing thousands or more objects. It is | |
118 sometimes possible to set up the names of your objects to fit with expected | |
119 wildcard matching patterns, to take advantage of the efficiency of doing | |
120 server-side prefix requests. See, for example "gsutil help prod" for a | |
121 concrete use case example. | |
122 | |
123 | |
124 <B>EFFICIENCY CONSIDERATION: USING MID-PATH WILDCARDS</B> | |
125 Suppose you have a bucket with these objects: | |
126 | |
127 gs://bucket/obj1 | |
128 gs://bucket/obj2 | |
129 gs://bucket/obj3 | |
130 gs://bucket/obj4 | |
131 gs://bucket/dir1/obj5 | |
132 gs://bucket/dir2/obj6 | |
133 | |
134 If you run the command: | |
135 | |
136 gsutil ls gs://bucket/*/obj5 | |
137 | |
138 gsutil will perform a /-delimited top-level bucket listing and then one bucket | |
139 listing for each subdirectory, for a total of 3 bucket listings: | |
140 | |
141 GET /bucket/?delimiter=/ | |
142 GET /bucket/?prefix=dir1/obj5&delimiter=/ | |
143 GET /bucket/?prefix=dir2/obj5&delimiter=/ | |
144 | |
145 The more bucket listings your wildcard requires, the slower and more expensive | |
146 it will be. The number of bucket listings required grows as: | |
147 | |
148 - the number of wildcard components (e.g., "gs://bucket/a??b/c*/*/d" | |
149 has 3 wildcard components); | |
150 - the number of subdirectories that match each component; and | |
151 - the number of results (pagination is implemented using one GET | |
152 request per 1000 results, specifying markers for each). | |
153 | |
154 If you want to use a mid-path wildcard, you might try instead using a | |
155 recursive wildcard, for example: | |
156 | |
157 gsutil ls gs://bucket/**/obj5 | |
158 | |
159 This will match more objects than "gs://bucket/*/obj5" (since it spans | |
160 directories), but is implemented using a delimiter-less bucket listing | |
161 request (which means fewer bucket requests, though it will list the entire | |
162 bucket and filter locally, so that could require a non-trivial amount of | |
163 network traffic). | |
164 """) | |
165 | |
166 | |
167 class CommandOptions(HelpProvider): | |
168 """Additional help about wildcards.""" | |
169 | |
170 # Help specification. See help_provider.py for documentation. | |
171 help_spec = HelpProvider.HelpSpec( | |
172 help_name='wildcards', | |
173 help_name_aliases=['wildcard', '*', '**'], | |
174 help_type='additional_help', | |
175 help_one_line_summary='Wildcard Names', | |
176 help_text=_DETAILED_HELP_TEXT, | |
177 subcommand_help_text={}, | |
178 ) | |
OLD | NEW |