Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(532)

Side by Side Diff: third_party/gsutil/gslib/addlhelp/wildcards.py

Issue 12042069: Scripts to download files from google storage based on sha1 sums (Closed) Base URL: https://chromium.googlesource.com/chromium/tools/depot_tools.git@master
Patch Set: Removed gsutil/tests and gsutil/docs Created 7 years, 10 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
OLDNEW
(Empty)
1 # Copyright 2012 Google Inc.
2 #
3 # Licensed under the Apache License, Version 2.0 (the "License");
4 # you may not use this file except in compliance with the License.
5 # You may obtain a copy of the License at
6 #
7 # http://www.apache.org/licenses/LICENSE-2.0
8 #
9 # Unless required by applicable law or agreed to in writing, software
10 # distributed under the License is distributed on an "AS IS" BASIS,
11 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 # See the License for the specific language governing permissions and
13 # limitations under the License.
14
15 from gslib.help_provider import HELP_NAME
16 from gslib.help_provider import HELP_NAME_ALIASES
17 from gslib.help_provider import HELP_ONE_LINE_SUMMARY
18 from gslib.help_provider import HelpProvider
19 from gslib.help_provider import HELP_TEXT
20 from gslib.help_provider import HelpType
21 from gslib.help_provider import HELP_TYPE
22
23 _detailed_help_text = ("""
24 <B>DESCRIPTION</B>
25 gsutil supports URI wildcards. For example, the command:
26
27 gsutil cp gs://bucket/data/abc* .
28
29 will copy all objects that start with gs://bucket/data/abc followed by any
30 number of characters within that subdirectory.
31
32 Note: Wildcards are disallowed when using the "-v" flag to specify an object
33 version.
34
35
36 <B>DIRECTORY BY DIRECTORY VS RECURSIVE WILDCARDS</B>
37 The "*" wildcard only matches up to the end of a path within
38 a subdirectory. For example, if bucket contains objects
39 named gs://bucket/data/abcd, gs://bucket/data/abcdef,
40 and gs://bucket/data/abcxyx, as well as an object in a sub-directory
41 (gs://bucket/data/abc/def) the above gsutil cp command would match the
42 first 3 object names but not the last one.
43
44 If you want matches to span directory boundaries, use a '**' wildcard:
45
46 gsutil cp gs://bucket/data/abc** .
47
48 will match all four objects above.
49
50 Note that gsutil supports the same wildcards for both objects and file names.
51 Thus, for example:
52
53 gsutil cp data/abc* gs://bucket
54
55 will match all names in the local file system. Most command shells also
56 support wildcarding, so if you run the above command probably your shell
57 is expanding the matches before running gsutil. However, most shells do not
58 support recursive wildcards ('**'), and you can cause gsutil's wildcarding
59 support to work for such shells by single-quoting the arguments so they
60 don't get interpreted by the shell before being passed to gsutil:
61
62 gsutil cp 'data/abc**' gs://bucket
63
64
65 <B>BUCKET WILDCARDS</B>
66 You can specify wildcards for bucket names. For example:
67
68 gsutil ls gs://data*.example.com
69
70 will list the contents of all buckets whose name starts with "data" and
71 ends with ".example.com".
72
73 You can also combine bucket and object name wildcards. For example this
74 command will remove all ".txt" files in any of your Google Cloud Storage
75 buckets:
76
77 gsutil rm gs://*/**.txt
78
79
80 <B>OTHER WILDCARD CHARACTERS</B>
81 In addition to '*', you can use these wildcards:
82
83 ? Matches a single character. For example "gs://bucket/??.txt"
84 only matches objects with two characters followed by .txt.
85
86 [chars] Match any of the specified characters. For example
87 "gs://bucket/[aeiou].txt" matches objects that contain a single vowel
88 character followed by .txt
89
90 [char range] Match any of the range of characters. For example
91 "gs://bucket/[a-m].txt" matches objects that contain letters
92 a, b, c, ... or m, and end with .txt.
93
94 You can combine wildcards to provide more powerful matches, for example:
95 gs://bucket/[a-m]??.j*g
96
97
98 <B>EFFICIENCY CONSIDERATION: USING WILDCARDS OVER MANY OBJECTS</B>
99 It is more efficient, faster, and less network traffic-intensive
100 to use wildcards that have a non-wildcard object-name prefix, like:
101
102 gs://bucket/abc*.txt
103
104 than it is to use wildcards as the first part of the object name, like:
105
106 gs://bucket/*abc.txt
107
108 This is because the request for "gs://bucket/abc*.txt" asks the server
109 to send back the subset of results whose object names start with "abc",
110 and then gsutil filters the result list for objects whose name ends with
111 ".txt". In contrast, "gs://bucket/*abc.txt" asks the server for the complete
112 list of objects in the bucket and then filters for those objects whose name
113 ends with "abc.txt". This efficiency consideration becomes increasingly
114 noticeable when you use buckets containing thousands or more objects. It is
115 sometimes possible to set up the names of your objects to fit with expected
116 wildcard matching patterns, to take advantage of the efficiency of doing
117 server-side prefix requests. See, for example "gsutil help prod" for a
118 concrete use case example.
119
120
121 <B>EFFICIENCY CONSIDERATION: USING MID-PATH WILDCARDS</B>
122 Suppose you have a bucket with these objects:
123 gs://bucket/obj1
124 gs://bucket/obj2
125 gs://bucket/obj3
126 gs://bucket/obj4
127 gs://bucket/dir1/obj5
128 gs://bucket/dir2/obj6
129
130 If you run the command:
131 gsutil ls gs://bucket/*/obj5
132 gsutil will perform a /-delimited top-level bucket listing and then one bucket
133 listing for each subdirectory, for a total of 3 bucket listings:
134 GET /bucket/?delimiter=/
135 GET /bucket/?prefix=dir1/obj5&delimiter=/
136 GET /bucket/?prefix=dir2/obj5&delimiter=/
137
138 The more bucket listings your wildcard requires, the slower and more expensive
139 it will be. The number of bucket listings required grows as:
140 - the number of wildcard components (e.g., "gs://bucket/a??b/c*/*/d"
141 has 3 wildcard components);
142 - the number of subdirectories that match each component; and
143 - the number of results (pagination is implemented using one GET
144 request per 1000 results, specifying markers for each).
145
146 If you want to use a mid-path wildcard, you might try instead using a
147 recursive wildcard, for example:
148
149 gsutil ls gs://bucket/**/obj5
150
151 This will match more objects than gs://bucket/*/obj5 (since it spans
152 directories), but is implemented using a delimiter-less bucket listing
153 request (which means fewer bucket requests, though it will list the entire
154 bucket and filter locally, so that could require a non-trivial amount of
155 network traffic).
156 """)
157
158
159 class CommandOptions(HelpProvider):
160 """Additional help about wildcards."""
161
162 help_spec = {
163 # Name of command or auxiliary help info for which this help applies.
164 HELP_NAME : 'wildcards',
165 # List of help name aliases.
166 HELP_NAME_ALIASES : ['wildcard', '*', '**'],
167 # Type of help:
168 HELP_TYPE : HelpType.ADDITIONAL_HELP,
169 # One line summary of this help.
170 HELP_ONE_LINE_SUMMARY : 'Wildcard support',
171 # The full help text.
172 HELP_TEXT : _detailed_help_text,
173 }
OLDNEW

Powered by Google App Engine
This is Rietveld 408576698