Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(64)

Side by Side Diff: third_party/gsutil/gslib/addlhelp/wildcards.py

Issue 1377933002: [catapult] - Copy Telemetry's gsutilz over to third_party. (Closed) Base URL: https://github.com/catapult-project/catapult.git@master
Patch Set: Rename to gsutil. Created 5 years, 2 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
OLDNEW
(Empty)
1 # -*- coding: utf-8 -*-
2 # Copyright 2012 Google Inc. All Rights Reserved.
3 #
4 # Licensed under the Apache License, Version 2.0 (the "License");
5 # you may not use this file except in compliance with the License.
6 # You may obtain a copy of the License at
7 #
8 # http://www.apache.org/licenses/LICENSE-2.0
9 #
10 # Unless required by applicable law or agreed to in writing, software
11 # distributed under the License is distributed on an "AS IS" BASIS,
12 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 # See the License for the specific language governing permissions and
14 # limitations under the License.
15 """Additional help about wildcards."""
16
17 from __future__ import absolute_import
18
19 from gslib.help_provider import HelpProvider
20
21 _DETAILED_HELP_TEXT = ("""
22 <B>DESCRIPTION</B>
23 gsutil supports URI wildcards. For example, the command:
24
25 gsutil cp gs://bucket/data/abc* .
26
27 will copy all objects that start with gs://bucket/data/abc followed by any
28 number of characters within that subdirectory.
29
30
31 <B>DIRECTORY BY DIRECTORY VS RECURSIVE WILDCARDS</B>
32 The "*" wildcard only matches up to the end of a path within
33 a subdirectory. For example, if bucket contains objects
34 named gs://bucket/data/abcd, gs://bucket/data/abcdef,
35 and gs://bucket/data/abcxyx, as well as an object in a sub-directory
36 (gs://bucket/data/abc/def) the above gsutil cp command would match the
37 first 3 object names but not the last one.
38
39 If you want matches to span directory boundaries, use a '**' wildcard:
40
41 gsutil cp gs://bucket/data/abc** .
42
43 will match all four objects above.
44
45 Note that gsutil supports the same wildcards for both objects and file names.
46 Thus, for example:
47
48 gsutil cp data/abc* gs://bucket
49
50 will match all names in the local file system. Most command shells also
51 support wildcarding, so if you run the above command probably your shell
52 is expanding the matches before running gsutil. However, most shells do not
53 support recursive wildcards ('**'), and you can cause gsutil's wildcarding
54 support to work for such shells by single-quoting the arguments so they
55 don't get interpreted by the shell before being passed to gsutil:
56
57 gsutil cp 'data/abc**' gs://bucket
58
59
60 <B>BUCKET WILDCARDS</B>
61 You can specify wildcards for bucket names within a single project. For
62 example:
63
64 gsutil ls gs://data*.example.com
65
66 will list the contents of all buckets whose name starts with "data" and
67 ends with ".example.com" in the default project. The -p option can be used
68 to specify a project other than the default. For example:
69
70 gsutil ls -p other-project gs://data*.example.com
71
72 You can also combine bucket and object name wildcards. For example this
73 command will remove all ".txt" files in any of your Google Cloud Storage
74 buckets in the default project:
75
76 gsutil rm gs://*/**.txt
77
78
79 <B>OTHER WILDCARD CHARACTERS</B>
80 In addition to '*', you can use these wildcards:
81
82 ?
83 Matches a single character. For example "gs://bucket/??.txt"
84 only matches objects with two characters followed by .txt.
85
86 [chars]
87 Match any of the specified characters. For example
88 "gs://bucket/[aeiou].txt" matches objects that contain a single vowel
89 character followed by .txt
90
91 [char range]
92 Match any of the range of characters. For example
93 "gs://bucket/[a-m].txt" matches objects that contain letters
94 a, b, c, ... or m, and end with .txt.
95
96 You can combine wildcards to provide more powerful matches, for example:
97
98 gs://bucket/[a-m]??.j*g
99
100
101 <B>EFFICIENCY CONSIDERATION: USING WILDCARDS OVER MANY OBJECTS</B>
102 It is more efficient, faster, and less network traffic-intensive
103 to use wildcards that have a non-wildcard object-name prefix, like:
104
105 gs://bucket/abc*.txt
106
107 than it is to use wildcards as the first part of the object name, like:
108
109 gs://bucket/*abc.txt
110
111 This is because the request for "gs://bucket/abc*.txt" asks the server to send
112 back the subset of results whose object name start with "abc" at the bucket
113 root, and then gsutil filters the result list for objects whose name ends with
114 ".txt". In contrast, "gs://bucket/*abc.txt" asks the server for the complete
115 list of objects in the bucket root, and then filters for those objects whose
116 name ends with "abc.txt". This efficiency consideration becomes increasingly
117 noticeable when you use buckets containing thousands or more objects. It is
118 sometimes possible to set up the names of your objects to fit with expected
119 wildcard matching patterns, to take advantage of the efficiency of doing
120 server-side prefix requests. See, for example "gsutil help prod" for a
121 concrete use case example.
122
123
124 <B>EFFICIENCY CONSIDERATION: USING MID-PATH WILDCARDS</B>
125 Suppose you have a bucket with these objects:
126
127 gs://bucket/obj1
128 gs://bucket/obj2
129 gs://bucket/obj3
130 gs://bucket/obj4
131 gs://bucket/dir1/obj5
132 gs://bucket/dir2/obj6
133
134 If you run the command:
135
136 gsutil ls gs://bucket/*/obj5
137
138 gsutil will perform a /-delimited top-level bucket listing and then one bucket
139 listing for each subdirectory, for a total of 3 bucket listings:
140
141 GET /bucket/?delimiter=/
142 GET /bucket/?prefix=dir1/obj5&delimiter=/
143 GET /bucket/?prefix=dir2/obj5&delimiter=/
144
145 The more bucket listings your wildcard requires, the slower and more expensive
146 it will be. The number of bucket listings required grows as:
147
148 - the number of wildcard components (e.g., "gs://bucket/a??b/c*/*/d"
149 has 3 wildcard components);
150 - the number of subdirectories that match each component; and
151 - the number of results (pagination is implemented using one GET
152 request per 1000 results, specifying markers for each).
153
154 If you want to use a mid-path wildcard, you might try instead using a
155 recursive wildcard, for example:
156
157 gsutil ls gs://bucket/**/obj5
158
159 This will match more objects than "gs://bucket/*/obj5" (since it spans
160 directories), but is implemented using a delimiter-less bucket listing
161 request (which means fewer bucket requests, though it will list the entire
162 bucket and filter locally, so that could require a non-trivial amount of
163 network traffic).
164 """)
165
166
167 class CommandOptions(HelpProvider):
168 """Additional help about wildcards."""
169
170 # Help specification. See help_provider.py for documentation.
171 help_spec = HelpProvider.HelpSpec(
172 help_name='wildcards',
173 help_name_aliases=['wildcard', '*', '**'],
174 help_type='additional_help',
175 help_one_line_summary='Wildcard Names',
176 help_text=_DETAILED_HELP_TEXT,
177 subcommand_help_text={},
178 )
OLDNEW
« no previous file with comments | « third_party/gsutil/gslib/addlhelp/versions.py ('k') | third_party/gsutil/gslib/boto_resumable_upload.py » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698