OLD | NEW |
| (Empty) |
1 # -*- coding: utf-8 -*- | |
2 # Copyright 2012 Google Inc. All Rights Reserved. | |
3 # | |
4 # Licensed under the Apache License, Version 2.0 (the "License"); | |
5 # you may not use this file except in compliance with the License. | |
6 # You may obtain a copy of the License at | |
7 # | |
8 # http://www.apache.org/licenses/LICENSE-2.0 | |
9 # | |
10 # Unless required by applicable law or agreed to in writing, software | |
11 # distributed under the License is distributed on an "AS IS" BASIS, | |
12 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |
13 # See the License for the specific language governing permissions and | |
14 # limitations under the License. | |
15 """Additional help about object versioning.""" | |
16 | |
17 from __future__ import absolute_import | |
18 | |
19 from gslib.help_provider import HelpProvider | |
20 | |
21 _DETAILED_HELP_TEXT = (""" | |
22 <B>OVERVIEW</B> | |
23 Versioning-enabled buckets maintain an archive of objects, providing a way to | |
24 un-delete data that you accidentally deleted, or to retrieve older versions of | |
25 your data. You can turn versioning on or off for a bucket at any time. Turning | |
26 versioning off leaves existing object versions in place, and simply causes the | |
27 bucket to stop accumulating new object versions. In this case, if you upload | |
28 to an existing object the current version is overwritten instead of creating | |
29 a new version. | |
30 | |
31 Regardless of whether you have enabled versioning on a bucket, every object | |
32 has two associated positive integer fields: | |
33 | |
34 - the generation, which is updated when the content of an object is | |
35 overwritten. | |
36 - the metageneration, which identifies the metadata generation. It starts | |
37 at 1; is updated every time the metadata (e.g., ACL or Content-Type) for a | |
38 given content generation is updated; and gets reset when the generation | |
39 number changes. | |
40 | |
41 Of these two integers, only the generation is used when working with versioned | |
42 data. Both generation and metageneration can be used with concurrency control | |
43 (discussed in a later section). | |
44 | |
45 To work with object versioning in gsutil, you can use a flavor of storage URIs | |
46 that that embed the object generation, which we refer to as version-specific | |
47 URIs. For example, the version-less object URI: | |
48 | |
49 gs://bucket/object | |
50 | |
51 might have have two versions, with these version-specific URIs: | |
52 | |
53 gs://bucket/object#1360383693690000 | |
54 gs://bucket/object#1360383802725000 | |
55 | |
56 The following sections discuss how to work with versioning and concurrency | |
57 control. | |
58 | |
59 | |
60 <B>OBJECT VERSIONING</B> | |
61 You can view, enable, and disable object versioning on a bucket using | |
62 the 'versioning get' and 'versioning set' commands. For example: | |
63 | |
64 gsutil versioning set on gs://bucket | |
65 | |
66 will enable versioning for the named bucket. See 'gsutil help versioning' | |
67 for additional details. | |
68 | |
69 To see all object versions in a versioning-enabled bucket along with | |
70 their generation.metageneration information, use gsutil ls -a: | |
71 | |
72 gsutil ls -a gs://bucket | |
73 | |
74 You can also specify particular objects for which you want to find the | |
75 version-specific URI(s), or you can use wildcards: | |
76 | |
77 gsutil ls -a gs://bucket/object1 gs://bucket/images/*.jpg | |
78 | |
79 The generation values form a monotonically increasing sequence as you create | |
80 additional object versions. Because of this, the latest object version is | |
81 always the last one listed in the gsutil ls output for a particular object. | |
82 For example, if a bucket contains these three versions of gs://bucket/object: | |
83 | |
84 gs://bucket/object#1360035307075000 | |
85 gs://bucket/object#1360101007329000 | |
86 gs://bucket/object#1360102216114000 | |
87 | |
88 then gs://bucket/object#1360102216114000 is the latest version and | |
89 gs://bucket/object#1360035307075000 is the oldest available version. | |
90 | |
91 If you specify version-less URIs with gsutil, you will operate on the | |
92 latest not-deleted version of an object, for example: | |
93 | |
94 gsutil cp gs://bucket/object ./dir | |
95 | |
96 or: | |
97 | |
98 gsutil rm gs://bucket/object | |
99 | |
100 To operate on a specific object version, use a version-specific URI. | |
101 For example, suppose the output of the above gsutil ls -a command is: | |
102 | |
103 gs://bucket/object#1360035307075000 | |
104 gs://bucket/object#1360101007329000 | |
105 | |
106 In this case, the command: | |
107 | |
108 gsutil cp gs://bucket/object#1360035307075000 ./dir | |
109 | |
110 will retrieve the second most recent version of the object. | |
111 | |
112 Note that version-specific URIs cannot be the target of the gsutil cp | |
113 command (trying to do so will result in an error), because writing to a | |
114 versioned object always creates a new version. | |
115 | |
116 If an object has been deleted, it will not show up in a normal gsutil ls | |
117 listing (i.e., ls without the -a option). You can restore a deleted object by | |
118 running gsutil ls -a to find the available versions, and then copying one of | |
119 the version-specific URIs to the version-less URI, for example: | |
120 | |
121 gsutil cp gs://bucket/object#1360101007329000 gs://bucket/object | |
122 | |
123 Note that when you do this it creates a new object version, which will incur | |
124 additional charges. You can get rid of the extra copy by deleting the older | |
125 version-specfic object: | |
126 | |
127 gsutil rm gs://bucket/object#1360101007329000 | |
128 | |
129 Or you can combine the two steps by using the gsutil mv command: | |
130 | |
131 gsutil mv gs://bucket/object#1360101007329000 gs://bucket/object | |
132 | |
133 If you want to remove all versions of an object use the gsutil rm -a option: | |
134 | |
135 gsutil rm -a gs://bucket/object | |
136 | |
137 Note that there is no limit to the number of older versions of an object you | |
138 will create if you continue to upload to the same object in a versioning- | |
139 enabled bucket. It is your responsibility to delete versions beyond the ones | |
140 you want to retain. | |
141 | |
142 | |
143 <B>COPYING VERSIONED BUCKETS</B> | |
144 You can copy data between two versioned buckets, using a command like: | |
145 | |
146 gsutil cp -r gs://bucket1/* gs://bucket2 | |
147 | |
148 When run using versioned buckets, this command will cause every object version | |
149 to be copied. The copies made in gs://bucket2 will have different generation | |
150 numbers (since a new generation is assigned when the object copy is made), | |
151 but the object sort order will remain consistent. For example, gs://bucket1 | |
152 might contain: | |
153 | |
154 % gsutil ls -la gs://bucket1 10 2013-06-06T02:33:11Z | |
155 53 2013-02-02T22:30:57Z gs://bucket1/file#1359844257574000 metageneration
=1 | |
156 12 2013-02-02T22:30:57Z gs://bucket1/file#1359844257615000 metageneration
=1 | |
157 97 2013-02-02T22:30:57Z gs://bucket1/file#1359844257665000 metageneration
=1 | |
158 | |
159 and after the copy, gs://bucket2 might contain: | |
160 | |
161 % gsutil ls -la gs://bucket2 | |
162 53 2013-06-06T02:33:11Z gs://bucket2/file#1370485991580000 metageneration
=1 | |
163 12 2013-06-06T02:33:14Z gs://bucket2/file#1370485994328000 metageneration
=1 | |
164 97 2013-06-06T02:33:17Z gs://bucket2/file#1370485997376000 metageneration
=1 | |
165 | |
166 Note that the object versions are in the same order (as can be seen by the | |
167 same sequence of sizes in both listings), but the generation numbers (and | |
168 timestamps) are newer in gs://bucket2. | |
169 | |
170 WARNING: If you use the gsutil -m option when copying the objects (to parallel | |
171 copy the data), object version ordering will NOT be preserved. All object | |
172 versions will be copied, but (for example) the latest/live version in the | |
173 destination bucket might be from one of the earlier versions in the source | |
174 bucket (and similarly, other versions may be out of order). When copying | |
175 versioned data it is advisable not to use the gsutil -m option. | |
176 | |
177 | |
178 <B>CONCURRENCY CONTROL</B> | |
179 If you are building an application using Google Cloud Storage, you may need to | |
180 be careful about concurrency control. Normally gsutil itself isn't used for | |
181 this purpose, but it's possible to write scripts around gsutil that perform | |
182 concurrency control. | |
183 | |
184 For example, suppose you want to implement a "rolling update" system using | |
185 gsutil, where a periodic job computes some data and uploads it to the cloud. | |
186 On each run, the job starts with the data that it computed from last run, and | |
187 computes a new value. To make this system robust, you need to have multiple | |
188 machines on which the job can run, which raises the possibility that two | |
189 simultaneous runs could attempt to update an object at the same time. This | |
190 leads to the following potential race condition: | |
191 | |
192 - job 1 computes the new value to be written | |
193 - job 2 computes the new value to be written | |
194 - job 2 writes the new value | |
195 - job 1 writes the new value | |
196 | |
197 In this case, the value that job 1 read is no longer current by the time | |
198 it goes to write the updated object, and writing at this point would result | |
199 in stale (or, depending on the application, corrupt) data. | |
200 | |
201 To prevent this, you can find the version-specific name of the object that was | |
202 created, and then use the information contained in that URI to specify an | |
203 x-goog-if-generation-match header on a subsequent gsutil cp command. You can | |
204 do this in two steps. First, use the gsutil cp -v option at upload time to get | |
205 the version-specific name of the object that was created, for example: | |
206 | |
207 gsutil cp -v file gs://bucket/object | |
208 | |
209 might output: | |
210 | |
211 Created: gs://bucket/object#1360432179236000 | |
212 | |
213 You can extract the generation value from this object and then construct a | |
214 subsequent gsutil command like this: | |
215 | |
216 gsutil -h x-goog-if-generation-match:1360432179236000 cp newfile \\ | |
217 gs://bucket/object | |
218 | |
219 This command requests Google Cloud Storage to attempt to upload newfile | |
220 but to fail the request if the generation of newfile that is live at the | |
221 time of the upload does not match that specified. | |
222 | |
223 If the command you use updates object metadata, you will need to find the | |
224 current metageneration for an object. To do this, use the gsutil ls -a and | |
225 -l options. For example, the command: | |
226 | |
227 gsutil ls -l -a gs://bucket/object | |
228 | |
229 will output something like: | |
230 | |
231 64 2013-02-12T19:59:13Z gs://bucket/object#1360699153986000 metagenerat
ion=3 | |
232 1521 2013-02-13T02:04:08Z gs://bucket/object#1360721048778000 metagenerat
ion=2 | |
233 | |
234 Given this information, you could use the following command to request setting | |
235 the ACL on the older version of the object, such that the command will fail | |
236 unless that is the current version of the data+metadata: | |
237 | |
238 gsutil -h x-goog-if-generation-match:1360699153986000 -h \\ | |
239 x-goog-if-metageneration-match:3 acl set public-read \\ | |
240 gs://bucket/object#1360699153986000 | |
241 | |
242 Without adding these headers, the update would simply overwrite the existing | |
243 ACL. Note that in contrast, the "gsutil acl ch" command uses these headers | |
244 automatically, because it performs a read-modify-write cycle in order to edit | |
245 ACLs. | |
246 | |
247 If you want to experiment with how generations and metagenerations work, try | |
248 the following. First, upload an object; then use gsutil ls -l -a to list all | |
249 versions of the object, along with each version's metageneration; then re- | |
250 upload the object and repeat the gsutil ls -l -a. You should see two object | |
251 versions, each with metageneration=1. Now try setting the ACL, and rerun the | |
252 gsutil ls -l -a. You should see the most recent object generation now has | |
253 metageneration=2. | |
254 | |
255 | |
256 <B>FOR MORE INFORMATION</B> | |
257 For more details on how to use versioning and preconditions, see | |
258 https://developers.google.com/storage/docs/object-versioning | |
259 """) | |
260 | |
261 | |
262 class CommandOptions(HelpProvider): | |
263 """Additional help about object versioning.""" | |
264 | |
265 # Help specification. See help_provider.py for documentation. | |
266 help_spec = HelpProvider.HelpSpec( | |
267 help_name='versions', | |
268 help_name_aliases=['concurrency', 'concurrency control'], | |
269 help_type='additional_help', | |
270 help_one_line_summary='Object Versioning and Concurrency Control', | |
271 help_text=_DETAILED_HELP_TEXT, | |
272 subcommand_help_text={}, | |
273 ) | |
OLD | NEW |