Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(103)

Side by Side Diff: third_party/gsutil/gslib/addlhelp/prod.py

Issue 1380943003: Roll version of gsutil to 4.15. (Closed) Base URL: https://github.com/catapult-project/catapult.git@master
Patch Set: rebase Created 5 years ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
OLDNEW
1 # -*- coding: utf-8 -*- 1 # -*- coding: utf-8 -*-
2 # Copyright 2012 Google Inc. All Rights Reserved. 2 # Copyright 2012 Google Inc. All Rights Reserved.
3 # 3 #
4 # Licensed under the Apache License, Version 2.0 (the "License"); 4 # Licensed under the Apache License, Version 2.0 (the "License");
5 # you may not use this file except in compliance with the License. 5 # you may not use this file except in compliance with the License.
6 # You may obtain a copy of the License at 6 # You may obtain a copy of the License at
7 # 7 #
8 # http://www.apache.org/licenses/LICENSE-2.0 8 # http://www.apache.org/licenses/LICENSE-2.0
9 # 9 #
10 # Unless required by applicable law or agreed to in writing, software 10 # Unless required by applicable law or agreed to in writing, software
(...skipping 17 matching lines...) Expand all
28 28
29 29
30 <B>BACKGROUND ON RESUMABLE TRANSFERS</B> 30 <B>BACKGROUND ON RESUMABLE TRANSFERS</B>
31 First, it's helpful to understand gsutil's resumable transfer mechanism, 31 First, it's helpful to understand gsutil's resumable transfer mechanism,
32 and how your script needs to be implemented around this mechanism to work 32 and how your script needs to be implemented around this mechanism to work
33 reliably. gsutil uses resumable transfer support when you attempt to upload 33 reliably. gsutil uses resumable transfer support when you attempt to upload
34 or download a file larger than a configurable threshold (by default, this 34 or download a file larger than a configurable threshold (by default, this
35 threshold is 2 MiB). When a transfer fails partway through (e.g., because of 35 threshold is 2 MiB). When a transfer fails partway through (e.g., because of
36 an intermittent network problem), gsutil uses a truncated randomized binary 36 an intermittent network problem), gsutil uses a truncated randomized binary
37 exponential backoff-and-retry strategy that by default will retry transfers up 37 exponential backoff-and-retry strategy that by default will retry transfers up
38 to 6 times over a 63 second period of time (see "gsutil help retries" for 38 to 23 times over a 10 minute period of time (see "gsutil help retries" for
39 details). If the transfer fails each of these attempts with no intervening 39 details). If the transfer fails each of these attempts with no intervening
40 progress, gsutil gives up on the transfer, but keeps a "tracker" file for 40 progress, gsutil gives up on the transfer, but keeps a "tracker" file for
41 it in a configurable location (the default location is ~/.gsutil/, in a file 41 it in a configurable location (the default location is ~/.gsutil/, in a file
42 named by a combination of the SHA1 hash of the name of the bucket and object 42 named by a combination of the SHA1 hash of the name of the bucket and object
43 being transferred and the last 16 characters of the file name). When transfers 43 being transferred and the last 16 characters of the file name). When transfers
44 fail in this fashion, you can rerun gsutil at some later time (e.g., after 44 fail in this fashion, you can rerun gsutil at some later time (e.g., after
45 the networking problem has been resolved), and the resumable transfer picks 45 the networking problem has been resolved), and the resumable transfer picks
46 up where it left off. 46 up where it left off.
47 47
48 48
49 <B>SCRIPTING DATA TRANSFER TASKS</B> 49 <B>SCRIPTING DATA TRANSFER TASKS</B>
50 To script large production data transfer tasks around this mechanism, 50 To script large production data transfer tasks around this mechanism,
51 you can implement a script that runs periodically, determines which file 51 you can implement a script that runs periodically, determines which file
52 transfers have not yet succeeded, and runs gsutil to copy them. Below, 52 transfers have not yet succeeded, and runs gsutil to copy them. Below,
53 we offer a number of suggestions about how this type of scripting should 53 we offer a number of suggestions about how this type of scripting should
54 be implemented: 54 be implemented:
55 55
56 1. When resumable transfers fail without any progress 6 times in a row 56 1. When resumable transfers fail without any progress 23 times in a row
57 over the course of up to 63 seconds, it probably won't work to simply 57 over the course of up to 10 minutes, it probably won't work to simply
58 retry the transfer immediately. A more successful strategy would be to 58 retry the transfer immediately. A more successful strategy would be to
59 have a cron job that runs every 30 minutes, determines which transfers 59 have a cron job that runs every 30 minutes, determines which transfers
60 need to be run, and runs them. If the network experiences intermittent 60 need to be run, and runs them. If the network experiences intermittent
61 problems, the script picks up where it left off and will eventually 61 problems, the script picks up where it left off and will eventually
62 succeed (once the network problem has been resolved). 62 succeed (once the network problem has been resolved).
63 63
64 2. If your business depends on timely data transfer, you should consider 64 2. If your business depends on timely data transfer, you should consider
65 implementing some network monitoring. For example, you can implement 65 implementing some network monitoring. For example, you can implement
66 a task that attempts a small download every few minutes and raises an 66 a task that attempts a small download every few minutes and raises an
67 alert if the attempt fails for several attempts in a row (or more or less 67 alert if the attempt fails for several attempts in a row (or more or less
(...skipping 32 matching lines...) Expand 10 before | Expand all | Expand 10 after
100 be done. 100 be done.
101 101
102 4. If you have really large numbers of objects in a single bucket 102 4. If you have really large numbers of objects in a single bucket
103 (say hundreds of thousands or more), you should consider tracking your 103 (say hundreds of thousands or more), you should consider tracking your
104 objects in a database instead of using bucket listings to enumerate 104 objects in a database instead of using bucket listings to enumerate
105 the objects. For example this database could track the state of your 105 the objects. For example this database could track the state of your
106 downloads, so you can determine what objects need to be downloaded by 106 downloads, so you can determine what objects need to be downloaded by
107 your periodic download script by querying the database locally instead 107 your periodic download script by querying the database locally instead
108 of performing a bucket listing. 108 of performing a bucket listing.
109 109
110 5. Make sure you don't delete partially downloaded files after a transfer 110 5. Make sure you don't delete partially downloaded temporary files after a
111 fails: gsutil picks up where it left off (and performs an MD5 check of 111 transfer fails: gsutil picks up where it left off (and performs a hash
112 the final downloaded content to ensure data integrity), so deleting 112 of the final downloaded content to ensure data integrity), so deleting
113 partially transferred files will cause you to lose progress and make 113 partially transferred files will cause you to lose progress and make
114 more wasteful use of your network. You should also make sure whatever 114 more wasteful use of your network.
115 process is waiting to consume the downloaded data doesn't get pointed
116 at the partially downloaded files. One way to do this is to download
117 into a staging directory and then move successfully downloaded files to
118 a directory where consumer processes will read them.
119 115
120 6. If you have a fast network connection, you can speed up the transfer of 116 6. If you have a fast network connection, you can speed up the transfer of
121 large numbers of files by using the gsutil -m (multi-threading / 117 large numbers of files by using the gsutil -m (multi-threading /
122 multi-processing) option. Be aware, however, that gsutil doesn't attempt to 118 multi-processing) option. Be aware, however, that gsutil doesn't attempt to
123 keep track of which files were downloaded successfully in cases where some 119 keep track of which files were downloaded successfully in cases where some
124 files failed to download. For example, if you use multi-threaded transfers 120 files failed to download. For example, if you use multi-threaded transfers
125 to download 100 files and 3 failed to download, it is up to your scripting 121 to download 100 files and 3 failed to download, it is up to your scripting
126 process to determine which transfers didn't succeed, and retry them. A 122 process to determine which transfers didn't succeed, and retry them. A
127 periodic check-and-run approach like outlined earlier would handle this 123 periodic check-and-run approach like outlined earlier would handle this
128 case. 124 case.
129 125
130 If you use parallel transfers (gsutil -m) you might want to experiment with 126 If you use parallel transfers (gsutil -m) you might want to experiment with
131 the number of threads being used (via the parallel_thread_count setting 127 the number of threads being used (via the parallel_thread_count setting
132 in the .boto config file). By default, gsutil uses 10 threads for Linux 128 in the .boto config file). By default, gsutil uses 10 threads for Linux
133 and 24 threads for other operating systems. Depending on your network 129 and 24 threads for other operating systems. Depending on your network
134 speed, available memory, CPU load, and other conditions, this may or may 130 speed, available memory, CPU load, and other conditions, this may or may
135 not be optimal. Try experimenting with higher or lower numbers of threads 131 not be optimal. Try experimenting with higher or lower numbers of threads
136 to find the best number of threads for your environment. 132 to find the best number of threads for your environment.
137
138 <B>RUNNING GSUTIL ON MULTIPLE MACHINES</B>
139 When running gsutil on multiple machines that are all attempting to use the
140 same OAuth2 refresh token, it is possible to encounter rate limiting errors
141 for the refresh requests (especially if all of these machines are likely to
142 start running gsutil at the same time). To account for this, gsutil will
143 automatically retry OAuth2 refresh requests with a truncated randomized
144 exponential backoff strategy like that which is described in the
145 "BACKGROUND ON RESUMABLE TRANSFERS" section above. The number of retries
146 attempted for OAuth2 refresh requests can be controlled via the
147 "oauth2_refresh_retries" variable in the .boto config file.
148 """) 133 """)
149 134
150 135
151 class CommandOptions(HelpProvider): 136 class CommandOptions(HelpProvider):
152 """Additional help about using gsutil for production tasks.""" 137 """Additional help about using gsutil for production tasks."""
153 138
154 # Help specification. See help_provider.py for documentation. 139 # Help specification. See help_provider.py for documentation.
155 help_spec = HelpProvider.HelpSpec( 140 help_spec = HelpProvider.HelpSpec(
156 help_name='prod', 141 help_name='prod',
157 help_name_aliases=[ 142 help_name_aliases=[
158 'production', 'resumable', 'resumable upload', 'resumable transfer', 143 'production', 'resumable', 'resumable upload', 'resumable transfer',
159 'resumable download', 'scripts', 'scripting'], 144 'resumable download', 'scripts', 'scripting'],
160 help_type='additional_help', 145 help_type='additional_help',
161 help_one_line_summary='Scripting Production Transfers', 146 help_one_line_summary='Scripting Production Transfers',
162 help_text=_DETAILED_HELP_TEXT, 147 help_text=_DETAILED_HELP_TEXT,
163 subcommand_help_text={}, 148 subcommand_help_text={},
164 ) 149 )
OLDNEW
« no previous file with comments | « third_party/gsutil/gslib/addlhelp/metadata.py ('k') | third_party/gsutil/gslib/addlhelp/security.py » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698