OLD | NEW |
1 # -*- coding: utf-8 -*- | 1 # -*- coding: utf-8 -*- |
2 # Copyright 2012 Google Inc. All Rights Reserved. | 2 # Copyright 2012 Google Inc. All Rights Reserved. |
3 # | 3 # |
4 # Licensed under the Apache License, Version 2.0 (the "License"); | 4 # Licensed under the Apache License, Version 2.0 (the "License"); |
5 # you may not use this file except in compliance with the License. | 5 # you may not use this file except in compliance with the License. |
6 # You may obtain a copy of the License at | 6 # You may obtain a copy of the License at |
7 # | 7 # |
8 # http://www.apache.org/licenses/LICENSE-2.0 | 8 # http://www.apache.org/licenses/LICENSE-2.0 |
9 # | 9 # |
10 # Unless required by applicable law or agreed to in writing, software | 10 # Unless required by applicable law or agreed to in writing, software |
(...skipping 17 matching lines...) Expand all Loading... |
28 | 28 |
29 | 29 |
30 <B>BACKGROUND ON RESUMABLE TRANSFERS</B> | 30 <B>BACKGROUND ON RESUMABLE TRANSFERS</B> |
31 First, it's helpful to understand gsutil's resumable transfer mechanism, | 31 First, it's helpful to understand gsutil's resumable transfer mechanism, |
32 and how your script needs to be implemented around this mechanism to work | 32 and how your script needs to be implemented around this mechanism to work |
33 reliably. gsutil uses resumable transfer support when you attempt to upload | 33 reliably. gsutil uses resumable transfer support when you attempt to upload |
34 or download a file larger than a configurable threshold (by default, this | 34 or download a file larger than a configurable threshold (by default, this |
35 threshold is 2 MiB). When a transfer fails partway through (e.g., because of | 35 threshold is 2 MiB). When a transfer fails partway through (e.g., because of |
36 an intermittent network problem), gsutil uses a truncated randomized binary | 36 an intermittent network problem), gsutil uses a truncated randomized binary |
37 exponential backoff-and-retry strategy that by default will retry transfers up | 37 exponential backoff-and-retry strategy that by default will retry transfers up |
38 to 6 times over a 63 second period of time (see "gsutil help retries" for | 38 to 23 times over a 10 minute period of time (see "gsutil help retries" for |
39 details). If the transfer fails each of these attempts with no intervening | 39 details). If the transfer fails each of these attempts with no intervening |
40 progress, gsutil gives up on the transfer, but keeps a "tracker" file for | 40 progress, gsutil gives up on the transfer, but keeps a "tracker" file for |
41 it in a configurable location (the default location is ~/.gsutil/, in a file | 41 it in a configurable location (the default location is ~/.gsutil/, in a file |
42 named by a combination of the SHA1 hash of the name of the bucket and object | 42 named by a combination of the SHA1 hash of the name of the bucket and object |
43 being transferred and the last 16 characters of the file name). When transfers | 43 being transferred and the last 16 characters of the file name). When transfers |
44 fail in this fashion, you can rerun gsutil at some later time (e.g., after | 44 fail in this fashion, you can rerun gsutil at some later time (e.g., after |
45 the networking problem has been resolved), and the resumable transfer picks | 45 the networking problem has been resolved), and the resumable transfer picks |
46 up where it left off. | 46 up where it left off. |
47 | 47 |
48 | 48 |
49 <B>SCRIPTING DATA TRANSFER TASKS</B> | 49 <B>SCRIPTING DATA TRANSFER TASKS</B> |
50 To script large production data transfer tasks around this mechanism, | 50 To script large production data transfer tasks around this mechanism, |
51 you can implement a script that runs periodically, determines which file | 51 you can implement a script that runs periodically, determines which file |
52 transfers have not yet succeeded, and runs gsutil to copy them. Below, | 52 transfers have not yet succeeded, and runs gsutil to copy them. Below, |
53 we offer a number of suggestions about how this type of scripting should | 53 we offer a number of suggestions about how this type of scripting should |
54 be implemented: | 54 be implemented: |
55 | 55 |
56 1. When resumable transfers fail without any progress 6 times in a row | 56 1. When resumable transfers fail without any progress 23 times in a row |
57 over the course of up to 63 seconds, it probably won't work to simply | 57 over the course of up to 10 minutes, it probably won't work to simply |
58 retry the transfer immediately. A more successful strategy would be to | 58 retry the transfer immediately. A more successful strategy would be to |
59 have a cron job that runs every 30 minutes, determines which transfers | 59 have a cron job that runs every 30 minutes, determines which transfers |
60 need to be run, and runs them. If the network experiences intermittent | 60 need to be run, and runs them. If the network experiences intermittent |
61 problems, the script picks up where it left off and will eventually | 61 problems, the script picks up where it left off and will eventually |
62 succeed (once the network problem has been resolved). | 62 succeed (once the network problem has been resolved). |
63 | 63 |
64 2. If your business depends on timely data transfer, you should consider | 64 2. If your business depends on timely data transfer, you should consider |
65 implementing some network monitoring. For example, you can implement | 65 implementing some network monitoring. For example, you can implement |
66 a task that attempts a small download every few minutes and raises an | 66 a task that attempts a small download every few minutes and raises an |
67 alert if the attempt fails for several attempts in a row (or more or less | 67 alert if the attempt fails for several attempts in a row (or more or less |
(...skipping 32 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... |
100 be done. | 100 be done. |
101 | 101 |
102 4. If you have really large numbers of objects in a single bucket | 102 4. If you have really large numbers of objects in a single bucket |
103 (say hundreds of thousands or more), you should consider tracking your | 103 (say hundreds of thousands or more), you should consider tracking your |
104 objects in a database instead of using bucket listings to enumerate | 104 objects in a database instead of using bucket listings to enumerate |
105 the objects. For example this database could track the state of your | 105 the objects. For example this database could track the state of your |
106 downloads, so you can determine what objects need to be downloaded by | 106 downloads, so you can determine what objects need to be downloaded by |
107 your periodic download script by querying the database locally instead | 107 your periodic download script by querying the database locally instead |
108 of performing a bucket listing. | 108 of performing a bucket listing. |
109 | 109 |
110 5. Make sure you don't delete partially downloaded files after a transfer | 110 5. Make sure you don't delete partially downloaded temporary files after a |
111 fails: gsutil picks up where it left off (and performs an MD5 check of | 111 transfer fails: gsutil picks up where it left off (and performs a hash |
112 the final downloaded content to ensure data integrity), so deleting | 112 of the final downloaded content to ensure data integrity), so deleting |
113 partially transferred files will cause you to lose progress and make | 113 partially transferred files will cause you to lose progress and make |
114 more wasteful use of your network. You should also make sure whatever | 114 more wasteful use of your network. |
115 process is waiting to consume the downloaded data doesn't get pointed | |
116 at the partially downloaded files. One way to do this is to download | |
117 into a staging directory and then move successfully downloaded files to | |
118 a directory where consumer processes will read them. | |
119 | 115 |
120 6. If you have a fast network connection, you can speed up the transfer of | 116 6. If you have a fast network connection, you can speed up the transfer of |
121 large numbers of files by using the gsutil -m (multi-threading / | 117 large numbers of files by using the gsutil -m (multi-threading / |
122 multi-processing) option. Be aware, however, that gsutil doesn't attempt to | 118 multi-processing) option. Be aware, however, that gsutil doesn't attempt to |
123 keep track of which files were downloaded successfully in cases where some | 119 keep track of which files were downloaded successfully in cases where some |
124 files failed to download. For example, if you use multi-threaded transfers | 120 files failed to download. For example, if you use multi-threaded transfers |
125 to download 100 files and 3 failed to download, it is up to your scripting | 121 to download 100 files and 3 failed to download, it is up to your scripting |
126 process to determine which transfers didn't succeed, and retry them. A | 122 process to determine which transfers didn't succeed, and retry them. A |
127 periodic check-and-run approach like outlined earlier would handle this | 123 periodic check-and-run approach like outlined earlier would handle this |
128 case. | 124 case. |
129 | 125 |
130 If you use parallel transfers (gsutil -m) you might want to experiment with | 126 If you use parallel transfers (gsutil -m) you might want to experiment with |
131 the number of threads being used (via the parallel_thread_count setting | 127 the number of threads being used (via the parallel_thread_count setting |
132 in the .boto config file). By default, gsutil uses 10 threads for Linux | 128 in the .boto config file). By default, gsutil uses 10 threads for Linux |
133 and 24 threads for other operating systems. Depending on your network | 129 and 24 threads for other operating systems. Depending on your network |
134 speed, available memory, CPU load, and other conditions, this may or may | 130 speed, available memory, CPU load, and other conditions, this may or may |
135 not be optimal. Try experimenting with higher or lower numbers of threads | 131 not be optimal. Try experimenting with higher or lower numbers of threads |
136 to find the best number of threads for your environment. | 132 to find the best number of threads for your environment. |
137 | |
138 <B>RUNNING GSUTIL ON MULTIPLE MACHINES</B> | |
139 When running gsutil on multiple machines that are all attempting to use the | |
140 same OAuth2 refresh token, it is possible to encounter rate limiting errors | |
141 for the refresh requests (especially if all of these machines are likely to | |
142 start running gsutil at the same time). To account for this, gsutil will | |
143 automatically retry OAuth2 refresh requests with a truncated randomized | |
144 exponential backoff strategy like that which is described in the | |
145 "BACKGROUND ON RESUMABLE TRANSFERS" section above. The number of retries | |
146 attempted for OAuth2 refresh requests can be controlled via the | |
147 "oauth2_refresh_retries" variable in the .boto config file. | |
148 """) | 133 """) |
149 | 134 |
150 | 135 |
151 class CommandOptions(HelpProvider): | 136 class CommandOptions(HelpProvider): |
152 """Additional help about using gsutil for production tasks.""" | 137 """Additional help about using gsutil for production tasks.""" |
153 | 138 |
154 # Help specification. See help_provider.py for documentation. | 139 # Help specification. See help_provider.py for documentation. |
155 help_spec = HelpProvider.HelpSpec( | 140 help_spec = HelpProvider.HelpSpec( |
156 help_name='prod', | 141 help_name='prod', |
157 help_name_aliases=[ | 142 help_name_aliases=[ |
158 'production', 'resumable', 'resumable upload', 'resumable transfer', | 143 'production', 'resumable', 'resumable upload', 'resumable transfer', |
159 'resumable download', 'scripts', 'scripting'], | 144 'resumable download', 'scripts', 'scripting'], |
160 help_type='additional_help', | 145 help_type='additional_help', |
161 help_one_line_summary='Scripting Production Transfers', | 146 help_one_line_summary='Scripting Production Transfers', |
162 help_text=_DETAILED_HELP_TEXT, | 147 help_text=_DETAILED_HELP_TEXT, |
163 subcommand_help_text={}, | 148 subcommand_help_text={}, |
164 ) | 149 ) |
OLD | NEW |