|
|
Chromium Code Reviews
DescriptionChange FindNextSliceToDownload() into FindSlicesToDownload()
For parallel downloads, we need a helper method to return a list of slices to download.
This CL changes FindNextSliceToDownload() into FindSlicesToDownload().
And it will return an array of slices to be downloaded.
This allows the ParallelDownloadJob to fork new request for each slice.
BUG=644352
Review-Url: https://codereview.chromium.org/2727143005
Cr-Commit-Position: refs/heads/master@{#455221}
Committed: https://chromium.googlesource.com/chromium/src/+/2bb2241a9aa5907ede2c661f1d4d1a8d9da017a1
Patch Set 1 #
Total comments: 12
Patch Set 2 : adding TODO and comments #
Total comments: 4
Messages
Total messages: 19 (4 generated)
qinmin@chromium.org changed reviewers: + dtrainor@chromium.org, xingliu@chromium.org
PTAL
Some discussion on how we use slice info. https://codereview.chromium.org/2727143005/diff/1/content/browser/download/pa... File content/browser/download/parallel_download_job.cc (right): https://codereview.chromium.org/2727143005/diff/1/content/browser/download/pa... content/browser/download/parallel_download_job.cc:38: std::vector<DownloadItem::ReceivedSlice> slices_to_download = There is already a function to build requests for new parallel download. But unified logic for both new download and resumption like this one is probably better. https://codereview.chromium.org/2727143005/diff/1/content/browser/download/pa... content/browser/download/parallel_download_job.cc:46: if (slices_to_download.size() >= kParallelRequestCount) { Maybe we can keep the mechanism simple for the first iteration: 1. For new download, fork N requests, and save N slice info in the database. 2. Download resumption, check the N slice, see if any of them need to be patched up. Each slice at most uses one request. Continue build sub requests or build them too dynamically is hard to manage and may bring more and more db IO and network overhead, especially for unstable network. Also since we don't put logic here, maybe just put a TODO. https://codereview.chromium.org/2727143005/diff/1/content/browser/download/pa... File content/browser/download/parallel_download_utils.h (right): https://codereview.chromium.org/2727143005/diff/1/content/browser/download/pa... content/browser/download/parallel_download_utils.h:15: // Find the first gap in an array of received slices and return it as the next Comment for the function is out-dated. This function looks for new slices to download. Do we directly write these to db? If not, then we still need to maintain some kind of mapping from the slices for the database and slices that are downloading, or have a function to transform the downloading slices into slices for db. This is kind of weird though since we can just use the slice info for the database, update them in memory and at some point write them back to db.
https://codereview.chromium.org/2727143005/diff/1/content/browser/download/pa... File content/browser/download/parallel_download_job.cc (right): https://codereview.chromium.org/2727143005/diff/1/content/browser/download/pa... content/browser/download/parallel_download_job.cc:38: std::vector<DownloadItem::ReceivedSlice> slices_to_download = On 2017/03/03 18:44:15, xingliu wrote: > There is already a function to build requests for new parallel download. But > unified logic for both new download and resumption like this one is probably > better. Start() is called on both resumption and new download, so this is the single entry point. If this is a new download, |slice_to_download| will be the whole file, and we need to split it before we can create additional workers. If this is a resumption, then |slice_to_download| will contain all the slices to fill the gaps. We should remove ForkRequestsForNewDownload() method and implement the if/else statement below. But i will leave that to a separate CL. https://codereview.chromium.org/2727143005/diff/1/content/browser/download/pa... content/browser/download/parallel_download_job.cc:46: if (slices_to_download.size() >= kParallelRequestCount) { On 2017/03/03 18:44:15, xingliu wrote: > Maybe we can keep the mechanism simple for the first iteration: > > 1. For new download, fork N requests, and save N slice info in the database. > > 2. Download resumption, check the N slice, see if any of them need to be patched > up. Each slice at most uses one request. > > Continue build sub requests or build them too dynamically is hard to manage and > may bring more and more db IO and network overhead, especially for unstable > network. > > > Also since we don't put logic here, maybe just put a TODO. For resumption, there could be less than N slices. This is because we forked the parallel request, but they never yield any data before browser gets killed. So we should also fork new request if the last slice is really large, and there are less than N slices available. added a TODO here. https://codereview.chromium.org/2727143005/diff/1/content/browser/download/pa... File content/browser/download/parallel_download_utils.h (right): https://codereview.chromium.org/2727143005/diff/1/content/browser/download/pa... content/browser/download/parallel_download_utils.h:15: // Find the first gap in an array of received slices and return it as the next Fixed the comment. The return value is only used for the caller to make url request, and should not be written to db. History db will finally get the new slice information when DownloadFile starts receiving data from those url requests, this happens when DownloadFileImpl::SendUpdate() is called. The return type of ReceivedSlice is bad. We should use a type FileSlice or something. The ReceivedSlice is only for slices that are written to the disk. But here the return type doesn't mean that those slices are already received. However, introducing a almost identical class is very strange, so I just use ReceivedSlice instead. DownloadSliceInfo is history/, cannot be used in content/. On 2017/03/03 18:44:15, xingliu wrote: > Comment for the function is out-dated. > > This function looks for new slices to download. Do we directly write these to > db? > > If not, then we still need to maintain some kind of mapping from the slices for > the database and slices that are downloading, or have a function to transform > the downloading slices into slices for db. > > This is kind of weird though since we can just use the slice info for the > database, update them in memory and at some point write them back to db.
https://codereview.chromium.org/2727143005/diff/1/content/browser/download/pa... File content/browser/download/parallel_download_job.cc (right): https://codereview.chromium.org/2727143005/diff/1/content/browser/download/pa... content/browser/download/parallel_download_job.cc:38: std::vector<DownloadItem::ReceivedSlice> slices_to_download = On 2017/03/03 19:32:35, qinmin wrote: > On 2017/03/03 18:44:15, xingliu wrote: > > There is already a function to build requests for new parallel download. But > > unified logic for both new download and resumption like this one is probably > > better. > > Start() is called on both resumption and new download, so this is the single > entry point. If this is a new download, |slice_to_download| will be the whole > file, and we need to split it before we can create additional workers. If this > is a resumption, then |slice_to_download| will contain all the slices to fill > the gaps. > > We should remove ForkRequestsForNewDownload() method and implement the if/else > statement below. But i will leave that to a separate CL. To clarify, Start() is called when Resuming an interrupted download, while Resume(true) is called when resuming a paused download. Either one or the other will get called, but not both.
In general sgtm, with some more discussion. https://codereview.chromium.org/2727143005/diff/1/content/browser/download/pa... File content/browser/download/parallel_download_job.cc (right): https://codereview.chromium.org/2727143005/diff/1/content/browser/download/pa... content/browser/download/parallel_download_job.cc:38: std::vector<DownloadItem::ReceivedSlice> slices_to_download = On 2017/03/03 19:44:33, qinmin wrote: > On 2017/03/03 19:32:35, qinmin wrote: > > On 2017/03/03 18:44:15, xingliu wrote: > > > There is already a function to build requests for new parallel download. But > > > unified logic for both new download and resumption like this one is probably > > > better. > > > > Start() is called on both resumption and new download, so this is the single > > entry point. If this is a new download, |slice_to_download| will be the whole > > file, and we need to split it before we can create additional workers. If this > > is a resumption, then |slice_to_download| will contain all the slices to fill > > the gaps. > > > > We should remove ForkRequestsForNewDownload() method and implement the if/else > > statement below. But i will leave that to a separate CL. > > > To clarify, Start() is called when Resuming an interrupted download, while > Resume(true) is called when resuming a paused download. Either one or the other > will get called, but not both. sgtm. https://codereview.chromium.org/2727143005/diff/1/content/browser/download/pa... content/browser/download/parallel_download_job.cc:46: if (slices_to_download.size() >= kParallelRequestCount) { On 2017/03/03 19:32:35, qinmin wrote: > On 2017/03/03 18:44:15, xingliu wrote: > > Maybe we can keep the mechanism simple for the first iteration: > > > > 1. For new download, fork N requests, and save N slice info in the database. > > > > 2. Download resumption, check the N slice, see if any of them need to be > patched > > up. Each slice at most uses one request. > > > > Continue build sub requests or build them too dynamically is hard to manage > and > > may bring more and more db IO and network overhead, especially for unstable > > network. > > > > > > Also since we don't put logic here, maybe just put a TODO. > > > For resumption, there could be less than N slices. This is because we forked the > parallel request, but they never yield any data before browser gets killed. So > we should also fork new request if the last slice is really large, and there are > less than N slices available. > > added a TODO here. In general sgtm if the slices are written to db after the file received data. Just for discussion here, that does it make sense to write slice info even before that, like when we first figure out the draft to chunk the file? https://codereview.chromium.org/2727143005/diff/1/content/browser/download/pa... File content/browser/download/parallel_download_utils.h (right): https://codereview.chromium.org/2727143005/diff/1/content/browser/download/pa... content/browser/download/parallel_download_utils.h:15: // Find the first gap in an array of received slices and return it as the next On 2017/03/03 19:32:35, qinmin wrote: > Fixed the comment. > The return value is only used for the caller to make url request, and should not > be written to db. History db will finally get the new slice information when > DownloadFile starts receiving data from those url requests, this happens when > DownloadFileImpl::SendUpdate() is called. > > The return type of ReceivedSlice is bad. We should use a type FileSlice or > something. The ReceivedSlice is only for slices that are written to the disk. > But here the return type doesn't mean that those slices are already received. > However, introducing a almost identical class is very strange, so I just use > ReceivedSlice instead. > > DownloadSliceInfo is history/, cannot be used in content/. > > On 2017/03/03 18:44:15, xingliu wrote: > > Comment for the function is out-dated. > > > > This function looks for new slices to download. Do we directly write these to > > db? > > > > If not, then we still need to maintain some kind of mapping from the slices > for > > the database and slices that are downloading, or have a function to transform > > the downloading slices into slices for db. > > > > This is kind of weird though since we can just use the slice info for the > > database, update them in memory and at some point write them back to db. > sgtm on new download use scenario. Question for resumption from browser restart: If this function is also called on resumption(in BuildParallelRequests), we build some requests that offset may be different from the slice info read from the db. Then we download some bytes, so do we still need to keep a map of offset of new request and the offset of the slice info from db here? https://codereview.chromium.org/2727143005/diff/20001/content/browser/downloa... File content/browser/download/parallel_download_job.cc (right): https://codereview.chromium.org/2727143005/diff/20001/content/browser/downloa... content/browser/download/parallel_download_job.cc:105: // split it into N pieces and pass the last N-1 pirces to different workers. nit, N-1 pieces. https://codereview.chromium.org/2727143005/diff/20001/content/browser/downloa... File content/browser/download/parallel_download_utils_unittest.cc (right): https://codereview.chromium.org/2727143005/diff/20001/content/browser/downloa... content/browser/download/parallel_download_utils_unittest.cc:53: EXPECT_EQ(1500, slices_to_download[1].offset); The last slice returned, e.g. [1500, 0]. How does the job knows if it should create a request or not? Does it mean the job must compare the content-length to figure out if it the last slice is done?
https://codereview.chromium.org/2727143005/diff/1/content/browser/download/pa... File content/browser/download/parallel_download_job.cc (right): https://codereview.chromium.org/2727143005/diff/1/content/browser/download/pa... content/browser/download/parallel_download_job.cc:46: if (slices_to_download.size() >= kParallelRequestCount) { On 2017/03/03 21:37:44, xingliu wrote: > On 2017/03/03 19:32:35, qinmin wrote: > > On 2017/03/03 18:44:15, xingliu wrote: > > > Maybe we can keep the mechanism simple for the first iteration: > > > > > > 1. For new download, fork N requests, and save N slice info in the database. > > > > > > 2. Download resumption, check the N slice, see if any of them need to be > > patched > > > up. Each slice at most uses one request. > > > > > > Continue build sub requests or build them too dynamically is hard to manage > > and > > > may bring more and more db IO and network overhead, especially for unstable > > > network. > > > > > > > > > Also since we don't put logic here, maybe just put a TODO. > > > > > > For resumption, there could be less than N slices. This is because we forked > the > > parallel request, but they never yield any data before browser gets killed. So > > we should also fork new request if the last slice is really large, and there > are > > less than N slices available. > > > > added a TODO here. > > In general sgtm if the slices are written to db after the file received data. > > Just for discussion here, that does it make sense to write slice info even > before that, like when we first figure out the draft to chunk the file? what do you mean by "write slice info"? You mean write the info into history db? Then the answer is no. The slice info should never reach the history db unless actual data are written. The history db will also ignore slices that has 0 bytes of data. History db commit is lagging the actual file thread, so it is perfectly fine that 1000 bytes is written to disk while history db only stores 500. https://codereview.chromium.org/2727143005/diff/20001/content/browser/downloa... File content/browser/download/parallel_download_job.cc (right): https://codereview.chromium.org/2727143005/diff/20001/content/browser/downloa... content/browser/download/parallel_download_job.cc:105: // split it into N pieces and pass the last N-1 pirces to different workers. On 2017/03/03 21:37:44, xingliu wrote: > nit, N-1 pieces. Done. https://codereview.chromium.org/2727143005/diff/20001/content/browser/downloa... File content/browser/download/parallel_download_utils_unittest.cc (right): https://codereview.chromium.org/2727143005/diff/20001/content/browser/downloa... content/browser/download/parallel_download_utils_unittest.cc:53: EXPECT_EQ(1500, slices_to_download[1].offset); On 2017/03/03 21:37:44, xingliu wrote: > The last slice returned, e.g. [1500, 0]. How does the job knows if it should > create a request or not? Does it mean the job must compare the content-length to > figure out if it the last slice is done? Yes, it has to figure it out by itself. This issue is not related to this Change, but it is related to parallel downloading itself. For example, if I created 2 parallel requests, and the 2nd requests finishes early. Then if browser crashes before the 1st request finishes, we won't know if we should start the 2nd request on resumption. Unless we store another boolean variable in history db to indicate that the slice is finished, we won't have a answer. We could: 1. check the content-length if it is there. Though this can still introduce some issues if the content-length is wrong and the 2nd request happened to download content-length bytes last time. 2. or, Always create a request at the end, i think this is a safer solution
lgtm
https://codereview.chromium.org/2727143005/diff/1/content/browser/download/pa... File content/browser/download/parallel_download_utils.h (right): https://codereview.chromium.org/2727143005/diff/1/content/browser/download/pa... content/browser/download/parallel_download_utils.h:15: // Find the first gap in an array of received slices and return it as the next On 2017/03/03 21:37:44, xingliu wrote: > On 2017/03/03 19:32:35, qinmin wrote: > > Fixed the comment. > > The return value is only used for the caller to make url request, and should > not > > be written to db. History db will finally get the new slice information when > > DownloadFile starts receiving data from those url requests, this happens when > > DownloadFileImpl::SendUpdate() is called. > > > > The return type of ReceivedSlice is bad. We should use a type FileSlice or > > something. The ReceivedSlice is only for slices that are written to the disk. > > But here the return type doesn't mean that those slices are already received. > > However, introducing a almost identical class is very strange, so I just use > > ReceivedSlice instead. > > > > DownloadSliceInfo is history/, cannot be used in content/. > > > > On 2017/03/03 18:44:15, xingliu wrote: > > > Comment for the function is out-dated. > > > > > > This function looks for new slices to download. Do we directly write these > to > > > db? > > > > > > If not, then we still need to maintain some kind of mapping from the slices > > for > > > the database and slices that are downloading, or have a function to > transform > > > the downloading slices into slices for db. > > > > > > This is kind of weird though since we can just use the slice info for the > > > database, update them in memory and at some point write them back to db. > > > > sgtm on new download use scenario. > > Question for resumption from browser restart: > > If this function is also called on resumption(in BuildParallelRequests), we > build some requests that offset may be different from the slice info read from > the db. Then we download some bytes, so do we still need to keep a map of offset > of new request and the offset of the slice info from db here? Once all the parallel requests are kicked off, this class shouldn't really care. This is because the new request can fail, so those new offsets are not final. The DownloadFileImpl, on the otherhand, will be responsible for keeping track of the new slices, and also the old one. DownloadFileImpl can call this class once all the stream finishes, and this class can then decide whether it wants to issue additional requests by checking the received_slices_ from DownloadItem(assuming downloadFileImpl already reported all the slices to DownloadItem).
On 2017/03/03 22:20:10, qinmin wrote: > https://codereview.chromium.org/2727143005/diff/1/content/browser/download/pa... > File content/browser/download/parallel_download_utils.h (right): > > https://codereview.chromium.org/2727143005/diff/1/content/browser/download/pa... > content/browser/download/parallel_download_utils.h:15: // Find the first gap in > an array of received slices and return it as the next > On 2017/03/03 21:37:44, xingliu wrote: > > On 2017/03/03 19:32:35, qinmin wrote: > > > Fixed the comment. > > > The return value is only used for the caller to make url request, and should > > not > > > be written to db. History db will finally get the new slice information when > > > DownloadFile starts receiving data from those url requests, this happens > when > > > DownloadFileImpl::SendUpdate() is called. > > > > > > The return type of ReceivedSlice is bad. We should use a type FileSlice or > > > something. The ReceivedSlice is only for slices that are written to the > disk. > > > But here the return type doesn't mean that those slices are already > received. > > > However, introducing a almost identical class is very strange, so I just use > > > ReceivedSlice instead. > > > > > > DownloadSliceInfo is history/, cannot be used in content/. > > > > > > On 2017/03/03 18:44:15, xingliu wrote: > > > > Comment for the function is out-dated. > > > > > > > > This function looks for new slices to download. Do we directly write these > > to > > > > db? > > > > > > > > If not, then we still need to maintain some kind of mapping from the > slices > > > for > > > > the database and slices that are downloading, or have a function to > > transform > > > > the downloading slices into slices for db. > > > > > > > > This is kind of weird though since we can just use the slice info for the > > > > database, update them in memory and at some point write them back to db. > > > > > > > sgtm on new download use scenario. > > > > Question for resumption from browser restart: > > > > If this function is also called on resumption(in BuildParallelRequests), we > > build some requests that offset may be different from the slice info read from > > the db. Then we download some bytes, so do we still need to keep a map of > offset > > of new request and the offset of the slice info from db here? > > Once all the parallel requests are kicked off, this class shouldn't really care. > This is because the new request can fail, so those new offsets are not final. > The DownloadFileImpl, on the otherhand, will be responsible for keeping track > of the new slices, and also the old one. > > DownloadFileImpl can call this class once all the stream finishes, and this > class can then decide whether it wants to issue additional requests by checking > the received_slices_ from DownloadItem(assuming downloadFileImpl already > reported all the slices to DownloadItem). This generally sgtm. Does DownloadFileImpl need to know the old slices? It feels like DownloadFileImpl will take too many responsibility. Ideally DownloadFileImpl is good to be agnostic to previous data but just aware of new incoming streams.
On 2017/03/03 23:31:50, xingliu wrote: > On 2017/03/03 22:20:10, qinmin wrote: > > > https://codereview.chromium.org/2727143005/diff/1/content/browser/download/pa... > > File content/browser/download/parallel_download_utils.h (right): > > > > > https://codereview.chromium.org/2727143005/diff/1/content/browser/download/pa... > > content/browser/download/parallel_download_utils.h:15: // Find the first gap > in > > an array of received slices and return it as the next > > On 2017/03/03 21:37:44, xingliu wrote: > > > On 2017/03/03 19:32:35, qinmin wrote: > > > > Fixed the comment. > > > > The return value is only used for the caller to make url request, and > should > > > not > > > > be written to db. History db will finally get the new slice information > when > > > > DownloadFile starts receiving data from those url requests, this happens > > when > > > > DownloadFileImpl::SendUpdate() is called. > > > > > > > > The return type of ReceivedSlice is bad. We should use a type FileSlice or > > > > something. The ReceivedSlice is only for slices that are written to the > > disk. > > > > But here the return type doesn't mean that those slices are already > > received. > > > > However, introducing a almost identical class is very strange, so I just > use > > > > ReceivedSlice instead. > > > > > > > > DownloadSliceInfo is history/, cannot be used in content/. > > > > > > > > On 2017/03/03 18:44:15, xingliu wrote: > > > > > Comment for the function is out-dated. > > > > > > > > > > This function looks for new slices to download. Do we directly write > these > > > to > > > > > db? > > > > > > > > > > If not, then we still need to maintain some kind of mapping from the > > slices > > > > for > > > > > the database and slices that are downloading, or have a function to > > > transform > > > > > the downloading slices into slices for db. > > > > > > > > > > This is kind of weird though since we can just use the slice info for > the > > > > > database, update them in memory and at some point write them back to db. > > > > > > > > > > sgtm on new download use scenario. > > > > > > Question for resumption from browser restart: > > > > > > If this function is also called on resumption(in BuildParallelRequests), we > > > build some requests that offset may be different from the slice info read > from > > > the db. Then we download some bytes, so do we still need to keep a map of > > offset > > > of new request and the offset of the slice info from db here? > > > > Once all the parallel requests are kicked off, this class shouldn't really > care. > > This is because the new request can fail, so those new offsets are not final. > > The DownloadFileImpl, on the otherhand, will be responsible for keeping track > > of the new slices, and also the old one. > > > > DownloadFileImpl can call this class once all the stream finishes, and this > > class can then decide whether it wants to issue additional requests by > checking > > the received_slices_ from DownloadItem(assuming downloadFileImpl already > > reported all the slices to DownloadItem). > > This generally sgtm. > > Does DownloadFileImpl need to know the old slices? It feels like > DownloadFileImpl will take too many responsibility. > > Ideally DownloadFileImpl is good to be agnostic to previous data but just aware > of new incoming streams. It should, otherwise it won't be able to check the integrity of the file.
ping, dtainor@, please take a look
lgtm!
The CQ bit was checked by qinmin@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch.
Bot data: {"patchset_id": 20001, "attempt_start_ts": 1488916803513660,
"parent_rev": "b5f30def1f14944ea1d342b99c2afcd693a31278", "commit_rev":
"2bb2241a9aa5907ede2c661f1d4d1a8d9da017a1"}
Message was sent while issue was closed.
Description was changed from ========== Change FindNextSliceToDownload() into FindSlicesToDownload() For parallel downloads, we need a helper method to return a list of slices to download. This CL changes FindNextSliceToDownload() into FindSlicesToDownload(). And it will return an array of slices to be downloaded. This allows the ParallelDownloadJob to fork new request for each slice. BUG=644352 ========== to ========== Change FindNextSliceToDownload() into FindSlicesToDownload() For parallel downloads, we need a helper method to return a list of slices to download. This CL changes FindNextSliceToDownload() into FindSlicesToDownload(). And it will return an array of slices to be downloaded. This allows the ParallelDownloadJob to fork new request for each slice. BUG=644352 Review-Url: https://codereview.chromium.org/2727143005 Cr-Commit-Position: refs/heads/master@{#455221} Committed: https://chromium.googlesource.com/chromium/src/+/2bb2241a9aa5907ede2c661f1d4d... ==========
Message was sent while issue was closed.
Committed patchset #2 (id:20001) as https://chromium.googlesource.com/chromium/src/+/2bb2241a9aa5907ede2c661f1d4d... |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
