|
|
DescriptionRefactor BaseFile class to support sparse files
To support parallel downloading, a file might have multiple writers at the same time.
This CL refactors the BaseFile class to support sparse files.
It adds a method to allow callers to write to arbitrary offsets.
And this CL skips all the hash validation logic if the |is_sparse_file_| is true.
BUG=644352
Review-Url: https://codereview.chromium.org/2695153002
Cr-Commit-Position: refs/heads/master@{#451276}
Committed: https://chromium.googlesource.com/chromium/src/+/f456c408998b5178f9f72810f417d57ffacb6da2
Patch Set 1 #
Total comments: 6
Patch Set 2 : using a single fd #
Total comments: 12
Patch Set 3 : introduce is_sparse_file_ #Patch Set 4 : remove the while loop around write #
Messages
Total messages: 38 (24 generated)
qinmin@chromium.org changed reviewers: + asanka@chromium.org, dtrainor@chromium.org, xingliu@chromium.org
PTAL
Patchset #1 (id:1) has been deleted
The CQ bit was checked by xingliu@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_chromium_compile_dbg_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_comp...) win_clang on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_clang/builds/...)
https://codereview.chromium.org/2695153002/diff/20001/content/browser/downloa... File content/browser/download/base_file.cc (right): https://codereview.chromium.org/2695153002/diff/20001/content/browser/downloa... content/browser/download/base_file.cc:125: if (secure_hash_) Maybe add DCHECK(access_mode_ == AccessMode::EXCLUSIVE); https://codereview.chromium.org/2695153002/diff/20001/content/browser/downloa... File content/browser/download/base_file_unittest.cc (right): https://codereview.chromium.org/2695153002/diff/20001/content/browser/downloa... content/browser/download/base_file_unittest.cc:782: EXPECT_EQ(DOWNLOAD_INTERRUPT_REASON_NONE, In this way, each parallel download worker class will hold a DownloadFile objects, each DownloadFile holds a content::BaseFile. I'll create a branch try to glue network IO with disk IO in this way. It will create multiple file descriptors on low level disk IO api. I think currently base::File supports that in linux and windows.
https://codereview.chromium.org/2695153002/diff/20001/content/browser/downloa... File content/browser/download/base_file.cc (right): https://codereview.chromium.org/2695153002/diff/20001/content/browser/downloa... content/browser/download/base_file.cc:125: if (secure_hash_) On 2017/02/15 19:13:20, xingliu wrote: > Maybe add DCHECK(access_mode_ == AccessMode::EXCLUSIVE); secure_hash_ is non null, so this should be in the if condition, rather than a DCHECK. https://codereview.chromium.org/2695153002/diff/20001/content/browser/downloa... File content/browser/download/base_file_unittest.cc (right): https://codereview.chromium.org/2695153002/diff/20001/content/browser/downloa... content/browser/download/base_file_unittest.cc:782: EXPECT_EQ(DOWNLOAD_INTERRUPT_REASON_NONE, On 2017/02/15 19:13:20, xingliu wrote: > In this way, each parallel download worker class will hold a DownloadFile > objects, each DownloadFile holds a content::BaseFile. I'll create a branch try > to glue network IO with disk IO in this way. > > It will create multiple file descriptors on low level disk IO api. I think > currently base::File supports that in linux and windows. An alternative is to provide a write(int64_t offset, const char* data, size_t data_len) function so that we can write to arbitrary location. Probably that is better, but then each job needs to figure out how much data they have written by themselves, rather than using the bytes_so_far() function.
https://codereview.chromium.org/2695153002/diff/20001/content/browser/downloa... File content/browser/download/base_file.cc (right): https://codereview.chromium.org/2695153002/diff/20001/content/browser/downloa... content/browser/download/base_file.cc:125: if (secure_hash_) On 2017/02/15 20:00:43, qinmin wrote: > On 2017/02/15 19:13:20, xingliu wrote: > > Maybe add DCHECK(access_mode_ == AccessMode::EXCLUSIVE); > > secure_hash_ is non null, so this should be in the if condition, rather than a > DCHECK. Got it, sorry I thought in EXCLUSIVE mode there will be no secure_hash_ computation. https://codereview.chromium.org/2695153002/diff/20001/content/browser/downloa... File content/browser/download/base_file_unittest.cc (right): https://codereview.chromium.org/2695153002/diff/20001/content/browser/downloa... content/browser/download/base_file_unittest.cc:782: EXPECT_EQ(DOWNLOAD_INTERRUPT_REASON_NONE, On 2017/02/15 20:00:43, qinmin wrote: > On 2017/02/15 19:13:20, xingliu wrote: > > In this way, each parallel download worker class will hold a DownloadFile > > objects, each DownloadFile holds a content::BaseFile. I'll create a branch try > > to glue network IO with disk IO in this way. > > > > It will create multiple file descriptors on low level disk IO api. I think > > currently base::File supports that in linux and windows. > > An alternative is to provide a write(int64_t offset, const char* data, size_t > data_len) function so that we can write to arbitrary location. Probably that is > better, but then each job needs to figure out how much data they have written by > themselves, rather than using the bytes_so_far() function. Yeah, make sense, we can reuse the meta data with the first approach. It might be easier to implement compared to the alternative approach.
Patchset #2 (id:40001) has been deleted
Description was changed from ========== Refactor BaseFile class to support parallel writers To support parallel downloading, a file might have multiple writers at the same time. This CL refactors the BaseFile class to support multiple writers. It allows the initial write offset to be larger or smaller than the actual file size. And this CL skips all the hash validation logic if the |access_mode_| is SHARED. BUG=644352 ========== to ========== Refactor BaseFile class to support parallel writers To support parallel downloading, a file might have multiple writers at the same time. This CL refactors the BaseFile class to support multiple writers. It adds a method to allow callers to write to arbitrary offsets. And this CL skips all the hash validation logic if the |access_mode_| is SHARED. BUG=644352 ==========
Patchset #2 (id:60001) has been deleted
The CQ bit was checked by qinmin@chromium.org
The CQ bit was unchecked by qinmin@chromium.org
The CQ bit was checked by qinmin@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Patchset #2 (id:80001) has been deleted
The CQ bit was checked by qinmin@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: chromeos_daisy_chromium_compile_only_ng on master.tryserver.chromium.linux (JOB_TIMED_OUT, no build URL) linux_chromium_asan_rel_ng on master.tryserver.chromium.linux (JOB_TIMED_OUT, no build URL) linux_chromium_chromeos_rel_ng on master.tryserver.chromium.linux (JOB_TIMED_OUT, no build URL) linux_chromium_compile_dbg_ng on master.tryserver.chromium.linux (JOB_TIMED_OUT, no build URL) linux_chromium_rel_ng on master.tryserver.chromium.linux (JOB_TIMED_OUT, no build URL) linux_chromium_tsan_rel_ng on master.tryserver.chromium.linux (JOB_TIMED_OUT, no build URL)
ping, asanka@, please take a look, thanks
I suspect that handling things like renames are going to be quite fragile if there are multiple BaseFiles that are involved. On Windows, the file cannot be renamed if there are any open file handles. So all BaseFiles would need to close their respective handles before we can rename an intermediate file. It might be worth considering having at most one BaseFile per download. It can be shared via the DownloadJob hierarchy. This would ensure that Close()/Rename()/Open() sequences can be performed atomically without stepping on the feet of other BaseFiles. Of course, instead each Job will now need to keep track of where its writing. That should end up being a much leaner overhead compared to the synchronization dance you'll need to do with having multiple BaseFiles. https://codereview.chromium.org/2695153002/diff/100001/content/browser/downlo... File content/browser/download/base_file.cc (right): https://codereview.chromium.org/2695153002/diff/100001/content/browser/downlo... content/browser/download/base_file.cc:92: if (file_.Seek(base::File::FROM_BEGIN, offset) < 0) { base::File::Write() can be used to write to a specific offset without an additional Seek() https://codereview.chromium.org/2695153002/diff/100001/content/browser/downlo... content/browser/download/base_file.cc:144: DownloadInterruptReason BaseFile::Rename(const base::FilePath& new_path) { Methods like Rename() and Annotate...() depend on this BaseFile being the sole owner. Overall, BaseFile still assumes sole ownership since it will delete the file if BaseFile is destroyed without calling Detach().
On 2017/02/16 18:11:24, asanka wrote: > I suspect that handling things like renames are going to be quite fragile if > there are multiple BaseFiles that are involved. On Windows, the file cannot be > renamed if there are any open file handles. So all BaseFiles would need to close > their respective handles before we can rename an intermediate file. > > It might be worth considering having at most one BaseFile per download. It can > be shared via the DownloadJob hierarchy. This would ensure that > Close()/Rename()/Open() sequences can be performed atomically without stepping > on the feet of other BaseFiles. Of course, instead each Job will now need to > keep track of where its writing. That should end up being a much leaner overhead > compared to the synchronization dance you'll need to do with having multiple > BaseFiles. IIUC I think this patch still assumes there will be one BaseFile. It's just adding support for writing to an offset and bypassing validation checks for the scenario where we might have multiple writers poking at the same file. > > https://codereview.chromium.org/2695153002/diff/100001/content/browser/downlo... > File content/browser/download/base_file.cc (right): > > https://codereview.chromium.org/2695153002/diff/100001/content/browser/downlo... > content/browser/download/base_file.cc:92: if (file_.Seek(base::File::FROM_BEGIN, > offset) < 0) { > base::File::Write() can be used to write to a specific offset without an > additional Seek() > > https://codereview.chromium.org/2695153002/diff/100001/content/browser/downlo... > content/browser/download/base_file.cc:144: DownloadInterruptReason > BaseFile::Rename(const base::FilePath& new_path) { > Methods like Rename() and Annotate...() depend on this BaseFile being the sole > owner. > > Overall, BaseFile still assumes sole ownership since it will delete the file if > BaseFile is destroyed without calling Detach().
I stand corrected. LGTM, but I still feel that the distinction is whether the file is sparse or not, and not whether there are multiple writers. The latter is not a distinction made at the BaseFile layer. Do you think slice updating is going to move here? Otherwise BaseFile won't be able to verify that the Write()s are non-overlapping. https://codereview.chromium.org/2695153002/diff/100001/content/browser/downlo... File content/browser/download/base_file.cc (right): https://codereview.chromium.org/2695153002/diff/100001/content/browser/downlo... content/browser/download/base_file.cc:92: if (file_.Seek(base::File::FROM_BEGIN, offset) < 0) { On 2017/02/16 at 18:11:24, asanka wrote: > base::File::Write() can be used to write to a specific offset without an additional Seek() Disregard. https://codereview.chromium.org/2695153002/diff/100001/content/browser/downlo... content/browser/download/base_file.cc:144: DownloadInterruptReason BaseFile::Rename(const base::FilePath& new_path) { On 2017/02/16 at 18:11:24, asanka wrote: > Methods like Rename() and Annotate...() depend on this BaseFile being the sole owner. > > Overall, BaseFile still assumes sole ownership since it will delete the file if BaseFile is destroyed without calling Detach(). Disregard. I misunderstood how BaseFile is intended to be used with parallel downloads. https://codereview.chromium.org/2695153002/diff/100001/content/browser/downlo... File content/browser/download/base_file.h (right): https://codereview.chromium.org/2695153002/diff/100001/content/browser/downlo... content/browser/download/base_file.h:40: SHARED, I'd argue that the distinction is that the file is sparse or not. https://codereview.chromium.org/2695153002/diff/100001/content/browser/downlo... content/browser/download/base_file.h:211: // This function is only useful when |access_mode_| is EXCLUSIVE. It verifies Once the download is complete, we'd still need use this to calculate the hash of the entire file. Otherwise we'll break SafeBrowsing.
On 2017/02/16 18:46:26, asanka wrote: > I stand corrected. > > LGTM, but I still feel that the distinction is whether the file is sparse or > not, and not whether there are multiple writers. The latter is not a distinction > made at the BaseFile layer. > > Do you think slice updating is going to move here? Otherwise BaseFile won't be > able to verify that the Write()s are non-overlapping. > Yes, I have added a TODO in the CL > https://codereview.chromium.org/2695153002/diff/100001/content/browser/downlo... > File content/browser/download/base_file.cc (right): > > https://codereview.chromium.org/2695153002/diff/100001/content/browser/downlo... > content/browser/download/base_file.cc:92: if (file_.Seek(base::File::FROM_BEGIN, > offset) < 0) { > On 2017/02/16 at 18:11:24, asanka wrote: > > base::File::Write() can be used to write to a specific offset without an > additional Seek() > > Disregard. > > https://codereview.chromium.org/2695153002/diff/100001/content/browser/downlo... > content/browser/download/base_file.cc:144: DownloadInterruptReason > BaseFile::Rename(const base::FilePath& new_path) { > On 2017/02/16 at 18:11:24, asanka wrote: > > Methods like Rename() and Annotate...() depend on this BaseFile being the sole > owner. > > > > Overall, BaseFile still assumes sole ownership since it will delete the file > if BaseFile is destroyed without calling Detach(). > > Disregard. I misunderstood how BaseFile is intended to be used with parallel > downloads. > > https://codereview.chromium.org/2695153002/diff/100001/content/browser/downlo... > File content/browser/download/base_file.h (right): > > https://codereview.chromium.org/2695153002/diff/100001/content/browser/downlo... > content/browser/download/base_file.h:40: SHARED, > I'd argue that the distinction is that the file is sparse or not. > > https://codereview.chromium.org/2695153002/diff/100001/content/browser/downlo... > content/browser/download/base_file.h:211: // This function is only useful when > |access_mode_| is EXCLUSIVE. It verifies > Once the download is complete, we'd still need use this to calculate the hash of > the entire file. Otherwise we'll break SafeBrowsing.
Description was changed from ========== Refactor BaseFile class to support parallel writers To support parallel downloading, a file might have multiple writers at the same time. This CL refactors the BaseFile class to support multiple writers. It adds a method to allow callers to write to arbitrary offsets. And this CL skips all the hash validation logic if the |access_mode_| is SHARED. BUG=644352 ========== to ========== Refactor BaseFile class to support sparse files To support parallel downloading, a file might have multiple writers at the same time. This CL refactors the BaseFile class to support sparse files. It adds a method to allow callers to write to arbitrary offsets. And this CL skips all the hash validation logic if the |is_sparse_file_| is true. BUG=644352 ==========
https://codereview.chromium.org/2695153002/diff/100001/content/browser/downlo... File content/browser/download/base_file.cc (right): https://codereview.chromium.org/2695153002/diff/100001/content/browser/downlo... content/browser/download/base_file.cc:92: if (file_.Seek(base::File::FROM_BEGIN, offset) < 0) { On 2017/02/16 18:11:24, asanka wrote: > base::File::Write() can be used to write to a specific offset without an > additional Seek() I used this seek() to reuse some of code from AppendDataToFile(). For example, if I use Write(offset, data, len), when will have to check the return value, and do a for loop to write the remainder data, and update the bytes_so_far_. By doing a seek, i can reuse the while loop in AppendDataToFile(). It is also very strange that we actually have this while loop in our code. base::File::Write() already does a while loop for us. If it writes fewer bytes than asked, this means an error. So I don't understand why this comment is here "// The Write call below is not guaranteed to write all the data.". Maybe this is for some systems that have a flawed base::File::Write() implementation? I can fix this in a separate CL if this comment is incorrect. https://codereview.chromium.org/2695153002/diff/100001/content/browser/downlo... content/browser/download/base_file.cc:144: DownloadInterruptReason BaseFile::Rename(const base::FilePath& new_path) { On 2017/02/16 18:11:24, asanka wrote: > Methods like Rename() and Annotate...() depend on this BaseFile being the sole > owner. > > Overall, BaseFile still assumes sole ownership since it will delete the file if > BaseFile is destroyed without calling Detach(). Yes, the BaseFile will still be owned by a DownloadFileImpl object and will be shared among the workers. So this CL is not going to create multiple BaseFiles for one physical file https://codereview.chromium.org/2695153002/diff/100001/content/browser/downlo... File content/browser/download/base_file.h (right): https://codereview.chromium.org/2695153002/diff/100001/content/browser/downlo... content/browser/download/base_file.h:40: SHARED, On 2017/02/16 18:46:26, asanka wrote: > I'd argue that the distinction is that the file is sparse or not. Done. Changed this to a boolean variable is_sparse_file. https://codereview.chromium.org/2695153002/diff/100001/content/browser/downlo... content/browser/download/base_file.h:211: // This function is only useful when |access_mode_| is EXCLUSIVE. It verifies On 2017/02/16 18:46:26, asanka wrote: > Once the download is complete, we'd still need use this to calculate the hash of > the entire file. Otherwise we'll break SafeBrowsing. Added the code to calculate the hash when finish() is called.
https://codereview.chromium.org/2695153002/diff/100001/content/browser/downlo... File content/browser/download/base_file.cc (right): https://codereview.chromium.org/2695153002/diff/100001/content/browser/downlo... content/browser/download/base_file.cc:92: if (file_.Seek(base::File::FROM_BEGIN, offset) < 0) { On 2017/02/16 at 21:35:50, qinmin wrote: > On 2017/02/16 18:11:24, asanka wrote: > > base::File::Write() can be used to write to a specific offset without an > > additional Seek() > > I used this seek() to reuse some of code from AppendDataToFile(). For example, if I use Write(offset, data, len), when will have to check the return value, and do a for loop to write the remainder data, and update the bytes_so_far_. By doing a seek, i can reuse the while loop in AppendDataToFile(). > > It is also very strange that we actually have this while loop in our code. base::File::Write() already does a while loop for us. If it writes fewer bytes than asked, this means an error. So I don't understand why this comment is here "// The Write call below is not guaranteed to write all the data.". Maybe this is for some systems that have a flawed base::File::Write() implementation? I can fix this in a separate CL if this comment is incorrect. Probably because this code predates the logic in base::File ? We should remove the loop around the File::Write() calls. Then afterwards we can get rid of the additional Seek(). I'm okay with doing so in a separate CL.
https://codereview.chromium.org/2695153002/diff/100001/content/browser/downlo... File content/browser/download/base_file.cc (right): https://codereview.chromium.org/2695153002/diff/100001/content/browser/downlo... content/browser/download/base_file.cc:92: if (file_.Seek(base::File::FROM_BEGIN, offset) < 0) { On 2017/02/16 21:58:02, asanka wrote: > On 2017/02/16 at 21:35:50, qinmin wrote: > > On 2017/02/16 18:11:24, asanka wrote: > > > base::File::Write() can be used to write to a specific offset without an > > > additional Seek() > > > > I used this seek() to reuse some of code from AppendDataToFile(). For example, > if I use Write(offset, data, len), when will have to check the return value, and > do a for loop to write the remainder data, and update the bytes_so_far_. By > doing a seek, i can reuse the while loop in AppendDataToFile(). > > > > It is also very strange that we actually have this while loop in our code. > base::File::Write() already does a while loop for us. If it writes fewer bytes > than asked, this means an error. So I don't understand why this comment is here > "// The Write call below is not guaranteed to write all the data.". Maybe this > is for some systems that have a flawed base::File::Write() implementation? I > can fix this in a separate CL if this comment is incorrect. > > Probably because this code predates the logic in base::File ? We should remove > the loop around the File::Write() calls. Then afterwards we can get rid of the > additional Seek(). I'm okay with doing so in a separate CL. Done. Removed the while loop around the Write() call and removed the seek() call.
Patchset #4 (id:140001) has been deleted
The CQ bit was checked by qinmin@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from asanka@chromium.org Link to the patchset: https://codereview.chromium.org/2695153002/#ps160001 (title: "remove the while loop around write")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 160001, "attempt_start_ts": 1487317842322850, "parent_rev": "29db19d0c973a8990283ec4b81111db5bdb8d2de", "commit_rev": "f456c408998b5178f9f72810f417d57ffacb6da2"}
Message was sent while issue was closed.
Description was changed from ========== Refactor BaseFile class to support sparse files To support parallel downloading, a file might have multiple writers at the same time. This CL refactors the BaseFile class to support sparse files. It adds a method to allow callers to write to arbitrary offsets. And this CL skips all the hash validation logic if the |is_sparse_file_| is true. BUG=644352 ========== to ========== Refactor BaseFile class to support sparse files To support parallel downloading, a file might have multiple writers at the same time. This CL refactors the BaseFile class to support sparse files. It adds a method to allow callers to write to arbitrary offsets. And this CL skips all the hash validation logic if the |is_sparse_file_| is true. BUG=644352 Review-Url: https://codereview.chromium.org/2695153002 Cr-Commit-Position: refs/heads/master@{#451276} Committed: https://chromium.googlesource.com/chromium/src/+/f456c408998b5178f9f72810f417... ==========
Message was sent while issue was closed.
Committed patchset #4 (id:160001) as https://chromium.googlesource.com/chromium/src/+/f456c408998b5178f9f72810f417... |