|
|
Created:
5 years, 10 months ago by Ben Kwa Modified:
5 years, 10 months ago CC:
chromium-reviews, extensions-reviews_chromium.org, nkostylev+watch_chromium.org, rginda+watch_chromium.org, oshima+watch_chromium.org, chromium-apps-reviews_chromium.org, stevenjb+watch_chromium.org, davemoore+watch_chromium.org Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionFiles.app: Implement md5 hashing over file streams.
Add a class to deal with the streaming/digesting, and hook it up. File streams are usable in a wider range of scenarios (like over MTP) so this enables hashing in those situations.
BUG=453880
Committed: https://crrev.com/eda76cb6f08db5d1d8dbb7a91959b080c56ccac9
Cr-Commit-Position: refs/heads/master@{#314576}
Patch Set 1 #
Total comments: 22
Patch Set 2 : Address feedback. Move I/O onto the IO thread. #
Total comments: 12
Patch Set 3 : Address feedback. #
Total comments: 2
Patch Set 4 : Add a DCHECK to verify that hash computation takes place on the IO thread. #
Total comments: 2
Patch Set 5 : Address feedback. #
Messages
Total messages: 23 (5 generated)
kenobi@chromium.org changed reviewers: + mtomasz@chromium.org, yoshiki@chromium.org
https://codereview.chromium.org/875513006/diff/1/chrome/browser/drive/drive_a... File chrome/browser/drive/drive_api_util.cc (right): https://codereview.chromium.org/875513006/diff/1/chrome/browser/drive/drive_a... chrome/browser/drive/drive_api_util.cc:188: int result = reader_->Read(buffer_.get(), kBufferSize_, nit: const int https://codereview.chromium.org/875513006/diff/1/chrome/browser/drive/drive_a... chrome/browser/drive/drive_api_util.cc:188: int result = reader_->Read(buffer_.get(), kBufferSize_, IIRC reader may have to be called on IO thread only. But it's not documented whether GetMd5Digest() should be called on IO thread or UI. Could you double check? https://codereview.chromium.org/875513006/diff/1/chrome/browser/drive/drive_a... chrome/browser/drive/drive_api_util.cc:198: // Error - just return empty string nit: period at the end of comments. https://codereview.chromium.org/875513006/diff/1/chrome/browser/drive/drive_a... File chrome/browser/drive/drive_api_util.h (right): https://codereview.chromium.org/875513006/diff/1/chrome/browser/drive/drive_a... chrome/browser/drive/drive_api_util.h:82: // Computes an MD5 digest of data read from the given |streamReader|. The nit: \n\n after destructor? https://codereview.chromium.org/875513006/diff/1/chrome/browser/drive/drive_a... chrome/browser/drive/drive_api_util.h:87: void GetMd5Digest(scoped_ptr<storage::FileStreamReader> streamReader, nit: stream_reader https://codereview.chromium.org/875513006/diff/1/chrome/browser/drive/drive_a... chrome/browser/drive/drive_api_util.h:88: const base::Callback<void(const std::string&)>& callback); nit: How about typedefing the callback within FileStreamMd5Digester? https://codereview.chromium.org/875513006/diff/1/chrome/browser/drive/drive_a... chrome/browser/drive/drive_api_util.h:92: void readNextChunk(); nit: read -> Read https://codereview.chromium.org/875513006/diff/1/chrome/browser/drive/drive_a... chrome/browser/drive/drive_api_util.h:98: static const int kBufferSize_ = 512 * 1024; // 512 kB. nit: This may be new if initializing static members in a header is allowed. Please double check. https://codereview.chromium.org/875513006/diff/1/chrome/browser/drive/drive_a... chrome/browser/drive/drive_api_util.h:98: static const int kBufferSize_ = 512 * 1024; // 512 kB. nit: Also, do we need it to be in the header file? How about moving to foobar.cc anonymous namespace? https://codereview.chromium.org/875513006/diff/1/chrome/browser/drive/drive_a... chrome/browser/drive/drive_api_util.h:98: static const int kBufferSize_ = 512 * 1024; // 512 kB. nit: Isn't the chunk too large? We're doing it on UI thread, which we don't want to block. https://codereview.chromium.org/875513006/diff/1/chrome/browser/drive/drive_a... chrome/browser/drive/drive_api_util.h:102: base::MD5Context md5Context_; nit: camelCase -> is_not_allowed
Patchset #2 (id:20001) has been deleted
Done. Reworking the I/O to occur on the IO thread was a fair bit of code, but things look a lot cleaner now. PTAL. Thanks! https://codereview.chromium.org/875513006/diff/1/chrome/browser/drive/drive_a... File chrome/browser/drive/drive_api_util.cc (right): https://codereview.chromium.org/875513006/diff/1/chrome/browser/drive/drive_a... chrome/browser/drive/drive_api_util.cc:188: int result = reader_->Read(buffer_.get(), kBufferSize_, On 2015/01/31 04:53:12, mtomasz wrote: > nit: const int Done. https://codereview.chromium.org/875513006/diff/1/chrome/browser/drive/drive_a... chrome/browser/drive/drive_api_util.cc:188: int result = reader_->Read(buffer_.get(), kBufferSize_, On 2015/01/31 04:53:12, mtomasz wrote: > IIRC reader may have to be called on IO thread only. But it's not documented > whether GetMd5Digest() should be called on IO thread or UI. Could you double > check? Hm, you're right; I misread the FileStreamReader::Read doc. I reworked this to be on the IO thread. Salient points: - FileStreamMd5Digester is now a ref-counted object. This enables it to be used properly with refptr and bind. - The extension function no longer has to hold on to a FileStreamMd5Digester as a class member; it just instantiates one in RunAsync, and relies on refcounting to manage the lifetime of that object. https://codereview.chromium.org/875513006/diff/1/chrome/browser/drive/drive_a... chrome/browser/drive/drive_api_util.cc:198: // Error - just return empty string On 2015/01/31 04:53:12, mtomasz wrote: > nit: period at the end of comments. Done. https://codereview.chromium.org/875513006/diff/1/chrome/browser/drive/drive_a... File chrome/browser/drive/drive_api_util.h (right): https://codereview.chromium.org/875513006/diff/1/chrome/browser/drive/drive_a... chrome/browser/drive/drive_api_util.h:82: // Computes an MD5 digest of data read from the given |streamReader|. The On 2015/01/31 04:53:12, mtomasz wrote: > nit: \n\n after destructor? Done. https://codereview.chromium.org/875513006/diff/1/chrome/browser/drive/drive_a... chrome/browser/drive/drive_api_util.h:87: void GetMd5Digest(scoped_ptr<storage::FileStreamReader> streamReader, On 2015/01/31 04:53:12, mtomasz wrote: > nit: stream_reader Done. https://codereview.chromium.org/875513006/diff/1/chrome/browser/drive/drive_a... chrome/browser/drive/drive_api_util.h:88: const base::Callback<void(const std::string&)>& callback); On 2015/01/31 04:53:12, mtomasz wrote: > nit: How about typedefing the callback within FileStreamMd5Digester? Done. https://codereview.chromium.org/875513006/diff/1/chrome/browser/drive/drive_a... chrome/browser/drive/drive_api_util.h:92: void readNextChunk(); On 2015/01/31 04:53:12, mtomasz wrote: > nit: read -> Read Done. https://codereview.chromium.org/875513006/diff/1/chrome/browser/drive/drive_a... chrome/browser/drive/drive_api_util.h:98: static const int kBufferSize_ = 512 * 1024; // 512 kB. On 2015/01/31 04:53:12, mtomasz wrote: > nit: This may be new if initializing static members in a header is allowed. > Please double check. The style guides don't explicitly say anything about this, but based on code search, it's pretty common. https://code.google.com/p/chromium/codesearch#search/&q=file:.*%5C.h%20%22sta... https://codereview.chromium.org/875513006/diff/1/chrome/browser/drive/drive_a... chrome/browser/drive/drive_api_util.h:98: static const int kBufferSize_ = 512 * 1024; // 512 kB. On 2015/01/31 04:53:12, mtomasz wrote: > nit: Also, do we need it to be in the header file? How about moving to foobar.cc > anonymous namespace? I figured that keeping this constant scoped within the class would be a good thing. I could move it to the .cc file, but then it'd just be a random constant in drive_api_util.cc, which I thought was more untidy. https://codereview.chromium.org/875513006/diff/1/chrome/browser/drive/drive_a... chrome/browser/drive/drive_api_util.h:98: static const int kBufferSize_ = 512 * 1024; // 512 kB. On 2015/01/31 04:53:12, mtomasz wrote: > nit: Isn't the chunk too large? We're doing it on UI thread, which we don't want > to block. So... this happens on the IO thread now. I asked around about it, the (incremental) cost of the hashing is small compared to the cost of the file read and so it doesn't really matter which thread the hashing occurs on. Better to keep it on the IO thread to reduce the amount of bouncing back and forth. https://codereview.chromium.org/875513006/diff/1/chrome/browser/drive/drive_a... chrome/browser/drive/drive_api_util.h:102: base::MD5Context md5Context_; On 2015/01/31 04:53:12, mtomasz wrote: > nit: camelCase -> is_not_allowed Done.
https://codereview.chromium.org/875513006/diff/40001/chrome/browser/chromeos/... File chrome/browser/chromeos/extensions/file_manager/private_api_file_system.cc (right): https://codereview.chromium.org/875513006/diff/40001/chrome/browser/chromeos/... chrome/browser/chromeos/extensions/file_manager/private_api_file_system.cc:742: scoped_refptr<drive::util::FileStreamMd5Digester> digester( You don't need to make it a ref counter class. We can keep it normal, and do base::Bind(&A::b, base::Unretained(b), ..., callback) if the "callback" outlives the "b" object. It is this case, as FileManagerPrivateComputeChecksumFunction is already ref counted. In general we try to avoid ref counted classes if not necessary in Chromium. http://www.chromium.org/developers/smart-pointer-guidelines https://codereview.chromium.org/875513006/diff/40001/chrome/browser/chromeos/... chrome/browser/chromeos/extensions/file_manager/private_api_file_system.cc:756: if (!BrowserThread::CurrentlyOn(BrowserThread::UI)) { This is a little bit hacky. Each method should be called on one thread, or any thread. I'd suggest to use PostTaskAndReply* which would call the callback on the same thread as caller. For reference see https://groups.google.com/a/chromium.org/d/topic/chromium-dev/uWGNk-hVlGc/dis... https://codereview.chromium.org/875513006/diff/40001/chrome/browser/drive/dri... File chrome/browser/drive/drive_api_util.cc (right): https://codereview.chromium.org/875513006/diff/40001/chrome/browser/drive/dri... chrome/browser/drive/drive_api_util.cc:214: ReadNextChunk(); For large files we will keep the IO thread busy for number of seconds. IIRC this thread is shared with IPC. While occupying IO thread is not as bad as UI, it may cause unresponsiveness of the browser. Please check with huge files if browser is still responsive. IIf it's fine, then let's go with the current solution. If it's not, then we may want to use a blocking thread pool. https://codereview.chromium.org/875513006/diff/40001/chrome/browser/drive/dri... File chrome/browser/drive/drive_api_util.h (right): https://codereview.chromium.org/875513006/diff/40001/chrome/browser/drive/dri... chrome/browser/drive/drive_api_util.h:103: static const int kBufferSize_ = 512 * 1024; // 512 kB. nit: Yeah, I'd suggest to move it to an anonymous namespace in .cc file. We always do that and try to make headers as small as possible (because we already have tons of them). We usually move everything what's possible from private section to anonymous namespace.
https://codereview.chromium.org/875513006/diff/40001/chrome/browser/chromeos/... File chrome/browser/chromeos/extensions/file_manager/private_api_file_system.cc (right): https://codereview.chromium.org/875513006/diff/40001/chrome/browser/chromeos/... chrome/browser/chromeos/extensions/file_manager/private_api_file_system.cc:742: scoped_refptr<drive::util::FileStreamMd5Digester> digester( On 2015/02/02 23:47:07, mtomasz wrote: > You don't need to make it a ref counter class. We can keep it normal, and do > base::Bind(&A::b, base::Unretained(b), ..., callback) if the "callback" outlives > the "b" object. It is this case, as FileManagerPrivateComputeChecksumFunction is > already ref counted. > > In general we try to avoid ref counted classes if not necessary in Chromium. > > http://www.chromium.org/developers/smart-pointer-guidelines So, you're suggesting making the FileStreamMd5Digester a member of the FileManagerPrivateComputeChecksumFunction? Otherwise it goes out of scope before the bound closure gets executed. I figured it was cleaner to just construct it here and have it live only as long as necessary, but I can do the member thing if you prefer that. https://codereview.chromium.org/875513006/diff/40001/chrome/browser/chromeos/... chrome/browser/chromeos/extensions/file_manager/private_api_file_system.cc:756: if (!BrowserThread::CurrentlyOn(BrowserThread::UI)) { On 2015/02/02 23:47:07, mtomasz wrote: > This is a little bit hacky. Each method should be called on one thread, or any > thread. I'd suggest to use PostTaskAndReply* which would call the callback on > the same thread as caller. > > For reference see > https://groups.google.com/a/chromium.org/d/topic/chromium-dev/uWGNk-hVlGc/dis... I don't think I can use PostTaskAndReply, because the MD5 hashing occurs asynchronously. Is there some utility for waiting on the async reads and returning synchronously? I don't know that that's really the best solution anyway - it seems cleaner to me to write code that accounts for the asynchronous nature of what's happening. Would it satisfy you to just take out this if statement and just always explicitly specify which function is occurring on which thread?
https://codereview.chromium.org/875513006/diff/40001/chrome/browser/chromeos/... File chrome/browser/chromeos/extensions/file_manager/private_api_file_system.cc (right): https://codereview.chromium.org/875513006/diff/40001/chrome/browser/chromeos/... chrome/browser/chromeos/extensions/file_manager/private_api_file_system.cc:742: scoped_refptr<drive::util::FileStreamMd5Digester> digester( On 2015/02/03 06:28:54, Ben Kwa wrote: > On 2015/02/02 23:47:07, mtomasz wrote: > > You don't need to make it a ref counter class. We can keep it normal, and do > > base::Bind(&A::b, base::Unretained(b), ..., callback) if the "callback" > outlives > > the "b" object. It is this case, as FileManagerPrivateComputeChecksumFunction > is > > already ref counted. > > > > In general we try to avoid ref counted classes if not necessary in Chromium. > > > > http://www.chromium.org/developers/smart-pointer-guidelines > > So, you're suggesting making the FileStreamMd5Digester a member of the > FileManagerPrivateComputeChecksumFunction? Otherwise it goes out of scope > before the bound closure gets executed. I figured it was cleaner to just > construct it here and have it live only as long as necessary, but I can do the > member thing if you prefer that. We use members for that, as the FileManagerPrivateComputeChecksumFunction instance is created per call. https://codereview.chromium.org/875513006/diff/40001/chrome/browser/chromeos/... chrome/browser/chromeos/extensions/file_manager/private_api_file_system.cc:756: if (!BrowserThread::CurrentlyOn(BrowserThread::UI)) { On 2015/02/03 06:28:54, Ben Kwa wrote: > On 2015/02/02 23:47:07, mtomasz wrote: > > This is a little bit hacky. Each method should be called on one thread, or any > > thread. I'd suggest to use PostTaskAndReply* which would call the callback on > > the same thread as caller. > > > > For reference see > > > https://groups.google.com/a/chromium.org/d/topic/chromium-dev/uWGNk-hVlGc/dis... > > I don't think I can use PostTaskAndReply, because the MD5 hashing occurs > asynchronously. Is there some utility for waiting on the async reads and > returning synchronously? I don't know that that's really the best solution > anyway - it seems cleaner to me to write code that accounts for the asynchronous > nature of what's happening. Would it satisfy you to just take out this if > statement and just always explicitly specify which function is occurring on > which thread? You're right. I'd suggest to repost then in an anonymous namespace. Eg. void OnMd5ChecksumOnIOThread(const std::string& checksum, Callback callback) { PostTask(UI, FROM_HERE, checksum, callback); } or sth like that, then: PostTask(IO, FROM_HERE, base::Bind(&Digester::GetMd5Digest, base::Unretained(digester_), base::Bind(&OnMd5ChecksumOnIOThread, base::Bind(&FileManagerPrivateChecksumFunction::Respond, this))); It's a pretty common pattern in Chromium.
Done - PTAL. Thanks! https://codereview.chromium.org/875513006/diff/40001/chrome/browser/chromeos/... File chrome/browser/chromeos/extensions/file_manager/private_api_file_system.cc (right): https://codereview.chromium.org/875513006/diff/40001/chrome/browser/chromeos/... chrome/browser/chromeos/extensions/file_manager/private_api_file_system.cc:742: scoped_refptr<drive::util::FileStreamMd5Digester> digester( On 2015/02/03 10:36:53, mtomasz wrote: > On 2015/02/03 06:28:54, Ben Kwa wrote: > > On 2015/02/02 23:47:07, mtomasz wrote: > > > You don't need to make it a ref counter class. We can keep it normal, and do > > > base::Bind(&A::b, base::Unretained(b), ..., callback) if the "callback" > > outlives > > > the "b" object. It is this case, as > FileManagerPrivateComputeChecksumFunction > > is > > > already ref counted. > > > > > > In general we try to avoid ref counted classes if not necessary in Chromium. > > > > > > http://www.chromium.org/developers/smart-pointer-guidelines > > > > So, you're suggesting making the FileStreamMd5Digester a member of the > > FileManagerPrivateComputeChecksumFunction? Otherwise it goes out of scope > > before the bound closure gets executed. I figured it was cleaner to just > > construct it here and have it live only as long as necessary, but I can do the > > member thing if you prefer that. > > We use members for that, as the FileManagerPrivateComputeChecksumFunction > instance is created per call. OK, done. https://codereview.chromium.org/875513006/diff/40001/chrome/browser/chromeos/... chrome/browser/chromeos/extensions/file_manager/private_api_file_system.cc:756: if (!BrowserThread::CurrentlyOn(BrowserThread::UI)) { On 2015/02/03 10:36:53, mtomasz wrote: > On 2015/02/03 06:28:54, Ben Kwa wrote: > > On 2015/02/02 23:47:07, mtomasz wrote: > > > This is a little bit hacky. Each method should be called on one thread, or > any > > > thread. I'd suggest to use PostTaskAndReply* which would call the callback > on > > > the same thread as caller. > > > > > > For reference see > > > > > > https://groups.google.com/a/chromium.org/d/topic/chromium-dev/uWGNk-hVlGc/dis... > > > > I don't think I can use PostTaskAndReply, because the MD5 hashing occurs > > asynchronously. Is there some utility for waiting on the async reads and > > returning synchronously? I don't know that that's really the best solution > > anyway - it seems cleaner to me to write code that accounts for the > asynchronous > > nature of what's happening. Would it satisfy you to just take out this if > > statement and just always explicitly specify which function is occurring on > > which thread? > > You're right. I'd suggest to repost then in an anonymous namespace. > > Eg. > > void OnMd5ChecksumOnIOThread(const std::string& checksum, Callback callback) { > PostTask(UI, FROM_HERE, checksum, callback); > } > > or sth like that, then: > > PostTask(IO, FROM_HERE, base::Bind(&Digester::GetMd5Digest, > base::Unretained(digester_), base::Bind(&OnMd5ChecksumOnIOThread, > base::Bind(&FileManagerPrivateChecksumFunction::Respond, this))); > > It's a pretty common pattern in Chromium. Done. https://codereview.chromium.org/875513006/diff/40001/chrome/browser/drive/dri... File chrome/browser/drive/drive_api_util.cc (right): https://codereview.chromium.org/875513006/diff/40001/chrome/browser/drive/dri... chrome/browser/drive/drive_api_util.cc:214: ReadNextChunk(); On 2015/02/02 23:47:07, mtomasz wrote: > For large files we will keep the IO thread busy for number of seconds. IIRC this > thread is shared with IPC. While occupying IO thread is not as bad as UI, it may > cause unresponsiveness of the browser. Please check with huge files if browser > is still responsive. IIf it's fine, then let's go with the current solution. If > it's not, then we may want to use a blocking thread pool. Will investigate this today and update with what I find. https://codereview.chromium.org/875513006/diff/40001/chrome/browser/drive/dri... File chrome/browser/drive/drive_api_util.h (right): https://codereview.chromium.org/875513006/diff/40001/chrome/browser/drive/dri... chrome/browser/drive/drive_api_util.h:103: static const int kBufferSize_ = 512 * 1024; // 512 kB. On 2015/02/02 23:47:08, mtomasz wrote: > nit: Yeah, I'd suggest to move it to an anonymous namespace in .cc file. We > always do that and try to make headers as small as possible (because we already > have tons of them). We usually move everything what's possible from private > section to anonymous namespace. Done.
lgtm with nit if no perf problems for large files on a slow chromebook. https://codereview.chromium.org/875513006/diff/60001/chrome/browser/chromeos/... File chrome/browser/chromeos/extensions/file_manager/private_api_file_system.cc (right): https://codereview.chromium.org/875513006/diff/60001/chrome/browser/chromeos/... chrome/browser/chromeos/extensions/file_manager/private_api_file_system.cc:219: BrowserThread::PostTask(BrowserThread::UI, FROM_HERE, nit: Please add a dcheck that we're on IO.
New patchsets have been uploaded after l-g-t-m from mtomasz@chromium.org
On 2015/02/04 01:32:30, mtomasz wrote: > lgtm with nit if no perf problems for large files on a slow chromebook. > > https://codereview.chromium.org/875513006/diff/60001/chrome/browser/chromeos/... > File chrome/browser/chromeos/extensions/file_manager/private_api_file_system.cc > (right): > > https://codereview.chromium.org/875513006/diff/60001/chrome/browser/chromeos/... > chrome/browser/chromeos/extensions/file_manager/private_api_file_system.cc:219: > BrowserThread::PostTask(BrowserThread::UI, FROM_HERE, > nit: Please add a dcheck that we're on IO. On a Spring device, hashing a ~300MB video file in-place on a mounted SD card takes approximately 17-19 seconds. During that time, I was able to play videos on youtube.com with no hiccups in playback.
https://codereview.chromium.org/875513006/diff/60001/chrome/browser/chromeos/... File chrome/browser/chromeos/extensions/file_manager/private_api_file_system.cc (right): https://codereview.chromium.org/875513006/diff/60001/chrome/browser/chromeos/... chrome/browser/chromeos/extensions/file_manager/private_api_file_system.cc:219: BrowserThread::PostTask(BrowserThread::UI, FROM_HERE, On 2015/02/04 01:32:30, mtomasz wrote: > nit: Please add a dcheck that we're on IO. Done.
Another performance note: Hashing the same file on the same device takes approximately 60s over MTP. Still no visible performance problems while that is in progress.
lgtm with tiny nit. Thanks. https://codereview.chromium.org/875513006/diff/70001/chrome/browser/drive/dri... File chrome/browser/drive/drive_api_util.cc (right): https://codereview.chromium.org/875513006/diff/70001/chrome/browser/drive/dri... chrome/browser/drive/drive_api_util.cc:191: if (result != net::ERR_IO_PENDING) { nit: we don't need braces.
New patchsets have been uploaded after l-g-t-m from yoshiki@chromium.org
Done. Thanks! https://codereview.chromium.org/875513006/diff/70001/chrome/browser/drive/dri... File chrome/browser/drive/drive_api_util.cc (right): https://codereview.chromium.org/875513006/diff/70001/chrome/browser/drive/dri... chrome/browser/drive/drive_api_util.cc:191: if (result != net::ERR_IO_PENDING) { On 2015/02/04 16:33:57, yoshiki wrote: > nit: we don't need braces. Done.
The CQ bit was checked by kenobi@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/875513006/90001
Message was sent while issue was closed.
Committed patchset #5 (id:90001)
Message was sent while issue was closed.
Patchset 5 (id:??) landed as https://crrev.com/eda76cb6f08db5d1d8dbb7a91959b080c56ccac9 Cr-Commit-Position: refs/heads/master@{#314576}
Message was sent while issue was closed.
The /analyze builder pointed out that FileStreamMd5Digester::OnChunkRead contains two variables called 'result'. One is an 'int' and the other is a string. There's no bug, but it is a bit... weird. Consider renaming one of them in a future change?
Message was sent while issue was closed.
On 2015/02/06 02:03:36, brucedawson wrote: > The /analyze builder pointed out that FileStreamMd5Digester::OnChunkRead > contains two variables called 'result'. One is an 'int' and the other is a > string. > > There's no bug, but it is a bit... weird. Consider renaming one of them in a > future change? eew, thanks for pointing that out. Will fix. |