|
|
Created:
5 years, 9 months ago by xc Modified:
5 years, 9 months ago CC:
chromium-reviews, chromium-apps-reviews_chromium.org, extensions-reviews_chromium.org Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionFlush the extension files to disk before finalize installation.
At least on Chrome OS devices, we have experienced extension
installation with corrupted files. Flushing the files to disk before
renaming the directory should minimize (potentially eliminate) the
possiblity of corrupted installs.
BUG=463987
TEST=unit tests
Committed: https://crrev.com/0b1f9920ac6dd13d4987e41b745bae8601324498
Cr-Commit-Position: refs/heads/master@{#321006}
Patch Set 1 #
Total comments: 4
Patch Set 2 : update comments #
Total comments: 4
Patch Set 3 : flush again after move #
Total comments: 3
Patch Set 4 : refector out a local helper method for dir flush #
Total comments: 4
Patch Set 5 : respond to comments and add an experiment guard. #
Total comments: 6
Patch Set 6 : add space #Patch Set 7 : add histogram #
Total comments: 4
Patch Set 8 : add webstore download time metric #
Total comments: 4
Patch Set 9 : handle clock going backward #
Total comments: 2
Patch Set 10 : #
Messages
Total messages: 39 (5 generated)
xiaohuic@chromium.org changed reviewers: + tnagel@chromium.org, xiyuan@chromium.org
I will add the owner once you guys give lgtm.
The code changes lgtm but I've some suggestions to improve the comments. And in the commit log, I'd suggest to change "minimize (potentially eliminate) the possiblity" to "significantly reduce the probability" because even when we protect against *file* corruption, there's still the possibility of *filesystem* corruption that we don't protect against. (We might consider mounting the fs with data=journal on kiosk devices, but that is out of the scope of this CL.) https://codereview.chromium.org/998663002/diff/1/extensions/common/file_util.cc File extensions/common/file_util.cc (right): https://codereview.chromium.org/998663002/diff/1/extensions/common/file_util.... extensions/common/file_util.cc:107: // extension to be in a corrupted state. Could you please mention that empty subdirectories are not flushed to disk? After discussion with bartfab@, I don't think that this poses a problem in practice (it's hard to find a reason why an extension would rely on an empty subdirectory) but nevertheless that fact should be mentioned for completeness. https://codereview.chromium.org/998663002/diff/1/extensions/common/file_util.... extensions/common/file_util.cc:120: // rename() which is atomic and safe. Could you please update this comment: 1. base::Move() is only guaranteed to be atomic on POSIX. 2. Please be more specific about "safe". base::Move() is not guaranteed to be persistent, i.e. after power loss, the move may be reverted. Thus I think it would be useful to explain why this cannot pose a problem.
xiyuan@chromium.org changed reviewers: + asargent@chromium.org
+asargent LGTM but please also wait for Antony.
https://codereview.chromium.org/998663002/diff/1/extensions/common/file_util.cc File extensions/common/file_util.cc (right): https://codereview.chromium.org/998663002/diff/1/extensions/common/file_util.... extensions/common/file_util.cc:107: // extension to be in a corrupted state. On 2015/03/11 13:32:50, Thiemo Nagel wrote: > Could you please mention that empty subdirectories are not flushed to disk? > After discussion with bartfab@, I don't think that this poses a problem in > practice (it's hard to find a reason why an extension would rely on an empty > subdirectory) but nevertheless that fact should be mentioned for completeness. Done. https://codereview.chromium.org/998663002/diff/1/extensions/common/file_util.... extensions/common/file_util.cc:120: // rename() which is atomic and safe. On 2015/03/11 13:32:50, Thiemo Nagel wrote: > Could you please update this comment: > 1. base::Move() is only guaranteed to be atomic on POSIX. > 2. Please be more specific about "safe". base::Move() is not guaranteed to be > persistent, i.e. after power loss, the move may be reverted. Thus I think it > would be useful to explain why this cannot pose a problem. Updated the comment. Reading the code, the extension installation includes this file directory and at least a prefs store entry. Since prefs store maybe flushed at different times, I am not sure we will ever be completely safe here. I am not an expert on extension system. It would be nice our code can detect extension installation fail during load time and revert back to the old version. But overall this cl should make it safer than before.
https://codereview.chromium.org/998663002/diff/20001/extensions/common/file_u... File extensions/common/file_util.cc (right): https://codereview.chromium.org/998663002/diff/20001/extensions/common/file_u... extensions/common/file_util.cc:108: // may still be loss. s/loss/lost/ https://codereview.chromium.org/998663002/diff/20001/extensions/common/file_u... extensions/common/file_util.cc:123: // But there might still be a problem when the ExtensionPrefs is out of sync I've run some tests on this: From what I've seen, prefs are updated after InstallExtension() has completed. This means that we're safe in case system/browser crash happens between Move() and pref update. However, we're not safe if the crash happens after pref update and the kernel re-orders the writes so that the updated pref touches the disk first and the move metadata update later. But there seems to be a possibility for mitigation: After the call to base::Move(), flush *any* of the files contained therein (or in subfolders). In my tests on Chrome OS, this causes the Move() to be persisted to disk. I have no idea whether that would work on Windows, but at least we would be covered on Chrome OS (and hopefully all of POSIX). One thing remains: We must make sure that the extension garbage collector only is run once the pref update has been flushed to disk.
https://codereview.chromium.org/998663002/diff/20001/extensions/common/file_u... File extensions/common/file_util.cc (right): https://codereview.chromium.org/998663002/diff/20001/extensions/common/file_u... extensions/common/file_util.cc:108: // may still be loss. On 2015/03/12 13:22:35, Thiemo Nagel wrote: > s/loss/lost/ Done. https://codereview.chromium.org/998663002/diff/20001/extensions/common/file_u... extensions/common/file_util.cc:123: // But there might still be a problem when the ExtensionPrefs is out of sync On 2015/03/12 13:22:35, Thiemo Nagel wrote: > I've run some tests on this: From what I've seen, prefs are updated after > InstallExtension() has completed. This means that we're safe in case > system/browser crash happens between Move() and pref update. However, we're not > safe if the crash happens after pref update and the kernel re-orders the writes > so that the updated pref touches the disk first and the move metadata update > later. > > But there seems to be a possibility for mitigation: After the call to > base::Move(), flush *any* of the files contained therein (or in subfolders). In > my tests on Chrome OS, this causes the Move() to be persisted to disk. I have > no idea whether that would work on Windows, but at least we would be covered on > Chrome OS (and hopefully all of POSIX). > > One thing remains: We must make sure that the extension garbage collector only > is run once the pref update has been flushed to disk. Done. I thought of the same thing last night :)
> Done. I thought of the same thing last night :) Great! LGTM!
Ping Antony, please take a look.
Sorry for latency. A couple of high level questions: 1) In the bug entry, xiyuan mentioned we probably wanted to use the "write-fdatasync-close-rename-dirsync" strategy, but it looks like this CL doesn't do the dirsync part, unless running the fsync on the files in the new location accomplishes the same thing? 2) It seems like a shame to have to run the fsync's in serial. Is there any way to benefit from parallelism? For instance, are there non-posix linux (or newer posix) APIs that let you pass a number of file descriptors to be synced at once? 3) It would be interesting to capture some performance data, perhaps with a histogram or something. I guess if we believe this is necessary to avoid corruption, it's not like we want to run an A/B test for very long, but it might be interesting to run one for a short period of time to help understand the difference in user experience. For instance, perhaps if doing this slows install down significantly, we might consider modifying the UI in some way (I'm not sure off the top of my head if the spinner keeps going once we've completed the network download, for instance). https://codereview.chromium.org/998663002/diff/40001/extensions/common/file_u... File extensions/common/file_util.cc (right): https://codereview.chromium.org/998663002/diff/40001/extensions/common/file_u... extensions/common/file_util.cc:143: break; Why break here? Did you intend to only flush the first file in the traversal? Or did you accidentally leave this in? Also, it seems like this block is really similar to, or possibly identical to, the above one (lines 109-118), modulo the answer about the break statement. It probably makes sense to have a helper method to do the common stuff. And if this pattern proves useful elsewhere, it might make sense to propose moving that helper into somewhere in base/files/ - maybe named "DurableDirectoryCopy" or something like that. https://codereview.chromium.org/998663002/diff/40001/extensions/common/file_u... extensions/common/file_util.cc:169: nit: spurious inserted line?
> [...] unless running the fsync on the files in the new > location accomplishes the same thing? That's exactly what it does (at least according to my tests on Linux/ext4). > 2) It seems like a shame to have to run the fsync's in serial. Is there any way > to benefit from parallelism? For instance, are there non-posix linux (or newer > posix) APIs that let you pass a number of file descriptors to be synced at once? From reading the man pages, it should be possible to use aio_fsync() to leverage Linux/ext4 capability of batching syncs, cf. max_batch_time in https://www.kernel.org/doc/Documentation/filesystems/ext4.txt . > 3) It would be interesting to capture some performance data, perhaps with a > histogram or something. +1 https://codereview.chromium.org/998663002/diff/40001/extensions/common/file_u... File extensions/common/file_util.cc (right): https://codereview.chromium.org/998663002/diff/40001/extensions/common/file_u... extensions/common/file_util.cc:143: break; On 2015/03/12 23:47:26, Antony Sargent wrote: > Why break here? Did you intend to only flush the first file in the traversal? This is intended. On Linux/ext4 (and hopefully all of POSIX), flushing a file effectively does a dirsync to all parent directories. > And if this > pattern proves useful elsewhere, it might make sense to propose moving that > helper into somewhere in base/files/ - maybe named "DurableDirectoryCopy" or > something like that. +1 to that.
Thanks Antony for the review. Do you think the cl is commitable after adding the experiment? Or do we need to wait for getting everything into base/files/? On 2015/03/12 23:47:26, Antony Sargent wrote: > Sorry for latency. > > A couple of high level questions: > > 1) In the bug entry, xiyuan mentioned we probably wanted to use the > "write-fdatasync-close-rename-dirsync" strategy, but it looks like this CL > doesn't do the dirsync part, unless running the fsync on the files in the new > location accomplishes the same thing? Yes, as Thiemo said. > > 2) It seems like a shame to have to run the fsync's in serial. Is there any way > to benefit from parallelism? For instance, are there non-posix linux (or newer > posix) APIs that let you pass a number of file descriptors to be synced at once? > I will look into that. But if it exists, it might not be cross platform and should probably live in base/files/. I will consolidate this topic when talking to base/files/ owner. > 3) It would be interesting to capture some performance data, perhaps with a > histogram or something. I guess if we believe this is necessary to avoid > corruption, it's not like we want to run an A/B test for very long, but it might > be interesting to run one for a short period of time to help understand the > difference in user experience. For instance, perhaps if doing this slows install > down significantly, we might consider modifying the UI in some way (I'm not > sure off the top of my head if the spinner keeps going once we've completed the > network download, for instance). Sounds good. I will add an experiment to control this. > > https://codereview.chromium.org/998663002/diff/40001/extensions/common/file_u... > File extensions/common/file_util.cc (right): > > https://codereview.chromium.org/998663002/diff/40001/extensions/common/file_u... > extensions/common/file_util.cc:143: break; > Why break here? Did you intend to only flush the first file in the traversal? Or > did you accidentally leave this in? As Thiemo answered. > > Also, it seems like this block is really similar to, or possibly identical to, > the above one (lines 109-118), modulo the answer about the break statement. It > probably makes sense to have a helper method to do the common stuff. And if this > pattern proves useful elsewhere, it might make sense to propose moving that > helper into somewhere in base/files/ - maybe named "DurableDirectoryCopy" or > something like that. > Adding a helper method. > https://codereview.chromium.org/998663002/diff/40001/extensions/common/file_u... > extensions/common/file_util.cc:169: > nit: spurious inserted line? Done
I don't think you need to wait to get the helper code into base/ - that would be fine to do as follow-up work (and they may advise you to wait putting it into base/ until there are more places that would consume it) https://codereview.chromium.org/998663002/diff/60001/extensions/common/file_u... File extensions/common/file_util.cc (right): https://codereview.chromium.org/998663002/diff/60001/extensions/common/file_u... extensions/common/file_util.cc:57: void FlushFilesInDir(const base::FilePath& path, bool one_file_only) { optional suggestion: you might consider using an enum instead of a bool for the one_file_only parameter, e.g. enum FlushOneOrAllFiles { ONE_FILE_ONLY, ALL_FILES }; then calls would look like FlushFilesInDir(crx_temp_source, ALL_FILES) and FlushFilesInDir(version_dir, ONE_FILE_ONLY)
https://codereview.chromium.org/998663002/diff/60001/extensions/common/file_u... File extensions/common/file_util.cc (right): https://codereview.chromium.org/998663002/diff/60001/extensions/common/file_u... extensions/common/file_util.cc:61: for(base::FilePath current = temp_traversal.Next(); !current.empty(); Nit: please insert blank after "for"
https://codereview.chromium.org/998663002/diff/60001/extensions/common/file_u... File extensions/common/file_util.cc (right): https://codereview.chromium.org/998663002/diff/60001/extensions/common/file_u... extensions/common/file_util.cc:57: void FlushFilesInDir(const base::FilePath& path, bool one_file_only) { On 2015/03/13 18:40:22, Antony Sargent wrote: > optional suggestion: you might consider using an enum instead of a bool for the > one_file_only parameter, e.g. > > enum FlushOneOrAllFiles { > ONE_FILE_ONLY, > ALL_FILES > }; > > then calls would look like > > FlushFilesInDir(crx_temp_source, ALL_FILES) > > and > > FlushFilesInDir(version_dir, ONE_FILE_ONLY) > > > Done. https://codereview.chromium.org/998663002/diff/60001/extensions/common/file_u... extensions/common/file_util.cc:61: for(base::FilePath current = temp_traversal.Next(); !current.empty(); On 2015/03/16 16:07:02, Thiemo Nagel wrote: > Nit: please insert blank after "for" Done.
https://codereview.chromium.org/998663002/diff/80001/extensions/common/file_u... File extensions/common/file_util.cc (right): https://codereview.chromium.org/998663002/diff/80001/extensions/common/file_u... extensions/common/file_util.cc:79: for(base::FilePath current = temp_traversal.Next(); !current.empty(); Nit: could you please insert a blank after the "for"? ;-)
https://codereview.chromium.org/998663002/diff/80001/extensions/common/file_u... File extensions/common/file_util.cc (right): https://codereview.chromium.org/998663002/diff/80001/extensions/common/file_u... extensions/common/file_util.cc:79: for(base::FilePath current = temp_traversal.Next(); !current.empty(); On 2015/03/16 17:57:31, Thiemo Nagel wrote: > Nit: could you please insert a blank after the "for"? ;-) Done.
Having the experiment is great - I think it would also be good to have a histogram to measure the performance (if there isn't one already measuring the time spent through this codepath). https://codereview.chromium.org/998663002/diff/80001/extensions/common/file_u... File extensions/common/file_util.cc (right): https://codereview.chromium.org/998663002/diff/80001/extensions/common/file_u... extensions/common/file_util.cc:151: // either complete successfully or in the event of data loss being reverted. I have a few grammar nits - the comment should read something like this: "The target version_dir does not exist yet, so base::Move uses rename() on POSIX systems. It is atomic in the sense that it will either complete successfully, or in the event of data loss, be reverted." https://codereview.chromium.org/998663002/diff/80001/extensions/common/file_u... extensions/common/file_util.cc:158: // Flush any file in the new version_dir to make sure the dir move above is suggestion: change "Flush any file" to "Flush one file"
Added a histogram to measure exactly the file installation time here. https://codereview.chromium.org/998663002/diff/80001/extensions/common/file_u... File extensions/common/file_util.cc (right): https://codereview.chromium.org/998663002/diff/80001/extensions/common/file_u... extensions/common/file_util.cc:151: // either complete successfully or in the event of data loss being reverted. On 2015/03/16 19:31:02, Antony Sargent wrote: > I have a few grammar nits - the comment should read something like this: > > "The target version_dir does not exist yet, so base::Move uses rename() on POSIX > systems. It is atomic in the sense that it will either complete successfully, or > in the event of data loss, be reverted." > > Done. https://codereview.chromium.org/998663002/diff/80001/extensions/common/file_u... extensions/common/file_util.cc:158: // Flush any file in the new version_dir to make sure the dir move above is On 2015/03/16 19:31:02, Antony Sargent wrote: > suggestion: change "Flush any file" to "Flush one file" Done.
xiaohuic@chromium.org changed reviewers: + mpearson@chromium.org
Add histograms owner for owner's review.
https://codereview.chromium.org/998663002/diff/120001/tools/metrics/histogram... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/998663002/diff/120001/tools/metrics/histogram... tools/metrics/histograms/histograms.xml:8879: + location to the final installation directory. please explicitly say when this is recorded. Once per extension install? https://codereview.chromium.org/998663002/diff/120001/tools/metrics/histogram... tools/metrics/histograms/histograms.xml:8879: + location to the final installation directory. This comment makes it seem like it counts only the move time, not the flush time. Please revise either the description or the code.
It might make sense to measure the download and unpack times, too, so that the flush time can be put into perspective.
On 2015/03/17 17:12:26, Thiemo Nagel wrote: > It might make sense to measure the download and unpack times, too, so that the > flush time can be put into perspective. Added webstore success download time metric, unpack time already exists.
https://codereview.chromium.org/998663002/diff/120001/tools/metrics/histogram... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/998663002/diff/120001/tools/metrics/histogram... tools/metrics/histograms/histograms.xml:8879: + location to the final installation directory. On 2015/03/16 21:59:42, Mark P wrote: > please explicitly say when this is recorded. Once per extension install? Done. https://codereview.chromium.org/998663002/diff/120001/tools/metrics/histogram... tools/metrics/histograms/histograms.xml:8879: + location to the final installation directory. On 2015/03/16 21:59:42, Mark P wrote: > This comment makes it seem like it counts only the move time, not the flush > time. Please revise either the description or the code. Done.
lgtm
https://codereview.chromium.org/998663002/diff/140001/chrome/browser/extensio... File chrome/browser/extensions/webstore_installer.cc (right): https://codereview.chromium.org/998663002/diff/140001/chrome/browser/extensio... chrome/browser/extensions/webstore_installer.cc:703: I'd suggest to add "if (download.GetEndTime() >= download.GetStartTime())" since these are base::Time and thus may run backwards.
https://codereview.chromium.org/998663002/diff/140001/chrome/browser/extensio... File chrome/browser/extensions/webstore_installer.cc (right): https://codereview.chromium.org/998663002/diff/140001/chrome/browser/extensio... chrome/browser/extensions/webstore_installer.cc:703: On 2015/03/17 20:41:55, Thiemo Nagel wrote: > I'd suggest to add "if (download.GetEndTime() >= download.GetStartTime())" since > these are base::Time and thus may run backwards. Ah, that's a good point. You might also consider using base::TimeTicks which is much less likely to be able to run backwards.
https://codereview.chromium.org/998663002/diff/140001/chrome/browser/extensio... File chrome/browser/extensions/webstore_installer.cc (right): https://codereview.chromium.org/998663002/diff/140001/chrome/browser/extensio... chrome/browser/extensions/webstore_installer.cc:703: On 2015/03/17 20:41:55, Thiemo Nagel wrote: > I'd suggest to add "if (download.GetEndTime() >= download.GetStartTime())" since > these are base::Time and thus may run backwards. Done. https://codereview.chromium.org/998663002/diff/140001/chrome/browser/extensio... chrome/browser/extensions/webstore_installer.cc:703: On 2015/03/17 20:46:49, Antony Sargent wrote: > On 2015/03/17 20:41:55, Thiemo Nagel wrote: > > I'd suggest to add "if (download.GetEndTime() >= download.GetStartTime())" > since > > these are base::Time and thus may run backwards. > > Ah, that's a good point. You might also consider using base::TimeTicks which is > much less likely to be able to run backwards. Didn't find TimeTicks on the download object or conversion from base:Time. Added the if check.
histograms.xml lgtm
Thank you for your patience, Xiaohui! https://codereview.chromium.org/998663002/diff/160001/chrome/browser/extensio... File chrome/browser/extensions/webstore_installer.cc (right): https://codereview.chromium.org/998663002/diff/160001/chrome/browser/extensio... chrome/browser/extensions/webstore_installer.cc:705: if (download.GetEndTime() > download.GetStartTime()) { nit: suggest ">=" instead of ">" since the download might be very quick
https://codereview.chromium.org/998663002/diff/160001/chrome/browser/extensio... File chrome/browser/extensions/webstore_installer.cc (right): https://codereview.chromium.org/998663002/diff/160001/chrome/browser/extensio... chrome/browser/extensions/webstore_installer.cc:705: if (download.GetEndTime() > download.GetStartTime()) { On 2015/03/17 21:09:37, Thiemo Nagel wrote: > nit: suggest ">=" instead of ">" since the download might be very quick Done.
The CQ bit was checked by xiaohuic@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from xiyuan@chromium.org, tnagel@chromium.org, asargent@chromium.org, mpearson@chromium.org Link to the patchset: https://codereview.chromium.org/998663002/#ps180001 (title: " ")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/998663002/180001
Message was sent while issue was closed.
Committed patchset #10 (id:180001)
Message was sent while issue was closed.
Patchset 10 (id:??) landed as https://crrev.com/0b1f9920ac6dd13d4987e41b745bae8601324498 Cr-Commit-Position: refs/heads/master@{#321006} |