|
|
Created:
3 years, 7 months ago by Maks Orlovich Modified:
3 years, 7 months ago CC:
chromium-reviews, cbentzel+watch_chromium.org, gavinp+disk_chromium.org, net-reviews_chromium.org Target Ref:
refs/heads/master Project:
chromium Visibility:
Public. |
DescriptionSimpleCache: Tolerate weird file sizes when doing index recovery.
(Rather than __builtin_trap-crashing)
BUG=712326
Review-Url: https://codereview.chromium.org/2878533002
Cr-Commit-Position: refs/heads/master@{#474055}
Committed: https://chromium.googlesource.com/chromium/src/+/2ea7908fea095d9644122e8391ed2fb87fd2e621
Patch Set 1 #
Total comments: 7
Patch Set 2 : Rework comment, add check on size in open + UMA enum for that failure #Patch Set 3 : Woops, the test was meant to be a different CL. #
Messages
Total messages: 43 (22 generated)
The CQ bit was checked by morlovich@chromium.org to run a CQ dry run
Description was changed from ========== SimpleCache: Tolerate weird file sizes when doing index recovery. (Rather than SIGILL-crashing) BUG=712326 ========== to ========== SimpleCache: Tolerate weird file sizes when doing index recovery. (Rather than __builtin_trap-crashing) BUG=712326 ==========
morlovich@chromium.org changed reviewers: + pasko@chromium.org
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_chromeos_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
ChromeOS build failure is: ../../ui/views/view.h:22:10: fatal error: 'ui/accessibility/ax_enums.h' file not found #include "ui/accessibility/ax_enums.h" ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ... don't think I need to worry about that
this only happens when stat(2) returned garbage in st_size during index scan, right? I guess it is likely that it will return the same garbage when the entry is open, and we would produce a cache read error in this case, which has a change to pop into something user-facing. I think to avoid this unfortunate (even though rare) course of events, we should rather attempt to delete the entry file(s) and avoid putting the entry to the index. This way we would avoid showing corruption errors to the user. WDYT? https://codereview.chromium.org/2878533002/diff/1/net/disk_cache/simple/simpl... File net/disk_cache/simple/simple_index_file.cc (right): https://codereview.chromium.org/2878533002/diff/1/net/disk_cache/simple/simpl... net/disk_cache/simple/simple_index_file.cc:166: // get removed. So we just make something up; if it's unused it will get not really "never get removed", it will be removed after those unfortunate cache dir rescans that we do when we loose the up to date index. https://codereview.chromium.org/2878533002/diff/1/net/disk_cache/simple/simpl... net/disk_cache/simple/simple_index_file.cc:169: const int kPlaceHolderSizeWhenInvalid = 32768; I think it's worth mentioning that the entry size in the index is updated on every successful entry open operation. I did not remember that. This is important as it guarantees that we would not produce cache read errors from perfectly valid entries. I guess we could choose the placeholder to be the average size of a cache entry, so that it minimizes the effect on the time when eviction kicks in. But it's all rare and hence moot. Up to you.
On 2017/05/11 12:15:07, pasko wrote: > this only happens when stat(2) returned garbage in st_size during index scan, > right? Yeah (or FindFirstFile[Ex]/FindNextFile on Windows)... > > I guess it is likely that it will return the same garbage when the entry is > open, and we would produce a cache read error in this case, which has a change > to pop into something user-facing. > > I think to avoid this unfortunate (even though rare) course of events, we should > rather attempt to delete the entry file(s) and avoid putting the entry to the > index. This way we would avoid showing corruption errors to the user. > > WDYT? Hmm, maybe, if we delete all the _0/_1/_s files. I am just worried since we are racing entry creation when we do that, so we may delete a _0 file then something creates a _1 file... then we merge them into index from the initial in-memory set, etc. > https://codereview.chromium.org/2878533002/diff/1/net/disk_cache/simple/simpl... > File net/disk_cache/simple/simple_index_file.cc (right): > > https://codereview.chromium.org/2878533002/diff/1/net/disk_cache/simple/simpl... > net/disk_cache/simple/simple_index_file.cc:166: // get removed. So we just make > something up; if it's unused it will get > not really "never get removed", it will be removed after those unfortunate cache > dir rescans that we do when we loose the up to date index. Not if we use the wrong size as a reason to not include it in the recovery right at this spot. > https://codereview.chromium.org/2878533002/diff/1/net/disk_cache/simple/simpl... > net/disk_cache/simple/simple_index_file.cc:169: const int > kPlaceHolderSizeWhenInvalid = 32768; > I think it's worth mentioning that the entry size in the index is updated on > every successful entry open operation. I did not remember that. This is > important as it guarantees that we would not produce cache read errors from > perfectly valid entries. > > I guess we could choose the placeholder to be the average size of a cache entry, > so that it minimizes the effect on the time when eviction kicks in. But it's all > rare and hence moot. Up to you. Yeah, 32KiB is more like 75th percentile. On my home machine. Making it too small is problematic too, though, since we may be using more disk than desired.
The CQ bit was checked by morlovich@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
On 2017/05/11 13:35:13, Maks Orlovich wrote: > On 2017/05/11 12:15:07, pasko wrote: > > I guess it is likely that it will return the same garbage when the entry is > > open, and we would produce a cache read error in this case, which has a > > change to pop into something user-facing. > > > > I think to avoid this unfortunate (even though rare) course of events, we > > should rather attempt to delete the entry file(s) and avoid putting the > > entry to the index. This way we would avoid showing corruption errors to the > > user. > > > > WDYT? > > Hmm, maybe, if we delete all the _0/_1/_s files. I am just worried since we > are racing entry creation when we do that, so we may delete a _0 file then > something creates a _1 file... then we merge them into index from the initial > in-memory set, etc. hm, I did not think about this race. I suggested eliminating the pieces of obviously corrupt data early because it feels less comfortable to keep corrupt data in the index and in the entry for extended periods of time. We may start relying on this data in new ways like, for example, keep small entries in a separate datastructure or other hypothetical things. I guess the race is not super likely, since to create a _1 we first need to create a _0 (with a couple of thread hops) - based on memory, did not verify from code. Also the worst case would be reading a stale _1. Anything else? We can unlink _1 first and _0 later .. hacky-and-works? One way to avoid the race is to go through entries_pending_doom_ after the index scan. But that's more complicated. Do you still prefer the current late-and-placeholder approach? I'd probably l-g-t-m it, since it dodges a crash, but a solution with more safeguards would be more welcome.
> Do you still prefer the current late-and-placeholder approach? I'd probably > l-g-t-m it, since it dodges a crash, but a solution with more safeguards would > be more welcome. Thought some more about it, and I feel like I prefer it because I find it easier to reason about.
On 2017/05/18 19:52:26, Maks Orlovich wrote: > > Do you still prefer the current late-and-placeholder approach? I'd probably > > l-g-t-m it, since it dodges a crash, but a solution with more safeguards would > > be more welcome. > > Thought some more about it, and I feel like I prefer it because I find it easier > to > reason about. OK. Please respond to the in-code comments and let's roll this in.
https://codereview.chromium.org/2878533002/diff/1/net/disk_cache/simple/simpl... File net/disk_cache/simple/simple_index_file.cc (right): https://codereview.chromium.org/2878533002/diff/1/net/disk_cache/simple/simpl... net/disk_cache/simple/simple_index_file.cc:166: // get removed. So we just make something up; if it's unused it will get On 2017/05/11 12:15:07, pasko wrote: > not really "never get removed", it will be removed after those unfortunate cache > dir rescans that we do when we loose the up to date index. It would be never removed if we decided to skip over it rather than include it here, since this *is* the dir rescan. https://codereview.chromium.org/2878533002/diff/1/net/disk_cache/simple/simpl... net/disk_cache/simple/simple_index_file.cc:169: const int kPlaceHolderSizeWhenInvalid = 32768; On 2017/05/11 12:15:07, pasko wrote: > I think it's worth mentioning that the entry size in the index is updated on > every successful entry open operation. I did not remember that. This is > important as it guarantees that we would not produce cache read errors from > perfectly valid entries. OK, will tweak the comment to make it more explicit once I get in the office.
https://codereview.chromium.org/2878533002/diff/1/net/disk_cache/simple/simpl... File net/disk_cache/simple/simple_index_file.cc (right): https://codereview.chromium.org/2878533002/diff/1/net/disk_cache/simple/simpl... net/disk_cache/simple/simple_index_file.cc:169: const int kPlaceHolderSizeWhenInvalid = 32768; On 2017/05/19 11:35:05, morlovich wrote: > On 2017/05/11 12:15:07, pasko wrote: > > I think it's worth mentioning that the entry size in the index is updated on > > every successful entry open operation. I did not remember that. This is > > important as it guarantees that we would not produce cache read errors from > > perfectly valid entries. > > OK, will tweak the comment to make it more explicit once I get in the office. Hmm, thinking some more, maybe we should range-check the size in SimpleSynchronousEntry::OpenFiles as well, since if it's also invalid there we will do a lot of silly stuff.
https://codereview.chromium.org/2878533002/diff/1/net/disk_cache/simple/simpl... File net/disk_cache/simple/simple_index_file.cc (right): https://codereview.chromium.org/2878533002/diff/1/net/disk_cache/simple/simpl... net/disk_cache/simple/simple_index_file.cc:169: const int kPlaceHolderSizeWhenInvalid = 32768; On 2017/05/19 13:18:06, Maks Orlovich wrote: > On 2017/05/19 11:35:05, morlovich wrote: > > On 2017/05/11 12:15:07, pasko wrote: > > > I think it's worth mentioning that the entry size in the index is updated on > > > every successful entry open operation. I did not remember that. This is > > > important as it guarantees that we would not produce cache read errors from > > > perfectly valid entries. > > > > OK, will tweak the comment to make it more explicit once I get in the office. > > Hmm, thinking some more, maybe we should range-check the size in > SimpleSynchronousEntry::OpenFiles as well, since if it's also invalid there we > will do a lot of silly stuff. ah, you are right. That basically brings us back to my comment of avoiding effects of silly reads early. Feel free to land this crash mitigation separately before the proper fix in SSE::OpenFiles materializes.
https://codereview.chromium.org/2878533002/diff/1/net/disk_cache/simple/simpl... File net/disk_cache/simple/simple_index_file.cc (right): https://codereview.chromium.org/2878533002/diff/1/net/disk_cache/simple/simpl... net/disk_cache/simple/simple_index_file.cc:169: const int kPlaceHolderSizeWhenInvalid = 32768; > ah, you are right. That basically brings us back to my comment of avoiding > effects of silly reads early. True, that would save work, but the check is probably a good idea anyway. > Feel free to land this crash mitigation separately > before the proper fix in SSE::OpenFiles materializes. Hmm, the actual fix there is probably just: if (!base::IsValueInRangeForNumericType<int>(file_info.size)) return false; Though would probably want a new UMA enum for OpenEntryResult. I think I'll just do that, this isn't exactly a CL with things depending on it.
The CQ bit was checked by morlovich@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
PTAL
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_chromium_x64_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_x64_...)
The CQ bit was checked by morlovich@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
lgtm, thank you for extensive investigation and a helpful comment sorry for review latency, I was sheriffing ..
morlovich@chromium.org changed reviewers: + mpearson@chromium.org
+mpearson for enums.xml addition
enums.xml lgtm, assuming that this isn't coming from a different bucket (i.e., it's something new being recorded that wasn't recorded before) --mark
On 2017/05/23 19:19:50, Mark P wrote: > enums.xml lgtm, assuming that this isn't coming from a different bucket (i.e., > it's something new being recorded that wasn't recorded before) > > --mark I feel like I have to elaborate then, since it potentially does: Previously, if it didn't crash way before that, the same op that reports the new enum value would have reported some other enum that does not accurately describe the error condition. Those are all error conditions, though, which occur at pretty low rates. (Hmm, not quite low enough, though..) Thanks, Maks
On Tue, May 23, 2017 at 12:37 PM, <morlovich@chromium.org> wrote: > On 2017/05/23 19:19:50, Mark P wrote: > > enums.xml lgtm, assuming that this isn't coming from a different bucket > (i.e., > > it's something new being recorded that wasn't recorded before) > > > > --mark > > I feel like I have to elaborate then, since it potentially does: > Previously, if it didn't crash way before that, the same op that reports > the new > enum value would have > reported some other enum that does not accurately describe the error > condition. > > Those are all error conditions, though, which occur at pretty low rates. > (Hmm, not quite low enough, though..) > In that case, you may want to consider adding something to the histograms.xml description for this histogram, mentioning that it changed somewhat in May 2017 so that invalid file lengths are recorded accurately (and that they were previously recorded inappropriately in other buckets). --mark > Thanks, > Maks > > > > https://codereview.chromium.org/2878533002/ > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On 2017/05/23 19:40:37, Mark P wrote: > On Tue, May 23, 2017 at 12:37 PM, <mailto:morlovich@chromium.org> wrote: > > In that case, you may want to consider adding something to the > histograms.xml description for this histogram, mentioning that it changed > somewhat in May 2017 so that invalid file lengths are recorded accurately > (and that they were previously recorded inappropriately in other buckets). > > --mark I don't believe that to be important, since this is basically bitflip/FS corruption territory here. Important enough not to crash the browser process, not really something I can see use of long-term trends for. WDYT? (Of course, that may be shown to be false post-facto if the metric shows it to be happening non-trivial amount of time.)
On Tue, May 23, 2017 at 12:44 PM, <morlovich@chromium.org> wrote: > On 2017/05/23 19:40:37, Mark P wrote: > > On Tue, May 23, 2017 at 12:37 PM, <mailto:morlovich@chromium.org> wrote: > > > > In that case, you may want to consider adding something to the > > histograms.xml description for this histogram, mentioning that it changed > > somewhat in May 2017 so that invalid file lengths are recorded accurately > > (and that they were previously recorded inappropriately in other > buckets). > > > > --mark > > I don't believe that to be important, since this is basically bitflip/FS > corruption > territory here. Important enough not to crash the browser process, not > really > something I can see use of long-term trends for. > Okay, if you think the counts here are going to be *de minimis*, then I'm okay with not saying anything. If it turns out you're wrong, please revise the description. --mark > > WDYT? > > (Of course, that may be shown to be false post-facto if the metric shows > it to > be > happening non-trivial amount of time.) > > > https://codereview.chromium.org/2878533002/ > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
The CQ bit was checked by morlovich@chromium.org
CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 40001, "attempt_start_ts": 1495568860678300, "parent_rev": "778f69d41ab8539ec7d45d7e24ed58582133d00e", "commit_rev": "2ea7908fea095d9644122e8391ed2fb87fd2e621"}
Message was sent while issue was closed.
Description was changed from ========== SimpleCache: Tolerate weird file sizes when doing index recovery. (Rather than __builtin_trap-crashing) BUG=712326 ========== to ========== SimpleCache: Tolerate weird file sizes when doing index recovery. (Rather than __builtin_trap-crashing) BUG=712326 Review-Url: https://codereview.chromium.org/2878533002 Cr-Commit-Position: refs/heads/master@{#474055} Committed: https://chromium.googlesource.com/chromium/src/+/2ea7908fea095d9644122e8391ed... ==========
Message was sent while issue was closed.
Committed patchset #3 (id:40001) as https://chromium.googlesource.com/chromium/src/+/2ea7908fea095d9644122e8391ed... |