|
|
Created:
3 years, 8 months ago by bcwhite Modified:
3 years, 8 months ago Reviewers:
Alexei Svitkine (slow) CC:
chromium-reviews, danakj+watch_chromium.org, asvitkine+watch_chromium.org, vmpstr+watch_chromium.org Target Ref:
refs/heads/master Project:
chromium Visibility:
Public. |
DescriptionBe tolerant of bad histogram construction arguments.
Even when the definitions for histograms are correct, sometimes the
code goes wrong and bad construction arguments appear. This will
avoid crashes by adjusting the parameters even on non-DCHECK builds.
That doesn't mean that whatever caused the bad parameters to appear
hasn't caused other issues as well.
BUG=586622
Review-Url: https://codereview.chromium.org/2828533002
Cr-Commit-Position: refs/heads/master@{#467406}
Committed: https://chromium.googlesource.com/chromium/src/+/aaf2d3451c0640d07f02978fc35d5119fafc35eb
Patch Set 1 #Patch Set 2 : fixed signed/unsigned mismatch #
Total comments: 4
Patch Set 3 : Be tolerant of bad histogram construction arguments. #
Total comments: 2
Patch Set 4 : add dummy enum to histogram definition #Patch Set 5 : revert to older DCHECK locations for compatibility with other modules #
Total comments: 3
Messages
Total messages: 48 (31 generated)
The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
bcwhite@chromium.org changed reviewers: + asvitkine@chromium.org
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-device on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device/builds...) ios-simulator-xcode-clang on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator-xco...) mac_chromium_compile_dbg_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_comp...)
The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
https://codereview.chromium.org/2828533002/diff/20001/base/metrics/histogram.cc File base/metrics/histogram.cc (right): https://codereview.chromium.org/2828533002/diff/20001/base/metrics/histogram.... base/metrics/histogram.cc:392: if (*bucket_count < 3) This should already be catching the issue. I guess this is not done in production builds - because its results is in a DCHECK().
https://codereview.chromium.org/2828533002/diff/20001/base/metrics/histogram.cc File base/metrics/histogram.cc (right): https://codereview.chromium.org/2828533002/diff/20001/base/metrics/histogram.... base/metrics/histogram.cc:392: if (*bucket_count < 3) On 2017/04/18 19:19:26, Alexei S (OOO until 19th) wrote: > This should already be catching the issue. > > I guess this is not done in production builds - because its results is in a > DCHECK(). Yes. I'm assuming the intent was to just carry on with bad results rather than crash, but there are obviously some assumptions made elsewhere on the validity of the parameters. I could CHECK() these or do more extensive modifications to the parameters: swap min/max, increase max to at least min + 2, etc.
https://codereview.chromium.org/2828533002/diff/20001/base/metrics/histogram.cc File base/metrics/histogram.cc (right): https://codereview.chromium.org/2828533002/diff/20001/base/metrics/histogram.... base/metrics/histogram.cc:392: if (*bucket_count < 3) On 2017/04/18 19:43:26, bcwhite wrote: > On 2017/04/18 19:19:26, Alexei S (OOO until 19th) wrote: > > This should already be catching the issue. > > > > I guess this is not done in production builds - because its results is in a > > DCHECK(). > > Yes. I'm assuming the intent was to just carry on with bad results rather than > crash, but there are obviously some assumptions made elsewhere on the validity > of the parameters. > > I could CHECK() these or do more extensive modifications to the parameters: swap > min/max, increase max to at least min + 2, etc. How about I move the DCHECKs into this method but have follow-on code (also in this method) that makes the parameters sane/crash-proof so that it won't cause failures in the field? I can add a sparse histogram that stores the histogram hash in the case of a failure; if there appear to be some common ones then it won't be too difficult to match those numbers to names and investigate further.
https://codereview.chromium.org/2828533002/diff/20001/base/metrics/histogram.cc File base/metrics/histogram.cc (right): https://codereview.chromium.org/2828533002/diff/20001/base/metrics/histogram.... base/metrics/histogram.cc:392: if (*bucket_count < 3) On 2017/04/24 12:37:59, bcwhite wrote: > On 2017/04/18 19:43:26, bcwhite wrote: > > On 2017/04/18 19:19:26, Alexei S (OOO until 19th) wrote: > > > This should already be catching the issue. > > > > > > I guess this is not done in production builds - because its results is in a > > > DCHECK(). > > > > Yes. I'm assuming the intent was to just carry on with bad results rather > than > > crash, but there are obviously some assumptions made elsewhere on the validity > > of the parameters. > > > > I could CHECK() these or do more extensive modifications to the parameters: > swap > > min/max, increase max to at least min + 2, etc. > > How about I move the DCHECKs into this method but have follow-on code (also in > this method) that makes the parameters sane/crash-proof so that it won't cause > failures in the field? > > I can add a sparse histogram that stores the histogram hash in the case of a > failure; if there appear to be some common ones then it won't be too difficult > to match those numbers to names and investigate further. That sounds good to me - thanks!
Description was changed from ========== Add CHECK for invalid number of LinearHistogram ranges. BUG=586622 ========== to ========== Be tolerant of bad histogram construction arguments. Even when the definitions for histograms are correct, sometimes the code goes wrong and bad construction arguments appear. This will avoid crashes by adjusting the parameters even on non-DCHECK builds. That doesn't mean that whatever caused the bad parameters to appear hasn't caused other issues as well. BUG=586622 ==========
The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run
Patchset #3 (id:40001) has been deleted
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
On 2017/04/24 15:45:44, Alexei Svitkine (slow) wrote: > https://codereview.chromium.org/2828533002/diff/20001/base/metrics/histogram.cc > File base/metrics/histogram.cc (right): > > https://codereview.chromium.org/2828533002/diff/20001/base/metrics/histogram.... > base/metrics/histogram.cc:392: if (*bucket_count < 3) > On 2017/04/24 12:37:59, bcwhite wrote: > > On 2017/04/18 19:43:26, bcwhite wrote: > > > On 2017/04/18 19:19:26, Alexei S (OOO until 19th) wrote: > > > > This should already be catching the issue. > > > > > > > > I guess this is not done in production builds - because its results is in > a > > > > DCHECK(). > > > > > > Yes. I'm assuming the intent was to just carry on with bad results rather > > than > > > crash, but there are obviously some assumptions made elsewhere on the > validity > > > of the parameters. > > > > > > I could CHECK() these or do more extensive modifications to the parameters: > > swap > > > min/max, increase max to at least min + 2, etc. > > > > How about I move the DCHECKs into this method but have follow-on code (also in > > this method) that makes the parameters sane/crash-proof so that it won't cause > > failures in the field? > > > > I can add a sparse histogram that stores the histogram hash in the case of a > > failure; if there appear to be some common ones then it won't be too difficult > > to match those numbers to names and investigate further. > > That sounds good to me - thanks! How's this?
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_chromeos_ozone_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
lg, one comment Thanks! https://codereview.chromium.org/2828533002/diff/60001/tools/metrics/histogram... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/2828533002/diff/60001/tools/metrics/histogram... tools/metrics/histograms/histograms.xml:22307: +<histogram name="Histogram.BadConstructionArguments"> Nit: Add an enum="" definition - that you can just put a dummy value in. This is needed so this histogram is treated as an enum by our tools, and not as numeric (since percentiles of histogram hashes don't make sense).
The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2828533002/diff/60001/tools/metrics/histogram... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/2828533002/diff/60001/tools/metrics/histogram... tools/metrics/histograms/histograms.xml:22307: +<histogram name="Histogram.BadConstructionArguments"> On 2017/04/24 20:41:20, Alexei Svitkine (slow) wrote: > Nit: Add an enum="" definition - that you can just put a dummy value in. > > This is needed so this histogram is treated as an enum by our tools, and not as > numeric (since percentiles of histogram hashes don't make sense). Comme ca?
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
lgtm Not sure if you need to add a dummy entry or if having it empty is fine. It's possible something server-side might break if there's no enums listed - but also it might just work. Up to you if you want to try it like this - but then potentially deal with any issues that come up - or whether to just add a dummy entry.
On 2017/04/25 16:30:55, Alexei Svitkine (slow) wrote: > lgtm > > Not sure if you need to add a dummy entry or if having it empty is fine. It's > possible something server-side might break if there's no enums listed - but also > it might just work. Empty quotes (enum="") failed. Name without a definition failed. This is what I came up with. Plus, we can always datafill it with known entities if such becomes useful.
On 2017/04/26 14:57:09, bcwhite wrote: > On 2017/04/25 16:30:55, Alexei Svitkine (slow) wrote: > > lgtm > > > > Not sure if you need to add a dummy entry or if having it empty is fine. It's > > possible something server-side might break if there's no enums listed - but > also > > it might just work. > > Empty quotes (enum="") failed. Name without a definition failed. This is what > I came up with. Plus, we can always datafill it with known entities if such > becomes useful. To clarify, I meant to add a dummy <int> entry within the <enum> because we may assume there's always at least one in our server side code.
The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: cast_shell_android on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/cast_shell_a...)
The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run
Patchset #5 (id:100001) has been deleted
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
On 2017/04/26 15:01:50, Alexei Svitkine (slow) wrote: > On 2017/04/26 14:57:09, bcwhite wrote: > > On 2017/04/25 16:30:55, Alexei Svitkine (slow) wrote: > > > lgtm > > > > > > Not sure if you need to add a dummy entry or if having it empty is fine. > It's > > > possible something server-side might break if there's no enums listed - but > > also > > > it might just work. > > > > Empty quotes (enum="") failed. Name without a definition failed. This is > what > > I came up with. Plus, we can always datafill it with known entities if such > > becomes useful. > > To clarify, I meant to add a dummy <int> entry within the <enum> because we may > assume there's always at least one in our server side code. Ah, okay. Done. Also, I had to revert things to the older of way of doing the DCHECK at the call-site because Inspect() is called from outside code as well, sometimes using the false return value to do more intelligent management.
lgtm https://codereview.chromium.org/2828533002/diff/120001/base/metrics/histogram.h File base/metrics/histogram.h (right): https://codereview.chromium.org/2828533002/diff/120001/base/metrics/histogram... base/metrics/histogram.h:187: // data doesn't create confusion on the servers. Nit: Isn't this addressed by this CL? So no need for the TODO?
https://codereview.chromium.org/2828533002/diff/120001/base/metrics/histogram.h File base/metrics/histogram.h (right): https://codereview.chromium.org/2828533002/diff/120001/base/metrics/histogram... base/metrics/histogram.h:187: // data doesn't create confusion on the servers. On 2017/04/26 17:33:16, Alexei Svitkine (slow) wrote: > Nit: Isn't this addressed by this CL? So no need for the TODO? No, right now it just creates the histogram with the fixed parameters. It won't crash but could yield different parameters than what is being generated on properly running processes. When these upload, UMA could see a different set of buckets from the errant process than all others are reporting. I think it would be better to just "sink" the damaged histogram samples and not report them at all rather than cause confusion upstream. Sinking is part of the plan for reducing histogram memory, creating dummy objects that just discard samples for those histograms that the server deems unused/obsolete without having to modify the code.
https://codereview.chromium.org/2828533002/diff/120001/base/metrics/histogram.h File base/metrics/histogram.h (right): https://codereview.chromium.org/2828533002/diff/120001/base/metrics/histogram... base/metrics/histogram.h:187: // data doesn't create confusion on the servers. On 2017/04/26 17:47:30, bcwhite wrote: > On 2017/04/26 17:33:16, Alexei Svitkine (slow) wrote: > > Nit: Isn't this addressed by this CL? So no need for the TODO? > > No, right now it just creates the histogram with the fixed parameters. It won't > crash but could yield different parameters than what is being generated on > properly running processes. When these upload, UMA could see a different set of > buckets from the errant process than all others are reporting. > > I think it would be better to just "sink" the damaged histogram samples and not > report them at all rather than cause confusion upstream. > > Sinking is part of the plan for reducing histogram memory, creating dummy > objects that just discard samples for those histograms that the server deems > unused/obsolete without having to modify the code. Ah yes, sorry didn't realise that's what you where referring to as "sink". Makes sense.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by bcwhite@chromium.org
CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 120001, "attempt_start_ts": 1493233630594180, "parent_rev": "40b32c7bef5d14f90e02b8c20630f98884316a67", "commit_rev": "aaf2d3451c0640d07f02978fc35d5119fafc35eb"}
Message was sent while issue was closed.
Description was changed from ========== Be tolerant of bad histogram construction arguments. Even when the definitions for histograms are correct, sometimes the code goes wrong and bad construction arguments appear. This will avoid crashes by adjusting the parameters even on non-DCHECK builds. That doesn't mean that whatever caused the bad parameters to appear hasn't caused other issues as well. BUG=586622 ========== to ========== Be tolerant of bad histogram construction arguments. Even when the definitions for histograms are correct, sometimes the code goes wrong and bad construction arguments appear. This will avoid crashes by adjusting the parameters even on non-DCHECK builds. That doesn't mean that whatever caused the bad parameters to appear hasn't caused other issues as well. BUG=586622 Review-Url: https://codereview.chromium.org/2828533002 Cr-Commit-Position: refs/heads/master@{#467406} Committed: https://chromium.googlesource.com/chromium/src/+/aaf2d3451c0640d07f02978fc35d... ==========
Message was sent while issue was closed.
Committed patchset #5 (id:120001) as https://chromium.googlesource.com/chromium/src/+/aaf2d3451c0640d07f02978fc35d... |