|
|
DescriptionAdd UMA stats for hard-fault counts for Windows 7 and greater.
If this stat is strongly bimodal it will be used to distinguish warm from cold for various startup metrics.
BUG=476923
TBR=jeremy@chromium.org
Committed: https://crrev.com/e2f7ec4b272226a67a3c5e66ed3d955e2a33227e
Cr-Commit-Position: refs/heads/master@{#325337}
Patch Set 1 #
Total comments: 3
Patch Set 2 : Fix for non Win32 platforms. #
Total comments: 7
Patch Set 3 : Fix warning on Win64. #Patch Set 4 : Use static_cast instead of reinterpret_cast. #Patch Set 5 : Addressed comments from Erik's review. #
Total comments: 9
Patch Set 6 : Addressed Alexei's review comments. #Patch Set 7 : Correct use of NT_SUCCESS. #
Total comments: 7
Patch Set 8 : Rebased and addressed nits. #
Messages
Total messages: 22 (4 generated)
chrisha@chromium.org changed reviewers: + asvitkine@chromium.org, jeremy@chromium.org
Not sure if startup_metric_utils is the right place or not. PTAL?
chrisha@chromium.org changed reviewers: + erikchen@chromium.org
+Erik Chen, as the likely next OWNER of the startup metrics component.
lgtm https://codereview.chromium.org/1084943002/diff/1/components/startup_metric_u... File components/startup_metric_utils/startup_metric_utils.cc (right): https://codereview.chromium.org/1084943002/diff/1/components/startup_metric_u... components/startup_metric_utils/startup_metric_utils.cc:68: sizeof(SYSTEM_PROCESS_INFORMATION_EX) == 184, Is the size the same between 32 and 64 bit? https://codereview.chromium.org/1084943002/diff/1/components/startup_metric_u... components/startup_metric_utils/startup_metric_utils.cc:88: if (!::GetVersionEx(reinterpret_cast<LPOSVERSIONINFOW>(&version_info))) Is this inside a file that's Windows-specific? If not, you probably need more ifdefs. (or move to a separate file)
Oops, not lgtm - clicked wrong button :\.
On 2015/04/14 16:53:51, Alexei Svitkine wrote: > lgtm > > https://codereview.chromium.org/1084943002/diff/1/components/startup_metric_u... > File components/startup_metric_utils/startup_metric_utils.cc (right): > > https://codereview.chromium.org/1084943002/diff/1/components/startup_metric_u... > components/startup_metric_utils/startup_metric_utils.cc:68: > sizeof(SYSTEM_PROCESS_INFORMATION_EX) == 184, > Is the size the same between 32 and 64 bit? > > https://codereview.chromium.org/1084943002/diff/1/components/startup_metric_u... > components/startup_metric_utils/startup_metric_utils.cc:88: if > (!::GetVersionEx(reinterpret_cast<LPOSVERSIONINFOW>(&version_info))) > Is this inside a file that's Windows-specific? If not, you probably need more > ifdefs. (or move to a separate file) Doh, silly me. Yes #ifdefs are required and intended. Mostly wanted a comment about whether this code is good living in this specific file.
https://codereview.chromium.org/1084943002/diff/1/components/startup_metric_u... File components/startup_metric_utils/startup_metric_utils.cc (right): https://codereview.chromium.org/1084943002/diff/1/components/startup_metric_u... components/startup_metric_utils/startup_metric_utils.cc:68: sizeof(SYSTEM_PROCESS_INFORMATION_EX) == 184, On 2015/04/14 16:53:51, Alexei Svitkine wrote: > Is the size the same between 32 and 64 bit? Good catch. Not the same size, but has the same layout on 64-bit. Removed the static_assert, as it was mostly just for me when I was writing this to ensure that I had the right definition.
I didn't review the windows-specific code. https://codereview.chromium.org/1084943002/diff/20001/components/startup_metr... File components/startup_metric_utils/startup_metric_utils.cc (right): https://codereview.chromium.org/1084943002/diff/20001/components/startup_metr... components/startup_metric_utils/startup_metric_utils.cc:181: // (Observed to vary from 1000 to 10000 on various test machines and If you're observing 10000 on a test machine, it's going to go way higher in the mild. Perhaps make the upper limit 200000, and the lower limit 100? https://codereview.chromium.org/1084943002/diff/20001/components/startup_metr... components/startup_metric_utils/startup_metric_utils.cc:190: "Startup.BrowserMessageLoopStartHardFaultCount", The description you give for this metric implies it's always recorded, but here you only record it on non-first run. I think it makes more sense to always record it, but either way the behavior and description should be consistent. https://codereview.chromium.org/1084943002/diff/20001/components/startup_metr... components/startup_metric_utils/startup_metric_utils.cc:272: RecordHardFaultHistogram(is_first_run); As a quick sanity check, can you confirm that this method is fast via local instrumentation?
Another look? https://codereview.chromium.org/1084943002/diff/20001/components/startup_metr... File components/startup_metric_utils/startup_metric_utils.cc (right): https://codereview.chromium.org/1084943002/diff/20001/components/startup_metr... components/startup_metric_utils/startup_metric_utils.cc:181: // (Observed to vary from 1000 to 10000 on various test machines and On 2015/04/14 23:01:52, erikchen wrote: > If you're observing 10000 on a test machine, it's going to go way higher in the > mild. Perhaps make the upper limit 200000, and the lower limit 100? I suppose it could go drastically higher, but it would be non-sensical. In an absolute worst case scenario this value is realistically bounded above by about 17000. This is assuming that Chrome and every single one of its dependencies (totaling about 70MB) is entirely faulted in using 4KB hard page faults. This is quite unlikely. I could see maybe doubling this to ensure that we catch some of the long tail... thoughts? As to the lower bound, we can and will see actual values of 0 for a completely warm start. I'd like to catch those explicitly. https://codereview.chromium.org/1084943002/diff/20001/components/startup_metr... components/startup_metric_utils/startup_metric_utils.cc:190: "Startup.BrowserMessageLoopStartHardFaultCount", On 2015/04/14 23:01:52, erikchen wrote: > The description you give for this metric implies it's always recorded, but here > you only record it on non-first run. I think it makes more sense to always > record it, but either way the behavior and description should be consistent. I was just being consistent with how Startup.BrowserMessageLoopStartTimeFromMainEntry and Startup.BrowserMessageLoopStartTimeFromMainEntry.FirstRun work, elsewhere in this file. I've modified the descriptions to be consistent. https://codereview.chromium.org/1084943002/diff/20001/components/startup_metr... components/startup_metric_utils/startup_metric_utils.cc:272: RecordHardFaultHistogram(is_first_run); On 2015/04/14 23:01:52, erikchen wrote: > As a quick sanity check, can you confirm that this method is fast via local > instrumentation? This was benchmarked across various versions of the OS and across both high-powered and low-powered machines. The results of the benchmarking are linked to in the attached bug. This was seen to vary from tens of microseconds to about 2 ms in the worst case.
histograms lgtm. https://codereview.chromium.org/1084943002/diff/20001/components/startup_metr... File components/startup_metric_utils/startup_metric_utils.cc (right): https://codereview.chromium.org/1084943002/diff/20001/components/startup_metr... components/startup_metric_utils/startup_metric_utils.cc:181: // (Observed to vary from 1000 to 10000 on various test machines and On 2015/04/15 13:04:30, chrisha wrote: > On 2015/04/14 23:01:52, erikchen wrote: > > If you're observing 10000 on a test machine, it's going to go way higher in > the > > mild. Perhaps make the upper limit 200000, and the lower limit 100? > > I suppose it could go drastically higher, but it would be non-sensical. In an > absolute worst case scenario this value is realistically bounded above by about > 17000. This is assuming that Chrome and every single one of its dependencies > (totaling about 70MB) is entirely faulted in using 4KB hard page faults. This is > quite unlikely. I could see maybe doubling this to ensure that we catch some of > the long tail... thoughts? > > As to the lower bound, we can and will see actual values of 0 for a completely > warm start. I'd like to catch those explicitly. Sounds reasonable to me. We can always change the bounds afterwards.
https://codereview.chromium.org/1084943002/diff/80001/components/startup_metr... File components/startup_metric_utils/startup_metric_utils.cc (right): https://codereview.chromium.org/1084943002/diff/80001/components/startup_metr... components/startup_metric_utils/startup_metric_utils.cc:83: bool GetHardFaultCountForCurrentProcess(uint32* hard_fault_count, Nit: uint32_t is best practice for new code. https://codereview.chromium.org/1084943002/diff/80001/components/startup_metr... components/startup_metric_utils/startup_metric_utils.cc:91: if (!::GetVersionEx(reinterpret_cast<LPOSVERSIONINFOW>(&version_info))) Can you use base/win/windows_version.h instead of doing it manually? https://codereview.chromium.org/1084943002/diff/80001/components/startup_metr... components/startup_metric_utils/startup_metric_utils.cc:130: if (status == 0 && return_length <= buffer.size()) Should use NT_SUCCESS() instead of status == 0 I think. https://codereview.chromium.org/1084943002/diff/80001/components/startup_metr... components/startup_metric_utils/startup_metric_utils.cc:174: success); Since there are different cases available, I suggest making a custom enum histogram outlining the possible ones, which would provide all the different things we might be interested in. i.e. NO_OS_SUPPORT, SUCCEEDED, FAILED,
Thanks. Another look? https://codereview.chromium.org/1084943002/diff/80001/components/startup_metr... File components/startup_metric_utils/startup_metric_utils.cc (right): https://codereview.chromium.org/1084943002/diff/80001/components/startup_metr... components/startup_metric_utils/startup_metric_utils.cc:83: bool GetHardFaultCountForCurrentProcess(uint32* hard_fault_count, On 2015/04/15 17:36:19, Alexei Svitkine wrote: > Nit: uint32_t is best practice for new code. Done. https://codereview.chromium.org/1084943002/diff/80001/components/startup_metr... components/startup_metric_utils/startup_metric_utils.cc:91: if (!::GetVersionEx(reinterpret_cast<LPOSVERSIONINFOW>(&version_info))) On 2015/04/15 17:36:19, Alexei Svitkine wrote: > Can you use base/win/windows_version.h instead of doing it manually? I certainly can, and it's far simpler that way. Thanks! Done. https://codereview.chromium.org/1084943002/diff/80001/components/startup_metr... components/startup_metric_utils/startup_metric_utils.cc:130: if (status == 0 && return_length <= buffer.size()) On 2015/04/15 17:36:19, Alexei Svitkine wrote: > Should use NT_SUCCESS() instead of status == 0 I think. Done. https://codereview.chromium.org/1084943002/diff/80001/components/startup_metr... components/startup_metric_utils/startup_metric_utils.cc:174: success); On 2015/04/15 17:36:19, Alexei Svitkine wrote: > Since there are different cases available, I suggest making a custom enum > histogram outlining the possible ones, which would provide all the different > things we might be interested in. > > i.e. > > NO_OS_SUPPORT, > SUCCEEDED, > FAILED, The "no OS support" case is completely deterministic (depends only on the OS version), and doesn't really provide any useful information. I'm not sure what use it would have?
lgtm % comments below https://codereview.chromium.org/1084943002/diff/80001/components/startup_metr... File components/startup_metric_utils/startup_metric_utils.cc (right): https://codereview.chromium.org/1084943002/diff/80001/components/startup_metr... components/startup_metric_utils/startup_metric_utils.cc:174: success); On 2015/04/15 17:56:01, chrisha wrote: > On 2015/04/15 17:36:19, Alexei Svitkine wrote: > > Since there are different cases available, I suggest making a custom enum > > histogram outlining the possible ones, which would provide all the different > > things we might be interested in. > > > > i.e. > > > > NO_OS_SUPPORT, > > SUCCEEDED, > > FAILED, > > The "no OS support" case is completely deterministic (depends only on the OS > version), and doesn't really provide any useful information. I'm not sure what > use it would have? I was thinking it would be simpler to reason about what percentage of users fall into each case - but that has more value when this actually affects behavior - since we're only logging the value, current set up seems fine to me now that I think about it again. Thanks. https://codereview.chromium.org/1084943002/diff/120001/components/startup_met... File components/startup_metric_utils/startup_metric_utils.cc (right): https://codereview.chromium.org/1084943002/diff/120001/components/startup_met... components/startup_metric_utils/startup_metric_utils.cc:131: while (index < buffer.size()) { Nit: Maybe add: DCHECK_LE(index + sizeof(SYSTEM_PROCESS_INFORMATION_EX), buffer.size()); https://codereview.chromium.org/1084943002/diff/120001/components/startup_met... components/startup_metric_utils/startup_metric_utils.cc:138: // The list ends with NextEntryOffset is zero. This also prevents busy Nit: "ends with" -> "ends when" https://codereview.chromium.org/1084943002/diff/120001/components/startup_met... components/startup_metric_utils/startup_metric_utils.cc:140: if (proc_info->NextEntryOffset == 0) Nit: Maybe as a safety check change to <= 0, which would safeguard against cycles in case that's in the returned data. https://codereview.chromium.org/1084943002/diff/120001/tools/metrics/histogra... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/1084943002/diff/120001/tools/metrics/histogra... tools/metrics/histograms/histograms.xml:38057: + start of the main thread's message loop, not including first runs of the Nit: "first runs of the browser" sounds a little confusing. I would change wording to something like "not including the first run after install, which is logged separately via <name of other metric>."
Thanks Alexei. jeremy: A final look as OWNER? https://codereview.chromium.org/1084943002/diff/120001/components/startup_met... File components/startup_metric_utils/startup_metric_utils.cc (right): https://codereview.chromium.org/1084943002/diff/120001/components/startup_met... components/startup_metric_utils/startup_metric_utils.cc:131: while (index < buffer.size()) { On 2015/04/15 18:08:10, Alexei Svitkine wrote: > Nit: Maybe add: > > DCHECK_LE(index + sizeof(SYSTEM_PROCESS_INFORMATION_EX), buffer.size()); Done. https://codereview.chromium.org/1084943002/diff/120001/components/startup_met... components/startup_metric_utils/startup_metric_utils.cc:138: // The list ends with NextEntryOffset is zero. This also prevents busy On 2015/04/15 18:08:10, Alexei Svitkine wrote: > Nit: "ends with" -> "ends when" Done. https://codereview.chromium.org/1084943002/diff/120001/components/startup_met... components/startup_metric_utils/startup_metric_utils.cc:140: if (proc_info->NextEntryOffset == 0) On 2015/04/15 18:08:10, Alexei Svitkine wrote: > Nit: Maybe as a safety check change to <= 0, which would safeguard against > cycles in case that's in the returned data. It's an unsigned value, but this is a harmless change. Done.
erik is to be the new owner, and he's lgtm'd. jeremy is the current owner and only needed for rubber stamp (offline discussion). Landing this TBR for now.
The CQ bit was checked by chrisha@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from erikchen@chromium.org, asvitkine@chromium.org Link to the patchset: https://codereview.chromium.org/1084943002/#ps140001 (title: "Rebased and addressed nits.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1084943002/140001
Message was sent while issue was closed.
Committed patchset #8 (id:140001)
Message was sent while issue was closed.
Patchset 8 (id:??) landed as https://crrev.com/e2f7ec4b272226a67a3c5e66ed3d955e2a33227e Cr-Commit-Position: refs/heads/master@{#325337}
Message was sent while issue was closed.
LGTM |