|
|
Created:
3 years, 7 months ago by James Cook Modified:
3 years, 7 months ago Reviewers:
bcwhite CC:
chromium-reviews, kalyank, sadrul Target Ref:
refs/heads/master Project:
chromium Visibility:
Public. |
Descriptionchromeos: Initialize StatisticsRecorder in AshTestSuite
This fixes flake in ash_unittests in tests that record and read back
histograms, in particular PointerMetricsRecorderTest.
This is similar to the initialization in ComponentsTestSuite,
ContentTestSuiteBase, etc.
BUG=725287
TEST=run ash_unittests locally, PointerMetricsRecorderTest doesn't retry
Review-Url: https://codereview.chromium.org/2901703002
Cr-Commit-Position: refs/heads/master@{#473952}
Committed: https://chromium.googlesource.com/chromium/src/+/8cd44d248f4d7439aeb914c5e7ca4b74df94563f
Patch Set 1 #
Messages
Total messages: 17 (9 generated)
The CQ bit was checked by jamescook@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Description was changed from ========== chromeos: Initialize StatisticsRecorder in AshTestSuite This fixes flake in ash_unittests in tests that record and read back histograms, in particular PointerMetricsRecorderTest. BUG=725287 TEST=run ash_unittests locally, PointerMetricsRecorderTest doesn't retry ========== to ========== chromeos: Initialize StatisticsRecorder in AshTestSuite This fixes flake in ash_unittests in tests that record and read back histograms, in particular PointerMetricsRecorderTest. This is similar to the initialization in ComponentsTestSuite, ContentTestSuiteBase, etc. BUG=725287 TEST=run ash_unittests locally, PointerMetricsRecorderTest doesn't retry ==========
jamescook@chromium.org changed reviewers: + bcwhite@chromium.org
bcwhite, please take a look. I'm sending this to you because you're active in //base/metrics and I'm wondering if other unit test suites might have this problem. (And frankly, I'm not 100% sure why this causes the test flake I'm seeing in the bug.) What I think is happening is: 1. Test A runs and happens to call function DoFoo() that records UMA histogram "foo". 2. In the same block of tests, test B runs, registers a HistogramTester (which initializes StatisticsRecorder), and also records UMA histogram "foo". 3. Test B then queries "foo" and gets a bad result. This doesn't happen often, because it relies on two tests exercising the same function with a histogram, but only one of them using a HistogramTester. It also doesn't happen if you run test B first, then test A. (This is what I see in practice in ash_unittests for the tests I care about.) I think in (1) the local static in DoFoo() created by the histogram macro is somehow in a bad state due to StatisticsRecorder not being initialized when it is first called. But I don't understand our histogram system very well. :-) Anyhow, this CL fixes my problem, and I doubt it hurts things. Does my reasoning make sense to you?
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
On 2017/05/23 00:40:58, James Cook (sick) wrote: > bcwhite, please take a look. I'm sending this to you because you're active in > //base/metrics and I'm wondering if other unit test suites might have this > problem. (And frankly, I'm not 100% sure why this causes the test flake I'm > seeing in the bug.) > > What I think is happening is: > 1. Test A runs and happens to call function DoFoo() that records UMA histogram > "foo". > 2. In the same block of tests, test B runs, registers a HistogramTester (which > initializes StatisticsRecorder), and also records UMA histogram "foo". > 3. Test B then queries "foo" and gets a bad result. > > This doesn't happen often, because it relies on two tests exercising the same > function with a histogram, but only one of them using a HistogramTester. It also > doesn't happen if you run test B first, then test A. (This is what I see in > practice in ash_unittests for the tests I care about.) > > I think in (1) the local static in DoFoo() created by the histogram macro is > somehow in a bad state due to StatisticsRecorder not being initialized when it > is first called. But I don't understand our histogram system very well. :-) > > Anyhow, this CL fixes my problem, and I doubt it hurts things. Does my reasoning > make sense to you? A histogram created before SR initialization will just be a local histogram, detached from the rest of the system. Another method creating the same histogram will create two distinct objects that do not overlap. What exactly is the "bad result"? No, your change wouldn't be harmful but I'd like to understand the issue here in case it could affect other tests now or in the future.
On 2017/05/23 12:48:54, bcwhite wrote: > On 2017/05/23 00:40:58, James Cook (sick) wrote: > > bcwhite, please take a look. I'm sending this to you because you're active in > > //base/metrics and I'm wondering if other unit test suites might have this > > problem. (And frankly, I'm not 100% sure why this causes the test flake I'm > > seeing in the bug.) > > > > What I think is happening is: > > 1. Test A runs and happens to call function DoFoo() that records UMA histogram > > "foo". > > 2. In the same block of tests, test B runs, registers a HistogramTester (which > > initializes StatisticsRecorder), and also records UMA histogram "foo". > > 3. Test B then queries "foo" and gets a bad result. > > > > This doesn't happen often, because it relies on two tests exercising the same > > function with a histogram, but only one of them using a HistogramTester. It > also > > doesn't happen if you run test B first, then test A. (This is what I see in > > practice in ash_unittests for the tests I care about.) > > > > I think in (1) the local static in DoFoo() created by the histogram macro is > > somehow in a bad state due to StatisticsRecorder not being initialized when it > > is first called. But I don't understand our histogram system very well. :-) > > > > Anyhow, this CL fixes my problem, and I doubt it hurts things. Does my > reasoning > > make sense to you? > > A histogram created before SR initialization will just be a local histogram, > detached from the rest of the system. Another method creating the same > histogram will create two distinct objects that do not overlap. > > What exactly is the "bad result"? > > No, your change wouldn't be harmful but I'd like to understand the issue here in > case it could affect other tests now or in the future. Yeah, I was worried about the same thing. That's why I sent the review to you. :-) See the linked bug for the "bad result". TL;DR: ../../base/test/histogram_tester.cc:55: Failure Expected: (nullptr) != (histogram), actual: 8-byte object <00-00 00-00 00-00 00-00> vs NULL Histogram "Event.DownEventCount.PerFormFactor" does not exist. The test code is pretty simple - it just constructs an object and uses a HistogramTester to verify the histogram was recorded: https://cs.chromium.org/chromium/src/ash/metrics/pointer_metrics_recorder_uni... The code under test is simple - it just records some histograms: https://cs.chromium.org/chromium/src/ash/metrics/pointer_metrics_recorder.cc?... To repro, on Linux add target_os="chromeos" to your args.gn. Build ash_unittests. Run "testing/xvfb.py out/Default/ash_unittests". Here's my args.gn: dcheck_always_on = true is_debug = false is_component_build = true target_os = "chromeos" use_goma = true enable_nacl = false A more minimal repro is this: testing/xvfb.py out/Default/ash_unittests --gtest_filter="PointerMetricsRecorderTest.DownEventPerInput:PointerWatcherAdapterTest.TouchEvents" DownEventPerInput:PointerWatcherAdapterTest will run first, corrupt some static/global state, and PointerMetricsRecorderTest will fail to find the Event.DownEventCount.PerFormFactor histogram. If you add logging to the RecordUMA() function in pointer_metrics_recorder.cc you'll see that both of those tests call RecordUMA(). I think the first call is messing up the static state that the histogram macro creates. But maybe it's some odd about local vs. global histograms (I don't really understand the distinction) such that if you create a local histogram, then try to make a global one, the global one will fail?
On 2017/05/23 15:16:28, James Cook (sick) wrote: > On 2017/05/23 12:48:54, bcwhite wrote: > > On 2017/05/23 00:40:58, James Cook (sick) wrote: > > > bcwhite, please take a look. I'm sending this to you because you're active > in > > > //base/metrics and I'm wondering if other unit test suites might have this > > > problem. (And frankly, I'm not 100% sure why this causes the test flake I'm > > > seeing in the bug.) > > > > > > What I think is happening is: > > > 1. Test A runs and happens to call function DoFoo() that records UMA > histogram > > > "foo". > > > 2. In the same block of tests, test B runs, registers a HistogramTester > (which > > > initializes StatisticsRecorder), and also records UMA histogram "foo". > > > 3. Test B then queries "foo" and gets a bad result. > > > > > > This doesn't happen often, because it relies on two tests exercising the > same > > > function with a histogram, but only one of them using a HistogramTester. It > > also > > > doesn't happen if you run test B first, then test A. (This is what I see in > > > practice in ash_unittests for the tests I care about.) > > > > > > I think in (1) the local static in DoFoo() created by the histogram macro is > > > somehow in a bad state due to StatisticsRecorder not being initialized when > it > > > is first called. But I don't understand our histogram system very well. :-) > > > > > > Anyhow, this CL fixes my problem, and I doubt it hurts things. Does my > > reasoning > > > make sense to you? > > > > A histogram created before SR initialization will just be a local histogram, > > detached from the rest of the system. Another method creating the same > > histogram will create two distinct objects that do not overlap. > > > > What exactly is the "bad result"? > > > > No, your change wouldn't be harmful but I'd like to understand the issue here > in > > case it could affect other tests now or in the future. > > Yeah, I was worried about the same thing. That's why I sent the review to you. > :-) > > See the linked bug for the "bad result". TL;DR: > > ../../base/test/histogram_tester.cc:55: Failure > Expected: (nullptr) != (histogram), actual: 8-byte object <00-00 00-00 00-00 > 00-00> vs NULL > Histogram "Event.DownEventCount.PerFormFactor" does not exist. > > The test code is pretty simple - it just constructs an object and uses a > HistogramTester to verify the histogram was recorded: > https://cs.chromium.org/chromium/src/ash/metrics/pointer_metrics_recorder_uni... > > The code under test is simple - it just records some histograms: > https://cs.chromium.org/chromium/src/ash/metrics/pointer_metrics_recorder.cc?... > > To repro, on Linux add target_os="chromeos" to your args.gn. Build > ash_unittests. Run "testing/xvfb.py out/Default/ash_unittests". Here's my > args.gn: > > dcheck_always_on = true > is_debug = false > is_component_build = true > target_os = "chromeos" > use_goma = true > enable_nacl = false > > A more minimal repro is this: > > testing/xvfb.py out/Default/ash_unittests > --gtest_filter="PointerMetricsRecorderTest.DownEventPerInput:PointerWatcherAdapterTest.TouchEvents" > > DownEventPerInput:PointerWatcherAdapterTest will run first, corrupt some > static/global state, and PointerMetricsRecorderTest will fail to find the > Event.DownEventCount.PerFormFactor histogram. > > If you add logging to the RecordUMA() function in pointer_metrics_recorder.cc > you'll see that both of those tests call RecordUMA(). I think the first call is > messing up the static state that the histogram macro creates. But maybe it's > some odd about local vs. global histograms (I don't really understand the > distinction) such that if you create a local histogram, then try to make a > global one, the global one will fail? Whoops, typo in second-to-last paragraph. Strike the "DownEventPerInput:" prefix. I meant: "PointerWatcherAdapterTest will run first, corrupt some static/global state, and PointerMetricsRecorderTest will fail to find the Event.DownEventCount.PerFormFactor histogram."
Here's what I believe is happening. Somewhere (okay, PointerWatcherAdapterTest.TouchEvents) the RecordUMA method is being called for "PointerMetricsRecorderTest.DownEventPerInput" before the SR is initialized. It creates its histogram and caches the pointer to it but its unknown globally. DownEventPerInput test then runs which initializes the SR as part of creating the HistogramTester but the histogram its looking is unknown and never gets created (because its already cached). Thus, it can't look up the name and you get your error. This is why adding SR::Initialize fixes the problem. With that, the SR is initialized during the creation of said histogram and it can later be found by HistogramTester. The best fix would be to call SR::Initialize in the test startup to catch all these cases but I'm not sure how practical that is. So, LGTM.
On 2017/05/23 16:43:34, bcwhite wrote: > Here's what I believe is happening. Somewhere (okay, > PointerWatcherAdapterTest.TouchEvents) the RecordUMA method is being called for > "PointerMetricsRecorderTest.DownEventPerInput" before the SR is initialized. It > creates its histogram and caches the pointer to it but its unknown globally. > DownEventPerInput test then runs which initializes the SR as part of creating > the HistogramTester but the histogram its looking is unknown and never gets > created (because its already cached). Thus, it can't look up the name and you > get your error. > > This is why adding SR::Initialize fixes the problem. With that, the SR is > initialized during the creation of said histogram and it can later be found by > HistogramTester. > > The best fix would be to call SR::Initialize in the test startup to catch all > these cases but I'm not sure how practical that is. > > So, LGTM. The place where I added SR::Initialize() will run it in test startup for all ash_unittests, so it should fix that suite at least. Thanks for the explanation.
The CQ bit was checked by jamescook@chromium.org
CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 1, "attempt_start_ts": 1495559155363180, "parent_rev": "33a338b49114af3a63152b690cbc9d4f22161650", "commit_rev": "8cd44d248f4d7439aeb914c5e7ca4b74df94563f"}
Message was sent while issue was closed.
Description was changed from ========== chromeos: Initialize StatisticsRecorder in AshTestSuite This fixes flake in ash_unittests in tests that record and read back histograms, in particular PointerMetricsRecorderTest. This is similar to the initialization in ComponentsTestSuite, ContentTestSuiteBase, etc. BUG=725287 TEST=run ash_unittests locally, PointerMetricsRecorderTest doesn't retry ========== to ========== chromeos: Initialize StatisticsRecorder in AshTestSuite This fixes flake in ash_unittests in tests that record and read back histograms, in particular PointerMetricsRecorderTest. This is similar to the initialization in ComponentsTestSuite, ContentTestSuiteBase, etc. BUG=725287 TEST=run ash_unittests locally, PointerMetricsRecorderTest doesn't retry Review-Url: https://codereview.chromium.org/2901703002 Cr-Commit-Position: refs/heads/master@{#473952} Committed: https://chromium.googlesource.com/chromium/src/+/8cd44d248f4d7439aeb914c5e7ca... ==========
Message was sent while issue was closed.
Committed patchset #1 (id:1) as https://chromium.googlesource.com/chromium/src/+/8cd44d248f4d7439aeb914c5e7ca... |