|
|
Created:
5 years, 4 months ago by dtapuska Modified:
5 years, 4 months ago Reviewers:
esprehn CC:
blink-reviews, blink-reviews-rendering, eae+blinkwatch, jchaffraix+rendering, leviw+renderwatch, pdr+renderingwatchlist_chromium.org, szager+layoutwatch_chromium.org, zoltan1 Base URL:
https://chromium.googlesource.com/chromium/blink.git@master Target Ref:
refs/heads/master Project:
blink Visibility:
Public. |
DescriptionEnable HitTestCache by default.
UMA metrics indicate we have great validity. Enable the hit test cache
in release by default. Keep the validity checks around in ASSERTs
are enabled.
BUG=398920
Committed: https://src.chromium.org/viewvc/blink?view=rev&revision=200852
Patch Set 1 #
Total comments: 3
Patch Set 2 : Remove UMA metrics for validity #Patch Set 3 : Fix layout test #
Messages
Total messages: 22 (7 generated)
dtapuska@chromium.org changed reviewers: + esprehn@chromium.org
https://codereview.chromium.org/1294913004/diff/1/Source/core/layout/LayoutVi... File Source/core/layout/LayoutView.cpp (left): https://codereview.chromium.org/1294913004/diff/1/Source/core/layout/LayoutVi... Source/core/layout/LayoutView.cpp:117: m_hitTestCache->verifyCachedResult(result, cacheResult); verifyCacheResult logs some UMA metrics. rbyers@ indicated that this would lead them to be orphaned. I do recognize this; although I want to keep them around as I would like to try the rect based hit test cache next and I would use them in that code path. (Although this code flow would need to be modified again; but I'll cross that road later).
What does "we have great validity" mean? Is it 100% correct? https://codereview.chromium.org/1294913004/diff/1/Source/core/layout/LayoutVi... File Source/core/layout/LayoutView.cpp (left): https://codereview.chromium.org/1294913004/diff/1/Source/core/layout/LayoutVi... Source/core/layout/LayoutView.cpp:117: m_hitTestCache->verifyCachedResult(result, cacheResult); On 2015/08/17 at 21:23:14, dtapuska wrote: > verifyCacheResult logs some UMA metrics. rbyers@ indicated that this would lead them to be orphaned. I do recognize this; although I want to keep them around as I would like to try the rect based hit test cache next and I would use them in that code path. (Although this code flow would need to be modified again; but I'll cross that road later). Please delete the code for now and add it back later. Having UMA behind an ENABLE(ASSERT) doesn't make sense.
On 2015/08/18 20:27:41, esprehn wrote: > What does "we have great validity" mean? Is it 100% correct? > > https://codereview.chromium.org/1294913004/diff/1/Source/core/layout/LayoutVi... > File Source/core/layout/LayoutView.cpp (left): > > https://codereview.chromium.org/1294913004/diff/1/Source/core/layout/LayoutVi... > Source/core/layout/LayoutView.cpp:117: > m_hitTestCache->verifyCachedResult(result, cacheResult); > On 2015/08/17 at 21:23:14, dtapuska wrote: > > verifyCacheResult logs some UMA metrics. rbyers@ indicated that this would > lead them to be orphaned. I do recognize this; although I want to keep them > around as I would like to try the rect based hit test cache next and I would use > them in that code path. (Although this code flow would need to be modified > again; but I'll cross that road later). > > Please delete the code for now and add it back later. Having UMA behind an > ENABLE(ASSERT) doesn't make sense. UMA reports there were approximately 11 failures in ~500 million cache hits. The rate has substantially dropped from the last fix where it was 1 in about 20 million.
https://codereview.chromium.org/1294913004/diff/1/Source/core/layout/LayoutVi... File Source/core/layout/LayoutView.cpp (left): https://codereview.chromium.org/1294913004/diff/1/Source/core/layout/LayoutVi... Source/core/layout/LayoutView.cpp:117: m_hitTestCache->verifyCachedResult(result, cacheResult); On 2015/08/18 20:27:41, esprehn wrote: > On 2015/08/17 at 21:23:14, dtapuska wrote: > > verifyCacheResult logs some UMA metrics. rbyers@ indicated that this would > lead them to be orphaned. I do recognize this; although I want to keep them > around as I would like to try the rect based hit test cache next and I would use > them in that code path. (Although this code flow would need to be modified > again; but I'll cross that road later). > > Please delete the code for now and add it back later. Having UMA behind an > ENABLE(ASSERT) doesn't make sense. Done.
11 in 500m means we're still failing sometimes though? How do we go about fixing that correctness?
On 2015/08/18 21:04:32, esprehn wrote: > 11 in 500m means we're still failing sometimes though? How do we go about fixing > that correctness? Yes it still means that there are some failures. What isn't clear to me is that if all the open hit test bugs can cause it not to be deterministic. I can continue thinking about the problem but I'm completely out of ideas as it stands. What is consistent with the current 11 failures is that they are all of code 151. Which implies the inner node, the pseudo node and the local point don't match. So that implies it pretty much found a different DOM node than what was in the cache at a different location. I'm not certain what I can do to try to grab more information from these failures. I have not received any reports of any failures in debug mode; so the non-release assert isn't grabbing any useful information. Turning it into a release assert is one possibility but that would add potential crashes at a rate of 1 in 50 million. Moving the mouse a single pixel causes it not to use the cache; so my biggest bet is that the hit test is non-deterministic sometimes.
On 2015/08/18 21:14:52, dtapuska wrote: > On 2015/08/18 21:04:32, esprehn wrote: > > 11 in 500m means we're still failing sometimes though? How do we go about > fixing > > that correctness? > > Yes it still means that there are some failures. What isn't clear to me is that > if all the open hit test bugs can cause it not to be deterministic. > > I can continue thinking about the problem but I'm completely out of ideas as it > stands. > > What is consistent with the current 11 failures is that they are all of code > 151. Which implies the inner node, the pseudo node and the local point don't > match. So that implies it pretty much found a different DOM node than what was > in the cache at a different location. > > I'm not certain what I can do to try to grab more information from these > failures. I have not received any reports of any failures in debug mode; so the > non-release assert isn't grabbing any useful information. Turning it into a > release assert is one possibility but that would add potential crashes at a rate > of 1 in 50 million. > > Moving the mouse a single pixel causes it not to use the cache; so my biggest > bet is that the hit test is non-deterministic sometimes. Right. My argument to Dave was that this is more rigor then we've probably ever subjected the hit-testing code to. If we could easily identify these 11 cases, I would not be surprised to find that most (or all) of them were bugs in hit testing itself. We're getting dramatically decreasing returns on our investment here - Dave's time could be way better spent on the bugs we KNOW we have (which surely occur more often than 0.0000022% of the time because we have reports from users). Of course we could pretty easily test his hypothesis if we want to. Maybe (independent from all hit test cache work) we should add code that for 1 in 10,000 hit tests does the hit test again a 2nd time and compares the result. Perhaps more likely is that we're doing something somewhere that changes the layout tree without dirtying layout correctly. Again we could probably test this by occasionally triggering a supposedly unnecessary layout and verifying nothing changed. If we find this happens extremely rarely, would that really be reason not to ship this cache?
On 2015/08/18 at 21:32:42, rbyers wrote: > On 2015/08/18 21:14:52, dtapuska wrote: > > On 2015/08/18 21:04:32, esprehn wrote: > > > 11 in 500m means we're still failing sometimes though? How do we go about > > fixing > > > that correctness? > > > > Yes it still means that there are some failures. What isn't clear to me is that > > if all the open hit test bugs can cause it not to be deterministic. > > > > I can continue thinking about the problem but I'm completely out of ideas as it > > stands. > > > > What is consistent with the current 11 failures is that they are all of code > > 151. Which implies the inner node, the pseudo node and the local point don't > > match. So that implies it pretty much found a different DOM node than what was > > in the cache at a different location. > > > > I'm not certain what I can do to try to grab more information from these > > failures. I have not received any reports of any failures in debug mode; so the > > non-release assert isn't grabbing any useful information. Turning it into a > > release assert is one possibility but that would add potential crashes at a rate > > of 1 in 50 million. > > > > Moving the mouse a single pixel causes it not to use the cache; so my biggest > > bet is that the hit test is non-deterministic sometimes. > > Right. My argument to Dave was that this is more rigor then we've probably ever subjected the hit-testing code to. If we could easily identify these 11 cases, I would not be surprised to find that most (or all) of them were bugs in hit testing itself. We're getting dramatically decreasing returns on our investment here - Dave's time could be way better spent on the bugs we KNOW we have (which surely occur more often than 0.0000022% of the time because we have reports from users). > > Of course we could pretty easily test his hypothesis if we want to. Maybe (independent from all hit test cache work) we should add code that for 1 in 10,000 hit tests does the hit test again a 2nd time and compares the result. > > Perhaps more likely is that we're doing something somewhere that changes the layout tree without dirtying layout correctly. Again we could probably test this by occasionally triggering a supposedly unnecessary layout and verifying nothing changed. If we find this happens extremely rarely, would that really be reason not to ship this cache? Could we add a RELEASE_ASSERT to make this crash and collect those urls? I think this is probably okay to ship, but it makes me sad to ship some known bugs like this. lgtm
On 2015/08/18 21:50:08, esprehn wrote: > On 2015/08/18 at 21:32:42, rbyers wrote: > > On 2015/08/18 21:14:52, dtapuska wrote: > > > On 2015/08/18 21:04:32, esprehn wrote: > > > > 11 in 500m means we're still failing sometimes though? How do we go about > > > fixing > > > > that correctness? > > > > > > Yes it still means that there are some failures. What isn't clear to me is > that > > > if all the open hit test bugs can cause it not to be deterministic. > > > > > > I can continue thinking about the problem but I'm completely out of ideas as > it > > > stands. > > > > > > What is consistent with the current 11 failures is that they are all of code > > > 151. Which implies the inner node, the pseudo node and the local point don't > > > match. So that implies it pretty much found a different DOM node than what > was > > > in the cache at a different location. > > > > > > I'm not certain what I can do to try to grab more information from these > > > failures. I have not received any reports of any failures in debug mode; so > the > > > non-release assert isn't grabbing any useful information. Turning it into a > > > release assert is one possibility but that would add potential crashes at a > rate > > > of 1 in 50 million. > > > > > > Moving the mouse a single pixel causes it not to use the cache; so my > biggest > > > bet is that the hit test is non-deterministic sometimes. > > > > Right. My argument to Dave was that this is more rigor then we've probably > ever subjected the hit-testing code to. If we could easily identify these 11 > cases, I would not be surprised to find that most (or all) of them were bugs in > hit testing itself. We're getting dramatically decreasing returns on our > investment here - Dave's time could be way better spent on the bugs we KNOW we > have (which surely occur more often than 0.0000022% of the time because we have > reports from users). > > > > Of course we could pretty easily test his hypothesis if we want to. Maybe > (independent from all hit test cache work) we should add code that for 1 in > 10,000 hit tests does the hit test again a 2nd time and compares the result. > > > > Perhaps more likely is that we're doing something somewhere that changes the > layout tree without dirtying layout correctly. Again we could probably test > this by occasionally triggering a supposedly unnecessary layout and verifying > nothing changed. If we find this happens extremely rarely, would that really be > reason not to ship this cache? > > Could we add a RELEASE_ASSERT to make this crash and collect those urls? I think > this is probably okay to ship, but it makes me sad to ship some known bugs like > this. > > lgtm I hear you Elliott I don't like to admit defeat either. I spent a few moons trying different things to get the failures out.. What I suggest is we land this. And can add a temporary release assert after m46 branches. That way we can capture some data in canary builds... Have it for a few days and see what we get with it.
The CQ bit was checked by dtapuska@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1294913004/20001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1294913004/20001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: linux_chromium_rel_ng on tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by dtapuska@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from esprehn@chromium.org Link to the patchset: https://codereview.chromium.org/1294913004/#ps40001 (title: "Fix layout test")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1294913004/40001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1294913004/40001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: win_chromium_x64_rel_ng on tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_x64_...)
The CQ bit was checked by dtapuska@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1294913004/40001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1294913004/40001
Message was sent while issue was closed.
Committed patchset #3 (id:40001) as https://src.chromium.org/viewvc/blink?view=rev&revision=200852 |