|
|
Created:
5 years, 3 months ago by haraken Modified:
5 years, 3 months ago CC:
Mads Ager (chromium), blink-reviews, kouhei+heap_chromium.org, oilpan-reviews Base URL:
svn://svn.chromium.org/blink/trunk Target Ref:
refs/heads/master Project:
blink Visibility:
Public. |
DescriptionOilpan: Fix a regression in CSS.ClassInvalidation
This CL fixes performance regression in CSS.ClassInvalidation (although I don't understand why this CL improves performance at all).
ToT without oilpan: 5200 runs/sec
ToT with oilpan: 4690 runs/sec
ToT + this CL with oilpan: 5220 runs/sec
BUG=
Patch Set 1 #Patch Set 2 : #Patch Set 3 : #Patch Set 4 : #
Total comments: 2
Messages
Total messages: 16 (2 generated)
I confirmed that this CL fixes performance regressions observed in CSS.FocusUpdate and CSS.ClassInvalidation. I don't understand why, and this CL is not yet in a landable state. I'll investigate more.
haraken@chromium.org changed reviewers: + oilpan-reviews@chromium.org, sigbjornf@opera.com
PTAL PS1 fixes regressions in CSS.FocusUpdate and CSS.ClassInvalidation. However, PS1 is not a landable CL. PS4 fixes a regression in CSS.ClassInvalidation but doesn't fix a regression in CSS.FocusUpdate. PS4 is a landable CL. So I'd like to land PS4 at the moment. https://codereview.chromium.org/1350483005/diff/60001/Source/platform/heap/Th... File Source/platform/heap/ThreadState.cpp (right): https://codereview.chromium.org/1350483005/diff/60001/Source/platform/heap/Th... Source/platform/heap/ThreadState.cpp:634: return judgeGCThreshold(1024 * 1024, 1.5); Note: This change will be harmless, so I'd like to just land this. Note: 0 causes the regression. Any value larger than 0 fixes the regression. It seems that we're hitting some compiler bug?
esprehn@chromium.org changed reviewers: + esprehn@chromium.org
What do these magic numbers do? I'm concerned were playing wackamole to ship oilpan...
https://codereview.chromium.org/1350483005/diff/60001/Source/platform/heap/Th... File Source/platform/heap/ThreadState.cpp (right): https://codereview.chromium.org/1350483005/diff/60001/Source/platform/heap/Th... Source/platform/heap/ThreadState.cpp:634: return judgeGCThreshold(1024 * 1024, 1.5); On 2015/09/20 13:36:41, haraken wrote: > > Note: This change will be harmless, so I'd like to just land this. > > Note: 0 causes the regression. Any value larger than 0 fixes the regression. It > seems that we're hitting some compiler bug? This is a good find; how about changing judgeGCThreshold()'s first argument (the minimum allocation it requires) to non-zero values.
Do you have logs for when GCs hit (and for what cause) with & without this change (ps4)? The sequene and ordering of those might help shed light on this one, possibly.
On 2015/09/21 05:59:28, sof wrote: > https://codereview.chromium.org/1350483005/diff/60001/Source/platform/heap/Th... > File Source/platform/heap/ThreadState.cpp (right): > > https://codereview.chromium.org/1350483005/diff/60001/Source/platform/heap/Th... > Source/platform/heap/ThreadState.cpp:634: return judgeGCThreshold(1024 * 1024, > 1.5); > On 2015/09/20 13:36:41, haraken wrote: > > > > Note: This change will be harmless, so I'd like to just land this. > > > > Note: 0 causes the regression. Any value larger than 0 fixes the regression. > It > > seems that we're hitting some compiler bug? > > This is a good find; how about changing judgeGCThreshold()'s first argument (the > minimum allocation it requires) to non-zero values. ^^^ Is this not what PS4 is doing? > Do you have logs for when GCs hit (and for what cause) with & without this > change (ps4)? The sequene and ordering of those might help shed light on this > one, possibly. Nothing changes with PS4. With or without PS4, no memory pressure GCs are triggered during the benchmark. This implies that we're just hitting a subtlety of compiler generated code, which seems not worth spending a lot of time on this IMHO (i.e., we're just gaming micro-benchmarks in this particular case).
On 2015/09/21 09:12:54, haraken wrote: > On 2015/09/21 05:59:28, sof wrote: > > > https://codereview.chromium.org/1350483005/diff/60001/Source/platform/heap/Th... > > File Source/platform/heap/ThreadState.cpp (right): > > > > > https://codereview.chromium.org/1350483005/diff/60001/Source/platform/heap/Th... > > Source/platform/heap/ThreadState.cpp:634: return judgeGCThreshold(1024 * 1024, > > 1.5); > > On 2015/09/20 13:36:41, haraken wrote: > > > > > > Note: This change will be harmless, so I'd like to just land this. > > > > > > Note: 0 causes the regression. Any value larger than 0 fixes the regression. > > It > > > seems that we're hitting some compiler bug? > > > > This is a good find; how about changing judgeGCThreshold()'s first argument > (the > > minimum allocation it requires) to non-zero values. > > ^^^ Is this not what PS4 is doing? > > > Do you have logs for when GCs hit (and for what cause) with & without this > > change (ps4)? The sequene and ordering of those might help shed light on this > > one, possibly. > > Nothing changes with PS4. With or without PS4, no memory pressure GCs are > triggered during the benchmark. This implies that we're just hitting a subtlety > of compiler generated code, which seems not worth spending a lot of time on this > IMHO (i.e., we're just gaming micro-benchmarks in this particular case). Thanks, it wasn't clear if GCs were triggered via this path or not. Yes, this whole activity is pointless overall. So you're arguing that the difference is due to code layout, which impacts that test. If it reproduces with another toolchain/target (Windows, in particular), I would doubt that.
On 2015/09/21 09:25:51, sof wrote: > On 2015/09/21 09:12:54, haraken wrote: > > On 2015/09/21 05:59:28, sof wrote: > > > > > > https://codereview.chromium.org/1350483005/diff/60001/Source/platform/heap/Th... > > > File Source/platform/heap/ThreadState.cpp (right): > > > > > > > > > https://codereview.chromium.org/1350483005/diff/60001/Source/platform/heap/Th... > > > Source/platform/heap/ThreadState.cpp:634: return judgeGCThreshold(1024 * > 1024, > > > 1.5); > > > On 2015/09/20 13:36:41, haraken wrote: > > > > > > > > Note: This change will be harmless, so I'd like to just land this. > > > > > > > > Note: 0 causes the regression. Any value larger than 0 fixes the > regression. > > > It > > > > seems that we're hitting some compiler bug? > > > > > > This is a good find; how about changing judgeGCThreshold()'s first argument > > (the > > > minimum allocation it requires) to non-zero values. > > > > ^^^ Is this not what PS4 is doing? > > > > > Do you have logs for when GCs hit (and for what cause) with & without this > > > change (ps4)? The sequene and ordering of those might help shed light on > this > > > one, possibly. > > > > Nothing changes with PS4. With or without PS4, no memory pressure GCs are > > triggered during the benchmark. This implies that we're just hitting a > subtlety > > of compiler generated code, which seems not worth spending a lot of time on > this > > IMHO (i.e., we're just gaming micro-benchmarks in this particular case). > > Thanks, it wasn't clear if GCs were triggered via this path or not. Yes, this > whole activity is pointless overall. > > So you're arguing that the difference is due to code layout, which impacts that > test. If it reproduces with another toolchain/target (Windows, in particular), I > would doubt that. FWIW, mac doesn't have the regression in CSS.ClassInvalidation. https://257e68df4dae930120a4c67b6f7a507d5042fbc3-www.googledrive.com/host/0B4...
On 2015/09/21 09:28:49, haraken wrote: > On 2015/09/21 09:25:51, sof wrote: > > On 2015/09/21 09:12:54, haraken wrote: > > > On 2015/09/21 05:59:28, sof wrote: > > > > > > > > > > https://codereview.chromium.org/1350483005/diff/60001/Source/platform/heap/Th... > > > > File Source/platform/heap/ThreadState.cpp (right): > > > > > > > > > > > > > > https://codereview.chromium.org/1350483005/diff/60001/Source/platform/heap/Th... > > > > Source/platform/heap/ThreadState.cpp:634: return judgeGCThreshold(1024 * > > 1024, > > > > 1.5); > > > > On 2015/09/20 13:36:41, haraken wrote: > > > > > > > > > > Note: This change will be harmless, so I'd like to just land this. > > > > > > > > > > Note: 0 causes the regression. Any value larger than 0 fixes the > > regression. > > > > It > > > > > seems that we're hitting some compiler bug? > > > > > > > > This is a good find; how about changing judgeGCThreshold()'s first > argument > > > (the > > > > minimum allocation it requires) to non-zero values. > > > > > > ^^^ Is this not what PS4 is doing? > > > > > > > Do you have logs for when GCs hit (and for what cause) with & without this > > > > change (ps4)? The sequene and ordering of those might help shed light on > > this > > > > one, possibly. > > > > > > Nothing changes with PS4. With or without PS4, no memory pressure GCs are > > > triggered during the benchmark. This implies that we're just hitting a > > subtlety > > > of compiler generated code, which seems not worth spending a lot of time on > > this > > > IMHO (i.e., we're just gaming micro-benchmarks in this particular case). > > > > Thanks, it wasn't clear if GCs were triggered via this path or not. Yes, this > > whole activity is pointless overall. > > > > So you're arguing that the difference is due to code layout, which impacts > that > > test. If it reproduces with another toolchain/target (Windows, in particular), > I > > would doubt that. > > FWIW, mac doesn't have the regression in CSS.ClassInvalidation. > https://257e68df4dae930120a4c67b6f7a507d5042fbc3-www.googledrive.com/host/0B4... I can reproduce this on a linux workstation w/ chrome; no difference either way on windows. If I change trunk's code from bool result = judgeGCThreshold(0, 1.5); return result; to volatile bool result = judgeGCThreshold(0, 1.5); return result; the same speedup can be seen also. Inlining of shouldForceMemoryPressureGC() ending up being counter-productive?
On 2015/09/21 14:27:04, sof wrote: > On 2015/09/21 09:28:49, haraken wrote: > > On 2015/09/21 09:25:51, sof wrote: > > > On 2015/09/21 09:12:54, haraken wrote: > > > > On 2015/09/21 05:59:28, sof wrote: > > > > > > > > > > > > > > > https://codereview.chromium.org/1350483005/diff/60001/Source/platform/heap/Th... > > > > > File Source/platform/heap/ThreadState.cpp (right): > > > > > > > > > > > > > > > > > > > > https://codereview.chromium.org/1350483005/diff/60001/Source/platform/heap/Th... > > > > > Source/platform/heap/ThreadState.cpp:634: return judgeGCThreshold(1024 * > > > 1024, > > > > > 1.5); > > > > > On 2015/09/20 13:36:41, haraken wrote: > > > > > > > > > > > > Note: This change will be harmless, so I'd like to just land this. > > > > > > > > > > > > Note: 0 causes the regression. Any value larger than 0 fixes the > > > regression. > > > > > It > > > > > > seems that we're hitting some compiler bug? > > > > > > > > > > This is a good find; how about changing judgeGCThreshold()'s first > > argument > > > > (the > > > > > minimum allocation it requires) to non-zero values. > > > > > > > > ^^^ Is this not what PS4 is doing? > > > > > > > > > Do you have logs for when GCs hit (and for what cause) with & without > this > > > > > change (ps4)? The sequene and ordering of those might help shed light on > > > this > > > > > one, possibly. > > > > > > > > Nothing changes with PS4. With or without PS4, no memory pressure GCs are > > > > triggered during the benchmark. This implies that we're just hitting a > > > subtlety > > > > of compiler generated code, which seems not worth spending a lot of time > on > > > this > > > > IMHO (i.e., we're just gaming micro-benchmarks in this particular case). > > > > > > Thanks, it wasn't clear if GCs were triggered via this path or not. Yes, > this > > > whole activity is pointless overall. > > > > > > So you're arguing that the difference is due to code layout, which impacts > > that > > > test. If it reproduces with another toolchain/target (Windows, in > particular), > > I > > > would doubt that. > > > > FWIW, mac doesn't have the regression in CSS.ClassInvalidation. > > > https://257e68df4dae930120a4c67b6f7a507d5042fbc3-www.googledrive.com/host/0B4... > > I can reproduce this on a linux workstation w/ chrome; no difference either way > on windows. > > If I change trunk's code from > > bool result = judgeGCThreshold(0, 1.5); > return result; > > to > > volatile bool result = judgeGCThreshold(0, 1.5); > return result; > > the same speedup can be seen also. Inlining of shouldForceMemoryPressureGC() > ending up being counter-productive? At first I thought so. So I tried to add NEVER_INLINE to shouldForceMemoryPressureGC(), but it didn't change the performance. This issue may be related to https://codereview.chromium.org/1351063002/.
On 2015/09/21 14:29:07, haraken wrote: > On 2015/09/21 14:27:04, sof wrote: > > On 2015/09/21 09:28:49, haraken wrote: > > > On 2015/09/21 09:25:51, sof wrote: > > > > On 2015/09/21 09:12:54, haraken wrote: > > > > > On 2015/09/21 05:59:28, sof wrote: > > > > > > > > > > > > > > > > > > > > > https://codereview.chromium.org/1350483005/diff/60001/Source/platform/heap/Th... > > > > > > File Source/platform/heap/ThreadState.cpp (right): > > > > > > > > > > > > > > > > > > > > > > > > > > > https://codereview.chromium.org/1350483005/diff/60001/Source/platform/heap/Th... > > > > > > Source/platform/heap/ThreadState.cpp:634: return judgeGCThreshold(1024 > * > > > > 1024, > > > > > > 1.5); > > > > > > On 2015/09/20 13:36:41, haraken wrote: > > > > > > > > > > > > > > Note: This change will be harmless, so I'd like to just land this. > > > > > > > > > > > > > > Note: 0 causes the regression. Any value larger than 0 fixes the > > > > regression. > > > > > > It > > > > > > > seems that we're hitting some compiler bug? > > > > > > > > > > > > This is a good find; how about changing judgeGCThreshold()'s first > > > argument > > > > > (the > > > > > > minimum allocation it requires) to non-zero values. > > > > > > > > > > ^^^ Is this not what PS4 is doing? > > > > > > > > > > > Do you have logs for when GCs hit (and for what cause) with & without > > this > > > > > > change (ps4)? The sequene and ordering of those might help shed light > on > > > > this > > > > > > one, possibly. > > > > > > > > > > Nothing changes with PS4. With or without PS4, no memory pressure GCs > are > > > > > triggered during the benchmark. This implies that we're just hitting a > > > > subtlety > > > > > of compiler generated code, which seems not worth spending a lot of time > > on > > > > this > > > > > IMHO (i.e., we're just gaming micro-benchmarks in this particular case). > > > > > > > > Thanks, it wasn't clear if GCs were triggered via this path or not. Yes, > > this > > > > whole activity is pointless overall. > > > > > > > > So you're arguing that the difference is due to code layout, which impacts > > > that > > > > test. If it reproduces with another toolchain/target (Windows, in > > particular), > > > I > > > > would doubt that. > > > > > > FWIW, mac doesn't have the regression in CSS.ClassInvalidation. > > > > > > https://257e68df4dae930120a4c67b6f7a507d5042fbc3-www.googledrive.com/host/0B4... > > > > I can reproduce this on a linux workstation w/ chrome; no difference either > way > > on windows. > > > > If I change trunk's code from > > > > bool result = judgeGCThreshold(0, 1.5); > > return result; > > > > to > > > > volatile bool result = judgeGCThreshold(0, 1.5); > > return result; > > > > the same speedup can be seen also. Inlining of shouldForceMemoryPressureGC() > > ending up being counter-productive? > > At first I thought so. So I tried to add NEVER_INLINE to > shouldForceMemoryPressureGC(), but it didn't change the performance. > Yes, scheduleGCIfNeeded() is not called upon often for this inlining to matter. However, I do notice that the check for (< 1024*1024) that this brings about in judgeGCThreshold() is inlined into scheduleIfGCNeeded(), which improves the branch target offset wrt instruction cache line reads. A code alignment / inlining problem in very common Heap code paths seems like a good lead.
On 2015/09/21 19:03:34, sof wrote: > On 2015/09/21 14:29:07, haraken wrote: > > On 2015/09/21 14:27:04, sof wrote: > > > On 2015/09/21 09:28:49, haraken wrote: > > > > On 2015/09/21 09:25:51, sof wrote: > > > > > On 2015/09/21 09:12:54, haraken wrote: > > > > > > On 2015/09/21 05:59:28, sof wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://codereview.chromium.org/1350483005/diff/60001/Source/platform/heap/Th... > > > > > > > File Source/platform/heap/ThreadState.cpp (right): > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://codereview.chromium.org/1350483005/diff/60001/Source/platform/heap/Th... > > > > > > > Source/platform/heap/ThreadState.cpp:634: return > judgeGCThreshold(1024 > > * > > > > > 1024, > > > > > > > 1.5); > > > > > > > On 2015/09/20 13:36:41, haraken wrote: > > > > > > > > > > > > > > > > Note: This change will be harmless, so I'd like to just land this. > > > > > > > > > > > > > > > > Note: 0 causes the regression. Any value larger than 0 fixes the > > > > > regression. > > > > > > > It > > > > > > > > seems that we're hitting some compiler bug? > > > > > > > > > > > > > > This is a good find; how about changing judgeGCThreshold()'s first > > > > argument > > > > > > (the > > > > > > > minimum allocation it requires) to non-zero values. > > > > > > > > > > > > ^^^ Is this not what PS4 is doing? > > > > > > > > > > > > > Do you have logs for when GCs hit (and for what cause) with & > without > > > this > > > > > > > change (ps4)? The sequene and ordering of those might help shed > light > > on > > > > > this > > > > > > > one, possibly. > > > > > > > > > > > > Nothing changes with PS4. With or without PS4, no memory pressure GCs > > are > > > > > > triggered during the benchmark. This implies that we're just hitting a > > > > > subtlety > > > > > > of compiler generated code, which seems not worth spending a lot of > time > > > on > > > > > this > > > > > > IMHO (i.e., we're just gaming micro-benchmarks in this particular > case). > > > > > > > > > > Thanks, it wasn't clear if GCs were triggered via this path or not. Yes, > > > this > > > > > whole activity is pointless overall. > > > > > > > > > > So you're arguing that the difference is due to code layout, which > impacts > > > > that > > > > > test. If it reproduces with another toolchain/target (Windows, in > > > particular), > > > > I > > > > > would doubt that. > > > > > > > > FWIW, mac doesn't have the regression in CSS.ClassInvalidation. > > > > > > > > > > https://257e68df4dae930120a4c67b6f7a507d5042fbc3-www.googledrive.com/host/0B4... > > > > > > I can reproduce this on a linux workstation w/ chrome; no difference either > > way > > > on windows. > > > > > > If I change trunk's code from > > > > > > bool result = judgeGCThreshold(0, 1.5); > > > return result; > > > > > > to > > > > > > volatile bool result = judgeGCThreshold(0, 1.5); > > > return result; > > > > > > the same speedup can be seen also. Inlining of shouldForceMemoryPressureGC() > > > ending up being counter-productive? > > > > At first I thought so. So I tried to add NEVER_INLINE to > > shouldForceMemoryPressureGC(), but it didn't change the performance. > > > > Yes, scheduleGCIfNeeded() is not called upon often for this inlining to matter. > > However, I do notice that the check for (< 1024*1024) that this brings about in > judgeGCThreshold() is inlined into scheduleIfGCNeeded(), which improves the > branch target offset wrt instruction cache line reads. A code alignment / > inlining problem in very common Heap code paths seems like a good lead. Thanks, what I'm still wondering is that the number of scheduleGCIfNeeded() called in the benchmark is still limited. It's called only in outOfLineAllocate(), so it shouldn't be in a hot call path...
I would just ignore this benchmark, it's designed to be an early warning system for possible regressions, but it's also very fast so I'm totally convinced you regressed it with code locality.
On 2015/09/22 00:22:44, esprehn wrote: > I would just ignore this benchmark, it's designed to be an early warning system > for possible regressions, but it's also very fast so I'm totally convinced you > regressed it with code locality. Makes sense to me. It won't be productive to game micro-benchmarks. |