|
|
Chromium Code Reviews|
Created:
4 years, 1 month ago by DmitrySkiba Modified:
4 years ago Reviewers:
Primiano Tucci (use gerrit) CC:
chromium-reviews, tracing+reviews_chromium.org, wfh+watch_chromium.org Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
Description[tracing] Use EstimateMemoryUsage() in deduplicators.
Both type name and stack deduplicators need to estimate their memory
overhead. Previously they were using simplistic models and estimated
only half of their actual memory usage. This CL changes estimation
methods to use EstimateMemoryUsage(), resulting in better accuracy.
Committed: https://crrev.com/ea1fd31aef31820bbff6440338e039028ddc9b04
Cr-Commit-Position: refs/heads/master@{#434728}
Patch Set 1 #
Total comments: 1
Patch Set 2 : Rebase #Patch Set 3 : git cl format #
Messages
Total messages: 24 (9 generated)
dskiba@chromium.org changed reviewers: + primiano@chromium.org
https://codereview.chromium.org/2514913002/diff/1/base/trace_event/heap_profi... File base/trace_event/heap_profiler_stack_frame_deduplicator.cc (left): https://codereview.chromium.org/2514913002/diff/1/base/trace_event/heap_profi... base/trace_event/heap_profiler_stack_frame_deduplicator.cc:130: sizeof(StackFrameDeduplicator) + maps_size + frames_allocated, the previous code was differenciating between resident and allocated memory, which is quite easy due to the layout of the heap profiler. Why this is begin dropped?
On 2016/11/22 01:15:13, Primiano Tucci wrote: > https://codereview.chromium.org/2514913002/diff/1/base/trace_event/heap_profi... > File base/trace_event/heap_profiler_stack_frame_deduplicator.cc (left): > > https://codereview.chromium.org/2514913002/diff/1/base/trace_event/heap_profi... > base/trace_event/heap_profiler_stack_frame_deduplicator.cc:130: > sizeof(StackFrameDeduplicator) + maps_size + frames_allocated, > the previous code was differenciating between resident and allocated memory, > which is quite easy due to the layout of the heap profiler. Why this is begin > dropped? I think that distinction was pure speculation - it's impossible to know what part of vector's buffer is dirty. The answer changes between buffer sizes and platforms. For our case of ~15 items on each level, vector's buffer will probably be always dirty, since it will probably reusing someone's dirty allocation.
On 2016/11/22 01:48:59, DmitrySkiba wrote: > On 2016/11/22 01:15:13, Primiano Tucci wrote: > > > https://codereview.chromium.org/2514913002/diff/1/base/trace_event/heap_profi... > > File base/trace_event/heap_profiler_stack_frame_deduplicator.cc (left): > > > > > https://codereview.chromium.org/2514913002/diff/1/base/trace_event/heap_profi... > > base/trace_event/heap_profiler_stack_frame_deduplicator.cc:130: > > sizeof(StackFrameDeduplicator) + maps_size + frames_allocated, > > the previous code was differenciating between resident and allocated memory, > > which is quite easy due to the layout of the heap profiler. Why this is begin > > dropped? > > I think that distinction was pure speculation - it's impossible to know what > part of vector's buffer is dirty. The answer changes between buffer sizes and > platforms. For our case of ~15 items on each level, vector's buffer will > probably be always dirty, since it will probably reusing someone's dirty > allocation. can you take a heap profile of about:blank for 20-30 seconds with and without this cl and check if in one of the two cases malloc diverges over time (i.e. heap profiling when idle should not move malloc significantly over time)?
On 2016/11/22 14:56:13, Primiano Tucci wrote: > On 2016/11/22 01:48:59, DmitrySkiba wrote: > > On 2016/11/22 01:15:13, Primiano Tucci wrote: > > > > > > https://codereview.chromium.org/2514913002/diff/1/base/trace_event/heap_profi... > > > File base/trace_event/heap_profiler_stack_frame_deduplicator.cc (left): > > > > > > > > > https://codereview.chromium.org/2514913002/diff/1/base/trace_event/heap_profi... > > > base/trace_event/heap_profiler_stack_frame_deduplicator.cc:130: > > > sizeof(StackFrameDeduplicator) + maps_size + frames_allocated, > > > the previous code was differenciating between resident and allocated memory, > > > which is quite easy due to the layout of the heap profiler. Why this is > begin > > > dropped? > > > > I think that distinction was pure speculation - it's impossible to know what > > part of vector's buffer is dirty. The answer changes between buffer sizes and > > platforms. For our case of ~15 items on each level, vector's buffer will > > probably be always dirty, since it will probably reusing someone's dirty > > allocation. > > can you take a heap profile of about:blank for 20-30 seconds with and without > this cl and check if in one of the two cases malloc diverges over time (i.e. > heap profiling when idle should not move malloc significantly over time)? OK, I can do that, but what do you expect to see (or catch)? My changes only affect value that is then surfaced in "tracing" column - do you expect my version to grow over time? Why would it be different from master? BTW, I took some measurements before submitting, and with my changes Estimate() functions report twice as much (magnitude is 400K vs 800K for stack deduplicator) - so previously we were underreporting.
On 2016/11/22 16:46:24, DmitrySkiba wrote: > OK, I can do that, but what do you expect to see (or catch)? My changes only > affect value that is then surfaced in "tracing" column Correct. the problem is that if you under or over-esitmate it, the value of malloc and PSS and resident size will start having an artificial slope. Which confuses people a lot. In an ideal situation if you just trace about:blank malloc/pss/rss should be flat lines (% marginal chrome activity but whatever) So what I want to see here is: - we are screwed in both cases (both estimation are wrong and the trace deviates anyways)? in which case whatever, simplicity wins and we land this as-is. - one of the two cases is better: we keep that - do you expect my > version to grow over time? Why would it be different from master? because tracing overhead is subtracted to malloc effective size (which is plotted) and resident tracing overhead is subtracted from pss/resident (Which is plotted) > BTW, I took some measurements before submitting, and with my changes Estimate() > functions report twice as much (magnitude is 400K vs 800K for stack > deduplicator) - so previously we were underreporting. I don't care if under or over. I care which one gets us closer to a flat line. I know that this is a huge pain. I'm working very hard to get us out of here with TV2
On 2016/11/22 18:27:56, Primiano Tucci wrote: > On 2016/11/22 16:46:24, DmitrySkiba wrote: > > OK, I can do that, but what do you expect to see (or catch)? My changes only > > affect value that is then surfaced in "tracing" column > > Correct. the problem is that if you under or over-esitmate it, the value of > malloc and PSS and resident size will start having an artificial slope. Which > confuses people a lot. > In an ideal situation if you just trace about:blank malloc/pss/rss should be > flat lines (% marginal chrome activity but whatever) > > So what I want to see here is: > - we are screwed in both cases (both estimation are wrong and the trace deviates > anyways)? in which case whatever, simplicity wins and we land this as-is. > - one of the two cases is better: we keep that > > - do you expect my > > version to grow over time? Why would it be different from master? > because tracing overhead is subtracted to malloc effective size (which is > plotted) and resident tracing overhead is subtracted from pss/resident (Which is > plotted) > > > BTW, I took some measurements before submitting, and with my changes > Estimate() > > functions report twice as much (magnitude is 400K vs 800K for stack > > deduplicator) - so previously we were underreporting. > I don't care if under or over. I care which one gets us closer to a flat line. > I know that this is a huge pain. I'm working very hard to get us out of here > with TV2 All right, I measured 30 seconds of NTP (startup tracing, detailed dump every 2 seconds): master: https://drive.google.com/file/d/0B_Hmi138MnbJSGg2ZUVGM0VjUU0/view?usp=sharing patch: https://drive.google.com/file/d/0B_Hmi138MnbJMjgzVlNRc1hDSVk/view?usp=sharing For the browser process, main_trace_log/*Deduplicator values (which this CL affects) are reasonable: master: second dump: 499.1 + 0.5 KiB, last dump: 501.7 + 0.7 KiB patch: second dump: 613.3 + 1.6 KiB, last dump: 658.6 + 2.0 KiB What is not reasonable is main_trace_log/TracedValue - check out disturbance in the graphs it caused on 4th (~25 MiB) and 8th (~48MiB) dumps. There is also malloc/tracing_overhead, which keeps increasing: from 269.2 KiB in the first dump to 58,511.8 KiB in the last dump. So compared to those things *Deduplicator values contribution is negligible.
On 2016/11/24 00:00:37, DmitrySkiba wrote: > On 2016/11/22 18:27:56, Primiano Tucci wrote: > > On 2016/11/22 16:46:24, DmitrySkiba wrote: > All right, I measured 30 seconds of NTP (startup tracing, detailed dump every 2 > seconds): thanks a lot for checking > > master: > https://drive.google.com/file/d/0B_Hmi138MnbJSGg2ZUVGM0VjUU0/view?usp=sharing > patch: > https://drive.google.com/file/d/0B_Hmi138MnbJMjgzVlNRc1hDSVk/view?usp=sharing > > For the browser process, main_trace_log/*Deduplicator values (which this CL > affects) are reasonable: > > master: second dump: 499.1 + 0.5 KiB, last dump: 501.7 + 0.7 KiB > patch: second dump: 613.3 + 1.6 KiB, last dump: 658.6 + 2.0 KiB Yeah ok not worth even bikeshedding. we have definitely bigger problems. thanks for the data, definitely LGTM given that this makes code more redable. > > What is not reasonable is main_trace_log/TracedValue - check out disturbance in > the graphs it caused on 4th (~25 MiB) and 8th (~48MiB) dumps. > > There is also malloc/tracing_overhead, which keeps increasing: from 269.2 KiB in > the first dump to 58,511.8 KiB in the last dump. Yeah ssid and I looked to that a while ago, but never got to the root. my theory is that in some occasion we invert the order of when we do the estimations (before or after mallinfo()) and we are off by one in some occasions. Anyways, going to get rid of this estimation logic once I fix tracing. > So compared to those things *Deduplicator values contribution is negligible. agreed. thanks again for checking
The CQ bit was checked by dskiba@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: cast_shell_linux on master.tryserver.chromium.linux (JOB_TIMED_OUT, no build URL) chromeos_x86-generic_chromium_compile_only_ng on master.tryserver.chromium.linux (JOB_TIMED_OUT, no build URL) chromium_presubmit on master.tryserver.chromium.linux (JOB_TIMED_OUT, no build URL) linux_chromium_asan_rel_ng on master.tryserver.chromium.linux (JOB_TIMED_OUT, no build URL) linux_chromium_chromeos_compile_dbg_ng on master.tryserver.chromium.linux (JOB_TIMED_OUT, no build URL) linux_chromium_chromeos_ozone_rel_ng on master.tryserver.chromium.linux (JOB_TIMED_OUT, no build URL) linux_chromium_chromeos_rel_ng on master.tryserver.chromium.linux (JOB_TIMED_OUT, no build URL) linux_chromium_clobber_rel_ng on master.tryserver.chromium.linux (JOB_TIMED_OUT, no build URL) linux_chromium_compile_dbg_ng on master.tryserver.chromium.linux (JOB_TIMED_OUT, no build URL)
The CQ bit was checked by dskiba@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: chromium_presubmit on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presub...)
The CQ bit was checked by dskiba@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from primiano@chromium.org Link to the patchset: https://codereview.chromium.org/2514913002/#ps40001 (title: "git cl format")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch.
Bot data: {"patchset_id": 40001, "attempt_start_ts": 1480360312271080,
"parent_rev": "3eec1b0353a7c4c6b266bd6e38383b1b534e167d", "commit_rev":
"4147d028924a831bbf86dc3c57caa3f2f410f2c6"}
Message was sent while issue was closed.
Committed patchset #3 (id:40001)
Message was sent while issue was closed.
Description was changed from ========== [tracing] Use EstimateMemoryUsage() in deduplicators. Both type name and stack deduplicators need to estimate their memory overhead. Previously they were using simplistic models and estimated only half of their actual memory usage. This CL changes estimation methods to use EstimateMemoryUsage(), resulting in better accuracy. ========== to ========== [tracing] Use EstimateMemoryUsage() in deduplicators. Both type name and stack deduplicators need to estimate their memory overhead. Previously they were using simplistic models and estimated only half of their actual memory usage. This CL changes estimation methods to use EstimateMemoryUsage(), resulting in better accuracy. Committed: https://crrev.com/ea1fd31aef31820bbff6440338e039028ddc9b04 Cr-Commit-Position: refs/heads/master@{#434728} ==========
Message was sent while issue was closed.
Patchset 3 (id:??) landed as https://crrev.com/ea1fd31aef31820bbff6440338e039028ddc9b04 Cr-Commit-Position: refs/heads/master@{#434728} |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
