|
|
Created:
6 years, 9 months ago by Daniel Bratell Modified:
6 years, 9 months ago Reviewers:
gavinp, rvargas (doing something else), jar (doing other things) CC:
chromium-reviews, cbentzel+watch_chromium.org, gavinp+disk_chromium.org, mmenke, asanka Base URL:
https://chromium.googlesource.com/chromium/src.git@master Visibility:
Public. |
DescriptionPut histogram code in disk_cache on a diet.
The CACHE_UMA macro expanded wildly and with all the static variables
in all blocks it was impossible for the compiler to shrink it. This
removes the static variables (for a minor performance hit) to save 75%
of the size of BackendImpl::ReportStats() and other changes elsewhere.
Examples (sorry, don't have data for symbols smaller than 6 KB so not exact):
disk_cache::BackendImpl::ReportStats() 55423 bytes -> 9788 bytes
disk_cache::BackendImpl::FirstEviction() 21361 bytes -> < 6 KB
disk_cache::BackendImpl::OpenEntryImpl: 16381 bytes -> < 6 KB
disk_cache::EntryImpl::ReportIOTime: 14626 bytes -> < 6 KB
disk_cache::Eviction::ReportListStats: 10099 bytes -> < 6 KB
disk_cache::Eviction::TrimCacheV2: 7820 bytes -> < 6 KB
disk_cache::Eviction::TrimCache: 7095 bytes -> < 6 KB
In total my stripped Linux x64 Release content_shell binary shrinks by 132 KB with this change (0.25%).
BUG=352555
R=gavinp@chromium.org
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=257637
Patch Set 1 #
Total comments: 9
Patch Set 2 : Same change also in v3 #
Total comments: 1
Patch Set 3 : Fewer comment lines. #Patch Set 4 : Restored switches with assert. #
Total comments: 1
Patch Set 5 : Restored more code. #
Created: 6 years, 9 months ago
Messages
Total messages: 20 (0 generated)
mmenke, can you please review this or forward to someone else if you are not the right person? (asanka?). This is a modification to a macro that expanded to crazy amounts of machine code for a total save of 132 KB (x64) binary size.
Redirecting to gavinp.
https://codereview.chromium.org/196383016/diff/1/net/disk_cache/blockfile/his... File net/disk_cache/blockfile/histogram_macros.h (right): https://codereview.chromium.org/196383016/diff/1/net/disk_cache/blockfile/his... net/disk_cache/blockfile/histogram_macros.h:26: base::HistogramBase* counter = base::Histogram::FactoryGet( \ Does this mean that we should change UMA_HISTOGRAM_XXX to eliminate the static object? I mean, these macros basically generate n independent histograms (ignore the experiment part as that's not currently used and unlikely to be used again) that could be as well named "AppCache.Foo", "HttpCache.Foo" and "ShaderCache.Foo". So yes, there's a lot of histograms here, but there's a lot of histograms in general.
https://codereview.chromium.org/196383016/diff/1/net/disk_cache/blockfile/his... File net/disk_cache/blockfile/histogram_macros.h (right): https://codereview.chromium.org/196383016/diff/1/net/disk_cache/blockfile/his... net/disk_cache/blockfile/histogram_macros.h:26: base::HistogramBase* counter = base::Histogram::FactoryGet( \ On 2014/03/14 21:48:41, rvargas wrote: > Does this mean that we should change UMA_HISTOGRAM_XXX to eliminate the static > object? > > I mean, these macros basically generate n independent histograms (ignore the > experiment part as that's not currently used and unlikely to be used again) that > could be as well named "AppCache.Foo", "HttpCache.Foo" and "ShaderCache.Foo". > > So yes, there's a lot of histograms here, but there's a lot of histograms in > general. I think removing static from all UVA macros could be the right thing to do but there might be one or two places where static is needed to not make the lookup become a bottleneck. This code was somewhat of a special case because of the use in the switch which amplified the overhead. I basically found this by noticing that a few disk cache functions were on the list of largest functions in Chromium (before the patch, not afterwards).
Daniel, This looks really good. I think histogram_macros_v3.h in the same directory probably needs to have the same edits. As well, simple/simple_histogram_macros.h is another example of this (bloating) pattern. You can be forgiven for not changing that one, but this is a reminder to me at least to follow up with that soon. https://codereview.chromium.org/196383016/diff/1/net/disk_cache/blockfile/his... File net/disk_cache/blockfile/histogram_macros.h (right): https://codereview.chromium.org/196383016/diff/1/net/disk_cache/blockfile/his... net/disk_cache/blockfile/histogram_macros.h:26: base::HistogramBase* counter = base::Histogram::FactoryGet( \ On 2014/03/14 21:48:41, rvargas wrote: > > I mean, these macros basically generate n independent histograms (ignore the > experiment part as that's not currently used and unlikely to be used again) that > could be as well named "AppCache.Foo", "HttpCache.Foo" and "ShaderCache.Foo". > I think this would be great. It's very difficult for people to understand what .0, .1, etc.. mean. I imagine there's a small benefit in binary size (because histograms store the strings explicitly), but even this need not be so if we generate them programatically. https://codereview.chromium.org/196383016/diff/1/net/disk_cache/blockfile/his... net/disk_cache/blockfile/histogram_macros.h:26: base::HistogramBase* counter = base::Histogram::FactoryGet( \ On 2014/03/14 21:48:41, rvargas wrote: > Does this mean that we should change UMA_HISTOGRAM_XXX to eliminate the static > object? A few weeks ago, I started looking at this. It turns out that not only are these UMA a cause of bloat, but they are unusually slow on ARM because of very conservative dependent load semantics. If you're interested, see https://codereview.chromium.org/148063009/ for the start of this work (which keeps the semantics, but saves space on ARM at very little cost on x86). I'm hoping to continue that, but I wanted to look more carefully at the perf gains from being more aggressive with memory barriers. https://codereview.chromium.org/196383016/diff/1/net/disk_cache/blockfile/his... net/disk_cache/blockfile/histogram_macros.h:96: switch (CACHE_UMA_BACKEND_IMPL_OBJ->cache_type()) {\ Does it make sense to keep this as a switch statement now? I imagine a DCHECK would work well for this now.
Uploaded a new version where the same change was done to the v3 code and removed the switches. A bit hard (as in "ugly") to do a DCHECK with a similar change so I just skipped checking completely. It's not like it matters anymore. As for renaming the histogram values (outside the scope of this patch) - remember that it means losing the history. Apart from that I think it's a good idea. The numbers are not very helpful.
This change LGTM, with or without my last nit. Thanks for looking into this and fixing it. https://codereview.chromium.org/196383016/diff/20001/net/disk_cache/blockfile... File net/disk_cache/blockfile/histogram_macros_v3.h (right): https://codereview.chromium.org/196383016/diff/20001/net/disk_cache/blockfile... net/disk_cache/blockfile/histogram_macros_v3.h:19: // Note: These macros are only run on one thread, so the declarations of Nit: I would just eliminate this entire comment here, because it seems to be recapitulating some history of how we wrote this. But I think that's a minor style thing, so take it or leave it.
The CQ bit was checked by bratell@opera.com
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/bratell@opera.com/196383016/40001
https://codereview.chromium.org/196383016/diff/1/net/disk_cache/blockfile/his... File net/disk_cache/blockfile/histogram_macros.h (right): https://codereview.chromium.org/196383016/diff/1/net/disk_cache/blockfile/his... net/disk_cache/blockfile/histogram_macros.h:26: base::HistogramBase* counter = base::Histogram::FactoryGet( \ On 2014/03/17 10:15:28, gavinp wrote: > On 2014/03/14 21:48:41, rvargas wrote: > > Does this mean that we should change UMA_HISTOGRAM_XXX to eliminate the static > > object? > > A few weeks ago, I started looking at this. It turns out that not only are these > UMA a cause of bloat, but they are unusually slow on ARM because of very > conservative dependent load semantics. > > If you're interested, see https://codereview.chromium.org/148063009/ for the > start of this work (which keeps the semantics, but saves space on ARM at very > little cost on x86). I would be more interested in an open bug. > > I'm hoping to continue that, but I wanted to look more carefully at the perf > gains from being more aggressive with memory barriers. https://codereview.chromium.org/196383016/diff/1/net/disk_cache/blockfile/his... net/disk_cache/blockfile/histogram_macros.h:26: base::HistogramBase* counter = base::Histogram::FactoryGet( \ On 2014/03/17 09:46:57, Daniel Bratell wrote: > On 2014/03/14 21:48:41, rvargas wrote: > > Does this mean that we should change UMA_HISTOGRAM_XXX to eliminate the static > > object? > > > > I mean, these macros basically generate n independent histograms (ignore the > > experiment part as that's not currently used and unlikely to be used again) > that > > could be as well named "AppCache.Foo", "HttpCache.Foo" and "ShaderCache.Foo". > > > > So yes, there's a lot of histograms here, but there's a lot of histograms in > > general. > > I think removing static from all UVA macros could be the right thing to do but > there might be one or two places where static is needed to not make the lookup > become a bottleneck. > > This code was somewhat of a special case because of the use in the switch which > amplified the overhead. I basically found this by noticing that a few disk cache > functions were on the list of largest functions in Chromium (before the patch, > not afterwards). How does the switch amplifies the overhead? What it enables is to generate more histograms with less lines of source code, but if function xx wants to generate 5 different histograms using standard base UMA macros (with statics), I would expect it to generate roughly the same binary code than one call of this macro. Is that not the case? https://codereview.chromium.org/196383016/diff/1/net/disk_cache/blockfile/his... net/disk_cache/blockfile/histogram_macros.h:96: switch (CACHE_UMA_BACKEND_IMPL_OBJ->cache_type()) {\ On 2014/03/17 10:15:28, gavinp wrote: > Does it make sense to keep this as a switch statement now? I imagine a DCHECK > would work well for this now. I would actually prefer to keep a switch (or equivalent), as I don't want new cache types to generate new histograms automatically. In fact, having PNACL histograms was probably a mistake (likely to be removed soon). That intention was more clear with the v3 version (although to be fair, irrelevant as that code is not currently used in prod)
The CQ bit was unchecked by rvargas@chromium.org
I forgot to mention that this looks fine as long as the plan is to remove all (or most) statics from histograms. I want to avoid diverging behavior on the macros.
I opened http://code.google.com/p/chromium/issues/detail?id=353244 about looking at the default macros. Personally I wouldn't mind dropping static and then optimizing any fallout individually, but that may just be coming from an environment where we were much more careful with memory usage and binary sizes. It's clear to me that the disk_cache macros is a much larger portion of the net binary size than I think you net people like, but it's possible you know a better way of solving that issue. This seemed to be the most obvious solution. Downside being that the disk cache macros diverge further from the default macros but they were already different so it's a problem in scale rather than in principle. https://codereview.chromium.org/196383016/diff/1/net/disk_cache/blockfile/his... File net/disk_cache/blockfile/histogram_macros.h (right): https://codereview.chromium.org/196383016/diff/1/net/disk_cache/blockfile/his... net/disk_cache/blockfile/histogram_macros.h:26: base::HistogramBase* counter = base::Histogram::FactoryGet( \ On 2014/03/17 18:03:08, rvargas wrote: > On 2014/03/17 09:46:57, Daniel Bratell wrote: > > On 2014/03/14 21:48:41, rvargas wrote: > > > Does this mean that we should change UMA_HISTOGRAM_XXX to eliminate the > static > > > object? > > > > > > I mean, these macros basically generate n independent histograms (ignore the > > > experiment part as that's not currently used and unlikely to be used again) > > that > > > could be as well named "AppCache.Foo", "HttpCache.Foo" and > "ShaderCache.Foo". > > > > > > So yes, there's a lot of histograms here, but there's a lot of histograms in > > > general. > > > > I think removing static from all UVA macros could be the right thing to do but > > there might be one or two places where static is needed to not make the lookup > > become a bottleneck. > > > > This code was somewhat of a special case because of the use in the switch > which > > amplified the overhead. I basically found this by noticing that a few disk > cache > > functions were on the list of largest functions in Chromium (before the patch, > > not afterwards). > > How does the switch amplifies the overhead? What it enables is to generate more > histograms with less lines of source code, but if function xx wants to generate > 5 different histograms using standard base UMA macros (with statics), I would > expect it to generate roughly the same binary code than one call of this macro. > Is that not the case? The "static" just prevented the optimizer from reusing the same code for all paths in the code, even though the code looked the same because the value of the histogram (hidden) variable depended on how the input the the first time that code path was reached. Having five distinct blocks for the five different histograms, distinct enough to prevent any merge of code, would probably cause the same accidental bloating. There is also a cost an overhead to static variables in that there has to be machine code to check if it's initialized or not every time you reach it as well as other things. If you look at the size savings, you see that ReportStats shrunk by more than 80% so it's not just that there a fifth of the macro left, there is more to it as well.
On 2014/03/17 19:35:53, Daniel Bratell wrote: > I opened http://code.google.com/p/chromium/issues/detail?id=353244 about looking > at the default macros. Personally I wouldn't mind dropping static and then > optimizing any fallout individually, but that may just be coming from an > environment where we were much more careful with memory usage and binary sizes. > > It's clear to me that the disk_cache macros is a much larger portion of the net > binary size than I think you net people like, but it's possible you know a > better way of solving that issue. This seemed to be the most obvious solution. > Downside being that the disk cache macros diverge further from the default > macros but they were already different so it's a problem in scale rather than in > principle. > > https://codereview.chromium.org/196383016/diff/1/net/disk_cache/blockfile/his... > File net/disk_cache/blockfile/histogram_macros.h (right): > > https://codereview.chromium.org/196383016/diff/1/net/disk_cache/blockfile/his... > net/disk_cache/blockfile/histogram_macros.h:26: base::HistogramBase* counter = > base::Histogram::FactoryGet( \ > On 2014/03/17 18:03:08, rvargas wrote: > > On 2014/03/17 09:46:57, Daniel Bratell wrote: > > > On 2014/03/14 21:48:41, rvargas wrote: > > > > Does this mean that we should change UMA_HISTOGRAM_XXX to eliminate the > > static > > > > object? > > > > > > > > I mean, these macros basically generate n independent histograms (ignore > the > > > > experiment part as that's not currently used and unlikely to be used > again) > > > that > > > > could be as well named "AppCache.Foo", "HttpCache.Foo" and > > "ShaderCache.Foo". > > > > > > > > So yes, there's a lot of histograms here, but there's a lot of histograms > in > > > > general. > > > > > > I think removing static from all UVA macros could be the right thing to do > but > > > there might be one or two places where static is needed to not make the > lookup > > > become a bottleneck. > > > > > > This code was somewhat of a special case because of the use in the switch > > which > > > amplified the overhead. I basically found this by noticing that a few disk > > cache > > > functions were on the list of largest functions in Chromium (before the > patch, > > > not afterwards). > > > > How does the switch amplifies the overhead? What it enables is to generate > more > > histograms with less lines of source code, but if function xx wants to > generate > > 5 different histograms using standard base UMA macros (with statics), I would > > expect it to generate roughly the same binary code than one call of this > macro. > > Is that not the case? > > The "static" just prevented the optimizer from reusing the same code for all > paths in the code, even though the code looked the same because the value of the > histogram (hidden) variable depended on how the input the the first time that > code path was reached. > > Having five distinct blocks for the five different histograms, distinct enough > to prevent any merge of code, would probably cause the same accidental bloating. > > There is also a cost an overhead to static variables in that there has to be > machine code to check if it's initialized or not every time you reach it as well > as other things. > > If you look at the size savings, you see that ReportStats shrunk by more than > 80% so it's not just that there a fifth of the macro left, there is more to it > as well. OK, we're in the same page then.
LGTM after fixing nit. https://codereview.chromium.org/196383016/diff/60001/net/disk_cache/blockfile... File net/disk_cache/blockfile/histogram_macros_v3.h (right): https://codereview.chromium.org/196383016/diff/60001/net/disk_cache/blockfile... net/disk_cache/blockfile/histogram_macros_v3.h:98: NOTREACHED();\ Don't add a NOTREACHED here... I'm going with no histograms by default :)
On 2014/03/17 19:40:59, rvargas wrote: > LGTM after fixing nit. > > https://codereview.chromium.org/196383016/diff/60001/net/disk_cache/blockfile... > File net/disk_cache/blockfile/histogram_macros_v3.h (right): > > https://codereview.chromium.org/196383016/diff/60001/net/disk_cache/blockfile... > net/disk_cache/blockfile/histogram_macros_v3.h:98: NOTREACHED();\ > Don't add a NOTREACHED here... I'm going with no histograms by default :) Do you want the default moved to the bottom as well? So that calling it with a non-supported type will do nothing at all?
On 2014/03/17 19:57:06, Daniel Bratell wrote: > On 2014/03/17 19:40:59, rvargas wrote: > > LGTM after fixing nit. > > > > > https://codereview.chromium.org/196383016/diff/60001/net/disk_cache/blockfile... > > File net/disk_cache/blockfile/histogram_macros_v3.h (right): > > > > > https://codereview.chromium.org/196383016/diff/60001/net/disk_cache/blockfile... > > net/disk_cache/blockfile/histogram_macros_v3.h:98: NOTREACHED();\ > > Don't add a NOTREACHED here... I'm going with no histograms by default :) > > Do you want the default moved to the bottom as well? So that calling it with a > non-supported type will do nothing at all? yes, please :)
The CQ bit was checked by bratell@opera.com
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/bratell@opera.com/196383016/80001
Message was sent while issue was closed.
Change committed as 257637 |