|
|
Created:
8 years, 9 months ago by Dai Mikurube (NOT FULLTIME) Modified:
8 years, 9 months ago CC:
chromium-reviews, willchan no longer on Chromium Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionTry fixing virtual memory regression by new tcmalloc.
It uses the old values before tcmalloc update in r126412.
BUG=118329
TEST=run perf tests.
Patch Set 1 #
Total comments: 6
Patch Set 2 : modified. #
Total comments: 3
Messages
Total messages: 22 (0 generated)
This change is trying to fix http://crbug.com/118329.
http://codereview.chromium.org/9699084/diff/1/third_party/tcmalloc/chromium/s... File third_party/tcmalloc/chromium/src/common.h (right): http://codereview.chromium.org/9699084/diff/1/third_party/tcmalloc/chromium/s... third_party/tcmalloc/chromium/src/common.h:84: static const size_t kNumClasses = 86 - kSkippedClasses; I have the recollection we used to have fewer classes as well (some where close to 60 or 64). How does this all work out?? I think the scaling factor between classes used to be 1.125 (1 + 1/8th). Does this change when we have additional classes? http://codereview.chromium.org/9699084/diff/1/third_party/tcmalloc/chromium/s... third_party/tcmalloc/chromium/src/common.h:89: // Chromium tunes kMaxSize: 256 * 1024 -> 8u * kPageSize (256k -> 32k) Please use prose, rather than the symbolic "->" to try to mean "changed to" or something. For instance, you might say: Original TCMalloc code used kMaxSize == 256 * 1024. In Chrome, we changed this to 32K, and represent it in terms of page size (as was done in prior versions of TCMalloc). http://codereview.chromium.org/9699084/diff/1/third_party/tcmalloc/chromium/s... third_party/tcmalloc/chromium/src/common.h:183: (((1 << kPageShift) * 8u + 127 + (120 << 7)) >> 7) + 1; I don't think you need to change this line. I think it is derived from the constant. In fact, in this case, kPageShift == 12 so (line 88) PageSize == 1 << kPageShift == 4K Hence this change (highlighted) is indeed 32K, and the old code: kMaxSize == 8u * kPageSize would also have been 32K. Why did you decide this was a better way of stating this, than the current TCMalloc code?
Thanks, Jim. Updated the patch. http://codereview.chromium.org/9699084/diff/1/third_party/tcmalloc/chromium/s... File third_party/tcmalloc/chromium/src/common.h (right): http://codereview.chromium.org/9699084/diff/1/third_party/tcmalloc/chromium/s... third_party/tcmalloc/chromium/src/common.h:84: static const size_t kNumClasses = 86 - kSkippedClasses; On 2012/03/15 19:36:49, jar wrote: > I have the recollection we used to have fewer classes as well (some where close > to 60 or 64). How does this all work out?? I think the scaling factor between > classes used to be 1.125 (1 + 1/8th). Does this change when we have additional > classes? It fails at an assertion in common.cc... I sent a try with a new value 54. http://codereview.chromium.org/9699084/diff/1/third_party/tcmalloc/chromium/s... third_party/tcmalloc/chromium/src/common.h:89: // Chromium tunes kMaxSize: 256 * 1024 -> 8u * kPageSize (256k -> 32k) On 2012/03/15 19:36:49, jar wrote: > Please use prose, rather than the symbolic "->" to try to mean "changed to" or > something. For instance, you might say: > > Original TCMalloc code used kMaxSize == 256 * 1024. In Chrome, we changed this > to 32K, and represent it in terms of page size (as was done in prior versions of > TCMalloc). Done. http://codereview.chromium.org/9699084/diff/1/third_party/tcmalloc/chromium/s... third_party/tcmalloc/chromium/src/common.h:183: (((1 << kPageShift) * 8u + 127 + (120 << 7)) >> 7) + 1; On 2012/03/15 19:36:49, jar wrote: > I don't think you need to change this line. I think it is derived from the > constant. In fact, in this case, > kPageShift == 12 > so (line 88) PageSize == 1 << kPageShift == 4K > Hence this change (highlighted) is indeed 32K, > and the old code: > kMaxSize == 8u * kPageSize > would also have been 32K. > > Why did you decide this was a better way of stating this, than the current > TCMalloc code? I thought keeping the original code implies the original authors' thoughts, if we use the old code. But, using kMaxSize sounds better here.
http://codereview.chromium.org/9699084/diff/6001/third_party/tcmalloc/chromiu... File third_party/tcmalloc/chromium/src/common.h (right): http://codereview.chromium.org/9699084/diff/6001/third_party/tcmalloc/chromiu... third_party/tcmalloc/chromium/src/common.h:85: static const size_t kNumClasses = 54 - kSkippedClasses; We need a comment for this, too, if it works.
http://codereview.chromium.org/9699084/diff/6001/third_party/tcmalloc/chromiu... File third_party/tcmalloc/chromium/src/common.h (right): http://codereview.chromium.org/9699084/diff/6001/third_party/tcmalloc/chromiu... third_party/tcmalloc/chromium/src/common.h:85: static const size_t kNumClasses = 54 - kSkippedClasses; On 2012/03/15 20:18:18, Dai Mikurube wrote: > We need a comment for this, too, if it works. Again.... I'm surprised we're not over 60. What else changed?
http://codereview.chromium.org/9699084/diff/6001/third_party/tcmalloc/chromiu... File third_party/tcmalloc/chromium/src/common.h (right): http://codereview.chromium.org/9699084/diff/6001/third_party/tcmalloc/chromiu... third_party/tcmalloc/chromium/src/common.h:85: static const size_t kNumClasses = 54 - kSkippedClasses; On 2012/03/15 20:56:37, jar wrote: > On 2012/03/15 20:18:18, Dai Mikurube wrote: > > We need a comment for this, too, if it works. > > Again.... I'm surprised we're not over 60. > > What else changed? > At first, I tried picking the number from its error log. I'll try to find the reason...
I tried to add Log(kLog, __FILE__, __LINE__, "SizeMap::Init -- ", sc, size, alignment); in common.cc:SizeMap::Init line 119 (in the loop finding size classes). http://src.chromium.org/viewvc/chrome/trunk/src/third_party/tcmalloc/chromium... Actually, it has 72-73 lines. But sometimes "continue"d in line common.cc:line 146. ... SizeMap::Init -- 1 16 8 ... SizeMap::Init -- 2 32 16 ... SizeMap::Init -- 3 48 16 ... SizeMap::Init -- 4 64 16 ... SizeMap::Init -- 5 80 16 ... SizeMap::Init -- 6 96 16 ... SizeMap::Init -- 7 112 16 ... SizeMap::Init -- 8 128 16 ... SizeMap::Init -- 9 144 16 ... SizeMap::Init -- 10 160 16 ... SizeMap::Init -- 11 176 16 ... SizeMap::Init -- 12 192 16 ... SizeMap::Init -- 13 208 16 ... SizeMap::Init -- 14 224 16 ... SizeMap::Init -- 15 240 16 ... SizeMap::Init -- 16 256 16 ... SizeMap::Init -- 17 288 32 ... SizeMap::Init -- 18 320 32 ... SizeMap::Init -- 19 352 32 ... SizeMap::Init -- 20 384 32 ... SizeMap::Init -- 21 416 32 ... SizeMap::Init -- 22 448 32 ... SizeMap::Init -- 22 480 32 ... SizeMap::Init -- 23 512 32 ... SizeMap::Init -- 23 576 64 ... SizeMap::Init -- 24 640 64 ... SizeMap::Init -- 25 704 64 ... SizeMap::Init -- 26 768 64 ... SizeMap::Init -- 27 832 64 ... SizeMap::Init -- 28 896 64 ... SizeMap::Init -- 28 960 64 ... SizeMap::Init -- 29 1024 64 ... SizeMap::Init -- 29 1152 128 ... SizeMap::Init -- 30 1280 128 ... SizeMap::Init -- 31 1408 128 ... SizeMap::Init -- 32 1536 128 ... SizeMap::Init -- 32 1664 128 ... SizeMap::Init -- 33 1792 128 ... SizeMap::Init -- 33 1920 128 ... SizeMap::Init -- 34 2048 128 ... SizeMap::Init -- 34 2304 256 ... SizeMap::Init -- 35 2560 256 ... SizeMap::Init -- 36 2816 256 ... SizeMap::Init -- 37 3072 256 ... SizeMap::Init -- 38 3328 256 ... SizeMap::Init -- 39 3584 256 ... SizeMap::Init -- 40 3840 256 ... SizeMap::Init -- 40 4096 256 ... SizeMap::Init -- 40 4608 512 ... SizeMap::Init -- 41 5120 512 ... SizeMap::Init -- 42 5632 512 ... SizeMap::Init -- 43 6144 512 ... SizeMap::Init -- 43 6656 512 ... SizeMap::Init -- 44 7168 512 ... SizeMap::Init -- 45 7680 512 ... SizeMap::Init -- 45 8192 512 ... SizeMap::Init -- 45 9216 1024 ... SizeMap::Init -- 46 10240 1024 ... SizeMap::Init -- 46 11264 1024 ... SizeMap::Init -- 47 12288 1024 ... SizeMap::Init -- 47 13312 1024 ... SizeMap::Init -- 48 14336 1024 ... SizeMap::Init -- 49 15360 1024 ... SizeMap::Init -- 49 16384 1024 ... SizeMap::Init -- 49 18432 2048 ... SizeMap::Init -- 50 20480 2048 ... SizeMap::Init -- 50 22528 2048 ... SizeMap::Init -- 51 24576 2048 ... SizeMap::Init -- 51 26624 2048 ... SizeMap::Init -- 52 28672 2048 ... SizeMap::Init -- 52 30720 2048 ... SizeMap::Init -- 53 32768 2048
Code changes around line 125 might affect it. http://codereview.chromium.org/9584046/diff/10001/third_party/tcmalloc/chromi...
I believe that code impacts how many pages we use for an allocation class, and ensures (now) that when we do a carving, we get at least enough to be able to immediately service a call to load a thread cache from the central cache. I don't think it will impact the number of distinct classes. Could you suggest how it could impact class count? It is interesting to note that this new code, in doing this work on line 125, may *require* that we get a larger contiguous set of pages in one swoop for a given class. Since this code will presumably only be an issue for the larger classes, this *might* require rather large contiguous blocks be available. This in turn *could* cause OOM when we have fragmentation, or at least cause it sooner than the previous version. Note that I *think* OOM scenarios are less common for a server, that is doing the same thing again and again and again and again. In contrast, a browser has a lot more variation... potentially leading to fragmentation. We have to keep this thought in mind if we start seeing higher OOM rates from the field. Jim On Thu, Mar 15, 2012 at 2:49 PM, <dmikurube@chromium.org> wrote: > Code changes around line 125 might affect it. > http://codereview.chromium.**org/9584046/diff/10001/third_** > party/tcmalloc/chromium/src/**common.cc<http://codereview.chromium.org/9584046/diff/10001/third_party/tcmalloc/chromium/src/common.cc> > > http://codereview.chromium.**org/9699084/<http://codereview.chromium.org/9699... >
Also notice, when I do an "about:tcmalloc" the page displays stats for 60 classes (at least in the version of chrome I'm running... which is I think a dev build). Jim On Thu, Mar 15, 2012 at 3:33 PM, Jim Roskind <jar@chromium.org> wrote: > I believe that code impacts how many pages we use for an allocation class, > and ensures (now) that when we do a carving, we get at least enough to be > able to immediately service a call to load a thread cache from the central > cache. > > I don't think it will impact the number of distinct classes. Could you > suggest how it could impact class count? > > It is interesting to note that this new code, in doing this work on line > 125, may *require* that we get a larger contiguous set of pages in one > swoop for a given class. Since this code will presumably only be an issue > for the larger classes, this *might* require rather large contiguous blocks > be available. This in turn *could* cause OOM when we have fragmentation, or > at least cause it sooner than the previous version. > > Note that I *think* OOM scenarios are less common for a server, that is > doing the same thing again and again and again and again. In contrast, > a browser has a lot more variation... potentially leading to fragmentation. > > We have to keep this thought in mind if we start seeing higher OOM rates > from the field. > > Jim > > > > On Thu, Mar 15, 2012 at 2:49 PM, <dmikurube@chromium.org> wrote: > >> Code changes around line 125 might affect it. >> http://codereview.chromium.**org/9584046/diff/10001/third_** >> party/tcmalloc/chromium/src/**common.cc<http://codereview.chromium.org/9584046/diff/10001/third_party/tcmalloc/chromium/src/common.cc> >> >> http://codereview.chromium.**org/9699084/<http://codereview.chromium.org/9699... >> > >
Though I don't still understand the behavior... (I'm trying): But, I think another idea could be using the entire old common.{cc|h} in Chromium. I made sure it works locally. With the old common.{cc|h}, I observed it has 60 classes.
Found that changes in AlignmentForSize(common.cc) are the key of # of classes. Experimentally, with new AlignmentForSize, # of classes is 53 while 60 with old AlignmentForSize. Looking at the code, it changes |alignment| for every iteration. It'll change the loop.
Interesting.... the alignment for size might be the difference. I assume the reason they did this was to get better cache locality. I *think* that 128 byte blocks are regularly loaded from memory into cache. Getting this header alignment *might*(?) mean that for many objects, the head gets loaded with a single cache load. I can't see the reason for a bloat difference... but here are at least some thoughts and notes on what they are (now) doing: The cost of this insisting on alignment of 128 bytes is wasted space between allocation units (carved out of a span), when the class size is not a multiple of 128. To avoid that complete wastage, they might have rounded sizes up to the next multiple of 128 (when they chase that alignment.. which probably happens for blocks over 1024 bytes). Forcing this sizing at 1024 means that we start wasting as much as 128 bytes in 1024 allocated, or about 1/8 overhead. Looking at the current about:tcmalloc I see that class 29 is 832 bytes, while class 30 is the full 1024 bytes. I think the "intermediate" would-be class got absorbed into the 1024 byte class size (there are some rules about using a near-by size it if works out better). IMO, they made a mistake(?) when they left such a big gap between 832 and 1024 bytes as viable class sizes. It violates their target rule of "never wasting more than 1/8th" of the allocated space. If a user asks for 833 bytes, they'll get 1024, wasting 191 bytes (far more than 128 bytes, for 1/18th!). looking at our TCMalloc allocation (in my browser), Jim On Thu, Mar 15, 2012 at 5:36 PM, <dmikurube@chromium.org> wrote: > Found that changes in AlignmentForSize(common.cc) are the key of # of > classes. > > Experimentally, with new AlignmentForSize, # of classes is 53 while 60 > with old > AlignmentForSize. > > Looking at the code, it changes |alignment| for every iteration. It'll > change > the loop. > > http://codereview.chromium.**org/9699084/<http://codereview.chromium.org/9699... >
Link to the current common.cc: http://codereview.chromium.org/9584046/diff/10001/third_party/tcmalloc/chromi... Well, we have 3 options to try. 1) Keep the latest and use 53 (54 - kSkippedClasses) as the value. 2) Get AlignmentForSize() back and use 60 (61 - kSkippedClasses). 3) Get the entire common.{cc|h} back and use 60 (61 - kSkippedClasses). I think (1) is good since we know the reason now and it has less difference from gperftools. In case of low performance or bloat, (3) would be the next choice. What do you think, Jim?
Wow, sorry, we crossed in the post. ;)
The more I re-read the perf paper, the less convinced the optimizations are tuned for us, especially since we are mostly Windows... and only currently ship a 32 bit binary there. I see very similar class sizes in both... so that is probably not the difference. The page size may play a role in allocating larger blocks (from spans), and that may lead to fragmentation (or failure to return a span when it is deallocated). I'm quite confused about the "class count" variable, and probably need to read the code to understand better. Jim On Thu, Mar 15, 2012 at 6:22 PM, <dmikurube@chromium.org> wrote: > Link to the current common.cc: > http://codereview.chromium.**org/9584046/diff/10001/third_** > party/tcmalloc/chromium/src/**common.cc<http://codereview.chromium.org/9584046/diff/10001/third_party/tcmalloc/chromium/src/common.cc> > > Well, we have 3 options to try. > 1) Keep the latest and use 53 (54 - kSkippedClasses) as the value. > 2) Get AlignmentForSize() back and use 60 (61 - kSkippedClasses). > 3) Get the entire common.{cc|h} back and use 60 (61 - kSkippedClasses). > > I think (1) is good since we know the reason now and it has less > difference from > gperftools. In case of low performance or bloat, (3) would be the next > choice. > > What do you think, Jim? > > http://codereview.chromium.**org/9699084/<http://codereview.chromium.org/9699... >
The class sizes are very similar.. so my comment about the 1024 size class is probably wrong... and I now think we always had the 128 byte alignment code (somewhere), since the class sizes in the new vs old list are pretty darn similar (identical in all but one class I think). Jim On Thu, Mar 15, 2012 at 6:18 PM, Jim Roskind <jar@chromium.org> wrote: > Interesting.... the alignment for size might be the difference. I assume > the reason they did this was to get better cache locality. I *think* that > 128 byte blocks are regularly loaded from memory into cache. Getting this > header alignment *might*(?) mean that for many objects, the head gets > loaded with a single cache load. > > I can't see the reason for a bloat difference... but here are at least > some thoughts and notes on what they are (now) doing: > > The cost of this insisting on alignment of 128 bytes is wasted space > between allocation units (carved out of a span), when the class size is not > a multiple of 128. To avoid that complete wastage, they might have rounded > sizes up to the next multiple of 128 (when they chase that alignment.. > which probably happens for blocks over 1024 bytes). Forcing this sizing at > 1024 means that we start wasting as much as 128 bytes in 1024 allocated, or > about 1/8 overhead. Looking at the current about:tcmalloc I see that class > 29 is 832 bytes, while class 30 is the full 1024 bytes. I think the > "intermediate" would-be class got absorbed into the 1024 byte class size > (there are some rules about using a near-by size it if works out better). > > IMO, they made a mistake(?) when they left such a big gap between 832 and > 1024 bytes as viable class sizes. It violates their target rule of "never > wasting more than 1/8th" of the allocated space. If a user asks for 833 > bytes, they'll get 1024, wasting 191 bytes (far more than 128 bytes, for > 1/18th!). looking at our TCMalloc allocation (in my browser), > > Jim > > > > On Thu, Mar 15, 2012 at 5:36 PM, <dmikurube@chromium.org> wrote: > >> Found that changes in AlignmentForSize(common.cc) are the key of # of >> classes. >> >> Experimentally, with new AlignmentForSize, # of classes is 53 while 60 >> with old >> AlignmentForSize. >> >> Looking at the code, it changes |alignment| for every iteration. It'll >> change >> the loop. >> >> http://codereview.chromium.**org/9699084/<http://codereview.chromium.org/9699... >> > >
I'm reading the code, but I created CLs for each fixing options in parallel. 1) Keep the latest and use 53 (54 - kSkippedClasses) as the value. http://codereview.chromium.org/9722001/ 2) Get AlignmentForSize() back and use 60 (61 - kSkippedClasses). http://codereview.chromium.org/9717009/ 3) Get the entire common.{cc|h} back and use 60 (61 - kSkippedClasses). http://codereview.chromium.org/9717007/
Though we may have to investigate its behavior more, I found that the choice (1) fixes this memory bloat by an experimental commit. See : > http://build.chromium.org/f/chromium/perf/xp-release-dual-core/moz/report.htm... We can find that r126412 made memory consumption larger, and it got recovered in r127301. r127301 is http://codereview.chromium.org/9722001/. I'll once revert it (since it's not reviewed well) later, and commit it again officially. On 2012/03/16 21:23:30, Dai Mikurube wrote: > I'm reading the code, but I created CLs for each fixing options in parallel. > > 1) Keep the latest and use 53 (54 - kSkippedClasses) as the value. > http://codereview.chromium.org/9722001/ > > 2) Get AlignmentForSize() back and use 60 (61 - kSkippedClasses). > http://codereview.chromium.org/9717009/ > > 3) Get the entire common.{cc|h} back and use 60 (61 - kSkippedClasses). > http://codereview.chromium.org/9717007/
r127301 is reverted in r127379. On 2012/03/17 04:14:39, Dai Mikurube wrote: > Though we may have to investigate its behavior more, I found that the choice (1) > fixes this memory bloat by an experimental commit. > > See : > > > http://build.chromium.org/f/chromium/perf/xp-release-dual-core/moz/report.htm... > > We can find that r126412 made memory consumption larger, and it got recovered in > r127301. r127301 is http://codereview.chromium.org/9722001/. I'll once revert > it (since it's not reviewed well) later, and commit it again officially. > > On 2012/03/16 21:23:30, Dai Mikurube wrote: > > I'm reading the code, but I created CLs for each fixing options in parallel. > > > > 1) Keep the latest and use 53 (54 - kSkippedClasses) as the value. > > http://codereview.chromium.org/9722001/ > > > > 2) Get AlignmentForSize() back and use 60 (61 - kSkippedClasses). > > http://codereview.chromium.org/9717009/ > > > > 3) Get the entire common.{cc|h} back and use 60 (61 - kSkippedClasses). > > http://codereview.chromium.org/9717007/
Finally, (1) was committed as a TBR change http://codereview.chromium.org/9722025/. I wonder if you could take a look at it. Details are there.
We chose the option (1) : http://codereview.chromium.org/9722025/. Closing this change. |