|
|
Created:
6 years, 2 months ago by jochen (gone - plz use gerrit) Modified:
5 years, 4 months ago CC:
aandrey+blink_chromium.org, blink-reviews, blink-reviews-wtf_chromium.org, Mikhail, mkwst+moarreviews_chromium.org, Tom Sepez Base URL:
svn://svn.chromium.org/blink/trunk Project:
blink Visibility:
Public. |
DescriptionAdd retry logic to partition alloc
If an allocation fails, we'll trigger an emergency GC and retry once.
We'll only try to free up memory on one thread - this means that if
multiple threads run out of memory at the same time, we're still out of
luck, but it's better than nothing.
BUG=397037
R=cevans@chromium.org
Patch Set 1 #
Total comments: 6
Messages
Total messages: 15 (1 generated)
+jl/tsepez fyi
jl@opera.com changed reviewers: + jl@opera.com
https://codereview.chromium.org/657113006/diff/1/Source/wtf/PartitionAlloc.cpp File Source/wtf/PartitionAlloc.cpp (right): https://codereview.chromium.org/657113006/diff/1/Source/wtf/PartitionAlloc.cp... Source/wtf/PartitionAlloc.cpp:723: void* ptr = partitionAllocSlowPath(root, flags | PartitionAllocRetry, size, bucket); Wouldn't it be better to retry partitionBucketAlloc() instead of partitionAllocSlowPath()? By trying partitionAllocSlowPath() you bypass the per-bucket freelist check, and partitionAllocSlowPath() begins with an assert that that freelist is empty. If memory has actually been freed, typically the freelist wouldn't be empty.
On 2014/10/22 11:24:05, Jens Widell wrote: > https://codereview.chromium.org/657113006/diff/1/Source/wtf/PartitionAlloc.cpp > File Source/wtf/PartitionAlloc.cpp (right): > > https://codereview.chromium.org/657113006/diff/1/Source/wtf/PartitionAlloc.cp... > Source/wtf/PartitionAlloc.cpp:723: void* ptr = partitionAllocSlowPath(root, > flags | PartitionAllocRetry, size, bucket); > Wouldn't it be better to retry partitionBucketAlloc() instead of > partitionAllocSlowPath()? > > By trying partitionAllocSlowPath() you bypass the per-bucket freelist check, and > partitionAllocSlowPath() begins with an assert that that freelist is empty. If > memory has actually been freed, typically the freelist wouldn't be empty. I just realized why not; sorry for the noise. At this point in partitionAllocSlowPath(), partitionSetNewActivePage() will have failed and will have set bucket->activePagesHead to null, which would make partitionBucketAlloc() crash. It also means that the freelist (which is per page in the bucket, not per bucket, as I said above) can't be checked, although some page could have an non-empty freelist. But that will be checked by partitionAllocSlowPath(). So again, sorry for the noise.
I wonder what will happen when the GC frees up memory but also causes some allocations (it shouldn't, but there's no way to guarantee that).
On 2014/10/22 11:38:45, jochen wrote: > I wonder what will happen when the GC frees up memory but also causes some > allocations (it shouldn't, but there's no way to guarantee that). Trying to allocate from the same bucket (i.e. similar size) would crash, since bucket->activePagesHead is null, unless partitionFreeSlowPath() was called first for a previously full page belonging to this bucket, and added that page to bucket->activePagesHead. Actually, partitionAllocSlowPath() will crash too when you retry, for the same reason; first in that ASSERT() and then also later in the call to partitionSetNewActivePage(). That could be fixed by checking if bucket->activePagesHead is still null, and only call partitionAllocSlowPath() if it isn't. If bucket->activePagesHead is still null, all you'd need to retry would be partitionAllocPartitionPages().
hum, I wonder whether partitionAllocBucket should do the retrying. or maybe the retry in partitionAllocSlowPath should invoke partitionAllocBucket()
On 2014/10/22 15:22:45, jochen wrote: > hum, I wonder whether partitionAllocBucket should do the retrying. or maybe the > retry in partitionAllocSlowPath should invoke partitionAllocBucket() Since partitionAllocBucket() is part of the fast path I guess we want to keep this kind of logic out of it, really. And I don't know if that would really solve the problem, either. The problem is, I think, that there's no state that the slow path can currently leave the bucket in (while failing) that both the allocation machinery and the freeing machinery is happy with. For allocation, you could probably set bucket->activePagesHead to &gSeedPage and have it behave correctly. But that's not quite acceptable to partitionFreeSlowPath() (it has an ASSERT()), since how could you be freeing something allocated from a bucket that's in its initial empty state? It's probably quite simple to adjust partitionFreeSlowPath() to accept (not assert against) bucket->activePagesHead==&gSeedPage and simply overwrite bucket->activePagesHead with the page you just freed something from in that case.
On 2014/10/22 at 15:31:40, jl wrote: > On 2014/10/22 15:22:45, jochen wrote: > > hum, I wonder whether partitionAllocBucket should do the retrying. or maybe the > > retry in partitionAllocSlowPath should invoke partitionAllocBucket() > > Since partitionAllocBucket() is part of the fast path I guess we want to keep this kind of logic out of it, really. > > And I don't know if that would really solve the problem, either. The problem is, I think, that there's no state that the slow path can currently leave the bucket in (while failing) that both the allocation machinery and the freeing machinery is happy with. > > For allocation, you could probably set bucket->activePagesHead to &gSeedPage and have it behave correctly. But that's not quite acceptable to partitionFreeSlowPath() (it has an ASSERT()), since how could you be freeing something allocated from a bucket that's in its initial empty state? > > It's probably quite simple to adjust partitionFreeSlowPath() to accept (not assert against) bucket->activePagesHead==&gSeedPage and simply overwrite bucket->activePagesHead with the page you just freed something from in that case. so what I don't understand is this. If we set the flag ReturnNull, then partitionAllocSlowPath has to at least leave the partition in a state that will make it return nullptr again if you retry, no? are you saying that if partitionAllocSlowPath() fails, the partition is unusable in any case?
On 2014/10/22 15:33:43, jochen wrote: > so what I don't understand is this. If we set the flag ReturnNull, then > partitionAllocSlowPath has to at least leave the partition in a state that will > make it return nullptr again if you retry, no? > > are you saying that if partitionAllocSlowPath() fails, the partition is unusable > in any case? Yes, apparently. Just that bucket, not the whole partition. But that's not good, either way. If you repeat an allocation with PartitionAllocReturnNull that returned null, the second attempt will crash, since bucket->activePagesHead is null. Allocation sizes above kGenericMaxBucketed (1 MB) take a different path and don't have that problem, though.
On 2014/10/22 15:41:37, Jens Widell wrote: > On 2014/10/22 15:33:43, jochen wrote: > > are you saying that if partitionAllocSlowPath() fails, the partition is > unusable > > in any case? > > Yes, apparently. Just that bucket, not the whole partition. But that's not good, > either way. Here's a test case that confirms that this bug exists: https://codereview.chromium.org/645223007
Generally, looking nice and simple. I agree with Jens' comment that we can use partitionBucketAlloc() to do a re-try that is more likely to succeed. I'm a bit worried by the re-entreancy that we expect to occur from within the callback, though. I think we have an unsolved double-lock problem and it's gnarly. I also worry in general about adding re-entrancy but maybe that's just me. Finally, we'll definitely want to add a test of all the different interesting situations if we move forward. https://codereview.chromium.org/657113006/diff/1/Source/wtf/PartitionAlloc.cpp File Source/wtf/PartitionAlloc.cpp (right): https://codereview.chromium.org/657113006/diff/1/Source/wtf/PartitionAlloc.cp... Source/wtf/PartitionAlloc.cpp:324: s_partitionFreeUpMemoryCallback(); Ah, so s_partitionFreeUpMemoryCallback() is going to by definition re-enter the allocator, for example partitionFreeGeneric(). Currently, this won't work -- partitionFreeGeneric() will attempt to take the lock, which will deadlock because it is already held. We're using a plain old spinlock and not a recursive lock. We could drop and then re-acquire the lock but that does make me a little nervous. It may also be tricky, actually: at this level we don't know what type of partition we're dealing with (generic, which has locks vs. sized which is lockless). I bet the code paths within the callback will be a little complicated. So I would not be surprised if something tried to allocate memory. Aside from the deadlock concern above, this would now raise the spectre of re-entering partitionFreeUpMemory(). If we do proceed with this patch, we might want to define what our exact re-entrancy expectations are and add some ASSERT()s to make sure the callback implementation is not being naughty. https://codereview.chromium.org/657113006/diff/1/Source/wtf/PartitionAlloc.cp... Source/wtf/PartitionAlloc.cpp:722: partitionFreeUpMemory(); Nit: I wonder if we want a comment here to say that this is tail re-entrancy and to be careful not to rely on previously calculated state after this call? https://codereview.chromium.org/657113006/diff/1/Source/wtf/PartitionAlloc.cp... Source/wtf/PartitionAlloc.cpp:723: void* ptr = partitionAllocSlowPath(root, flags | PartitionAllocRetry, size, bucket); On 2014/10/22 11:24:05, Jens Widell wrote: > Wouldn't it be better to retry partitionBucketAlloc() instead of > partitionAllocSlowPath()? > > By trying partitionAllocSlowPath() you bypass the per-bucket freelist check, and > partitionAllocSlowPath() begins with an assert that that freelist is empty. If > memory has actually been freed, typically the freelist wouldn't be empty. I agree that partitionBucketAlloc() is the correct API to call. https://codereview.chromium.org/657113006/diff/1/Source/wtf/PartitionAlloc.h File Source/wtf/PartitionAlloc.h (right): https://codereview.chromium.org/657113006/diff/1/Source/wtf/PartitionAlloc.h#... Source/wtf/PartitionAlloc.h:202: Nit: looks like a double newline here?
https://codereview.chromium.org/657113006/diff/1/Source/wtf/PartitionAlloc.cpp File Source/wtf/PartitionAlloc.cpp (right): https://codereview.chromium.org/657113006/diff/1/Source/wtf/PartitionAlloc.cp... Source/wtf/PartitionAlloc.cpp:320: { One more thing. Can we add here a // TODO comment that says we should also take the opportunity to flush the ring list of unused but still committed pages?
On 2014/11/06 21:45:34, Chris Evans wrote: > https://codereview.chromium.org/657113006/diff/1/Source/wtf/PartitionAlloc.cpp > File Source/wtf/PartitionAlloc.cpp (right): > > https://codereview.chromium.org/657113006/diff/1/Source/wtf/PartitionAlloc.cp... > Source/wtf/PartitionAlloc.cpp:320: { > One more thing. Can we add here a // TODO comment that says we should also take > the opportunity to flush the ring list of unused but still committed pages? Hmm, the more I think about this, the more nervous I'm becoming. What if we were using standard system malloc() for all allocations? We would simply have to put up with a NULL result being returned to the caller, or a std::bad_alloc for operator new. There's a similar story with tcmalloc, where I don't think there's a callback + retry facility? You'll just get a crash. So I worry that we're adding re-entrancy-based complexity just because we can in this case, because it's easy to modify an in-Blink-tree allocator. I also worry that adding this complexity won't solve the problem fully. If the system is low on memory, there's every chance that a tcmalloc-backed allocation will be the one that pushes things over the edge, e.g. perhaps from an image decode or IPC message deserialize etc. |