|
|
DescriptionmacOS: Shim all malloc zones.
All information about malloc zones [and their original functions] is stored in
MallocZoneAggregator. When the allocator shim is ready to dispatch back to the
original system implementations, it calls the MallocZoneAggregator with
|context|, which provides enough information to dispatch to the correct malloc
zone.
BUG=693237
Review-Url: https://codereview.chromium.org/2703803004
Cr-Commit-Position: refs/heads/master@{#452784}
Committed: https://chromium.googlesource.com/chromium/src/+/7a54483e22a7e4578870d144b7d4a71380e63a10
Patch Set 1 #Patch Set 2 : clean up #Patch Set 3 : Add base export. #Patch Set 4 : Change location of BASE_EXPORT. #Patch Set 5 : Add a memory barrier. #
Total comments: 24
Patch Set 6 : Comments from mark. #Patch Set 7 : Minor formatting. #
Total comments: 9
Patch Set 8 : Comments from primiano. #Patch Set 9 : Style. #Patch Set 10 : Minor cleanup. #Patch Set 11 : Add base_export #Patch Set 12 : more base export. #
Total comments: 32
Patch Set 13 : Comments from primiano. #Patch Set 14 : Nits. #
Total comments: 10
Patch Set 15 : Comments from primiano, mark. #Patch Set 16 : Rebase. #Dependent Patchsets: Messages
Total messages: 61 (37 generated)
The CQ bit was checked by erikchen@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: mac_chromium_compile_dbg_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_comp...)
The CQ bit was checked by erikchen@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: mac_chromium_compile_dbg_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_comp...)
The CQ bit was checked by erikchen@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
erikchen@chromium.org changed reviewers: + primiano@chromium.org
primiano: Please review.
The CQ bit was checked by erikchen@chromium.org to run a CQ dry run
erikchen@chromium.org changed reviewers: + mark@chromium.org
mark: Please review.
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2703803004/diff/80001/base/allocator/allocato... File base/allocator/allocator_shim_default_dispatch_to_mac_zoned_malloc.cc (right): https://codereview.chromium.org/2703803004/diff/80001/base/allocator/allocato... base/allocator/allocator_shim_default_dispatch_to_mac_zoned_malloc.cc:17: // zones. Avoid using a LeakyInstance since performance is very important, and Did you mean LazyInstance (or perhaps a leaky LazyInstance)? https://codereview.chromium.org/2703803004/diff/80001/base/allocator/allocato... base/allocator/allocator_shim_default_dispatch_to_mac_zoned_malloc.cc:18: // we want to avoid an extra pointer dereference. Hmm. This isn’t wonderful. We don’t want to have an initializer or finalizer run either. https://codereview.chromium.org/2703803004/diff/80001/base/allocator/malloc_z... File base/allocator/malloc_zone_aggregator_mac.cc (right): https://codereview.chromium.org/2703803004/diff/80001/base/allocator/malloc_z... base/allocator/malloc_zone_aggregator_mac.cc:30: // Not all custom malloc zones have a memalign. Is this relevant to us anymore, or is this some old-OS stuff that we no longer need to worry about? https://codereview.chromium.org/2703803004/diff/80001/base/allocator/malloc_z... base/allocator/malloc_zone_aggregator_mac.cc:42: memset(zones_, 0, sizeof(MallocZoneFunctions) * kMaxZoneCount); This one can be sizeof(zones_) or sizeof(MallocZoneFunctions) * arraysize(zones_). https://codereview.chromium.org/2703803004/diff/80001/base/allocator/malloc_z... base/allocator/malloc_zone_aggregator_mac.cc:47: for (int i = 0; i < kMaxZoneCount; ++i) { All of the rest can be arraysize(zones_). https://codereview.chromium.org/2703803004/diff/80001/base/allocator/malloc_z... base/allocator/malloc_zone_aggregator_mac.cc:60: for (int i = 0; i < kMaxZoneCount; ++i) { You’re concerned with performance and avoiding one level of pointer indirection, yet you’ve got a loop to find a matching zone? I don’t buy that. The extra dereference you’re trying hard to avoid probably won’t amount to anything measurable. By the way, leaky LazyInstances are so three weeks ago. Now you can just write this: // static MallocZoneAllocator* MallocZoneAllocator::GetMallocZoneAllocator() { static MallocZoneAllocator* malloc_zone_allocator = new MallocZoneAllocator(); // which can now be private return malloc_zone_allocator; } Some people prefer a reference-returning variant because it affirmatively states that you’ll get a non-nullptr object. I don‘t have a preference either way. https://codereview.chromium.org/2703803004/diff/80001/base/allocator/malloc_z... base/allocator/malloc_zone_aggregator_mac.cc:80: void MallocZoneAggregator::DispatchFreeToZone(void* zone, void* ptr) { By the time I got here, I thought that you should have templated this. https://codereview.chromium.org/2703803004/diff/80001/base/allocator/malloc_z... base/allocator/malloc_zone_aggregator_mac.cc:131: void MallocZoneAggregator::DispatchBatchFreeToZone(void* zone, Blank line between functions. https://codereview.chromium.org/2703803004/diff/80001/base/allocator/malloc_z... File base/allocator/malloc_zone_aggregator_mac.h (right): https://codereview.chromium.org/2703803004/diff/80001/base/allocator/malloc_z... base/allocator/malloc_zone_aggregator_mac.h:18: using MallocZonePointer = struct _malloc_zone_t*; Unused https://codereview.chromium.org/2703803004/diff/80001/base/allocator/malloc_z... base/allocator/malloc_zone_aggregator_mac.h:55: void* context = nullptr; Is this a _malloc_zone_t* or a ChromeMallocZone*? https://codereview.chromium.org/2703803004/diff/80001/base/allocator/malloc_z... base/allocator/malloc_zone_aggregator_mac.h:85: void* DispatchMallocToZone(void* zone, size_t size); and are these zone parameters that type too? https://codereview.chromium.org/2703803004/diff/80001/base/allocator/malloc_z... base/allocator/malloc_zone_aggregator_mac.h:105: static const int kMaxZoneCount = 30; Can be constexpr.
Patchset #6 (id:100001) has been deleted
https://codereview.chromium.org/2703803004/diff/80001/base/allocator/allocato... File base/allocator/allocator_shim_default_dispatch_to_mac_zoned_malloc.cc (right): https://codereview.chromium.org/2703803004/diff/80001/base/allocator/allocato... base/allocator/allocator_shim_default_dispatch_to_mac_zoned_malloc.cc:17: // zones. Avoid using a LeakyInstance since performance is very important, and On 2017/02/21 22:52:50, Mark Mentovai wrote: > Did you mean LazyInstance (or perhaps a leaky LazyInstance)? created a static member instead. https://codereview.chromium.org/2703803004/diff/80001/base/allocator/allocato... base/allocator/allocator_shim_default_dispatch_to_mac_zoned_malloc.cc:18: // we want to avoid an extra pointer dereference. On 2017/02/21 22:52:50, Mark Mentovai wrote: > Hmm. This isn’t wonderful. We don’t want to have an initializer or finalizer run > either. created a static member instead. https://codereview.chromium.org/2703803004/diff/80001/base/allocator/malloc_z... File base/allocator/malloc_zone_aggregator_mac.cc (right): https://codereview.chromium.org/2703803004/diff/80001/base/allocator/malloc_z... base/allocator/malloc_zone_aggregator_mac.cc:30: // Not all custom malloc zones have a memalign. On 2017/02/21 22:52:51, Mark Mentovai wrote: > Is this relevant to us anymore, or is this some old-OS stuff that we no longer > need to worry about? Happens on 10.11. https://codereview.chromium.org/2703803004/diff/80001/base/allocator/malloc_z... base/allocator/malloc_zone_aggregator_mac.cc:42: memset(zones_, 0, sizeof(MallocZoneFunctions) * kMaxZoneCount); On 2017/02/21 22:52:51, Mark Mentovai wrote: > This one can be sizeof(zones_) or sizeof(MallocZoneFunctions) * > arraysize(zones_). Done. https://codereview.chromium.org/2703803004/diff/80001/base/allocator/malloc_z... base/allocator/malloc_zone_aggregator_mac.cc:47: for (int i = 0; i < kMaxZoneCount; ++i) { On 2017/02/21 22:52:51, Mark Mentovai wrote: > All of the rest can be arraysize(zones_). Done. https://codereview.chromium.org/2703803004/diff/80001/base/allocator/malloc_z... base/allocator/malloc_zone_aggregator_mac.cc:60: for (int i = 0; i < kMaxZoneCount; ++i) { On 2017/02/21 22:52:51, Mark Mentovai wrote: > You’re concerned with performance and avoiding one level of pointer indirection, > yet you’ve got a loop to find a matching zone? I don’t buy that. The extra > dereference you’re trying hard to avoid probably won’t amount to anything > measurable. > > By the way, leaky LazyInstances are so three weeks ago. Now you can just write > this: > > // static > MallocZoneAllocator* MallocZoneAllocator::GetMallocZoneAllocator() { > static MallocZoneAllocator* malloc_zone_allocator = new MallocZoneAllocator(); > // which can now be private > return malloc_zone_allocator; > } > > Some people prefer a reference-returning variant because it affirmatively states > that you’ll get a non-nullptr object. I don‘t have a preference either way. premature optimization is the *mumble mumble mumble*. I've pulled out GetFunctionsForZone. https://codereview.chromium.org/2703803004/diff/80001/base/allocator/malloc_z... base/allocator/malloc_zone_aggregator_mac.cc:80: void MallocZoneAggregator::DispatchFreeToZone(void* zone, void* ptr) { On 2017/02/21 22:52:50, Mark Mentovai wrote: > By the time I got here, I thought that you should have templated this. After factoring out GetFunctionsForZone(), I no longer think templates will help improve clarity. https://codereview.chromium.org/2703803004/diff/80001/base/allocator/malloc_z... base/allocator/malloc_zone_aggregator_mac.cc:131: void MallocZoneAggregator::DispatchBatchFreeToZone(void* zone, On 2017/02/21 22:52:51, Mark Mentovai wrote: > Blank line between functions. Done. https://codereview.chromium.org/2703803004/diff/80001/base/allocator/malloc_z... File base/allocator/malloc_zone_aggregator_mac.h (right): https://codereview.chromium.org/2703803004/diff/80001/base/allocator/malloc_z... base/allocator/malloc_zone_aggregator_mac.h:18: using MallocZonePointer = struct _malloc_zone_t*; On 2017/02/21 22:52:51, Mark Mentovai wrote: > Unused removed https://codereview.chromium.org/2703803004/diff/80001/base/allocator/malloc_z... base/allocator/malloc_zone_aggregator_mac.h:55: void* context = nullptr; On 2017/02/21 22:52:51, Mark Mentovai wrote: > Is this a _malloc_zone_t* or a ChromeMallocZone*? ChromeMallocZone*. https://codereview.chromium.org/2703803004/diff/80001/base/allocator/malloc_z... base/allocator/malloc_zone_aggregator_mac.h:85: void* DispatchMallocToZone(void* zone, size_t size); On 2017/02/21 22:52:51, Mark Mentovai wrote: > and are these zone parameters that type too? No, we currently pass void* through the allocator shim. I originally did this because I thought that windows might need some type of context as well [for shimming HeapAlloc]. Plus malloc_zone_t isn't defined on other platforms [obviously we could forward declare it, or add ifdef onion soup everywhere], but using void* just seemed cleaner. https://codereview.chromium.org/2703803004/diff/80001/base/allocator/malloc_z... base/allocator/malloc_zone_aggregator_mac.h:105: static const int kMaxZoneCount = 30; On 2017/02/21 22:52:51, Mark Mentovai wrote: > Can be constexpr. Done.
Did a first pass, I think I am not fully grasping the intended design. My top-level comments so far are: 1) That MallocZoneFunctions seems to be a copy of ChromeMallocZone. Why not using just that one? 2) Not sure I fully buy the extra layer of malloc_zone_aggregator_mac (but very likely it's because I'm missing something still). Right now it seems to introduce just another hopping layer. Why instead don't you do that job in allocator_shim_default_dispatch_to_mac_zoned_malloc.cc ? With my current understanding of the situation, this is what I would have done here: - replace that g_default_zone in ...to_mac_zoned_malloc.cc with an array (g_default_zones[kMax]) - put the lookup logic that you have right now in malloc_zone_aggregator inside each of the various MallocImpl() in ...to_mac_zoned_malloc.cc (I don't see any other interesting functionality in the aggregator which couldn't be directly baked in to_mac_zoned_malloc.cc) That would: - reduce the number of files involved in this magic - reduce the number of layers and calls. in the current state it's going to be quite hard to get rid of the actual call between MallocImpl() -> MallocZoneAggregator::GetMallocZoneAggregator()->DispatchMallocToZone unless there is some link-time-optimization on OSX I am not aware of. - reduce the size of this CL, which always makes everybody happy :) Happy to talk more over IM/VC https://codereview.chromium.org/2703803004/diff/140001/base/allocator/allocat... File base/allocator/allocator_interception_mac.mm (right): https://codereview.chromium.org/2703803004/diff/140001/base/allocator/allocat... base/allocator/allocator_interception_mac.mm:31: #include "base/allocator/allocator_shim_default_dispatch_to_mac_zoned_malloc.h" What is this for? We shouldn't have cyclic logical dependencies (shim depending on interception_mac and viceversa). Honestly can't figure out why we have even alloctator_shim.h above? Would be great if this this file remains a lower level building block for the shim and the dependency is only in the shim -> interception_mac direction https://codereview.chromium.org/2703803004/diff/140001/base/allocator/malloc_... File base/allocator/malloc_zone_aggregator_mac.cc (right): https://codereview.chromium.org/2703803004/diff/140001/base/allocator/malloc_... base/allocator/malloc_zone_aggregator_mac.cc:49: memset(zones_, 0, sizeof(zones_)); can you add a static_assert(is_pod(zones_[0])) ? https://codereview.chromium.org/2703803004/diff/140001/base/allocator/malloc_... base/allocator/malloc_zone_aggregator_mac.cc:162: __builtin_unreachable(); this seems redundant, CHECK(false) will already expand to __builtin_unreachable. Also using __builtin_unreachable seems unprecedented in the codebase % logging.h where we use it in CHECK. https://codereview.chromium.org/2703803004/diff/140001/base/allocator/malloc_... File base/allocator/malloc_zone_aggregator_mac.h (right): https://codereview.chromium.org/2703803004/diff/140001/base/allocator/malloc_... base/allocator/malloc_zone_aggregator_mac.h:42: struct MallocZoneFunctions { TIL about the existence of ChromeMallocZone. Now my question is: what is the rationale of the extra layer of MallocZoneFunctions? It seems to be a copy of ChromeMallocZone. can't we use just that?
> 1) That MallocZoneFunctions seems to be a copy of ChromeMallocZone. Why not > using just that one? MallocZoneFunctions has a void* context, which ChromeMallocZone does not. ChromeMallocZone has a bunch of otherwise unused fields. > 2) Not sure I fully buy the extra layer of malloc_zone_aggregator_mac (but very > likely it's because I'm missing something still). > > Right now it seems to introduce just another hopping layer. Why instead don't > you do that job in allocator_shim_default_dispatch_to_mac_zoned_malloc.cc ? > With my current understanding of the situation, this is what I would have done > here: > > - replace that g_default_zone in ...to_mac_zoned_malloc.cc with an array > (g_default_zones[kMax]) > - put the lookup logic that you have right now in malloc_zone_aggregator inside > each of the various MallocImpl() in ...to_mac_zoned_malloc.cc > (I don't see any other interesting functionality in the aggregator which > couldn't be directly baked in to_mac_zoned_malloc.cc) Separating functionality into separate files/classes is a standard paradigm. If your main concern is performance, see patch set #5, where I also avoid an extra pointer dereference without needing to combine files. I switched to the current code on mark@'s advice, on the premise that we can optimize more in the future. > > That would: > - reduce the number of files involved in this magic > - reduce the number of layers and calls. in the current state it's going to be > quite hard to get rid of the actual call between MallocImpl() -> > MallocZoneAggregator::GetMallocZoneAggregator()->DispatchMallocToZone unless > there is some link-time-optimization on OSX I am not aware of. > - reduce the size of this CL, which always makes everybody happy :) I thought, based on our previous discussions, that you wanted the default_dispatch_to_mac_zoned_malloc.cc file purely for dispatch back into default malloc zone, and not to also *store*/intercept the malloc zone, which the aggregator has to do. In the past, you've specified an explicit request to separate interception/dispatch logic. When something needs to do both, you prefer to have it be separated into its own file [allocator_interception_mac]. I'm going to talk with mark@, and assuming he's happy with what you've proposed, I'm going to go ahead and implement it, in an attempt to shave another 24-hour review cycle off of this CL. > > Happy to talk more over IM/VC > > https://codereview.chromium.org/2703803004/diff/140001/base/allocator/allocat... > File base/allocator/allocator_interception_mac.mm (right): > > https://codereview.chromium.org/2703803004/diff/140001/base/allocator/allocat... > base/allocator/allocator_interception_mac.mm:31: #include > "base/allocator/allocator_shim_default_dispatch_to_mac_zoned_malloc.h" > What is this for? We shouldn't have cyclic logical dependencies (shim depending > on interception_mac and viceversa). Honestly can't figure out why we have even > alloctator_shim.h above? > Would be great if this this file remains a lower level building block for the > shim and the dependency is only in the shim -> interception_mac direction > > https://codereview.chromium.org/2703803004/diff/140001/base/allocator/malloc_... > File base/allocator/malloc_zone_aggregator_mac.cc (right): > > https://codereview.chromium.org/2703803004/diff/140001/base/allocator/malloc_... > base/allocator/malloc_zone_aggregator_mac.cc:49: memset(zones_, 0, > sizeof(zones_)); > can you add a static_assert(is_pod(zones_[0])) ? > > https://codereview.chromium.org/2703803004/diff/140001/base/allocator/malloc_... > base/allocator/malloc_zone_aggregator_mac.cc:162: __builtin_unreachable(); > this seems redundant, CHECK(false) will already expand to __builtin_unreachable. > Also using __builtin_unreachable seems unprecedented in the codebase % logging.h > where we use it in CHECK. > > https://codereview.chromium.org/2703803004/diff/140001/base/allocator/malloc_... > File base/allocator/malloc_zone_aggregator_mac.h (right): > > https://codereview.chromium.org/2703803004/diff/140001/base/allocator/malloc_... > base/allocator/malloc_zone_aggregator_mac.h:42: struct MallocZoneFunctions { > TIL about the existence of ChromeMallocZone. Now my question is: what is the > rationale of the extra layer of MallocZoneFunctions? It seems to be a copy of > ChromeMallocZone. can't we use just that?
chatted with mark@, he seems willing to accept primiano's proposals, so I'm going to go ahead and implement.
On 2017/02/22 18:47:15, erikchen wrote: > chatted with mark@, he seems willing to accept primiano's proposals, so I'm > going to go ahead and implement. Okay, I think I have a solution that will make everyone happy. I started implemented primiano's suggestion, but quickly ran into something that I think will make primiano unhappy. AFAICT, primiano's constraints are the following: 1) Avoid all unnecessary pointer dereferences/function calls. 2) Avoid cyclical dependencies. Right now, default_dispatch* and overide.* both depend on interception.*, but not vise versa. Adding a global to default_dispatch* will cause cyclical dependencies, since this global will need to be touched during post-launch interception. Instead, I keep all the logic in aggregator.*, and make it a dependency of all other files. It exposes a global which default_dispatch* can use directly. This also keeps the logic separated from default_dispatch*, which I think is cleaner.
mark, primiano: Please review. https://codereview.chromium.org/2703803004/diff/140001/base/allocator/allocat... File base/allocator/allocator_interception_mac.mm (right): https://codereview.chromium.org/2703803004/diff/140001/base/allocator/allocat... base/allocator/allocator_interception_mac.mm:31: #include "base/allocator/allocator_shim_default_dispatch_to_mac_zoned_malloc.h" On 2017/02/22 12:25:39, Primiano Tucci wrote: > What is this for? We shouldn't have cyclic logical dependencies (shim depending > on interception_mac and viceversa). Honestly can't figure out why we have even > alloctator_shim.h above? > Would be great if this this file remains a lower level building block for the > shim and the dependency is only in the shim -> interception_mac direction whoops, unnecessary include. https://codereview.chromium.org/2703803004/diff/140001/base/allocator/malloc_... File base/allocator/malloc_zone_aggregator_mac.cc (right): https://codereview.chromium.org/2703803004/diff/140001/base/allocator/malloc_... base/allocator/malloc_zone_aggregator_mac.cc:49: memset(zones_, 0, sizeof(zones_)); On 2017/02/22 12:25:39, Primiano Tucci wrote: > can you add a static_assert(is_pod(zones_[0])) ? MallocZoneFunctions has a non-default constructor, because it's "weight" exceeds 10. I think the static_assert doesn't provide additional clarity, especially since the default constructor can use non-0 values [header initialization] [compared to the alternative of suppressing the default-constructor warning]. https://codereview.chromium.org/2703803004/diff/140001/base/allocator/malloc_... base/allocator/malloc_zone_aggregator_mac.cc:162: __builtin_unreachable(); On 2017/02/22 12:25:39, Primiano Tucci wrote: > this seems redundant, CHECK(false) will already expand to __builtin_unreachable. > Also using __builtin_unreachable seems unprecedented in the codebase % logging.h > where we use it in CHECK. Switched to IMMEDIATE_CRASH(); https://codereview.chromium.org/2703803004/diff/140001/base/allocator/malloc_... File base/allocator/malloc_zone_aggregator_mac.h (right): https://codereview.chromium.org/2703803004/diff/140001/base/allocator/malloc_... base/allocator/malloc_zone_aggregator_mac.h:42: struct MallocZoneFunctions { On 2017/02/22 12:25:39, Primiano Tucci wrote: > TIL about the existence of ChromeMallocZone. Now my question is: what is the > rationale of the extra layer of MallocZoneFunctions? It seems to be a copy of > ChromeMallocZone. can't we use just that? The latter doesn't have a |context| parameter.
The CQ bit was checked by erikchen@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: mac_chromium_compile_dbg_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_comp...)
The CQ bit was checked by erikchen@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: mac_chromium_compile_dbg_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_comp...)
The CQ bit was checked by erikchen@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
layering and naming looks such better! thanks a lot for the patience of rewrapping this. I might have developed the classic CL Stockholm Syndrome but now it feels easier to follow now. Mostly minor comments left. The bigger question left from my side is about threading (read comment inline: what happens if a zone is destroyed while you iterate on it) https://codereview.chromium.org/2703803004/diff/240001/base/allocator/allocat... File base/allocator/allocator_interception_mac.mm (right): https://codereview.chromium.org/2703803004/diff/240001/base/allocator/allocat... base/allocator/allocator_interception_mac.mm:351: for (unsigned int i = 0; i < count; ++i) { just checking: what happens if a zone goes away while we iterate? there is no locking here. Asking because we had some similar problem on windows and the conclusion was that the winapi don't have any way to safely enumerate the heaps (similar concept to these zones, really), as the winapi would crash if a heap is destroyed while iterating. (sorry too late to dig in crbug, but if you git blame the malloc_dump_provider under the win section I'm pretty sure you'll find the CL with all the context, if interested) I hope we're not in the same situation here :) https://codereview.chromium.org/2703803004/diff/240001/base/allocator/allocat... File base/allocator/allocator_shim_default_dispatch_to_mac_zoned_malloc.cc (right): https://codereview.chromium.org/2703803004/diff/240001/base/allocator/allocat... base/allocator/allocator_shim_default_dispatch_to_mac_zoned_malloc.cc:18: MallocZoneFunctions& GetFunctionsForZone(void* zone) { IMHO it makes more sense for this to be an inline function in /malloc_zone_functions_mac.h and hiding (i.e. un-externing) the g_malloc_zones. This requires the awkard bit where here in this file you need to worry abound and match the acquire/release semantic (see my comment about threading in malloc_zone_functions_mac.h) https://codereview.chromium.org/2703803004/diff/240001/base/allocator/allocat... base/allocator/allocator_shim_default_dispatch_to_mac_zoned_malloc.cc:23: IMMEDIATE_CRASH(); out of curiosity, did you use this because the compiler is not smart enough to realize that CHECK(false) is noreturn? (well actually it might not be on non-official builds) https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... File base/allocator/malloc_zone_functions_mac.cc (right): https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... base/allocator/malloc_zone_functions_mac.cc:47: base::Lock* GetLock() { do you really need all this locking here? Isn't easier to just make it so that all the zone replacements are done from the same thread and just add a threadchecker here? https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... base/allocator/malloc_zone_functions_mac.cc:55: bool IsMallocZoneAlreadyStoredLockAcquired(ChromeMallocZone* zone) { I think the naming pattern in the rest of the codebase is xxxLocked not xxxLockAcquired: cs stats: \b\w+Locked\( -f:third_party -> 723 resuts \b\w+LockAcquired\( -f:third_party -> 8 resuts https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... base/allocator/malloc_zone_functions_mac.cc:70: memset(g_malloc_zones, 0, sizeof(g_malloc_zones)); why you need this? global structs and pods are zero-initialized by default. maybe just add a static_assert(is_pod(g_malloc_zones)) to make sure that won't accidentally become a dynamic initializer. https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... base/allocator/malloc_zone_functions_mac.cc:80: base::subtle::MemoryBarrier(); This memory barrier is not enough in theory and is non-necessary in practice on x86-TSO (but we don't want to rely on this because TSan will slap us). What you really want to guarantee here is that the |context| is the last field getting written. so that on the reader side either you read a null context or (non-null context AND consistent function ptrs). The way you epxress this is by making |context| a subtle::AtomicWord and by doing a ReleaseStore here (well, in StoreZoneFunctions) AND an AcquireLoad in GetFunctionsForZone. Now, I say this is non necessary on x86-TSO because, in the Total Store Order model, the reordering engine guarantees that stores and loads appear to be always linearized, and the only visible thing that can happen IIRC is that stores get reorder w.r.t loads. But you don't want to think to this micro-architecture detail and depend about all this (also because in the meantime the compiler could take other reordering licenses). Thankfully you don't need to. in fact if you look at the x86 implementation both AcquireLoad and REleaseStore are just conventional load and store operations (+ the required volatile and such to keep to compiler quiet) :) https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... base/allocator/malloc_zone_functions_mac.cc:97: g_zeroed = false; as per the comment above I don't think you need this g_zeroed. But if you end up really needing it, I find quite hard to believe that this is = false, since you just memset-ed it to 0 two lines above :) https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... File base/allocator/malloc_zone_functions_mac.h (right): https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... base/allocator/malloc_zone_functions_mac.h:56: void StoreZoneFunctions(ChromeMallocZone* zone, MallocZoneFunctions* functions); +const ChromeMallocZone? Mostly for readability purposes, as it becomes IMHO way clearer what gets stored into this. If you are annoyed by the 80col wrapping you can strip the arg names here and just do: void StoreZoneFunctions(const ChromeMallocZone*, MallocZoneFunctions*);
layering and naming looks such better! thanks a lot for the patience of rewrapping this. I might have developed the classic CL Stockholm Syndrome but now it feels easier to follow now. Mostly minor comments left. The bigger question left from my side is about threading (read comment inline: what happens if a zone is destroyed while you iterate on it) https://codereview.chromium.org/2703803004/diff/240001/base/allocator/allocat... File base/allocator/allocator_interception_mac.mm (right): https://codereview.chromium.org/2703803004/diff/240001/base/allocator/allocat... base/allocator/allocator_interception_mac.mm:351: for (unsigned int i = 0; i < count; ++i) { just checking: what happens if a zone goes away while we iterate? there is no locking here. Asking because we had some similar problem on windows and the conclusion was that the winapi don't have any way to safely enumerate the heaps (similar concept to these zones, really), as the winapi would crash if a heap is destroyed while iterating. (sorry too late to dig in crbug, but if you git blame the malloc_dump_provider under the win section I'm pretty sure you'll find the CL with all the context, if interested) I hope we're not in the same situation here :) https://codereview.chromium.org/2703803004/diff/240001/base/allocator/allocat... File base/allocator/allocator_shim_default_dispatch_to_mac_zoned_malloc.cc (right): https://codereview.chromium.org/2703803004/diff/240001/base/allocator/allocat... base/allocator/allocator_shim_default_dispatch_to_mac_zoned_malloc.cc:18: MallocZoneFunctions& GetFunctionsForZone(void* zone) { IMHO it makes more sense for this to be an inline function in /malloc_zone_functions_mac.h and hiding (i.e. un-externing) the g_malloc_zones. This requires the awkard bit where here in this file you need to worry abound and match the acquire/release semantic (see my comment about threading in malloc_zone_functions_mac.h) https://codereview.chromium.org/2703803004/diff/240001/base/allocator/allocat... base/allocator/allocator_shim_default_dispatch_to_mac_zoned_malloc.cc:23: IMMEDIATE_CRASH(); out of curiosity, did you use this because the compiler is not smart enough to realize that CHECK(false) is noreturn? (well actually it might not be on non-official builds) https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... File base/allocator/malloc_zone_functions_mac.cc (right): https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... base/allocator/malloc_zone_functions_mac.cc:47: base::Lock* GetLock() { do you really need all this locking here? Isn't easier to just make it so that all the zone replacements are done from the same thread and just add a threadchecker here? https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... base/allocator/malloc_zone_functions_mac.cc:55: bool IsMallocZoneAlreadyStoredLockAcquired(ChromeMallocZone* zone) { I think the naming pattern in the rest of the codebase is xxxLocked not xxxLockAcquired: cs stats: \b\w+Locked\( -f:third_party -> 723 resuts \b\w+LockAcquired\( -f:third_party -> 8 resuts https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... base/allocator/malloc_zone_functions_mac.cc:70: memset(g_malloc_zones, 0, sizeof(g_malloc_zones)); why you need this? global structs and pods are zero-initialized by default. maybe just add a static_assert(is_pod(g_malloc_zones)) to make sure that won't accidentally become a dynamic initializer. https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... base/allocator/malloc_zone_functions_mac.cc:80: base::subtle::MemoryBarrier(); This memory barrier is not enough in theory and is non-necessary in practice on x86-TSO (but we don't want to rely on this because TSan will slap us). What you really want to guarantee here is that the |context| is the last field getting written. so that on the reader side either you read a null context or (non-null context AND consistent function ptrs). The way you epxress this is by making |context| a subtle::AtomicWord and by doing a ReleaseStore here (well, in StoreZoneFunctions) AND an AcquireLoad in GetFunctionsForZone. Now, I say this is non necessary on x86-TSO because, in the Total Store Order model, the reordering engine guarantees that stores and loads appear to be always linearized, and the only visible thing that can happen IIRC is that stores get reorder w.r.t loads. But you don't want to think to this micro-architecture detail and depend about all this (also because in the meantime the compiler could take other reordering licenses). Thankfully you don't need to. in fact if you look at the x86 implementation both AcquireLoad and REleaseStore are just conventional load and store operations (+ the required volatile and such to keep to compiler quiet) :) https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... base/allocator/malloc_zone_functions_mac.cc:97: g_zeroed = false; as per the comment above I don't think you need this g_zeroed. But if you end up really needing it, I find quite hard to believe that this is = false, since you just memset-ed it to 0 two lines above :) https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... File base/allocator/malloc_zone_functions_mac.h (right): https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... base/allocator/malloc_zone_functions_mac.h:56: void StoreZoneFunctions(ChromeMallocZone* zone, MallocZoneFunctions* functions); +const ChromeMallocZone? Mostly for readability purposes, as it becomes IMHO way clearer what gets stored into this. If you are annoyed by the 80col wrapping you can strip the arg names here and just do: void StoreZoneFunctions(const ChromeMallocZone*, MallocZoneFunctions*);
layering and naming looks such better! thanks a lot for the patience of rewrapping this. I might have developed the classic CL Stockholm Syndrome but now it feels easier to follow now. Mostly minor comments left. The bigger question left from my side is about threading (read comment inline: what happens if a zone is destroyed while you iterate on it) https://codereview.chromium.org/2703803004/diff/240001/base/allocator/allocat... File base/allocator/allocator_interception_mac.mm (right): https://codereview.chromium.org/2703803004/diff/240001/base/allocator/allocat... base/allocator/allocator_interception_mac.mm:351: for (unsigned int i = 0; i < count; ++i) { just checking: what happens if a zone goes away while we iterate? there is no locking here. Asking because we had some similar problem on windows and the conclusion was that the winapi don't have any way to safely enumerate the heaps (similar concept to these zones, really), as the winapi would crash if a heap is destroyed while iterating. (sorry too late to dig in crbug, but if you git blame the malloc_dump_provider under the win section I'm pretty sure you'll find the CL with all the context, if interested) I hope we're not in the same situation here :) https://codereview.chromium.org/2703803004/diff/240001/base/allocator/allocat... File base/allocator/allocator_shim_default_dispatch_to_mac_zoned_malloc.cc (right): https://codereview.chromium.org/2703803004/diff/240001/base/allocator/allocat... base/allocator/allocator_shim_default_dispatch_to_mac_zoned_malloc.cc:18: MallocZoneFunctions& GetFunctionsForZone(void* zone) { IMHO it makes more sense for this to be an inline function in /malloc_zone_functions_mac.h and hiding (i.e. un-externing) the g_malloc_zones. This requires the awkard bit where here in this file you need to worry abound and match the acquire/release semantic (see my comment about threading in malloc_zone_functions_mac.h) https://codereview.chromium.org/2703803004/diff/240001/base/allocator/allocat... base/allocator/allocator_shim_default_dispatch_to_mac_zoned_malloc.cc:23: IMMEDIATE_CRASH(); out of curiosity, did you use this because the compiler is not smart enough to realize that CHECK(false) is noreturn? (well actually it might not be on non-official builds) https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... File base/allocator/malloc_zone_functions_mac.cc (right): https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... base/allocator/malloc_zone_functions_mac.cc:47: base::Lock* GetLock() { do you really need all this locking here? Isn't easier to just make it so that all the zone replacements are done from the same thread and just add a threadchecker here? https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... base/allocator/malloc_zone_functions_mac.cc:55: bool IsMallocZoneAlreadyStoredLockAcquired(ChromeMallocZone* zone) { I think the naming pattern in the rest of the codebase is xxxLocked not xxxLockAcquired: cs stats: \b\w+Locked\( -f:third_party -> 723 resuts \b\w+LockAcquired\( -f:third_party -> 8 resuts https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... base/allocator/malloc_zone_functions_mac.cc:70: memset(g_malloc_zones, 0, sizeof(g_malloc_zones)); why you need this? global structs and pods are zero-initialized by default. maybe just add a static_assert(is_pod(g_malloc_zones)) to make sure that won't accidentally become a dynamic initializer. https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... base/allocator/malloc_zone_functions_mac.cc:80: base::subtle::MemoryBarrier(); This memory barrier is not enough in theory and is non-necessary in practice on x86-TSO (but we don't want to rely on this because TSan will slap us). What you really want to guarantee here is that the |context| is the last field getting written. so that on the reader side either you read a null context or (non-null context AND consistent function ptrs). The way you epxress this is by making |context| a subtle::AtomicWord and by doing a ReleaseStore here (well, in StoreZoneFunctions) AND an AcquireLoad in GetFunctionsForZone. Now, I say this is non necessary on x86-TSO because, in the Total Store Order model, the reordering engine guarantees that stores and loads appear to be always linearized, and the only visible thing that can happen IIRC is that stores get reorder w.r.t loads. But you don't want to think to this micro-architecture detail and depend about all this (also because in the meantime the compiler could take other reordering licenses). Thankfully you don't need to. in fact if you look at the x86 implementation both AcquireLoad and REleaseStore are just conventional load and store operations (+ the required volatile and such to keep to compiler quiet) :) https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... base/allocator/malloc_zone_functions_mac.cc:97: g_zeroed = false; as per the comment above I don't think you need this g_zeroed. But if you end up really needing it, I find quite hard to believe that this is = false, since you just memset-ed it to 0 two lines above :) https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... File base/allocator/malloc_zone_functions_mac.h (right): https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... base/allocator/malloc_zone_functions_mac.h:56: void StoreZoneFunctions(ChromeMallocZone* zone, MallocZoneFunctions* functions); +const ChromeMallocZone? Mostly for readability purposes, as it becomes IMHO way clearer what gets stored into this. If you are annoyed by the 80col wrapping you can strip the arg names here and just do: void StoreZoneFunctions(const ChromeMallocZone*, MallocZoneFunctions*);
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
https://codereview.chromium.org/2703803004/diff/240001/base/allocator/allocat... File base/allocator/allocator_interception_mac.mm (right): https://codereview.chromium.org/2703803004/diff/240001/base/allocator/allocat... base/allocator/allocator_interception_mac.mm:351: for (unsigned int i = 0; i < count; ++i) { On 2017/02/23 00:40:17, Primiano Tucci wrote: > just checking: what happens if a zone goes away while we iterate? there is no > locking here. > Asking because we had some similar problem on windows and the conclusion was > that the winapi don't have any way to safely enumerate the heaps (similar > concept to these zones, really), as the winapi would crash if a heap is > destroyed while iterating. (sorry too late to dig in crbug, but if you git blame > the malloc_dump_provider under the win section I'm pretty sure you'll find the > CL with all the context, if interested) > > I hope we're not in the same situation here :) same problem, sorry. https://codereview.chromium.org/2703803004/diff/240001/base/allocator/allocat... File base/allocator/allocator_shim_default_dispatch_to_mac_zoned_malloc.cc (right): https://codereview.chromium.org/2703803004/diff/240001/base/allocator/allocat... base/allocator/allocator_shim_default_dispatch_to_mac_zoned_malloc.cc:18: MallocZoneFunctions& GetFunctionsForZone(void* zone) { On 2017/02/23 00:40:17, Primiano Tucci wrote: > IMHO it makes more sense for this to be an inline function in > /malloc_zone_functions_mac.h and hiding (i.e. un-externing) the g_malloc_zones. > This requires the awkard bit where here in this file you need to worry abound > and match the acquire/release semantic (see my comment about threading in > malloc_zone_functions_mac.h) inline function in malloc_zone_functions_mac.h doesn't allow us to un-extern g_malloc_zones. Instead I've marked this function inline [which is really just a compiler hint anyways, which is why I mostly avoid use of it].. https://codereview.chromium.org/2703803004/diff/240001/base/allocator/allocat... base/allocator/allocator_shim_default_dispatch_to_mac_zoned_malloc.cc:23: IMMEDIATE_CRASH(); On 2017/02/23 00:40:18, Primiano Tucci wrote: > out of curiosity, did you use this because the compiler is not smart enough to > realize that CHECK(false) is noreturn? (well actually it might not be on > non-official builds) yes https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... File base/allocator/malloc_zone_functions_mac.cc (right): https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... base/allocator/malloc_zone_functions_mac.cc:47: base::Lock* GetLock() { On 2017/02/23 00:40:18, Primiano Tucci wrote: > do you really need all this locking here? Isn't easier to just make it so that > all the zone replacements are done from the same thread and just add a > threadchecker here? Actually, the new Task Scheduler doesn't guarantee thread-affinity, just ordering. Adding a usage requirement in the header [which would only loosely be enforced by dchecks/checks] add complexity to the interface for no benefit, whereas using a lock ensures correctness, and doesn't pollute the interface.] Note that there's no real performance hit, since there's not going to be contention on this lock. https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... base/allocator/malloc_zone_functions_mac.cc:55: bool IsMallocZoneAlreadyStoredLockAcquired(ChromeMallocZone* zone) { On 2017/02/23 00:40:18, Primiano Tucci wrote: > I think the naming pattern in the rest of the codebase is xxxLocked not > xxxLockAcquired: > > cs stats: > \b\w+Locked\( -f:third_party -> 723 resuts > \b\w+LockAcquired\( -f:third_party -> 8 resuts Done. https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... base/allocator/malloc_zone_functions_mac.cc:70: memset(g_malloc_zones, 0, sizeof(g_malloc_zones)); On 2017/02/23 00:40:18, Primiano Tucci wrote: > why you need this? global structs and pods are zero-initialized by default. > maybe just add a static_assert(is_pod(g_malloc_zones)) to make sure that won't > accidentally become a dynamic initializer. This isn't pod. Notice that MallocZoneFunctions has an explicit constructor. See my response to your previous set of comments. But you're right that we don't need to initialize this array. https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... base/allocator/malloc_zone_functions_mac.cc:80: base::subtle::MemoryBarrier(); On 2017/02/23 00:40:18, Primiano Tucci wrote: > This memory barrier is not enough in theory and is non-necessary in practice on Sorry, but I disagree on both points. Comments inline. > x86-TSO (but we don't want to rely on this because TSan will slap us). > > What you really want to guarantee here is that the |context| is the last field > getting written. so that on the reader side either you read a null context or > (non-null context AND consistent function ptrs). Not quite. Your suggestion is a stronger guarantee than we need. We just need to ensure that each malloc-zone that we've shimmed will see a consistent result for the MallocZoneFunctions that has the appropriate |context|. This just means that we need a memory barrier before inserting the shim. I commented on this in the header file. > The way you epxress this is by making |context| a subtle::AtomicWord and by > doing a ReleaseStore here (well, in StoreZoneFunctions) AND an AcquireLoad in > GetFunctionsForZone. This potentially has very significant performance implications [depends on implementation of AcquireLoad/ReleaseStore], which is why I've avoided it. > > Now, I say this is non necessary on x86-TSO because, in the Total Store Order > model, the reordering engine guarantees that stores and loads appear to be > always linearized, and the only visible thing that can happen IIRC is that > stores get reorder w.r.t loads. ia-64 allows all sorts of reorderings. https://en.wikipedia.org/wiki/Memory_ordering > > But you don't want to think to this micro-architecture detail and depend about > all this (also because in the meantime the compiler could take other reordering > licenses). Thankfully you don't need to. in fact if you look at the x86 > implementation both AcquireLoad and REleaseStore are just conventional load and > store operations (+ the required volatile and such to keep to compiler quiet) :) As you've noted, it's very hard to get correct semantics for AcquireLoad and ReleaseStore, whereas MemoryBarrier is very well defined. I'll try to come online early tomorrow so we can discuss live.. https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... base/allocator/malloc_zone_functions_mac.cc:97: g_zeroed = false; On 2017/02/23 00:40:18, Primiano Tucci wrote: > as per the comment above I don't think you need this g_zeroed. But if you end up > really needing it, I find quite hard to believe that this is = false, since you > just memset-ed it to 0 two lines above :) removed. https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... File base/allocator/malloc_zone_functions_mac.h (right): https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... base/allocator/malloc_zone_functions_mac.h:56: void StoreZoneFunctions(ChromeMallocZone* zone, MallocZoneFunctions* functions); On 2017/02/23 00:40:18, Primiano Tucci wrote: > +const ChromeMallocZone? Mostly for readability purposes, as it becomes IMHO way > clearer what gets stored into this. If you are annoyed by the 80col wrapping > you can strip the arg names here and just do: > > void StoreZoneFunctions(const ChromeMallocZone*, MallocZoneFunctions*); Done. I just use clang-format.
OK, LGTM. https://codereview.chromium.org/2703803004/diff/280001/base/allocator/allocat... File base/allocator/allocator_interception_mac.h (right): https://codereview.chromium.org/2703803004/diff/280001/base/allocator/allocat... base/allocator/allocator_interception_mac.h:17: // Saves the function pointers currently used by default zone. the default zone https://codereview.chromium.org/2703803004/diff/280001/base/allocator/allocat... File base/allocator/allocator_interception_mac.mm (right): https://codereview.chromium.org/2703803004/diff/280001/base/allocator/allocat... base/allocator/allocator_interception_mac.mm:360: kern_return_t kr = malloc_get_all_zones(mach_task_self(), 0, &zones, &count); memory_reader_t (the second argument) is a function pointer, so nullptr it is. https://codereview.chromium.org/2703803004/diff/280001/base/allocator/allocat... base/allocator/allocator_interception_mac.mm:368: if (zone->malloc == functions->malloc) You can merge this with the previous with an ||, or for even one line fewer, do that and then invert the logic to make ReplaceZoneFunctions() conditioned and get rid of the “continue” altogether. https://codereview.chromium.org/2703803004/diff/280001/base/allocator/allocat... File base/allocator/allocator_shim_default_dispatch_to_mac_zoned_malloc.cc (right): https://codereview.chromium.org/2703803004/diff/280001/base/allocator/allocat... base/allocator/allocator_shim_default_dispatch_to_mac_zoned_malloc.cc:23: IMMEDIATE_CRASH(); Too unsafe to log something if this goes bad? https://codereview.chromium.org/2703803004/diff/280001/base/allocator/malloc_... File base/allocator/malloc_zone_functions_mac.cc (right): https://codereview.chromium.org/2703803004/diff/280001/base/allocator/malloc_... base/allocator/malloc_zone_functions_mac.cc:47: base::Lock* GetLock() { I know I told you that I didn’t care which, but now that I see how this worked out, I think it’d be better if you returned a & instead of a *. Four out of five callers want the ref anyway, and the fifth could easily use the ref.
https://codereview.chromium.org/2703803004/diff/140001/base/allocator/malloc_... File base/allocator/malloc_zone_aggregator_mac.cc (right): https://codereview.chromium.org/2703803004/diff/140001/base/allocator/malloc_... base/allocator/malloc_zone_aggregator_mac.cc:49: memset(zones_, 0, sizeof(zones_)); On 2017/02/22 21:03:09, erikchen wrote: > On 2017/02/22 12:25:39, Primiano Tucci wrote: > > can you add a static_assert(is_pod(zones_[0])) ? > > MallocZoneFunctions has a non-default constructor, I just realized now about that that default constructor. By not initializing the pod fields explicitly, that CTOR will prevent the struct from being zero-initialized, both in the case where you define as a temporary variable on the stack or as a global variable. > because it's "weight" exceeds 10. It should be possible to define a pure struct which has as many fields as you want as long as they are all pod. that is the only way to avoid a static initializer AFAIK. > I think the static_assert doesn't provide additional clarity, Is not just about clarity, is about avoiding a static initializer. See discussion on https://groups.google.com/a/chromium.org/d/msg/chromium-dev/688aPLsA5jU/lwJm7... The TL;DR is that: the only way to have a solid guarantee that a struct/class won't cause a static initializer is: - not having any constructors - checking that is_pod > especially > since the default constructor can use non-0 values [header initialization] See my other comment on this. I think the safest and cleanest way here is to not having a constructor at all and initialize this: - as a globlal variable: this will be implicitly zero-initialized - as a local variable: just do MallocZone x = {}; > [compared to the alternative of suppressing the default-constructor warning]. I think this is the only choice here to really prevent a static initializer. better to ask this to thakis@ https://codereview.chromium.org/2703803004/diff/240001/base/allocator/allocat... File base/allocator/allocator_interception_mac.mm (right): https://codereview.chromium.org/2703803004/diff/240001/base/allocator/allocat... base/allocator/allocator_interception_mac.mm:351: for (unsigned int i = 0; i < count; ++i) { On 2017/02/23 01:43:14, erikchen wrote: > On 2017/02/23 00:40:17, Primiano Tucci wrote: > > just checking: what happens if a zone goes away while we iterate? there is no > > locking here. > > Asking because we had some similar problem on windows and the conclusion was > > that the winapi don't have any way to safely enumerate the heaps (similar > > concept to these zones, really), as the winapi would crash if a heap is > > destroyed while iterating. (sorry too late to dig in crbug, but if you git > blame > > the malloc_dump_provider under the win section I'm pretty sure you'll find the > > CL with all the context, if interested) > > > > I hope we're not in the same situation here :) > > same problem, sorry. Are you saying that we know upfront that this code will likely make chrome crash? https://codereview.chromium.org/2703803004/diff/240001/base/allocator/allocat... File base/allocator/allocator_shim_default_dispatch_to_mac_zoned_malloc.cc (right): https://codereview.chromium.org/2703803004/diff/240001/base/allocator/allocat... base/allocator/allocator_shim_default_dispatch_to_mac_zoned_malloc.cc:18: MallocZoneFunctions& GetFunctionsForZone(void* zone) { On 2017/02/23 01:43:14, erikchen wrote: > On 2017/02/23 00:40:17, Primiano Tucci wrote: > > IMHO it makes more sense for this to be an inline function in > > /malloc_zone_functions_mac.h and hiding (i.e. un-externing) the > g_malloc_zones. > > This requires the awkard bit where here in this file you need to worry abound > > and match the acquire/release semantic (see my comment about threading in > > malloc_zone_functions_mac.h) > > inline function in malloc_zone_functions_mac.h doesn't allow us to un-extern > g_malloc_zones. sorry, I really meant "allow us to not have to access the extern g_malloc_zone outside of that file" > Instead I've marked this function inline [which is really just a > compiler hint anyways, which is why I mostly avoid use of it].. Yup agree about inline being pretty much useless. My suggestion was to define this function in the malloc_zone_functions_mac.h, so that all the logic that accesses g_malloc_zones stays there. https://codereview.chromium.org/2703803004/diff/240001/base/allocator/allocat... base/allocator/allocator_shim_default_dispatch_to_mac_zoned_malloc.cc:23: IMMEDIATE_CRASH(); On 2017/02/23 01:43:14, erikchen wrote: > On 2017/02/23 00:40:18, Primiano Tucci wrote: > > out of curiosity, did you use this because the compiler is not smart enough to > > realize that CHECK(false) is noreturn? (well actually it might not be on > > non-official builds) > > yes got it, sg https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... File base/allocator/malloc_zone_functions_mac.cc (right): https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... base/allocator/malloc_zone_functions_mac.cc:47: base::Lock* GetLock() { On 2017/02/23 01:43:14, erikchen wrote: > On 2017/02/23 00:40:18, Primiano Tucci wrote: > > do you really need all this locking here? Isn't easier to just make it so that > > all the zone replacements are done from the same thread and just add a > > threadchecker here? > > Actually, the new Task Scheduler doesn't guarantee thread-affinity, just > ordering. Well depends wheteher you use SingleThreadTaskRunner or just a seqTaskRunner. > Adding a usage requirement in the header [which would only loosely be > enforced by dchecks/checks] add complexity to the interface for no benefit, > whereas using a lock ensures correctness, and doesn't pollute the interface.] > Note that there's no real performance hit, since there's not going to be > contention on this lock. My worry is not about contention, is just when possible I prefer to not think about thread safety (by guaranteeing that everything happens on a thread) instead of trying to deal with it. Nothing is more thread safe than things that happen on a single thread :) Ahyhow, not strongly opinionated on this, so feel free to keep this. https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... base/allocator/malloc_zone_functions_mac.cc:70: memset(g_malloc_zones, 0, sizeof(g_malloc_zones)); On 2017/02/23 01:43:14, erikchen wrote: > On 2017/02/23 00:40:18, Primiano Tucci wrote: > > why you need this? global structs and pods are zero-initialized by default. > > maybe just add a static_assert(is_pod(g_malloc_zones)) to make sure that won't > > accidentally become a dynamic initializer. > > This isn't pod. Notice that MallocZoneFunctions has an explicit constructor. Uhm, this means that g_malloc_zones will likely cause an initializer, which is: - forbidden by the style guide (see below) - going to be intercepted by Nico's static intializer's check in the main waterfall (there is no check in CQ) and cause a revert, unless the compiler is smart enough to figure out that the ctor doesn't really require a static initializer (but, as per above, you shouldn't rely on that. From style guide: Variables of class type with static storage duration are forbidden (...) Objects with static storage duration, including global variables, static variables, static class member variables, and function static variables, must be Plain Old Data (POD): only ints, chars, floats, or pointers, or arrays/structs of POD. Also, I just realized that by having a default ctor for MallocZoneFunctions that does nothing, you are essentially preventing the global struct to be zero initialized. This explains why the existing code didn't hit the static initialzier check. In summary, I think the cleanest thing to do is to drop that constructor, make MallocZoneFunctions a POD, and initialize with {} in the places where is not used as a global. > See > my response to your previous set of comments. But you're right that we don't > need to initialize this array. https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... base/allocator/malloc_zone_functions_mac.cc:80: base::subtle::MemoryBarrier(); On 2017/02/23 01:43:14, erikchen wrote: > On 2017/02/23 00:40:18, Primiano Tucci wrote: > > This memory barrier is not enough in theory and is non-necessary in practice > on > Sorry, but I disagree on both points. Comments inline. No need to be sorry :) > > x86-TSO (but we don't want to rely on this because TSan will slap us). > > > > What you really want to guarantee here is that the |context| is the last field > > getting written. so that on the reader side either you read a null context or > > (non-null context AND consistent function ptrs). > Not quite. Your suggestion is a stronger guarantee than we need. We just need to > ensure that each malloc-zone that we've shimmed will see a consistent result for > the MallocZoneFunctions that has the appropriate |context|. Let me see if I visualize the problem correctly before we discuss the solution. My understanding is that, in the very essence, the problem we have is the following (I'm deliberately ignoring the "array" part and considering the simpler case of 1-item array): on one writer thread you do: g_zone.malloc_function_ptr = .... g_zone.calloc_function_ptr = .... g_zone.context = .... On the reader thread (in allocator_shim_default_dispatch_to_mac_zoned_malloc.cc) if (g_zone.context != nullptr) .. use the fields of the zone such as malloc_function_ptr So my understanding is that either you don't want to see the zone being populated, or you want to see it populated consistently. is this right? or am I misunderstanding the problem? > we need a memory barrier before inserting the shim. I commented on this in the > header file. Yup saw that. > > > The way you epxress this is by making |context| a subtle::AtomicWord and by > > doing a ReleaseStore here (well, in StoreZoneFunctions) AND an AcquireLoad in > > GetFunctionsForZone. > This potentially has very significant performance implications [depends on > implementation of AcquireLoad/ReleaseStore], which is why I've avoided it. There are no performance implication on AcquireLoad/ReleaseStore on x86/x86_64 (which are the only archs we care about on Mac). AFAIL there is a penalty on Linux desktop, but that's a bug and just because we are using a mismatching sysroot (crbug.com/592903) > > > > > Now, I say this is non necessary on x86-TSO because, in the Total Store Order > > model, the reordering engine guarantees that stores and loads appear to be > > always linearized, and the only visible thing that can happen IIRC is that > > stores get reorder w.r.t loads. > ia-64 allows all sorts of reorderings. Correct, but ia-64 == itanium != x86_64. (I know, but... ia-32 == x86. Lovely isn't it?) > https://en.wikipedia.org/wiki/Memory_ordering Precisely. That confirms my point. You did just look at the wrong column :) amd64 is what you want to look for here. the "amd" part is irrelevant and historic. amd64 == x86_64 (regardless of the brand, Intel or AMD) (And to add more excitement, since we are on the topic, amd64 is also often referred to as x64) > > > > > But you don't want to think to this micro-architecture detail and depend about > > all this (also because in the meantime the compiler could take other > reordering > > licenses). Thankfully you don't need to. in fact if you look at the x86 > > implementation both AcquireLoad and REleaseStore are just conventional load > and > > store operations (+ the required volatile and such to keep to compiler quiet) > :) > As you've noted, it's very hard to get correct semantics for AcquireLoad and > ReleaseStore, whereas MemoryBarrier is very well defined. Also AcquireLoad and ReleaseStore semantic is "well defined". But I don't want to get lost in linguistic or phylosophical discussions, so I guess what you meant here is: "MemoryBarrier" is easier to reason about than Acquire/Release semantics. Which I agree. but still MB itself is hard to reason about. MB guarantees: - that all the stores before the barrier are committed and are seen by other threads before the next writers AND - all the reads after the barrier won't be reordered with the one before *on the current thread* Unfortunately, if my understanding of the problem is correct, this MB is still not enough to guarantee what you need (unless, again, you assume the x86/64 TSO model). With a memory barrier, the following linearization of the code above is possible: W -> Writer thread R -> Reader thread W: g_zone.malloc = ... R: g_zone.context W: g_zone.context = ... R: g_zone.malloc W: MemoryBarrier Essentially, the MB does NOT guarantee that the stores will NOT be reordered before hitting the barrier (if you think about that, a MB cannot have retroactive effect). Hence the other thread can still see the writes in opposite order. I created a repro that you can compile and run for this example here: https://gist.github.com/primiano/0936c4740a28a26d22d41fff8abb0e6e If you run that code on x86/64 it will never hit the race, even if you remove completely the memory barrier. If you run that code on arm (which, as noted in the wikipedia page you linked, has a weaker ordering model) it will fail even in presence of the one memory barrier. Which is the reason why both Acq/Rel and MB are under the "subtle" namespace and the reason why we strongly discourage the use of atomic and use only locks. However, I agree with you, that for this specific case, a lock is a no-go. By direct experience when writing the Linux shim I know that even one extra mfence in the malloc path will be detected by the perf waterfall (true story: https://codereview.chromium.org/1777363002/) (Also, in all this, we are completely ignoring optimizations that the compiler does on top, which are often the biggest reason why you want base::subtle::*Load/Store) > I'll try to come online early tomorrow so we can discuss live.. Yup happy to talk live. I will be later today anyways.
LGTM with the rest of the comment addressed (rest == ignore my comment on the MemoryBarrier as we figured out the MB is sufficient because of the causality relationship between the writer and reader) https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... File base/allocator/malloc_zone_functions_mac.cc (right): https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... base/allocator/malloc_zone_functions_mac.cc:80: base::subtle::MemoryBarrier(); On 2017/02/23 11:19:41, Primiano Tucci wrote: > On 2017/02/23 01:43:14, erikchen wrote: > > On 2017/02/23 00:40:18, Primiano Tucci wrote: > > > This memory barrier is not enough in theory and is non-necessary in practice > > on > > Sorry, but I disagree on both points. Comments inline. > No need to be sorry :) > > > > x86-TSO (but we don't want to rely on this because TSan will slap us). > > > > > > What you really want to guarantee here is that the |context| is the last > field > > > getting written. so that on the reader side either you read a null context > or > > > (non-null context AND consistent function ptrs). > > Not quite. Your suggestion is a stronger guarantee than we need. We just need > to > > ensure that each malloc-zone that we've shimmed will see a consistent result > for > > the MallocZoneFunctions that has the appropriate |context|. > Let me see if I visualize the problem correctly before we discuss the solution. > My understanding is that, in the very essence, the problem we have is the > following (I'm deliberately ignoring the "array" part and considering the > simpler case of 1-item array): > on one writer thread you do: > g_zone.malloc_function_ptr = .... > g_zone.calloc_function_ptr = .... > g_zone.context = .... > > On the reader thread (in allocator_shim_default_dispatch_to_mac_zoned_malloc.cc) > if (g_zone.context != nullptr) > .. use the fields of the zone such as malloc_function_ptr > > So my understanding is that either you don't want to see the zone being > populated, or you want to see it populated consistently. > is this right? or am I misunderstanding the problem? > > > we need a memory barrier before inserting the shim. I commented on this in the > > header file. > Yup saw that. > > > > > > The way you epxress this is by making |context| a subtle::AtomicWord and by > > > doing a ReleaseStore here (well, in StoreZoneFunctions) AND an AcquireLoad > in > > > GetFunctionsForZone. > > This potentially has very significant performance implications [depends on > > implementation of AcquireLoad/ReleaseStore], which is why I've avoided it. > There are no performance implication on AcquireLoad/ReleaseStore on x86/x86_64 > (which are the only archs we care about on Mac). AFAIL there is a penalty on > Linux desktop, but that's a bug and just because we are using a mismatching > sysroot (crbug.com/592903) > > > > > > > > > > Now, I say this is non necessary on x86-TSO because, in the Total Store > Order > > > model, the reordering engine guarantees that stores and loads appear to be > > > always linearized, and the only visible thing that can happen IIRC is that > > > stores get reorder w.r.t loads. > > ia-64 allows all sorts of reorderings. > Correct, but ia-64 == itanium != x86_64. > (I know, but... ia-32 == x86. Lovely isn't it?) > > > > https://en.wikipedia.org/wiki/Memory_ordering > Precisely. That confirms my point. You did just look at the wrong column :) > amd64 is what you want to look for here. the "amd" part is irrelevant and > historic. amd64 == x86_64 (regardless of the brand, Intel or AMD) > > (And to add more excitement, since we are on the topic, amd64 is also often > referred to as x64) > > > > > > > > > > But you don't want to think to this micro-architecture detail and depend > about > > > all this (also because in the meantime the compiler could take other > > reordering > > > licenses). Thankfully you don't need to. in fact if you look at the x86 > > > implementation both AcquireLoad and REleaseStore are just conventional load > > and > > > store operations (+ the required volatile and such to keep to compiler > quiet) > > :) > > As you've noted, it's very hard to get correct semantics for AcquireLoad and > > ReleaseStore, whereas MemoryBarrier is very well defined. > > Also AcquireLoad and ReleaseStore semantic is "well defined". But I don't want > to get lost in linguistic or phylosophical discussions, so I guess what you > meant here is: "MemoryBarrier" is easier to reason about than Acquire/Release > semantics. Which I agree. but still MB itself is hard to reason about. > > MB guarantees: > - that all the stores before the barrier are committed and are seen by other > threads before the next writers AND > - all the reads after the barrier won't be reordered with the one before *on the > current thread* > > Unfortunately, if my understanding of the problem is correct, this MB is still > not enough to guarantee what you need (unless, again, you assume the x86/64 TSO > model). With a memory barrier, the following linearization of the code above is > possible: > > W -> Writer thread > R -> Reader thread > > W: g_zone.malloc = ... > R: g_zone.context > W: g_zone.context = ... > R: g_zone.malloc > W: MemoryBarrier > > Essentially, the MB does NOT guarantee that the stores will NOT be reordered > before hitting the barrier (if you think about that, a MB cannot have > retroactive effect). Hence the other thread can still see the writes in opposite > order. > I created a repro that you can compile and run for this example here: > > https://gist.github.com/primiano/0936c4740a28a26d22d41fff8abb0e6e > > If you run that code on x86/64 it will never hit the race, even if you remove > completely the memory barrier. > If you run that code on arm (which, as noted in the wikipedia page you linked, > has a weaker ordering model) it will fail even in presence of the one memory > barrier. > > Which is the reason why both Acq/Rel and MB are under the "subtle" namespace and > the reason why we strongly discourage the use of atomic and use only locks. > However, I agree with you, that for this specific case, a lock is a no-go. By > direct experience when writing the Linux shim I know that even one extra mfence > in the malloc path will be detected by the perf waterfall (true story: > https://codereview.chromium.org/1777363002/) > > (Also, in all this, we are completely ignoring optimizations that the compiler > does on top, which are often the biggest reason why you want > base::subtle::*Load/Store) > > > I'll try to come online early tomorrow so we can discuss live.. > Yup happy to talk live. I will be later today anyways. Erik and I had a chat offline, summarizing here. There is a causality relationship that I was not seeing. Essentially I was worrying about the case when another thread starts seeing the stores before we actually get to the MemoryBarrier. That would be a problem and that is the repro I did create in that snippet above. However, this cannot happen here. The only code that reads the malloc functions is triggered when the malloc zone is replaced, which happens after this point. So nothing can see the stores before this point, hence the MB here is correct as you only want to guarantee that whatever happens next sees everything you have written, but you don't have to worry about what happens before. (unrelatedly, there is a possibility that ASan will still bark on this, because it won't be able to infer this causality. but let's not worry about this until this happens first)
The CQ bit was checked by erikchen@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from primiano@chromium.org, mark@chromium.org Link to the patchset: https://codereview.chromium.org/2703803004/#ps300001 (title: "Comments from primiano, mark.")
https://codereview.chromium.org/2703803004/diff/240001/base/allocator/allocat... File base/allocator/allocator_interception_mac.mm (right): https://codereview.chromium.org/2703803004/diff/240001/base/allocator/allocat... base/allocator/allocator_interception_mac.mm:351: for (unsigned int i = 0; i < count; ++i) { On 2017/02/23 11:19:41, Primiano Tucci wrote: > On 2017/02/23 01:43:14, erikchen wrote: > > On 2017/02/23 00:40:17, Primiano Tucci wrote: > > > just checking: what happens if a zone goes away while we iterate? there is > no > > > locking here. > > > Asking because we had some similar problem on windows and the conclusion was > > > that the winapi don't have any way to safely enumerate the heaps (similar > > > concept to these zones, really), as the winapi would crash if a heap is > > > destroyed while iterating. (sorry too late to dig in crbug, but if you git > > blame > > > the malloc_dump_provider under the win section I'm pretty sure you'll find > the > > > CL with all the context, if interested) > > > > > > I hope we're not in the same situation here :) > > > > same problem, sorry. > > Are you saying that we know upfront that this code will likely make chrome > crash? there is a race condition that could cause a chrome crash, if a library destroys a malloc zone while we iterate over all zones. malloc_get_all_zones is almost really helpful, as it takes a memory_reader_t argument that theoretically can copy into a safe region, but it doesn't acquire the malloc zone lock! Ugh. I'm not worried about this CL, since it shims very early on in the life cycle. I am however worried about the next CL. Will have a follow-up discussion there. https://codereview.chromium.org/2703803004/diff/240001/base/allocator/allocat... File base/allocator/allocator_shim_default_dispatch_to_mac_zoned_malloc.cc (right): https://codereview.chromium.org/2703803004/diff/240001/base/allocator/allocat... base/allocator/allocator_shim_default_dispatch_to_mac_zoned_malloc.cc:18: MallocZoneFunctions& GetFunctionsForZone(void* zone) { On 2017/02/23 11:19:41, Primiano Tucci wrote: > On 2017/02/23 01:43:14, erikchen wrote: > > On 2017/02/23 00:40:17, Primiano Tucci wrote: > > > IMHO it makes more sense for this to be an inline function in > > > /malloc_zone_functions_mac.h and hiding (i.e. un-externing) the > > g_malloc_zones. > > > This requires the awkard bit where here in this file you need to worry > abound > > > and match the acquire/release semantic (see my comment about threading in > > > malloc_zone_functions_mac.h) > > > > inline function in malloc_zone_functions_mac.h doesn't allow us to un-extern > > g_malloc_zones. > sorry, I really meant "allow us to not have to access the extern g_malloc_zone > outside of that file" > > > Instead I've marked this function inline [which is really just a > > compiler hint anyways, which is why I mostly avoid use of it].. > Yup agree about inline being pretty much useless. My suggestion was to define > this function in the malloc_zone_functions_mac.h, so that all the logic that > accesses g_malloc_zones stays there. > Done. https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... File base/allocator/malloc_zone_functions_mac.cc (right): https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... base/allocator/malloc_zone_functions_mac.cc:47: base::Lock* GetLock() { On 2017/02/23 11:19:41, Primiano Tucci wrote: > On 2017/02/23 01:43:14, erikchen wrote: > > On 2017/02/23 00:40:18, Primiano Tucci wrote: > > > do you really need all this locking here? Isn't easier to just make it so > that > > > all the zone replacements are done from the same thread and just add a > > > threadchecker here? > > > > Actually, the new Task Scheduler doesn't guarantee thread-affinity, just > > ordering. > Well depends wheteher you use SingleThreadTaskRunner or just a seqTaskRunner. > > > Adding a usage requirement in the header [which would only loosely be > > enforced by dchecks/checks] add complexity to the interface for no benefit, > > whereas using a lock ensures correctness, and doesn't pollute the interface.] > > Note that there's no real performance hit, since there's not going to be > > contention on this lock. > My worry is not about contention, is just when possible I prefer to not think > about thread safety (by guaranteeing that everything happens on a thread) > instead of trying to deal with it. Nothing is more thread safe than things that > happen on a single thread :) > Ahyhow, not strongly opinionated on this, so feel free to keep this. There's no way to guarantee things happen on a single thread without high-performance-impact checks. Locks guarantee that we have the right behavior, regardless of how consumers use us. https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... base/allocator/malloc_zone_functions_mac.cc:70: memset(g_malloc_zones, 0, sizeof(g_malloc_zones)); On 2017/02/23 11:19:41, Primiano Tucci wrote: > On 2017/02/23 01:43:14, erikchen wrote: > > On 2017/02/23 00:40:18, Primiano Tucci wrote: > > > why you need this? global structs and pods are zero-initialized by default. > > > maybe just add a static_assert(is_pod(g_malloc_zones)) to make sure that > won't > > > accidentally become a dynamic initializer. > > > > This isn't pod. Notice that MallocZoneFunctions has an explicit constructor. > > Uhm, this means that g_malloc_zones will likely cause an initializer, which is: > - forbidden by the style guide (see below) > - going to be intercepted by Nico's static intializer's check in the main > waterfall (there is no check in CQ) and cause a revert, unless the compiler is > smart enough to figure out that the ctor doesn't really require a static > initializer (but, as per above, you shouldn't rely on that. > > From style guide: > Variables of class type with static storage duration are forbidden (...) > Objects with static storage duration, including global variables, static > variables, static class member variables, and function static variables, must be > Plain Old Data (POD): only ints, chars, floats, or pointers, or arrays/structs > of POD. > > Also, I just realized that by having a default ctor for MallocZoneFunctions that > does nothing, you are essentially preventing the global struct to be zero > initialized. This explains why the existing code didn't hit the static > initialzier check. > > In summary, I think the cleanest thing to do is to drop that constructor, make > MallocZoneFunctions a POD, and initialize with {} in the places where is not > used as a global. > > > > See > > my response to your previous set of comments. But you're right that we don't > > need to initialize this array. 1. Our custom clang module forces us to make a default constructor because there are >= 10 members. 2. Default constructor means non-POD 3. + Global = static initializers. AFAICT, there's no way to turn off the custom clang warning. This seems like an issue with our custom clang plugin. I'll chat with erg@. That being said, given that we want a global array, I actually think it's slightly better for us to dynamically allocate the space [lower binary size]. I've gone ahead and done this. https://codereview.chromium.org/2703803004/diff/280001/base/allocator/allocat... File base/allocator/allocator_interception_mac.h (right): https://codereview.chromium.org/2703803004/diff/280001/base/allocator/allocat... base/allocator/allocator_interception_mac.h:17: // Saves the function pointers currently used by default zone. On 2017/02/23 02:20:12, Mark Mentovai wrote: > the default zone Done. https://codereview.chromium.org/2703803004/diff/280001/base/allocator/allocat... File base/allocator/allocator_interception_mac.mm (right): https://codereview.chromium.org/2703803004/diff/280001/base/allocator/allocat... base/allocator/allocator_interception_mac.mm:360: kern_return_t kr = malloc_get_all_zones(mach_task_self(), 0, &zones, &count); On 2017/02/23 02:20:12, Mark Mentovai wrote: > memory_reader_t (the second argument) is a function pointer, so nullptr it is. Done. https://codereview.chromium.org/2703803004/diff/280001/base/allocator/allocat... base/allocator/allocator_interception_mac.mm:368: if (zone->malloc == functions->malloc) On 2017/02/23 02:20:12, Mark Mentovai wrote: > You can merge this with the previous with an ||, or for even one line fewer, do > that and then invert the logic to make ReplaceZoneFunctions() conditioned and > get rid of the “continue” altogether. Done. https://codereview.chromium.org/2703803004/diff/280001/base/allocator/allocat... File base/allocator/allocator_shim_default_dispatch_to_mac_zoned_malloc.cc (right): https://codereview.chromium.org/2703803004/diff/280001/base/allocator/allocat... base/allocator/allocator_shim_default_dispatch_to_mac_zoned_malloc.cc:23: IMMEDIATE_CRASH(); On 2017/02/23 02:20:12, Mark Mentovai wrote: > Too unsafe to log something if this goes bad? right - we're about to cause a malloc-related function to fail, and any logging that might use malloc can cause recursion. https://codereview.chromium.org/2703803004/diff/280001/base/allocator/malloc_... File base/allocator/malloc_zone_functions_mac.cc (right): https://codereview.chromium.org/2703803004/diff/280001/base/allocator/malloc_... base/allocator/malloc_zone_functions_mac.cc:47: base::Lock* GetLock() { On 2017/02/23 02:20:12, Mark Mentovai wrote: > I know I told you that I didn’t care which, but now that I see how this worked > out, I think it’d be better if you returned a & instead of a *. Four out of five > callers want the ref anyway, and the fifth could easily use the ref. Done.
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
erikchen@chromium.org changed reviewers: + erg@chromium.org
Chatted with erg@. His response was that given that wasn't a blocker for this CL [since I actually wanted a global array, which I can malloc onto the heap], let's continue to not worry about this. I asked if we could add a mechanism to explicitly turn off this warning for a struct. erg@ says not easily. Maybe something using C++11 attributes.
https://codereview.chromium.org/2703803004/diff/240001/base/allocator/allocat... File base/allocator/allocator_interception_mac.mm (right): https://codereview.chromium.org/2703803004/diff/240001/base/allocator/allocat... base/allocator/allocator_interception_mac.mm:351: for (unsigned int i = 0; i < count; ++i) { On 2017/02/23 20:06:44, erikchen wrote: > On 2017/02/23 11:19:41, Primiano Tucci wrote: > > On 2017/02/23 01:43:14, erikchen wrote: > > > On 2017/02/23 00:40:17, Primiano Tucci wrote: > > > > just checking: what happens if a zone goes away while we iterate? there is > > no > > > > locking here. > > > > Asking because we had some similar problem on windows and the conclusion > was > > > > that the winapi don't have any way to safely enumerate the heaps (similar > > > > concept to these zones, really), as the winapi would crash if a heap is > > > > destroyed while iterating. (sorry too late to dig in crbug, but if you git > > > blame > > > > the malloc_dump_provider under the win section I'm pretty sure you'll find > > the > > > > CL with all the context, if interested) > > > > > > > > I hope we're not in the same situation here :) > > > > > > same problem, sorry. > > > > Are you saying that we know upfront that this code will likely make chrome > > crash? > > there is a race condition that could cause a chrome crash, if a library destroys > a malloc zone while we iterate over all zones. > > malloc_get_all_zones is almost really helpful, as it takes a memory_reader_t > argument that theoretically can copy into a safe region, but it doesn't acquire > the malloc zone lock! Ugh. > > I'm not worried about this CL, since it shims very early on in the life cycle. I > am however worried about the next CL. Will have a follow-up discussion there. Acknowledged. https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... File base/allocator/malloc_zone_functions_mac.cc (right): https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... base/allocator/malloc_zone_functions_mac.cc:47: base::Lock* GetLock() { On 2017/02/23 20:06:44, erikchen wrote: > On 2017/02/23 11:19:41, Primiano Tucci wrote: > > On 2017/02/23 01:43:14, erikchen wrote: > > > On 2017/02/23 00:40:18, Primiano Tucci wrote: > > > > do you really need all this locking here? Isn't easier to just make it so > > that > > > > all the zone replacements are done from the same thread and just add a > > > > threadchecker here? > > > > > > Actually, the new Task Scheduler doesn't guarantee thread-affinity, just > > > ordering. > > Well depends wheteher you use SingleThreadTaskRunner or just a seqTaskRunner. > > > > > Adding a usage requirement in the header [which would only loosely be > > > enforced by dchecks/checks] add complexity to the interface for no benefit, > > > whereas using a lock ensures correctness, and doesn't pollute the > interface.] > > > Note that there's no real performance hit, since there's not going to be > > > contention on this lock. > > My worry is not about contention, is just when possible I prefer to not think > > about thread safety (by guaranteeing that everything happens on a thread) > > instead of trying to deal with it. Nothing is more thread safe than things > that > > happen on a single thread :) > > Ahyhow, not strongly opinionated on this, so feel free to keep this. > > There's no way to guarantee things happen on a single thread without > high-performance-impact checks. ThreadChecker (and SequenceChecker) have zero cost in production as they are dcheck-only, which is usually IMHO a reasonable compromise. > Locks guarantee that we have the right behavior, > regardless of how consumers use us. locks induce people to make false assumptions on the threadsafety of the code, using it for use-cases it was never designed to deal with. By experience they are never a problem at the time you land the CL. but one day, 6 months from now, somebody will refactor this code (or add some other calls). to be clear, i'm perfectly fine today with this, but my warn is that one day you'll see a bug on this :) I prefer explicit "this code doesn't support multi-thread scenario" because it discourage future people (sometimes that being future-myself) from doing adventurous things. https://codereview.chromium.org/2703803004/diff/240001/base/allocator/malloc_... base/allocator/malloc_zone_functions_mac.cc:70: memset(g_malloc_zones, 0, sizeof(g_malloc_zones)); On 2017/02/23 20:06:44, erikchen wrote: > On 2017/02/23 11:19:41, Primiano Tucci wrote: > > On 2017/02/23 01:43:14, erikchen wrote: > > > On 2017/02/23 00:40:18, Primiano Tucci wrote: > > > > why you need this? global structs and pods are zero-initialized by > default. > > > > maybe just add a static_assert(is_pod(g_malloc_zones)) to make sure that > > won't > > > > accidentally become a dynamic initializer. > > > > > > This isn't pod. Notice that MallocZoneFunctions has an explicit constructor. > > > > > Uhm, this means that g_malloc_zones will likely cause an initializer, which > is: > > - forbidden by the style guide (see below) > > - going to be intercepted by Nico's static intializer's check in the main > > waterfall (there is no check in CQ) and cause a revert, unless the compiler is > > smart enough to figure out that the ctor doesn't really require a static > > initializer (but, as per above, you shouldn't rely on that. > > > > From style guide: > > Variables of class type with static storage duration are forbidden (...) > > Objects with static storage duration, including global variables, static > > variables, static class member variables, and function static variables, must > be > > Plain Old Data (POD): only ints, chars, floats, or pointers, or arrays/structs > > of POD. > > > > Also, I just realized that by having a default ctor for MallocZoneFunctions > that > > does nothing, you are essentially preventing the global struct to be zero > > initialized. This explains why the existing code didn't hit the static > > initialzier check. > > > > In summary, I think the cleanest thing to do is to drop that constructor, make > > MallocZoneFunctions a POD, and initialize with {} in the places where is not > > used as a global. > > > > > > > See > > > my response to your previous set of comments. But you're right that we don't > > > need to initialize this array. > > 1. Our custom clang module forces us to make a default constructor because there > are >= 10 members. > 2. Default constructor means non-POD > 3. + Global = static initializers. > AFAICT, there's no way to turn off the custom clang warning. This seems like an > issue with our custom clang plugin. I'll chat with erg@. > Alright I see, there seems to be a problem in our constraints. essentially: - clang plugin requires a struct with too many field to have an explicit ctor. - the only way to avoid static initializers for a stuct is to not have a ctor. Will start a thread on chromium-dev about this. > That being said, given that we want a global array, I actually think it's > slightly better for us to dynamically allocate the space [lower binary size]. zero-initialized data does not cause any binary increase. See discussion in: https://groups.google.com/a/chromium.org/d/msg/project-trim/vfX5Ohn1Ees/gLLZ2... > I've gone ahead and done this. okay for me, but you just added an extra layer of indirection when accessing the array. Proof here: https://gist.github.com/primiano/2473ac16db614ffa1fc1f33b4dc0e15b (yes, I am nerdsnipable that easily :P) essentially accessing an heap allocated array requires to first fetch the pointer into a register, and then use the register+off*displacement mov to read/write. Instead when accessing a global object you can directly do the math based on the the instruction pointer, because in position in dependent code, the array will be laid out at a fixed distance from the IP of the current instruction ;-) Having said that this still is okay to me. Just saying because you were the one obsessed with perf impact of AcquireLoad/ReleaseStore :)
The CQ bit was unchecked by commit-bot@chromium.org
Failed to apply patch for base/BUILD.gn: While running git apply --index -p1; error: patch failed: base/BUILD.gn:152 error: base/BUILD.gn: patch does not apply Patch: base/BUILD.gn Index: base/BUILD.gn diff --git a/base/BUILD.gn b/base/BUILD.gn index 833d611c6283fdf9421fb8cfef3c89379a900b14..42896931cb058bca159f15be249ae5979c500520 100644 --- a/base/BUILD.gn +++ b/base/BUILD.gn @@ -152,6 +152,8 @@ component("base") { "allocator/allocator_interception_mac.h", "allocator/allocator_interception_mac.mm", "allocator/allocator_shim.h", + "allocator/malloc_zone_functions_mac.cc", + "allocator/malloc_zone_functions_mac.h", "allocator/oom.h", "android/animation_frame_time_histogram.cc", "android/animation_frame_time_histogram.h", @@ -1885,6 +1887,7 @@ if (is_ios || is_mac) { test("base_unittests") { sources = [ + "allocator/malloc_zone_functions_mac_unittest.cc", "allocator/tcmalloc_unittest.cc", "android/application_status_listener_unittest.cc", "android/content_uri_utils_unittest.cc",
> okay for me, but you just added an extra layer of indirection when accessing the > array. > Proof here: > https://gist.github.com/primiano/2473ac16db614ffa1fc1f33b4dc0e15b > (yes, I am nerdsnipable that easily :P) > > essentially accessing an heap allocated array requires to first fetch the > pointer into a register, and then use the register+off*displacement mov to > read/write. > Instead when accessing a global object you can directly do the math based on the > the instruction pointer, because in position in dependent code, the array will > be laid out at a fixed distance from the IP of the current instruction ;-) > > Having said that this still is okay to me. Just saying because you were the one > obsessed with perf impact of AcquireLoad/ReleaseStore :) Wasn't aware of the performance hit there. :(
The CQ bit was checked by erikchen@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from primiano@chromium.org, mark@chromium.org Link to the patchset: https://codereview.chromium.org/2703803004/#ps320001 (title: "Rebase.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 320001, "attempt_start_ts": 1487927532994620, "parent_rev": "0ec18db7916d7549ee8fe61bb6063ca8922ed86e", "commit_rev": "7a54483e22a7e4578870d144b7d4a71380e63a10"}
Message was sent while issue was closed.
Description was changed from ========== macOS: Shim all malloc zones. All information about malloc zones [and their original functions] is stored in MallocZoneAggregator. When the allocator shim is ready to dispatch back to the original system implementations, it calls the MallocZoneAggregator with |context|, which provides enough information to dispatch to the correct malloc zone. BUG=693237 ========== to ========== macOS: Shim all malloc zones. All information about malloc zones [and their original functions] is stored in MallocZoneAggregator. When the allocator shim is ready to dispatch back to the original system implementations, it calls the MallocZoneAggregator with |context|, which provides enough information to dispatch to the correct malloc zone. BUG=693237 Review-Url: https://codereview.chromium.org/2703803004 Cr-Commit-Position: refs/heads/master@{#452784} Committed: https://chromium.googlesource.com/chromium/src/+/7a54483e22a7e4578870d144b7d4... ==========
Message was sent while issue was closed.
Committed patchset #16 (id:320001) as https://chromium.googlesource.com/chromium/src/+/7a54483e22a7e4578870d144b7d4... |