|
|
DescriptionNew internal cache (stack) class for recording available memories.
This is extracted to its own class for simplicity and ease-of-reuse. It was also originally intended to be a lock-free object but that ended up being overly complicated so has been put on indefinite hold.
BUG=620813
Committed: https://crrev.com/4c361942b436b45ddcaab02d6829921b8f3c9e81
Cr-Commit-Position: refs/heads/master@{#416282}
Patch Set 1 #Patch Set 2 : some clean up #
Total comments: 23
Patch Set 3 : addressed some review comments by manzagop #
Total comments: 1
Patch Set 4 : simplify fifo design #
Total comments: 8
Patch Set 5 : refactor fifo (now queue) to allocate slot before updating it #
Total comments: 8
Patch Set 6 : addressed review comments by manzagop #Patch Set 7 : fix broken pop operation #
Total comments: 6
Patch Set 8 : new test #Patch Set 9 : switch from queue to stack (only one atomic pointer) #Patch Set 10 : fixed some compile problems #
Total comments: 10
Patch Set 11 : comment improvements #
Total comments: 4
Patch Set 12 : more comment improvements #
Total comments: 3
Patch Set 13 : forgo lock-free; use regular lock to create thread-safe stack #
Total comments: 2
Patch Set 14 : fixed comment #
Messages
Total messages: 147 (92 generated)
The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: cast_shell_linux on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/cast_shell_linu...)
The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run
Patchset #2 (id:20001) has been deleted
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-device on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device/builds...)
The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Patchset #2 (id:40001) has been deleted
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-device on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device/builds...)
The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run
Patchset #2 (id:60001) has been deleted
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
bcwhite@chromium.org changed reviewers: + manzagop@chromium.org
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Nothing of substance but I still need to go over push/pop in detail. https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra... File base/debug/activity_tracker.cc (right): https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra... base/debug/activity_tracker.cc:506: // Get the first available memory from the top of the stack. nit: top of the stack -> fifo. https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra... base/debug/activity_tracker.cc:512: // fail if something else has already taken the block and changed its type. Who could have taken the block, and how? Ah, by iterating directly over the allocator's blocks below? https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra... base/debug/activity_tracker.cc:529: // of available memories and try to claim it. Maybe clarify the comment to say the claimed block might be in the cache, eg: Expand the search for free blocks by iterating over the allocator's free blocks. This is because the cache's capacity may be inferior to the number of allocated blocks. Note that any block found may actually be in the cache (if it was inserted after the last check). https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra... base/debug/activity_tracker.cc:627: // Push this on the internal list of available memory blocks so it can nit: list -> cache https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra... base/debug/activity_tracker.cc:629: // size of the stack, it will fail but a fallback check in CreateTracker stack -> fifo / free block cache https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra... File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra... base/debug/activity_tracker.h:50: template <typename T, size_t SIZE, T INVALID_VALUE> Would it be better for generated code size to exclude SIZE and INVALID_VALUE from the template and have them be constructor params? https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra... base/debug/activity_tracker.h:53: CAPACITY = SIZE + 1 // +1 because one slot must always be free. nit: isn't "size = capacity + 1" more accurate? https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra... base/debug/activity_tracker.h:95: // Reports if the stack is empty/full based on explit head/tail values. nit: explit? https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra... base/debug/activity_tracker.h:98: return (tail + CAPACITY - 1) % CAPACITY == head; nit: perhaps comment on the modulo, since that varies per language eg: // Note: adding CAPACITY to ensure result of modulo is positive.
https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra... File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra... base/debug/activity_tracker.h:151: // false failure doesn't occur. I'm wondering if there's a problem. Being able to store a value at head (l.129) means that we wrote to a free spot, but it doesn't mean we wrote at the real head_. It's possible that in between loading the head (l.115) and writing to it (l.129) there were lots of pushes and pops and we're actually writing in the middle of the free range?
https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra... File base/debug/activity_tracker.cc (right): https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra... base/debug/activity_tracker.cc:506: // Get the first available memory from the top of the stack. On 2016/08/16 21:44:31, manzagop wrote: > nit: top of the stack -> fifo. Done. Though a FIFO and LIFO are both types of stack. https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra... base/debug/activity_tracker.cc:512: // fail if something else has already taken the block and changed its type. On 2016/08/16 21:44:31, manzagop wrote: > Who could have taken the block, and how? Ah, by iterating directly over the > allocator's blocks below? Correct. https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra... base/debug/activity_tracker.cc:529: // of available memories and try to claim it. On 2016/08/16 21:44:31, manzagop wrote: > Maybe clarify the comment to say the claimed block might be in the cache, eg: > > Expand the search for free blocks by iterating over the allocator's free blocks. > This is because the cache's capacity may be inferior to the number of allocated > blocks. Note that any block found may actually be in the cache (if it was > inserted after the last check). Done. https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra... base/debug/activity_tracker.cc:627: // Push this on the internal list of available memory blocks so it can On 2016/08/16 21:44:31, manzagop wrote: > nit: list -> cache Done. https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra... base/debug/activity_tracker.cc:629: // size of the stack, it will fail but a fallback check in CreateTracker On 2016/08/16 21:44:31, manzagop wrote: > stack -> fifo / free block cache Done. https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra... File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra... base/debug/activity_tracker.h:50: template <typename T, size_t SIZE, T INVALID_VALUE> On 2016/08/16 21:44:31, manzagop wrote: > Would it be better for generated code size to exclude SIZE and INVALID_VALUE > from the template and have them be constructor params? Seems reasonable. I'll look into it. https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra... base/debug/activity_tracker.h:95: // Reports if the stack is empty/full based on explit head/tail values. On 2016/08/16 21:44:31, manzagop wrote: > nit: explit? Done. https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra... base/debug/activity_tracker.h:98: return (tail + CAPACITY - 1) % CAPACITY == head; On 2016/08/16 21:44:32, manzagop wrote: > nit: perhaps comment on the modulo, since that varies per language eg: > // Note: adding CAPACITY to ensure result of modulo is positive. Done. https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra... base/debug/activity_tracker.h:151: // false failure doesn't occur. On 2016/08/16 22:05:00, manzagop wrote: > I'm wondering if there's a problem. Being able to store a value at head (l.129) > means that we wrote to a free spot, but it doesn't mean we wrote at the real > head_. It's possible that in between loading the head (l.115) and writing to it > (l.129) there were lots of pushes and pops and we're actually writing in the > middle of the free range? So... multiple pushes and pops occur such that tail advances past head and the store succeeds but the head-increment fails. Good catch. I'm going to have to think on this. One solution that comes to mind would be to keep head/tail as raw values and only do the modulus when referencing stack_. Then, if this increment fails then I can see if it failed because the tail has gone by and undo the store of the "invalid" value. That would also eliminate the size/capacity (or, "How do I differentiate between empty and full?") problem. I don't *think* a limit of 4 billion push operations (on a 32-bit architecture) would be a real limitation. You?
https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra... File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra... base/debug/activity_tracker.h:151: // false failure doesn't occur. On 2016/08/17 19:29:23, bcwhite wrote: > On 2016/08/16 22:05:00, manzagop wrote: > > I'm wondering if there's a problem. Being able to store a value at head > (l.129) > > means that we wrote to a free spot, but it doesn't mean we wrote at the real > > head_. It's possible that in between loading the head (l.115) and writing to > it > > (l.129) there were lots of pushes and pops and we're actually writing in the > > middle of the free range? > > So... multiple pushes and pops occur such that tail advances past head and the > store succeeds but the head-increment fails. > > Good catch. > > I'm going to have to think on this. One solution that comes to mind would be to > keep head/tail as raw values and only do the modulus when referencing stack_. > Then, if this increment fails then I can see if it failed because the tail has > gone by and undo the store of the "invalid" value. That would also eliminate > the size/capacity (or, "How do I differentiate between empty and full?") > problem. > > I don't *think* a limit of 4 billion push operations (on a 32-bit architecture) > would be a real limitation. You? > I agree this fixes the problem cases. However, I'm wary of baking a limit on the number of operations even though 4B does seem high. Is a lock really too expensive in this scenario? We're talking once per thread creation / destruction. https://codereview.chromium.org/2255503002/diff/100001/base/debug/activity_tr... File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/100001/base/debug/activity_tr... base/debug/activity_tracker.h:141: head_.compare_exchange_strong(head, head + 1, I've just noticed this could do a spurious push, if: - t1 puts a valid value but suspends before incrementing head - t2 fails to do the exchange and enters the if - t1 increments head. - some threads do enough complete pushes to wrap around back to head which now contains an invalid value - t2 resumes and increments head which effectively pushes an invalid value. I think the suggestion to store head "not modulo-ed" addresses it though.
The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run
https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra... File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra... base/debug/activity_tracker.h:50: template <typename T, size_t SIZE, T INVALID_VALUE> On 2016/08/16 21:44:31, manzagop wrote: > Would it be better for generated code size to exclude SIZE and INVALID_VALUE > from the template and have them be constructor params? Done. https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra... base/debug/activity_tracker.h:53: CAPACITY = SIZE + 1 // +1 because one slot must always be free. On 2016/08/16 21:44:32, manzagop wrote: > nit: isn't "size = capacity + 1" more accurate? Done. https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra... base/debug/activity_tracker.h:151: // false failure doesn't occur. On 2016/08/18 14:14:46, manzagop wrote: > On 2016/08/17 19:29:23, bcwhite wrote: > > On 2016/08/16 22:05:00, manzagop wrote: > > > I'm wondering if there's a problem. Being able to store a value at head > > (l.129) > > > means that we wrote to a free spot, but it doesn't mean we wrote at the real > > > head_. It's possible that in between loading the head (l.115) and writing to > > it > > > (l.129) there were lots of pushes and pops and we're actually writing in the > > > middle of the free range? > > > > So... multiple pushes and pops occur such that tail advances past head and > the > > store succeeds but the head-increment fails. > > > > Good catch. > > > > I'm going to have to think on this. One solution that comes to mind would be > to > > keep head/tail as raw values and only do the modulus when referencing stack_. > > Then, if this increment fails then I can see if it failed because the tail has > > gone by and undo the store of the "invalid" value. That would also eliminate > > the size/capacity (or, "How do I differentiate between empty and full?") > > problem. > > > > I don't *think* a limit of 4 billion push operations (on a 32-bit > architecture) > > would be a real limitation. You? > > > > I agree this fixes the problem cases. However, I'm wary of baking a limit on the > number of operations even though 4B does seem high. > > Is a lock really too expensive in this scenario? We're talking once per thread > creation / destruction. I've simplified things now that thread-death is no longer a concern. It precludes ever adapting this to memory shared between processes but that's a problem for another day, such as when there is actually a demand for it.
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: cast_shell_linux on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/cast_shell_linu...) ios-device on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device/builds...) ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...)
https://codereview.chromium.org/2255503002/diff/120001/base/debug/activity_tr... File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/120001/base/debug/activity_tr... base/debug/activity_tracker.h:123: // Pushing the "invalid" value is not allowed; it would be cause an infinite nit: "it would be cause" https://codereview.chromium.org/2255503002/diff/120001/base/debug/activity_tr... base/debug/activity_tracker.h:150: // If the new head maches the old one then another thread is currently typo: maches https://codereview.chromium.org/2255503002/diff/120001/base/debug/activity_tr... base/debug/activity_tracker.h:164: // for this to fail since only one thread at a time can be between the Ugh... I think we still have the issue where after reading the head, the tail can pass you by, and it's possible to write somewhere other than at head. I wonder if an undo would work. Another option might be if you have another invalid value which could be used to signal head's location?
https://codereview.chromium.org/2255503002/diff/120001/base/debug/activity_tr... File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/120001/base/debug/activity_tr... base/debug/activity_tracker.h:164: // for this to fail since only one thread at a time can be between the > Ugh... I think we still have the issue where after reading the head, the tail > can pass you by, and it's possible to write somewhere other than at head. I don't think that can happen any more. Only the thread that successfully writes the value can now increment the head so there is no way for the head to be anything but what this thread expects. The tail can't advance past the head because there is an "empty()" check at the top of its loop so pop() will never even try to read the value we've written here until after the increment can occur on line 168. Is there a different case you're thinking of?
https://codereview.chromium.org/2255503002/diff/120001/base/debug/activity_tr... File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/120001/base/debug/activity_tr... base/debug/activity_tracker.h:164: // for this to fail since only one thread at a time can be between the On 2016/08/22 12:59:41, bcwhite wrote: > > Ugh... I think we still have the issue where after reading the head, the tail > > can pass you by, and it's possible to write somewhere other than at head. > > I don't think that can happen any more. Only the thread that successfully > writes the value can now increment the head so there is no way for the head to > be anything but what this thread expects. > > The tail can't advance past the head because there is an "empty()" check at the > top of its loop so pop() will never even try to read the value we've written > here until after the increment can occur on line 168. > > Is there a different case you're thinking of? Here's what I have in mind: T1 stores a copy of head at l.128, then a T2 does multiple push/pops leading tail_ to pass in front of T1's copy of "head". T1 then attempts to do it's push: it succeeds in swapping it's value with an invalid value, but it's not at the real head. It then keeps yielding because its head doesn't match the real head. Eventually, pushes on other threads bring head back to just before T1's belief of head, but it's no longer possible for anyone to push.
The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: mac_chromium_compile_dbg_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_comp...)
The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run
Patchset #5 (id:140001) has been deleted
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2255503002/diff/120001/base/debug/activity_tr... File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/120001/base/debug/activity_tr... base/debug/activity_tracker.h:123: // Pushing the "invalid" value is not allowed; it would be cause an infinite On 2016/08/19 20:44:50, manzagop wrote: > nit: "it would be cause" Done. https://codereview.chromium.org/2255503002/diff/120001/base/debug/activity_tr... base/debug/activity_tracker.h:150: // If the new head maches the old one then another thread is currently On 2016/08/19 20:44:50, manzagop wrote: > typo: maches Acknowledged. https://codereview.chromium.org/2255503002/diff/120001/base/debug/activity_tr... base/debug/activity_tracker.h:164: // for this to fail since only one thread at a time can be between the On 2016/08/22 14:49:29, manzagop wrote: > On 2016/08/22 12:59:41, bcwhite wrote: > > > Ugh... I think we still have the issue where after reading the head, the > tail > > > can pass you by, and it's possible to write somewhere other than at head. > > > > I don't think that can happen any more. Only the thread that successfully > > writes the value can now increment the head so there is no way for the head to > > be anything but what this thread expects. > > > > The tail can't advance past the head because there is an "empty()" check at > the > > top of its loop so pop() will never even try to read the value we've written > > here until after the increment can occur on line 168. > > > > Is there a different case you're thinking of? > > Here's what I have in mind: T1 stores a copy of head at l.128, then a T2 does > multiple push/pops leading tail_ to pass in front of T1's copy of "head". T1 > then attempts to do it's push: it succeeds in swapping it's value with an > invalid value, but it's not at the real head. It then keeps yielding because its > head doesn't match the real head. Eventually, pushes on other threads bring head > back to just before T1's belief of head, but it's no longer possible for anyone > to push. Right. I'm going about this all wrong. A slot should be allocated from the head before anything is written. Once a slot belongs to a thread, then it can be updated free of contention.
lgtm! \o/ https://codereview.chromium.org/2255503002/diff/160001/base/debug/activity_tr... File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/160001/base/debug/activity_tr... base/debug/activity_tracker.h:195: PlatformThread::YieldCurrentThread(); Do you need to reload tail in case it was an ongoing pop? https://codereview.chromium.org/2255503002/diff/160001/base/debug/activity_tr... File base/debug/activity_tracker_unittest.cc (right): https://codereview.chromium.org/2255503002/diff/160001/base/debug/activity_tr... base/debug/activity_tracker_unittest.cc:58: if (i % 1 == 0) { This is always true? https://codereview.chromium.org/2255503002/diff/160001/base/debug/activity_tr... base/debug/activity_tracker_unittest.cc:78: while (!queue_->empty()) Is it guaranteed that the popper is still alive? Same question below.
The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2255503002/diff/160001/base/debug/activity_tr... File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/160001/base/debug/activity_tr... base/debug/activity_tracker.h:195: PlatformThread::YieldCurrentThread(); On 2016/08/23 16:02:03, manzagop wrote: > Do you need to reload tail in case it was an ongoing pop? Done. https://codereview.chromium.org/2255503002/diff/160001/base/debug/activity_tr... File base/debug/activity_tracker_unittest.cc (right): https://codereview.chromium.org/2255503002/diff/160001/base/debug/activity_tr... base/debug/activity_tracker_unittest.cc:58: if (i % 1 == 0) { On 2016/08/23 16:02:03, manzagop wrote: > This is always true? Fixed. I had changed in while testing the test and forgot to put it back. https://codereview.chromium.org/2255503002/diff/160001/base/debug/activity_tr... base/debug/activity_tracker_unittest.cc:78: while (!queue_->empty()) On 2016/08/23 16:02:03, manzagop wrote: > Is it guaranteed that the popper is still alive? Same question below. There's no protection against thread-death in the Queue (or anywhere else I've seen) so I don't make any protections here, either.
bcwhite@chromium.org changed reviewers: + glider@chromium.org
Alex, would you mind reviewing a lock-free fifo/queue in activity_tracker.h? The reason for not using a lock is because this class tracks blocking operations such as locks and it seemed better to avoid the recursion.
glider@chromium.org changed reviewers: + dvyukov@chromium.org
Sorry, heading home already, will take a look tomorrow. +dvyukov@, who is in MTV and may have a minute for this (and is far more lock-free-savvy than I am).
The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run
Patchset #7 (id:200001) has been deleted
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Patchset #7 (id:220001) has been deleted
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
https://codereview.chromium.org/2255503002/diff/160001/base/debug/activity_tr... File base/debug/activity_tracker_unittest.cc (right): https://codereview.chromium.org/2255503002/diff/160001/base/debug/activity_tr... base/debug/activity_tracker_unittest.cc:78: while (!queue_->empty()) On 2016/08/23 16:12:41, bcwhite wrote: > On 2016/08/23 16:02:03, manzagop wrote: > > Is it guaranteed that the popper is still alive? Same question below. > > There's no protection against thread-death in the Queue (or anywhere else I've > seen) so I don't make any protections here, either. Ah, what I meant is could it have exited already, but the answer is no, because the number of pushes and pops is equal. https://codereview.chromium.org/2255503002/diff/240001/base/debug/activity_tr... File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/240001/base/debug/activity_tr... base/debug/activity_tracker.h:177: // In short: deallocate the slot at the tail of the queue, read the value I think it's possible that slot wasn't written to yet (a push is in flight). By deallocating, we're making it possible for another push to happen (requires 2 pops), in which case either one of the push values could get written first?
https://codereview.chromium.org/2255503002/diff/240001/base/debug/activity_tr... File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/240001/base/debug/activity_tr... base/debug/activity_tracker.h:177: // In short: deallocate the slot at the tail of the queue, read the value On 2016/08/23 19:46:08, manzagop wrote: > I think it's possible that slot wasn't written to yet (a push is in flight). By > deallocating, we're making it possible for another push to happen (requires 2 > pops), in which case either one of the push values could get written first? Either push could get written first but they'll get written to numbered slots based on the order in which they succeed in the increment-of-head exchange. Waiting for the value to be stored is on line 207. Pop will wait for the lowest-numbered one to be written and return that, though parallel pops could return in any order.
https://codereview.chromium.org/2255503002/diff/160001/base/debug/activity_tr... File base/debug/activity_tracker_unittest.cc (right): https://codereview.chromium.org/2255503002/diff/160001/base/debug/activity_tr... base/debug/activity_tracker_unittest.cc:78: while (!queue_->empty()) On 2016/08/23 19:46:08, manzagop wrote: > On 2016/08/23 16:12:41, bcwhite wrote: > > On 2016/08/23 16:02:03, manzagop wrote: > > > Is it guaranteed that the popper is still alive? Same question below. > > > > There's no protection against thread-death in the Queue (or anywhere else I've > > seen) so I don't make any protections here, either. > > Ah, what I meant is could it have exited already, but the answer is no, because > the number of pushes and pops is equal. Acknowledged.
https://codereview.chromium.org/2255503002/diff/240001/base/debug/activity_tr... File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/240001/base/debug/activity_tr... base/debug/activity_tracker.h:177: // In short: deallocate the slot at the tail of the queue, read the value On 2016/08/23 20:16:26, bcwhite wrote: > On 2016/08/23 19:46:08, manzagop wrote: > > I think it's possible that slot wasn't written to yet (a push is in flight). > By > > deallocating, we're making it possible for another push to happen (requires 2 > > pops), in which case either one of the push values could get written first? > > Either push could get written first but they'll get written to numbered slots > based on the order in which they succeed in the increment-of-head exchange. > Waiting for the value to be stored is on line 207. > > Pop will wait for the lowest-numbered one to be written and return that, though > parallel pops could return in any order. Sorry, I wasn't super clear. I mean that if there are lots of pushes that complete while that first push is in flight (granted that's unlikely) then you could wrap around and start performing a second push at the same location, while the first is still in flight. Either of those might complete first. Again, it's unlikely but possible?
https://codereview.chromium.org/2255503002/diff/240001/base/debug/activity_tr... File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/240001/base/debug/activity_tr... base/debug/activity_tracker.h:177: // In short: deallocate the slot at the tail of the queue, read the value On 2016/08/23 21:20:28, manzagop wrote: > On 2016/08/23 20:16:26, bcwhite wrote: > > On 2016/08/23 19:46:08, manzagop wrote: > > > I think it's possible that slot wasn't written to yet (a push is in flight). > > By > > > deallocating, we're making it possible for another push to happen (requires > 2 > > > pops), in which case either one of the push values could get written first? > > > > Either push could get written first but they'll get written to numbered slots > > based on the order in which they succeed in the increment-of-head exchange. > > Waiting for the value to be stored is on line 207. > > > > Pop will wait for the lowest-numbered one to be written and return that, > though > > parallel pops could return in any order. > > Sorry, I wasn't super clear. I mean that if there are lots of pushes that > complete while that first push is in flight (granted that's unlikely) then you > could wrap around and start performing a second push at the same location, while > the first is still in flight. Either of those might complete first. Again, it's > unlikely but possible? Let's see... 1) T1 push allocates slot X but doesn't write 2) other threads successfully push N-1 values (more will fail) 3) T2 pop deallocates slot X but can't read so yields 4) other threads successfully pop some values; queue no longer full 5) T3 push allocates contended slot X... Going forward, either T1 or T3 (but not both) can complete the write of slot X and T2 will read it. As soon as it's read, though, the other can then complete its write at any time (we're back at step #1) and be available for the next pop operation. Ordering would go to hell and we had two threads go into an extended stall (for due to the scheduler and one waiting for it)... but I don't think anything gets lost or otherwise breaks.
Could you add in the CL description a short blurb about the problems with the code before, and the motivation for the current approach (eg vs using a lock). (To ensure there's a written trace.) https://codereview.chromium.org/2255503002/diff/240001/base/debug/activity_tr... File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/240001/base/debug/activity_tr... base/debug/activity_tracker.h:177: // In short: deallocate the slot at the tail of the queue, read the value On 2016/08/24 11:56:28, bcwhite wrote: > On 2016/08/23 21:20:28, manzagop wrote: > > On 2016/08/23 20:16:26, bcwhite wrote: > > > On 2016/08/23 19:46:08, manzagop wrote: > > > > I think it's possible that slot wasn't written to yet (a push is in > flight). > > > By > > > > deallocating, we're making it possible for another push to happen > (requires > > 2 > > > > pops), in which case either one of the push values could get written > first? > > > > > > Either push could get written first but they'll get written to numbered > slots > > > based on the order in which they succeed in the increment-of-head exchange. > > > Waiting for the value to be stored is on line 207. > > > > > > Pop will wait for the lowest-numbered one to be written and return that, > > though > > > parallel pops could return in any order. > > > > Sorry, I wasn't super clear. I mean that if there are lots of pushes that > > complete while that first push is in flight (granted that's unlikely) then you > > could wrap around and start performing a second push at the same location, > while > > the first is still in flight. Either of those might complete first. Again, > it's > > unlikely but possible? > > Let's see... > > 1) T1 push allocates slot X but doesn't write > 2) other threads successfully push N-1 values (more will fail) > 3) T2 pop deallocates slot X but can't read so yields > 4) other threads successfully pop some values; queue no longer full > 5) T3 push allocates contended slot X... > > Going forward, either T1 or T3 (but not both) can complete the write of slot X > and T2 will read it. As soon as it's read, though, the other can then complete > its write at any time (we're back at step #1) and be available for the next pop > operation. > > Ordering would go to hell and we had two threads go into an extended stall (for > due to the scheduler and one waiting for it)... but I don't think anything gets > lost or otherwise breaks. That's my understanding as well. In the current use case it doesn't matter if the order isn't preserved. So we could go with a disclaimer comment.
The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Added a new parallel pushes & pops test... and found a bug. My understanding of the problem is... - there was no ordering of head/tail increments and their reads for checking empty/full - one thread could read a not-full "tail" value and add to the queue, filling it - another thread could read an earlier not-full "tail" value and add to the queue, over-filling it - queue now reads "empty" because tail==head even though it has values I've fixed this (I think -- it's not an easily reproducible failure) by using "sequentially consistent" operations between the head/tail increments and the head/tail reads for checking empty/full. https://codereview.chromium.org/2255503002/diff/240001/base/debug/activity_tr... File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/240001/base/debug/activity_tr... base/debug/activity_tracker.h:177: // In short: deallocate the slot at the tail of the queue, read the value On 2016/08/24 13:17:52, manzagop wrote: > On 2016/08/24 11:56:28, bcwhite wrote: > > On 2016/08/23 21:20:28, manzagop wrote: > > > On 2016/08/23 20:16:26, bcwhite wrote: > > > > On 2016/08/23 19:46:08, manzagop wrote: > > > > > I think it's possible that slot wasn't written to yet (a push is in > > flight). > > > > By > > > > > deallocating, we're making it possible for another push to happen > > (requires > > > 2 > > > > > pops), in which case either one of the push values could get written > > first? > > > > > > > > Either push could get written first but they'll get written to numbered > > slots > > > > based on the order in which they succeed in the increment-of-head > exchange. > > > > Waiting for the value to be stored is on line 207. > > > > > > > > Pop will wait for the lowest-numbered one to be written and return that, > > > though > > > > parallel pops could return in any order. > > > > > > Sorry, I wasn't super clear. I mean that if there are lots of pushes that > > > complete while that first push is in flight (granted that's unlikely) then > you > > > could wrap around and start performing a second push at the same location, > > while > > > the first is still in flight. Either of those might complete first. Again, > > it's > > > unlikely but possible? > > > > Let's see... > > > > 1) T1 push allocates slot X but doesn't write > > 2) other threads successfully push N-1 values (more will fail) > > 3) T2 pop deallocates slot X but can't read so yields > > 4) other threads successfully pop some values; queue no longer full > > 5) T3 push allocates contended slot X... > > > > Going forward, either T1 or T3 (but not both) can complete the write of slot X > > and T2 will read it. As soon as it's read, though, the other can then > complete > > its write at any time (we're back at step #1) and be available for the next > pop > > operation. > > > > Ordering would go to hell and we had two threads go into an extended stall > (for > > due to the scheduler and one waiting for it)... but I don't think anything > gets > > lost or otherwise breaks. > > That's my understanding as well. In the current use case it doesn't matter if > the order isn't preserved. So we could go with a disclaimer comment. I think it goes without saying that parallel operations don't have a guaranteed order in any case.
Can you please run the linux_tsan trybot?
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: chromeos_daisy_chromium_compile_only_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromeos_daisy_...)
On 2016/08/24 13:47:01, Alexander Potapenko wrote: > Can you please run the linux_tsan trybot? Right! I forgot that was there now. I managed to reproduce the problem even with seq_cst so reverting first...
Patchset #8 (id:260001) has been deleted
On 2016/08/24 14:12:44, bcwhite wrote: > On 2016/08/24 13:47:01, Alexander Potapenko wrote: > > Can you please run the linux_tsan trybot? > > Right! I forgot that was there now. I managed to reproduce the problem even > with seq_cst so reverting first... linux_chromium_tsan_rel_ng has finished base_unittests. No problems found. I'll run it again; crashes on my own PC don't happen very often.
The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Patchset #9 (id:300001) has been deleted
The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
On 2016/08/24 16:19:10, bcwhite wrote: > On 2016/08/24 14:12:44, bcwhite wrote: > > On 2016/08/24 13:47:01, Alexander Potapenko wrote: > > > Can you please run the linux_tsan trybot? > > > > Right! I forgot that was there now. I managed to reproduce the problem even > > with seq_cst so reverting first... > > linux_chromium_tsan_rel_ng has finished base_unittests. No problems found. > I'll run it again; crashes on my own PC don't happen very often. I figured out the problem. It's necessary for head & tail to be both read together in a single atomic operation otherwise full/empty checks aren't reliable and can cause overflow/underflow. I could do this by reducing them to 16 bits and merging them into a 32-bit structure which is then accessed in a single read. Or, I can change it from a queue to a stack since the latter has its head & tail always be the same value. This seemed easier and since my own code doesn't care about ordering, it is acceptable.
The CQ bit was unchecked by bcwhite@chromium.org
The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by bcwhite@chromium.org
Description was changed from ========== New lock-free FIFO for the activity tracker. BUG=620813 ========== to ========== New lock-free stack for the activity tracker. BUG=620813 ==========
The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run
Patchset #10 (id:340001) has been deleted
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by bcwhite@chromium.org
To the best of my understanding of "lock-free", this stack isn't lock-free. You're basically locking every stack slot with a spinlock, so if one of the threads dies after it has updated the head but before it has changed the corresponding slot, all other threads that will be accessing this slot will get stuck. https://codereview.chromium.org/2255503002/diff/270004/base/debug/activity_tr... File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/270004/base/debug/activity_tr... base/debug/activity_tracker.h:46: //============================================================================= Have you seen LockFreeCircularQueue (https://cs.chromium.org/chromium/src/content/renderer/devtools/lock_free_circ...) ? Doesn't it meet your needs? (Not sure whether it's correct or not, never saw it before) https://codereview.chromium.org/2255503002/diff/270004/base/debug/activity_tr... base/debug/activity_tracker.h:66: CHECK(head_.is_lock_free()); Can these be converted to static_assert() then? https://codereview.chromium.org/2255503002/diff/270004/base/debug/activity_tr... base/debug/activity_tracker.h:107: // Get the head of the stack and acquire its contents. What does the "acquire its contents" part mean? The below load is a relaxed one. https://codereview.chromium.org/2255503002/diff/270004/base/debug/activity_tr... base/debug/activity_tracker.h:128: // The exchange will have loaded the latest "head" into |slot|. That's "head_", correct? https://codereview.chromium.org/2255503002/diff/270004/base/debug/activity_tr... base/debug/activity_tracker.h:139: while (!values_[slot].compare_exchange_strong(expected_value, value, What is the principal difference between this compare_exchange and the previous one? Why is this one strong?
The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run
Every lock-free algorithm I've seen (though I'm certainly no expert) involves "spinning" or "retry" on some sort of resource. But whatever you call it, it doesn't involve base::Lock or any other official kind of mutex. There is a problem if a thread dies but looking through Chrome code, I didn't see anything that handles such. If it did, any "AutoLock" would be a danger. This is different than the PersistentMemoryAllocator which had to deal with other processes that may die. https://codereview.chromium.org/2255503002/diff/270004/base/debug/activity_tr... File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/270004/base/debug/activity_tr... base/debug/activity_tracker.h:46: //============================================================================= On 2016/08/25 13:36:08, Alexander Potapenko wrote: > Have you seen LockFreeCircularQueue > (https://cs.chromium.org/chromium/src/content/renderer/devtools/lock_free_circ...) > ? Doesn't it meet your needs? > > (Not sure whether it's correct or not, never saw it before) I looked at it quickly but (a) it wasn't in src/base and (b) was for large buffer transfers. So didn't really meet my needs. https://codereview.chromium.org/2255503002/diff/270004/base/debug/activity_tr... base/debug/activity_tracker.h:66: CHECK(head_.is_lock_free()); On 2016/08/25 13:36:08, Alexander Potapenko wrote: > Can these be converted to static_assert() then? I tried but it doesn't compile. Either it isn't truly a constant or it's not "constant enough" for static_assert. https://codereview.chromium.org/2255503002/diff/270004/base/debug/activity_tr... base/debug/activity_tracker.h:107: // Get the head of the stack and acquire its contents. On 2016/08/25 13:36:08, Alexander Potapenko wrote: > What does the "acquire its contents" part mean? The below load is a relaxed one. It used to be an "acquire" but I reduced it because the acquire actually occurs during the increment on line 125. Comment fixed (here and in pop). https://codereview.chromium.org/2255503002/diff/270004/base/debug/activity_tr... base/debug/activity_tracker.h:128: // The exchange will have loaded the latest "head" into |slot|. On 2016/08/25 13:36:08, Alexander Potapenko wrote: > That's "head_", correct? "head" or |head_|, yes. https://codereview.chromium.org/2255503002/diff/270004/base/debug/activity_tr... base/debug/activity_tracker.h:139: while (!values_[slot].compare_exchange_strong(expected_value, value, On 2016/08/25 13:36:08, Alexander Potapenko wrote: > What is the principal difference between this compare_exchange and the previous > one? Why is this one strong? Weak CAS are less expensive but can fail on some architectures even though the compare would succeed. Above, the cost of retrying is very small (though I really don't know enough to say if it's "small enough" to justify using "weak") but here the cost of retrying is releasing the CPU. That seems certain to be more expensive that "weak" is worth.
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
lgtm https://codereview.chromium.org/2255503002/diff/370001/base/debug/activity_tr... File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/370001/base/debug/activity_tr... base/debug/activity_tracker.h:93: std::atomic<size_t> head_; // One past the newest value; where to push. nit: is "index of the first empty slot" clearer? https://codereview.chromium.org/2255503002/diff/370001/base/debug/activity_tr... base/debug/activity_tracker.h:162: // If the stack is full, fail. nit: full -> empty
The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2255503002/diff/370001/base/debug/activity_tr... File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/370001/base/debug/activity_tr... base/debug/activity_tracker.h:93: std::atomic<size_t> head_; // One past the newest value; where to push. On 2016/08/25 15:26:55, manzagop wrote: > nit: is "index of the first empty slot" clearer? Done. https://codereview.chromium.org/2255503002/diff/370001/base/debug/activity_tr... base/debug/activity_tracker.h:162: // If the stack is full, fail. On 2016/08/25 15:26:55, manzagop wrote: > nit: full -> empty Done.
On 2016/08/25 14:54:38, bcwhite wrote: > Every lock-free algorithm I've seen (though I'm certainly no expert) involves > "spinning" or "retry" on some sort of resource. > > But whatever you call it, it doesn't involve base::Lock or any other official > kind of mutex. > > There is a problem if a thread dies but looking through Chrome code, I didn't > see anything that handles such. If it did, any "AutoLock" would be a danger. State this the other way around, the two-step push and pop operations in your stack make it possible for one thread affect the progress of other threads, which is what people usually try to avoid with lock-free algorithms. By the way, what was the prime cause of this CL? Have you tried a simple array guarded with a mutex? With some access patterns it should be at least not worse. > This is different than the PersistentMemoryAllocator which had to deal with > other processes that may die. Ok, thread death aside. A more relevant example is: if one thread starts to starve in the middle of a push(), other threads cannot make a pop() and are doomed to wait for it. > https://codereview.chromium.org/2255503002/diff/270004/base/debug/activity_tr... > File base/debug/activity_tracker.h (right): > > https://codereview.chromium.org/2255503002/diff/270004/base/debug/activity_tr... > base/debug/activity_tracker.h:46: > //============================================================================= > On 2016/08/25 13:36:08, Alexander Potapenko wrote: > > Have you seen LockFreeCircularQueue > > > (https://cs.chromium.org/chromium/src/content/renderer/devtools/lock_free_circ...) > > ? Doesn't it meet your needs? > > > > (Not sure whether it's correct or not, never saw it before) > > I looked at it quickly but (a) it wasn't in src/base and (b) was for large > buffer transfers. So didn't really meet my needs. > > https://codereview.chromium.org/2255503002/diff/270004/base/debug/activity_tr... > base/debug/activity_tracker.h:66: CHECK(head_.is_lock_free()); > On 2016/08/25 13:36:08, Alexander Potapenko wrote: > > Can these be converted to static_assert() then? > > I tried but it doesn't compile. Either it isn't truly a constant or it's not > "constant enough" for static_assert. > > https://codereview.chromium.org/2255503002/diff/270004/base/debug/activity_tr... > base/debug/activity_tracker.h:107: // Get the head of the stack and acquire its > contents. > On 2016/08/25 13:36:08, Alexander Potapenko wrote: > > What does the "acquire its contents" part mean? The below load is a relaxed > one. > > It used to be an "acquire" but I reduced it because the acquire actually occurs > during the increment on line 125. Comment fixed (here and in pop). > > https://codereview.chromium.org/2255503002/diff/270004/base/debug/activity_tr... > base/debug/activity_tracker.h:128: // The exchange will have loaded the latest > "head" into |slot|. > On 2016/08/25 13:36:08, Alexander Potapenko wrote: > > That's "head_", correct? > > "head" or |head_|, yes. > > https://codereview.chromium.org/2255503002/diff/270004/base/debug/activity_tr... > base/debug/activity_tracker.h:139: while > (!values_[slot].compare_exchange_strong(expected_value, value, > On 2016/08/25 13:36:08, Alexander Potapenko wrote: > > What is the principal difference between this compare_exchange and the > previous > > one? Why is this one strong? > > Weak CAS are less expensive but can fail on some architectures even though the > compare would succeed. Above, the cost of retrying is very small (though I > really don't know enough to say if it's "small enough" to justify using "weak") > but here the cost of retrying is releasing the CPU. That seems certain to be > more expensive that "weak" is worth. Sounds reasonable.
> > Every lock-free algorithm I've seen (though I'm certainly no expert) involves > > "spinning" or "retry" on some sort of resource. > > > > But whatever you call it, it doesn't involve base::Lock or any other official > > kind of mutex. > > > > There is a problem if a thread dies but looking through Chrome code, I didn't > > see anything that handles such. If it did, any "AutoLock" would be a danger. > State this the other way around, the two-step push and pop operations in your > stack make it possible for one thread affect the progress of other threads, > which is what people usually try to avoid with lock-free algorithms. Ah. If I understand you correctly... True "lock-free" could never cause one thread to stall because another thread got paused somewhere in the middle of its operation. Suggestions? > By the way, what was the prime cause of this CL? Have you tried a simple array > guarded with a mutex? > With some access patterns it should be at least not worse. The activity tracker tracks operations on mutex so it seemed beneficial to avoid calling such here as it would likely lead to recursive behavior. This recursion can be detected and stopped (it actually already is in another place) but it seemed better to avoid it if reasonably simple to do so. Perhaps a mutex is the better way. Perhaps I'm just a sucker for punishment! :-) (no "perhaps" -- it's nice to work on something truly difficult)
https://codereview.chromium.org/2255503002/diff/390001/base/debug/activity_tr... File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/390001/base/debug/activity_tr... base/debug/activity_tracker.h:126: std::memory_order_acquire, I have a gut feeling that the memory ordering in exchange statements is wrong, but I'm struggling to build an example proving that. Let's say threads T1 and T2 are performing a push() and a pop() on the stack, so the following operations are going to happen: 1. T1 loads H from head_, T1 stores H+1 to head_ 2. T1 loads -1 from values_[H], T1 stores V to values_[H] 3. T2 loads H+1 from head_, T2 stores H to head_ 4. T2 loads V from values_[H], T2 stores -1 to values_[H] Right now the std::memory_order_acquire make sure that 2 cannot be ordered before 1, and 4 cannot go before 3. However there are no guarantees on the mutual order of 2 and 3 or 2 and 4. Therefore it's possible that the accesses happen in the following order: 1-3-4-2, and the value loaded by T2 from values_[H] is stale. This example is not really convincing, but if several threads are involved, it occurs to me that an ABA is possible here.
https://codereview.chromium.org/2255503002/diff/390001/base/debug/activity_tr... File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/390001/base/debug/activity_tr... base/debug/activity_tracker.h:126: std::memory_order_acquire, On 2016/08/25 17:07:42, Alexander Potapenko wrote: > I have a gut feeling that the memory ordering in exchange statements is wrong, > but I'm struggling to build an example proving that. > > Let's say threads T1 and T2 are performing a push() and a pop() on the stack, so > the following operations are going to happen: > 1. T1 loads H from head_, T1 stores H+1 to head_ > 2. T1 loads -1 from values_[H], T1 stores V to values_[H] > 3. T2 loads H+1 from head_, T2 stores H to head_ > 4. T2 loads V from values_[H], T2 stores -1 to values_[H] > > Right now the std::memory_order_acquire make sure that 2 cannot be ordered > before 1, and 4 cannot go before 3. However there are no guarantees on the > mutual order of 2 and 3 or 2 and 4. > Therefore it's possible that the accesses happen in the following order: > 1-3-4-2, and the value loaded by T2 from values_[H] is stale. > > This example is not really convincing, but if several threads are involved, it > occurs to me that an ABA is possible here. In that ordering, #4 will read "invalid", yield the CPU, and retry. ABA is certainly possible but I think its okay because all we care about is if the slot is empty when we're trying to write to it (or not-empty when trying to read). It really doesn't matter how many times it switched in between.
@dvyukov: I've read in a couple of places that it's quite hard to make an array-backed lock-free stack, is that really so?
On 2016/08/25 17:19:06, bcwhite wrote: > https://codereview.chromium.org/2255503002/diff/390001/base/debug/activity_tr... > File base/debug/activity_tracker.h (right): > > https://codereview.chromium.org/2255503002/diff/390001/base/debug/activity_tr... > base/debug/activity_tracker.h:126: std::memory_order_acquire, > On 2016/08/25 17:07:42, Alexander Potapenko wrote: > > I have a gut feeling that the memory ordering in exchange statements is wrong, > > but I'm struggling to build an example proving that. > > > > Let's say threads T1 and T2 are performing a push() and a pop() on the stack, > so > > the following operations are going to happen: > > 1. T1 loads H from head_, T1 stores H+1 to head_ > > 2. T1 loads -1 from values_[H], T1 stores V to values_[H] > > 3. T2 loads H+1 from head_, T2 stores H to head_ > > 4. T2 loads V from values_[H], T2 stores -1 to values_[H] > > > > Right now the std::memory_order_acquire make sure that 2 cannot be ordered > > before 1, and 4 cannot go before 3. However there are no guarantees on the > > mutual order of 2 and 3 or 2 and 4. > > Therefore it's possible that the accesses happen in the following order: > > 1-3-4-2, and the value loaded by T2 from values_[H] is stale. > > > > This example is not really convincing, but if several threads are involved, it > > occurs to me that an ABA is possible here. > > In that ordering, #4 will read "invalid", yield the CPU, and retry. > > ABA is certainly possible but I think its okay because all we care about is if > the slot is empty when we're trying to write to it (or not-empty when trying to > read). It really doesn't matter how many times it switched in between. My understanding is that here the second thread may see any of the previous slot values written by other push/pop operations before. I'll try to construct a better example.
The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
On 2016/08/25 17:57:25, Alexander Potapenko wrote: > On 2016/08/25 17:19:06, bcwhite wrote: > > > https://codereview.chromium.org/2255503002/diff/390001/base/debug/activity_tr... > > File base/debug/activity_tracker.h (right): > > > > > https://codereview.chromium.org/2255503002/diff/390001/base/debug/activity_tr... > > base/debug/activity_tracker.h:126: std::memory_order_acquire, > > On 2016/08/25 17:07:42, Alexander Potapenko wrote: > > > I have a gut feeling that the memory ordering in exchange statements is > wrong, > > > but I'm struggling to build an example proving that. > > > > > > Let's say threads T1 and T2 are performing a push() and a pop() on the > stack, > > so > > > the following operations are going to happen: > > > 1. T1 loads H from head_, T1 stores H+1 to head_ > > > 2. T1 loads -1 from values_[H], T1 stores V to values_[H] > > > 3. T2 loads H+1 from head_, T2 stores H to head_ > > > 4. T2 loads V from values_[H], T2 stores -1 to values_[H] > > > > > > Right now the std::memory_order_acquire make sure that 2 cannot be ordered > > > before 1, and 4 cannot go before 3. However there are no guarantees on the > > > mutual order of 2 and 3 or 2 and 4. > > > Therefore it's possible that the accesses happen in the following order: > > > 1-3-4-2, and the value loaded by T2 from values_[H] is stale. > > > > > > This example is not really convincing, but if several threads are involved, > it > > > occurs to me that an ABA is possible here. > > > > In that ordering, #4 will read "invalid", yield the CPU, and retry. > > > > ABA is certainly possible but I think its okay because all we care about is if > > the slot is empty when we're trying to write to it (or not-empty when trying > to > > read). It really doesn't matter how many times it switched in between. > > My understanding is that here the second thread may see any of the previous slot > values written by other push/pop operations before. > I'll try to construct a better example. Since values are never read or written but only atomically swapped with "invalid", I don't think there is any way for a value to be returned multiple times.
I can look at it over next few days if you need more eyeballs.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
On 2016/08/25 18:11:34, Dmitry Vyukov wrote: > I can look at it over next few days if you need more eyeballs. Have not looked at the lock-free code itself yet. It is unclear to me why this component requires custom, complex lock-free algorithms. What is the thread creation ratio? What's the total number of threads? What's the number of processes? What are the chances that we get a memory block from the lock-free cache? If thread creation ratio is not high (which I would expect), then we can just scan the allocator blocks all the time and claim a free block with CAS. Similarly, if there are lots of processes with few threads, there are chances that there is a free block in a cache, but it is in a different process. So in such case we better scan allocator blocks as well. Even if we need a per-process cache, why do we need it to be lock-free? There is no termination-safety concerns here as the cache is per-process, so it seems to me that a mutex-protected cache will perfectly do.
On 2016/08/26 09:19:47, Dmitry Vyukov wrote: > On 2016/08/25 18:11:34, Dmitry Vyukov wrote: > > I can look at it over next few days if you need more eyeballs. > > Have not looked at the lock-free code itself yet. > It is unclear to me why this component requires custom, complex lock-free > algorithms. What is the thread creation ratio? What's the total number of > threads? What's the number of processes? What are the chances that we get a > memory block from the lock-free cache? > If thread creation ratio is not high (which I would expect), then we can just > scan the allocator blocks all the time and claim a free block with CAS. > Similarly, if there are lots of processes with few threads, there are chances > that there is a free block in a cache, but it is in a different process. So in > such case we better scan allocator blocks as well. > Even if we need a per-process cache, why do we need it to be lock-free? There is > no termination-safety concerns here as the cache is per-process, so it seems to > me that a mutex-protected cache will perfectly do. For some reason I assumed that this is called once per thread and cached in a TLS variable. Not sure if it's the case...
> Have not looked at the lock-free code itself yet. > It is unclear to me why this component requires custom, complex lock-free > algorithms. What is the thread creation ratio? What's the total number of > threads? What's the number of processes? What are the chances that we get a > memory block from the lock-free cache? > If thread creation ratio is not high (which I would expect), then we can just > scan the allocator blocks all the time and claim a free block with CAS. I have a personal policy to always write reusable code where reasonable to do so. It seemed to me at the time that such a class could be helpful in other situations, hence the "general purpose" nature of this. If that's not practical, then yes, there are other ways. Performance is not the issue in this use-case; avoiding a base::Lock was the original desire but even that is more for simplicity rather than necessity. > Similarly, if there are lots of processes with few threads, there are chances > that there is a free block in a cache, but it is in a different process. So in > such case we better scan allocator blocks as well. > Even if we need a per-process cache, why do we need it to be lock-free? There is > no termination-safety concerns here as the cache is per-process, so it seems to > me that a mutex-protected cache will perfectly do. This "stack" object doesn't operate between processes. All threads accessing a particular instance of such are within the same process. Unless there is way to make this work that would support thread-death at which point "general purpose" could be extended to inter-process.
On 2016/08/26 14:05:44, bcwhite wrote: > > Have not looked at the lock-free code itself yet. > > It is unclear to me why this component requires custom, complex lock-free > > algorithms. What is the thread creation ratio? What's the total number of > > threads? What's the number of processes? What are the chances that we get a > > memory block from the lock-free cache? > > If thread creation ratio is not high (which I would expect), then we can just > > scan the allocator blocks all the time and claim a free block with CAS. > > I have a personal policy to always write reusable code where reasonable to do > so. It seemed to me at the time that such a class could be helpful in other > situations, hence the "general purpose" nature of this. > > If that's not practical, then yes, there are other ways. Performance is not the > issue in this use-case; avoiding a base::Lock was the original desire but even > that is more for simplicity rather than necessity. It can be a reusable, mutex-based cache of objects. Reusability does not imply lock-free. > > Similarly, if there are lots of processes with few threads, there are chances > > that there is a free block in a cache, but it is in a different process. So in > > such case we better scan allocator blocks as well. > > Even if we need a per-process cache, why do we need it to be lock-free? There > is > > no termination-safety concerns here as the cache is per-process, so it seems > to > > me that a mutex-protected cache will perfectly do. > > This "stack" object doesn't operate between processes. All threads accessing a > particular instance of such are within the same process. > > Unless there is way to make this work that would support thread-death at which > point "general purpose" could be extended to inter-process. There is a way to make it termination-safe. But complexity of it (and of the current code) makes the benefit very questionable. So, is there any reason to not just scan all allocator blocks?
> > If that's not practical, then yes, there are other ways. Performance is not > the > > issue in this use-case; avoiding a base::Lock was the original desire but even > > that is more for simplicity rather than necessity. > > It can be a reusable, mutex-based cache of objects. Reusability does not imply > lock-free. Of course. But if one is to create a lock-free cache, why not make it reusable? The immediate purpose of "lock-free" was because this class using it tracks locks and so using a lock would create a recursive behavior. Yes, it can be dealt with (far easier than writing this class) but creates a conceptual complexity that I had hoped to avoid. > > > Similarly, if there are lots of processes with few threads, there are > chances > > > that there is a free block in a cache, but it is in a different process. So > in > > > such case we better scan allocator blocks as well. > > > Even if we need a per-process cache, why do we need it to be lock-free? > There > > is > > > no termination-safety concerns here as the cache is per-process, so it seems > > to > > > me that a mutex-protected cache will perfectly do. > > > > This "stack" object doesn't operate between processes. All threads accessing > a > > particular instance of such are within the same process. > > > > Unless there is way to make this work that would support thread-death at which > > point "general purpose" could be extended to inter-process. > > There is a way to make it termination-safe. But complexity of it (and of the > current code) makes the benefit very questionable. Understood. My PersistentMemoryAllocator had to be termination-safe because it operates inter-process. > So, is there any reason to not just scan all allocator blocks? An O(N) algorithm isn't generally useful and reusable. Also, there is a planned second use-case within this same file that will deal with individual resources (locks, events, etc) that will happen more frequently than the creation of threads. It's not clear that an O(N) algorithm or acquiring a mutex in those cases is practical. The basic problem is that the alloc/free of these memory spaces, coming as they do from the PersistentMemoryAllocator, are expensive. Keeping a cache of free'd spaces saves a lot of cycles and that is important for something measuring frequent low-level activities. I designed the cache as a stack but it doesn't have to be.
https://codereview.chromium.org/2255503002/diff/390001/base/debug/activity_tr... File base/debug/activity_tracker.cc (right): https://codereview.chromium.org/2255503002/diff/390001/base/debug/activity_tr... base/debug/activity_tracker.cc:543: // Dobule Failure. This shouldn't happen. But be graceful if it does, s/Dobule/Double.
On 2016/08/26 14:51:46, bcwhite wrote: > > > If that's not practical, then yes, there are other ways. Performance is not > > the > > > issue in this use-case; avoiding a base::Lock was the original desire but > even > > > that is more for simplicity rather than necessity. > > > > It can be a reusable, mutex-based cache of objects. Reusability does not imply > > lock-free. > > Of course. But if one is to create a lock-free cache, why not make it reusable? One reason is that it doesn't guarantee the lock-free properties while claiming to do so. E.g. the users may expect that push() and pop() operations on one thread do not block other threads. > The immediate purpose of "lock-free" was because this class using it tracks > locks and so using a lock would create a recursive behavior. Yes, it can be > dealt with (far easier than writing this class) but creates a conceptual > complexity that I had hoped to avoid. Since this class is already dealing with atomics, it could probably use a custom spinlock implementation to protect the cache. That's a single CAS loop and a lot less code. > > > > Similarly, if there are lots of processes with few threads, there are > > chances > > > > that there is a free block in a cache, but it is in a different process. > So > > in > > > > such case we better scan allocator blocks as well. > > > > Even if we need a per-process cache, why do we need it to be lock-free? > > There > > > is > > > > no termination-safety concerns here as the cache is per-process, so it > seems > > > to > > > > me that a mutex-protected cache will perfectly do. > > > > > > This "stack" object doesn't operate between processes. All threads > accessing > > a > > > particular instance of such are within the same process. > > > > > > Unless there is way to make this work that would support thread-death at > which > > > point "general purpose" could be extended to inter-process. > > > > There is a way to make it termination-safe. But complexity of it (and of the > > current code) makes the benefit very questionable. > > Understood. My PersistentMemoryAllocator had to be termination-safe because it > operates inter-process. > > > > So, is there any reason to not just scan all allocator blocks? > > An O(N) algorithm isn't generally useful and reusable. > > Also, there is a planned second use-case within this same file that will deal > with individual resources (locks, events, etc) that will happen more frequently > than the creation of threads. It's not clear that an O(N) algorithm or acquiring > a mutex in those cases is practical. Having every thread CAS a shared memory location is somewhat similar to a mutex. The thread still has to obtain exclusive access to the cacheline where |head_| resides. > > The basic problem is that the alloc/free of these memory spaces, coming as they > do from the PersistentMemoryAllocator, are expensive. Keeping a cache of free'd > spaces saves a lot of cycles and that is important for something measuring > frequent low-level activities. > TCMalloc also has to deal with this problem. It uses per-thread freelists, which are periodically flushed into the global freelist (at a much smaller cost). Have you considered per-thread caches? > I designed the cache as a stack but it doesn't have to be.
> > Of course. But if one is to create a lock-free cache, why not make it > reusable? > One reason is that it doesn't guarantee the lock-free properties while claiming > to do so. Fair enough but that was a misunderstanding on my part and the purpose of the extended review. That doesn't mean it's not a worthwhile goal if it is something that would be generally useful. > E.g. the users may expect that push() and pop() operations on one thread do not > block other threads. > > > The immediate purpose of "lock-free" was because this class using it tracks > > locks and so using a lock would create a recursive behavior. Yes, it can be > > dealt with (far easier than writing this class) but creates a conceptual > > complexity that I had hoped to avoid. > Since this class is already dealing with atomics, it could probably use a custom > spinlock implementation to protect the cache. That's a single CAS loop and a lot > less code. Yup. I looked at some implementation of those. Sadly, nobody ever bother to make one that is reusable (i.e. a class in base). They're not complicated but why re-implement it anew each time? > > > So, is there any reason to not just scan all allocator blocks? > > > > An O(N) algorithm isn't generally useful and reusable. > > > > Also, there is a planned second use-case within this same file that will deal > > with individual resources (locks, events, etc) that will happen more > frequently > > than the creation of threads. It's not clear that an O(N) algorithm or > acquiring > > a mutex in those cases is practical. > Having every thread CAS a shared memory location is somewhat similar to a mutex. > The thread still has to obtain exclusive access to the cacheline where |head_| > resides. At one point in the development I had a queue with separate head/tail to reduce that contention. It had problems, though, when it came to comparisons between them to determine full/empty. Perhaps there is a better waythat isn't so rigorous: Something that would be tolerant of contention by allowing a slot to be skipped (left "invalid") in certain push conditions knowing that pop would just deal with it. But I fear that I may be reaching the point of wanting to do this just for the sake of conquering a hard problem. > > The basic problem is that the alloc/free of these memory spaces, coming as > they > > do from the PersistentMemoryAllocator, are expensive. Keeping a cache of > free'd > > spaces saves a lot of cycles and that is important for something measuring > > frequent low-level activities. > > > TCMalloc also has to deal with this problem. It uses per-thread freelists, which > are periodically flushed into the global freelist (at a much smaller cost). Have > you considered per-thread caches? No, I didn't. The complexity seems a bit high. It's something to think about, though.
On 2016/08/26 17:35:48, bcwhite wrote: > > > Of course. But if one is to create a lock-free cache, why not make it > > reusable? > > One reason is that it doesn't guarantee the lock-free properties while > claiming > > to do so. > > Fair enough but that was a misunderstanding on my part and the purpose of the > extended review. That doesn't mean it's not a worthwhile goal if it is > something that would be generally useful. > > > > E.g. the users may expect that push() and pop() operations on one thread do > not > > block other threads. > > > > > The immediate purpose of "lock-free" was because this class using it tracks > > > locks and so using a lock would create a recursive behavior. Yes, it can be > > > dealt with (far easier than writing this class) but creates a conceptual > > > complexity that I had hoped to avoid. > > Since this class is already dealing with atomics, it could probably use a > custom > > spinlock implementation to protect the cache. That's a single CAS loop and a > lot > > less code. > > Yup. I looked at some implementation of those. Sadly, nobody ever bother to > make one that is reusable (i.e. a class in base). They're not complicated but > why re-implement it anew each time? Maybe there just was no need for them till now. There're https://cs.chromium.org/chromium/src/third_party/WebKit/Source/wtf/SpinLock.h... and https://cs.chromium.org/chromium/src/third_party/tcmalloc/vendor/src/base/spi... in different subprojects though. > > > > > So, is there any reason to not just scan all allocator blocks? > > > > > > An O(N) algorithm isn't generally useful and reusable. > > > > > > Also, there is a planned second use-case within this same file that will > deal > > > with individual resources (locks, events, etc) that will happen more > > frequently > > > than the creation of threads. It's not clear that an O(N) algorithm or > > acquiring > > > a mutex in those cases is practical. > > Having every thread CAS a shared memory location is somewhat similar to a > mutex. > > The thread still has to obtain exclusive access to the cacheline where |head_| > > resides. > > At one point in the development I had a queue with separate head/tail to reduce > that contention. It had problems, though, when it came to comparisons between > them to determine full/empty. > > Perhaps there is a better waythat isn't so rigorous: Something that would be > tolerant of contention by allowing a slot to be skipped (left "invalid") in > certain push conditions knowing that pop would just deal with it. I wonder how big is the actual contention. Is it worth the effort? > But I fear that I may be reaching the point of wanting to do this just for the > sake of conquering a hard problem. Beware, any other person who'll be maintaining that code may have a hard time :) > > > > The basic problem is that the alloc/free of these memory spaces, coming as > > they > > > do from the PersistentMemoryAllocator, are expensive. Keeping a cache of > > free'd > > > spaces saves a lot of cycles and that is important for something measuring > > > frequent low-level activities. > > > > > TCMalloc also has to deal with this problem. It uses per-thread freelists, > which > > are periodically flushed into the global freelist (at a much smaller cost). > Have > > you considered per-thread caches? > > No, I didn't. The complexity seems a bit high. It's something to think about, > though.
> > Yup. I looked at some implementation of those. Sadly, nobody ever bother to > > make one that is reusable (i.e. a class in base). They're not complicated but > > why re-implement it anew each time? > Maybe there just was no need for them till now. Which of course assumes we do need them now. Always a valid question. I think I'll take the weekend to rethink this. > > But I fear that I may be reaching the point of wanting to do this just for the > > sake of conquering a hard problem. > > Beware, any other person who'll be maintaining that code may have a hard time :) Yeah... The code is always so simple once it's done that you'd expect it wouldn't need any maintenance. But the number of subtle interactions grows so quickly with every operation that it's difficult (if not impossible) to prove it correct.
On 2016/08/26 14:51:46, bcwhite wrote: > > > If that's not practical, then yes, there are other ways. Performance is not > > the > > > issue in this use-case; avoiding a base::Lock was the original desire but > even > > > that is more for simplicity rather than necessity. > > > > It can be a reusable, mutex-based cache of objects. Reusability does not imply > > lock-free. > > Of course. But if one is to create a lock-free cache, why not make it reusable? > > The immediate purpose of "lock-free" was because this class using it tracks > locks and so using a lock would create a recursive behavior. Yes, it can be > dealt with (far easier than writing this class) but creates a conceptual > complexity that I had hoped to avoid. > > > > > > Similarly, if there are lots of processes with few threads, there are > > chances > > > > that there is a free block in a cache, but it is in a different process. > So > > in > > > > such case we better scan allocator blocks as well. > > > > Even if we need a per-process cache, why do we need it to be lock-free? > > There > > > is > > > > no termination-safety concerns here as the cache is per-process, so it > seems > > > to > > > > me that a mutex-protected cache will perfectly do. > > > > > > This "stack" object doesn't operate between processes. All threads > accessing > > a > > > particular instance of such are within the same process. > > > > > > Unless there is way to make this work that would support thread-death at > which > > > point "general purpose" could be extended to inter-process. > > > > There is a way to make it termination-safe. But complexity of it (and of the > > current code) makes the benefit very questionable. > > Understood. My PersistentMemoryAllocator had to be termination-safe because it > operates inter-process. > > > > So, is there any reason to not just scan all allocator blocks? > > An O(N) algorithm isn't generally useful and reusable. > > Also, there is a planned second use-case within this same file that will deal > with individual resources (locks, events, etc) that will happen more frequently > than the creation of threads. It's not clear that an O(N) algorithm or acquiring > a mutex in those cases is practical. > > > The basic problem is that the alloc/free of these memory spaces, coming as they > do from the PersistentMemoryAllocator, are expensive. Keeping a cache of free'd > spaces saves a lot of cycles and that is important for something measuring > frequent low-level activities. > > I designed the cache as a stack but it doesn't have to be. If it's going to be used inside of mutexes, then this implementation is too slow. You are adding 4 atomic RMW per mutex lock/unlock, which makes it at least 5 times slower. Worse in reality, because user mutex can be local, but this thing is global and produces lots of contention. For frequent operations you need a per-thread cache.
The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_android_rel_ng on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/linux_androi...)
Patchset #13 (id:410001) has been deleted
The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run
Patchset #13 (id:430001) has been deleted
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Okay, giving up on the lock-free stack/queue concept. I'll just use a regular lock for managing the cache since its only used on thread-create and thread-exit. I'll investigate thread-local storage for caching resource trackers within a thread.
lgtm Could you add the motivation to the CL description? https://codereview.chromium.org/2255503002/diff/450001/base/debug/activity_tr... File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/450001/base/debug/activity_tr... base/debug/activity_tracker.h:550: // These have to be lock-free because lock activity is tracked and causes Update comment?
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Description was changed from ========== New lock-free stack for the activity tracker. BUG=620813 ========== to ========== New internal cache (stack) class for recording available memories. This is extracted to its own class for simplicity and ease-of-reuse. It was also originally intended to be a lock-free object but that ended up being overly complicated so has been put on indefinite hold. BUG=620813 ==========
The CQ bit was unchecked by bcwhite@chromium.org
https://codereview.chromium.org/2255503002/diff/450001/base/debug/activity_tr... File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/450001/base/debug/activity_tr... base/debug/activity_tracker.h:550: // These have to be lock-free because lock activity is tracked and causes On 2016/09/01 15:46:48, manzagop wrote: > Update comment? Done.
The CQ bit was checked by bcwhite@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from manzagop@chromium.org Link to the patchset: https://codereview.chromium.org/2255503002/#ps470001 (title: "fixed comment")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Message was sent while issue was closed.
Description was changed from ========== New internal cache (stack) class for recording available memories. This is extracted to its own class for simplicity and ease-of-reuse. It was also originally intended to be a lock-free object but that ended up being overly complicated so has been put on indefinite hold. BUG=620813 ========== to ========== New internal cache (stack) class for recording available memories. This is extracted to its own class for simplicity and ease-of-reuse. It was also originally intended to be a lock-free object but that ended up being overly complicated so has been put on indefinite hold. BUG=620813 ==========
Message was sent while issue was closed.
Committed patchset #14 (id:470001)
Message was sent while issue was closed.
Description was changed from ========== New internal cache (stack) class for recording available memories. This is extracted to its own class for simplicity and ease-of-reuse. It was also originally intended to be a lock-free object but that ended up being overly complicated so has been put on indefinite hold. BUG=620813 ========== to ========== New internal cache (stack) class for recording available memories. This is extracted to its own class for simplicity and ease-of-reuse. It was also originally intended to be a lock-free object but that ended up being overly complicated so has been put on indefinite hold. BUG=620813 Committed: https://crrev.com/4c361942b436b45ddcaab02d6829921b8f3c9e81 Cr-Commit-Position: refs/heads/master@{#416282} ==========
Message was sent while issue was closed.
Patchset 14 (id:??) landed as https://crrev.com/4c361942b436b45ddcaab02d6829921b8f3c9e81 Cr-Commit-Position: refs/heads/master@{#416282} |