Issue 2255503002: New cache for the activity tracker.

bcwhite

The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run

4 years, 4 months ago (2016-08-16 17:04:55 UTC) #1

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2255503002/1

4 years, 4 months ago (2016-08-16 17:05:10 UTC) #2

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 4 months ago (2016-08-16 17:10:55 UTC) #3

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: cast_shell_linux on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/cast_shell_linux/builds/208373)

4 years, 4 months ago (2016-08-16 17:10:55 UTC) #4

bcwhite

The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run

4 years, 4 months ago (2016-08-16 17:41:34 UTC) #5

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2255503002/20001

4 years, 4 months ago (2016-08-16 17:41:47 UTC) #6

bcwhite

The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run

4 years, 4 months ago (2016-08-16 17:45:35 UTC) #7

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2255503002/40001

4 years, 4 months ago (2016-08-16 17:45:56 UTC) #9

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 4 months ago (2016-08-16 17:50:43 UTC) #10

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: ios-device on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device/builds/52812)

4 years, 4 months ago (2016-08-16 17:50:44 UTC) #11

bcwhite

The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run

4 years, 4 months ago (2016-08-16 18:03:06 UTC) #12

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2255503002/60001

4 years, 4 months ago (2016-08-16 18:03:20 UTC) #13

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 4 months ago (2016-08-16 18:13:30 UTC) #15

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: ios-device on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device/builds/52818)

4 years, 4 months ago (2016-08-16 18:13:30 UTC) #16

bcwhite

The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run

4 years, 4 months ago (2016-08-16 18:27:24 UTC) #17

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2255503002/80001

4 years, 4 months ago (2016-08-16 18:27:40 UTC) #19

bcwhite

bcwhite@chromium.org changed reviewers: + manzagop@chromium.org

4 years, 4 months ago (2016-08-16 18:44:00 UTC) #20

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 4 months ago (2016-08-16 19:18:48 UTC) #22

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

4 years, 4 months ago (2016-08-16 19:18:49 UTC) #23

manzagop (departed)

Nothing of substance but I still need to go over push/pop in detail. https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tracker.cc File ...

4 years, 4 months ago (2016-08-16 21:44:32 UTC) #24

manzagop (departed)

https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tracker.h File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tracker.h#newcode151 base/debug/activity_tracker.h:151: // false failure doesn't occur. I'm wondering if there's ...

4 years, 4 months ago (2016-08-16 22:05:00 UTC) #25

bcwhite

https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tracker.cc File base/debug/activity_tracker.cc (right): https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tracker.cc#newcode506 base/debug/activity_tracker.cc:506: // Get the first available memory from the top ...

4 years, 4 months ago (2016-08-17 19:29:23 UTC) #26

https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra...
File base/debug/activity_tracker.cc (right):

https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra...
base/debug/activity_tracker.cc:506: // Get the first available memory from the
top of the stack.
On 2016/08/16 21:44:31, manzagop wrote:
> nit: top of the stack -> fifo.

Done.  Though a FIFO and LIFO are both types of stack.

https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra...
base/debug/activity_tracker.cc:512: // fail if something else has already taken
the block and changed its type.
On 2016/08/16 21:44:31, manzagop wrote:
> Who could have taken the block, and how? Ah, by iterating directly over the
> allocator's blocks below?

Correct.

https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra...
base/debug/activity_tracker.cc:529: // of available memories and try to claim
it.
On 2016/08/16 21:44:31, manzagop wrote:
> Maybe clarify the comment to say the claimed block might be in the cache, eg:
> 
> Expand the search for free blocks by iterating over the allocator's free
blocks.
> This is because the cache's capacity may be inferior to the number of
allocated
> blocks. Note that any block found may actually be in the cache (if it was
> inserted after the last check).

Done.

https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra...
base/debug/activity_tracker.cc:627: // Push this on the internal list of
available memory blocks so it can
On 2016/08/16 21:44:31, manzagop wrote:
> nit: list -> cache

Done.

https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra...
base/debug/activity_tracker.cc:629: // size of the stack, it will fail but a
fallback check in CreateTracker
On 2016/08/16 21:44:31, manzagop wrote:
> stack -> fifo / free block cache

Done.

https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra...
File base/debug/activity_tracker.h (right):

https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra...
base/debug/activity_tracker.h:50: template <typename T, size_t SIZE, T
INVALID_VALUE>
On 2016/08/16 21:44:31, manzagop wrote:
> Would it be better for generated code size to exclude SIZE and INVALID_VALUE
> from the template and have them be constructor params?

Seems reasonable.  I'll look into it.

https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra...
base/debug/activity_tracker.h:95: // Reports if the stack is empty/full based on
explit head/tail values.
On 2016/08/16 21:44:31, manzagop wrote:
> nit: explit?

Done.

https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra...
base/debug/activity_tracker.h:98: return (tail + CAPACITY - 1) % CAPACITY ==
head;
On 2016/08/16 21:44:32, manzagop wrote:
> nit: perhaps comment on the modulo, since that varies per language eg:
> // Note: adding CAPACITY to ensure result of modulo is positive.

Done.

https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra...
base/debug/activity_tracker.h:151: // false failure doesn't occur.
On 2016/08/16 22:05:00, manzagop wrote:
> I'm wondering if there's a problem. Being able to store a value at head
(l.129)
> means that we wrote to a free spot, but it doesn't mean we wrote at the real
> head_. It's possible that in between loading the head (l.115) and writing to
it
> (l.129) there were lots of pushes and pops and we're actually writing in the
> middle of the free range?

So...  multiple pushes and pops occur such that tail advances past head and the
store succeeds but the head-increment fails.

Good catch.

I'm going to have to think on this.  One solution that comes to mind would be to
keep head/tail as raw values and only do the modulus when referencing stack_. 
Then, if this increment fails then I can see if it failed because the tail has
gone by and undo the store of the "invalid" value.  That would also eliminate
the size/capacity (or, "How do I differentiate between empty and full?")
problem.

I don't *think* a limit of 4 billion push operations (on a 32-bit architecture)
would be a real limitation.  You?

manzagop (departed)

https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tracker.h File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tracker.h#newcode151 base/debug/activity_tracker.h:151: // false failure doesn't occur. On 2016/08/17 19:29:23, bcwhite ...

4 years, 4 months ago (2016-08-18 14:14:46 UTC) #27

https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra...
File base/debug/activity_tracker.h (right):

https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra...
base/debug/activity_tracker.h:151: // false failure doesn't occur.
On 2016/08/17 19:29:23, bcwhite wrote:
> On 2016/08/16 22:05:00, manzagop wrote:
> > I'm wondering if there's a problem. Being able to store a value at head
> (l.129)
> > means that we wrote to a free spot, but it doesn't mean we wrote at the real
> > head_. It's possible that in between loading the head (l.115) and writing to
> it
> > (l.129) there were lots of pushes and pops and we're actually writing in the
> > middle of the free range?
> 
> So...  multiple pushes and pops occur such that tail advances past head and
the
> store succeeds but the head-increment fails.
> 
> Good catch.
> 
> I'm going to have to think on this.  One solution that comes to mind would be
to
> keep head/tail as raw values and only do the modulus when referencing stack_. 
> Then, if this increment fails then I can see if it failed because the tail has
> gone by and undo the store of the "invalid" value.  That would also eliminate
> the size/capacity (or, "How do I differentiate between empty and full?")
> problem.
> 
> I don't *think* a limit of 4 billion push operations (on a 32-bit
architecture)
> would be a real limitation.  You?
> 

I agree this fixes the problem cases. However, I'm wary of baking a limit on the
number of operations even though 4B does seem high.

Is a lock really too expensive in this scenario? We're talking once per thread
creation / destruction.

https://codereview.chromium.org/2255503002/diff/100001/base/debug/activity_tr...
File base/debug/activity_tracker.h (right):

https://codereview.chromium.org/2255503002/diff/100001/base/debug/activity_tr...
base/debug/activity_tracker.h:141: head_.compare_exchange_strong(head, head + 1,
I've just noticed this could do a spurious push, if:
- t1 puts a valid value but suspends before incrementing head
- t2 fails to do the exchange and enters the if
- t1 increments head.
- some threads do enough complete pushes to wrap around back to head which now
contains an invalid value
- t2 resumes and increments head which effectively pushes an invalid value.

I think the suggestion to store head "not modulo-ed" addresses it though.

bcwhite

The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run

4 years, 4 months ago (2016-08-18 19:26:16 UTC) #28

bcwhite

https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tracker.h File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tracker.h#newcode50 base/debug/activity_tracker.h:50: template <typename T, size_t SIZE, T INVALID_VALUE> On 2016/08/16 ...

4 years, 4 months ago (2016-08-18 19:26:43 UTC) #29

https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra...
File base/debug/activity_tracker.h (right):

https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra...
base/debug/activity_tracker.h:50: template <typename T, size_t SIZE, T
INVALID_VALUE>
On 2016/08/16 21:44:31, manzagop wrote:
> Would it be better for generated code size to exclude SIZE and INVALID_VALUE
> from the template and have them be constructor params?

Done.

https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra...
base/debug/activity_tracker.h:53: CAPACITY = SIZE + 1  // +1 because one slot
must always be free.
On 2016/08/16 21:44:32, manzagop wrote:
> nit: isn't "size = capacity + 1" more accurate?

Done.

https://codereview.chromium.org/2255503002/diff/80001/base/debug/activity_tra...
base/debug/activity_tracker.h:151: // false failure doesn't occur.
On 2016/08/18 14:14:46, manzagop wrote:
> On 2016/08/17 19:29:23, bcwhite wrote:
> > On 2016/08/16 22:05:00, manzagop wrote:
> > > I'm wondering if there's a problem. Being able to store a value at head
> > (l.129)
> > > means that we wrote to a free spot, but it doesn't mean we wrote at the
real
> > > head_. It's possible that in between loading the head (l.115) and writing
to
> > it
> > > (l.129) there were lots of pushes and pops and we're actually writing in
the
> > > middle of the free range?
> > 
> > So...  multiple pushes and pops occur such that tail advances past head and
> the
> > store succeeds but the head-increment fails.
> > 
> > Good catch.
> > 
> > I'm going to have to think on this.  One solution that comes to mind would
be
> to
> > keep head/tail as raw values and only do the modulus when referencing
stack_. 
> > Then, if this increment fails then I can see if it failed because the tail
has
> > gone by and undo the store of the "invalid" value.  That would also
eliminate
> > the size/capacity (or, "How do I differentiate between empty and full?")
> > problem.
> > 
> > I don't *think* a limit of 4 billion push operations (on a 32-bit
> architecture)
> > would be a real limitation.  You?
> > 
> 
> I agree this fixes the problem cases. However, I'm wary of baking a limit on
the
> number of operations even though 4B does seem high.
> 
> Is a lock really too expensive in this scenario? We're talking once per thread
> creation / destruction.

I've simplified things now that thread-death is no longer a concern. It
precludes ever adapting this to memory shared between processes but that's a
problem for another day, such as when there is actually a demand for it.

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2255503002/120001

4 years, 4 months ago (2016-08-18 19:26:51 UTC) #30

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 4 months ago (2016-08-18 19:33:11 UTC) #31

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: cast_shell_linux on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/cast_shell_linux/builds/210042) ios-device on ...

4 years, 4 months ago (2016-08-18 19:33:11 UTC) #32

manzagop (departed)

https://codereview.chromium.org/2255503002/diff/120001/base/debug/activity_tracker.h File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/120001/base/debug/activity_tracker.h#newcode123 base/debug/activity_tracker.h:123: // Pushing the "invalid" value is not allowed; it ...

4 years, 4 months ago (2016-08-19 20:44:50 UTC) #33

bcwhite

https://codereview.chromium.org/2255503002/diff/120001/base/debug/activity_tracker.h File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/120001/base/debug/activity_tracker.h#newcode164 base/debug/activity_tracker.h:164: // for this to fail since only one thread ...

4 years, 4 months ago (2016-08-22 12:59:41 UTC) #34

manzagop (departed)

https://codereview.chromium.org/2255503002/diff/120001/base/debug/activity_tracker.h File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/120001/base/debug/activity_tracker.h#newcode164 base/debug/activity_tracker.h:164: // for this to fail since only one thread ...

4 years, 4 months ago (2016-08-22 14:49:29 UTC) #35

bcwhite

The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run

4 years, 4 months ago (2016-08-23 13:56:09 UTC) #36

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2255503002/140001

4 years, 4 months ago (2016-08-23 13:56:34 UTC) #37

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 4 months ago (2016-08-23 14:05:04 UTC) #38

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: mac_chromium_compile_dbg_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_compile_dbg_ng/builds/255993)

4 years, 4 months ago (2016-08-23 14:05:05 UTC) #39

bcwhite

The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run

4 years, 4 months ago (2016-08-23 14:26:27 UTC) #40

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2255503002/160001

4 years, 4 months ago (2016-08-23 14:26:50 UTC) #42

bcwhite

https://codereview.chromium.org/2255503002/diff/120001/base/debug/activity_tracker.h File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/120001/base/debug/activity_tracker.h#newcode123 base/debug/activity_tracker.h:123: // Pushing the "invalid" value is not allowed; it ...

4 years, 4 months ago (2016-08-23 14:52:38 UTC) #43

https://codereview.chromium.org/2255503002/diff/120001/base/debug/activity_tr...
File base/debug/activity_tracker.h (right):

https://codereview.chromium.org/2255503002/diff/120001/base/debug/activity_tr...
base/debug/activity_tracker.h:123: // Pushing the "invalid" value is not
allowed; it would be cause an infinite
On 2016/08/19 20:44:50, manzagop wrote:
> nit: "it would be cause"

Done.

https://codereview.chromium.org/2255503002/diff/120001/base/debug/activity_tr...
base/debug/activity_tracker.h:150: // If the new head maches the old one then
another thread is currently
On 2016/08/19 20:44:50, manzagop wrote:
> typo: maches

Acknowledged.

https://codereview.chromium.org/2255503002/diff/120001/base/debug/activity_tr...
base/debug/activity_tracker.h:164: // for this to fail since only one thread at
a time can be between the
On 2016/08/22 14:49:29, manzagop wrote:
> On 2016/08/22 12:59:41, bcwhite wrote:
> > > Ugh... I think we still have the issue where after reading the head, the
> tail
> > > can pass you by, and it's possible to write somewhere other than at head.
> > 
> > I don't think that can happen any more.  Only the thread that successfully
> > writes the value can now increment the head so there is no way for the head
to
> > be anything but what this thread expects.
> > 
> > The tail can't advance past the head because there is an "empty()" check at
> the
> > top of its loop so pop() will never even try to read the value we've written
> > here until after the increment can occur on line 168.
> > 
> > Is there a different case you're thinking of?
> 
> Here's what I have in mind: T1 stores a copy of head at l.128, then a T2 does
> multiple push/pops leading  tail_ to pass in front of T1's copy of "head". T1
> then attempts to do it's push: it succeeds in swapping it's value with an
> invalid value, but it's not at the real head. It then keeps yielding because
its
> head doesn't match the real head. Eventually, pushes on other threads bring
head
> back to just before T1's belief of head, but it's no longer possible for
anyone
> to push.

Right.  I'm going about this all wrong.  A slot should be allocated from the
head before anything is written.  Once a slot belongs to a thread, then it can
be updated free of contention.

manzagop (departed)

lgtm! \o/ https://codereview.chromium.org/2255503002/diff/160001/base/debug/activity_tracker.h File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/160001/base/debug/activity_tracker.h#newcode195 base/debug/activity_tracker.h:195: PlatformThread::YieldCurrentThread(); Do you need to reload tail ...

4 years, 4 months ago (2016-08-23 16:02:03 UTC) #44

bcwhite

The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run

4 years, 4 months ago (2016-08-23 16:09:46 UTC) #45

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2255503002/180001

4 years, 4 months ago (2016-08-23 16:10:14 UTC) #46

bcwhite

https://codereview.chromium.org/2255503002/diff/160001/base/debug/activity_tracker.h File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/160001/base/debug/activity_tracker.h#newcode195 base/debug/activity_tracker.h:195: PlatformThread::YieldCurrentThread(); On 2016/08/23 16:02:03, manzagop wrote: > Do you ...

4 years, 4 months ago (2016-08-23 16:12:42 UTC) #47

bcwhite

bcwhite@chromium.org changed reviewers: + glider@chromium.org

4 years, 4 months ago (2016-08-23 16:32:54 UTC) #48

bcwhite

Alex, would you mind reviewing a lock-free fifo/queue in activity_tracker.h? The reason for not using ...

4 years, 4 months ago (2016-08-23 16:32:55 UTC) #49

Alexander Potapenko

glider@chromium.org changed reviewers: + dvyukov@chromium.org

4 years, 4 months ago (2016-08-23 16:42:20 UTC) #50

Alexander Potapenko

Sorry, heading home already, will take a look tomorrow. +dvyukov@, who is in MTV and ...

4 years, 4 months ago (2016-08-23 16:42:21 UTC) #51

bcwhite

The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run

4 years, 4 months ago (2016-08-23 16:51:06 UTC) #52

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2255503002/200001

4 years, 4 months ago (2016-08-23 16:51:25 UTC) #53

bcwhite

The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run

4 years, 4 months ago (2016-08-23 17:23:49 UTC) #54

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2255503002/220001

4 years, 4 months ago (2016-08-23 17:24:30 UTC) #56

bcwhite

The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run

4 years, 4 months ago (2016-08-23 18:26:41 UTC) #57

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2255503002/240001

4 years, 4 months ago (2016-08-23 18:27:26 UTC) #58

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 4 months ago (2016-08-23 19:36:29 UTC) #60

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

4 years, 4 months ago (2016-08-23 19:36:30 UTC) #61

manzagop (departed)

https://codereview.chromium.org/2255503002/diff/160001/base/debug/activity_tracker_unittest.cc File base/debug/activity_tracker_unittest.cc (right): https://codereview.chromium.org/2255503002/diff/160001/base/debug/activity_tracker_unittest.cc#newcode78 base/debug/activity_tracker_unittest.cc:78: while (!queue_->empty()) On 2016/08/23 16:12:41, bcwhite wrote: > On ...

4 years, 4 months ago (2016-08-23 19:46:08 UTC) #62

bcwhite

https://codereview.chromium.org/2255503002/diff/240001/base/debug/activity_tracker.h File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/240001/base/debug/activity_tracker.h#newcode177 base/debug/activity_tracker.h:177: // In short: deallocate the slot at the tail ...

4 years, 4 months ago (2016-08-23 20:16:27 UTC) #63

bcwhite

https://codereview.chromium.org/2255503002/diff/160001/base/debug/activity_tracker_unittest.cc File base/debug/activity_tracker_unittest.cc (right): https://codereview.chromium.org/2255503002/diff/160001/base/debug/activity_tracker_unittest.cc#newcode78 base/debug/activity_tracker_unittest.cc:78: while (!queue_->empty()) On 2016/08/23 19:46:08, manzagop wrote: > On ...

4 years, 4 months ago (2016-08-23 20:16:46 UTC) #64

manzagop (departed)

https://codereview.chromium.org/2255503002/diff/240001/base/debug/activity_tracker.h File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/240001/base/debug/activity_tracker.h#newcode177 base/debug/activity_tracker.h:177: // In short: deallocate the slot at the tail ...

4 years, 4 months ago (2016-08-23 21:20:28 UTC) #65

bcwhite

https://codereview.chromium.org/2255503002/diff/240001/base/debug/activity_tracker.h File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/240001/base/debug/activity_tracker.h#newcode177 base/debug/activity_tracker.h:177: // In short: deallocate the slot at the tail ...

4 years, 4 months ago (2016-08-24 11:56:28 UTC) #66

https://codereview.chromium.org/2255503002/diff/240001/base/debug/activity_tr...
File base/debug/activity_tracker.h (right):

https://codereview.chromium.org/2255503002/diff/240001/base/debug/activity_tr...
base/debug/activity_tracker.h:177: // In short: deallocate the slot at the tail
of the queue, read the value
On 2016/08/23 21:20:28, manzagop wrote:
> On 2016/08/23 20:16:26, bcwhite wrote:
> > On 2016/08/23 19:46:08, manzagop wrote:
> > > I think it's possible that slot wasn't written to yet (a push is in
flight).
> > By
> > > deallocating, we're making it possible for another push to happen
(requires
> 2
> > > pops), in which case either one of the push values could get written
first?
> > 
> > Either push could get written first but they'll get written to numbered
slots
> > based on the order in which they succeed in the increment-of-head exchange. 
> > Waiting for the value to be stored is on line 207.
> > 
> > Pop will wait for the lowest-numbered one to be written and return that,
> though
> > parallel pops could return in any order.
> 
> Sorry, I wasn't super clear. I mean that if there are lots of pushes that
> complete while that first push is in flight (granted that's unlikely) then you
> could wrap around and start performing a second push at the same location,
while
> the first is still in flight. Either of those might complete first. Again,
it's
> unlikely but possible?

Let's see...

1) T1 push allocates slot X but doesn't write
2) other threads successfully push N-1 values (more will fail)
3) T2 pop deallocates slot X but can't read so yields
4) other threads successfully pop some values; queue no longer full
5) T3 push allocates contended slot X...

Going forward, either T1 or T3 (but not both) can complete the write of slot X
and T2 will read it.  As soon as it's read, though, the other can then complete
its write at any time (we're back at step #1) and be available for the next pop
operation.

Ordering would go to hell and we had two threads go into an extended stall (for
due to the scheduler and one waiting for it)...  but I don't think anything gets
lost or otherwise breaks.

manzagop (departed)

Could you add in the CL description a short blurb about the problems with the ...

4 years, 4 months ago (2016-08-24 13:17:52 UTC) #67

Could you add in the CL description a short blurb about the problems with the
code before, and the motivation for the current approach (eg vs using a lock).
(To ensure there's a written trace.)

https://codereview.chromium.org/2255503002/diff/240001/base/debug/activity_tr...
File base/debug/activity_tracker.h (right):

https://codereview.chromium.org/2255503002/diff/240001/base/debug/activity_tr...
base/debug/activity_tracker.h:177: // In short: deallocate the slot at the tail
of the queue, read the value
On 2016/08/24 11:56:28, bcwhite wrote:
> On 2016/08/23 21:20:28, manzagop wrote:
> > On 2016/08/23 20:16:26, bcwhite wrote:
> > > On 2016/08/23 19:46:08, manzagop wrote:
> > > > I think it's possible that slot wasn't written to yet (a push is in
> flight).
> > > By
> > > > deallocating, we're making it possible for another push to happen
> (requires
> > 2
> > > > pops), in which case either one of the push values could get written
> first?
> > > 
> > > Either push could get written first but they'll get written to numbered
> slots
> > > based on the order in which they succeed in the increment-of-head
exchange. 
> > > Waiting for the value to be stored is on line 207.
> > > 
> > > Pop will wait for the lowest-numbered one to be written and return that,
> > though
> > > parallel pops could return in any order.
> > 
> > Sorry, I wasn't super clear. I mean that if there are lots of pushes that
> > complete while that first push is in flight (granted that's unlikely) then
you
> > could wrap around and start performing a second push at the same location,
> while
> > the first is still in flight. Either of those might complete first. Again,
> it's
> > unlikely but possible?
> 
> Let's see...
> 
> 1) T1 push allocates slot X but doesn't write
> 2) other threads successfully push N-1 values (more will fail)
> 3) T2 pop deallocates slot X but can't read so yields
> 4) other threads successfully pop some values; queue no longer full
> 5) T3 push allocates contended slot X...
> 
> Going forward, either T1 or T3 (but not both) can complete the write of slot X
> and T2 will read it.  As soon as it's read, though, the other can then
complete
> its write at any time (we're back at step #1) and be available for the next
pop
> operation.
> 
> Ordering would go to hell and we had two threads go into an extended stall
(for
> due to the scheduler and one waiting for it)...  but I don't think anything
gets
> lost or otherwise breaks.

That's my understanding as well. In the current use case it doesn't matter if
the order isn't preserved. So we could go with a disclaimer comment.

bcwhite

The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run

4 years, 4 months ago (2016-08-24 13:36:41 UTC) #68

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2255503002/260001

4 years, 4 months ago (2016-08-24 13:36:50 UTC) #69

bcwhite

Added a new parallel pushes & pops test... and found a bug. My understanding of ...

4 years, 4 months ago (2016-08-24 13:41:21 UTC) #70

Added a new parallel pushes & pops test...  and found a bug.

My understanding of the problem is...

- there was no ordering of head/tail increments and their reads for checking
empty/full
- one thread could read a not-full "tail" value and add to the queue, filling it
- another thread could read an earlier not-full "tail" value and add to the
queue, over-filling it
- queue now reads "empty" because tail==head even though it has values

I've fixed this (I think -- it's not an easily reproducible failure) by using
"sequentially consistent" operations between the head/tail increments and the
head/tail reads for checking empty/full.

https://codereview.chromium.org/2255503002/diff/240001/base/debug/activity_tr...
File base/debug/activity_tracker.h (right):

https://codereview.chromium.org/2255503002/diff/240001/base/debug/activity_tr...
base/debug/activity_tracker.h:177: // In short: deallocate the slot at the tail
of the queue, read the value
On 2016/08/24 13:17:52, manzagop wrote:
> On 2016/08/24 11:56:28, bcwhite wrote:
> > On 2016/08/23 21:20:28, manzagop wrote:
> > > On 2016/08/23 20:16:26, bcwhite wrote:
> > > > On 2016/08/23 19:46:08, manzagop wrote:
> > > > > I think it's possible that slot wasn't written to yet (a push is in
> > flight).
> > > > By
> > > > > deallocating, we're making it possible for another push to happen
> > (requires
> > > 2
> > > > > pops), in which case either one of the push values could get written
> > first?
> > > > 
> > > > Either push could get written first but they'll get written to numbered
> > slots
> > > > based on the order in which they succeed in the increment-of-head
> exchange. 
> > > > Waiting for the value to be stored is on line 207.
> > > > 
> > > > Pop will wait for the lowest-numbered one to be written and return that,
> > > though
> > > > parallel pops could return in any order.
> > > 
> > > Sorry, I wasn't super clear. I mean that if there are lots of pushes that
> > > complete while that first push is in flight (granted that's unlikely) then
> you
> > > could wrap around and start performing a second push at the same location,
> > while
> > > the first is still in flight. Either of those might complete first. Again,
> > it's
> > > unlikely but possible?
> > 
> > Let's see...
> > 
> > 1) T1 push allocates slot X but doesn't write
> > 2) other threads successfully push N-1 values (more will fail)
> > 3) T2 pop deallocates slot X but can't read so yields
> > 4) other threads successfully pop some values; queue no longer full
> > 5) T3 push allocates contended slot X...
> > 
> > Going forward, either T1 or T3 (but not both) can complete the write of slot
X
> > and T2 will read it.  As soon as it's read, though, the other can then
> complete
> > its write at any time (we're back at step #1) and be available for the next
> pop
> > operation.
> > 
> > Ordering would go to hell and we had two threads go into an extended stall
> (for
> > due to the scheduler and one waiting for it)...  but I don't think anything
> gets
> > lost or otherwise breaks.
> 
> That's my understanding as well. In the current use case it doesn't matter if
> the order isn't preserved. So we could go with a disclaimer comment.

I think it goes without saying that parallel operations don't have a guaranteed
order in any case.

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 4 months ago (2016-08-24 13:50:40 UTC) #72

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: chromeos_daisy_chromium_compile_only_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromeos_daisy_chromium_compile_only_ng/builds/188978)

4 years, 4 months ago (2016-08-24 13:50:41 UTC) #73

bcwhite

On 2016/08/24 13:47:01, Alexander Potapenko wrote: > Can you please run the linux_tsan trybot? Right! ...

4 years, 4 months ago (2016-08-24 14:12:44 UTC) #74

bcwhite

On 2016/08/24 14:12:44, bcwhite wrote: > On 2016/08/24 13:47:01, Alexander Potapenko wrote: > > Can ...

4 years, 3 months ago (2016-08-24 16:19:10 UTC) #76

bcwhite

The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run

4 years, 3 months ago (2016-08-24 18:30:19 UTC) #77

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2255503002/300001

4 years, 3 months ago (2016-08-24 18:30:57 UTC) #78

bcwhite

The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run

4 years, 3 months ago (2016-08-24 18:42:55 UTC) #80

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2255503002/320001

4 years, 3 months ago (2016-08-24 18:43:53 UTC) #81

bcwhite

On 2016/08/24 16:19:10, bcwhite wrote: > On 2016/08/24 14:12:44, bcwhite wrote: > > On 2016/08/24 ...

4 years, 3 months ago (2016-08-24 18:46:08 UTC) #82

bcwhite

The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run

4 years, 3 months ago (2016-08-24 19:49:20 UTC) #84

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2255503002/340001

4 years, 3 months ago (2016-08-24 19:50:37 UTC) #85

bcwhite

Description was changed from ========== New lock-free FIFO for the activity tracker. BUG=620813 ========== to ...

4 years, 3 months ago (2016-08-24 20:07:43 UTC) #87

bcwhite

The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run

4 years, 3 months ago (2016-08-24 20:37:29 UTC) #88

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2255503002/270004

4 years, 3 months ago (2016-08-24 20:38:12 UTC) #90

Alexander Potapenko

To the best of my understanding of "lock-free", this stack isn't lock-free. You're basically locking ...

4 years, 3 months ago (2016-08-25 13:36:08 UTC) #93

bcwhite

The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run

4 years, 3 months ago (2016-08-25 14:54:32 UTC) #94

bcwhite

Every lock-free algorithm I've seen (though I'm certainly no expert) involves "spinning" or "retry" on ...

4 years, 3 months ago (2016-08-25 14:54:38 UTC) #95

Every lock-free algorithm I've seen (though I'm certainly no expert) involves
"spinning" or "retry" on some sort of resource.

But whatever you call it, it doesn't involve base::Lock or any other official
kind of mutex.

There is a problem if a thread dies but looking through Chrome code, I didn't
see anything that handles such.  If it did, any "AutoLock" would be a danger.

This is different than the PersistentMemoryAllocator which had to deal with
other processes that may die.

https://codereview.chromium.org/2255503002/diff/270004/base/debug/activity_tr...
File base/debug/activity_tracker.h (right):

https://codereview.chromium.org/2255503002/diff/270004/base/debug/activity_tr...
base/debug/activity_tracker.h:46:
//=============================================================================
On 2016/08/25 13:36:08, Alexander Potapenko wrote:
> Have you seen LockFreeCircularQueue
>
(https://cs.chromium.org/chromium/src/content/renderer/devtools/lock_free_circ...)
> ? Doesn't it meet your needs?
> 
> (Not sure whether it's correct or not, never saw it before)

I looked at it quickly but (a) it wasn't in src/base and (b) was for large
buffer transfers.  So didn't really meet my needs.

https://codereview.chromium.org/2255503002/diff/270004/base/debug/activity_tr...
base/debug/activity_tracker.h:66: CHECK(head_.is_lock_free());
On 2016/08/25 13:36:08, Alexander Potapenko wrote:
> Can these be converted to static_assert() then?

I tried but it doesn't compile.  Either it isn't truly a constant or it's not
"constant enough" for static_assert.

https://codereview.chromium.org/2255503002/diff/270004/base/debug/activity_tr...
base/debug/activity_tracker.h:107: // Get the head of the stack and acquire its
contents.
On 2016/08/25 13:36:08, Alexander Potapenko wrote:
> What does the "acquire its contents" part mean? The below load is a relaxed
one.

It used to be an "acquire" but I reduced it because the acquire actually occurs
during the increment on line 125.  Comment fixed (here and in pop).

https://codereview.chromium.org/2255503002/diff/270004/base/debug/activity_tr...
base/debug/activity_tracker.h:128: // The exchange will have loaded the latest
"head" into |slot|.
On 2016/08/25 13:36:08, Alexander Potapenko wrote:
> That's "head_", correct?

"head" or |head_|, yes.

https://codereview.chromium.org/2255503002/diff/270004/base/debug/activity_tr...
base/debug/activity_tracker.h:139: while
(!values_[slot].compare_exchange_strong(expected_value, value,
On 2016/08/25 13:36:08, Alexander Potapenko wrote:
> What is the principal difference between this compare_exchange and the
previous
> one? Why is this one strong?

Weak CAS are less expensive but can fail on some architectures even though the
compare would succeed.  Above, the cost of retrying is very small (though I
really don't know enough to say if it's "small enough" to justify using "weak")
but here the cost of retrying is releasing the CPU.  That seems certain to be
more expensive that "weak" is worth.

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2255503002/370001

4 years, 3 months ago (2016-08-25 14:54:51 UTC) #96

manzagop (departed)

lgtm https://codereview.chromium.org/2255503002/diff/370001/base/debug/activity_tracker.h File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/370001/base/debug/activity_tracker.h#newcode93 base/debug/activity_tracker.h:93: std::atomic<size_t> head_; // One past the newest value; ...

4 years, 3 months ago (2016-08-25 15:26:55 UTC) #97

bcwhite

The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run

4 years, 3 months ago (2016-08-25 15:50:49 UTC) #98

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2255503002/390001

4 years, 3 months ago (2016-08-25 15:51:16 UTC) #99

bcwhite

https://codereview.chromium.org/2255503002/diff/370001/base/debug/activity_tracker.h File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/370001/base/debug/activity_tracker.h#newcode93 base/debug/activity_tracker.h:93: std::atomic<size_t> head_; // One past the newest value; where ...

4 years, 3 months ago (2016-08-25 15:53:19 UTC) #100

Alexander Potapenko

On 2016/08/25 14:54:38, bcwhite wrote: > Every lock-free algorithm I've seen (though I'm certainly no ...

4 years, 3 months ago (2016-08-25 16:22:24 UTC) #101

On 2016/08/25 14:54:38, bcwhite wrote:
> Every lock-free algorithm I've seen (though I'm certainly no expert) involves
> "spinning" or "retry" on some sort of resource.
>
> But whatever you call it, it doesn't involve base::Lock or any other official
> kind of mutex.
> 
> There is a problem if a thread dies but looking through Chrome code, I didn't
> see anything that handles such.  If it did, any "AutoLock" would be a danger.
State this the other way around, the two-step push and pop operations in your
stack make it possible for one thread affect the progress of other threads,
which is what people usually try to avoid with lock-free algorithms.

By the way, what was the prime cause of this CL? Have you tried a simple array
guarded with a mutex?
With some access patterns it should be at least not worse.

> This is different than the PersistentMemoryAllocator which had to deal with
> other processes that may die.
Ok, thread death aside. A more relevant example is: if one thread starts to
starve in the middle of a push(), other threads cannot make a pop() and are
doomed to wait for it.

>
https://codereview.chromium.org/2255503002/diff/270004/base/debug/activity_tr...
> File base/debug/activity_tracker.h (right):
> 
>
https://codereview.chromium.org/2255503002/diff/270004/base/debug/activity_tr...
> base/debug/activity_tracker.h:46:
>
//=============================================================================
> On 2016/08/25 13:36:08, Alexander Potapenko wrote:
> > Have you seen LockFreeCircularQueue
> >
>
(https://cs.chromium.org/chromium/src/content/renderer/devtools/lock_free_circ...)
> > ? Doesn't it meet your needs?
> > 
> > (Not sure whether it's correct or not, never saw it before)
> 
> I looked at it quickly but (a) it wasn't in src/base and (b) was for large
> buffer transfers.  So didn't really meet my needs.
> 
>
https://codereview.chromium.org/2255503002/diff/270004/base/debug/activity_tr...
> base/debug/activity_tracker.h:66: CHECK(head_.is_lock_free());
> On 2016/08/25 13:36:08, Alexander Potapenko wrote:
> > Can these be converted to static_assert() then?
> 
> I tried but it doesn't compile.  Either it isn't truly a constant or it's not
> "constant enough" for static_assert.
> 
>
https://codereview.chromium.org/2255503002/diff/270004/base/debug/activity_tr...
> base/debug/activity_tracker.h:107: // Get the head of the stack and acquire
its
> contents.
> On 2016/08/25 13:36:08, Alexander Potapenko wrote:
> > What does the "acquire its contents" part mean? The below load is a relaxed
> one.
> 
> It used to be an "acquire" but I reduced it because the acquire actually
occurs
> during the increment on line 125.  Comment fixed (here and in pop).
> 
>
https://codereview.chromium.org/2255503002/diff/270004/base/debug/activity_tr...
> base/debug/activity_tracker.h:128: // The exchange will have loaded the latest
> "head" into |slot|.
> On 2016/08/25 13:36:08, Alexander Potapenko wrote:
> > That's "head_", correct?
> 
> "head" or |head_|, yes.
> 
>
https://codereview.chromium.org/2255503002/diff/270004/base/debug/activity_tr...
> base/debug/activity_tracker.h:139: while
> (!values_[slot].compare_exchange_strong(expected_value, value,
> On 2016/08/25 13:36:08, Alexander Potapenko wrote:
> > What is the principal difference between this compare_exchange and the
> previous
> > one? Why is this one strong?
> 
> Weak CAS are less expensive but can fail on some architectures even though the
> compare would succeed.  Above, the cost of retrying is very small (though I
> really don't know enough to say if it's "small enough" to justify using
"weak")
> but here the cost of retrying is releasing the CPU.  That seems certain to be
> more expensive that "weak" is worth.
Sounds reasonable.

bcwhite

> > Every lock-free algorithm I've seen (though I'm certainly no expert) involves > > ...

4 years, 3 months ago (2016-08-25 17:06:35 UTC) #102

Alexander Potapenko

https://codereview.chromium.org/2255503002/diff/390001/base/debug/activity_tracker.h File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/390001/base/debug/activity_tracker.h#newcode126 base/debug/activity_tracker.h:126: std::memory_order_acquire, I have a gut feeling that the memory ...

4 years, 3 months ago (2016-08-25 17:07:42 UTC) #103

bcwhite

https://codereview.chromium.org/2255503002/diff/390001/base/debug/activity_tracker.h File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/390001/base/debug/activity_tracker.h#newcode126 base/debug/activity_tracker.h:126: std::memory_order_acquire, On 2016/08/25 17:07:42, Alexander Potapenko wrote: > I ...

4 years, 3 months ago (2016-08-25 17:19:06 UTC) #104

Alexander Potapenko

@dvyukov: I've read in a couple of places that it's quite hard to make an ...

4 years, 3 months ago (2016-08-25 17:19:29 UTC) #105

Alexander Potapenko

On 2016/08/25 17:19:06, bcwhite wrote: > https://codereview.chromium.org/2255503002/diff/390001/base/debug/activity_tracker.h > File base/debug/activity_tracker.h (right): > > https://codereview.chromium.org/2255503002/diff/390001/base/debug/activity_tracker.h#newcode126 > ...

4 years, 3 months ago (2016-08-25 17:57:25 UTC) #106

bcwhite

The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run

4 years, 3 months ago (2016-08-25 18:00:30 UTC) #107

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2255503002/410001

4 years, 3 months ago (2016-08-25 18:02:12 UTC) #108

bcwhite

On 2016/08/25 17:57:25, Alexander Potapenko wrote: > On 2016/08/25 17:19:06, bcwhite wrote: > > > ...

4 years, 3 months ago (2016-08-25 18:05:41 UTC) #109

On 2016/08/25 17:57:25, Alexander Potapenko wrote:
> On 2016/08/25 17:19:06, bcwhite wrote:
> >
>
https://codereview.chromium.org/2255503002/diff/390001/base/debug/activity_tr...
> > File base/debug/activity_tracker.h (right):
> > 
> >
>
https://codereview.chromium.org/2255503002/diff/390001/base/debug/activity_tr...
> > base/debug/activity_tracker.h:126: std::memory_order_acquire,
> > On 2016/08/25 17:07:42, Alexander Potapenko wrote:
> > > I have a gut feeling that the memory ordering in exchange statements is
> wrong,
> > > but I'm struggling to build an example proving that.
> > > 
> > > Let's say threads T1 and T2 are performing a push() and a pop() on the
> stack,
> > so
> > > the following operations are going to happen:
> > > 1. T1 loads H from head_, T1 stores H+1 to head_
> > > 2. T1 loads -1 from values_[H], T1 stores V to values_[H]
> > > 3. T2 loads H+1 from head_, T2 stores H to head_
> > > 4. T2 loads V from values_[H], T2 stores -1 to values_[H]
> > > 
> > > Right now the std::memory_order_acquire make sure that 2 cannot be ordered
> > > before 1, and 4 cannot go before 3. However there are no guarantees on the
> > > mutual order of 2 and 3 or 2 and 4.
> > > Therefore it's possible that the accesses happen in the following order:
> > > 1-3-4-2, and the value loaded by T2 from values_[H] is stale.
> > > 
> > > This example is not really convincing, but if several threads are
involved,
> it
> > > occurs to me that an ABA is possible here.
> > 
> > In that ordering, #4 will read "invalid", yield the CPU, and retry.
> > 
> > ABA is certainly possible but I think its okay because all we care about is
if
> > the slot is empty when we're trying to write to it (or not-empty when trying
> to
> > read).  It really doesn't matter how many times it switched in between.
> 
> My understanding is that here the second thread may see any of the previous
slot
> values written by other push/pop operations before.
> I'll try to construct a better example.

Since values are never read or written but only atomically swapped with
"invalid", I don't think there is any way for a value to be returned multiple
times.

Dmitry Vyukov

I can look at it over next few days if you need more eyeballs.

4 years, 3 months ago (2016-08-25 18:11:34 UTC) #110

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 3 months ago (2016-08-25 20:55:49 UTC) #111

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

4 years, 3 months ago (2016-08-25 20:55:50 UTC) #112

Dmitry Vyukov

On 2016/08/25 18:11:34, Dmitry Vyukov wrote: > I can look at it over next few ...

4 years, 3 months ago (2016-08-26 09:19:47 UTC) #113

Dmitry Vyukov

On 2016/08/26 09:19:47, Dmitry Vyukov wrote: > On 2016/08/25 18:11:34, Dmitry Vyukov wrote: > > ...

4 years, 3 months ago (2016-08-26 10:37:10 UTC) #114

bcwhite

> Have not looked at the lock-free code itself yet. > It is unclear to ...

4 years, 3 months ago (2016-08-26 14:05:44 UTC) #115

Dmitry Vyukov

On 2016/08/26 14:05:44, bcwhite wrote: > > Have not looked at the lock-free code itself ...

4 years, 3 months ago (2016-08-26 14:21:30 UTC) #116

On 2016/08/26 14:05:44, bcwhite wrote:
> > Have not looked at the lock-free code itself yet.
> > It is unclear to me why this component requires custom, complex lock-free
> > algorithms. What is the thread creation ratio? What's the total number of
> > threads? What's the number of processes? What are the chances that we get a
> > memory block from the lock-free cache?
> > If thread creation ratio is not high (which I would expect), then we can
just
> > scan the allocator blocks all the time and claim a free block with CAS.
> 
> I have a personal policy to always write reusable code where reasonable to do
> so.  It seemed to me at the time that such a class could be helpful in other
> situations, hence the "general purpose" nature of this.
> 
> If that's not practical, then yes, there are other ways.  Performance is not
the
> issue in this use-case; avoiding a base::Lock was the original desire but even
> that is more for simplicity rather than necessity.

It can be a reusable, mutex-based cache of objects. Reusability does not imply
lock-free.


> > Similarly, if there are lots of processes with few threads, there are
chances
> > that there is a free block in a cache, but it is in a different process. So
in
> > such case we better scan allocator blocks as well.
> > Even if we need a per-process cache, why do we need it to be lock-free?
There
> is
> > no termination-safety concerns here as the cache is per-process, so it seems
> to
> > me that a mutex-protected cache will perfectly do.
> 
> This "stack" object doesn't operate between processes.  All threads accessing
a
> particular instance of such are within the same process.
> 
> Unless there is way to make this work that would support thread-death at which
> point "general purpose" could be extended to inter-process.

There is a way to make it termination-safe. But complexity of it (and of the
current code) makes the benefit very questionable.

So, is there any reason to not just scan all allocator blocks?

bcwhite

> > If that's not practical, then yes, there are other ways. Performance is not ...

4 years, 3 months ago (2016-08-26 14:51:46 UTC) #117

> > If that's not practical, then yes, there are other ways.  Performance is not
> the
> > issue in this use-case; avoiding a base::Lock was the original desire but
even
> > that is more for simplicity rather than necessity.
> 
> It can be a reusable, mutex-based cache of objects. Reusability does not imply
> lock-free.

Of course.  But if one is to create a lock-free cache, why not make it reusable?

The immediate purpose of "lock-free" was because this class using it tracks
locks and so using a lock would create a recursive behavior.  Yes, it can be
dealt with (far easier than writing this class) but creates a conceptual
complexity that I had hoped to avoid.


> > > Similarly, if there are lots of processes with few threads, there are
> chances
> > > that there is a free block in a cache, but it is in a different process.
So
> in
> > > such case we better scan allocator blocks as well.
> > > Even if we need a per-process cache, why do we need it to be lock-free?
> There
> > is
> > > no termination-safety concerns here as the cache is per-process, so it
seems
> > to
> > > me that a mutex-protected cache will perfectly do.
> > 
> > This "stack" object doesn't operate between processes.  All threads
accessing
> a
> > particular instance of such are within the same process.
> > 
> > Unless there is way to make this work that would support thread-death at
which
> > point "general purpose" could be extended to inter-process.
> 
> There is a way to make it termination-safe. But complexity of it (and of the
> current code) makes the benefit very questionable.

Understood.  My PersistentMemoryAllocator had to be termination-safe because it
operates inter-process.


> So, is there any reason to not just scan all allocator blocks?

An O(N) algorithm isn't generally useful and reusable.

Also, there is a planned second use-case within this same file that will deal
with individual resources (locks, events, etc) that will happen more frequently
than the creation of threads. It's not clear that an O(N) algorithm or acquiring
a mutex in those cases is practical.


The basic problem is that the alloc/free of these memory spaces, coming as they
do from the PersistentMemoryAllocator, are expensive.  Keeping a cache of free'd
spaces saves a lot of cycles and that is important for something measuring
frequent low-level activities.

I designed the cache as a stack but it doesn't have to be.

Alexander Potapenko

https://codereview.chromium.org/2255503002/diff/390001/base/debug/activity_tracker.cc File base/debug/activity_tracker.cc (right): https://codereview.chromium.org/2255503002/diff/390001/base/debug/activity_tracker.cc#newcode543 base/debug/activity_tracker.cc:543: // Dobule Failure. This shouldn't happen. But be graceful ...

4 years, 3 months ago (2016-08-26 16:26:58 UTC) #118

Alexander Potapenko

On 2016/08/26 14:51:46, bcwhite wrote: > > > If that's not practical, then yes, there ...

4 years, 3 months ago (2016-08-26 16:53:55 UTC) #119

On 2016/08/26 14:51:46, bcwhite wrote:
> > > If that's not practical, then yes, there are other ways.  Performance is
not
> > the
> > > issue in this use-case; avoiding a base::Lock was the original desire but
> even
> > > that is more for simplicity rather than necessity.
> > 
> > It can be a reusable, mutex-based cache of objects. Reusability does not
imply
> > lock-free.
> 
> Of course.  But if one is to create a lock-free cache, why not make it
reusable?
One reason is that it doesn't guarantee the lock-free properties while claiming
to do so.
E.g. the users may expect that push() and pop() operations on one thread do not
block other threads.

> The immediate purpose of "lock-free" was because this class using it tracks
> locks and so using a lock would create a recursive behavior.  Yes, it can be
> dealt with (far easier than writing this class) but creates a conceptual
> complexity that I had hoped to avoid.
Since this class is already dealing with atomics, it could probably use a custom
spinlock implementation to protect the cache. That's a single CAS loop and a lot
less code.

> > > > Similarly, if there are lots of processes with few threads, there are
> > chances
> > > > that there is a free block in a cache, but it is in a different process.
> So
> > in
> > > > such case we better scan allocator blocks as well.
> > > > Even if we need a per-process cache, why do we need it to be lock-free?
> > There
> > > is
> > > > no termination-safety concerns here as the cache is per-process, so it
> seems
> > > to
> > > > me that a mutex-protected cache will perfectly do.
> > > 
> > > This "stack" object doesn't operate between processes.  All threads
> accessing
> > a
> > > particular instance of such are within the same process.
> > > 
> > > Unless there is way to make this work that would support thread-death at
> which
> > > point "general purpose" could be extended to inter-process.
> > 
> > There is a way to make it termination-safe. But complexity of it (and of the
> > current code) makes the benefit very questionable.
> 
> Understood.  My PersistentMemoryAllocator had to be termination-safe because
it
> operates inter-process.
> 
> 
> > So, is there any reason to not just scan all allocator blocks?
> 
> An O(N) algorithm isn't generally useful and reusable.
> 
> Also, there is a planned second use-case within this same file that will deal
> with individual resources (locks, events, etc) that will happen more
frequently
> than the creation of threads. It's not clear that an O(N) algorithm or
acquiring
> a mutex in those cases is practical.
Having every thread CAS a shared memory location is somewhat similar to a mutex.
The thread still has to obtain exclusive access to the cacheline where |head_|
resides. 
> 
> The basic problem is that the alloc/free of these memory spaces, coming as
they
> do from the PersistentMemoryAllocator, are expensive.  Keeping a cache of
free'd
> spaces saves a lot of cycles and that is important for something measuring
> frequent low-level activities.
> 
TCMalloc also has to deal with this problem. It uses per-thread freelists, which
are periodically flushed into the global freelist (at a much smaller cost). Have
you considered per-thread caches?
> I designed the cache as a stack but it doesn't have to be.

bcwhite

> > Of course. But if one is to create a lock-free cache, why not ...

4 years, 3 months ago (2016-08-26 17:35:48 UTC) #120

> > Of course.  But if one is to create a lock-free cache, why not make it
> reusable?
> One reason is that it doesn't guarantee the lock-free properties while
claiming
> to do so.

Fair enough but that was a misunderstanding on my part and the purpose of the
extended review.  That doesn't mean it's not a worthwhile goal if it is
something that would be generally useful.


> E.g. the users may expect that push() and pop() operations on one thread do
not
> block other threads.
> 
> > The immediate purpose of "lock-free" was because this class using it tracks
> > locks and so using a lock would create a recursive behavior.  Yes, it can be
> > dealt with (far easier than writing this class) but creates a conceptual
> > complexity that I had hoped to avoid.
> Since this class is already dealing with atomics, it could probably use a
custom
> spinlock implementation to protect the cache. That's a single CAS loop and a
lot
> less code.

Yup.  I looked at some implementation of those.  Sadly, nobody ever bother to
make one that is reusable (i.e. a class in base).  They're not complicated but
why re-implement it anew each time?


> > > So, is there any reason to not just scan all allocator blocks?
> > 
> > An O(N) algorithm isn't generally useful and reusable.
> > 
> > Also, there is a planned second use-case within this same file that will
deal
> > with individual resources (locks, events, etc) that will happen more
> frequently
> > than the creation of threads. It's not clear that an O(N) algorithm or
> acquiring
> > a mutex in those cases is practical.
> Having every thread CAS a shared memory location is somewhat similar to a
mutex.
> The thread still has to obtain exclusive access to the cacheline where |head_|
> resides. 

At one point in the development I had a queue with separate head/tail to reduce
that contention.  It had problems, though, when it came to comparisons between
them to determine full/empty.

Perhaps there is a better waythat isn't so rigorous: Something that would be
tolerant of contention by allowing a slot to be skipped (left "invalid") in
certain push conditions knowing that pop would just deal with it.

But I fear that I may be reaching the point of wanting to do this just for the
sake of conquering a hard problem.


> > The basic problem is that the alloc/free of these memory spaces, coming as
> they
> > do from the PersistentMemoryAllocator, are expensive.  Keeping a cache of
> free'd
> > spaces saves a lot of cycles and that is important for something measuring
> > frequent low-level activities.
> > 
> TCMalloc also has to deal with this problem. It uses per-thread freelists,
which
> are periodically flushed into the global freelist (at a much smaller cost).
Have
> you considered per-thread caches?

No, I didn't.  The complexity seems a bit high.  It's something to think about,
though.

Alexander Potapenko

On 2016/08/26 17:35:48, bcwhite wrote: > > > Of course. But if one is to ...

4 years, 3 months ago (2016-08-26 18:14:45 UTC) #121

On 2016/08/26 17:35:48, bcwhite wrote:
> > > Of course.  But if one is to create a lock-free cache, why not make it
> > reusable?
> > One reason is that it doesn't guarantee the lock-free properties while
> claiming
> > to do so.
> 
> Fair enough but that was a misunderstanding on my part and the purpose of the
> extended review.  That doesn't mean it's not a worthwhile goal if it is
> something that would be generally useful.
> 
> 
> > E.g. the users may expect that push() and pop() operations on one thread do
> not
> > block other threads.
> > 
> > > The immediate purpose of "lock-free" was because this class using it
tracks
> > > locks and so using a lock would create a recursive behavior.  Yes, it can
be
> > > dealt with (far easier than writing this class) but creates a conceptual
> > > complexity that I had hoped to avoid.
> > Since this class is already dealing with atomics, it could probably use a
> custom
> > spinlock implementation to protect the cache. That's a single CAS loop and a
> lot
> > less code.
> 
> Yup.  I looked at some implementation of those.  Sadly, nobody ever bother to
> make one that is reusable (i.e. a class in base).  They're not complicated but
> why re-implement it anew each time?
Maybe there just was no need for them till now.
There're
https://cs.chromium.org/chromium/src/third_party/WebKit/Source/wtf/SpinLock.h...
and
https://cs.chromium.org/chromium/src/third_party/tcmalloc/vendor/src/base/spi...
in different subprojects though.
> 
> > > > So, is there any reason to not just scan all allocator blocks?
> > > 
> > > An O(N) algorithm isn't generally useful and reusable. 
> > > 
> > > Also, there is a planned second use-case within this same file that will
> deal
> > > with individual resources (locks, events, etc) that will happen more
> > frequently
> > > than the creation of threads. It's not clear that an O(N) algorithm or
> > acquiring
> > > a mutex in those cases is practical.
> > Having every thread CAS a shared memory location is somewhat similar to a
> mutex.
> > The thread still has to obtain exclusive access to the cacheline where
|head_|
> > resides. 
> 
> At one point in the development I had a queue with separate head/tail to
reduce
> that contention.  It had problems, though, when it came to comparisons between
> them to determine full/empty.
> 
> Perhaps there is a better waythat isn't so rigorous: Something that would be
> tolerant of contention by allowing a slot to be skipped (left "invalid") in
> certain push conditions knowing that pop would just deal with it.
I wonder how big is the actual contention. Is it worth the effort?

> But I fear that I may be reaching the point of wanting to do this just for the
> sake of conquering a hard problem.
Beware, any other person who'll be maintaining that code may have a hard time :)
> 
> > > The basic problem is that the alloc/free of these memory spaces, coming as
> > they
> > > do from the PersistentMemoryAllocator, are expensive.  Keeping a cache of
> > free'd
> > > spaces saves a lot of cycles and that is important for something measuring
> > > frequent low-level activities.
> > > 
> > TCMalloc also has to deal with this problem. It uses per-thread freelists,
> which
> > are periodically flushed into the global freelist (at a much smaller cost).
> Have
> > you considered per-thread caches?
> 
> No, I didn't.  The complexity seems a bit high.  It's something to think
about,
> though.

bcwhite

> > Yup. I looked at some implementation of those. Sadly, nobody ever bother to ...

4 years, 3 months ago (2016-08-26 18:31:55 UTC) #122

Dmitry Vyukov

On 2016/08/26 14:51:46, bcwhite wrote: > > > If that's not practical, then yes, there ...

4 years, 3 months ago (2016-08-27 06:14:23 UTC) #123

On 2016/08/26 14:51:46, bcwhite wrote:
> > > If that's not practical, then yes, there are other ways.  Performance is
not
> > the
> > > issue in this use-case; avoiding a base::Lock was the original desire but
> even
> > > that is more for simplicity rather than necessity.
> > 
> > It can be a reusable, mutex-based cache of objects. Reusability does not
imply
> > lock-free.
> 
> Of course.  But if one is to create a lock-free cache, why not make it
reusable?
> 
> The immediate purpose of "lock-free" was because this class using it tracks
> locks and so using a lock would create a recursive behavior.  Yes, it can be
> dealt with (far easier than writing this class) but creates a conceptual
> complexity that I had hoped to avoid.
> 
> 
> > > > Similarly, if there are lots of processes with few threads, there are
> > chances
> > > > that there is a free block in a cache, but it is in a different process.
> So
> > in
> > > > such case we better scan allocator blocks as well.
> > > > Even if we need a per-process cache, why do we need it to be lock-free?
> > There
> > > is
> > > > no termination-safety concerns here as the cache is per-process, so it
> seems
> > > to
> > > > me that a mutex-protected cache will perfectly do.
> > > 
> > > This "stack" object doesn't operate between processes.  All threads
> accessing
> > a
> > > particular instance of such are within the same process.
> > > 
> > > Unless there is way to make this work that would support thread-death at
> which
> > > point "general purpose" could be extended to inter-process.
> > 
> > There is a way to make it termination-safe. But complexity of it (and of the
> > current code) makes the benefit very questionable.
> 
> Understood.  My PersistentMemoryAllocator had to be termination-safe because
it
> operates inter-process.
> 
> 
> > So, is there any reason to not just scan all allocator blocks?
> 
> An O(N) algorithm isn't generally useful and reusable.
> 
> Also, there is a planned second use-case within this same file that will deal
> with individual resources (locks, events, etc) that will happen more
frequently
> than the creation of threads. It's not clear that an O(N) algorithm or
acquiring
> a mutex in those cases is practical.
> 
> 
> The basic problem is that the alloc/free of these memory spaces, coming as
they
> do from the PersistentMemoryAllocator, are expensive.  Keeping a cache of
free'd
> spaces saves a lot of cycles and that is important for something measuring
> frequent low-level activities.
> 
> I designed the cache as a stack but it doesn't have to be.


If it's going to be used inside of mutexes, then this implementation is too
slow. You are adding 4 atomic RMW per mutex lock/unlock, which makes it at least
5 times slower. Worse in reality, because user mutex can be local, but this
thing is global and produces lots of contention.

For frequent operations you need a per-thread cache.

bcwhite

The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run

4 years, 3 months ago (2016-08-29 14:01:57 UTC) #124

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2255503002/430001

4 years, 3 months ago (2016-08-29 14:02:21 UTC) #125

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 3 months ago (2016-08-29 15:27:35 UTC) #126

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: linux_android_rel_ng on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/132021)

4 years, 3 months ago (2016-08-29 15:27:37 UTC) #127

bcwhite

The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run

4 years, 3 months ago (2016-09-01 15:19:53 UTC) #129

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2255503002/450001

4 years, 3 months ago (2016-09-01 15:20:14 UTC) #131

bcwhite

Okay, giving up on the lock-free stack/queue concept. I'll just use a regular lock for ...

4 years, 3 months ago (2016-09-01 15:22:30 UTC) #132

manzagop (departed)

lgtm Could you add the motivation to the CL description? https://codereview.chromium.org/2255503002/diff/450001/base/debug/activity_tracker.h File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/450001/base/debug/activity_tracker.h#newcode550 ...

4 years, 3 months ago (2016-09-01 15:46:48 UTC) #133

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 3 months ago (2016-09-01 16:36:39 UTC) #134

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

4 years, 3 months ago (2016-09-01 16:36:40 UTC) #135

bcwhite

The CQ bit was checked by bcwhite@chromium.org to run a CQ dry run

4 years, 3 months ago (2016-09-01 16:59:15 UTC) #136

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2255503002/470001

4 years, 3 months ago (2016-09-01 16:59:37 UTC) #137

bcwhite

Description was changed from ========== New lock-free stack for the activity tracker. BUG=620813 ========== to ...

4 years, 3 months ago (2016-09-01 17:03:00 UTC) #138

bcwhite

https://codereview.chromium.org/2255503002/diff/450001/base/debug/activity_tracker.h File base/debug/activity_tracker.h (right): https://codereview.chromium.org/2255503002/diff/450001/base/debug/activity_tracker.h#newcode550 base/debug/activity_tracker.h:550: // These have to be lock-free because lock activity ...

4 years, 3 months ago (2016-09-02 14:20:36 UTC) #140

bcwhite

The patchset sent to the CQ was uploaded after l-g-t-m from manzagop@chromium.org Link to the ...

4 years, 3 months ago (2016-09-02 15:25:26 UTC) #142

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2255503002/470001

4 years, 3 months ago (2016-09-02 15:25:44 UTC) #143

commit-bot: I haz the power

Description was changed from ========== New internal cache (stack) class for recording available memories. This ...

4 years, 3 months ago (2016-09-02 16:38:49 UTC) #144

commit-bot: I haz the power

Description was changed from ========== New internal cache (stack) class for recording available memories. This ...

4 years, 3 months ago (2016-09-02 16:42:29 UTC) #146

commit-bot: I haz the power

4 years, 3 months ago (2016-09-02 16:42:31 UTC) #147

Message was sent while issue was closed.

Patchset 14 (id:??) landed as
https://crrev.com/4c361942b436b45ddcaab02d6829921b8f3c9e81
Cr-Commit-Position: refs/heads/master@{#416282}

Issue 2255503002: New cache for the activity tracker. (Closed)

Description

Patch Set 1 #

Patch Set 2 : some clean up #

Patch Set 3 : addressed some review comments by manzagop #

Patch Set 4 : simplify fifo design #

Patch Set 5 : refactor fifo (now queue) to allocate slot before updating it #

Patch Set 6 : addressed review comments by manzagop #

Patch Set 7 : fix broken pop operation #

Patch Set 8 : new test #

Patch Set 9 : switch from queue to stack (only one atomic pointer) #

Patch Set 10 : fixed some compile problems #

Patch Set 11 : comment improvements #

Patch Set 12 : more comment improvements #

Patch Set 13 : forgo lock-free; use regular lock to create thread-safe stack #

Patch Set 14 : fixed comment #

Messages