Issue 1327063002: [tracing] Add sqlite memory statistics to tracing.

ssid

ssid@chromium.org changed reviewers: + primiano@chromium.org

5 years, 3 months ago (2015-09-08 10:34:03 UTC) #1

Primiano Tucci (use gerrit)

Thanks for doing this Looks generally reasonable. I have some general comments. After which we ...

5 years, 2 months ago (2015-09-30 10:26:56 UTC) #3

ssid

Made changes, ptal. https://codereview.chromium.org/1327063002/diff/60001/sql/BUILD.gn File sql/BUILD.gn (right): https://codereview.chromium.org/1327063002/diff/60001/sql/BUILD.gn#newcode17 sql/BUILD.gn:17: "process_memory_dump_provider.h", On 2015/09/30 10:26:56, Primiano Tucci ...

5 years, 2 months ago (2015-10-02 17:06:10 UTC) #5

Primiano Tucci (use gerrit)

The patch looks good, but before landing would be great if you could figure out ...

5 years, 2 months ago (2015-10-06 16:21:21 UTC) #6

ssid

On 2015/10/06 16:21:21, Primiano Tucci wrote: > The patch looks good, but before landing would ...

5 years, 2 months ago (2015-10-07 09:23:03 UTC) #7

Primiano Tucci (use gerrit)

primiano@chromium.org changed reviewers: + shess@chromium.org

5 years, 2 months ago (2015-10-07 09:38:27 UTC) #8

Primiano Tucci (use gerrit)

Ok LGTM from my side but I am not an OWNER nor too familiar with ...

5 years, 2 months ago (2015-10-07 09:38:27 UTC) #9

Scott Hess - ex-Googler

LGTM, minor comments in the code. On 2015/10/07 09:38:27, Primiano Tucci wrote: > +Scott: essentially ...

5 years, 2 months ago (2015-10-07 23:08:57 UTC) #10

LGTM, minor comments in the code.

On 2015/10/07 09:38:27, Primiano Tucci wrote:
> +Scott: essentially the main question here, beyond the code itself is: are the
> memory stats ssid is reporting here complete? are we missing some source of
> memory-usage or over-counting something?
> The main goal of all of this is to report the memory used by chrome via
sqlite.

They are complete-ish, I'm less sure about how useful they'll be :-).  As
mentioned, I'm landing an mmap change which should greatly change the CACHE
values.  The STMT values I have often wondered about, but the actual values
aren't amazingly useful - what I want an angle on is cache performance (if a
statement is never/seldom hit, I don't care if it doesn't take much space
because that probably means it's cheap to regenerate).  My experience is that
the SCHEMA value is generally small enough that it's hard to care about, and
it's also hard to see what to do about it anyhow.

I don't know if it make sense to reference Sqlite.MemoryKB.* histograms
anywhere, since they track the overall memory used.

https://codereview.chromium.org/1327063002/diff/100001/sql/connection.cc
File sql/connection.cc (right):

https://codereview.chromium.org/1327063002/diff/100001/sql/connection.cc#newc...
sql/connection.cc:239: base::StringPrintf("sqlite/connection_%p", this));
You might consider injecting histogram_tag_, if non-empty.  This is used to
automatically spray stats across Sqlite.<class>.<tag> histograms.

https://codereview.chromium.org/1327063002/diff/100001/sql/sql_memory_dump_pr...
File sql/sql_memory_dump_provider.cc (right):

https://codereview.chromium.org/1327063002/diff/100001/sql/sql_memory_dump_pr...
sql/sql_memory_dump_provider.cc:34: sqlite3_memory_highwater(1 /*resetFlag */));
sqlite3_memory_used() is literally sqlite3_status(SQLITE_STATUS_MEMORY_USED,
...) discarding the highwater mark, and vice versau for _highwater().  Since
you're already doing sqlite3_status() for malloc count below, for consistency
this could/should probably use sqlite3_status also.

Scott Hess - ex-Googler

I don't see my "as mentioned" mention. Sigh. http://crbug.com/489784 . It starts using SQLite's mmap ...

5 years, 2 months ago (2015-10-07 23:11:37 UTC) #11

ssid

On 2015/10/07 23:11:37, Scott Hess wrote: > I don't see my "as mentioned" mention. Sigh. ...

5 years, 2 months ago (2015-10-08 16:53:59 UTC) #14

Primiano Tucci (use gerrit)

https://codereview.chromium.org/1327063002/diff/180001/content/browser/browser_main_loop.cc File content/browser/browser_main_loop.cc (right): https://codereview.chromium.org/1327063002/diff/180001/content/browser/browser_main_loop.cc#newcode666 content/browser/browser_main_loop.cc:666: sql::SqlMemoryDumpProvider::GetInstance()); Don't we need this also in child processes?

5 years, 2 months ago (2015-10-08 17:18:20 UTC) #17

ssid

https://codereview.chromium.org/1327063002/diff/180001/content/browser/browser_main_loop.cc File content/browser/browser_main_loop.cc (right): https://codereview.chromium.org/1327063002/diff/180001/content/browser/browser_main_loop.cc#newcode666 content/browser/browser_main_loop.cc:666: sql::SqlMemoryDumpProvider::GetInstance()); On 2015/10/08 17:18:20, Primiano Tucci wrote: > Don't ...

5 years, 2 months ago (2015-10-08 17:20:17 UTC) #18

Scott Hess - ex-Googler

On 2015/10/08 17:20:17, ssid wrote: > https://codereview.chromium.org/1327063002/diff/180001/content/browser/browser_main_loop.cc > File content/browser/browser_main_loop.cc (right): > > https://codereview.chromium.org/1327063002/diff/180001/content/browser/browser_main_loop.cc#newcode666 > ...

5 years, 2 months ago (2015-10-08 17:41:17 UTC) #19

Primiano Tucci (use gerrit)

On 2015/10/08 17:41:17, Scott Hess wrote: > The only direct SQLite user in children should ...

5 years, 2 months ago (2015-10-08 17:43:14 UTC) #20

Primiano Tucci (use gerrit)

On 2015/10/08 17:41:17, Scott Hess wrote: > The only direct SQLite user in children should ...

5 years, 2 months ago (2015-10-08 17:43:15 UTC) #21

Scott Hess - ex-Googler

On 2015/10/08 16:53:59, ssid wrote: > On 2015/10/07 23:11:37, Scott Hess wrote: > > I ...

5 years, 2 months ago (2015-10-08 17:44:05 UTC) #22

Scott Hess - ex-Googler

On 2015/10/08 17:43:15, Primiano Tucci wrote: > On 2015/10/08 17:41:17, Scott Hess wrote: > > ...

5 years, 2 months ago (2015-10-08 17:50:43 UTC) #24

Primiano Tucci (use gerrit)

On 2015/10/08 17:50:43, Scott Hess wrote: > WebSQL has been deprecated for many years, and ...

5 years, 2 months ago (2015-10-08 17:54:30 UTC) #25

ssid

On 2015/10/08 17:54:30, Primiano Tucci wrote: > On 2015/10/08 17:50:43, Scott Hess wrote: > > ...

5 years, 2 months ago (2015-10-08 17:57:46 UTC) #26

Scott Hess - ex-Googler

On 2015/10/08 17:57:46, ssid wrote: > On 2015/10/08 17:54:30, Primiano Tucci wrote: > > On ...

5 years, 2 months ago (2015-10-08 18:05:41 UTC) #27

ssid

The patchset sent to the CQ was uploaded after l-g-t-m from primiano@chromium.org Link to the ...

5 years, 2 months ago (2015-10-08 20:03:06 UTC) #29

ssid

Filed a bug to track the issue about the renderer databases memory usage. https://code.google.com/p/chromium/issues/detail?id=542333.

5 years, 2 months ago (2015-10-12 16:46:11 UTC) #33

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1327063002/180001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1327063002/180001

5 years, 2 months ago (2015-10-12 16:46:34 UTC) #35

commit-bot: I haz the power

Patchset 7 (id:??) landed as https://crrev.com/9f8022f2aadfd86989231e6d367c2c2b9a53ae4b Cr-Commit-Position: refs/heads/master@{#353549}

5 years, 2 months ago (2015-10-12 17:50:02 UTC) #37

michaeln

michaeln@chromium.org changed reviewers: + michaeln@chromium.org

5 years, 2 months ago (2015-10-15 23:03:28 UTC) #38

michaeln

https://codereview.chromium.org/1327063002/diff/180001/sql/connection.cc File sql/connection.cc (right): https://codereview.chromium.org/1327063002/diff/180001/sql/connection.cc#newcode222 sql/connection.cc:222: bool Connection::OnMemoryDump(const base::trace_event::MemoryDumpArgs& args, what prevents this method from ...

5 years, 2 months ago (2015-10-15 23:03:28 UTC) #39

Scott Hess - ex-Googler

https://codereview.chromium.org/1327063002/diff/180001/sql/connection.cc File sql/connection.cc (right): https://codereview.chromium.org/1327063002/diff/180001/sql/connection.cc#newcode222 sql/connection.cc:222: bool Connection::OnMemoryDump(const base::trace_event::MemoryDumpArgs& args, On 2015/10/15 23:03:28, michaeln wrote: ...

5 years, 2 months ago (2015-10-15 23:50:29 UTC) #40

Primiano Tucci (use gerrit)

https://codereview.chromium.org/1327063002/diff/180001/sql/connection.cc File sql/connection.cc (right): https://codereview.chromium.org/1327063002/diff/180001/sql/connection.cc#newcode222 sql/connection.cc:222: bool Connection::OnMemoryDump(const base::trace_event::MemoryDumpArgs& args, On 2015/10/15 23:50:29, Scott Hess ...

5 years, 2 months ago (2015-10-16 08:37:44 UTC) #41

michaeln

On 2015/10/16 08:37:44, Primiano Tucci wrote: > https://codereview.chromium.org/1327063002/diff/180001/sql/connection.cc > File sql/connection.cc (right): > > https://codereview.chromium.org/1327063002/diff/180001/sql/connection.cc#newcode222 ...

5 years, 2 months ago (2015-10-16 20:06:43 UTC) #42

Primiano Tucci (use gerrit)

On 2015/10/16 20:06:43, michaeln wrote: > On 2015/10/16 08:37:44, Primiano Tucci wrote: > > https://codereview.chromium.org/1327063002/diff/180001/sql/connection.cc ...

5 years, 2 months ago (2015-10-19 14:00:52 UTC) #43

Scott Hess - ex-Googler

On 2015/10/19 14:00:52, Primiano Tucci wrote: > Ok if this is the case we should ...

5 years, 2 months ago (2015-10-19 16:39:35 UTC) #44

michaeln

On 2015/10/19 16:39:35, Scott Hess wrote: > On 2015/10/19 14:00:52, Primiano Tucci wrote: > > ...

5 years, 2 months ago (2015-10-19 19:59:54 UTC) #45

Scott Hess - ex-Googler

On 2015/10/19 19:59:54, michaeln wrote: > On 2015/10/19 16:39:35, Scott Hess wrote: > > On ...

5 years, 2 months ago (2015-10-19 20:07:12 UTC) #46

ssid

On 2015/10/19 20:07:12, Scott Hess wrote: > On 2015/10/19 19:59:54, michaeln wrote: > > On ...

5 years, 2 months ago (2015-10-21 12:26:38 UTC) #47

michaeln

Without something to prevent the sequence of events suggested below, there's definitely a latent bug. ...

5 years, 2 months ago (2015-10-21 20:20:32 UTC) #48

Scott Hess - ex-Googler

On 2015/10/21 20:20:32, michaeln wrote: > The easiest safest change might be to require callers ...

5 years, 2 months ago (2015-10-21 21:01:05 UTC) #49

Primiano Tucci (use gerrit)

On 2015/10/21 21:01:05, Scott Hess wrote: > On 2015/10/21 20:20:32, michaeln wrote: > > The ...

5 years, 1 month ago (2015-10-27 18:47:34 UTC) #50

Message was sent while issue was closed.

On 2015/10/21 21:01:05, Scott Hess wrote:
> On 2015/10/21 20:20:32, michaeln wrote:
> > The easiest safest change might be to require callers provide a
> > sequencedTaskRunner and that they call unreg only on that taskrunner. But
that
> > makes it impossible to instrument sql::Connection directly w/o involving the
> > clients of the class (get them to give you a task runner).
> 
> I was going to suggest shifting the client to a separate object which
> cross-references with the Connection, and then they could have independent
> lifetimes.  BUT, there would still be a race when the reporter wants to
generate
> stats.
> 
> That said, since Connection is generally already in an AssertIOAllowed()
thread,
> having a lock to handle this wouldn't be absurd.  I'm just not sure where
you'd
> put the lock, so it might have to be global.  I guess the init lock might make
> sense, it's only used on open.
> 
> I'm not sure what the ownership situation on that new object would be, though.

Ok I had some time to look at the code.
michaeln is definitely right, this code as it is is racy for the precise reason
explained in #42.
My understanding is that even if we specify a taskrunner affinity for the
OnMemoryDump method, that still won't solve the problem, as there is no
guarantee that a Connection is destroyed on the same thread of the ctor (still
would reduce the chances of the race above).
I don't see how adding a SequenceChecker would solve any problem. It would be
totally different if we were guaranteed that all Connection clients use it on a
SequencedTaskRunner and they respect that (read: a Connection is always created,
accessed and destroyed on the same SequencedTaskRunner). Is that the case
though?
It doesn't look clear to me, from the comments above my guess is that the answer
is: no, it happens to be sequentially-safish, but there is no
SequencedTaskRunner enforcement from the client side (please correct me if I am
wrong here).

If all the statements above are true, the only option that I can see is to:
- Move the dump provider for Connection to its own, RefCountedThreadSafe object,
and having Connection own it and destroy it via PostTask on the a single thread.
- Having a lock shared between the ~Connection dtor and the dump provider above
(+ a flag on the dump provider side to tell whether the connection went away)
- The lock will never be contended unless tracing is disabled.

Otherwise, if we are in the state where Connection can be destroyed on any
thread, and we don't know upfront on which one, I don't see how we can safely
dump its contents.

The main question is, how does this plan sound to you?

Primiano Tucci (use gerrit)

On 2015/10/27 18:47:34, Primiano Tucci wrote: > On 2015/10/21 21:01:05, Scott Hess wrote: > > ...

5 years, 1 month ago (2015-10-27 19:11:52 UTC) #51

Message was sent while issue was closed.

On 2015/10/27 18:47:34, Primiano Tucci wrote:
> On 2015/10/21 21:01:05, Scott Hess wrote:
> > On 2015/10/21 20:20:32, michaeln wrote:
> > > The easiest safest change might be to require callers provide a
> > > sequencedTaskRunner and that they call unreg only on that taskrunner. But
> that
> > > makes it impossible to instrument sql::Connection directly w/o involving
the
> > > clients of the class (get them to give you a task runner).
> > 
> > I was going to suggest shifting the client to a separate object which
> > cross-references with the Connection, and then they could have independent
> > lifetimes.  BUT, there would still be a race when the reporter wants to
> generate
> > stats.
> > 
> > That said, since Connection is generally already in an AssertIOAllowed()
> thread,
> > having a lock to handle this wouldn't be absurd.  I'm just not sure where
> you'd
> > put the lock, so it might have to be global.  I guess the init lock might
make
> > sense, it's only used on open.
> > 
> > I'm not sure what the ownership situation on that new object would be,
though.
> 
> Ok I had some time to look at the code.
> michaeln is definitely right, this code as it is is racy for the precise
reason
> explained in #42.
> My understanding is that even if we specify a taskrunner affinity for the
> OnMemoryDump method, that still won't solve the problem, as there is no
> guarantee that a Connection is destroyed on the same thread of the ctor (still
> would reduce the chances of the race above).
> I don't see how adding a SequenceChecker would solve any problem. It would be
> totally different if we were guaranteed that all Connection clients use it on
a
> SequencedTaskRunner and they respect that (read: a Connection is always
created,
> accessed and destroyed on the same SequencedTaskRunner). Is that the case
> though?
> It doesn't look clear to me, from the comments above my guess is that the
answer
> is: no, it happens to be sequentially-safish, but there is no
> SequencedTaskRunner enforcement from the client side (please correct me if I
am
> wrong here).
> 
> If all the statements above are true, the only option that I can see is to:
> - Move the dump provider for Connection to its own, RefCountedThreadSafe
object,
> and having Connection own it and destroy it via PostTask on the a single
thread.
> - Having a lock shared between the ~Connection dtor and the dump provider
above
> (+ a flag on the dump provider side to tell whether the connection went away)
> - The lock will never be contended unless tracing is disabled.
> 
> Otherwise, if we are in the state where Connection can be destroyed on any
> thread, and we don't know upfront on which one, I don't see how we can safely
> dump its contents.
> 
> The main question is, how does this plan sound to you?

Ok I think I have a slightly better plan.
- I can change the MemoryDumpManager to ensure that all the dump providers
without a task runner affinity happen on a dedicated thread (used only by the
MemoryDumpManager). I need this thread anyways for other reasons.
- All the OnMemoryDump calls for non-taskrunner bound threads will happen on
this thread. This will be the case for Connection().
- In this way I can make the UnregisterDumpProvider call locked. Essentially
~Connection dtor will call MemoryDumpManager::UnregisterDumpProvider as first
thing in the constructor. That will take the internal lock of the MDM and will
prevent that something else tries to OnMemoryDump while we are being destroyed.
Nothing in the ~Connection dtor can possibly depend on a thread which is used
exclusively by MDM, so no deadlocks should happen in this direction.

In the other direction, we have to guarantee that the body of OnMemoryDump never
takes any other lock shared with the UnregisterDumpProvider, but that should be
easy to guarantee as the Unregister call is pretty dumb.

Does all of this make sense? I'll try to think again tomorrow with a fresher
mind to this.

Scott Hess - ex-Googler

On 2015/10/27 19:11:52, Primiano Tucci wrote: > On 2015/10/27 18:47:34, Primiano Tucci wrote: > > ...

5 years, 1 month ago (2015-10-27 19:34:25 UTC) #52

Message was sent while issue was closed.

On 2015/10/27 19:11:52, Primiano Tucci wrote:
> On 2015/10/27 18:47:34, Primiano Tucci wrote:
> > On 2015/10/21 21:01:05, Scott Hess wrote:
> > > On 2015/10/21 20:20:32, michaeln wrote:
> > > > The easiest safest change might be to require callers provide a
> > > > sequencedTaskRunner and that they call unreg only on that taskrunner.
But
> > that
> > > > makes it impossible to instrument sql::Connection directly w/o involving
> the
> > > > clients of the class (get them to give you a task runner).
> > > 
> > > I was going to suggest shifting the client to a separate object which
> > > cross-references with the Connection, and then they could have independent
> > > lifetimes.  BUT, there would still be a race when the reporter wants to
> > generate
> > > stats.
> > > 
> > > That said, since Connection is generally already in an AssertIOAllowed()
> > thread,
> > > having a lock to handle this wouldn't be absurd.  I'm just not sure where
> > you'd
> > > put the lock, so it might have to be global.  I guess the init lock might
> make
> > > sense, it's only used on open.
> > > 
> > > I'm not sure what the ownership situation on that new object would be,
> though.
> > 
> > Ok I had some time to look at the code.
> > michaeln is definitely right, this code as it is is racy for the precise
> reason
> > explained in #42.
> > My understanding is that even if we specify a taskrunner affinity for the
> > OnMemoryDump method, that still won't solve the problem, as there is no
> > guarantee that a Connection is destroyed on the same thread of the ctor
(still
> > would reduce the chances of the race above).
> > I don't see how adding a SequenceChecker would solve any problem. It would
be
> > totally different if we were guaranteed that all Connection clients use it
on
> a
> > SequencedTaskRunner and they respect that (read: a Connection is always
> created,
> > accessed and destroyed on the same SequencedTaskRunner). Is that the case
> > though?
> > It doesn't look clear to me, from the comments above my guess is that the
> answer
> > is: no, it happens to be sequentially-safish, but there is no
> > SequencedTaskRunner enforcement from the client side (please correct me if I
> am
> > wrong here).
> > 
> > If all the statements above are true, the only option that I can see is to:
> > - Move the dump provider for Connection to its own, RefCountedThreadSafe
> object,
> > and having Connection own it and destroy it via PostTask on the a single
> thread.
> > - Having a lock shared between the ~Connection dtor and the dump provider
> above
> > (+ a flag on the dump provider side to tell whether the connection went
away)
> > - The lock will never be contended unless tracing is disabled.
> > 
> > Otherwise, if we are in the state where Connection can be destroyed on any
> > thread, and we don't know upfront on which one, I don't see how we can
safely
> > dump its contents.
> > 
> > The main question is, how does this plan sound to you?
> 
> Ok I think I have a slightly better plan.
> - I can change the MemoryDumpManager to ensure that all the dump providers
> without a task runner affinity happen on a dedicated thread (used only by the
> MemoryDumpManager). I need this thread anyways for other reasons.
> - All the OnMemoryDump calls for non-taskrunner bound threads will happen on
> this thread. This will be the case for Connection().
> - In this way I can make the UnregisterDumpProvider call locked. Essentially
> ~Connection dtor will call MemoryDumpManager::UnregisterDumpProvider as first
> thing in the constructor. That will take the internal lock of the MDM and will
> prevent that something else tries to OnMemoryDump while we are being
destroyed.
> Nothing in the ~Connection dtor can possibly depend on a thread which is used
> exclusively by MDM, so no deadlocks should happen in this direction.
> 
> In the other direction, we have to guarantee that the body of OnMemoryDump
never
> takes any other lock shared with the UnregisterDumpProvider, but that should
be
> easy to guarantee as the Unregister call is pretty dumb.
> 
> Does all of this make sense? I'll try to think again tomorrow with a fresher
> mind to this.

The main concern I have with this is that it's not clear that OnMemoryDump() can
call into a Connection object which may be blocked in ~Connection.  At this time
there are no virtual methods, but it's still a subtle issue.  At minimum it
would probably want to mark all Connection methods off-limits.  The current set
of sqlite3 functions used by OnMemoryDump() should be fine as long as it's all
under the OnMemoryDump() lock preventing ~Connection from reaching Close() (*). 
Of course, any extension to other sqlite3 functions should be carefully
considered, because doing I/O under the lock is probably a bad idea.

(*) It may be that the unregister should happen at the top of CloseInternal().

michaeln

On 2015/10/27 19:11:52, Primiano Tucci wrote: > On 2015/10/27 18:47:34, Primiano Tucci wrote: > > ...

5 years, 1 month ago (2015-10-27 19:57:50 UTC) #53

Message was sent while issue was closed.

On 2015/10/27 19:11:52, Primiano Tucci wrote:
> On 2015/10/27 18:47:34, Primiano Tucci wrote:
> > On 2015/10/21 21:01:05, Scott Hess wrote:
> > > On 2015/10/21 20:20:32, michaeln wrote:
> > > > The easiest safest change might be to require callers provide a
> > > > sequencedTaskRunner and that they call unreg only on that taskrunner.
But
> > that
> > > > makes it impossible to instrument sql::Connection directly w/o involving
> the
> > > > clients of the class (get them to give you a task runner).
> > > 
> > > I was going to suggest shifting the client to a separate object which
> > > cross-references with the Connection, and then they could have independent
> > > lifetimes.  BUT, there would still be a race when the reporter wants to
> > generate
> > > stats.
> > > 
> > > That said, since Connection is generally already in an AssertIOAllowed()
> > thread,
> > > having a lock to handle this wouldn't be absurd.  I'm just not sure where
> > you'd
> > > put the lock, so it might have to be global.  I guess the init lock might
> make
> > > sense, it's only used on open.
> > > 
> > > I'm not sure what the ownership situation on that new object would be,
> though.
> > 
> > Ok I had some time to look at the code.
> > michaeln is definitely right, this code as it is is racy for the precise
> reason
> > explained in #42.
> > My understanding is that even if we specify a taskrunner affinity for the
> > OnMemoryDump method, that still won't solve the problem, as there is no
> > guarantee that a Connection is destroyed on the same thread of the ctor
(still
> > would reduce the chances of the race above).
> > I don't see how adding a SequenceChecker would solve any problem. It would
be
> > totally different if we were guaranteed that all Connection clients use it
on
> a
> > SequencedTaskRunner and they respect that (read: a Connection is always
> created,
> > accessed and destroyed on the same SequencedTaskRunner). Is that the case
> > though?
> > It doesn't look clear to me, from the comments above my guess is that the
> answer
> > is: no, it happens to be sequentially-safish, but there is no
> > SequencedTaskRunner enforcement from the client side (please correct me if I
> am
> > wrong here).

We need to change the interfce between the mdm and the dumpees is some way to
make this work. I see a couple general options.

1) Change the OnDumpMemory(callback) method to be more async in nature and to
invoke a callback when done and avoid registering rawptrs to objects with ill
defined lifetimes. Instead at register time, xfer ownership of a client dumpee
shim with an async OnDumpMemory() method. The MDM can serialize the addition of
that owned object to the set of all dumpees. The shim could be called on a
thread that is fileio safe. The shim could handle any thread hopping details
needed and eventually reply. One kind of reply could be "dumpee gone" and the
MDM could remove the shim from its set and delete it. There could also be a
threadsafe way to unreg, reg returns an id which can be used to unreg later. 

2)Require that dumpees provide a seqtaskrunner and gaurantee that reg/unreg are
called on that sequence. THe MDM would then be requred to call the dumpee on
that seq.

I think the second is hard to do given your trying to graft dumpee behavior onto
existing objects that do not have thread or sequence affinity. The shape of the
first one seems more promising to me.

> > 
> > If all the statements above are true, the only option that I can see is to:
> > - Move the dump provider for Connection to its own, RefCountedThreadSafe
> object,
> > and having Connection own it and destroy it via PostTask on the a single
> thread.
> > - Having a lock shared between the ~Connection dtor and the dump provider
> above
> > (+ a flag on the dump provider side to tell whether the connection went
away)
> > - The lock will never be contended unless tracing is disabled.
> > 
> > Otherwise, if we are in the state where Connection can be destroyed on any
> > thread, and we don't know upfront on which one, I don't see how we can
safely
> > dump its contents.
> > 
> > The main question is, how does this plan sound to you?
> 
> Ok I think I have a slightly better plan.
> - I can change the MemoryDumpManager to ensure that all the dump providers
> without a task runner affinity happen on a dedicated thread (used only by the
> MemoryDumpManager). I need this thread anyways for other reasons.

Do you really need a dedicated thread to tally memory consumption? Can you
explain what the other reasons are? We generally try  to avoid one-off threads.

> - All the OnMemoryDump calls for non-taskrunner bound threads will happen on
> this thread. This will be the case for Connection().
> - In this way I can make the UnregisterDumpProvider call locked. Essentially
> ~Connection dtor will call MemoryDumpManager::UnregisterDumpProvider as first
> thing in the constructor. That will take the internal lock of the MDM and will
> prevent that something else tries to OnMemoryDump while we are being
destroyed.
> Nothing in the ~Connection dtor can possibly depend on a thread which is used
> exclusively by MDM, so no deadlocks should happen in this direction.

I think that would eliminate the raciness, but may open us up to janking the UI
or IO threads on disk io? I think sqlite internally does locking to accomodate
multi-threaded use of an open sqlite3 db handle. A call to mdm reg/unreg could
end up blocking on OnDump which is blocking on sqlite to finish reading or
writing something.
 
> In the other direction, we have to guarantee that the body of OnMemoryDump
never
> takes any other lock shared with the UnregisterDumpProvider, but that should
be
> easy to guarantee as the Unregister call is pretty dumb.
> 
> Does all of this make sense? I'll try to think again tomorrow with a fresher
> mind to this.

michaeln

> 1) Change the OnDumpMemory(callback) method to be more async in nature and to > ...

5 years, 1 month ago (2015-10-27 20:50:18 UTC) #54

ssid

5 years, 1 month ago (2015-11-02 19:36:47 UTC) #55

Message was sent while issue was closed.

> The main concern I have with this is that it's not clear that OnMemoryDump()
can
> call into a Connection object which may be blocked in ~Connection.  At this
time
> there are no virtual methods, but it's still a subtle issue.  At minimum it
> would probably want to mark all Connection methods off-limits.  The current
set
> of sqlite3 functions used by OnMemoryDump() should be fine as long as it's all
> under the OnMemoryDump() lock preventing ~Connection from reaching Close()
(*). 
> Of course, any extension to other sqlite3 functions should be carefully
> considered, because doing I/O under the lock is probably a bad idea.
> 
> (*) It may be that the unregister should happen at the top of CloseInternal().


Yes, I think it is better to unregister at CloseInternal. Since this would block
exactly what we want (database gettin destroyed when dump happens) instead of
blocking ~Connection.

On 2015/10/27 19:57:50, michaeln wrote:
> On 2015/10/27 19:11:52, Primiano Tucci wrote:
> > On 2015/10/27 18:47:34, Primiano Tucci wrote:
> > > On 2015/10/21 21:01:05, Scott Hess wrote:
> > > > On 2015/10/21 20:20:32, michaeln wrote:
> > > > > The easiest safest change might be to require callers provide a
> > > > > sequencedTaskRunner and that they call unreg only on that taskrunner.
> But
> > > that
> > > > > makes it impossible to instrument sql::Connection directly w/o
involving
> > the
> > > > > clients of the class (get them to give you a task runner).
> > > > 
> > > > I was going to suggest shifting the client to a separate object which
> > > > cross-references with the Connection, and then they could have
independent
> > > > lifetimes.  BUT, there would still be a race when the reporter wants to
> > > generate
> > > > stats.
> > > > 
> > > > That said, since Connection is generally already in an AssertIOAllowed()
> > > thread,
> > > > having a lock to handle this wouldn't be absurd.  I'm just not sure
where
> > > you'd
> > > > put the lock, so it might have to be global.  I guess the init lock
might
> > make
> > > > sense, it's only used on open.
> > > > 
> > > > I'm not sure what the ownership situation on that new object would be,
> > though.
> > > 
> > > Ok I had some time to look at the code.
> > > michaeln is definitely right, this code as it is is racy for the precise
> > reason
> > > explained in #42.
> > > My understanding is that even if we specify a taskrunner affinity for the
> > > OnMemoryDump method, that still won't solve the problem, as there is no
> > > guarantee that a Connection is destroyed on the same thread of the ctor
> (still
> > > would reduce the chances of the race above).
> > > I don't see how adding a SequenceChecker would solve any problem. It would
> be
> > > totally different if we were guaranteed that all Connection clients use it
> on
> > a
> > > SequencedTaskRunner and they respect that (read: a Connection is always
> > created,
> > > accessed and destroyed on the same SequencedTaskRunner). Is that the case
> > > though?
> > > It doesn't look clear to me, from the comments above my guess is that the
> > answer
> > > is: no, it happens to be sequentially-safish, but there is no
> > > SequencedTaskRunner enforcement from the client side (please correct me if
I
> > am
> > > wrong here).
> 
> We need to change the interfce between the mdm and the dumpees is some way to
> make this work. I see a couple general options.
> 
> 1) Change the OnDumpMemory(callback) method to be more async in nature and to
> invoke a callback when done and avoid registering rawptrs to objects with ill
> defined lifetimes. Instead at register time, xfer ownership of a client dumpee
> shim with an async OnDumpMemory() method. The MDM can serialize the addition
of
> that owned object to the set of all dumpees. The shim could be called on a
> thread that is fileio safe. The shim could handle any thread hopping details
> needed and eventually reply. One kind of reply could be "dumpee gone" and the
> MDM could remove the shim from its set and delete it. There could also be a
> threadsafe way to unreg, reg returns an id which can be used to unreg later. 
> 
> 2)Require that dumpees provide a seqtaskrunner and gaurantee that reg/unreg
are
> called on that sequence. THe MDM would then be requred to call the dumpee on
> that seq.
> 
> I think the second is hard to do given your trying to graft dumpee behavior
onto
> existing objects that do not have thread or sequence affinity. The shape of
the
> first one seems more promising to me.
> 
> > > 
> > > If all the statements above are true, the only option that I can see is
to:
> > > - Move the dump provider for Connection to its own, RefCountedThreadSafe
> > object,
> > > and having Connection own it and destroy it via PostTask on the a single
> > thread.
> > > - Having a lock shared between the ~Connection dtor and the dump provider
> > above
> > > (+ a flag on the dump provider side to tell whether the connection went
> away)
> > > - The lock will never be contended unless tracing is disabled.
> > > 
> > > Otherwise, if we are in the state where Connection can be destroyed on any
> > > thread, and we don't know upfront on which one, I don't see how we can
> safely
> > > dump its contents.
> > > 
> > > The main question is, how does this plan sound to you?
> > 
> > Ok I think I have a slightly better plan.
> > - I can change the MemoryDumpManager to ensure that all the dump providers
> > without a task runner affinity happen on a dedicated thread (used only by
the
> > MemoryDumpManager). I need this thread anyways for other reasons.
> 
> Do you really need a dedicated thread to tally memory consumption? Can you
> explain what the other reasons are? We generally try  to avoid one-off
threads.
> 

The dedicated thread is needed because when tracing is enabled the ui thread is
janked since the dump providers perform heavy operations. This has already been
done.

> > - All the OnMemoryDump calls for non-taskrunner bound threads will happen on
> > this thread. This will be the case for Connection().
> > - In this way I can make the UnregisterDumpProvider call locked. Essentially
> > ~Connection dtor will call MemoryDumpManager::UnregisterDumpProvider as
first
> > thing in the constructor. That will take the internal lock of the MDM and
will
> > prevent that something else tries to OnMemoryDump while we are being
> destroyed.
> > Nothing in the ~Connection dtor can possibly depend on a thread which is
used
> > exclusively by MDM, so no deadlocks should happen in this direction.
> 
> I think that would eliminate the raciness, but may open us up to janking the
UI
> or IO threads on disk io? I think sqlite internally does locking to accomodate
> multi-threaded use of an open sqlite3 db handle. A call to mdm reg/unreg could
> end up blocking on OnDump which is blocking on sqlite to finish reading or
> writing something.

Since we now have a dedicated thread for memory-infra dumpers, the janking due
to dump on main threads is avoided. Yes it is true that ~Connection can be
blocked by onDump call which is doing a read or write operation. This cannot be
avoided in any case. This could happen even if ~Connection is called after a
insert into table, due to internal locking of sqlite3. This can't be avoided
since the user is trying to take memory traces, which inherently slows down the
performance. This only adds a wrapper to internal locks, and does not cause a
lot of extra janking. Also as shess@ pointed out, the sqlite functions currently
called do not use io operations.

>  
> > In the other direction, we have to guarantee that the body of OnMemoryDump
> never
> > takes any other lock shared with the UnregisterDumpProvider, but that should
> be
> > easy to guarantee as the Unregister call is pretty dumb.
> > 
> > Does all of this make sense? I'll try to think again tomorrow with a fresher
> > mind to this.

Issue 1327063002: [tracing] Add sqlite memory statistics to tracing. (Closed)

Description

Patch Set 1 #

Patch Set 2 : Nits. #

Patch Set 3 : Add test #

Patch Set 4 : Nits. #

Patch Set 5 : Fixes. #

Patch Set 6 : Fixes. #

Patch Set 7 : Rebase. #

Messages