|
|
Description[tracing] Add sqlite memory statistics to tracing.
The process-wide memory usage statistics of sqlite library is added to
chrome://tracing. The memory usage of sqlite library is mainly through
sqlite_malloc. The total usage of the process is recorded by
sqlite3_memory_used() api. This CL also adds per-connection memory
usage to tracing. Each connection uses memory for cache, schema and
statement, and these usages are recorded. sqlit3_malloc uses malloc
internally to allocate memory. So, thie memory is traced as
sub-allocation from system_allocator(malloc).
This CL lets us keep track of sqlite memory usage in chrome telemetry.
BUG=466141
Committed: https://crrev.com/9f8022f2aadfd86989231e6d367c2c2b9a53ae4b
Cr-Commit-Position: refs/heads/master@{#353549}
Patch Set 1 #Patch Set 2 : Nits. #Patch Set 3 : Add test #Patch Set 4 : Nits. #
Total comments: 21
Patch Set 5 : Fixes. #
Total comments: 4
Patch Set 6 : Fixes. #Patch Set 7 : Rebase. #
Total comments: 5
Messages
Total messages: 55 (12 generated)
ssid@chromium.org changed reviewers: + primiano@chromium.org
PTAL, thanks
Thanks for doing this Looks generally reasonable. I have some general comments. After which we will need a round from the owners. Also, can you please expand the cl description, describing what you keep track of here? https://codereview.chromium.org/1327063002/diff/60001/sql/BUILD.gn File sql/BUILD.gn (right): https://codereview.chromium.org/1327063002/diff/60001/sql/BUILD.gn#newcode17 sql/BUILD.gn:17: "process_memory_dump_provider.h", maybe remove process_ from the file names? Makes it a bit misleading. https://codereview.chromium.org/1327063002/diff/60001/sql/connection.cc File sql/connection.cc (right): https://codereview.chromium.org/1327063002/diff/60001/sql/connection.cc#newco... sql/connection.cc:229: sqlite3_db_status(db_, SQLITE_DBSTATUS_CACHE_USED, &cache_size, &dummy_int, why not dumping also the high watermarks (your dummy_int) since you're here? https://codereview.chromium.org/1327063002/diff/60001/sql/connection.h File sql/connection.h (right): https://codereview.chromium.org/1327063002/diff/60001/sql/connection.h#newcod... sql/connection.h:107: private: Just to have a rough idea, how many connections can we have open? Is it a few, dozens or hundreds? https://codereview.chromium.org/1327063002/diff/60001/sql/connection_unittest.cc File sql/connection_unittest.cc (right): https://codereview.chromium.org/1327063002/diff/60001/sql/connection_unittest... sql/connection_unittest.cc:1307: EXPECT_NE(0u, pmd.allocator_dumps().size()); Doesn't make a huge difference, but probably more readable if you say _GE(0u) https://codereview.chromium.org/1327063002/diff/60001/sql/process_memory_dump... File sql/process_memory_dump_provider.cc (right): https://codereview.chromium.org/1327063002/diff/60001/sql/process_memory_dump... sql/process_memory_dump_provider.cc:30: &dummy_high_water, 0 /* resetFlag */); ditto: report also the high water mark. Also, dummy_high_water is a very debatable variable name. Perhaps unused_max, or unused_high_watermark (But I'd just use and report it) https://codereview.chromium.org/1327063002/diff/60001/sql/process_memory_dump... sql/process_memory_dump_provider.cc:49: dump->AddScalar("sqlite_malloc_size", sqlite_ prefix here seems redundant. This is already in the contexct of a sqlite dump https://codereview.chromium.org/1327063002/diff/60001/sql/process_memory_dump... sql/process_memory_dump_provider.cc:52: dump->AddScalar("high_water_malloc", I'd call this malloc_high_wmark_size if you to keep the column name short https://codereview.chromium.org/1327063002/diff/60001/sql/process_memory_dump... sql/process_memory_dump_provider.cc:61: base::trace_event::MemoryAllocatorDump::kUnitsObjects, Hmm not sure the number of pages is really relevant. Is there any way to get their size? https://codereview.chromium.org/1327063002/diff/60001/sql/process_memory_dump... sql/process_memory_dump_provider.cc:78: pmd->AddSuballocation(dump->guid(), system_allocator_name); Are you sure that all the scratch + pagecache are sub-allocated from malloc? By reading the comments looks like only the "malloc" part is malloc-allocated, and for the rest it might have a page allocator (mmap) If unsure ask the OWNERS, they'll know more than myself. https://codereview.chromium.org/1327063002/diff/60001/sql/process_memory_dump... File sql/process_memory_dump_provider.h (right): https://codereview.chromium.org/1327063002/diff/60001/sql/process_memory_dump... sql/process_memory_dump_provider.h:14: class SQL_EXPORT ProcessMemoryDumpProvider Can you please add a comment explaining what this is dumping? HOw is that different from the per-connection MDP ?
Patchset #5 (id:80001) has been deleted
Made changes, ptal. https://codereview.chromium.org/1327063002/diff/60001/sql/BUILD.gn File sql/BUILD.gn (right): https://codereview.chromium.org/1327063002/diff/60001/sql/BUILD.gn#newcode17 sql/BUILD.gn:17: "process_memory_dump_provider.h", On 2015/09/30 10:26:56, Primiano Tucci wrote: > maybe remove process_ from the file names? Makes it a bit misleading. It was process wide stats, so added process_. removed it. https://codereview.chromium.org/1327063002/diff/60001/sql/connection.cc File sql/connection.cc (right): https://codereview.chromium.org/1327063002/diff/60001/sql/connection.cc#newco... sql/connection.cc:229: sqlite3_db_status(db_, SQLITE_DBSTATUS_CACHE_USED, &cache_size, &dummy_int, On 2015/09/30 10:26:56, Primiano Tucci wrote: > why not dumping also the high watermarks (your dummy_int) since you're here? The high water mark is not tracked by the sqlite library for these parameters. It always returns 0 according to api definition. That is why dummy_int (it is unusable). https://codereview.chromium.org/1327063002/diff/60001/sql/connection.h File sql/connection.h (right): https://codereview.chromium.org/1327063002/diff/60001/sql/connection.h#newcod... sql/connection.h:107: private: On 2015/09/30 10:26:56, Primiano Tucci wrote: > Just to have a rough idea, how many connections can we have open? > Is it a few, dozens or hundreds? It is usually less than 2 dozens. https://codereview.chromium.org/1327063002/diff/60001/sql/connection_unittest.cc File sql/connection_unittest.cc (right): https://codereview.chromium.org/1327063002/diff/60001/sql/connection_unittest... sql/connection_unittest.cc:1307: EXPECT_NE(0u, pmd.allocator_dumps().size()); On 2015/09/30 10:26:56, Primiano Tucci wrote: > Doesn't make a huge difference, but probably more readable if you say _GE(0u) Done. https://codereview.chromium.org/1327063002/diff/60001/sql/process_memory_dump... File sql/process_memory_dump_provider.cc (right): https://codereview.chromium.org/1327063002/diff/60001/sql/process_memory_dump... sql/process_memory_dump_provider.cc:30: &dummy_high_water, 0 /* resetFlag */); On 2015/09/30 10:26:56, Primiano Tucci wrote: > ditto: report also the high water mark. > Also, dummy_high_water is a very debatable variable name. > Perhaps unused_max, or unused_high_watermark (But I'd just use and report it) I was actually dumping the high water using different method. Turns out that there are 2 methods to get the total size. using that now instead of this status. Verified both returns the same value. https://codereview.chromium.org/1327063002/diff/60001/sql/process_memory_dump... sql/process_memory_dump_provider.cc:49: dump->AddScalar("sqlite_malloc_size", On 2015/09/30 10:26:56, Primiano Tucci wrote: > sqlite_ prefix here seems redundant. This is already in the contexct of a sqlite > dump removed this since it was actually the total. https://codereview.chromium.org/1327063002/diff/60001/sql/process_memory_dump... sql/process_memory_dump_provider.cc:52: dump->AddScalar("high_water_malloc", On 2015/09/30 10:26:56, Primiano Tucci wrote: > I'd call this malloc_high_wmark_size if you to keep the column name short Done. https://codereview.chromium.org/1327063002/diff/60001/sql/process_memory_dump... sql/process_memory_dump_provider.cc:61: base::trace_event::MemoryAllocatorDump::kUnitsObjects, On 2015/09/30 10:26:56, Primiano Tucci wrote: > Hmm not sure the number of pages is really relevant. Is there any way to get > their size? hm no. I don't find any way to get the sizes. I removed them now. Scratch is allocated from the scratch buffer passed by chrome and chrome does not set the buffer, so it is always empty. Cache sizes are added by each connection, and there is no api to get the total cache size, so removing it. https://codereview.chromium.org/1327063002/diff/60001/sql/process_memory_dump... sql/process_memory_dump_provider.cc:78: pmd->AddSuballocation(dump->guid(), system_allocator_name); On 2015/09/30 10:26:56, Primiano Tucci wrote: > Are you sure that all the scratch + pagecache are sub-allocated from malloc? > By reading the comments looks like only the "malloc" part is malloc-allocated, > and for the rest it might have a page allocator (mmap) > If unsure ask the OWNERS, they'll know more than myself. Sorry about the scratch, I misunderstood. Yes sqlite3_malloc() uses malloc internally. I verified. https://codereview.chromium.org/1327063002/diff/60001/sql/process_memory_dump... File sql/process_memory_dump_provider.h (right): https://codereview.chromium.org/1327063002/diff/60001/sql/process_memory_dump... sql/process_memory_dump_provider.h:14: class SQL_EXPORT ProcessMemoryDumpProvider On 2015/09/30 10:26:56, Primiano Tucci wrote: > Can you please add a comment explaining what this is dumping? HOw is that > different from the per-connection MDP ? Done.
The patch looks good, but before landing would be great if you could figure out (ping some owner) where the page cache is allocated from and if there is a way to get the size. https://codereview.chromium.org/1327063002/diff/60001/sql/process_memory_dump... File sql/process_memory_dump_provider.cc (right): https://codereview.chromium.org/1327063002/diff/60001/sql/process_memory_dump... sql/process_memory_dump_provider.cc:61: base::trace_event::MemoryAllocatorDump::kUnitsObjects, On 2015/10/02 17:06:10, ssid wrote: > On 2015/09/30 10:26:56, Primiano Tucci wrote: > > Hmm not sure the number of pages is really relevant. Is there any way to get > > their size? > > hm no. I don't find any way to get the sizes. I removed them now. Maybe check with OWNERS. If that is actually using memory feels pretty bad not reporting that.
On 2015/10/06 16:21:21, Primiano Tucci wrote: > The patch looks good, but before landing would be great if you could figure out > (ping some owner) where the page cache is allocated from and if there is a way > to get the size. > > https://codereview.chromium.org/1327063002/diff/60001/sql/process_memory_dump... > File sql/process_memory_dump_provider.cc (right): > > https://codereview.chromium.org/1327063002/diff/60001/sql/process_memory_dump... > sql/process_memory_dump_provider.cc:61: > base::trace_event::MemoryAllocatorDump::kUnitsObjects, > On 2015/10/02 17:06:10, ssid wrote: > > On 2015/09/30 10:26:56, Primiano Tucci wrote: > > > Hmm not sure the number of pages is really relevant. Is there any way to get > > > their size? > > > > hm no. I don't find any way to get the sizes. I removed them now. > > Maybe check with OWNERS. If that is actually using memory feels pretty bad not > reporting that. The cache usage per connection is reported, I am guessing this is only the total of them. I will check with the owners.
primiano@chromium.org changed reviewers: + shess@chromium.org
Ok LGTM from my side but I am not an OWNER nor too familiar with sqlite. Would be nice to get a confirmation from an owner that we are reporting the right numbers here. Let's see if we can spot a +shess into the wild :) +Scott: essentially the main question here, beyond the code itself is: are the memory stats ssid is reporting here complete? are we missing some source of memory-usage or over-counting something? The main goal of all of this is to report the memory used by chrome via sqlite. Thanks
LGTM, minor comments in the code. On 2015/10/07 09:38:27, Primiano Tucci wrote: > +Scott: essentially the main question here, beyond the code itself is: are the > memory stats ssid is reporting here complete? are we missing some source of > memory-usage or over-counting something? > The main goal of all of this is to report the memory used by chrome via sqlite. They are complete-ish, I'm less sure about how useful they'll be :-). As mentioned, I'm landing an mmap change which should greatly change the CACHE values. The STMT values I have often wondered about, but the actual values aren't amazingly useful - what I want an angle on is cache performance (if a statement is never/seldom hit, I don't care if it doesn't take much space because that probably means it's cheap to regenerate). My experience is that the SCHEMA value is generally small enough that it's hard to care about, and it's also hard to see what to do about it anyhow. I don't know if it make sense to reference Sqlite.MemoryKB.* histograms anywhere, since they track the overall memory used. https://codereview.chromium.org/1327063002/diff/100001/sql/connection.cc File sql/connection.cc (right): https://codereview.chromium.org/1327063002/diff/100001/sql/connection.cc#newc... sql/connection.cc:239: base::StringPrintf("sqlite/connection_%p", this)); You might consider injecting histogram_tag_, if non-empty. This is used to automatically spray stats across Sqlite.<class>.<tag> histograms. https://codereview.chromium.org/1327063002/diff/100001/sql/sql_memory_dump_pr... File sql/sql_memory_dump_provider.cc (right): https://codereview.chromium.org/1327063002/diff/100001/sql/sql_memory_dump_pr... sql/sql_memory_dump_provider.cc:34: sqlite3_memory_highwater(1 /*resetFlag */)); sqlite3_memory_used() is literally sqlite3_status(SQLITE_STATUS_MEMORY_USED, ...) discarding the highwater mark, and vice versau for _highwater(). Since you're already doing sqlite3_status() for malloc count below, for consistency this could/should probably use sqlite3_status also.
I don't see my "as mentioned" mention. Sigh. http://crbug.com/489784 . It starts using SQLite's mmap support, which I've configured in a way which should mostly shift all of the CACHE usage to the filesystem buffers. CACHE is still used during transactions, and we have some code which uses long-running transactions for various reasons. I have plans for that code, too, but that's longer term.
Patchset #7 (id:140001) has been deleted
Patchset #7 (id:160001) has been deleted
On 2015/10/07 23:11:37, Scott Hess wrote: > I don't see my "as mentioned" mention. Sigh. http://crbug.com/489784 . It > starts using SQLite's mmap support, which I've configured in a way which should > mostly shift all of the CACHE usage to the filesystem buffers. > > CACHE is still used during transactions, and we have some code which uses > long-running transactions for various reasons. I have plans for that code, too, > but that's longer term. Thanks. I see from your CL that sql is going to use mmap for file IO, which accounts for a lot of private clean memory in browser, since the limit is set to 256Mb. Not sure if it is really useful to trace. https://codereview.chromium.org/1327063002/diff/100001/sql/connection.cc File sql/connection.cc (right): https://codereview.chromium.org/1327063002/diff/100001/sql/connection.cc#newc... sql/connection.cc:239: base::StringPrintf("sqlite/connection_%p", this)); On 2015/10/07 23:08:57, Scott Hess wrote: > You might consider injecting histogram_tag_, if non-empty. This is used to > automatically spray stats across Sqlite.<class>.<tag> histograms. Done thanks. https://codereview.chromium.org/1327063002/diff/100001/sql/sql_memory_dump_pr... File sql/sql_memory_dump_provider.cc (right): https://codereview.chromium.org/1327063002/diff/100001/sql/sql_memory_dump_pr... sql/sql_memory_dump_provider.cc:34: sqlite3_memory_highwater(1 /*resetFlag */)); On 2015/10/07 23:08:57, Scott Hess wrote: > sqlite3_memory_used() is literally sqlite3_status(SQLITE_STATUS_MEMORY_USED, > ...) discarding the highwater mark, and vice versau for _highwater(). Since > you're already doing sqlite3_status() for malloc count below, for consistency > this could/should probably use sqlite3_status also. I see. I thought getting them separately was faster for some reason. Changed it.
ssid@chromium.org changed reviewers: + avi@chromium.org
+avi: For a 2 line change in browser_main_loop.cc. PTAL.
https://codereview.chromium.org/1327063002/diff/180001/content/browser/browse... File content/browser/browser_main_loop.cc (right): https://codereview.chromium.org/1327063002/diff/180001/content/browser/browse... content/browser/browser_main_loop.cc:666: sql::SqlMemoryDumpProvider::GetInstance()); Don't we need this also in child processes?
https://codereview.chromium.org/1327063002/diff/180001/content/browser/browse... File content/browser/browser_main_loop.cc (right): https://codereview.chromium.org/1327063002/diff/180001/content/browser/browse... content/browser/browser_main_loop.cc:666: sql::SqlMemoryDumpProvider::GetInstance()); On 2015/10/08 17:18:20, Primiano Tucci wrote: > Don't we need this also in child processes? No, there are no connections opened in child processes as far as I have seen. It connects to databases via IPC. ("content/common/database_messages.h")
On 2015/10/08 17:20:17, ssid wrote: > https://codereview.chromium.org/1327063002/diff/180001/content/browser/browse... > File content/browser/browser_main_loop.cc (right): > > https://codereview.chromium.org/1327063002/diff/180001/content/browser/browse... > content/browser/browser_main_loop.cc:666: > sql::SqlMemoryDumpProvider::GetInstance()); > On 2015/10/08 17:18:20, Primiano Tucci wrote: > > Don't we need this also in child processes? > > No, there are no connections opened in child processes as far as I have seen. It > connects to databases via IPC. ("content/common/database_messages.h") The only direct SQLite user in children should be WebSQL. That works by shipping file handles across IPC, so it would still have the same memory-usage issues. But I'm not sure WebSQL is worth wiring this in to track. It's possible that non-renderer non-browser utility processes could use SQLite, but most databases are opened in a locking mode which would prevent that from working. I'm not aware of other processes using it.
On 2015/10/08 17:41:17, Scott Hess wrote: > The only direct SQLite user in children should be WebSQL. That works by > shipping file handles across IPC, so it would still have the same memory-usage > issues. But I'm not sure WebSQL is worth wiring this in to track. Well but on the other side we are talking about adding two extra lines to content_main to get more complete data? Or would we need extra wiring for WebSQL?
On 2015/10/08 17:41:17, Scott Hess wrote: > The only direct SQLite user in children should be WebSQL. That works by > shipping file handles across IPC, so it would still have the same memory-usage > issues. But I'm not sure WebSQL is worth wiring this in to track. Well but on the other side we are talking about adding two extra lines to content_main to get more complete data? Or would we need extra wiring for WebSQL?
On 2015/10/08 16:53:59, ssid wrote: > On 2015/10/07 23:11:37, Scott Hess wrote: > > I don't see my "as mentioned" mention. Sigh. http://crbug.com/489784 . It > > starts using SQLite's mmap support, which I've configured in a way which > should > > mostly shift all of the CACHE usage to the filesystem buffers. > > > > CACHE is still used during transactions, and we have some code which uses > > long-running transactions for various reasons. I have plans for that code, > too, > > but that's longer term. > > Thanks. > I see from your CL that sql is going to use mmap for file IO, which accounts for > a lot of private clean memory in browser, since the limit is set to 256Mb. Not > sure if it is really useful to trace. 256MB per connection, and our 99th percentile on db size is like 50MB. Effectively all of our cache usage will be removed. I mostly mention it because if the change sticks, it's going to make a big change to the metrics. Likewise if the change ends up being reverted, or if it has to be selectively reverted due to some sort of platform-specific issue.
lgtm
On 2015/10/08 17:43:15, Primiano Tucci wrote: > On 2015/10/08 17:41:17, Scott Hess wrote: > > The only direct SQLite user in children should be WebSQL. That works by > > shipping file handles across IPC, so it would still have the same memory-usage > > issues. But I'm not sure WebSQL is worth wiring this in to track. > > Well but on the other side we are talking about adding two extra lines to > content_main to get more complete data? > Or would we need extra wiring for WebSQL? WebSQL has been deprecated for many years, and is used by very few sites. Rather than ask "How hard is it to wire in?", I'd rather ask "What value is there to wiring it in?"
On 2015/10/08 17:50:43, Scott Hess wrote: > WebSQL has been deprecated for many years, and is used by very few sites. > Rather than ask "How hard is it to wire in?", I'd rather ask "What value is > there to wiring it in?" The value from my side is: if there is a site out there which, for whatever reason, ends up making some massive use of WebSQL we have a way to say "aha" rather than rambling in the dark.
On 2015/10/08 17:54:30, Primiano Tucci wrote: > On 2015/10/08 17:50:43, Scott Hess wrote: > > WebSQL has been deprecated for many years, and is used by very few sites. > > Rather than ask "How hard is it to wire in?", I'd rather ask "What value is > > there to wiring it in?" > > The value from my side is: if there is a site out there which, for whatever > reason, ends up making some massive use of WebSQL we have a way to say "aha" > rather than rambling in the dark. The WebSQL as I can see communicates with the browser databases and will be accounted inside browser process. ("components/webdata/common/web_database.cc").
On 2015/10/08 17:57:46, ssid wrote: > On 2015/10/08 17:54:30, Primiano Tucci wrote: > > On 2015/10/08 17:50:43, Scott Hess wrote: > > > WebSQL has been deprecated for many years, and is used by very few sites. > > > Rather than ask "How hard is it to wire in?", I'd rather ask "What value is > > > there to wiring it in?" > > > > The value from my side is: if there is a site out there which, for whatever > > reason, ends up making some massive use of WebSQL we have a way to say "aha" > > rather than rambling in the dark. > > The WebSQL as I can see communicates with the browser databases and will be > accounted inside browser process. ("components/webdata/common/web_database.cc"). That's meta-information about databases. The actual databases are in at: third_party/WebKit/Source/modules/webdatabase/sqlite/SQLiteFileSystem.cpp and SQLiteFileSystemPosix.cpp/SQLiteFileSystemWin.cpp. In any case, the code uses SQLite APIs directly, not our sql::Connection wrapper, so I guess wiring it in isn't as simple as a few lines in the renderer startup.
The CQ bit was checked by ssid@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from primiano@chromium.org Link to the patchset: https://codereview.chromium.org/1327063002/#ps180001 (title: "Rebase.")
The CQ bit was unchecked by ssid@chromium.org
The CQ bit was unchecked by ssid@chromium.org
lgtm stampity stamp
Filed a bug to track the issue about the renderer databases memory usage. https://code.google.com/p/chromium/issues/detail?id=542333.
The CQ bit was checked by ssid@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1327063002/180001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1327063002/180001
Message was sent while issue was closed.
Committed patchset #7 (id:180001)
Message was sent while issue was closed.
Patchset 7 (id:??) landed as https://crrev.com/9f8022f2aadfd86989231e6d367c2c2b9a53ae4b Cr-Commit-Position: refs/heads/master@{#353549}
Message was sent while issue was closed.
michaeln@chromium.org changed reviewers: + michaeln@chromium.org
Message was sent while issue was closed.
https://codereview.chromium.org/1327063002/diff/180001/sql/connection.cc File sql/connection.cc (right): https://codereview.chromium.org/1327063002/diff/180001/sql/connection.cc#newc... sql/connection.cc:222: bool Connection::OnMemoryDump(const base::trace_event::MemoryDumpArgs& args, what prevents this method from being called after ~Connection has happened?
Message was sent while issue was closed.
https://codereview.chromium.org/1327063002/diff/180001/sql/connection.cc File sql/connection.cc (right): https://codereview.chromium.org/1327063002/diff/180001/sql/connection.cc#newc... sql/connection.cc:222: bool Connection::OnMemoryDump(const base::trace_event::MemoryDumpArgs& args, On 2015/10/15 23:03:28, michaeln wrote: > what prevents this method from being called after ~Connection has happened? ~Connection unregisters, and the !db_ test below should handle cases where the db closed but the instance is still live. [I should have suggested is_open() there, but they mean the same thing at this time.]
Message was sent while issue was closed.
https://codereview.chromium.org/1327063002/diff/180001/sql/connection.cc File sql/connection.cc (right): https://codereview.chromium.org/1327063002/diff/180001/sql/connection.cc#newc... sql/connection.cc:222: bool Connection::OnMemoryDump(const base::trace_event::MemoryDumpArgs& args, On 2015/10/15 23:50:29, Scott Hess wrote: > On 2015/10/15 23:03:28, michaeln wrote: > > what prevents this method from being called after ~Connection has happened? As long as the dtor happens on the same thread of the ctor, the OnMemoryDump is guaranteed to not be called adter the UnregisterDumpProvider call (which happens on the dtor)
Message was sent while issue was closed.
On 2015/10/16 08:37:44, Primiano Tucci wrote: > https://codereview.chromium.org/1327063002/diff/180001/sql/connection.cc > File sql/connection.cc (right): > > https://codereview.chromium.org/1327063002/diff/180001/sql/connection.cc#newc... > sql/connection.cc:222: bool Connection::OnMemoryDump(const > base::trace_event::MemoryDumpArgs& args, > On 2015/10/15 23:50:29, Scott Hess wrote: > > On 2015/10/15 23:03:28, michaeln wrote: > > > what prevents this method from being called after ~Connection has happened? > > As long as the dtor happens on the same thread of the ctor, the OnMemoryDump is > guaranteed to not be called adter the UnregisterDumpProvider call (which happens > on the dtor) The sql::Connection classes don't register with thread affinity and don't necessarily get deleted on the same thread they were created on. So Reg(), Unreg(), and Dump() can on different threads. ThreadDump() { { AutoLock lock(lock_); // compute skip_dump while holding the lock_ skip_dump = is_unregistred; } ???? what prevents a different thread from entering Unreg() and deleting the provider between execution of the block above and the block below ???? // Invoke the dump provider without holding the |lock_|. if (!skip_dump) { MemoryDumpArgs args = {pmd_async_state->req_args.level_of_detail}; dump_successful = mdp->OnMemoryDump(args, &pmd_async_state->process_memory_dump); }
Message was sent while issue was closed.
On 2015/10/16 20:06:43, michaeln wrote: > On 2015/10/16 08:37:44, Primiano Tucci wrote: > > https://codereview.chromium.org/1327063002/diff/180001/sql/connection.cc > > File sql/connection.cc (right): > > > > > https://codereview.chromium.org/1327063002/diff/180001/sql/connection.cc#newc... > > sql/connection.cc:222: bool Connection::OnMemoryDump(const > > base::trace_event::MemoryDumpArgs& args, > > On 2015/10/15 23:50:29, Scott Hess wrote: > > > On 2015/10/15 23:03:28, michaeln wrote: > > > > what prevents this method from being called after ~Connection has > happened? > > > > As long as the dtor happens on the same thread of the ctor, the OnMemoryDump > is > > guaranteed to not be called adter the UnregisterDumpProvider call (which > happens > > on the dtor) > > The sql::Connection classes don't register with thread affinity and don't > necessarily get deleted on the same thread they were created on. So Reg(), > Unreg(), and Dump() can on different threads. > > ThreadDump() { > { > AutoLock lock(lock_); > // compute skip_dump while holding the lock_ > skip_dump = is_unregistred; > } > > ???? > what prevents a different thread from entering Unreg() and deleting the > provider > between execution of the block above and the block below > ???? > > // Invoke the dump provider without holding the |lock_|. > if (!skip_dump) { > MemoryDumpArgs args = {pmd_async_state->req_args.level_of_detail}; > dump_successful = > mdp->OnMemoryDump(args, &pmd_async_state->process_memory_dump); > } Ok if this is the case we should revert this CL until we get the threading right.
Message was sent while issue was closed.
On 2015/10/19 14:00:52, Primiano Tucci wrote: > Ok if this is the case we should revert this CL until we get the threading > right. sql::Connection is not intended to be thread-safe. Unfortunately, by the time it occurred to anyone to add ThreadChecker annotations, clever coders had already used it in ways which made ThreadChecker less easy to add in. Unfortunately, while I believe that the cross-thread uses _are_ predominantly thread-safe (a PostTask is between uses on different threads), there's nothing in there to enforce it.
Message was sent while issue was closed.
On 2015/10/19 16:39:35, Scott Hess wrote: > On 2015/10/19 14:00:52, Primiano Tucci wrote: > > Ok if this is the case we should revert this CL until we get the threading > > right. > > sql::Connection is not intended to be thread-safe. Unfortunately, by the time > it occurred to anyone to add ThreadChecker annotations, clever coders had > already used it in ways which made ThreadChecker less easy to add in. > > Unfortunately, while I believe that the cross-thread uses _are_ predominantly > thread-safe (a PostTask is between uses on different threads), there's nothing > in there to enforce it. see base::SequenceChecker, it can verify both single-threaded'ness and single-sequence'ness
Message was sent while issue was closed.
On 2015/10/19 19:59:54, michaeln wrote: > On 2015/10/19 16:39:35, Scott Hess wrote: > > On 2015/10/19 14:00:52, Primiano Tucci wrote: > > > Ok if this is the case we should revert this CL until we get the threading > > > right. > > > > sql::Connection is not intended to be thread-safe. Unfortunately, by the time > > it occurred to anyone to add ThreadChecker annotations, clever coders had > > already used it in ways which made ThreadChecker less easy to add in. > > > > Unfortunately, while I believe that the cross-thread uses _are_ predominantly > > thread-safe (a PostTask is between uses on different threads), there's nothing > > in there to enforce it. > > see base::SequenceChecker, it can verify both single-threaded'ness and > single-sequence'ness "SequenceChecker is a helper class used to help verify that some methods of a class are called in sequence -- that is, called from the same SequencedTaskRunner." I'm not sure that's better than ThreadChecker in terms of constraining things. Looking at impl.h, it sounds like it devolves to ThreadChecker if not in a SequencedWorkerPool. Many of these clients handle their threading manually. Put another way, the well-behaved clients don't make the problem, it's the not-well-behaved ones which make the problem :-).
Message was sent while issue was closed.
On 2015/10/19 20:07:12, Scott Hess wrote: > On 2015/10/19 19:59:54, michaeln wrote: > > On 2015/10/19 16:39:35, Scott Hess wrote: > > > On 2015/10/19 14:00:52, Primiano Tucci wrote: > > > > Ok if this is the case we should revert this CL until we get the threading > > > > right. > > > > > > sql::Connection is not intended to be thread-safe. Unfortunately, by the > time > > > it occurred to anyone to add ThreadChecker annotations, clever coders had > > > already used it in ways which made ThreadChecker less easy to add in. > > > > > > Unfortunately, while I believe that the cross-thread uses _are_ > predominantly > > > thread-safe (a PostTask is between uses on different threads), there's > nothing > > > in there to enforce it. > > > > see base::SequenceChecker, it can verify both single-threaded'ness and > > single-sequence'ness > > "SequenceChecker is a helper class used to help verify that some methods of a > class are called in sequence -- that is, called from the same > SequencedTaskRunner." > > I'm not sure that's better than ThreadChecker in terms of constraining things. > Looking at impl.h, it sounds like it devolves to ThreadChecker if not in a > SequencedWorkerPool. Many of these clients handle their threading manually. > > Put another way, the well-behaved clients don't make the problem, it's the > not-well-behaved ones which make the problem :-). Is there a bug or bot that is broken? Should I revert this CL till we figure out how to handle the threads?
Message was sent while issue was closed.
Without something to prevent the sequence of events suggested below, there's definitely a latent bug. I don't really see how registering a dump provider without thread affinity is safe regardless of same thread reg/unreg behavior of the caller. The snipped logic below can be run on any thread if the caller does not express a thread affinity. <snip> ThreadDump() { { AutoLock lock(lock_); // compute skip_dump while holding the lock_ skip_dump = is_unregistred; } ???? what prevents a different thread from entering Unreg() and deleting the provider between execution of the block above and below ???? // Invoke the dump provider without holding the |lock_|. if (!skip_dump) { MemoryDumpArgs args = {pmd_async_state->req_args.level_of_detail}; dump_successful = mdp->OnMemoryDump(args, &pmd_async_state->process_memory_dump); } </snip> The easiest safest change might be to require callers provide a sequencedTaskRunner and that they call unreg only on that taskrunner. But that makes it impossible to instrument sql::Connection directly w/o involving the clients of the class (get them to give you a task runner).
Message was sent while issue was closed.
On 2015/10/21 20:20:32, michaeln wrote: > The easiest safest change might be to require callers provide a > sequencedTaskRunner and that they call unreg only on that taskrunner. But that > makes it impossible to instrument sql::Connection directly w/o involving the > clients of the class (get them to give you a task runner). I was going to suggest shifting the client to a separate object which cross-references with the Connection, and then they could have independent lifetimes. BUT, there would still be a race when the reporter wants to generate stats. That said, since Connection is generally already in an AssertIOAllowed() thread, having a lock to handle this wouldn't be absurd. I'm just not sure where you'd put the lock, so it might have to be global. I guess the init lock might make sense, it's only used on open. I'm not sure what the ownership situation on that new object would be, though.
Message was sent while issue was closed.
On 2015/10/21 21:01:05, Scott Hess wrote: > On 2015/10/21 20:20:32, michaeln wrote: > > The easiest safest change might be to require callers provide a > > sequencedTaskRunner and that they call unreg only on that taskrunner. But that > > makes it impossible to instrument sql::Connection directly w/o involving the > > clients of the class (get them to give you a task runner). > > I was going to suggest shifting the client to a separate object which > cross-references with the Connection, and then they could have independent > lifetimes. BUT, there would still be a race when the reporter wants to generate > stats. > > That said, since Connection is generally already in an AssertIOAllowed() thread, > having a lock to handle this wouldn't be absurd. I'm just not sure where you'd > put the lock, so it might have to be global. I guess the init lock might make > sense, it's only used on open. > > I'm not sure what the ownership situation on that new object would be, though. Ok I had some time to look at the code. michaeln is definitely right, this code as it is is racy for the precise reason explained in #42. My understanding is that even if we specify a taskrunner affinity for the OnMemoryDump method, that still won't solve the problem, as there is no guarantee that a Connection is destroyed on the same thread of the ctor (still would reduce the chances of the race above). I don't see how adding a SequenceChecker would solve any problem. It would be totally different if we were guaranteed that all Connection clients use it on a SequencedTaskRunner and they respect that (read: a Connection is always created, accessed and destroyed on the same SequencedTaskRunner). Is that the case though? It doesn't look clear to me, from the comments above my guess is that the answer is: no, it happens to be sequentially-safish, but there is no SequencedTaskRunner enforcement from the client side (please correct me if I am wrong here). If all the statements above are true, the only option that I can see is to: - Move the dump provider for Connection to its own, RefCountedThreadSafe object, and having Connection own it and destroy it via PostTask on the a single thread. - Having a lock shared between the ~Connection dtor and the dump provider above (+ a flag on the dump provider side to tell whether the connection went away) - The lock will never be contended unless tracing is disabled. Otherwise, if we are in the state where Connection can be destroyed on any thread, and we don't know upfront on which one, I don't see how we can safely dump its contents. The main question is, how does this plan sound to you?
Message was sent while issue was closed.
On 2015/10/27 18:47:34, Primiano Tucci wrote: > On 2015/10/21 21:01:05, Scott Hess wrote: > > On 2015/10/21 20:20:32, michaeln wrote: > > > The easiest safest change might be to require callers provide a > > > sequencedTaskRunner and that they call unreg only on that taskrunner. But > that > > > makes it impossible to instrument sql::Connection directly w/o involving the > > > clients of the class (get them to give you a task runner). > > > > I was going to suggest shifting the client to a separate object which > > cross-references with the Connection, and then they could have independent > > lifetimes. BUT, there would still be a race when the reporter wants to > generate > > stats. > > > > That said, since Connection is generally already in an AssertIOAllowed() > thread, > > having a lock to handle this wouldn't be absurd. I'm just not sure where > you'd > > put the lock, so it might have to be global. I guess the init lock might make > > sense, it's only used on open. > > > > I'm not sure what the ownership situation on that new object would be, though. > > Ok I had some time to look at the code. > michaeln is definitely right, this code as it is is racy for the precise reason > explained in #42. > My understanding is that even if we specify a taskrunner affinity for the > OnMemoryDump method, that still won't solve the problem, as there is no > guarantee that a Connection is destroyed on the same thread of the ctor (still > would reduce the chances of the race above). > I don't see how adding a SequenceChecker would solve any problem. It would be > totally different if we were guaranteed that all Connection clients use it on a > SequencedTaskRunner and they respect that (read: a Connection is always created, > accessed and destroyed on the same SequencedTaskRunner). Is that the case > though? > It doesn't look clear to me, from the comments above my guess is that the answer > is: no, it happens to be sequentially-safish, but there is no > SequencedTaskRunner enforcement from the client side (please correct me if I am > wrong here). > > If all the statements above are true, the only option that I can see is to: > - Move the dump provider for Connection to its own, RefCountedThreadSafe object, > and having Connection own it and destroy it via PostTask on the a single thread. > - Having a lock shared between the ~Connection dtor and the dump provider above > (+ a flag on the dump provider side to tell whether the connection went away) > - The lock will never be contended unless tracing is disabled. > > Otherwise, if we are in the state where Connection can be destroyed on any > thread, and we don't know upfront on which one, I don't see how we can safely > dump its contents. > > The main question is, how does this plan sound to you? Ok I think I have a slightly better plan. - I can change the MemoryDumpManager to ensure that all the dump providers without a task runner affinity happen on a dedicated thread (used only by the MemoryDumpManager). I need this thread anyways for other reasons. - All the OnMemoryDump calls for non-taskrunner bound threads will happen on this thread. This will be the case for Connection(). - In this way I can make the UnregisterDumpProvider call locked. Essentially ~Connection dtor will call MemoryDumpManager::UnregisterDumpProvider as first thing in the constructor. That will take the internal lock of the MDM and will prevent that something else tries to OnMemoryDump while we are being destroyed. Nothing in the ~Connection dtor can possibly depend on a thread which is used exclusively by MDM, so no deadlocks should happen in this direction. In the other direction, we have to guarantee that the body of OnMemoryDump never takes any other lock shared with the UnregisterDumpProvider, but that should be easy to guarantee as the Unregister call is pretty dumb. Does all of this make sense? I'll try to think again tomorrow with a fresher mind to this.
Message was sent while issue was closed.
On 2015/10/27 19:11:52, Primiano Tucci wrote: > On 2015/10/27 18:47:34, Primiano Tucci wrote: > > On 2015/10/21 21:01:05, Scott Hess wrote: > > > On 2015/10/21 20:20:32, michaeln wrote: > > > > The easiest safest change might be to require callers provide a > > > > sequencedTaskRunner and that they call unreg only on that taskrunner. But > > that > > > > makes it impossible to instrument sql::Connection directly w/o involving > the > > > > clients of the class (get them to give you a task runner). > > > > > > I was going to suggest shifting the client to a separate object which > > > cross-references with the Connection, and then they could have independent > > > lifetimes. BUT, there would still be a race when the reporter wants to > > generate > > > stats. > > > > > > That said, since Connection is generally already in an AssertIOAllowed() > > thread, > > > having a lock to handle this wouldn't be absurd. I'm just not sure where > > you'd > > > put the lock, so it might have to be global. I guess the init lock might > make > > > sense, it's only used on open. > > > > > > I'm not sure what the ownership situation on that new object would be, > though. > > > > Ok I had some time to look at the code. > > michaeln is definitely right, this code as it is is racy for the precise > reason > > explained in #42. > > My understanding is that even if we specify a taskrunner affinity for the > > OnMemoryDump method, that still won't solve the problem, as there is no > > guarantee that a Connection is destroyed on the same thread of the ctor (still > > would reduce the chances of the race above). > > I don't see how adding a SequenceChecker would solve any problem. It would be > > totally different if we were guaranteed that all Connection clients use it on > a > > SequencedTaskRunner and they respect that (read: a Connection is always > created, > > accessed and destroyed on the same SequencedTaskRunner). Is that the case > > though? > > It doesn't look clear to me, from the comments above my guess is that the > answer > > is: no, it happens to be sequentially-safish, but there is no > > SequencedTaskRunner enforcement from the client side (please correct me if I > am > > wrong here). > > > > If all the statements above are true, the only option that I can see is to: > > - Move the dump provider for Connection to its own, RefCountedThreadSafe > object, > > and having Connection own it and destroy it via PostTask on the a single > thread. > > - Having a lock shared between the ~Connection dtor and the dump provider > above > > (+ a flag on the dump provider side to tell whether the connection went away) > > - The lock will never be contended unless tracing is disabled. > > > > Otherwise, if we are in the state where Connection can be destroyed on any > > thread, and we don't know upfront on which one, I don't see how we can safely > > dump its contents. > > > > The main question is, how does this plan sound to you? > > Ok I think I have a slightly better plan. > - I can change the MemoryDumpManager to ensure that all the dump providers > without a task runner affinity happen on a dedicated thread (used only by the > MemoryDumpManager). I need this thread anyways for other reasons. > - All the OnMemoryDump calls for non-taskrunner bound threads will happen on > this thread. This will be the case for Connection(). > - In this way I can make the UnregisterDumpProvider call locked. Essentially > ~Connection dtor will call MemoryDumpManager::UnregisterDumpProvider as first > thing in the constructor. That will take the internal lock of the MDM and will > prevent that something else tries to OnMemoryDump while we are being destroyed. > Nothing in the ~Connection dtor can possibly depend on a thread which is used > exclusively by MDM, so no deadlocks should happen in this direction. > > In the other direction, we have to guarantee that the body of OnMemoryDump never > takes any other lock shared with the UnregisterDumpProvider, but that should be > easy to guarantee as the Unregister call is pretty dumb. > > Does all of this make sense? I'll try to think again tomorrow with a fresher > mind to this. The main concern I have with this is that it's not clear that OnMemoryDump() can call into a Connection object which may be blocked in ~Connection. At this time there are no virtual methods, but it's still a subtle issue. At minimum it would probably want to mark all Connection methods off-limits. The current set of sqlite3 functions used by OnMemoryDump() should be fine as long as it's all under the OnMemoryDump() lock preventing ~Connection from reaching Close() (*). Of course, any extension to other sqlite3 functions should be carefully considered, because doing I/O under the lock is probably a bad idea. (*) It may be that the unregister should happen at the top of CloseInternal().
Message was sent while issue was closed.
On 2015/10/27 19:11:52, Primiano Tucci wrote: > On 2015/10/27 18:47:34, Primiano Tucci wrote: > > On 2015/10/21 21:01:05, Scott Hess wrote: > > > On 2015/10/21 20:20:32, michaeln wrote: > > > > The easiest safest change might be to require callers provide a > > > > sequencedTaskRunner and that they call unreg only on that taskrunner. But > > that > > > > makes it impossible to instrument sql::Connection directly w/o involving > the > > > > clients of the class (get them to give you a task runner). > > > > > > I was going to suggest shifting the client to a separate object which > > > cross-references with the Connection, and then they could have independent > > > lifetimes. BUT, there would still be a race when the reporter wants to > > generate > > > stats. > > > > > > That said, since Connection is generally already in an AssertIOAllowed() > > thread, > > > having a lock to handle this wouldn't be absurd. I'm just not sure where > > you'd > > > put the lock, so it might have to be global. I guess the init lock might > make > > > sense, it's only used on open. > > > > > > I'm not sure what the ownership situation on that new object would be, > though. > > > > Ok I had some time to look at the code. > > michaeln is definitely right, this code as it is is racy for the precise > reason > > explained in #42. > > My understanding is that even if we specify a taskrunner affinity for the > > OnMemoryDump method, that still won't solve the problem, as there is no > > guarantee that a Connection is destroyed on the same thread of the ctor (still > > would reduce the chances of the race above). > > I don't see how adding a SequenceChecker would solve any problem. It would be > > totally different if we were guaranteed that all Connection clients use it on > a > > SequencedTaskRunner and they respect that (read: a Connection is always > created, > > accessed and destroyed on the same SequencedTaskRunner). Is that the case > > though? > > It doesn't look clear to me, from the comments above my guess is that the > answer > > is: no, it happens to be sequentially-safish, but there is no > > SequencedTaskRunner enforcement from the client side (please correct me if I > am > > wrong here). We need to change the interfce between the mdm and the dumpees is some way to make this work. I see a couple general options. 1) Change the OnDumpMemory(callback) method to be more async in nature and to invoke a callback when done and avoid registering rawptrs to objects with ill defined lifetimes. Instead at register time, xfer ownership of a client dumpee shim with an async OnDumpMemory() method. The MDM can serialize the addition of that owned object to the set of all dumpees. The shim could be called on a thread that is fileio safe. The shim could handle any thread hopping details needed and eventually reply. One kind of reply could be "dumpee gone" and the MDM could remove the shim from its set and delete it. There could also be a threadsafe way to unreg, reg returns an id which can be used to unreg later. 2)Require that dumpees provide a seqtaskrunner and gaurantee that reg/unreg are called on that sequence. THe MDM would then be requred to call the dumpee on that seq. I think the second is hard to do given your trying to graft dumpee behavior onto existing objects that do not have thread or sequence affinity. The shape of the first one seems more promising to me. > > > > If all the statements above are true, the only option that I can see is to: > > - Move the dump provider for Connection to its own, RefCountedThreadSafe > object, > > and having Connection own it and destroy it via PostTask on the a single > thread. > > - Having a lock shared between the ~Connection dtor and the dump provider > above > > (+ a flag on the dump provider side to tell whether the connection went away) > > - The lock will never be contended unless tracing is disabled. > > > > Otherwise, if we are in the state where Connection can be destroyed on any > > thread, and we don't know upfront on which one, I don't see how we can safely > > dump its contents. > > > > The main question is, how does this plan sound to you? > > Ok I think I have a slightly better plan. > - I can change the MemoryDumpManager to ensure that all the dump providers > without a task runner affinity happen on a dedicated thread (used only by the > MemoryDumpManager). I need this thread anyways for other reasons. Do you really need a dedicated thread to tally memory consumption? Can you explain what the other reasons are? We generally try to avoid one-off threads. > - All the OnMemoryDump calls for non-taskrunner bound threads will happen on > this thread. This will be the case for Connection(). > - In this way I can make the UnregisterDumpProvider call locked. Essentially > ~Connection dtor will call MemoryDumpManager::UnregisterDumpProvider as first > thing in the constructor. That will take the internal lock of the MDM and will > prevent that something else tries to OnMemoryDump while we are being destroyed. > Nothing in the ~Connection dtor can possibly depend on a thread which is used > exclusively by MDM, so no deadlocks should happen in this direction. I think that would eliminate the raciness, but may open us up to janking the UI or IO threads on disk io? I think sqlite internally does locking to accomodate multi-threaded use of an open sqlite3 db handle. A call to mdm reg/unreg could end up blocking on OnDump which is blocking on sqlite to finish reading or writing something. > In the other direction, we have to guarantee that the body of OnMemoryDump never > takes any other lock shared with the UnregisterDumpProvider, but that should be > easy to guarantee as the Unregister call is pretty dumb. > > Does all of this make sense? I'll try to think again tomorrow with a fresher > mind to this.
Message was sent while issue was closed.
> 1) Change the OnDumpMemory(callback) method to be more async in nature and to > invoke a callback when done and avoid registering rawptrs to objects with ill > defined lifetimes. Instead at register time, xfer ownership of a client dumpee > shim with an async OnDumpMemory() method. The MDM can serialize the addition of > that owned object to the set of all dumpees. The shim could be called on a > thread that is fileio safe. The shim could handle any thread hopping details > needed and eventually reply. One kind of reply could be "dumpee gone" and the > MDM could remove the shim from its set and delete it. There could also be a > threadsafe way to unreg, reg returns an id which can be used to unreg later. That wouldn't exactly solve the problem of how to instrument sql::Connection class, but it would push responsibility for doing it safely onto the 'shim' registered for each connection. The MDM could be simpler, just serialize all reg/unreg/dump activity with a seqtaskrunner. Multi threading is the dumpees problem.
Message was sent while issue was closed.
> The main concern I have with this is that it's not clear that OnMemoryDump() can > call into a Connection object which may be blocked in ~Connection. At this time > there are no virtual methods, but it's still a subtle issue. At minimum it > would probably want to mark all Connection methods off-limits. The current set > of sqlite3 functions used by OnMemoryDump() should be fine as long as it's all > under the OnMemoryDump() lock preventing ~Connection from reaching Close() (*). > Of course, any extension to other sqlite3 functions should be carefully > considered, because doing I/O under the lock is probably a bad idea. > > (*) It may be that the unregister should happen at the top of CloseInternal(). Yes, I think it is better to unregister at CloseInternal. Since this would block exactly what we want (database gettin destroyed when dump happens) instead of blocking ~Connection. On 2015/10/27 19:57:50, michaeln wrote: > On 2015/10/27 19:11:52, Primiano Tucci wrote: > > On 2015/10/27 18:47:34, Primiano Tucci wrote: > > > On 2015/10/21 21:01:05, Scott Hess wrote: > > > > On 2015/10/21 20:20:32, michaeln wrote: > > > > > The easiest safest change might be to require callers provide a > > > > > sequencedTaskRunner and that they call unreg only on that taskrunner. > But > > > that > > > > > makes it impossible to instrument sql::Connection directly w/o involving > > the > > > > > clients of the class (get them to give you a task runner). > > > > > > > > I was going to suggest shifting the client to a separate object which > > > > cross-references with the Connection, and then they could have independent > > > > lifetimes. BUT, there would still be a race when the reporter wants to > > > generate > > > > stats. > > > > > > > > That said, since Connection is generally already in an AssertIOAllowed() > > > thread, > > > > having a lock to handle this wouldn't be absurd. I'm just not sure where > > > you'd > > > > put the lock, so it might have to be global. I guess the init lock might > > make > > > > sense, it's only used on open. > > > > > > > > I'm not sure what the ownership situation on that new object would be, > > though. > > > > > > Ok I had some time to look at the code. > > > michaeln is definitely right, this code as it is is racy for the precise > > reason > > > explained in #42. > > > My understanding is that even if we specify a taskrunner affinity for the > > > OnMemoryDump method, that still won't solve the problem, as there is no > > > guarantee that a Connection is destroyed on the same thread of the ctor > (still > > > would reduce the chances of the race above). > > > I don't see how adding a SequenceChecker would solve any problem. It would > be > > > totally different if we were guaranteed that all Connection clients use it > on > > a > > > SequencedTaskRunner and they respect that (read: a Connection is always > > created, > > > accessed and destroyed on the same SequencedTaskRunner). Is that the case > > > though? > > > It doesn't look clear to me, from the comments above my guess is that the > > answer > > > is: no, it happens to be sequentially-safish, but there is no > > > SequencedTaskRunner enforcement from the client side (please correct me if I > > am > > > wrong here). > > We need to change the interfce between the mdm and the dumpees is some way to > make this work. I see a couple general options. > > 1) Change the OnDumpMemory(callback) method to be more async in nature and to > invoke a callback when done and avoid registering rawptrs to objects with ill > defined lifetimes. Instead at register time, xfer ownership of a client dumpee > shim with an async OnDumpMemory() method. The MDM can serialize the addition of > that owned object to the set of all dumpees. The shim could be called on a > thread that is fileio safe. The shim could handle any thread hopping details > needed and eventually reply. One kind of reply could be "dumpee gone" and the > MDM could remove the shim from its set and delete it. There could also be a > threadsafe way to unreg, reg returns an id which can be used to unreg later. > > 2)Require that dumpees provide a seqtaskrunner and gaurantee that reg/unreg are > called on that sequence. THe MDM would then be requred to call the dumpee on > that seq. > > I think the second is hard to do given your trying to graft dumpee behavior onto > existing objects that do not have thread or sequence affinity. The shape of the > first one seems more promising to me. > > > > > > > If all the statements above are true, the only option that I can see is to: > > > - Move the dump provider for Connection to its own, RefCountedThreadSafe > > object, > > > and having Connection own it and destroy it via PostTask on the a single > > thread. > > > - Having a lock shared between the ~Connection dtor and the dump provider > > above > > > (+ a flag on the dump provider side to tell whether the connection went > away) > > > - The lock will never be contended unless tracing is disabled. > > > > > > Otherwise, if we are in the state where Connection can be destroyed on any > > > thread, and we don't know upfront on which one, I don't see how we can > safely > > > dump its contents. > > > > > > The main question is, how does this plan sound to you? > > > > Ok I think I have a slightly better plan. > > - I can change the MemoryDumpManager to ensure that all the dump providers > > without a task runner affinity happen on a dedicated thread (used only by the > > MemoryDumpManager). I need this thread anyways for other reasons. > > Do you really need a dedicated thread to tally memory consumption? Can you > explain what the other reasons are? We generally try to avoid one-off threads. > The dedicated thread is needed because when tracing is enabled the ui thread is janked since the dump providers perform heavy operations. This has already been done. > > - All the OnMemoryDump calls for non-taskrunner bound threads will happen on > > this thread. This will be the case for Connection(). > > - In this way I can make the UnregisterDumpProvider call locked. Essentially > > ~Connection dtor will call MemoryDumpManager::UnregisterDumpProvider as first > > thing in the constructor. That will take the internal lock of the MDM and will > > prevent that something else tries to OnMemoryDump while we are being > destroyed. > > Nothing in the ~Connection dtor can possibly depend on a thread which is used > > exclusively by MDM, so no deadlocks should happen in this direction. > > I think that would eliminate the raciness, but may open us up to janking the UI > or IO threads on disk io? I think sqlite internally does locking to accomodate > multi-threaded use of an open sqlite3 db handle. A call to mdm reg/unreg could > end up blocking on OnDump which is blocking on sqlite to finish reading or > writing something. Since we now have a dedicated thread for memory-infra dumpers, the janking due to dump on main threads is avoided. Yes it is true that ~Connection can be blocked by onDump call which is doing a read or write operation. This cannot be avoided in any case. This could happen even if ~Connection is called after a insert into table, due to internal locking of sqlite3. This can't be avoided since the user is trying to take memory traces, which inherently slows down the performance. This only adds a wrapper to internal locks, and does not cause a lot of extra janking. Also as shess@ pointed out, the sqlite functions currently called do not use io operations. > > > In the other direction, we have to guarantee that the body of OnMemoryDump > never > > takes any other lock shared with the UnregisterDumpProvider, but that should > be > > easy to guarantee as the Unregister call is pretty dumb. > > > > Does all of this make sense? I'll try to think again tomorrow with a fresher > > mind to this. |