|
|
Created:
8 years, 9 months ago by Scott Hess - ex-Googler Modified:
8 years, 8 months ago Reviewers:
Greg Billock CC:
chromium-reviews Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionImplement sql::Connection::Raze() in terms of sqlite3_backup API.
Wraps up the notion of reseting a database in a way which respects
SQLite locking constraints and doesn't require closing the database.
A similar outcome could be managed using filesystem operations, which
requires coordination between clients of the database to make sure
that no corruption occurs due to incorrect handling of -journal files.
Also, Windows pins files until the last handle closes, making that
approach challenging in some cases.
BUG=none
TEST=none
ADDENDUM: had to git cl dcommit, and things froze, it looked weird, I thought maybe I needed to re-sync with trunk, so hit Ctrl-C and resync'ed. Code already landed:
http://src.chromium.org/viewvc/chrome?view=rev&revision=131167
Patch Set 1 #Patch Set 2 : cleanup and moar testz. #
Total comments: 21
Patch Set 3 : gbillock comments. #
Total comments: 1
Messages
Total messages: 15 (0 generated)
I also have a sqlite3_raze() implementation using btree.c internals, but this seemed safer. I plan to suggest it to the SQLite team as a reasonable function to add, since it's safer than filesystem operations.
http://codereview.chromium.org/9768006/diff/5001/sql/connection.cc File sql/connection.cc (right): http://codereview.chromium.org/9768006/diff/5001/sql/connection.cc#newcode190 sql/connection.cc:190: // Propagate the page size to the new database. Maybe say that this gets the page_size from the current connection, and the next stanze propagates it. How likely is the DB to respond to this PRAGMA if it is totally goofed up? http://codereview.chromium.org/9768006/diff/5001/sql/connection.cc#newcode200 sql/connection.cc:200: // one. The backup API tracks the original schema from the so the new DB after the backup will have schema_version = 2? http://codereview.chromium.org/9768006/diff/5001/sql/connection.cc#newcode214 sql/connection.cc:214: busy_timeout.SetTimeout(base::TimeDelta::FromSeconds(1)); Just rely on callers to use RazeWithTimeout here? http://codereview.chromium.org/9768006/diff/5001/sql/connection.cc#newcode216 sql/connection.cc:216: int rc = sqlite3_backup_step(backup, 1); doc says pass -1 to copy all remaining pages. Would that be better? http://codereview.chromium.org/9768006/diff/5001/sql/connection.cc#newcode220 sql/connection.cc:220: if (rc == SQLITE_BUSY) { Is there a way to force it open? Presumably something might be wedged hard when this method is called, right? http://codereview.chromium.org/9768006/diff/5001/sql/connection.cc#newcode224 sql/connection.cc:224: // There should only be a single page to copy. Using -1 would mean you don't need this, I think. http://codereview.chromium.org/9768006/diff/5001/sql/connection.h File sql/connection.h (right): http://codereview.chromium.org/9768006/diff/5001/sql/connection.h#newcode187 sql/connection.h:187: // filesystem operations problematic). Returns true if the database Does this mean it won't necessarily work? It seems like this would be used in extreme situations, so wouldn't that mean it should take steps to try to make sure that collisions get nuked as well? http://codereview.chromium.org/9768006/diff/5001/sql/connection.h#newcode191 sql/connection.h:191: // process. RazeWithTimeout() may be used if appropriate. What's the use case for that kind of call? http://codereview.chromium.org/9768006/diff/5001/sql/connection_unittest.cc File sql/connection_unittest.cc (right): http://codereview.chromium.org/9768006/diff/5001/sql/connection_unittest.cc#n... sql/connection_unittest.cc:138: EXPECT_EQ(2, s.ColumnInt(0)); is this impl-dependent? that is, won't the page count vary depending on sqlite version, build params, or whatever else?
Thanks for the review - I had entirely forgotten about this CL :-). I'm thinking on the "backup exactly one page" thing. I maybe might think that backing up -1 pages would adapt reasonably in case SQLite changes in a way which results in more than 1 page in the empty database. As indicated, I am mixed on that. http://codereview.chromium.org/9768006/diff/5001/sql/connection.cc File sql/connection.cc (right): http://codereview.chromium.org/9768006/diff/5001/sql/connection.cc#newcode190 sql/connection.cc:190: // Propagate the page size to the new database. On 2012/03/31 01:23:50, Greg Billock wrote: > Maybe say that this gets the page_size from the current connection, and the next > stanze propagates it. OK. > How likely is the DB to respond to this PRAGMA if it is totally goofed up? AFAICT, 'PRAGMA page_size' only returns info from the connection structure. So this would imply that the database could not even be opened. In that case, this approach won't work and something else would need to happen. http://codereview.chromium.org/9768006/diff/5001/sql/connection.cc#newcode200 sql/connection.cc:200: // one. The backup API tracks the original schema from the On 2012/03/31 01:23:50, Greg Billock wrote: > so the new DB after the backup will have schema_version = 2? The backup API restores the backup destination's schema_version and adds one. This guarantees that other readers see a schema change. I'll clarify the comment. It needs to force the null database to have a page, otherwise it will back up zero pages, in which case it won't honor the existing page_size. http://codereview.chromium.org/9768006/diff/5001/sql/connection.cc#newcode214 sql/connection.cc:214: busy_timeout.SetTimeout(base::TimeDelta::FromSeconds(1)); On 2012/03/31 01:23:50, Greg Billock wrote: > Just rely on callers to use RazeWithTimeout here? That is a really good suggestion. I must have been testing or otherwise distracted! http://codereview.chromium.org/9768006/diff/5001/sql/connection.cc#newcode216 sql/connection.cc:216: int rc = sqlite3_backup_step(backup, 1); On 2012/03/31 01:23:50, Greg Billock wrote: > doc says pass -1 to copy all remaining pages. Would that be better? Partly because I must have missed the -1 setting. Even so, I'm mixed. If more than one page is backed up, I don't know what that would mean, so I am somewhat inclined to be conservative. http://codereview.chromium.org/9768006/diff/5001/sql/connection.cc#newcode220 sql/connection.cc:220: if (rc == SQLITE_BUSY) { On 2012/03/31 01:23:50, Greg Billock wrote: > Is there a way to force it open? Presumably something might be wedged hard when > this method is called, right? Unfortunately, there's no way to force it open without the possibility of the other writer writing crap over it. If this is a problem in practice, the w/timeout version would be reasonable to try. I think it might be best to try it out, histogram the results, and see what happened. Profile databases are exclusive-access for the most part, and the caller should know that. http://codereview.chromium.org/9768006/diff/5001/sql/connection.h File sql/connection.h (right): http://codereview.chromium.org/9768006/diff/5001/sql/connection.h#newcode187 sql/connection.h:187: // filesystem operations problematic). Returns true if the database On 2012/03/31 01:23:50, Greg Billock wrote: > Does this mean it won't necessarily work? It seems like this would be used in > extreme situations, so wouldn't that mean it should take steps to try to make > sure that collisions get nuked as well? My broad background with this change is: A) corruption isn't that common in the first place, and has varied causes, which make this hard to test for real. B) we have a baseline presumption that SQLite's core operations like opening journal files and writing pages are appropriate. Tracing through the SQLite code, there are places where this code can fall down, and for the most part resolving the problem in those cases probably requires changes in the caller, or additional gruntwork to track down what's really happening (by adding histograms or DumpWithoutCrash() calls or something). So this CL aims to provide a best-effort, I-think-this-should-work-well starting point, which can be built on. My expectation is that we'll find that this doesn't work in a tiny number of cases, and that they'll be "I have no idea what we can do about that" types of cases. Like "profile not writable" or "disk is horribly hosed". If there's a systemic case, like Database X turns out to have two writers instead of the expected one, then we'll (hopefully) notice and drill down and fix the problem. http://codereview.chromium.org/9768006/diff/5001/sql/connection.h#newcode191 sql/connection.h:191: // process. RazeWithTimeout() may be used if appropriate. On 2012/03/31 01:23:50, Greg Billock wrote: > What's the use case for that kind of call? Most (all?) profile databases are exclusive-access, so they should NEVER receive SQLITE_BUSY and Raze() should work. Web SQL databases can be opened in multiple renderers, in which case SQLITE_BUSY must be expected and handled. http://codereview.chromium.org/9768006/diff/5001/sql/connection_unittest.cc File sql/connection_unittest.cc (right): http://codereview.chromium.org/9768006/diff/5001/sql/connection_unittest.cc#n... sql/connection_unittest.cc:138: EXPECT_EQ(2, s.ColumnInt(0)); On 2012/03/31 01:23:50, Greg Billock wrote: > is this impl-dependent? that is, won't the page count vary depending on sqlite > version, build params, or whatever else? Shouldn't ever vary. There should be the special page 1 containing the file header and sqlite_master, and page 2 as the root (and also child) page for table foo. It will contain a single one-byte value, so it should fit fine no matter the page size. I actually don't think it needs any data to have a root page. If it ever does generate more than 2 total pages, then someone probably needs to look at the code and make sure that the assumptions about how databases are constructed still hold.
commentated and modified to use -1 and DCHECK on not-one-page case. http://codereview.chromium.org/9768006/diff/5001/sql/connection.cc File sql/connection.cc (right): http://codereview.chromium.org/9768006/diff/5001/sql/connection.cc#newcode216 sql/connection.cc:216: int rc = sqlite3_backup_step(backup, 1); On 2012/04/04 01:28:25, shess wrote: > On 2012/03/31 01:23:50, Greg Billock wrote: > > doc says pass -1 to copy all remaining pages. Would that be better? > > Partly because I must have missed the -1 setting. Even so, I'm mixed. If more > than one page is backed up, I don't know what that would mean, so I am somewhat > inclined to be conservative. I modified this code to use sqlite3_backup_pagecount() and a DCHECK.
lgtm http://codereview.chromium.org/9768006/diff/5001/sql/connection.h File sql/connection.h (right): http://codereview.chromium.org/9768006/diff/5001/sql/connection.h#newcode187 sql/connection.h:187: // filesystem operations problematic). Returns true if the database Sounds good. I don't know if any of that's worth mentioning in the comment. Since you've already written it, perhaps so... On 2012/04/04 01:28:25, shess wrote: > On 2012/03/31 01:23:50, Greg Billock wrote: > > Does this mean it won't necessarily work? It seems like this would be used in > > extreme situations, so wouldn't that mean it should take steps to try to make > > sure that collisions get nuked as well? > > My broad background with this change is: > A) corruption isn't that common in the first place, and has varied causes, which > make this hard to test for real. > B) we have a baseline presumption that SQLite's core operations like opening > journal files and writing pages are appropriate. > > Tracing through the SQLite code, there are places where this code can fall down, > and for the most part resolving the problem in those cases probably requires > changes in the caller, or additional gruntwork to track down what's really > happening (by adding histograms or DumpWithoutCrash() calls or something). So > this CL aims to provide a best-effort, I-think-this-should-work-well starting > point, which can be built on. > > My expectation is that we'll find that this doesn't work in a tiny number of > cases, and that they'll be "I have no idea what we can do about that" types of > cases. Like "profile not writable" or "disk is horribly hosed". If there's a > systemic case, like Database X turns out to have two writers instead of the > expected one, then we'll (hopefully) notice and drill down and fix the problem. http://codereview.chromium.org/9768006/diff/5001/sql/connection.h#newcode191 sql/connection.h:191: // process. RazeWithTimeout() may be used if appropriate. Do you envision using this most typically for WebSQL disaster recovery? My mental image was this was for profile DBs, but perhaps that's completely off. On 2012/04/04 01:28:25, shess wrote: > On 2012/03/31 01:23:50, Greg Billock wrote: > > What's the use case for that kind of call? > > Most (all?) profile databases are exclusive-access, so they should NEVER receive > SQLITE_BUSY and Raze() should work. Web SQL databases can be opened in multiple > renderers, in which case SQLITE_BUSY must be expected and handled. http://codereview.chromium.org/9768006/diff/11001/sql/connection.cc File sql/connection.cc (right): http://codereview.chromium.org/9768006/diff/11001/sql/connection.cc#newcode234 sql/connection.cc:234: DCHECK_EQ(pages, 1); Likes it. That should get the attention of a version roll or something that breaks the assumption.
I'm going to stop adding comments, because I could spend eternity debating with myself about how many teeth the horse has, and readers are probably already freaked out. http://codereview.chromium.org/9768006/diff/5001/sql/connection.h File sql/connection.h (right): http://codereview.chromium.org/9768006/diff/5001/sql/connection.h#newcode191 sql/connection.h:191: // process. RazeWithTimeout() may be used if appropriate. On 2012/04/04 19:49:21, Greg Billock wrote: > Do you envision using this most typically for WebSQL disaster recovery? My > mental image was this was for profile DBs, but perhaps that's completely off. Mostly it's for profile dbs, but I believe websql has a case where the database needs to be deleted, except there they don't have exclusive access so they have to wait around for things to clear. In the Gears days we had a similar problem, where many processes could have connections, so it was unclear what to do when corruption was detected. We added sqlite3_poison() which broke the database in a transactional fashion, so that on next access all the other connections broke and reset.
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/shess@chromium.org/9768006/11001
Try job failure for 9768006-11001 (retry) on win_rel for step "browser_tests". It's a second try, previously, step "browser_tests" failed. http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=win_rel&nu...
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/shess@chromium.org/9768006/11001
Try job failure for 9768006-11001 (retry) on linux_chromeos for step "compile" (clobber build). It's a second try, previously, step "compile" failed. http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=linux_chro...
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/shess@chromium.org/9768006/11001
Try job failure for 9768006-11001 (retry) on win_rel for step "browser_tests". It's a second try, previously, step "browser_tests" failed. http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=win_rel&nu...
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/shess@chromium.org/9768006/11001
Try job failure for 9768006-11001 (retry) on win_rel for step "browser_tests". It's a second try, previously, step "browser_tests" failed. http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=win_rel&nu...
On 2012/04/06 19:15:14, I haz the power (commit-bot) wrote: > Try job failure for 9768006-11001 (retry) on win_rel for step "browser_tests". > It's a second try, previously, step "browser_tests" failed. > http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=win_rel&nu... Awesome. Lessee if these things can still be landed the old-fashioned way... |