|
|
Created:
4 years, 4 months ago by afakhry Modified:
3 years, 9 months ago CC:
chromium-reviews, rouslan+autofill_chromium.org, estade+watch_chromium.org, vabr+watchlistautofill_chromium.org, jdonnelly+autofillwatch_chromium.org Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionRecreate the WebData database on a catastrophic SQL error
Currently, if the WebData database gets corrupted, it will remain like so
and users will keep getting the profile error dialog every time they
sign in. The only way for the user to fix this is to recreate the profile
or delete the corrupt file.
This CL does for the WebData DB as is done for the History DB; when a catastrophic
error is seen, the DB is razed.
Example feedback report filed with this CL applied: https://docs.google.com/a/google.com/document/d/1qt-AwWVHZTtcJoFPNq2_yI-Tl2pIiOwEQCaIf23iL4g/edit?usp=sharing
BUG=455749, 635734
Committed: https://crrev.com/a592d29c2a4667b4ec95bd091e860bd01662908e
Cr-Commit-Position: refs/heads/master@{#413500}
Patch Set 1 #
Total comments: 4
Patch Set 2 : derat's comments #
Total comments: 7
Patch Set 3 : shess@' comments #
Total comments: 1
Patch Set 4 : Recover DB and show profile error dialog #
Total comments: 10
Patch Set 5 : shess' comments #
Total comments: 5
Patch Set 6 : Rebase on shess change in https://codereview.chromium.org/2258703004/ #Patch Set 7 : Move to c/b/profiles #Patch Set 8 : Fix typo #Messages
Total messages: 60 (36 generated)
The CQ bit was checked by afakhry@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_chromeos_ozone_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
derat@chromium.org changed reviewers: + derat@chromium.org
(drive-by review; i'm unfamiliar with this code.) it'd be nice to have tests for the new behavior, but it looks like components/webdata/common/ is pretty short on unit tests. :-( https://codereview.chromium.org/2225333003/diff/1/components/webdata/common/w... File components/webdata/common/web_database_service.cc (right): https://codereview.chromium.org/2225333003/diff/1/components/webdata/common/w... components/webdata/common/web_database_service.cc:141: auto loaded_callback = loaded_callbacks_.back(); same comment as below https://codereview.chromium.org/2225333003/diff/1/components/webdata/common/w... components/webdata/common/web_database_service.cc:154: auto error_callback = error_callbacks_.back(); the c++ rules around auto and references seem arcane and terrify me. is this safe or a use-after-free? my initial concern is that since back() usually returns a reference, the 'auto' type here may be a reference as well, in which case it may no longer be valid after the pop_back() call on the next line. in any case, i think our rules for 'auto' say that it can only be used when the variable's type is obvious, which isn't the case here (since error_callbacks_ is declared in the header), so please switch this to use an explicit non-reference callback type.
afakhry@chromium.org changed reviewers: + shess@chromium.org
derat@ Thanks! I'll try to add tests once I finalize the solution. Adding shess@. shess@, this is CL is a follow-up to the CL https://codereview.chromium.org/2107493002/, as agreed with michaeln@. I found 2 issues while working on the previous CL: 1- The WebData DB when corrupted remains like so, and the user keeps getting the profile error dialog on each sign in. 2- A crash happens when the user exits the profile error message box, which I explained here https://bugs.chromium.org/p/chromium/issues/detail?id=635734#c2 https://codereview.chromium.org/2225333003/diff/1/components/webdata/common/w... File components/webdata/common/web_database_service.cc (right): https://codereview.chromium.org/2225333003/diff/1/components/webdata/common/w... components/webdata/common/web_database_service.cc:141: auto loaded_callback = loaded_callbacks_.back(); On 2016/08/09 21:05:41, Daniel Erat wrote: > same comment as below Done. https://codereview.chromium.org/2225333003/diff/1/components/webdata/common/w... components/webdata/common/web_database_service.cc:154: auto error_callback = error_callbacks_.back(); On 2016/08/09 21:05:41, Daniel Erat wrote: > the c++ rules around auto and references seem arcane and terrify me. is this > safe or a use-after-free? my initial concern is that since back() usually > returns a reference, the 'auto' type here may be a reference as well, in which > case it may no longer be valid after the pop_back() call on the next line. > > in any case, i think our rules for 'auto' say that it can only be used when the > variable's type is obvious, which isn't the case here (since error_callbacks_ is > declared in the header), so please switch this to use an explicit non-reference > callback type. Yes, absolutely. Thanks! Done. https://codereview.chromium.org/2225333003/diff/20001/sql/connection.cc File sql/connection.cc (right): https://codereview.chromium.org/2225333003/diff/20001/sql/connection.cc#newco... sql/connection.cc:1708: return false; shess@, I need your advice about this code. The reason I added it is because I want to avoid the following crash on Debug builds when the WebData DB file is corrupted: [4470:4489:0809/140648:FATAL:statement.cc(52)] Cannot call mutating statements on an invalid statement. #0 0x2b105313d14e base::debug::StackTrace::StackTrace() #1 0x2b1053198b1c logging::LogMessage::~LogMessage() #2 0x2b1055b4f892 sql::Statement::CheckValid() #3 0x2b1055b4f8e2 sql::Statement::StepInternal() #4 0x2b1055b4fd8a sql::Statement::Step() #5 0x2b1055b4a178 sql::MetaTable::RazeIfDeprecated() #6 0x2b1055b0afa4 WebDatabase::Init() #7 0x2b1055b0e785 WebDatabaseBackend::LoadDatabaseIfNecessary() #8 0x2b1055b0e549 WebDatabaseBackend::InitDatabase() #9 0x2b1055b13382 _ZN4base8internal13FunctorTraitsIM18WebDatabaseBackendFvvEvE6InvokeIRK13scoped_refptrIS2_EJEEEvS4_OT_DpOT0_ #10 0x2b1055b132d1 _ZN4base8internal12InvokeHelperILb0EvE8MakeItSoIRKM18WebDatabaseBackendFvvEJRK13scoped_refptrIS4_EEEEvOT_DpOT0_ #11 0x2b1055b13272 _ZN4base8internal7InvokerINS0_9BindStateIM18WebDatabaseBackendFvvEJ13scoped_refptrIS3_EEEEFvvEE7RunImplIRKS5_RKSt5tupleIJS7_EEJLm0EEEEvOT_OT0_NS_13IndexSequenceIJXspT1_EEEE #12 0x2b1055b12e4c _ZN4base8internal7InvokerINS0_9BindStateIM18WebDatabaseBackendFvvEJ13scoped_refptrIS3_EEEEFvvEE3RunEPNS0_13BindStateBaseE #13 0x2b1053120d5e base::Callback<>::Run() #14 0x2b1053142a4e base::debug::TaskAnnotator::RunTask() #15 0x2b10531b5551 base::MessageLoop::RunTask() #16 0x2b10531b57d4 base::MessageLoop::DeferOrRunPendingTask() #17 0x2b10531b5a9e base::MessageLoop::DoWork() #18 0x2b10531ca8a4 base::MessagePumpDefault::Run() #19 0x2b10531b4fba base::MessageLoop::RunHandler() #20 0x2b105324d694 base::RunLoop::Run() #21 0x2b10532e26d1 base::Thread::Run() #22 0x2b1057e88648 content::BrowserThreadImpl::DBThreadRun() #23 0x2b1057e88c38 content::BrowserThreadImpl::Run() #24 0x2b10532e2cab base::Thread::ThreadMain() #25 0x2b10532cf58a base::(anonymous namespace)::ThreadFunc() #26 0x2b1068615184 start_thread #27 0x2b106892537d clone I wanted to make sure that a DB with file that's either SQLITE_NOTADB or SQLITE_CORRUPT is not openable to return early at: https://cs.chromium.org/chromium/src/components/webdata/common/web_database.c... and skip the sql::MetaTable::RazeIfDeprecated() https://cs.chromium.org/chromium/src/components/webdata/common/web_database.c... that causes the crash. Unfortunately a lot of tests expect that opening a DB in this state is successful, so I'm not sure how to overcome this problem.
Description was changed from ========== Recreate the WebData database on a catastrophic SQL error Currently, if the WebData database gets corrupted, it will remain like so and the user will keep getting the profile error dialog every time he signs in. The only way for the user to fix this is to recreate his profile or delete the corrupt file. This CL does for the WebData DB as is done for the History DB; when a catstrophic error is seen, the DB is razed. BUG=455749 ========== to ========== Recreate the WebData database on a catastrophic SQL error Currently, if the WebData database gets corrupted, it will remain like so and the user will keep getting the profile error dialog every time he signs in. The only way for the user to fix this is to recreate his profile or delete the corrupt file. This CL does for the WebData DB as is done for the History DB; when a catstrophic error is seen, the DB is razed. Example feedback report filed with this CL applied: https://docs.google.com/a/google.com/document/d/1qt-AwWVHZTtcJoFPNq2_yI-Tl2pI... BUG=455749, 635734 ==========
Note that the doc linked in the CL comment takes me to a doc containing a link, and that link doesn't work for me, even though I have a google.com account. Maybe you could just describe the information inline. https://codereview.chromium.org/2225333003/diff/20001/components/webdata/comm... File components/webdata/common/web_database_backend.cc (right): https://codereview.chromium.org/2225333003/diff/20001/components/webdata/comm... components/webdata/common/web_database_backend.cc:141: db_.reset(); These three lines are essentially identical to what sql::Connection::RazeAndClose() does. Additionally, RazeAndClose() is intended to work correctly when called from within the error callback, so using that would mean you don't have to post a task and carry around a database handle you know to reference a potentially-corrupt database. https://codereview.chromium.org/2225333003/diff/20001/components/webdata/comm... File components/webdata/common/web_database_service.cc (right): https://codereview.chromium.org/2225333003/diff/20001/components/webdata/comm... components/webdata/common/web_database_service.cc:153: // |error_callbacks_| before it accesses it. If I understand this correctly, another option might be something like: ErrorCallbacks callbacks; std::swap(callbacks, error_callbacks_); for (const auto& error_callback : callbacks) { ... } https://codereview.chromium.org/2225333003/diff/20001/sql/connection.cc File sql/connection.cc (right): https://codereview.chromium.org/2225333003/diff/20001/sql/connection.cc#newco... sql/connection.cc:1708: return false; On 2016/08/09 21:36:17, afakhry wrote: > shess@, I need your advice about this code. > > The reason I added it is because I want to avoid the following crash on Debug > builds when the WebData DB file is corrupted: I'm not sure what your question is. I wouldn't expect you to see enough corruption that having this corrupt ever comes up as a problem in debug builds. So what's the context where the corruption is happening? A test or something like that? If I read the existing code correctly, it's probably using ExecuteAndReturnErrorCode() Specifically to avoid calling the error callback. Otherwise it could have just used plain Execute().
The CQ bit was checked by afakhry@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Patchset #3 (id:40001) has been deleted
The CQ bit was checked by afakhry@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
shess@ With the solution in this new patch, we RazeAndClose() the DB on a catastrophic error as you suggested, and the addition of checking whether the DB was poisoned in OpenInternal() and retry again, makes the profile error dialog entirely go away. The DB is opened and initialized successfully as if the file was never corrupt. The user signs in successfully, but is faced with another sync error asking them to re-signin, since the recreated DB contains nothing: https://drive.google.com/file/d/0B9NoD5ZlZAEMbGR3amwzY3pKaDA/view?usp=sharing Pros: - User is not faced with a confusing dialog saying "some features will be missing" and the decision to file a feedback report or not. All he needs to do is to re-signin to get back sync. Cons: - The DB can still get corrupted, but we don't get to see that from the stats as it fixes itself before we know about it. This might not be important though. https://codereview.chromium.org/2225333003/diff/20001/components/webdata/comm... File components/webdata/common/web_database_backend.cc (right): https://codereview.chromium.org/2225333003/diff/20001/components/webdata/comm... components/webdata/common/web_database_backend.cc:141: db_.reset(); On 2016/08/10 17:16:19, Scott Hess wrote: > These three lines are essentially identical to what > sql::Connection::RazeAndClose() does. Additionally, RazeAndClose() is intended > to work correctly when called from within the error callback, so using that > would mean you don't have to post a task and carry around a database handle you > know to reference a potentially-corrupt database. Interesting. Thanks! Done. https://codereview.chromium.org/2225333003/diff/20001/components/webdata/comm... File components/webdata/common/web_database_service.cc (right): https://codereview.chromium.org/2225333003/diff/20001/components/webdata/comm... components/webdata/common/web_database_service.cc:153: // |error_callbacks_| before it accesses it. On 2016/08/10 17:16:19, Scott Hess wrote: > If I understand this correctly, another option might be something like: > ErrorCallbacks callbacks; > std::swap(callbacks, error_callbacks_); > for (const auto& error_callback : callbacks) { > ... > } This works too, but it exposes a problem that was already there. Now we get the profile error dialog 4 times, once after every "OK! The nested runloop of the message box blocks the execution until it's dismissed, and then the next callback is executed, which shows the profile error dialog again! This pop_back() implementation *hides* the problem, since all other callbacks are executed fast while the message box is still being shown. This is good but that's not a robust solution! I'm not sure why the code is written that way. Is it expected that we get one profile error dialog for each WebData DB type (autofill, keyword, token, password ...)? I'd need more time to think about how to fix this cleanly (if that's important). https://codereview.chromium.org/2225333003/diff/20001/sql/connection.cc File sql/connection.cc (right): https://codereview.chromium.org/2225333003/diff/20001/sql/connection.cc#newco... sql/connection.cc:1708: return false; On 2016/08/10 17:16:19, Scott Hess wrote: > On 2016/08/09 21:36:17, afakhry wrote: > > shess@, I need your advice about this code. > > > > The reason I added it is because I want to avoid the following crash on Debug > > builds when the WebData DB file is corrupted: > > I'm not sure what your question is. I wouldn't expect you to see enough > corruption that having this corrupt ever comes up as a problem in debug builds. > So what's the context where the corruption is happening? A test or something > like that? > > If I read the existing code correctly, it's probably using > ExecuteAndReturnErrorCode() Specifically to avoid calling the error callback. > Otherwise it could have just used plain Execute(). I'm forcing a corruption in the Web Data file on a debug ChromeOS Linux build just for testing. My question was if there's a way to avoid that crash by making sure opening the corrupt db fails so that we can skip running sql::MetaTable::RazeIfDeprecated() which causes the crash. But it seems that opening a corrupt db is expected to succeed! With the solution in the new patch, this is not needed. However I found that razing a NOTADB database always fails at: sqlite3_backup_step(backup, -1); [https://cs.chromium.org/chromium/src/sql/connection.cc?type=cs&q=%22sqlite3_backup_step(backup,+-1);%22&sq=package:chromium&l=102] which returns NOTADB as well, and hence we hit this DCHECK here https://cs.chromium.org/chromium/src/sql/connection.cc?type=cs&q=Connection::.... So maybe the assumptions need to be revised? If debug builds are not something we need to worry about, then we can ignore this comment.
On 2016/08/11 17:32:54, afakhry wrote: > shess@ > With the solution in this new patch, we RazeAndClose() the DB on a catastrophic > error as you suggested, and the addition of checking whether the DB was poisoned > in OpenInternal() and retry again, makes the profile error dialog entirely go > away. The DB is opened and initialized successfully as if the file was never > corrupt. > The user signs in successfully, but is faced with another sync error asking them > to re-signin, since the recreated DB contains nothing: > https://drive.google.com/file/d/0B9NoD5ZlZAEMbGR3amwzY3pKaDA/view?usp=sharing > > Pros: > - User is not faced with a confusing dialog saying "some features will be > missing" and the decision to file a feedback report or not. All he needs to do > is to re-signin to get back sync. > > Cons: > - The DB can still get corrupted, but we don't get to see that from the stats as > it fixes itself before we know about it. This might not be important though. > Actually, I just spoke with Albert about this, and we think that it's still better to show the profile error page so that we get the visibility that we need through the UMA stats and the feedback reports, and also to clearly indicate to the user that a data loss might have happened. The user will still see the sync error notification (I showed in the above screenshot) which has a button that signs the user out to do the re-signin. Since we do a RazeAndClose() anyways, the subsequent signin will be problem-free. What do you think? > https://codereview.chromium.org/2225333003/diff/20001/components/webdata/comm... > File components/webdata/common/web_database_backend.cc (right): > > https://codereview.chromium.org/2225333003/diff/20001/components/webdata/comm... > components/webdata/common/web_database_backend.cc:141: db_.reset(); > On 2016/08/10 17:16:19, Scott Hess wrote: > > These three lines are essentially identical to what > > sql::Connection::RazeAndClose() does. Additionally, RazeAndClose() is > intended > > to work correctly when called from within the error callback, so using that > > would mean you don't have to post a task and carry around a database handle > you > > know to reference a potentially-corrupt database. > > Interesting. Thanks! Done. > > https://codereview.chromium.org/2225333003/diff/20001/components/webdata/comm... > File components/webdata/common/web_database_service.cc (right): > > https://codereview.chromium.org/2225333003/diff/20001/components/webdata/comm... > components/webdata/common/web_database_service.cc:153: // |error_callbacks_| > before it accesses it. > On 2016/08/10 17:16:19, Scott Hess wrote: > > If I understand this correctly, another option might be something like: > > ErrorCallbacks callbacks; > > std::swap(callbacks, error_callbacks_); > > for (const auto& error_callback : callbacks) { > > ... > > } > > This works too, but it exposes a problem that was already there. Now we get the > profile error dialog 4 times, once after every "OK! The nested runloop of the > message box blocks the execution until it's dismissed, and then the next > callback is executed, which shows the profile error dialog again! > > This pop_back() implementation *hides* the problem, since all other callbacks > are executed fast while the message box is still being shown. This is good but > that's not a robust solution! I'm not sure why the code is written that way. Is > it expected that we get one profile error dialog for each WebData DB type > (autofill, keyword, token, password ...)? I'd need more time to think about how > to fix this cleanly (if that's important). > > https://codereview.chromium.org/2225333003/diff/20001/sql/connection.cc > File sql/connection.cc (right): > > https://codereview.chromium.org/2225333003/diff/20001/sql/connection.cc#newco... > sql/connection.cc:1708: return false; > On 2016/08/10 17:16:19, Scott Hess wrote: > > On 2016/08/09 21:36:17, afakhry wrote: > > > shess@, I need your advice about this code. > > > > > > The reason I added it is because I want to avoid the following crash on > Debug > > > builds when the WebData DB file is corrupted: > > > > I'm not sure what your question is. I wouldn't expect you to see enough > > corruption that having this corrupt ever comes up as a problem in debug > builds. > > So what's the context where the corruption is happening? A test or something > > like that? > > > > If I read the existing code correctly, it's probably using > > ExecuteAndReturnErrorCode() Specifically to avoid calling the error callback. > > Otherwise it could have just used plain Execute(). > > I'm forcing a corruption in the Web Data file on a debug ChromeOS Linux build > just for testing. My question was if there's a way to avoid that crash by making > sure opening the corrupt db fails so that we can skip running > sql::MetaTable::RazeIfDeprecated() which causes the crash. But it seems that > opening a corrupt db is expected to succeed! > > With the solution in the new patch, this is not needed. However I found that > razing a NOTADB database always fails at: sqlite3_backup_step(backup, -1); > [https://cs.chromium.org/chromium/src/sql/connection.cc?type=cs&q=%22sqlite3_backup_step(backup,+-1);%22&sq=package:chromium&l=102] > which returns NOTADB as well, and hence we hit this DCHECK here > https://cs.chromium.org/chromium/src/sql/connection.cc?type=cs&q=Connection::.... > So maybe the assumptions need to be revised? > > If debug builds are not something we need to worry about, then we can ignore > this comment.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: mac_chromium_rel_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_...)
shess@, friendly ping. Thank you!
OK, I'm not _entirely_ sure I follow at this point. I spent the balance of yesterday doing archaeology, and I think the "PRAGMA auto_vacuum" line exists entirely for metrics. So IMHO it could be modified to do the detect-errors-and-retry logic (and I have a CL waiting to layer in a test for this). Looking at histograms, I think that this would be useful mainly to detect SQLITE_NOTADB type cases, perhaps with a few SQLITE_CORRUPT cases. Anyhow, I suspect that that may be a reasonable alternative to your connection.cc modification. Raze was supposed to work for SQLITE_NOTADB, I think there's even a test for it. That should fall to the truncate case, but I might be mis-remembering. I'll look at that, though, there may be cases where it gets defeated. When you say you're forcing a corruption just for testing, you mean manually forcing it for manual testing? In that case, feel free to comment out the DCHECKS in your build, or mark them ERROR, rather than reworking the code. This is actually the DCHECK working as intended to notify you that something you're doing is going awry. If the code works around the warning, then later someone might unintentionally cause a corruption and _not_ receive a warning. WRT posting the UI thing - I guess that ends up being up to you. My suggestion with RazeAndClose() was mostly that it looked like the code was trying to safely post tasks outside the database setup so that it could clean up the database, and RazeAndClose() is intended to work for exactly that case. You may need a latch between the database error callback and your handler, as it would be very hard for the database error callback to apply any sort of post-callback suppression (some errors should be retried, others will never succeed, some are ambiguous depending on what data is being read, etc).
I'm liking this better, but I need to go over OpenInternal() because I'm concerned that if we keep pushing arbitrary snippets into there, it will explode. https://codereview.chromium.org/2225333003/diff/60001/sql/connection.cc File sql/connection.cc (right): https://codereview.chromium.org/2225333003/diff/60001/sql/connection.cc#newco... sql/connection.cc:1742: } I'm going to post you an alternative, once I've finished the test for it. I'm a little concerned about this blanket check. It's plausible, but it feels like it should be at the end of the method, or it should be outside the scope of the method, so that it covers all the cases where someone poisoned the database during the open/setup process.
The CQ bit was checked by afakhry@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-device on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device/builds...)
Hello shess@, Did you manage to write tests to see where the block I added in Connection::OpenInternal() should go? This new patch handles what Albert and I agreed upon, which is to show the profile error dialog even if we manage to Raze and re-open the DB file successfully, since data loss is a high possibility. We need users to be able to file feedback reports from the dialog, and UMA stats should be reported. For the dialog in the case of a recovery from a catastrophic failure, I chose an actionable message to tell the user to "sign out the sign in again" as seen in the screenshot: https://drive.google.com/a/google.com/file/d/0B6G_-uQnf1_LSWhNcmF2WE5yLU0/vie... PTAL. Thanks!
The CQ bit was checked by afakhry@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Patchset #4 (id:80001) has been deleted
Patchset #4 (id:100001) has been deleted
The CQ bit was checked by afakhry@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
LGTM with suggested changes. Sorry for the delay. https://codereview.chromium.org/2225333003/diff/120001/components/webdata/com... File components/webdata/common/web_database_backend.cc (right): https://codereview.chromium.org/2225333003/diff/120001/components/webdata/com... components/webdata/common/web_database_backend.cc:108: diagnostics_.clear(); Also clear catastrophic_error_occurred_, since it's a new open? Or is the above just to save memory if everything has gone poorly? https://codereview.chromium.org/2225333003/diff/120001/components/webdata/com... components/webdata/common/web_database_backend.cc:125: db_.reset(); It might make more sense to put this in an init_status != sql::INIT_OK block, with early return, and put the common/expected case down here. I think that's closer to common Chromium style. https://codereview.chromium.org/2225333003/diff/120001/components/webdata/com... components/webdata/common/web_database_backend.cc:135: db_->GetSQLConnection()->RazeAndClose(); Makes sense to me. Note that this callback could be called after LoadDatabaseIfNecessary() has run successfully to completion. This can happen if there is corruption in the database data, but not in the header data. The levels I can think of are: - header corruption, which should be touched off in sql::Connection::Open(). - schema corruption, which should be touched off by table creation and checking. - data corruption, which is only noticed when the corrupt data is queried. I _think_ the second bit is covered by db_->Init(), assuming that does queries like "CREATE TABLE IF NOT EXISTS" or sql::Connection:DoesTableExist() type calls. Anyhow, I think what will happen is that data corruption will result in a raze, but the analysis data will never be surfaced to the user or sent up. This _does_ happen, but I can't really say how often. It may be small enough not to worry about. At one point I thought I had determined that ~25% of corruptions were detect in scope of sql::Connection::Open(), which is why I introduced the retry logic, but it's hard to make general statements based on research I did a few years ago ... https://codereview.chromium.org/2225333003/diff/120001/sql/connection.cc File sql/connection.cc (right): https://codereview.chromium.org/2225333003/diff/120001/sql/connection.cc#newc... sql/connection.cc:1743: I'd prefer the version I posted to you under separate CL to this. Partly because I think it does 95% the same thing, but partly because I feel like this is kind of arbitrary to be bundled into your CL (I want to have a clear idea of where the change came from in two years when it's breaking someone's sql code :-). https://codereview.chromium.org/2225333003/diff/120001/sql/init_status.h File sql/init_status.h (right): https://codereview.chromium.org/2225333003/diff/120001/sql/init_status.h#newc... sql/init_status.h:18: Any reason this isn't at the end of the list? Also, the comment is wishy-washy. At this time, the database was recovered but data was almost certainly lost. Maybe just say "The database was deleted and re-opened successfully" or something to that end.
BTW, crbug.com/597785 is about an attempt to have a helper function which automagically does best-effort data recovery. So far it hasn't made it to bet channel, so I'm unwilling to declare success, but OTOH it hasn't detonated the canary/dev channels, so I'm optimistic. RazeAndClose() is probably more reliable for now, if you can live with the consequences, but something to keep in mind for later.
afakhry@chromium.org changed reviewers: + thestig@chromium.org
Thanks shess@ +thestig@ for chrome/. https://codereview.chromium.org/2225333003/diff/120001/components/webdata/com... File components/webdata/common/web_database_backend.cc (right): https://codereview.chromium.org/2225333003/diff/120001/components/webdata/com... components/webdata/common/web_database_backend.cc:108: diagnostics_.clear(); On 2016/08/18 21:51:38, Scott Hess wrote: > Also clear catastrophic_error_occurred_, since it's a new open? Or is the above > just to save memory if everything has gone poorly? Done. https://codereview.chromium.org/2225333003/diff/120001/components/webdata/com... components/webdata/common/web_database_backend.cc:125: db_.reset(); On 2016/08/18 21:51:38, Scott Hess wrote: > It might make more sense to put this in an init_status != sql::INIT_OK block, > with early return, and put the common/expected case down here. I think that's > closer to common Chromium style. Done. https://codereview.chromium.org/2225333003/diff/120001/components/webdata/com... components/webdata/common/web_database_backend.cc:135: db_->GetSQLConnection()->RazeAndClose(); On 2016/08/18 21:51:38, Scott Hess wrote: > Makes sense to me. > > Note that this callback could be called after LoadDatabaseIfNecessary() has run > successfully to completion. This can happen if there is corruption in the > database data, but not in the header data. The levels I can think of are: > - header corruption, which should be touched off in sql::Connection::Open(). > - schema corruption, which should be touched off by table creation and > checking. > - data corruption, which is only noticed when the corrupt data is queried. > I _think_ the second bit is covered by db_->Init(), assuming that does queries > like "CREATE TABLE IF NOT EXISTS" or sql::Connection:DoesTableExist() type > calls. > > Anyhow, I think what will happen is that data corruption will result in a raze, > but the analysis data will never be surfaced to the user or sent up. This > _does_ happen, but I can't really say how often. It may be small enough not to > worry about. At one point I thought I had determined that ~25% of corruptions > were detect in scope of sql::Connection::Open(), which is why I introduced the > retry logic, but it's hard to make general statements based on research I did a > few years ago ... It seems that WebDatabase::Init() does a lot of the things that make it most likely that a DB corruption will be detected by the time it returns. Things like Connection::Open(), MetaTable::Init(), and WebDatabaseTable::CreateTablesIfNecessary(), which you already mentioned above. This makes me less worried about the scenario you mentioned. Am I correct? https://codereview.chromium.org/2225333003/diff/120001/sql/connection.cc File sql/connection.cc (right): https://codereview.chromium.org/2225333003/diff/120001/sql/connection.cc#newc... sql/connection.cc:1743: On 2016/08/18 21:51:38, Scott Hess wrote: > I'd prefer the version I posted to you under separate CL to this. Partly > because I think it does 95% the same thing, but partly because I feel like this > is kind of arbitrary to be bundled into your CL (I want to have a clear idea of > where the change came from in two years when it's breaking someone's sql code > :-). Agreed. I'll wait until you land yours and rebase once that happens. https://codereview.chromium.org/2225333003/diff/120001/sql/init_status.h File sql/init_status.h (right): https://codereview.chromium.org/2225333003/diff/120001/sql/init_status.h#newc... sql/init_status.h:18: On 2016/08/18 21:51:38, Scott Hess wrote: > Any reason this isn't at the end of the list? > > Also, the comment is wishy-washy. At this time, the database was recovered but > data was almost certainly lost. Maybe just say "The database was deleted and > re-opened successfully" or something to that end. No, no reason. I put up there to be close to INIT_OK. I moved it to the end, and changed the comment. Done.
https://codereview.chromium.org/2225333003/diff/140001/chrome/browser/sql_ini... File chrome/browser/sql_init_error_message_ids.h (right): https://codereview.chromium.org/2225333003/diff/140001/chrome/browser/sql_ini... chrome/browser/sql_init_error_message_ids.h:1: // Copyright 2016 The Chromium Authors. All rights reserved. Is there no appropriate chrome/browser/foo sub-directory to put this?
Description was changed from ========== Recreate the WebData database on a catastrophic SQL error Currently, if the WebData database gets corrupted, it will remain like so and the user will keep getting the profile error dialog every time he signs in. The only way for the user to fix this is to recreate his profile or delete the corrupt file. This CL does for the WebData DB as is done for the History DB; when a catstrophic error is seen, the DB is razed. Example feedback report filed with this CL applied: https://docs.google.com/a/google.com/document/d/1qt-AwWVHZTtcJoFPNq2_yI-Tl2pI... BUG=455749, 635734 ========== to ========== Recreate the WebData database on a catastrophic SQL error Currently, if the WebData database gets corrupted, it will remain like so and users will keep getting the profile error dialog every time they sign in. The only way for the user to fix this is to recreate the profile or delete the corrupt file. This CL does for the WebData DB as is done for the History DB; when a catstrophic error is seen, the DB is razed. Example feedback report filed with this CL applied: https://docs.google.com/a/google.com/document/d/1qt-AwWVHZTtcJoFPNq2_yI-Tl2pI... BUG=455749, 635734 ==========
https://codereview.chromium.org/2225333003/diff/140001/chrome/browser/sql_ini... File chrome/browser/sql_init_error_message_ids.h (right): https://codereview.chromium.org/2225333003/diff/140001/chrome/browser/sql_ini... chrome/browser/sql_init_error_message_ids.h:1: // Copyright 2016 The Chromium Authors. All rights reserved. On 2016/08/19 18:15:02, Lei Zhang wrote: > Is there no appropriate chrome/browser/foo sub-directory to put this? Couldn't find any suitable one. :(
The CQ bit was checked by afakhry@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
I rebased on shess' change in https://codereview.chromium.org/2258703004/ and tested that the behavior is exactly the same (DB corruption is detected and DB is Razed). I'm now ready to proceed with this change. https://codereview.chromium.org/2225333003/diff/140001/chrome/browser/sql_ini... File chrome/browser/sql_init_error_message_ids.h (right): https://codereview.chromium.org/2225333003/diff/140001/chrome/browser/sql_ini... chrome/browser/sql_init_error_message_ids.h:1: // Copyright 2016 The Chromium Authors. All rights reserved. On 2016/08/19 20:07:50, afakhry wrote: > On 2016/08/19 18:15:02, Lei Zhang wrote: > > Is there no appropriate chrome/browser/foo sub-directory to put this? > > Couldn't find any suitable one. :( Any suggestions though?
https://codereview.chromium.org/2225333003/diff/140001/chrome/browser/sql_ini... File chrome/browser/sql_init_error_message_ids.h (right): https://codereview.chromium.org/2225333003/diff/140001/chrome/browser/sql_ini... chrome/browser/sql_init_error_message_ids.h:1: // Copyright 2016 The Chromium Authors. All rights reserved. On 2016/08/20 00:54:11, afakhry wrote: > On 2016/08/19 20:07:50, afakhry wrote: > > On 2016/08/19 18:15:02, Lei Zhang wrote: > > > Is there no appropriate chrome/browser/foo sub-directory to put this? > > > > Couldn't find any suitable one. :( > > Any suggestions though? c/b/profiles maybe?
https://codereview.chromium.org/2225333003/diff/140001/chrome/browser/sql_ini... File chrome/browser/sql_init_error_message_ids.h (right): https://codereview.chromium.org/2225333003/diff/140001/chrome/browser/sql_ini... chrome/browser/sql_init_error_message_ids.h:1: // Copyright 2016 The Chromium Authors. All rights reserved. On 2016/08/20 01:17:52, Lei Zhang wrote: > On 2016/08/20 00:54:11, afakhry wrote: > > On 2016/08/19 20:07:50, afakhry wrote: > > > On 2016/08/19 18:15:02, Lei Zhang wrote: > > > > Is there no appropriate chrome/browser/foo sub-directory to put this? > > > > > > Couldn't find any suitable one. :( > > > > Any suggestions though? > > c/b/profiles maybe? Done.
On 2016/08/20 02:00:02, afakhry wrote: > Done. Tnanks. Want to see if c/b/profiles/OWNERS will take it? Also typo: "catstrophic"
Description was changed from ========== Recreate the WebData database on a catastrophic SQL error Currently, if the WebData database gets corrupted, it will remain like so and users will keep getting the profile error dialog every time they sign in. The only way for the user to fix this is to recreate the profile or delete the corrupt file. This CL does for the WebData DB as is done for the History DB; when a catstrophic error is seen, the DB is razed. Example feedback report filed with this CL applied: https://docs.google.com/a/google.com/document/d/1qt-AwWVHZTtcJoFPNq2_yI-Tl2pI... BUG=455749, 635734 ========== to ========== Recreate the WebData database on a catastrophic SQL error Currently, if the WebData database gets corrupted, it will remain like so and users will keep getting the profile error dialog every time they sign in. The only way for the user to fix this is to recreate the profile or delete the corrupt file. This CL does for the WebData DB as is done for the History DB; when a catastrophic error is seen, the DB is razed. Example feedback report filed with this CL applied: https://docs.google.com/a/google.com/document/d/1qt-AwWVHZTtcJoFPNq2_yI-Tl2pI... BUG=455749, 635734 ==========
afakhry@chromium.org changed reviewers: + erg@chromium.org
Fixed typo. +erg@chromium.org: To approve the addition of sql_init_error_message_ids.[cc/h] in c/b/profiles.
lgtm
lgtm
The CQ bit was checked by afakhry@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from shess@chromium.org Link to the patchset: https://codereview.chromium.org/2225333003/#ps200001 (title: "Fix typo")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Message was sent while issue was closed.
Description was changed from ========== Recreate the WebData database on a catastrophic SQL error Currently, if the WebData database gets corrupted, it will remain like so and users will keep getting the profile error dialog every time they sign in. The only way for the user to fix this is to recreate the profile or delete the corrupt file. This CL does for the WebData DB as is done for the History DB; when a catastrophic error is seen, the DB is razed. Example feedback report filed with this CL applied: https://docs.google.com/a/google.com/document/d/1qt-AwWVHZTtcJoFPNq2_yI-Tl2pI... BUG=455749, 635734 ========== to ========== Recreate the WebData database on a catastrophic SQL error Currently, if the WebData database gets corrupted, it will remain like so and users will keep getting the profile error dialog every time they sign in. The only way for the user to fix this is to recreate the profile or delete the corrupt file. This CL does for the WebData DB as is done for the History DB; when a catastrophic error is seen, the DB is razed. Example feedback report filed with this CL applied: https://docs.google.com/a/google.com/document/d/1qt-AwWVHZTtcJoFPNq2_yI-Tl2pI... BUG=455749, 635734 ==========
Message was sent while issue was closed.
Committed patchset #8 (id:200001)
Message was sent while issue was closed.
Description was changed from ========== Recreate the WebData database on a catastrophic SQL error Currently, if the WebData database gets corrupted, it will remain like so and users will keep getting the profile error dialog every time they sign in. The only way for the user to fix this is to recreate the profile or delete the corrupt file. This CL does for the WebData DB as is done for the History DB; when a catastrophic error is seen, the DB is razed. Example feedback report filed with this CL applied: https://docs.google.com/a/google.com/document/d/1qt-AwWVHZTtcJoFPNq2_yI-Tl2pI... BUG=455749, 635734 ========== to ========== Recreate the WebData database on a catastrophic SQL error Currently, if the WebData database gets corrupted, it will remain like so and users will keep getting the profile error dialog every time they sign in. The only way for the user to fix this is to recreate the profile or delete the corrupt file. This CL does for the WebData DB as is done for the History DB; when a catastrophic error is seen, the DB is razed. Example feedback report filed with this CL applied: https://docs.google.com/a/google.com/document/d/1qt-AwWVHZTtcJoFPNq2_yI-Tl2pI... BUG=455749, 635734 Committed: https://crrev.com/a592d29c2a4667b4ec95bd091e860bd01662908e Cr-Commit-Position: refs/heads/master@{#413500} ==========
Message was sent while issue was closed.
Patchset 8 (id:??) landed as https://crrev.com/a592d29c2a4667b4ec95bd091e860bd01662908e Cr-Commit-Position: refs/heads/master@{#413500} |