Issue 2225333003: Recreate the WebData database on a catastrophic SQL error

afakhry

The CQ bit was checked by afakhry@chromium.org to run a CQ dry run

4 years, 4 months ago (2016-08-09 19:44:20 UTC) #1

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2225333003/1

4 years, 4 months ago (2016-08-09 19:45:33 UTC) #2

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 4 months ago (2016-08-09 20:06:11 UTC) #3

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: linux_chromium_chromeos_ozone_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_chromeos_ozone_rel_ng/builds/215308)

4 years, 4 months ago (2016-08-09 20:06:12 UTC) #4

Daniel Erat

derat@chromium.org changed reviewers: + derat@chromium.org

4 years, 4 months ago (2016-08-09 21:05:41 UTC) #5

Daniel Erat

(drive-by review; i'm unfamiliar with this code.) it'd be nice to have tests for the ...

4 years, 4 months ago (2016-08-09 21:05:42 UTC) #6

afakhry

afakhry@chromium.org changed reviewers: + shess@chromium.org

4 years, 4 months ago (2016-08-09 21:36:14 UTC) #7

afakhry

derat@ Thanks! I'll try to add tests once I finalize the solution. Adding shess@. shess@, ...

4 years, 4 months ago (2016-08-09 21:36:18 UTC) #8

derat@ Thanks! I'll try to add tests once I finalize the solution. 

Adding shess@.
shess@, this is CL is a follow-up to the CL
https://codereview.chromium.org/2107493002/, as agreed with michaeln@.

I found 2 issues while working on the previous CL:
1- The WebData DB when corrupted remains like so, and the user keeps getting the
profile error dialog on each sign in.
2- A crash happens when the user exits the profile error message box, which I
explained here https://bugs.chromium.org/p/chromium/issues/detail?id=635734#c2

https://codereview.chromium.org/2225333003/diff/1/components/webdata/common/w...
File components/webdata/common/web_database_service.cc (right):

https://codereview.chromium.org/2225333003/diff/1/components/webdata/common/w...
components/webdata/common/web_database_service.cc:141: auto loaded_callback =
loaded_callbacks_.back();
On 2016/08/09 21:05:41, Daniel Erat wrote:
> same comment as below

Done.

https://codereview.chromium.org/2225333003/diff/1/components/webdata/common/w...
components/webdata/common/web_database_service.cc:154: auto error_callback =
error_callbacks_.back();
On 2016/08/09 21:05:41, Daniel Erat wrote:
> the c++ rules around auto and references seem arcane and terrify me. is this
> safe or a use-after-free? my initial concern is that since back() usually
> returns a reference, the 'auto' type here may be a reference as well, in which
> case it may no longer be valid after the pop_back() call on the next line.
> 
> in any case, i think our rules for 'auto' say that it can only be used when
the
> variable's type is obvious, which isn't the case here (since error_callbacks_
is
> declared in the header), so please switch this to use an explicit
non-reference
> callback type.

Yes, absolutely. Thanks!
Done.

https://codereview.chromium.org/2225333003/diff/20001/sql/connection.cc
File sql/connection.cc (right):

https://codereview.chromium.org/2225333003/diff/20001/sql/connection.cc#newco...
sql/connection.cc:1708: return false;
shess@, I need your advice about this code.

The reason I added it is because I want to avoid the following crash on Debug
builds when the WebData DB file is corrupted:

[4470:4489:0809/140648:FATAL:statement.cc(52)] Cannot call mutating statements
on an invalid statement.
#0 0x2b105313d14e base::debug::StackTrace::StackTrace()
#1 0x2b1053198b1c logging::LogMessage::~LogMessage()
#2 0x2b1055b4f892 sql::Statement::CheckValid()
#3 0x2b1055b4f8e2 sql::Statement::StepInternal()
#4 0x2b1055b4fd8a sql::Statement::Step()
#5 0x2b1055b4a178 sql::MetaTable::RazeIfDeprecated()
#6 0x2b1055b0afa4 WebDatabase::Init()
#7 0x2b1055b0e785 WebDatabaseBackend::LoadDatabaseIfNecessary()
#8 0x2b1055b0e549 WebDatabaseBackend::InitDatabase()
#9 0x2b1055b13382
_ZN4base8internal13FunctorTraitsIM18WebDatabaseBackendFvvEvE6InvokeIRK13scoped_refptrIS2_EJEEEvS4_OT_DpOT0_
#10 0x2b1055b132d1
_ZN4base8internal12InvokeHelperILb0EvE8MakeItSoIRKM18WebDatabaseBackendFvvEJRK13scoped_refptrIS4_EEEEvOT_DpOT0_
#11 0x2b1055b13272
_ZN4base8internal7InvokerINS0_9BindStateIM18WebDatabaseBackendFvvEJ13scoped_refptrIS3_EEEEFvvEE7RunImplIRKS5_RKSt5tupleIJS7_EEJLm0EEEEvOT_OT0_NS_13IndexSequenceIJXspT1_EEEE
#12 0x2b1055b12e4c
_ZN4base8internal7InvokerINS0_9BindStateIM18WebDatabaseBackendFvvEJ13scoped_refptrIS3_EEEEFvvEE3RunEPNS0_13BindStateBaseE
#13 0x2b1053120d5e base::Callback<>::Run()
#14 0x2b1053142a4e base::debug::TaskAnnotator::RunTask()
#15 0x2b10531b5551 base::MessageLoop::RunTask()
#16 0x2b10531b57d4 base::MessageLoop::DeferOrRunPendingTask()
#17 0x2b10531b5a9e base::MessageLoop::DoWork()
#18 0x2b10531ca8a4 base::MessagePumpDefault::Run()
#19 0x2b10531b4fba base::MessageLoop::RunHandler()
#20 0x2b105324d694 base::RunLoop::Run()
#21 0x2b10532e26d1 base::Thread::Run()
#22 0x2b1057e88648 content::BrowserThreadImpl::DBThreadRun()
#23 0x2b1057e88c38 content::BrowserThreadImpl::Run()
#24 0x2b10532e2cab base::Thread::ThreadMain()
#25 0x2b10532cf58a base::(anonymous namespace)::ThreadFunc()
#26 0x2b1068615184 start_thread
#27 0x2b106892537d clone

I wanted to make sure that a DB with file that's either SQLITE_NOTADB or
SQLITE_CORRUPT is not openable to return early at:
https://cs.chromium.org/chromium/src/components/webdata/common/web_database.c...

and skip the sql::MetaTable::RazeIfDeprecated()
https://cs.chromium.org/chromium/src/components/webdata/common/web_database.c...
that causes the crash.

Unfortunately a lot of tests expect that opening a DB in this state is
successful, so I'm not sure how to overcome this problem.

afakhry

Description was changed from ========== Recreate the WebData database on a catastrophic SQL error Currently, ...

4 years, 4 months ago (2016-08-09 21:37:29 UTC) #9

Scott Hess - ex-Googler

Note that the doc linked in the CL comment takes me to a doc containing ...

4 years, 4 months ago (2016-08-10 17:16:19 UTC) #10

Note that the doc linked in the CL comment takes me to a doc containing a link,
and that link doesn't work for me, even though I have a google.com account. 
Maybe you could just describe the information inline.

https://codereview.chromium.org/2225333003/diff/20001/components/webdata/comm...
File components/webdata/common/web_database_backend.cc (right):

https://codereview.chromium.org/2225333003/diff/20001/components/webdata/comm...
components/webdata/common/web_database_backend.cc:141: db_.reset();
These three lines are essentially identical to what
sql::Connection::RazeAndClose() does.  Additionally, RazeAndClose() is intended
to work correctly when called from within the error callback, so using that
would mean you don't have to post a task and carry around a database handle you
know to reference a potentially-corrupt database.

https://codereview.chromium.org/2225333003/diff/20001/components/webdata/comm...
File components/webdata/common/web_database_service.cc (right):

https://codereview.chromium.org/2225333003/diff/20001/components/webdata/comm...
components/webdata/common/web_database_service.cc:153: // |error_callbacks_|
before it accesses it.
If I understand this correctly, another option might be something like:
  ErrorCallbacks callbacks;
  std::swap(callbacks, error_callbacks_);
  for (const auto& error_callback : callbacks) {
    ...
  }

https://codereview.chromium.org/2225333003/diff/20001/sql/connection.cc
File sql/connection.cc (right):

https://codereview.chromium.org/2225333003/diff/20001/sql/connection.cc#newco...
sql/connection.cc:1708: return false;
On 2016/08/09 21:36:17, afakhry wrote:
> shess@, I need your advice about this code.
> 
> The reason I added it is because I want to avoid the following crash on Debug
> builds when the WebData DB file is corrupted:

I'm not sure what your question is.  I wouldn't expect you to see enough
corruption that having this corrupt ever comes up as a problem in debug builds. 
So what's the context where the corruption is happening?  A test or something
like that?

If I read the existing code correctly, it's probably using
ExecuteAndReturnErrorCode() Specifically to avoid calling the error callback. 
Otherwise it could have just used plain Execute().

afakhry

The CQ bit was checked by afakhry@chromium.org to run a CQ dry run

4 years, 4 months ago (2016-08-11 17:14:12 UTC) #11

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2225333003/40001

4 years, 4 months ago (2016-08-11 17:15:02 UTC) #12

afakhry

The CQ bit was checked by afakhry@chromium.org to run a CQ dry run

4 years, 4 months ago (2016-08-11 17:22:51 UTC) #14

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2225333003/60001

4 years, 4 months ago (2016-08-11 17:23:34 UTC) #15

afakhry

shess@ With the solution in this new patch, we RazeAndClose() the DB on a catastrophic ...

4 years, 4 months ago (2016-08-11 17:32:54 UTC) #16

shess@
With the solution in this new patch, we RazeAndClose() the DB on a catastrophic
error as you suggested, and the addition of checking whether the DB was poisoned
in OpenInternal() and retry again, makes the profile error dialog entirely go
away. The DB is opened and initialized successfully as if the file was never
corrupt.
The user signs in successfully, but is faced with another sync error asking them
to re-signin, since the recreated DB contains nothing:
https://drive.google.com/file/d/0B9NoD5ZlZAEMbGR3amwzY3pKaDA/view?usp=sharing

Pros:
- User is not faced with a confusing dialog saying "some features will be
missing" and the decision to file a feedback report or not. All he needs to do
is to re-signin to get back sync.

Cons:
- The DB can still get corrupted, but we don't get to see that from the stats as
it fixes itself before we know about it. This might not be important though.

https://codereview.chromium.org/2225333003/diff/20001/components/webdata/comm...
File components/webdata/common/web_database_backend.cc (right):

https://codereview.chromium.org/2225333003/diff/20001/components/webdata/comm...
components/webdata/common/web_database_backend.cc:141: db_.reset();
On 2016/08/10 17:16:19, Scott Hess wrote:
> These three lines are essentially identical to what
> sql::Connection::RazeAndClose() does.  Additionally, RazeAndClose() is
intended
> to work correctly when called from within the error callback, so using that
> would mean you don't have to post a task and carry around a database handle
you
> know to reference a potentially-corrupt database.

Interesting. Thanks! Done.

https://codereview.chromium.org/2225333003/diff/20001/components/webdata/comm...
File components/webdata/common/web_database_service.cc (right):

https://codereview.chromium.org/2225333003/diff/20001/components/webdata/comm...
components/webdata/common/web_database_service.cc:153: // |error_callbacks_|
before it accesses it.
On 2016/08/10 17:16:19, Scott Hess wrote:
> If I understand this correctly, another option might be something like:
>   ErrorCallbacks callbacks;
>   std::swap(callbacks, error_callbacks_);
>   for (const auto& error_callback : callbacks) {
>     ...
>   }

This works too, but it exposes a problem that was already there. Now we get the
profile error dialog 4 times, once after every "OK! The nested runloop of the
message box blocks the execution until it's dismissed, and then the next
callback is executed, which shows the profile error dialog again!

This pop_back() implementation *hides* the problem, since all other callbacks
are executed fast while the message box is still being shown. This is good but
that's not a robust solution! I'm not sure why the code is written that way. Is
it expected that we get one profile error dialog for each WebData DB type
(autofill, keyword, token, password ...)? I'd need more time to think about how
to fix this cleanly (if that's important).

https://codereview.chromium.org/2225333003/diff/20001/sql/connection.cc
File sql/connection.cc (right):

https://codereview.chromium.org/2225333003/diff/20001/sql/connection.cc#newco...
sql/connection.cc:1708: return false;
On 2016/08/10 17:16:19, Scott Hess wrote:
> On 2016/08/09 21:36:17, afakhry wrote:
> > shess@, I need your advice about this code.
> > 
> > The reason I added it is because I want to avoid the following crash on
Debug
> > builds when the WebData DB file is corrupted:
> 
> I'm not sure what your question is.  I wouldn't expect you to see enough
> corruption that having this corrupt ever comes up as a problem in debug
builds. 
> So what's the context where the corruption is happening?  A test or something
> like that?
> 
> If I read the existing code correctly, it's probably using
> ExecuteAndReturnErrorCode() Specifically to avoid calling the error callback. 
> Otherwise it could have just used plain Execute().

I'm forcing a corruption in the Web Data file on a debug ChromeOS Linux build
just for testing. My question was if there's a way to avoid that crash by making
sure opening the corrupt db fails so that we can skip running
sql::MetaTable::RazeIfDeprecated() which causes the crash. But it seems that
opening a corrupt db is expected to succeed!

With the solution in the new patch, this is not needed. However I found that
razing a NOTADB database always fails at: sqlite3_backup_step(backup, -1);
[https://cs.chromium.org/chromium/src/sql/connection.cc?type=cs&q=%22sqlite3_backup_step(backup,+-1);%22&sq=package:chromium&l=102]
which returns NOTADB as well, and hence we hit this DCHECK here
https://cs.chromium.org/chromium/src/sql/connection.cc?type=cs&q=Connection::....
So maybe the assumptions need to be revised?

If debug builds are not something we need to worry about, then we can ignore
this comment.

afakhry

On 2016/08/11 17:32:54, afakhry wrote: > shess@ > With the solution in this new patch, ...

4 years, 4 months ago (2016-08-11 18:03:51 UTC) #17

On 2016/08/11 17:32:54, afakhry wrote:
> shess@
> With the solution in this new patch, we RazeAndClose() the DB on a
catastrophic
> error as you suggested, and the addition of checking whether the DB was
poisoned
> in OpenInternal() and retry again, makes the profile error dialog entirely go
> away. The DB is opened and initialized successfully as if the file was never
> corrupt.
> The user signs in successfully, but is faced with another sync error asking
them
> to re-signin, since the recreated DB contains nothing:
> https://drive.google.com/file/d/0B9NoD5ZlZAEMbGR3amwzY3pKaDA/view?usp=sharing
> 
> Pros:
> - User is not faced with a confusing dialog saying "some features will be
> missing" and the decision to file a feedback report or not. All he needs to do
> is to re-signin to get back sync.
> 
> Cons:
> - The DB can still get corrupted, but we don't get to see that from the stats
as
> it fixes itself before we know about it. This might not be important though.
> 

Actually, I just spoke with Albert about this, and we think that it's still
better to show the profile error page so that we get the visibility that we need
through the UMA stats and the feedback reports, and also to clearly indicate to
the user that a data loss might have happened. The user will still see the sync
error notification (I showed in the above screenshot) which has a button that
signs the user out to do the re-signin. Since we do a RazeAndClose() anyways,
the subsequent signin will be problem-free.

What do you think?

>
https://codereview.chromium.org/2225333003/diff/20001/components/webdata/comm...
> File components/webdata/common/web_database_backend.cc (right):
> 
>
https://codereview.chromium.org/2225333003/diff/20001/components/webdata/comm...
> components/webdata/common/web_database_backend.cc:141: db_.reset();
> On 2016/08/10 17:16:19, Scott Hess wrote:
> > These three lines are essentially identical to what
> > sql::Connection::RazeAndClose() does.  Additionally, RazeAndClose() is
> intended
> > to work correctly when called from within the error callback, so using that
> > would mean you don't have to post a task and carry around a database handle
> you
> > know to reference a potentially-corrupt database.
> 
> Interesting. Thanks! Done.
> 
>
https://codereview.chromium.org/2225333003/diff/20001/components/webdata/comm...
> File components/webdata/common/web_database_service.cc (right):
> 
>
https://codereview.chromium.org/2225333003/diff/20001/components/webdata/comm...
> components/webdata/common/web_database_service.cc:153: // |error_callbacks_|
> before it accesses it.
> On 2016/08/10 17:16:19, Scott Hess wrote:
> > If I understand this correctly, another option might be something like:
> >   ErrorCallbacks callbacks;
> >   std::swap(callbacks, error_callbacks_);
> >   for (const auto& error_callback : callbacks) {
> >     ...
> >   }
> 
> This works too, but it exposes a problem that was already there. Now we get
the
> profile error dialog 4 times, once after every "OK! The nested runloop of the
> message box blocks the execution until it's dismissed, and then the next
> callback is executed, which shows the profile error dialog again!
> 
> This pop_back() implementation *hides* the problem, since all other callbacks
> are executed fast while the message box is still being shown. This is good but
> that's not a robust solution! I'm not sure why the code is written that way.
Is
> it expected that we get one profile error dialog for each WebData DB type
> (autofill, keyword, token, password ...)? I'd need more time to think about
how
> to fix this cleanly (if that's important).
> 
> https://codereview.chromium.org/2225333003/diff/20001/sql/connection.cc
> File sql/connection.cc (right):
> 
>
https://codereview.chromium.org/2225333003/diff/20001/sql/connection.cc#newco...
> sql/connection.cc:1708: return false;
> On 2016/08/10 17:16:19, Scott Hess wrote:
> > On 2016/08/09 21:36:17, afakhry wrote:
> > > shess@, I need your advice about this code.
> > > 
> > > The reason I added it is because I want to avoid the following crash on
> Debug
> > > builds when the WebData DB file is corrupted:
> > 
> > I'm not sure what your question is.  I wouldn't expect you to see enough
> > corruption that having this corrupt ever comes up as a problem in debug
> builds. 
> > So what's the context where the corruption is happening?  A test or
something
> > like that?
> > 
> > If I read the existing code correctly, it's probably using
> > ExecuteAndReturnErrorCode() Specifically to avoid calling the error
callback. 
> > Otherwise it could have just used plain Execute().
> 
> I'm forcing a corruption in the Web Data file on a debug ChromeOS Linux build
> just for testing. My question was if there's a way to avoid that crash by
making
> sure opening the corrupt db fails so that we can skip running
> sql::MetaTable::RazeIfDeprecated() which causes the crash. But it seems that
> opening a corrupt db is expected to succeed!
> 
> With the solution in the new patch, this is not needed. However I found that
> razing a NOTADB database always fails at: sqlite3_backup_step(backup, -1);
>
[https://cs.chromium.org/chromium/src/sql/connection.cc?type=cs&q=%22sqlite3_backup_step(backup,+-1);%22&sq=package:chromium&l=102]
> which returns NOTADB as well, and hence we hit this DCHECK here
>
https://cs.chromium.org/chromium/src/sql/connection.cc?type=cs&q=Connection::....
> So maybe the assumptions need to be revised?
> 
> If debug builds are not something we need to worry about, then we can ignore
> this comment.

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 4 months ago (2016-08-11 18:40:07 UTC) #18

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: mac_chromium_rel_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_ng/builds/276205)

4 years, 4 months ago (2016-08-11 18:40:08 UTC) #19

Scott Hess - ex-Googler

OK, I'm not _entirely_ sure I follow at this point. I spent the balance of ...

4 years, 4 months ago (2016-08-12 21:05:33 UTC) #21

Scott Hess - ex-Googler

I'm liking this better, but I need to go over OpenInternal() because I'm concerned that ...

4 years, 4 months ago (2016-08-12 21:11:53 UTC) #22

afakhry

The CQ bit was checked by afakhry@chromium.org to run a CQ dry run

4 years, 4 months ago (2016-08-17 23:19:59 UTC) #23

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2225333003/80001

4 years, 4 months ago (2016-08-17 23:20:28 UTC) #24

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 4 months ago (2016-08-17 23:30:02 UTC) #25

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: ios-device on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device/builds/53873)

4 years, 4 months ago (2016-08-17 23:30:02 UTC) #26

afakhry

Hello shess@, Did you manage to write tests to see where the block I added ...

4 years, 4 months ago (2016-08-17 23:30:38 UTC) #27

afakhry

The CQ bit was checked by afakhry@chromium.org to run a CQ dry run

4 years, 4 months ago (2016-08-17 23:34:09 UTC) #28

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2225333003/100001

4 years, 4 months ago (2016-08-17 23:34:52 UTC) #29

afakhry

The CQ bit was checked by afakhry@chromium.org to run a CQ dry run

4 years, 4 months ago (2016-08-17 23:43:30 UTC) #32

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2225333003/120001

4 years, 4 months ago (2016-08-17 23:43:58 UTC) #33

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 4 months ago (2016-08-18 01:45:27 UTC) #34

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

4 years, 4 months ago (2016-08-18 01:45:27 UTC) #35

Scott Hess - ex-Googler

LGTM with suggested changes. Sorry for the delay. https://codereview.chromium.org/2225333003/diff/120001/components/webdata/common/web_database_backend.cc File components/webdata/common/web_database_backend.cc (right): https://codereview.chromium.org/2225333003/diff/120001/components/webdata/common/web_database_backend.cc#newcode108 components/webdata/common/web_database_backend.cc:108: diagnostics_.clear(); ...

4 years, 4 months ago (2016-08-18 21:51:38 UTC) #36

LGTM with suggested changes.  Sorry for the delay.

https://codereview.chromium.org/2225333003/diff/120001/components/webdata/com...
File components/webdata/common/web_database_backend.cc (right):

https://codereview.chromium.org/2225333003/diff/120001/components/webdata/com...
components/webdata/common/web_database_backend.cc:108: diagnostics_.clear();
Also clear catastrophic_error_occurred_, since it's a new open?  Or is the above
just to save memory if everything has gone poorly?

https://codereview.chromium.org/2225333003/diff/120001/components/webdata/com...
components/webdata/common/web_database_backend.cc:125: db_.reset();
It might make more sense to put this in an init_status != sql::INIT_OK block,
with early return, and put the common/expected case down here.  I think that's
closer to common Chromium style.

https://codereview.chromium.org/2225333003/diff/120001/components/webdata/com...
components/webdata/common/web_database_backend.cc:135:
db_->GetSQLConnection()->RazeAndClose();
Makes sense to me.

Note that this callback could be called after LoadDatabaseIfNecessary() has run
successfully to completion.  This can happen if there is corruption in the
database data, but not in the header data.  The levels I can think of are:
 - header corruption, which should be touched off in sql::Connection::Open().
 - schema corruption, which should be touched off by table creation and
checking.
 - data corruption, which is only noticed when the corrupt data is queried.
I _think_ the second bit is covered by db_->Init(), assuming that does queries
like "CREATE TABLE IF NOT EXISTS" or sql::Connection:DoesTableExist() type
calls.

Anyhow, I think what will happen is that data corruption will result in a raze,
but the analysis data will never be surfaced to the user or sent up.  This
_does_ happen, but I can't really say how often.  It may be small enough not to
worry about.  At one point I thought I had determined that ~25% of corruptions
were detect in scope of sql::Connection::Open(), which is why I introduced the
retry logic, but it's hard to make general statements based on research I did a
few years ago ...

https://codereview.chromium.org/2225333003/diff/120001/sql/connection.cc
File sql/connection.cc (right):

https://codereview.chromium.org/2225333003/diff/120001/sql/connection.cc#newc...
sql/connection.cc:1743: 
I'd prefer the version I posted to you under separate CL to this.  Partly
because I think it does 95% the same thing, but partly because I feel like this
is kind of arbitrary to be bundled into your CL (I want to have a clear idea of
where the change came from in two years when it's breaking someone's sql code
:-).

https://codereview.chromium.org/2225333003/diff/120001/sql/init_status.h
File sql/init_status.h (right):

https://codereview.chromium.org/2225333003/diff/120001/sql/init_status.h#newc...
sql/init_status.h:18: 
Any reason this isn't at the end of the list?

Also, the comment is wishy-washy.  At this time, the database was recovered but
data was almost certainly lost.  Maybe just say "The database was deleted and
re-opened successfully" or something to that end.

Scott Hess - ex-Googler

BTW, crbug.com/597785 is about an attempt to have a helper function which automagically does best-effort ...

4 years, 4 months ago (2016-08-18 21:54:31 UTC) #37

afakhry

afakhry@chromium.org changed reviewers: + thestig@chromium.org

4 years, 4 months ago (2016-08-19 18:01:57 UTC) #38

afakhry

Thanks shess@ +thestig@ for chrome/. https://codereview.chromium.org/2225333003/diff/120001/components/webdata/common/web_database_backend.cc File components/webdata/common/web_database_backend.cc (right): https://codereview.chromium.org/2225333003/diff/120001/components/webdata/common/web_database_backend.cc#newcode108 components/webdata/common/web_database_backend.cc:108: diagnostics_.clear(); On 2016/08/18 21:51:38, ...

4 years, 4 months ago (2016-08-19 18:01:58 UTC) #39

Thanks shess@

+thestig@ for chrome/.

https://codereview.chromium.org/2225333003/diff/120001/components/webdata/com...
File components/webdata/common/web_database_backend.cc (right):

https://codereview.chromium.org/2225333003/diff/120001/components/webdata/com...
components/webdata/common/web_database_backend.cc:108: diagnostics_.clear();
On 2016/08/18 21:51:38, Scott Hess wrote:
> Also clear catastrophic_error_occurred_, since it's a new open?  Or is the
above
> just to save memory if everything has gone poorly?

Done.

https://codereview.chromium.org/2225333003/diff/120001/components/webdata/com...
components/webdata/common/web_database_backend.cc:125: db_.reset();
On 2016/08/18 21:51:38, Scott Hess wrote:
> It might make more sense to put this in an init_status != sql::INIT_OK block,
> with early return, and put the common/expected case down here.  I think that's
> closer to common Chromium style.

Done.

https://codereview.chromium.org/2225333003/diff/120001/components/webdata/com...
components/webdata/common/web_database_backend.cc:135:
db_->GetSQLConnection()->RazeAndClose();
On 2016/08/18 21:51:38, Scott Hess wrote:
> Makes sense to me.
> 
> Note that this callback could be called after LoadDatabaseIfNecessary() has
run
> successfully to completion.  This can happen if there is corruption in the
> database data, but not in the header data.  The levels I can think of are:
>  - header corruption, which should be touched off in sql::Connection::Open().
>  - schema corruption, which should be touched off by table creation and
> checking.
>  - data corruption, which is only noticed when the corrupt data is queried.
> I _think_ the second bit is covered by db_->Init(), assuming that does queries
> like "CREATE TABLE IF NOT EXISTS" or sql::Connection:DoesTableExist() type
> calls.
> 
> Anyhow, I think what will happen is that data corruption will result in a
raze,
> but the analysis data will never be surfaced to the user or sent up.  This
> _does_ happen, but I can't really say how often.  It may be small enough not
to
> worry about.  At one point I thought I had determined that ~25% of corruptions
> were detect in scope of sql::Connection::Open(), which is why I introduced the
> retry logic, but it's hard to make general statements based on research I did
a
> few years ago ...

It seems that WebDatabase::Init() does a lot of the things that make it most
likely that a DB corruption will be detected by the time it returns. Things like
Connection::Open(), MetaTable::Init(), and
WebDatabaseTable::CreateTablesIfNecessary(), which you already mentioned above.
This makes me less worried about the scenario you mentioned. Am I correct?

https://codereview.chromium.org/2225333003/diff/120001/sql/connection.cc
File sql/connection.cc (right):

https://codereview.chromium.org/2225333003/diff/120001/sql/connection.cc#newc...
sql/connection.cc:1743: 
On 2016/08/18 21:51:38, Scott Hess wrote:
> I'd prefer the version I posted to you under separate CL to this.  Partly
> because I think it does 95% the same thing, but partly because I feel like
this
> is kind of arbitrary to be bundled into your CL (I want to have a clear idea
of
> where the change came from in two years when it's breaking someone's sql code
> :-).

Agreed. I'll wait until you land yours and rebase once that happens.

https://codereview.chromium.org/2225333003/diff/120001/sql/init_status.h
File sql/init_status.h (right):

https://codereview.chromium.org/2225333003/diff/120001/sql/init_status.h#newc...
sql/init_status.h:18: 
On 2016/08/18 21:51:38, Scott Hess wrote:
> Any reason this isn't at the end of the list?
> 
> Also, the comment is wishy-washy.  At this time, the database was recovered
but
> data was almost certainly lost.  Maybe just say "The database was deleted and
> re-opened successfully" or something to that end.

No, no reason. I put up there to be close to INIT_OK.
I moved it to the end, and changed the comment.
Done.

Lei Zhang

https://codereview.chromium.org/2225333003/diff/140001/chrome/browser/sql_init_error_message_ids.h File chrome/browser/sql_init_error_message_ids.h (right): https://codereview.chromium.org/2225333003/diff/140001/chrome/browser/sql_init_error_message_ids.h#newcode1 chrome/browser/sql_init_error_message_ids.h:1: // Copyright 2016 The Chromium Authors. All rights reserved. ...

4 years, 4 months ago (2016-08-19 18:15:03 UTC) #40

afakhry

Description was changed from ========== Recreate the WebData database on a catastrophic SQL error Currently, ...

4 years, 4 months ago (2016-08-19 19:58:44 UTC) #41

afakhry

https://codereview.chromium.org/2225333003/diff/140001/chrome/browser/sql_init_error_message_ids.h File chrome/browser/sql_init_error_message_ids.h (right): https://codereview.chromium.org/2225333003/diff/140001/chrome/browser/sql_init_error_message_ids.h#newcode1 chrome/browser/sql_init_error_message_ids.h:1: // Copyright 2016 The Chromium Authors. All rights reserved. ...

4 years, 4 months ago (2016-08-19 20:07:50 UTC) #42

afakhry

The CQ bit was checked by afakhry@chromium.org to run a CQ dry run

4 years, 4 months ago (2016-08-20 00:51:28 UTC) #43

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2225333003/160001

4 years, 4 months ago (2016-08-20 00:51:53 UTC) #44

afakhry

I rebased on shess' change in https://codereview.chromium.org/2258703004/ and tested that the behavior is exactly the ...

4 years, 4 months ago (2016-08-20 00:54:11 UTC) #45

Lei Zhang

https://codereview.chromium.org/2225333003/diff/140001/chrome/browser/sql_init_error_message_ids.h File chrome/browser/sql_init_error_message_ids.h (right): https://codereview.chromium.org/2225333003/diff/140001/chrome/browser/sql_init_error_message_ids.h#newcode1 chrome/browser/sql_init_error_message_ids.h:1: // Copyright 2016 The Chromium Authors. All rights reserved. ...

4 years, 4 months ago (2016-08-20 01:17:52 UTC) #46

afakhry

https://codereview.chromium.org/2225333003/diff/140001/chrome/browser/sql_init_error_message_ids.h File chrome/browser/sql_init_error_message_ids.h (right): https://codereview.chromium.org/2225333003/diff/140001/chrome/browser/sql_init_error_message_ids.h#newcode1 chrome/browser/sql_init_error_message_ids.h:1: // Copyright 2016 The Chromium Authors. All rights reserved. ...

4 years, 4 months ago (2016-08-20 02:00:02 UTC) #47

Lei Zhang

On 2016/08/20 02:00:02, afakhry wrote: > Done. Tnanks. Want to see if c/b/profiles/OWNERS will take ...

4 years, 4 months ago (2016-08-20 02:08:29 UTC) #48

afakhry

Description was changed from ========== Recreate the WebData database on a catastrophic SQL error Currently, ...

4 years, 4 months ago (2016-08-20 02:27:33 UTC) #49

afakhry

afakhry@chromium.org changed reviewers: + erg@chromium.org

4 years, 4 months ago (2016-08-20 02:32:11 UTC) #50

afakhry

Fixed typo. +erg@chromium.org: To approve the addition of sql_init_error_message_ids.[cc/h] in c/b/profiles.

4 years, 4 months ago (2016-08-20 02:32:12 UTC) #51

afakhry

The patchset sent to the CQ was uploaded after l-g-t-m from shess@chromium.org Link to the ...

4 years, 4 months ago (2016-08-22 17:51:06 UTC) #55

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2225333003/200001

4 years, 4 months ago (2016-08-22 17:51:37 UTC) #56

commit-bot: I haz the power

Description was changed from ========== Recreate the WebData database on a catastrophic SQL error Currently, ...

4 years, 4 months ago (2016-08-22 19:07:16 UTC) #57

commit-bot: I haz the power

Description was changed from ========== Recreate the WebData database on a catastrophic SQL error Currently, ...

4 years, 4 months ago (2016-08-22 19:09:52 UTC) #59

commit-bot: I haz the power

4 years, 4 months ago (2016-08-22 19:09:54 UTC) #60

Message was sent while issue was closed.

Patchset 8 (id:??) landed as
https://crrev.com/a592d29c2a4667b4ec95bd091e860bd01662908e
Cr-Commit-Position: refs/heads/master@{#413500}

Issue 2225333003: Recreate the WebData database on a catastrophic SQL error (Closed)

Description

Patch Set 1 #

Patch Set 2 : derat's comments #

Patch Set 3 : shess@' comments #

Patch Set 4 : Recover DB and show profile error dialog #

Patch Set 5 : shess' comments #

Patch Set 6 : Rebase on shess change in https://codereview.chromium.org/2258703004/ #

Patch Set 7 : Move to c/b/profiles #

Patch Set 8 : Fix typo #

Messages