|
|
Created:
6 years, 7 months ago by jiayl Modified:
6 years, 7 months ago CC:
chromium-reviews, feature-media-reviews_chromium.org, jam, darin-cc_chromium.org Base URL:
https://chromium.googlesource.com/chromium/src.git@master Visibility:
Public. |
DescriptionDo not CHECK on the result of Sql::Statement::Run.
It may fail on database errors. We should not crash Chrome for such errors.
The CHECK is replaced by a log message. We already handles database error in general in OnDatabaseError, so no additional error handling is needed.
BUG=376884
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=273135
Patch Set 1 #
Total comments: 6
Patch Set 2 : #
Total comments: 8
Patch Set 3 : raze on all errors #
Total comments: 2
Patch Set 4 : #
Total comments: 2
Patch Set 5 : use ScopedVector #
Total comments: 1
Patch Set 6 : #
Messages
Total messages: 23 (0 generated)
PTAL
https://codereview.chromium.org/289343005/diff/1/content/browser/media/webrtc... File content/browser/media/webrtc_identity_store_backend.cc (right): https://codereview.chromium.org/289343005/diff/1/content/browser/media/webrtc... content/browser/media/webrtc_identity_store_backend.cc:490: if (!sql::IsErrorCatastrophic(error)) CL description implies that all the things that previously were CHECK failures would now be handled by OnDatabaseError but this has the potential to falsify that. Do you have good reason to believe this is safe? Should you track an "end_the_world_" bool that any detected failure should cause to make this function treat as catastrophic? https://codereview.chromium.org/289343005/diff/1/content/browser/media/webrtc... content/browser/media/webrtc_identity_store_backend.cc:574: DVLOG(2) << "Failed to add the identity to DB."; return? https://codereview.chromium.org/289343005/diff/1/content/browser/media/webrtc... content/browser/media/webrtc_identity_store_backend.cc:582: DVLOG(2) << "Failed to delete the identity from DB."; ditto
Thanks! https://codereview.chromium.org/289343005/diff/1/content/browser/media/webrtc... File content/browser/media/webrtc_identity_store_backend.cc (right): https://codereview.chromium.org/289343005/diff/1/content/browser/media/webrtc... content/browser/media/webrtc_identity_store_backend.cc:490: if (!sql::IsErrorCatastrophic(error)) On 2014/05/23 23:05:37, Ami Fischman wrote: > CL description implies that all the things that previously were CHECK failures > would now be handled by OnDatabaseError but this has the potential to falsify > that. Do you have good reason to believe this is safe? > > Should you track an "end_the_world_" bool that any detected failure should cause > to make this function treat as catastrophic? Changed to always RazeAndClose for "DELETE" failures. Transient "INSERT" failures are safe to ignore. https://codereview.chromium.org/289343005/diff/1/content/browser/media/webrtc... content/browser/media/webrtc_identity_store_backend.cc:574: DVLOG(2) << "Failed to add the identity to DB."; On 2014/05/23 23:05:37, Ami Fischman wrote: > return? Done. https://codereview.chromium.org/289343005/diff/1/content/browser/media/webrtc... content/browser/media/webrtc_identity_store_backend.cc:582: DVLOG(2) << "Failed to delete the identity from DB."; On 2014/05/23 23:05:37, Ami Fischman wrote: > ditto Done.
I'd be more comfortable if you got a SQL/DB expert to review this instead of me... https://codereview.chromium.org/289343005/diff/20001/content/browser/media/we... File content/browser/media/webrtc_identity_store_backend.cc (right): https://codereview.chromium.org/289343005/diff/20001/content/browser/media/we... content/browser/media/webrtc_identity_store_backend.cc:498: !StartsWithASCII(statement, "DELETE", false)) { ಠ_ಠ What about "dElEte"? (should you ToLower() the statement before text-compare?) More generally this seems quite brittle for something that's supposed to be managing security-related things. What's the cost of getting this wrong (using a corrupt/out-of-date DB)? What's the cost of being overzealous (always nuking the DB on error)?
https://codereview.chromium.org/289343005/diff/20001/content/browser/media/we... File content/browser/media/webrtc_identity_store_backend.cc (right): https://codereview.chromium.org/289343005/diff/20001/content/browser/media/we... content/browser/media/webrtc_identity_store_backend.cc:498: !StartsWithASCII(statement, "DELETE", false)) { On 2014/05/23 23:30:30, Ami Fischman wrote: > ಠ_ಠ > > What about "dElEte"? > (should you ToLower() the statement before text-compare?) > > More generally this seems quite brittle for something that's supposed to be > managing security-related things. > What's the cost of getting this wrong (using a corrupt/out-of-date DB)? > What's the cost of being overzealous (always nuking the DB on error)? The comparison is case insensitive. DB corruption will be a catastrophic error. See https://code.google.com/p/chromium/codesearch#chromium/src/sql/error_delegate... What do you mean by "out of date"? We always try to delete expired identifies at the startup of the class.
Adding shess to review the DB operations, based on the revision log of statement.h.
https://codereview.chromium.org/289343005/diff/20001/content/browser/media/we... File content/browser/media/webrtc_identity_store_backend.cc (right): https://codereview.chromium.org/289343005/diff/20001/content/browser/media/we... content/browser/media/webrtc_identity_store_backend.cc:498: !StartsWithASCII(statement, "DELETE", false)) { On 2014/05/23 23:34:54, jiayl wrote: > On 2014/05/23 23:30:30, Ami Fischman wrote: > > ಠ_ಠ > > > > What about "dElEte"? > > (should you ToLower() the statement before text-compare?) > > > > More generally this seems quite brittle for something that's supposed to be > > managing security-related things. > > What's the cost of getting this wrong (using a corrupt/out-of-date DB)? > > What's the cost of being overzealous (always nuking the DB on error)? > > > The comparison is case insensitive. So it is! (I missed the "false") > DB corruption will be a catastrophic error. See > https://code.google.com/p/chromium/codesearch#chromium/src/sql/error_delegate... > What do you mean by "out of date"? We always try to delete expired identifies at > the startup of the class. When I said "corrupt" I was referring to a DB whose contents may be correct as far as the DB layer is concerned but incorrect as far as this class is concerned. In other words, does this class expect semantics from the DB not expressed in the schema? Or is the DB just a list of records with no inter-relation, and is only used a serialization mechanism for persistence?
https://codereview.chromium.org/289343005/diff/20001/content/browser/media/we... File content/browser/media/webrtc_identity_store_backend.cc (right): https://codereview.chromium.org/289343005/diff/20001/content/browser/media/we... content/browser/media/webrtc_identity_store_backend.cc:498: !StartsWithASCII(statement, "DELETE", false)) { On 2014/05/23 23:47:01, Ami Fischman wrote: > On 2014/05/23 23:34:54, jiayl wrote: > > On 2014/05/23 23:30:30, Ami Fischman wrote: > > > ಠ_ಠ > > > > > > What about "dElEte"? > > > (should you ToLower() the statement before text-compare?) > > > > > > More generally this seems quite brittle for something that's supposed to be > > > managing security-related things. > > > What's the cost of getting this wrong (using a corrupt/out-of-date DB)? > > > What's the cost of being overzealous (always nuking the DB on error)? > > > > > > The comparison is case insensitive. > > So it is! > (I missed the "false") > > > DB corruption will be a catastrophic error. See > > > https://code.google.com/p/chromium/codesearch#chromium/src/sql/error_delegate... > > What do you mean by "out of date"? We always try to delete expired identifies > at > > the startup of the class. > > When I said "corrupt" I was referring to a DB whose contents may be correct as > far as the DB layer is concerned but incorrect as far as this class is > concerned. In other words, does this class expect semantics from the DB not > expressed in the schema? Or is the DB just a list of records with no > inter-relation, and is only used a serialization mechanism for persistence? The expectations of this class and not expressed in the DB schema are 1. the (origin, identit_name) can uniquely identify a record in the DB 2. expired records are deleted on startup These can only be violated if "DELETE" fails. "ADD" or "SELECT" failures will not cause "corruption".
SG. Will defer to shess@ for review. On Fri, May 23, 2014 at 4:53 PM, <jiayl@chromium.org> wrote: > > https://codereview.chromium.org/289343005/diff/20001/ > content/browser/media/webrtc_identity_store_backend.cc > File content/browser/media/webrtc_identity_store_backend.cc (right): > > https://codereview.chromium.org/289343005/diff/20001/ > content/browser/media/webrtc_identity_store_backend.cc#newcode498 > content/browser/media/webrtc_identity_store_backend.cc:498: > !StartsWithASCII(statement, "DELETE", false)) { > On 2014/05/23 23:47:01, Ami Fischman wrote: > >> On 2014/05/23 23:34:54, jiayl wrote: >> > On 2014/05/23 23:30:30, Ami Fischman wrote: >> > > ಠ_ಠ >> > > >> > > What about "dElEte"? >> > > (should you ToLower() the statement before text-compare?) >> > > >> > > More generally this seems quite brittle for something that's >> > supposed to be > >> > > managing security-related things. >> > > What's the cost of getting this wrong (using a corrupt/out-of-date >> > DB)? > >> > > What's the cost of being overzealous (always nuking the DB on >> > error)? > >> > >> > >> > The comparison is case insensitive. >> > > So it is! >> (I missed the "false") >> > > > DB corruption will be a catastrophic error. See >> > >> > > https://code.google.com/p/chromium/codesearch#chromium/ > src/sql/error_delegate_util.cc&rcl=1400145541&l=11 > >> > What do you mean by "out of date"? We always try to delete expired >> > identifies > >> at >> > the startup of the class. >> > > When I said "corrupt" I was referring to a DB whose contents may be >> > correct as > >> far as the DB layer is concerned but incorrect as far as this class is >> concerned. In other words, does this class expect semantics from the >> > DB not > >> expressed in the schema? Or is the DB just a list of records with no >> inter-relation, and is only used a serialization mechanism for >> > persistence? > > > The expectations of this class and not expressed in the DB schema are > 1. the (origin, identit_name) can uniquely identify a record in the DB > 2. expired records are deleted on startup > > These can only be violated if "DELETE" fails. "ADD" or "SELECT" failures > will not cause "corruption". > > https://codereview.chromium.org/289343005/ > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
Another option in the DELETE case might be to actually encode your sensitive deletes in a testable fashion. SQLite allows for c-style comments, so could could write: DELETE /* sensitive */ ... and test for that. Clearly you want to make sure that you're testing each such sensitive delete, so that people don't break your guarantees on this front. https://codereview.chromium.org/289343005/diff/20001/content/browser/media/we... File content/browser/media/webrtc_identity_store_backend.cc (right): https://codereview.chromium.org/289343005/diff/20001/content/browser/media/we... content/browser/media/webrtc_identity_store_backend.cc:383: DVLOG(2) << "Unable to open DB."; For all of these log lines, please verify that you're not just duplicating something already covered in sql/. These are debug-only, so they won't bloat up the release binary, but it's still annoying to get 12 lines of redundant information when you trip something. Hmm, I guess more to the point, are there any circumstances where you imagine someone will enable the --verbose flags to see these AND will even have compiled things with the right flags that the code is even there AND won't be in a position to just layer in some additional spot logging or run it in a debugger? https://codereview.chromium.org/289343005/diff/20001/content/browser/media/we... content/browser/media/webrtc_identity_store_backend.cc:498: !StartsWithASCII(statement, "DELETE", false)) { On 2014/05/23 23:47:01, Ami Fischman wrote: > On 2014/05/23 23:34:54, jiayl wrote: > > On 2014/05/23 23:30:30, Ami Fischman wrote: > > > ಠ_ಠ > > > > > > What about "dElEte"? > > > (should you ToLower() the statement before text-compare?) > > > > > > More generally this seems quite brittle for something that's supposed to be > > > managing security-related things. > > > What's the cost of getting this wrong (using a corrupt/out-of-date DB)? > > > What's the cost of being overzealous (always nuking the DB on error)? > > > > > > The comparison is case insensitive. > > So it is! > (I missed the "false") If you're going to use this kind of assumption to test results, then please add some sort of DCHECK somewhere to prove that the code is adhering to your assumptions (as opposed to containing leading whitespace or a leading comment or something). Obviously putting that DCHECK here has coverage issues, I'd be comfortable with having it in sql/, but am not offering to do the gruntwork to flush out whoever is already doing that. Also, there is support code in sql/test to allow injecting some types of corruption into databases. If you're committed to handling error cases, you really should test them. There have been cases where the error-handling code was simply wrong, and we found out when thousands of people started crashing. [I haven't read your code thoroughly, perhaps there are tests of this sort out there. But obviously they can't be testing _these_ cases, because of the CHECKs.] > > DB corruption will be a catastrophic error. See > > > https://code.google.com/p/chromium/codesearch#chromium/src/sql/error_delegate... > > What do you mean by "out of date"? We always try to delete expired identifies > at > > the startup of the class. > > When I said "corrupt" I was referring to a DB whose contents may be correct as > far as the DB layer is concerned but incorrect as far as this class is > concerned. In other words, does this class expect semantics from the DB not > expressed in the schema? Or is the DB just a list of records with no > inter-relation, and is only used a serialization mechanism for persistence? I'm not sure what you're getting at with this comment, but keep in mind that: - in the fleet, corruption happens because hardware is broken. - corruption can manifest as broken schema. Long-term, you might want to tighten the screws past what the "is catastrophic" test returns. The other cases aren't necessarily going to recover themselves, they were just deemed too uncertain to act on. Personally I've never been convinced that we can have a global sense of unrecoverable, due to differences in how sub-systems are structured.
> > Long-term, you might want to tighten the screws past what the "is > catastrophic" test returns. The other cases aren't necessarily going to > recover themselves, they were just deemed too uncertain to act on. > Personally I've never been convinced that we can have a global sense of > unrecoverable, due to differences in how sub-systems are structured. > I suspect I agree. Jiayang: does it make sense to make all SQL errors catastrophic to this class, but not to the browser as a whole? In other words, make _any_ detected SQL issues wipe the DB completely and start over, but not CHECK? (IIRC this class is a performance optimization; nuking the DB will slow down webrtc connection establishment but not affect correctness or behavior other than that, right?) > > https://codereview.chromium.org/289343005/ > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On 2014/05/26 05:32:18, Ami Fischman wrote: > > > > Long-term, you might want to tighten the screws past what the "is > > catastrophic" test returns. The other cases aren't necessarily going to > > recover themselves, they were just deemed too uncertain to act on. > > Personally I've never been convinced that we can have a global sense of > > unrecoverable, due to differences in how sub-systems are structured. > > > > I suspect I agree. > Jiayang: does it make sense to make all SQL errors catastrophic to this > class, but not to the browser as a whole? > In other words, make _any_ detected SQL issues wipe the DB completely and > start over, but not CHECK? > (IIRC this class is a performance optimization; nuking the DB will slow > down webrtc connection establishment but not affect correctness or behavior > other than that, right?) > > > > > > https://codereview.chromium.org/289343005/ > > > > To unsubscribe from this group and stop receiving emails from it, send an email > to mailto:chromium-reviews+unsubscribe@chromium.org. Yeah, that makes sense. Done.
https://codereview.chromium.org/289343005/diff/20001/content/browser/media/we... File content/browser/media/webrtc_identity_store_backend.cc (right): https://codereview.chromium.org/289343005/diff/20001/content/browser/media/we... content/browser/media/webrtc_identity_store_backend.cc:383: DVLOG(2) << "Unable to open DB."; On 2014/05/24 05:06:41, shess wrote: > For all of these log lines, please verify that you're not just duplicating > something already covered in sql/. These are debug-only, so they won't bloat up > the release binary, but it's still annoying to get 12 lines of redundant > information when you trip something. > > Hmm, I guess more to the point, are there any circumstances where you imagine > someone will enable the --verbose flags to see these AND will even have compiled > things with the right flags that the code is even there AND won't be in a > position to just layer in some additional spot logging or run it in a debugger? I don't think it's redundant with the sql/ logs, which does not tell which caller is hit by the DB error and gives no hint on what feature may be broken. It's true that these logs will be not be shown/useful for most of the time, but it will still make it more convenient for the developers to nail down problems without touching the code. E.g. a webrtc/libjingle developer may expierience issues with DTLS but has no idea about this part of the code in Chrome. Maybe I should raise the level of the log, say DLOG(WARNING)? https://codereview.chromium.org/289343005/diff/20001/content/browser/media/we... content/browser/media/webrtc_identity_store_backend.cc:498: !StartsWithASCII(statement, "DELETE", false)) { On 2014/05/24 05:06:41, shess wrote: > On 2014/05/23 23:47:01, Ami Fischman wrote: > > On 2014/05/23 23:34:54, jiayl wrote: > > > On 2014/05/23 23:30:30, Ami Fischman wrote: > > > > ಠ_ಠ > > > > > > > > What about "dElEte"? > > > > (should you ToLower() the statement before text-compare?) > > > > > > > > More generally this seems quite brittle for something that's supposed to > be > > > > managing security-related things. > > > > What's the cost of getting this wrong (using a corrupt/out-of-date DB)? > > > > What's the cost of being overzealous (always nuking the DB on error)? > > > > > > > > > The comparison is case insensitive. > > > > So it is! > > (I missed the "false") > > If you're going to use this kind of assumption to test results, then please add > some sort of DCHECK somewhere to prove that the code is adhering to your > assumptions (as opposed to containing leading whitespace or a leading comment or > something). Obviously putting that DCHECK here has coverage issues, I'd be > comfortable with having it in sql/, but am not offering to do the gruntwork to > flush out whoever is already doing that. > > Also, there is support code in sql/test to allow injecting some types of > corruption into databases. If you're committed to handling error cases, you > really should test them. There have been cases where the error-handling code > was simply wrong, and we found out when thousands of people started crashing. > > [I haven't read your code thoroughly, perhaps there are tests of this sort out > there. But obviously they can't be testing _these_ cases, because of the > CHECKs.] > > > > DB corruption will be a catastrophic error. See > > > > > > https://code.google.com/p/chromium/codesearch#chromium/src/sql/error_delegate... > > > What do you mean by "out of date"? We always try to delete expired > identifies > > at > > > the startup of the class. > > > > When I said "corrupt" I was referring to a DB whose contents may be correct as > > far as the DB layer is concerned but incorrect as far as this class is > > concerned. In other words, does this class expect semantics from the DB not > > expressed in the schema? Or is the DB just a list of records with no > > inter-relation, and is only used a serialization mechanism for persistence? > > I'm not sure what you're getting at with this comment, but keep in mind that: > - in the fleet, corruption happens because hardware is broken. > - corruption can manifest as broken schema. > > Long-term, you might want to tighten the screws past what the "is catastrophic" > test returns. The other cases aren't necessarily going to recover themselves, > they were just deemed too uncertain to act on. Personally I've never been > convinced that we can have a global sense of unrecoverable, due to differences > in how sub-systems are structured. Now all DB errors will trigger RazeAndClose. A test was added.
@shess: is it guaranteed that a stmt.Run() returning false will result in an OnDatabaseError() callback? https://codereview.chromium.org/289343005/diff/40001/content/browser/media/we... File content/browser/media/webrtc_identity_store_backend.cc (right): https://codereview.chromium.org/289343005/diff/40001/content/browser/media/we... content/browser/media/webrtc_identity_store_backend.cc:573: return; In case of error here or at l.583 or 595 below, the rest of pending_operations_ will not be tried, and p_o_ will not be cleared (it may contain a mix of ptrs pointing to live & freed memory, too). Is that ok? (should the first two return's be continue's? Should p_o_.clear() be called no matter what?)
On 2014/05/27 19:25:05, Ami Fischman wrote: > @shess: is it guaranteed that a stmt.Run() returning false will result in an > OnDatabaseError() callback? It may not get such a callback in-scope for the Run(), for instance if corruption causes an invalid statement in prepare. But the intention is that it should have caused a callback at _that_ point. Unfortunately, it's hard to reach "guaranteed", because these cases almost invariably do not happen on a developer workstation. I don't see sql::Connection::set_histogram_tag() being called in here. That plus a mod in histograms.xml will enable a couple histograms, including Sqlite.Error.<Tag>, which logs the errors seen in the fleet. It probably wouldn't hurt to spin that up.
On Tue, May 27, 2014 at 1:30 PM, <shess@chromium.org> wrote: > I don't see sql::Connection::set_histogram_tag() being called in here. > That > plus a mod in histograms.xml will enable a couple histograms, including > Sqlite.Error.<Tag>, which logs the errors seen in the fleet. It probably > wouldn't hurt to spin that up. > To be clear - I don't think that call will help or hinder this CL. It's just useful to have so you can reason a little bit about what kinds of errors you can expect to see. > https://codereview.chromium.org/289343005/ > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
https://codereview.chromium.org/289343005/diff/40001/content/browser/media/we... File content/browser/media/webrtc_identity_store_backend.cc (right): https://codereview.chromium.org/289343005/diff/40001/content/browser/media/we... content/browser/media/webrtc_identity_store_backend.cc:573: return; On 2014/05/27 19:25:06, Ami Fischman wrote: > In case of error here or at l.583 or 595 below, the rest of pending_operations_ > will not be tried, and p_o_ will not be cleared (it may contain a mix of ptrs > pointing to live & freed memory, too). Is that ok? > (should the first two return's be continue's? Should p_o_.clear() be called no > matter what?) Fixed now. Always clear p_o_.
https://codereview.chromium.org/289343005/diff/50003/content/browser/media/we... File content/browser/media/webrtc_identity_store_backend.cc (right): https://codereview.chromium.org/289343005/diff/50003/content/browser/media/we... content/browser/media/webrtc_identity_store_backend.cc:557: // |pending_operations_| is always cleared in case of DB errors. It'll be clear, but you'll leak the PendingOperations that don't get executed b/c of the returns below. That's ok?
https://codereview.chromium.org/289343005/diff/50003/content/browser/media/we... File content/browser/media/webrtc_identity_store_backend.cc (right): https://codereview.chromium.org/289343005/diff/50003/content/browser/media/we... content/browser/media/webrtc_identity_store_backend.cc:557: // |pending_operations_| is always cleared in case of DB errors. On 2014/05/27 22:16:26, Ami Fischman wrote: > It'll be clear, but you'll leak the PendingOperations that don't get executed > b/c of the returns below. That's ok? Argh, there must be something wrong with me today. Fixed.
LGTM https://codereview.chromium.org/289343005/diff/70001/content/browser/media/we... File content/browser/media/webrtc_identity_store_backend.cc (right): https://codereview.chromium.org/289343005/diff/70001/content/browser/media/we... content/browser/media/webrtc_identity_store_backend.cc:565: PendingOperation* po = *it; FWIW could also const& it instead.
The CQ bit was checked by jiayl@chromium.org
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/jiayl@chromium.org/289343005/90001
Message was sent while issue was closed.
Change committed as 273135 |