Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(13)

Issue 1742693003: [sqlite] Backport icuLikeCompare patch from SQLite.

Created:
4 years, 9 months ago by Scott Hess - ex-Googler
Modified:
4 years, 8 months ago
Reviewers:
jungshik at Google
CC:
chromium-reviews, chromium-apps-reviews_chromium.org, jshin+watch_chromium.org, extensions-reviews_chromium.org
Base URL:
https://chromium.googlesource.com/chromium/src.git@zzsql_patch_recover
Target Ref:
refs/pending/heads/master
Project:
chromium
Visibility:
Public.

Description

[sqlite] Backport icuLikeCompare patch from SQLite. Original Chromium CL at https://codereview.chromium.org/1643803003 "Use safe macros for UTF-8 iteration in sqlite" SQLite interpretation: https://www.sqlite.org/src/info/424b7aee3310b978 "Fix the ICU extension LIKE function so that it does not read past the end of a buffer if it it passed malformed utf-8." BUG=575205

Patch Set 1 #

Total comments: 4
Unified diffs Side-by-side diffs Delta from patch set Stats (+237 lines, -99 lines) Patch
M third_party/sqlite/amalgamation/sqlite3.c View 5 chunks +47 lines, -17 lines 0 comments Download
D third_party/sqlite/patches/0012-Use-safe-macros-for-UTF-8-iteration-in-sqlite-icu-ex.patch View 1 chunk +0 lines, -65 lines 0 comments Download
A third_party/sqlite/patches/0012-backport-Fix-buffer-overrun-in-ICU-extension-s-LIKE-.patch View 1 chunk +143 lines, -0 lines 0 comments Download
M third_party/sqlite/src/ext/icu/icu.c View 5 chunks +47 lines, -17 lines 4 comments Download

Depends on Patchset:

Dependent Patchsets:

Messages

Total messages: 7 (2 generated)
Scott Hess - ex-Googler
This backports the patch SQLite uses to fix the problem. AFAICT, it does actually fix ...
4 years, 9 months ago (2016-02-26 22:31:46 UTC) #2
Scott Hess - ex-Googler
On 2016/02/26 22:31:46, Scott Hess wrote: > This backports the patch SQLite uses to fix ...
4 years, 9 months ago (2016-03-02 18:40:14 UTC) #4
jungshik at Google
Sorry for the late reply. The upstream change has some issues with malformed UTF-8. https://codereview.chromium.org/1742693003/diff/1/third_party/sqlite/src/ext/icu/icu.c ...
4 years, 9 months ago (2016-03-11 21:37:08 UTC) #5
Scott Hess - ex-Googler
I think there are two levels, here. At one level is whether the new code ...
4 years, 9 months ago (2016-03-11 23:32:33 UTC) #6
jungshik at Google
4 years, 8 months ago (2016-04-05 20:00:52 UTC) #7
Sorry for the terribly late reply. I drafted one and then never sent it out. 

On 2016/03/11 23:32:33, Scott Hess wrote:
> I think there are two levels, here.  At one level is whether the new code
makes
> invalid assumptions based on the input that lead to buffer overruns, which I
> think does not happen.

I agree. 

 
> The second level is about whether it does the "right" thing.  In the code you
> originally wrote, the invalid runs would all be  converted U+FFFD.  Since the
> overall comparison code does not flag these values, it will simply compare
> between the strings.  As best I understand things, with the new code, invalid
> code points will be returned, and, again, the invalidness will be ignored and
> they'll be compared between the strings.


> AFAICT, in either case strings broken in the same way will end up comparing
> correctly.  In your code there are cases where strings will look equal when
they
> aren't.  The same is true of their code, where an invalidly decoded code point
> could equal a validly decoded code point.  Is that about right?

Yeah, that's about right *if* strings are sequences of *bytes* as opposed to 
*Unicode characters*.  

The upstream change treats over-long UTF-8 sequences (invalid per UTF-8 spec) as
equivalent to their canonical representation.  As you know well, a single
character can be 
represented in multiple ways *if* overlong sequences are allowed. For instance, 
'A' (0x41) can be represented in 8(4)  different ways (1-byte ASCII, 2 ~ 8 bytes
in the old UTF-8 
and 2 ~ 4 bytes in the new but still invalid UTF-8) *if* over-long sequences are
allowed. 
This makes me worried (of a possible security issue: I don't have any concrete
case in mind, though).

Powered by Google App Engine
This is Rietveld 408576698