Issue 1742693003: [sqlite] Backport icuLikeCompare patch from SQLite.

Created:
4 years, 9 months ago by Scott Hess - ex-Googler

Modified:
4 years, 8 months ago

Reviewers:
jungshik at Google

CC:
chromium-reviews, chromium-apps-reviews_chromium.org, jshin+watch_chromium.org, extensions-reviews_chromium.org

Base URL:
https://chromium.googlesource.com/chromium/src.git@zzsql_patch_recover

Target Ref:
refs/pending/heads/master

Project:
chromium

Visibility:
Public.

More Reviews

Description

[sqlite] Backport icuLikeCompare patch from SQLite. Original Chromium CL at https://codereview.chromium.org/1643803003 "Use safe macros for UTF-8 iteration in sqlite" SQLite interpretation: https://www.sqlite.org/src/info/424b7aee3310b978 "Fix the ICU extension LIKE function so that it does not read past the end of a buffer if it it passed malformed utf-8." BUG=575205

Patch Set 1 #

Total comments: 4

Created: 4 years, 9 months ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Stats (+237 lines, -99 lines)			Patch
M	third_party/sqlite/amalgamation/sqlite3.c	View	5 chunks	+47 lines, -17 lines	0 comments	Download
D	third_party/sqlite/patches/0012-Use-safe-macros-for-UTF-8-iteration-in-sqlite-icu-ex.patch	View	1 chunk	+0 lines, -65 lines	0 comments	Download
A	third_party/sqlite/patches/0012-backport-Fix-buffer-overrun-in-ICU-extension-s-LIKE-.patch	View	1 chunk	+143 lines, -0 lines	0 comments	Download
M	third_party/sqlite/src/ext/icu/icu.c	View	5 chunks	+47 lines, -17 lines	4 comments	Download

Depends on Patchset:

Issue 1742763003 Patch 1

Dependent Patchsets:

Issue 1746453002 Patch 1

Messages

Total messages: 7 (2 generated)

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages

Scott Hess - ex-Googler

This backports the patch SQLite uses to fix the problem. AFAICT, it does actually fix ...

4 years, 9 months ago (2016-02-26 22:31:46 UTC) #2

Scott Hess - ex-Googler

On 2016/02/26 22:31:46, Scott Hess wrote: > This backports the patch SQLite uses to fix ...

4 years, 9 months ago (2016-03-02 18:40:14 UTC) #4

jungshik at Google

Sorry for the late reply. The upstream change has some issues with malformed UTF-8. https://codereview.chromium.org/1742693003/diff/1/third_party/sqlite/src/ext/icu/icu.c ...

4 years, 9 months ago (2016-03-11 21:37:08 UTC) #5

Scott Hess - ex-Googler

I think there are two levels, here. At one level is whether the new code ...

4 years, 9 months ago (2016-03-11 23:32:33 UTC) #6

jungshik at Google

4 years, 8 months ago (2016-04-05 20:00:52 UTC) #7

Sorry for the terribly late reply. I drafted one and then never sent it out. 

On 2016/03/11 23:32:33, Scott Hess wrote:
> I think there are two levels, here.  At one level is whether the new code
makes
> invalid assumptions based on the input that lead to buffer overruns, which I
> think does not happen.

I agree. 

> The second level is about whether it does the "right" thing.  In the code you
> originally wrote, the invalid runs would all be  converted U+FFFD.  Since the
> overall comparison code does not flag these values, it will simply compare
> between the strings.  As best I understand things, with the new code, invalid
> code points will be returned, and, again, the invalidness will be ignored and
> they'll be compared between the strings.

> AFAICT, in either case strings broken in the same way will end up comparing
> correctly.  In your code there are cases where strings will look equal when
they
> aren't.  The same is true of their code, where an invalidly decoded code point
> could equal a validly decoded code point.  Is that about right?

Yeah, that's about right *if* strings are sequences of *bytes* as opposed to 
*Unicode characters*.  

The upstream change treats over-long UTF-8 sequences (invalid per UTF-8 spec) as
equivalent to their canonical representation.  As you know well, a single
character can be 
represented in multiple ways *if* overlong sequences are allowed. For instance, 
'A' (0x41) can be represented in 8(4)  different ways (1-byte ASCII, 2 ~ 8 bytes
in the old UTF-8 
and 2 ~ 4 bytes in the new but still invalid UTF-8) *if* over-long sequences are
allowed. 
This makes me worried (of a possible security issue: I don't have any concrete
case in mind, though).

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages