Issue 1165793007: Make sure multi chracter codepoints are deleted correctly

Issue 1165793007: Make sure multi chracter codepoints are deleted correctly (Closed)

Created:
5 years, 6 months ago by Changwan Ryu

Modified:
5 years, 6 months ago

Reviewers:
bcwhite, Ted C, aelias_OOO_until_Jul13, aurimas (slooooooooow)

CC:
chromium-reviews, darin-cc_chromium.org, jam

Base URL:
https://chromium.googlesource.com/chromium/src.git@master

Target Ref:
refs/pending/heads/master

Project:
chromium

Visibility:
Public.

More Reviews

Description

Make sure multi chracter codepoints are deleted correctly deleteSurroundingText() only deletes one character even for multi-character codepoint. On the blink side, we have InputMethodController::extendSelectionAndDelete() to make sure selection and deletion respect Unicode boundaries. However, AdapterInputConnection keeps track of selection region separately, and this value is incorrectly updated. On top of adding a new test for this case, it extends waitAndVerifyEditorCallback to also check the outbound calls to InputMethodManager. The above extension found that testEnterKeyEventWhileComposingText fails because there is hidden discrepancy between blink implementation and what we report to InputMethodManager. So I've added a TODO for that. BUG=497091 Committed: https://crrev.com/5f6e036312bc4a978768f5b5971eee1a5ec9f272 Cr-Commit-Position: refs/heads/master@{#333704}

Patch Set 1 #

Patch Set 2 : #

Patch Set 3 : #

Total comments: 3

Patch Set 4 : addressed brian's comment and TODOs #

Patch Set 5 : fixed findbug error #

Patch Set 6 : do not check updateSelection for some of the tests #

Created: 5 years, 6 months ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+221 lines, -59 lines)			Patch
M	content/public/android/java/src/org/chromium/content/browser/input/AdapterInputConnection.java	View	1 2 3	5 chunks	+31 lines, -6 lines	0 comments	Download
M	content/public/android/javatests/src/org/chromium/content/browser/input/AdapterInputConnectionTest.java	View	1 2 3	2 chunks	+18 lines, -0 lines	0 comments	Download
M	content/public/android/javatests/src/org/chromium/content/browser/input/ImeTest.java	View	1 2 3 4 5	27 chunks	+112 lines, -52 lines	0 comments	Download
M	content/public/test/android/javatests/src/org/chromium/content/browser/test/util/TestInputMethodManagerWrapper.java	View	1 2 3 4	3 chunks	+60 lines, -1 line	0 comments	Download

Messages

Total messages: 28 (7 generated)

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages

bcwhite

LGTM https://codereview.chromium.org/1165793007/diff/40001/content/public/android/java/src/org/chromium/content/browser/input/AdapterInputConnection.java File content/public/android/java/src/org/chromium/content/browser/input/AdapterInputConnection.java (right): https://codereview.chromium.org/1165793007/diff/40001/content/public/android/java/src/org/chromium/content/browser/input/AdapterInputConnection.java#newcode362 content/public/android/java/src/org/chromium/content/browser/input/AdapterInputConnection.java:362: public static int adjustLengthIfAtUtf16Boundary(CharSequence str, int index) { ...

5 years, 6 months ago (2015-06-05 18:27:32 UTC) #4

aelias_OOO_until_Jul13

Hmm, this fix doesn't seem like the right approach to me. First of all, the ...

5 years, 6 months ago (2015-06-05 19:31:12 UTC) #5

aelias_OOO_until_Jul13

"not lgtm" to clear out previous lgtm (particularly Aurimas's which was on a test-only variant ...

5 years, 6 months ago (2015-06-05 19:49:56 UTC) #6

Changwan Ryu

On 2015/06/05 19:31:12, aelias wrote: > Hmm, this fix doesn't seem like the right approach ...

5 years, 6 months ago (2015-06-05 22:19:33 UTC) #7

aelias_OOO_until_Jul13

On 2015/06/05 at 22:19:33, changwan wrote: > On 2015/06/05 19:31:12, aelias wrote: > > Hmm, ...

5 years, 6 months ago (2015-06-05 22:54:14 UTC) #8

On 2015/06/05 at 22:19:33, changwan wrote:
> On 2015/06/05 19:31:12, aelias wrote:
> > Hmm, this fix doesn't seem like the right approach to me.  First of all, the
> > "adjustLengthIfAtUTF16Boundary" method feels like reinventing the wheel. 
There
> > has got to be existing standard-library methods to do anything like this. 
If
> > there isn't any similar standard-library method, then it's probably not a
> > particularly legit thing to do with Unicode.
> > 
> 
> Very similar functions are found in some libraries that we aren't importing:
> 
> com.google.android.common.base#unicodePreservingIndex
> com.google.base.Strings#validSurrogatePairAt
> 

Hmm, I'm only finding that unicodePreservingIndex in code related to the email
app, is that actually in a base library?

> > More fundamentally, the emoji is only one character so the selection size
should
> > be only 1, not 2.  Some part(s) of our system got confused and thought it's
2
> > characters instead of 1 because it happens to made out of 2 UTF-16 words.  I
> > think the backspace/selection code should all be thinking it's only acting
on
> > one character, and there should be some lower level charset/encoding layer
that
> > takes removing the 4 bytes from the underlying data structure, invisibly to
the
> > higher-level code.
> 
> I might be wrong, but I think the selection size should be 2 because of the
following observations:
> 
> - All the basic implementations of CharSequence and CharSequenceIterator are
in UTF-16 encoding,
> even though it is still theoretically possible that some third-party
implementations may not be.
> So CharSequence.charAt() and size() all take the emoji as two characters.
> - Some functions in android.view.inputmethod.InputConnection explicitly
mentions 'characters'
> which IMO refers to CharSequence characters.
> - ExtractedText has CharSequence and offsets. I think it would be confusing to
IME developers if
> they cannot apply basic offset functions to the CharSequence text.
> 
> I think we just need to be more careful at deletion and selection (maybe
setComposingRegion should be fixed as well in the future).

Hmm, you have a point since
https://docs.oracle.com/javase/7/docs/api/java/lang/CharSequence.html says each
character "represents a character in the Basic Multilingual Plane (BMP) or a
surrogate".  That seems like an awful legacy of the 90s but I guess we're going
to have to work with it.

Note that another place says "In the Java SE API documentation, Unicode code
point is used for character values in the range between U+0000 and U+10FFFF, and
Unicode code unit is used for 16-bit char values that are code units of the
UTF-16 encoding."  In future discussions, let's use these terms instead of
"character" when it's ambiguous.

What happens in a TextView when emoji are inputted?  Does it take up 2 spaces in
the selection coordinates or 1?

Changwan Ryu

On 2015/06/05 22:54:14, aelias wrote: > On 2015/06/05 at 22:19:33, changwan wrote: > > On ...

5 years, 6 months ago (2015-06-08 06:05:57 UTC) #9

On 2015/06/05 22:54:14, aelias wrote:
> On 2015/06/05 at 22:19:33, changwan wrote:
> > On 2015/06/05 19:31:12, aelias wrote:
> > > Hmm, this fix doesn't seem like the right approach to me.  First of all,
the
> > > "adjustLengthIfAtUTF16Boundary" method feels like reinventing the wheel. 
> There
> > > has got to be existing standard-library methods to do anything like this. 
> If
> > > there isn't any similar standard-library method, then it's probably not a
> > > particularly legit thing to do with Unicode.
> > > 
> > 
> > Very similar functions are found in some libraries that we aren't importing:
> > 
> > com.google.android.common.base#unicodePreservingIndex
> > com.google.base.Strings#validSurrogatePairAt
> > 
> 
> Hmm, I'm only finding that unicodePreservingIndex in code related to the email
> app, is that actually in a base library?
Please check http://crbug.com/497091

> 
> > > More fundamentally, the emoji is only one character so the selection size
> should
> > > be only 1, not 2.  Some part(s) of our system got confused and thought
it's
> 2
> > > characters instead of 1 because it happens to made out of 2 UTF-16 words. 
I
> > > think the backspace/selection code should all be thinking it's only acting
> on
> > > one character, and there should be some lower level charset/encoding layer
> that
> > > takes removing the 4 bytes from the underlying data structure, invisibly
to
> the
> > > higher-level code.
> > 
> > I might be wrong, but I think the selection size should be 2 because of the
> following observations:
> > 
> > - All the basic implementations of CharSequence and CharSequenceIterator are
> in UTF-16 encoding,
> > even though it is still theoretically possible that some third-party
> implementations may not be.
> > So CharSequence.charAt() and size() all take the emoji as two characters.
> > - Some functions in android.view.inputmethod.InputConnection explicitly
> mentions 'characters'
> > which IMO refers to CharSequence characters.
> > - ExtractedText has CharSequence and offsets. I think it would be confusing
to
> IME developers if
> > they cannot apply basic offset functions to the CharSequence text.
> > 
> > I think we just need to be more careful at deletion and selection (maybe
> setComposingRegion should be fixed as well in the future).
> 
> Hmm, you have a point since
> https://docs.oracle.com/javase/7/docs/api/java/lang/CharSequence.html says
each
> character "represents a character in the Basic Multilingual Plane (BMP) or a
> surrogate".  That seems like an awful legacy of the 90s but I guess we're
going
> to have to work with it.
> 
> Note that another place says "In the Java SE API documentation, Unicode code
> point is used for character values in the range between U+0000 and U+10FFFF,
and
> Unicode code unit is used for 16-bit char values that are code units of the
> UTF-16 encoding."  In future discussions, let's use these terms instead of
> "character" when it's ambiguous.
> 
> What happens in a TextView when emoji are inputted?  Does it take up 2 spaces
in
> the selection coordinates or 1?

I'll have to either hook up Android or a emoji enabled IME to figure this out.
I'm starting to download Android source code.

Changwan Ryu

On 2015/06/08 06:05:57, Changwan Ryu wrote: > On 2015/06/05 22:54:14, aelias wrote: > > On ...

5 years, 6 months ago (2015-06-08 07:08:49 UTC) #10

On 2015/06/08 06:05:57, Changwan Ryu wrote:
> On 2015/06/05 22:54:14, aelias wrote:
> > On 2015/06/05 at 22:19:33, changwan wrote:
> > > On 2015/06/05 19:31:12, aelias wrote:
> > > > Hmm, this fix doesn't seem like the right approach to me.  First of all,
> the
> > > > "adjustLengthIfAtUTF16Boundary" method feels like reinventing the wheel.

> > There
> > > > has got to be existing standard-library methods to do anything like
this. 
> > If
> > > > there isn't any similar standard-library method, then it's probably not
a
> > > > particularly legit thing to do with Unicode.
> > > > 
> > > 
> > > Very similar functions are found in some libraries that we aren't
importing:
> > > 
> > > com.google.android.common.base#unicodePreservingIndex
> > > com.google.base.Strings#validSurrogatePairAt
> > > 
> > 
> > Hmm, I'm only finding that unicodePreservingIndex in code related to the
email
> > app, is that actually in a base library?
> Please check http://crbug.com/497091
> 
> > 
> > > > More fundamentally, the emoji is only one character so the selection
size
> > should
> > > > be only 1, not 2.  Some part(s) of our system got confused and thought
> it's
> > 2
> > > > characters instead of 1 because it happens to made out of 2 UTF-16
words. 
> I
> > > > think the backspace/selection code should all be thinking it's only
acting
> > on
> > > > one character, and there should be some lower level charset/encoding
layer
> > that
> > > > takes removing the 4 bytes from the underlying data structure, invisibly
> to
> > the
> > > > higher-level code.
> > > 
> > > I might be wrong, but I think the selection size should be 2 because of
the
> > following observations:
> > > 
> > > - All the basic implementations of CharSequence and CharSequenceIterator
are
> > in UTF-16 encoding,
> > > even though it is still theoretically possible that some third-party
> > implementations may not be.
> > > So CharSequence.charAt() and size() all take the emoji as two characters.
> > > - Some functions in android.view.inputmethod.InputConnection explicitly
> > mentions 'characters'
> > > which IMO refers to CharSequence characters.
> > > - ExtractedText has CharSequence and offsets. I think it would be
confusing
> to
> > IME developers if
> > > they cannot apply basic offset functions to the CharSequence text.
> > > 
> > > I think we just need to be more careful at deletion and selection (maybe
> > setComposingRegion should be fixed as well in the future).
> > 
> > Hmm, you have a point since
> > https://docs.oracle.com/javase/7/docs/api/java/lang/CharSequence.html says
> each
> > character "represents a character in the Basic Multilingual Plane (BMP) or a
> > surrogate".  That seems like an awful legacy of the 90s but I guess we're
> going
> > to have to work with it.
> > 
> > Note that another place says "In the Java SE API documentation, Unicode code
> > point is used for character values in the range between U+0000 and U+10FFFF,
> and
> > Unicode code unit is used for 16-bit char values that are code units of the
> > UTF-16 encoding."  In future discussions, let's use these terms instead of
> > "character" when it's ambiguous.
> > 
> > What happens in a TextView when emoji are inputted?  Does it take up 2
spaces
> in
> > the selection coordinates or 1?
> 
> I'll have to either hook up Android or a emoji enabled IME to figure this out.
> I'm starting to download Android source code.

Please check the crbug. Android's EditText also takes the emoji as two
'characters'.

aelias_OOO_until_Jul13

OK, I suppose the approach is fine, just a minor comment below. Please also address ...

5 years, 6 months ago (2015-06-09 03:52:53 UTC) #11

Changwan Ryu

PTAL I've extended waitAndVerifyEditableCallback to also check outbound calls to InputMethodManager, and removed some of ...

5 years, 6 months ago (2015-06-09 12:34:12 UTC) #12

Changwan Ryu

Ted, could you review TestInputMethodManagerWrapper.java? Thanks.

5 years, 6 months ago (2015-06-10 00:14:55 UTC) #15

Ted C

On 2015/06/10 00:14:55, Changwan Ryu wrote: > Ted, could you review TestInputMethodManagerWrapper.java? Thanks. TestInputMethodManagerWrapper.java - ...

5 years, 6 months ago (2015-06-10 00:18:17 UTC) #16

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1165793007/80001

5 years, 6 months ago (2015-06-10 00:24:35 UTC) #19

Changwan Ryu

On 2015/06/10 00:18:17, Ted C wrote: > On 2015/06/10 00:14:55, Changwan Ryu wrote: > > ...

5 years, 6 months ago (2015-06-10 00:29:18 UTC) #20

commit-bot: I haz the power

Try jobs failed on following builders: linux_android_rel_ng on tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_android_rel_ng/builds/31938)

5 years, 6 months ago (2015-06-10 03:02:28 UTC) #22

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1165793007/100001

5 years, 6 months ago (2015-06-10 08:06:24 UTC) #25

commit-bot: I haz the power

Patchset 6 (id:??) landed as https://crrev.com/5f6e036312bc4a978768f5b5971eee1a5ec9f272 Cr-Commit-Position: refs/heads/master@{#333704}

5 years, 6 months ago (2015-06-10 09:11:15 UTC) #27

David Trainor- moved to gerrit

5 years, 6 months ago (2015-06-10 18:28:41 UTC) #28

Message was sent while issue was closed.

A revert of this CL (patchset #6 id:100001) has been created in
https://codereview.chromium.org/1173083004/ by dtrainor@chromium.org.

The reason for reverting is: Failing Android Tests (dbg) builder:
http://build.chromium.org/p/chromium.linux/builders/Android%20Tests%20%28dbg%...

The newly added test seems to be flaky..

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages