Issue 1340423006: Improve text elision in TruncateString().

Issue 1340423006: Improve text elision in TruncateString(). (Closed)

Created:
5 years, 3 months ago by Peter Kasting

Modified:
5 years, 3 months ago

Reviewers:
msw

CC:
chromium-reviews

Base URL:
https://chromium.googlesource.com/chromium/src.git@master

Target Ref:
refs/pending/heads/master

Project:
chromium

Visibility:
Public.

More Reviews

Description

Improve text elision in TruncateString(). * When doing word-breaking in the middle of the first word in a string with leading whitespace, we would incorrectly truncate the entire string. * When doing word-breaking in whitespace after a word, we would incorrectly omit that word from the truncated string. This also adds more tests and organizes them a bit more readably. BUG=none TEST=Select up to but not including the 'f' in the string "aaaaaaaaa bbbbbbbbb ccccccccc ddddddddd eeeeeeeee f". Right-click the selection and look at the "Search Google for..." string. It should include 'e's. Committed: https://crrev.com/fc336926514596ca46b1c1e2973f48c1b6766df1 Cr-Commit-Position: refs/heads/master@{#350310}

Patch Set 1 #

Total comments: 22

Patch Set 2 : Review comments #

Total comments: 3

Created: 5 years, 3 months ago

Download [raw] [tar.bz2]

		Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+94 lines, -77 lines)			Patch
	M	ui/gfx/text_elider.cc	View	1	2 chunks	+30 lines, -30 lines	1 comment	Download
	M	ui/gfx/text_elider_unittest.cc	View	1	1 chunk	+64 lines, -47 lines	2 comments	Download

Messages

Total messages: 18 (4 generated)

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages

msw

https://codereview.chromium.org/1340423006/diff/1/ui/gfx/text_elider.cc File ui/gfx/text_elider.cc (right): https://codereview.chromium.org/1340423006/diff/1/ui/gfx/text_elider.cc#newcode759 ui/gfx/text_elider.cc:759: // String fits, return it. unrelated nit: please move ...

5 years, 3 months ago (2015-09-17 18:42:45 UTC) #3

Peter Kasting

https://codereview.chromium.org/1340423006/diff/1/ui/gfx/text_elider.cc File ui/gfx/text_elider.cc (right): https://codereview.chromium.org/1340423006/diff/1/ui/gfx/text_elider.cc#newcode773 ui/gfx/text_elider.cc:773: int32_t index = static_cast<int32_t>(length) - (word_break ? 0 : ...

5 years, 3 months ago (2015-09-17 19:39:03 UTC) #4

msw

https://codereview.chromium.org/1340423006/diff/1/ui/gfx/text_elider.cc File ui/gfx/text_elider.cc (right): https://codereview.chromium.org/1340423006/diff/1/ui/gfx/text_elider.cc#newcode773 ui/gfx/text_elider.cc:773: int32_t index = static_cast<int32_t>(length) - (word_break ? 0 : ...

5 years, 3 months ago (2015-09-17 20:45:10 UTC) #5

Peter Kasting

https://codereview.chromium.org/1340423006/diff/1/ui/gfx/text_elider.cc File ui/gfx/text_elider.cc (right): https://codereview.chromium.org/1340423006/diff/1/ui/gfx/text_elider.cc#newcode773 ui/gfx/text_elider.cc:773: int32_t index = static_cast<int32_t>(length) - (word_break ? 0 : ...

5 years, 3 months ago (2015-09-17 20:58:05 UTC) #6

https://codereview.chromium.org/1340423006/diff/1/ui/gfx/text_elider.cc
File ui/gfx/text_elider.cc (right):

https://codereview.chromium.org/1340423006/diff/1/ui/gfx/text_elider.cc#newco...
ui/gfx/text_elider.cc:773: int32_t index = static_cast<int32_t>(length) -
(word_break ? 0 : 1);
On 2015/09/17 20:45:10, msw wrote:
> On 2015/09/17 19:39:03, Peter Kasting wrote:
> > On 2015/09/17 18:42:44, msw wrote:
> > > I'd be a lot more comfortable with this code using iterators to set
indices
> > and
> > > avoiding +/-1 increments that might chop surrogates. It shouldn't be too
> hard
> > to
> > > rewrite the char iterator loop to start at |length| instead of |length-1|,
> but
> > > the successful word break code path might need to return early; WDYT?
> > 
> > I tried a couple times yesterday to rewrite the char iterator loop and
> couldn't
> > do it correctly.  Also, depending on whether you're getting into it from the
> > word break loop or not, you want to start at (length) or (length - 1).  Just
> > changing that to (length + 1) and (length) doesn't help avoid potential
> > surrogate breaking.
> > 
> > I might be able to do a more extensive rewrite of this function that gets
all
> > this more right.  I wouldn't mind addressing all your
> > non-surrogate-breaking-related comments, checking this in (since the
existing
> > code already suffers from these issues, so we wouldn't be making them worse,
> but
> > we would at least improve test coverage and fix some bugs), and then doing
the
> > rewrite pass as a second CL.  Does that seem OK?
> 
> That seems fine, except lines 804 and 814 seem to potentially introduce new
> opportunities to break surrogates.

804 is definitely a regression I need to fix.  I meant to comment about that
before and forgot, but basically I need to restore the use of
char_iterator.next() there instead of adding 1.

814 can break a surrogate, but it's the same break point (length - 1) we already
have a problem with a couple other places in this code.  I don't really consider
that a new bug; I think the right fix there is basically going to be the same as
all the other places, which is pretty much, at the top of the function we need
to use a character iterator to find the right character break before length, and
then use that everywhere.  So I'd rather fix that at the same time as I fix the
similar instances.

msw

https://codereview.chromium.org/1340423006/diff/1/ui/gfx/text_elider.cc File ui/gfx/text_elider.cc (right): https://codereview.chromium.org/1340423006/diff/1/ui/gfx/text_elider.cc#newcode773 ui/gfx/text_elider.cc:773: int32_t index = static_cast<int32_t>(length) - (word_break ? 0 : ...

5 years, 3 months ago (2015-09-17 21:06:26 UTC) #7

https://codereview.chromium.org/1340423006/diff/1/ui/gfx/text_elider.cc
File ui/gfx/text_elider.cc (right):

https://codereview.chromium.org/1340423006/diff/1/ui/gfx/text_elider.cc#newco...
ui/gfx/text_elider.cc:773: int32_t index = static_cast<int32_t>(length) -
(word_break ? 0 : 1);
On 2015/09/17 20:58:05, Peter Kasting wrote:
> On 2015/09/17 20:45:10, msw wrote:
> > On 2015/09/17 19:39:03, Peter Kasting wrote:
> > > On 2015/09/17 18:42:44, msw wrote:
> > > > I'd be a lot more comfortable with this code using iterators to set
> indices
> > > and
> > > > avoiding +/-1 increments that might chop surrogates. It shouldn't be too
> > hard
> > > to
> > > > rewrite the char iterator loop to start at |length| instead of
|length-1|,
> > but
> > > > the successful word break code path might need to return early; WDYT?
> > > 
> > > I tried a couple times yesterday to rewrite the char iterator loop and
> > couldn't
> > > do it correctly.  Also, depending on whether you're getting into it from
the
> > > word break loop or not, you want to start at (length) or (length - 1). 
Just
> > > changing that to (length + 1) and (length) doesn't help avoid potential
> > > surrogate breaking.
> > > 
> > > I might be able to do a more extensive rewrite of this function that gets
> all
> > > this more right.  I wouldn't mind addressing all your
> > > non-surrogate-breaking-related comments, checking this in (since the
> existing
> > > code already suffers from these issues, so we wouldn't be making them
worse,
> > but
> > > we would at least improve test coverage and fix some bugs), and then doing
> the
> > > rewrite pass as a second CL.  Does that seem OK?
> > 
> > That seems fine, except lines 804 and 814 seem to potentially introduce new
> > opportunities to break surrogates.
> 
> 804 is definitely a regression I need to fix.  I meant to comment about that
> before and forgot, but basically I need to restore the use of
> char_iterator.next() there instead of adding 1.
> 
> 814 can break a surrogate, but it's the same break point (length - 1) we
already
> have a problem with a couple other places in this code.  I don't really
consider
> that a new bug; I think the right fix there is basically going to be the same
as
> all the other places, which is pretty much, at the top of the function we need
> to use a character iterator to find the right character break before length,
and
> then use that everywhere.  So I'd rather fix that at the same time as I fix
the
> similar instances.

OK, sgtm

Peter Kasting

PTAL https://codereview.chromium.org/1340423006/diff/1/ui/gfx/text_elider.cc File ui/gfx/text_elider.cc (right): https://codereview.chromium.org/1340423006/diff/1/ui/gfx/text_elider.cc#newcode759 ui/gfx/text_elider.cc:759: // String fits, return it. On 2015/09/17 18:42:44, ...

5 years, 3 months ago (2015-09-19 02:43:25 UTC) #8

PTAL

https://codereview.chromium.org/1340423006/diff/1/ui/gfx/text_elider.cc
File ui/gfx/text_elider.cc (right):

https://codereview.chromium.org/1340423006/diff/1/ui/gfx/text_elider.cc#newco...
ui/gfx/text_elider.cc:759: // String fits, return it.
On 2015/09/17 18:42:44, msw wrote:
> unrelated nit: please move the comment above the if statement or add curlies;
> ditto for the other two early returns below.

Fixed by exchanging these for some EOL comments (which I rewrote slightly).

https://codereview.chromium.org/1340423006/diff/1/ui/gfx/text_elider.cc#newco...
ui/gfx/text_elider.cc:773: int32_t index = static_cast<int32_t>(length) -
(word_break ? 0 : 1);
On 2015/09/17 21:06:26, msw wrote:
> On 2015/09/17 20:58:05, Peter Kasting wrote:
> > On 2015/09/17 20:45:10, msw wrote:
> > > On 2015/09/17 19:39:03, Peter Kasting wrote:
> > > > On 2015/09/17 18:42:44, msw wrote:
> > > > > I'd be a lot more comfortable with this code using iterators to set
> > indices
> > > > and
> > > > > avoiding +/-1 increments that might chop surrogates. It shouldn't be
too
> > > hard
> > > > to
> > > > > rewrite the char iterator loop to start at |length| instead of
> |length-1|,
> > > but
> > > > > the successful word break code path might need to return early; WDYT?
> > > > 
> > > > I tried a couple times yesterday to rewrite the char iterator loop and
> > > couldn't
> > > > do it correctly.  Also, depending on whether you're getting into it from
> the
> > > > word break loop or not, you want to start at (length) or (length - 1). 
> Just
> > > > changing that to (length + 1) and (length) doesn't help avoid potential
> > > > surrogate breaking.
> > > > 
> > > > I might be able to do a more extensive rewrite of this function that
gets
> > all
> > > > this more right.  I wouldn't mind addressing all your
> > > > non-surrogate-breaking-related comments, checking this in (since the
> > existing
> > > > code already suffers from these issues, so we wouldn't be making them
> worse,
> > > but
> > > > we would at least improve test coverage and fix some bugs), and then
doing
> > the
> > > > rewrite pass as a second CL.  Does that seem OK?
> > > 
> > > That seems fine, except lines 804 and 814 seem to potentially introduce
new
> > > opportunities to break surrogates.
> > 
> > 804 is definitely a regression I need to fix.  I meant to comment about that
> > before and forgot, but basically I need to restore the use of
> > char_iterator.next() there instead of adding 1.
> > 
> > 814 can break a surrogate, but it's the same break point (length - 1) we
> already
> > have a problem with a couple other places in this code.  I don't really
> consider
> > that a new bug; I think the right fix there is basically going to be the
same
> as
> > all the other places, which is pretty much, at the top of the function we
need
> > to use a character iterator to find the right character break before length,
> and
> > then use that everywhere.  So I'd rather fix that at the same time as I fix
> the
> > similar instances.
> 
> OK, sgtm

Fixed line 804 by restoring the next() call.

https://codereview.chromium.org/1340423006/diff/1/ui/gfx/text_elider.cc#newco...
ui/gfx/text_elider.cc:781: return string.substr(0, length - 1) + kElideString;
On 2015/09/17 18:42:44, msw wrote:
> It's unrelated (okay for follow-up), but I think this error case should still
> try to use the character iterator codepath below instead of potentially
chopping
> a surrogate pair or similar.

Yep, will address in the followup.

https://codereview.chromium.org/1340423006/diff/1/ui/gfx/text_elider.cc#newco...
ui/gfx/text_elider.cc:783: index = bi->preceding(index);
On 2015/09/17 18:42:44, msw wrote:
> nit: consider using |index = bi->preceding(length);| here and just
> unconditionally initializing index to |length - 1| on line 773

Done.

https://codereview.chromium.org/1340423006/diff/1/ui/gfx/text_elider.cc#newco...
ui/gfx/text_elider.cc:803: // this character, plus the ellipsis.
On 2015/09/17 18:42:44, msw wrote:
> nit: s/plus/and append/

Done.

https://codereview.chromium.org/1340423006/diff/1/ui/gfx/text_elider_unittest.cc
File ui/gfx/text_elider_unittest.cc (right):

https://codereview.chromium.org/1340423006/diff/1/ui/gfx/text_elider_unittest...
ui/gfx/text_elider_unittest.cc:1073: TEST(TextEliderTest, MAYBE_TruncateString)
{
On 2015/09/17 18:42:44, msw wrote:
> Adding a test with surrogate pairs, etc. would be nice.

Since the code can get those wrong today, let me do that as part of my next
patch.  I agree we should be testing them.

https://codereview.chromium.org/1340423006/diff/1/ui/gfx/text_elider_unittest...
ui/gfx/text_elider_unittest.cc:1081: EXPECT_EQ(L"\x2026",
UTF16ToWide(TruncateString(str, 1, WORD_BREAK)));
On 2015/09/17 18:42:44, msw wrote:
> nit: maybe EXPECT_STREQ would make UTF16ToWide unnecessary?

Nah, that just means you need to supply two pointers, and they have to be the
same type.  The issue is that on Mac there's no way to do a raw string16 pointer
literal.

https://codereview.chromium.org/1340423006/diff/1/ui/gfx/text_elider_unittest...
ui/gfx/text_elider_unittest.cc:1093: // Test breaking at the start of a later
word.
On 2015/09/17 18:42:44, msw wrote:
> It's probably worthwhile to test index 12 too (but maybe that's mostly covered
> by testing index 5 below).

Done.  Expanded the tests below by a case as well.

msw

lgtm with an optional nit/idea for followup test changes. https://codereview.chromium.org/1340423006/diff/1/ui/gfx/text_elider_unittest.cc File ui/gfx/text_elider_unittest.cc (right): https://codereview.chromium.org/1340423006/diff/1/ui/gfx/text_elider_unittest.cc#newcode1073 ui/gfx/text_elider_unittest.cc:1073: ...

5 years, 3 months ago (2015-09-21 18:28:10 UTC) #9

Peter Kasting

https://codereview.chromium.org/1340423006/diff/20001/ui/gfx/text_elider_unittest.cc File ui/gfx/text_elider_unittest.cc (right): https://codereview.chromium.org/1340423006/diff/20001/ui/gfx/text_elider_unittest.cc#newcode1101 ui/gfx/text_elider_unittest.cc:1101: UTF16ToWide(TruncateString(str, 12, CHARACTER_BREAK))); On 2015/09/21 18:28:10, msw wrote: > ...

5 years, 3 months ago (2015-09-22 23:10:55 UTC) #10

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1340423006/20001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1340423006/20001

5 years, 3 months ago (2015-09-22 23:12:45 UTC) #12

commit-bot: I haz the power

Try jobs failed on following builders: win8_chromium_ng on tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win8_chromium_ng/builds/44162)

5 years, 3 months ago (2015-09-22 23:39:54 UTC) #14

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1340423006/20001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1340423006/20001

5 years, 3 months ago (2015-09-23 00:11:03 UTC) #16

commit-bot: I haz the power

5 years, 3 months ago (2015-09-23 06:00:18 UTC) #18

Message was sent while issue was closed.

Patchset 2 (id:??) landed as
https://crrev.com/fc336926514596ca46b1c1e2973f48c1b6766df1
Cr-Commit-Position: refs/heads/master@{#350310}

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages