|
|
Chromium Code Reviews
DescriptionSplit script runs at 'COMMON' blocks except for "white space" character. RTHB merges text runs
if they're connected by 'COMMON' blocks (like symbols and numbers), which might cause the "tofu" issue.
Also Rewrite the line breaker algorithm based on words iteration approach.
However, the root reason for this issue is that partial (sub text-runs) font fallback
algorithm, which will be overhauled in crbug.com/396415.
BUG=473946, 448909
TEST=Manually check the website http://th.wikipedia.org/wiki/เยอรมันวิงส์_เที่ยวบินที่_9525 on device Pixel and Peach Pit.
Committed: https://crrev.com/149bee40c5cc0bd72cee9a5550b0d178a449e03e
Cr-Commit-Position: refs/heads/master@{#333845}
Patch Set 1 #Patch Set 2 : Make sure multiline breaker behavior is correct. #
Total comments: 6
Patch Set 3 : Address msw's comments. Also fixed TRUNCATE_LONG_WORDS behavior. #Patch Set 4 : Rebase #
Total comments: 6
Patch Set 5 : Delete the useless function. #Patch Set 6 : Address Mukai's comments. #
Total comments: 9
Patch Set 7 : Fix broken tests. Address Mukai's comments. #
Total comments: 12
Patch Set 8 : Address the comments. #
Total comments: 1
Patch Set 9 : Don't break run at white space. #Patch Set 10 : Rebase. #
Total comments: 10
Patch Set 11 : Address Mukai's comments. #
Total comments: 3
Patch Set 12 : A new implementation of HarfBuzzLineBreaker. #Patch Set 13 : small glitches #
Total comments: 45
Patch Set 14 : Address mukai@'s comments. #
Total comments: 8
Patch Set 15 : Address Mukai's comments. #
Total comments: 62
Patch Set 16 : Address msw@'s comments. #
Total comments: 10
Patch Set 17 : Address msw@'s comments. #
Total comments: 16
Patch Set 18 : Address msw@'s comments. Rebase #Patch Set 19 : Try to figure out the zero width space behavior on different OS... #Patch Set 20 : fix broken unittest? #Patch Set 21 : Fix broken unittest. #
Total comments: 11
Patch Set 22 : Address msw@'s comments. #Patch Set 23 : Address msw@'s comments. #
Total comments: 1
Patch Set 24 : address msw@'s comment. #
Messages
Total messages: 68 (17 generated)
xdai@chromium.org changed reviewers: + msw@chromium.org, mukai@chromium.org
Could you help to review the CL please, thanks!
msw@chromium.org changed reviewers: + ckocagil@chromium.org
lgtm, but please wait for Jun and hopefully Cem to review.
On 2015/04/16 22:58:02, msw wrote: > lgtm, but please wait for Jun and hopefully Cem to review. the code itself looks okay, but that's not what we've discussed... Could this cause side effects (choosing wrong glyphs) on some languages? Should we do jshin's suggestion first? (rescanning the font list for per-character if it fails).
On 2015/04/16 23:02:25, Jun Mukai wrote: > On 2015/04/16 22:58:02, msw wrote: > > lgtm, but please wait for Jun and hopefully Cem to review. > > the code itself looks okay, but that's not what we've discussed... > Could this cause side effects (choosing wrong glyphs) on some languages? > Should we do jshin's suggestion first? (rescanning the font list for > per-character if it fails). Yes, I know. I think jshin@'s suggestion is most about the improvement of the sub text-runs font fallback algorithm, which I believe is the main purpose of crbug.com/396415. Since ckocagil@ is the owner of the RTHB class, it might be better if he can do this work in crbug.com/396415.
My biggest concern with this CL is performance. Splitting script runs at COMMON characters might easily cause in 2x-4x more runs, mainly due to splitting at whitespace. We *definitely* need to show that this change doesn't cause in a noticable regression. Can you verify that with at least a very crude metric?
On 2015/04/17 00:17:02, ckocagil wrote: > My biggest concern with this CL is performance. Splitting script runs at COMMON > characters might easily cause in 2x-4x more runs, mainly due to splitting at > whitespace. We *definitely* need to show that this change doesn't cause in a > noticable regression. Can you verify that with at least a very crude metric? Can we avoid splitting at whitespace?
On 2015/04/17 01:53:54, msw wrote: > On 2015/04/17 00:17:02, ckocagil wrote: > > My biggest concern with this CL is performance. Splitting script runs at > COMMON > > characters might easily cause in 2x-4x more runs, mainly due to splitting at > > whitespace. We *definitely* need to show that this change doesn't cause in a > > noticable regression. Can you verify that with at least a very crude metric? > > Can we avoid splitting at whitespace? Yes I think we can, by special handling of whitespace. ckocagil@, what do you think if we handle whitespace separately?
On 2015/04/17 17:25:13, xdai1 wrote: > On 2015/04/17 01:53:54, msw wrote: > > On 2015/04/17 00:17:02, ckocagil wrote: > > > My biggest concern with this CL is performance. Splitting script runs at > > COMMON > > > characters might easily cause in 2x-4x more runs, mainly due to splitting at > > > whitespace. We *definitely* need to show that this change doesn't cause in a > > > noticable regression. Can you verify that with at least a very crude metric? > > > > Can we avoid splitting at whitespace? > > Yes I think we can, by special handling of whitespace. > ckocagil@, what do you think if we handle whitespace separately? That should make it better. Perhaps we should handle '.' and ',' too.
On 2015/04/17 19:05:55, ckocagil wrote:
> On 2015/04/17 17:25:13, xdai1 wrote:
> > On 2015/04/17 01:53:54, msw wrote:
> > > On 2015/04/17 00:17:02, ckocagil wrote:
> > > > My biggest concern with this CL is performance. Splitting script runs at
> > > COMMON
> > > > characters might easily cause in 2x-4x more runs, mainly due to
splitting
> at
> > > > whitespace. We *definitely* need to show that this change doesn't cause
in
> a
> > > > noticable regression. Can you verify that with at least a very crude
> metric?
> > >
> > > Can we avoid splitting at whitespace?
> >
> > Yes I think we can, by special handling of whitespace.
> > ckocagil@, what do you think if we handle whitespace separately?
>
> That should make it better. Perhaps we should handle '.' and ',' too.
I'm a bit nervous on comma or period. One of the bug reports in crbug.com/473946
says underscore ('_') is garbled when combined with Thai characters.
Whitespace would be fine and it's special because it doesn't have to 'draw'
actual pixels, and it is fine with Thai language as far as seen in
crbug.com/473946.
On 2015/04/17 19:37:10, Jun Mukai wrote:
> On 2015/04/17 19:05:55, ckocagil wrote:
> > On 2015/04/17 17:25:13, xdai1 wrote:
> > > On 2015/04/17 01:53:54, msw wrote:
> > > > On 2015/04/17 00:17:02, ckocagil wrote:
> > > > > My biggest concern with this CL is performance. Splitting script runs
at
> > > > COMMON
> > > > > characters might easily cause in 2x-4x more runs, mainly due to
> splitting
> > at
> > > > > whitespace. We *definitely* need to show that this change doesn't
cause
> in
> > a
> > > > > noticable regression. Can you verify that with at least a very crude
> > metric?
> > > >
> > > > Can we avoid splitting at whitespace?
> > >
> > > Yes I think we can, by special handling of whitespace.
> > > ckocagil@, what do you think if we handle whitespace separately?
> >
> > That should make it better. Perhaps we should handle '.' and ',' too.
>
> I'm a bit nervous on comma or period. One of the bug reports in
crbug.com/473946
> says underscore ('_') is garbled when combined with Thai characters.
> Whitespace would be fine and it's special because it doesn't have to 'draw'
> actual pixels, and it is fine with Thai language as far as seen in
> crbug.com/473946.
Hm, but on the other hand, splitting runs for period could cause line breaking
such like
this
.
that's not expected (also, similar thing could happen on "that's" or something).
xdai, please write a test case for this and modify the line breaker to prevent
this happening.
mukai@, ckocagil@, could you help to take another look at this CL? Thanks!
Is the line breaking change necessary to bundle with the run breaking change? https://codereview.chromium.org/1070223004/diff/20001/ui/gfx/render_text_harf... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/1070223004/diff/20001/ui/gfx/render_text_harf... ui/gfx/render_text_harfbuzz.cc:167: // Special handling for white space. White space should be merged into the nit: "Special handling to merge white space into the previous run." https://codereview.chromium.org/1070223004/diff/20001/ui/gfx/render_text_harf... ui/gfx/render_text_harfbuzz.cc:169: if (u_isUWhiteSpace(char_iterator.get())) { Just "continue;" instead of faking out |scripts| and |scripts_size|? https://codereview.chromium.org/1070223004/diff/20001/ui/gfx/render_text_harf... ui/gfx/render_text_harfbuzz.cc:324: SkScalar word_width = Is this necessary for the run breaking change? Can you split this out?
On 2015/04/23 00:00:38, msw wrote:
> Is the line breaking change necessary to bundle with the run breaking change?
>
I haven't seen the new code, but it turns out that it has to be changed.
Let's think the input is "haven't". Previously it was a single run because "'"
is a common script. But the new code it's split into three runs ("haven" + "'" +
"t" because "'" is not Latin), but we want to keep these three runs in a single
line usually.
Things will be complicated if the text is "I haven't seen", where runs are "I
haven" + "'" + "t seen" but the word breaking is "I" + "haven't" + "seen".
>
https://codereview.chromium.org/1070223004/diff/20001/ui/gfx/render_text_harf...
> File ui/gfx/render_text_harfbuzz.cc (right):
>
>
https://codereview.chromium.org/1070223004/diff/20001/ui/gfx/render_text_harf...
> ui/gfx/render_text_harfbuzz.cc:167: // Special handling for white space. White
> space should be merged into the
> nit: "Special handling to merge white space into the previous run."
>
>
https://codereview.chromium.org/1070223004/diff/20001/ui/gfx/render_text_harf...
> ui/gfx/render_text_harfbuzz.cc:169: if (u_isUWhiteSpace(char_iterator.get()))
{
> Just "continue;" instead of faking out |scripts| and |scripts_size|?
>
>
https://codereview.chromium.org/1070223004/diff/20001/ui/gfx/render_text_harf...
> ui/gfx/render_text_harfbuzz.cc:324: SkScalar word_width =
> Is this necessary for the run breaking change? Can you split this out?
On 2015/04/23 00:26:13, Jun Mukai wrote:
> On 2015/04/23 00:00:38, msw wrote:
> > Is the line breaking change necessary to bundle with the run breaking
change?
> >
>
> I haven't seen the new code, but it turns out that it has to be changed.
> Let's think the input is "haven't". Previously it was a single run because "'"
> is a common script. But the new code it's split into three runs ("haven" + "'"
+
> "t" because "'" is not Latin), but we want to keep these three runs in a
single
> line usually.
> Things will be complicated if the text is "I haven't seen", where runs are "I
> haven" + "'" + "t seen" but the word breaking is "I" + "haven't" + "seen".
>
> >
>
https://codereview.chromium.org/1070223004/diff/20001/ui/gfx/render_text_harf...
> > File ui/gfx/render_text_harfbuzz.cc (right):
> >
> >
>
https://codereview.chromium.org/1070223004/diff/20001/ui/gfx/render_text_harf...
> > ui/gfx/render_text_harfbuzz.cc:167: // Special handling for white space.
White
> > space should be merged into the
> > nit: "Special handling to merge white space into the previous run."
> >
> >
>
https://codereview.chromium.org/1070223004/diff/20001/ui/gfx/render_text_harf...
> > ui/gfx/render_text_harfbuzz.cc:169: if
(u_isUWhiteSpace(char_iterator.get()))
> {
> > Just "continue;" instead of faking out |scripts| and |scripts_size|?
> >
> >
>
https://codereview.chromium.org/1070223004/diff/20001/ui/gfx/render_text_harf...
> > ui/gfx/render_text_harfbuzz.cc:324: SkScalar word_width =
> > Is this necessary for the run breaking change? Can you split this out?
Thanks for the explanation mukai@! Yes, the line breaker change is necessary,
because we changed the run breaker behavior. Otherwise we might encounter the
unexpected line breaks as described in the CL description.
Could you help to take another look at this CL? thanks! https://codereview.chromium.org/1070223004/diff/20001/ui/gfx/render_text_harf... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/1070223004/diff/20001/ui/gfx/render_text_harf... ui/gfx/render_text_harfbuzz.cc:167: // Special handling for white space. White space should be merged into the On 2015/04/23 00:00:38, msw wrote: > nit: "Special handling to merge white space into the previous run." Done. https://codereview.chromium.org/1070223004/diff/20001/ui/gfx/render_text_harf... ui/gfx/render_text_harfbuzz.cc:169: if (u_isUWhiteSpace(char_iterator.get())) { On 2015/04/23 00:00:38, msw wrote: > Just "continue;" instead of faking out |scripts| and |scripts_size|? Actually this is depending on how we want to break the text into runs. For example, suppose we have text "abc defg". If we use "continue;" here, then we'll get two runs: "abc " and "defg", but if we fake out |scripts| and |scripts_size|, then we'll get one single run: "abc defg", which is same with the behavior before this change. I think these two approaches are both fine. If we decide to use the former one (continue;), we'll need to modify some corresponding tests. What do you think? https://codereview.chromium.org/1070223004/diff/20001/ui/gfx/render_text_harf... ui/gfx/render_text_harfbuzz.cc:324: SkScalar word_width = On 2015/04/23 00:00:38, msw wrote: > Is this necessary for the run breaking change? Can you split this out? Yes, it has to be modified as described in the CL description.
kindly ping?
https://codereview.chromium.org/1070223004/diff/60001/ui/gfx/render_text_harf... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/1070223004/diff/60001/ui/gfx/render_text_harf... ui/gfx/render_text_harfbuzz.cc:168: if (u_isUWhiteSpace(char_iterator.get())) { I am not so sure about other whitespace variations supports. Did you confirm that the Thai font can draw fullwidth space (U+3000) or some other spaces? Otherwise directly comparing with ' ' would be safer. https://codereview.chromium.org/1070223004/diff/60001/ui/gfx/render_text_harf... ui/gfx/render_text_harfbuzz.cc:337: start_char == 0 ? 0 : GetGlyphWidth(word->first, start_char - 1); I think you don't have to add a method like 'GetGlyphWidth', the width in the past runs (but in a same word) are already added and computed, you can just remember them. GetGlyphWidth() name and interface sounds too generic and I'm afraid it could include tricky corner cases like mixture of RTL and LTR, in the middle of runs and/or words. Those are not our expected cases I think. https://codereview.chromium.org/1070223004/diff/60001/ui/gfx/render_text_harf... ui/gfx/render_text_harfbuzz.cc:369: (line_x_ == 0 && word_width < *width)) { I don't get these conditions. If line_x_ is 0 (i.e. this run appears at the beginning of a line), it should never wrap the word unnecessary because it causes to leave an empty line. Am I wrong?
Patchset #5 (id:80001) has been deleted
Patchset #5 (id:100001) has been deleted
Mukai, could you please take another look at the CL? Thanks! https://codereview.chromium.org/1070223004/diff/60001/ui/gfx/render_text_harf... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/1070223004/diff/60001/ui/gfx/render_text_harf... ui/gfx/render_text_harfbuzz.cc:168: if (u_isUWhiteSpace(char_iterator.get())) { On 2015/04/28 01:57:36, Jun Mukai wrote: > I am not so sure about other whitespace variations supports. Did you confirm > that the Thai font can draw fullwidth space (U+3000) or some other spaces? > Otherwise directly comparing with ' ' would be safer. I added a test case for fullwidth space in RenderTextTest.Multiline_LineBreakerBehavior. Based on the observation, it works exactly the same as the normal space. I think it should be fine? https://codereview.chromium.org/1070223004/diff/60001/ui/gfx/render_text_harf... ui/gfx/render_text_harfbuzz.cc:337: start_char == 0 ? 0 : GetGlyphWidth(word->first, start_char - 1); On 2015/04/28 01:57:36, Jun Mukai wrote: > I think you don't have to add a method like 'GetGlyphWidth', the width in the > past runs (but in a same word) are already added and computed, you can just > remember them. > > GetGlyphWidth() name and interface sounds too generic and I'm afraid it could > include tricky corner cases like mixture of RTL and LTR, in the middle of runs > and/or words. Those are not our expected cases I think. Done. https://codereview.chromium.org/1070223004/diff/60001/ui/gfx/render_text_harf... ui/gfx/render_text_harfbuzz.cc:369: (line_x_ == 0 && word_width < *width)) { On 2015/04/28 01:57:36, Jun Mukai wrote: > I don't get these conditions. If line_x_ is 0 (i.e. this run appears at the > beginning of a line), it should never wrap the word unnecessary because it > causes to leave an empty line. Am I wrong? For example, "I am a student", suppose the line width is 10 (which means it should break before "d" character). When we come to "d", line_x_ equals 0, the word_width_ equals 4, and the *width equals 11, then we know we should roll back one word and advance a new line.
Please also have a look at the failure of views_unittests in the trybots. It indicates something is being wrong around line breaking. By seeing troublesome corner cases, honestly I'm feeling that we may want to hold this CL and consider doing jshin's approach first, which is to try per-character font search only for missing glyphs. https://codereview.chromium.org/1070223004/diff/140001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (left): https://codereview.chromium.org/1070223004/diff/140001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:331: if (*width > available_width) { I am not sure if removing this block works correctly. Consider the input is "aaaaa-b ccc" and the given width is for 3 characters, behavior is IGNORE_LONG_WORDS. In that case: - first run "aaaaa" runs and no problem - second run "-" has no problem - third run "b ccc". It first scan "b ", it's *width is already bigger than available_width, it needs to return true here, otherwise it continues to show "ccc" in the same line. https://codereview.chromium.org/1070223004/diff/140001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/1070223004/diff/140001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:165: while (char_iterator.Advance()) { Hm, why not just doing if (u_isUWhiteSpace(char_iterator.get())) continue; ? You don't have to compute ScriptSetIntersect for whitespaces, anyways it shouldn't separate the text runs. https://codereview.chromium.org/1070223004/diff/140001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:333: next_word_start_pos_ = Probably word_end_pos_ would be a better name?
Patchset #7 (id:160001) has been deleted
Patchset #7 (id:180001) has been deleted
Mukai, please take another look, thanks for your review! https://codereview.chromium.org/1070223004/diff/140001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (left): https://codereview.chromium.org/1070223004/diff/140001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:331: if (*width > available_width) { On 2015/04/29 21:09:07, Jun Mukai wrote: > I am not sure if removing this block works correctly. Consider the input is > "aaaaa-b ccc" and the given width is for 3 characters, behavior is > IGNORE_LONG_WORDS. > > In that case: > - first run "aaaaa" runs and no problem > - second run "-" has no problem > - third run "b ccc". It first scan "b ", it's *width is already bigger than > available_width, it needs to return true here, otherwise it continues to show > "ccc" in the same line. In third run it won't come inside this inner if because it can't pass the outer if (next_word == words_->breaks().end()). I also tested it by writing a test case and it works as expected. It creates three lines (because it contains three words: [aaaaa-], [b ], [ccc]): aaaaa- b ccc Actually it just postpones returning at line 387. https://codereview.chromium.org/1070223004/diff/140001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/1070223004/diff/140001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:165: while (char_iterator.Advance()) { On 2015/04/29 21:09:07, Jun Mukai wrote: > Hm, why not just doing > if (u_isUWhiteSpace(char_iterator.get())) > continue; > ? > You don't have to compute ScriptSetIntersect for whitespaces, anyways it > shouldn't separate the text runs. As I said in the previous comment to msw@, this is depending on how we want to break the text into runs. For example, suppose we have text "abc defg". If we use "continue;" here, then we'll get two runs: "abc " and "defg", but if we fake out |scripts| and |scripts_size|, then we'll get one single run: "abc defg". I think these two approaches are both fine. But since the previous one makes more sense, I'll accept your suggestion and use "continue;" here. https://codereview.chromium.org/1070223004/diff/140001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:333: next_word_start_pos_ = On 2015/04/29 21:09:07, Jun Mukai wrote: > Probably word_end_pos_ would be a better name? Done.
https://codereview.chromium.org/1070223004/diff/140001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (left): https://codereview.chromium.org/1070223004/diff/140001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:331: if (*width > available_width) { On 2015/05/05 18:27:49, xdai1 wrote: > On 2015/04/29 21:09:07, Jun Mukai wrote: > > I am not sure if removing this block works correctly. Consider the input is > > "aaaaa-b ccc" and the given width is for 3 characters, behavior is > > IGNORE_LONG_WORDS. > > > > In that case: > > - first run "aaaaa" runs and no problem > > - second run "-" has no problem > > - third run "b ccc". It first scan "b ", it's *width is already bigger than > > available_width, it needs to return true here, otherwise it continues to show > > "ccc" in the same line. > > In third run it won't come inside this inner if because it can't pass the outer > if (next_word == words_->breaks().end()). > > I also tested it by writing a test case and it works as expected. It creates > three lines (because it contains three words: [aaaaa-], [b ], [ccc]): > aaaaa- > b > ccc > Actually it just postpones returning at line 387. For line 387, the behavior has to be TRUNCATE_LONG_WORDS. Consider the behavior is IGNORE_LONG_WORDS. In the case I commented above, - *width > available_width, but - line_x_ > 0 and word_width_ > *width - word_wrap_behavior_ is neither WRAP_LONG_WORDS nor TRUNCATE_LONG_WORDS Therefore, it doesn't match any cases in the if-clause of line 365-388. It eventually arrives here. https://codereview.chromium.org/1070223004/diff/200001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/1070223004/diff/200001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:261: if (current_word_end_pos <= word_end_pos_ && first_char != '\n') { This will ignore too much if TRUNCATE_LONG_WORDS is set and the run contains multiple words? Think about the example like "averyverylongword's test". The "s " should be truncated but "test" should appear in the next line. https://codereview.chromium.org/1070223004/diff/200001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:493: BreakList<size_t>::const_iterator word_for_current_run = This isn't correct if the segment contains more than two words. Consider example of "foo bar baz, quox". For the first segment, |char_range| is (0, 11) but the word_width should be computed from the last word "baz" whose range is (8, 11) https://codereview.chromium.org/1070223004/diff/200001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:502: for (size_t pos = char_range.start(); pos < char_range.end(); ++pos) { TextRunHarfBuzz::CharRangeToGlyphRange() can be used. https://codereview.chromium.org/1070223004/diff/200001/ui/gfx/render_text_uni... File ui/gfx/render_text_unittest.cc (right): https://codereview.chromium.org/1070223004/diff/200001/ui/gfx/render_text_uni... ui/gfx/render_text_unittest.cc:2125: { L"abcdefg hijkl", Range(0, 8), Range(8, 13), true }, Why concatenating them? What was wrong in the test case? https://codereview.chromium.org/1070223004/diff/200001/ui/gfx/render_text_uni... ui/gfx/render_text_unittest.cc:2209: {L"a \n b ", 2ul, { Range(0, 2), Range(3, 4), Range(4, 6) } }, Consider updating the data structure a bit. The struct just assumes a single run per a line, but now it's not, right? By the way how does that come? In this test case, the run splitting must not change, it's still "a ", "\n", " b ". Am I missing something?
Patchset #8 (id:220001) has been deleted
Mukai, please take another look, thanks! https://codereview.chromium.org/1070223004/diff/140001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (left): https://codereview.chromium.org/1070223004/diff/140001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:331: if (*width > available_width) { On 2015/05/06 17:38:52, Jun Mukai wrote: > On 2015/05/05 18:27:49, xdai1 wrote: > > On 2015/04/29 21:09:07, Jun Mukai wrote: > > > I am not sure if removing this block works correctly. Consider the input is > > > "aaaaa-b ccc" and the given width is for 3 characters, behavior is > > > IGNORE_LONG_WORDS. > > > > > > In that case: > > > - first run "aaaaa" runs and no problem > > > - second run "-" has no problem > > > - third run "b ccc". It first scan "b ", it's *width is already bigger than > > > available_width, it needs to return true here, otherwise it continues to > show > > > "ccc" in the same line. > > > > In third run it won't come inside this inner if because it can't pass the > outer > > if (next_word == words_->breaks().end()). > > > > I also tested it by writing a test case and it works as expected. It creates > > three lines (because it contains three words: [aaaaa-], [b ], [ccc]): > > aaaaa- > > b > > ccc > > Actually it just postpones returning at line 387. > > For line 387, the behavior has to be TRUNCATE_LONG_WORDS. Consider the behavior > is IGNORE_LONG_WORDS. In the case I commented above, > - *width > available_width, but > - line_x_ > 0 and word_width_ > *width > - word_wrap_behavior_ is neither WRAP_LONG_WORDS nor TRUNCATE_LONG_WORDS > > Therefore, it doesn't match any cases in the if-clause of line 365-388. It > eventually arrives here. In this case when it comes to "b", it will return in line 375. Because: 1) "b " is a single run, it doesn't combined into the previous "aaaaa-" run. 2) line_x_ != 0 && word_width_ == *width I also added this test case to RenderTextTest.Multiline_LineBreakerBehavior. https://codereview.chromium.org/1070223004/diff/200001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/1070223004/diff/200001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:261: if (current_word_end_pos <= word_end_pos_ && first_char != '\n') { On 2015/05/06 17:38:52, Jun Mukai wrote: > This will ignore too much if TRUNCATE_LONG_WORDS is set and the run contains > multiple words? > Think about the example like "averyverylongword's test". The "s " should be > truncated but "test" should appear in the next line. Because of the way we handle the white space, it won't get this trouble. For this example, there are four runs here: "averyverylongword", "'", "s ", "test", and two words here: "averyverylongword's " and "test". So it only ignore "'" and "s ", but doesn't ignore the run "test". https://codereview.chromium.org/1070223004/diff/200001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:493: BreakList<size_t>::const_iterator word_for_current_run = On 2015/05/06 17:38:52, Jun Mukai wrote: > This isn't correct if the segment contains more than two words. > Consider example of "foo bar baz, quox". For the first segment, |char_range| is > (0, 11) but the word_width should be computed from the last word "baz" whose > range is (8, 11) Because of the way we handle white space, there are 5 runs here: "foo ", "bar ", "baz", ", ", "quox". Since we add runs one by one, so it should be fine here? https://codereview.chromium.org/1070223004/diff/200001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:502: for (size_t pos = char_range.start(); pos < char_range.end(); ++pos) { On 2015/05/06 17:38:52, Jun Mukai wrote: > TextRunHarfBuzz::CharRangeToGlyphRange() can be used. Done. https://codereview.chromium.org/1070223004/diff/200001/ui/gfx/render_text_uni... File ui/gfx/render_text_unittest.cc (right): https://codereview.chromium.org/1070223004/diff/200001/ui/gfx/render_text_uni... ui/gfx/render_text_unittest.cc:2125: { L"abcdefg hijkl", Range(0, 8), Range(8, 13), true }, On 2015/05/06 17:38:52, Jun Mukai wrote: > Why concatenating them? > What was wrong in the test case? See line 2153: ASSERT_EQ(1U, render_text.lines_[0].segments.size()); Before special handling of white space, they all belong to a single run. But now "abc defg" are split to two runs: "abc ", "defg", thus will fail the ASSERT in line 2153. https://codereview.chromium.org/1070223004/diff/200001/ui/gfx/render_text_uni... ui/gfx/render_text_unittest.cc:2209: {L"a \n b ", 2ul, { Range(0, 2), Range(3, 4), Range(4, 6) } }, On 2015/05/06 17:38:52, Jun Mukai wrote: > Consider updating the data structure a bit. The struct just assumes a single run > per a line, but now it's not, right? > By the way how does that come? In this test case, the run splitting must not > change, it's still "a ", "\n", " b ". Am I missing something? No, the run split does change to: "a ", "\n", " ", "b ". That's why the second line contains two runs: " " and "b ". I changed the test to merge all the segments that belongs to one line to a single range, and then compare it with the expected line range.
I see why I was confused. Please do NOT split runs at whitespaces, as ckocagil pointed out. "this is a sentence" must be a single run. Splitting them at whitespace would indeed reduce some complexities, but if doing so, please address the performance concern which ckocagil raised. https://codereview.chromium.org/1070223004/diff/140001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (left): https://codereview.chromium.org/1070223004/diff/140001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:331: if (*width > available_width) { On 2015/05/06 20:23:44, xdai1 wrote: > On 2015/05/06 17:38:52, Jun Mukai wrote: > > On 2015/05/05 18:27:49, xdai1 wrote: > > > On 2015/04/29 21:09:07, Jun Mukai wrote: > > > > I am not sure if removing this block works correctly. Consider the input > is > > > > "aaaaa-b ccc" and the given width is for 3 characters, behavior is > > > > IGNORE_LONG_WORDS. > > > > > > > > In that case: > > > > - first run "aaaaa" runs and no problem > > > > - second run "-" has no problem > > > > - third run "b ccc". It first scan "b ", it's *width is already bigger > than > > > > available_width, it needs to return true here, otherwise it continues to > > show > > > > "ccc" in the same line. > > > > > > In third run it won't come inside this inner if because it can't pass the > > outer > > > if (next_word == words_->breaks().end()). > > > > > > I also tested it by writing a test case and it works as expected. It creates > > > three lines (because it contains three words: [aaaaa-], [b ], [ccc]): > > > aaaaa- > > > b > > > ccc > > > Actually it just postpones returning at line 387. > > > > For line 387, the behavior has to be TRUNCATE_LONG_WORDS. Consider the > behavior > > is IGNORE_LONG_WORDS. In the case I commented above, > > - *width > available_width, but > > - line_x_ > 0 and word_width_ > *width > > - word_wrap_behavior_ is neither WRAP_LONG_WORDS nor TRUNCATE_LONG_WORDS > > > > Therefore, it doesn't match any cases in the if-clause of line 365-388. It > > eventually arrives here. > > In this case when it comes to "b", it will return in line 375. Because: > 1) "b " is a single run, it doesn't combined into the previous "aaaaa-" run. No, "b " and "ccc" must be in a single run, they are only separated by a whitespace. > 2) line_x_ != 0 && word_width_ == *width word_width_ shouldn't equal to *width. *width starts from 0 in each iteration of this method, but word_width_ should start from 'aaaaa-' (oh, if hyphen "-" splits into words, think that "aaaaa'b ccc"). > I also added this test case to RenderTextTest.Multiline_LineBreakerBehavior. My point is, think that a run contains multiple words (like "b ccc"), and its first word ("b ") is actually a part of a larger word including previous runs ("aaaaa'"), and it's involved by linebreaks. I don't get how your code deals with such cases. https://codereview.chromium.org/1070223004/diff/200001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/1070223004/diff/200001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:261: if (current_word_end_pos <= word_end_pos_ && first_char != '\n') { On 2015/05/06 20:23:44, xdai1 wrote: > On 2015/05/06 17:38:52, Jun Mukai wrote: > > This will ignore too much if TRUNCATE_LONG_WORDS is set and the run contains > > multiple words? > > Think about the example like "averyverylongword's test". The "s " should be > > truncated but "test" should appear in the next line. > > Because of the way we handle the white space, it won't get this trouble. For > this example, there are four runs here: "averyverylongword", "'", "s ", "test", > and two words here: "averyverylongword's " and "test". So it only ignore "'" and > "s ", but doesn't ignore the run "test". Wait. Do we split runs by whitespaces? Don't we want to concatenate them due to the performance reason ckocagil pointed out? https://codereview.chromium.org/1070223004/diff/200001/ui/gfx/render_text_uni... File ui/gfx/render_text_unittest.cc (right): https://codereview.chromium.org/1070223004/diff/200001/ui/gfx/render_text_uni... ui/gfx/render_text_unittest.cc:2125: { L"abcdefg hijkl", Range(0, 8), Range(8, 13), true }, On 2015/05/06 20:23:44, xdai1 wrote: > On 2015/05/06 17:38:52, Jun Mukai wrote: > > Why concatenating them? > > What was wrong in the test case? > > See line 2153: ASSERT_EQ(1U, render_text.lines_[0].segments.size()); > Before special handling of white space, they all belong to a single run. But now > "abc defg" are split to two runs: "abc ", "defg", thus will fail the ASSERT in > line 2153. Please do not do this. Do not update test scenario unless it is wrong. If your code needs the assumption of the test, update the test data and its structure, not the test case itself. https://codereview.chromium.org/1070223004/diff/240001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/1070223004/diff/240001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:166: ScriptSetIntersect(char_iterator.get(), scripts, &scripts_size); You should move ScriptSetIntersect below of if-clause, otherwise whitespace data affects |scripts| and |scripts_size|.
Patchset #9 (id:260001) has been deleted
Mukai, please help to take another look at the CL, thanks!
https://codereview.chromium.org/1070223004/diff/290001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (left): https://codereview.chromium.org/1070223004/diff/290001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:331: if (*width > available_width) { And then, you will have to keep this logic right? What will happen on the example of "aaaaa'bb ccc" I previously commented? https://codereview.chromium.org/1070223004/diff/290001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/1070223004/diff/290001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:54: block_code == UBLOCK_EMOTICONS; Remove these special cases of DINGBATS and EMOTICIONS, which was added by me for a workaround of weird emoji characters. They are 'COMMON' scripts, therefore these special cases are not necessary after your patch. https://codereview.chromium.org/1070223004/diff/290001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:260: if (multiline_ && word_wrap_behavior_ == TRUNCATE_LONG_WORDS ) { This logic looks fairly complicated. I don't get why. Why don't you integrate this into BreakRunAtWidth? I think your logic here is to skip runs when its width is totally hidden by available_width. You can do that with modifying BreakRunAtWidth(). This way, you'll have to have a minor fix on BreakRun() to avoid adding segments if BreakRunAtWidth()'s result is an empty segment. https://codereview.chromium.org/1070223004/diff/290001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:505: void AddSegment(int run_index, Range char_range, float width) { I think all of your problem is that treating the 'added' segments are final. If you can make it rollback and moving the added segments into the next line, I believe the result code would get much simpler. https://codereview.chromium.org/1070223004/diff/290001/ui/gfx/render_text_uni... File ui/gfx/render_text_unittest.cc (right): https://codereview.chromium.org/1070223004/diff/290001/ui/gfx/render_text_uni... ui/gfx/render_text_unittest.cc:2672: ASSERT_EQ(4U, run_list->size()); After removing the spcial case of EMOTICON and DINGBAT, this isn't necessary anymore (although the runs are segmented at 0-2, 2-4, 4-5 instead of 2-3 and 3-5.
Patchset #11 (id:310001) has been deleted
Patchset #11 (id:330001) has been deleted
Mukai, thanks for the review, please take another look! https://codereview.chromium.org/1070223004/diff/290001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (left): https://codereview.chromium.org/1070223004/diff/290001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:331: if (*width > available_width) { On 2015/05/11 05:14:38, Jun Mukai wrote: > And then, you will have to keep this logic right? > What will happen on the example of "aaaaa'bb ccc" I previously commented? As we discussed offline, "aaaaa'bb ccc" has two words: "aaaaa'bb " and "ccc", three runs: "aaaaa", "'", and "bb ccc", the first two runs are fine, when it comes to the third run, because of "bb " still belongs to the first word, "bb " will expand to the first line (because of logic line 423 in Patch 10) but "ccc" will be put into the second line (also because of logic 423 in Patch 10). https://codereview.chromium.org/1070223004/diff/290001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/1070223004/diff/290001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:54: block_code == UBLOCK_EMOTICONS; On 2015/05/11 05:14:38, Jun Mukai wrote: > Remove these special cases of DINGBATS and EMOTICIONS, which was added by me for > a workaround of weird emoji characters. They are 'COMMON' scripts, therefore > these special cases are not necessary after your patch. Done. https://codereview.chromium.org/1070223004/diff/290001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:260: if (multiline_ && word_wrap_behavior_ == TRUNCATE_LONG_WORDS ) { On 2015/05/11 05:14:38, Jun Mukai wrote: > This logic looks fairly complicated. I don't get why. Why don't you integrate > this into BreakRunAtWidth? > > I think your logic here is to skip runs when its width is totally hidden by > available_width. You can do that with modifying BreakRunAtWidth(). This way, > you'll have to have a minor fix on BreakRun() to avoid adding segments if > BreakRunAtWidth()'s result is an empty segment. Done. https://codereview.chromium.org/1070223004/diff/290001/ui/gfx/render_text_uni... File ui/gfx/render_text_unittest.cc (right): https://codereview.chromium.org/1070223004/diff/290001/ui/gfx/render_text_uni... ui/gfx/render_text_unittest.cc:2672: ASSERT_EQ(4U, run_list->size()); On 2015/05/11 05:14:38, Jun Mukai wrote: > After removing the spcial case of EMOTICON and DINGBAT, this isn't necessary > anymore (although the runs are segmented at 0-2, 2-4, 4-5 instead of 2-3 and > 3-5. Actually this is still the case. '\x25B6' ("play character" ) doesn't belong to 'COMMON' scripts.
I still feel like very similar logic spreads around the code. Possibly the problem is you're trying to do something like word-level iteration with run-level iteration. Previously a word cannot span over multiple runs, therefore the linebreaker code works as: AddRun() --> break the run into words --> adds the words into line or advances line. Now the assumption has changed completely. I think that there are two approaches: a. keep the run-level iteration, but sometimes roll back happens, an already-added segment could move to the next line. b. or, word-level iterations I am commenting on the way of a., but your code is a kind of adding b. but keeping run-level iterations. That's why some confusion happens. For example, I think the "GetWordWidth" method is the concept of b. approach. So if you want to proceed your patch in that way, I suggest you to forget my comments, and to rewrite the HarfBuzzLineBreaker interface entirely. Remove "AddRun()" method. Remove BreakRun(). Simply iterate over words, then things will be much simpler in that way. https://codereview.chromium.org/1070223004/diff/290001/ui/gfx/render_text_uni... File ui/gfx/render_text_unittest.cc (right): https://codereview.chromium.org/1070223004/diff/290001/ui/gfx/render_text_uni... ui/gfx/render_text_unittest.cc:2672: ASSERT_EQ(4U, run_list->size()); On 2015/05/11 21:55:22, xdai1 wrote: > On 2015/05/11 05:14:38, Jun Mukai wrote: > > After removing the spcial case of EMOTICON and DINGBAT, this isn't necessary > > anymore (although the runs are segmented at 0-2, 2-4, 4-5 instead of 2-3 and > > 3-5. > > Actually this is still the case. '\x25B6' ("play character" ) doesn't belong to > 'COMMON' scripts. It's common actually. See http://www.unicode.org/Public/UCD/latest/ucd/Scripts.txt 2500..25B6 ; Common # So [183] BOX DRAWINGS LIGHT HORIZONTAL..BLACK RIGHT-POINTING TRIANGLE Ah, but it's in UBLOCK_GEOMETRIC_SHAPES which is still special-cased by IsUnusualBlockCode(). I'm not sure we still have to keep this special case though (msw@ or ckocagil@ should know). https://codereview.chromium.org/1070223004/diff/350001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/1070223004/diff/350001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:50: block_code == UBLOCK_MISCELLANEOUS_SYMBOLS; ckocagil@ or msw@ -- is this function still necessary? https://codereview.chromium.org/1070223004/diff/350001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:253: if (multiline_ && line_x_ != 0) { Why does this have to be here? This break a run and call AdvanceLine(), which sounds like a role of BreakRun(). That way, a benefit is that you don't have to implement GetWordWidth() by yourself, it's a part of BreakRunAtWidth(). https://codereview.chromium.org/1070223004/diff/350001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:532: // Calculate |word_width_|. If you modify AddRun() to invoke always BreakRun() if |multiline_| (regardless of run's width), then BreakRunAtWidth() can compute the word width. Then you don't have to compute here. Right?
Mukai, could you help to take another look at this CL please? This is the new implementation of HarfBuzzLineBreaker based on word-level iteration.
mostly nits https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:246: std::vector<internal::Line> GetLines() const { return lines_; } Do not do this. This copies all of the line structure. The point of old Finalize() is avoid copying but simply passing the line structures. https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:258: void RunLineBreaking() { I don't think this method is necessary. - revert back to AdvanceLine() in the constructor - modify the invocation of HarfBuzzLineBreaker as: HarfBuzzLineBreaker line_breaker(...); if (multiline()) line_breaker.ConstructSingleLine(); else line_breaker.ConstructMultiLines(); line_breaker.FinalizeLines(&lines, &size); - then remove 'multiline_' flag from HarfBuzzLineBreaker. This way you can even remove multiline_ flag. https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:270: for (auto iter = run_list_.runs().begin(); iter != run_list_.runs().end(); It's better to use 'for (size_t i = 0; i < run_list_.size(); ++i) { ... }' https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:285: iter++) { for (const auto& word : words_->breaks()) { ... } https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:293: internal::LineSegment last_segment = word_segments.back(); const internal::LineSegment& https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:294: base::char16 last_char = text_[last_segment.char_range.start()]; const base::char16 https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:305: AdvanceLine(); I think this is not necessary, and if behavior is WRAP_LONG_WORDS and input is "aa bbbbbb cc" (with max_width_ = 4 chars), your result is: aa bbbb bb cc But expected result is aa b bbbb b cc Just remove these two lines, AddWordToLine() can handle AdvanceLine(). https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:338: void AddWordToLine(std::vector<internal::LineSegment>& segments) { const std::vector<internal::LineSegment>& https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:339: if (segments.empty() || lines_.empty()) lines_ shouldn't be empty, because AdvanceLine() must be called, right? DCHECK(!lines_.empty()); and remove lines_.empty() check in if. https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:343: for (auto iter = segments.begin(); iter != segments.end(); iter++) { for (const internal::LineSegment& segment : segments) { ... } https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:351: DCHECK(word_wrap_behavior_ != IGNORE_LONG_WORDS); DCHECK(word_wrap_behavior_ == TRUNCATE_LONG_WORDS || word_wrap_behavior_ == WRAP_LONG_WORDS); IGNORE_LONG_WORDS is already tested at line 348, you don't have to check again. I believe the point is, this section only supports TRUNCATE_LONG_WORDS and WRAP_LONG_WORDS. https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:355: size_t cutoff_pos; size_t cutoff_pos = 0; https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:370: run.positions[glyph_range.start()].x(); Looks like it's better to add a new method to HarfBuzzTextRun for SkScalar GetWidthForGlyphRange(const Range& range); to do this. https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:398: void AddLineSegment(internal::LineSegment& segment) { const internal::LineSegment& https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:403: internal::LineSegment last_segment = line->segments.back(); internal::LineSegment& last_segment = line_segments->back(); (or internal::LineSegment* last_segment = &line_segments->back(); ), then modify it directly, and return. https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:447: SkScalar GetCutoffWidth(internal::LineSegment segment, const internal::LineSegment& https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:448: size_t st_pos, Remove st_pos. It always starts from segment.char_range.start() right? https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:449: size_t& en_pos) { Return value is passed by a pointer, do not use nonconst reference as a parameter. See: https://google-styleguide.googlecode.com/svn/trunk/cppguide.html#Reference_Ar... Also I prefer naming 'end_pos' Also make this method a const method -- GetCutoffWidth(...) const { ... } https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:458: SkScalar char_width = ((glyph_range.end() >= run.glyph_count) const SkScalar https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:473: std::vector<internal::LineSegment>& segments) { std::vector<internal::LineSegment>* Also make this a const method.
Thanks for the review! please take another look at the CL. https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:246: std::vector<internal::Line> GetLines() const { return lines_; } On 2015/05/15 20:16:01, Jun Mukai wrote: > Do not do this. This copies all of the line structure. The point of old > Finalize() is avoid copying but simply passing the line structures. Removed it and passed values through FinalizeLines(). https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:258: void RunLineBreaking() { On 2015/05/15 20:16:00, Jun Mukai wrote: > I don't think this method is necessary. > > - revert back to AdvanceLine() in the constructor > - modify the invocation of HarfBuzzLineBreaker as: > HarfBuzzLineBreaker line_breaker(...); > if (multiline()) > line_breaker.ConstructSingleLine(); > else > line_breaker.ConstructMultiLines(); > line_breaker.FinalizeLines(&lines, &size); > > - then remove 'multiline_' flag from HarfBuzzLineBreaker. > > This way you can even remove multiline_ flag. Done. https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:270: for (auto iter = run_list_.runs().begin(); iter != run_list_.runs().end(); On 2015/05/15 20:16:00, Jun Mukai wrote: > It's better to use 'for (size_t i = 0; i < run_list_.size(); ++i) { ... }' Done. but I don't know why... https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:285: iter++) { On 2015/05/15 20:16:00, Jun Mukai wrote: > for (const auto& word : words_->breaks()) { ... } Done. https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:293: internal::LineSegment last_segment = word_segments.back(); On 2015/05/15 20:15:59, Jun Mukai wrote: > const internal::LineSegment& Done. https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:294: base::char16 last_char = text_[last_segment.char_range.start()]; On 2015/05/15 20:16:00, Jun Mukai wrote: > const base::char16 Done. https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:305: AdvanceLine(); On 2015/05/15 20:16:00, Jun Mukai wrote: > I think this is not necessary, and if behavior is WRAP_LONG_WORDS and input is > "aa bbbbbb cc" (with max_width_ = 4 chars), your result is: > aa > bbbb > bb > cc > > But expected result is > aa b > bbbb > b cc > > > Just remove these two lines, AddWordToLine() can handle AdvanceLine(). As we discussed offline, this logic is still needed. https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:338: void AddWordToLine(std::vector<internal::LineSegment>& segments) { On 2015/05/15 20:16:00, Jun Mukai wrote: > const std::vector<internal::LineSegment>& This can't be const since the element inside of the vector might be modified. See line 381-384 in this patch. https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:339: if (segments.empty() || lines_.empty()) On 2015/05/15 20:16:00, Jun Mukai wrote: > lines_ shouldn't be empty, because AdvanceLine() must be called, right? > DCHECK(!lines_.empty()); and remove lines_.empty() check in if. Done. https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:343: for (auto iter = segments.begin(); iter != segments.end(); iter++) { On 2015/05/15 20:16:00, Jun Mukai wrote: > for (const internal::LineSegment& segment : segments) { ... } Same as previous one, segment can't be const since it might need to be modified in line 381-384. https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:351: DCHECK(word_wrap_behavior_ != IGNORE_LONG_WORDS); On 2015/05/15 20:16:01, Jun Mukai wrote: > DCHECK(word_wrap_behavior_ == TRUNCATE_LONG_WORDS || word_wrap_behavior_ == > WRAP_LONG_WORDS); > > IGNORE_LONG_WORDS is already tested at line 348, you don't have to check again. > I believe the point is, this section only supports TRUNCATE_LONG_WORDS and > WRAP_LONG_WORDS. Done. https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:355: size_t cutoff_pos; On 2015/05/15 20:16:00, Jun Mukai wrote: > size_t cutoff_pos = 0; Done. https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:370: run.positions[glyph_range.start()].x(); On 2015/05/15 20:16:01, Jun Mukai wrote: > Looks like it's better to add a new method to HarfBuzzTextRun for SkScalar > GetWidthForGlyphRange(const Range& range); to do this. Done. Also adjusted some functions' sequence to consist with the header file. https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:398: void AddLineSegment(internal::LineSegment& segment) { On 2015/05/15 20:16:00, Jun Mukai wrote: > const internal::LineSegment& It can't be const since it might be modified in line 405-409 in this patch. https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:403: internal::LineSegment last_segment = line->segments.back(); On 2015/05/15 20:16:01, Jun Mukai wrote: > internal::LineSegment& last_segment = line_segments->back(); (or > internal::LineSegment* last_segment = &line_segments->back(); ), then modify it > directly, and return. Done. And also make last_segment const. https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:447: SkScalar GetCutoffWidth(internal::LineSegment segment, On 2015/05/15 20:16:00, Jun Mukai wrote: > const internal::LineSegment& Done. https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:448: size_t st_pos, On 2015/05/15 20:16:00, Jun Mukai wrote: > Remove st_pos. It always starts from segment.char_range.start() right? Done. https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:449: size_t& en_pos) { On 2015/05/15 20:16:00, Jun Mukai wrote: > Return value is passed by a pointer, do not use nonconst reference as a > parameter. > See: > https://google-styleguide.googlecode.com/svn/trunk/cppguide.html#Reference_Ar... > > Also I prefer naming 'end_pos' > > Also make this method a const method -- GetCutoffWidth(...) const { ... } I see. Thanks! Done. https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:458: SkScalar char_width = ((glyph_range.end() >= run.glyph_count) On 2015/05/15 20:16:00, Jun Mukai wrote: > const SkScalar Done. https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:473: std::vector<internal::LineSegment>& segments) { On 2015/05/15 20:16:00, Jun Mukai wrote: > std::vector<internal::LineSegment>* > Also make this a const method. Done.
almost there! https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:338: void AddWordToLine(std::vector<internal::LineSegment>& segments) { On 2015/05/15 23:41:18, xdai1 wrote: > On 2015/05/15 20:16:00, Jun Mukai wrote: > > const std::vector<internal::LineSegment>& > > This can't be const since the element inside of the vector might be modified. > See line 381-384 in this patch. If you really need to modify this parameter, pass it as a pointer. Again, any non-const reference parameters are disallowed explicitly. No exceptions. However, I think the point of modifying segment is char_range, x_range, and width. You can declare the local variables right before while such like Range char_range = segment.char_range; Range x_range = segment.x_range; SkScalar width = segment.width; while (...) { } This way you don't have to modify segments, and word_segments can be const. https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:398: void AddLineSegment(internal::LineSegment& segment) { On 2015/05/15 23:41:18, xdai1 wrote: > On 2015/05/15 20:16:00, Jun Mukai wrote: > > const internal::LineSegment& > > It can't be const since it might be modified in line 405-409 in this patch. In that case please pass as the pointer. Our coding style explicitly disallows non-const reference parameters. https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:403: internal::LineSegment last_segment = line->segments.back(); On 2015/05/15 23:41:17, xdai1 wrote: > On 2015/05/15 20:16:01, Jun Mukai wrote: > > internal::LineSegment& last_segment = line_segments->back(); (or > > internal::LineSegment* last_segment = &line_segments->back(); ), then modify > it > > directly, and return. > > Done. And also make last_segment const. No. My intention was, if the last_segment and segment shares the same run, just modify the last_segment and abandon segment. computation in line 420-440 is per-run, so you don't have to do it again. What I'm thinking is (I may miss some but): if (last_segment.run == segment.run) { // Simply adding |segment| data into |last_segment|. last_segment->char_range.set_end(segment.run.char_range.end()); last_segment->width += segment.width; last_segment->x_range.set_end(last_segment->x_range.start() + SkScalarCeilToInt(last_segment->width)); line->size.set_width(line->size.width() + segment.width()); text_x_ += segment.width; available_width_ -= segment.width; return; } https://codereview.chromium.org/1070223004/diff/410001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/1070223004/diff/410001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:260: for (const auto& word : words_->breaks()) { Oops, I think previous code could be better if we want to pass the range, as I commented below. https://codereview.chromium.org/1070223004/diff/410001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:262: SkScalar word_width = GetWordWidth(&word, &word_segments); I think it's better to pass the range rather than the word. The range can be obtained through BreakList::GetRange (you need to pass the iterator rather than the Break though) https://codereview.chromium.org/1070223004/diff/410001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:464: size_t run_en_idx = run_list_.GetRunIndexAt(word_en); Let's rename run_start_index and run_end_index. Usually acronyms and shortened terms are confusing. https://codereview.chromium.org/1070223004/diff/410001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:468: const Range char_range = run.range.Intersect(Range(word_st, word_en + 1)); if (char_range.IsEmpty()) continue; I am not sure this is really necessary, but it's a good thing to make sure not-adding empty segments.
Patchset #15 (id:430001) has been deleted
Patchset #15 (id:450001) has been deleted
Patchset #15 (id:470001) has been deleted
Mukai, please take another look at the CL, thanks for the review! https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:338: void AddWordToLine(std::vector<internal::LineSegment>& segments) { On 2015/05/16 00:21:56, Jun Mukai wrote: > On 2015/05/15 23:41:18, xdai1 wrote: > > On 2015/05/15 20:16:00, Jun Mukai wrote: > > > const std::vector<internal::LineSegment>& > > > > This can't be const since the element inside of the vector might be modified. > > See line 381-384 in this patch. > > If you really need to modify this parameter, pass it as a pointer. Again, any > non-const reference parameters are disallowed explicitly. No exceptions. > > > However, I think the point of modifying segment is char_range, x_range, and > width. You can declare the local variables right before while such like > Range char_range = segment.char_range; > Range x_range = segment.x_range; > SkScalar width = segment.width; > while (...) { > } > > This way you don't have to modify segments, and word_segments can be const. Done. https://codereview.chromium.org/1070223004/diff/390001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:403: internal::LineSegment last_segment = line->segments.back(); On 2015/05/16 00:21:56, Jun Mukai wrote: > On 2015/05/15 23:41:17, xdai1 wrote: > > On 2015/05/15 20:16:01, Jun Mukai wrote: > > > internal::LineSegment& last_segment = line_segments->back(); (or > > > internal::LineSegment* last_segment = &line_segments->back(); ), then modify > > it > > > directly, and return. > > > > Done. And also make last_segment const. > > No. My intention was, if the last_segment and segment shares the same run, just > modify the last_segment and abandon segment. > > computation in line 420-440 is per-run, so you don't have to do it again. What > I'm thinking is (I may miss some but): > > if (last_segment.run == segment.run) { > // Simply adding |segment| data into |last_segment|. > last_segment->char_range.set_end(segment.run.char_range.end()); > last_segment->width += segment.width; > last_segment->x_range.set_end(last_segment->x_range.start() + > SkScalarCeilToInt(last_segment->width)); > line->size.set_width(line->size.width() + segment.width()); > text_x_ += segment.width; > available_width_ -= segment.width; > return; > } You're right. We can modify last_segment and keep segment const. Done. https://codereview.chromium.org/1070223004/diff/410001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/1070223004/diff/410001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:260: for (const auto& word : words_->breaks()) { On 2015/05/16 00:21:57, Jun Mukai wrote: > Oops, I think previous code could be better if we want to pass the range, as I > commented below. Done. https://codereview.chromium.org/1070223004/diff/410001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:262: SkScalar word_width = GetWordWidth(&word, &word_segments); On 2015/05/16 00:21:57, Jun Mukai wrote: > I think it's better to pass the range rather than the word. The range can be > obtained through BreakList::GetRange (you need to pass the iterator rather than > the Break though) Done. https://codereview.chromium.org/1070223004/diff/410001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:464: size_t run_en_idx = run_list_.GetRunIndexAt(word_en); On 2015/05/16 00:21:57, Jun Mukai wrote: > Let's rename run_start_index and run_end_index. Usually acronyms and shortened > terms are confusing. Done. https://codereview.chromium.org/1070223004/diff/410001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:468: const Range char_range = run.range.Intersect(Range(word_st, word_en + 1)); On 2015/05/16 00:21:57, Jun Mukai wrote: > if (char_range.IsEmpty()) continue; > I am not sure this is really necessary, but it's a good thing to make sure > not-adding empty segments. Done.
lgtm! But please get yet another review from msw.
Thanks Mukai! msw@, could you help to take another look at this CL? Thanks.
On 2015/05/19 16:14:08, xdai1 wrote: > Thanks Mukai! msw@, could you help to take another look at this CL? Thanks. msw@, kingly ping?
I'm really sorry for my delay here. I haven't finished looking over the main HarfBuzzLineBreaker changes, but I thought I'd send along the comments I have so far. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:93: // USCRIPT_INVALID_CODE. nit: fits on line above. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:250: segment.x_range = Range(SkScalarCeilToInt(text_x_), The segment ranges should probably be RangeF, but that's not critical here. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:387: // Add line segment to current line. Note in order to keep the visual order nit: "Add a line segment to the current line. Note that, in order..." https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:389: // a same run. nit: "the same run" https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:519: SkScalar available_width_; If |available_width_| == |max_width_ - text_x_|, why do we need both? https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:662: return ((glyph_range.end() >= glyph_count) When would the glyph range include anything beyond glyph_count? Shouldn't we DHECK_LE(glyph_range.start(), glyph_count) and DHECK_LE(glyph_range.end(), glyph_count) or DHECK(Range(0, glyph_count).contains(glyph_range))? https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:663: ? SkFloatToScalar(width) This seems wrong; if the glyph_range is (4, 7) and glyph_count is 7, I don't think this should return the full run width. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:665: positions[glyph_range.start()].x(); nit: this indentation seems wrong. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:702: size_t TextRunList::GetRunIndexAt(size_t position) const { Hmm, it'd be nice if this could share logic with GetRunContainingCaret or RangeContainsCaret, but that doesn't seem terribly straightforward. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_uni... File ui/gfx/render_text_unittest.cc (right): https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_uni... ui/gfx/render_text_unittest.cc:2222: // There might be multiple segments in one line. Merge all the segments Are there actually multiple segments in any of these test cases? (they only appear to contain 'a'-'f', '\n', and ' ', so I'm not sure why your change would increase the runs/segments here) https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_uni... ui/gfx/render_text_unittest.cc:2222: // There might be multiple segments in one line. Merge all the segments nit: "Merge all the segments ranges in the same line." https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_uni... ui/gfx/render_text_unittest.cc:2224: size_t segment_size = render_text.lines()[j].segments.size(); nit: const https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_uni... ui/gfx/render_text_unittest.cc:2229: render_text.lines()[j].segments[segment_size - 1].char_range.end()); Would we need to union all the ranges for RTL? Probably not worth fixing now. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_uni... ui/gfx/render_text_unittest.cc:2370: const size_t num_lines; nit: Why not just assume 3 in the code; we assume 3 char_ranges. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_uni... ui/gfx/render_text_unittest.cc:2417: // Merges all the segments ranges in the same line. nit: "Merge"
Some more comments, I'll wait for your response after this. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:264: SkScalar word_width = GetWordWidth(word_range, &word_segments); The old algorithm created one segment per run, or more if the run was split across multiple lines. This new algorithm creates a segment for every word! Segments aren't too expensive, but this seems wasteful and may have negative performance impacts. Would it make sense to add entire runs when the run width will fit, then resort to word breaking when the whole run doesn't fit? Hmm, I guess that would break the case where a word is split across multiple runs... And perhaps we don't actually increase the final number of segments, since AddLineSegment merges segments from the same run... Perhaps this is okay, but did you consider any alternative solutions? https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:268: // If the word is end with '\n', we should advance a new line after nit: s/is end/ends/ or consider "If the last word is '\n'" https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:269: // adding the word to current line. nit: "to the current" https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:272: if (last_char == '\n') { nit: inline if (text_[word_segments.back().char_range.start()] == '\n') you can also combine that with the if statement above. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:335: if (word_segments.empty()) nit: this probably shouldn't happen either; make this a dcheck? https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:350: internal::LineSegment remain_segment = segment; nit: remaining_segment or truncated_segment https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:357: // See RenderTextTest.Multiline_MinWidth for example. nit: for an example https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:358: cutoff_pos = remain_segment.char_range.start() + 1; I wonder if this is correct/sufficient, what if the character is a surrogate pair, a zero width character, or something else? https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:371: SkScalarCeilToInt(text_x_ + width)); Should this be remain_segment.x_range.start() + width? https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:398: last_segment.char_range.set_end(segment.char_range.end()); nit: can we dcheck that the segments actually contain adjacent indices (ie. last_segment.char_range.end() == segment.char_range.start())? https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:400: last_segment.x_range.set_end( ditto: can we dcheck that the x_ranges are adjacent? https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:437: // Finds the end position |end_pos| in |segment| that the preceding width is nit: s/that/where/ https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:451: *end_pos += 1; nit: (*end_pos)++; https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:466: for (size_t idx = run_start_index; idx <= run_end_index; idx++) { nit: why |idx| instead of |i|? https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:469: if (char_range.is_empty()) It seems like this shouldn't be possible, maybe DCHECK instead?
msw@, thanks for the review! I've addressed your comments, please take another look at the CL. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:93: // USCRIPT_INVALID_CODE. On 2015/05/27 17:38:00, msw wrote: > nit: fits on line above. Done. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:250: segment.x_range = Range(SkScalarCeilToInt(text_x_), On 2015/05/27 17:37:59, msw wrote: > The segment ranges should probably be RangeF, but that's not critical here. Yes, we probably should use RangeF for the X coordinates of a line segment. I can address that in another CL, if you don't mind. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:264: SkScalar word_width = GetWordWidth(word_range, &word_segments); On 2015/05/28 18:16:16, msw wrote: > The old algorithm created one segment per run, or more if the run was split > across multiple lines. This new algorithm creates a segment for every word! > Segments aren't too expensive, but this seems wasteful and may have negative > performance impacts. Would it make sense to add entire runs when the run width > will fit, then resort to word breaking when the whole run doesn't fit? > > Hmm, I guess that would break the case where a word is split across multiple > runs... And perhaps we don't actually increase the final number of segments, > since AddLineSegment merges segments from the same run... Perhaps this is okay, > but did you consider any alternative solutions? Yes, the number of the segments is the same with that in the old algorithm. Breaking text into lines based on words seems the most natural way to me, which is also used by many modern word processors, such as OpenOffice and Microsoft Word (http://en.wikipedia.org/wiki/Line_wrap_and_word_wrap#Algorithm). Actually combining these two approach together was what I did in old patches (see Patch 11): keep the run-level iteration, but resort to word-level approach when the current run doesn't fit in the current line. As you could see, the logic is much much more complicated. After discussed with Mukai for several times, we agreed that it might be better to not mix them up. The time complexity of these two approaches should be the same: in worst case we both need to process each character one by one. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:268: // If the word is end with '\n', we should advance a new line after On 2015/05/28 18:16:16, msw wrote: > nit: s/is end/ends/ or consider "If the last word is '\n'" Done. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:269: // adding the word to current line. On 2015/05/28 18:16:16, msw wrote: > nit: "to the current" Done. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:272: if (last_char == '\n') { On 2015/05/28 18:16:16, msw wrote: > nit: inline if (text_[word_segments.back().char_range.start()] == '\n') > you can also combine that with the if statement above. Done. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:335: if (word_segments.empty()) On 2015/05/28 18:16:15, msw wrote: > nit: this probably shouldn't happen either; make this a dcheck? This might happen, see test function: RenderTextTest.Multiline_SufficientWidth. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:350: internal::LineSegment remain_segment = segment; On 2015/05/28 18:16:16, msw wrote: > nit: remaining_segment or truncated_segment Done. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:357: // See RenderTextTest.Multiline_MinWidth for example. On 2015/05/28 18:16:16, msw wrote: > nit: for an example Done. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:358: cutoff_pos = remain_segment.char_range.start() + 1; On 2015/05/28 18:16:16, msw wrote: > I wonder if this is correct/sufficient, what if the character is a surrogate > pair, a zero width character, or something else? You're right, we should special handle these cases here. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:371: SkScalarCeilToInt(text_x_ + width)); On 2015/05/28 18:16:16, msw wrote: > Should this be remain_segment.x_range.start() + width? It's preciser to use text_x_ instead of remain_segment.x_range.start() since the x_range was floored due to type coercion of float -> size_t. text_x_ will be updated each time when calling AddLineSegment(). https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:387: // Add line segment to current line. Note in order to keep the visual order On 2015/05/27 17:37:59, msw wrote: > nit: "Add a line segment to the current line. Note that, in order..." Done. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:389: // a same run. On 2015/05/27 17:37:59, msw wrote: > nit: "the same run" Done. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:398: last_segment.char_range.set_end(segment.char_range.end()); On 2015/05/28 18:16:16, msw wrote: > nit: can we dcheck that the segments actually contain adjacent indices (ie. > last_segment.char_range.end() == segment.char_range.start())? Done. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:400: last_segment.x_range.set_end( On 2015/05/28 18:16:16, msw wrote: > ditto: can we dcheck that the x_ranges are adjacent? Done. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:437: // Finds the end position |end_pos| in |segment| that the preceding width is On 2015/05/28 18:16:16, msw wrote: > nit: s/that/where/ Done. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:451: *end_pos += 1; On 2015/05/28 18:16:15, msw wrote: > nit: (*end_pos)++; Done. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:466: for (size_t idx = run_start_index; idx <= run_end_index; idx++) { On 2015/05/28 18:16:16, msw wrote: > nit: why |idx| instead of |i|? Done. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:469: if (char_range.is_empty()) On 2015/05/28 18:16:15, msw wrote: > It seems like this shouldn't be possible, maybe DCHECK instead? Done. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:519: SkScalar available_width_; On 2015/05/27 17:37:59, msw wrote: > If |available_width_| == |max_width_ - text_x_|, why do we need both? |available_width_| will be reset when advancing a new line but |text_x_| won't. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:662: return ((glyph_range.end() >= glyph_count) On 2015/05/27 17:37:59, msw wrote: > When would the glyph range include anything beyond glyph_count? Shouldn't we > DHECK_LE(glyph_range.start(), glyph_count) and DHECK_LE(glyph_range.end(), > glyph_count) or DHECK(Range(0, glyph_count).contains(glyph_range))? Yes, it is not possible that glyph_range.end() include anything beyond glyph_count. It should be better using "==" instead of ">=" in the conditional ternary operator. Done. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:663: ? SkFloatToScalar(width) On 2015/05/27 17:37:59, msw wrote: > This seems wrong; if the glyph_range is (4, 7) and glyph_count is 7, I don't > think this should return the full run width. It will not return the full run width, it will return width - positions[glyph_range.start()].x(). https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:665: positions[glyph_range.start()].x(); On 2015/05/27 17:37:59, msw wrote: > nit: this indentation seems wrong. It seems right. It's not inside of the conditional ternary operator. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:702: size_t TextRunList::GetRunIndexAt(size_t position) const { On 2015/05/27 17:37:59, msw wrote: > Hmm, it'd be nice if this could share logic with GetRunContainingCaret or > RangeContainsCaret, but that doesn't seem terribly straightforward. Sorry, I didn't see how can we do that... https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_uni... File ui/gfx/render_text_unittest.cc (right): https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_uni... ui/gfx/render_text_unittest.cc:2222: // There might be multiple segments in one line. Merge all the segments On 2015/05/27 17:38:00, msw wrote: > nit: "Merge all the segments ranges in the same line." Done. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_uni... ui/gfx/render_text_unittest.cc:2222: // There might be multiple segments in one line. Merge all the segments On 2015/05/27 17:38:00, msw wrote: > Are there actually multiple segments in any of these test cases? (they only > appear to contain 'a'-'f', '\n', and ' ', so I'm not sure why your change would > increase the runs/segments here) Yes, for test case "a \n b", there are four runs here: "a ", "\n", " " and "b". The reason is we always merge white space into the previous run and then we special handle "\n" to form a single run. And that's why we have two segments in the second line: " " and "b". If there is no special character, for example, "a bc" is a single run. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_uni... ui/gfx/render_text_unittest.cc:2224: size_t segment_size = render_text.lines()[j].segments.size(); On 2015/05/27 17:38:00, msw wrote: > nit: const Done. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_uni... ui/gfx/render_text_unittest.cc:2229: render_text.lines()[j].segments[segment_size - 1].char_range.end()); On 2015/05/27 17:38:00, msw wrote: > Would we need to union all the ranges for RTL? Probably not worth fixing now. I don't think so. This test function doesn't test RTL related features. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_uni... ui/gfx/render_text_unittest.cc:2370: const size_t num_lines; On 2015/05/27 17:38:00, msw wrote: > nit: Why not just assume 3 in the code; we assume 3 char_ranges. Done. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_uni... ui/gfx/render_text_unittest.cc:2417: // Merges all the segments ranges in the same line. On 2015/05/27 17:38:00, msw wrote: > nit: "Merge" Done.
Thanks for the clarifications and fixes. https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:335: if (word_segments.empty()) On 2015/06/01 16:51:14, xdai1 wrote: > On 2015/05/28 18:16:15, msw wrote: > > nit: this probably shouldn't happen either; make this a dcheck? > > This might happen, see test function: RenderTextTest.Multiline_SufficientWidth. When the overall text is empty? Should the code bail earlier then? https://codereview.chromium.org/1070223004/diff/510001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/1070223004/diff/510001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:370: FindValidBoundaryBefore(remaining_segment, &cutoff_pos); Can this be part of GetCutoffPos? https://codereview.chromium.org/1070223004/diff/510001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:372: remaining_segment.char_range.start() == cutoff_pos nit: put this ==/0 logic in GetGlyphWidthForCharRange https://codereview.chromium.org/1070223004/diff/510001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:377: // |max_width_| might be smaller than a single character. In this case Ditto, can this be part of GetCutoffPos? https://codereview.chromium.org/1070223004/diff/510001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:431: DCHECK(last_segment.char_range.end() == segment.char_range.start()); nit: DCHECK_EQ here and below. https://codereview.chromium.org/1070223004/diff/510001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:493: size_t FindValidBoundaryAfter(const internal::LineSegment& segment, Can we avoid copying modified versions of text_elider.cc code (CharIsMark, GetCodePointAt, FindValidBoundary[Before|After]) here? Maybe expose the needed helpers on StringSlicer or as text_elider/text_util functions?
msw@, please help to take another look at the CL, thanks for the review! https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/1070223004/diff/490001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:335: if (word_segments.empty()) On 2015/06/01 23:27:18, msw wrote: > On 2015/06/01 16:51:14, xdai1 wrote: > > On 2015/05/28 18:16:15, msw wrote: > > > nit: this probably shouldn't happen either; make this a dcheck? > > > > This might happen, see test function: > RenderTextTest.Multiline_SufficientWidth. > > When the overall text is empty? Should the code bail earlier then? For example, if the entire text is "\n" or just "", then we might encounter this situation (word_segments is empty). But yes, we could make it an early return. https://codereview.chromium.org/1070223004/diff/510001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/1070223004/diff/510001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:370: FindValidBoundaryBefore(remaining_segment, &cutoff_pos); On 2015/06/01 23:27:18, msw wrote: > Can this be part of GetCutoffPos? Done. https://codereview.chromium.org/1070223004/diff/510001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:372: remaining_segment.char_range.start() == cutoff_pos On 2015/06/01 23:27:18, msw wrote: > nit: put this ==/0 logic in GetGlyphWidthForCharRange Done. https://codereview.chromium.org/1070223004/diff/510001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:377: // |max_width_| might be smaller than a single character. In this case On 2015/06/01 23:27:18, msw wrote: > Ditto, can this be part of GetCutoffPos? Done. https://codereview.chromium.org/1070223004/diff/510001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:431: DCHECK(last_segment.char_range.end() == segment.char_range.start()); On 2015/06/01 23:27:18, msw wrote: > nit: DCHECK_EQ here and below. Done. https://codereview.chromium.org/1070223004/diff/510001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:493: size_t FindValidBoundaryAfter(const internal::LineSegment& segment, On 2015/06/01 23:27:18, msw wrote: > Can we avoid copying modified versions of text_elider.cc code (CharIsMark, > GetCodePointAt, FindValidBoundary[Before|After]) here? Maybe expose the needed > helpers on StringSlicer or as text_elider/text_util functions? Put these functions in text_util.h and text_util.cc files.
https://codereview.chromium.org/1070223004/diff/530001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/1070223004/diff/530001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:445: // Don't separate surrogate pair or combined characters. Can we test for this behavior? Please ensure that it actually fails without this line or without the FindValidBoundaryAfter for the zero-width case below. https://codereview.chromium.org/1070223004/diff/530001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:452: if (width == 0 && available_width_ == max_width_) { Does this need to loop while the width is zero? Isn't that what the last patch set did? Can we add a test for this behavior (multiple zero-width chars put on the same line until the first non-zero-width char)? https://codereview.chromium.org/1070223004/diff/530001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:453: end_pos++; nit: |inline end_pos + 1| below instead, drop curly braces. https://codereview.chromium.org/1070223004/diff/530001/ui/gfx/text_elider.cc File ui/gfx/text_elider.cc (right): https://codereview.chromium.org/1070223004/diff/530001/ui/gfx/text_elider.cc#... ui/gfx/text_elider.cc:28: #include "third_party/icu/source/common/unicode/uchar.h" nit: can we remove some includes, like this one, now? https://codereview.chromium.org/1070223004/diff/530001/ui/gfx/text_elider.cc#... ui/gfx/text_elider.cc:31: #include "third_party/icu/source/common/unicode/utf16.h" ditto https://codereview.chromium.org/1070223004/diff/530001/ui/gfx/text_utils.cc File ui/gfx/text_utils.cc (right): https://codereview.chromium.org/1070223004/diff/530001/ui/gfx/text_utils.cc#n... ui/gfx/text_utils.cc:88: size_t FindValidBoundaryAfter(const base::string16& text, size_t index) { Why did you drop the checked_casts here? Is https://codereview.chromium.org/681123006 somehow no longer applicable to this code? Be careful about changing code when you move it, and be sure to call it out in a CL description when you do so. I wonder why FindValidBoundaryBefore doesn't need a similar cast. https://codereview.chromium.org/1070223004/diff/530001/ui/gfx/text_utils.cc#n... ui/gfx/text_utils.cc:98: } nit: drop curly braces https://codereview.chromium.org/1070223004/diff/530001/ui/gfx/text_utils.h File ui/gfx/text_utils.h (right): https://codereview.chromium.org/1070223004/diff/530001/ui/gfx/text_utils.h#ne... ui/gfx/text_utils.h:37: FindValidBoundaryBefore(const base::string16& text, size_t index); nit: Is this the correct line breaking according to the style guide? I thought we only put the export and return type on a separate line if they didn't fit on the same line as the function name without args. I suspect this should be something like this (and ditto below): GFX_EXPORT size_t FindValidBoundaryBefore(const base::string16& text, size_t index);
Patchset #18 (id:550001) has been deleted
msw@, I've addressed your comments. Please take another look, thanks for your review. https://codereview.chromium.org/1070223004/diff/530001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/1070223004/diff/530001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:445: // Don't separate surrogate pair or combined characters. On 2015/06/03 00:32:03, msw wrote: > Can we test for this behavior? Please ensure that it actually fails without this > line or without the FindValidBoundaryAfter for the zero-width case below. Added two unit tests: RenderTextTest.Multiline_SurrogatePairsOrCombiningChars and RenderTextTest.Multiline_ZeroWidthChars. https://codereview.chromium.org/1070223004/diff/530001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:452: if (width == 0 && available_width_ == max_width_) { On 2015/06/03 00:32:03, msw wrote: > Does this need to loop while the width is zero? Isn't that what the last patch > set did? Can we add a test for this > behavior (multiple zero-width chars put on the same line until the first > non-zero-width char)? No need, this case (zero-width chars) has been addressed in line 436-443 (in this patch). |end_pos| will loop until encounter a non-zero width character. Actually after the second thought, we could also remove FindValidBoundaryBefore() call here because it's been taken care in line 438 GetGlyphWidthForCharRange() function. GetGlyphWidthForCharRange() receives a char_range parameter, and maps the char_range to a valid glyph range and calculates the corresponding glyph width. So we won't come to a situation that separates a surrogate pair or combining characters. But FindValidBoundaryAfter() call is still necessary and RenderTextTest.Multiline_SurrogatePairsOrCombiningChars will fail without this call. https://codereview.chromium.org/1070223004/diff/530001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:453: end_pos++; On 2015/06/03 00:32:03, msw wrote: > nit: |inline end_pos + 1| below instead, drop curly braces. Done. https://codereview.chromium.org/1070223004/diff/530001/ui/gfx/text_elider.cc File ui/gfx/text_elider.cc (right): https://codereview.chromium.org/1070223004/diff/530001/ui/gfx/text_elider.cc#... ui/gfx/text_elider.cc:28: #include "third_party/icu/source/common/unicode/uchar.h" On 2015/06/03 00:32:03, msw wrote: > nit: can we remove some includes, like this one, now? Done. https://codereview.chromium.org/1070223004/diff/530001/ui/gfx/text_elider.cc#... ui/gfx/text_elider.cc:31: #include "third_party/icu/source/common/unicode/utf16.h" On 2015/06/03 00:32:03, msw wrote: > ditto Done. https://codereview.chromium.org/1070223004/diff/530001/ui/gfx/text_utils.cc File ui/gfx/text_utils.cc (right): https://codereview.chromium.org/1070223004/diff/530001/ui/gfx/text_utils.cc#n... ui/gfx/text_utils.cc:88: size_t FindValidBoundaryAfter(const base::string16& text, size_t index) { On 2015/06/03 00:32:04, msw wrote: > Why did you drop the checked_casts here? Is > https://codereview.chromium.org/681123006 somehow no longer applicable to this > code? Be careful about changing code when you move it, and be sure to call it > out in a CL description when you do so. I wonder why FindValidBoundaryBefore > doesn't need a similar cast. Sorry my bad, I should not have done that. I saw FindValidBoundaryBefore() doesn't have this checked_casts, and thought it should be safe to remove it here as well... The reason that FindValidBoundaryBefore() doesn't need this cast is that U16_SET_CP_START() doesn't take text.length() as an argument, but U16_SET_CP_LIMIT() does. As described in https://codereview.chromium.org/681123006, U16_SET_CP_LIMIT() expect the argument to be an int32, so if we pass a size_t value to it, when compiling with --Werror=type-limits, it will result in compilation error. https://codereview.chromium.org/1070223004/diff/530001/ui/gfx/text_utils.cc#n... ui/gfx/text_utils.cc:98: } On 2015/06/03 00:32:04, msw wrote: > nit: drop curly braces After bring back the checked_casts, it still requires curly braces here. https://codereview.chromium.org/1070223004/diff/530001/ui/gfx/text_utils.h File ui/gfx/text_utils.h (right): https://codereview.chromium.org/1070223004/diff/530001/ui/gfx/text_utils.h#ne... ui/gfx/text_utils.h:37: FindValidBoundaryBefore(const base::string16& text, size_t index); On 2015/06/03 00:32:04, msw wrote: > nit: Is this the correct line breaking according to the style guide? I thought > we only put the export and return type on a separate line if they didn't fit on > the same line as the function name without args. I suspect this should be > something like this (and ditto below): > GFX_EXPORT size_t FindValidBoundaryBefore(const base::string16& text, > size_t index); Yes, this is the correct line breaking. It's generated by running "git cl format". I agree it's weird...
Patchset #19 (id:590001) has been deleted
https://codereview.chromium.org/1070223004/diff/650001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/1070223004/diff/650001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:447: // not separate surrogate pair or combining characters. You removed the preceding call to FindValidBoundaryBefore above. Won't this potentially split pairs if the conditional below doesn't fire? https://codereview.chromium.org/1070223004/diff/650001/ui/gfx/render_text_uni... File ui/gfx/render_text_unittest.cc (right): https://codereview.chromium.org/1070223004/diff/650001/ui/gfx/render_text_uni... ui/gfx/render_text_unittest.cc:2474: render_text.SetFontList(FontList("Arial, 13px")); Avoid relying on a particular fonts, it might break on Win XP, etc. https://codereview.chromium.org/1070223004/diff/650001/ui/gfx/render_text_uni... ui/gfx/render_text_unittest.cc:2481: const Range char_ranges[3] = {Range(0, 6), Range(6, 10), Range(10, 11)}; My concern was actually for zero width spaces at the start of a line, but I guess that's pretty obscure if it's even possible ("\n" + kZeroWidthSpace?)... https://codereview.chromium.org/1070223004/diff/650001/ui/gfx/render_text_uni... ui/gfx/render_text_unittest.cc:2494: if (segment_size > 0) nit: instead ASSERT_GT(segment_size, 0), and "Range line_range(render_text..."
msw@, I've addressed your comments. Please take another look, thanks! https://codereview.chromium.org/1070223004/diff/650001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/1070223004/diff/650001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:447: // not separate surrogate pair or combining characters. On 2015/06/09 01:15:50, msw wrote: > You removed the preceding call to FindValidBoundaryBefore above. > Won't this potentially split pairs if the conditional below doesn't fire? We don't need to worry about this case, because GetGlyhpWidthForCharRange() function in line 438 will take care of it. GetGlyphWidthForCharRange() receives a char_range parameter, and maps the char_range to a VALID glyph_range and calculates the corresponding glyph width. If the glyph width can't be fit into the current line, break from the loop so we won't come to a situation that separates a surrogate pair or combining characters. We also don't need to worry about zero-width characters because this case will be addressed in line 436-443. |end_pos| will loop until encounter a non-zero width character. https://codereview.chromium.org/1070223004/diff/650001/ui/gfx/render_text_uni... File ui/gfx/render_text_unittest.cc (right): https://codereview.chromium.org/1070223004/diff/650001/ui/gfx/render_text_uni... ui/gfx/render_text_unittest.cc:2474: render_text.SetFontList(FontList("Arial, 13px")); On 2015/06/09 01:15:50, msw wrote: > Avoid relying on a particular fonts, it might break on Win XP, etc. Done. https://codereview.chromium.org/1070223004/diff/650001/ui/gfx/render_text_uni... ui/gfx/render_text_unittest.cc:2481: const Range char_ranges[3] = {Range(0, 6), Range(6, 10), Range(10, 11)}; On 2015/06/09 01:15:50, msw wrote: > My concern was actually for zero width spaces at the start of a line, but I > guess that's pretty obscure if it's even possible ("\n" + kZeroWidthSpace?)... I'm not quite understand your concern... Why would "\n" + kZeroWidthSpace cause any problem? I modified the test RenderTextTest.Multiline_ZeroWidthChars to include the case that zero width space at the start of a line. Is that what you mean? https://codereview.chromium.org/1070223004/diff/650001/ui/gfx/render_text_uni... ui/gfx/render_text_unittest.cc:2494: if (segment_size > 0) On 2015/06/09 01:15:50, msw wrote: > nit: instead ASSERT_GT(segment_size, 0), and "Range line_range(render_text..." Done.
https://codereview.chromium.org/1070223004/diff/650001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/1070223004/diff/650001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:447: // not separate surrogate pair or combining characters. On 2015/06/09 17:59:21, xdai1 wrote: > On 2015/06/09 01:15:50, msw wrote: > > You removed the preceding call to FindValidBoundaryBefore above. > > Won't this potentially split pairs if the conditional below doesn't fire? > > We don't need to worry about this case, because GetGlyhpWidthForCharRange() > function in line 438 will take care of it. GetGlyphWidthForCharRange() receives > a char_range parameter, and maps the char_range to a VALID glyph_range and > calculates the corresponding glyph width. If the glyph width can't be fit into > the current line, break from the loop so we won't come to a situation that > separates a surrogate pair or combining characters. Sorry, I don't understand; this function returns |end_pos|, the value of |end_pos| isn't guaranteed to be valid, is it? Even if GetGlyphWidthForCharRange uses the nearest valid glyph range for the invalid character range passed in, I worry that returning an invalid |end_pos| from here could break the line between a surrogate pair. > We also don't need to worry about zero-width characters because this case will > be addressed in line 436-443. |end_pos| will loop until encounter a non-zero > width character. That's fair. https://codereview.chromium.org/1070223004/diff/650001/ui/gfx/render_text_uni... File ui/gfx/render_text_unittest.cc (right): https://codereview.chromium.org/1070223004/diff/650001/ui/gfx/render_text_uni... ui/gfx/render_text_unittest.cc:2481: const Range char_ranges[3] = {Range(0, 6), Range(6, 10), Range(10, 11)}; On 2015/06/09 17:59:21, xdai1 wrote: > On 2015/06/09 01:15:50, msw wrote: > > My concern was actually for zero width spaces at the start of a line, but I > > guess that's pretty obscure if it's even possible ("\n" + kZeroWidthSpace?)... > > I'm not quite understand your concern... Why would "\n" + kZeroWidthSpace cause > any problem? I modified the test RenderTextTest.Multiline_ZeroWidthChars to > include the case that zero width space at the start of a line. Is that what you > mean? I think my concern was that the line breaker might put a zero-width space on its own line... but I feel like this concern isn't worth investigating further at this point.
msw@, please take another look, thanks! https://codereview.chromium.org/1070223004/diff/650001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/1070223004/diff/650001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:447: // not separate surrogate pair or combining characters. On 2015/06/09 22:51:53, msw wrote: > On 2015/06/09 17:59:21, xdai1 wrote: > > On 2015/06/09 01:15:50, msw wrote: > > > You removed the preceding call to FindValidBoundaryBefore above. > > > Won't this potentially split pairs if the conditional below doesn't fire? > > > > We don't need to worry about this case, because GetGlyhpWidthForCharRange() > > function in line 438 will take care of it. GetGlyphWidthForCharRange() > receives > > a char_range parameter, and maps the char_range to a VALID glyph_range and > > calculates the corresponding glyph width. If the glyph width can't be fit into > > the current line, break from the loop so we won't come to a situation that > > separates a surrogate pair or combining characters. > > Sorry, I don't understand; this function returns |end_pos|, the value of > |end_pos| isn't guaranteed to be valid, is it? Even if GetGlyphWidthForCharRange > uses the nearest valid glyph range for the invalid character range passed in, I > worry that returning an invalid |end_pos| from here could break the line > between a surrogate pair. > Yes, I think you're right! I should not remove the call to FindValidBoundaryBefore() here. I also modified the test RenderTextTest.Multiline_SurrogatePairsOrCombiningChars to include this case.
lgtm with a nit https://codereview.chromium.org/1070223004/diff/690001/ui/gfx/render_text_har... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/1070223004/diff/690001/ui/gfx/render_text_har... ui/gfx/render_text_harfbuzz.cc:446: width = run.GetGlyphWidthForCharRange( nit: If GetGlyphWidthForCharRange is expensive, perhaps it should only be called on the rare cases where |end_pos| changes in the line above, like this: const size_t valid_end_pos = FindValidBoundaryBefore(text_, end_pos); if (end_pos != valid_end_pos) { end_pos = valid_end_pos; width = run.GetGlyphWidthForCharRange( Range(segment.char_range.start(), end_pos)); }
On 2015/06/10 18:56:48, msw wrote: > lgtm with a nit > > https://codereview.chromium.org/1070223004/diff/690001/ui/gfx/render_text_har... > File ui/gfx/render_text_harfbuzz.cc (right): > > https://codereview.chromium.org/1070223004/diff/690001/ui/gfx/render_text_har... > ui/gfx/render_text_harfbuzz.cc:446: width = run.GetGlyphWidthForCharRange( > nit: If GetGlyphWidthForCharRange is expensive, perhaps it should only be called > on the rare cases where |end_pos| changes in the line above, like this: > const size_t valid_end_pos = FindValidBoundaryBefore(text_, end_pos); > if (end_pos != valid_end_pos) { > end_pos = valid_end_pos; > width = run.GetGlyphWidthForCharRange( > Range(segment.char_range.start(), end_pos)); > } Addressed, thanks for your review!
The CQ bit was checked by xdai@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from mukai@chromium.org, msw@chromium.org Link to the patchset: https://codereview.chromium.org/1070223004/#ps710001 (title: "address msw@'s comment.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1070223004/710001
Message was sent while issue was closed.
Committed patchset #24 (id:710001)
Message was sent while issue was closed.
Patchset 24 (id:??) landed as https://crrev.com/149bee40c5cc0bd72cee9a5550b0d178a449e03e Cr-Commit-Position: refs/heads/master@{#333845} |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
