|
|
Created:
6 years, 7 months ago by ckocagil Modified:
6 years, 6 months ago Reviewers:
msw CC:
chromium-reviews Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionFix the Char to Glyph mapping logic in RenderTextHarfBuzz
BUG=321868
R=msw@chromium.org
NOTRY=true
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=275754
Patch Set 1 #
Total comments: 12
Patch Set 2 : comment #Patch Set 3 : comment #Patch Set 4 : rebased #Patch Set 5 : add tests, fix implementation #
Total comments: 13
Patch Set 6 : comments #Patch Set 7 : elaborate #Patch Set 8 : compile fix #Patch Set 9 : compile fix 2 #
Messages
Total messages: 31 (0 generated)
https://codereview.chromium.org/300623003/diff/1/ui/gfx/render_text_harfbuzz.cc File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/300623003/diff/1/ui/gfx/render_text_harfbuzz.... ui/gfx/render_text_harfbuzz.cc:378: glyph_count : CharToGlyph(char_range.end()); This doesn't seem right. It seems like if the end character is the last in the run, this will include all of its glyphs (glyph_count exceeds the index value of the last glyph associated with the end character), but if it's mid-run, CharToGlyph will return "the index of the first first glyph that corresponds to the character", and thus not really include the glyphs of that character.
https://codereview.chromium.org/300623003/diff/1/ui/gfx/render_text_harfbuzz.cc File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/300623003/diff/1/ui/gfx/render_text_harfbuzz.... ui/gfx/render_text_harfbuzz.cc:378: glyph_count : CharToGlyph(char_range.end()); |char_range.end() == range.end()| doesn't mean "last character in the run", it means "one past the last character". Remember that these are half-open intervals.
lgtm with a nit. https://codereview.chromium.org/300623003/diff/1/ui/gfx/render_text_harfbuzz.cc File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/300623003/diff/1/ui/gfx/render_text_harfbuzz.... ui/gfx/render_text_harfbuzz.cc:371: // For RTL runs, we subtract 1 from |char_range| to get the trailing edges. nit: the meaning of "to get the trailing edges" isn't entirely clear to me, can you clarify or elaborate on this comment? https://codereview.chromium.org/300623003/diff/1/ui/gfx/render_text_harfbuzz.... ui/gfx/render_text_harfbuzz.cc:378: glyph_count : CharToGlyph(char_range.end()); On 2014/05/28 01:02:02, ckocagil wrote: > |char_range.end() == range.end()| doesn't mean "last character in the run", it > means "one past the last character". Remember that these are half-open > intervals. Ah, that makes more way more sense.
https://codereview.chromium.org/300623003/diff/1/ui/gfx/render_text_harfbuzz.cc File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/300623003/diff/1/ui/gfx/render_text_harfbuzz.... ui/gfx/render_text_harfbuzz.cc:371: // For RTL runs, we subtract 1 from |char_range| to get the trailing edges. On 2014/05/28 01:18:30, msw wrote: > nit: the meaning of "to get the trailing edges" isn't entirely clear to me, can > you clarify or elaborate on this comment? I tried to find a better way to explain this, but couldn't. Let's say we're trying to select [1,2) in a 3-char string. LTR: chars: A[B)C glyphs: a[b)c RTL: chars: A[B)C glyphs: (c]b a oops! it should be "c(b]a" and that's what subtracting 1 does.
Having some unit testing of these functions would help a lot! https://codereview.chromium.org/300623003/diff/1/ui/gfx/render_text_harfbuzz.cc File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/300623003/diff/1/ui/gfx/render_text_harfbuzz.... ui/gfx/render_text_harfbuzz.cc:348: if (pos < glyph_to_char[i + 1]) For logical LTR string "a" that yields two glyphs "xy", it looks like this incorrectly returns 1 for input |pos|==0 (I'd expect it to return 0 for the first glyph, 'x', not 1 for the second glyph 'y'). Am I missing something? Should this be: if (direction == UBIDI_LTR) { for (size_t i = 0; i < glyph_count; ++i) { if (pos <= glyph_to_char[i + 1]) return i; return glyph_count - 1; } https://codereview.chromium.org/300623003/diff/1/ui/gfx/render_text_harfbuzz.... ui/gfx/render_text_harfbuzz.cc:371: // For RTL runs, we subtract 1 from |char_range| to get the trailing edges. On 2014/06/02 02:57:42, ckocagil wrote: > On 2014/05/28 01:18:30, msw wrote: > > nit: the meaning of "to get the trailing edges" isn't entirely clear to me, > can > > you clarify or elaborate on this comment? > > I tried to find a better way to explain this, but couldn't. > > Let's say we're trying to select [1,2) in a 3-char string. > > LTR: > chars: A[B)C > glyphs: a[b)c > > RTL: > chars: A[B)C > glyphs: (c]b a oops! > > it should be "c(b]a" and that's what subtracting 1 does. So a char range of [1,2) would yield "[c)ba" without -1's like: first = CharToGlyph(char_range.start()) => CharToGlyph(1) => 1 last = CharToGlyph(char_range.end()) => CharToGlyph(2) => 0 return Range(std::min(first, last), std::max(first, last)); return Range(std::min(1, 0), std::max(1, 0)); return Range(0, 1) => "[c)ba" But a char range of [1,2) would yield "c[b)a" with -1's like: first = CharToGlyph(char_range.start() - 1) => CharToGlyph(0) => 2 last = CharToGlyph(char_range.end() - 1) => CharToGlyph(1) => 1 return Range(std::min(first, last), std::max(first, last)); return Range(std::min(2, 1), std::max(2, 1)); return Range(1, 2) => "c[b)a" It looks right for odd dual-glyph or surrogate pair cases too: String A[B)D where 'B' is a two-glyph char for "d[cb)a" yields: first = CharToGlyph(char_range.start() - 1) => CharToGlyph(0) => 3 last = CharToGlyph(char_range.end() - 1) => CharToGlyph(1) => 1 return Range(std::min(first, last), std::max(first, last)); return Range(std::min(3, 1), std::max(3, 1)); return Range(1, 3) => "d[cb)a" String A[BC)D where 'BC' yields just one glyph for "d[b)a" yields: first = CharToGlyph(char_range.start() - 1) => CharToGlyph(0) => 2 last = CharToGlyph(char_range.end() - 1) => CharToGlyph(1) => 1 return Range(std::min(first, last), std::max(first, last)); return Range(std::min(2, 1), std::max(2, 1)); return Range(1, 2) => "d[b)a" So, I guess that subtracting 1 makes sense, but I'm not sure that it's really "to get the trailing edges". It almost seems like you're trying to get the *leading* edges, at least in the case of "A[B)C", where the *leading* edges of chars 1 ('B') and 2 ('C') are "A|B|C", and their trailing edges are "AB|C|". Even if you apply the term to the visual string "cba", you want the leading edges of the corresponding glyphs "c|b|a", where the trailing edges would be the incorrect "|c|ba". But perhaps you mean the trailing edges of the preceding characters? Can you change it to "leading", explain why "trailing" makes more sense, or perhaps say "For RTL runs, we subtract 1 from |char_range| to get the trailing edges of the preceding characters" or similar?
https://codereview.chromium.org/300623003/diff/1/ui/gfx/render_text_harfbuzz.cc File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/300623003/diff/1/ui/gfx/render_text_harfbuzz.... ui/gfx/render_text_harfbuzz.cc:371: // For RTL runs, we subtract 1 from |char_range| to get the trailing edges. What I mean by trailing is: imagine if CharToGlyph returned a range of glyphs. For LTR we'd use CharToGlyph(i).start(), for RTL we'd use CharToGlyph(i).end(). Okay, I changed it to "leading". lgty?
We really need unit tests for behavior like this. It'll be easier and it'll yield more more comprehensive coverage to implement some as you go for specific behavior nuances like this, rather than a big independent push before we flip HarfBuzz on-by-default, but I guess I won't prevent you from landing without a test for this... https://codereview.chromium.org/300623003/diff/1/ui/gfx/render_text_harfbuzz.cc File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/300623003/diff/1/ui/gfx/render_text_harfbuzz.... ui/gfx/render_text_harfbuzz.cc:348: if (pos < glyph_to_char[i + 1]) On 2014/06/02 19:37:58, msw wrote: > For logical LTR string "a" that yields two glyphs "xy", it looks like this > incorrectly returns 1 for input |pos|==0 (I'd expect it to return 0 for the > first glyph, 'x', not 1 for the second glyph 'y'). Am I missing something? > Should this be: > if (direction == UBIDI_LTR) { > for (size_t i = 0; i < glyph_count; ++i) { > if (pos <= glyph_to_char[i + 1]) > return i; > return glyph_count - 1; > } What about this? LMK if I'm wrong or how we should follow-up.
(Actually you should prevent me from landing this without a proper unit test. Sorry for making you remind me, I had totally forgotten about writing a test.) So, here's the test and a better implementation. https://codereview.chromium.org/300623003/diff/1/ui/gfx/render_text_harfbuzz.cc File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/300623003/diff/1/ui/gfx/render_text_harfbuzz.... ui/gfx/render_text_harfbuzz.cc:348: if (pos < glyph_to_char[i + 1]) On 2014/06/03 22:32:09, msw wrote: > On 2014/06/02 19:37:58, msw wrote: > > For logical LTR string "a" that yields two glyphs "xy", it looks like this > > incorrectly returns 1 for input |pos|==0 (I'd expect it to return 0 for the > > first glyph, 'x', not 1 for the second glyph 'y'). Am I missing something? > > Should this be: > > if (direction == UBIDI_LTR) { > > for (size_t i = 0; i < glyph_count; ++i) { > > if (pos <= glyph_to_char[i + 1]) > > return i; > > return glyph_count - 1; > > } > > What about this? LMK if I'm wrong or how we should follow-up. This should be fixed now.
Sorry for my confusion, thanks for addressing my earlier comments to fix that error, writing a test, and bearing with me here. https://codereview.chromium.org/300623003/diff/1/ui/gfx/render_text_harfbuzz.cc File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/300623003/diff/1/ui/gfx/render_text_harfbuzz.... ui/gfx/render_text_harfbuzz.cc:943: run->glyph_to_char.reset(new uint32[run->glyph_count]); Would it be helpful and prudent to make a similar char_to_cluster_start array? https://codereview.chromium.org/300623003/diff/80001/ui/gfx/render_text_harfb... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/300623003/diff/80001/ui/gfx/render_text_harfb... ui/gfx/render_text_harfbuzz.cc:396: size_t cluster_start = 0; Q: Should RTL find the cluster start too? https://codereview.chromium.org/300623003/diff/80001/ui/gfx/render_text_harfb... ui/gfx/render_text_harfbuzz.cc:398: if (pos < glyph_to_char[i]) nit: make this part of the loop's terminating condition. https://codereview.chromium.org/300623003/diff/80001/ui/gfx/render_text_harfb... ui/gfx/render_text_harfbuzz.cc:425: for (size_t i = char_range.start(); i > range.start(); --i) { Why do we need these loops now? Maybe add a comment? https://codereview.chromium.org/300623003/diff/80001/ui/gfx/render_text_unitt... File ui/gfx/render_text_unittest.cc (right): https://codereview.chromium.org/300623003/diff/80001/ui/gfx/render_text_unitt... ui/gfx/render_text_unittest.cc:1928: TEST_F(RenderTextTest, HarfBuzz_CharToGlyph) { Very nice test; thank you for adding this! https://codereview.chromium.org/300623003/diff/80001/ui/gfx/render_text_unitt... ui/gfx/render_text_unittest.cc:1936: // From string "A B C D" to glyphs "a b c d". nit: move the comments up to share their open curly lines.
https://codereview.chromium.org/300623003/diff/1/ui/gfx/render_text_harfbuzz.cc File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/300623003/diff/1/ui/gfx/render_text_harfbuzz.... ui/gfx/render_text_harfbuzz.cc:943: run->glyph_to_char.reset(new uint32[run->glyph_count]); On 2014/06/06 02:00:42, msw wrote: > Would it be helpful and prudent to make a similar char_to_cluster_start array? It would be helpful, but this is a trade off between memory and cpu (and code complexity). I think this CL is currently fine (not too complex), as long as the tests are sufficient. https://codereview.chromium.org/300623003/diff/80001/ui/gfx/render_text_harfb... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/300623003/diff/80001/ui/gfx/render_text_harfb... ui/gfx/render_text_harfbuzz.cc:396: size_t cluster_start = 0; On 2014/06/06 02:00:42, msw wrote: > Q: Should RTL find the cluster start too? It already returns the cluster start without any special handling. https://codereview.chromium.org/300623003/diff/80001/ui/gfx/render_text_harfb... ui/gfx/render_text_harfbuzz.cc:398: if (pos < glyph_to_char[i]) On 2014/06/06 02:00:42, msw wrote: > nit: make this part of the loop's terminating condition. Done. https://codereview.chromium.org/300623003/diff/80001/ui/gfx/render_text_harfb... ui/gfx/render_text_harfbuzz.cc:425: for (size_t i = char_range.start(); i > range.start(); --i) { On 2014/06/06 02:00:42, msw wrote: > Why do we need these loops now? Maybe add a comment? Every cluster has a non-empty set of glyphs. CharToGlyph gives us the cluster begin. To figure out the cluster end, we loop until we find a non-empty glyph range. Added a comment. https://codereview.chromium.org/300623003/diff/80001/ui/gfx/render_text_unitt... File ui/gfx/render_text_unittest.cc (right): https://codereview.chromium.org/300623003/diff/80001/ui/gfx/render_text_unitt... ui/gfx/render_text_unittest.cc:1936: // From string "A B C D" to glyphs "a b c d". On 2014/06/06 02:00:42, msw wrote: > nit: move the comments up to share their open curly lines. Done.
lgtm with a question for my own clarification. https://codereview.chromium.org/300623003/diff/80001/ui/gfx/render_text_harfb... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/300623003/diff/80001/ui/gfx/render_text_harfb... ui/gfx/render_text_harfbuzz.cc:425: for (size_t i = char_range.start(); i > range.start(); --i) { On 2014/06/07 05:40:33, ckocagil wrote: > On 2014/06/06 02:00:42, msw wrote: > > Why do we need these loops now? Maybe add a comment? > > Every cluster has a non-empty set of glyphs. CharToGlyph gives us the cluster > begin. To figure out the cluster end, we loop until we find a non-empty glyph > range. > > Added a comment. Hmm, okay I suppose. Can you give me an example where |first = CharToGlyph(char_range.start() -1)| wouldn't yield something distinct from |last = CharToGlyph(char_range.end() -1)| and the loop would be needed? Also, is such a case tested in your new unit test?
https://codereview.chromium.org/300623003/diff/80001/ui/gfx/render_text_harfb... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/300623003/diff/80001/ui/gfx/render_text_harfb... ui/gfx/render_text_harfbuzz.cc:425: for (size_t i = char_range.start(); i > range.start(); --i) { On 2014/06/08 00:10:17, msw wrote: > On 2014/06/07 05:40:33, ckocagil wrote: > > On 2014/06/06 02:00:42, msw wrote: > > > Why do we need these loops now? Maybe add a comment? > > > > Every cluster has a non-empty set of glyphs. CharToGlyph gives us the cluster > > begin. To figure out the cluster end, we loop until we find a non-empty glyph > > range. > > > > Added a comment. > > Hmm, okay I suppose. Can you give me an example where |first = > CharToGlyph(char_range.start() -1)| wouldn't yield something distinct from |last > = CharToGlyph(char_range.end() -1)| and the loop would be needed? Also, is such > a case tested in your new unit test? The loops are needed for glyphs that correspond to multiple characters. Let me give you an LTR example (same idea for RTL): A B C D -> ab c d (where ab is a single glyph) We want CharRangeToGlyphRange [0, 1) to be [0, 1). However, both CharToGlyph(0) and CharToGlyph(1) give us the cluster start, glyph 0, which gives us the empty glyph range [0, 0). This is because the cluster doesn't end at char 1, it ends at char 2. To find the actual cluster end, we need to loop. Yes, both loops are tested by our cases. I actually wrote the tests first and then added the loops to fix this problem.
Still LGTM! https://codereview.chromium.org/300623003/diff/80001/ui/gfx/render_text_harfb... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/300623003/diff/80001/ui/gfx/render_text_harfb... ui/gfx/render_text_harfbuzz.cc:425: for (size_t i = char_range.start(); i > range.start(); --i) { On 2014/06/08 01:30:07, ckocagil wrote: > On 2014/06/08 00:10:17, msw wrote: > > On 2014/06/07 05:40:33, ckocagil wrote: > > > On 2014/06/06 02:00:42, msw wrote: > > > > Why do we need these loops now? Maybe add a comment? > > > > > > Every cluster has a non-empty set of glyphs. CharToGlyph gives us the > cluster > > > begin. To figure out the cluster end, we loop until we find a non-empty > glyph > > > range. > > > > > > Added a comment. > > > > Hmm, okay I suppose. Can you give me an example where |first = > > CharToGlyph(char_range.start() -1)| wouldn't yield something distinct from > |last > > = CharToGlyph(char_range.end() -1)| and the loop would be needed? Also, is > such > > a case tested in your new unit test? > > The loops are needed for glyphs that correspond to multiple characters. Let me > give you an LTR example (same idea for RTL): > > A B C D -> ab c d (where ab is a single glyph) > We want CharRangeToGlyphRange [0, 1) to be [0, 1). However, both CharToGlyph(0) > and CharToGlyph(1) give us the cluster start, glyph 0, which gives us the empty > glyph range [0, 0). This is because the cluster doesn't end at char 1, it ends > at char 2. To find the actual cluster end, we need to loop. > > Yes, both loops are tested by our cases. I actually wrote the tests first and > then added the loops to fix this problem. Great explanation, thank you! (it might almost be worth adding as a comment to the code).
Landing now https://codereview.chromium.org/300623003/diff/80001/ui/gfx/render_text_harfb... File ui/gfx/render_text_harfbuzz.cc (right): https://codereview.chromium.org/300623003/diff/80001/ui/gfx/render_text_harfb... ui/gfx/render_text_harfbuzz.cc:425: for (size_t i = char_range.start(); i > range.start(); --i) { Expanded the comment a bit without getting too detailed
The CQ bit was checked by ckocagil@chromium.org
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/ckocagil@chromium.org/300623003/120001
FYI, CQ is re-trying this CL (attempt #1). The failing builders are: linux_chromium_chromeos_clang_dbg on tryserver.chromium (http://build.chromium.org/p/tryserver.chromium/builders/linux_chromium_chrome...) linux_chromium_chromeos_rel on tryserver.chromium (http://build.chromium.org/p/tryserver.chromium/builders/linux_chromium_chrome...) linux_chromium_clang_dbg on tryserver.chromium (http://build.chromium.org/p/tryserver.chromium/builders/linux_chromium_clang_...) linux_chromium_rel on tryserver.chromium (http://build.chromium.org/p/tryserver.chromium/builders/linux_chromium_rel/bu...) mac_chromium_compile_dbg on tryserver.chromium (http://build.chromium.org/p/tryserver.chromium/builders/mac_chromium_compile_...) mac_chromium_rel on tryserver.chromium (http://build.chromium.org/p/tryserver.chromium/builders/mac_chromium_rel/buil...)
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: linux_chromium_clang_dbg on tryserver.chromium (http://build.chromium.org/p/tryserver.chromium/builders/linux_chromium_clang_...)
The CQ bit was checked by ckocagil@chromium.org
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/ckocagil@chromium.org/300623003/140001
The CQ bit was unchecked by ckocagil@chromium.org
The CQ bit was checked by ckocagil@chromium.org
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/ckocagil@chromium.org/300623003/150004
FYI, CQ is re-trying this CL (attempt #1). The failing builders are: android_dbg_triggered_tests on tryserver.chromium (http://build.chromium.org/p/tryserver.chromium/builders/android_dbg_triggered...)
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: android_dbg_triggered_tests on tryserver.chromium (http://build.chromium.org/p/tryserver.chromium/builders/android_dbg_triggered...)
The CQ bit was checked by ckocagil@chromium.org
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/ckocagil@chromium.org/300623003/150004
Message was sent while issue was closed.
Change committed as 275754 |