|
|
Created:
7 years, 5 months ago by Stephen Chennney Modified:
7 years, 5 months ago CC:
blink-reviews, jamesr, eae+blinkwatch, danakj, dglazkov+blink, Rik, jeez, pdr., abarth-chromium Base URL:
svn://svn.chromium.org/blink/trunk Visibility:
Public. |
DescriptionOptimize Font CodePath selection and more unit testing.
Replaces the long sequences of if statements in Font::characterRangeCodePath, Font::isCJKIdeograph and Font::isCJKIdeographOrSymbol
with a hash set and/or binary tree lookup (with a little bit of early out and surrogate glyph
testing).
Also here is additional unit testing for these font code path checks.
Perf tests show no regression and no speedup, although we are unlikely to have much perf coverage of these glyphs, if any. I don't see how we could possible be slower for most cases, and we would only be slower in complex cases if the glyphs happen to hit one of the early if tests. Note that in the old code, Font::isCJKIdeograph and Font::isCJKIdeographOrSymbol went through every if statement before returning false, despite the fact that the bulk of common glyphs could be trivially rejected as being lower than any of the ranges.
There is an obvious error at the very end of
Font::isCJKIdeographOrSymbol which is fixed in this patch.
R=eseidel@chromium.org
BUG=251328
Committed: https://src.chromium.org/viewvc/blink?view=rev&revision=154838
Patch Set 1 #Patch Set 2 : Remove extraneous code. #Patch Set 3 : Fix Windows compile. Simpler too. #Patch Set 4 : Fix test check macros. #
Total comments: 1
Patch Set 5 : Fixed tests and rest of optimization #
Total comments: 4
Messages
Total messages: 21 (0 generated)
On 2013/07/11 19:10:48, Stephen Chenney wrote: New patch pending to fix Windows compile error.
Eric, you might like where this is going.
I find the unittests insanity-verbose. :) I do support better testing however. Do we have an official list of these from unicode or something that we could just read from a file instead of copying into c++? I don't trust my ability to copy/paste very well, so I prefer to use official test files when possible. :) https://codereview.chromium.org/18949006/diff/5891733057437696/Source/core/pl... File Source/core/platform/graphics/FontTest.cpp (right): https://codereview.chromium.org/18949006/diff/5891733057437696/Source/core/pl... Source/core/platform/graphics/FontTest.cpp:647: c = 0xA000; Is it some sort of gtest preferred style to not just write these out as one line? EXPECT_TRUE(FONT::isCJKIdeograph(0x9FFF)); seems easier to read. expectIsCGKIdeograph(0x9FFF, true); might even be cleaner.
I really must have been sleep deprived when I wrote those tests. Here's the sane version.
On 2013/07/15 21:32:32, Stephen Chenney wrote: > I really must have been sleep deprived when I wrote those tests. Here's the sane > version. Bump.
omg, so much better. It looks like LiteralSEt never landed, sadly: https://codereview.chromium.org/15622005/ https://codereview.chromium.org/18949006/diff/15001/Source/core/platform/grap... File Source/core/platform/graphics/Font.cpp (right): https://codereview.chromium.org/18949006/diff/15001/Source/core/platform/grap... Source/core/platform/graphics/Font.cpp:527: static HashSet<UChar32>* cjkIsolatedSymbols = 0; abarth made us a LiteralHashSet/StaticHashSet at some point to handle thsi kind of thing if you need.
https://codereview.chromium.org/18949006/diff/15001/Source/core/platform/grap... File Source/core/platform/graphics/Font.cpp (right): https://codereview.chromium.org/18949006/diff/15001/Source/core/platform/grap... Source/core/platform/graphics/Font.cpp:659: UChar32* boundingCharacter = approximateBinarySearch<UChar32, UChar32>( You mentioned you fixed an error here? Can it be tested?
On 2013/07/23 20:49:16, eseidel wrote: > omg, so much better. > > It looks like LiteralSEt never landed, sadly: > https://codereview.chromium.org/15622005/ > > https://codereview.chromium.org/18949006/diff/15001/Source/core/platform/grap... > File Source/core/platform/graphics/Font.cpp (right): > > https://codereview.chromium.org/18949006/diff/15001/Source/core/platform/grap... > Source/core/platform/graphics/Font.cpp:527: static HashSet<UChar32>* > cjkIsolatedSymbols = 0; > abarth made us a LiteralHashSet/StaticHashSet at some point to handle thsi kind > of thing if you need. I've CCed myself on the patch should it land. It seems that it needs to go in manually.
On 2013/07/23 20:50:51, eseidel wrote: > https://codereview.chromium.org/18949006/diff/15001/Source/core/platform/grap... > File Source/core/platform/graphics/Font.cpp (right): > > https://codereview.chromium.org/18949006/diff/15001/Source/core/platform/grap... > Source/core/platform/graphics/Font.cpp:659: UChar32* boundingCharacter = > approximateBinarySearch<UChar32, UChar32>( > You mentioned you fixed an error here? Can it be tested? I fixed this line: if (c >= 0x1F200 && c <= 0x1F6F) This is obviously wrong because 0x1F6F < 0x1F200. I believe it should be 0x1F6FF, based on looking at unicode character sets and considering the most likely way in which the error arose (someone missing the final character). That's what I changed it to. I don't think it's worth testing outside of the unit test (which does cover it).
lgtm I'm mostly trusting that you've stared at this code much more than I have here.
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/schenney@chromium.org/18949006/15001
Failed to apply patch for Source/core/platform/graphics/Font.cpp: While running patch -p1 --forward --force --no-backup-if-mismatch; patching file Source/core/platform/graphics/Font.cpp Hunk #1 FAILED at 28. 1 out of 3 hunks FAILED -- saving rejects to file Source/core/platform/graphics/Font.cpp.rej Patch: Source/core/platform/graphics/Font.cpp Index: Source/core/platform/graphics/Font.cpp diff --git a/Source/core/platform/graphics/Font.cpp b/Source/core/platform/graphics/Font.cpp index f948c7e8c9de9c02fc4e0ea40e31a48f61e23e9a..b6074ecb8e66c391e09308bcc5c75f8d640c19b1 100644 --- a/Source/core/platform/graphics/Font.cpp +++ b/Source/core/platform/graphics/Font.cpp @@ -28,10 +28,11 @@ #include "core/platform/graphics/TextRun.h" #include "core/platform/graphics/WidthIterator.h" #include "core/platform/text/transcoder/FontTranscoder.h" -#include <wtf/MainThread.h> -#include <wtf/MathExtras.h> -#include <wtf/text/StringBuilder.h> -#include <wtf/UnusedParam.h> +#include "wtf/MainThread.h" +#include "wtf/MathExtras.h" +#include "wtf/StdLibExtras.h" +#include "wtf/UnusedParam.h" +#include "wtf/text/StringBuilder.h" using namespace WTF; using namespace Unicode; @@ -353,122 +354,84 @@ Font::CodePath Font::codePath(const TextRun& run) const return characterRangeCodePath(run.characters16(), run.length()); } -Font::CodePath Font::characterRangeCodePath(const UChar* characters, unsigned len) +static inline UChar keyExtractorUChar(const UChar* value) { - // FIXME: Should use a UnicodeSet in ports where ICU is used. Note that we - // can't simply use UnicodeCharacter Property/class because some characters - // are not 'combining', but still need to go to the complex path. - // Alternatively, we may as well consider binary search over a sorted - // list of ranges. - CodePath result = Simple; - for (unsigned i = 0; i < len; i++) { - const UChar c = characters[i]; - if (c < 0x2E5) // U+02E5 through U+02E9 (Modifier Letters : Tone letters) - continue; - if (c <= 0x2E9) - return Complex; - - if (c < 0x300) // U+0300 through U+036F Combining diacritical marks - continue; - if (c <= 0x36F) - return Complex; + return *value; +} - if (c < 0x0591 || c == 0x05BE) // U+0591 through U+05CF excluding U+05BE Hebrew combining marks, Hebrew punctuation Paseq, Sof Pasuq and Nun Hafukha - continue; - if (c <= 0x05CF) - return Complex; +static inline UChar32 keyExtractorUChar32(const UChar32* value) +{ + return *value; +} +Font::CodePath Font::characterRangeCodePath(const UChar* characters, unsigned len) +{ + static UChar complexCodePathRanges[] = { + // U+02E5 through U+02E9 (Modifier Letters : Tone letters) + 0x2E5, 0x2E9, + // U+0300 through U+036F Combining diacritical marks + 0x300, 0x36F, + // U+0591 through U+05CF excluding U+05BE Hebrew combining marks, ... + 0x0591, 0x05BD, + // ... Hebrew punctuation Paseq, Sof Pasuq and Nun Hafukha + 0x05BF, 0x05CF, // U+0600 through U+109F Arabic, Syriac, Thaana, NKo, Samaritan, Mandaic, - // Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, + // Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, // Malayalam, Sinhala, Thai, Lao, Tibetan, Myanmar - if (c < 0x0600) - continue; - if (c <= 0x109F) - return Complex; + 0x0600, 0x109F, + // U+1100 through U+11FF Hangul Jamo (only Ancient Korean should be left + // here if you precompose; Modern Korean will be precomposed as a result of step A) + 0x1100, 0x11FF, + // U+135D through U+135F Ethiopic combining marks + 0x135D, 0x135F, + // U+1780 through U+18AF Tagalog, Hanunoo, Buhid, Taghanwa,Khmer, Mongolian + 0x1700, 0x18AF, + // U+1900 through U+194F Limbu (Unicode 4.0) + 0x1900, 0x194F, + // U+1980 through U+19DF New Tai Lue + 0x1980, 0x19DF, + // U+1A00 through U+1CFF Buginese, Tai Tham, Balinese, Batak, Lepcha, Vedic + 0x1A00, 0x1CFF, + // U+1DC0 through U+1DFF Comining diacritical mark supplement + 0x1DC0, 0x1DFF, + // U+20D0 through U+20FF Combining marks for symbols + 0x20D0, 0x20FF, + // U+2CEF through U+2CF1 Combining marks for Coptic + 0x2CEF, 0x2CF1, + // U+302A through U+302F Ideographic and Hangul Tone marks + 0x302A, 0x302F, + // U+A67C through U+A67D Combining marks for old Cyrillic + 0xA67C, 0xA67D, + // U+A6F0 through U+A6F1 Combining mark for Bamum + 0xA6F0, 0xA6F1, + // U+A800 through U+ABFF Nagri, Phags-pa, Saurashtra, Devanagari Extended, + // Hangul Jamo Ext. A, Javanese, Myanmar Extended A, Tai Viet, Meetei Mayek + 0xA800, 0xABFF, + // U+D7B0 through U+D7FF Hangul Jamo Ext. B + 0xD7B0, 0xD7FF, + // U+FE00 through U+FE0F Unicode variation selectors + 0xFE00, 0xFE0F, + // U+FE20 through U+FE2F Combining half marks + 0xFE20, 0xFE2F + }; + static size_t complexCodePathRangesCount = WTF_ARRAY_LENGTH(complexCodePathRanges); - // U+1100 through U+11FF Hangul Jamo (only Ancient Korean should be left here if you precompose; - // Modern Korean will be precomposed as a result of step A) - if (c < 0x1100) - continue; - if (c <= 0x11FF) - return Complex; - - if (c < 0x135D) // U+135D through U+135F Ethiopic combining marks - continue; - if (c <= 0x135F) - return Complex; - - if (c < 0x1700) // U+1780 through U+18AF Tagalog, Hanunoo, Buhid, Taghanwa,Khmer, Mongolian - continue; - if (c <= 0x18AF) - return Complex; - - if (c < 0x1900) // U+1900 through U+194F Limbu (Unicode 4.0) - continue; - if (c <= 0x194F) - return Complex; - - if (c < 0x1980) // U+1980 through U+19DF New Tai Lue - continue; - if (c <= 0x19DF) - return Complex; - - if (c < 0x1A00) // U+1A00 through U+1CFF Buginese, Tai Tham, Balinese, Batak, Lepcha, Vedic - continue; - if (c <= 0x1CFF) - return Complex; + CodePath result = Simple; + for (unsigned i = 0; i < len; i++) { + const UChar c = characters[i]; - if (c < 0x1DC0) // U+1DC0 through U+1DFF Comining diacritical mark supplement + // Shortcut for common case + if (c < 0x2E5) continue; - if (c <= 0x1DFF) - return Complex; // U+1E00 through U+2000 characters with diacritics and stacked diacritics - if (c <= 0x2000) { + if (c >= 0x1E00 && c <= 0x2000) { result = SimpleWithGlyphOverflow; continue; } - if (c < 0x20D0) // U+20D0 through U+20FF Combining marks for symbols - continue; - if (c <= 0x20FF) - return Complex; - - if (c < 0x2CEF) // U+2CEF through U+2CF1 Combining marks for Coptic - continue; - if (c <= 0x2CF1) - return Complex; - - if (c < 0x302A) // U+302A through U+302F Ideographic and Hangul Tone marks - continue; - if (c <= 0x302F) - return Complex; - - if (c < 0xA67C) // U+A67C through U+A67D Combining marks for old Cyrillic - continue; - if (c <= 0xA67D) - return Complex; - - if (c < 0xA6F0) // U+A6F0 through U+A6F1 Combining mark for Bamum - continue; - if (c <= 0xA6F1) - return Complex; - - // U+A800 through U+ABFF Nagri, Phags-pa, Saurashtra, Devanagari Extended, - // Hangul Jamo Ext. A, Javanese, Myanmar Extended A, Tai Viet, Meetei Mayek, - if (c < 0xA800) - continue; - if (c <= 0xABFF) - return Complex; - - if (c < 0xD7B0) // U+D7B0 through U+D7FF Hangul Jamo Ext. B - continue; - if (c <= 0xD7FF) - return Complex; - - if (c <= 0xDBFF) { - // High surrogate - + // Surrogate pairs + if (c > 0xD7FF && c <= 0xDBFF) { if (i == len - 1) continue; @@ -494,219 +457,214 @@ Font::CodePath Font::characterRangeCodePath(const UChar* characters, unsigned le continue; } - if (c < 0xFE00) // U+FE00 through U+FE0F Unicode variation selectors - continue; - if (c <= 0xFE0F) + // Search for other Complex cases + UChar* boundingCharacter = approximateBinarySearch<UChar, UChar>( + (UChar*)complexCodePathRanges, complexCodePathRangesCount, c, keyExtractorUChar); + // Exact matches are complex + if (*boundingCharacter == c) return Complex; - - if (c < 0xFE20) // U+FE20 through U+FE2F Combining half marks + bool isEndOfRange = ((boundingCharacter - complexCodePathRanges) % 2); + if (*boundingCharacter < c) { + // Determine if we are in a range or out + if (!isEndOfRange) + return Complex; continue; - if (c <= 0xFE2F) + } + ASSERT(*boundingCharacter > c); + // Determine if we are in a range or out - opposite condition to above + if (isEndOfRange) return Complex; } + return result; } bool Font::isCJKIdeograph(UChar32 c) { - // The basic CJK Unified Ideographs block. - if (c >= 0x4E00 && c <= 0x9FFF) - return true; - - // CJK Unified Ideographs Extension A. - if (c >= 0x3400 && c <= 0x4DBF) - return true; - - // CJK Radicals Supplement. - if (c >= 0x2E80 && c <= 0x2EFF) - return true; - - // Kangxi Radicals. - if (c >= 0x2F00 && c <= 0x2FDF) - return true; … (message too large)
On 2013/07/11 19:10:48, Stephen Chenney wrote: I suspect you could get a much better performance improvement by making use of locality of reference, i.e., it is more likely than not that the search match for a character will be the same as the match for the previous character (particularly true among CJK text).
Message was sent while issue was closed.
Committed patchset #5 manually as r154838 (presubmit successful).
Message was sent while issue was closed.
https://codereview.chromium.org/18949006/diff/15001/Source/core/platform/grap... File Source/core/platform/graphics/Font.cpp (right): https://codereview.chromium.org/18949006/diff/15001/Source/core/platform/grap... Source/core/platform/graphics/Font.cpp:612: cjkIsolatedSymbols->add(0x1F100); This is likely to blow up the binary. Can you add these in a loop instead?
Message was sent while issue was closed.
https://codereview.chromium.org/18949006/diff/15001/Source/core/platform/grap... File Source/core/platform/graphics/Font.cpp (right): https://codereview.chromium.org/18949006/diff/15001/Source/core/platform/grap... Source/core/platform/graphics/Font.cpp:612: cjkIsolatedSymbols->add(0x1F100); On 2013/07/24 22:34:24, abarth wrote: > This is likely to blow up the binary. Can you add these in a loop instead? You mean make a static array and then iterate over it to add?
Message was sent while issue was closed.
Yep!
Message was sent while issue was closed.
On 2013/07/24 23:10:35, abarth wrote: > Yep! OK. Tomorrow's TODO. |