Issue 18949006: Optimize Font CodePath selection and more unit testing.

Issue 18949006: Optimize Font CodePath selection and more unit testing. (Closed)

Created:
7 years, 5 months ago by Stephen Chennney

Modified:
7 years, 5 months ago

Reviewers:
bungeman-chromium, eseidel, eae, glenn, abarth-chromium

CC:
blink-reviews, jamesr, eae+blinkwatch, danakj, dglazkov+blink, Rik, jeez, pdr., abarth-chromium

Base URL:
svn://svn.chromium.org/blink/trunk

Visibility:
Public.

More Reviews

Description

Optimize Font CodePath selection and more unit testing. Replaces the long sequences of if statements in Font::characterRangeCodePath, Font::isCJKIdeograph and Font::isCJKIdeographOrSymbol with a hash set and/or binary tree lookup (with a little bit of early out and surrogate glyph testing). Also here is additional unit testing for these font code path checks. Perf tests show no regression and no speedup, although we are unlikely to have much perf coverage of these glyphs, if any. I don't see how we could possible be slower for most cases, and we would only be slower in complex cases if the glyphs happen to hit one of the early if tests. Note that in the old code, Font::isCJKIdeograph and Font::isCJKIdeographOrSymbol went through every if statement before returning false, despite the fact that the bulk of common glyphs could be trivially rejected as being lower than any of the ranges. There is an obvious error at the very end of Font::isCJKIdeographOrSymbol which is fixed in this patch. R=eseidel@chromium.org BUG=251328 Committed: https://src.chromium.org/viewvc/blink?view=rev&revision=154838

Patch Set 1 #

Patch Set 2 : Remove extraneous code. #

Patch Set 3 : Fix Windows compile. Simpler too. #

Patch Set 4 : Fix test check macros. #

Total comments: 1

Patch Set 5 : Fixed tests and rest of optimization #

Total comments: 4

Created: 7 years, 5 months ago

Download [raw] [tar.bz2]

		Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+528 lines, -848 lines)			Patch
	M	Source/core/platform/graphics/Font.cpp	View	1 2 3 4	3 chunks	+261 lines, -303 lines	4 comments	Download
	M	Source/core/platform/graphics/FontTest.cpp	View	1 2 3 4	2 chunks	+267 lines, -545 lines	0 comments	Download

Messages

Total messages: 21 (0 generated)

Expand Messages | Collapse Messages

Stephen Chennney

On 2013/07/11 19:10:48, Stephen Chenney wrote: New patch pending to fix Windows compile error.

7 years, 5 months ago (2013-07-11 19:29:41 UTC) #2

eseidel

I find the unittests insanity-verbose. :) I do support better testing however. Do we have ...

7 years, 5 months ago (2013-07-12 22:32:28 UTC) #4

Stephen Chennney

I really must have been sleep deprived when I wrote those tests. Here's the sane ...

7 years, 5 months ago (2013-07-15 21:32:32 UTC) #5

Stephen Chennney

On 2013/07/15 21:32:32, Stephen Chenney wrote: > I really must have been sleep deprived when ...

7 years, 5 months ago (2013-07-23 20:23:24 UTC) #6

eseidel

omg, so much better. It looks like LiteralSEt never landed, sadly: https://codereview.chromium.org/15622005/ https://codereview.chromium.org/18949006/diff/15001/Source/core/platform/graphics/Font.cpp File Source/core/platform/graphics/Font.cpp ...

7 years, 5 months ago (2013-07-23 20:49:16 UTC) #7

eseidel

https://codereview.chromium.org/18949006/diff/15001/Source/core/platform/graphics/Font.cpp File Source/core/platform/graphics/Font.cpp (right): https://codereview.chromium.org/18949006/diff/15001/Source/core/platform/graphics/Font.cpp#newcode659 Source/core/platform/graphics/Font.cpp:659: UChar32* boundingCharacter = approximateBinarySearch<UChar32, UChar32>( You mentioned you fixed ...

7 years, 5 months ago (2013-07-23 20:50:51 UTC) #8

Stephen Chennney

On 2013/07/23 20:49:16, eseidel wrote: > omg, so much better. > > It looks like ...

7 years, 5 months ago (2013-07-23 21:16:41 UTC) #10

Stephen Chennney

On 2013/07/23 20:50:51, eseidel wrote: > https://codereview.chromium.org/18949006/diff/15001/Source/core/platform/graphics/Font.cpp > File Source/core/platform/graphics/Font.cpp (right): > > https://codereview.chromium.org/18949006/diff/15001/Source/core/platform/graphics/Font.cpp#newcode659 > ...

7 years, 5 months ago (2013-07-23 21:20:57 UTC) #11

eseidel

lgtm I'm mostly trusting that you've stared at this code much more than I have ...

7 years, 5 months ago (2013-07-23 21:37:36 UTC) #12

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/schenney@chromium.org/18949006/15001

7 years, 5 months ago (2013-07-23 21:38:21 UTC) #13

commit-bot: I haz the power

Failed to apply patch for Source/core/platform/graphics/Font.cpp: While running patch -p1 --forward --force --no-backup-if-mismatch; patching file ...

7 years, 5 months ago (2013-07-23 21:38:23 UTC) #14

Failed to apply patch for Source/core/platform/graphics/Font.cpp:
While running patch -p1 --forward --force --no-backup-if-mismatch;
  patching file Source/core/platform/graphics/Font.cpp
  Hunk #1 FAILED at 28.
  1 out of 3 hunks FAILED -- saving rejects to file
Source/core/platform/graphics/Font.cpp.rej

Patch:       Source/core/platform/graphics/Font.cpp
Index: Source/core/platform/graphics/Font.cpp
diff --git a/Source/core/platform/graphics/Font.cpp
b/Source/core/platform/graphics/Font.cpp
index
f948c7e8c9de9c02fc4e0ea40e31a48f61e23e9a..b6074ecb8e66c391e09308bcc5c75f8d640c19b1
100644
--- a/Source/core/platform/graphics/Font.cpp
+++ b/Source/core/platform/graphics/Font.cpp
@@ -28,10 +28,11 @@
 #include "core/platform/graphics/TextRun.h"
 #include "core/platform/graphics/WidthIterator.h"
 #include "core/platform/text/transcoder/FontTranscoder.h"
-#include <wtf/MainThread.h>
-#include <wtf/MathExtras.h>
-#include <wtf/text/StringBuilder.h>
-#include <wtf/UnusedParam.h>
+#include "wtf/MainThread.h"
+#include "wtf/MathExtras.h"
+#include "wtf/StdLibExtras.h"
+#include "wtf/UnusedParam.h"
+#include "wtf/text/StringBuilder.h"
 
 using namespace WTF;
 using namespace Unicode;
@@ -353,122 +354,84 @@ Font::CodePath Font::codePath(const TextRun& run) const
     return characterRangeCodePath(run.characters16(), run.length());
 }
 
-Font::CodePath Font::characterRangeCodePath(const UChar* characters, unsigned
len)
+static inline UChar keyExtractorUChar(const UChar* value)
 {
-    // FIXME: Should use a UnicodeSet in ports where ICU is used. Note that we 
-    // can't simply use UnicodeCharacter Property/class because some characters
-    // are not 'combining', but still need to go to the complex path.
-    // Alternatively, we may as well consider binary search over a sorted
-    // list of ranges.
-    CodePath result = Simple;
-    for (unsigned i = 0; i < len; i++) {
-        const UChar c = characters[i];
-        if (c < 0x2E5) // U+02E5 through U+02E9 (Modifier Letters : Tone
letters)  
-            continue;
-        if (c <= 0x2E9) 
-            return Complex;
-
-        if (c < 0x300) // U+0300 through U+036F Combining diacritical marks
-            continue;
-        if (c <= 0x36F)
-            return Complex;
+    return *value;
+}
 
-        if (c < 0x0591 || c == 0x05BE) // U+0591 through U+05CF excluding
U+05BE Hebrew combining marks, Hebrew punctuation Paseq, Sof Pasuq and Nun
Hafukha
-            continue;
-        if (c <= 0x05CF)
-            return Complex;
+static inline UChar32 keyExtractorUChar32(const UChar32* value)
+{
+    return *value;
+}
 
+Font::CodePath Font::characterRangeCodePath(const UChar* characters, unsigned
len)
+{
+    static UChar complexCodePathRanges[] = {
+        // U+02E5 through U+02E9 (Modifier Letters : Tone letters)
+        0x2E5, 0x2E9,
+        // U+0300 through U+036F Combining diacritical marks
+        0x300, 0x36F,
+        // U+0591 through U+05CF excluding U+05BE Hebrew combining marks, ...
+        0x0591, 0x05BD,
+        // ... Hebrew punctuation Paseq, Sof Pasuq and Nun Hafukha
+        0x05BF, 0x05CF,
         // U+0600 through U+109F Arabic, Syriac, Thaana, NKo, Samaritan,
Mandaic,
-        // Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu,
Kannada, 
+        // Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu,
Kannada,
         // Malayalam, Sinhala, Thai, Lao, Tibetan, Myanmar
-        if (c < 0x0600) 
-            continue;
-        if (c <= 0x109F)
-            return Complex;
+        0x0600, 0x109F,
+        // U+1100 through U+11FF Hangul Jamo (only Ancient Korean should be
left
+        // here if you precompose; Modern Korean will be precomposed as a
result of step A)
+        0x1100, 0x11FF,
+        // U+135D through U+135F Ethiopic combining marks
+        0x135D, 0x135F,
+        // U+1780 through U+18AF Tagalog, Hanunoo, Buhid, Taghanwa,Khmer,
Mongolian
+        0x1700, 0x18AF,
+        // U+1900 through U+194F Limbu (Unicode 4.0)
+        0x1900, 0x194F,
+        // U+1980 through U+19DF New Tai Lue
+        0x1980, 0x19DF,
+        // U+1A00 through U+1CFF Buginese, Tai Tham, Balinese, Batak, Lepcha,
Vedic
+        0x1A00, 0x1CFF,
+        // U+1DC0 through U+1DFF Comining diacritical mark supplement
+        0x1DC0, 0x1DFF,
+        // U+20D0 through U+20FF Combining marks for symbols
+        0x20D0, 0x20FF,
+        // U+2CEF through U+2CF1 Combining marks for Coptic
+        0x2CEF, 0x2CF1,
+        // U+302A through U+302F Ideographic and Hangul Tone marks
+        0x302A, 0x302F,
+        // U+A67C through U+A67D Combining marks for old Cyrillic
+        0xA67C, 0xA67D,
+        // U+A6F0 through U+A6F1 Combining mark for Bamum
+        0xA6F0, 0xA6F1,
+        // U+A800 through U+ABFF Nagri, Phags-pa, Saurashtra, Devanagari
Extended,
+        // Hangul Jamo Ext. A, Javanese, Myanmar Extended A, Tai Viet, Meetei
Mayek
+        0xA800, 0xABFF,
+        // U+D7B0 through U+D7FF Hangul Jamo Ext. B
+        0xD7B0, 0xD7FF,
+        // U+FE00 through U+FE0F Unicode variation selectors
+        0xFE00, 0xFE0F,
+        // U+FE20 through U+FE2F Combining half marks
+        0xFE20, 0xFE2F
+    };
+    static size_t complexCodePathRangesCount =
WTF_ARRAY_LENGTH(complexCodePathRanges);
 
-        // U+1100 through U+11FF Hangul Jamo (only Ancient Korean should be
left here if you precompose;
-        // Modern Korean will be precomposed as a result of step A)
-        if (c < 0x1100)
-            continue;
-        if (c <= 0x11FF)
-            return Complex;
-
-        if (c < 0x135D) // U+135D through U+135F Ethiopic combining marks
-            continue;
-        if (c <= 0x135F)
-            return Complex;
-
-        if (c < 0x1700) // U+1780 through U+18AF Tagalog, Hanunoo, Buhid,
Taghanwa,Khmer, Mongolian
-            continue;
-        if (c <= 0x18AF)
-            return Complex;
-
-        if (c < 0x1900) // U+1900 through U+194F Limbu (Unicode 4.0)
-            continue;
-        if (c <= 0x194F)
-            return Complex;
-
-        if (c < 0x1980) // U+1980 through U+19DF New Tai Lue
-            continue;
-        if (c <= 0x19DF)
-            return Complex;
-
-        if (c < 0x1A00) // U+1A00 through U+1CFF Buginese, Tai Tham, Balinese,
Batak, Lepcha, Vedic
-            continue;
-        if (c <= 0x1CFF)
-            return Complex;
+    CodePath result = Simple;
+    for (unsigned i = 0; i < len; i++) {
+        const UChar c = characters[i];
 
-        if (c < 0x1DC0) // U+1DC0 through U+1DFF Comining diacritical mark
supplement
+        // Shortcut for common case
+        if (c < 0x2E5)
             continue;
-        if (c <= 0x1DFF)
-            return Complex;
 
         // U+1E00 through U+2000 characters with diacritics and stacked
diacritics
-        if (c <= 0x2000) {
+        if (c >= 0x1E00 && c <= 0x2000) {
             result = SimpleWithGlyphOverflow;
             continue;
         }
 
-        if (c < 0x20D0) // U+20D0 through U+20FF Combining marks for symbols
-            continue;
-        if (c <= 0x20FF)
-            return Complex;
-
-        if (c < 0x2CEF) // U+2CEF through U+2CF1 Combining marks for Coptic
-            continue;
-        if (c <= 0x2CF1)
-            return Complex;
-
-        if (c < 0x302A) // U+302A through U+302F Ideographic and Hangul Tone
marks
-            continue;
-        if (c <= 0x302F)
-            return Complex;
-
-        if (c < 0xA67C) // U+A67C through U+A67D Combining marks for old
Cyrillic
-            continue;
-        if (c <= 0xA67D)
-            return Complex;
-
-        if (c < 0xA6F0) // U+A6F0 through U+A6F1 Combining mark for Bamum
-            continue;
-        if (c <= 0xA6F1)
-            return Complex;
-
-       // U+A800 through U+ABFF Nagri, Phags-pa, Saurashtra, Devanagari
Extended,
-       // Hangul Jamo Ext. A, Javanese, Myanmar Extended A, Tai Viet, Meetei
Mayek,
-        if (c < 0xA800) 
-            continue;
-        if (c <= 0xABFF)
-            return Complex;
-
-        if (c < 0xD7B0) // U+D7B0 through U+D7FF Hangul Jamo Ext. B
-            continue;
-        if (c <= 0xD7FF)
-            return Complex;
-
-        if (c <= 0xDBFF) {
-            // High surrogate
-
+        // Surrogate pairs
+        if (c > 0xD7FF && c <= 0xDBFF) {
             if (i == len - 1)
                 continue;
 
@@ -494,219 +457,214 @@ Font::CodePath Font::characterRangeCodePath(const UChar*
characters, unsigned le
             continue;
         }
 
-        if (c < 0xFE00) // U+FE00 through U+FE0F Unicode variation selectors
-            continue;
-        if (c <= 0xFE0F)
+        // Search for other Complex cases
+        UChar* boundingCharacter = approximateBinarySearch<UChar, UChar>(
+            (UChar*)complexCodePathRanges, complexCodePathRangesCount, c,
keyExtractorUChar);
+        // Exact matches are complex
+        if (*boundingCharacter == c)
             return Complex;
-
-        if (c < 0xFE20) // U+FE20 through U+FE2F Combining half marks
+        bool isEndOfRange = ((boundingCharacter - complexCodePathRanges) % 2);
+        if (*boundingCharacter < c) {
+            // Determine if we are in a range or out
+            if (!isEndOfRange)
+                return Complex;
             continue;
-        if (c <= 0xFE2F)
+        }
+        ASSERT(*boundingCharacter > c);
+        // Determine if we are in a range or out - opposite condition to above
+        if (isEndOfRange)
             return Complex;
     }
+
     return result;
 }
 
 bool Font::isCJKIdeograph(UChar32 c)
 {
-    // The basic CJK Unified Ideographs block.
-    if (c >= 0x4E00 && c <= 0x9FFF)
-        return true;
-    
-    // CJK Unified Ideographs Extension A.
-    if (c >= 0x3400 && c <= 0x4DBF)
-        return true;
-    
-    // CJK Radicals Supplement.
-    if (c >= 0x2E80 && c <= 0x2EFF)
-        return true;
-    
-    // Kangxi Radicals.
-    if (c >= 0x2F00 && c <= 0x2FDF)
-        return true;
…
(message too large)

glenn

On 2013/07/11 19:10:48, Stephen Chenney wrote: I suspect you could get a much better performance ...

7 years, 5 months ago (2013-07-24 15:49:58 UTC) #16

Stephen Chennney

Committed patchset #5 manually as r154838 (presubmit successful).

7 years, 5 months ago (2013-07-24 17:50:57 UTC) #17

abarth-chromium

https://codereview.chromium.org/18949006/diff/15001/Source/core/platform/graphics/Font.cpp File Source/core/platform/graphics/Font.cpp (right): https://codereview.chromium.org/18949006/diff/15001/Source/core/platform/graphics/Font.cpp#newcode612 Source/core/platform/graphics/Font.cpp:612: cjkIsolatedSymbols->add(0x1F100); This is likely to blow up the binary. ...

7 years, 5 months ago (2013-07-24 22:34:24 UTC) #18

Stephen Chennney

https://codereview.chromium.org/18949006/diff/15001/Source/core/platform/graphics/Font.cpp File Source/core/platform/graphics/Font.cpp (right): https://codereview.chromium.org/18949006/diff/15001/Source/core/platform/graphics/Font.cpp#newcode612 Source/core/platform/graphics/Font.cpp:612: cjkIsolatedSymbols->add(0x1F100); On 2013/07/24 22:34:24, abarth wrote: > This is ...

7 years, 5 months ago (2013-07-24 23:02:18 UTC) #19

Stephen Chennney

7 years, 5 months ago (2013-07-24 23:22:50 UTC) #21

Message was sent while issue was closed.

On 2013/07/24 23:10:35, abarth wrote:
> Yep!

OK. Tomorrow's TODO.

Expand Messages | Collapse Messages