Issue 196353018: Smaller CSSParser UTF16 buffers for escaped strings.

Issue 196353018: Smaller CSSParser UTF16 buffers for escaped strings. (Closed)

Created:
6 years, 9 months ago by Daniel Bratell

Modified:
6 years, 8 months ago

Reviewers:
Julien - ping for review, mstensho (USE GERRIT), eseidel, rune, darktears

CC:
blink-reviews, ed+blinkwatch_opera.com, dglazkov+blink, apavlov+blink_chromium.org, rune+blink, rwlbuis

Base URL:
https://chromium.googlesource.com/chromium/blink.git@master

Visibility:
Public.

More Reviews

Description

Smaller CSSParser UTF16 buffers for escaped strings. If an ASCII stylesheet contains escaped characters those need to be unescaped and stored somewhere. The code used to allocate a buffer the size of the whole CSS file * sizeof(UChar). That meant 700 KB on arstechnica.com to store a single character. Instead store the supposedly rare characters in special buffers of appropriate sizes. BUG=352544 Committed: https://src.chromium.org/viewvc/blink?view=rev&revision=170451

Patch Set 1 #

Total comments: 21

Patch Set 2 : Dropped /* static */ and rewrote comment. #

Total comments: 6

Patch Set 3 : Rewrote comments. #

Total comments: 9

Patch Set 4 : More comments. #

Total comments: 2

Patch Set 5 : Reused existing code. #

Patch Set 6 : Rebased #

Created: 6 years, 8 months ago

Download [raw] [tar.bz2]

		Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+91 lines, -23 lines)			Patch
	M	Source/core/css/CSSTokenizer.h	View	1 2 3	3 chunks	+21 lines, -10 lines	0 comments	Download
	M	Source/core/css/CSSTokenizer-in.cpp	View	1 2 3 4 5	9 chunks	+70 lines, -13 lines	0 comments	Download

Messages

Total messages: 16 (0 generated)

Expand Messages | Collapse Messages

Daniel Bratell

jchaffraix, could you please have a look at this patch to stop using the full-stylesheet-unicode ...

6 years, 9 months ago (2014-03-14 08:59:21 UTC) #1

rune

https://codereview.chromium.org/196353018/diff/1/Source/core/css/CSSTokenizer-in.cpp File Source/core/css/CSSTokenizer-in.cpp (right): https://codereview.chromium.org/196353018/diff/1/Source/core/css/CSSTokenizer-in.cpp#newcode389 Source/core/css/CSSTokenizer-in.cpp:389: /* static */ There's no precedence for indication of ...

6 years, 9 months ago (2014-03-14 12:50:23 UTC) #2

Daniel Bratell

https://codereview.chromium.org/196353018/diff/1/Source/core/css/CSSTokenizer-in.cpp File Source/core/css/CSSTokenizer-in.cpp (right): https://codereview.chromium.org/196353018/diff/1/Source/core/css/CSSTokenizer-in.cpp#newcode449 Source/core/css/CSSTokenizer-in.cpp:449: // write it down so use that count as ...

6 years, 9 months ago (2014-03-17 15:55:35 UTC) #3

Daniel Bratell

https://codereview.chromium.org/196353018/diff/20001/Source/core/css/CSSTokenizer-in.cpp File Source/core/css/CSSTokenizer-in.cpp (right): https://codereview.chromium.org/196353018/diff/20001/Source/core/css/CSSTokenizer-in.cpp#newcode666 Source/core/css/CSSTokenizer-in.cpp:666: if (unicode > 0xff && sizeof(DestCharacterType) == 1) Note: ...

6 years, 9 months ago (2014-03-17 16:01:06 UTC) #4

Daniel Bratell

6 years, 9 months ago (2014-03-17 20:15:35 UTC) #5

rune

https://codereview.chromium.org/196353018/diff/20001/Source/core/css/CSSTokenizer-in.cpp File Source/core/css/CSSTokenizer-in.cpp (right): https://codereview.chromium.org/196353018/diff/20001/Source/core/css/CSSTokenizer-in.cpp#newcode514 Source/core/css/CSSTokenizer-in.cpp:514: // it down so use that as the upper ...

6 years, 9 months ago (2014-03-17 20:18:35 UTC) #6

Daniel Bratell

https://codereview.chromium.org/196353018/diff/20001/Source/core/css/CSSTokenizer-in.cpp File Source/core/css/CSSTokenizer-in.cpp (right): https://codereview.chromium.org/196353018/diff/20001/Source/core/css/CSSTokenizer-in.cpp#newcode514 Source/core/css/CSSTokenizer-in.cpp:514: // it down so use that as the upper ...

6 years, 9 months ago (2014-03-17 20:25:17 UTC) #7

Julien - ping for review

I don't know this code really well so it would take me quite some time ...

6 years, 9 months ago (2014-03-18 23:32:19 UTC) #9

Julien - ping for review

https://codereview.chromium.org/196353018/diff/40001/Source/core/css/CSSTokenizer-in.cpp File Source/core/css/CSSTokenizer-in.cpp (right): https://codereview.chromium.org/196353018/diff/40001/Source/core/css/CSSTokenizer-in.cpp#newcode307 Source/core/css/CSSTokenizer-in.cpp:307: UChar* CSSTokenizer::getStringBuffer16(size_t len) We usually don't put the word ...

6 years, 9 months ago (2014-03-20 20:57:38 UTC) #10

Daniel Bratell

Now even better. https://codereview.chromium.org/196353018/diff/40001/Source/core/css/CSSTokenizer-in.cpp File Source/core/css/CSSTokenizer-in.cpp (right): https://codereview.chromium.org/196353018/diff/40001/Source/core/css/CSSTokenizer-in.cpp#newcode307 Source/core/css/CSSTokenizer-in.cpp:307: UChar* CSSTokenizer::getStringBuffer16(size_t len) On 2014/03/20 20:57:39, ...

6 years, 9 months ago (2014-03-21 15:14:38 UTC) #11

Now even better.

https://codereview.chromium.org/196353018/diff/40001/Source/core/css/CSSToken...
File Source/core/css/CSSTokenizer-in.cpp (right):

https://codereview.chromium.org/196353018/diff/40001/Source/core/css/CSSToken...
Source/core/css/CSSTokenizer-in.cpp:307: UChar*
CSSTokenizer::getStringBuffer16(size_t len)
On 2014/03/20 20:57:39, Julien Chaffraix - PST wrote:
> We usually don't put the word "get" on getters as it's redundant. Here we
could
> write allocateStringBuffer16 as it would better match the intent.
> 
> Maybe we could mention somehow that the string is owned by the tokenizer in
the
> name too but we don't want to make the name too long.

Done.

https://codereview.chromium.org/196353018/diff/40001/Source/core/css/CSSToken...
Source/core/css/CSSTokenizer-in.cpp:389: unsigned
CSSTokenizer::parseEscape(CharacterType*& src)
On 2014/03/20 20:57:39, Julien Chaffraix - PST wrote:
> This function should now have static linkage.

Hmm, not sure I understand. A function can have internal or external linkage.
This has external linkage since it's part of a class.

I did make it a static method in the class in this patch though, mostly to
prevent it from modifying the tokenizer state by accident which it used to do
("src" and "currentCharacter()" happened to be the same pointer so it
"worked(tm)".)

I think it's doing fine as a static member function, but I can make it a file
level internal function though if you think that matches the style better.

https://codereview.chromium.org/196353018/diff/40001/Source/core/css/CSSToken...
File Source/core/css/CSSTokenizer.h (right):

https://codereview.chromium.org/196353018/diff/40001/Source/core/css/CSSToken...
Source/core/css/CSSTokenizer.h:162: Vector<OwnPtr<UChar[]> > m_cssStrings16;
On 2014/03/20 20:57:39, Julien Chaffraix - PST wrote:
> Really not a huge fan of the Vector approach but we need to hold onto the
> characters as the intermediate strings don't do it.
> 
> Would you add a comment to explain this optimization in the code and why this
is
> needed? (if we revisit how we handle CSSParserString's, we could remove this
> giant Vector).

Done. - Though I wouldn't call it giant. It will be 20 bytes and empty in most
cases. For Ars Technica it will also contain a 1 character string for another
~20 bytes. Those 40 bytes should be compared to the 700 KB we used to allocate
into m_currentCharacter16 before this patch.

Julien - ping for review

lgtm but please try to share some code. https://codereview.chromium.org/196353018/diff/40001/Source/core/css/CSSTokenizer-in.cpp File Source/core/css/CSSTokenizer-in.cpp (right): https://codereview.chromium.org/196353018/diff/40001/Source/core/css/CSSTokenizer-in.cpp#newcode389 Source/core/css/CSSTokenizer-in.cpp:389: unsigned ...

6 years, 9 months ago (2014-03-21 17:56:26 UTC) #12

lgtm but please try to share some code.

https://codereview.chromium.org/196353018/diff/40001/Source/core/css/CSSToken...
File Source/core/css/CSSTokenizer-in.cpp (right):

https://codereview.chromium.org/196353018/diff/40001/Source/core/css/CSSToken...
Source/core/css/CSSTokenizer-in.cpp:389: unsigned
CSSTokenizer::parseEscape(CharacterType*& src)
On 2014/03/21 15:14:39, Daniel Bratell wrote:
> On 2014/03/20 20:57:39, Julien Chaffraix - PST wrote:
> > This function should now have static linkage.
> 
> Hmm, not sure I understand. A function can have internal or external linkage.
> This has external linkage since it's part of a class.
> 
> I did make it a static method in the class in this patch though, mostly to
> prevent it from modifying the tokenizer state by accident which it used to do
> ("src" and "currentCharacter()" happened to be the same pointer so it
> "worked(tm)".)
> 
> I think it's doing fine as a static member function, but I can make it a file
> level internal function though if you think that matches the style better.

I was talking about file level internal function as you pointed out. All in all,
it's not really blocking and it works either way.

https://codereview.chromium.org/196353018/diff/40001/Source/core/css/CSSToken...
File Source/core/css/CSSTokenizer.h (right):

https://codereview.chromium.org/196353018/diff/40001/Source/core/css/CSSToken...
Source/core/css/CSSTokenizer.h:162: Vector<OwnPtr<UChar[]> > m_cssStrings16;
On 2014/03/21 15:14:39, Daniel Bratell wrote:
> On 2014/03/20 20:57:39, Julien Chaffraix - PST wrote:
> > Really not a huge fan of the Vector approach but we need to hold onto the
> > characters as the intermediate strings don't do it.
> > 
> > Would you add a comment to explain this optimization in the code and why
this
> is
> > needed? (if we revisit how we handle CSSParserString's, we could remove this
> > giant Vector).
> 
> Done. - Though I wouldn't call it giant. It will be 20 bytes and empty in most
> cases. For Ars Technica it will also contain a 1 character string for another
> ~20 bytes. Those 40 bytes should be compared to the 700 KB we used to allocate
> into m_currentCharacter16 before this patch.

I meant giant *hack* Vector but forgot one word. Ideally we shouldn't require
that but due to the way CSSParserString works, we have to retain the memory
until parsing is done :-(

https://codereview.chromium.org/196353018/diff/60001/Source/core/css/CSSToken...
File Source/core/css/CSSTokenizer-in.cpp (right):

https://codereview.chromium.org/196353018/diff/60001/Source/core/css/CSSToken...
Source/core/css/CSSTokenizer-in.cpp:540: parseEscape<SrcCharacterType>(src);
I really think this inner loop should be replaced by a call to
checkAndSkipString as it seems to do the exact same checks.

Daniel Bratell

I don't think it's a hack at all to have a string table. It's the ...

6 years, 8 months ago (2014-03-31 15:07:15 UTC) #13

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/bratell@opera.com/196353018/100001

6 years, 8 months ago (2014-03-31 15:07:35 UTC) #15

Message was sent while issue was closed.

Change committed as 170451

Expand Messages | Collapse Messages