|
|
DescriptionAllocate stack item for fragment context elements once
Instead of allocating stack items for fragment context elements on
demand, do it once when creating the fragment context and re-use the
stack item during parsing. This avoids excessive calls to malloc when
parsing for innerHTML.
BUG=318711
Committed: https://src.chromium.org/viewvc/blink?view=rev&revision=162047
Patch Set 1 #Patch Set 2 : Allocate stack item in fragment constructor #
Total comments: 2
Patch Set 3 : Get rid of PassRefPtr #
Messages
Total messages: 18 (0 generated)
Eric, was it something like this you had in mind for avoiding the malloc just for namespace checks? I tried to verify that this did indeed affect http://jsperf.com/parse-html-type but didn't come up with any conclusive numbers; the difference seemed to be in the noise for me. But looking at the results from 'perf', and annotating time spent inside HTMLTreeBuilder::constructTree, this absolutely looks like an improvement.
Side-note: HTMLElementStack::hasOnlyOneElement looks like a candidate for inlining (now).
This is OK, but I Think better would be to just keep m_adjustedCurrentStackItemForFragment as a member, and (lazily?) allocate it only for fragments. Then this can return a reference (instead of a RefPtr!) and all of these allocs go away.
Thanks for writing this up. Sorry I was slow to review.
On 2013/11/14 01:49:43, eseidel wrote: > This is OK, but I Think better would be to just keep > m_adjustedCurrentStackItemForFragment as a member, and (lazily?) allocate it > only for fragments. > > Then this can return a reference (instead of a RefPtr!) and all of these allocs > go away. That sounds actually the first patch I wrote, but I worried about not getting rid of the malloc, just pushing it earlier. But it indeed sounds like the right solution. I'll update the patch.
We could also just include a stack-allocated ajusted stackitem? Or can't we just return m_stack[0]? Won't that always be the context? element? or I guess that's the <html>?
Keeping innerHTML down to the minimum number of mallocs is actually important. When we very first refactored the HTML parser for threading we added a bunch of mallocs to the null-case (innerHTML='') and it was a severe perf regression. These days we special case innerHTML='' to clear fast, and at one point we special cased innerHTML='text' or talked about it. innerHTML of just a couple tags can be quite common and will be dominated by all the rest of the parser overhead if we're not careful.
On 2013/11/14 06:46:20, eseidel wrote: > [...] at one point we special cased innerHTML='text' or talked about it. Indeed we special case it (and the empty string case). https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit...
We could do better than that though. We could short-cut earlier by scanning the innerHTML string for the characters '<' (tags) or '&' (entities) and remove '\0' and then just cut over to setInnerText/setTextContent instead.
On 2013/11/14 06:44:47, eseidel wrote: > We could also just include a stack-allocated ajusted stackitem? Or can't we > just return m_stack[0]? Won't that always be the context? element? or I guess > that's the <html>? Right, a new empty <html> seems to be what the spec says, even though we seem to have a slight variation to this: https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit... So not usable for the context element, I believe.
Yeah, that's one example of the crazy we had to add in order to keep innerHTML fast after refactoring the parser. :)
Yeah, that's one example of the crazy we had to add in order to keep innerHTML fast after refactoring the parser. :)
I've conducted some performance measurements comparing before/after this patch. The short test is: <script> var testString = "<span>apa</span>"; var elm = document.createElement('div'); t = function() { elm.innerHTML = testString; }; for (var i =0; i < 10000; i++) { t(); } console.log('done'); </script> or the long test, which is identical except for the testString line: var testString = (new Array(500)).join("<span>apa</span>"); I measure using 'perf record -f $content_shell_binary --single-process <url-to-test>' (Excuse the cropped lines...) The top of the profile stack, before patch, long test: 7.34% content_shell-n content_shell-no-patch [.] WebCore::HTMLTokenizer::nextToken(WebCore: 5.71% content_shell-n content_shell-no-patch [.] WebCore::HTMLConstructionSite::executeQueu 5.48% content_shell-n content_shell-no-patch [.] WebCore::HTMLTreeBuilder::constructTree(We 3.95% content_shell-n content_shell-no-patch [.] tc_malloc 3.87% content_shell-n content_shell-no-patch [.] WebCore::HTMLTreeBuilder::processStartTagF 2.67% content_shell-n content_shell-no-patch [.] tc_free The top of the profile stack, after patch, long test: 7.43% content_shell content_shell [.] WebCore::HTMLConstructionSite::executeQu 7.10% content_shell content_shell [.] WebCore::HTMLTokenizer::nextToken(WebCor 4.01% content_shell content_shell [.] WebCore::HTMLTreeBuilder::processStartTa 3.01% content_shell content_shell [.] WebCore::HTMLTreeBuilder::constructTree( 2.83% content_shell content_shell [.] WebCore::HTMLDocumentParser::constructTr 2.65% content_shell content_shell [.] WTF::HashTableAddResult<WTF::HashTableIt 2.65% content_shell content_shell [.] WebCore::ContainerNode::removeChildren() 2.48% content_shell content_shell [.] tc_malloc Observation: less malloc pressure, constructTree down from 5.48% to 2.83%, more time for executeQueuedTasks and nextToken. The top of the profile stack, before patch, short test: 6.79% content_shell-n content_shell-no-patch [.] WebCore::HTMLTokenizer::nextToken(WebCore:: 5.88% content_shell-n content_shell-no-patch [.] WebCore::HTMLTreeBuilder::constructTree(Web 5.69% content_shell-n content_shell-no-patch [.] WebCore::HTMLConstructionSite::executeQueue 4.49% content_shell-n content_shell-no-patch [.] tc_malloc 3.94% content_shell-n content_shell-no-patch [.] WebCore::HTMLTreeBuilder::processStartTagFo 2.81% content_shell-n content_shell-no-patch [.] tc_free 2.68% content_shell-n content_shell-no-patch [.] WebCore::ContainerNode::removeChildren() The top of the profile stack, after patch, short test: 7.24% content_shell content_shell [.] WebCore::HTMLTokenizer::nextToken(WebCore 7.08% content_shell content_shell [.] WebCore::HTMLConstructionSite::executeQue 4.42% content_shell content_shell [.] WebCore::HTMLTreeBuilder::processStartTag 2.84% content_shell content_shell [.] WebCore::HTMLTreeBuilder::constructTree(W 2.71% content_shell content_shell [.] WTF::HashTableAddResult<WTF::HashTableIte 2.47% content_shell content_shell [.] WebCore::ContainerNode::removeChildren() 2.47% content_shell content_shell [.] WebCore::HTMLDocumentParser::constructTre 2.26% content_shell content_shell [.] tc_malloc Observations: Even more dramatic drop for malloc and constructTree; otherwise in line with the results above. (Feedback on the testing procedure appreciated. There are probably many with much more experience than me doing there sorts of tests.)
lgtm Definitely better. https://codereview.chromium.org/68893014/diff/100001/Source/core/html/parser/... File Source/core/html/parser/HTMLTreeBuilder.cpp (left): https://codereview.chromium.org/68893014/diff/100001/Source/core/html/parser/... Source/core/html/parser/HTMLTreeBuilder.cpp:343: , m_contextElement(0) Oh, I love it. https://codereview.chromium.org/68893014/diff/100001/Source/core/html/parser/... File Source/core/html/parser/HTMLTreeBuilder.h (right): https://codereview.chromium.org/68893014/diff/100001/Source/core/html/parser/... Source/core/html/parser/HTMLTreeBuilder.h:205: PassRefPtr<HTMLStackItem> contextElementStackItem() const { ASSERT(m_fragment); return m_contextElementStackItem; } I think this can just be HTMLStackItem*, PassRefPTr is for when you expect the caller to take ownership. In this case, you expect contextElementStackItem() to live exactly as long as HTMLTreeBuilder does, hence the raw ptr.
Testing procedure sgtm. You invented your own microbenchmark, because you're fixing something very microbenchmarkable -- sounds like a great strategy. I would expect nextToken to be top of the stack, but as much as possible we should get rid of all the mallocs on that stack. Generally when doing this sort of work I count any malloc at 3x its reported cost. 1x for the malloc call itself, 1x for the free and ~1x for the pressure of having the malloc'd memory lying around. Total rough estimate, but helps me remember to focus on removing mallocs first when looking at profiles like this.
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/davve@opera.com/68893014/210001
On 2013/11/14 15:39:41, eseidel wrote: > lgtm > > Definitely better. Thanks for your help! It has definitely been a learning experience touching innerHTML and the HTML parser.
Message was sent while issue was closed.
Change committed as 162047 |