|
|
DescriptionImplement StringImpl sharing for StringView::toString().
Prior to https://codereview.chromium.org/2007103003 StringBuilder had an
optimization that would share the StringImpl when calling append() with
a String the first time. This meant that doing element.textContent on
an Element with a single Text node child wouldn't need to string copy.
The optimization also helps the parser, removing a string copy when
flushing text in the HTML parser (flushPendingText) when there's a
single text that was appended, and the text is a reasonable length
(<= 65536 bytes).
This optimization further helps any code that would potentially append
an AtomicString to a StringBuilder and then do toAtomicString()
or toString() by ensuring we return the same AtomicString again
avoiding a bunch of work, and allowing the atomic string optimization
to propagate to the result.
This patch restores the StringBuilder optimization by adding a
StringImpl* member to StringView and allowing toString() and
toAtomicString() to return that member when the offset is zero and the
length is the same as the input string.
This patch also collapses the m_is8Bit flag into the m_impl by just
reusing the field to know if the StringView is 8bit. We now store the
shared empty 8bit or 16bit strings in the impl field for views created
from sources that are not backed by StringImpl. Doing this keeps the
StringView class to be only 2 ptrs and an unsigned wide.
I also further simplified the code by using the union members directly
and by adding a void* bytes() getter to StringImpl which removed the
branch on is8Bit for the common case of constructing a StringView from
a String or StringImpl with no offset, and the full length.
BUG=615174
Committed: https://crrev.com/580cbf5d215fee758dd5c447cf03885b07bc9d2e
Cr-Commit-Position: refs/heads/master@{#397986}
Patch Set 1 #Patch Set 2 : add getptr. #Patch Set 3 : Fix preload scanner bug. #
Total comments: 11
Messages
Total messages: 28 (13 generated)
The CQ bit was checked by esprehn@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/2033293002/1 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/2033293002/1
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-simulator on tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...)
The CQ bit was checked by esprehn@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/2033293002/20001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/2033293002/20001
Description was changed from ========== Implement StringImpl sharing for StringView::toString(). Prior to https://codereview.chromium.org/2007103003 StringBuilder had an optimization that would share the StringImpl when calling append() with a String the first time. This meant that doing element.textContent on an Element with a single Text node child wouldn't need to string copy. The optimization also helps the parser, removing a string copy when flushing text in the HTML parser (flushPendingText) when there's a single text that was appended, and the text is a reasonable length (<= 65536 bytes). This optimization further helps any code that would potentially do append an AtomicString to a StringBuilder and then do toAtomicString() or toString() by ensuring we return the same AtomicString again avoiding a bunch of work, and allowing the atomic string optimization to propagate to the result. This patch restores the StringBuilder optimization by adding a StringImpl* member to StringView and allowing toString() and toAtomicString() to return that member when the offset is zero and the length is the same as the input string. This patch also collapses the m_is8Bit flag into the m_impl by just reusing the field to know if the StringView is 8bit. We now store the shared empty 8bit or 16bit strings in the impl field for views created from sources that are not backed by StringImpl. Doing this keeps the StringView class to be only 2 ptrs and an unsigned wide. I also further simplified the code by using the union members directly and by adding a void* bytes() getter to StringImpl removed the branch on is8Bit for the common case of constructing a StringView from a String or StringImpl with no offset, and the full length. This patch is a 10% improvement on the Parser/textarea-parsing.html PerformanceTest on my Retina Macbook Pro. BUG=615174 ========== to ========== Implement StringImpl sharing for StringView::toString(). Prior to https://codereview.chromium.org/2007103003 StringBuilder had an optimization that would share the StringImpl when calling append() with a String the first time. This meant that doing element.textContent on an Element with a single Text node child wouldn't need to string copy. The optimization also helps the parser, removing a string copy when flushing text in the HTML parser (flushPendingText) when there's a single text that was appended, and the text is a reasonable length (<= 65536 bytes). This optimization further helps any code that would potentially do an append an AtomicString to a StringBuilder and then do toAtomicString() or toString() by ensuring we return the same AtomicString again avoiding a bunch of work, and allowing the atomic string optimization to propagate to the result. This patch restores the StringBuilder optimization by adding a StringImpl* member to StringView and allowing toString() and toAtomicString() to return that member when the offset is zero and the length is the same as the input string. This patch also collapses the m_is8Bit flag into the m_impl by just reusing the field to know if the StringView is 8bit. We now store the shared empty 8bit or 16bit strings in the impl field for views created from sources that are not backed by StringImpl. Doing this keeps the StringView class to be only 2 ptrs and an unsigned wide. I also further simplified the code by using the union members directly and by adding a void* bytes() getter to StringImpl removed the branch on is8Bit for the common case of constructing a StringView from a String or StringImpl with no offset, and the full length. This patch is a 10% improvement on the Parser/textarea-parsing.html PerformanceTest on my Retina Macbook Pro. BUG=615174 ==========
Description was changed from ========== Implement StringImpl sharing for StringView::toString(). Prior to https://codereview.chromium.org/2007103003 StringBuilder had an optimization that would share the StringImpl when calling append() with a String the first time. This meant that doing element.textContent on an Element with a single Text node child wouldn't need to string copy. The optimization also helps the parser, removing a string copy when flushing text in the HTML parser (flushPendingText) when there's a single text that was appended, and the text is a reasonable length (<= 65536 bytes). This optimization further helps any code that would potentially do an append an AtomicString to a StringBuilder and then do toAtomicString() or toString() by ensuring we return the same AtomicString again avoiding a bunch of work, and allowing the atomic string optimization to propagate to the result. This patch restores the StringBuilder optimization by adding a StringImpl* member to StringView and allowing toString() and toAtomicString() to return that member when the offset is zero and the length is the same as the input string. This patch also collapses the m_is8Bit flag into the m_impl by just reusing the field to know if the StringView is 8bit. We now store the shared empty 8bit or 16bit strings in the impl field for views created from sources that are not backed by StringImpl. Doing this keeps the StringView class to be only 2 ptrs and an unsigned wide. I also further simplified the code by using the union members directly and by adding a void* bytes() getter to StringImpl removed the branch on is8Bit for the common case of constructing a StringView from a String or StringImpl with no offset, and the full length. This patch is a 10% improvement on the Parser/textarea-parsing.html PerformanceTest on my Retina Macbook Pro. BUG=615174 ========== to ========== Implement StringImpl sharing for StringView::toString(). Prior to https://codereview.chromium.org/2007103003 StringBuilder had an optimization that would share the StringImpl when calling append() with a String the first time. This meant that doing element.textContent on an Element with a single Text node child wouldn't need to string copy. The optimization also helps the parser, removing a string copy when flushing text in the HTML parser (flushPendingText) when there's a single text that was appended, and the text is a reasonable length (<= 65536 bytes). This optimization further helps any code that would potentially do an append an AtomicString to a StringBuilder and then do toAtomicString() or toString() by ensuring we return the same AtomicString again avoiding a bunch of work, and allowing the atomic string optimization to propagate to the result. This patch restores the StringBuilder optimization by adding a StringImpl* member to StringView and allowing toString() and toAtomicString() to return that member when the offset is zero and the length is the same as the input string. This patch also collapses the m_is8Bit flag into the m_impl by just reusing the field to know if the StringView is 8bit. We now store the shared empty 8bit or 16bit strings in the impl field for views created from sources that are not backed by StringImpl. Doing this keeps the StringView class to be only 2 ptrs and an unsigned wide. I also further simplified the code by using the union members directly and by adding a void* bytes() getter to StringImpl which removed the branch on is8Bit for the common case of constructing a StringView from a String or StringImpl with no offset, and the full length. This patch is a 10% improvement on the Parser/textarea-parsing.html PerformanceTest on my Retina Macbook Pro. BUG=615174 ==========
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_rel_ng on tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by esprehn@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/2033293002/40001
https://codereview.chromium.org/2033293002/diff/40001/third_party/WebKit/Sour... File third_party/WebKit/Source/core/html/parser/HTMLPreloadScanner.cpp (right): https://codereview.chromium.org/2033293002/diff/40001/third_party/WebKit/Sour... third_party/WebKit/Source/core/html/parser/HTMLPreloadScanner.cpp:199: // ref to a string we're going to send will fail isSafeToSendToAnotherThread(). This was a bug that already existed, but StringView::toString() used to always vend a new string, even when the m_length was the same as the full string which was hiding this bug by creating a new string inside toString(). All the other stuff that the preload scanner stores already do this, ex. PreloadRequest makes an isolatedCopy() of any String it might send.
Description was changed from ========== Implement StringImpl sharing for StringView::toString(). Prior to https://codereview.chromium.org/2007103003 StringBuilder had an optimization that would share the StringImpl when calling append() with a String the first time. This meant that doing element.textContent on an Element with a single Text node child wouldn't need to string copy. The optimization also helps the parser, removing a string copy when flushing text in the HTML parser (flushPendingText) when there's a single text that was appended, and the text is a reasonable length (<= 65536 bytes). This optimization further helps any code that would potentially do an append an AtomicString to a StringBuilder and then do toAtomicString() or toString() by ensuring we return the same AtomicString again avoiding a bunch of work, and allowing the atomic string optimization to propagate to the result. This patch restores the StringBuilder optimization by adding a StringImpl* member to StringView and allowing toString() and toAtomicString() to return that member when the offset is zero and the length is the same as the input string. This patch also collapses the m_is8Bit flag into the m_impl by just reusing the field to know if the StringView is 8bit. We now store the shared empty 8bit or 16bit strings in the impl field for views created from sources that are not backed by StringImpl. Doing this keeps the StringView class to be only 2 ptrs and an unsigned wide. I also further simplified the code by using the union members directly and by adding a void* bytes() getter to StringImpl which removed the branch on is8Bit for the common case of constructing a StringView from a String or StringImpl with no offset, and the full length. This patch is a 10% improvement on the Parser/textarea-parsing.html PerformanceTest on my Retina Macbook Pro. BUG=615174 ========== to ========== Implement StringImpl sharing for StringView::toString(). Prior to https://codereview.chromium.org/2007103003 StringBuilder had an optimization that would share the StringImpl when calling append() with a String the first time. This meant that doing element.textContent on an Element with a single Text node child wouldn't need to string copy. The optimization also helps the parser, removing a string copy when flushing text in the HTML parser (flushPendingText) when there's a single text that was appended, and the text is a reasonable length (<= 65536 bytes). This optimization further helps any code that would potentially do an append an AtomicString to a StringBuilder and then do toAtomicString() or toString() by ensuring we return the same AtomicString again avoiding a bunch of work, and allowing the atomic string optimization to propagate to the result. This patch restores the StringBuilder optimization by adding a StringImpl* member to StringView and allowing toString() and toAtomicString() to return that member when the offset is zero and the length is the same as the input string. This patch also collapses the m_is8Bit flag into the m_impl by just reusing the field to know if the StringView is 8bit. We now store the shared empty 8bit or 16bit strings in the impl field for views created from sources that are not backed by StringImpl. Doing this keeps the StringView class to be only 2 ptrs and an unsigned wide. I also further simplified the code by using the union members directly and by adding a void* bytes() getter to StringImpl which removed the branch on is8Bit for the common case of constructing a StringView from a String or StringImpl with no offset, and the full length. BUG=615174 ==========
esprehn@chromium.org changed reviewers: + haraken@chromium.org, jyasskin@chromium.org, yutak@chromium.org
Description was changed from ========== Implement StringImpl sharing for StringView::toString(). Prior to https://codereview.chromium.org/2007103003 StringBuilder had an optimization that would share the StringImpl when calling append() with a String the first time. This meant that doing element.textContent on an Element with a single Text node child wouldn't need to string copy. The optimization also helps the parser, removing a string copy when flushing text in the HTML parser (flushPendingText) when there's a single text that was appended, and the text is a reasonable length (<= 65536 bytes). This optimization further helps any code that would potentially do an append an AtomicString to a StringBuilder and then do toAtomicString() or toString() by ensuring we return the same AtomicString again avoiding a bunch of work, and allowing the atomic string optimization to propagate to the result. This patch restores the StringBuilder optimization by adding a StringImpl* member to StringView and allowing toString() and toAtomicString() to return that member when the offset is zero and the length is the same as the input string. This patch also collapses the m_is8Bit flag into the m_impl by just reusing the field to know if the StringView is 8bit. We now store the shared empty 8bit or 16bit strings in the impl field for views created from sources that are not backed by StringImpl. Doing this keeps the StringView class to be only 2 ptrs and an unsigned wide. I also further simplified the code by using the union members directly and by adding a void* bytes() getter to StringImpl which removed the branch on is8Bit for the common case of constructing a StringView from a String or StringImpl with no offset, and the full length. BUG=615174 ========== to ========== Implement StringImpl sharing for StringView::toString(). Prior to https://codereview.chromium.org/2007103003 StringBuilder had an optimization that would share the StringImpl when calling append() with a String the first time. This meant that doing element.textContent on an Element with a single Text node child wouldn't need to string copy. The optimization also helps the parser, removing a string copy when flushing text in the HTML parser (flushPendingText) when there's a single text that was appended, and the text is a reasonable length (<= 65536 bytes). This optimization further helps any code that would potentially append an AtomicString to a StringBuilder and then do toAtomicString() or toString() by ensuring we return the same AtomicString again avoiding a bunch of work, and allowing the atomic string optimization to propagate to the result. This patch restores the StringBuilder optimization by adding a StringImpl* member to StringView and allowing toString() and toAtomicString() to return that member when the offset is zero and the length is the same as the input string. This patch also collapses the m_is8Bit flag into the m_impl by just reusing the field to know if the StringView is 8bit. We now store the shared empty 8bit or 16bit strings in the impl field for views created from sources that are not backed by StringImpl. Doing this keeps the StringView class to be only 2 ptrs and an unsigned wide. I also further simplified the code by using the union members directly and by adding a void* bytes() getter to StringImpl which removed the branch on is8Bit for the common case of constructing a StringView from a String or StringImpl with no offset, and the full length. BUG=615174 ==========
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_chromium_rel_ng on tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_...)
https://codereview.chromium.org/2033293002/diff/40001/third_party/WebKit/Sour... File third_party/WebKit/Source/wtf/text/StringBuilder.h (right): https://codereview.chromium.org/2033293002/diff/40001/third_party/WebKit/Sour... third_party/WebKit/Source/wtf/text/StringBuilder.h:103: m_string = impl; Don't we need to set m_buffer to impl? https://codereview.chromium.org/2033293002/diff/40001/third_party/WebKit/Sour... File third_party/WebKit/Source/wtf/text/StringView.h (right): https://codereview.chromium.org/2033293002/diff/40001/third_party/WebKit/Sour... third_party/WebKit/Source/wtf/text/StringView.h:126: if (m_impl->bytes() == bytes() && m_length == m_impl->length()) Would you help me understand why do you need to check both bytes() and length()? Wouldn't it be enough to check bytes()? https://codereview.chromium.org/2033293002/diff/40001/third_party/WebKit/Sour... third_party/WebKit/Source/wtf/text/StringView.h:143: StringImpl* m_impl; Who guarantees that someone else keeps a reference to the StringImpl while StringView is used? I guess it's a little bit fragile assumption. https://codereview.chromium.org/2033293002/diff/40001/third_party/WebKit/Sour... third_party/WebKit/Source/wtf/text/StringView.h:148: const void* m_bytes; Worth adding a comment that how each of the fields are used.
https://codereview.chromium.org/2033293002/diff/40001/third_party/WebKit/Sour... File third_party/WebKit/Source/wtf/text/StringBuilder.h (right): https://codereview.chromium.org/2033293002/diff/40001/third_party/WebKit/Sour... third_party/WebKit/Source/wtf/text/StringBuilder.h:103: m_string = impl; On 2016/06/06 at 02:28:52, haraken wrote: > Don't we need to set m_buffer to impl? Nope, m_buffer is for when we have an overallocated string created with the allocate uninitialized method. Since the string we're adding here is coming from outside the builder its immutable and we can't set m_buffer, and it's also correctly sized already, so there's no extra room. https://codereview.chromium.org/2033293002/diff/40001/third_party/WebKit/Sour... File third_party/WebKit/Source/wtf/text/StringView.h (right): https://codereview.chromium.org/2033293002/diff/40001/third_party/WebKit/Sour... third_party/WebKit/Source/wtf/text/StringView.h:126: if (m_impl->bytes() == bytes() && m_length == m_impl->length()) On 2016/06/06 at 02:28:52, haraken wrote: > Would you help me understand why do you need to check both bytes() and length()? Wouldn't it be enough to check bytes()? A StringView can be a shorter length than the underlying string, for example: String s("xyz"); StringView(s, 0, 1) == "x"; // offset of 0, length is limited to 1. So for the StringImpl to be identical to this StringView it needs to be the same length, and have the same offset. https://codereview.chromium.org/2033293002/diff/40001/third_party/WebKit/Sour... third_party/WebKit/Source/wtf/text/StringView.h:143: StringImpl* m_impl; On 2016/06/06 at 02:28:52, haraken wrote: > Who guarantees that someone else keeps a reference to the StringImpl while StringView is used? I guess it's a little bit fragile assumption. The caller does, this is the same as base::StringPiece. Early on I had this own m_impl, but Jeffery didn't like it because sometimes it has ownership and sometimes it doesn't. ex. If we take a ref we have these semantics: char* buffer = allocateBuffer(); StringView view1(buffer, length); // doesn't own. String s("x"); StringView view2(s); // owns. Jeffery wanted it to either always own, or never own, which meant either always allocating a StringImpl for literal buffers (too slow), or never taking ownership like base::StringPiece. Taking ownership also means ref churn which CSSParserString was trying to avoid by also not taking ownership. StringView is meant to be really cheap, so we can replace lots of usage of (LChar*, length) and String arguments all over the engine with it so passing a literal string like "foo" or passing a String object are both cheap. https://codereview.chromium.org/2033293002/diff/40001/third_party/WebKit/Sour... third_party/WebKit/Source/wtf/text/StringView.h:148: const void* m_bytes; On 2016/06/06 at 02:28:52, haraken wrote: > Worth adding a comment that how each of the fields are used. This is the same as String's semantics. In the next patch I was thinking of simplifying this down to just m_bytes and removing the union and just having the reinterpret_cast directly inside characters8() and characters16(). I think that's more clear.
LGTM https://codereview.chromium.org/2033293002/diff/40001/third_party/WebKit/Sour... File third_party/WebKit/Source/wtf/text/StringView.h (right): https://codereview.chromium.org/2033293002/diff/40001/third_party/WebKit/Sour... third_party/WebKit/Source/wtf/text/StringView.h:143: StringImpl* m_impl; On 2016/06/06 04:42:21, esprehn wrote: > On 2016/06/06 at 02:28:52, haraken wrote: > > Who guarantees that someone else keeps a reference to the StringImpl while > StringView is used? I guess it's a little bit fragile assumption. > > The caller does, this is the same as base::StringPiece. Early on I had this own > m_impl, but Jeffery didn't like it because sometimes it has ownership and > sometimes it doesn't. > > ex. If we take a ref we have these semantics: > > char* buffer = allocateBuffer(); > StringView view1(buffer, length); // doesn't own. > > String s("x"); > StringView view2(s); // owns. > > Jeffery wanted it to either always own, or never own, which meant either always > allocating a StringImpl for literal buffers (too slow), or never taking > ownership like base::StringPiece. > > Taking ownership also means ref churn which CSSParserString was trying to avoid > by also not taking ownership. StringView is meant to be really cheap, so we can > replace lots of usage of (LChar*, length) and String arguments all over the > engine with it so passing a literal string like "foo" or passing a String object > are both cheap. Thanks for the details. Makes sense. The assertion you added to ~StringView() would be enough to catch ownership errors in common cases :) https://codereview.chromium.org/2033293002/diff/40001/third_party/WebKit/Sour... third_party/WebKit/Source/wtf/text/StringView.h:148: const void* m_bytes; On 2016/06/06 04:42:21, esprehn wrote: > On 2016/06/06 at 02:28:52, haraken wrote: > > Worth adding a comment that how each of the fields are used. > > This is the same as String's semantics. In the next patch I was thinking of > simplifying this down to just m_bytes and removing the union and just having the > reinterpret_cast directly inside characters8() and characters16(). I think > that's more clear. That sounds great.
The CQ bit was checked by esprehn@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/2033293002/40001
lgtm
Message was sent while issue was closed.
Committed patchset #3 (id:40001)
Message was sent while issue was closed.
Description was changed from ========== Implement StringImpl sharing for StringView::toString(). Prior to https://codereview.chromium.org/2007103003 StringBuilder had an optimization that would share the StringImpl when calling append() with a String the first time. This meant that doing element.textContent on an Element with a single Text node child wouldn't need to string copy. The optimization also helps the parser, removing a string copy when flushing text in the HTML parser (flushPendingText) when there's a single text that was appended, and the text is a reasonable length (<= 65536 bytes). This optimization further helps any code that would potentially append an AtomicString to a StringBuilder and then do toAtomicString() or toString() by ensuring we return the same AtomicString again avoiding a bunch of work, and allowing the atomic string optimization to propagate to the result. This patch restores the StringBuilder optimization by adding a StringImpl* member to StringView and allowing toString() and toAtomicString() to return that member when the offset is zero and the length is the same as the input string. This patch also collapses the m_is8Bit flag into the m_impl by just reusing the field to know if the StringView is 8bit. We now store the shared empty 8bit or 16bit strings in the impl field for views created from sources that are not backed by StringImpl. Doing this keeps the StringView class to be only 2 ptrs and an unsigned wide. I also further simplified the code by using the union members directly and by adding a void* bytes() getter to StringImpl which removed the branch on is8Bit for the common case of constructing a StringView from a String or StringImpl with no offset, and the full length. BUG=615174 ========== to ========== Implement StringImpl sharing for StringView::toString(). Prior to https://codereview.chromium.org/2007103003 StringBuilder had an optimization that would share the StringImpl when calling append() with a String the first time. This meant that doing element.textContent on an Element with a single Text node child wouldn't need to string copy. The optimization also helps the parser, removing a string copy when flushing text in the HTML parser (flushPendingText) when there's a single text that was appended, and the text is a reasonable length (<= 65536 bytes). This optimization further helps any code that would potentially append an AtomicString to a StringBuilder and then do toAtomicString() or toString() by ensuring we return the same AtomicString again avoiding a bunch of work, and allowing the atomic string optimization to propagate to the result. This patch restores the StringBuilder optimization by adding a StringImpl* member to StringView and allowing toString() and toAtomicString() to return that member when the offset is zero and the length is the same as the input string. This patch also collapses the m_is8Bit flag into the m_impl by just reusing the field to know if the StringView is 8bit. We now store the shared empty 8bit or 16bit strings in the impl field for views created from sources that are not backed by StringImpl. Doing this keeps the StringView class to be only 2 ptrs and an unsigned wide. I also further simplified the code by using the union members directly and by adding a void* bytes() getter to StringImpl which removed the branch on is8Bit for the common case of constructing a StringView from a String or StringImpl with no offset, and the full length. BUG=615174 Committed: https://crrev.com/580cbf5d215fee758dd5c447cf03885b07bc9d2e Cr-Commit-Position: refs/heads/master@{#397986} ==========
Message was sent while issue was closed.
Patchset 3 (id:??) landed as https://crrev.com/580cbf5d215fee758dd5c447cf03885b07bc9d2e Cr-Commit-Position: refs/heads/master@{#397986} |