|
|
Chromium Code Reviews|
Created:
4 years, 2 months ago by Charlie Harrison Modified:
4 years, 1 month ago CC:
blink-reviews, chromium-reviews Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
Description[KURL] Avoid re-hashing input if is already canonicalized
KURL will re-hash the raw canonicalized output and check the
AtomicStringTable (addWithTranslator) for the string. This can be very
expensive for large URLs.
However, since many URLs are generated from existing AtomicStrings
(which already have their hashes computed), we can use a fast path
and just return the input if it was already canonicalized and hashed.
This optimization shaves 25% off the overhead of loading 1mb data
url images (if the data url is properly canonicalized). Data from
profiling results on Linux.
BUG=348655, 657978
Committed: https://crrev.com/495bfff108b8c1b2f67bfbd0a7fffceb436b31f8
Cr-Commit-Position: refs/heads/master@{#430166}
Patch Set 1 #Patch Set 2 : comment + AtomicString #
Total comments: 2
Patch Set 3 : fix comment typo (trybots prev) #
Total comments: 2
Patch Set 4 : fix threaded parser, random refactoring #Patch Set 5 : lets just go wild and see what happens (remove isUnusedPreload() check) #Patch Set 6 : uh ignore PS 5 I have no idea how that got there :P #
Total comments: 4
Patch Set 7 : Major tweaks, CheckedNumeric #Patch Set 8 : null check :P #
Total comments: 6
Patch Set 9 : esprehn review: use clampTo and tweak comment #Patch Set 10 : remove <limits> #
Total comments: 2
Messages
Total messages: 49 (32 generated)
The CQ bit was checked by csharrison@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: android_arm64_dbg_recipe on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_arm6...) android_clang_dbg_recipe on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_clan...) android_compile_dbg on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_comp...) android_n5x_swarming_rel on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_n5x_...) cast_shell_android on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/cast_shell_a...) linux_android_rel_ng on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/linux_androi...) cast_shell_linux on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/cast_shell_linu...) chromeos_amd64-generic_chromium_compile_only_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromeos_amd64-...) chromeos_daisy_chromium_compile_only_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromeos_daisy_...) chromeos_x86-generic_chromium_compile_only_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromeos_x86-ge...) chromium_presubmit on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presub...) linux_chromium_asan_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...) linux_chromium_chromeos_compile_dbg_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...) linux_chromium_chromeos_ozone_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...) linux_chromium_chromeos_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...) linux_chromium_clobber_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...) linux_chromium_compile_dbg_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...) linux_chromium_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...) ios-device on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device/builds...) ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...) mac_chromium_compile_dbg_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_comp...) mac_chromium_rel_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_...) win_chromium_compile_dbg_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_comp...) win_chromium_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_...) win_chromium_x64_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_x64_...) win_clang on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_clang/builds/...)
The CQ bit was checked by csharrison@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
csharrison@chromium.org changed reviewers: + haraken@chromium.org
haraken, WDYT? BTW the benchmark came from loading 100 XHRs with data URLs. There may be slight differences for e.g. image loading that I am looking into.
haraken@chromium.org changed reviewers: + esprehn@chromium.org
I think Elliott is the best reviewer for String stuff :)
https://codereview.chromium.org/2428093002/diff/20001/third_party/WebKit/Sour... File third_party/WebKit/Source/platform/weborigin/KURL.cpp (right): https://codereview.chromium.org/2428093002/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/weborigin/KURL.cpp:764: // Note: because we are canonicalization into narrow characters, this check we are canonicalizing
Thanks! BTW I did an updated benchmark using:
for (i < 100) {
var img = new Image();
img.src = <1mb long data url>
document.append(img);
}
and saw similar numbers: 25-30% of the workload is spent in
AtomicString::fromUTF8. Profiler says the majority of time there is spent in
WTF::Unicode::calculateStringHashAndLengthFromUTF8MaskingTop8Bits.
https://codereview.chromium.org/2428093002/diff/20001/third_party/WebKit/Sour...
File third_party/WebKit/Source/platform/weborigin/KURL.cpp (right):
https://codereview.chromium.org/2428093002/diff/20001/third_party/WebKit/Sour...
third_party/WebKit/Source/platform/weborigin/KURL.cpp:764: // Note: because we
are canonicalization into narrow characters, this check
On 2016/10/18 19:30:37, haraken wrote:
>
> we are canonicalizing
Done.
bot failures look real, I'm looking into it.
The CQ bit was checked by csharrison@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Ok tests are fixed, but the failures brought about how this change is a bit problematic. Namely, we pass in a const String, and KURL secretly adds its StringImpl to the Atomic table and marks it Atomic. I'm guessing underlying KURL strings should be atomic because we do tons of comparisons with them. This causes issues if we construct KURLs from Strings that want to stay thread safe, and is why the BG parser had to pass copies of href attributes to construct predicted base URLs . It resulted in the tokens being unsafe to pass to another thread. I'd like some advice on what's the best solution here. I think there a few options: - Keep something like PS 4, which just silently changes Strings to be Atomic (we can probably add documentation). - Add another KURL constructor which is thread safe, and will make the underlying m_string either non-atomic, or just avoid this optimization entirely. - Just make the underlying m_string non-atomic for all cases which hit this optimization. - Abandon this patch. Due to the big wins for data URLs I'd like to avoid this option if possible :P What do people think? How important is it for KURLs to be comparable using the AtomicString table?
Description was changed from ========== [KURL] Avoid re-allocating buffers if input is already canonicalized. Rather than directly allocating an AtomicString with the canonicalized output, we memcmp that the output is the same as the input. If it is, we don't need to make another allocation. For a workload of loading a sequence of large (~1mb) base64 encoded data URLs, this patch reduces CPU load by 25% (via Linux profiler). Note that for URLs where canonicalization *does* do something, this patch adds overhead of scanning the buffers with memcmp. BUG=348655 ========== to ========== [KURL] Avoid re-allocating buffers if input is already canonicalized. Rather than directly allocating an AtomicString with the canonicalized output, we memcmp that the output is the same as the input. If it is, we don't need to make another allocation. For a workload of loading a sequence of large (~1mb) base64 encoded data URLs, this patch reduces CPU load by 25% (via Linux profiler). Note that for URLs where canonicalization *does* do something, this patch adds overhead of scanning the buffers with memcmp. BUG=348655,657978 ==========
The CQ bit was checked by csharrison@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2428093002/diff/40001/third_party/WebKit/Sour... File third_party/WebKit/Source/platform/weborigin/KURL.cpp (left): https://codereview.chromium.org/2428093002/diff/40001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/weborigin/KURL.cpp:724: init(base, relative.characters16(), relative.length(), queryEncoding); We go down this path when relative.isNull() which is strange. https://codereview.chromium.org/2428093002/diff/100001/third_party/WebKit/Sou... File third_party/WebKit/Source/platform/weborigin/KURL.cpp (right): https://codereview.chromium.org/2428093002/diff/100001/third_party/WebKit/Sou... third_party/WebKit/Source/platform/weborigin/KURL.cpp:723: int relativeLength = static_cast<int>(relativeUTF8.length()); This is scary, our strings can be unsigned in length, but net URL is using int? I think this means you can overflow this thing. I think you'd want to clamp? https://codereview.chromium.org/2428093002/diff/100001/third_party/WebKit/Sou... third_party/WebKit/Source/platform/weborigin/KURL.cpp:730: if (!relative.isEmpty() && relative.containsOnlyASCII() && Why does containsOnlyASCII() matter here? I don't think there's any situation where the length is the same and the latin1 representation and the utf8 representation mean different things.
The CQ bit was checked by csharrison@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was checked by csharrison@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Description was changed from ========== [KURL] Avoid re-allocating buffers if input is already canonicalized. Rather than directly allocating an AtomicString with the canonicalized output, we memcmp that the output is the same as the input. If it is, we don't need to make another allocation. For a workload of loading a sequence of large (~1mb) base64 encoded data URLs, this patch reduces CPU load by 25% (via Linux profiler). Note that for URLs where canonicalization *does* do something, this patch adds overhead of scanning the buffers with memcmp. BUG=348655,657978 ========== to ========== [KURL] Avoid re-hashing input if is already canonicalized KURL will re-hash the raw canonicalized output and check the AtomicStringTable (addWithTranslator) for the string. This can be very expensive for large URLs. However, since many URLs are generated from existing AtomicStrings (which already have their hashes computed), we can use a fast path and just return the input if it was already canonicalized and hashed. BUG=348655,657978 ==========
Thanks a lot for the review. I've updated the CL to reflect an offline conversion: - CL description updated to reflect what is *really* being optimized. This actually doesn't have anything to do with allocations, I was mistaken :P - I've edited the PS to be simpler per your suggestion here: https://codereview.chromium.org/2450083002 I've also used CheckedNumeric to do clamping. Please let me know if there's a more idiomatic way. Also let me know if you disagree with checking if the impl is Atomic. https://codereview.chromium.org/2428093002/diff/40001/third_party/WebKit/Sour... File third_party/WebKit/Source/platform/weborigin/KURL.cpp (left): https://codereview.chromium.org/2428093002/diff/40001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/weborigin/KURL.cpp:724: init(base, relative.characters16(), relative.length(), queryEncoding); On 2016/10/25 22:23:50, esprehn wrote: > We go down this path when relative.isNull() which is strange. The null check is just for the is8Bit check which reaches into m_impl. Other than that the code is robust to nulls. https://codereview.chromium.org/2428093002/diff/100001/third_party/WebKit/Sou... File third_party/WebKit/Source/platform/weborigin/KURL.cpp (right): https://codereview.chromium.org/2428093002/diff/100001/third_party/WebKit/Sou... third_party/WebKit/Source/platform/weborigin/KURL.cpp:723: int relativeLength = static_cast<int>(relativeUTF8.length()); On 2016/10/25 22:23:50, esprehn wrote: > This is scary, our strings can be unsigned in length, but net URL is using int? > I think this means you can overflow this thing. I think you'd want to clamp? Done. https://codereview.chromium.org/2428093002/diff/100001/third_party/WebKit/Sou... third_party/WebKit/Source/platform/weborigin/KURL.cpp:730: if (!relative.isEmpty() && relative.containsOnlyASCII() && On 2016/10/25 22:23:50, esprehn wrote: > Why does containsOnlyASCII() matter here? I don't think there's any situation > where the length is the same and the latin1 representation and the utf8 > representation mean different things. You're right. removed.
Description was changed from ========== [KURL] Avoid re-hashing input if is already canonicalized KURL will re-hash the raw canonicalized output and check the AtomicStringTable (addWithTranslator) for the string. This can be very expensive for large URLs. However, since many URLs are generated from existing AtomicStrings (which already have their hashes computed), we can use a fast path and just return the input if it was already canonicalized and hashed. BUG=348655,657978 ========== to ========== [KURL] Avoid re-hashing input if is already canonicalized KURL will re-hash the raw canonicalized output and check the AtomicStringTable (addWithTranslator) for the string. This can be very expensive for large URLs. However, since many URLs are generated from existing AtomicStrings (which already have their hashes computed), we can use a fast path and just return the input if it was already canonicalized and hashed. This optimization shaves 25% off the overhead of loading 1mb data url images (if the data url is properly canonicalized). Data from profiling results on Linux. BUG=348655,657978 ==========
esprehn, friendly ping?
The CheckedNumeric looks weird, I think you want clampTo? otherwise lgtm https://codereview.chromium.org/2428093002/diff/140001/third_party/WebKit/Sou... File third_party/WebKit/Source/platform/weborigin/KURL.cpp (right): https://codereview.chromium.org/2428093002/diff/140001/third_party/WebKit/Sou... third_party/WebKit/Source/platform/weborigin/KURL.cpp:742: CheckedNumeric<int> relativeLength( I think you want to use clampTo() in here instead of CheckedNumeric? https://codereview.chromium.org/2428093002/diff/140001/third_party/WebKit/Sou... third_party/WebKit/Source/platform/weborigin/KURL.cpp:749: CheckedNumeric<int> relativeLength( ditto https://codereview.chromium.org/2428093002/diff/140001/third_party/WebKit/Sou... third_party/WebKit/Source/platform/weborigin/KURL.cpp:767: // the string thread unsafe. fwiw I don't think the thread safety is an issue, since KURL itself is not thread safe since it uses AtomicString.
The CQ bit was checked by csharrison@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
thanks!! clampTo makes this code way nicer. https://codereview.chromium.org/2428093002/diff/140001/third_party/WebKit/Sou... File third_party/WebKit/Source/platform/weborigin/KURL.cpp (right): https://codereview.chromium.org/2428093002/diff/140001/third_party/WebKit/Sou... third_party/WebKit/Source/platform/weborigin/KURL.cpp:742: CheckedNumeric<int> relativeLength( On 2016/11/04 22:55:59, esprehn wrote: > I think you want to use clampTo() in here instead of CheckedNumeric? Aha, thanks. Done. https://codereview.chromium.org/2428093002/diff/140001/third_party/WebKit/Sou... third_party/WebKit/Source/platform/weborigin/KURL.cpp:749: CheckedNumeric<int> relativeLength( On 2016/11/04 22:55:59, esprehn wrote: > ditto Done. https://codereview.chromium.org/2428093002/diff/140001/third_party/WebKit/Sou... third_party/WebKit/Source/platform/weborigin/KURL.cpp:767: // the string thread unsafe. On 2016/11/04 22:55:59, esprehn wrote: > fwiw I don't think the thread safety is an issue, since KURL itself is not > thread safe since it uses AtomicString. The KURL is not thread safe, but the String passed as input could be (and needs to be in some cases when the html parser is off-thread). If the String is not Atomic, the code below would do something like `m_string = AtomicString(relative)`, which would mark the underlying impl as atomic, making the input no longer thread safe. I tried to edit the comment to be a bit clearer here.
The CQ bit was checked by csharrison@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
yay, still lgtm https://codereview.chromium.org/2428093002/diff/180001/third_party/WebKit/Sou... File third_party/WebKit/Source/platform/weborigin/KURL.cpp (right): https://codereview.chromium.org/2428093002/diff/180001/third_party/WebKit/Sou... third_party/WebKit/Source/platform/weborigin/KURL.cpp:765: m_string = relative; btw why does this compile? explicit AtomicString(const String& s) I thought you had to be explicit and write AtomicString(relative). C++ is so confusing. :/
https://codereview.chromium.org/2428093002/diff/180001/third_party/WebKit/Sou... File third_party/WebKit/Source/platform/weborigin/KURL.cpp (right): https://codereview.chromium.org/2428093002/diff/180001/third_party/WebKit/Sou... third_party/WebKit/Source/platform/weborigin/KURL.cpp:765: m_string = relative; On 2016/11/05 02:33:14, esprehn wrote: > btw why does this compile? > > explicit AtomicString(const String& s) > > I thought you had to be explicit and write AtomicString(relative). > > C++ is so confusing. :/ m_string is just a String on KURL, not an explicit AtomicString (for KURL::copy we sometimes have it be non atomic). I just tried to go through some of the string constructors and got lost finding how each case automatically converts though. It is quite confusing. I can certainly add AtomicString(relative) here to be a bit clearer. I believe the semantics are identical.
On 2016/11/05 at 03:20:16, csharrison wrote: > https://codereview.chromium.org/2428093002/diff/180001/third_party/WebKit/Sou... > File third_party/WebKit/Source/platform/weborigin/KURL.cpp (right): > > https://codereview.chromium.org/2428093002/diff/180001/third_party/WebKit/Sou... > third_party/WebKit/Source/platform/weborigin/KURL.cpp:765: m_string = relative; > On 2016/11/05 02:33:14, esprehn wrote: > > btw why does this compile? > > > > explicit AtomicString(const String& s) > > > > I thought you had to be explicit and write AtomicString(relative). > > > > C++ is so confusing. :/ > > m_string is just a String on KURL, not an explicit AtomicString (for KURL::copy we sometimes have it be non atomic). > > I just tried to go through some of the string constructors and got lost finding how each case automatically converts though. It is quite confusing. I can certainly add AtomicString(relative) here to be a bit clearer. I believe the semantics are identical. Oh I get it, yeah it's fine to leave it as is then. :)
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by csharrison@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Message was sent while issue was closed.
Description was changed from ========== [KURL] Avoid re-hashing input if is already canonicalized KURL will re-hash the raw canonicalized output and check the AtomicStringTable (addWithTranslator) for the string. This can be very expensive for large URLs. However, since many URLs are generated from existing AtomicStrings (which already have their hashes computed), we can use a fast path and just return the input if it was already canonicalized and hashed. This optimization shaves 25% off the overhead of loading 1mb data url images (if the data url is properly canonicalized). Data from profiling results on Linux. BUG=348655,657978 ========== to ========== [KURL] Avoid re-hashing input if is already canonicalized KURL will re-hash the raw canonicalized output and check the AtomicStringTable (addWithTranslator) for the string. This can be very expensive for large URLs. However, since many URLs are generated from existing AtomicStrings (which already have their hashes computed), we can use a fast path and just return the input if it was already canonicalized and hashed. This optimization shaves 25% off the overhead of loading 1mb data url images (if the data url is properly canonicalized). Data from profiling results on Linux. BUG=348655,657978 ==========
Message was sent while issue was closed.
Committed patchset #10 (id:180001)
Message was sent while issue was closed.
Description was changed from ========== [KURL] Avoid re-hashing input if is already canonicalized KURL will re-hash the raw canonicalized output and check the AtomicStringTable (addWithTranslator) for the string. This can be very expensive for large URLs. However, since many URLs are generated from existing AtomicStrings (which already have their hashes computed), we can use a fast path and just return the input if it was already canonicalized and hashed. This optimization shaves 25% off the overhead of loading 1mb data url images (if the data url is properly canonicalized). Data from profiling results on Linux. BUG=348655,657978 ========== to ========== [KURL] Avoid re-hashing input if is already canonicalized KURL will re-hash the raw canonicalized output and check the AtomicStringTable (addWithTranslator) for the string. This can be very expensive for large URLs. However, since many URLs are generated from existing AtomicStrings (which already have their hashes computed), we can use a fast path and just return the input if it was already canonicalized and hashed. This optimization shaves 25% off the overhead of loading 1mb data url images (if the data url is properly canonicalized). Data from profiling results on Linux. BUG=348655,657978 Committed: https://crrev.com/495bfff108b8c1b2f67bfbd0a7fffceb436b31f8 Cr-Commit-Position: refs/heads/master@{#430166} ==========
Message was sent while issue was closed.
Patchset 10 (id:??) landed as https://crrev.com/495bfff108b8c1b2f67bfbd0a7fffceb436b31f8 Cr-Commit-Position: refs/heads/master@{#430166} |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
