|
|
Created:
3 years, 11 months ago by Leszek Swirski Modified:
3 years, 11 months ago Reviewers:
rmcilroy CC:
v8-reviews_googlegroups.com Target Ref:
refs/pending/heads/master Project:
v8 Visibility:
Public. |
Description[conversions] Make "DoubleToUint32IfEqualToSelf" use bit magic
Uses the structure of an IEEE float to speed up
DoubleToUint32IfEqualToSelf, similar to FastD2UI. Micro-benchmarks show
a ~1.2x-2x speed-up, depending on the input.
Review-Url: https://codereview.chromium.org/2636453003
Cr-Commit-Position: refs/heads/master@{#42420}
Committed: https://chromium.googlesource.com/v8/v8/+/5dbc1ba0f999161bb10548eba78aa9d22199d4be
Patch Set 1 #
Total comments: 6
Patch Set 2 : Change. Everything. #
Total comments: 2
Patch Set 3 : Use FastUI2D #Messages
Total messages: 24 (14 generated)
The CQ bit was checked by leszeks@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
leszeks@chromium.org changed reviewers: + rmcilroy@chromium.org
Hi Ross, This is the bit magic I was talking about for the double to uint32 checked conversion. WDYT?
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Looks good overall, one question. https://codereview.chromium.org/2636453003/diff/1/src/conversions-inl.h File src/conversions-inl.h (right): https://codereview.chromium.org/2636453003/diff/1/src/conversions-inl.h#newco... src/conversions-inl.h:154: memcpy(&check, top_ptr, sizeof(uint32_t)); Could you just do a reinterpreter_cast on the address and compare directly in memory rather than a memcpy? https://codereview.chromium.org/2636453003/diff/1/src/conversions-inl.h#newco... src/conversions-inl.h:156: memcpy(uint32_value, bottom_ptr, sizeof(uint32_t)); ditto
https://codereview.chromium.org/2636453003/diff/1/src/conversions-inl.h File src/conversions-inl.h (right): https://codereview.chromium.org/2636453003/diff/1/src/conversions-inl.h#newco... src/conversions-inl.h:154: memcpy(&check, top_ptr, sizeof(uint32_t)); On 2017/01/16 14:12:11, rmcilroy wrote: > Could you just do a reinterpreter_cast on the address and compare directly in > memory rather than a memcpy? Good question -- short version, no, because of strict aliasing. Longer explanation, see the giant comment above bit_cast in macros.h. I could use bitcast to cast the float to a uint64_t, and check/shift that around, but I actually profiled this in microbenchmarks and it was slower. https://codereview.chromium.org/2636453003/diff/1/src/conversions-inl.h#newco... src/conversions-inl.h:156: memcpy(uint32_value, bottom_ptr, sizeof(uint32_t)); On 2017/01/16 14:12:11, rmcilroy wrote: > ditto As above.
https://codereview.chromium.org/2636453003/diff/1/src/conversions-inl.h File src/conversions-inl.h (right): https://codereview.chromium.org/2636453003/diff/1/src/conversions-inl.h#newco... src/conversions-inl.h:154: memcpy(&check, top_ptr, sizeof(uint32_t)); On 2017/01/16 14:26:54, Leszek Swirski wrote: > On 2017/01/16 14:12:11, rmcilroy wrote: > > Could you just do a reinterpreter_cast on the address and compare directly in > > memory rather than a memcpy? > > Good question -- short version, no, because of strict aliasing. Longer > explanation, see the giant comment above bit_cast in macros.h. Ahh I see, thanks for the explination. >I could use > bitcast to cast the float to a uint64_t, and check/shift that around, but I > actually profiled this in microbenchmarks and it was slower. How much slower? I would rather keep all the mempy nastyness in one place and just use bit_cast here if it wasn't a significant perf difference. Maybe you could shift the mask / valid constants instead so that constant folding hides any of the shifting overheads?
On 2017/01/16 15:59:51, rmcilroy wrote: > >I could use > > bitcast to cast the float to a uint64_t, and check/shift that around, but I > > actually profiled this in microbenchmarks and it was slower. > > How much slower? I would rather keep all the mempy nastyness in one place and > just use bit_cast here if it wasn't a significant perf difference. Maybe you > could shift the mask / valid constants instead so that constant folding hides > any of the shifting overheads? Like I say, microbenchmarks and on x64, but slower than the original function: Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------ BM_Double2Uint32_Baseline/0 235 ns 235 ns 2818466 BM_Double2Uint32_Baseline/1 236 ns 236 ns 2977739 BM_Double2Uint32_Baseline/2 95 ns 95 ns 7451615 [Ed: this one is on an input of -1, there's a fast path] BM_Double2Uint32_Baseline/3 262 ns 262 ns 2979588 BM_Double2Uint32_Baseline/4 228 ns 228 ns 2704763 BM_Double2Uint32_Baseline/5 213 ns 213 ns 3304381 BM_Double2Uint32_into_check_then_copy_out_unconditional/0 117 ns 117 ns 6041176 BM_Double2Uint32_into_check_then_copy_out_unconditional/1 118 ns 118 ns 5989923 BM_Double2Uint32_into_check_then_copy_out_unconditional/2 117 ns 117 ns 6055617 BM_Double2Uint32_into_check_then_copy_out_unconditional/3 116 ns 116 ns 6040589 BM_Double2Uint32_into_check_then_copy_out_unconditional/4 116 ns 116 ns 6005077 BM_Double2Uint32_into_check_then_copy_out_unconditional/5 118 ns 118 ns 6040763 BM_Double2Uint32_into_uint64_then_check/0 356 ns 356 ns 1966876 BM_Double2Uint32_into_uint64_then_check/1 356 ns 356 ns 1967444 BM_Double2Uint32_into_uint64_then_check/2 356 ns 356 ns 1964732 BM_Double2Uint32_into_uint64_then_check/3 358 ns 358 ns 1969686 BM_Double2Uint32_into_uint64_then_check/4 361 ns 361 ns 1966370 BM_Double2Uint32_into_uint64_then_check/5 357 ns 357 ns 1909102
The CQ bit was checked by leszeks@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Description was changed from ========== [conversions] Make "DoubleToUint32IfEqualToSelf" use bit magic Uses the structure of an IEEE float to speed up DoubleToUint32IfEqualToSelf, similar to FastD2UI. Micro-benchmarks show a ~2x speed-up. ========== to ========== [conversions] Make "DoubleToUint32IfEqualToSelf" use bit magic Uses the structure of an IEEE float to speed up DoubleToUint32IfEqualToSelf, similar to FastD2UI. Micro-benchmarks show a ~1.2x-2x speed-up, depending on the input. ==========
https://codereview.chromium.org/2636453003/diff/1/src/conversions-inl.h File src/conversions-inl.h (right): https://codereview.chromium.org/2636453003/diff/1/src/conversions-inl.h#newco... src/conversions-inl.h:154: memcpy(&check, top_ptr, sizeof(uint32_t)); On 2017/01/16 15:59:51, rmcilroy wrote: > On 2017/01/16 14:26:54, Leszek Swirski wrote: > > On 2017/01/16 14:12:11, rmcilroy wrote: > > > Could you just do a reinterpreter_cast on the address and compare directly > in > > > memory rather than a memcpy? > > > > Good question -- short version, no, because of strict aliasing. Longer > > explanation, see the giant comment above bit_cast in macros.h. > > Ahh I see, thanks for the explination. > > >I could use > > bitcast to cast the float to a uint64_t, and check/shift that around, but I > > actually profiled this in microbenchmarks and it was slower. > > How much slower? I would rather keep all the mempy nastyness in one place and > just use bit_cast here if it wasn't a significant perf difference. Maybe you > could shift the mask / valid constants instead so that constant folding hides > any of the shifting overheads? Whoops, it turns out my microbenchmarks were super broken. bit_cast to uint64 is actually faster, so updated the code to do it.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
LGTM once comment addressed. https://codereview.chromium.org/2636453003/diff/20001/src/conversions-inl.h File src/conversions-inl.h (right): https://codereview.chromium.org/2636453003/diff/20001/src/conversions-inl.h#n... src/conversions-inl.h:150: return (result & kBottomBitMask) == value; nit - please do: return FastUI2D(*uint32_value) == value
https://codereview.chromium.org/2636453003/diff/20001/src/conversions-inl.h File src/conversions-inl.h (right): https://codereview.chromium.org/2636453003/diff/20001/src/conversions-inl.h#n... src/conversions-inl.h:150: return (result & kBottomBitMask) == value; On 2017/01/17 15:46:07, rmcilroy wrote: > nit - please do: > > return FastUI2D(*uint32_value) == value Changed to FastUI2D, but still using "result & kBottomBitMask" because fast (reliably 5-7% faster on microbenchmarks).
The CQ bit was checked by leszeks@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from rmcilroy@chromium.org Link to the patchset: https://codereview.chromium.org/2636453003/#ps40001 (title: "Use FastUI2D")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 40001, "attempt_start_ts": 1484669215152700, "parent_rev": "23fb6cf07da51a446fc6d90876d4ec3519862460", "commit_rev": "5dbc1ba0f999161bb10548eba78aa9d22199d4be"}
Message was sent while issue was closed.
Description was changed from ========== [conversions] Make "DoubleToUint32IfEqualToSelf" use bit magic Uses the structure of an IEEE float to speed up DoubleToUint32IfEqualToSelf, similar to FastD2UI. Micro-benchmarks show a ~1.2x-2x speed-up, depending on the input. ========== to ========== [conversions] Make "DoubleToUint32IfEqualToSelf" use bit magic Uses the structure of an IEEE float to speed up DoubleToUint32IfEqualToSelf, similar to FastD2UI. Micro-benchmarks show a ~1.2x-2x speed-up, depending on the input. Review-Url: https://codereview.chromium.org/2636453003 Cr-Commit-Position: refs/heads/master@{#42420} Committed: https://chromium.googlesource.com/v8/v8/+/5dbc1ba0f999161bb10548eba78aa9d2219... ==========
Message was sent while issue was closed.
Committed patchset #3 (id:40001) as https://chromium.googlesource.com/v8/v8/+/5dbc1ba0f999161bb10548eba78aa9d2219... |