|
|
Descriptiontry to speed-up maprect + round2i + contains
We call roundOut in a few places. If we can get SkNx::Ceil we could efficiently implement that as well.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2133413002
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Committed: https://skia.googlesource.com/skia/+/b42b785d1cbc98bd34aceae338060831b974f9c5
Patch Set 1 #Patch Set 2 : fix debug build #Patch Set 3 : add unittest #Patch Set 4 : rebase #
Total comments: 3
Patch Set 5 : fix prev rebase #
Total comments: 3
Patch Set 6 : un-move SkCpu.h #Patch Set 7 : and un-move it in gyp #
Total comments: 3
Patch Set 8 : add dox #
Total comments: 1
Messages
Total messages: 56 (32 generated)
Description was changed from ========== try to speed-up maprect + round2i + contains BUG=skia: ========== to ========== try to speed-up maprect + round2i + contains BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2133413002 ==========
The CQ bit was checked by reed@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
reed@google.com changed reviewers: + msarett@google.com, mtklein@google.com
I can't get the Sk4i <= guy to compile :(
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Debug-Trybot on master.client.skia (JOB_FAILED, http://build.chromium.org/p/client.skia/builders/Test-Ubuntu-GCC-GCE-CPU-AVX2...)
The CQ bit was checked by reed@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Pretty cool: Sk4f rd = (src + Sk4f(0.5)).floor(); This generates 1 instruction on INTEL!!! roundps
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Description was changed from ========== try to speed-up maprect + round2i + contains BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2133413002 ========== to ========== try to speed-up maprect + round2i + contains BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2133413002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
The CQ bit was checked by reed@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
There were warnings when CQ was processing your CL: * CQ_EXTRA_TRYBOTS flag is deprecated and will soon be removed. Use CQ_INCLUDE_TRYBOTS instead.
To inline round2i() I need to move SkNx.h (and its dependencies) into include/private ...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: Build-Win-MSVC-x86-Debug-Trybot on master.client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Win-MSVC-x86-D...)
Description was changed from ========== try to speed-up maprect + round2i + contains BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2133413002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== try to speed-up maprect + round2i + contains We call roundOut in a few places. If we can get SkNx::Ceil we could efficiently implement that as well. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2133413002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
https://codereview.chromium.org/2133413002/diff/60001/include/core/SkRect.h File include/core/SkRect.h (right): https://codereview.chromium.org/2133413002/diff/60001/include/core/SkRect.h#n... include/core/SkRect.h:874: SkIRect round2i() const { On Intel this disassembles to roundps and cvtt? There is a single Intel instruction to convert float->int with rounding. I made an attempt to expose that here: https://codereview.chromium.org/2134753006/ https://codereview.chromium.org/2133413002/diff/60001/include/core/SkRect.h#n... include/core/SkRect.h:876: Sk4s rd = (Sk4s::Load(&fLeft) + Sk4s(0.5)).floor(); Why call floor when the cast will floor anyway?
On 2016/07/11 12:50:06, msarett wrote: > https://codereview.chromium.org/2133413002/diff/60001/include/core/SkRect.h > File include/core/SkRect.h (right): > > https://codereview.chromium.org/2133413002/diff/60001/include/core/SkRect.h#n... > include/core/SkRect.h:874: SkIRect round2i() const { > On Intel this disassembles to roundps and cvtt? > > There is a single Intel instruction to convert float->int with rounding. I made > an attempt to expose that here: > https://codereview.chromium.org/2134753006/ > > https://codereview.chromium.org/2133413002/diff/60001/include/core/SkRect.h#n... > include/core/SkRect.h:876: Sk4s rd = (Sk4s::Load(&fLeft) + Sk4s(0.5)).floor(); > Why call floor when the cast will floor anyway? Doh! -- I was just so excited knowing I *could* call floor()
https://codereview.chromium.org/2133413002/diff/60001/include/core/SkRect.h File include/core/SkRect.h (right): https://codereview.chromium.org/2133413002/diff/60001/include/core/SkRect.h#n... include/core/SkRect.h:876: Sk4s rd = (Sk4s::Load(&fLeft) + Sk4s(0.5)).floor(); On 2016/07/11 12:50:06, msarett wrote: > Why call floor when the cast will floor anyway? We've got to make sure we're careful to distinguish the behavior of floor(), ceil(), round(), and trunc() when dealing with these spatial types which may contain negative values. I think it's probably best that we add round(), ceil(), and floor() calls to Sk4f, so that we're not left wondering whether implementations in terms of each other are correct. For what it's worth, truncating from float->int is cheap and available everywhere. Anything else is more complicated. Generally floor() and ceil() and round() are going to be about this expensive and error prone: - truncate to integer - convert back to float - see if that result is too small or big - add back (wrong ? kFixupConstant : 0) to fix it up The main caveat is that we cannot round floats to values that are representable as floats but not integers. The safest thing might be to never expose an API that does float -> float rounding. When we have SSE 4.1 or better on Intel, roundps lifts the roundtrip-through-ints limitation, and is considerably faster. We check dynamically for this today, but it's unclear whether that approach is reliably faster than the slow caveated SSE2 version. ARMv8 added similar instructions to NEON. On ARMv7 we always do the slow caveated thing.
On 2016/07/11 12:54:24, reed1 wrote: > On 2016/07/11 12:50:06, msarett wrote: > > https://codereview.chromium.org/2133413002/diff/60001/include/core/SkRect.h > > File include/core/SkRect.h (right): > > > > > https://codereview.chromium.org/2133413002/diff/60001/include/core/SkRect.h#n... > > include/core/SkRect.h:874: SkIRect round2i() const { > > On Intel this disassembles to roundps and cvtt? > > > > There is a single Intel instruction to convert float->int with rounding. I > made > > an attempt to expose that here: > > https://codereview.chromium.org/2134753006/ > > > > > https://codereview.chromium.org/2133413002/diff/60001/include/core/SkRect.h#n... > > include/core/SkRect.h:876: Sk4s rd = (Sk4s::Load(&fLeft) + Sk4s(0.5)).floor(); > > Why call floor when the cast will floor anyway? > > Doh! -- I was just so excited knowing I *could* call floor() ... and my original thoughts were correct -- we need floor() semantics (round to -inf)
The CQ bit was checked by reed@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
There were warnings when CQ was processing your CL: * CQ_EXTRA_TRYBOTS flag is deprecated and will soon be removed. Use CQ_INCLUDE_TRYBOTS instead.
ptal
There were warnings when CQ was processing your CL: * CQ_EXTRA_TRYBOTS flag is deprecated and will soon be removed. Use CQ_INCLUDE_TRYBOTS instead.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Description was changed from ========== try to speed-up maprect + round2i + contains We call roundOut in a few places. If we can get SkNx::Ceil we could efficiently implement that as well. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2133413002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== try to speed-up maprect + round2i + contains We call roundOut in a few places. If we can get SkNx::Ceil we could efficiently implement that as well. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2133413002 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
https://codereview.chromium.org/2133413002/diff/80001/gyp/core.gypi File gyp/core.gypi (right): https://codereview.chromium.org/2133413002/diff/80001/gyp/core.gypi#newcode421 gyp/core.gypi:421: '<(skia_include_path)/private/SkCpu.h', I think at head you should find now that SkCpu.h does not need to move. SkNx and co still will need to.
https://codereview.chromium.org/2133413002/diff/80001/include/core/SkRect.h File include/core/SkRect.h (right): https://codereview.chromium.org/2133413002/diff/80001/include/core/SkRect.h#n... include/core/SkRect.h:876: Sk4s rd = (Sk4s::Load(&fLeft) + Sk4s(0.5)).floor(); I can see at least three different methods of rounding which agree for most numbers, but differ when rounding x = N.5f, for various N. 1) floorf(x + 0.5f) (round to nearest, breaking .5 towards +inf) 2) roundf(x) (round to nearest, breaking .5 away from zero) 3) nearbyintf(x) (round to nearest, breaking .5 towards even) Do we care about what happens on the .5 border? When we do have instructions to implement these functions, those instructions implement roundf(x), floorf(x), ceilf(x), and nearbyintf(x) directly, but floorf(x+0.5f) requires two steps. That's true for both x86 and ARMv8.
I will try un-moving SkCpu.h https://codereview.chromium.org/2133413002/diff/80001/include/core/SkRect.h File include/core/SkRect.h (right): https://codereview.chromium.org/2133413002/diff/80001/include/core/SkRect.h#n... include/core/SkRect.h:876: Sk4s rd = (Sk4s::Load(&fLeft) + Sk4s(0.5)).floor(); On 2016/07/11 17:28:34, mtklein wrote: > I can see at least three different methods of rounding which agree for most > numbers, but differ when rounding x = N.5f, for various N. > > 1) floorf(x + 0.5f) (round to nearest, breaking .5 towards +inf) > 2) roundf(x) (round to nearest, breaking .5 away from zero) > 3) nearbyintf(x) (round to nearest, breaking .5 towards even) > > Do we care about what happens on the .5 border? When we do have instructions to > implement these functions, those instructions implement roundf(x), floorf(x), > ceilf(x), and nearbyintf(x) directly, but floorf(x+0.5f) requires two steps. > That's true for both x86 and ARMv8. For coordinates (cpu and gpu), we need a direction that is invariant for all values. Towards +inf is nice in that it is the same as our fixed-point behavior (x + 0x8000 >> 16). I think we can only use #1 for coordinates.
The CQ bit was checked by reed@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
correct, no need to move SkCpu.h -- ptal
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
> For coordinates (cpu and gpu), we need a direction that is invariant for all > values. Towards +inf is nice in that it is the same as our fixed-point behavior > (x + 0x8000 >> 16). I think we can only use #1 for coordinates. Interesting. Might want to note that this needs to agree with SkScalarRoundToInt?
lgtm https://codereview.chromium.org/2133413002/diff/120001/include/core/SkRect.h File include/core/SkRect.h (right): https://codereview.chromium.org/2133413002/diff/120001/include/core/SkRect.h#... include/core/SkRect.h:877: SkNx_cast<int32_t>(rd).store(&dst.fLeft); SkIRect dst; SkNx_cast<int32_t>(this->round2s()).store(&dst.fLeft); return dst; ? https://codereview.chromium.org/2133413002/diff/120001/include/core/SkRect.h#... include/core/SkRect.h:883: (Sk4s::Load(&fLeft) + Sk4s(0.5)).floor().store(&dst.fLeft); It's okay if you like writing this out, but you can also write (Sk4s::Load(&fLeft) + 0.5f)
https://codereview.chromium.org/2133413002/diff/120001/include/core/SkRect.h File include/core/SkRect.h (right): https://codereview.chromium.org/2133413002/diff/120001/include/core/SkRect.h#... include/core/SkRect.h:877: SkNx_cast<int32_t>(rd).store(&dst.fLeft); On 2016/07/11 19:43:18, mtklein wrote: > SkIRect dst; > SkNx_cast<int32_t>(this->round2s()).store(&dst.fLeft); > return dst; > > ? round2s returns SkRect, not Sk4s
The CQ bit was checked by reed@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
add dox
The CQ bit was unchecked by reed@google.com
The CQ bit was checked by reed@google.com
The patchset sent to the CQ was uploaded after l-g-t-m from mtklein@google.com Link to the patchset: https://codereview.chromium.org/2133413002/#ps140001 (title: "add dox")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Message was sent while issue was closed.
Description was changed from ========== try to speed-up maprect + round2i + contains We call roundOut in a few places. If we can get SkNx::Ceil we could efficiently implement that as well. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2133413002 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== try to speed-up maprect + round2i + contains We call roundOut in a few places. If we can get SkNx::Ceil we could efficiently implement that as well. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2133413002 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/b42b785d1cbc98bd34aceae338060831b974f9c5 ==========
Message was sent while issue was closed.
Committed patchset #8 (id:140001) as https://skia.googlesource.com/skia/+/b42b785d1cbc98bd34aceae338060831b974f9c5
Message was sent while issue was closed.
A revert of this CL (patchset #8 id:140001) has been created in https://codereview.chromium.org/2136343002/ by msarett@google.com. The reason for reverting is: Breaking the roll... https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel....
Message was sent while issue was closed.
bungeman@google.com changed reviewers: + bungeman@google.com
Message was sent while issue was closed.
https://codereview.chromium.org/2133413002/diff/140001/include/private/SkNx_s... File include/private/SkNx_sse.h (right): https://codereview.chromium.org/2133413002/diff/140001/include/private/SkNx_s... include/private/SkNx_sse.h:285: auto flip = _mm_set1_epi8(char(0x80)); Heh, maybe something like the following would keep msvc happy? Too bad _mm_set1_epi8 takes char instead of 'signed char' or int8_t or something sane. #include <climits> #if CHAR_MAX == UCHAR_MAX auto flip = _mm_set1_epi8(char(0x80)); #else auto flip = _mm_set1_epi8(char(-0x80)); #endif
Message was sent while issue was closed.
On 2016/07/11 22:07:49, bungeman-skia wrote: > https://codereview.chromium.org/2133413002/diff/140001/include/private/SkNx_s... > File include/private/SkNx_sse.h (right): > > https://codereview.chromium.org/2133413002/diff/140001/include/private/SkNx_s... > include/private/SkNx_sse.h:285: auto flip = _mm_set1_epi8(char(0x80)); > Heh, maybe something like the following would keep msvc happy? Too bad > _mm_set1_epi8 takes char instead of 'signed char' or int8_t or something sane. > > #include <climits> > > #if CHAR_MAX == UCHAR_MAX > auto flip = _mm_set1_epi8(char(0x80)); > #else > auto flip = _mm_set1_epi8(char(-0x80)); > #endif It's been enough previously to just declare a value on the stack. Have a look at SkNx<8,uint16_t>::Min(): const uint16_t top = 0x8000; // Keep this separate from _mm_set1_epi16 or MSVC will whine. const __m128i top_8x = _mm_set1_epi16(top); |