Issue 982123002: SKPMFloat: we can beat the naive loops when clamping

Issue 982123002: SKPMFloat: we can beat the naive loops when clamping (Closed)

Created:
5 years, 9 months ago by mtklein_C

Modified:
5 years, 9 months ago

Reviewers:
msarett, mtklein, reed1

CC:
reviews_skia.org

Base URL:
https://skia.googlesource.com/skia.git@master

Target Ref:
refs/heads/master

Project:
skia

Visibility:
Public.

More Reviews

Description

SKPMFloat: we can beat the naive loops when clamping Clamping 4 at a time is now about 15% faster than 1 at a time with SSSE3. Clamping 4 at a time is now about 20% faster with SSE2, and this applies to non-clamping too (we still just clamp there). In all cases, 4 at a time is never worse than 1 at a time, and not clamping is never slower than clamping. Here's all the bench results, with the numbers for portable code as a fun point of reference: SSSE3: maxrss loops min median mean max stddev samples config bench 10M 2291 4.66ns 4.66ns 4.66ns 4.68ns 0% ▆█▁▁▁▇▁▇▁▃ nonrendering SkPMFloat_get_1x 10M 2040 5.29ns 5.3ns 5.3ns 5.32ns 0% ▃▆▃▃▁▁▆▃▃█ nonrendering SkPMFloat_clamp_1x 10M 7175 4.62ns 4.62ns 4.62ns 4.63ns 0% ▁▄▃████▃▄▇ nonrendering SkPMFloat_get_4x 10M 5801 4.89ns 4.89ns 4.89ns 4.91ns 0% █▂▄▃▁▃▄█▁▁ nonrendering SkPMFloat_clamp_4x SSE2: maxrss loops min median mean max stddev samples config bench 10M 1601 6.02ns 6.05ns 6.04ns 6.08ns 0% █▅▄▅▄▂▁▂▂▂ nonrendering SkPMFloat_get_1x 10M 2918 6.05ns 6.06ns 6.05ns 6.06ns 0% ▂▇▁▇▇▁▇█▇▂ nonrendering SkPMFloat_clamp_1x 10M 3569 5.43ns 5.45ns 5.44ns 5.45ns 0% ▄█▂██▇▁▁▇▇ nonrendering SkPMFloat_get_4x 10M 4168 5.43ns 5.43ns 5.43ns 5.44ns 0% █▄▇▁▇▄▁▁▁▁ nonrendering SkPMFloat_clamp_4x Portable: maxrss loops min median mean max stddev samples config bench 10M 500 27.8ns 28.1ns 28ns 28.2ns 0% ▃█▆▃▇▃▆▁▇▂ nonrendering SkPMFloat_get_1x 10M 770 40.1ns 40.2ns 40.2ns 40.3ns 0% ▅▁▃▂▆▄█▂▅▂ nonrendering SkPMFloat_clamp_1x 10M 1269 28.4ns 28.8ns 29.1ns 32.7ns 4% ▂▂▂█▂▁▁▂▁▁ nonrendering SkPMFloat_get_4x 10M 1439 40.2ns 40.4ns 40.4ns 40.5ns 0% ▆▆▆█▁▆▅█▅▆ nonrendering SkPMFloat_clamp_4x SkPMFloat_neon.h is still one big TODO as far as 4-at-a-time APIs go. BUG=skia: Committed: https://skia.googlesource.com/skia/+/91fd7371ec80724ec53aae8f2d5a6753499d8963

Patch Set 1 #

Patch Set 2 : restore some asserts #

Created: 5 years, 9 months ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+81 lines, -13 lines)			Patch
M	src/core/SkPMFloat.h	View		2 chunks	+3 lines, -12 lines	0 comments	Download
M	src/opts/SkPMFloat_SSE2.h	View	1	2 chunks	+28 lines, -1 line	0 comments	Download
M	src/opts/SkPMFloat_SSSE3.h	View	1	1 chunk	+25 lines, -0 lines	0 comments	Download
M	src/opts/SkPMFloat_neon.h	View		1 chunk	+13 lines, -0 lines	0 comments	Download
M	src/opts/SkPMFloat_none.h	View		1 chunk	+12 lines, -0 lines	0 comments	Download

Messages

Total messages: 7 (2 generated)

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages

mtklein_C

Turns out I was looking in the wrong places to beat the naive loop. I ...

5 years, 9 months ago (2015-03-05 20:03:26 UTC) #2

msarett

lgtm I'm still looking to see if we can beat the naive loops. One issue ...

5 years, 9 months ago (2015-03-06 13:48:03 UTC) #3

mtklein

On 2015/03/06 13:48:03, msarett wrote: > lgtm > > I'm still looking to see if ...

5 years, 9 months ago (2015-03-06 13:58:36 UTC) #4

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/982123002/20001

5 years, 9 months ago (2015-03-06 14:10:19 UTC) #6

commit-bot: I haz the power

5 years, 9 months ago (2015-03-06 14:15:47 UTC) #7

Message was sent while issue was closed.

Committed patchset #2 (id:20001) as
https://skia.googlesource.com/skia/+/91fd7371ec80724ec53aae8f2d5a6753499d8963

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages