Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(505)

Issue 982123002: SKPMFloat: we can beat the naive loops when clamping (Closed)

Created:
5 years, 9 months ago by mtklein_C
Modified:
5 years, 9 months ago
Reviewers:
msarett, mtklein, reed1
CC:
reviews_skia.org
Base URL:
https://skia.googlesource.com/skia.git@master
Target Ref:
refs/heads/master
Project:
skia
Visibility:
Public.

Description

SKPMFloat: we can beat the naive loops when clamping Clamping 4 at a time is now about 15% faster than 1 at a time with SSSE3. Clamping 4 at a time is now about 20% faster with SSE2, and this applies to non-clamping too (we still just clamp there). In all cases, 4 at a time is never worse than 1 at a time, and not clamping is never slower than clamping. Here's all the bench results, with the numbers for portable code as a fun point of reference: SSSE3: maxrss loops min median mean max stddev samples config bench 10M 2291 4.66ns 4.66ns 4.66ns 4.68ns 0% ▆█▁▁▁▇▁▇▁▃ nonrendering SkPMFloat_get_1x 10M 2040 5.29ns 5.3ns 5.3ns 5.32ns 0% ▃▆▃▃▁▁▆▃▃█ nonrendering SkPMFloat_clamp_1x 10M 7175 4.62ns 4.62ns 4.62ns 4.63ns 0% ▁▄▃████▃▄▇ nonrendering SkPMFloat_get_4x 10M 5801 4.89ns 4.89ns 4.89ns 4.91ns 0% █▂▄▃▁▃▄█▁▁ nonrendering SkPMFloat_clamp_4x SSE2: maxrss loops min median mean max stddev samples config bench 10M 1601 6.02ns 6.05ns 6.04ns 6.08ns 0% █▅▄▅▄▂▁▂▂▂ nonrendering SkPMFloat_get_1x 10M 2918 6.05ns 6.06ns 6.05ns 6.06ns 0% ▂▇▁▇▇▁▇█▇▂ nonrendering SkPMFloat_clamp_1x 10M 3569 5.43ns 5.45ns 5.44ns 5.45ns 0% ▄█▂██▇▁▁▇▇ nonrendering SkPMFloat_get_4x 10M 4168 5.43ns 5.43ns 5.43ns 5.44ns 0% █▄▇▁▇▄▁▁▁▁ nonrendering SkPMFloat_clamp_4x Portable: maxrss loops min median mean max stddev samples config bench 10M 500 27.8ns 28.1ns 28ns 28.2ns 0% ▃█▆▃▇▃▆▁▇▂ nonrendering SkPMFloat_get_1x 10M 770 40.1ns 40.2ns 40.2ns 40.3ns 0% ▅▁▃▂▆▄█▂▅▂ nonrendering SkPMFloat_clamp_1x 10M 1269 28.4ns 28.8ns 29.1ns 32.7ns 4% ▂▂▂█▂▁▁▂▁▁ nonrendering SkPMFloat_get_4x 10M 1439 40.2ns 40.4ns 40.4ns 40.5ns 0% ▆▆▆█▁▆▅█▅▆ nonrendering SkPMFloat_clamp_4x SkPMFloat_neon.h is still one big TODO as far as 4-at-a-time APIs go. BUG=skia: Committed: https://skia.googlesource.com/skia/+/91fd7371ec80724ec53aae8f2d5a6753499d8963

Patch Set 1 #

Patch Set 2 : restore some asserts #

Unified diffs Side-by-side diffs Delta from patch set Stats (+81 lines, -13 lines) Patch
M src/core/SkPMFloat.h View 2 chunks +3 lines, -12 lines 0 comments Download
M src/opts/SkPMFloat_SSE2.h View 1 2 chunks +28 lines, -1 line 0 comments Download
M src/opts/SkPMFloat_SSSE3.h View 1 1 chunk +25 lines, -0 lines 0 comments Download
M src/opts/SkPMFloat_neon.h View 1 chunk +13 lines, -0 lines 0 comments Download
M src/opts/SkPMFloat_none.h View 1 chunk +12 lines, -0 lines 0 comments Download

Messages

Total messages: 7 (2 generated)
mtklein_C
Turns out I was looking in the wrong places to beat the naive loop. I ...
5 years, 9 months ago (2015-03-05 20:03:26 UTC) #2
msarett
lgtm I'm still looking to see if we can beat the naive loops. One issue ...
5 years, 9 months ago (2015-03-06 13:48:03 UTC) #3
mtklein
On 2015/03/06 13:48:03, msarett wrote: > lgtm > > I'm still looking to see if ...
5 years, 9 months ago (2015-03-06 13:58:36 UTC) #4
commit-bot: I haz the power
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/982123002/20001
5 years, 9 months ago (2015-03-06 14:10:19 UTC) #6
commit-bot: I haz the power
5 years, 9 months ago (2015-03-06 14:15:47 UTC) #7
Message was sent while issue was closed.
Committed patchset #2 (id:20001) as
https://skia.googlesource.com/skia/+/91fd7371ec80724ec53aae8f2d5a6753499d8963

Powered by Google App Engine
This is Rietveld 408576698