Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(209)

Issue 2430313008: scale by 1 for neon implemented (Closed)

Created:
4 years, 2 months ago by fbarchard1
Modified:
4 years, 2 months ago
Reviewers:
wangcheng, hubbe
Target Ref:
refs/heads/master
Project:
libyuv
Visibility:
Public.

Description

scale by 1 for neon implemented void HalfFloat1Row_NEON(const uint16* src, uint16* dst, float, int width) { asm volatile ( "1: \n" MEMACCESS(0) "ld1 {v1.16b}, [%0], #16 \n" // load 8 shorts "subs %w2, %w2, #8 \n" // 8 pixels per loop "uxtl v2.4s, v1.4h \n" // 8 int's "uxtl2 v1.4s, v1.8h \n" "scvtf v2.4s, v2.4s \n" // 8 floats "scvtf v1.4s, v1.4s \n" "fcvtn v4.4h, v2.4s \n" // 8 floatsgit "fcvtn2 v4.8h, v1.4s \n" MEMACCESS(1) "st1 {v4.16b}, [%1], #16 \n" // store 8 shorts "b.gt 1b \n" : "+r"(src), // %0 "+r"(dst), // %1 "+r"(width) // %2 : : "cc", "memory", "v1", "v2", "v4" ); } void HalfFloatRow_NEON(const uint16* src, uint16* dst, float scale, int width) { asm volatile ( "1: \n" MEMACCESS(0) "ld1 {v1.16b}, [%0], #16 \n" // load 8 shorts "subs %w2, %w2, #8 \n" // 8 pixels per loop "uxtl v2.4s, v1.4h \n" // 8 int's "uxtl2 v1.4s, v1.8h \n" "scvtf v2.4s, v2.4s \n" // 8 floats "scvtf v1.4s, v1.4s \n" "fmul v2.4s, v2.4s, %3.s[0] \n" // adjust exponent "fmul v1.4s, v1.4s, %3.s[0] \n" "uqshrn v4.4h, v2.4s, #13 \n" // isolate halffloat "uqshrn2 v4.8h, v1.4s, #13 \n" MEMACCESS(1) "st1 {v4.16b}, [%1], #16 \n" // store 8 shorts "b.gt 1b \n" : "+r"(src), // %0 "+r"(dst), // %1 "+r"(width) // %2 : "w"(scale * 1.9259299444e-34f) // %3 : "cc", "memory", "v1", "v2", "v4" ); } TEST=LibYUVPlanarTest.TestHalfFloatPlane_One BUG=libyuv:560 R=hubbe@chromium.org Committed: https://chromium.googlesource.com/libyuv/libyuv/+/451af5e922e026c266d25abc92e7519acfc9a4c5

Patch Set 1 #

Patch Set 2 : avx2 version of unscaled halffloat #

Patch Set 3 : f16c version without scaling #

Patch Set 4 : bump version #

Patch Set 5 : xmm4 unused #

Patch Set 6 : one allocations for halffloat test #

Unified diffs Side-by-side diffs Delta from patch set Stats (+138 lines, -20 lines) Patch
M README.chromium View 1 2 3 1 chunk +1 line, -1 line 0 comments Download
M include/libyuv/row.h View 1 2 3 4 1 chunk +6 lines, -0 lines 0 comments Download
M include/libyuv/version.h View 1 2 3 1 chunk +1 line, -1 line 0 comments Download
M source/planar_functions.cc View 1 1 chunk +6 lines, -4 lines 0 comments Download
M source/row_any.cc View 1 1 chunk +3 lines, -1 line 0 comments Download
M source/row_gcc.cc View 1 2 3 4 1 chunk +30 lines, -0 lines 0 comments Download
M source/row_neon64.cc View 1 2 3 4 1 chunk +49 lines, -0 lines 0 comments Download
M unit_test/planar_test.cc View 1 2 3 4 5 2 chunks +42 lines, -13 lines 0 comments Download

Messages

Total messages: 6 (2 generated)
fbarchard1
util/android/test_runner.py gtest -s libyuv_unittest -t 7200 --verbose --release --gtest_filter=*$* -a "--libyuv_width=640 --libyuv_height=360 --libyuv_repeat=39999 --libyuv_flags=-1 --libyuv_cpu_info=-1" ...
4 years, 2 months ago (2016-10-20 23:38:42 UTC) #2
fbarchard1
Test performance is not super consistent. In this CL I changed the 3 allocations to ...
4 years, 2 months ago (2016-10-21 19:29:37 UTC) #3
hubbe
lgtm
4 years, 2 months ago (2016-10-21 19:32:52 UTC) #4
fbarchard1
4 years, 2 months ago (2016-10-21 21:30:07 UTC) #6
Message was sent while issue was closed.
Committed patchset #6 (id:100001) manually as
451af5e922e026c266d25abc92e7519acfc9a4c5 (presubmit successful).

Powered by Google App Engine
This is Rietveld 408576698