|
|
Created:
7 years, 3 months ago by Bangfu Modified:
7 years, 3 months ago CC:
v8-dev Base URL:
git://github.com/v8/v8.git@master Visibility:
Public. |
DescriptionARM: Add tests for CopyBytes.
TEST=cctest/test-macro-assembler-arm.cc
R=ulan@chromium.org
Committed: https://code.google.com/p/v8/source/detail?r=16751
Patch Set 1 : ARM fix: CopyBytes() #
Total comments: 3
Patch Set 2 : add test case #
Total comments: 2
Patch Set 3 : change branch condition #Patch Set 4 : revert copybytes #Patch Set 5 : bug fix #Patch Set 6 : bug fix for debug #
Messages
Total messages: 24 (0 generated)
CC'ing Sven for additional feedback. The patch looks good so far, however: (1) While we are at it: Wouldn't it be faster to compute src + length in the length register on startup, yiedling send, and afterwards just compare src and send, so you don't need to increment src and decrement length? That'd at least save a few instructions. (2) Please add a test case for the function. https://codereview.chromium.org/23480027/diff/3001/src/arm/macro-assembler-ar... File src/arm/macro-assembler-arm.cc (right): https://codereview.chromium.org/23480027/diff/3001/src/arm/macro-assembler-ar... src/arm/macro-assembler-arm.cc:3194: Please add ASSERTs that src, dst, length and scratch are distinct registers.
DBC https://codereview.chromium.org/23480027/diff/3001/src/arm/macro-assembler-ar... File src/arm/macro-assembler-arm.cc (right): https://codereview.chromium.org/23480027/diff/3001/src/arm/macro-assembler-ar... src/arm/macro-assembler-arm.cc:3199: tst(src, Operand(kPointerSize - 1)); This code could be improve further, see the tail end of the copy in codegen-arm.cc:CreateMemCopyUint8Function. https://codereview.chromium.org/23480027/diff/3001/src/arm/macro-assembler-ar... src/arm/macro-assembler-arm.cc:3247: bind(&byte_loop); here too you could avoid the small loop.
On 2013/09/03 14:47:06, Rodolph Perfetta wrote: > DBC > > https://codereview.chromium.org/23480027/diff/3001/src/arm/macro-assembler-ar... > File src/arm/macro-assembler-arm.cc (right): > > https://codereview.chromium.org/23480027/diff/3001/src/arm/macro-assembler-ar... > src/arm/macro-assembler-arm.cc:3199: tst(src, Operand(kPointerSize - 1)); > This code could be improve further, see the tail end of the copy in > codegen-arm.cc:CreateMemCopyUint8Function. > > https://codereview.chromium.org/23480027/diff/3001/src/arm/macro-assembler-ar... > src/arm/macro-assembler-arm.cc:3247: bind(&byte_loop); > here too you could avoid the small loop. Good suggestion, I am working on it. Thanks a lot.
On 2013/09/03 11:54:35, Benedikt Meurer wrote: > CC'ing Sven for additional feedback. > > The patch looks good so far, however: > > (1) While we are at it: Wouldn't it be faster to compute src + length in the > length register on startup, yiedling send, and afterwards just compare src and > send, so you don't need to increment src and decrement length? That'd at least > save a few instructions. > > (2) Please add a test case for the function. > > https://codereview.chromium.org/23480027/diff/3001/src/arm/macro-assembler-ar... > File src/arm/macro-assembler-arm.cc (right): > > https://codereview.chromium.org/23480027/diff/3001/src/arm/macro-assembler-ar... > src/arm/macro-assembler-arm.cc:3194: > Please add ASSERTs that src, dst, length and scratch are distinct registers. It's good idea to remove the decrement length, I am working on it. Currently there are 21 test cases are calling CopyBytes function: === mjsunit/array-constructor === === mjsunit/array-functions-prototype-misc === === mjsunit/array-join === === mjsunit/array-tostring === === mjsunit/cyclic-array-to-string === === mjsunit/debug-evaluate-locals-optimized-double === === mjsunit/debug-evaluate-locals-optimized === === mjsunit/debug-handle === === mjsunit/json === === mjsunit/json2 === === mjsunit/regexp-call-as-function === === mjsunit/regexp === === mjsunit/compiler/optimized-for-in === === mjsunit/compiler/regress-closures-with-eval === === mjsunit/harmony/proxies-json === === mjsunit/regress/regress-1209 === === mjsunit/regress/regress-2470 === === mjsunit/regress/regress-2612 === === mjsunit/regress/regress-crbug-3184 === === mjsunit/third_party/regexp-pcre === === mjsunit/tools/tickprocessor === I would be more than happy to add a new test case if you still think it's necessary. Thanks a lot.
> Currently there are 21 test cases are calling CopyBytes function: > === mjsunit/array-constructor === > === mjsunit/array-functions-prototype-misc === > === mjsunit/array-join === > === mjsunit/array-tostring === > === mjsunit/cyclic-array-to-string === > === mjsunit/debug-evaluate-locals-optimized-double === > === mjsunit/debug-evaluate-locals-optimized === > === mjsunit/debug-handle === > === mjsunit/json === > === mjsunit/json2 === > === mjsunit/regexp-call-as-function === > === mjsunit/regexp === > === mjsunit/compiler/optimized-for-in === > === mjsunit/compiler/regress-closures-with-eval === > === mjsunit/harmony/proxies-json === > === mjsunit/regress/regress-1209 === > === mjsunit/regress/regress-2470 === > === mjsunit/regress/regress-2612 === > === mjsunit/regress/regress-crbug-3184 === > === mjsunit/third_party/regexp-pcre === > === mjsunit/tools/tickprocessor === > > I would be more than happy to add a new test case if you still think it's > necessary. Yes, please add a dedicated test in cctest/test-macro-assembler-arm.cc, you'll have to create that file. We need more explicit test coverage for the macro assemblers.
On 2013/09/04 10:17:50, Benedikt Meurer wrote: > > Currently there are 21 test cases are calling CopyBytes function: > > === mjsunit/array-constructor === > > === mjsunit/array-functions-prototype-misc === > > === mjsunit/array-join === > > === mjsunit/array-tostring === > > === mjsunit/cyclic-array-to-string === > > === mjsunit/debug-evaluate-locals-optimized-double === > > === mjsunit/debug-evaluate-locals-optimized === > > === mjsunit/debug-handle === > > === mjsunit/json === > > === mjsunit/json2 === > > === mjsunit/regexp-call-as-function === > > === mjsunit/regexp === > > === mjsunit/compiler/optimized-for-in === > > === mjsunit/compiler/regress-closures-with-eval === > > === mjsunit/harmony/proxies-json === > > === mjsunit/regress/regress-1209 === > > === mjsunit/regress/regress-2470 === > > === mjsunit/regress/regress-2612 === > > === mjsunit/regress/regress-crbug-3184 === > > === mjsunit/third_party/regexp-pcre === > > === mjsunit/tools/tickprocessor === > > > > I would be more than happy to add a new test case if you still think it's > > necessary. > > Yes, please add a dedicated test in cctest/test-macro-assembler-arm.cc, you'll > have to create that file. We need more explicit test coverage for the macro > assemblers. cctest/test-macro-assembler-arm.cc is added and some updates, could you please have a look?
https://codereview.chromium.org/23480027/diff/10001/src/arm/macro-assembler-a... File src/arm/macro-assembler-arm.cc (right): https://codereview.chromium.org/23480027/diff/10001/src/arm/macro-assembler-a... src/arm/macro-assembler-arm.cc:3200: sub(ip, length, Operand(4)); Hm, this use of ip doesn't sit well with me. Using ip in the ARM backend is like calling for trouble, because it's not always obvious to anyone whether it is preserved across a bunch of operations or not (an inherent design flaw in the ARM backend). That being said, I don't like the new code. Looking at this from a highlevel point of view, I think the main optimization opportunity is copying by word. But although we optimize depending on src alignment, we never check to see whether dst is actually word aligned. I also don't like the overlap with the custom MemCopy function in codegen-arm.cc.
On 2013/09/10 11:29:21, Benedikt Meurer wrote: > https://codereview.chromium.org/23480027/diff/10001/src/arm/macro-assembler-a... > File src/arm/macro-assembler-arm.cc (right): > > https://codereview.chromium.org/23480027/diff/10001/src/arm/macro-assembler-a... > src/arm/macro-assembler-arm.cc:3200: sub(ip, length, Operand(4)); > Hm, this use of ip doesn't sit well with me. Using ip in the ARM backend is like > calling for trouble, because it's not always obvious to anyone whether it is > preserved across a bunch of operations or not (an inherent design flaw in the > ARM backend). > > That being said, I don't like the new code. Looking at this from a highlevel > point of view, I think the main optimization opportunity is copying by word. But > although we optimize depending on src alignment, we never check to see whether > dst is actually word aligned. > > I also don't like the overlap with the custom MemCopy function in > codegen-arm.cc. Is it possible to commit only test case code if you think there is no need to optimize this function?
CC'ing danno and ulan for additional feedback. I think we should really get this CopyBytes() thing right. Maybe you should try by investigating the following question first: How often is CopyBytes() called with compatible alignment of src and dst? And we should base our optimizations on the result of that.
Please split the test case into a separate CL, so we can commit that.
I think the change is correct. Do you know how it affects the benchmarks? If there is improvement, we can land it. https://codereview.chromium.org/23480027/diff/10001/src/arm/macro-assembler-a... File src/arm/macro-assembler-arm.cc (right): https://codereview.chromium.org/23480027/diff/10001/src/arm/macro-assembler-a... src/arm/macro-assembler-arm.cc:3244: b(lt, &word_loop); Any reason why this is not "b(le, &word_loop)"? I think it should work and would be consistent with the unaligned_accesses case.
On 2013/09/13 09:26:26, ulan wrote: > I think the change is correct. Do you know how it affects the benchmarks? If > there is improvement, we can land it. > > https://codereview.chromium.org/23480027/diff/10001/src/arm/macro-assembler-a... > File src/arm/macro-assembler-arm.cc (right): > > https://codereview.chromium.org/23480027/diff/10001/src/arm/macro-assembler-a... > src/arm/macro-assembler-arm.cc:3244: b(lt, &word_loop); > Any reason why this is not "b(le, &word_loop)"? I think it should work and would > be consistent with the unaligned_accesses case. It speeds up sun spider benchmark by 0.3% with this optimization. I have updated the branch condition to be consistent. Thanks for review.
> It speeds up sun spider benchmark by 0.3% with this optimization. That looks within the noise to me. I ran this CL on other benchmarks, and it was performance neutral there too. Let's land the tests and skip the change CopyBytes, since its complexity doesn't payoff.
On 2013/09/16 12:34:11, ulan wrote: > > It speeds up sun spider benchmark by 0.3% with this optimization. > That looks within the noise to me. I ran this CL on other benchmarks, and it was > performance neutral there too. > > Let's land the tests and skip the change CopyBytes, since its complexity doesn't > payoff. copybytes() is reverted, thanks.
LGTM, I will land it for you. I edited the description to avoid confusion in logs. For the record, the original description: > ARM: CopyBytes() Optimization > The current CopyBytes() doesn't fully support unaligned > copies. This patch supports unaligned word copy with > software alignment if ARM HW supports. > In worst case, the optimized CopyBytes() could speed up > by 3 times.
Message was sent while issue was closed.
Committed patchset #4 manually as r16751 (presubmit successful).
Message was sent while issue was closed.
On 2013/09/17 09:01:20, ulan wrote: > Committed patchset #4 manually as r16751 (presubmit successful). Thanks a lot.
Message was sent while issue was closed.
On 2013/09/17 09:21:43, Bangfu wrote: > On 2013/09/17 09:01:20, ulan wrote: > > Committed patchset #4 manually as r16751 (presubmit successful). > > Thanks a lot. The test does not pass in arm.debug, we had to revert.
Message was sent while issue was closed.
On 2013/09/17 10:35:44, Benedikt Meurer wrote: > On 2013/09/17 09:21:43, Bangfu wrote: > > On 2013/09/17 09:01:20, ulan wrote: > > > Committed patchset #4 manually as r16751 (presubmit successful). > > > > Thanks a lot. > > The test does not pass in arm.debug, we had to revert. There is a bug in the current copyBytes(), which is fixed in my previous patch. Now my test case tricked the bug. Please look at the code, the assert occurred with my new test case: // Copy bytes in word size chunks. if (emit_debug_code()) { tst(src, Operand(kPointerSize - 1)); Assert(eq, kExpectingAlignmentForCopyBytes); } The assert would occur if length is 1 and src address ends with "01 or 10". Do you want me to fix the bug in copyBytes or use my previous patch instead?
Message was sent while issue was closed.
Indeed, there a bug in the current implementation. Could you please reupload your tests with a fix? One possible fix would be to replace: sub(length, length, Operand(1), SetCC); b(ne, &byte_loop_1); // Copy bytes in word size chunks. bind(&word_loop); with sub(length, length, Operand(1), SetCC); b(ne, &align_loop_1); // Copy bytes in word size chunks. bind(&word_loop); And remove the unused aligh_loop label.
Message was sent while issue was closed.
On 2013/09/17 11:24:23, ulan wrote: > Indeed, there a bug in the current implementation. > > Could you please reupload your tests with a fix? > > One possible fix would be to replace: > sub(length, length, Operand(1), SetCC); > b(ne, &byte_loop_1); > // Copy bytes in word size chunks. > bind(&word_loop); > > with > > sub(length, length, Operand(1), SetCC); > b(ne, &align_loop_1); > // Copy bytes in word size chunks. > bind(&word_loop); > > And remove the unused aligh_loop label. Your suggestion will improve things, but the bug won't go away. If the length reaches zero before src get aligned, assert might occur.
Message was sent while issue was closed.
On 2013/09/17 12:27:47, Bangfu wrote: > On 2013/09/17 11:24:23, ulan wrote: > > Indeed, there a bug in the current implementation. > > > > Could you please reupload your tests with a fix? > > > > One possible fix would be to replace: > > sub(length, length, Operand(1), SetCC); > > b(ne, &byte_loop_1); > > // Copy bytes in word size chunks. > > bind(&word_loop); > > > > with > > > > sub(length, length, Operand(1), SetCC); > > b(ne, &align_loop_1); > > // Copy bytes in word size chunks. > > bind(&word_loop); > > > > And remove the unused aligh_loop label. > > Your suggestion will improve things, but the bug won't go away. If the length > reaches zero before src get aligned, assert might occur. Right, then it is better to compare the length with kPointerSize and jump to byte_loop if it smaller at the very beginning instead of comparing the length with 0. This would also remove the need to check the length in the align_loop.
Message was sent while issue was closed.
On 2013/09/17 12:39:27, ulan wrote: > On 2013/09/17 12:27:47, Bangfu wrote: > > On 2013/09/17 11:24:23, ulan wrote: > > > Indeed, there a bug in the current implementation. > > > > > > Could you please reupload your tests with a fix? > > > > > > One possible fix would be to replace: > > > sub(length, length, Operand(1), SetCC); > > > b(ne, &byte_loop_1); > > > // Copy bytes in word size chunks. > > > bind(&word_loop); > > > > > > with > > > > > > sub(length, length, Operand(1), SetCC); > > > b(ne, &align_loop_1); > > > // Copy bytes in word size chunks. > > > bind(&word_loop); > > > > > > And remove the unused aligh_loop label. > > > > Your suggestion will improve things, but the bug won't go away. If the length > > reaches zero before src get aligned, assert might occur. > Right, then it is better to compare the length with kPointerSize and jump to > byte_loop if it smaller at the very beginning instead of comparing the length > with 0. This would also remove the need to check the length in the align_loop. Done. By the way, do you want me to add more test cases for ARM macro assembler to be similar like X64 macro?
Message was sent while issue was closed.
> Done. Thank you! Could you please create a new CL issue with description "Reland r16751 with fix." And upload this fix and test there. This would be better for logs. If you are using git-svn, you can do it in the old git branch with: > git cl issue 0 > git cl upload > By the way, do you want me to add more test cases for ARM macro assembler to be > similar like X64 macro? That might uncover more bugs. If you have time, then please go ahead with it. |