|
|
Chromium Code Reviews|
Created:
4 years, 3 months ago by Prashant.Patil Modified:
4 years, 2 months ago Reviewers:
Ken Russell (switch to Gerrit), Dirk Pranke, Raymond Toy, Zhenyao Mo, prashant.patil, Stephen Chennney CC:
chromium-reviews, blink-reviews, hongchan, raghu.gandham_imgtec.com, gordana.cmiljanovic_imgtec.com, manojkumar.bhosale, parag.salasakar_imgtec.com, kaustubh.raste_imgtec.com Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionAdd MSA (MIPS SIMD Arch) optimized VectorMath functions
We add following MSA optimized VectorMath functions:
- vsma
- vsmul
- vadd
- vmul
- vmaxmgv
- vclip
R=kbr@chromium.org
BUG=642653
Committed: https://crrev.com/95d7e1276aed8f2d2687de388d83919b813c78dc
Cr-Commit-Position: refs/heads/master@{#424977}
Patch Set 1 #Patch Set 2 : Removing zvmul and vsvesq #
Total comments: 25
Patch Set 3 : Incorporate review comments #
Total comments: 1
Patch Set 4 : Rebase #Patch Set 5 : Incorporate review comments #
Total comments: 4
Patch Set 6 : Incorporate review comments #Patch Set 7 : Add MSA (MIPS SIMD Arch) optimized VectorMath functions #Patch Set 8 : coding consistency changes #
Total comments: 2
Patch Set 9 : Incorporated review comments #Patch Set 10 : formatting changes #
Messages
Total messages: 61 (17 generated)
Description was changed from ========== Add MSA (MIPS SIMD Arch) optimized VectorMath functions We add following MSA optimized VectorMath functions: - vsma - vsmul - vadd - vmul - zvmul - vsvesq - vmaxmgv - vclip R=kbr@chromium.org BUG=642653 ========== to ========== Add MSA (MIPS SIMD Arch) optimized VectorMath functions We add following MSA optimized VectorMath functions: - vsma - vsmul - vadd - vmul - zvmul - vsvesq - vmaxmgv - vclip R=kbr@chromium.org BUG=642653 ==========
prashant.patil@imgtec.com changed reviewers: + dpranke@chromium.org, schenney@chromium.org, zmo@chromium.org
dpranke@chromium.org changed reviewers: - dpranke@chromium.org
rtoy@chromium.org changed reviewers: + dpranke@chromium.org, rtoy@chromium.org
It will take me some time to review this. Can you please run the webaudio layout tests to make sure things still pass with these optimizations? Can you also provide performance numbers for each of these routines compared to the non-optimize versions? If the gain is too small, we'd rather not have to have these hard-to-read and maintain code.
On 2016/09/14 17:14:11, Raymond Toy wrote: > It will take me some time to review this. > > Can you please run the webaudio layout tests to make sure things still pass with > these optimizations? > > Can you also provide performance numbers for each of these routines compared to > the non-optimize versions? If the gain is too small, we'd rather not have to > have these hard-to-read and maintain code. Tested compliance tests with these MSA optimizations. The performance numbers compared to non-optimized version are: vsma - 10x vsmul, vadd, vmul, zvmul - 5x vsvesq - 4x vmaxmgv - 10x vclip - 7x
On 2016/09/16 07:56:01, Prashant.Patil wrote: > On 2016/09/14 17:14:11, Raymond Toy wrote: > > It will take me some time to review this. > > > > Can you please run the webaudio layout tests to make sure things still pass > with > > these optimizations? > > > > Can you also provide performance numbers for each of these routines compared > to > > the non-optimize versions? If the gain is too small, we'd rather not have to > > have these hard-to-read and maintain code. > > > Tested compliance tests with these MSA optimizations. > > The performance numbers compared to non-optimized version are: > vsma - 10x > vsmul, vadd, vmul, zvmul - 5x > vsvesq - 4x > vmaxmgv - 10x > vclip - 7x Very nice speed improvements! Did you just write a small test to compare the performance of these routines with and without MSA? What compliance tests did you run?
On 2016/09/16 15:17:35, Raymond Toy wrote: > On 2016/09/16 07:56:01, Prashant.Patil wrote: > > On 2016/09/14 17:14:11, Raymond Toy wrote: > > > It will take me some time to review this. > > > > > > Can you please run the webaudio layout tests to make sure things still pass > > with > > > these optimizations? > > > > > > Can you also provide performance numbers for each of these routines compared > > to > > > the non-optimize versions? If the gain is too small, we'd rather not have > to > > > have these hard-to-read and maintain code. > > > > > > Tested compliance tests with these MSA optimizations. > > > > The performance numbers compared to non-optimized version are: > > vsma - 10x > > vsmul, vadd, vmul, zvmul - 5x > > vsvesq - 4x > > vmaxmgv - 10x > > vclip - 7x > > Very nice speed improvements! Did you just write a small test to compare the > performance > of these routines with and without MSA? > > What compliance tests did you run? Yes. We had written standalone tests to compare the performance with and without MSA. Also tested for compliance with c version using standalone tests.
On 2016/09/19 06:55:11, Prashant.Patil wrote: > On 2016/09/16 15:17:35, Raymond Toy wrote: > > On 2016/09/16 07:56:01, Prashant.Patil wrote: > > > On 2016/09/14 17:14:11, Raymond Toy wrote: > > > > It will take me some time to review this. > > > > > > > > Can you please run the webaudio layout tests to make sure things still > pass > > > with > > > > these optimizations? > > > > > > > > Can you also provide performance numbers for each of these routines > compared > > > to > > > > the non-optimize versions? If the gain is too small, we'd rather not have > > to > > > > have these hard-to-read and maintain code. > > > > > > > > > Tested compliance tests with these MSA optimizations. > > > > > > The performance numbers compared to non-optimized version are: > > > vsma - 10x > > > vsmul, vadd, vmul, zvmul - 5x > > > vsvesq - 4x > > > vmaxmgv - 10x > > > vclip - 7x > > > > Very nice speed improvements! Did you just write a small test to compare the > > performance > > of these routines with and without MSA? > > > > What compliance tests did you run? > > > Yes. We had written standalone tests to compare the performance with and without > MSA. > > Also tested for compliance with c version using standalone tests. Thanks for the info. Can you also run the webaudio layout tests? They've been pretty good at finding small changes, and caught a few issues when SSE2 support was added.
On 2016/09/19 15:37:31, Raymond Toy-OOO TPAC Sep20-24 wrote: > On 2016/09/19 06:55:11, Prashant.Patil wrote: > > On 2016/09/16 15:17:35, Raymond Toy wrote: > > > On 2016/09/16 07:56:01, Prashant.Patil wrote: > > > > On 2016/09/14 17:14:11, Raymond Toy wrote: > > > > > It will take me some time to review this. > > > > > > > > > > Can you please run the webaudio layout tests to make sure things still > > pass > > > > with > > > > > these optimizations? > > > > > > > > > > Can you also provide performance numbers for each of these routines > > compared > > > > to > > > > > the non-optimize versions? If the gain is too small, we'd rather not > have > > > to > > > > > have these hard-to-read and maintain code. > > > > > > > > > > > > Tested compliance tests with these MSA optimizations. > > > > > > > > The performance numbers compared to non-optimized version are: > > > > vsma - 10x > > > > vsmul, vadd, vmul, zvmul - 5x > > > > vsvesq - 4x > > > > vmaxmgv - 10x > > > > vclip - 7x > > > > > > Very nice speed improvements! Did you just write a small test to compare > the > > > performance > > > of these routines with and without MSA? > > > > > > What compliance tests did you run? > > > > > > Yes. We had written standalone tests to compare the performance with and > without > > MSA. > > > > Also tested for compliance with c version using standalone tests. > > Thanks for the info. > > Can you also run the webaudio layout tests? They've been pretty good at finding > small changes, and caught a few issues when SSE2 support was added. As i mentioned earlier, we have tested these MSA optimized kernels for all possible combinations/corner cases locally. As of today we don't have android device with MSA to run webaudio layout tests.
On 2016/09/21 12:14:20, Prashant.Patil wrote: > On 2016/09/19 15:37:31, Raymond Toy-OOO TPAC Sep20-24 wrote: > > On 2016/09/19 06:55:11, Prashant.Patil wrote: > > > On 2016/09/16 15:17:35, Raymond Toy wrote: > > > > On 2016/09/16 07:56:01, Prashant.Patil wrote: > > > > > On 2016/09/14 17:14:11, Raymond Toy wrote: > > > > > > It will take me some time to review this. > > > > > > > > > > > > Can you please run the webaudio layout tests to make sure things still > > > pass > > > > > with > > > > > > these optimizations? > > > > > > > > > > > > Can you also provide performance numbers for each of these routines > > > compared > > > > > to > > > > > > the non-optimize versions? If the gain is too small, we'd rather not > > have > > > > to > > > > > > have these hard-to-read and maintain code. > > > > > > > > > > > > > > > Tested compliance tests with these MSA optimizations. > > > > > > > > > > The performance numbers compared to non-optimized version are: > > > > > vsma - 10x > > > > > vsmul, vadd, vmul, zvmul - 5x > > > > > vsvesq - 4x > > > > > vmaxmgv - 10x > > > > > vclip - 7x > > > > > > > > Very nice speed improvements! Did you just write a small test to compare > > the > > > > performance > > > > of these routines with and without MSA? > > > > > > > > What compliance tests did you run? > > > > > > > > > Yes. We had written standalone tests to compare the performance with and > > without > > > MSA. > > > > > > Also tested for compliance with c version using standalone tests. > > > > Thanks for the info. > > > > Can you also run the webaudio layout tests? They've been pretty good at > finding > > small changes, and caught a few issues when SSE2 support was added. > > As i mentioned earlier, we have tested these MSA optimized kernels for all > possible combinations/corner cases locally. As of today we don't have android > device with MSA to run webaudio layout tests. Raymond, if you give a thumbs-up I'll approve this.
On 2016/09/22 00:46:04, Ken Russell wrote: > On 2016/09/21 12:14:20, Prashant.Patil wrote: > > On 2016/09/19 15:37:31, Raymond Toy-OOO TPAC Sep20-24 wrote: > > > On 2016/09/19 06:55:11, Prashant.Patil wrote: > > > > On 2016/09/16 15:17:35, Raymond Toy wrote: > > > > > On 2016/09/16 07:56:01, Prashant.Patil wrote: > > > > > > On 2016/09/14 17:14:11, Raymond Toy wrote: > > > > > > > It will take me some time to review this. > > > > > > > > > > > > > > Can you please run the webaudio layout tests to make sure things > still > > > > pass > > > > > > with > > > > > > > these optimizations? > > > > > > > > > > > > > > Can you also provide performance numbers for each of these routines > > > > compared > > > > > > to > > > > > > > the non-optimize versions? If the gain is too small, we'd rather > not > > > have > > > > > to > > > > > > > have these hard-to-read and maintain code. > > > > > > > > > > > > > > > > > > Tested compliance tests with these MSA optimizations. > > > > > > > > > > > > The performance numbers compared to non-optimized version are: > > > > > > vsma - 10x > > > > > > vsmul, vadd, vmul, zvmul - 5x > > > > > > vsvesq - 4x > > > > > > vmaxmgv - 10x > > > > > > vclip - 7x > > > > > > > > > > Very nice speed improvements! Did you just write a small test to > compare > > > the > > > > > performance > > > > > of these routines with and without MSA? > > > > > > > > > > What compliance tests did you run? > > > > > > > > > > > > Yes. We had written standalone tests to compare the performance with and > > > without > > > > MSA. > > > > > > > > Also tested for compliance with c version using standalone tests. > > > > > > Thanks for the info. > > > > > > Can you also run the webaudio layout tests? They've been pretty good at > > finding > > > small changes, and caught a few issues when SSE2 support was added. > > > > As i mentioned earlier, we have tested these MSA optimized kernels for all > > possible combinations/corner cases locally. As of today we don't have android > > device with MSA to run webaudio layout tests. > > Raymond, if you give a thumbs-up I'll approve this. Raymond, your view on this? should we go for commit?
On 2016/09/28 09:18:07, Prashant.Patil wrote: > On 2016/09/22 00:46:04, Ken Russell wrote: > > On 2016/09/21 12:14:20, Prashant.Patil wrote: > > > On 2016/09/19 15:37:31, Raymond Toy-OOO TPAC Sep20-24 wrote: > > > > On 2016/09/19 06:55:11, Prashant.Patil wrote: > > > > > On 2016/09/16 15:17:35, Raymond Toy wrote: > > > > > > On 2016/09/16 07:56:01, Prashant.Patil wrote: > > > > > > > On 2016/09/14 17:14:11, Raymond Toy wrote: > > > > > > > > It will take me some time to review this. > > > > > > > > > > > > > > > > Can you please run the webaudio layout tests to make sure things > > still > > > > > pass > > > > > > > with > > > > > > > > these optimizations? > > > > > > > > > > > > > > > > Can you also provide performance numbers for each of these > routines > > > > > compared > > > > > > > to > > > > > > > > the non-optimize versions? If the gain is too small, we'd rather > > not > > > > have > > > > > > to > > > > > > > > have these hard-to-read and maintain code. > > > > > > > > > > > > > > > > > > > > > Tested compliance tests with these MSA optimizations. > > > > > > > > > > > > > > The performance numbers compared to non-optimized version are: > > > > > > > vsma - 10x > > > > > > > vsmul, vadd, vmul, zvmul - 5x > > > > > > > vsvesq - 4x > > > > > > > vmaxmgv - 10x > > > > > > > vclip - 7x > > > > > > > > > > > > Very nice speed improvements! Did you just write a small test to > > compare > > > > the > > > > > > performance > > > > > > of these routines with and without MSA? > > > > > > > > > > > > What compliance tests did you run? > > > > > > > > > > > > > > > Yes. We had written standalone tests to compare the performance with and > > > > without > > > > > MSA. > > > > > > > > > > Also tested for compliance with c version using standalone tests. > > > > > > > > Thanks for the info. > > > > > > > > Can you also run the webaudio layout tests? They've been pretty good at > > > finding > > > > small changes, and caught a few issues when SSE2 support was added. > > > > > > As i mentioned earlier, we have tested these MSA optimized kernels for all > > > possible combinations/corner cases locally. As of today we don't have > android > > > device with MSA to run webaudio layout tests. > > > > Raymond, if you give a thumbs-up I'll approve this. > > Raymond, your view on this? should we go for commit? The general idea is great and we should do this. However, I've been away at TPAC so I haven't reviewed the code yet. Could you tell me how much difference in numerical results there is between the scalar and simd code? Perhaps as an SNR measure between the scalar and simd results? Or were the results exactly identical?
On 2016/09/28 16:01:23, Raymond Toy wrote: > On 2016/09/28 09:18:07, Prashant.Patil wrote: > > On 2016/09/22 00:46:04, Ken Russell wrote: > > > On 2016/09/21 12:14:20, Prashant.Patil wrote: > > > > On 2016/09/19 15:37:31, Raymond Toy-OOO TPAC Sep20-24 wrote: > > > > > On 2016/09/19 06:55:11, Prashant.Patil wrote: > > > > > > On 2016/09/16 15:17:35, Raymond Toy wrote: > > > > > > > On 2016/09/16 07:56:01, Prashant.Patil wrote: > > > > > > > > On 2016/09/14 17:14:11, Raymond Toy wrote: > > > > > > > > > It will take me some time to review this. > > > > > > > > > > > > > > > > > > Can you please run the webaudio layout tests to make sure things > > > still > > > > > > pass > > > > > > > > with > > > > > > > > > these optimizations? > > > > > > > > > > > > > > > > > > Can you also provide performance numbers for each of these > > routines > > > > > > compared > > > > > > > > to > > > > > > > > > the non-optimize versions? If the gain is too small, we'd > rather > > > not > > > > > have > > > > > > > to > > > > > > > > > have these hard-to-read and maintain code. > > > > > > > > > > > > > > > > > > > > > > > > Tested compliance tests with these MSA optimizations. > > > > > > > > > > > > > > > > The performance numbers compared to non-optimized version are: > > > > > > > > vsma - 10x > > > > > > > > vsmul, vadd, vmul, zvmul - 5x > > > > > > > > vsvesq - 4x > > > > > > > > vmaxmgv - 10x > > > > > > > > vclip - 7x > > > > > > > > > > > > > > Very nice speed improvements! Did you just write a small test to > > > compare > > > > > the > > > > > > > performance > > > > > > > of these routines with and without MSA? > > > > > > > > > > > > > > What compliance tests did you run? > > > > > > > > > > > > > > > > > > Yes. We had written standalone tests to compare the performance with > and > > > > > without > > > > > > MSA. > > > > > > > > > > > > Also tested for compliance with c version using standalone tests. > > > > > > > > > > Thanks for the info. > > > > > > > > > > Can you also run the webaudio layout tests? They've been pretty good at > > > > finding > > > > > small changes, and caught a few issues when SSE2 support was added. > > > > > > > > As i mentioned earlier, we have tested these MSA optimized kernels for all > > > > possible combinations/corner cases locally. As of today we don't have > > android > > > > device with MSA to run webaudio layout tests. > > > > > > Raymond, if you give a thumbs-up I'll approve this. > > > > Raymond, your view on this? should we go for commit? > > The general idea is great and we should do this. However, I've been away at > TPAC so I haven't reviewed the code yet. > > Could you tell me how much difference in numerical results there is between the > scalar and simd code? Perhaps as an SNR measure between the scalar and simd > results? Or were the results exactly identical? 1. The scalar and simd code for following functions generate exactly identical results - vsma , vsmul, vadd, vmul, vmaxmgv, vclip 2. zvmul - In scalar code : The operation is (a*b) +/- (c*d). Multiplications are done first and then addition/subtraction. So 3 rounding errors (2 for multiply, 1 for add/sub) In simd code : The first multiplication (a*b) is done, then fused multiply-add instruciton is used. Here only 2 rounding errors. Hence the simd results are more precise than scalar counterpart. The absolute max error observed between scalar and simd code is 0.005 3. vsvesq - Due to difference in accumulation order in simd and scalar code(Please refer the msa simd code for vsvesq), the standard rounding error is accumulated differently hence the result diverts from that of scalar counterpart.
On 2016/09/29 13:25:00, Prashant.Patil wrote: > On 2016/09/28 16:01:23, Raymond Toy wrote: > > On 2016/09/28 09:18:07, Prashant.Patil wrote: > > > On 2016/09/22 00:46:04, Ken Russell wrote: > > > > On 2016/09/21 12:14:20, Prashant.Patil wrote: > > > > > On 2016/09/19 15:37:31, Raymond Toy-OOO TPAC Sep20-24 wrote: > > > > > > On 2016/09/19 06:55:11, Prashant.Patil wrote: > > > > > > > On 2016/09/16 15:17:35, Raymond Toy wrote: > > > > > > > > On 2016/09/16 07:56:01, Prashant.Patil wrote: > > > > > > > > > On 2016/09/14 17:14:11, Raymond Toy wrote: > > > > > > > > > > It will take me some time to review this. > > > > > > > > > > > > > > > > > > > > Can you please run the webaudio layout tests to make sure > things > > > > still > > > > > > > pass > > > > > > > > > with > > > > > > > > > > these optimizations? > > > > > > > > > > > > > > > > > > > > Can you also provide performance numbers for each of these > > > routines > > > > > > > compared > > > > > > > > > to > > > > > > > > > > the non-optimize versions? If the gain is too small, we'd > > rather > > > > not > > > > > > have > > > > > > > > to > > > > > > > > > > have these hard-to-read and maintain code. > > > > > > > > > > > > > > > > > > > > > > > > > > > Tested compliance tests with these MSA optimizations. > > > > > > > > > > > > > > > > > > The performance numbers compared to non-optimized version are: > > > > > > > > > vsma - 10x > > > > > > > > > vsmul, vadd, vmul, zvmul - 5x > > > > > > > > > vsvesq - 4x > > > > > > > > > vmaxmgv - 10x > > > > > > > > > vclip - 7x > > > > > > > > > > > > > > > > Very nice speed improvements! Did you just write a small test to > > > > compare > > > > > > the > > > > > > > > performance > > > > > > > > of these routines with and without MSA? > > > > > > > > > > > > > > > > What compliance tests did you run? > > > > > > > > > > > > > > > > > > > > > Yes. We had written standalone tests to compare the performance with > > and > > > > > > without > > > > > > > MSA. > > > > > > > > > > > > > > Also tested for compliance with c version using standalone tests. > > > > > > > > > > > > Thanks for the info. > > > > > > > > > > > > Can you also run the webaudio layout tests? They've been pretty good > at > > > > > finding > > > > > > small changes, and caught a few issues when SSE2 support was added. > > > > > > > > > > As i mentioned earlier, we have tested these MSA optimized kernels for > all > > > > > possible combinations/corner cases locally. As of today we don't have > > > android > > > > > device with MSA to run webaudio layout tests. > > > > > > > > Raymond, if you give a thumbs-up I'll approve this. > > > > > > Raymond, your view on this? should we go for commit? > > > > The general idea is great and we should do this. However, I've been away at > > TPAC so I haven't reviewed the code yet. > > > > Could you tell me how much difference in numerical results there is between > the > > scalar and simd code? Perhaps as an SNR measure between the scalar and simd > > results? Or were the results exactly identical? > > 1. The scalar and simd code for following functions generate exactly identical > results - > vsma , vsmul, vadd, vmul, vmaxmgv, vclip Fantastic! I have no problems including these simd optimizations for these operations. > > 2. zvmul - > In scalar code : The operation is (a*b) +/- (c*d). Multiplications are done > first and then addition/subtraction. So 3 rounding errors (2 for multiply, 1 for > add/sub) > In simd code : The first multiplication (a*b) is done, then fused > multiply-add instruciton is used. Here only 2 rounding errors. > Hence the simd results are more precise than scalar counterpart. The absolute > max error observed between scalar and simd code is 0.005 Without knowing the magnitude of the input numbers, I can't really conclude anything meaningful, but on the face of it an error of 0.005 is huge and unacceptable. Also, I recall some paper by Prof. Kahan about fused multiplies for these kinds of computations. Depending on which product is done in the FMA (a*b or c*d), the result could be erroneously negative for something that should have been positive. Unfortunately, I can't seem to find that paper. Thus, I'm a bit concerned about using an FMA here. > > 3. vsvesq - > Due to difference in accumulation order in simd and scalar code(Please refer > the msa simd code for vsvesq), the standard rounding error is accumulated > differently hence the result diverts from that of scalar counterpart. Sure, but how much? It's really important to make sure the simd optimizations don't change the computed results by too much. Such differences might be audible if combined in certain ways.
On 2016/09/29 18:12:24, Raymond Toy wrote: > On 2016/09/29 13:25:00, Prashant.Patil wrote: > > On 2016/09/28 16:01:23, Raymond Toy wrote: > > > On 2016/09/28 09:18:07, Prashant.Patil wrote: > > > > On 2016/09/22 00:46:04, Ken Russell wrote: > > > > > On 2016/09/21 12:14:20, Prashant.Patil wrote: > > > > > > On 2016/09/19 15:37:31, Raymond Toy-OOO TPAC Sep20-24 wrote: > > > > > > > On 2016/09/19 06:55:11, Prashant.Patil wrote: > > > > > > > > On 2016/09/16 15:17:35, Raymond Toy wrote: > > > > > > > > > On 2016/09/16 07:56:01, Prashant.Patil wrote: > > > > > > > > > > On 2016/09/14 17:14:11, Raymond Toy wrote: > > > > > > > > > > > It will take me some time to review this. > > > > > > > > > > > > > > > > > > > > > > Can you please run the webaudio layout tests to make sure > > things > > > > > still > > > > > > > > pass > > > > > > > > > > with > > > > > > > > > > > these optimizations? > > > > > > > > > > > > > > > > > > > > > > Can you also provide performance numbers for each of these > > > > routines > > > > > > > > compared > > > > > > > > > > to > > > > > > > > > > > the non-optimize versions? If the gain is too small, we'd > > > rather > > > > > not > > > > > > > have > > > > > > > > > to > > > > > > > > > > > have these hard-to-read and maintain code. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Tested compliance tests with these MSA optimizations. > > > > > > > > > > > > > > > > > > > > The performance numbers compared to non-optimized version are: > > > > > > > > > > vsma - 10x > > > > > > > > > > vsmul, vadd, vmul, zvmul - 5x > > > > > > > > > > vsvesq - 4x > > > > > > > > > > vmaxmgv - 10x > > > > > > > > > > vclip - 7x > > > > > > > > > > > > > > > > > > Very nice speed improvements! Did you just write a small test > to > > > > > compare > > > > > > > the > > > > > > > > > performance > > > > > > > > > of these routines with and without MSA? > > > > > > > > > > > > > > > > > > What compliance tests did you run? > > > > > > > > > > > > > > > > > > > > > > > > Yes. We had written standalone tests to compare the performance > with > > > and > > > > > > > without > > > > > > > > MSA. > > > > > > > > > > > > > > > > Also tested for compliance with c version using standalone tests. > > > > > > > > > > > > > > Thanks for the info. > > > > > > > > > > > > > > Can you also run the webaudio layout tests? They've been pretty > good > > at > > > > > > finding > > > > > > > small changes, and caught a few issues when SSE2 support was added. > > > > > > > > > > > > As i mentioned earlier, we have tested these MSA optimized kernels for > > all > > > > > > possible combinations/corner cases locally. As of today we don't have > > > > android > > > > > > device with MSA to run webaudio layout tests. > > > > > > > > > > Raymond, if you give a thumbs-up I'll approve this. > > > > > > > > Raymond, your view on this? should we go for commit? > > > > > > The general idea is great and we should do this. However, I've been away at > > > TPAC so I haven't reviewed the code yet. > > > > > > Could you tell me how much difference in numerical results there is between > > the > > > scalar and simd code? Perhaps as an SNR measure between the scalar and simd > > > results? Or were the results exactly identical? > > > > 1. The scalar and simd code for following functions generate exactly identical > > results - > > vsma , vsmul, vadd, vmul, vmaxmgv, vclip > > Fantastic! I have no problems including these simd optimizations for these > operations. > > > > > 2. zvmul - > > In scalar code : The operation is (a*b) +/- (c*d). Multiplications are done > > first and then addition/subtraction. So 3 rounding errors (2 for multiply, 1 > for > > add/sub) > > In simd code : The first multiplication (a*b) is done, then fused > > multiply-add instruciton is used. Here only 2 rounding errors. > > Hence the simd results are more precise than scalar counterpart. The > absolute > > max error observed between scalar and simd code is 0.005 > > Without knowing the magnitude of the input numbers, I can't really conclude > anything meaningful, but on the face of it an error of 0.005 is huge and > unacceptable. > > Also, I recall some paper by Prof. Kahan about fused multiplies for these kinds > of computations. Depending on which product is done in the FMA (a*b or c*d), > the result could be erroneously negative for something that should have been > positive. Unfortunately, I can't seem to find that paper. Thus, I'm a bit > concerned about using an FMA here. > > > > > 3. vsvesq - > > Due to difference in accumulation order in simd and scalar code(Please refer > > the msa simd code for vsvesq), the standard rounding error is accumulated > > differently hence the result diverts from that of scalar counterpart. > > Sure, but how much? > > It's really important to make sure the simd optimizations don't change the > computed results by too much. Such differences might be audible if combined in > certain ways. Thanks for the inputs. I will create the new patch set for 6 matching functions. Meanwhile i will do some more analysis and come up with data later for remaining 2 functions i.e. zvmul and vsvesq.
On 2016/09/30 09:01:42, Prashant.Patil wrote: > On 2016/09/29 18:12:24, Raymond Toy wrote: > > On 2016/09/29 13:25:00, Prashant.Patil wrote: > > > On 2016/09/28 16:01:23, Raymond Toy wrote: > > > > On 2016/09/28 09:18:07, Prashant.Patil wrote: > > > > > On 2016/09/22 00:46:04, Ken Russell wrote: > > > > > > On 2016/09/21 12:14:20, Prashant.Patil wrote: > > > > > > > On 2016/09/19 15:37:31, Raymond Toy-OOO TPAC Sep20-24 wrote: > > > > > > > > On 2016/09/19 06:55:11, Prashant.Patil wrote: > > > > > > > > > On 2016/09/16 15:17:35, Raymond Toy wrote: > > > > > > > > > > On 2016/09/16 07:56:01, Prashant.Patil wrote: > > > > > > > > > > > On 2016/09/14 17:14:11, Raymond Toy wrote: > > > > > > > > > > > > It will take me some time to review this. > > > > > > > > > > > > > > > > > > > > > > > > Can you please run the webaudio layout tests to make sure > > > things > > > > > > still > > > > > > > > > pass > > > > > > > > > > > with > > > > > > > > > > > > these optimizations? > > > > > > > > > > > > > > > > > > > > > > > > Can you also provide performance numbers for each of these > > > > > routines > > > > > > > > > compared > > > > > > > > > > > to > > > > > > > > > > > > the non-optimize versions? If the gain is too small, we'd > > > > rather > > > > > > not > > > > > > > > have > > > > > > > > > > to > > > > > > > > > > > > have these hard-to-read and maintain code. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Tested compliance tests with these MSA optimizations. > > > > > > > > > > > > > > > > > > > > > > The performance numbers compared to non-optimized version > are: > > > > > > > > > > > vsma - 10x > > > > > > > > > > > vsmul, vadd, vmul, zvmul - 5x > > > > > > > > > > > vsvesq - 4x > > > > > > > > > > > vmaxmgv - 10x > > > > > > > > > > > vclip - 7x > > > > > > > > > > > > > > > > > > > > Very nice speed improvements! Did you just write a small test > > to > > > > > > compare > > > > > > > > the > > > > > > > > > > performance > > > > > > > > > > of these routines with and without MSA? > > > > > > > > > > > > > > > > > > > > What compliance tests did you run? > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes. We had written standalone tests to compare the performance > > with > > > > and > > > > > > > > without > > > > > > > > > MSA. > > > > > > > > > > > > > > > > > > Also tested for compliance with c version using standalone > tests. > > > > > > > > > > > > > > > > Thanks for the info. > > > > > > > > > > > > > > > > Can you also run the webaudio layout tests? They've been pretty > > good > > > at > > > > > > > finding > > > > > > > > small changes, and caught a few issues when SSE2 support was > added. > > > > > > > > > > > > > > As i mentioned earlier, we have tested these MSA optimized kernels > for > > > all > > > > > > > possible combinations/corner cases locally. As of today we don't > have > > > > > android > > > > > > > device with MSA to run webaudio layout tests. > > > > > > > > > > > > Raymond, if you give a thumbs-up I'll approve this. > > > > > > > > > > Raymond, your view on this? should we go for commit? > > > > > > > > The general idea is great and we should do this. However, I've been away > at > > > > TPAC so I haven't reviewed the code yet. > > > > > > > > Could you tell me how much difference in numerical results there is > between > > > the > > > > scalar and simd code? Perhaps as an SNR measure between the scalar and > simd > > > > results? Or were the results exactly identical? > > > > > > 1. The scalar and simd code for following functions generate exactly > identical > > > results - > > > vsma , vsmul, vadd, vmul, vmaxmgv, vclip > > > > Fantastic! I have no problems including these simd optimizations for these > > operations. > > > > > > > > 2. zvmul - > > > In scalar code : The operation is (a*b) +/- (c*d). Multiplications are > done > > > first and then addition/subtraction. So 3 rounding errors (2 for multiply, 1 > > for > > > add/sub) > > > In simd code : The first multiplication (a*b) is done, then fused > > > multiply-add instruciton is used. Here only 2 rounding errors. > > > Hence the simd results are more precise than scalar counterpart. The > > absolute > > > max error observed between scalar and simd code is 0.005 > > > > Without knowing the magnitude of the input numbers, I can't really conclude > > anything meaningful, but on the face of it an error of 0.005 is huge and > > unacceptable. > > > > Also, I recall some paper by Prof. Kahan about fused multiplies for these > kinds > > of computations. Depending on which product is done in the FMA (a*b or c*d), > > the result could be erroneously negative for something that should have been > > positive. Unfortunately, I can't seem to find that paper. Thus, I'm a bit > > concerned about using an FMA here. > > > > > > > > 3. vsvesq - > > > Due to difference in accumulation order in simd and scalar code(Please > refer > > > the msa simd code for vsvesq), the standard rounding error is accumulated > > > differently hence the result diverts from that of scalar counterpart. > > > > Sure, but how much? > > > > It's really important to make sure the simd optimizations don't change the > > computed results by too much. Such differences might be audible if combined > in > > certain ways. > > Thanks for the inputs. > I will create the new patch set for 6 matching functions. > Meanwhile i will do some more analysis and come up with data later for remaining > 2 functions i.e. zvmul and vsvesq. Created new patch set (#2) with 6 matching functions only (vsma , vsmul, vadd, vmul, vmaxmgv, vclip) so that we can go with commit.
https://codereview.chromium.org/2340583003/diff/20001/third_party/WebKit/Sour... File third_party/WebKit/Source/platform/audio/VectorMath.cpp (right): https://codereview.chromium.org/2340583003/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/audio/VectorMath.cpp:197: LD_SP8(sourceP, 4, vSrc0, vSrc1, vSrc2, vSrc3, vSrc4, vSrc5, vSrc6, vSrc7); Are there alignment constraints for sourceP and destPCopy? In general AudioArrays are on 16-byte boundaries (I think), but I'm not sure this hold everywhere for all callers. Same issue applies to all of the routines below. https://codereview.chromium.org/2340583003/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/audio/VectorMath.cpp:205: if (n >= 28) { Is there really much to be gained in having this complicated unrolling for all of the possible remainders that aren't a multiple of 32 but are a multiple of 4? The primary use case is for blocks of 128, so this code is never executed. for other uses, I think they're always a power of two (greater than 128), so this code is never executed there either. https://codereview.chromium.org/2340583003/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/audio/VectorMath.cpp:333: for (; n >= 40; n -= 40) { Is it really worth doing blocks of 40 instead of 32? Since the typical use case is powers of two, this guarantees the messy code below is run. Using blocks of 32 keeps a nice symmetry with the other code that uses 32. https://codereview.chromium.org/2340583003/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/audio/VectorMath.cpp:512: LD_SP8(source2P, 4, vSrc4, vSrc5, vSrc6, vSrc7, vSrc12, vSrc13, vSrc14, vSrc15); Can we pick better names for vSrc[n]? It's really odd to see source1P loaded into vSrc0-3 and vSrc8-11. https://codereview.chromium.org/2340583003/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/audio/VectorMath.cpp:519: if (n >= 20) { Is this really worth doing? In the typical use case n is always a multiple of 32. https://codereview.chromium.org/2340583003/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/audio/VectorMath.cpp:679: LD_SP8(source1P, 4, vSrc0, vSrc1, vSrc2, vSrc3, vSrc8, vSrc9, vSrc10, vSrc11); Same comment as in line 512. https://codereview.chromium.org/2340583003/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/audio/VectorMath.cpp:959: LD_SP6(sourceP, 4, vSrc0, vSrc1, vSrc2, vSrc3, vSrc4, vSrc5); Same comment as in line 333. https://codereview.chromium.org/2340583003/diff/20001/third_party/WebKit/Sour... File third_party/WebKit/Source/platform/cpu/mips/CommonMacrosMSA.h (right): https://codereview.chromium.org/2340583003/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/cpu/mips/CommonMacrosMSA.h:526: */ Comment style here seems inconsistent with the nearby commenting style. https://codereview.chromium.org/2340583003/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/cpu/mips/CommonMacrosMSA.h:532: #define ADD4(in0, in1, in2, in3, in4, in5, in6, in7, out0, out1, out2, out3) \ I know the naming here is consistent with the rest of the file, but it's almost impossible to know which if the in's get added to produce which out. https://codereview.chromium.org/2340583003/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/cpu/mips/CommonMacrosMSA.h:577: Arguments : Inputs - in0, in1, scale Based on the code, out0, and out1 are also inputs. https://codereview.chromium.org/2340583003/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/cpu/mips/CommonMacrosMSA.h:612: Outputs - out0, out1 There are not outputs out0, out1, just the one output, max. And there's also the input mask. Impossible to tell what mask is for other than looking at the code. https://codereview.chromium.org/2340583003/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/cpu/mips/CommonMacrosMSA.h:628: Arguments : Inputs - in0, in1 You forgot min, max as inputs.
https://codereview.chromium.org/2340583003/diff/20001/third_party/WebKit/Sour... File third_party/WebKit/Source/platform/audio/VectorMath.cpp (right): https://codereview.chromium.org/2340583003/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/audio/VectorMath.cpp:197: LD_SP8(sourceP, 4, vSrc0, vSrc1, vSrc2, vSrc3, vSrc4, vSrc5, vSrc6, vSrc7); On 2016/10/03 16:47:06, Raymond Toy wrote: > Are there alignment constraints for sourceP and destPCopy? In general > AudioArrays are on 16-byte boundaries (I think), but I'm not sure this hold > everywhere for all callers. Same issue applies to all of the routines below. There are no alignment constraints https://codereview.chromium.org/2340583003/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/audio/VectorMath.cpp:205: if (n >= 28) { On 2016/10/03 16:47:06, Raymond Toy wrote: > Is there really much to be gained in having this complicated unrolling for all > of the possible remainders that aren't a multiple of 32 but are a multiple of 4? > > The primary use case is for blocks of 128, so this code is never executed. for > other uses, I think they're always a power of two (greater than 128), so this > code is never executed there either. OK. I shall remove all cases below 32. https://codereview.chromium.org/2340583003/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/audio/VectorMath.cpp:333: for (; n >= 40; n -= 40) { On 2016/10/03 16:47:06, Raymond Toy wrote: > Is it really worth doing blocks of 40 instead of 32? Since the typical use case > is powers of two, this guarantees the messy code below is run. > > Using blocks of 32 keeps a nice symmetry with the other code that uses 32. OK. This was done considering 10 cycle vector load latency in one of the mips variants. However most of the variants have ~8 cycle latency now, hence will change this to 32. https://codereview.chromium.org/2340583003/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/audio/VectorMath.cpp:512: LD_SP8(source2P, 4, vSrc4, vSrc5, vSrc6, vSrc7, vSrc12, vSrc13, vSrc14, vSrc15); On 2016/10/03 16:47:06, Raymond Toy wrote: > Can we pick better names for vSrc[n]? It's really odd to see source1P loaded > into vSrc0-3 and vSrc8-11. Done. https://codereview.chromium.org/2340583003/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/audio/VectorMath.cpp:519: if (n >= 20) { On 2016/10/03 16:47:06, Raymond Toy wrote: > Is this really worth doing? In the typical use case n is always a multiple of > 32. Done. https://codereview.chromium.org/2340583003/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/audio/VectorMath.cpp:679: LD_SP8(source1P, 4, vSrc0, vSrc1, vSrc2, vSrc3, vSrc8, vSrc9, vSrc10, vSrc11); On 2016/10/03 16:47:06, Raymond Toy wrote: > Same comment as in line 512. Done. https://codereview.chromium.org/2340583003/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/audio/VectorMath.cpp:959: LD_SP6(sourceP, 4, vSrc0, vSrc1, vSrc2, vSrc3, vSrc4, vSrc5); On 2016/10/03 16:47:06, Raymond Toy wrote: > Same comment as in line 333. Done. https://codereview.chromium.org/2340583003/diff/20001/third_party/WebKit/Sour... File third_party/WebKit/Source/platform/cpu/mips/CommonMacrosMSA.h (right): https://codereview.chromium.org/2340583003/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/cpu/mips/CommonMacrosMSA.h:526: */ On 2016/10/03 16:47:06, Raymond Toy wrote: > Comment style here seems inconsistent with the nearby commenting style. Done. https://codereview.chromium.org/2340583003/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/cpu/mips/CommonMacrosMSA.h:532: #define ADD4(in0, in1, in2, in3, in4, in5, in6, in7, out0, out1, out2, out3) \ On 2016/10/03 16:47:06, Raymond Toy wrote: > I know the naming here is consistent with the rest of the file, but it's almost > impossible to know which if the in's get added to produce which out. I will add macro description for better understanding. https://codereview.chromium.org/2340583003/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/cpu/mips/CommonMacrosMSA.h:577: Arguments : Inputs - in0, in1, scale On 2016/10/03 16:47:06, Raymond Toy wrote: > Based on the code, out0, and out1 are also inputs. Done. https://codereview.chromium.org/2340583003/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/cpu/mips/CommonMacrosMSA.h:612: Outputs - out0, out1 On 2016/10/03 16:47:06, Raymond Toy wrote: > There are not outputs out0, out1, just the one output, max. And there's also > the input mask. > > Impossible to tell what mask is for other than looking at the code. I will update the macro usage for better code readability. https://codereview.chromium.org/2340583003/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/cpu/mips/CommonMacrosMSA.h:628: Arguments : Inputs - in0, in1 On 2016/10/03 16:47:06, Raymond Toy wrote: > You forgot min, max as inputs. Done.
Description was changed from ========== Add MSA (MIPS SIMD Arch) optimized VectorMath functions We add following MSA optimized VectorMath functions: - vsma - vsmul - vadd - vmul - zvmul - vsvesq - vmaxmgv - vclip R=kbr@chromium.org BUG=642653 ========== to ========== Add MSA (MIPS SIMD Arch) optimized VectorMath functions We add following MSA optimized VectorMath functions: - vsma - vsmul - vadd - vmul - vmaxmgv - vclip R=kbr@chromium.org BUG=642653 ==========
https://codereview.chromium.org/2340583003/diff/20001/third_party/WebKit/Sour... File third_party/WebKit/Source/platform/audio/VectorMath.cpp (right): https://codereview.chromium.org/2340583003/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/audio/VectorMath.cpp:333: for (; n >= 40; n -= 40) { On 2016/10/04 11:47:27, Prashant.Patil wrote: > On 2016/10/03 16:47:06, Raymond Toy wrote: > > Is it really worth doing blocks of 40 instead of 32? Since the typical use > case > > is powers of two, this guarantees the messy code below is run. > > > > Using blocks of 32 keeps a nice symmetry with the other code that uses 32. > > OK. This was done considering 10 cycle vector load latency in one of the mips > variants. However most of the variants have ~8 cycle latency now, hence will > change this to 32. If there is significant gain in doing this, by all means let's do it. It just makes thinking about the code much harder.
https://codereview.chromium.org/2340583003/diff/40001/third_party/WebKit/Sour... File third_party/WebKit/Source/platform/audio/VectorMath.cpp (right): https://codereview.chromium.org/2340583003/diff/40001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/audio/VectorMath.cpp:859: const v4f32 vHighThr = (v4f32)__msa_fill_w(*((int32_t*)highThresholdP)); Use C++ casting here instead of C. However, is this guaranteed to do the right thing with this kind of type punning? I always forget the complicated rules. I always use a union of int and float for this kind of punning.
On 2016/10/04 15:37:44, Raymond Toy wrote: > https://codereview.chromium.org/2340583003/diff/20001/third_party/WebKit/Sour... > File third_party/WebKit/Source/platform/audio/VectorMath.cpp (right): > > https://codereview.chromium.org/2340583003/diff/20001/third_party/WebKit/Sour... > third_party/WebKit/Source/platform/audio/VectorMath.cpp:333: for (; n >= 40; n > -= 40) { > On 2016/10/04 11:47:27, Prashant.Patil wrote: > > On 2016/10/03 16:47:06, Raymond Toy wrote: > > > Is it really worth doing blocks of 40 instead of 32? Since the typical use > > case > > > is powers of two, this guarantees the messy code below is run. > > > > > > Using blocks of 32 keeps a nice symmetry with the other code that uses 32. > > > > OK. This was done considering 10 cycle vector load latency in one of the mips > > variants. However most of the variants have ~8 cycle latency now, hence will > > change this to 32. > > If there is significant gain in doing this, by all means let's do it. It just > makes thinking about the code much harder. Considering the readability and minimal gain we will stick to this change (block of 32)
On 2016/10/04 15:48:07, Raymond Toy wrote: > https://codereview.chromium.org/2340583003/diff/40001/third_party/WebKit/Sour... > File third_party/WebKit/Source/platform/audio/VectorMath.cpp (right): > > https://codereview.chromium.org/2340583003/diff/40001/third_party/WebKit/Sour... > third_party/WebKit/Source/platform/audio/VectorMath.cpp:859: const v4f32 > vHighThr = (v4f32)__msa_fill_w(*((int32_t*)highThresholdP)); > Use C++ casting here instead of C. > > However, is this guaranteed to do the right thing with this kind of type > punning? I always forget the complicated rules. I always use a union of int and > float for this kind of punning. As you suggested, i shall use union for this change.
Incorporated review comments
https://codereview.chromium.org/2340583003/diff/80001/third_party/WebKit/Sour... File third_party/WebKit/Source/platform/audio/VectorMath.cpp (right): https://codereview.chromium.org/2340583003/diff/80001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/audio/VectorMath.cpp:858: const v4f32 vLowThr = (v4f32)__msa_fill_w(FLOAT2INT(lowThreshold)); Isn't this some kind of gcc/clang extension? FLOAT2INT is a macro that expands to a bunch of code which is used as the argument to a function. We don't normally allow such extensions. https://codereview.chromium.org/2340583003/diff/80001/third_party/WebKit/Sour... File third_party/WebKit/Source/platform/cpu/mips/CommonMacrosMSA.h (right): https://codereview.chromium.org/2340583003/diff/80001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/cpu/mips/CommonMacrosMSA.h:785: Arguments : Inputs - in0, in1, in2, in3, max Nit: the dash following Inputs and Output doesn't line up like other places in this code.
https://codereview.chromium.org/2340583003/diff/80001/third_party/WebKit/Sour... File third_party/WebKit/Source/platform/audio/VectorMath.cpp (right): https://codereview.chromium.org/2340583003/diff/80001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/audio/VectorMath.cpp:858: const v4f32 vLowThr = (v4f32)__msa_fill_w(FLOAT2INT(lowThreshold)); On 2016/10/05 17:26:53, Raymond Toy wrote: > Isn't this some kind of gcc/clang extension? FLOAT2INT is a macro that expands > to a bunch of code which is used as the argument to a function. > > We don't normally allow such extensions. I will remove this macro usage. https://codereview.chromium.org/2340583003/diff/80001/third_party/WebKit/Sour... File third_party/WebKit/Source/platform/cpu/mips/CommonMacrosMSA.h (right): https://codereview.chromium.org/2340583003/diff/80001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/cpu/mips/CommonMacrosMSA.h:785: Arguments : Inputs - in0, in1, in2, in3, max On 2016/10/05 17:26:53, Raymond Toy wrote: > Nit: the dash following Inputs and Output doesn't line up like other places in > this code. Done.
incorporated review comments
Thanks for your patience in this! It's great to have these nice speedups for the basic vector operations. lgtm with one minor nit. https://codereview.chromium.org/2340583003/diff/140001/third_party/WebKit/Sou... File third_party/WebKit/Source/platform/cpu/mips/CommonMacrosMSA.h (right): https://codereview.chromium.org/2340583003/diff/140001/third_party/WebKit/Sou... third_party/WebKit/Source/platform/cpu/mips/CommonMacrosMSA.h:19: }FloatInt; Nit: "}FloatInt" -> "} FloatInt"
https://codereview.chromium.org/2340583003/diff/140001/third_party/WebKit/Sou... File third_party/WebKit/Source/platform/cpu/mips/CommonMacrosMSA.h (right): https://codereview.chromium.org/2340583003/diff/140001/third_party/WebKit/Sou... third_party/WebKit/Source/platform/cpu/mips/CommonMacrosMSA.h:19: }FloatInt; On 2016/10/06 15:42:48, Raymond Toy wrote: > Nit: "}FloatInt" -> "} FloatInt" Done.
Incorporated review comments
The CQ bit was checked by prashant.patil@imgtec.com
The patchset sent to the CQ was uploaded after l-g-t-m from rtoy@chromium.org Link to the patchset: https://codereview.chromium.org/2340583003/#ps160001 (title: "Incorporated review comments")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: chromium_presubmit on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presub...)
On 2016/10/07 08:20:45, commit-bot: I haz the power wrote: > Try jobs failed on following builders: > chromium_presubmit on master.tryserver.chromium.linux (JOB_FAILED, > http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presub...) Raymond, plz help me to figure out this error (chromium_presubmit) ** Presubmit ERRORS ** Missing LGTM from an OWNER for these files: third_party/WebKit/Source/platform/cpu/mips/CommonMacrosMSA.h
On 2016/10/07 13:00:37, Prashant.Patil wrote: > On 2016/10/07 08:20:45, commit-bot: I haz the power wrote: > > Try jobs failed on following builders: > > chromium_presubmit on master.tryserver.chromium.linux (JOB_FAILED, > > > http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presub...) > > Raymond, plz help me to figure out this error (chromium_presubmit) > > ** Presubmit ERRORS ** > Missing LGTM from an OWNER for these files: > third_party/WebKit/Source/platform/cpu/mips/CommonMacrosMSA.h kbr@: Can you take a look at CommonMacrosMSA.h?
kbr@chromium.org changed reviewers: + prashant.patil@imgtec.com - Prashant.Patil@imgtec.com
lgtm based on rtoy's review.
Prashant: This CL should be ready to land; all approvals have been granted, I think.
The CQ bit was checked by prashant.patil@imgtec.com
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: chromium_presubmit on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presub...)
The CQ bit was checked by prashant.patil@imgtec.com
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: chromium_presubmit on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presub...)
On 2016/10/12 08:47:18, commit-bot: I haz the power wrote: > Try jobs failed on following builders: > chromium_presubmit on master.tryserver.chromium.linux (JOB_FAILED, > http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presub...) Raymond, any idea about this failure?
On 2016/10/12 15:05:21, Prashant.Patil wrote: > On 2016/10/12 08:47:18, commit-bot: I haz the power wrote: > > Try jobs failed on following builders: > > chromium_presubmit on master.tryserver.chromium.linux (JOB_FAILED, > > > http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presub...) > > Raymond, any idea about this failure? Look at https://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presu... You need to run git cl format. I would just do git cl format on just the two files you modified. This is a very recent change in blink code; the code must conform to the Chromium style now.
Done formatting changes (git cl format)
The CQ bit was checked by prashant.patil@imgtec.com
The patchset sent to the CQ was uploaded after l-g-t-m from kbr@chromium.org, rtoy@chromium.org Link to the patchset: https://codereview.chromium.org/2340583003/#ps180001 (title: "formatting changes")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Message was sent while issue was closed.
Description was changed from ========== Add MSA (MIPS SIMD Arch) optimized VectorMath functions We add following MSA optimized VectorMath functions: - vsma - vsmul - vadd - vmul - vmaxmgv - vclip R=kbr@chromium.org BUG=642653 ========== to ========== Add MSA (MIPS SIMD Arch) optimized VectorMath functions We add following MSA optimized VectorMath functions: - vsma - vsmul - vadd - vmul - vmaxmgv - vclip R=kbr@chromium.org BUG=642653 ==========
Message was sent while issue was closed.
Committed patchset #10 (id:180001)
Message was sent while issue was closed.
Description was changed from ========== Add MSA (MIPS SIMD Arch) optimized VectorMath functions We add following MSA optimized VectorMath functions: - vsma - vsmul - vadd - vmul - vmaxmgv - vclip R=kbr@chromium.org BUG=642653 ========== to ========== Add MSA (MIPS SIMD Arch) optimized VectorMath functions We add following MSA optimized VectorMath functions: - vsma - vsmul - vadd - vmul - vmaxmgv - vclip R=kbr@chromium.org BUG=642653 Committed: https://crrev.com/95d7e1276aed8f2d2687de388d83919b813c78dc Cr-Commit-Position: refs/heads/master@{#424977} ==========
Message was sent while issue was closed.
Patchset 10 (id:??) landed as https://crrev.com/95d7e1276aed8f2d2687de388d83919b813c78dc Cr-Commit-Position: refs/heads/master@{#424977} |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
