|
|
Created:
4 years, 6 months ago by chfremer Modified:
4 years, 5 months ago CC:
avayvod+watch_chromium.org, chromium-reviews, darin-cc_chromium.org, feature-media-reviews_chromium.org, imcheng+watch_chromium.org, isheriff+watch_chromium.org, jam, jasonroberts+watch_google.com, miu+watch_chromium.org, mkwst+moarreviews-renderer_chromium.org, mlamouri+watch-content_chromium.org, posciak+watch_chromium.org, xjz+watch_chromium.org Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionApplied diff from eklavyamirani's cl [1] and rebased to latest master revision.
This is the starting point for taking over the cl and continue working on it.
[1] https://codereview.chromium.org/1769373003/
BUG=580391
R=mcasas@chromium.org
Committed: https://crrev.com/c9e8ada83949be0314cc2d1d90df7935c3dc9534
Cr-Commit-Position: refs/heads/master@{#401952}
Patch Set 1 #
Total comments: 6
Patch Set 2 : Revise comments/names in code #
Total comments: 8
Patch Set 3 : Refactor AudioBus but not yet its usages #Patch Set 4 : Add original signatures as deprecated #Patch Set 5 : Switched to templated method approach #
Total comments: 6
Patch Set 6 : Generalize AudioSampleTypeTraits using templates #Patch Set 7 : Performance Improvements #Patch Set 8 : Added unit tests #
Total comments: 23
Patch Set 9 : Addressed comments #Patch Set 10 : Rebase-update #Patch Set 11 : Performance tweak for Arm #
Total comments: 7
Patch Set 12 : Removed a todo #
Messages
Total messages: 71 (15 generated)
Description was changed from ========== Applied diff from eklavyamirani's cl BUG= ========== to ========== Applied diff from eklavyamirani's cl [1] and rebased to latest master revision. This is the starting point for taking over the cl and continue working on it. [1] https://codereview.chromium.org/1769373003/ BUG=580391 R=mcasas@chromium.org ==========
chfremer@chromium.org changed reviewers: + mcasas@chromium.org
bots please (git cl try) https://codereview.chromium.org/2024993004/diff/1/AUTHORS File AUTHORS (right): https://codereview.chromium.org/2024993004/diff/1/AUTHORS#newcode180 AUTHORS:180: Eklavya Mirani <eklavyamirani@gmail.com> Remove this entry if eklavyamirani@ is not the one submitting the patch. https://codereview.chromium.org/2024993004/diff/1/content/renderer/media/audi... File content/renderer/media/audio_track_recorder.cc (right): https://codereview.chromium.org/2024993004/diff/1/content/renderer/media/audi... content/renderer/media/audio_track_recorder.cc:250: buffer_.get()); Fits in one line? audio_bus->ToInterleavedFloat(0, 0, audio_bus->frames(), buffer_.get()); (Tip: run `git cl format` before uploading). https://codereview.chromium.org/2024993004/diff/1/media/base/audio_bus.cc File media/base/audio_bus.cc (right): https://codereview.chromium.org/2024993004/diff/1/media/base/audio_bus.cc#new... media/base/audio_bus.cc:314: for (; src < src_end; ++src, dest += channels()) nit: every cycle counts, so get channels() evaluation out of both loops: const int channels = channels();
Patchset #2 (id:20001) has been deleted
* Revised comments in audio_bus.h/cc after going through a read/understand cycle as someone previously unfamiliar with the code. * Revised some names/comments in audio_bus_unittest.cc to make intent more clear. * Removed eklavyamirani from AUTHORS Note: Not yet looking for sign-off. This is still just a preparation step before revising the code changes proposed by eklavyamirani. I will post at least one subsequent patches to this review with a proposed solution to the original issue. https://codereview.chromium.org/2024993004/diff/1/AUTHORS File AUTHORS (right): https://codereview.chromium.org/2024993004/diff/1/AUTHORS#newcode180 AUTHORS:180: Eklavya Mirani <eklavyamirani@gmail.com> On 2016/06/02 01:13:14, mcasas wrote: > Remove this entry if eklavyamirani@ is not the one > submitting the patch. Done. https://codereview.chromium.org/2024993004/diff/1/content/renderer/media/audi... File content/renderer/media/audio_track_recorder.cc (right): https://codereview.chromium.org/2024993004/diff/1/content/renderer/media/audi... content/renderer/media/audio_track_recorder.cc:250: buffer_.get()); On 2016/06/02 01:13:14, mcasas wrote: > Fits in one line? > audio_bus->ToInterleavedFloat(0, 0, audio_bus->frames(), buffer_.get()); > > (Tip: run `git cl format` before uploading). Done. https://codereview.chromium.org/2024993004/diff/1/media/base/audio_bus.cc File media/base/audio_bus.cc (right): https://codereview.chromium.org/2024993004/diff/1/media/base/audio_bus.cc#new... media/base/audio_bus.cc:314: for (; src < src_end; ++src, dest += channels()) On 2016/06/02 01:13:14, mcasas wrote: > nit: every cycle counts, so get channels() evaluation out > of both loops: > const int channels = channels(); As per the discussion in the corresponding bug entry [1], I am not sure whether or not we will keep this newly added code. Will address this if we decide to keep it. [1] https://bugs.chromium.org/p/chromium/issues/detail?id=580391
miu@chromium.org changed reviewers: + miu@chromium.org
https://codereview.chromium.org/2024993004/diff/40001/media/base/audio_bus.cc File media/base/audio_bus.cc (right): https://codereview.chromium.org/2024993004/diff/40001/media/base/audio_bus.cc... media/base/audio_bus.cc:312: int num_samples, s/num_samples/frames/ (for naming consistency with the rest of this class) https://codereview.chromium.org/2024993004/diff/40001/media/base/audio_bus.h File media/base/audio_bus.h (right): https://codereview.chromium.org/2024993004/diff/40001/media/base/audio_bus.h#... media/base/audio_bus.h:84: // nicer to reuse the existing methods and replace |int bytes_per_sample" with FYI--We're actually trying to get rid of |bytes_per_sample| throughout the audio stack. :) The reason is that we want the audio stack to standardize on planar float PCM. The conversion to/from interleaved int PCM is almost always a platform-interface-glue detail. https://codereview.chromium.org/2024993004/diff/40001/media/base/audio_bus.h#... media/base/audio_bus.h:86: void ToInterleavedFloat(int source_offset, I agree with Miguel here, but realize it's a much larger change: Seems that we should have templated From/ToInterleaved methods instead. Something like: template <typename InterleavedPODType> void FromInterleaved(const InterleavedPODType* source, int frames); template <typename InterleavedPODType> void ToInterleaved(int frames, InterleavedPODType* dest) const; ...where the only implemented InterleavedPODTypes would be int16_t and float (unless there is actually code that uses 8-bit or 24-bit integer PCM?). https://codereview.chromium.org/2024993004/diff/40001/media/base/audio_bus.h#... media/base/audio_bus.h:87: int destination_offset, nit: Please remove the |destination_offset| argument. Instead the caller should do pointer math on |buffer| so that it points at the correct starting destination location. https://codereview.chromium.org/2024993004/diff/40001/media/base/audio_bus.h#... media/base/audio_bus.h:88: int num_channels, This argument should be named |frames| (it's definitely not the number of channels!). https://codereview.chromium.org/2024993004/diff/40001/media/cast/sender/audio... File media/cast/sender/audio_encoder.cc (right): https://codereview.chromium.org/2024993004/diff/40001/media/cast/sender/audio... media/cast/sender/audio_encoder.cc:285: // Opus requires channel-interleaved samples in a single array. nit: You can remove this comment now. The call to ToInterleavedFloat() is a statement of this fact. ;-)
Patchset #3 (id:60001) has been deleted
* Refactored From/ToInterleaved/Partial to use an enum SampleFormat instead of |bytes_per_sample|. * Simplified and renamed From/ToInterleavedInternal by removing two template parameters that did not seem to be needed. Scale factor and bias are now applied in the float domain instead of a custom numeric type that needed to be passed in. * Added support to From/ToInterleaved/Partial for float SampleFormat. * Revised comments describing what the methods do. * Added a TODO for discussing the desired behavior regarding zeroing out frames in FromInterleavedPartial(). Note: The next step is to update all usages of the changed methods. Before adding the patch for this, I would like to agree on what is the most suitable way of doing this. Currently, all usages pass in a number for |bytes_per_sample|. We could simply offer a static method AudioBus::LookupSampleTypeFromBytesPerSample() and call it everywhere. But the cleaner solution would be to have all usages know what SampleType they actually want (as opposed to just how many bytes per sample the format has). Doing this thoroughly would require refactoring of the AudioParameters class [1], which already has a TODO for cleaning up an enum that looks like it describes sample formats, but then really doesn't. [1] https://code.google.com/p/chromium/codesearch#chromium/src/media/base/audio_p...
On 2016/06/02 23:06:09, chfremer wrote: > * Refactored From/ToInterleaved/Partial to use an enum SampleFormat instead of > |bytes_per_sample|. > * Simplified and renamed From/ToInterleavedInternal by removing two template > parameters that did not seem to be needed. Scale factor and bias are now applied > in the float domain instead of a custom numeric type that needed to be passed > in. > * Added support to From/ToInterleaved/Partial for float SampleFormat. > * Revised comments describing what the methods do. > * Added a TODO for discussing the desired behavior regarding zeroing out frames > in FromInterleavedPartial(). > > Note: The next step is to update all usages of the changed methods. Before > adding the patch for this, I would like to agree on what is the most suitable > way of doing this. > > Currently, all usages pass in a number for |bytes_per_sample|. We could simply > offer a static method AudioBus::LookupSampleTypeFromBytesPerSample() and call it > everywhere. But the cleaner solution would be to have all usages know what > SampleType they actually want (as opposed to just how many bytes per sample the > format has). > > Doing this thoroughly would require refactoring of the AudioParameters class > [1], which already has a TODO for cleaning up an enum that looks like it > describes sample formats, but then really doesn't. > > [1] > https://code.google.com/p/chromium/codesearch#chromium/src/media/base/audio_p... After thinking about it for a bit, I believe a good approach to handle the call sites that currently depend on the old signature of From/ToInterleaved/Partial is to keep the old signatures around, mark them as deprecated, and have them forward to the new methods. We then have the option to update the call sites step-by-step in subsequent modifications.
I'm not a fan of the "enum arg" proposal. The template-based approach I mentioned before allows for: 1) No pointer casting (to void* and back); 2) Linker can throw out unused implementations, like the int8_t version that is not being used by any client code; 3) Compile-time checking of this "argument" rather than having to account for failure possibilities at run-time.
dalecurtis@chromium.org changed reviewers: + dalecurtis@chromium.org
If you decide to modify the existing methods, make sure you run the AudioBus perf tests in media_perftests to avoid regressions. They might be run on bots still somewhere, but I don't recall.
On 2016/06/03 18:14:20, miu wrote: > I'm not a fan of the "enum arg" proposal. The template-based approach I > mentioned before allows for: 1) No pointer casting (to void* and back); 2) > Linker can throw out unused implementations, like the int8_t version that is not > being used by any client code; 3) Compile-time checking of this "argument" > rather than having to account for failure possibilities at run-time. @miu: Thanks for your input on this. I agree with the advantages you list for the template-based approach. And it is true that currently all our supported sample formats happen to use a unique underlying C++ data type, which makes it sufficiently unique to indicate which sample format we mean. However, the 1:1 mapping between data types and sample format is a little implicit and does not hold in general. Also, when using data types to express the sample format, we have to use typeids whenever we need to do runtime logic that involves evaluating the sample type. The other concern I have is about exposing templated methods in the public API of a class (in this case AudioBus). Since templated methods cannot be virtual, they cannot be used in C++ interfaces. By using templated methods here, we would force all clients to be strongly coupled to the concrete class. I have a proposal for how we could get the benefits and avoid some of the concerns: AudioBus { public: void FromInterleaved8BitUnsignedInt(const uint8_t* source, int num_frames_to_write); void FromInterleaved16BitSignedInt(const uint16_t* source, int num_frames_to_write); void FromInterleaved32BitSignedInt(const uint32_t* source, int num_frames_to_write); void FromInterleaved32BitFloat(const float* source, int num_frames_to_write); } This way, we can make the sample type explicit through the name of the method, and we get rid of the void* as well as the potential failures at run-time. How does that look to you?
On 2016/06/03 21:53:55, chfremer wrote: > On 2016/06/03 18:14:20, miu wrote: > > I'm not a fan of the "enum arg" proposal. The template-based approach I > > mentioned before allows for: 1) No pointer casting (to void* and back); 2) > > Linker can throw out unused implementations, like the int8_t version that is > not > > being used by any client code; 3) Compile-time checking of this "argument" > > rather than having to account for failure possibilities at run-time. > > @miu: Thanks for your input on this. I agree with the advantages you > list for the template-based approach. And it is true that currently > all our supported sample formats happen to use a unique underlying C++ data > type, which makes it sufficiently unique to indicate which sample format we > mean. > However, the 1:1 mapping between data types and sample format is a little > implicit > and does not hold in general. Also, when using data types to express the sample > format, we have to use typeids whenever we need to do runtime logic that > involves > evaluating the sample type. > > The other concern I have is about exposing templated methods in the public API > of a class (in this case AudioBus). Since templated methods cannot be virtual, > they cannot be used in C++ interfaces. By using templated methods here, we would > > force all clients to be strongly coupled to the concrete class. > > I have a proposal for how we could get the benefits and avoid some of the > concerns: > > AudioBus { > public: > void FromInterleaved8BitUnsignedInt(const uint8_t* source, int > num_frames_to_write); > void FromInterleaved16BitSignedInt(const uint16_t* source, int > num_frames_to_write); > void FromInterleaved32BitSignedInt(const uint32_t* source, int > num_frames_to_write); > void FromInterleaved32BitFloat(const float* source, int num_frames_to_write); > } > > This way, we can make the sample type explicit through the name of the method, > and > we get rid of the void* as well as the potential failures at run-time. > > How does that look to you? All this SGTM. :)
Patchset #5 (id:120001) has been deleted
After thinking about it a bit more, I decided that going with the templated methods in the public interface of AudioBus (as suggested by both mcasas and miu) is probably the best approach. Since the main purpose of the AudioBus class is to represent data (and not behavior), it is acceptable that other classes depend on it directly instead of using an interface to have them depend on an abstraction. Instead of using the C++ data type as the template parameter, I am proposing to introduce a concept "SampleTypeTraits" and providing corresponding classes for all supported sample types (see file audio_sample_types.h [1]). This has several advantages: 1.) It does not require a 1:1 mapping between C++ data types and sample formats. 2.) It allows decoupling of the sample conversion logic (which is somewhat specific to the sample type) from the (de-)interleaving logic. The sample conversion logic can live inside the SampleTypeTraits. I have uploaded the proposed changes in Patch Set 5. The media_unittests pass. My next steps are to run the AudioBus perf tests as suggested by DaleCurtis and to use the compiler to check if UnsignedInt8 and SignedInt32 are actually used anywhere (as per suggestion by mcasas). [1] https://codereview.chromium.org/2024993004/patch/140001/150006 https://codereview.chromium.org/2024993004/diff/40001/media/base/audio_bus.h File media/base/audio_bus.h (right): https://codereview.chromium.org/2024993004/diff/40001/media/base/audio_bus.h#... media/base/audio_bus.h:86: void ToInterleavedFloat(int source_offset, On 2016/06/02 20:40:21, miu (OOO June 11-19) wrote: > I agree with Miguel here, but realize it's a much larger change: Seems that we > should have templated From/ToInterleaved methods instead. Something like: > > template <typename InterleavedPODType> > void FromInterleaved(const InterleavedPODType* source, int frames); > > template <typename InterleavedPODType> > void ToInterleaved(int frames, InterleavedPODType* dest) const; > > ...where the only implemented InterleavedPODTypes would be int16_t and float > (unless there is actually code that uses 8-bit or 24-bit integer PCM?). Done. https://codereview.chromium.org/2024993004/diff/40001/media/cast/sender/audio... File media/cast/sender/audio_encoder.cc (right): https://codereview.chromium.org/2024993004/diff/40001/media/cast/sender/audio... media/cast/sender/audio_encoder.cc:285: // Opus requires channel-interleaved samples in a single array. On 2016/06/02 20:40:21, miu (OOO June 11-19) wrote: > nit: You can remove this comment now. The call to ToInterleavedFloat() is a > statement of this fact. ;-) Done.
A quick note on the perf tests. I did 3 runs of AudioBusPerfTest both with and without the CL. It seems the proposed CL makes things quit a bit slower. I will look into why that is and see if I can find a solution. Here are the results: Before change: RUN 1 chfremer@chfremer0:~/code/chromium/src$ out/gn_release/media_perftests --gtest_filter="AudioBusPerfTest.*" IMPORTANT DEBUGGING NOTE: batches of tests are run inside their own process. For debugging a test inside a debugger, use the --gtest_filter=<your_test_name> flag along with --single-process-tests. Using sharding settings from environment. This is shard 0/1 Using 1 parallel jobs. Note: Google Test filter = AudioBusPerfTest.Interleave [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from AudioBusPerfTest [ RUN ] AudioBusPerfTest.Interleave *RESULT audio_bus_to_interleaved: int8_t= 18.38665 ms *RESULT audio_bus_from_interleaved: int8_t= 17.373150000000003 ms *RESULT audio_bus_to_interleaved: int16_t= 18.74655 ms *RESULT audio_bus_from_interleaved: int16_t= 17.10665 ms *RESULT audio_bus_to_interleaved: int32_t= 22.68235 ms *RESULT audio_bus_from_interleaved: int32_t= 18.4614 ms [ OK ] AudioBusPerfTest.Interleave (2852 ms) [----------] 1 test from AudioBusPerfTest (2852 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test case ran. (2852 ms total) [ PASSED ] 1 test. [1/1] AudioBusPerfTest.Interleave (2852 ms) SUCCESS: all tests passed. Tests took 2 seconds. RUN 2 chfremer@chfremer0:~/code/chromium/src$ out/gn_release/media_perftests --gtest_filter="AudioBusPerfTest.*" IMPORTANT DEBUGGING NOTE: batches of tests are run inside their own process. For debugging a test inside a debugger, use the --gtest_filter=<your_test_name> flag along with --single-process-tests. Using sharding settings from environment. This is shard 0/1 Using 1 parallel jobs. Note: Google Test filter = AudioBusPerfTest.Interleave [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from AudioBusPerfTest [ RUN ] AudioBusPerfTest.Interleave *RESULT audio_bus_to_interleaved: int8_t= 22.22195 ms *RESULT audio_bus_from_interleaved: int8_t= 13.9913 ms *RESULT audio_bus_to_interleaved: int16_t= 22.48115 ms *RESULT audio_bus_from_interleaved: int16_t= 13.4035 ms *RESULT audio_bus_to_interleaved: int32_t= 15.54395 ms *RESULT audio_bus_from_interleaved: int32_t= 12.88245 ms [ OK ] AudioBusPerfTest.Interleave (2645 ms) [----------] 1 test from AudioBusPerfTest (2645 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test case ran. (2645 ms total) [ PASSED ] 1 test. [1/1] AudioBusPerfTest.Interleave (2645 ms) SUCCESS: all tests passed. Tests took 2 seconds. RUN 3 chfremer@chfremer0:~/code/chromium/src$ out/gn_release/media_perftests --gtest_filter="AudioBusPerfTest.*" IMPORTANT DEBUGGING NOTE: batches of tests are run inside their own process. For debugging a test inside a debugger, use the --gtest_filter=<your_test_name> flag along with --single-process-tests. Using sharding settings from environment. This is shard 0/1 Using 1 parallel jobs. Note: Google Test filter = AudioBusPerfTest.Interleave [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from AudioBusPerfTest [ RUN ] AudioBusPerfTest.Interleave *RESULT audio_bus_to_interleaved: int8_t= 16.1233 ms *RESULT audio_bus_from_interleaved: int8_t= 13.9184 ms *RESULT audio_bus_to_interleaved: int16_t= 16.06805 ms *RESULT audio_bus_from_interleaved: int16_t= 15.7369 ms *RESULT audio_bus_to_interleaved: int32_t= 23.42435 ms *RESULT audio_bus_from_interleaved: int32_t= 24.16545 ms [ OK ] AudioBusPerfTest.Interleave (2787 ms) [----------] 1 test from AudioBusPerfTest (2787 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test case ran. (2788 ms total) [ PASSED ] 1 test. [1/1] AudioBusPerfTest.Interleave (2787 ms) SUCCESS: all tests passed. Tests took 2 seconds. After change: RUN 1 chfremer@chfremer0:~/code/chromium/src$ out/gn_release/media_perftests --gtest_filter="AudioBusPerfTest.*" IMPORTANT DEBUGGING NOTE: batches of tests are run inside their own process. For debugging a test inside a debugger, use the --gtest_filter=<your_test_name> flag along with --single-process-tests. Using sharding settings from environment. This is shard 0/1 Using 1 parallel jobs. Note: Google Test filter = AudioBusPerfTest.Interleave [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from AudioBusPerfTest [ RUN ] AudioBusPerfTest.Interleave *RESULT audio_bus_to_interleaved: int8_t= 73.5419 ms *RESULT audio_bus_from_interleaved: int8_t= 23.9568 ms *RESULT audio_bus_to_interleaved: int16_t= 25.65945 ms *RESULT audio_bus_from_interleaved: int16_t= 25.419999999999998 ms *RESULT audio_bus_to_interleaved: int32_t= 21.818450000000002 ms *RESULT audio_bus_from_interleaved: int32_t= 18.5058 ms [ OK ] AudioBusPerfTest.Interleave (4387 ms) [----------] 1 test from AudioBusPerfTest (4387 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test case ran. (4387 ms total) [ PASSED ] 1 test. [1/1] AudioBusPerfTest.Interleave (4387 ms) SUCCESS: all tests passed. Tests took 4 seconds. RUN 2 chfremer@chfremer0:~/code/chromium/src$ out/gn_release/media_perftests --gtest_filter="AudioBusPerfTest.*" IMPORTANT DEBUGGING NOTE: batches of tests are run inside their own process. For debugging a test inside a debugger, use the --gtest_filter=<your_test_name> flag along with --single-process-tests. Using sharding settings from environment. This is shard 0/1 Using 1 parallel jobs. Note: Google Test filter = AudioBusPerfTest.Interleave [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from AudioBusPerfTest [ RUN ] AudioBusPerfTest.Interleave *RESULT audio_bus_to_interleaved: int8_t= 73.73075 ms *RESULT audio_bus_from_interleaved: int8_t= 23.8739 ms *RESULT audio_bus_to_interleaved: int16_t= 22.5804 ms *RESULT audio_bus_from_interleaved: int16_t= 25.044249999999998 ms *RESULT audio_bus_to_interleaved: int32_t= 22.32145 ms *RESULT audio_bus_from_interleaved: int32_t= 13.37345 ms [ OK ] AudioBusPerfTest.Interleave (4156 ms) [----------] 1 test from AudioBusPerfTest (4156 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test case ran. (4156 ms total) [ PASSED ] 1 test. [1/1] AudioBusPerfTest.Interleave (4155 ms) SUCCESS: all tests passed. Tests took 4 seconds. RUN 3 chfremer@chfremer0:~/code/chromium/src$ out/gn_release/media_perftests --gtest_filter="AudioBusPerfTest.*" IMPORTANT DEBUGGING NOTE: batches of tests are run inside their own process. For debugging a test inside a debugger, use the --gtest_filter=<your_test_name> flag along with --single-process-tests. Using sharding settings from environment. This is shard 0/1 Using 1 parallel jobs. Note: Google Test filter = AudioBusPerfTest.Interleave [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from AudioBusPerfTest [ RUN ] AudioBusPerfTest.Interleave *RESULT audio_bus_to_interleaved: int8_t= 77.11555 ms *RESULT audio_bus_from_interleaved: int8_t= 25.28095 ms *RESULT audio_bus_to_interleaved: int16_t= 22.5516 ms *RESULT audio_bus_from_interleaved: int16_t= 25.037799999999997 ms *RESULT audio_bus_to_interleaved: int32_t= 15.694049999999999 ms *RESULT audio_bus_from_interleaved: int32_t= 18.9465 ms [ OK ] AudioBusPerfTest.Interleave (4328 ms) [----------] 1 test from AudioBusPerfTest (4328 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test case ran. (4328 ms total) [ PASSED ] 1 test. [1/1] AudioBusPerfTest.Interleave (4328 ms) SUCCESS: all tests passed. Tests took 4 seconds.
I like the idea of providing sample type traits. Though, I think there's a lot of duplicated code there that could be simplified (idea presented in comments below): https://codereview.chromium.org/2024993004/diff/140001/media/base/audio_sampl... File media/base/audio_sample_types.h (right): https://codereview.chromium.org/2024993004/diff/140001/media/base/audio_sampl... media/base/audio_sample_types.h:27: typedef uint8_t ValueType; Chromium's preferring the new syntax for new code: using ValueType = uint8_t; https://codereview.chromium.org/2024993004/diff/140001/media/base/audio_sampl... media/base/audio_sample_types.h:28: static uint8_t min_value() { return 0; } All the zero-arg static methods in these classes should be declared with constexpr. Actually, do they even need to be methods? For example: static constexpr uint8_t min_value() { return 0; } ... Even better, IMHO: constexpr uint8_t kMinValue = 0; constexpr uint8_t kMaxValue = 255; constexpr uint8_t kZeroPointValue = 128; https://codereview.chromium.org/2024993004/diff/140001/media/base/audio_sampl... media/base/audio_sample_types.h:67: class Linear16BitSignedIntSampleTypeTraits { Seems all these classes for the integer types are identical code, sans the type. This suggests templating (write less code that does more). Modifying your code, here's what I'm proposing (also with minor changes for Google and Chromium C++ style): // For uint8_t, int16_t, int32_t... template <typename SampleType> class FixedSampleTypeTraits { static_assert(std::numeric_limits<SampleType>::is_integer(), "Template is only valid for integer types."); public: using ValueType = SampleType; constexpr SampleType kMinValue = std::numeric_limits<SampleType>::min(); constexpr SampleType kMaxValue = std::numeric_limits<SampleType>::max(); constexpr SampleType kZeroPointValue = (kMinValue == 0) ? (kMaxValue / 2 + 1) : 0; template <typename FloatType> static SampleType ConvertFromFloat(FloatType source_value) { if (source_value <= FloatSampleTypeTraits<FloatType>::kMinValue) return kMinValue; if (source_value >= FloatSampleTypeTraits<FloatType>::kMaxValue) return kMaxValue; return static_cast<SampleType>( (source_value * GetScaleFactor(source_value)) + kZeroPointValue); } template <typename FloatType> static FloatType ConvertToFloat(SampleType source_value) { FloatType offset_value = static_cast<FloatType>(source_value) - kZeroPointValue; return offset_value / GetScaleFactor(offset_value); } private: template <typename FloatType> static FloatType GetScaleFactor(FloatType input_value) { return (input_value < 0.0f) ? (static_cast<FloatType>(kZeroPointValue) - kMinValue) : (static_cast<FloatType>(kMaxValue) - kZeroPointValue); } }; // For float or double. template <typename SampleType> class FloatSampleTypeTraits { static_assert(std::is_floating_point<SampleType>::value, "Template is only valid for float types."); public: using ValueType = SampleType; constexpr SampleType kMinValue = -1.0f; constexpr SampleType kMaxValue = +1.0f; constexpr SampleType kZeroPointValue = 0.0f; template <typename FloatType> static SampleType ConvertFromFloat(FloatType source_value) { return static_cast<SampleType>(source_value); } template <typename FloatType> static FloatType ConvertToFloat(SampleType source_value) { return static_cast<FloatType>(source_value) } };
Patchset #6 (id:160001) has been deleted
I incorporated miu's suggestion for generalizing the individual SampleTypeTraits classes using templates and getting rid of duplicate code this way. The only modifications I did were to add some comments calling out important edge cases as well as providing some convenience aliases for commonly used types. The design feels good to me at this point, but the perf tests currently show some regression. Haven't investigated yet, but I suspect it is because it is recomputing values a lot that could be cached instead. I will try to optimize that and post a subsequent Patch Set with those changes. https://codereview.chromium.org/2024993004/diff/140001/media/base/audio_sampl... File media/base/audio_sample_types.h (right): https://codereview.chromium.org/2024993004/diff/140001/media/base/audio_sampl... media/base/audio_sample_types.h:27: typedef uint8_t ValueType; On 2016/06/07 22:16:36, miu (OOO June 11-19) wrote: > Chromium's preferring the new syntax for new code: > > using ValueType = uint8_t; Done. https://codereview.chromium.org/2024993004/diff/140001/media/base/audio_sampl... media/base/audio_sample_types.h:28: static uint8_t min_value() { return 0; } On 2016/06/07 22:16:36, miu (OOO June 11-19) wrote: > All the zero-arg static methods in these classes should be declared with > constexpr. Actually, do they even need to be methods? For example: > > static constexpr uint8_t min_value() { return 0; } > ... > > Even better, IMHO: > > constexpr uint8_t kMinValue = 0; > constexpr uint8_t kMaxValue = 255; > constexpr uint8_t kZeroPointValue = 128; Done. https://codereview.chromium.org/2024993004/diff/140001/media/base/audio_sampl... media/base/audio_sample_types.h:67: class Linear16BitSignedIntSampleTypeTraits { Thanks for this beautiful suggestion. I will create another Path Set to incorporate it.
On 2016/06/07 22:18:30, DaleCurtis wrote: > See also https://codereview.chromium.org/1854433002 Thanks for bringing this to my attention. I was not aware of the AudioBuffer class before (forgive me, I am new here), and I am a little shocked to see the amount of duplication between AudioBuffer and AudioBus. Both seem to serve pretty much the same purpose except that AudioBuffer knows something about time stamps. There are probably historical reasons for why it is this way, but do we really need the two different classes? If yes, why? If no, is the goal to unify them at some point?
On 2016/06/08 18:46:39, chfremer wrote: > The design feels good to me at this point, but the perf tests currently show > some regression. Haven't investigated yet, but I suspect it is because it is > recomputing values a lot that could be cached instead. I will try to optimize > that and post a subsequent Patch Set with those changes. Sometimes it helps to see the assembly code being generated. It's possible there are things the compiler isn't optimizing well, like not inlining function calls and/or extra branching or comparisons that you could work around. Also, do we know if this code is being compiled for speed or size? By default, all code is compiled for size, and a special BUILD file foo is needed to compile for speed. Generally, performance on data like this is usually achieved when the compiler finds a way to operate on multiple vector elements simultaneously (e.g., via loop unrolling, etc.). It might be worth looking into whether the use of SSE2 and/or ARM-NEON instructions would help accelerate the interleave/de-interleave algorithms. For example: https://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--...
On 2016/06/08 19:29:37, chfremer wrote: > I was not aware of the AudioBuffer class before (forgive me, I am new here), and > I am a little shocked to see the amount of duplication between AudioBuffer and > AudioBus. Both seem to serve pretty much the same purpose except that > AudioBuffer knows something about time stamps. There are probably historical > reasons for why it is this way, but do we really need the two different classes? > If yes, why? If no, is the goal to unify them at some point? I could swear I've had this conversation before, but can't find the e-mail thread. Anyway, IMHO, it would be great to see AudioBus and AudioBuffer replaced with a new "AudioFrame" that behaves just like VideoFrame (where reference timestamps and other metadata can be carried alongside the audio data). Also, perhaps an "AudioFrame" may be one of those things that *should* be ref-counted (because it holds shared data passed between threads to possibly multiple consumers) instead of scoped-owned.
On 2016/06/08 at 19:44:26, miu wrote: > On 2016/06/08 19:29:37, chfremer wrote: > > I was not aware of the AudioBuffer class before (forgive me, I am new here), and > > I am a little shocked to see the amount of duplication between AudioBuffer and > > AudioBus. Both seem to serve pretty much the same purpose except that > > AudioBuffer knows something about time stamps. There are probably historical > > reasons for why it is this way, but do we really need the two different classes? > > If yes, why? If no, is the goal to unify them at some point? > > I could swear I've had this conversation before, but can't find the e-mail thread. Anyway, IMHO, it would be great to see AudioBus and AudioBuffer replaced with a new "AudioFrame" that behaves just like VideoFrame (where reference timestamps and other metadata can be carried alongside the audio data). Also, perhaps an "AudioFrame" may be one of those things that *should* be ref-counted (because it holds shared data passed between threads to possibly multiple consumers) instead of scoped-owned. The benefit of AudioBuffer is that it does no conversion of the data until necessary, where as AudioBus is used in contexts where it must always be float planar data -- greatly simplifying the container and clients using it. Absolutely we should get rid of the format conversion duplicity between the two, but I'm less sold on a unification of the two. Certainly no one is happy with the state of VideoFrame and all of its use cases, so I don't think that's a good example :)
Applied Performance Improvements: * In FixedSampleTypeTraits::ConvertFromFloatType(), use a nested if/else structure instead of handling the clipping cases first. * Pre-compute all scaling factors at compile-time in order to save computation time. * Use pre-computed inverse scaling factors to replace floating point division by multiplication. Note, that my performance measurements were not very thorough, so I am not sure how all this translates to other platforms. To assess performance changes, I looked at the results of the AudioBusPerfTest using a release build on my Linux dev box. Numbers are listed below. Before committing, I would like to add one more Patch Set where I will add unit tests for the sample conversion logic in audio_sample_types.h, since it is now nicely separated from the interleaving logic and can be tested in isolation. -- Performance Numbers: Before Patch Set 1: Run 1 *RESULT audio_bus_to_interleaved: int8_t= 18.38665 ms *RESULT audio_bus_from_interleaved: int8_t= 17.373150000000003 ms *RESULT audio_bus_to_interleaved: int16_t= 18.74655 ms *RESULT audio_bus_from_interleaved: int16_t= 17.10665 ms *RESULT audio_bus_to_interleaved: int32_t= 22.68235 ms *RESULT audio_bus_from_interleaved: int32_t= 18.4614 ms Run 2 *RESULT audio_bus_to_interleaved: int8_t= 22.22195 ms *RESULT audio_bus_from_interleaved: int8_t= 13.9913 ms *RESULT audio_bus_to_interleaved: int16_t= 22.48115 ms *RESULT audio_bus_from_interleaved: int16_t= 13.4035 ms *RESULT audio_bus_to_interleaved: int32_t= 15.54395 ms *RESULT audio_bus_from_interleaved: int32_t= 12.88245 ms Run 3 *RESULT audio_bus_to_interleaved: int8_t= 16.1233 ms *RESULT audio_bus_from_interleaved: int8_t= 13.9184 ms *RESULT audio_bus_to_interleaved: int16_t= 16.06805 ms *RESULT audio_bus_from_interleaved: int16_t= 15.7369 ms *RESULT audio_bus_to_interleaved: int32_t= 23.42435 ms *RESULT audio_bus_from_interleaved: int32_t= 24.16545 ms After Patch Set 6: Run 1 (the only run I did): *RESULT audio_bus_to_interleaved: int8_t= 77.3463 ms *RESULT audio_bus_from_interleaved: int8_t= 27.853050000000003 ms *RESULT audio_bus_to_interleaved: int16_t= 83.40355 ms *RESULT audio_bus_from_interleaved: int16_t= 26.1024 ms *RESULT audio_bus_to_interleaved: int32_t= 16.43055 ms *RESULT audio_bus_from_interleaved: int32_t= 12.95375 ms After Patch Set 7 (current) Run 1 *RESULT audio_bus_to_interleaved: int8_t= 16.7107 ms *RESULT audio_bus_from_interleaved: int8_t= 13.05325 ms *RESULT audio_bus_to_interleaved: int16_t= 14.842099999999999 ms *RESULT audio_bus_from_interleaved: int16_t= 13.95645 ms *RESULT audio_bus_to_interleaved: int32_t= 20.8999 ms *RESULT audio_bus_from_interleaved: int32_t= 19.71995 ms Run 2 *RESULT audio_bus_to_interleaved: int8_t= 20.16935 ms *RESULT audio_bus_from_interleaved: int8_t= 15.592949999999998 ms *RESULT audio_bus_to_interleaved: int16_t= 15.4225 ms *RESULT audio_bus_from_interleaved: int16_t= 12.59485 ms *RESULT audio_bus_to_interleaved: int32_t= 15.379900000000001 ms *RESULT audio_bus_from_interleaved: int32_t= 21.4834 ms Run 3 *RESULT audio_bus_to_interleaved: int8_t= 19.024700000000003 ms *RESULT audio_bus_from_interleaved: int8_t= 13.556000000000001 ms *RESULT audio_bus_to_interleaved: int16_t= 14.853800000000001 ms *RESULT audio_bus_from_interleaved: int16_t= 12.750350000000001 ms *RESULT audio_bus_to_interleaved: int32_t= 20.5277 ms *RESULT audio_bus_from_interleaved: int32_t= 14.934800000000001 ms
* Added unit tests * Fixed a bug in audio_sample_types.h, where a negative float was cast to an unsigned int leading to an undefined value. Assuming the try bots are green, this CL is now ready for commit from my side. Please review for sign-off.
On 2016/06/10 at 22:58:42, chfremer wrote: > * Added unit tests > * Fixed a bug in audio_sample_types.h, where a negative float was cast to an unsigned int leading to an undefined value. > > Assuming the try bots are green, this CL is now ready for commit from my side. Please review for sign-off. Do you have an arm system to test on? An issue we had in the past was accidental promotion to 64 bit types causing large slowdowns.
Overall approach lgtm. I'll be out next week, so I'm delegating to the other reviewers to review the finer details. I'll just re-iterate once more: There are ARM-NEON and SSE instructions for doing high-performance interleave/de-intervleaved memory loads and stores. If not this change, it might be worth looking into optimizing further in a later change. Often you can get 4X or more improvement in performance (which usually also translates into improved battery life). We do this for a number of things already: https://cs.chromium.org/chromium/src/media/base/vector_math.h?sq=package:chro...
A few comments, nothing major; if the audio-guys are happy, this is a Lima-Golf-Tango-Mike to me https://codereview.chromium.org/2024993004/diff/220001/media/base/audio_bus.cc File media/base/audio_bus.cc (right): https://codereview.chromium.org/2024993004/diff/220001/media/base/audio_bus.c... media/base/audio_bus.cc:55: void AudioBus::CheckOverflow(int start_frame, int frames, int total_frames) { This doesn't need to be a member right? (Uses not member vars and is private). https://codereview.chromium.org/2024993004/diff/220001/media/base/audio_bus.c... media/base/audio_bus.cc:232: NOTREACHED() << "Unsupported bytes per sample encountered."; nit: ... encountered: " << bytes_per_sample; https://codereview.chromium.org/2024993004/diff/220001/media/base/audio_bus.c... media/base/audio_bus.cc:256: NOTREACHED() << "Unsupported bytes per sample encountered."; idem nit https://codereview.chromium.org/2024993004/diff/220001/media/base/audio_bus.h File media/base/audio_bus.h (right): https://codereview.chromium.org/2024993004/diff/220001/media/base/audio_bus.h... media/base/audio_bus.h:118: // DEPRECATED nit: mention the bug // DEPRECATED (https://crbug.com/580391) And maybe you can even: // TODO(crfremer): deprecated, remove (https://crbug.com/abcdef) :) https://codereview.chromium.org/2024993004/diff/220001/media/base/audio_bus.h... media/base/audio_bus.h:130: // DEPRECATED Idem. https://codereview.chromium.org/2024993004/diff/220001/media/base/audio_bus.h... media/base/audio_bus.h:285: void AudioBus::CopyConvertFromAudioBusToInterleavedTarget( I understand that miu@ commented on the appropriateness of using NEON instructions here; in that case, make a http://crbug.com/new and note it here: //TODO(chfremer): Consider using super duper vector // instructions here, https://crbug.com/abcdef. (and/or in the previous deinterlacing method). https://codereview.chromium.org/2024993004/diff/220001/media/base/audio_bus_u... File media/base/audio_bus_unittest.cc (right): https://codereview.chromium.org/2024993004/diff/220001/media/base/audio_bus_u... media/base/audio_bus_unittest.cc:71: VerifyAreEqualWithEpsilon(result, expected, 0); micro nit: I wonder if AudioBus should have an equality operator instead. I wonder if it was considered and ruled out to keep the footprint of that class smaller ... https://codereview.chromium.org/2024993004/diff/220001/media/base/audio_sampl... File media/base/audio_sample_types.h (right): https://codereview.chromium.org/2024993004/diff/220001/media/base/audio_sampl... media/base/audio_sample_types.h:1: // Copyright (c) 2016 The Chromium Authors. All rights reserved. No (c) -- got removed this (last?) year https://www.chromium.org/developers/coding-style#TOC-File-headers https://codereview.chromium.org/2024993004/diff/220001/media/base/audio_sampl... media/base/audio_sample_types.h:67: static FloatType ConvertToFloatType(SampleType source_value) { When writing e.g. mojo traits or MediaStream traits, these relatively simple adapters are just called From() and To() since anyway when instantiating, it's clear what's being done: blabla = FloatSampleTypeTraits::To<MySuperType>(...) or blabla.From<MyUberType>(...) So I'd do the same here and elsewhere in this file. https://codereview.chromium.org/2024993004/diff/220001/media/base/audio_sampl... media/base/audio_sample_types.h:158: if (source_value <= -1) { s/-1/-1.0/ ? And/or shouldn't it be called |FloatSampleTypeTraits::kMinValue| ? Maybe take those out to file-private and call it |kMinFloatSampleValue| or something like that. https://codereview.chromium.org/2024993004/diff/220001/media/base/audio_sampl... media/base/audio_sample_types.h:160: } 1-line bodies are usually not in {} here and elsewhere in this file. https://codereview.chromium.org/2024993004/diff/220001/media/base/audio_sampl... media/base/audio_sample_types.h:165: if (source_value >= 1) { Same here, this smells like a 1.0f constant :)
On 2016/06/10 23:08:48, DaleCurtis wrote: > Do you have an arm system to test on? An issue we had in the past was accidental > promotion to 64 bit types causing large slowdowns. I have a Nexus 7 at my desk, and I am sure I can get other devices for testing if needed. Do you have any pointers for me as for what tests to run and how to run them? Is it possible to run the AudioBus perf tests directly on an arm system?
https://codereview.chromium.org/2024993004/diff/220001/media/base/audio_bus.cc File media/base/audio_bus.cc (right): https://codereview.chromium.org/2024993004/diff/220001/media/base/audio_bus.c... media/base/audio_bus.cc:55: void AudioBus::CheckOverflow(int start_frame, int frames, int total_frames) { On 2016/06/12 09:41:47, mcasas wrote: > This doesn't need to be a member right? > (Uses not member vars and is private). I made this a member, because it called from a now templated method, so it must be defined in the header. I agree that it is not specific to AudioBus and could be moved to a more general place. However, this is true for any private static method anywhere, and I didn't want to move too much code around. https://codereview.chromium.org/2024993004/diff/220001/media/base/audio_bus.c... media/base/audio_bus.cc:232: NOTREACHED() << "Unsupported bytes per sample encountered."; On 2016/06/12 09:41:47, mcasas wrote: > nit: > ... encountered: " << bytes_per_sample; Done. https://codereview.chromium.org/2024993004/diff/220001/media/base/audio_bus.c... media/base/audio_bus.cc:256: NOTREACHED() << "Unsupported bytes per sample encountered."; On 2016/06/12 09:41:48, mcasas wrote: > idem nit Done. https://codereview.chromium.org/2024993004/diff/220001/media/base/audio_bus.h File media/base/audio_bus.h (right): https://codereview.chromium.org/2024993004/diff/220001/media/base/audio_bus.h... media/base/audio_bus.h:118: // DEPRECATED On 2016/06/12 09:41:48, mcasas wrote: > nit: mention the bug > // DEPRECATED (https://crbug.com/580391) > > And maybe you can even: > // TODO(crfremer): deprecated, remove (https://crbug.com/abcdef) > > :) Done. https://codereview.chromium.org/2024993004/diff/220001/media/base/audio_bus.h... media/base/audio_bus.h:130: // DEPRECATED On 2016/06/12 09:41:48, mcasas wrote: > Idem. Done. https://codereview.chromium.org/2024993004/diff/220001/media/base/audio_bus.h... media/base/audio_bus.h:285: void AudioBus::CopyConvertFromAudioBusToInterleavedTarget( On 2016/06/12 09:41:48, mcasas wrote: > I understand that miu@ commented on the > appropriateness of using NEON instructions here; > in that case, make a http://crbug.com/new and > note it here: > //TODO(chfremer): Consider using super duper vector > // instructions here, https://crbug.com/abcdef. > > (and/or in the previous deinterlacing method). Done. https://codereview.chromium.org/2024993004/diff/220001/media/base/audio_sampl... File media/base/audio_sample_types.h (right): https://codereview.chromium.org/2024993004/diff/220001/media/base/audio_sampl... media/base/audio_sample_types.h:1: // Copyright (c) 2016 The Chromium Authors. All rights reserved. On 2016/06/12 09:41:48, mcasas wrote: > No (c) -- got removed this (last?) year > https://www.chromium.org/developers/coding-style#TOC-File-headers Done. https://codereview.chromium.org/2024993004/diff/220001/media/base/audio_sampl... media/base/audio_sample_types.h:67: static FloatType ConvertToFloatType(SampleType source_value) { On 2016/06/12 09:41:48, mcasas wrote: > When writing e.g. mojo traits or MediaStream traits, > these relatively simple adapters are just called From() > and To() since anyway when instantiating, it's clear > what's being done: > > blabla = FloatSampleTypeTraits::To<MySuperType>(...) > or > blabla.From<MyUberType>(...) > > So I'd do the same here and elsewhere in this file. Done. https://codereview.chromium.org/2024993004/diff/220001/media/base/audio_sampl... media/base/audio_sample_types.h:158: if (source_value <= -1) { Thanks for catching that. Fixed. https://codereview.chromium.org/2024993004/diff/220001/media/base/audio_sampl... media/base/audio_sample_types.h:160: } On 2016/06/12 09:41:48, mcasas wrote: > 1-line bodies are usually not in {} > here and elsewhere in this file. Done. https://codereview.chromium.org/2024993004/diff/220001/media/base/audio_sampl... media/base/audio_sample_types.h:165: if (source_value >= 1) { On 2016/06/12 09:41:48, mcasas wrote: > Same here, this smells like a 1.0f constant :) Done.
On 2016/06/13 at 16:27:55, chfremer wrote: > On 2016/06/10 23:08:48, DaleCurtis wrote: > > Do you have an arm system to test on? An issue we had in the past was accidental > > promotion to 64 bit types causing large slowdowns. > > I have a Nexus 7 at my desk, and I am sure I can get other devices for testing if needed. Do you have any pointers for me as for what tests to run and how to run them? Is it possible to run the AudioBus perf tests directly on an arm system? Yes, just build the Android version of media_perftests_apk and you can run them as you did on desktop.
On 2016/06/13 16:27:55, chfremer wrote: > On 2016/06/10 23:08:48, DaleCurtis wrote: > > Do you have an arm system to test on? An issue we had in the past was > accidental > > promotion to 64 bit types causing large slowdowns. > > I have a Nexus 7 at my desk, and I am sure I can get other devices for testing > if needed. Do you have any pointers for me as for what tests to run and how to > run them? Is it possible to run the AudioBus perf tests directly on an arm > system? A quick update about my attempt to run performance tests on a Nexus 7: I was able to run media_unittests on the device, but when trying to run media_perftests on it, it gives me an error trace: E 4.197s list_tests(0a608a6b) E 4.198s list_tests(0a608a6b) INSTRUMENTATION_CODE: -1 E 4.239s Main Error occurred. Traceback (most recent call last): File "/usr/local/google/home/chfremer/code/chromium/src/build/android/test_runner.py", line 978, in main return RunTestsCommand(args) File "/usr/local/google/home/chfremer/code/chromium/src/build/android/test_runner.py", line 805, in RunTestsCommand return RunTestsInPlatformMode(args) File "/usr/local/google/home/chfremer/code/chromium/src/build/android/test_runner.py", line 869, in RunTestsInPlatformMode raw_results = test_run.RunTests() File "/usr/local/google/home/chfremer/code/chromium/src/build/android/pylib/local/device/local_device_test_run.py", line 89, in RunTests tests = self._GetTests() File "/usr/local/google/home/chfremer/code/chromium/src/build/android/pylib/local/device/local_device_gtest_run.py", line 341, in _GetTests 'Failed to list tests on any device') CommandFailedError: Failed to list tests on any device I added some debug output to the Python script in local_device_gtest_run.py to look at the list of tests returned by the gtest binary, and instead of an actual list of tests it returns the following error trace: raw_test_list = [ '[ERROR:icu_util.cc(173)] Invalid file descriptor to ICU data received.', '[FATAL:jni_android.cc(123)] Failed to find class org/chromium/ui/gl/SurfaceTextureListener', '#00 0xa38b5f8f /data/app/org.chromium.native_test-2/lib/arm/libbase.cr.so+0x0007bf8f', '#01 0xa389d7b5 /data/app/org.chromium.native_test-2/lib/arm/libbase.cr.so+0x000637b5', '#02 0xa389d849 /data/app/org.chromium.native_test-2/lib/arm/libbase.cr.so+0x00063849', '#03 0xa3247879 /data/app/org.chromium.native_test-2/lib/arm/libgl_wrapper.cr.so+0x00022879', '#04 0xa389ebc9 /data/app/org.chromium.native_test-2/lib/arm/libbase.cr.so+0x00064bc9', '#05 0xa2f9249b /data/app/org.chromium.native_test-2/lib/arm/lib_media_perftests__library.cr.so+0x0001249b', '#06 0xa2f90291 /data/app/org.chromium.native_test-2/lib/arm/lib_media_perftests__library.cr.so+0x00010291', '#07 0xa2f9198d /data/app/org.chromium.native_test-2/lib/arm/lib_media_perftests__library.cr.so+0x0001198d', '#08 0xa2f924fd /data/app/org.chromium.native_test-2/lib/arm/lib_media_perftests__library.cr.so+0x000124fd', '#09 0xa2f952ed /data/app/org.chromium.native_test-2/lib/arm/lib_media_perftests__library.cr.so+0x000152ed', '#10 0xaf2da999 /data/dalvik-cache/arm/data@app@org.chromium.native_test-2@base.apk@classes.d..., '', 'INSTRUMENTATION_CODE: -1'] I am trying to find someone who can help me figure out why that is happening. Please let me know if you have any ideas.
Try this: ninja -C <out> media_perftests_incremental <out>/bin/run_media_perftests_incremental --gtest_filter=AudioBus* You can drop the _incremental if you like, but it will speed up your builds.
On 2016/06/14 at 18:48:13, DaleCurtis wrote: > Try this: > > ninja -C <out> media_perftests_incremental > <out>/bin/run_media_perftests_incremental --gtest_filter=AudioBus* > > You can drop the _incremental if you like, but it will speed up your builds. Actually that failed for me too. Will take a look.
On 2016/06/14 19:14:31, DaleCurtis wrote: > Fix here: https://codereview.chromium.org/2066943002 Thanks so much for the fix and unblocking me! With the fix, I was able to run the AudioBus perf tests. I ran it on three devices: Nexus 7 and two Nexus 5 phones I borrowed from my teammate. Except for one outlier run, which must have been some issue with the device, results are fairly similar. It seems the patch slightly improves performance for the to_interleaved for int_8 (which is really uint_8) case. Here are the numbers: Nexus 7 Before Patch Run 1 '*RESULT audio_bus_to_interleaved: int8_t= 326.68455 ms', '*RESULT audio_bus_from_interleaved: int8_t= 114.99935 ms', '*RESULT audio_bus_to_interleaved: int16_t= 226.0422 ms', '*RESULT audio_bus_from_interleaved: int16_t= 161.80575 ms', '*RESULT audio_bus_to_interleaved: int32_t= 188.55745 ms', '*RESULT audio_bus_from_interleaved: int32_t= 124.81535 ms' Run 2 '*RESULT audio_bus_to_interleaved: int8_t= 329.30755 ms', '*RESULT audio_bus_from_interleaved: int8_t= 116.18955000000001 ms', '*RESULT audio_bus_to_interleaved: int16_t= 223.3002 ms', '*RESULT audio_bus_from_interleaved: int16_t= 163.48115 ms', '*RESULT audio_bus_to_interleaved: int32_t= 188.15765 ms', '*RESULT audio_bus_from_interleaved: int32_t= 124.34694999999999 ms' Run 3 '*RESULT audio_bus_to_interleaved: int8_t= 323.92425 ms', '*RESULT audio_bus_from_interleaved: int8_t= 117.218 ms', '*RESULT audio_bus_to_interleaved: int16_t= 220.46509999999998 ms', '*RESULT audio_bus_from_interleaved: int16_t= 160.6018 ms', '*RESULT audio_bus_to_interleaved: int32_t= 187.14600000000002 ms', '*RESULT audio_bus_from_interleaved: int32_t= 122.95075 ms' After Patch Set 9 Run 1 '*RESULT audio_bus_to_interleaved: int8_t= 219.63649999999998 ms', '*RESULT audio_bus_from_interleaved: int8_t= 104.50129999999999 ms', '*RESULT audio_bus_to_interleaved: int16_t= 230.394 ms', '*RESULT audio_bus_from_interleaved: int16_t= 160.83679999999998 ms', '*RESULT audio_bus_to_interleaved: int32_t= 188.94195 ms', '*RESULT audio_bus_from_interleaved: int32_t= 124.527 ms', Run 2 '*RESULT audio_bus_to_interleaved: int8_t= 219.72355 ms', '*RESULT audio_bus_from_interleaved: int8_t= 103.83144999999999 ms', '*RESULT audio_bus_to_interleaved: int16_t= 230.11165 ms', '*RESULT audio_bus_from_interleaved: int16_t= 162.37945 ms', '*RESULT audio_bus_to_interleaved: int32_t= 190.744 ms', '*RESULT audio_bus_from_interleaved: int32_t= 125.9125 ms', Run 3 '*RESULT audio_bus_to_interleaved: int8_t= 217.7368 ms', '*RESULT audio_bus_from_interleaved: int8_t= 103.36455000000001 ms', '*RESULT audio_bus_to_interleaved: int16_t= 229.10459999999998 ms', '*RESULT audio_bus_from_interleaved: int16_t= 160.3409 ms', '*RESULT audio_bus_to_interleaved: int32_t= 188.47655 ms', '*RESULT audio_bus_from_interleaved: int32_t= 126.55485000000002 ms' Nexus 5 (Android M) Before Patch Run 1 '*RESULT audio_bus_to_interleaved: int8_t= 170.91969999999998 ms', '*RESULT audio_bus_from_interleaved: int8_t= 51.9375 ms', '*RESULT audio_bus_to_interleaved: int16_t= 87.69550000000001 ms', '*RESULT audio_bus_from_interleaved: int16_t= 44.765550000000005 ms', '*RESULT audio_bus_to_interleaved: int32_t= 98.33574999999999 ms', '*RESULT audio_bus_from_interleaved: int32_t= 38.7813 ms' Run 2 '*RESULT audio_bus_to_interleaved: int8_t= 167.9789 ms', '*RESULT audio_bus_from_interleaved: int8_t= 49.779250000000005 ms', '*RESULT audio_bus_to_interleaved: int16_t= 85.5103 ms', '*RESULT audio_bus_from_interleaved: int16_t= 45.96925 ms', '*RESULT audio_bus_to_interleaved: int32_t= 99.11395 ms', '*RESULT audio_bus_from_interleaved: int32_t= 37.87245 ms' After Patch Set 9 Run 1 '*RESULT audio_bus_to_interleaved: int8_t= 98.99095 ms', '*RESULT audio_bus_from_interleaved: int8_t= 42.988749999999996 ms', '*RESULT audio_bus_to_interleaved: int16_t= 90.7886 ms', '*RESULT audio_bus_from_interleaved: int16_t= 45.52315 ms', '*RESULT audio_bus_to_interleaved: int32_t= 87.92255 ms', '*RESULT audio_bus_from_interleaved: int32_t= 40.897000000000006 ms' Run 2 '*RESULT audio_bus_to_interleaved: int8_t= 96.7388 ms', '*RESULT audio_bus_from_interleaved: int8_t= 42.51375 ms', '*RESULT audio_bus_to_interleaved: int16_t= 90.0026 ms', '*RESULT audio_bus_from_interleaved: int16_t= 44.35495 ms', '*RESULT audio_bus_to_interleaved: int32_t= 85.3197 ms', '*RESULT audio_bus_from_interleaved: int32_t= 35.7469 ms' Nexus 5 (Android L) Before Patch Run 1 (fluke?) '*RESULT audio_bus_to_interleaved: int8_t= 384.76155 ms', '*RESULT audio_bus_from_interleaved: int8_t= 218.16244999999998 ms', '*RESULT audio_bus_to_interleaved: int16_t= 396.83105 ms', '*RESULT audio_bus_from_interleaved: int16_t= 295.78975 ms', '*RESULT audio_bus_to_interleaved: int32_t= 584.10225 ms', '*RESULT audio_bus_from_interleaved: int32_t= 467.44899999999996 ms' Run 2 '*RESULT audio_bus_to_interleaved: int8_t= 170.3535 ms', '*RESULT audio_bus_from_interleaved: int8_t= 49.04495 ms', '*RESULT audio_bus_to_interleaved: int16_t= 85.83250000000001 ms', '*RESULT audio_bus_from_interleaved: int16_t= 44.96785 ms', '*RESULT audio_bus_to_interleaved: int32_t= 92.8631 ms', '*RESULT audio_bus_from_interleaved: int32_t= 41.85615 ms', After Patch Set 9 Run 1 '*RESULT audio_bus_to_interleaved: int8_t= 100.4884 ms', '*RESULT audio_bus_from_interleaved: int8_t= 44.93555 ms', '*RESULT audio_bus_to_interleaved: int16_t= 93.90765 ms', '*RESULT audio_bus_from_interleaved: int16_t= 45.8017 ms', '*RESULT audio_bus_to_interleaved: int32_t= 89.9615 ms', '*RESULT audio_bus_from_interleaved: int32_t= 42.11305 ms' Run 2 '*RESULT audio_bus_to_interleaved: int8_t= 96.85175000000001 ms', '*RESULT audio_bus_from_interleaved: int8_t= 42.85655 ms', '*RESULT audio_bus_to_interleaved: int16_t= 90.22285000000001 ms', '*RESULT audio_bus_from_interleaved: int16_t= 45.55935 ms', '*RESULT audio_bus_to_interleaved: int32_t= 85.89489999999999 ms', '*RESULT audio_bus_from_interleaved: int32_t= 41.51075 ms'
So it seems like on x86 we're roughly at parity or better while arm is slightly slower after your changes. The numbers are close enough that it's not a big deal, but if you can poke around and see if there is anything you can clean up, that'd be helpful.
On 2016/06/14 21:06:56, DaleCurtis wrote: > So it seems like on x86 we're roughly at parity or better while arm is slightly > slower after your changes. The numbers are close enough that it's not a big > deal, but if you can poke around and see if there is anything you can clean up, > that'd be helpful. Hmm, my interpretation of the numbers would be that after the change arm is actually slightly faster than before. Please point me to the concrete lines that indicate the opposite in case I overlooked something.
On 2016/06/14 at 21:41:24, chfremer wrote: > On 2016/06/14 21:06:56, DaleCurtis wrote: > > So it seems like on x86 we're roughly at parity or better while arm is slightly > > slower after your changes. The numbers are close enough that it's not a big > > deal, but if you can poke around and see if there is anything you can clean up, > > that'd be helpful. > > Hmm, my interpretation of the numbers would be that after the change arm is > actually slightly faster than before. Please point me to the concrete lines > that indicate the opposite in case I overlooked something. Sorry, I was only looking at the int16_t results since those are the most commonly used. You're right that the rest are faster though. Great work!
On 2016/06/14 21:47:29, DaleCurtis wrote: > On 2016/06/14 at 21:41:24, chfremer wrote: > > On 2016/06/14 21:06:56, DaleCurtis wrote: > > > So it seems like on x86 we're roughly at parity or better while arm is > slightly > > > slower after your changes. The numbers are close enough that it's not a big > > > deal, but if you can poke around and see if there is anything you can clean > up, > > > that'd be helpful. > > > > Hmm, my interpretation of the numbers would be that after the change arm is > > actually slightly faster than before. Please point me to the concrete lines > > that indicate the opposite in case I overlooked something. > > Sorry, I was only looking at the int16_t results since those are the most > commonly used. You're right that the rest are faster though. Great work! Oh, you are right. The to_interleaved for int16_t indeed seems to be a little bit slower after the patch throughout all test runs. I hope that difference is small enough to be acceptable. I will spend a few minutes checking if I can find a way to improve it, but I don't see any obvious wasted cycles right now.
On 2016/06/14 21:56:07, chfremer wrote: > On 2016/06/14 21:47:29, DaleCurtis wrote: > > On 2016/06/14 at 21:41:24, chfremer wrote: > > > On 2016/06/14 21:06:56, DaleCurtis wrote: > > > > So it seems like on x86 we're roughly at parity or better while arm is > > slightly > > > > slower after your changes. The numbers are close enough that it's not a > big > > > > deal, but if you can poke around and see if there is anything you can > clean > > up, > > > > that'd be helpful. > > > > > > Hmm, my interpretation of the numbers would be that after the change arm is > > > actually slightly faster than before. Please point me to the concrete lines > > > that indicate the opposite in case I overlooked something. > > > > Sorry, I was only looking at the int16_t results since those are the most > > commonly used. You're right that the rest are faster though. Great work! > > Oh, you are right. The to_interleaved for int16_t indeed seems to be a little > bit slower after the patch throughout all test runs. I hope that difference is > small enough to be acceptable. I will spend a few minutes checking if I can find > > a way to improve it, but I don't see any obvious wasted cycles right now. Okay, so I found that when doing Variant B instead of Variant A (see below) the performance for the to_interleaved int16_t case improves to be on par with what we had before the patch. Only difference is to not store scaling_factor in a temp variable. This kind of makes me doubt my approach to performance testing, since I have a hard time believing that after compiler optimizations one version would really be more efficient than the other. Am I maybe missing any flags I need to set in order to get all the optimizations? Here are the contents of my args.gn file: target_os = "android" target_cpu = "arm" is_debug = false is_component_build = true is_clang = true symbol_level = 1 enable_incremental_javac = true use_goma = true Code for Variant A: FloatType scaling_factor; if (source_value < 0) { if (source_value <= FloatSampleTypeTraits<float>::kMinValue) return kMinValue; scaling_factor = ScalingFactors<FloatType>::kForNegativeInput; } else { if (source_value >= FloatSampleTypeTraits<float>::kMaxValue) return kMaxValue; scaling_factor = ScalingFactors<FloatType>::kForPositiveInput; } return static_cast<SampleType>((source_value * scaling_factor) + static_cast<FloatType>(kZeroPointValue)); Code for Variant B: if (source_value < 0) { if (source_value <= FloatSampleTypeTraits<float>::kMinValue) return kMinValue; return static_cast<SampleType>( (source_value * ScalingFactors<FloatType>::kForNegativeInput) + static_cast<FloatType>(kZeroPointValue)); } else { if (source_value >= FloatSampleTypeTraits<float>::kMaxValue) return kMaxValue; return static_cast<SampleType>( (source_value * ScalingFactors<FloatType>::kForPositiveInput) + static_cast<FloatType>(kZeroPointValue)); }
On 2016/06/14 at 22:20:03, chfremer wrote: > On 2016/06/14 21:56:07, chfremer wrote: > > On 2016/06/14 21:47:29, DaleCurtis wrote: > > > On 2016/06/14 at 21:41:24, chfremer wrote: > > > > On 2016/06/14 21:06:56, DaleCurtis wrote: > > > > > So it seems like on x86 we're roughly at parity or better while arm is > > > slightly > > > > > slower after your changes. The numbers are close enough that it's not a > > big > > > > > deal, but if you can poke around and see if there is anything you can > > clean > > > up, > > > > > that'd be helpful. > > > > > > > > Hmm, my interpretation of the numbers would be that after the change arm is > > > > actually slightly faster than before. Please point me to the concrete lines > > > > that indicate the opposite in case I overlooked something. > > > > > > Sorry, I was only looking at the int16_t results since those are the most > > > commonly used. You're right that the rest are faster though. Great work! > > > > Oh, you are right. The to_interleaved for int16_t indeed seems to be a little > > bit slower after the patch throughout all test runs. I hope that difference is > > small enough to be acceptable. I will spend a few minutes checking if I can find > > > > a way to improve it, but I don't see any obvious wasted cycles right now. > > Okay, so I found that when doing Variant B instead of Variant A (see below) the > performance for the to_interleaved int16_t case improves to be on par with what > we had before the patch. Only difference is to not store scaling_factor in a temp > variable. This kind of makes me doubt my approach to performance > testing, since I have a hard time believing that after compiler optimizations > one version would really be more efficient than the other. Am I maybe missing any > flags I need to set in order to get all the optimizations? Here are the contents > of my args.gn file: > > target_os = "android" > target_cpu = "arm" > is_debug = false > is_component_build = true > is_clang = true > symbol_level = 1 > enable_incremental_javac = true > use_goma = true > > Code for Variant A: > > FloatType scaling_factor; > if (source_value < 0) { > if (source_value <= FloatSampleTypeTraits<float>::kMinValue) > return kMinValue; > scaling_factor = ScalingFactors<FloatType>::kForNegativeInput; > } else { > if (source_value >= FloatSampleTypeTraits<float>::kMaxValue) > return kMaxValue; > scaling_factor = ScalingFactors<FloatType>::kForPositiveInput; > } > > return static_cast<SampleType>((source_value * scaling_factor) + > static_cast<FloatType>(kZeroPointValue)); > > > Code for Variant B: > > if (source_value < 0) { > if (source_value <= FloatSampleTypeTraits<float>::kMinValue) > return kMinValue; > return static_cast<SampleType>( > (source_value * ScalingFactors<FloatType>::kForNegativeInput) + > static_cast<FloatType>(kZeroPointValue)); > } else { > if (source_value >= FloatSampleTypeTraits<float>::kMaxValue) > return kMaxValue; > return static_cast<SampleType>( > (source_value * ScalingFactors<FloatType>::kForPositiveInput) + > static_cast<FloatType>(kZeroPointValue)); > } The best is to set is_official_build=true and make sure dcheck_always_on=false w/ a Release build. DCHECKs will kill the perf tests.
Also, when I wrote this a long time ago, things like temp vars and conditional ordering mattered a lot unfortunately... hence the perf test :)
On 2016/06/14 22:30:21, DaleCurtis wrote: > Also, when I wrote this a long time ago, things like temp vars and conditional > ordering mattered a lot unfortunately... hence the perf test :) Thanks, Dale. I applied the two extra build flags, and this seem to make a difference. However, it did not eliminate the fact that removing the temp variable also still has an effect. With the new build flags, Patch Set 9 seems to no longer show any regression for the to_interleaved int16_t case, which is good. However, with variant B (i.e., temp var removed), that case improves even more, but at the cost of most other cases. Since the conversion from/to int16_t seems to be the most relevant, I think we may want to go with this variant. If you agree, then I will upload it as a new Patch Set. Please let me know what you think. Nexus 7 Before Patch (with new build flags) Run 1 audio_bus_to_interleaved: int8_t= 274.11195 ms audio_bus_from_interleaved: int8_t= 111.98119999999999 ms audio_bus_to_interleaved: int16_t= 220.282 ms audio_bus_from_interleaved: int16_t= 163.9038 ms audio_bus_to_interleaved: int32_t= 189.8346 ms audio_bus_from_interleaved: int32_t= 123.92729999999999 ms Run 2 audio_bus_to_interleaved: int8_t= 270.90610000000004 ms audio_bus_from_interleaved: int8_t= 113.98164999999999 ms audio_bus_to_interleaved: int16_t= 221.67055 ms audio_bus_from_interleaved: int16_t= 163.8657 ms audio_bus_to_interleaved: int32_t= 188.89770000000001 ms audio_bus_from_interleaved: int32_t= 124.144 ms Run 3 audio_bus_to_interleaved: int8_t= 278.2364 ms audio_bus_from_interleaved: int8_t= 113.31790000000001 ms audio_bus_to_interleaved: int16_t= 220.5719 ms audio_bus_from_interleaved: int16_t= 163.60935 ms audio_bus_to_interleaved: int32_t= 188.1424 ms audio_bus_from_interleaved: int32_t= 126.90885 ms After Patch Set 9 (with new build flags) Run 1 audio_bus_to_interleaved: int8_t= 220.46965 ms audio_bus_from_interleaved: int8_t= 106.7825 ms audio_bus_to_interleaved: int16_t= 218.46005 ms audio_bus_from_interleaved: int16_t= 163.2629 ms audio_bus_to_interleaved: int32_t= 180.26125 ms audio_bus_from_interleaved: int32_t= 125.06105 ms Run 2 audio_bus_to_interleaved: int8_t= 219.71435000000002 ms audio_bus_from_interleaved: int8_t= 104.2816 ms audio_bus_to_interleaved: int16_t= 218.1183 ms audio_bus_from_interleaved: int16_t= 161.62415000000001 ms audio_bus_to_interleaved: int32_t= 178.5828 ms audio_bus_from_interleaved: int32_t= 124.02955 ms Run 3 audio_bus_to_interleaved: int8_t= 221.3074 ms audio_bus_from_interleaved: int8_t= 105.70219999999999 ms audio_bus_to_interleaved: int16_t= 219.17114999999998 ms audio_bus_from_interleaved: int16_t= 164.0411 ms audio_bus_to_interleaved: int32_t= 180.18644999999998 ms audio_bus_from_interleaved: int32_t= 126.37180000000001 ms After Patch Set 9 Variant B (with new build flags) Run 1 audio_bus_to_interleaved: int8_t= 242.61935 ms audio_bus_from_interleaved: int8_t= 104.21905 ms audio_bus_to_interleaved: int16_t= 210.63235 ms audio_bus_from_interleaved: int16_t= 164.27304999999998 ms audio_bus_to_interleaved: int32_t= 190.03910000000002 ms audio_bus_from_interleaved: int32_t= 125.43945000000001 ms Run 2 audio_bus_to_interleaved: int8_t= 242.78564999999998 ms audio_bus_from_interleaved: int8_t= 105.7648 ms audio_bus_to_interleaved: int16_t= 210.20815 ms audio_bus_from_interleaved: int16_t= 160.2432 ms audio_bus_to_interleaved: int32_t= 187.85705000000002 ms audio_bus_from_interleaved: int32_t= 122.93244999999999 ms Run 3 audio_bus_to_interleaved: int8_t= 243.4433 ms audio_bus_from_interleaved: int8_t= 104.08324999999999 ms audio_bus_to_interleaved: int16_t= 211.29305 ms audio_bus_from_interleaved: int16_t= 162.18875 ms audio_bus_to_interleaved: int32_t= 191.1148 ms audio_bus_from_interleaved: int32_t= 123.74114999999999 ms Before Patch (with old build flags) Run 1 '*RESULT audio_bus_to_interleaved: int8_t= 326.68455 ms', '*RESULT audio_bus_from_interleaved: int8_t= 114.99935 ms', '*RESULT audio_bus_to_interleaved: int16_t= 226.0422 ms', '*RESULT audio_bus_from_interleaved: int16_t= 161.80575 ms', '*RESULT audio_bus_to_interleaved: int32_t= 188.55745 ms', '*RESULT audio_bus_from_interleaved: int32_t= 124.81535 ms' Run 2 '*RESULT audio_bus_to_interleaved: int8_t= 329.30755 ms', '*RESULT audio_bus_from_interleaved: int8_t= 116.18955000000001 ms', '*RESULT audio_bus_to_interleaved: int16_t= 223.3002 ms', '*RESULT audio_bus_from_interleaved: int16_t= 163.48115 ms', '*RESULT audio_bus_to_interleaved: int32_t= 188.15765 ms', '*RESULT audio_bus_from_interleaved: int32_t= 124.34694999999999 ms' Run 3 '*RESULT audio_bus_to_interleaved: int8_t= 323.92425 ms', '*RESULT audio_bus_from_interleaved: int8_t= 117.218 ms', '*RESULT audio_bus_to_interleaved: int16_t= 220.46509999999998 ms', '*RESULT audio_bus_from_interleaved: int16_t= 160.6018 ms', '*RESULT audio_bus_to_interleaved: int32_t= 187.14600000000002 ms', '*RESULT audio_bus_from_interleaved: int32_t= 122.95075 ms' After Patch Set 9 (with old build flags) Run 1 '*RESULT audio_bus_to_interleaved: int8_t= 219.63649999999998 ms', '*RESULT audio_bus_from_interleaved: int8_t= 104.50129999999999 ms', '*RESULT audio_bus_to_interleaved: int16_t= 230.394 ms', '*RESULT audio_bus_from_interleaved: int16_t= 160.83679999999998 ms', '*RESULT audio_bus_to_interleaved: int32_t= 188.94195 ms', '*RESULT audio_bus_from_interleaved: int32_t= 124.527 ms', Run 2 '*RESULT audio_bus_to_interleaved: int8_t= 219.72355 ms', '*RESULT audio_bus_from_interleaved: int8_t= 103.83144999999999 ms', '*RESULT audio_bus_to_interleaved: int16_t= 230.11165 ms', '*RESULT audio_bus_from_interleaved: int16_t= 162.37945 ms', '*RESULT audio_bus_to_interleaved: int32_t= 190.744 ms', '*RESULT audio_bus_from_interleaved: int32_t= 125.9125 ms', Run 3 '*RESULT audio_bus_to_interleaved: int8_t= 217.7368 ms', '*RESULT audio_bus_from_interleaved: int8_t= 103.36455000000001 ms', '*RESULT audio_bus_to_interleaved: int16_t= 229.10459999999998 ms', '*RESULT audio_bus_from_interleaved: int16_t= 160.3409 ms', '*RESULT audio_bus_to_interleaved: int32_t= 188.47655 ms', '*RESULT audio_bus_from_interleaved: int32_t= 126.55485000000002 ms'
Doesn't matter much either way, but out of curiosity does the following change things? I remember using the ternary explicitly to help performance: return source_value < 0 ? (source_value <= FloatSampleTypeTraits<float>::kMinValue ? kMinValue : static_cast<SampleType>( (source_value * ScalingFactors<FloatType>::kForNegativeInput) + static_cast<FloatType>(kZeroPointValue))) : (source_value >= FloatSampleTypeTraits<float>::kMaxValue ? kMaxValue : static_cast<SampleType>( (source_value * ScalingFactors<FloatType>::kForPositiveInput) + static_cast<FloatType>(kZeroPointValue)));
On 2016/06/14 23:25:26, DaleCurtis wrote: > Doesn't matter much either way, but out of curiosity does the following change > things? I remember using the ternary explicitly to help performance: > > return source_value < 0 > ? (source_value <= FloatSampleTypeTraits<float>::kMinValue > ? kMinValue > : static_cast<SampleType>( > (source_value * > ScalingFactors<FloatType>::kForNegativeInput) + > static_cast<FloatType>(kZeroPointValue))) > : (source_value >= FloatSampleTypeTraits<float>::kMaxValue > ? kMaxValue > : static_cast<SampleType>( > (source_value * > ScalingFactors<FloatType>::kForPositiveInput) + > static_cast<FloatType>(kZeroPointValue))); Interesting experiment (let us call it Variant C). Just tried it, and found that it does not change anything compared to the Variant B from before. In that case, I think we should go with Variant B, because I feel readability is slightly better than Variant C. I will go ahead and prepare the patch set for it.
Patchset #10 (id:260001) has been deleted
Finished performance tweaks.
lgtm assuming the next step is to remove the deprecated methods and cull the unused variants? https://codereview.chromium.org/2024993004/diff/300001/media/base/audio_bus.h File media/base/audio_bus.h (right): https://codereview.chromium.org/2024993004/diff/300001/media/base/audio_bus.h... media/base/audio_bus.h:226: // Delegates to FromInterleavedPartial() I'm a little unhappy about all this code in the header; IIRC this means each caller will get a copy of this code instead of doing a function call to perform this work. My template-fu is a bit out of date, but is it possible to put these in the .cc file these days? https://codereview.chromium.org/2024993004/diff/300001/media/base/audio_bus.h... media/base/audio_bus.h:256: // TODO(dalecurtis): See if intrinsic optimizations help any here (for each We can drop this. They do help a lot, but are crufty to maintain. https://codereview.chromium.org/2024993004/diff/300001/media/base/audio_bus.h... media/base/audio_bus.h:268: // TODO(chfremer): Consider using vector instructions to speed this up, Ditto for the same reason, you'd have to write x86 and arm versions. https://codereview.chromium.org/2024993004/diff/300001/media/base/audio_bus.h... media/base/audio_bus.h:290: // TODO(chfremer): Consider using vector instructions to speed this up, Ditto.
https://codereview.chromium.org/2024993004/diff/300001/media/base/audio_bus.h File media/base/audio_bus.h (right): https://codereview.chromium.org/2024993004/diff/300001/media/base/audio_bus.h... media/base/audio_bus.h:226: // Delegates to FromInterleavedPartial() I share your concern, but I am not aware of any technique to work around this other than not using templated methods. It seems for using the templated approach, we have to pay with stronger coupling of the calling code to this class. But I am also not an expert on the topic of template features of C++. If there is a better way, I would love to learn about it. https://codereview.chromium.org/2024993004/diff/300001/media/base/audio_bus.h... media/base/audio_bus.h:256: // TODO(dalecurtis): See if intrinsic optimizations help any here (for each On 2016/06/15 20:50:00, DaleCurtis wrote: > We can drop this. They do help a lot, but are crufty to maintain. Done. https://codereview.chromium.org/2024993004/diff/300001/media/base/audio_bus.h... media/base/audio_bus.h:268: // TODO(chfremer): Consider using vector instructions to speed this up, Agreed that it is harder to maintain. But if performance is a high-enough priority, it may still make sense to do this. But I am not currently anticipating to do this work next, so I am not sure whether or not it makes sense to keep the TODOs in the code.
The CQ bit was checked by chfremer@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from miu@chromium.org, dalecurtis@chromium.org Link to the patchset: https://codereview.chromium.org/2024993004/#ps320001 (title: "Removed a todo")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/2024993004/320001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: win_clang on tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_clang/builds/...)
re the performance conversation: 1. Regarding the local temporary variable affecting performance: The only way to really understand what's happening here is to look at the assembly output from the compiler. FWIW, performance should be better when not using the temporary because there ARE code paths where it was not needed (the clipping tests). 2. Are we compiling for speed or size? If the latter, that could mean performance is highly-dependent on constructs and order-of-operations in the code. Regardless, have we explicitly set the "compile for speed" options in the build files?
On 2016/06/20 20:31:38, miu wrote: > 2. Are we compiling for speed or size? If the latter, that could mean > performance is highly-dependent on constructs and order-of-operations in the > code. Regardless, have we explicitly set the "compile for speed" options in the > build files? What exactly are these "compile for speed" options? Is there a list or document you could point me to? For the tests on ARM, here are the contents of the args.gn file I used: target_os = "android" target_cpu = "arm" is_debug = false is_component_build = true is_official_build=true is_clang = true dcheck_always_on=false symbol_level = 1 enable_incremental_javac = true use_goma = true For the tests on Linux my args.gn file looked as follows: is_component_build = true is_debug = false use_goma = true
is_official_build should be sufficient.
On 2016/06/20 22:36:55, DaleCurtis wrote: > is_official_build should be sufficient. Does that actually enable "compile for speed?" I thought, for official release builds, the default was "compile for size" unless build files explicitly configure things otherwise?
On 2016/06/20 at 22:48:18, miu wrote: > On 2016/06/20 22:36:55, DaleCurtis wrote: > > is_official_build should be sufficient. > > Does that actually enable "compile for speed?" I thought, for official release builds, the default was "compile for size" unless build files explicitly configure things otherwise? Not sure about that, there are some things we do differently, but here are the options: https://cs.chromium.org/chromium/src/build/config/compiler/BUILD.gn?q=is_offi... Note: We only build android official builds with GN though. If we want to be pedantic we'll need to build with .gyp and retest.
Just wanted to check back with you. Are we okay to check in or do you feel more testing is required?
I think this is fine to land.
The CQ bit was checked by chfremer@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Message was sent while issue was closed.
Description was changed from ========== Applied diff from eklavyamirani's cl [1] and rebased to latest master revision. This is the starting point for taking over the cl and continue working on it. [1] https://codereview.chromium.org/1769373003/ BUG=580391 R=mcasas@chromium.org ========== to ========== Applied diff from eklavyamirani's cl [1] and rebased to latest master revision. This is the starting point for taking over the cl and continue working on it. [1] https://codereview.chromium.org/1769373003/ BUG=580391 R=mcasas@chromium.org ==========
Message was sent while issue was closed.
Committed patchset #12 (id:320001)
Message was sent while issue was closed.
Description was changed from ========== Applied diff from eklavyamirani's cl [1] and rebased to latest master revision. This is the starting point for taking over the cl and continue working on it. [1] https://codereview.chromium.org/1769373003/ BUG=580391 R=mcasas@chromium.org ========== to ========== Applied diff from eklavyamirani's cl [1] and rebased to latest master revision. This is the starting point for taking over the cl and continue working on it. [1] https://codereview.chromium.org/1769373003/ BUG=580391 R=mcasas@chromium.org Committed: https://crrev.com/c9e8ada83949be0314cc2d1d90df7935c3dc9534 Cr-Commit-Position: refs/heads/master@{#401952} ==========
Message was sent while issue was closed.
Patchset 12 (id:??) landed as https://crrev.com/c9e8ada83949be0314cc2d1d90df7935c3dc9534 Cr-Commit-Position: refs/heads/master@{#401952} |