|
|
Chromium Code Reviews|
Created:
7 years, 3 months ago by sheu Modified:
7 years, 1 month ago CC:
chromium-reviews, jam, apatrick_chromium, joi+watch-content_chromium.org, feature-media-reviews_chromium.org, darin-cc_chromium.org, piman+watch_chromium.org, Ami GONE FROM CHROMIUM, wuchengli Base URL:
https://chromium.googlesource.com/chromium/src.git@git-svn Visibility:
Public. |
DescriptionRemove GSC usage from ExynosVideoDecodeAccelerator.
With support for compositing from GL_TEXTURE_EXTERNAL_OES image (and support
from the Exynos GL stack for creating those textures), the video decoder
does not have to perform an explicit YUV->RGB color conversion step.
+27% throughput on birds0.h264 through vda_unittest.
BUG=167417
TEST=local build, run on CrOS snow, vda_unittest
TEST=local build, run on Android/clank
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=235128
Patch Set 1 #Patch Set 2 : 85f017d7 Initial. #Patch Set 3 : 7e4634ee Functional. #
Total comments: 15
Patch Set 4 : 935aea25 Nits + rebase + unittests #Patch Set 5 : d3e0d67b Rebase; reuploading deleted patchset. #Patch Set 6 : 9a29ff6c Unittest fixes. #Patch Set 7 : 11352e20 Rebase. #
Total comments: 4
Patch Set 8 : d1d7ea86 std::list<> -> std::queue<> #
Total comments: 6
Patch Set 9 : 11845b4b crop fix, rebase. #
Total comments: 2
Messages
Total messages: 34 (0 generated)
Done. This works on both HTML at ToT, and an old build of Flash from ihf@ with GL_TEXTURE_EXTERNAL_OES support.
Please make sure that you test: - DASH (H264 and VP8) - raw stream resolution change (H264 and VP8) - WebRTC decode. https://chromiumcodereview.appspot.com/23526070/diff/5001/content/common/gpu/... File content/common/gpu/media/exynos_video_decode_accelerator.cc (right): https://chromiumcodereview.appspot.com/23526070/diff/5001/content/common/gpu/... content/common/gpu/media/exynos_video_decode_accelerator.cc:380: EGLint attrs[] = { Can some of this not concerning output_record be extracted to outside of the loop? https://chromiumcodereview.appspot.com/23526070/diff/5001/content/common/gpu/... content/common/gpu/media/exynos_video_decode_accelerator.cc:1144: DCHECK(dqbuf.timestamp.tv_sec >= 0); DCHECK_GE possible? https://chromiumcodereview.appspot.com/23526070/diff/5001/content/common/gpu/... content/common/gpu/media/exynos_video_decode_accelerator.cc:1613: // Gsc will get enqueued in AssignPictureBuffersTask(). Not anymore. https://chromiumcodereview.appspot.com/23526070/diff/5001/content/common/gpu/... File content/common/gpu/media/exynos_video_decode_accelerator.h (right): https://chromiumcodereview.appspot.com/23526070/diff/5001/content/common/gpu/... content/common/gpu/media/exynos_video_decode_accelerator.h:242: // Create MFC output and GSC input and output buffers for the given |format|. Remove GSC reference. https://chromiumcodereview.appspot.com/23526070/diff/5001/content/common/gpu/... content/common/gpu/media/exynos_video_decode_accelerator.h:369: // Output buffers ready to use, as a FIFO since we want oldest-first. Document why oldest-first please. https://chromiumcodereview.appspot.com/23526070/diff/5001/content/common/gpu/... content/common/gpu/media/exynos_video_decode_accelerator.h:370: std::list<int> mfc_free_output_buffers_; std::queue?
Rebased, updated, also unittests. Can you point me at the test cases for what you want tested above? DASH in particular I keep hearing about and not knowing. https://chromiumcodereview.appspot.com/23526070/diff/5001/content/common/gpu/... File content/common/gpu/media/exynos_video_decode_accelerator.cc (right): https://chromiumcodereview.appspot.com/23526070/diff/5001/content/common/gpu/... content/common/gpu/media/exynos_video_decode_accelerator.cc:380: EGLint attrs[] = { On 2013/09/30 05:16:25, posciak wrote: > Can some of this not concerning output_record be extracted to outside of the > loop? Done. https://chromiumcodereview.appspot.com/23526070/diff/5001/content/common/gpu/... content/common/gpu/media/exynos_video_decode_accelerator.cc:1144: DCHECK(dqbuf.timestamp.tv_sec >= 0); On 2013/09/30 05:16:25, posciak wrote: > DCHECK_GE possible? Done. https://chromiumcodereview.appspot.com/23526070/diff/5001/content/common/gpu/... content/common/gpu/media/exynos_video_decode_accelerator.cc:1613: // Gsc will get enqueued in AssignPictureBuffersTask(). On 2013/09/30 05:16:25, posciak wrote: > Not anymore. Done. https://chromiumcodereview.appspot.com/23526070/diff/5001/content/common/gpu/... File content/common/gpu/media/exynos_video_decode_accelerator.h (right): https://chromiumcodereview.appspot.com/23526070/diff/5001/content/common/gpu/... content/common/gpu/media/exynos_video_decode_accelerator.h:242: // Create MFC output and GSC input and output buffers for the given |format|. On 2013/09/30 05:16:25, posciak wrote: > Remove GSC reference. Done. https://chromiumcodereview.appspot.com/23526070/diff/5001/content/common/gpu/... content/common/gpu/media/exynos_video_decode_accelerator.h:369: // Output buffers ready to use, as a FIFO since we want oldest-first. On 2013/09/30 05:16:25, posciak wrote: > Document why oldest-first please. Done. https://chromiumcodereview.appspot.com/23526070/diff/5001/content/common/gpu/... content/common/gpu/media/exynos_video_decode_accelerator.h:370: std::list<int> mfc_free_output_buffers_; On 2013/09/30 05:16:25, posciak wrote: > std::queue? Can't clear() them.
FYI, as mentioned, I'm easily getting 25% framerate improvements using this: https://code.google.com/p/chromium/issues/detail?id=302870#c28
https://chromiumcodereview.appspot.com/23526070/diff/5001/content/common/gpu/... File content/common/gpu/media/exynos_video_decode_accelerator.h (right): https://chromiumcodereview.appspot.com/23526070/diff/5001/content/common/gpu/... content/common/gpu/media/exynos_video_decode_accelerator.h:370: std::list<int> mfc_free_output_buffers_; On 2013/10/04 23:25:56, sheu wrote: > On 2013/09/30 05:16:25, posciak wrote: > > std::queue? > > Can't clear() them. https://code.google.com/p/chromium/codesearch#chromium/src/content/common/gpu...
IIRC, VDA test failed when I ran it with this patch.
Rebased and updated. vda_unittest needed some fixing too, to fix up the Thumbnail* test cases. That particular test case is still failing, but I think it's because the old "golden image" checksum is incorrect. The previous run only decoded 247 out of 250 frames of test-25fps.h264; now it decodes all 250 frames. https://chromiumcodereview.appspot.com/23526070/diff/5001/content/common/gpu/... File content/common/gpu/media/exynos_video_decode_accelerator.h (right): https://chromiumcodereview.appspot.com/23526070/diff/5001/content/common/gpu/... content/common/gpu/media/exynos_video_decode_accelerator.h:370: std::list<int> mfc_free_output_buffers_; On 2013/10/05 00:06:33, posciak wrote: > On 2013/10/04 23:25:56, sheu wrote: > > On 2013/09/30 05:16:25, posciak wrote: > > > std::queue? > > > > Can't clear() them. > > https://code.google.com/p/chromium/codesearch#chromium/src/content/common/gpu... Seems kind of... inefficient?
https://chromiumcodereview.appspot.com/23526070/diff/5001/content/common/gpu/... File content/common/gpu/media/exynos_video_decode_accelerator.h (right): https://chromiumcodereview.appspot.com/23526070/diff/5001/content/common/gpu/... content/common/gpu/media/exynos_video_decode_accelerator.h:370: std::list<int> mfc_free_output_buffers_; On 2013/11/07 02:37:27, sheu wrote: > On 2013/10/05 00:06:33, posciak wrote: > > On 2013/10/04 23:25:56, sheu wrote: > > > On 2013/09/30 05:16:25, posciak wrote: > > > > std::queue? > > > > > > Can't clear() them. > > > > > https://code.google.com/p/chromium/codesearch#chromium/src/content/common/gpu... > > Seems kind of... inefficient? Seems kind of an... overoptimization? ;) Especially since it's only on cleanup. I say way worth it, instead of having to remember this one comment everywhere in the code and risking not maintaining the FIFO requirement.
All the prereqs are in place for landing this. posciak@: any more comments on this CL?
On 2013/11/11 22:39:49, sheu wrote: > All the prereqs are in place for landing this. posciak@: any more comments on > this CL? Did you miss my latest one at https://chromiumcodereview.appspot.com/23526070/diff/5001/content/common/gpu/... ?
On 2013/11/11 23:43:25, posciak wrote: > On 2013/11/11 22:39:49, sheu wrote: > > All the prereqs are in place for landing this. posciak@: any more comments on > > this CL? > > Did you miss my latest one at > https://chromiumcodereview.appspot.com/23526070/diff/5001/content/common/gpu/... > ? Yeah, I can do that. I was just wondering if there's any more I should fold into my next upload.
On 2013/11/11 23:44:58, sheu wrote: > On 2013/11/11 23:43:25, posciak wrote: > > On 2013/11/11 22:39:49, sheu wrote: > > > All the prereqs are in place for landing this. posciak@: any more comments > on > > > this CL? > > > > Did you miss my latest one at > > > https://chromiumcodereview.appspot.com/23526070/diff/5001/content/common/gpu/... > > ? > > Yeah, I can do that. I was just wondering if there's any more I should fold > into my next upload. Did another pass. Small nits + the queue stuff above, but otherwise lgtm. Not really familiar with the GL stuff though. Would be good if Wu-Cheng could doublecheck PictureReady paths. Please test DASH and raw stream res change on both H264 and VP8 and WebRTC decode.
Sorry forgot to publish comments. https://chromiumcodereview.appspot.com/23526070/diff/268002/content/common/gp... File content/common/gpu/media/exynos_video_decode_accelerator.cc (right): https://chromiumcodereview.appspot.com/23526070/diff/268002/content/common/gp... content/common/gpu/media/exynos_video_decode_accelerator.cc:407: attrs[11] = frame_buffer_size_.width(); To be supercorrect, I think we should use bytesperline from v4l2_format struct instead of this? Here and below. https://chromiumcodereview.appspot.com/23526070/diff/268002/content/common/gp... content/common/gpu/media/exynos_video_decode_accelerator.cc:1918: } while (i < mfc_output_buffer_map_.size()); Why do-while and not for?
Updated. https://chromiumcodereview.appspot.com/23526070/diff/268002/content/common/gp... File content/common/gpu/media/exynos_video_decode_accelerator.cc (right): https://chromiumcodereview.appspot.com/23526070/diff/268002/content/common/gp... content/common/gpu/media/exynos_video_decode_accelerator.cc:407: attrs[11] = frame_buffer_size_.width(); On 2013/11/12 02:20:25, posciak wrote: > To be supercorrect, I think we should use bytesperline from v4l2_format struct > instead of this? Here and below. I don't think it's worth more state, especially since we already assume NV12M. In fact, mfc_output_buffer_size_[] wasn't being used either, so I'm going to remove it. https://chromiumcodereview.appspot.com/23526070/diff/268002/content/common/gp... content/common/gpu/media/exynos_video_decode_accelerator.cc:1918: } while (i < mfc_output_buffer_map_.size()); On 2013/11/12 02:20:25, posciak wrote: > Why do-while and not for? Saves one conditional up above, since we already check that mfc_output_buffer_map_.size() != 0. Is awesome.
fischman@: OWNERS check. Thanks.
piman@: PTAL at content/gpu/gpu_main.cc. This is just about removing a presandbox bit so it should be trivial.
lgtm
LGTM++ https://chromiumcodereview.appspot.com/23526070/diff/578001/content/common/gp... File content/common/gpu/media/exynos_video_decode_accelerator.cc (right): https://chromiumcodereview.appspot.com/23526070/diff/578001/content/common/gp... content/common/gpu/media/exynos_video_decode_accelerator.cc:331: format.fmt.pix_mp.pixelformat = V4L2_PIX_FMT_NV12M; I thought 16X16 was a perf win? https://chromiumcodereview.appspot.com/23526070/diff/578001/content/common/gp... File content/common/gpu/media/rendering_helper.cc (right): https://chromiumcodereview.appspot.com/23526070/diff/578001/content/common/gp... content/common/gpu/media/rendering_helper.cc:334: "#extension GL_OES_EGL_image_external : enable\n" Sure would be nice if you could use STRINGIZE https://chromiumcodereview.appspot.com/23526070/diff/578001/content/common/gp... content/common/gpu/media/rendering_helper.cc:334: "#extension GL_OES_EGL_image_external : enable\n" This works on windows/angle?
https://chromiumcodereview.appspot.com/23526070/diff/578001/content/common/gp... File content/common/gpu/media/exynos_video_decode_accelerator.cc (right): https://chromiumcodereview.appspot.com/23526070/diff/578001/content/common/gp... content/common/gpu/media/exynos_video_decode_accelerator.cc:331: format.fmt.pix_mp.pixelformat = V4L2_PIX_FMT_NV12M; On 2013/11/12 19:36:44, Ami Fischman wrote: > I thought 16X16 was a perf win? It is a perf win... for those hardware blocks that can read it. GSC understands the tiled format, so we can MFC -> GSC in this format. Unfortunately, the GPU does not, so direct MFC -> GPU cannot use it. Fortunately, the perf win from cutting out GSC entirely is mega epic compared to what we have here. https://chromiumcodereview.appspot.com/23526070/diff/578001/content/common/gp... File content/common/gpu/media/rendering_helper.cc (right): https://chromiumcodereview.appspot.com/23526070/diff/578001/content/common/gp... content/common/gpu/media/rendering_helper.cc:334: "#extension GL_OES_EGL_image_external : enable\n" On 2013/11/12 19:36:44, Ami Fischman wrote: > Sure would be nice if you could use STRINGIZE Can't, since we have an "#extension" preprocessor bit. (Not sure if you meant that in passing or what.) https://chromiumcodereview.appspot.com/23526070/diff/578001/content/common/gp... content/common/gpu/media/rendering_helper.cc:334: "#extension GL_OES_EGL_image_external : enable\n" On 2013/11/12 19:36:44, Ami Fischman wrote: > This works on windows/angle? It should. To the trybot!
On Tue, Nov 12, 2013 at 11:41 AM, <sheu@chromium.org> wrote: > Fortunately, the perf win from cutting out GSC entirely is mega epic > compared to what we have here. > Give that CL description some perf win details. CL descriptions love perf win details. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
danakj@: I'd like to make sure you're fine with the change just to cc/layers/video_layer_impl.cc. For what it's worth, I actually did build and test the change on an Android device.
https://chromiumcodereview.appspot.com/23526070/diff/718001/cc/layers/video_l... File cc/layers/video_layer_impl.cc (right): https://chromiumcodereview.appspot.com/23526070/diff/718001/cc/layers/video_l... cc/layers/video_layer_impl.cc:217: gfx::Transform scale; I think the old code would be preferable. This initializes an identity matrix, then scales it, then multiplies it. The old code initializes a matrx from the stream texture, then scales it. Avoids the identity step and the extra multiply.
On 2013/11/13 23:27:18, danakj wrote: > https://chromiumcodereview.appspot.com/23526070/diff/718001/cc/layers/video_l... > File cc/layers/video_layer_impl.cc (right): > > https://chromiumcodereview.appspot.com/23526070/diff/718001/cc/layers/video_l... > cc/layers/video_layer_impl.cc:217: gfx::Transform scale; > I think the old code would be preferable. > > This initializes an identity matrix, then scales it, then multiplies it. > > The old code initializes a matrx from the stream texture, then scales it. Avoids > the identity step and the extra multiply. (They are functionally the exact same otherwise, right? I think?)
https://chromiumcodereview.appspot.com/23526070/diff/718001/cc/layers/video_l... File cc/layers/video_layer_impl.cc (right): https://chromiumcodereview.appspot.com/23526070/diff/718001/cc/layers/video_l... cc/layers/video_layer_impl.cc:217: gfx::Transform scale; On 2013/11/13 23:27:19, danakj wrote: > I think the old code would be preferable. > > This initializes an identity matrix, then scales it, then multiplies it. > > The old code initializes a matrx from the stream texture, then scales it. Avoids > the identity step and the extra multiply. Unfortunately, the two operations are not identical, which is why I had to put this in :-( What we need for sampling the stream texture is to do a stream texture transform ("T" matrix below), then a scale ("S" matrix below). So: (u', v') = S * T * (u, v) gfx::Transform::Scale() is a little backwards. It has this comment: // Applies the current transformation on a scaling and assigns the result // to |this|. void Scale(SkMScalar x, SkMScalar y); and does a SkMatrix44::preScale(), instead of an SkMatrix44::postScale(). So what we get incorrectly is: (u', v') = T * S * (u, v) It's a little confusing.
On 2013/11/13 23:37:33, sheu wrote: > https://chromiumcodereview.appspot.com/23526070/diff/718001/cc/layers/video_l... > File cc/layers/video_layer_impl.cc (right): > > https://chromiumcodereview.appspot.com/23526070/diff/718001/cc/layers/video_l... > cc/layers/video_layer_impl.cc:217: gfx::Transform scale; > On 2013/11/13 23:27:19, danakj wrote: > > I think the old code would be preferable. > > > > This initializes an identity matrix, then scales it, then multiplies it. > > > > The old code initializes a matrx from the stream texture, then scales it. > Avoids > > the identity step and the extra multiply. > > Unfortunately, the two operations are not identical, which is why I had to put > this in :-( > > What we need for sampling the stream texture is to do a stream texture transform > ("T" matrix below), then a scale ("S" matrix below). So: > > (u', v') = S * T * (u, v) > > gfx::Transform::Scale() is a little backwards. It has this comment: > > // Applies the current transformation on a scaling and assigns the result > // to |this|. > void Scale(SkMScalar x, SkMScalar y); > > and does a SkMatrix44::preScale(), instead of an SkMatrix44::postScale(). So > what we get incorrectly is: > > (u', v') = T * S * (u, v) > > It's a little confusing. Oh I see, I didn't catch the order reversal, my bad. You could save the scale by initializing the first matrix with the scale components directly, but meh. LGTM.
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/sheu@chromium.org/23526070/718001
Retried try job too often on mac for step(s) compile http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=mac&number...
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/sheu@chromium.org/23526070/718001
Retried try job too often on mac for step(s) compile http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=mac&number...
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/sheu@chromium.org/23526070/718001
Message was sent while issue was closed.
Change committed as 235128 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
