|
|
Created:
6 years, 5 months ago by sandersd (OOO until July 31) Modified:
6 years, 4 months ago CC:
chromium-reviews, posciak+watch_chromium.org, jam, mcasas+watch_chromium.org, feature-media-reviews_chromium.org, darin-cc_chromium.org, piman+watch_chromium.org, wjia+watch_chromium.org Base URL:
https://chromium.googlesource.com/chromium/src.git@master Project:
chromium Visibility:
Public. |
DescriptionImplement actually decoding frames in VTVideoDecodeAccelerator.
This adds translation from Annex B to AVCC format along with decoding
frames and binding them to textures.
It seems that kVTDecodeFrame_EnableTemporalProcessing is just a
suggestion to VideoToolbox, and one that it ignores. That means that,
for now, this code only outputs frames in the correct order for I-frame
only video.
BUG=133828
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=287451
Patch Set 1 #Patch Set 2 : Fixup header. #
Total comments: 25
Patch Set 3 : Address comments. #
Total comments: 14
Patch Set 4 : Fix refcounting and address comments. #
Total comments: 17
Patch Set 5 : Fix race on coded_size_. #
Total comments: 48
Patch Set 6 : Fix header comments. #
Total comments: 2
Patch Set 7 : Vertical space. #Patch Set 8 : Nits. #
Total comments: 36
Patch Set 9 : Fix nits. #
Total comments: 5
Patch Set 10 : Rebase #Patch Set 11 : Rebase again #Patch Set 12 : Fixes #
Messages
Total messages: 34 (0 generated)
https://codereview.chromium.org/397883002/diff/20001/content/common/gpu/media... File content/common/gpu/media/vt_video_decode_accelerator.cc (right): https://codereview.chromium.org/397883002/diff/20001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:37: // Will be guaranteed to be valid once flush before delete is implemented. nit: this reads more like a TODO -- is that the case? https://codereview.chromium.org/397883002/diff/20001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:41: CFRetain(image_buffer); does this call need to be balanced with a CFRelease()/ https://codereview.chromium.org/397883002/diff/20001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:49: frames_pending_decode_(0), while you increment/decrement this variable, I don't see it being actually used anywhere https://codereview.chromium.org/397883002/diff/20001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:97: const std::vector<size_t>& nalu_data_sizes) { add DCHECK(decoder_thread_.message_loop_proxy()->BelongsToCurrentThread()); https://codereview.chromium.org/397883002/diff/20001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:133: int32_t pixel_format = '2vuy'; this is used in multiple places -- can you make this a constant at the top of this file? https://codereview.chromium.org/397883002/diff/20001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:266: // kVTDecodeFrame_EnableAsynchronousDecompression | does it make send to define these somewhere? perhaps in the fragment header file along w/ the other forward declarations? https://codereview.chromium.org/397883002/diff/20001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:268: VTDecodeFrameFlags decode_flags = (1<<0) | (1<<3); spaces around binary operators (1 << 0) | (1 << 3) https://codereview.chromium.org/397883002/diff/20001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:293: decoded_frames_.push(DecodedFrame(bitstream_id, image_buffer)); do we need locking around decoded_frames_? alternatively we can include the DecodedFrame in the posted task to SendPictures() and avoid locking for the entirety of that method relating to my comment above about frames_pending_decode_ ... if we don't need that either we can remove all locking https://codereview.chromium.org/397883002/diff/20001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:337: DecodedFrame frame = decoded_frames_.front(); decoded_frames_.pop(); put the pop() calls on separate lines https://codereview.chromium.org/397883002/diff/20001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:339: int32_t picture_id = picture_ids_.front(); picture_ids_.pop(); ditto https://codereview.chromium.org/397883002/diff/20001/content/common/gpu/media... File content/common/gpu/media/vt_video_decode_accelerator.h (right): https://codereview.chromium.org/397883002/diff/20001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:86: std::map<int32_t, int32_t> texture_ids_; Chromium uses int32 w/o the _t https://codereview.chromium.org/397883002/diff/20001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:105: int32_t frames_pending_decode_; for a count that shouldn't be negative, something like size_t would be more appropriate http://dev.chromium.org/developers/coding-style#TOC-Types
I've addressed the comments, but not tested the new code yet as building on Mac is not cooperating at the moment. https://codereview.chromium.org/397883002/diff/20001/content/common/gpu/media... File content/common/gpu/media/vt_video_decode_accelerator.cc (right): https://codereview.chromium.org/397883002/diff/20001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:37: // Will be guaranteed to be valid once flush before delete is implemented. On 2014/07/16 18:18:51, scherkus wrote: > nit: this reads more like a TODO -- is that the case? Done. https://codereview.chromium.org/397883002/diff/20001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:41: CFRetain(image_buffer); On 2014/07/16 18:18:51, scherkus wrote: > does this call need to be balanced with a CFRelease()/ No, that is handled by the destruction of what is now called picture_bindings_ (was retained_images_). I've moved this call to VTVDA::Output and tried to make it more clear by using a ScopedCFTypeRef. https://codereview.chromium.org/397883002/diff/20001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:49: frames_pending_decode_(0), On 2014/07/16 18:18:51, scherkus wrote: > while you increment/decrement this variable, I don't see it being actually used > anywhere It will be used to determine when a request to flush has completed. https://codereview.chromium.org/397883002/diff/20001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:97: const std::vector<size_t>& nalu_data_sizes) { On 2014/07/16 18:18:51, scherkus wrote: > add DCHECK(decoder_thread_.message_loop_proxy()->BelongsToCurrentThread()); Done. https://codereview.chromium.org/397883002/diff/20001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:133: int32_t pixel_format = '2vuy'; On 2014/07/16 18:18:51, scherkus wrote: > this is used in multiple places -- can you make this a constant at the top of > this file? Done. https://codereview.chromium.org/397883002/diff/20001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:266: // kVTDecodeFrame_EnableAsynchronousDecompression | On 2014/07/16 18:18:51, scherkus wrote: > does it make send to define these somewhere? > > perhaps in the fragment header file along w/ the other forward declarations? Done. https://codereview.chromium.org/397883002/diff/20001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:268: VTDecodeFrameFlags decode_flags = (1<<0) | (1<<3); On 2014/07/16 18:18:51, scherkus wrote: > spaces around binary operators > > (1 << 0) | (1 << 3) Done. https://codereview.chromium.org/397883002/diff/20001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:293: decoded_frames_.push(DecodedFrame(bitstream_id, image_buffer)); On 2014/07/16 18:18:51, scherkus wrote: > do we need locking around decoded_frames_? > > alternatively we can include the DecodedFrame in the posted task to > SendPictures() and avoid locking for the entirety of that method > > > relating to my comment above about frames_pending_decode_ ... if we don't need > that either we can remove all locking I've implemented a task-posting version of this. It's much more refcount-heavy, but it does avoid doing anything expensive inside a lock. Probably a much better plan where the OpenGL API is involved. Most of the excess refcounting can be removed through clever use of references and maybe a little bit of C++11, if necessary. https://codereview.chromium.org/397883002/diff/20001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:337: DecodedFrame frame = decoded_frames_.front(); decoded_frames_.pop(); On 2014/07/16 18:18:51, scherkus wrote: > put the pop() calls on separate lines Done. https://codereview.chromium.org/397883002/diff/20001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:339: int32_t picture_id = picture_ids_.front(); picture_ids_.pop(); On 2014/07/16 18:18:51, scherkus wrote: > ditto Done. https://codereview.chromium.org/397883002/diff/20001/content/common/gpu/media... File content/common/gpu/media/vt_video_decode_accelerator.h (right): https://codereview.chromium.org/397883002/diff/20001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:86: std::map<int32_t, int32_t> texture_ids_; On 2014/07/16 18:18:51, scherkus wrote: > Chromium uses int32 w/o the _t My understanding is those types are deprecated. https://code.google.com/p/chromium/codesearch#chromium/src/base/basictypes.h&... https://codereview.chromium.org/397883002/diff/20001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:105: int32_t frames_pending_decode_; On 2014/07/16 18:18:51, scherkus wrote: > for a count that shouldn't be negative, something like size_t would be more > appropriate > > http://dev.chromium.org/developers/coding-style#TOC-Types Done.
https://codereview.chromium.org/397883002/diff/20001/content/common/gpu/media... File content/common/gpu/media/vt_video_decode_accelerator.h (right): https://codereview.chromium.org/397883002/diff/20001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:86: std::map<int32_t, int32_t> texture_ids_; On 2014/07/17 00:22:22, sandersd wrote: > On 2014/07/16 18:18:51, scherkus wrote: > > Chromium uses int32 w/o the _t > > My understanding is those types are deprecated. > > https://code.google.com/p/chromium/codesearch#chromium/src/base/basictypes.h&... huh! go figure! https://code.google.com/p/chromium/issues/detail?id=138542 https://codereview.chromium.org/397883002/diff/40001/content/common/gpu/media... File content/common/gpu/media/vt_video_decode_accelerator.cc (right): https://codereview.chromium.org/397883002/diff/40001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:62: frames_pending_decode_(0), can you remove this for this CL and add it back in when it's actually used by Flush()? it'll make it easier to understand the subsequent review for Flush() https://codereview.chromium.org/397883002/diff/40001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:301: // software decoder is selected. this comment is confusing to me ... is this a TODO? should be reading the info_flags? if so, what should we be doing it if it wasn't corrupted? when/why are we selecting the software decoder? https://codereview.chromium.org/397883002/diff/40001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:303: CHECK(CFGetTypeID(image_buffer) == CVPixelBufferGetTypeID()); nit: is it possible to CHECK_EQ() these two, or do we get type errors? https://codereview.chromium.org/397883002/diff/40001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:347: gpu_task_runner_->PostTask(FROM_HERE, base::Bind( does this have to be made async? it looks like we should be on the gpu thread already if it has to be async a comment or something would be nice, similar to one above in AssignPictureBuffers() ... unless the PostTask() is also not needed in AssignPictureBuffers :) https://codereview.chromium.org/397883002/diff/40001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:354: if (!picture_ids_.empty() && !decoded_frames_.empty()) { nit: prefer early return when possible so we avoid deeply nesting code if (picture_ids_.empty() || decoded_frames_.empty()) return; https://codereview.chromium.org/397883002/diff/40001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:381: gpu_task_runner_->PostTask(FROM_HERE, base::Bind( do these need to be async? we're already on gpu_task_runner_ and should be able to call the client directly the only time weak_client_factory_ gets invalidated is when our destructor is running, which should mean it's safe to store a raw VDA::Client* pointer and call it directly (if we're somehow executing this code after our dtor has run we've got much bigger problems to solve!) https://codereview.chromium.org/397883002/diff/40001/content/common/gpu/media... File content/common/gpu/media/vt_video_decode_accelerator.h (right): https://codereview.chromium.org/397883002/diff/40001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:69: // Configure a VideoToolbox decompression session from parameter set NALUs. general comment on comments: comments that are an english sentence equivalent of their method names aren't helpful. more helpful comments would include (as an example), grouping functions together by the thread they run on // Methods interacting with VideoToolbox. Run on |decoder_thread_|. void ConfigureDecoder(...); void DecodeTask(...); // Methods interacting with |weak_client_factory_|. Run on |gpu_task_runner_|. void OutputTask(...); void SendPictures();
It turns out that the task-posting version of Output() was decrementing the |image_buffer| reference one-too-many times (still not sure exactly how). I've added a DCHECK to make sure similar issues can't come back. https://codereview.chromium.org/397883002/diff/40001/content/common/gpu/media... File content/common/gpu/media/vt_video_decode_accelerator.cc (right): https://codereview.chromium.org/397883002/diff/40001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:62: frames_pending_decode_(0), On 2014/07/17 02:08:26, scherkus wrote: > can you remove this for this CL and add it back in when it's actually used by > Flush()? > > it'll make it easier to understand the subsequent review for Flush() Done. https://codereview.chromium.org/397883002/diff/40001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:301: // software decoder is selected. On 2014/07/17 02:08:26, scherkus wrote: > this comment is confusing to me ... is this a TODO? should be reading the > info_flags? if so, what should we be doing it if it wasn't corrupted? when/why > are we selecting the software decoder? The only bit of interest in info_flags is the frame dropped bit, but we don't expect any frames to be dropped (and I suspect that image_buffer will be of type CFNull in that case). The software decoder is selected for us automatically if the hardware can't be initialized or does not support the parameters of the video. https://codereview.chromium.org/397883002/diff/40001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:303: CHECK(CFGetTypeID(image_buffer) == CVPixelBufferGetTypeID()); On 2014/07/17 02:08:25, scherkus wrote: > nit: is it possible to CHECK_EQ() these two, or do we get type errors? Done. https://codereview.chromium.org/397883002/diff/40001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:347: gpu_task_runner_->PostTask(FROM_HERE, base::Bind( On 2014/07/17 02:08:26, scherkus wrote: > does this have to be made async? it looks like we should be on the gpu thread > already > > if it has to be async a comment or something would be nice, similar to one above > in AssignPictureBuffers() ... unless the PostTask() is also not needed in > AssignPictureBuffers :) Done. I've tried to avoid any calling back into the host on the same stack, as the results are not documented, but some of these are indeed trivial. https://codereview.chromium.org/397883002/diff/40001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:354: if (!picture_ids_.empty() && !decoded_frames_.empty()) { On 2014/07/17 02:08:25, scherkus wrote: > nit: prefer early return when possible so we avoid deeply nesting code > > if (picture_ids_.empty() || decoded_frames_.empty()) > return; Done. https://codereview.chromium.org/397883002/diff/40001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:381: gpu_task_runner_->PostTask(FROM_HERE, base::Bind( On 2014/07/17 02:08:25, scherkus wrote: > do these need to be async? > > we're already on gpu_task_runner_ and should be able to call the client directly > > the only time weak_client_factory_ gets invalidated is when our destructor is > running, which should mean it's safe to store a raw VDA::Client* pointer and > call it directly > > (if we're somehow executing this code after our dtor has run we've got much > bigger problems to solve!) Done. It looks like both the Chrome and Pepper implementations are trivial enough to call directly (but I don't like it because I had to audit the code to be sure). https://codereview.chromium.org/397883002/diff/40001/content/common/gpu/media... File content/common/gpu/media/vt_video_decode_accelerator.h (right): https://codereview.chromium.org/397883002/diff/40001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:69: // Configure a VideoToolbox decompression session from parameter set NALUs. On 2014/07/17 02:08:26, scherkus wrote: > general comment on comments: comments that are an english sentence equivalent of > their method names aren't helpful. > > more helpful comments would include (as an example), grouping functions together > by the thread they run on > > // Methods interacting with VideoToolbox. Run on |decoder_thread_|. > void ConfigureDecoder(...); > void DecodeTask(...); > > // Methods interacting with |weak_client_factory_|. Run on |gpu_task_runner_|. > void OutputTask(...); > void SendPictures(); Done.
few more nits but looking good https://codereview.chromium.org/397883002/diff/60001/content/common/gpu/media... File content/common/gpu/media/vt_video_decode_accelerator.cc (right): https://codereview.chromium.org/397883002/diff/60001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:159: session_.reset(); shouldn't this already be NULL / DCHECK it is? https://codereview.chromium.org/397883002/diff/60001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:170: if (coded_size_ != new_coded_size) { FYI coded_size_ is being read/written to from multiple threads also, is this a code path you hit today? it seems like this code can only be called once to initialize session_ and then never again perhaps you'd be better off: 1) removing weak_client_factory_ entirely 2) posting a task to a helper method on the gpu thread (e.g., CodedSizeChanged(...) or DecoderConfigured(...) or what have you) 3) updating coded_size_ on the GPU thread 4) calling client_->ProvidePictureBuffers(...) as needed that eliminates reading/writing coded_size_ on multiple threads and keeps us using a single weak pointer https://codereview.chromium.org/397883002/diff/60001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:320: // become broken if they are used before that happens. to confirm -- we need the post task here, right? https://codereview.chromium.org/397883002/diff/60001/content/common/gpu/media... File content/common/gpu/media/vt_video_decode_accelerator.h (right): https://codereview.chromium.org/397883002/diff/60001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:53: // Called by OutputThunk when VideoToolbox finishes decoding a frame. nit: we have a convention of adding a "()" suffix to function names referenced in comments (similar to how we surround variables names with |'s -- makes it easier to scan and disambiguates functions from types/classes) https://codereview.chromium.org/397883002/diff/60001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:68: // Methods for interacting with VideoToolbox. Run on |decoded_thread_|. typo: decoded -> decoder https://codereview.chromium.org/397883002/diff/60001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:90: std::map<int32_t, base::ScopedCFTypeRef<CVImageBufferRef>> picture_bindings_; interesting fact: apparently because we compile with -std=gnu++11 we no longer need spaces between nested template arguments ">>" hooray! https://codereview.chromium.org/397883002/diff/60001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:109: // state starts being destroyed. the old comment was more accurate the situation you describe here is impossible (the dtor executes on the GPU thread so no other tasks can be running until the current task that triggered destruction yields back to the message loop. at that point all WeakPtrs have been invalidated and any pending tasks referencing said WeakPtrs will no-op)
https://codereview.chromium.org/397883002/diff/60001/content/common/gpu/media... File content/common/gpu/media/vt_video_decode_accelerator.cc (right): https://codereview.chromium.org/397883002/diff/60001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:159: session_.reset(); On 2014/07/17 21:17:26, scherkus wrote: > shouldn't this already be NULL / DCHECK it is? That's true now that I've changed DecodeTask to only call once, but that is temporary. It turns out that the previous SPS/PPS may need to be saved because it's possible to update only one of them (but I need to read the spec to be sure I get this right), but the code was assuming they always come together. https://codereview.chromium.org/397883002/diff/60001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:170: if (coded_size_ != new_coded_size) { On 2014/07/17 21:17:26, scherkus wrote: > FYI coded_size_ is being read/written to from multiple threads > > also, is this a code path you hit today? it seems like this code can only be > called once to initialize session_ and then never again > > perhaps you'd be better off: > 1) removing weak_client_factory_ entirely > 2) posting a task to a helper method on the gpu thread (e.g., > CodedSizeChanged(...) or DecoderConfigured(...) or what have you) > 3) updating coded_size_ on the GPU thread > 4) calling client_->ProvidePictureBuffers(...) as needed > > > that eliminates reading/writing coded_size_ on multiple threads and keeps us > using a single weak pointer Good catch on the data race. This is a tricky operation, because we have to make sure that the current pictures are all released before we send the current frame to be decoded. This is complicated by the fact that we can't immediately release pictures that are backed by images (that can't happen safely until they are passed back in ReusePictureBuffer). I've changed this to use SizeChangedTask, which I believe will be safe with the addition of a SIZE_CHANGED state (states are part of the flushing logic coming next). https://codereview.chromium.org/397883002/diff/60001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:320: // become broken if they are used before that happens. On 2014/07/17 21:17:26, scherkus wrote: > to confirm -- we need the post task here, right? Yes.
looks like you missed the comments in the .h https://codereview.chromium.org/397883002/diff/60001/content/common/gpu/media... File content/common/gpu/media/vt_video_decode_accelerator.h (right): https://codereview.chromium.org/397883002/diff/60001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:53: // Called by OutputThunk when VideoToolbox finishes decoding a frame. On 2014/07/17 21:17:26, scherkus wrote: > nit: we have a convention of adding a "()" suffix to function names referenced > in comments (similar to how we surround variables names with |'s -- makes it > easier to scan and disambiguates functions from types/classes) FYI not done https://codereview.chromium.org/397883002/diff/60001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:68: // Methods for interacting with VideoToolbox. Run on |decoded_thread_|. On 2014/07/17 21:17:26, scherkus wrote: > typo: decoded -> decoder FYI not done https://codereview.chromium.org/397883002/diff/60001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:109: // state starts being destroyed. On 2014/07/17 21:17:26, scherkus wrote: > the old comment was more accurate > > the situation you describe here is impossible (the dtor executes on the GPU > thread so no other tasks can be running until the current task that triggered > destruction yields back to the message loop. at that point all WeakPtrs have > been invalidated and any pending tasks referencing said WeakPtrs will no-op) FYI not done
Oops, sorry about that! https://codereview.chromium.org/397883002/diff/60001/content/common/gpu/media... File content/common/gpu/media/vt_video_decode_accelerator.h (right): https://codereview.chromium.org/397883002/diff/60001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:53: // Called by OutputThunk when VideoToolbox finishes decoding a frame. On 2014/07/17 23:32:51, scherkus wrote: > On 2014/07/17 21:17:26, scherkus wrote: > > nit: we have a convention of adding a "()" suffix to function names referenced > > in comments (similar to how we surround variables names with |'s -- makes it > > easier to scan and disambiguates functions from types/classes) > > FYI not done Done. https://codereview.chromium.org/397883002/diff/60001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:68: // Methods for interacting with VideoToolbox. Run on |decoded_thread_|. On 2014/07/17 23:32:51, scherkus wrote: > On 2014/07/17 21:17:26, scherkus wrote: > > typo: decoded -> decoder > > FYI not done Done. https://codereview.chromium.org/397883002/diff/60001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:90: std::map<int32_t, base::ScopedCFTypeRef<CVImageBufferRef>> picture_bindings_; On 2014/07/17 21:17:26, scherkus wrote: > interesting fact: apparently because we compile with -std=gnu++11 we no longer > need spaces between nested template arguments ">>" > > hooray! Acknowledged. https://codereview.chromium.org/397883002/diff/60001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:109: // state starts being destroyed. On 2014/07/17 23:32:51, scherkus wrote: > On 2014/07/17 21:17:26, scherkus wrote: > > the old comment was more accurate > > > > the situation you describe here is impossible (the dtor executes on the GPU > > thread so no other tasks can be running until the current task that triggered > > destruction yields back to the message loop. at that point all WeakPtrs have > > been invalidated and any pending tasks referencing said WeakPtrs will no-op) > > FYI not done Ah yes, now that this is correctly used from only the GPU thread, the destruction order doesn't actually matter anymore (as long as it is destructed after |decoder_thread_|). Done.
https://codereview.chromium.org/397883002/diff/100001/content/common/gpu/medi... File content/common/gpu/media/vt_video_decode_accelerator.h (right): https://codereview.chromium.org/397883002/diff/100001/content/common/gpu/medi... content/common/gpu/media/vt_video_decode_accelerator.h:107: // This WeakPtrFactory does not need to be last as its pointers are bound to final nit (I swear! this will all become automatic with time!): we make judicious use of vertical whitespace general guideline is a blank line before each // comment, except for when it's the first line in a new indentation block tl;dr version: add a blank line before this + other member variables
https://codereview.chromium.org/397883002/diff/100001/content/common/gpu/medi... File content/common/gpu/media/vt_video_decode_accelerator.h (right): https://codereview.chromium.org/397883002/diff/100001/content/common/gpu/medi... content/common/gpu/media/vt_video_decode_accelerator.h:107: // This WeakPtrFactory does not need to be last as its pointers are bound to On 2014/07/18 01:08:41, scherkus wrote: > final nit (I swear! this will all become automatic with time!): we make > judicious use of vertical whitespace > > general guideline is a blank line before each // comment, except for when it's > the first line in a new indentation block > > tl;dr version: add a blank line before this + other member variables Done.
you rock! lgtm! posciak: I think we can get the CQ started on this and if you have any other comments we can fix them up in a follow up CL
https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... File content/common/gpu/media/vt_video_decode_accelerator.cc (right): https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:73: client_ = client; Sanity check: why is it safe to use client_ as a non-weakptr? client becomes invalid after Destroy() is called. https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:167: if (coded_size_ != new_coded_size) { Since you don't do dismiss/allow resolution change, you should error out here if !coded_size_.Empty(). Or will we never get here because session_ will be not-null then? https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:193: // NALUs are stored with Annex B format in the bitstream buffer (3-byte start Actually, start codes can be either 3- or 4-byte. https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:294: weak_this_factory_.GetWeakPtr(), Perhaps use one cached weakptr instead of generating one every time? https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:307: client_->ProvidePictureBuffers( Do we want if (client_) ? https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:351: glBindTexture(GL_TEXTURE_RECTANGLE_ARB, texture_ids_[picture_id]); Please use ScopedPictureBinder so that the currently bound texture gets rebound after you are done, instead of setting it to invalid texture id at the end. https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:366: client_->NotifyEndOfBitstreamBuffer(frame.bitstream_id); Could you do this in output task? What happens when frames are out of order, we'd want to return bitstream buffers faster then their frames are output probably. https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... File content/common/gpu/media/vt_video_decode_accelerator.h (right): https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:14: #include "base/memory/scoped_ptr.h" I don't see any usage of scoped_ptr in this class. There is however scoped_refptr, for which the correct header I believe is ref_counted.h https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:17: #include "base/synchronization/lock.h" Unused? https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:47: virtual void ReusePictureBuffer(int32_t picture_id) OVERRIDE; Need to include stdint.h for int32_t. https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:69: void ConfigureDecoder( s/ConfigureDecoder/ConfigureDecoderTask/ ? https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:77: void SendPictures(); Task? https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:84: gfx::Size texture_size_; Shouldn't this be the same as coded_size_ ? https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:86: std::map<int32_t, int32_t> texture_ids_; texture ids are uint32 https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:88: std::queue<int32_t> picture_ids_; s/picture_ids_/available_picture_ids_/ ? https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:92: std::map<int32_t, base::ScopedCFTypeRef<CVImageBufferRef>> picture_bindings_; Is the first parameter a texture id? -> uint32. Also, how about having texture id, picture id and ScopedCFTypeRef<CVImageBufferRef> in one struct, instead of two maps? Looks like both maps are indexed by picture_id. Would make it easier to manage this. You could also save a lookup in SendPictures. https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:104: // Unprotected shared state (set up and torn down on GPU thread). What does "unprotected" mean?
https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... File content/common/gpu/media/vt_video_decode_accelerator.cc (right): https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:73: client_ = client; On 2014/07/18 01:22:12, Pawel Osciak wrote: > Sanity check: why is it safe to use client_ as a non-weakptr? client becomes > invalid after Destroy() is called. Destroy() calls "delete this" In other words, this object doesn't exist anymore after Destroy().
https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... File content/common/gpu/media/vt_video_decode_accelerator.cc (right): https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:73: client_ = client; On 2014/07/18 01:26:17, scherkus wrote: > On 2014/07/18 01:22:12, Pawel Osciak wrote: > > Sanity check: why is it safe to use client_ as a non-weakptr? client becomes > > invalid after Destroy() is called. > > Destroy() calls "delete this" > > In other words, this object doesn't exist anymore after Destroy(). The danger is if any of the methods running on the decoder_thread_ would be posting to GPU ChildThread directly using client_. Then it wouldn't help. Can't see a scenario like that directly here, but I'm asking this to make sure the above is understood and has been thought through/verified.
https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... File content/common/gpu/media/vt_video_decode_accelerator.cc (right): https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:73: client_ = client; On 2014/07/18 01:45:05, Pawel Osciak wrote: > On 2014/07/18 01:26:17, scherkus wrote: > > On 2014/07/18 01:22:12, Pawel Osciak wrote: > > > Sanity check: why is it safe to use client_ as a non-weakptr? client becomes > > > invalid after Destroy() is called. > > > > Destroy() calls "delete this" > > > > In other words, this object doesn't exist anymore after Destroy(). > > The danger is if any of the methods running on the decoder_thread_ would be > posting to GPU ChildThread directly using client_. Then it wouldn't help. > > Can't see a scenario like that directly here, but I'm asking this to make sure > the above is understood and has been thought through/verified. Yep! I recommended routing everything through methods on VTVDA on the GPU thread. Posting to other threads using raw pointers is too tricky / error-prone and I don't feel we need to have duplicate WeakPtrFactories when one should be good enough for the class lifetime as a whole.
On 2014/07/18 01:47:48, scherkus wrote: > https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... > File content/common/gpu/media/vt_video_decode_accelerator.cc (right): > > https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... > content/common/gpu/media/vt_video_decode_accelerator.cc:73: client_ = client; > On 2014/07/18 01:45:05, Pawel Osciak wrote: > > On 2014/07/18 01:26:17, scherkus wrote: > > > On 2014/07/18 01:22:12, Pawel Osciak wrote: > > > > Sanity check: why is it safe to use client_ as a non-weakptr? client > becomes > > > > invalid after Destroy() is called. > > > > > > Destroy() calls "delete this" > > > > > > In other words, this object doesn't exist anymore after Destroy(). > > > > The danger is if any of the methods running on the decoder_thread_ would be > > posting to GPU ChildThread directly using client_. Then it wouldn't help. > > > > Can't see a scenario like that directly here, but I'm asking this to make sure > > the above is understood and has been thought through/verified. > > Yep! I recommended routing everything through methods on VTVDA on the GPU > thread. > > Posting to other threads using raw pointers is too tricky / error-prone and I > don't feel we need to have duplicate WeakPtrFactories when one should be good > enough for the class lifetime as a whole. Agreed. Thanks.
https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... File content/common/gpu/media/vt_video_decode_accelerator.cc (right): https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:167: if (coded_size_ != new_coded_size) { On 2014/07/18 01:22:12, Pawel Osciak wrote: > Since you don't do dismiss/allow resolution change, you should error out here if > !coded_size_.Empty(). > Or will we never get here because session_ will be not-null then? Resolution change will be supported as soon as I figure out how to keep track of the correct SPS/PPS units. https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:193: // NALUs are stored with Annex B format in the bitstream buffer (3-byte start On 2014/07/18 01:22:12, Pawel Osciak wrote: > Actually, start codes can be either 3- or 4-byte. Done. https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:294: weak_this_factory_.GetWeakPtr(), On 2014/07/18 01:22:12, Pawel Osciak wrote: > Perhaps use one cached weakptr instead of generating one every time? I don't see much difference between copying a WeakPtr and generating a new one. Is there a significant performance difference? https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:307: client_->ProvidePictureBuffers( On 2014/07/18 01:22:12, Pawel Osciak wrote: > Do we want if (client_) ? This is always safe. https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:351: glBindTexture(GL_TEXTURE_RECTANGLE_ARB, texture_ids_[picture_id]); On 2014/07/18 01:22:12, Pawel Osciak wrote: > Please use ScopedPictureBinder so that the currently bound texture gets rebound > after you are done, instead of setting it to invalid texture id at the end. Done. This is an area I don't know much about and can't find documentation. Do you know who can tell me if I am doing the Context management correctly? https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:366: client_->NotifyEndOfBitstreamBuffer(frame.bitstream_id); On 2014/07/18 01:22:12, Pawel Osciak wrote: > Could you do this in output task? What happens when frames are out of order, > we'd want to return bitstream buffers faster then their frames are output > probably. Re-ordering of out of order frames is an unsolved problem right now, but if these are decoupled then unlimited frames are sent for decoding. I have been interpreting the API as: NotifyEndOfBitstreamBuffer implies PictureReady will not be called (again) for the bitstream_id. Obviously not what all the VDAs are doing right now, but very similar to PPAPI. This results in up to kMaxInFlightDecodes (4) pending frames. The alternative is to implement our own rate limiting. https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... File content/common/gpu/media/vt_video_decode_accelerator.h (right): https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:14: #include "base/memory/scoped_ptr.h" On 2014/07/18 01:22:13, Pawel Osciak wrote: > I don't see any usage of scoped_ptr in this class. > There is however scoped_refptr, for which the correct header I believe is > ref_counted.h Done. https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:17: #include "base/synchronization/lock.h" On 2014/07/18 01:22:12, Pawel Osciak wrote: > Unused? Done. https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:47: virtual void ReusePictureBuffer(int32_t picture_id) OVERRIDE; On 2014/07/18 01:22:13, Pawel Osciak wrote: > Need to include stdint.h for int32_t. Done. https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:69: void ConfigureDecoder( On 2014/07/18 01:22:13, Pawel Osciak wrote: > s/ConfigureDecoder/ConfigureDecoderTask/ ? ConfigureDecoder is a utility method that is only called directly. https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:77: void SendPictures(); On 2014/07/18 01:22:12, Pawel Osciak wrote: > Task? This one I'm not sure of. It's usually called as a regular method, but it one case it's posted as a task to work around an API limitation. It's not intrinsically a task though. https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:84: gfx::Size texture_size_; On 2014/07/18 01:22:12, Pawel Osciak wrote: > Shouldn't this be the same as coded_size_ ? It's nearly the same, but one is owned by the decoder thread and one is owned by the GPU thread. https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:86: std::map<int32_t, int32_t> texture_ids_; On 2014/07/18 01:22:12, Pawel Osciak wrote: > texture ids are uint32 Done. https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:88: std::queue<int32_t> picture_ids_; On 2014/07/18 01:22:13, Pawel Osciak wrote: > s/picture_ids_/available_picture_ids_/ ? Done. https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:92: std::map<int32_t, base::ScopedCFTypeRef<CVImageBufferRef>> picture_bindings_; On 2014/07/18 01:22:12, Pawel Osciak wrote: > Is the first parameter a texture id? -> uint32. > > Also, how about having texture id, picture id and > ScopedCFTypeRef<CVImageBufferRef> in one struct, instead of two maps? Looks like > both maps are indexed by picture_id. Would make it easier to manage this. You > could also save a lookup in SendPictures. It's a picture ID. This is a reasonable idea (it's what I did for DecodedFrames), but I'd like to hold off until the flushing logic is done to be sure it doesn't impact that. I've added a TODO. https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.h:104: // Unprotected shared state (set up and torn down on GPU thread). On 2014/07/18 01:22:13, Pawel Osciak wrote: > What does "unprotected" mean? It means there is no lock (there will be lock-protected shared state to handle flushing correctly).
https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... File content/common/gpu/media/vt_video_decode_accelerator.cc (right): https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:167: if (coded_size_ != new_coded_size) { On 2014/07/18 21:50:14, sandersd wrote: > On 2014/07/18 01:22:12, Pawel Osciak wrote: > > Since you don't do dismiss/allow resolution change, you should error out here > if > > !coded_size_.Empty(). > > Or will we never get here because session_ will be not-null then? > > Resolution change will be supported as soon as I figure out how to keep track of > the correct SPS/PPS units. What exactly do you need to do? H264Parser stores SPSes/PPSes if you keep its state. https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:294: weak_this_factory_.GetWeakPtr(), On 2014/07/18 21:50:14, sandersd wrote: > On 2014/07/18 01:22:12, Pawel Osciak wrote: > > Perhaps use one cached weakptr instead of generating one every time? > > I don't see much difference between copying a WeakPtr and generating a new one. > Is there a significant performance difference? It's about correctness. See https://code.google.com/p/chromium/codesearch#chromium/src/base/memory/weak_p.... So if you create the pointer here, then if OutputTask is not run on the same thread as this method, it's a bug. Also, if this method is called on more than one thread, as the comment above it suggests, it's probably also a bug. It suggests you don't test in Debug build. Debug build will DCHECK if this happens. https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:307: client_->ProvidePictureBuffers( On 2014/07/18 21:50:14, sandersd wrote: > On 2014/07/18 01:22:12, Pawel Osciak wrote: > > Do we want if (client_) ? > > This is always safe. When I ask questions like this, I usually imply "if yes, please prove why" :) But I guess we just had a conversation with Andrew about this above. https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:351: glBindTexture(GL_TEXTURE_RECTANGLE_ARB, texture_ids_[picture_id]); On 2014/07/18 21:50:14, sandersd wrote: > On 2014/07/18 01:22:12, Pawel Osciak wrote: > > Please use ScopedPictureBinder so that the currently bound texture gets > rebound > > after you are done, instead of setting it to invalid texture id at the end. > > Done. > > This is an area I don't know much about and can't find documentation. Do you > know who can tell me if I am doing the Context management correctly? I'm not an expert, but a rule of thumb is to always make your context current before you call the methods that require it. I don't think you need to restore the previous context afterwards, but this may be different on Mac. https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:366: client_->NotifyEndOfBitstreamBuffer(frame.bitstream_id); On 2014/07/18 21:50:14, sandersd wrote: > On 2014/07/18 01:22:12, Pawel Osciak wrote: > > Could you do this in output task? What happens when frames are out of order, > > we'd want to return bitstream buffers faster then their frames are output > > probably. > > Re-ordering of out of order frames is an unsolved problem right now, but if > these are decoupled then unlimited frames are sent for decoding. > The client will keep the VDA fed with a limited number of input buffers at a time. If reordering happens, you may potentially risk performance issues by not keeping your input pipeline full enough, or starving/freezing yourself in the worst case, depending on the choice of input buffer count by the client. There is no low limit on the input buffers you may be fed with in the API, so if a client decides to give you only a small number, you may just deadlock yourself waiting for inputs. > I have been interpreting the API as: NotifyEndOfBitstreamBuffer implies > PictureReady will not be called (again) for the bitstream_id. No, this is not true. NotifyEndOfBitstreamBuffer means I'm done with the input you gave me, I'm ready for more. There is no relation to PictureReady whatsoever, and there should be no relation between handling of inputs and outputs, they should be separate. I'm not sure what you mean by "again"? PictureReady is called only once for each id... > Obviously not what > all the VDAs are doing right now, but very similar to PPAPI. > This results in up > to kMaxInFlightDecodes (4) pending frames. > > The alternative is to implement our own rate limiting. What do you mean by rate limiting? Limiting the number of inputs in flight? Client will do this for you.
https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... File content/common/gpu/media/vt_video_decode_accelerator.cc (right): https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:167: if (coded_size_ != new_coded_size) { On 2014/07/22 11:19:35, Pawel Osciak wrote: > On 2014/07/18 21:50:14, sandersd wrote: > > On 2014/07/18 01:22:12, Pawel Osciak wrote: > > > Since you don't do dismiss/allow resolution change, you should error out > here > > if > > > !coded_size_.Empty(). > > > Or will we never get here because session_ will be not-null then? > > > > Resolution change will be supported as soon as I figure out how to keep track > of > > the correct SPS/PPS units. > > What exactly do you need to do? H264Parser stores SPSes/PPSes if you keep its > state. I need to know which SPS/SPSExt/PPS records to bundle up into the avcC atom. I suspect it's as simple as the most recent ones, but I have not read up on it yet (and my original code that just used the ones from the most recent frame that had them broke when only one of SPS/PPS was present). https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:294: weak_this_factory_.GetWeakPtr(), On 2014/07/22 11:19:35, Pawel Osciak wrote: > On 2014/07/18 21:50:14, sandersd wrote: > > On 2014/07/18 01:22:12, Pawel Osciak wrote: > > > Perhaps use one cached weakptr instead of generating one every time? > > > > I don't see much difference between copying a WeakPtr and generating a new > one. > > Is there a significant performance difference? > > It's about correctness. See > https://code.google.com/p/chromium/codesearch#chromium/src/base/memory/weak_p.... > > So if you create the pointer here, then if OutputTask is not run on the same > thread as this method, it's a bug. Also, if this method is called on more than > one thread, as the comment above it suggests, it's probably also a bug. > > It suggests you don't test in Debug build. Debug build will DCHECK if this > happens. The pointers are not bound to the thread they are created on, they are bound to the thread they are dereferenced on (always the GPU thread). https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:307: client_->ProvidePictureBuffers( On 2014/07/22 11:19:35, Pawel Osciak wrote: > On 2014/07/18 21:50:14, sandersd wrote: > > On 2014/07/18 01:22:12, Pawel Osciak wrote: > > > Do we want if (client_) ? > > > > This is always safe. > > When I ask questions like this, I usually imply "if yes, please prove why" :) > But I guess we just had a conversation with Andrew about this above. Acknowledged. https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:351: glBindTexture(GL_TEXTURE_RECTANGLE_ARB, texture_ids_[picture_id]); On 2014/07/22 11:19:35, Pawel Osciak wrote: > On 2014/07/18 21:50:14, sandersd wrote: > > On 2014/07/18 01:22:12, Pawel Osciak wrote: > > > Please use ScopedPictureBinder so that the currently bound texture gets > > rebound > > > after you are done, instead of setting it to invalid texture id at the end. > > > > Done. > > > > This is an area I don't know much about and can't find documentation. Do you > > know who can tell me if I am doing the Context management correctly? > > I'm not an expert, but a rule of thumb is to always make your context current > before you call the methods that require it. I don't think you need to restore > the previous context afterwards, but this may be different on Mac. Makes sense. It crashes if I don't restore though, so I'll leave that alone. (Strangely it works fine I don't make the context current in the first place...) https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:366: client_->NotifyEndOfBitstreamBuffer(frame.bitstream_id); On 2014/07/22 11:19:35, Pawel Osciak wrote: > On 2014/07/18 21:50:14, sandersd wrote: > > On 2014/07/18 01:22:12, Pawel Osciak wrote: > > > Could you do this in output task? What happens when frames are out of order, > > > we'd want to return bitstream buffers faster then their frames are output > > > probably. > > > > Re-ordering of out of order frames is an unsolved problem right now, but if > > these are decoupled then unlimited frames are sent for decoding. > > > > The client will keep the VDA fed with a limited number of input buffers at a > time. If reordering happens, you may potentially risk performance issues by not > keeping your input pipeline full enough, or starving/freezing yourself in the > worst case, depending on the choice of input buffer count by the client. > > There is no low limit on the input buffers you may be fed with in the API, so if > a client decides to give you only a small number, you may just deadlock yourself > waiting for inputs. > > > I have been interpreting the API as: NotifyEndOfBitstreamBuffer implies > > PictureReady will not be called (again) for the bitstream_id. > > No, this is not true. NotifyEndOfBitstreamBuffer means I'm done with the input > you gave me, I'm ready for more. There is no relation to PictureReady > whatsoever, and there should be no relation between handling of inputs and > outputs, they should be separate. > > I'm not sure what you mean by "again"? PictureReady is called only once for each > id... > > > Obviously not what > > all the VDAs are doing right now, but very similar to PPAPI. > > This results in up > > to kMaxInFlightDecodes (4) pending frames. > > > > The alternative is to implement our own rate limiting. > > What do you mean by rate limiting? Limiting the number of inputs in flight? > Client will do this for you. > "Again" was parenthetical because it does not apply in this case, but the last iteration of the PPAPI I saw did not have that restriction. Client does implement rate limiting, but it does it by tracking the number of bitstream buffers that have been sent for decoding that it has not been notified are done with via. NotifyEndOfBitstreamBuffer. You can see here that the |done_cb| is called as part of that callback: https://code.google.com/p/chromium/codesearch#chromium/src/media/filters/gpu_... When I experimented with this, ACKing sooner resulted in extreme system load while it tried to demux and decode at an unbounded rate.
https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... File content/common/gpu/media/vt_video_decode_accelerator.cc (right): https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:366: client_->NotifyEndOfBitstreamBuffer(frame.bitstream_id); On 2014/07/22 19:00:23, sandersd wrote: > On 2014/07/22 11:19:35, Pawel Osciak wrote: > > On 2014/07/18 21:50:14, sandersd wrote: > > > On 2014/07/18 01:22:12, Pawel Osciak wrote: > > > > Could you do this in output task? What happens when frames are out of > order, > > > > we'd want to return bitstream buffers faster then their frames are output > > > > probably. > > > > > > Re-ordering of out of order frames is an unsolved problem right now, but if > > > these are decoupled then unlimited frames are sent for decoding. > > > > > > > The client will keep the VDA fed with a limited number of input buffers at a > > time. If reordering happens, you may potentially risk performance issues by > not > > keeping your input pipeline full enough, or starving/freezing yourself in the > > worst case, depending on the choice of input buffer count by the client. > > > > There is no low limit on the input buffers you may be fed with in the API, so > if > > a client decides to give you only a small number, you may just deadlock > yourself > > waiting for inputs. > > > > > I have been interpreting the API as: NotifyEndOfBitstreamBuffer implies > > > PictureReady will not be called (again) for the bitstream_id. > > > > No, this is not true. NotifyEndOfBitstreamBuffer means I'm done with the input > > you gave me, I'm ready for more. There is no relation to PictureReady > > whatsoever, and there should be no relation between handling of inputs and > > outputs, they should be separate. > > > > I'm not sure what you mean by "again"? PictureReady is called only once for > each > > id... > > > > > Obviously not what > > > all the VDAs are doing right now, but very similar to PPAPI. > > > This results in up > > > to kMaxInFlightDecodes (4) pending frames. > > > > > > The alternative is to implement our own rate limiting. > > > > What do you mean by rate limiting? Limiting the number of inputs in flight? > > Client will do this for you. > > > > "Again" was parenthetical because it does not apply in this case, but the last > iteration of the PPAPI I saw did not have that restriction. > > Client does implement rate limiting, but it does it by tracking the number of > bitstream buffers that have been sent for decoding that it has not been notified > are done with via. NotifyEndOfBitstreamBuffer. > > You can see here that the |done_cb| is called as part of that callback: > https://code.google.com/p/chromium/codesearch#chromium/src/media/filters/gpu_... > > When I experimented with this, ACKing sooner resulted in extreme system load > while it tried to demux and decode at an unbounded rate. The key here is that rate limiting is done on the outputs by the client. The client will display and return PictureBuffers according to the timestamps in the stream. Rate-limiting inputs on buffer IDs in output order may result in problems I mentioned above, because the output/display order may not be the same as decode order. Moreover, if the stream does not provide max_num_reorder_frames syntax element and you can't infer that from H.264 profile (https://code.google.com/p/chromium/codesearch#chromium/src/content/common/gpu...), the decoder has to fill the whole DPB before it can output (sliding window over DPB contents), because it doesn't know if a negative POC will be given to it a bit later on. This means it will have to decode at least DPB size of frames before it can output a frame. If the client allows less than DPB size input buffers in flight, you will probably deadlock. Usually videos on YT etc. include max_num_reorder_frames syntax element, but this is not a requirement of the standard. In your case you should simply just limit yourself on the outputs, but not return inputs in output order. I would suggest just holding off decode if no output buffers are available.
Now it's my fault that there was a long delay, it took me a while to think about this one ;-) https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... File content/common/gpu/media/vt_video_decode_accelerator.cc (right): https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... content/common/gpu/media/vt_video_decode_accelerator.cc:366: client_->NotifyEndOfBitstreamBuffer(frame.bitstream_id); On 2014/07/23 04:14:23, Pawel Osciak wrote: > The key here is that rate limiting is done on the outputs by the client. The > client will display and return PictureBuffers according to the timestamps in the > stream. Currently not true (it's still in delivery order), but it certainly should be in the future. > Rate-limiting inputs on buffer IDs in output order may result in problems I > mentioned above, because the output/display order may not be the same as decode > order. Moreover, if the stream does not provide max_num_reorder_frames syntax > element and you can't infer that from H.264 profile Quite true, but solving this requires that we know the display order to make sure we don't bind later frames first and run out of buffers. VideoToolbox does not provide that information, so we would have to parse it from the slice headers ourselves. That's not a bad idea long-term, but it's rather over-complicated for this CL. A medium-term solution would be to request kDPBMaxSize buffers, and then dismiss buffers as soon as we can map them to a ready picture, even if they are not decoded yet (as I believe you were describing). Essentially it would mean always using the worst-case amount of memory. All of this is unnecessary for the current CL though, as we don't do any reordering anywhere in the pipeline.
On 2014/07/23 18:03:39, sandersd wrote: > Now it's my fault that there was a long delay, it took me a while to think about > this one ;-) > > https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... > File content/common/gpu/media/vt_video_decode_accelerator.cc (right): > > https://codereview.chromium.org/397883002/diff/80001/content/common/gpu/media... > content/common/gpu/media/vt_video_decode_accelerator.cc:366: > client_->NotifyEndOfBitstreamBuffer(frame.bitstream_id); > On 2014/07/23 04:14:23, Pawel Osciak wrote: > > The key here is that rate limiting is done on the outputs by the client. The > > client will display and return PictureBuffers according to the timestamps in > the > > stream. > > Currently not true (it's still in delivery order), but it certainly should be in > the future. I meant rate limiting, not the order. Otherwise you'd have playback at crazy (or too slow) fps and not be synchronized with audio. See https://code.google.com/p/chromium/codesearch#chromium/src/media/filters/vide.... What do you mean about the future? Is there some change planned here? > > Rate-limiting inputs on buffer IDs in output order may result in problems I > > mentioned above, because the output/display order may not be the same as > decode > > order. Moreover, if the stream does not provide max_num_reorder_frames syntax > > element and you can't infer that from H.264 profile > > Quite true, but solving this requires that we know the display order to make > sure we don't bind later frames first and run out of buffers. VideoToolbox does > not provide that information, so we would have to parse it from the slice > headers ourselves. That's not a bad idea long-term, but it's rather > over-complicated for this CL. Right, but doesn't VT call your callbacks in output order? And you bind on output, so it should be fine... > A medium-term solution would be to request kDPBMaxSize buffers, and then dismiss > buffers as soon as we can map them to a ready picture, even if they are not > decoded yet (as I believe you were describing). Essentially it would mean always > using the worst-case amount of memory. Correct. So VT doesn't provide any information on required number of output buffers? I'm having trouble Googling any API reference for VT. Is this zero-copy and you share the memory when you bind the output image buffer to the texture? Also, it's a simple calculation (https://code.google.com/p/chromium/codesearch#chromium/src/content/common/gpu...), and you parse SPS already anyway, so you could just use the calculation above at that time. > All of this is unnecessary for the current CL though, as we don't do any > reordering anywhere in the pipeline. Hm? VT should be doing reordering for you if the stream requires it...
> > > The key here is that rate limiting is done on the outputs by the client. The > > > client will display and return PictureBuffers according to the timestamps in > > > the stream. > > > > Currently not true (it's still in delivery order), but it certainly should be > > in the future. > > I meant rate limiting, not the order. Otherwise you'd have playback at crazy (or > too slow) fps and not be synchronized with audio. See > https://code.google.com/p/chromium/codesearch#chromium/src/media/filters/vide.... > > What do you mean about the future? Is there some change planned here? Understood. Andrew is working on changes to allow re-ordering in the client. > > > Rate-limiting inputs on buffer IDs in output order may result in problems I > > > mentioned above, because the output/display order may not be the same as > > > decode order. Moreover, if the stream does not provide max_num_reorder_frames > > > syntax element and you can't infer that from H.264 profile > > > > Quite true, but solving this requires that we know the display order to make > > sure we don't bind later frames first and run out of buffers. VideoToolbox > > does not provide that information, so we would have to parse it from the slice > > headers ourselves. That's not a bad idea long-term, but it's rather > > over-complicated for this CL. > > Right, but doesn't VT call your callbacks in output order? And you bind on > output, so it should be fine... It does not. > > A medium-term solution would be to request kDPBMaxSize buffers, and then > > dismiss buffers as soon as we can map them to a ready picture, even if they > > are not decoded yet (as I believe you were describing). Essentially it would > > mean always using the worst-case amount of memory. > > Correct. So VT doesn't provide any information on required number of output > buffers? I'm having trouble Googling any API reference for VT. > Is this zero-copy and you share the memory when you bind the output image buffer > to the texture? Indeed, the decoded frames are allocated by VideoToolbox and are bound to the textures without copying. > Also, it's a simple calculation > (https://code.google.com/p/chromium/codesearch#chromium/src/content/common/gpu...), > and you parse SPS already anyway, so you could just use the calculation above at > that time. Right now there is no SPS parsing, they are only located and copied. > > All of this is unnecessary for the current CL though, as we don't do any > > reordering anywhere in the pipeline. > > Hm? VT should be doing reordering for you if the stream requires it... There is a flag to "allow" the reordering, but it isn't a required feature and it doesn't have an effect with the hardware I tested on.
Ok, maybe I'm crazy, but I am sure, I clearly remember I did respond to this same day you posted these comments. I have no idea where my comments have gone... I guess I have to do it again. Sorry. On 2014/07/24 02:37:17, sandersd wrote: > > > > The key here is that rate limiting is done on the outputs by the client. > The > > > > client will display and return PictureBuffers according to the timestamps > in > > > > the stream. > > > > > > Currently not true (it's still in delivery order), but it certainly should > be > > > in the future. > > > > I meant rate limiting, not the order. Otherwise you'd have playback at crazy > (or > > too slow) fps and not be synchronized with audio. See > > > https://code.google.com/p/chromium/codesearch#chromium/src/media/filters/vide.... > > > > What do you mean about the future? Is there some change planned here? > > Understood. > > Andrew is working on changes to allow re-ordering in the client. There should be no need for that. The VDA API requires the VDA impl to reorder. Is there a bug for this I can discuss this with Andrew? > > > > Rate-limiting inputs on buffer IDs in output order may result in problems > I > > > > mentioned above, because the output/display order may not be the same as > > > > decode order. Moreover, if the stream does not provide > max_num_reorder_frames > > > > syntax element and you can't infer that from H.264 profile > > > > > > Quite true, but solving this requires that we know the display order to make > > > sure we don't bind later frames first and run out of buffers. VideoToolbox > > > does not provide that information, so we would have to parse it from the > slice > > > headers ourselves. That's not a bad idea long-term, but it's rather > > > over-complicated for this CL. > > > > Right, but doesn't VT call your callbacks in output order? And you bind on > > output, so it should be fine... > > It does not. If it doesn't, then it's bad. PictureReady is supposed to be called in the output order. Existing VDAs work this way, and really only the VDA should be able to tell the display order from the stream. > > > A medium-term solution would be to request kDPBMaxSize buffers, and then > > > dismiss buffers as soon as we can map them to a ready picture, even if they > > > are not decoded yet (as I believe you were describing). Essentially it would > > > mean always using the worst-case amount of memory. > > > > Correct. So VT doesn't provide any information on required number of output > > buffers? I'm having trouble Googling any API reference for VT. > > Is this zero-copy and you share the memory when you bind the output image > buffer > > to the texture? > > Indeed, the decoded frames are allocated by VideoToolbox and are bound to the > textures without copying. > > > Also, it's a simple calculation > > > (https://code.google.com/p/chromium/codesearch#chromium/src/content/common/gpu...), > > and you parse SPS already anyway, so you could just use the calculation above > at > > that time. > > Right now there is no SPS parsing, they are only located and copied. > It's just a matter of calling ParseSPS() after AdvanceToNextNALU() and then using the formula above. Done :) > > > All of this is unnecessary for the current CL though, as we don't do any > > > reordering anywhere in the pipeline. > > > > Hm? VT should be doing reordering for you if the stream requires it... > > There is a flag to "allow" the reordering, but it isn't a required feature and > it doesn't have an effect with the hardware I tested on. How can it "allow" the ordering or not? Reordering is a property of the stream. I'm assuming you mean mean B-frames by reordering. Could you point me to the doc for this? You can't really say B-frames are not a required feature...
On 2014/07/29 07:32:53, Pawel Osciak wrote: > Ok, maybe I'm crazy, but I am sure, I clearly remember I did respond to this > same day you posted these comments. I have no idea where my comments have > gone... I guess I have to do it again. Sorry. > > On 2014/07/24 02:37:17, sandersd wrote: > > > > > The key here is that rate limiting is done on the outputs by the client. > > The > > > > > client will display and return PictureBuffers according to the > timestamps > > in > > > > > the stream. > > > > > > > > Currently not true (it's still in delivery order), but it certainly should > > be > > > > in the future. > > > > > > I meant rate limiting, not the order. Otherwise you'd have playback at crazy > > (or > > > too slow) fps and not be synchronized with audio. See > > > > > > https://code.google.com/p/chromium/codesearch#chromium/src/media/filters/vide.... > > > > > > What do you mean about the future? Is there some change planned here? > > > > Understood. > > > > Andrew is working on changes to allow re-ordering in the client. > > There should be no need for that. The VDA API requires the VDA impl to reorder. > Is there a bug for this I can discuss this with Andrew? http://crbug.com/110814 is the bug tracking implementing frame scheduling > > > > > Rate-limiting inputs on buffer IDs in output order may result in > problems > > I > > > > > mentioned above, because the output/display order may not be the same as > > > > > decode order. Moreover, if the stream does not provide > > max_num_reorder_frames > > > > > syntax element and you can't infer that from H.264 profile > > > > > > > > Quite true, but solving this requires that we know the display order to > make > > > > sure we don't bind later frames first and run out of buffers. VideoToolbox > > > > does not provide that information, so we would have to parse it from the > > slice > > > > headers ourselves. That's not a bad idea long-term, but it's rather > > > > over-complicated for this CL. > > > > > > Right, but doesn't VT call your callbacks in output order? And you bind on > > > output, so it should be fine... > > > > It does not. > > If it doesn't, then it's bad. PictureReady is supposed to be called in the > output order. Existing VDAs work this way, and really only the VDA should be > able to tell the display order from the stream. > > > > > A medium-term solution would be to request kDPBMaxSize buffers, and then > > > > dismiss buffers as soon as we can map them to a ready picture, even if > they > > > > are not decoded yet (as I believe you were describing). Essentially it > would > > > > mean always using the worst-case amount of memory. > > > > > > Correct. So VT doesn't provide any information on required number of output > > > buffers? I'm having trouble Googling any API reference for VT. > > > Is this zero-copy and you share the memory when you bind the output image > > buffer > > > to the texture? > > > > Indeed, the decoded frames are allocated by VideoToolbox and are bound to the > > textures without copying. > > > > > Also, it's a simple calculation > > > > > > (https://code.google.com/p/chromium/codesearch#chromium/src/content/common/gpu...), > > > and you parse SPS already anyway, so you could just use the calculation > above > > at > > > that time. > > > > Right now there is no SPS parsing, they are only located and copied. > > > > It's just a matter of calling ParseSPS() after AdvanceToNextNALU() and then > using the formula above. Done :) > > > > > All of this is unnecessary for the current CL though, as we don't do any > > > > reordering anywhere in the pipeline. > > > > > > Hm? VT should be doing reordering for you if the stream requires it... > > > > There is a flag to "allow" the reordering, but it isn't a required feature and > > it doesn't have an effect with the hardware I tested on. > > How can it "allow" the ordering or not? Reordering is a property of the stream. > I'm assuming you mean mean B-frames by reordering. Could you point me to the doc > for this? > You can't really say B-frames are not a required feature...
> There should be no need for that. The VDA API requires the VDA impl to reorder. > Is there a bug for this I can discuss this with Andrew? > > If it doesn't, then it's bad. PictureReady is supposed to be called in the > output order. Existing VDAs work this way, and really only the VDA should be > able to tell the display order from the stream. I don't recall seeing any comments in the API that require this. While it's certainly a desirable behavior, I doubt very much that the current code guarantees this, as PostTask() is used as the primary means of synchronization. And it's not true that only the VDA knows this, as it's GPUVideoDecoder that's adding the timestamps back on. Again though, this is not an issue I think needs to be solved in this CL. > It's just a matter of calling ParseSPS() after AdvanceToNextNALU() and then > using the formula above. Done :) SGTM, but I think the state machine required to do this correctly is a little more involved than you suggest ;-) > How can it "allow" the ordering or not? Reordering is a property of the stream. > I'm assuming you mean mean B-frames by reordering. Could you point me to the doc > for this? > You can't really say B-frames are not a required feature... Reordering is also a property of the decoder. A decoder can minimize its memory usage by delivering frames in stream order, which works as long as they can be processed in that order. The VT API does not require the decoder to emit frames in output order. The closest thing to documentation that exists is in the header file: @param decodeFlags A bitfield of directives to the decompression session and decoder. The kVTDecodeFrame_EnableAsynchronousDecompression bit indicates whether the video decoder may decompress the frame asynchronously. The kVTDecodeFrame_EnableTemporalProcessing bit indicates whether the decoder may delay calls to the output callback so as to enable processing in temporal (display) order. If both flags are clear, the decompression shall complete and your output callback function will be called before VTDecompressionSessionDecodeFrame returns. If either flag is set, VTDecompressionSessionDecodeFrame may return before the output callback function is called. @constant kVTDecodeFrame_EnableTemporalProcessing With the kVTDecodeFrame_EnableTemporalProcessing bit clear, the video decoder should emit every frame once that frame's decoding is done -- frames may not be delayed indefinitely. With the bit set, it is legal for the decoder to delay frames indefinitely -- at least until VTDecompressionSessionFinishDelayedFrames or VTDecompressionSessionInvalidate is called.
https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... File content/common/gpu/media/vt_video_decode_accelerator.cc (right): https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:41: intptr_t bitstream_id = reinterpret_cast<intptr_t>(source_frame_refcon); Would it work to cast to int32_t directly? https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:166: // If the size has changed, trigger a request for new picture buffers. Nit: I'd CHECK(coded_size_.empty()) in case you get a resolution change mid-stream, but up to you. Not a finished product anyway :) https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:210: CHECK_EQ(result, media::H264Parser::kOk); In general this means that any bad stream provided by the user/from the wild can crash GPU process. We have to handle it as an error properly, so at least please add a TODO for this. https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:227: ConfigureDecoder(config_nalu_data_ptrs, config_nalu_data_sizes); What if there were no SPS and/or PPS in this bitstream buffer? https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:244: for (size_t i = 0; i < nalus.size(); i++) { Do we require at least one NALU to be there? Maybe we need to check for that if so... https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:246: uint8_t header[4] = {0xff & nalu.size >> 24, Please base::HostToNet32(checked_cast<uint32_t>(nalu.size)). https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:250: CHECK(!CMBlockBufferReplaceDataBytes(header, data, offset, 4)); s/4/kNALUHeaderLength/ https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:251: offset += 4; ditto https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:272: VTDecodeFrameFlags decode_flags = Please comment what this does. https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:292: CFRetain(image_buffer); TODO for later, I guess if we get killed before this is reaches OutputTask or we early return somewhere on the way, or before it is returned via ReusePictureBuffer, we'll leak image_buffer? If so, we need a TODO to keep track of those... And also clean up the lists of decoded frames properly on destruction of this object? By the way, did we just lose CFRelease(image_buffer) from the code? How it it released once it's no longer used? Does CVImageBufferRef work as scoper? But then we wouldn't use CFRetain...? https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:305: void VTVideoDecodeAccelerator::SizeChangedTask(gfx::Size coded_size) { DCHECK(CalledOnValidThread()); ? https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:307: // TODO(sandersd): Dismiss existing picture buffers. And todo hold the decode thread while we replace pictures. https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:317: available_picture_ids_.push(pictures[i].id()); check these are empty first just in case maybe... https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:321: // Pictures are not marked as uncleared until this method returns. They will I don't understand this. How are they marked by the act of returning from this method? https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:341: CGLContextObj prev_context = CGLGetCurrentContext(); Looks like we have a gfx::ScopedCGLSetCurrentContext, could we use it? https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... File content/common/gpu/media/vt_video_decode_accelerator.h (right): https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.h:60: DecodedFrame(uint32_t bitstream_id, CVImageBufferRef image_buffer); s/uint32_t/int32_t/ https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.h:95: // Image buffers retained while they are bound to pictures. Nit: I'd explain this is used to keep a ref to CVImageBuffer while at display. Right?
https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... File content/common/gpu/media/vt_video_decode_accelerator.cc (right): https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:41: intptr_t bitstream_id = reinterpret_cast<intptr_t>(source_frame_refcon); On 2014/08/02 00:28:14, Pawel Osciak wrote: > Would it work to cast to int32_t directly? Done. https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:166: // If the size has changed, trigger a request for new picture buffers. On 2014/08/02 00:28:14, Pawel Osciak wrote: > Nit: I'd CHECK(coded_size_.empty()) in case you get a resolution change > mid-stream, but up to you. Not a finished product anyway :) This should be the correct behavior for a mid-stream size change (except for flushing which is not implemented yet). https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:210: CHECK_EQ(result, media::H264Parser::kOk); On 2014/08/02 00:28:14, Pawel Osciak wrote: > In general this means that any bad stream provided by the user/from the wild can > crash GPU process. We have to handle it as an error properly, so at least please > add a TODO for this. Done. https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:227: ConfigureDecoder(config_nalu_data_ptrs, config_nalu_data_sizes); On 2014/08/02 00:28:14, Pawel Osciak wrote: > What if there were no SPS and/or PPS in this bitstream buffer? This code is knowingly wrong in that case. The logic will be corrected at the same time as parameter set changes. https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:244: for (size_t i = 0; i < nalus.size(); i++) { On 2014/08/02 00:28:14, Pawel Osciak wrote: > Do we require at least one NALU to be there? Maybe we need to check for that if > so... If we don't get any NALUs, it means something has gone wrong at a lower layer already. I'm not sure what VT will do if we give it 0 NALUs, but either failing or not failing is fine. https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:246: uint8_t header[4] = {0xff & nalu.size >> 24, On 2014/08/02 00:28:14, Pawel Osciak wrote: > Please base::HostToNet32(checked_cast<uint32_t>(nalu.size)). I have a deep personal aversion to byte swapping, but we'll try it your way ;-) Done. https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:250: CHECK(!CMBlockBufferReplaceDataBytes(header, data, offset, 4)); On 2014/08/02 00:28:14, Pawel Osciak wrote: > s/4/kNALUHeaderLength/ Done. https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:251: offset += 4; On 2014/08/02 00:28:14, Pawel Osciak wrote: > ditto Done. https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:272: VTDecodeFrameFlags decode_flags = On 2014/08/02 00:28:14, Pawel Osciak wrote: > Please comment what this does. Done. https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:292: CFRetain(image_buffer); On 2014/08/02 00:28:14, Pawel Osciak wrote: > TODO for later, I guess if we get killed before this is reaches OutputTask or we > early return somewhere on the way, or before it is returned via > ReusePictureBuffer, we'll leak image_buffer? If so, we need a TODO to keep track > of those... And also clean up the lists of decoded frames properly on > destruction of this object? > > By the way, did we just lose CFRelease(image_buffer) from the code? How it it > released once it's no longer used? Does CVImageBufferRef work as scoper? But > then we wouldn't use CFRetain...? This is the standard mode for CoreFramework APIs; they pass the object and then decrement the refcount when the callback returns. What's going on here is that the ScopedCFTypeRef in DecodedFrame is operating in the default mode of ASSUME, which means that it assumes ownership without increasing the refcount. The image will be free'd in the case that the callback is discarded, when that ScopedCFTypeRef is deleted. https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:305: void VTVideoDecodeAccelerator::SizeChangedTask(gfx::Size coded_size) { On 2014/08/02 00:28:14, Pawel Osciak wrote: > DCHECK(CalledOnValidThread()); ? Done. https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:307: // TODO(sandersd): Dismiss existing picture buffers. On 2014/08/02 00:28:14, Pawel Osciak wrote: > And todo hold the decode thread while we replace pictures. This one is going to be a rather tricky dance, but I'm not certain that SizeChangedTask() will be involved in it directly. The core functionality will be a condition variable that blocks inside ConfigureDecoder() until all pending images have been sent by SendPictures(). https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:317: available_picture_ids_.push(pictures[i].id()); On 2014/08/02 00:28:14, Pawel Osciak wrote: > check these are empty first just in case maybe... Done. https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:321: // Pictures are not marked as uncleared until this method returns. They will On 2014/08/02 00:28:14, Pawel Osciak wrote: > I don't understand this. How are they marked by the act of returning from this > method? https://code.google.com/p/chromium/codesearch#chromium/src/content/common/gpu... https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:341: CGLContextObj prev_context = CGLGetCurrentContext(); On 2014/08/02 00:28:14, Pawel Osciak wrote: > Looks like we have a gfx::ScopedCGLSetCurrentContext, could we use it? Done.
lgtm! Thank you for your patience :) https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... File content/common/gpu/media/vt_video_decode_accelerator.cc (right): https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:166: // If the size has changed, trigger a request for new picture buffers. On 2014/08/02 01:43:13, sandersd wrote: > On 2014/08/02 00:28:14, Pawel Osciak wrote: > > Nit: I'd CHECK(coded_size_.empty()) in case you get a resolution change > > mid-stream, but up to you. Not a finished product anyway :) > > This should be the correct behavior for a mid-stream size change (except for > flushing which is not implemented yet). coded_size_.empty() would mean not mid-stream, but beginning of stream. So basically CHECK that this is not a mid-stream change. https://chromiumcodereview.appspot.com/397883002/diff/160001/content/common/g... File content/common/gpu/media/vt_video_decode_accelerator.cc (right): https://chromiumcodereview.appspot.com/397883002/diff/160001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:43: int32_t bitstream_id = reinterpret_cast<intptr_t>(source_frame_refcon); I meant reinterpret_cast<int32_t>... https://chromiumcodereview.appspot.com/397883002/diff/160001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:325: CHECK(!available_picture_ids_.count(pictures[i].id())); Nice, pretty crafty! https://chromiumcodereview.appspot.com/397883002/diff/160001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:327: texture_ids_[pictures[i].id()] = pictures[i].texture_id(); Should probably check this is empty too though...?
Some fixes, and reply to comments on the .h that I failed to reply to before. https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... File content/common/gpu/media/vt_video_decode_accelerator.cc (right): https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:166: // If the size has changed, trigger a request for new picture buffers. On 2014/08/03 04:26:30, Pawel Osciak wrote: > On 2014/08/02 01:43:13, sandersd wrote: > > On 2014/08/02 00:28:14, Pawel Osciak wrote: > > > Nit: I'd CHECK(coded_size_.empty()) in case you get a resolution change > > > mid-stream, but up to you. Not a finished product anyway :) > > > > This should be the correct behavior for a mid-stream size change (except for > > flushing which is not implemented yet). > > coded_size_.empty() would mean not mid-stream, but beginning of stream. So > basically CHECK that this is not a mid-stream change. Right, but my intention is that this code will execute at any place in the stream, and this is the correct implementation for that. For now, it's the call to ConfigureDecoder() that is restricted. https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... File content/common/gpu/media/vt_video_decode_accelerator.h (right): https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.h:60: DecodedFrame(uint32_t bitstream_id, CVImageBufferRef image_buffer); On 2014/08/02 00:28:15, Pawel Osciak wrote: > s/uint32_t/int32_t/ Done. https://chromiumcodereview.appspot.com/397883002/diff/140001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.h:95: // Image buffers retained while they are bound to pictures. On 2014/08/02 00:28:15, Pawel Osciak wrote: > Nit: I'd explain this is used to keep a ref to CVImageBuffer while at display. > Right? I was using the word 'retained' to mean that precisely. I've reworded it to use more generic terms. https://chromiumcodereview.appspot.com/397883002/diff/160001/content/common/g... File content/common/gpu/media/vt_video_decode_accelerator.cc (right): https://chromiumcodereview.appspot.com/397883002/diff/160001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:325: CHECK(!available_picture_ids_.count(pictures[i].id())); On 2014/08/03 04:26:30, Pawel Osciak wrote: > Nice, pretty crafty! Turns out to be wrong to :-( https://chromiumcodereview.appspot.com/397883002/diff/160001/content/common/g... content/common/gpu/media/vt_video_decode_accelerator.cc:327: texture_ids_[pictures[i].id()] = pictures[i].texture_id(); On 2014/08/03 04:26:30, Pawel Osciak wrote: > Should probably check this is empty too though...? This is a better plan.
The CQ bit was checked by sandersd@chromium.org
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/sandersd@chromium.org/397883002/220001
Message was sent while issue was closed.
Change committed as 287451 |