|
|
Created:
7 years, 10 months ago by reveman Modified:
7 years, 10 months ago CC:
chromium-reviews, apatrick_chromium Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionRe-land: Mark async texture uploads as completed from the upload thread.
This reduces the latency between when an upload completes and when the
client is notified.
BUG=173802
NOTRY=True
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=183010
Patch Set 1 #
Total comments: 17
Patch Set 2 : Keep replies. #
Total comments: 2
Patch Set 3 : Avoid unnecessary polling by GpuCommandBufferStub. #
Total comments: 2
Patch Set 4 : add check to make sure shared memory is valid in AsyncPixelTransfersCompletedQuery::End. #Patch Set 5 : Fix shutdown issue #
Total comments: 2
Patch Set 6 : fix typo #Patch Set 7 : rebase #Messages
Total messages: 34 (0 generated)
https://codereview.chromium.org/12213073/diff/1/ui/gl/DEPS File ui/gl/DEPS (right): https://codereview.chromium.org/12213073/diff/1/ui/gl/DEPS#newcode4 ui/gl/DEPS:4: "+gpu/command_buffer/common", I assume that this is not cool. We need to know what the QuerySync struct looks like to set it directly from the upload thread in async_pixel_transfer_delegate_android.cc. Should we move this struct to ui/gl?
https://codereview.chromium.org/12213073/diff/1/ui/gl/async_pixel_transfer_de... File ui/gl/async_pixel_transfer_delegate_android.cc (right): https://codereview.chromium.org/12213073/diff/1/ui/gl/async_pixel_transfer_de... ui/gl/async_pixel_transfer_delegate_android.cc:306: base::Lock transfer_in_progress_lock_; we could use base::subtle::Atomic32 here instead if performance is a concern.
https://codereview.chromium.org/12213073/diff/1/gpu/command_buffer/service/as... File gpu/command_buffer/service/async_pixel_transfer_delegate_mock.h (right): https://codereview.chromium.org/12213073/diff/1/gpu/command_buffer/service/as... gpu/command_buffer/service/async_pixel_transfer_delegate_mock.h:37: uint32 submit_count)); It was a bit unclear to me before what submit_count meant, and it's even less clear in this context. Could there be a better name? https://codereview.chromium.org/12213073/diff/1/gpu/command_buffer/service/qu... File gpu/command_buffer/service/query_manager.cc (right): https://codereview.chromium.org/12213073/diff/1/gpu/command_buffer/service/qu... gpu/command_buffer/service/query_manager.cc:200: mem_params.shm_data_size = sizeof(QuerySync); I was in debate of whether we needed to do this in this instance. The Query class is actually ref-counted... So if that is legit, we could insure a gpu-thread reference is kept for the duration of the async call. However, I'm suspicious of the Query being reference counted in the first place. Can it really outlive its manager safely? This is part of the reason I was hesitant to do query completion on the thread. I also used a weak pointer rather than holding a reference, to be sure. If we keep the dupe, we should investigate removing it along with the texture memory dupe. We just need an intermediate class that can be reference counted within a process. https://codereview.chromium.org/12213073/diff/1/gpu/command_buffer/service/qu... gpu/command_buffer/service/query_manager.cc:204: manager()->decoder()->GetAsyncPixelTransferDelegate()->AsyncNotifyCompletion( I think to avoid the layering violation, we could do the query setting via a callback that we create right here and pass along with the memory etc. The delegate can decide whether it needs to dupe the memory, and when/what-thread to execute the callback on. The callback will need to be very minimal in what it does, since it could happen on another thread. https://codereview.chromium.org/12213073/diff/1/gpu/command_buffer/service/qu... gpu/command_buffer/service/query_manager.cc:217: if (sync->process_count != submit_count()) Is this thread safe? Given it's simplicity maybe it is? I think we should add a comment at the minimum that we are comparing something that is being changed on another thread. https://codereview.chromium.org/12213073/diff/1/ui/gl/async_pixel_transfer_de... File ui/gl/async_pixel_transfer_delegate_android.cc (right): https://codereview.chromium.org/12213073/diff/1/ui/gl/async_pixel_transfer_de... ui/gl/async_pixel_transfer_delegate_android.cc:306: base::Lock transfer_in_progress_lock_; On 2013/02/07 19:59:02, David Reveman wrote: > we could use base::subtle::Atomic32 here instead if performance is a concern. Can you explain a bit why we need a lock or atomic int at all? https://codereview.chromium.org/12213073/diff/1/ui/gl/async_pixel_transfer_de... File ui/gl/async_pixel_transfer_delegate_stub.cc (right): https://codereview.chromium.org/12213073/diff/1/ui/gl/async_pixel_transfer_de... ui/gl/async_pixel_transfer_delegate_stub.cc:83: gpu::gles2::QuerySync* sync = static_cast<gpu::gles2::QuerySync*>( See my earlier comment. I think the layering violation can be avoided with a callback.
https://codereview.chromium.org/12213073/diff/1/ui/gl/async_pixel_transfer_de... File ui/gl/async_pixel_transfer_delegate_android.cc (right): https://codereview.chromium.org/12213073/diff/1/ui/gl/async_pixel_transfer_de... ui/gl/async_pixel_transfer_delegate_android.cc:36: class TextureUploadStats Can this be another change? I don't understand how upload stats are related to this change. And it adds another lock, each instance of which requires a lot of justification, IMO.
I also forgot to mention that this patch change async query result from always being 1 to instead always being 0. We don't use result so it shouldn't matter. https://codereview.chromium.org/12213073/diff/1/gpu/command_buffer/service/as... File gpu/command_buffer/service/async_pixel_transfer_delegate_mock.h (right): https://codereview.chromium.org/12213073/diff/1/gpu/command_buffer/service/as... gpu/command_buffer/service/async_pixel_transfer_delegate_mock.h:37: uint32 submit_count)); On 2013/02/07 20:22:56, epenner wrote: > It was a bit unclear to me before what submit_count meant, and it's even less > clear in this context. Could there be a better name? this is what the query system uses for the value that should be written to the shared memory at query completion. I wouldn't want to change it unless we change all of the query system. https://codereview.chromium.org/12213073/diff/1/gpu/command_buffer/service/qu... File gpu/command_buffer/service/query_manager.cc (right): https://codereview.chromium.org/12213073/diff/1/gpu/command_buffer/service/qu... gpu/command_buffer/service/query_manager.cc:200: mem_params.shm_data_size = sizeof(QuerySync); On 2013/02/07 20:22:56, epenner wrote: > I was in debate of whether we needed to do this in this instance. The Query > class is actually ref-counted... So if that is legit, we could insure a > gpu-thread reference is kept for the duration of the async call. > > However, I'm suspicious of the Query being reference counted in the first place. > Can it really outlive its manager safely? This is part of the reason I was > hesitant to do query completion on the thread. I also used a weak pointer rather > than holding a reference, to be sure. > > If we keep the dupe, we should investigate removing it along with the texture > memory dupe. We just need an intermediate class that can be reference counted > within a process. I think the safe way to do this for now is duplicating it and we can later change it if possible along with the texture memory. https://codereview.chromium.org/12213073/diff/1/gpu/command_buffer/service/qu... gpu/command_buffer/service/query_manager.cc:204: manager()->decoder()->GetAsyncPixelTransferDelegate()->AsyncNotifyCompletion( On 2013/02/07 20:22:56, epenner wrote: > I think to avoid the layering violation, we could do the query setting via a > callback that we create right here and pass along with the memory etc. The > delegate can decide whether it needs to dupe the memory, and when/what-thread to > execute the callback on. The callback will need to be very minimal in what it > does, since it could happen on another thread. yea, sounds like that might be a pretty clean solution. and it will get rid of that inappropriate include, yay! https://codereview.chromium.org/12213073/diff/1/gpu/command_buffer/service/qu... gpu/command_buffer/service/query_manager.cc:217: if (sync->process_count != submit_count()) On 2013/02/07 20:22:56, epenner wrote: > Is this thread safe? Given it's simplicity maybe it is? I think we should add a > comment at the minimum that we are comparing something that is being changed on > another thread. this is consistent with the code in the client side query tracker. the query system already guarantees that this is atomic across processes. I think that adding some additional thread safety guards would just be confusing. I'll add a comment though. https://codereview.chromium.org/12213073/diff/1/ui/gl/async_pixel_transfer_de... File ui/gl/async_pixel_transfer_delegate_android.cc (right): https://codereview.chromium.org/12213073/diff/1/ui/gl/async_pixel_transfer_de... ui/gl/async_pixel_transfer_delegate_android.cc:36: class TextureUploadStats On 2013/02/07 20:36:34, epenner wrote: > Can this be another change? I don't understand how upload stats are related to > this change. And it adds another lock, each instance of which requires a lot of > justification, IMO. I can't get rid of the replies and keep collecting upload stats correctly without this. we can of course remove the stats but that breaks the benchmarks extension. https://codereview.chromium.org/12213073/diff/1/ui/gl/async_pixel_transfer_de... ui/gl/async_pixel_transfer_delegate_android.cc:306: base::Lock transfer_in_progress_lock_; On 2013/02/07 20:22:56, epenner wrote: > On 2013/02/07 19:59:02, David Reveman wrote: > > we could use base::subtle::Atomic32 here instead if performance is a concern. > > Can you explain a bit why we need a lock or atomic int at all? Both threads will access this. Main thread to determine when transfer is done. Upload thread sets it when done.
I think I understand this more and I think it could be hugely simplified by keeping the reply to do non-latency-critical stuff... Every atomic int and lock adds a huge mental burden to someone unfamiliar with this code. https://codereview.chromium.org/12213073/diff/1/ui/gl/async_pixel_transfer_de... File ui/gl/async_pixel_transfer_delegate_android.cc (right): https://codereview.chromium.org/12213073/diff/1/ui/gl/async_pixel_transfer_de... ui/gl/async_pixel_transfer_delegate_android.cc:36: class TextureUploadStats I see, let's not get rid of the reply then? We should minimize the impact of this latency savings optimization on design cleanliness. Could the query code also benefit from the reply to reduce a lot of the changes there? Could we do *only* the query completion on the upload thread and handle the rest in a reply to the GPU thread? It appears a lot of GPU side code was written to work around not having the reply task. https://codereview.chromium.org/12213073/diff/1/ui/gl/async_pixel_transfer_de... ui/gl/async_pixel_transfer_delegate_android.cc:306: base::Lock transfer_in_progress_lock_; That's what atomic ints are used for yes, but I'm asking why it's required from a design perspective. Could this benefit from the reply as well? If the reply is posted to the GPU before we mark the completion, it's impossible for the renderer to do something on the GPU thread before the GPU handles the reply.
Dang. I realized the normal PostTaskAndReply doesn't do what I said. It will mark the completion and then post the reply task *after*, which leaves open the smallest of smallest of races. To get around that we can can post one task the upload thread followed by one PostTaskAndReply which handles the reply part... This is ugly, but I think the right solution for 'too many tasks' is batching.
Or maybe post the reply custom from the upload thread.
On 2013/02/07 22:52:23, epenner wrote: > Dang. I realized the normal PostTaskAndReply doesn't do what I said. It will > mark the completion and then post the reply task *after*, which leaves open the > smallest of smallest of races. To get around that we can can post one task the > upload thread followed by one PostTaskAndReply which handles the reply part... > This is ugly, but I think the right solution for 'too many tasks' is batching. And sorry again, my suggestion was in the wrong order. My suggestion is: - Dupe Memory - PostTaskAndReply(NoOpUploadtask, NonCriticalGpuAccountingTask) - PostTask(LatencyCriticalCompletionTask) If there is any races in that please feel free to poke holes in this idea. But from what I can tell its race-free.
I think you're right, removing the replies from AsyncPixelTransferDelegateAndroid as part of this CL is not necessary. At least not on android where we have a gpu thread. However, it's not clear to me that this would be safe if we were using a gpu process. In that case, can we make the assumption that a task posted to the message loop will run before an IO handler triggered by an event just a microsecond later? If it's not appropriate to make that assumption on non-android maybe not the best to do this on android either.. Anyhow, latest patch gets the replies back but keeps the modifications to the query manager, which I think are good as they allow us to avoid posting yet another task to the upload thread (I'm seeing 0.15-0.25ms delay between each task so not adding another seems important). I also like how this change makes async transfer queries behave more like other type of queries and we no longer have a set of pending transfer queries that the query manager is unaware of. PTAL. Btw, doesn't seem like TransferStateInternal need to be RefCountedThreadSafe. Latest patch makes it just RefCounted. Am I missing something here?
From a quick look it's looking really good! I'll look at it closely tonight. > In that case, can we make the assumption that a task posted to the message loop will run before an IO handler triggered by an event just a microsecond later? I'm not sure if there is a distinction on Android, as the 'GPU process' still exists. It still communicates via IPCs where it would previously, it's just that it shares the same process as the browser. Can you describe the race you are thinking of in detail? Which message / IPCs specifically are you thinking of? > Btw, doesn't seem like TransferStateInternal need to be RefCountedThreadSafe. Latest patch makes it just RefCounted. Am I missing something here? I'm very curious about this myself! The question is, is this safe? otherThread->PostTaskAndReply( SomeClosure, SomeClosureWithReferenceParameter) The reference parameter *should* only be accessed on the calling thread and not the other thread. However, what if the closure is copied-by-value or something while on the other thread? I'd just like to know for sure that it is *NOT* touched by the other thread.
On 2013/02/08 05:23:28, epenner wrote: > From a quick look it's looking really good! I'll look at it closely tonight. > > > In that case, can we make the assumption that a task posted to the message > loop will run before an IO handler triggered by an event just a microsecond > later? > > I'm not sure if there is a distinction on Android, as the 'GPU process' still > exists. It still communicates via IPCs where it would previously, it's just that > it shares the same process as the browser. Can you describe the race you are > thinking of in detail? Which message / IPCs specifically are you thinking of? Let's say our message loop looks something like this: while (!quit_) { while (renderer_channel->CanReadWithoutBlocking()) renderer_channel->Dispatch(); while (pending_tasks_.size()) pending_tasks_.front.Run(); pending_tasks_.front. } > > > Btw, doesn't seem like TransferStateInternal need to be RefCountedThreadSafe. > Latest patch makes it just RefCounted. Am I missing something here? > > I'm very curious about this myself! The question is, is this safe? > otherThread->PostTaskAndReply( > SomeClosure, > SomeClosureWithReferenceParameter) > The reference parameter *should* only be accessed on the calling thread and not > the other thread. However, what if the closure is copied-by-value or something > while on the other thread? I'd just like to know for sure that it is *NOT* > touched by the other thread.
LGTM However, I think we need a GPU reviewer for the Query changes. The query changes might have merit on their own, but if it's not solving a problem for M25 I'd be tempted to keep this minimal and just have AsyncNotifyCompletion take two callbacks (one fast and one slow doing the old stuff). But it's up to you. But we should discuss any race possibilities you are worried about in depth. If I've missed something I'd like to understand it so we can address it. My understanding was that the only race risk is that the renderer could receive the completion signal and render something using the texture before one of the replies executes on the main GPU thread. But that should be prevented entirely by the reply being added to the message loop before the render process is even notified of completion.
On 2013/02/08 05:55:55, David Reveman wrote: > On 2013/02/08 05:23:28, epenner wrote: > > From a quick look it's looking really good! I'll look at it closely tonight. > > > > > In that case, can we make the assumption that a task posted to the message > > loop will run before an IO handler triggered by an event just a microsecond > > later? > > > > I'm not sure if there is a distinction on Android, as the 'GPU process' still > > exists. It still communicates via IPCs where it would previously, it's just > that > > it shares the same process as the browser. Can you describe the race you are > > thinking of in detail? Which message / IPCs specifically are you thinking of? > > Let's say our message loop looks something like this: > > while (!quit_) { > while (renderer_channel->CanReadWithoutBlocking()) > renderer_channel->Dispatch(); > while (pending_tasks_.size()) > pending_tasks_.front.Run(); > pending_tasks_.front. > } oops, accidentally hit send. so lets say it looks something like this: while (!quit_) { while (renderer_channel->CanReadWithoutBlocking()) renderer_channel->Dispatch(); while (pending_tasks_.size()) { pending_tasks_.front.Run(); pending_tasks_.pop_back(); } } which of course is very far from the real thing but we should be able to think of the message loop as a black box and it could replace by something that resembles the code above. In this case, just because a task was added to the pending task list before renderer_channel->CanReadWithoutBlocking() would return true doesn't guarantee the task will be run first. > > > > > > Btw, doesn't seem like TransferStateInternal need to be > RefCountedThreadSafe. > > Latest patch makes it just RefCounted. Am I missing something here? > > > > I'm very curious about this myself! The question is, is this safe? > > otherThread->PostTaskAndReply( > > SomeClosure, > > SomeClosureWithReferenceParameter) > > The reference parameter *should* only be accessed on the calling thread and > not > > the other thread. However, what if the closure is copied-by-value or something > > while on the other thread? I'd just like to know for sure that it is *NOT* > > touched by the other thread. We should be safe if we use base::Unretained() and make sure not to ref it on the other thread. We should add the const modifier to prevent future mistakes. I'll fix that.
> We should be safe if we use base::Unretained() and make sure not to ref it on > the other thread. We should add the const modifier to prevent future mistakes. > I'll fix that. nevermind, we obviously have members in TransferStateInternal that we modify from the upload thread so we can't use const modifier.
Gregg, please have a look at QueryManager changes in patch when you have a chance.
> oops, accidentally hit send. so lets say it looks something like this: > > while (!quit_) { > while (renderer_channel->CanReadWithoutBlocking()) > renderer_channel->Dispatch(); > > while (pending_tasks_.size()) { > pending_tasks_.front.Run(); > pending_tasks_.pop_back(); > } > } > > which of course is very far from the real thing but we should be able to think > of the message loop as a black box and it could replace by something that > resembles the code above. > > In this case, just because a task was added to the pending task list before > renderer_channel->CanReadWithoutBlocking() would return true doesn't guarantee > the task will be run first. So your saying basically that the GPU thread's raw tasks could be reordered. Yes task order on the GPU thread is assumed right now. I think such a re-ordering message loop should probably use a different type/interface to make clear this massive semantic change, but in any event I guess the main thing for now is to confirm that this doesn't happen like you say.
On 2013/02/08 18:59:46, epenner wrote: > > oops, accidentally hit send. so lets say it looks something like this: > > > > while (!quit_) { > > while (renderer_channel->CanReadWithoutBlocking()) > > renderer_channel->Dispatch(); > > > > while (pending_tasks_.size()) { > > pending_tasks_.front.Run(); > > pending_tasks_.pop_back(); > > } > > } > > > > which of course is very far from the real thing but we should be able to think > > of the message loop as a black box and it could replace by something that > > resembles the code above. > > > > In this case, just because a task was added to the pending task list before > > renderer_channel->CanReadWithoutBlocking() would return true doesn't guarantee > > the task will be run first. > > So your saying basically that the GPU thread's raw tasks could be reordered. Not really. There's 2 assumptions that need to be correct for the current patch to be without any race conditions. when using this: upload_thread->PostTaskAndReply(PerformUpload, SetTransferComplete) upload_thread->PostTask(NotifyCompletion) 1. The SetTransferComplete callback needs to be in the caller thread's pending task queue at the time NotifyCompletion is run. 2. Any IPC events from the renderer needs to go through the same task queue. I'm guessing it's OK to make these assumptions. It felt a bit suspicious as if we would use an IPC channel for the SetTransferComplete notification, condition 1 would break. > task order on the GPU thread is assumed right now. I think such a re-ordering > message loop should probably use a different type/interface to make clear this > massive semantic change, but in any event I guess the main thing for now is to > confirm that this doesn't happen like you say. Right. I've done some more tracing and I think it's critical to get rid of these replies for good performance. The upload thread has low priority so avoiding to wake up other threads after each upload significantly impacts how packed the uploads will be on the thread. So we probably want to get rid of these replies after all but I think it's better to do that in a follow up patch unless the assumptions made in the current patch are determined inappropriate.
Sounds good. Assumptions seem solid to me unless I'm missing something big. Personally I'd prefer a "mega-task" to locks if we need to optimize it, but let's get this in first, true.
https://codereview.chromium.org/12213073/diff/7002/gpu/command_buffer/service... File gpu/command_buffer/service/query_manager.cc (right): https://codereview.chromium.org/12213073/diff/7002/gpu/command_buffer/service... gpu/command_buffer/service/query_manager.cc:210: Buffer buffer = manager()->decoder()->GetSharedMemoryBuffer(shm_id()); Is there any chance this will fail? https://codereview.chromium.org/12213073/diff/12002/gpu/command_buffer/servic... File gpu/command_buffer/service/query_manager.cc (right): https://codereview.chromium.org/12213073/diff/12002/gpu/command_buffer/servic... gpu/command_buffer/service/query_manager.cc:188: void *data = static_cast<int8*>(mem_params.shared_memory->memory()) + how do you know this memory is still valid?
https://codereview.chromium.org/12213073/diff/7002/gpu/command_buffer/service... File gpu/command_buffer/service/query_manager.cc (right): https://codereview.chromium.org/12213073/diff/7002/gpu/command_buffer/service... gpu/command_buffer/service/query_manager.cc:210: Buffer buffer = manager()->decoder()->GetSharedMemoryBuffer(shm_id()); On 2013/02/11 17:34:09, greggman wrote: > Is there any chance this will fail? Good point. Added a check for this. https://codereview.chromium.org/12213073/diff/12002/gpu/command_buffer/servic... File gpu/command_buffer/service/query_manager.cc (right): https://codereview.chromium.org/12213073/diff/12002/gpu/command_buffer/servic... gpu/command_buffer/service/query_manager.cc:188: void *data = static_cast<int8*>(mem_params.shared_memory->memory()) + On 2013/02/11 17:34:09, greggman wrote: > how do you know this memory is still valid? AsyncPixelTransferDelegate is responsible for making sure this function is called with valid memory. Stub implementation makes this callback synchronously so if the memory is valid in AsyncPixelTransfersCompletedQuery::End, it's also valid here. Android implementation dups the shared memory to ensure it's still valid when this callback is run.
lgtm
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/reveman@chromium.org/12213073/5003
Message was sent while issue was closed.
Change committed as 181883
This latest problem fixes the problem reported here: https://code.google.com/p/chromium/issues/detail?id=175863 During shutdown we were calling ProcessPendingTransferQueries() after having destroyed the scheduler. Not sure this is the best way to handle this. Maybe all this async upload related code should be moved out of MakeCurrent?
lgtm https://codereview.chromium.org/12213073/diff/16001/ui/gl/async_pixel_transfe... File ui/gl/async_pixel_transfer_delegate_android.cc (right): https://codereview.chromium.org/12213073/diff/16001/ui/gl/async_pixel_transfe... ui/gl/async_pixel_transfer_delegate_android.cc:437: // In practice, they are complete when the CPU glSubTexImage2D completes. s/SubTex/TexSub/?
https://codereview.chromium.org/12213073/diff/16001/ui/gl/async_pixel_transfe... File ui/gl/async_pixel_transfer_delegate_android.cc (right): https://codereview.chromium.org/12213073/diff/16001/ui/gl/async_pixel_transfe... ui/gl/async_pixel_transfer_delegate_android.cc:437: // In practice, they are complete when the CPU glSubTexImage2D completes. On 2013/02/13 20:31:10, greggman wrote: > s/SubTex/TexSub/? Done.
Eric, I just rebased this onto your safe shared memory change. PTAL.
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/reveman@chromium.org/12213073/27001
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/reveman@chromium.org/12213073/27001
lgtm
Commit queue rejected this change because the description was changed between the time the change entered the commit queue and the time it was ready to commit. You can safely check the commit box again.
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/reveman@chromium.org/12213073/27001
Message was sent while issue was closed.
Change committed as 183010 |