|
|
Chromium Code Reviews|
Created:
8 years ago by danakj Modified:
7 years, 11 months ago CC:
chromium-reviews, piman, backer, cc-bugs_chromium.org Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
Descriptioncc: Unify context losing machinery
The lost context callback is only used in the thread proxy, and it
ends up calling all the way into the scheduler, which can cause some
weird re-entrancy. Instead, post a task to check for the lost context
both in the single and threaded proxy classes.
Tested by cc_unittests:LayerTreeHostContextTest* unit tests.
I beefed up the tests some to also test losing the context during
draw.
BUG=123444
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=178239
Patch Set 1 #Patch Set 2 : #Patch Set 3 : #
Total comments: 10
Patch Set 4 : #Patch Set 5 : #
Total comments: 6
Patch Set 6 : #Patch Set 7 : reboot #
Messages
Total messages: 30 (0 generated)
Something happened to cc-bugs..
I like fixing the timing but think we should still route this call down through to the output surface itself rather than branching inside the proxy.
Gregg recently added a glLoseContextCHROMIUM() call to the command buffer: http://code.google.com/p/chromium/issues/detail?id=166020 / http://src.chromium.org/viewvc/chrome?view=rev&revision=173441. What if we plumbed the loseCompositorContext machinery out to output surface and either used the actual extension (when using the command buffer) or emulated it in the OutputSurface implementation in tests? That'd be a lot more realistic. On Mon, Dec 17, 2012 at 3:50 PM, <jamesr@chromium.org> wrote: > I like fixing the timing but think we should still route this call down > through > to the output surface itself rather than branching inside the proxy. > > https://codereview.chromium.**org/11606012/<https://codereview.chromium.org/1... >
PTAL
https://codereview.chromium.org/11606012/diff/9024/cc/layer_tree_host.h File cc/layer_tree_host.h (left): https://codereview.chromium.org/11606012/diff/9024/cc/layer_tree_host.h#oldco... cc/layer_tree_host.h:139: void loseOutputSurface(int numTimes); webkit/compositor_bindings/web_layer_tree_view_impl.cc#214: void WebLayerTreeViewImpl::loseCompositorContext(int numTimes) { m_layerTreeHost->loseOutputSurface(numTimes); } sorry :(
https://codereview.chromium.org/11606012/diff/9024/cc/layer_tree_host.h File cc/layer_tree_host.h (left): https://codereview.chromium.org/11606012/diff/9024/cc/layer_tree_host.h#oldco... cc/layer_tree_host.h:139: void loseOutputSurface(int numTimes); On 2012/12/18 02:00:16, jamesr wrote: > webkit/compositor_bindings/web_layer_tree_view_impl.cc#214: > > > void WebLayerTreeViewImpl::loseCompositorContext(int numTimes) > { > m_layerTreeHost->loseOutputSurface(numTimes); > } > > sorry :( Yeh. I'm going to just make it a no-op in the bindings. All the unit tests seem fine there. And we can remove it from WLTV. SG?
On Mon, Dec 17, 2012 at 6:01 PM, <danakj@chromium.org> wrote: > > https://codereview.chromium.**org/11606012/diff/9024/cc/** > layer_tree_host.h<https://codereview.chromium.org/11606012/diff/9024/cc/layer_tree_host.h> > File cc/layer_tree_host.h (left): > > https://codereview.chromium.**org/11606012/diff/9024/cc/** > layer_tree_host.h#oldcode139<https://codereview.chromium.org/11606012/diff/9024/cc/layer_tree_host.h#oldcode139> > cc/layer_tree_host.h:139: void loseOutputSurface(int numTimes); > On 2012/12/18 02:00:16, jamesr wrote: > >> webkit/compositor_bindings/**web_layer_tree_view_impl.cc#**214: >> > > > void WebLayerTreeViewImpl::**loseCompositorContext(int numTimes) >> { >> m_layerTreeHost->**loseOutputSurface(numTimes); >> } >> > > sorry :( >> > > Yeh. I'm going to just make it a no-op in the bindings. All the unit > tests seem fine there. > > And we can remove it from WLTV. SG? > > No, we have layout tests that call this. If we can convince ourselves that unit tests cover all cases that the current layout tests hit then we can remove both, but we'd have to see if we're losing coverage there. The layout tests have definitely caught real issues in the past. We could also potentially route the layout tests in a different way. > https://codereview.chromium.**org/11606012/<https://codereview.chromium.org/1... >
On 2012/12/18 02:03:30, jamesr1 wrote: > On Mon, Dec 17, 2012 at 6:01 PM, <mailto:danakj@chromium.org> wrote: > > > > > https://codereview.chromium.**org/11606012/diff/9024/cc/** > > > layer_tree_host.h<https://codereview.chromium.org/11606012/diff/9024/cc/layer_tree_host.h> > > File cc/layer_tree_host.h (left): > > > > https://codereview.chromium.**org/11606012/diff/9024/cc/** > > > layer_tree_host.h#oldcode139<https://codereview.chromium.org/11606012/diff/9024/cc/layer_tree_host.h#oldcode139> > > cc/layer_tree_host.h:139: void loseOutputSurface(int numTimes); > > On 2012/12/18 02:00:16, jamesr wrote: > > > >> webkit/compositor_bindings/**web_layer_tree_view_impl.cc#**214: > >> > > > > > > void WebLayerTreeViewImpl::**loseCompositorContext(int numTimes) > >> { > >> m_layerTreeHost->**loseOutputSurface(numTimes); > >> } > >> > > > > sorry :( > >> > > > > Yeh. I'm going to just make it a no-op in the bindings. All the unit > > tests seem fine there. > > > > And we can remove it from WLTV. SG? > > > > > No, we have layout tests that call this. If we can convince ourselves that > unit tests cover all cases that the current layout tests hit then we can > remove both, but we'd have to see if we're losing coverage there. The > layout tests have definitely caught real issues in the past. > > We could also potentially route the layout tests in a different way. Oh, gross. Do they need to be able to pass something other than "1" to this function? I'll look at different routing. > > > > > https://codereview.chromium.**org/11606012/%3Chttps://codereview.chromium.org...> > >
On Mon, Dec 17, 2012 at 6:04 PM, <danakj@chromium.org> wrote: > On 2012/12/18 02:03:30, jamesr1 wrote: > >> On Mon, Dec 17, 2012 at 6:01 PM, <mailto:danakj@chromium.org> wrote: >> > > > >> > https://codereview.chromium.****org/11606012/diff/9024/cc/** >> > >> > > layer_tree_host.h<https://**codereview.chromium.org/** > 11606012/diff/9024/cc/layer_**tree_host.h<https://codereview.chromium.org/11606012/diff/9024/cc/layer_tree_host.h> > > > >> > File cc/layer_tree_host.h (left): >> > >> > https://codereview.chromium.****org/11606012/diff/9024/cc/** >> > >> > > layer_tree_host.h#oldcode139<h**ttps://codereview.chromium.** > org/11606012/diff/9024/cc/**layer_tree_host.h#oldcode139<https://codereview.chromium.org/11606012/diff/9024/cc/layer_tree_host.h#oldcode139> > > > >> > cc/layer_tree_host.h:139: void loseOutputSurface(int numTimes); >> > On 2012/12/18 02:00:16, jamesr wrote: >> > >> >> webkit/compositor_bindings/****web_layer_tree_view_impl.cc#****214: >> >> >> > >> > >> > void WebLayerTreeViewImpl::****loseCompositorContext(int numTimes) >> >> { >> >> m_layerTreeHost->****loseOutputSurface(numTimes); >> >> >> } >> >> >> > >> > sorry :( >> >> >> > >> > Yeh. I'm going to just make it a no-op in the bindings. All the unit >> > tests seem fine there. >> > >> > And we can remove it from WLTV. SG? >> > >> > >> No, we have layout tests that call this. If we can convince ourselves >> that >> unit tests cover all cases that the current layout tests hit then we can >> remove both, but we'd have to see if we're losing coverage there. The >> layout tests have definitely caught real issues in the past. >> > > We could also potentially route the layout tests in a different way. >> > > Oh, gross. Do they need to be able to pass something other than "1" to this > function? I'll look at different routing. > Or decide that the tests that use values > 1 are superfluous or are covered by our unit tests. I'd be perfectly fine with that. > > > > > >> > > https://codereview.chromium.****org/11606012/%3Chttps://codere** > view.chromium.org/11606012/ <http://codereview.chromium.org/11606012/>> > >> > >> > > > > https://codereview.chromium.**org/11606012/<https://codereview.chromium.org/1... >
Okay, this depends on https://bugs.webkit.org/show_bug.cgi?id=105238 I verified the layout tests still work. They actually lose the context now, so bonus points. Left in the plumbing for loseOutputSurface thru to the impl thread. Once there, it has the resource provider lose its context3d.
+gman for gpu/
Am I hitting the wrong command_buffer_impl though? It doesn't seem to be part of the DRT build target. (But then how do the tests pass.)
Ok, it seems that the DRT tests for lost context don't actually test anything at the moment. So I'm not sure what to do with that, and I'll drop the gpu/ changes from this.
On 2012/12/18 02:44:07, danakj wrote: > Am I hitting the wrong command_buffer_impl though? It doesn't seem to be part of > the DRT build target. > > (But then how do the tests pass.) DRT uses webkit/gpu/webgraphicscontext3d_in_process_command_buffer_impl Try commenting out parts of cc's context recovery and see if the tests pass. I'm not surprised the tests pass if the context is not lost at all, they're intended to verify that we recover.
OK! I have this now losing the context with DRT and things failed once I had the context actually getting lost. Boy, did things fail... I found a bug, I guess, in the PrioritizedResourceManager. If the context was lost and did't recreate on the first try, we have a NULL m_resourceManager in layerTreeHostImpl. Then the resource manager would crash trying to free the contents textures (which are gone anyways). I'm scared that the webkit/gpu changes are a big hack, but I'm not sure if that's actually true. I had to make the PumpCommands avoid CHECK-failing when the context is lost, because GL errors will definitely happen in that case.
https://codereview.chromium.org/11606012/diff/8/cc/gl_renderer.cc File cc/gl_renderer.cc (right): https://codereview.chromium.org/11606012/diff/8/cc/gl_renderer.cc#newcode1513 cc/gl_renderer.cc:1513: if (m_context->checkFramebufferStatus(GL_FRAMEBUFFER) != GL_FRAMEBUFFER_COMPLETE) this is a sync call and pretty expensive in some drivers. we definitely cannot afford to make this call in production builds https://codereview.chromium.org/11606012/diff/8/cc/test/fake_web_graphics_con... File cc/test/fake_web_graphics_context_3d.cc (right): https://codereview.chromium.org/11606012/diff/8/cc/test/fake_web_graphics_con... cc/test/fake_web_graphics_context_3d.cc:43: return m_contextLost ? GL_INVALID_VALUE : GL_NO_ERROR; i don't think GL_INVALID_VALUE is a valid return value - http://www.opengl.org/registry/specs/ARB/robustness.txt says this will return NO_ERROR or one of the UNKNOWN_CONTEXT_RESET_ARB family. let's do UNKNOWN https://codereview.chromium.org/11606012/diff/8/webkit/gpu/webgraphicscontext... File webkit/gpu/webgraphicscontext3d_in_process_command_buffer_impl.cc (right): https://codereview.chromium.org/11606012/diff/8/webkit/gpu/webgraphicscontext... webkit/gpu/webgraphicscontext3d_in_process_command_buffer_impl.cc:1671: context_->set_context_lost_reason(context_lost_reason_); why do we need to store the context loss reason on both WGC3DIPCBI and GLInProcessContext? can this just use context_->context_lost_reason() ?
https://codereview.chromium.org/11606012/diff/8/cc/thread_proxy.cc File cc/thread_proxy.cc (right): https://codereview.chromium.org/11606012/diff/8/cc/thread_proxy.cc#newcode313 cc/thread_proxy.cc:313: Proxy::implThread()->postTask(base::Bind(&ThreadProxy::checkLostOutputSurfaceOnImplThread, base::Unretained(this))); hmmm - how is this base::Unretained(this) safe? i.e. how do you know 'this' will exist when the task runs? the calls from the main thread to the impl thread that do postTask(..., base::Unretained(this)) are (i think) safe because when we shut down the proxy we do it by posting a task from the main thread to the impl thread and then blocking the main thread until that task runs, which will be after all other tasks already posted from main->impl thread can run. if you're already on the impl thread, however, you have no idea what tasks may be in the queue - perhaps the main thread has already posted the stop task and it's sitting on the impl thread queue. the other postTask calls in this file from the impl thread to ThreadProxy are probably also all unsafe as well :(. https://codereview.chromium.org/11606012/diff/8/cc/thread_proxy.h File cc/thread_proxy.h (right): https://codereview.chromium.org/11606012/diff/8/cc/thread_proxy.h#newcode135 cc/thread_proxy.h:135: void checkLostOutputSurfaceOnImplThread(); naming nit: checkOutputSurfaceStatus ?
https://codereview.chromium.org/11606012/diff/8/cc/gl_renderer.cc File cc/gl_renderer.cc (right): https://codereview.chromium.org/11606012/diff/8/cc/gl_renderer.cc#newcode1513 cc/gl_renderer.cc:1513: if (m_context->checkFramebufferStatus(GL_FRAMEBUFFER) != GL_FRAMEBUFFER_COMPLETE) On 2012/12/18 05:40:11, jamesr wrote: > this is a sync call and pretty expensive in some drivers. we definitely cannot > afford to make this call in production builds Then we should probably just remove this line. The dcheck fires if you lose the context, which isn't a "bug" in here.
On 2012/12/18 06:06:57, danakj wrote: > https://codereview.chromium.org/11606012/diff/8/cc/gl_renderer.cc > File cc/gl_renderer.cc (right): > > https://codereview.chromium.org/11606012/diff/8/cc/gl_renderer.cc#newcode1513 > cc/gl_renderer.cc:1513: if (m_context->checkFramebufferStatus(GL_FRAMEBUFFER) != > GL_FRAMEBUFFER_COMPLETE) > On 2012/12/18 05:40:11, jamesr wrote: > > this is a sync call and pretty expensive in some drivers. we definitely > cannot > > afford to make this call in production builds > > Then we should probably just remove this line. The dcheck fires if you lose the > context, which isn't a "bug" in here. Can you just || it with a lost context check but keep it inside the DCHECK? It is valuable to know if a framebuffer's incomplete when developing.
On Tue, Dec 18, 2012 at 1:07 AM, <jamesr@chromium.org> wrote: > On 2012/12/18 06:06:57, danakj wrote: >> >> https://codereview.chromium.org/11606012/diff/8/cc/gl_renderer.cc >> File cc/gl_renderer.cc (right): > > >> >> https://codereview.chromium.org/11606012/diff/8/cc/gl_renderer.cc#newcode1513 >> cc/gl_renderer.cc:1513: if >> (m_context->checkFramebufferStatus(GL_FRAMEBUFFER) > > != >> >> GL_FRAMEBUFFER_COMPLETE) >> On 2012/12/18 05:40:11, jamesr wrote: >> > this is a sync call and pretty expensive in some drivers. we definitely >> cannot >> > afford to make this call in production builds > > >> Then we should probably just remove this line. The dcheck fires if you >> lose > > the >> >> context, which isn't a "bug" in here. > > > Can you just || it with a lost context check but keep it inside the DCHECK? > It > is valuable to know if a framebuffer's incomplete when developing. Ya, okay that sounds reasonable.
https://codereview.chromium.org/11606012/diff/17001/cc/prioritized_resource_m... File cc/prioritized_resource_manager.cc (right): https://codereview.chromium.org/11606012/diff/17001/cc/prioritized_resource_m... cc/prioritized_resource_manager.cc:314: DCHECK(m_proxy->isImplThread() && m_proxy->isMainThreadBlocked()); Or alternatively, if resourceProvider is NULL, we should assert that there are no non-evicted backings. I'd still prefer to just never allow a NULL resourceProvider. https://codereview.chromium.org/11606012/diff/17001/cc/thread_proxy.cc File cc/thread_proxy.cc (right): https://codereview.chromium.org/11606012/diff/17001/cc/thread_proxy.cc#newcod... cc/thread_proxy.cc:982: m_layerTreeHost->deleteContentsTexturesOnImplThread(m_layerTreeHostImpl->resourceProvider()); I'd prefer to do if (m_layerTreeHostImpl->resourceProvider()) m_layerTreeHost->deleteContentsTextures()... Because the PRM assumes assumes that the resourceProvider passed in is non-NULL and the same as the resourceProvider that created resources.
https://codereview.chromium.org/11606012/diff/17001/cc/prioritized_resource.cc File cc/prioritized_resource.cc (right): https://codereview.chromium.org/11606012/diff/17001/cc/prioritized_resource.c... cc/prioritized_resource.cc:161: resourceProvider->deleteResource(id()); this is a 4-space indent file https://codereview.chromium.org/11606012/diff/17001/cc/resource_provider.cc File cc/resource_provider.cc (right): https://codereview.chromium.org/11606012/diff/17001/cc/resource_provider.cc#n... cc/resource_provider.cc:960: WebGraphicsContext3D* context3d = m_outputSurface->Context3D(); this file's currently 4-space indent. i think being consistent within a file will make automated (or semi-automated) conversion easier so i'd like to stay consistent https://codereview.chromium.org/11606012/diff/17001/cc/thread_proxy.cc File cc/thread_proxy.cc (right): https://codereview.chromium.org/11606012/diff/17001/cc/thread_proxy.cc#newcod... cc/thread_proxy.cc:313: Proxy::implThread()->postTask(base::Bind(&ThreadProxy::checkOutputSurfaceStatusOnImplThread, base::Unretained(this))); this is crashy - there's no promises that the ThreadProxy will still be alive when the task runs. when posting main->impl we can use base::Unretained(this) since shutdown blocks the main thread. I think we should keep a WeakPtr usable on the impl thread and have all tasks posted to the impl thread use that as the target.
Sorry don't worry about this CL too much. I'm going to delete all the layout tests and make unit tests for them instead. https://codereview.chromium.org/11606012/diff/8/cc/gl_renderer.cc File cc/gl_renderer.cc (right): https://codereview.chromium.org/11606012/diff/8/cc/gl_renderer.cc#newcode1513 cc/gl_renderer.cc:1513: if (m_context->checkFramebufferStatus(GL_FRAMEBUFFER) != GL_FRAMEBUFFER_COMPLETE) On 2012/12/18 06:06:57, danakj wrote: > On 2012/12/18 05:40:11, jamesr wrote: > > this is a sync call and pretty expensive in some drivers. we definitely > cannot > > afford to make this call in production builds > > Then we should probably just remove this line. The dcheck fires if you lose the > context, which isn't a "bug" in here. Done. https://codereview.chromium.org/11606012/diff/8/cc/test/fake_web_graphics_con... File cc/test/fake_web_graphics_context_3d.cc (right): https://codereview.chromium.org/11606012/diff/8/cc/test/fake_web_graphics_con... cc/test/fake_web_graphics_context_3d.cc:43: return m_contextLost ? GL_INVALID_VALUE : GL_NO_ERROR; On 2012/12/18 05:40:11, jamesr wrote: > i don't think GL_INVALID_VALUE is a valid return value - > http://www.opengl.org/registry/specs/ARB/robustness.txt says this will return > NO_ERROR or one of the UNKNOWN_CONTEXT_RESET_ARB family. let's do UNKNOWN Done. https://codereview.chromium.org/11606012/diff/8/cc/thread_proxy.h File cc/thread_proxy.h (right): https://codereview.chromium.org/11606012/diff/8/cc/thread_proxy.h#newcode135 cc/thread_proxy.h:135: void checkLostOutputSurfaceOnImplThread(); On 2012/12/18 05:44:41, jamesr wrote: > naming nit: checkOutputSurfaceStatus ? Done. https://codereview.chromium.org/11606012/diff/8/webkit/gpu/webgraphicscontext... File webkit/gpu/webgraphicscontext3d_in_process_command_buffer_impl.cc (right): https://codereview.chromium.org/11606012/diff/8/webkit/gpu/webgraphicscontext... webkit/gpu/webgraphicscontext3d_in_process_command_buffer_impl.cc:1671: context_->set_context_lost_reason(context_lost_reason_); On 2012/12/18 05:40:11, jamesr wrote: > why do we need to store the context loss reason on both WGC3DIPCBI and > GLInProcessContext? can this just use context_->context_lost_reason() ? Done. https://codereview.chromium.org/11606012/diff/17001/cc/thread_proxy.cc File cc/thread_proxy.cc (right): https://codereview.chromium.org/11606012/diff/17001/cc/thread_proxy.cc#newcod... cc/thread_proxy.cc:313: Proxy::implThread()->postTask(base::Bind(&ThreadProxy::checkOutputSurfaceStatusOnImplThread, base::Unretained(this))); On 2012/12/18 21:24:01, jamesr wrote: > this is crashy - there's no promises that the ThreadProxy will still be alive > when the task runs. when posting main->impl we can use base::Unretained(this) > since shutdown blocks the main thread. > > I think we should keep a WeakPtr usable on the impl thread and have all tasks > posted to the impl thread use that as the target. Ok, great plan! Thanks. I'll get to this.. First I'm yak shaving layer_tree_host_unittests a bit so I can have a sane environment to write these layout-test-replacing-unit-tests.
I rebooted this CL now that all the context loss testing has has been moved to cc/. Now it just unifies the single vs multi thread context-loss behaviours. Along with some unit test improvements.
lgtm
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/danakj@chromium.org/11606012/30001
Step "update" is always a major failure. Look at the try server FAQ for more details. http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=ios_rel_de...
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/danakj@chromium.org/11606012/30001
Message was sent while issue was closed.
Change committed as 178239 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
