|
|
Chromium Code Reviews|
Created:
8 years, 12 months ago by Josh Horwich Modified:
8 years, 11 months ago CC:
chromium-reviews, darin-cc_chromium.org, jam Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionReimplement ReceivedSyncMsgQueue::DispatchMessages
Implementation of IPC::SyncChannel::ReceivedSyncMsgQueue::DispatchMessages that
does not hold any messages in a local stack-frame's delayed_queue, which was
causing me to see an inbound sync message from a plugin not dispatched while
the renderer was waiting for replies from the plugin. This was causing the
plugin and renderer to deadlock waiting for each other.
BUG=108491
TEST=Run Pepperized O3D and observe for tab hangs
TEST=Run ipc_tests unittests
Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=117309
Patch Set 1 #Patch Set 2 : Try to avoid rescanning entire queue. Added unittest #
Total comments: 8
Patch Set 3 : Better detection of modified message queues #Patch Set 4 : Add descriptive comment to unittest #
Total comments: 1
Messages
Total messages: 14 (0 generated)
Please take a look
It looks correct to me. Adding John, who is an OWNER and may have some opinion. I have 2 questions: - do you think it would be possible to add a test for the case that was causing a deadlock? - when doing restricted dispatch, if there are a lot of messages but only a few that are on the right channel, the code behaves in O(n^2) where n is the number of messages in the queue, because after every dispatched message we always restart from the start of the queue. Do you think it would be possible to keep track of the position in the queue across iterations of the loop? At the same time, I guess conditions that would previously trigger the deadlock might invalidate the iterator, mmh... thoughts?
rubberstamp lgtm, i defer to piman since he's more familiar with the restrict_dispatch mode. looking at the code, i shared the same concern which is that each time we have to loop through all the messages. I don't know how often this happens when you're using the channel in this mode though.
On 2012/01/03 19:15:06, piman wrote: > It looks correct to me. Adding John, who is an OWNER and may have some opinion. > I have 2 questions: > - do you think it would be possible to add a test for the case that was causing > a deadlock? Test added to ipc_sync_channel_unittest.cc. I was able to observe the deadlock, including gdb backtraces consistent with my real-world failure - with the old implementation, and see the test succeed with my new implementations (both patch set 1 and the newest in patch set 2) > - when doing restricted dispatch, if there are a lot of messages but only a few > that are on the right channel, the code behaves in O(n^2) where n is the number > of messages in the queue, because after every dispatched message we always > restart from the start of the queue. Do you think it would be possible to keep > track of the position in the queue across iterations of the loop? At the same > time, I guess conditions that would previously trigger the deadlock might > invalidate the iterator, mmh... thoughts? Patch set 2 also has a way to try to avoid a full rescan of the queue if the queue hasn't been modified between loops in DispatchMessages - and if the queue has been modified I reset the iterator to the front. Let me know if you think this is better, or if the added complexity (fragility if someone else adds code to modify the queue but doesn't set message_queue_modified_) isn't worth it...
Please take a look - updated to include a unittest, also a candidate implementation on trying to (safely) avoid re-scanning the queue from the beginning each iteration.
http://codereview.chromium.org/9022038/diff/5001/ipc/ipc_sync_channel.cc File ipc/ipc_sync_channel.cc (right): http://codereview.chromium.org/9022038/diff/5001/ipc/ipc_sync_channel.cc#newc... ipc/ipc_sync_channel.cc:103: message_queue_modified_ = false; In case of nested dispatches, I'm not sure we can safely reset message_queue_modified_, some dispatches higher in the stack may need to know as well. Possibly you need to keep that it has been modified in any iteration and then reset message_queue_modified_ back to true at the end of this function if it has. http://codereview.chromium.org/9022038/diff/5001/ipc/ipc_sync_channel_unittes... File ipc/ipc_sync_channel_unittest.cc (right): http://codereview.chromium.org/9022038/diff/5001/ipc/ipc_sync_channel_unittes... ipc/ipc_sync_channel_unittest.cc:1363: void OnDoServerTask() { I was a bit confused about the synchronization. It's worth a comment above that the synchronization puts the test in a state where at some point: - client1 is in Run() about to send an unblocking NoArgs message to server1 - client2 is in OnDoClient2Task() about to send an unblocking NoArgs message to server2 (which is blocked by server1's processing). - server1 is in OnDoServerTask() about to send an non-unblocking NoArgs message to client1. If I understand correctly the resolution should be: - server1 handles the NoArgs from client1, gets server2 to send a non-unblockign NoArgs message to client2. - re-entering from that, server2 handles the NoArgs from client2, does nothing, replies, and returns to server1 - client2 gets the reply, goes back to the main loop - client2 then gets the NoArgs message from server2 and tells it to terminate (and terminates itself). - meanwhile, the send from server2 returned, server1 replies to client1 and return from the main loop - client1 gets the reply, goes back to the main loop - client1 then gets the NoArgs message from server1, and tells it to terminate (and terminates itself). The issue was that when server2 tries its send, client2's message might have been put into the delayed_queue and not inspected for re-entrancy, but your patch fixes that. This seems reasonable to me... If you can find a way to sum it up in a comment, that would be useful (I had to go over it in depth to understand, and my brain hurts). http://codereview.chromium.org/9022038/diff/5001/ipc/ipc_sync_channel_unittes... ipc/ipc_sync_channel_unittest.cc:1410: WaitableEvent* server_ready_event, nit: indentation http://codereview.chromium.org/9022038/diff/5001/ipc/ipc_sync_channel_unittes... ipc/ipc_sync_channel_unittest.cc:1412: : Worker("channel2", Channel::MODE_CLIENT), nit: indentation (the : should be an extra 4 space indent, or 6 total).
Please take a look. Comment in the unittest expanded. Implementation should account for changes to queue from within nested dispatches. http://codereview.chromium.org/9022038/diff/5001/ipc/ipc_sync_channel.cc File ipc/ipc_sync_channel.cc (right): http://codereview.chromium.org/9022038/diff/5001/ipc/ipc_sync_channel.cc#newc... ipc/ipc_sync_channel.cc:103: message_queue_modified_ = false; On 2012/01/10 04:33:17, piman wrote: > In case of nested dispatches, I'm not sure we can safely reset > message_queue_modified_, some dispatches higher in the stack may need to know as > well. > Possibly you need to keep that it has been modified in any iteration and then > reset message_queue_modified_ back to true at the end of this function if it > has. Done. But in a different manner - now I maintain a queue 'version' each time the queue gets changed (insert or delete) and compare that in DispatchMessages so it knows if either another thread adds to the queue, or if a nested dispatch removes, etc. However, I could be swayed to restore this to patchset 1's simpler "always scan the full list" if we don't want this complexity/fragility. http://codereview.chromium.org/9022038/diff/5001/ipc/ipc_sync_channel_unittes... File ipc/ipc_sync_channel_unittest.cc (right): http://codereview.chromium.org/9022038/diff/5001/ipc/ipc_sync_channel_unittes... ipc/ipc_sync_channel_unittest.cc:1363: void OnDoServerTask() { On 2012/01/10 04:33:17, piman wrote: > I was a bit confused about the synchronization. > It's worth a comment above that the synchronization puts the test in a state > where at some point: > - client1 is in Run() about to send an unblocking NoArgs message to server1 > - client2 is in OnDoClient2Task() about to send an unblocking NoArgs message to > server2 (which is blocked by server1's processing). > - server1 is in OnDoServerTask() about to send an non-unblocking NoArgs message > to client1. > > > If I understand correctly the resolution should be: > - server1 handles the NoArgs from client1, gets server2 to send a non-unblockign > NoArgs message to client2. > - re-entering from that, server2 handles the NoArgs from client2, does nothing, > replies, and returns to server1 > - client2 gets the reply, goes back to the main loop > - client2 then gets the NoArgs message from server2 and tells it to terminate > (and terminates itself). > - meanwhile, the send from server2 returned, server1 replies to client1 and > return from the main loop > - client1 gets the reply, goes back to the main loop > - client1 then gets the NoArgs message from server1, and tells it to terminate > (and terminates itself). > > > The issue was that when server2 tries its send, client2's message might have > been put into the delayed_queue and not inspected for re-entrancy, but your > patch fixes that. > > This seems reasonable to me... If you can find a way to sum it up in a comment, > that would be useful (I had to go over it in depth to understand, and my brain > hurts). Done. Sorry for the brain pain, Your understanding is correct. I went through it again and described it in painful detail now. http://codereview.chromium.org/9022038/diff/5001/ipc/ipc_sync_channel_unittes... ipc/ipc_sync_channel_unittest.cc:1410: WaitableEvent* server_ready_event, On 2012/01/10 04:33:17, piman wrote: > nit: indentation Done. http://codereview.chromium.org/9022038/diff/5001/ipc/ipc_sync_channel_unittes... ipc/ipc_sync_channel_unittest.cc:1412: : Worker("channel2", Channel::MODE_CLIENT), On 2012/01/10 04:33:17, piman wrote: > nit: indentation (the : should be an extra 4 space indent, or 6 total). Done.
LGTM+nit, thanks. http://codereview.chromium.org/9022038/diff/14001/ipc/ipc_sync_channel_unitte... File ipc/ipc_sync_channel_unittest.cc (right): http://codereview.chromium.org/9022038/diff/14001/ipc/ipc_sync_channel_unitte... ipc/ipc_sync_channel_unittest.cc:1345: // delayed_queue. Specifically, we start with client1 about so send an typo: "about so send" -> "about to send"
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/jhorwich@chromium.org/9022038/14001
Try job failure for 9022038-14001 (retry) on linux_rel for step "browser_tests". It's a second try, previously, step "browser_tests" failed. http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=linux_rel&...
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/jhorwich@chromium.org/9022038/14001
Try job failure for 9022038-14001 (retry) on win_rel for step "update". It's a second try, previously, step "base_unittests" failed. http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=win_rel&nu... Step "update" is always a major failure. Look at the try server FAQ for more details.
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/jhorwich@chromium.org/9022038/14001
Change committed as 117309 |
