|
|
DescriptionFix for HTTP2 request hanging bug.
If, when a socket request completed asynchronously, the next
socket request in the queue failed synchronously,
ClientSocketPoolBase wouldn't continue to try to service other
socket requests in the queue.
As a result, the socket pools could end up with pending connect
requests and free socket slots, but no ConnectJobs would be made
to service them, if this happened 6 times in a row to a socket
group.
This was a relatively obscure issue, until H2 started depending
on this path for sharing sessions when different domains map to
the same IP.
BUG=723748
Review-Url: https://codereview.chromium.org/2888623011
Cr-Commit-Position: refs/heads/master@{#472952}
Committed: https://chromium.googlesource.com/chromium/src/+/9d72fe409a214856ec4d9700105b4664bb4c0861
Patch Set 1 #
Total comments: 7
Messages
Total messages: 16 (9 generated)
The CQ bit was checked by mmenke@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Description was changed from ========== Fix for HTTP2 request hanging bug. If, when a socket request completed asynchronously, the next socket request in the queue failed synchronously, we wouldn't continue to try to service other socket requests in the queue. As a result, the socket pools could end up with pending connect requests and free socket slots, but no ConnectJobs would be made to service them, if this happened 6 times in a row to a socket group. This was a relatively obscure issue, until H2 started depending on this path for sharing sessions when different domains map to the same IP. BUG=723748 ========== to ========== Fix for HTTP2 request hanging bug. If, when a socket request completed asynchronously, the next socket request in the queue failed synchronously, ClientSocketPoolBase wouldn't continue to try to service other socket requests in the queue. As a result, the socket pools could end up with pending connect requests and free socket slots, but no ConnectJobs would be made to service them, if this happened 6 times in a row to a socket group. This was a relatively obscure issue, until H2 started depending on this path for sharing sessions when different domains map to the same IP. BUG=723748 ==========
mmenke@chromium.org changed reviewers: + davidben@chromium.org
[davidben]: PTAL. Since things are notified of request completion asynchronously, having a loop here shouldn't cause any reentrancy issue. [bnc]: FYI. https://codereview.chromium.org/2888623011/diff/1/net/socket/client_socket_po... File net/socket/client_socket_pool_base.cc (right): https://codereview.chromium.org/2888623011/diff/1/net/socket/client_socket_po... net/socket/client_socket_pool_base.cc:1082: if (rv != ERR_IO_PENDING) { Could add a loop here, too, but: 1) group may be deleted if it becomes empty. 2) Every call to OnAvailableSocketSlot is followed by a call to CheckForStalledSocketGroups() (Except the call in CheckForStalledSocketGroups itself). Since user callbacks aren't invoked synchronously here, 1) isn't a blocker, it just makes things a bit messy. But I'm also not sure if there's any benefit to doing looping here instead/additionally, other than perhaps a marginal performance benefit, if we don't look through all groups for one in need of another socket request quite as often.
Okay, I think I understand what's going on? Two questions. https://codereview.chromium.org/2888623011/diff/1/net/socket/client_socket_po... File net/socket/client_socket_pool_base.cc (left): https://codereview.chromium.org/2888623011/diff/1/net/socket/client_socket_po... net/socket/client_socket_pool_base.cc:934: // the looping we leave it at this. Previously we tried to avoid infinite looping here. Should that be a concern? I guess the assumption is that, since we invoke all user callbacks here on a PostTask, we can only drain through our existing set of requests and not grow new ones in processing? https://codereview.chromium.org/2888623011/diff/1/net/socket/client_socket_po... File net/socket/client_socket_pool_base.cc (right): https://codereview.chromium.org/2888623011/diff/1/net/socket/client_socket_po... net/socket/client_socket_pool_base.cc:916: break; What's causing the socket pools to react to the free socket slot in this code?
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
https://codereview.chromium.org/2888623011/diff/1/net/socket/client_socket_po... File net/socket/client_socket_pool_base.cc (left): https://codereview.chromium.org/2888623011/diff/1/net/socket/client_socket_po... net/socket/client_socket_pool_base.cc:934: // the looping we leave it at this. On 2017/05/18 20:22:51, davidben wrote: > Previously we tried to avoid infinite looping here. Should that be a concern? I > guess the assumption is that, since we invoke all user callbacks here on a > PostTask, we can only drain through our existing set of requests and not grow > new ones in processing? Correct. I can't see how we'd get into an infinite loop, or have re-entrancy issues here because of the PostTask. Also note that we're already potentially creating two ConnectJobs in a row, since we always call this method after another OnAvailableSocketSlot call. I'm also not seeing how we'd end up looping more than once, in general, other than on sync failure/success. https://codereview.chromium.org/2888623011/diff/1/net/socket/client_socket_po... File net/socket/client_socket_pool_base.cc (right): https://codereview.chromium.org/2888623011/diff/1/net/socket/client_socket_po... net/socket/client_socket_pool_base.cc:916: break; On 2017/05/18 20:22:51, davidben wrote: > What's causing the socket pools to react to the free socket slot in this code? Destroying the socket will do that. So suppose this is the SSL pool, and the client socket pool is full. SSLClientSockets own a ClientSocketHandle, which owns a TCPClientSocket. So deleting the idle SSLClientSocket will delete the ClientSocketHandle, which will free up a slot in the lower layer pool. Which will then call its own ProcessPendingRequest and CheckForStalledSocketGroups functions. Hrm...there's another question: Should we also do a loop for checking for stalled lower layer pools? Think it's beyond the scope of this CL, but worth figuring out.
lgtm https://codereview.chromium.org/2888623011/diff/1/net/socket/client_socket_po... File net/socket/client_socket_pool_base.cc (right): https://codereview.chromium.org/2888623011/diff/1/net/socket/client_socket_po... net/socket/client_socket_pool_base.cc:916: break; On 2017/05/18 20:44:57, mmenke wrote: > On 2017/05/18 20:22:51, davidben wrote: > > What's causing the socket pools to react to the free socket slot in this code? > > Destroying the socket will do that. > > So suppose this is the SSL pool, and the client socket pool is full. > SSLClientSockets own a ClientSocketHandle, which owns a TCPClientSocket. So > deleting the idle SSLClientSocket will delete the ClientSocketHandle, which will > free up a slot in the lower layer pool. Which will then call its own > ProcessPendingRequest and CheckForStalledSocketGroups functions. > > Hrm...there's another question: Should we also do a loop for checking for > stalled lower layer pools? Think it's beyond the scope of this CL, but worth > figuring out. I guess the vague analogy to our sync-return || ERR_IO_PENDING calling convention would be that all of these functions should return a signal whether they synchronously did something and expect you to loop or whether they'll handle all their obligations asynchornously.
https://codereview.chromium.org/2888623011/diff/1/net/socket/client_socket_po... File net/socket/client_socket_pool_base.cc (right): https://codereview.chromium.org/2888623011/diff/1/net/socket/client_socket_po... net/socket/client_socket_pool_base.cc:916: break; On 2017/05/18 20:52:31, davidben wrote: > On 2017/05/18 20:44:57, mmenke wrote: > > On 2017/05/18 20:22:51, davidben wrote: > > > What's causing the socket pools to react to the free socket slot in this > code? > > > > Destroying the socket will do that. > > > > So suppose this is the SSL pool, and the client socket pool is full. > > SSLClientSockets own a ClientSocketHandle, which owns a TCPClientSocket. So > > deleting the idle SSLClientSocket will delete the ClientSocketHandle, which > will > > free up a slot in the lower layer pool. Which will then call its own > > ProcessPendingRequest and CheckForStalledSocketGroups functions. > > > > Hrm...there's another question: Should we also do a loop for checking for > > stalled lower layer pools? Think it's beyond the scope of this CL, but worth > > figuring out. > > I guess the vague analogy to our sync-return || ERR_IO_PENDING calling > convention would be that all of these functions should return a signal whether > they synchronously did something and expect you to loop or whether they'll > handle all their obligations asynchornously. Ideally, we really should only have at most one (new) idle socket when there's a higher layer stalled group, so only one call is needed. I'm skeptical that's currently the case, though, so yea, that's one approach. We could also just look at idle_socket_count(), like we do below, and then do: if (idle_socket_count() == 0) return; for (std::set<LowerLayeredPool*>::iterator it = lower_pools_.begin(); it != lower_pools_.end(); ++it) { while ((*it)->IsStalled()) { CloseOneIdleSocket(); if (idle_socket_count() == 0) return; } }
The CQ bit was checked by mmenke@chromium.org
CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 1, "attempt_start_ts": 1495141384588740, "parent_rev": "40ab4cb23bce1e6502719e82c3b058021cb149dc", "commit_rev": "9d72fe409a214856ec4d9700105b4664bb4c0861"}
Message was sent while issue was closed.
Description was changed from ========== Fix for HTTP2 request hanging bug. If, when a socket request completed asynchronously, the next socket request in the queue failed synchronously, ClientSocketPoolBase wouldn't continue to try to service other socket requests in the queue. As a result, the socket pools could end up with pending connect requests and free socket slots, but no ConnectJobs would be made to service them, if this happened 6 times in a row to a socket group. This was a relatively obscure issue, until H2 started depending on this path for sharing sessions when different domains map to the same IP. BUG=723748 ========== to ========== Fix for HTTP2 request hanging bug. If, when a socket request completed asynchronously, the next socket request in the queue failed synchronously, ClientSocketPoolBase wouldn't continue to try to service other socket requests in the queue. As a result, the socket pools could end up with pending connect requests and free socket slots, but no ConnectJobs would be made to service them, if this happened 6 times in a row to a socket group. This was a relatively obscure issue, until H2 started depending on this path for sharing sessions when different domains map to the same IP. BUG=723748 Review-Url: https://codereview.chromium.org/2888623011 Cr-Commit-Position: refs/heads/master@{#472952} Committed: https://chromium.googlesource.com/chromium/src/+/9d72fe409a214856ec4d9700105b... ==========
Message was sent while issue was closed.
Committed patchset #1 (id:1) as https://chromium.googlesource.com/chromium/src/+/9d72fe409a214856ec4d9700105b... |