|
|
Created:
7 years, 9 months ago by wtc Modified:
7 years, 9 months ago CC:
chromium-reviews, cbentzel+watch_chromium.org, darin-cc_chromium.org Base URL:
svn://svn.chromium.org/chrome/trunk/src/ Visibility:
Public. |
Description[Abandoned. Please see https://codereview.chromium.org/12468002/
instead.]
Retrieve more accurate error code (WSAECONNRESET vs.
WSAECONNABORTED) than what WSAEnumNetworkEvents reported
in network_events.iErrorCode[FD_CLOSE_BIT} by calling
recv() on the socket.
Eliminate (or at least greatly reduce) spurious wakeups
by reseting the event object when recv() returns a
positive byte count.
R=pmeenan@chromium.org,rvargas@chromium.org
BUG=180313
TEST=none
Patch Set 1 #
Total comments: 16
Messages
Total messages: 4 (0 generated)
Here is an alternative patch for the WSAECONNABORTED vs. WSAECONNRESET problem when non-blocking reads are used. It is a drastic change to how we handle FD_CLOSE. (No change to how we handle FD_READ.) The CL contains an independent change to eliminate/reduce the spurious wakeups. Please express your opinions of these two changes separately. Thanks! https://codereview.chromium.org/12546004/diff/1/net/socket/tcp_client_socket_... File net/socket/tcp_client_socket_win.cc (left): https://codereview.chromium.org/12546004/diff/1/net/socket/tcp_client_socket_... net/socket/tcp_client_socket_win.cc:1037: CreateNetLogSocketErrorCallback(rv, os_error)); If you prefer a smaller change, I can call DoRead() only inside this if block. I would replace lines 1035-1037 with rv = DoRead(core_->read_iobuffer_, core_->read_buffer_length_, read_callback_); DCHECK(rv < 0 && rv != ERR_IO_PENDING); https://codereview.chromium.org/12546004/diff/1/net/socket/tcp_client_socket_... File net/socket/tcp_client_socket_win.cc (right): https://codereview.chromium.org/12546004/diff/1/net/socket/tcp_client_socket_... net/socket/tcp_client_socket_win.cc:269: disable_overlapped_reads_(true), Please ignore this change. I need this temporary change to force non-blocking reads in my debugging. https://codereview.chromium.org/12546004/diff/1/net/socket/tcp_client_socket_... net/socket/tcp_client_socket_win.cc:1030: AssertEventNotSignaled(core_->read_overlapped_.hEvent); This asserts that WSAEnumNetworkEvents resets the event object (on success), as documented. https://codereview.chromium.org/12546004/diff/1/net/socket/tcp_client_socket_... net/socket/tcp_client_socket_win.cc:1037: read_callback_); I got the inspiration of this change from the old-fashioned way of using select(), or the modern way of using libevent. In those models, an error condition also causes the socket to be reported as readable, and we need to call recv() on the socket and let it fail to find out what the error is. I tried this and indeed this retrieves the more accurate WSAECONNRESET error after the recv() call failed. I also saw recv() returns 0 on EOF. https://codereview.chromium.org/12546004/diff/1/net/socket/tcp_client_socket_... net/socket/tcp_client_socket_win.cc:1042: // signaled but the network events flags are all clear (0). The WSAResetEvent call I added on line 864 eliminates this spurious wakeup in my debugging. I also verified the spurious wakeup was not caused by the recv(MSG_PEEK) call made in IsConnected() and IsConnectedAndIdle().
https://codereview.chromium.org/12546004/diff/1/net/socket/tcp_client_socket_... File net/socket/tcp_client_socket_win.cc (right): https://codereview.chromium.org/12546004/diff/1/net/socket/tcp_client_socket_... net/socket/tcp_client_socket_win.cc:864: BOOL ok = WSAResetEvent(core_->read_overlapped_.hEvent); I think this can cause unwanted hangs (unlikely, but possible). There can be more bytes available right after we come back from recv, and clearing the event would prevent us from knowing about them, unless new bytes come from the net. We could manually reset the event right before calling recv, but that would be reducing the size of the race window, not eliminating it. https://codereview.chromium.org/12546004/diff/1/net/socket/tcp_client_socket_... net/socket/tcp_client_socket_win.cc:1030: AssertEventNotSignaled(core_->read_overlapped_.hEvent); On 2013/03/06 17:33:43, wtc wrote: > > This asserts that WSAEnumNetworkEvents resets the event object > (on success), as documented. Sounds like a race to me. https://codereview.chromium.org/12546004/diff/1/net/socket/tcp_client_socket_... net/socket/tcp_client_socket_win.cc:1032: NOTREACHED(); I'm assuming this is for development purposes... I don't think we should leave this checked in. https://codereview.chromium.org/12546004/diff/1/net/socket/tcp_client_socket_... net/socket/tcp_client_socket_win.cc:1042: // signaled but the network events flags are all clear (0). On 2013/03/06 17:33:43, wtc wrote: > > The WSAResetEvent call I added on line 864 eliminates this > spurious wakeup in my debugging. > > I also verified the spurious wakeup was not caused by the > recv(MSG_PEEK) call made in IsConnected() and IsConnectedAndIdle(). I'm thinking that we either accept a spurious wake up, or we pay th eprice of an extra recv that should return would_block. https://codereview.chromium.org/12546004/diff/1/net/socket/tcp_client_socket_... net/socket/tcp_client_socket_win.cc:1043: NOTREACHED(); same thing here. If this triggers for a developer actively working on this code, everything is fine. If this triggers for a random developer things are not good, and this doesn't really check a code invariant (it depends on OS behavior), so I expect it to hit at some point.
https://codereview.chromium.org/12546004/diff/1/net/socket/tcp_client_socket_... File net/socket/tcp_client_socket_win.cc (right): https://codereview.chromium.org/12546004/diff/1/net/socket/tcp_client_socket_... net/socket/tcp_client_socket_win.cc:864: BOOL ok = WSAResetEvent(core_->read_overlapped_.hEvent); On 2013/03/06 23:08:49, rvargas wrote: > I think this can cause unwanted hangs (unlikely, but possible). There can be > more bytes available right after we come back from recv, and clearing the event > would prevent us from knowing about them, unless new bytes come from the net. > > We could manually reset the event right before calling recv, but that would be > reducing the size of the race window, not eliminating it. What are you trying to solve by manually resetting the event here? It gets reset automatically in WSAEnumNetworkEvents which should be atomic and eliminate any race conditions, keeping the state management all within winsock. From: http://msdn.microsoft.com/en-us/library/windows/desktop/ms741690(v=vs.85).aspx "The proper way to reset the state of an event object used with the WSAEventSelect function is to pass the handle of the event object to the WSAEnumNetworkEvents function in the hEventObject parameter. This will reset the event object and adjust the status of active FD events on the socket in an atomic fashion." https://codereview.chromium.org/12546004/diff/1/net/socket/tcp_client_socket_... net/socket/tcp_client_socket_win.cc:1030: AssertEventNotSignaled(core_->read_overlapped_.hEvent); On 2013/03/06 23:08:49, rvargas wrote: > On 2013/03/06 17:33:43, wtc wrote: > > > > This asserts that WSAEnumNetworkEvents resets the event object > > (on success), as documented. > > Sounds like a race to me. Agreed, it's not atomic so it's theoretically possible for data to arrive between the WSAEnumNetworkEvents call and the Assert. Not likely but still possible. https://codereview.chromium.org/12546004/diff/1/net/socket/tcp_client_socket_... net/socket/tcp_client_socket_win.cc:1037: read_callback_); On 2013/03/06 17:33:43, wtc wrote: > > I got the inspiration of this change from the old-fashioned > way of using select(), or the modern way of using libevent. > In those models, an error condition also causes the socket > to be reported as readable, and we need to call recv() on > the socket and let it fail to find out what the error is. > > I tried this and indeed this retrieves the more accurate > WSAECONNRESET error after the recv() call failed. > > I also saw recv() returns 0 on EOF. Is the last error from the recv() here as granular as the error code in the FD_CLOSE_BIT in the case of a close? Generally I like this because it does handle the case of incoming data in a close event which was a bug in my implementation but I'm just wondering if you also need to keep the FD_CLOSE error code in the case that the recv fails or if the error in the recv is good enough. https://codereview.chromium.org/12546004/diff/1/net/socket/tcp_client_socket_... net/socket/tcp_client_socket_win.cc:1042: // signaled but the network events flags are all clear (0). On 2013/03/06 23:08:49, rvargas wrote: > On 2013/03/06 17:33:43, wtc wrote: > > > > The WSAResetEvent call I added on line 864 eliminates this > > spurious wakeup in my debugging. > > > > I also verified the spurious wakeup was not caused by the > > recv(MSG_PEEK) call made in IsConnected() and IsConnectedAndIdle(). > > I'm thinking that we either accept a spurious wake up, or we pay th eprice of an > extra recv that should return would_block. How often do the Spurious wake-ups happen from IsConnected() and IsConnectedAndIdle()? It's light enough in here that I'm much more comfortable just letting them happen than trying to manually toggle the event. I'd just change the comment since you know what causes them to happen.
Thank you both for your comments. I will abandon this CL and move this approach to the original CL. https://codereview.chromium.org/12546004/diff/1/net/socket/tcp_client_socket_... File net/socket/tcp_client_socket_win.cc (right): https://codereview.chromium.org/12546004/diff/1/net/socket/tcp_client_socket_... net/socket/tcp_client_socket_win.cc:1037: read_callback_); On 2013/03/07 15:11:04, pmeenan1 wrote: > > Is the last error from the recv() here as granular as the error code in the > FD_CLOSE_BIT in the case of a close? Yes, in fact it is better. When the peer resets the connection, the error code in the FD_CLOSE_BIT is WSAECONNABORTED, but the last error from recv() is WSAECONNRESET, which is more accurate. https://codereview.chromium.org/12546004/diff/1/net/socket/tcp_client_socket_... net/socket/tcp_client_socket_win.cc:1042: // signaled but the network events flags are all clear (0). On 2013/03/07 15:11:04, pmeenan1 wrote: > > How often do the Spurious wake-ups happen from IsConnected() and > IsConnectedAndIdle()? I confirmed that the spurious wake-ups are NOT caused by IsConnected() and IsConnectedAndIdle(). Under the current code, spurious wake-ups are to be expected. > It's light enough in here that I'm much more comfortable > just letting them happen than trying to manually toggle the event. I'd just > change the comment since you know what causes them to happen. Done. |