|
|
Created:
10 years, 7 months ago by eroman Modified:
9 years, 6 months ago CC:
chromium-reviews, cbentzel+watch_chromium.org, darin-cc_chromium.org Base URL:
svn://chrome-svn/chrome/trunk/src/ Visibility:
Public. |
DescriptionAlways fallback to the next address when doing TCP connect (libevent impl).
Before it would only try the next address in the list, for specific OS errors.
Also did a slight refactoring to use a state machine for Connect().
BUG=44490
Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=47728
Patch Set 1 #
Total comments: 20
Patch Set 2 : Address wtc and wilchan comments #Patch Set 3 : fix a typo #
Total comments: 20
Patch Set 4 : address wtc's comments #Patch Set 5 : address comments #
Total comments: 1
Patch Set 6 : sync #
Messages
Total messages: 12 (0 generated)
Please review this change carefully. I added two reviewers to be double sure this won't regress something. Still testing manually to be sure.
http://codereview.chromium.org/2132014/diff/1/3 File net/socket/tcp_client_socket_libevent.h (right): http://codereview.chromium.org/2132014/diff/1/3#newcode53 net/socket/tcp_client_socket_libevent.h:53: STATE_CREATE_SOCKET_COMPLETE, Since DoCreateSocket cannot possibly return ERR_IO_PENDING, we don't need the STATE_CREATE_SOCKET_COMPLETE state. Do we have a precedence of using a STATE_xxx_COMPLETE state for a non-blocking operation? If so, it's fine to do this for consistency. Otherwise, you can merge DoCreateSocket into DoConnectSocket. We can still have a DoCreateSocket function; I just don't think it needs to be a state.
http://codereview.chromium.org/2132014/diff/1/3 File net/socket/tcp_client_socket_libevent.h (right): http://codereview.chromium.org/2132014/diff/1/3#newcode53 net/socket/tcp_client_socket_libevent.h:53: STATE_CREATE_SOCKET_COMPLETE, On 2010/05/19 01:56:19, wtc wrote: > Since DoCreateSocket cannot possibly return ERR_IO_PENDING, > we don't need the STATE_CREATE_SOCKET_COMPLETE state. > > Do we have a precedence of using a STATE_xxx_COMPLETE state > for a non-blocking operation? If so, it's fine to do this > for consistency. Otherwise, you can merge DoCreateSocket > into DoConnectSocket. We can still have a DoCreateSocket > function; I just don't think it needs to be a state. You are correct that this does not *need* to be a state, since it can only complete synchronously. I do like having it at least as a separate function, since it simplifies the multiple "return" statements in the DoConnectSocket().
http://codereview.chromium.org/2132014/diff/1/2 File net/socket/tcp_client_socket_libevent.cc (left): http://codereview.chromium.org/2132014/diff/1/2#oldcode157 net/socket/tcp_client_socket_libevent.cc:157: // Synchronous operation not supported. Why'd you get rid of this comment? I think it's still valid. http://codereview.chromium.org/2132014/diff/1/2 File net/socket/tcp_client_socket_libevent.cc (right): http://codereview.chromium.org/2132014/diff/1/2#newcode149 net/socket/tcp_client_socket_libevent.cc:149: DCHECK(next_state_ != STATE_NONE); DCHECK_NE http://codereview.chromium.org/2132014/diff/1/2#newcode171 net/socket/tcp_client_socket_libevent.cc:171: NOTREACHED() << "bad state"; My preference is to LOG(DFATAL) here so we get ERROR messages in release builds. http://codereview.chromium.org/2132014/diff/1/2#newcode238 net/socket/tcp_client_socket_libevent.cc:238: if (result == OK) { No need for braces. http://codereview.chromium.org/2132014/diff/1/2#newcode274 net/socket/tcp_client_socket_libevent.cc:274: close(socket_); HANDLE_EINTR http://codereview.chromium.org/2132014/diff/1/2#newcode424 net/socket/tcp_client_socket_libevent.cc:424: // TODO(eroman): Is this check really necessary? I'm skeptical too, but whatever. http://codereview.chromium.org/2132014/diff/1/2#newcode430 net/socket/tcp_client_socket_libevent.cc:430: int rv = DoLoop(MapConnectError(os_error)); We should be recording these os_errors :( Maybe file a bug for this? I'm happy to take it on if you don't have time. http://codereview.chromium.org/2132014/diff/1/3 File net/socket/tcp_client_socket_libevent.h (right): http://codereview.chromium.org/2132014/diff/1/3#newcode53 net/socket/tcp_client_socket_libevent.h:53: STATE_CREATE_SOCKET_COMPLETE, On 2010/05/19 01:56:19, wtc wrote: > Since DoCreateSocket cannot possibly return ERR_IO_PENDING, > we don't need the STATE_CREATE_SOCKET_COMPLETE state. > > Do we have a precedence of using a STATE_xxx_COMPLETE state > for a non-blocking operation? If so, it's fine to do this > for consistency. Otherwise, you can merge DoCreateSocket > into DoConnectSocket. We can still have a DoCreateSocket > function; I just don't think it needs to be a state. We do indeed have STATE_xxx_COMPLETE states for non-blocking operations, but I don't think it's clear that it's necessary in this case. I have a mild preference for merging DoCreateSocket() and DoConnectSocket(). The only error handling that is different is the additional StopWatchingDescriptor() call. Since we can safely call it, I think it'd be simpler to just merge the functions (so my mind doesn't have to run through the state machine as much). http://codereview.chromium.org/2132014/diff/1/3#newcode157 net/socket/tcp_client_socket_libevent.h:157: State next_state_; I wonder if this would be cleaner to read if it were named next_connect_state_. I'm just imagining some who is unfamiliar with the code reading this for the first time. It's not clear that reading/writing wouldn't get states.
Thanks for the reviews! I have collapsed the connect and create states. http://codereview.chromium.org/2132014/diff/1/2 File net/socket/tcp_client_socket_libevent.cc (left): http://codereview.chromium.org/2132014/diff/1/2#oldcode157 net/socket/tcp_client_socket_libevent.cc:157: // Synchronous operation not supported. On 2010/05/19 02:07:38, willchan wrote: > Why'd you get rid of this comment? I think it's still valid. Done. http://codereview.chromium.org/2132014/diff/1/2 File net/socket/tcp_client_socket_libevent.cc (right): http://codereview.chromium.org/2132014/diff/1/2#newcode149 net/socket/tcp_client_socket_libevent.cc:149: DCHECK(next_state_ != STATE_NONE); On 2010/05/19 02:07:38, willchan wrote: > DCHECK_NE Done. (The evils of copy-paste). http://codereview.chromium.org/2132014/diff/1/2#newcode171 net/socket/tcp_client_socket_libevent.cc:171: NOTREACHED() << "bad state"; On 2010/05/19 02:07:38, willchan wrote: > My preference is to LOG(DFATAL) here so we get ERROR messages in release builds. Done. http://codereview.chromium.org/2132014/diff/1/2#newcode238 net/socket/tcp_client_socket_libevent.cc:238: if (result == OK) { On 2010/05/19 02:07:38, willchan wrote: > No need for braces. Done. http://codereview.chromium.org/2132014/diff/1/2#newcode274 net/socket/tcp_client_socket_libevent.cc:274: close(socket_); On 2010/05/19 02:07:38, willchan wrote: > HANDLE_EINTR Done. http://codereview.chromium.org/2132014/diff/1/2#newcode424 net/socket/tcp_client_socket_libevent.cc:424: // TODO(eroman): Is this check really necessary? On 2010/05/19 02:07:38, willchan wrote: > I'm skeptical too, but whatever. Yeah, after the refactor, I want to convert this to a CHECK for a dev release -- if that passes, remove this alltogether. http://codereview.chromium.org/2132014/diff/1/2#newcode430 net/socket/tcp_client_socket_libevent.cc:430: int rv = DoLoop(MapConnectError(os_error)); On 2010/05/19 02:07:38, willchan wrote: > We should be recording these os_errors :( Maybe file a bug for this? I'm happy > to take it on if you don't have time. I fully agree, and even filed the bug crbug.com/44488 to cover that. I want to keep the refactor separate from the logging change though, so I will revisit that after this is done. http://codereview.chromium.org/2132014/diff/1/3 File net/socket/tcp_client_socket_libevent.h (right): http://codereview.chromium.org/2132014/diff/1/3#newcode53 net/socket/tcp_client_socket_libevent.h:53: STATE_CREATE_SOCKET_COMPLETE, On 2010/05/19 02:07:38, willchan wrote: > On 2010/05/19 01:56:19, wtc wrote: > > Since DoCreateSocket cannot possibly return ERR_IO_PENDING, > > we don't need the STATE_CREATE_SOCKET_COMPLETE state. > > > > Do we have a precedence of using a STATE_xxx_COMPLETE state > > for a non-blocking operation? If so, it's fine to do this > > for consistency. Otherwise, you can merge DoCreateSocket > > into DoConnectSocket. We can still have a DoCreateSocket > > function; I just don't think it needs to be a state. > > We do indeed have STATE_xxx_COMPLETE states for non-blocking operations, but I > don't think it's clear that it's necessary in this case. I have a mild > preference for merging DoCreateSocket() and DoConnectSocket(). The only error > handling that is different is the additional StopWatchingDescriptor() call. > Since we can safely call it, I think it'd be simpler to just merge the functions > (so my mind doesn't have to run through the state machine as much). Done. http://codereview.chromium.org/2132014/diff/1/3#newcode157 net/socket/tcp_client_socket_libevent.h:157: State next_state_; On 2010/05/19 02:07:38, willchan wrote: > I wonder if this would be cleaner to read if it were named next_connect_state_. > I'm just imagining some who is unfamiliar with the code reading this for the > first time. It's not clear that reading/writing wouldn't get states. Done.
LGTM. eroman: I'm sorry I wrote my comments last night in a hurry so they're not clear. What I meant was that we don't need a _COMPLETE state for DoCreateSocket because DoCreateSocket never returns ERR_IO_PENDING. However, it should be OK to use a state for DoCreateSocket. Please consider resurrecting your DoCreateSocket function. I compared the original code with the new code carefully as you asked, to avoid introducing regressions. Please see my comments below. Nit: I don't really like the _ATTEMPT state names. I know why you named them that way though. I think it's fine to say CONNECT_STATE_CONNECT CONNECT_STATE_CONNECT_COMPLETE and DoConnect DoConnectComplete http://codereview.chromium.org/2132014/diff/13001/5002 File net/socket/tcp_client_socket_libevent.cc (right): http://codereview.chromium.org/2132014/diff/13001/5002#newcode92 net/socket/tcp_client_socket_libevent.cc:92: case 0: Remove this case because it is already in MapPosixError. http://codereview.chromium.org/2132014/diff/13001/5002#newcode132 net/socket/tcp_client_socket_libevent.cc:132: // We will try to connect to each address in address_. Start with the Typo: address_ => addresses_ http://codereview.chromium.org/2132014/diff/13001/5002#newcode182 net/socket/tcp_client_socket_libevent.cc:182: if (socket_ == kInvalidSocket || SetNonBlocking(socket_)) If SetNonBlocking(socket_) fails, it seems better to still close socket_ and set socket_ to kInvalidSocket as the original code does. Similarly for the error returns at lines 199 and 207 below. http://codereview.chromium.org/2132014/diff/13001/5002#newcode214 net/socket/tcp_client_socket_libevent.cc:214: write_socket_watcher_.StopWatchingFileDescriptor(); This is redundant with the write_socket_watcher_.StopWatchingFileDescriptor() call in the DoDisconnect() call at line 220 below. Please verify the following: 1. Redundant calls to StopWatchingFileDescriptor are OK. See if it's easy to avoid them. 2. It's OK to call StopWatchingFileDescriptor without a prior call to WatchFileDescriptor. http://codereview.chromium.org/2132014/diff/13001/5002#newcode223 net/socket/tcp_client_socket_libevent.cc:223: const addrinfo* next = current_ai_->ai_next; Nit: you don't need to use the local variable 'next' now. The original code needs to save current_ai_->ai_next in a variable because it calls Disconnect(), which moves current_ai_. Also, it's good to try the next address only for the errors that shows the current address is unreachable. I guess you removed ShouldTryNextAddress because it had almost all the error codes connect() may fail with. Nit: say "fall back" instead of "fall-back" or "fallback". http://codereview.chromium.org/2132014/diff/13001/5002#newcode235 net/socket/tcp_client_socket_libevent.cc:235: TRACE_EVENT_INSTANT("socket.disconnect", this, ""); Since you removed all the TRACE_EVENT macros for Connect, perhaps you should also remove this TRACE_EVENT macro for Disconnect. http://codereview.chromium.org/2132014/diff/13001/5002#newcode398 net/socket/tcp_client_socket_libevent.cc:398: // TODO(eroman): Is this check really necessary? You can remove this check. This requires mapping EALREADY (a bug in our code) to ERR_UNEXPECTED in MapConnectError. I don't know how to handle EINPROGRESS (a bug in libevent) here. Perhaps also map it to ERR_UNEXPECTED? http://codereview.chromium.org/2132014/diff/13001/5003 File net/socket/tcp_client_socket_libevent.h (right): http://codereview.chromium.org/2132014/diff/13001/5003#newcode115 net/socket/tcp_client_socket_libevent.h:115: return next_connect_state_ != CONNECT_STATE_NONE; This test is correct. But I believe this test can be made tighter as next_connect_state_ == CONNECT_STATE_ATTEMPT_COMPLETE It should be impossible for WriteWatcher to fire while the socket is in the CONNECT_STATE_ATTEMPT state, correct? I just wanted to verify my understanding of the code.
Sending some more comments now, before you rush off to implement what wtc said. RE: wtc's comments, I like not having the CreateSocket() state, since I prefer not to have states where you don't have a complete state for it too. I like that all the error handling is down in DoConnectComplete(). I agree with wtc's comments about the naming though. I prefer CONNECT_STATE_CONNECT to ATTEMPT. http://codereview.chromium.org/2132014/diff/13001/5002 File net/socket/tcp_client_socket_libevent.cc (left): http://codereview.chromium.org/2132014/diff/13001/5002#oldcode148 net/socket/tcp_client_socket_libevent.cc:148: DCHECK(!waiting_connect_); Please add back something like this. DCHECK_EQ(CONNECT_STATE_NONE, next_connect_state_); perhaps? http://codereview.chromium.org/2132014/diff/13001/5002#oldcode150 net/socket/tcp_client_socket_libevent.cc:150: TRACE_EVENT_BEGIN("socket.connect", this, ""); Is this gone? Frankly, we should just kill them all and replace with NetLog. I'm sure if you ask mbelshe/erikkay, they'd agree.
Thanks for the detailed comments wtc; all addressed. http://codereview.chromium.org/2132014/diff/13001/5002 File net/socket/tcp_client_socket_libevent.cc (right): http://codereview.chromium.org/2132014/diff/13001/5002#newcode92 net/socket/tcp_client_socket_libevent.cc:92: case 0: On 2010/05/19 18:51:17, wtc wrote: > Remove this case because it is already in MapPosixError. Done. http://codereview.chromium.org/2132014/diff/13001/5002#newcode132 net/socket/tcp_client_socket_libevent.cc:132: // We will try to connect to each address in address_. Start with the On 2010/05/19 18:51:17, wtc wrote: > Typo: address_ => addresses_ Done. http://codereview.chromium.org/2132014/diff/13001/5002#newcode182 net/socket/tcp_client_socket_libevent.cc:182: if (socket_ == kInvalidSocket || SetNonBlocking(socket_)) On 2010/05/19 18:51:17, wtc wrote: > If SetNonBlocking(socket_) fails, it seems better to still > close socket_ and set socket_ to kInvalidSocket as the > original code does. > > Similarly for the error returns at lines 199 and 207 below. Note that we still close the sockets: it happens in DoConnectAttemptComplete(), as part of the DoDisconnect() call. This was an intentional move, since I wanted to consolidate all the location that were calling close(). http://codereview.chromium.org/2132014/diff/13001/5002#newcode214 net/socket/tcp_client_socket_libevent.cc:214: write_socket_watcher_.StopWatchingFileDescriptor(); On 2010/05/19 18:51:17, wtc wrote: > This is redundant with the write_socket_watcher_.StopWatchingFileDescriptor() > call in the DoDisconnect() call at line 220 below. > > Please verify the following: > 1. Redundant calls to StopWatchingFileDescriptor are OK. > See if it's easy to avoid them. > 2. It's OK to call StopWatchingFileDescriptor without a > prior call to WatchFileDescriptor. This only redundant on errors. In the success case, we need to unregister the watcher that connect added. I could move this into the "if (result == OK)" block if you prefer? http://codereview.chromium.org/2132014/diff/13001/5002#newcode223 net/socket/tcp_client_socket_libevent.cc:223: const addrinfo* next = current_ai_->ai_next; On 2010/05/19 18:51:17, wtc wrote: > Nit: you don't need to use the local variable 'next' now. > The original code needs to save current_ai_->ai_next in a > variable because it calls Disconnect(), which moves > current_ai_. > > Also, it's good to try the next address only for the errors > that shows the current address is unreachable. I guess you > removed ShouldTryNextAddress because it had almost all the > error codes connect() may fail with. > > Nit: say "fall back" instead of "fall-back" or "fallback". I don't think there is a situation where we wouldn't want to try the next address in the list based on how an earlier one failed. There may well be some situation where that is necessary, but since I couldn't find any record of that, I would rather change the policy and discover if it has problems in the field. http://codereview.chromium.org/2132014/diff/13001/5002#newcode235 net/socket/tcp_client_socket_libevent.cc:235: TRACE_EVENT_INSTANT("socket.disconnect", this, ""); On 2010/05/19 18:51:17, wtc wrote: > Since you removed all the TRACE_EVENT macros for Connect, > perhaps you should also remove this TRACE_EVENT macro for > Disconnect. Done. I need to do another pass and remove all of these, since I don't think this macro is actually being used anymore. http://codereview.chromium.org/2132014/diff/13001/5002#newcode398 net/socket/tcp_client_socket_libevent.cc:398: // TODO(eroman): Is this check really necessary? On 2010/05/19 18:51:17, wtc wrote: > You can remove this check. > > This requires mapping EALREADY (a bug in our code) to > ERR_UNEXPECTED in MapConnectError. > > I don't know how to handle EINPROGRESS (a bug in > libevent) here. Perhaps also map it to ERR_UNEXPECTED? I will investigate this in a follow-up.
willchan: I agree that CreateSocket() doesn't need to be a state, but it can still be a function because it does several things (opens a socket, sets it in non-blocking mode, and sets the TCP_NODELAY socket option).
On 2010/05/19 19:17:43, wtc wrote: > willchan: I agree that CreateSocket() doesn't need to be > a state, but it can still be a function because it does > several things (opens a socket, sets it in non-blocking mode, > and sets the TCP_NODELAY socket option). Ah, I see, ok! I think the existing function is short enough (30-40 lines) that breaking out to another function isn't necessary, but if you find it more readable, then I won't object. Up to you guys.
* I renamed the state to CONNECT_STATE_CONNECT / DoConnect() * I put back the CreateSocket() function. http://codereview.chromium.org/2132014/diff/13001/5002 File net/socket/tcp_client_socket_libevent.cc (left): http://codereview.chromium.org/2132014/diff/13001/5002#oldcode148 net/socket/tcp_client_socket_libevent.cc:148: DCHECK(!waiting_connect_); On 2010/05/19 18:55:08, willchan wrote: > Please add back something like this. DCHECK_EQ(CONNECT_STATE_NONE, > next_connect_state_); perhaps? Added back: DCHECK(!waiting_connect()); (Not sure how this got lost). http://codereview.chromium.org/2132014/diff/13001/5002#oldcode150 net/socket/tcp_client_socket_libevent.cc:150: TRACE_EVENT_BEGIN("socket.connect", this, ""); On 2010/05/19 18:55:08, willchan wrote: > Is this gone? Frankly, we should just kill them all and replace with NetLog. > I'm sure if you ask mbelshe/erikkay, they'd agree. Will do. http://codereview.chromium.org/2132014/diff/13001/5002 File net/socket/tcp_client_socket_libevent.cc (right): http://codereview.chromium.org/2132014/diff/13001/5002#newcode223 net/socket/tcp_client_socket_libevent.cc:223: const addrinfo* next = current_ai_->ai_next; On 2010/05/19 18:51:17, wtc wrote: > Nit: you don't need to use the local variable 'next' now. > The original code needs to save current_ai_->ai_next in a > variable because it calls Disconnect(), which moves > current_ai_. Done. > > Also, it's good to try the next address only for the errors > that shows the current address is unreachable. I guess you > removed ShouldTryNextAddress because it had almost all the > error codes connect() may fail with. > > Nit: say "fall back" instead of "fall-back" or "fallback". Done.
LGTM http://codereview.chromium.org/2132014/diff/7002/18002 File net/socket/tcp_client_socket_libevent.cc (right): http://codereview.chromium.org/2132014/diff/7002/18002#newcode128 net/socket/tcp_client_socket_libevent.cc:128: DCHECK(!waiting_connect()); We need logging.h for DCHECK |