|
|
Created:
9 years, 2 months ago by ramant (doing other things) Modified:
9 years, 2 months ago CC:
chromium-reviews, cbentzel+watch_chromium.org, eroman, darin-cc_chromium.org, mmenke Base URL:
svn://chrome-svn/chrome/trunk/src/ Visibility:
Public. |
DescriptionSend PING to check the status of the SPDY connection.
Make SpdySessions more resilient to connections dying.
The following comment describes the reasoning behind the preface and trailing PINGs (design evolved during discussions with/between jar, roberto and willchan).
http://code.google.com/p/chromium/issues/detail?id=89725#c6
connection_at_risk_of_loss_ms_ is a knob which could be used to control the sending of preface-PING. It is set to 0ms so that a PING is sent whenever we send a SYN_STREAM.
BUG=89725, 34752
R=willchan,jar
TEST=network unit tests
Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=105723
Patch Set 1 #
Total comments: 8
Patch Set 2 : '' #Patch Set 3 : '' #Patch Set 4 : '' #Patch Set 5 : '' #
Total comments: 14
Patch Set 6 : '' #Patch Set 7 : '' #Patch Set 8 : '' #
Total comments: 12
Patch Set 9 : '' #Patch Set 10 : '' #Patch Set 11 : '' #Patch Set 12 : '' #Patch Set 13 : '' #Patch Set 14 : '' #
Total comments: 56
Patch Set 15 : '' #Patch Set 16 : '' #Patch Set 17 : '' #
Total comments: 24
Patch Set 18 : '' #Patch Set 19 : '' #Patch Set 20 : '' #Patch Set 21 : '' #Patch Set 22 : '' #
Messages
Total messages: 19 (0 generated)
http://codereview.chromium.org/8230037/diff/1/net/base/net_log_event_type_list.h File net/base/net_log_event_type_list.h (right): http://codereview.chromium.org/8230037/diff/1/net/base/net_log_event_type_lis... net/base/net_log_event_type_list.h:857: // "stream_id": <The stream ID for the window update>, There shouldn't be a stream_id. http://codereview.chromium.org/8230037/diff/1/net/spdy/spdy_http_stream.cc File net/spdy/spdy_http_stream.cc (right): http://codereview.chromium.org/8230037/diff/1/net/spdy/spdy_http_stream.cc#ne... net/spdy/spdy_http_stream.cc:286: DoCallback(ERR_CONNECTION_ABORTED); Need a different error code. http://codereview.chromium.org/8230037/diff/1/net/spdy/spdy_session.cc File net/spdy/spdy_session.cc (right): http://codereview.chromium.org/8230037/diff/1/net/spdy/spdy_session.cc#newcod... net/spdy/spdy_session.cc:506: make_scoped_refptr(new NetLogSpdyPingParameter(stream_id, unique_id))); There's no need to log the stream_id, just the unique_id. The unique_id should probably be the stream_id. http://codereview.chromium.org/8230037/diff/1/net/spdy/spdy_session.h File net/spdy/spdy_session.h (right): http://codereview.chromium.org/8230037/diff/1/net/spdy/spdy_session.h#newcode204 net/spdy/spdy_session.h:204: if (sent_ping_) This isn't good enough, you're going to need to track multiple pings in flight.
http://codereview.chromium.org/8230037/diff/10006/net/base/net_log_event_type... File net/base/net_log_event_type_list.h (right): http://codereview.chromium.org/8230037/diff/10006/net/base/net_log_event_type... net/base/net_log_event_type_list.h:857: // "unique_id": <The unique_id of the PING message>, You don't need the underscore in the description of the key. http://codereview.chromium.org/8230037/diff/10006/net/spdy/spdy_session.cc File net/spdy/spdy_session.cc (right): http://codereview.chromium.org/8230037/diff/10006/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:495: int result = SendPing(stream_id); You should queue this _after_ the SYN_STREAM. We don't want to slow down the SYN_STREAM by the serialization latency of the PING. And is there a reason you aren't doing this in SpdyHttpStream like you did previously? http://codereview.chromium.org/8230037/diff/10006/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:663: void SpdySession::OnWriteCompleteInternal(int result, bool ping_frame) { You shouldn't need this extra bool parameter. http://codereview.chromium.org/8230037/diff/10006/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:780: bool ping_frame = false; You shouldn't need this. http://codereview.chromium.org/8230037/diff/10006/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1500: const base::TimeDelta kInterval = base::TimeDelta::FromMilliseconds(10000); Wow, this logic is not what we had discussed in the meeting. Why is there all this pre-PING and post-PING stuff? Is this a new change after Jim and Roberto talked? Can someone tell me what's going on? http://codereview.chromium.org/8230037/diff/10006/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1555: QueueFrame(ping_frame.get(), SPDY_PRIORITY_HIGHEST, stream); Here's the problem. Pass a NULL stream to QueueFrame(). A PING frame shouldn't be tied to a stream.
http://codereview.chromium.org/8230037/diff/10006/net/spdy/spdy_session.cc File net/spdy/spdy_session.cc (right): http://codereview.chromium.org/8230037/diff/10006/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1500: const base::TimeDelta kInterval = base::TimeDelta::FromMilliseconds(10000); On 2011/10/13 06:38:39, willchan wrote: > Wow, this logic is not what we had discussed in the meeting. Why is there all > this pre-PING and post-PING stuff? Is this a new change after Jim and Roberto > talked? Can someone tell me what's going on? From Raman: """ pre-PING and post-PING came after Jim and Roberto talked. Before we send the request, send a pre-PING if there are no ping's in flight. Send a post-PING at the end of the request (if we didn't post a post-PING already). Send post-PING's once in ?? (5) secs. """ Raman, just reply here directly so I don't have to copy / paste it in.
I'll be out until after noon at training. Thanks for working on this! I'm worried that we might need to do validation of the PING unique_ids received from the server. http://codereview.chromium.org/8230037/diff/10006/net/spdy/spdy_session.cc File net/spdy/spdy_session.cc (right): http://codereview.chromium.org/8230037/diff/10006/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:495: int result = SendPing(stream_id); On 2011/10/13 06:38:39, willchan wrote: > You should queue this _after_ the SYN_STREAM. We don't want to slow down the > SYN_STREAM by the serialization latency of the PING. You didn't address this comment. > > And is there a reason you aren't doing this in SpdyHttpStream like you did > previously? You didn't address this question. http://codereview.chromium.org/8230037/diff/11010/net/spdy/spdy_session.cc File net/spdy/spdy_session.cc (right): http://codereview.chromium.org/8230037/diff/11010/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1351: --pings_in_flight_; If you aren't doing any PING validation of the unique_id, what happens if the server is broken and sends us a PING immediately? Will this underflow? http://codereview.chromium.org/8230037/diff/11010/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1510: if (!follower_ping_pending_) { What's the difference between a follower_ping and a post-PING? http://codereview.chromium.org/8230037/diff/11010/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1531: // frames. What does this comment mean? What is supposed to happen with this code? http://codereview.chromium.org/8230037/diff/11010/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1544: if (!IsStreamActive(stream_id)) Why is WritePingFrame() looking up the stream? http://codereview.chromium.org/8230037/diff/11010/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1586: base::TimeDelta delay = now - received_data_time_ + kHungInterval; Is this math right? Shouldn't it be kHungInterval - (now - received_data_time_)? http://codereview.chromium.org/8230037/diff/11010/net/spdy/spdy_session.h File net/spdy/spdy_session.h (right): http://codereview.chromium.org/8230037/diff/11010/net/spdy/spdy_session.h#new... net/spdy/spdy_session.h:297: int SendPing(spdy::SpdyStreamId stream_id); SendPing() shouldn't be tied to a |stream_id|. It should take a unique_id. The only caller right now should be the code that happens to use a unique_id that equals the stream_id. Same thing applies for the rest of these Send/Write methods.
Hi willchan, Changed the code to send PING from SpdyHttpStream::OnSendHeadersComplete and made all the other changes you had suggested. This is still work in progress. thanks, raman http://codereview.chromium.org/8230037/diff/1/net/base/net_log_event_type_list.h File net/base/net_log_event_type_list.h (right): http://codereview.chromium.org/8230037/diff/1/net/base/net_log_event_type_lis... net/base/net_log_event_type_list.h:857: // "stream_id": <The stream ID for the window update>, On 2011/10/12 05:41:06, willchan wrote: > There shouldn't be a stream_id. Done. http://codereview.chromium.org/8230037/diff/1/net/spdy/spdy_http_stream.cc File net/spdy/spdy_http_stream.cc (right): http://codereview.chromium.org/8230037/diff/1/net/spdy/spdy_http_stream.cc#ne... net/spdy/spdy_http_stream.cc:286: DoCallback(ERR_CONNECTION_ABORTED); On 2011/10/12 05:41:06, willchan wrote: > Need a different error code. Done. http://codereview.chromium.org/8230037/diff/1/net/spdy/spdy_session.cc File net/spdy/spdy_session.cc (right): http://codereview.chromium.org/8230037/diff/1/net/spdy/spdy_session.cc#newcod... net/spdy/spdy_session.cc:506: make_scoped_refptr(new NetLogSpdyPingParameter(stream_id, unique_id))); On 2011/10/12 05:41:06, willchan wrote: > There's no need to log the stream_id, just the unique_id. The unique_id should > probably be the stream_id. Done. http://codereview.chromium.org/8230037/diff/1/net/spdy/spdy_session.h File net/spdy/spdy_session.h (right): http://codereview.chromium.org/8230037/diff/1/net/spdy/spdy_session.h#newcode204 net/spdy/spdy_session.h:204: if (sent_ping_) On 2011/10/12 05:41:06, willchan wrote: > This isn't good enough, you're going to need to track multiple pings in flight. Done. http://codereview.chromium.org/8230037/diff/10006/net/base/net_log_event_type... File net/base/net_log_event_type_list.h (right): http://codereview.chromium.org/8230037/diff/10006/net/base/net_log_event_type... net/base/net_log_event_type_list.h:857: // "unique_id": <The unique_id of the PING message>, On 2011/10/13 06:38:39, willchan wrote: > You don't need the underscore in the description of the key. Done. http://codereview.chromium.org/8230037/diff/10006/net/spdy/spdy_session.cc File net/spdy/spdy_session.cc (right): http://codereview.chromium.org/8230037/diff/10006/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:495: int result = SendPing(stream_id); On 2011/10/13 15:47:17, willchan wrote: > On 2011/10/13 06:38:39, willchan wrote: > > You should queue this _after_ the SYN_STREAM. We don't want to slow down the > > SYN_STREAM by the serialization latency of the PING. > > You didn't address this comment. > > > > > And is there a reason you aren't doing this in SpdyHttpStream like you did > > previously? > > You didn't address this question. Jim said Roberto wanted us to send the pre-ping request early. Would be great if you, Jim and Roberto can have a quick meeting. It is a very small change to move the call into spdy_http_stream. thanks much. http://codereview.chromium.org/8230037/diff/10006/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:663: void SpdySession::OnWriteCompleteInternal(int result, bool ping_frame) { On 2011/10/13 06:38:39, willchan wrote: > You shouldn't need this extra bool parameter. Done. http://codereview.chromium.org/8230037/diff/10006/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:780: bool ping_frame = false; On 2011/10/13 06:38:39, willchan wrote: > You shouldn't need this. Done. http://codereview.chromium.org/8230037/diff/10006/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1500: const base::TimeDelta kInterval = base::TimeDelta::FromMilliseconds(10000); On 2011/10/13 15:10:53, willchan wrote: > On 2011/10/13 06:38:39, willchan wrote: > > Wow, this logic is not what we had discussed in the meeting. Why is there all > > this pre-PING and post-PING stuff? Is this a new change after Jim and Roberto > > talked? Can someone tell me what's going on? > > From Raman: > """ > pre-PING and post-PING came after Jim and Roberto talked. > > Before we send the request, send a pre-PING if there are no ping's in flight. > Send a post-PING at the end of the request (if we didn't post a post-PING > already). Send post-PING's once in ?? (5) secs. > """ > > Raman, just reply here directly so I don't have to copy / paste it in. Sorry about that. http://codereview.chromium.org/8230037/diff/10006/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1555: QueueFrame(ping_frame.get(), SPDY_PRIORITY_HIGHEST, stream); On 2011/10/13 06:38:39, willchan wrote: > Here's the problem. Pass a NULL stream to QueueFrame(). A PING frame shouldn't > be tied to a stream. Done. http://codereview.chromium.org/8230037/diff/11010/net/spdy/spdy_session.cc File net/spdy/spdy_session.cc (right): http://codereview.chromium.org/8230037/diff/11010/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1351: --pings_in_flight_; On 2011/10/13 15:47:17, willchan wrote: > If you aren't doing any PING validation of the unique_id, what happens if the > server is broken and sends us a PING immediately? Will this underflow? Decrements pings_in_flight_ only if client had sent (we send odd unique_ids). Added DCHECK for pings_in_flight_ to be > 0 (made sure it doesn't go below a negative number). Would like to talk to you and Jim about the above. thanks. http://codereview.chromium.org/8230037/diff/11010/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1510: if (!follower_ping_pending_) { On 2011/10/13 15:47:17, willchan wrote: > What's the difference between a follower_ping and a post-PING? Used the word post_ping everywhere (sorry for using follower_ping). Done. http://codereview.chromium.org/8230037/diff/11010/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1531: // frames. On 2011/10/13 15:47:17, willchan wrote: > What does this comment mean? What is supposed to happen with this code? My intention is not to send PING frame as a separate packet (wanted it to bo sent along with other data). Would like to talk to you if I got the above logic right or not. thanks. http://codereview.chromium.org/8230037/diff/11010/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1544: if (!IsStreamActive(stream_id)) On 2011/10/13 15:47:17, willchan wrote: > Why is WritePingFrame() looking up the stream? Deleted the use of stream_id from all PING related code. Done. http://codereview.chromium.org/8230037/diff/11010/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1586: base::TimeDelta delay = now - received_data_time_ + kHungInterval; On 2011/10/13 15:47:17, willchan wrote: > Is this math right? Shouldn't it be kHungInterval - (now - received_data_time_)? Done. http://codereview.chromium.org/8230037/diff/11010/net/spdy/spdy_session.h File net/spdy/spdy_session.h (right): http://codereview.chromium.org/8230037/diff/11010/net/spdy/spdy_session.h#new... net/spdy/spdy_session.h:297: int SendPing(spdy::SpdyStreamId stream_id); On 2011/10/13 15:47:17, willchan wrote: > SendPing() shouldn't be tied to a |stream_id|. It should take a unique_id. The > only caller right now should be the code that happens to use a unique_id that > equals the stream_id. > > Same thing applies for the rest of these Send/Write methods. Dropped the stream_id argument. As you had suggested earlier, didn't want to use stream_id for unique_id. Added unique_id_counter_ to determine the unique_id. Hope that is ok. thanks. Done.
Comments below based on our team-programming discussion. http://codereview.chromium.org/8230037/diff/25001/net/http/http_network_layer.cc File net/http/http_network_layer.cc (right): http://codereview.chromium.org/8230037/diff/25001/net/http/http_network_layer... net/http/http_network_layer.cc:89: SpdySession::set_send_ping_for_every_request(true); As per discussion... I think you should disable this in the few tests that are currently failing. Then you can (in the default case) have true without this call. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc File net/spdy/spdy_session.cc (right): http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:187: ~NetLogSpdyPingParameter() {} Question for WillChan: Does making this private preclude every destroying an instance? (since this class doesn't do the deed). Does this me we (by design) leak all such created instances? http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:287: verify_domain_authentication_(verify_domain_authentication) { Since we just got data over the underlying TCP/IP connection to set this up.... please initialize received_data_ to TimeTicks::Now(). http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:512: SendPing(); We need a better name, since this confused me into thinking ew always send a ping in front of every message. Perhaps: SendPrefacePingIfNoneInFlight() http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:539: // Find our stream We should probably (just for safety and correctness) call: SendPrefacePingIfNoneInFlight() http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1367: make_scoped_refptr(new NetLogSpdyPingParameter(frame.unique_id()))); We probably need to condition this on logging... especially if the new'ed item would leak otherwise. WillChan: What am I missing?? http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1371: WritePingFrame(); You need to pass the frame.unique_id http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1535: base::TimeDelta::FromMilliseconds(hung_interval_ms_); This needs a different timer name. It is really meant to be an optimization to avoid sending wasteful preface pings (when we just got some data). Perhaps we could call it: connection_at_risk_of_loss_ms If it is zero (the most conservative figure), then we always send the preface ping (when none are in flight). It is common for TCP/IP sessions to time out in about 3-5 minutes. Certainly if it has been more than 3 minutes, we do want to send a preface ping. I don't think any connection will time out in under about 10 seconds. So this might as well be set to something conservative like 10 seconds. Later, we could adjust it to send fewer pings perhaps. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1549: ++pings_in_flight_; You should pass the next_ping_id_ to WritePingFrame. Then you can move ++pings_in_flight_ into WritePingFrame(n), and condition it on incrementing only if we have an odd number. Similarly, last_sent_was_ping_ can be set in that function (when it is an odd number). http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1582: check_status_delay_time_ms_); This should be our hung_interval_ms (or newer name) http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1595: unique_id_counter_ += 2; This is only done if we were sending (an odd number) which will come as an argument. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1610: DCHECK(base::TimeTicks::Now() - received_data_time_ > kHungInterval); I think this should be >= kHungInterval http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1616: base::TimeTicks now = base::TimeTicks::Now(); You mentioned this code might be clearer if you just did the now call up around line 1606. You can then calculate time since received_data_time, and subtract kHungInterval (which is the amount of time you still need to wait). If that is negative, then we fail and close connection. If it is positive, then we post delayed task for that amount of tim. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.h File net/spdy/spdy_session.h (right): http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.h#new... net/spdy/spdy_session.h:108: void SendPing(); In cases where you ONLY use methods for tests, please make these public methods, which should not be used in chrome, private. It is important that we don't lure users (in chrome) into calling these methods. Friends are bad when used in regular code, but not for test code, that needs to indecently twiddle interior state etc. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.h#new... net/spdy/spdy_session.h:346: void SendPostPing(); You talked about replacing the word "Post" This could mean this would be perchance: SendTrailingPing If you like that picture, we could change SendPrePing to: SendPrefacePing http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.h#new... net/spdy/spdy_session.h:354: void PlanToCheckStatus(); When talking, you suggested adding the word "Ping" to this name, such as: PlanToCheckPingStatus http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.h#new... net/spdy/spdy_session.h:509: // This list keeps track of all pings that are in flight. nit: Suggest changing comment to: Count of all pings on the wire, for which we have not gotten a response. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.h#new... net/spdy/spdy_session.h:514: uint32 unique_id_counter_; suggest: next_ping_id_; (and adjust comment) http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.h#new... net/spdy/spdy_session.h:519: // Flag if we have a pending post-PING. Indicate if we have already scheduled a delayed task to send a trailing ping (and we never have more than one scheduled at a time). http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.h#new... net/spdy/spdy_session.h:522: // Flag if we have a check status pending. Indicate if we have already scheduled a delayed task to check the ping status. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.h#new... net/spdy/spdy_session.h:526: // sent any other data). Indicate if the last data we sent was a ping (generally, a follower ping). This helps us decided if we need yet another follower ping, or if it would be a waste of effort (and MUST not be done). http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.h#new... net/spdy/spdy_session.h:549: static bool send_ping_for_every_request_; That name is no longer valid (since we don't send one per request). You use it to enable and disable this whole connection health checking system. Perhaps better would be: enable_ping_based_connection_checking_ http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.h#new... net/spdy/spdy_session.h:552: static int hung_interval_ms_; Use only one var for line 551 and 552. Suggest using the "hung_interval_ms_". A comment could be: The amount of time (in milliseconds) that we are willing to tolerate with no data received (of any form), while there is a ping in flight, before we declare the connection to be hung.
http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc File net/spdy/spdy_session.cc (right): http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:187: ~NetLogSpdyPingParameter() {} On 2011/10/14 19:59:07, jar wrote: > Question for WillChan: Does making this private preclude every destroying an > instance? (since this class doesn't do the deed). Does this me we (by design) > leak all such created instances? Refcounted http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1447: last_sent_was_ping_ = false; Why don't we just call this in QueueFrame()? You can identify the frame type there. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1531: if (pings_in_flight_ || post_ping_pending_ || !send_ping_for_every_request_) So, if we are sending a ping for every request, how does that get sent out? http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1546: AutoReset<bool> reset(&delayed_write_pending_, true); Please explain how this works. How does this coalesce the PING and SYN_STREAM into the same packet? The SpdySession::WriteSocket() function is still only calling Socket::Write() with a buffer for a single frame.
http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc File net/spdy/spdy_session.cc (right): http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1610: DCHECK(base::TimeTicks::Now() - received_data_time_ > kHungInterval); On 2011/10/14 19:59:07, jar wrote: > I think this should be >= kHungInterval Can you use DCHECK_GT?
Hi willchan and jar, Made all the changes you had suggested. All net unittests have passed in my dev env. Would appreciate if you could take another look at this CL. thanks very much, raman http://codereview.chromium.org/8230037/diff/25001/net/http/http_network_layer.cc File net/http/http_network_layer.cc (right): http://codereview.chromium.org/8230037/diff/25001/net/http/http_network_layer... net/http/http_network_layer.cc:89: SpdySession::set_send_ping_for_every_request(true); On 2011/10/14 19:59:07, jar wrote: > As per discussion... I think you should disable this in the few tests that are > currently failing. Then you can (in the default case) have true without this > call. Done. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc File net/spdy/spdy_session.cc (right): http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:187: ~NetLogSpdyPingParameter() {} On 2011/10/14 22:49:21, willchan wrote: > On 2011/10/14 19:59:07, jar wrote: > > Question for WillChan: Does making this private preclude every destroying an > > instance? (since this class doesn't do the deed). Does this me we (by > design) > > leak all such created instances? > > Refcounted Done. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:187: ~NetLogSpdyPingParameter() {} On 2011/10/14 19:59:07, jar wrote: > Question for WillChan: Does making this private preclude every destroying an > instance? (since this class doesn't do the deed). Does this me we (by design) > leak all such created instances? Done. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:287: verify_domain_authentication_(verify_domain_authentication) { On 2011/10/14 19:59:07, jar wrote: > Since we just got data over the underlying TCP/IP connection to set this up.... > please initialize received_data_ to TimeTicks::Now(). Done. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:512: SendPing(); On 2011/10/14 19:59:07, jar wrote: > We need a better name, since this confused me into thinking ew always send a > ping in front of every message. Perhaps: > > SendPrefacePingIfNoneInFlight() Done. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:539: // Find our stream On 2011/10/14 19:59:07, jar wrote: > We should probably (just for safety and correctness) call: > > SendPrefacePingIfNoneInFlight() Done. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1367: make_scoped_refptr(new NetLogSpdyPingParameter(frame.unique_id()))); On 2011/10/14 19:59:07, jar wrote: > We probably need to condition this on logging... especially if the new'ed item > would leak otherwise. > > WillChan: What am I missing?? Done. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1371: WritePingFrame(); On 2011/10/14 19:59:07, jar wrote: > You need to pass the frame.unique_id Done. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1447: last_sent_was_ping_ = false; On 2011/10/14 22:49:21, willchan wrote: > Why don't we just call this in QueueFrame()? You can identify the frame type > there. We don't want to set last_sent_was_ping_ to true when we send ping response for the ping request from server. Wasn't sure how to differentiate if a ping frame is client generated vs a response to server. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1531: if (pings_in_flight_ || post_ping_pending_ || !send_ping_for_every_request_) On 2011/10/14 22:49:21, willchan wrote: > So, if we are sending a ping for every request, how does that get sent out? The variable name is wrong. We don't send ping for every request. Done. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1535: base::TimeDelta::FromMilliseconds(hung_interval_ms_); On 2011/10/14 19:59:07, jar wrote: > This needs a different timer name. It is really meant to be an optimization to > avoid sending wasteful preface pings (when we just got some data). Perhaps we > could call it: > > connection_at_risk_of_loss_ms > > > If it is zero (the most conservative figure), then we always send the preface > ping (when none are in flight). > > It is common for TCP/IP sessions to time out in about 3-5 minutes. Certainly if > it has been more than 3 minutes, we do want to send a preface ping. > > I don't think any connection will time out in under about 10 seconds. So this > might as well be set to something conservative like 10 seconds. Later, we could > adjust it to send fewer pings perhaps. Done. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1546: AutoReset<bool> reset(&delayed_write_pending_, true); On 2011/10/14 22:49:21, willchan wrote: > Please explain how this works. How does this coalesce the PING and SYN_STREAM > into the same packet? The SpdySession::WriteSocket() function is still only > calling Socket::Write() with a buffer for a single frame. Deleted the AutoReset. You are right, we are writing into socket for every frame. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1549: ++pings_in_flight_; On 2011/10/14 19:59:07, jar wrote: > You should pass the next_ping_id_ to WritePingFrame. Then you can move > ++pings_in_flight_ into WritePingFrame(n), and condition it on incrementing only > if we have an odd number. > > Similarly, last_sent_was_ping_ can be set in that function (when it is an odd > number). Done. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1582: check_status_delay_time_ms_); On 2011/10/14 19:59:07, jar wrote: > This should be our hung_interval_ms (or newer name) Done. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1595: unique_id_counter_ += 2; On 2011/10/14 19:59:07, jar wrote: > This is only done if we were sending (an odd number) which will come as an > argument. Done. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1610: DCHECK(base::TimeTicks::Now() - received_data_time_ > kHungInterval); On 2011/10/14 19:59:07, jar wrote: > I think this should be >= kHungInterval Done. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1610: DCHECK(base::TimeTicks::Now() - received_data_time_ > kHungInterval); On 2011/10/14 23:13:11, willchan wrote: > On 2011/10/14 19:59:07, jar wrote: > > I think this should be >= kHungInterval > Can you use DCHECK_GT? DCHECK_GE was giving errors for TimeDelta arguments (logging.h was trying to create a string). http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1616: base::TimeTicks now = base::TimeTicks::Now(); On 2011/10/14 19:59:07, jar wrote: > You mentioned this code might be clearer if you just did the now call up around > line 1606. You can then calculate time since received_data_time, and subtract > kHungInterval (which is the amount of time you still need to wait). > > If that is negative, then we fail and close connection. > > If it is positive, then we post delayed task for that amount of tim. Done. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.h File net/spdy/spdy_session.h (right): http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.h#new... net/spdy/spdy_session.h:108: void SendPing(); On 2011/10/14 19:59:07, jar wrote: > In cases where you ONLY use methods for tests, please make these public methods, > which should not be used in chrome, private. > > It is important that we don't lure users (in chrome) into calling these methods. > Friends are bad when used in regular code, but not for test code, that needs to > indecently twiddle interior state etc. Done. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.h#new... net/spdy/spdy_session.h:346: void SendPostPing(); On 2011/10/14 19:59:07, jar wrote: > You talked about replacing the word "Post" This could mean this would be > perchance: > SendTrailingPing > > > > If you like that picture, we could change SendPrePing to: > SendPrefacePing Done. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.h#new... net/spdy/spdy_session.h:354: void PlanToCheckStatus(); On 2011/10/14 19:59:07, jar wrote: > When talking, you suggested adding the word "Ping" to this name, such as: > PlanToCheckPingStatus > Done. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.h#new... net/spdy/spdy_session.h:509: // This list keeps track of all pings that are in flight. On 2011/10/14 19:59:07, jar wrote: > nit: Suggest changing comment to: > > Count of all pings on the wire, for which we have not gotten a response. Done. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.h#new... net/spdy/spdy_session.h:514: uint32 unique_id_counter_; On 2011/10/14 19:59:07, jar wrote: > suggest: next_ping_id_; > > (and adjust comment) Done. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.h#new... net/spdy/spdy_session.h:519: // Flag if we have a pending post-PING. On 2011/10/14 19:59:07, jar wrote: > Indicate if we have already scheduled a delayed task to send a trailing ping > (and we never have more than one scheduled at a time). Done. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.h#new... net/spdy/spdy_session.h:522: // Flag if we have a check status pending. On 2011/10/14 19:59:07, jar wrote: > Indicate if we have already scheduled a delayed task to check the ping status. Done. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.h#new... net/spdy/spdy_session.h:526: // sent any other data). On 2011/10/14 19:59:07, jar wrote: > Indicate if the last data we sent was a ping (generally, a follower ping). This > helps us decided if we need yet another follower ping, or if it would be a waste > of effort (and MUST not be done). Done. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.h#new... net/spdy/spdy_session.h:549: static bool send_ping_for_every_request_; On 2011/10/14 19:59:07, jar wrote: > That name is no longer valid (since we don't send one per request). You use it > to enable and disable this whole connection health checking system. Perhaps > better would be: > > enable_ping_based_connection_checking_ Done. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.h#new... net/spdy/spdy_session.h:552: static int hung_interval_ms_; On 2011/10/14 19:59:07, jar wrote: > Use only one var for line 551 and 552. > > Suggest using the "hung_interval_ms_". > > A comment could be: > The amount of time (in milliseconds) that we are willing to tolerate with no > data received (of any form), while there is a ping in flight, before we declare > the connection to be hung. Done.
I have to run off. I'll do another pass later tonight. I'll take a look at some point on Saturday too, and once more on Sunday perhaps.
Raman, there doesn't appear to be any unit testing of the timeout and hang recovery stuff. Have you manually tested it? I see some basic tests of whether or not we even send out the PING. http://codereview.chromium.org/8230037/diff/27005/net/http/http_network_layer.cc File net/http/http_network_layer.cc (right): http://codereview.chromium.org/8230037/diff/27005/net/http/http_network_layer... net/http/http_network_layer.cc:45: static const char kDisablePing[] = "no-ping"; Not that I really care, but why are we adding a flag to disable this? Is there a good reason for users/devs to want to do this? http://codereview.chromium.org/8230037/diff/27005/net/http/http_stream_factor... File net/http/http_stream_factory_impl_unittest.cc (right): http://codereview.chromium.org/8230037/diff/27005/net/http/http_stream_factor... net/http/http_stream_factory_impl_unittest.cc:348: SpdySession::set_enable_ping_based_connection_checking(false); It looks weird that we are only disabling PING for this single test. I think we probably want to disable it for all tests by default. I think you should take out all of these calls and move them into net/base/run_all_unittests.cc, and then only enable it for the unit tests that specifically test PING. http://codereview.chromium.org/8230037/diff/27005/net/spdy/spdy_session.cc File net/spdy/spdy_session.cc (right): http://codereview.chromium.org/8230037/diff/27005/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:615: last_sent_was_ping_ = false; I feel like it'd be safer if you set this in QueueFrame(). http://codereview.chromium.org/8230037/diff/27005/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1371: // Send reponse to a PING from Server. s/reponse/response/, s/Server/server/g http://codereview.chromium.org/8230037/diff/27005/net/spdy/spdy_session.h File net/spdy/spdy_session.h (right): http://codereview.chromium.org/8230037/diff/27005/net/spdy/spdy_session.h#new... net/spdy/spdy_session.h:570: static int connection_at_risk_of_loss_ms_; Is this a new optimization? I haven't heard about this one. Raman, in the future, please stop adding changes we haven't discussed beforehand when we're trying to get changes in by the milestone cutoff. It would have been much easier if we had gotten the quick and dirty change in right at the get go and then could relax and get an updated version in later at a more leisurely schedule. What's the failure mode of this change? If a NAT router reboots within those 10 seconds and thus we lose the connection mapping in its table, what happens? Will we ever catch this hang without a user-initiated refresh or something? Is that reasonable? What about connections for non-user initiated requests? Will those only time out after many minutes? http://codereview.chromium.org/8230037/diff/27005/net/spdy/spdy_session.h#new... net/spdy/spdy_session.h:573: // trailing ping. You should explain why we use a trailing ping. http://codereview.chromium.org/8230037/diff/27005/net/spdy/spdy_session_unitt... File net/spdy/spdy_session_unittest.cc (right): http://codereview.chromium.org/8230037/diff/27005/net/spdy/spdy_session_unitt... net/spdy/spdy_session_unittest.cc:202: const char* kStreamUrl = "http://www.google.com/"; This is not a constant. It's a non-const pointer to a const char buffer. You should just declare a static const char kStreamUrl[] = ...
The protocol logic LGTM. I'm quite convinced the state machine logic is correct, but I'm not as savvy as Will is about the implementation interfaces. Comments below are directed at some of the points raised by Will, and may be helpful in moving forward. http://codereview.chromium.org/8230037/diff/27005/net/http/http_network_layer.cc File net/http/http_network_layer.cc (right): http://codereview.chromium.org/8230037/diff/27005/net/http/http_network_layer... net/http/http_network_layer.cc:45: static const char kDisablePing[] = "no-ping"; The concern was that this was being added late in the feature cycle. IF it is seen to be broken, we can disable it via the flag. The flag will probably go away after we have more field experience. Putting new features "under control of a flag" was requested as best-practice for larger features, but seems to make sense with smaller features landing this late. We also needed the infrastructure to make testing easier... so it was almost a freebie. On 2011/10/15 08:08:03, willchan wrote: > Not that I really care, but why are we adding a flag to disable this? Is there a > good reason for users/devs to want to do this? http://codereview.chromium.org/8230037/diff/27005/net/spdy/spdy_session.cc File net/spdy/spdy_session.cc (right): http://codereview.chromium.org/8230037/diff/27005/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:615: last_sent_was_ping_ = false; On 2011/10/15 08:08:03, willchan wrote: > I feel like it'd be safer if you set this in QueueFrame(). The problem(?) is that we send pings via QueueFrame() as well. Hence we'd have to have QueueFrame() filter on the type of frame, and only do the increment some of the time. Worse yet, QueueFrame() function is used to send both respond pings as well as client initiated pings. We only want to set this boolean to false if we don't send a client initiated ping. Bottom line: If we put this logic into QueueFrame() then we'd have to teach QueueFrame to not only parse out Pings, but also to parse out odd vs even (server initiated vs client initiated) pings. Raman and I thought we'd then be putting too much Ping specific awareness into that general function. http://codereview.chromium.org/8230037/diff/27005/net/spdy/spdy_session.h File net/spdy/spdy_session.h (right): http://codereview.chromium.org/8230037/diff/27005/net/spdy/spdy_session.h#new... net/spdy/spdy_session.h:570: static int connection_at_risk_of_loss_ms_; Recall that the back-and-forth discussion was about whether to send a preface ping, or a follower ping. Either one was roughly sufficient, but the preface ping was helpful in finding out sooner if the channel was long since dead, and the follower ping was better at providing assurance that the last bytes had reached the server (and thus justifying an arbitrary wait-for-response (with no additional pinging). The argument against a preface ping was that it delays (ever so slightly) the sending of the first bits of the request. This logic relating to line 570 simply states that IF we just very recently received data on the wire, then the preface ping serves no valuable function, and only delays the data. Recall that all parties to the discussion agreed that fewer pings were better, and this parameter allows for reducing the number of pings. If this is set to zero, then we always send a preface ping (when we have no current pings in flight). Hence it can be set to instigate maximal pinging, or to allow more conservative (fewer) pings when starting a new session on the back (just after finishing) receiving data from a previous session. I think this was in the spirit of all of our discussions. On 2011/10/15 08:08:03, willchan wrote: > Is this a new optimization? I haven't heard about this one. > > Raman, in the future, please stop adding changes we haven't discussed beforehand > when we're trying to get changes in by the milestone cutoff. It would have been > much easier if we had gotten the quick and dirty change in right at the get go > and then could relax and get an updated version in later at a more leisurely > schedule. > > What's the failure mode of this change? If a NAT router reboots within those 10 > seconds and thus we lose the connection mapping in its table, what happens? Will > we ever catch this hang without a user-initiated refresh or something? Is that > reasonable? What about connections for non-user initiated requests? Will those > only time out after many minutes? http://codereview.chromium.org/8230037/diff/27005/net/spdy/spdy_session.h#new... net/spdy/spdy_session.h:573: // trailing ping. On 2011/10/15 08:08:03, willchan wrote: > You should explain why we use a trailing ping. Suggested comment: We use a trailing ping (sent after all data) to get an effective acknowlegement from the server that it has indeed received all (prior) data frames. With that assurance, we are willing to enter into a wait state for responses to our last data frame(s) without further pings.
LGTM. Please make sure you manually test this properly. I want to make sure this doesn't need to get reverted/disabled since this is our top SPDY bug (drives users nuts when it happens). http://codereview.chromium.org/8230037/diff/27005/net/http/http_network_layer.cc File net/http/http_network_layer.cc (right): http://codereview.chromium.org/8230037/diff/27005/net/http/http_network_layer... net/http/http_network_layer.cc:45: static const char kDisablePing[] = "no-ping"; OK, that sounds reasonable. Although I really really don't want to turn this off. It'd be great if instead we could just fall back easily to a dumber / less optimized code path, because this is a pretty important bugfix. On 2011/10/15 09:00:52, jar wrote: > The concern was that this was being added late in the feature cycle. IF it is > seen to be broken, we can disable it via the flag. The flag will probably go > away after we have more field experience. > > Putting new features "under control of a flag" was requested as best-practice > for larger features, but seems to make sense with smaller features landing this > late. > > We also needed the infrastructure to make testing easier... so it was almost a > freebie. > > On 2011/10/15 08:08:03, willchan wrote: > > Not that I really care, but why are we adding a flag to disable this? Is there > a > > good reason for users/devs to want to do this? > http://codereview.chromium.org/8230037/diff/27005/net/spdy/spdy_session.cc File net/spdy/spdy_session.cc (right): http://codereview.chromium.org/8230037/diff/27005/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:615: last_sent_was_ping_ = false; On 2011/10/15 09:00:52, jar wrote: > On 2011/10/15 08:08:03, willchan wrote: > > I feel like it'd be safer if you set this in QueueFrame(). > > The problem(?) is that we send pings via QueueFrame() as well. Hence we'd have > to have QueueFrame() filter on the type of frame, and only do the increment some > of the time. Worse yet, QueueFrame() function is used to send both respond > pings as well as client initiated pings. We only want to set this boolean to > false if we don't send a client initiated ping. > > Bottom line: If we put this logic into QueueFrame() then we'd have to teach > QueueFrame to not only parse out Pings, but also to parse out odd vs even > (server initiated vs client initiated) pings. Raman and I thought we'd then be > putting too much Ping specific awareness into that general function. Yes, this is one of those typical tradeoffs. Do you impose more burden on new callers who all have to remember to set this bool? Or do you do it at a single chokepoint but potentially cross abstraction/layering boundaries. My sense is that the internal details within the .cc file don't constitute an important layering boundary, so it's better to make it easier for implementers of new frame types, since the failure to set this bool is very easy for a reviewer to miss. But your concern is noted. Probably a good middle ground is to keep QueueFrame() as is, create a QueueFrameInternal() with a bool indicating whether or not it's a ping frame, and make the PING code call QueueFrameInternal() with ping set to true. http://codereview.chromium.org/8230037/diff/27005/net/spdy/spdy_session.h File net/spdy/spdy_session.h (right): http://codereview.chromium.org/8230037/diff/27005/net/spdy/spdy_session.h#new... net/spdy/spdy_session.h:570: static int connection_at_risk_of_loss_ms_; Please recall http://code.google.com/p/chromium/issues/detail?id=89725#c6 which makes no mention of this. I'm disappointed we are changing that proposal so late in the cycle. Were we really planning to discuss this over the weekend prior to the cutoff? If you guys really want it, let's do it. Let's not wait another review roundtrip, let's just get this in. That said, let me express my qualms. In short, this is premature optimization that violates Keep It Simple Stupid and may lead to improper handling of exceptional cases. I feel like this is unnecessary complexity for such a little performance gain. With the preface and trailer PING frames coming in at 12 bytes each, and the trailer coming in one second after the preface, that means the maximum PING overhead is 24 bytes + TLS application record overhead per second. Now, with this change setting connection_at_risk_of_loss_ms_ to 10s, we are *at best* saving 9 * (24 bytes + TLS application record overhead) over a 10 second period. Over 10 seconds, do we really care about saving a few hundred bytes in the absolute best case? Now, if a router does reboot or whatever and we send a request out during this 9 second window and don't send anymore requests for the session, that SPDY connection will hang and not recover until the TCP retransmit timeouts. For pages, this isn't too big of a deal since it's likely a new request will happen (more browsing or a refresh or something). For system requests that aren't driven by users, this is pretty harmful. And we have plenty of those nowadays which talk to Google over SPDY. On 2011/10/15 09:00:52, jar wrote: > Recall that the back-and-forth discussion was about whether to send a preface > ping, or a follower ping. Either one was roughly sufficient, but the preface > ping was helpful in finding out sooner if the channel was long since dead, and > the follower ping was better at providing assurance that the last bytes had > reached the server (and thus justifying an arbitrary wait-for-response (with no > additional pinging). > > The argument against a preface ping was that it delays (ever so slightly) the > sending of the first bits of the request. > > This logic relating to line 570 simply states that IF we just very recently > received data on the wire, then the preface ping serves no valuable function, > and only delays the data. > > Recall that all parties to the discussion agreed that fewer pings were better, > and this parameter allows for reducing the number of pings. If this is set to > zero, then we always send a preface ping (when we have no current pings in > flight). Hence it can be set to instigate maximal pinging, or to allow more > conservative (fewer) pings when starting a new session on the back (just after > finishing) receiving data from a previous session. > > I think this was in the spirit of all of our discussions. > > > > On 2011/10/15 08:08:03, willchan wrote: > > Is this a new optimization? I haven't heard about this one. > > > > Raman, in the future, please stop adding changes we haven't discussed > beforehand > > when we're trying to get changes in by the milestone cutoff. It would have > been > > much easier if we had gotten the quick and dirty change in right at the get go > > and then could relax and get an updated version in later at a more leisurely > > schedule. > > > > What's the failure mode of this change? If a NAT router reboots within those > 10 > > seconds and thus we lose the connection mapping in its table, what happens? > Will > > we ever catch this hang without a user-initiated refresh or something? Is that > > reasonable? What about connections for non-user initiated requests? Will those > > only time out after many minutes? >
Response to comments by WillChan. http://codereview.chromium.org/8230037/diff/27005/net/spdy/spdy_session.cc File net/spdy/spdy_session.cc (right): http://codereview.chromium.org/8230037/diff/27005/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:615: last_sent_was_ping_ = false; I'm fine with the factoring you suggested if it makes you feel more comfortable. The only two re/setting of this flag that are IMO critical are: a) Set to true when we send a ping. b) Set to false when we send a session frame (content or request header, or any "user" data). We don't (I think) need a "follower ping" after we send some meta data (such as configuration info, or a reply to a ping etc.) As a result, it is not significant to update the flag in such cases. There is no big loss (I think) if we later find out the connection is black holed, and meta-data never reached the server. As it is, we're getting very near to hacking this protocol to universally ack all frames... and this seems like a bad level to work toward such. As a result, I'm mostly (only?) concerned with setting the flag in those two cases, and we probably should (in our leisure time) nail this policy and the related abstractions (kindred to what you suggest). For other transmission, we don't need to set the status. Again, I'm fine with the factoring you suggested. Given the current code, it will achieve the re/setting more clearly to a reader (and harder to mess up as a programmer), with the same semantics. As a result, it really should only be those two classes of frames that we key on to modify this state... but it seemed too hard in our rushed time to get it perfect... and we opted for more pings than "needed" in such cases. On 2011/10/15 20:17:40, willchan wrote: > On 2011/10/15 09:00:52, jar wrote: > > On 2011/10/15 08:08:03, willchan wrote: > > > I feel like it'd be safer if you set this in QueueFrame(). > > > > The problem(?) is that we send pings via QueueFrame() as well. Hence we'd > have > > to have QueueFrame() filter on the type of frame, and only do the increment > some > > of the time. Worse yet, QueueFrame() function is used to send both respond > > pings as well as client initiated pings. We only want to set this boolean to > > false if we don't send a client initiated ping. > > > > Bottom line: If we put this logic into QueueFrame() then we'd have to teach > > QueueFrame to not only parse out Pings, but also to parse out odd vs even > > (server initiated vs client initiated) pings. Raman and I thought we'd then be > > putting too much Ping specific awareness into that general function. > > Yes, this is one of those typical tradeoffs. Do you impose more burden on new > callers who all have to remember to set this bool? Or do you do it at a single > chokepoint but potentially cross abstraction/layering boundaries. > > My sense is that the internal details within the .cc file don't constitute an > important layering boundary, so it's better to make it easier for implementers > of new frame types, since the failure to set this bool is very easy for a > reviewer to miss. But your concern is noted. Probably a good middle ground is to > keep QueueFrame() as is, create a QueueFrameInternal() with a bool indicating > whether or not it's a ping frame, and make the PING code call > QueueFrameInternal() with ping set to true. http://codereview.chromium.org/8230037/diff/27005/net/spdy/spdy_session.h File net/spdy/spdy_session.h (right): http://codereview.chromium.org/8230037/diff/27005/net/spdy/spdy_session.h#new... net/spdy/spdy_session.h:570: static int connection_at_risk_of_loss_ms_; If you are concerned with it, we can set the time to zero. It was coded with that as a viable option. Raman wanted to land some histograms to see how often these things would be useful. In a protocol which is deriving a lot of its benefit based on compression, consistent care about not pushing extra messages or bytes seems to always be prudent. ...and even if we don't choose to use the knobs, having them close at hand may prove very helpful. Note that the expectation is that a follower ping would be sent very soon after the data. I'd suggest times for follower ping transmission in the range of 500ms to 1000ms, and hence your mention of 10 seconds is perchance misplaced.. Follower-ping is just used to aggregate a few requests automatically between a pair of pings, so omission of a preface ping is not a big deal (again, we talked about *never* sending preface pings, and all parties were ok with it). Performing an optimization that sometimes (when it would IMO be embarrassing to look at a trace and see consecutive follower-ping followed almost immediately by a preface ping), seems like a reasonable option. Again, if you're concerned, I'm fine with setting the "connection_at_risk_of_loss_ms" to be 0. That means that we're always nervous, and always send a preface-ping even if we just got back a reply to the last follower-ping. Although not appropriate to this CL and goal, the whole use of pings to stimulate (effectively) acks seems like an element of the protocol that should have been implicitly provided at a lower level. Specifically, we could insist on a server sending ACKs for received data at various times, which would mean that a ping would never be needed to stimulate such a pseudo-ACK. Again, this can't be in this weekend's timeline, but this should be discussed for SPDY 3. On 2011/10/15 20:17:40, willchan wrote: > Please recall http://code.google.com/p/chromium/issues/detail?id=89725#c6 which > makes no mention of this. I'm disappointed we are changing that proposal so late > in the cycle. Were we really planning to discuss this over the weekend prior to > the cutoff? > > If you guys really want it, let's do it. Let's not wait another review > roundtrip, let's just get this in. That said, let me express my qualms. In > short, this is premature optimization that violates Keep It Simple Stupid and > may lead to improper handling of exceptional cases. > > I feel like this is unnecessary complexity for such a little performance gain. > With the preface and trailer PING frames coming in at 12 bytes each, and the > trailer coming in one second after the preface, that means the maximum PING > overhead is 24 bytes + TLS application record overhead per second. Now, with > this change setting connection_at_risk_of_loss_ms_ to 10s, we are *at best* > saving 9 * (24 bytes + TLS application record overhead) over a 10 second period. > Over 10 seconds, do we really care about saving a few hundred bytes in the > absolute best case? > > Now, if a router does reboot or whatever and we send a request out during this 9 > second window and don't send anymore requests for the session, that SPDY > connection will hang and not recover until the TCP retransmit timeouts. For > pages, this isn't too big of a deal since it's likely a new request will happen > (more browsing or a refresh or something). For system requests that aren't > driven by users, this is pretty harmful. And we have plenty of those nowadays > which talk to Google over SPDY. > > On 2011/10/15 09:00:52, jar wrote: > > Recall that the back-and-forth discussion was about whether to send a preface > > ping, or a follower ping. Either one was roughly sufficient, but the preface > > ping was helpful in finding out sooner if the channel was long since dead, and > > the follower ping was better at providing assurance that the last bytes had > > reached the server (and thus justifying an arbitrary wait-for-response (with > no > > additional pinging). > > > > The argument against a preface ping was that it delays (ever so slightly) the > > sending of the first bits of the request. > > > > This logic relating to line 570 simply states that IF we just very recently > > received data on the wire, then the preface ping serves no valuable function, > > and only delays the data. > > > > Recall that all parties to the discussion agreed that fewer pings were better, > > and this parameter allows for reducing the number of pings. If this is set to > > zero, then we always send a preface ping (when we have no current pings in > > flight). Hence it can be set to instigate maximal pinging, or to allow more > > conservative (fewer) pings when starting a new session on the back (just after > > finishing) receiving data from a previous session. > > > > I think this was in the spirit of all of our discussions. > > > > > > > > On 2011/10/15 08:08:03, willchan wrote: > > > Is this a new optimization? I haven't heard about this one. > > > > > > Raman, in the future, please stop adding changes we haven't discussed > > beforehand > > > when we're trying to get changes in by the milestone cutoff. It would have > > been > > > much easier if we had gotten the quick and dirty change in right at the get > go > > > and then could relax and get an updated version in later at a more leisurely > > > schedule. > > > > > > What's the failure mode of this change? If a NAT router reboots within those > > 10 > > > seconds and thus we lose the connection mapping in its table, what happens? > > Will > > > we ever catch this hang without a user-initiated refresh or something? Is > that > > > reasonable? What about connections for non-user initiated requests? Will > those > > > only time out after many minutes? > > >
hi willchan and jar, Many many thanks for your comments. Kudos to you both. Tested the failure case manually (by black holing all the data read from the server). CheckPingStatus sent ERR_SPDY_PING_FAILED back and closed all streams and HttpNetworkTransaction::HandleIOError retried the URL. One thing I am not sure is how often we should send preface-PING. We could adjust the time as we find more data and get some input from server folks (could depend on how much load we are adding to the overall network traffic and to the server processing). Currently it is set to send preface-PING if we haven't received any data in 10 secs. Many many thanks for suggesting run_all_unittests idea. thanks very much, raman http://codereview.chromium.org/8230037/diff/27005/net/http/http_stream_factor... File net/http/http_stream_factory_impl_unittest.cc (right): http://codereview.chromium.org/8230037/diff/27005/net/http/http_stream_factor... net/http/http_stream_factory_impl_unittest.cc:348: SpdySession::set_enable_ping_based_connection_checking(false); On 2011/10/15 08:08:03, willchan wrote: > It looks weird that we are only disabling PING for this single test. I think we > probably want to disable it for all tests by default. I think you should take > out all of these calls and move them into net/base/run_all_unittests.cc, and > then only enable it for the unit tests that specifically test PING. Done. http://codereview.chromium.org/8230037/diff/27005/net/spdy/spdy_session.cc File net/spdy/spdy_session.cc (right): http://codereview.chromium.org/8230037/diff/27005/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:1371: // Send reponse to a PING from Server. On 2011/10/15 08:08:03, willchan wrote: > s/reponse/response/, s/Server/server/g Done. http://codereview.chromium.org/8230037/diff/27005/net/spdy/spdy_session.h File net/spdy/spdy_session.h (right): http://codereview.chromium.org/8230037/diff/27005/net/spdy/spdy_session.h#new... net/spdy/spdy_session.h:570: static int connection_at_risk_of_loss_ms_; On 2011/10/15 08:08:03, willchan wrote: > Is this a new optimization? I haven't heard about this one. > > Raman, in the future, please stop adding changes we haven't discussed beforehand > when we're trying to get changes in by the milestone cutoff. It would have been > much easier if we had gotten the quick and dirty change in right at the get go > and then could relax and get an updated version in later at a more leisurely > schedule. > > What's the failure mode of this change? If a NAT router reboots within those 10 > seconds and thus we lose the connection mapping in its table, what happens? Will > we ever catch this hang without a user-initiated refresh or something? Is that > reasonable? What about connections for non-user initiated requests? Will those > only time out after many minutes? I think it is just variable name change. In patch 14, we were using hunginterval to send the preface ping in SendPing. http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc I am not sure what should be right interval time. http://codereview.chromium.org/8230037/diff/27005/net/spdy/spdy_session.h#new... net/spdy/spdy_session.h:573: // trailing ping. On 2011/10/15 09:00:52, jar wrote: > On 2011/10/15 08:08:03, willchan wrote: > > You should explain why we use a trailing ping. > > Suggested comment: > > We use a trailing ping (sent after all data) to get an effective acknowlegement > from the server that it has indeed received all (prior) data frames. With that > assurance, we are willing to enter into a wait state for responses to our last > data frame(s) without further pings. Done. http://codereview.chromium.org/8230037/diff/27005/net/spdy/spdy_session_unitt... File net/spdy/spdy_session_unittest.cc (right): http://codereview.chromium.org/8230037/diff/27005/net/spdy/spdy_session_unitt... net/spdy/spdy_session_unittest.cc:202: const char* kStreamUrl = "http://www.google.com/"; On 2011/10/15 08:08:03, willchan wrote: > This is not a constant. It's a non-const pointer to a const char buffer. You > should just declare a static const char kStreamUrl[] = ... Done.
Great, everything sounds good. The only change I think that's worth making at this time is setting the default connection_at_risk_of_loss_ms_ value to 0. The QueueFrame() refactoring can be postponed and discussed more fully later. Let's commit this and then work on unit tests and histograms so we can better evaluate timers. Unit tests and histograms should be in 2 separate CLs, it'd be great to get histograms in Monday too, but is not a blocker for M16. Oh, and updating the changelist description with the algorithm or providing a link to the algorithm would be great. And add mention of the connection_at_risk_of_loss_ms_ knob to the bug thread too. http://codereview.chromium.org/8230037/diff/27005/net/spdy/spdy_session.cc File net/spdy/spdy_session.cc (right): http://codereview.chromium.org/8230037/diff/27005/net/spdy/spdy_session.cc#ne... net/spdy/spdy_session.cc:615: last_sent_was_ping_ = false; OK On 2011/10/15 23:25:11, jar wrote: > I'm fine with the factoring you suggested if it makes you feel more comfortable. > The only two re/setting of this flag that are IMO critical are: > a) Set to true when we send a ping. > b) Set to false when we send a session frame (content or request header, or any > "user" data). > > We don't (I think) need a "follower ping" after we send some meta data (such as > configuration info, or a reply to a ping etc.) As a result, it is not > significant to update the flag in such cases. There is no big loss (I think) if > we later find out the connection is black holed, and meta-data never reached the > server. As it is, we're getting very near to hacking this protocol to > universally ack all frames... and this seems like a bad level to work toward > such. > > As a result, I'm mostly (only?) concerned with setting the flag in those two > cases, and we probably should (in our leisure time) nail this policy and the > related abstractions (kindred to what you suggest). For other transmission, we > don't need to set the status. > > Again, I'm fine with the factoring you suggested. Given the current code, it > will achieve the re/setting more clearly to a reader (and harder to mess up as a > programmer), with the same semantics. > > As a result, it really should only be those two classes of frames that we key on > to modify this state... but it seemed too hard in our rushed time to get it > perfect... and we opted for more pings than "needed" in such cases. > > On 2011/10/15 20:17:40, willchan wrote: > > On 2011/10/15 09:00:52, jar wrote: > > > On 2011/10/15 08:08:03, willchan wrote: > > > > I feel like it'd be safer if you set this in QueueFrame(). > > > > > > The problem(?) is that we send pings via QueueFrame() as well. Hence we'd > > have > > > to have QueueFrame() filter on the type of frame, and only do the increment > > some > > > of the time. Worse yet, QueueFrame() function is used to send both respond > > > pings as well as client initiated pings. We only want to set this boolean > to > > > false if we don't send a client initiated ping. > > > > > > Bottom line: If we put this logic into QueueFrame() then we'd have to teach > > > QueueFrame to not only parse out Pings, but also to parse out odd vs even > > > (server initiated vs client initiated) pings. Raman and I thought we'd then > be > > > putting too much Ping specific awareness into that general function. > > > > Yes, this is one of those typical tradeoffs. Do you impose more burden on new > > callers who all have to remember to set this bool? Or do you do it at a single > > chokepoint but potentially cross abstraction/layering boundaries. > > > > My sense is that the internal details within the .cc file don't constitute an > > important layering boundary, so it's better to make it easier for implementers > > of new frame types, since the failure to set this bool is very easy for a > > reviewer to miss. But your concern is noted. Probably a good middle ground is > to > > keep QueueFrame() as is, create a QueueFrameInternal() with a bool indicating > > whether or not it's a ping frame, and make the PING code call > > QueueFrameInternal() with ping set to true. > http://codereview.chromium.org/8230037/diff/27005/net/spdy/spdy_session.h File net/spdy/spdy_session.h (right): http://codereview.chromium.org/8230037/diff/27005/net/spdy/spdy_session.h#new... net/spdy/spdy_session.h:570: static int connection_at_risk_of_loss_ms_; On 2011/10/15 23:25:11, jar wrote: > If you are concerned with it, we can set the time to zero. It was coded with > that as a viable option. Great, let's do that. > > Raman wanted to land some histograms to see how often these things would be > useful. > > In a protocol which is deriving a lot of its benefit based on compression, > consistent care about not pushing extra messages or bytes seems to always be > prudent. ...and even if we don't choose to use the knobs, having them close at > hand may prove very helpful. Agreed. I just don't want to miss the deadline by adding in extra complexity that is untested (all of this timeout based code is currently untested). > > Note that the expectation is that a follower ping would be sent very soon after > the data. I'd suggest times for follower ping transmission in the range of > 500ms to 1000ms, and hence your mention of 10 seconds is perchance misplaced.. You might want to re-read my comment. I said the trailer PING happens in 1 second. The 10s comes from this new knob added |connection_at_risk_of_loss_ms_|. > > Follower-ping is just used to aggregate a few requests automatically between a > pair of pings, so omission of a preface ping is not a big deal (again, we talked > about *never* sending preface pings, and all parties were ok with it). > Performing an optimization that sometimes (when it would IMO be embarrassing to > look at a trace and see consecutive follower-ping followed almost immediately by > a preface ping), seems like a reasonable option. > > Again, if you're concerned, I'm fine with setting the > "connection_at_risk_of_loss_ms" to be 0. That means that we're always nervous, > and always send a preface-ping even if we just got back a reply to the last > follower-ping. > > Although not appropriate to this CL and goal, the whole use of pings to > stimulate (effectively) acks seems like an element of the protocol that should > have been implicitly provided at a lower level. Specifically, we could insist > on a server sending ACKs for received data at various times, which would mean > that a ping would never be needed to stimulate such a pseudo-ACK. Again, this > can't be in this weekend's timeline, but this should be discussed for SPDY 3. > > > On 2011/10/15 20:17:40, willchan wrote: > > Please recall http://code.google.com/p/chromium/issues/detail?id=89725#c6 > which > > makes no mention of this. I'm disappointed we are changing that proposal so > late > > in the cycle. Were we really planning to discuss this over the weekend prior > to > > the cutoff? > > > > If you guys really want it, let's do it. Let's not wait another review > > roundtrip, let's just get this in. That said, let me express my qualms. In > > short, this is premature optimization that violates Keep It Simple Stupid and > > may lead to improper handling of exceptional cases. > > > > I feel like this is unnecessary complexity for such a little performance gain. > > With the preface and trailer PING frames coming in at 12 bytes each, and the > > trailer coming in one second after the preface, that means the maximum PING > > overhead is 24 bytes + TLS application record overhead per second. Now, with > > this change setting connection_at_risk_of_loss_ms_ to 10s, we are *at best* > > saving 9 * (24 bytes + TLS application record overhead) over a 10 second > period. > > Over 10 seconds, do we really care about saving a few hundred bytes in the > > absolute best case? > > > > Now, if a router does reboot or whatever and we send a request out during this > 9 > > second window and don't send anymore requests for the session, that SPDY > > connection will hang and not recover until the TCP retransmit timeouts. For > > pages, this isn't too big of a deal since it's likely a new request will > happen > > (more browsing or a refresh or something). For system requests that aren't > > driven by users, this is pretty harmful. And we have plenty of those nowadays > > which talk to Google over SPDY. > > > > On 2011/10/15 09:00:52, jar wrote: > > > Recall that the back-and-forth discussion was about whether to send a > preface > > > ping, or a follower ping. Either one was roughly sufficient, but the preface > > > ping was helpful in finding out sooner if the channel was long since dead, > and > > > the follower ping was better at providing assurance that the last bytes had > > > reached the server (and thus justifying an arbitrary wait-for-response (with > > no > > > additional pinging). > > > > > > The argument against a preface ping was that it delays (ever so slightly) > the > > > sending of the first bits of the request. > > > > > > This logic relating to line 570 simply states that IF we just very recently > > > received data on the wire, then the preface ping serves no valuable > function, > > > and only delays the data. > > > > > > Recall that all parties to the discussion agreed that fewer pings were > better, > > > and this parameter allows for reducing the number of pings. If this is set > to > > > zero, then we always send a preface ping (when we have no current pings in > > > flight). Hence it can be set to instigate maximal pinging, or to allow more > > > conservative (fewer) pings when starting a new session on the back (just > after > > > finishing) receiving data from a previous session. > > > > > > I think this was in the spirit of all of our discussions. > > > > > > > > > > > > On 2011/10/15 08:08:03, willchan wrote: > > > > Is this a new optimization? I haven't heard about this one. > > > > > > > > Raman, in the future, please stop adding changes we haven't discussed > > > beforehand > > > > when we're trying to get changes in by the milestone cutoff. It would have > > > been > > > > much easier if we had gotten the quick and dirty change in right at the > get > > go > > > > and then could relax and get an updated version in later at a more > leisurely > > > > schedule. > > > > > > > > What's the failure mode of this change? If a NAT router reboots within > those > > > 10 > > > > seconds and thus we lose the connection mapping in its table, what > happens? > > > Will > > > > we ever catch this hang without a user-initiated refresh or something? Is > > that > > > > reasonable? What about connections for non-user initiated requests? Will > > those > > > > only time out after many minutes? > > > > > > http://codereview.chromium.org/8230037/diff/27005/net/spdy/spdy_session.h#new... net/spdy/spdy_session.h:570: static int connection_at_risk_of_loss_ms_; On 2011/10/15 23:33:11, ramant wrote: > On 2011/10/15 08:08:03, willchan wrote: > > Is this a new optimization? I haven't heard about this one. > > > > Raman, in the future, please stop adding changes we haven't discussed > beforehand > > when we're trying to get changes in by the milestone cutoff. It would have > been > > much easier if we had gotten the quick and dirty change in right at the get go > > and then could relax and get an updated version in later at a more leisurely > > schedule. > > > > What's the failure mode of this change? If a NAT router reboots within those > 10 > > seconds and thus we lose the connection mapping in its table, what happens? > Will > > we ever catch this hang without a user-initiated refresh or something? Is that > > reasonable? What about connections for non-user initiated requests? Will those > > only time out after many minutes? > > I think it is just variable name change. In patch 14, we were using hunginterval > to send the preface ping in SendPing. Oh wow, I totally missed that previously. My bad. I didn't see it as one of the optimizations in http://code.google.com/p/chromium/issues/detail?id=89725, so I thought it was new. > > http://codereview.chromium.org/8230037/diff/25001/net/spdy/spdy_session.cc > > I am not sure what should be right interval time.
> The only change I think that's worth making at this time is setting > the default connection_at_risk_of_loss_ms_ value to 0. Done. thanks much willchan, raman
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/rtenneti@chromium.org/8230037/43007
Change committed as 105723 |