Issue 174287: Resend on IO errors on late bound sockets that were idle.

willchan no longer on Chromium

11 years, 4 months ago (2009-08-22 00:46:26 UTC) #1

wtc

LGTM. You should document the three SocketReuseType values in client_socket_handle.h.

11 years, 4 months ago (2009-08-22 01:07:12 UTC) #2

willchan no longer on Chromium

On 2009/08/22 01:07:12, wtc wrote: > LGTM. > > You should document the three SocketReuseType ...

11 years, 4 months ago (2009-08-22 01:39:32 UTC) #3

wtc

http://codereview.chromium.org/174287/diff/4/1003 File net/http/http_network_transaction.cc (right): http://codereview.chromium.org/174287/diff/4/1003#newcode1557 Line 1557: // We didn't use an idle socket. I ...

11 years, 4 months ago (2009-08-22 02:05:13 UTC) #4

willchan no longer on Chromium

http://codereview.chromium.org/174287/diff/4/1003 File net/http/http_network_transaction.cc (right): http://codereview.chromium.org/174287/diff/4/1003#newcode1557 Line 1557: // We didn't use an idle socket. On ...

11 years, 4 months ago (2009-08-22 02:17:09 UTC) #5

darin (slow to review)

I'm confused by this change. If the idle time is very short for a late-bound ...

11 years, 4 months ago (2009-08-24 16:54:41 UTC) #6

willchan no longer on Chromium

I don't think the RST errors are a function of whether or not the newly ...

11 years, 4 months ago (2009-08-24 17:34:37 UTC) #7

I don't think the RST errors are a function of whether or not the newly
connected socket was immediately used, but how much time it sits idle before
use.  I think it's perfectly fine to keep unused idle late-bound sockets up to X
idle time.  I'm trying to figure out what X should be based on histogram data.

If you look at
http://www.corp.google.com/~chrome-eng/dashboards/histograms/index.html?data_...,
you'll see that 90% of RSTs occur on a idle, unused socket happen after 10
seconds.  I misread the code and thought the idle sockets were timed out after
10 seconds, but I see now that it's just that the timer fires every 10 seconds
to time out idle sockets after 5 minutes, so I'm changing the histogram to
measure up to 6 minutes.  I'm also adding histograms to see how long a socket
sits idle before it gets used.  I plan to compare these two histograms to
determine the appropriate amount of time to let sockets sit idle before they get
closed.  I have the same data for already used (HTTP keep-alive), idle sockets,
so it's possible we might want to time those out earlier than 5 seconds as well.

This change is just to mitigate user visible ERR_CONNECTION_RESETs while I add
the extra histograms this week to figure out an appropriate idle time.  Given
the above data, are you still sure we never want to keep unused, idle late-bound
sockets?  I haven't made that change yet, because I can't gather the data I want
then.

On 2009/08/24 16:54:41, darin wrote:
> I'm confused by this change.  If the idle time is very short for a late-bound
> socket, then won't it look a lot like a newly opened socket for a normal HTTP
> request?  In which case, it seems like you are saying that we should fallback
to
> retrying for RST errors that occur on a newly opened socket.  That doesn't
seem
> correct to me.
> 
> I think it is a mistake to keep idle late-bound sockets that are not used.  (I
> don't know if I am using the right terminology.)
> 
> http://codereview.chromium.org/174287/diff/7/1008
> File net/socket/client_socket_handle.h (right):
> 
> http://codereview.chromium.org/174287/diff/7/1008#newcode31
> Line 31: UNUSED = 0,   // unused socket that just finished connectin
> nit: "connecting"

willchan no longer on Chromium

Woops, I forgot to mention which histograms to look at. Take a look at the ...

11 years, 4 months ago (2009-08-24 17:40:38 UTC) #8

Woops, I forgot to mention which histograms to look at.  Take a look at the
Net.SocketIdleTimeOnIOError* histograms to see how long a socket is idle before
it RSTs (or otherwise closes/aborts).

On 2009/08/24 17:34:37, willchan wrote:
> I don't think the RST errors are a function of whether or not the newly
> connected socket was immediately used, but how much time it sits idle before
> use.  I think it's perfectly fine to keep unused idle late-bound sockets up to
X
> idle time.  I'm trying to figure out what X should be based on histogram data.
> 
> If you look at
>
http://www.corp.google.com/%7Echrome-eng/dashboards/histograms/index.html?dat...,
> you'll see that 90% of RSTs occur on a idle, unused socket happen after 10
> seconds.  I misread the code and thought the idle sockets were timed out after
> 10 seconds, but I see now that it's just that the timer fires every 10 seconds
> to time out idle sockets after 5 minutes, so I'm changing the histogram to
> measure up to 6 minutes.  I'm also adding histograms to see how long a socket
> sits idle before it gets used.  I plan to compare these two histograms to
> determine the appropriate amount of time to let sockets sit idle before they
get
> closed.  I have the same data for already used (HTTP keep-alive), idle
sockets,
> so it's possible we might want to time those out earlier than 5 seconds as
well.
> 
> This change is just to mitigate user visible ERR_CONNECTION_RESETs while I add
> the extra histograms this week to figure out an appropriate idle time.  Given
> the above data, are you still sure we never want to keep unused, idle
late-bound
> sockets?  I haven't made that change yet, because I can't gather the data I
want
> then.
> 
> On 2009/08/24 16:54:41, darin wrote:
> > I'm confused by this change.  If the idle time is very short for a
late-bound
> > socket, then won't it look a lot like a newly opened socket for a normal
HTTP
> > request?  In which case, it seems like you are saying that we should
fallback
> to
> > retrying for RST errors that occur on a newly opened socket.  That doesn't
> seem
> > correct to me.
> > 
> > I think it is a mistake to keep idle late-bound sockets that are not used. 
(I
> > don't know if I am using the right terminology.)
> > 
> > http://codereview.chromium.org/174287/diff/7/1008
> > File net/socket/client_socket_handle.h (right):
> > 
> > http://codereview.chromium.org/174287/diff/7/1008#newcode31
> > Line 31: UNUSED = 0,   // unused socket that just finished connectin
> > nit: "connecting"

wtc

On 2009/08/24 16:54:41, darin wrote: > > I think it is a mistake to keep ...

11 years, 4 months ago (2009-08-24 17:49:02 UTC) #9

willchan no longer on Chromium

Yeah, I know both of you said this :) Do you guys still stand by ...

11 years, 4 months ago (2009-08-24 17:54:53 UTC) #10

willchan no longer on Chromium

That's a very valid concern. I'm happy to change it only to retry on the ...

11 years, 4 months ago (2009-08-24 17:56:32 UTC) #11

That's a very valid concern.  I'm happy to change it only to retry on
the ERR_CONNECTION_CLOSED case (I've been meaning to gather data on
what the distribution is of ERR_CONNECTION_* is too).  I'll go ahead
and make that change.  I'm not clear though why this concern is only
valid for a never-before-used socket.  If we get ERR_CONNECTION_CLOSED
or ERR_CONNECTION_ABORTED, doesn't it seem possible that the same
problem occurred?  Or perhaps it's just less likely?

On Mon, Aug 24, 2009 at 10:41 AM, Darin Fisher<darin@chromium.org> wrote:
> So, I think what you are doing is safe for the RST case since that error =
is
> an indication that the other side did not receive the data. =A0But what a=
bout
> ERR_CONNECTION_CLOSED and ERR_CONNECTION_ABORTED?
> My concern with retrying on an error observed after attempting to write t=
o a
> never-before-used socket is that we might end up issuing the HTTP request
> twice. =A0That could be very bad.
> -Darin
>
> On Mon, Aug 24, 2009 at 10:34 AM, <willchan@chromium.org> wrote:
>>
>> I don't think the RST errors are a function of whether or not the newly
>> connected socket was immediately used, but how much time it sits idle
>> before use. =A0I think it's perfectly fine to keep unused idle late-boun=
d
>> sockets up to X idle time. =A0I'm trying to figure out what X should be
>> based on histogram data.
>>
>> If you look at
>>
>> http://www.corp.google.com/~chrome-eng/dashboards/histograms/index.html?=
data_source=3Dyesterday.js&version=3D4.0.202.0-.&group=3DNet&histogram=3DNe=
t,
>> you'll see that 90% of RSTs occur on a idle, unused socket happen after
>> 10 seconds. =A0I misread the code and thought the idle sockets were time=
d
>> out after 10 seconds, but I see now that it's just that the timer fires
>> every 10 seconds to time out idle sockets after 5 minutes, so I'm
>> changing the histogram to measure up to 6 minutes. =A0I'm also adding
>> histograms to see how long a socket sits idle before it gets used. =A0I
>> plan to compare these two histograms to determine the appropriate amount
>> of time to let sockets sit idle before they get closed. =A0I have the sa=
me
>> data for already used (HTTP keep-alive), idle sockets, so it's possible
>> we might want to time those out earlier than 5 seconds as well.
>>
>> This change is just to mitigate user visible ERR_CONNECTION_RESETs while
>> I add the extra histograms this week to figure out an appropriate idle
>> time. =A0Given the above data, are you still sure we never want to keep
>> unused, idle late-bound sockets? =A0I haven't made that change yet,
>> because I can't gather the data I want then.
>>
>> On 2009/08/24 16:54:41, darin wrote:
>>>
>>> I'm confused by this change. =A0If the idle time is very short for a
>>
>> late-bound
>>>
>>> socket, then won't it look a lot like a newly opened socket for a
>>
>> normal HTTP
>>>
>>> request? =A0In which case, it seems like you are saying that we should
>>
>> fallback to
>>>
>>> retrying for RST errors that occur on a newly opened socket. =A0That
>>
>> doesn't seem
>>>
>>> correct to me.
>>
>>> I think it is a mistake to keep idle late-bound sockets that are not
>>
>> used. =A0(I
>>>
>>> don't know if I am using the right terminology.)
>>
>>> http://codereview.chromium.org/174287/diff/7/1008
>>> File net/socket/client_socket_handle.h (right):
>>
>>> http://codereview.chromium.org/174287/diff/7/1008#newcode31
>>> Line 31: UNUSED =3D 0, =A0 // unused socket that just finished connecti=
n
>>> nit: "connecting"
>>
>>
>>
>> http://codereview.chromium.org/174287
>
>

darin (slow to review)

So, I think what you are doing is safe for the RST case since that ...

11 years, 4 months ago (2009-08-24 17:57:13 UTC) #12

So, I think what you are doing is safe for the RST case since that error is
an indication that the other side did not receive the data.  But what about
ERR_CONNECTION_CLOSED and ERR_CONNECTION_ABORTED?
My concern with retrying on an error observed after attempting to write to a
never-before-used socket is that we might end up issuing the HTTP request
twice.  That could be very bad.

-Darin


On Mon, Aug 24, 2009 at 10:34 AM, <willchan@chromium.org> wrote:

> I don't think the RST errors are a function of whether or not the newly
> connected socket was immediately used, but how much time it sits idle
> before use.  I think it's perfectly fine to keep unused idle late-bound
> sockets up to X idle time.  I'm trying to figure out what X should be
> based on histogram data.
>
> If you look at
>
>
http://www.corp.google.com/~chrome-eng/dashboards/histograms/index.html?data_...
> ,
> you'll see that 90% of RSTs occur on a idle, unused socket happen after
> 10 seconds.  I misread the code and thought the idle sockets were timed
> out after 10 seconds, but I see now that it's just that the timer fires
> every 10 seconds to time out idle sockets after 5 minutes, so I'm
> changing the histogram to measure up to 6 minutes.  I'm also adding
> histograms to see how long a socket sits idle before it gets used.  I
> plan to compare these two histograms to determine the appropriate amount
> of time to let sockets sit idle before they get closed.  I have the same
> data for already used (HTTP keep-alive), idle sockets, so it's possible
> we might want to time those out earlier than 5 seconds as well.
>
> This change is just to mitigate user visible ERR_CONNECTION_RESETs while
> I add the extra histograms this week to figure out an appropriate idle
> time.  Given the above data, are you still sure we never want to keep
> unused, idle late-bound sockets?  I haven't made that change yet,
> because I can't gather the data I want then.
>
>
> On 2009/08/24 16:54:41, darin wrote:
>
>> I'm confused by this change.  If the idle time is very short for a
>>
> late-bound
>
>> socket, then won't it look a lot like a newly opened socket for a
>>
> normal HTTP
>
>> request?  In which case, it seems like you are saying that we should
>>
> fallback to
>
>> retrying for RST errors that occur on a newly opened socket.  That
>>
> doesn't seem
>
>> correct to me.
>>
>
>  I think it is a mistake to keep idle late-bound sockets that are not
>>
> used.  (I
>
>> don't know if I am using the right terminology.)
>>
>
>  http://codereview.chromium.org/174287/diff/7/1008
>> File net/socket/client_socket_handle.h (right):
>>
>
>  http://codereview.chromium.org/174287/diff/7/1008#newcode31
>> Line 31: UNUSED = 0,   // unused socket that just finished connectin
>> nit: "connecting"
>>
>
>
>
> http://codereview.chromium.org/174287
>

willchan no longer on Chromium

On Mon, Aug 24, 2009 at 10:54 AM, Darin Fisher<darin@chromium.org> wrote: > Backing up a ...

11 years, 4 months ago (2009-08-24 17:57:27 UTC) #13

On Mon, Aug 24, 2009 at 10:54 AM, Darin Fisher<darin@chromium.org> wrote:
> Backing up a second... I agree that finding the right timeout value is th=
e
> best answer here. =C2=A0If we can discover that, then it makes sense to r=
euse
> sockets that were connected but never used.
> If the only way to do that is to have this code in place temporarily, the=
n
> so be it.

Ok, cool.

> If we can narrow the experiment to only the error condition we are
> interested in, then that is probably best.

Sounds great

> Q: =C2=A0Why do you want to change it to only retry on ERR_CONNECTION_CLO=
SED and
> not ERR_CONNECTION_RESET? =C2=A0I thought this was about RST errors.

Because I clearly haven't woken up yet :)  Thanks for correcting that,
I meant to say ERR_CONNECTION_RESET.

>
> -Darin
>
>
> On Mon, Aug 24, 2009 at 10:48 AM, William Chan (=E9=99=88=E6=99=BA=E6=98=
=8C) <willchan@chromium.org>
> wrote:
>>
>> That's a very valid concern. =C2=A0I'm happy to change it only to retry =
on
>> the ERR_CONNECTION_CLOSED case (I've been meaning to gather data on
>> what the distribution is of ERR_CONNECTION_* is too). =C2=A0I'll go ahea=
d
>> and make that change. =C2=A0I'm not clear though why this concern is onl=
y
>> valid for a never-before-used socket. =C2=A0If we get ERR_CONNECTION_CLO=
SED
>> or ERR_CONNECTION_ABORTED, doesn't it seem possible that the same
>> problem occurred? =C2=A0Or perhaps it's just less likely?
>>
>> On Mon, Aug 24, 2009 at 10:41 AM, Darin Fisher<darin@chromium.org> wrote=
:
>> > So, I think what you are doing is safe for the RST case since that err=
or
>> > is
>> > an indication that the other side did not receive the data. =C2=A0But =
what
>> > about
>> > ERR_CONNECTION_CLOSED and ERR_CONNECTION_ABORTED?
>> > My concern with retrying on an error observed after attempting to writ=
e
>> > to a
>> > never-before-used socket is that we might end up issuing the HTTP
>> > request
>> > twice. =C2=A0That could be very bad.
>> > -Darin
>> >
>> > On Mon, Aug 24, 2009 at 10:34 AM, <willchan@chromium.org> wrote:
>> >>
>> >> I don't think the RST errors are a function of whether or not the new=
ly
>> >> connected socket was immediately used, but how much time it sits idle
>> >> before use. =C2=A0I think it's perfectly fine to keep unused idle lat=
e-bound
>> >> sockets up to X idle time. =C2=A0I'm trying to figure out what X shou=
ld be
>> >> based on histogram data.
>> >>
>> >> If you look at
>> >>
>> >>
>> >> http://www.corp.google.com/~chrome-eng/dashboards/histograms/index.ht=
ml?data_source=3Dyesterday.js&version=3D4.0.202.0-.&group=3DNet&histogram=
=3DNet,
>> >> you'll see that 90% of RSTs occur on a idle, unused socket happen aft=
er
>> >> 10 seconds. =C2=A0I misread the code and thought the idle sockets wer=
e timed
>> >> out after 10 seconds, but I see now that it's just that the timer fir=
es
>> >> every 10 seconds to time out idle sockets after 5 minutes, so I'm
>> >> changing the histogram to measure up to 6 minutes. =C2=A0I'm also add=
ing
>> >> histograms to see how long a socket sits idle before it gets used. =
=C2=A0I
>> >> plan to compare these two histograms to determine the appropriate
>> >> amount
>> >> of time to let sockets sit idle before they get closed. =C2=A0I have =
the
>> >> same
>> >> data for already used (HTTP keep-alive), idle sockets, so it's possib=
le
>> >> we might want to time those out earlier than 5 seconds as well.
>> >>
>> >> This change is just to mitigate user visible ERR_CONNECTION_RESETs
>> >> while
>> >> I add the extra histograms this week to figure out an appropriate idl=
e
>> >> time. =C2=A0Given the above data, are you still sure we never want to=
 keep
>> >> unused, idle late-bound sockets? =C2=A0I haven't made that change yet=
,
>> >> because I can't gather the data I want then.
>> >>
>> >> On 2009/08/24 16:54:41, darin wrote:
>> >>>
>> >>> I'm confused by this change. =C2=A0If the idle time is very short fo=
r a
>> >>
>> >> late-bound
>> >>>
>> >>> socket, then won't it look a lot like a newly opened socket for a
>> >>
>> >> normal HTTP
>> >>>
>> >>> request? =C2=A0In which case, it seems like you are saying that we s=
hould
>> >>
>> >> fallback to
>> >>>
>> >>> retrying for RST errors that occur on a newly opened socket. =C2=A0T=
hat
>> >>
>> >> doesn't seem
>> >>>
>> >>> correct to me.
>> >>
>> >>> I think it is a mistake to keep idle late-bound sockets that are not
>> >>
>> >> used. =C2=A0(I
>> >>>
>> >>> don't know if I am using the right terminology.)
>> >>
>> >>> http://codereview.chromium.org/174287/diff/7/1008
>> >>> File net/socket/client_socket_handle.h (right):
>> >>
>> >>> http://codereview.chromium.org/174287/diff/7/1008#newcode31
>> >>> Line 31: UNUSED =3D 0, =C2=A0 // unused socket that just finished co=
nnectin
>> >>> nit: "connecting"
>> >>
>> >>
>> >>
>> >> http://codereview.chromium.org/174287
>> >
>> >
>
>

darin (slow to review)

On Mon, Aug 24, 2009 at 10:56 AM, William Chan (=B3=C2=D6=C7=B2=FD) <willchan@chromium.org>wrote: > On Mon, ...

11 years, 4 months ago (2009-08-24 17:59:06 UTC) #14

On Mon, Aug 24, 2009 at 10:56 AM, William Chan (=B3=C2=D6=C7=B2=FD)
<willchan@chromium.org>wrote:

> On Mon, Aug 24, 2009 at 10:54 AM, Darin Fisher<darin@chromium.org> wrote:
> > Backing up a second... I agree that finding the right timeout value is
> the
> > best answer here.  If we can discover that, then it makes sense to reus=
e
> > sockets that were connected but never used.
> > If the only way to do that is to have this code in place temporarily,
> then
> > so be it.
>
> Ok, cool.
>
> > If we can narrow the experiment to only the error condition we are
> > interested in, then that is probably best.
>
> Sounds great
>
> > Q:  Why do you want to change it to only retry on ERR_CONNECTION_CLOSED
> and
> > not ERR_CONNECTION_RESET?  I thought this was about RST errors.
>
> Because I clearly haven't woken up yet :)  Thanks for correcting that,
> I meant to say ERR_CONNECTION_RESET.
>
>
okay, cool.
-darin




> >
> > -Darin
> >
> >
> > On Mon, Aug 24, 2009 at 10:48 AM, William Chan (=B3=C2=D6=C7=B2=FD) <
> willchan@chromium.org>
> > wrote:
> >>
> >> That's a very valid concern.  I'm happy to change it only to retry on
> >> the ERR_CONNECTION_CLOSED case (I've been meaning to gather data on
> >> what the distribution is of ERR_CONNECTION_* is too).  I'll go ahead
> >> and make that change.  I'm not clear though why this concern is only
> >> valid for a never-before-used socket.  If we get ERR_CONNECTION_CLOSED
> >> or ERR_CONNECTION_ABORTED, doesn't it seem possible that the same
> >> problem occurred?  Or perhaps it's just less likely?
> >>
> >> On Mon, Aug 24, 2009 at 10:41 AM, Darin Fisher<darin@chromium.org>
> wrote:
> >> > So, I think what you are doing is safe for the RST case since that
> error
> >> > is
> >> > an indication that the other side did not receive the data.  But wha=
t
> >> > about
> >> > ERR_CONNECTION_CLOSED and ERR_CONNECTION_ABORTED?
> >> > My concern with retrying on an error observed after attempting to
> write
> >> > to a
> >> > never-before-used socket is that we might end up issuing the HTTP
> >> > request
> >> > twice.  That could be very bad.
> >> > -Darin
> >> >
> >> > On Mon, Aug 24, 2009 at 10:34 AM, <willchan@chromium.org> wrote:
> >> >>
> >> >> I don't think the RST errors are a function of whether or not the
> newly
> >> >> connected socket was immediately used, but how much time it sits id=
le
> >> >> before use.  I think it's perfectly fine to keep unused idle
> late-bound
> >> >> sockets up to X idle time.  I'm trying to figure out what X should =
be
> >> >> based on histogram data.
> >> >>
> >> >> If you look at
> >> >>
> >> >>
> >> >>
> http://www.corp.google.com/~chrome-eng/dashboards/histograms/index.html?d=
ata_source=3Dyesterday.js&version=3D4.0.202.0-.&group=3DNet&histogram=3DNet
> ,
> >> >> you'll see that 90% of RSTs occur on a idle, unused socket happen
> after
> >> >> 10 seconds.  I misread the code and thought the idle sockets were
> timed
> >> >> out after 10 seconds, but I see now that it's just that the timer
> fires
> >> >> every 10 seconds to time out idle sockets after 5 minutes, so I'm
> >> >> changing the histogram to measure up to 6 minutes.  I'm also adding
> >> >> histograms to see how long a socket sits idle before it gets used. =
 I
> >> >> plan to compare these two histograms to determine the appropriate
> >> >> amount
> >> >> of time to let sockets sit idle before they get closed.  I have the
> >> >> same
> >> >> data for already used (HTTP keep-alive), idle sockets, so it's
> possible
> >> >> we might want to time those out earlier than 5 seconds as well.
> >> >>
> >> >> This change is just to mitigate user visible ERR_CONNECTION_RESETs
> >> >> while
> >> >> I add the extra histograms this week to figure out an appropriate
> idle
> >> >> time.  Given the above data, are you still sure we never want to ke=
ep
> >> >> unused, idle late-bound sockets?  I haven't made that change yet,
> >> >> because I can't gather the data I want then.
> >> >>
> >> >> On 2009/08/24 16:54:41, darin wrote:
> >> >>>
> >> >>> I'm confused by this change.  If the idle time is very short for a
> >> >>
> >> >> late-bound
> >> >>>
> >> >>> socket, then won't it look a lot like a newly opened socket for a
> >> >>
> >> >> normal HTTP
> >> >>>
> >> >>> request?  In which case, it seems like you are saying that we shou=
ld
> >> >>
> >> >> fallback to
> >> >>>
> >> >>> retrying for RST errors that occur on a newly opened socket.  That
> >> >>
> >> >> doesn't seem
> >> >>>
> >> >>> correct to me.
> >> >>
> >> >>> I think it is a mistake to keep idle late-bound sockets that are n=
ot
> >> >>
> >> >> used.  (I
> >> >>>
> >> >>> don't know if I am using the right terminology.)
> >> >>
> >> >>> http://codereview.chromium.org/174287/diff/7/1008
> >> >>> File net/socket/client_socket_handle.h (right):
> >> >>
> >> >>> http://codereview.chromium.org/174287/diff/7/1008#newcode31
> >> >>> Line 31: UNUSED =3D 0,   // unused socket that just finished conne=
ctin
> >> >>> nit: "connecting"
> >> >>
> >> >>
> >> >>
> >> >> http://codereview.chromium.org/174287
> >> >
> >> >
> >
> >
>

darin (slow to review)

Backing up a second... I agree that finding the right timeout value is the best ...

11 years, 4 months ago (2009-08-24 18:06:08 UTC) #15

Backing up a second... I agree that finding the right timeout value is the
best answer here.  If we can discover that, then it makes sense to reuse
sockets that were connected but never used.
If the only way to do that is to have this code in place temporarily, then
so be it.

If we can narrow the experiment to only the error condition we are
interested in, then that is probably best.

Q:  Why do you want to change it to only retry on ERR_CONNECTION_CLOSED and
not ERR_CONNECTION_RESET?  I thought this was about RST errors.

-Darin



On Mon, Aug 24, 2009 at 10:48 AM, William Chan (=E9=99=88=E6=99=BA=E6=98=8C=
)
<willchan@chromium.org>wrote:

> That's a very valid concern.  I'm happy to change it only to retry on
> the ERR_CONNECTION_CLOSED case (I've been meaning to gather data on
> what the distribution is of ERR_CONNECTION_* is too).  I'll go ahead
> and make that change.  I'm not clear though why this concern is only
> valid for a never-before-used socket.  If we get ERR_CONNECTION_CLOSED
> or ERR_CONNECTION_ABORTED, doesn't it seem possible that the same
> problem occurred?  Or perhaps it's just less likely?
>
> On Mon, Aug 24, 2009 at 10:41 AM, Darin Fisher<darin@chromium.org> wrote:
> > So, I think what you are doing is safe for the RST case since that erro=
r
> is
> > an indication that the other side did not receive the data.  But what
> about
> > ERR_CONNECTION_CLOSED and ERR_CONNECTION_ABORTED?
> > My concern with retrying on an error observed after attempting to write
> to a
> > never-before-used socket is that we might end up issuing the HTTP reque=
st
> > twice.  That could be very bad.
> > -Darin
> >
> > On Mon, Aug 24, 2009 at 10:34 AM, <willchan@chromium.org> wrote:
> >>
> >> I don't think the RST errors are a function of whether or not the newl=
y
> >> connected socket was immediately used, but how much time it sits idle
> >> before use.  I think it's perfectly fine to keep unused idle late-boun=
d
> >> sockets up to X idle time.  I'm trying to figure out what X should be
> >> based on histogram data.
> >>
> >> If you look at
> >>
> >>
> http://www.corp.google.com/~chrome-eng/dashboards/histograms/index.html?d=
ata_source=3Dyesterday.js&version=3D4.0.202.0-.&group=3DNet&histogram=3DNet
> ,
> >> you'll see that 90% of RSTs occur on a idle, unused socket happen afte=
r
> >> 10 seconds.  I misread the code and thought the idle sockets were time=
d
> >> out after 10 seconds, but I see now that it's just that the timer fire=
s
> >> every 10 seconds to time out idle sockets after 5 minutes, so I'm
> >> changing the histogram to measure up to 6 minutes.  I'm also adding
> >> histograms to see how long a socket sits idle before it gets used.  I
> >> plan to compare these two histograms to determine the appropriate amou=
nt
> >> of time to let sockets sit idle before they get closed.  I have the sa=
me
> >> data for already used (HTTP keep-alive), idle sockets, so it's possibl=
e
> >> we might want to time those out earlier than 5 seconds as well.
> >>
> >> This change is just to mitigate user visible ERR_CONNECTION_RESETs whi=
le
> >> I add the extra histograms this week to figure out an appropriate idle
> >> time.  Given the above data, are you still sure we never want to keep
> >> unused, idle late-bound sockets?  I haven't made that change yet,
> >> because I can't gather the data I want then.
> >>
> >> On 2009/08/24 16:54:41, darin wrote:
> >>>
> >>> I'm confused by this change.  If the idle time is very short for a
> >>
> >> late-bound
> >>>
> >>> socket, then won't it look a lot like a newly opened socket for a
> >>
> >> normal HTTP
> >>>
> >>> request?  In which case, it seems like you are saying that we should
> >>
> >> fallback to
> >>>
> >>> retrying for RST errors that occur on a newly opened socket.  That
> >>
> >> doesn't seem
> >>>
> >>> correct to me.
> >>
> >>> I think it is a mistake to keep idle late-bound sockets that are not
> >>
> >> used.  (I
> >>>
> >>> don't know if I am using the right terminology.)
> >>
> >>> http://codereview.chromium.org/174287/diff/7/1008
> >>> File net/socket/client_socket_handle.h (right):
> >>
> >>> http://codereview.chromium.org/174287/diff/7/1008#newcode31
> >>> Line 31: UNUSED =3D 0,   // unused socket that just finished connecti=
n
> >>> nit: "connecting"
> >>
> >>
> >>
> >> http://codereview.chromium.org/174287
> >
> >
>

wtc

Yes, ERR_CONNECTION_CLOSED isn't returned by socket functions when TCP is used. I looked into ERR_CONNECTION_CLOSED ...

11 years, 4 months ago (2009-08-24 18:18:38 UTC) #16

darin (slow to review)

On Mon, Aug 24, 2009 at 11:18 AM, <wtc@chromium.org> wrote: > Yes, ERR_CONNECTION_CLOSED isn't returned ...

11 years, 4 months ago (2009-08-24 20:56:29 UTC) #17

wtc

On 2009/08/24 20:56:29, darin wrote: > On Mon, Aug 24, 2009 at 11:18 AM, <mailto:wtc@chromium.org> ...

11 years, 4 months ago (2009-08-26 00:48:30 UTC) #18

darin (slow to review)

11 years, 4 months ago (2009-08-26 04:51:33 UTC) #19

On Tue, Aug 25, 2009 at 5:48 PM, <wtc@chromium.org> wrote:

> On 2009/08/24 20:56:29, darin wrote:
>
>> On Mon, Aug 24, 2009 at 11:18 AM, <mailto:wtc@chromium.org> wrote:
>>
>
>  > We should take this opportunity to define exactly what
>> > ERR_CONNECTION_CLOSED means because our error handling
>> > depends on this error code.  Perhaps we should add a new
>> > error code for SOCKS and other layered protocols, and remove
>> > ERR_CONNECTION_CLOSED.
>> >
>> >
>> +1
>>
>
>  What is the SOCKS use case?
>>
>
> SOCKS uses ERR_CONNECTION_CLOSED to mean the SOCKS
> layer gets an unexpected EOF (0, as opposed to an
> error like ECONNRESET) from the underlying socket.
> This is a reasonable interpretation of ERR_CONNECTION_CLOSED
> in a layered protocol.
>
> Should we define ERR_CONNECTION_CLOSED that way?
>
>
Maybe



> Or should we add a new error code ERR_UNEXPECTED_EOF
> (or more verbosely, ERR_UNDERLYING_CONNECTION_CLOSED)?
>


It seems like what we want is a signal from the socket that tells us whether
or not data that we wrote to it was delivered.  This is what TCP RST means
(closed without all sent packets receiving an ACK), so maybe the SOCKS layer
should just be returning ERR_CONNECTION_RESET if it finds the underlying
connection to be prematurely closed after it has accepted data from the
application layer.

-Darin





>
> Note: we could also use an *internal* error code ERR_EOF
> in two places to disambiguate a rv of 0 in DoLoop, which
> may mean
> - we read and reached EOF, or
> - we didn't need to read for some reason, for example,
>  in a HTTP response with a known length, we've reached
>  the end of the response body on a keep-alive connection
>  (hence no EOF).
> Right now we're using additional bool members to help
> disambiguate the 0 rv value.
>
>
> http://codereview.chromium.org/174287
>

Issue 174287: Resend on IO errors on late bound sockets that were idle. (Closed)

Description

Patch Set 1 #

Patch Set 2 : Add comments for ClientSocketHandle::SocketReuseType. #

Patch Set 3 : Update comment. #

Messages