|
|
Chromium Code Reviews|
Created:
11 years, 4 months ago by willchan no longer on Chromium Modified:
9 years, 7 months ago CC:
chromium-reviews_googlegroups.com Visibility:
Public. |
DescriptionResend on IO errors on late bound sockets that were idle.
According to UMA data, late bound sockets that were idle are significantly more likely to get (reset/close/abort) errors. Currently, we don't resend on late bound sockets that were idle because they weren't reused. This changes that.
TODO: determine how long a socket has to be idle before it is likely to get a TCP RST if we try to reuse it.
Also document the ClientSocketHandle::ReuseSocketType values.
BUG=http://crbug.com/18192.
Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=24084
Patch Set 1 #Patch Set 2 : Add comments for ClientSocketHandle::SocketReuseType. #
Total comments: 2
Patch Set 3 : Update comment. #
Total comments: 1
Messages
Total messages: 19 (0 generated)
LGTM. You should document the three SocketReuseType values in client_socket_handle.h.
On 2009/08/22 01:07:12, wtc wrote: > LGTM. > > You should document the three SocketReuseType > values in client_socket_handle.h. I've added documentation and will update the changelist description too before landing this.
http://codereview.chromium.org/174287/diff/4/1003 File net/http/http_network_transaction.cc (right): http://codereview.chromium.org/174287/diff/4/1003#newcode1557 Line 1557: // We didn't use an idle socket. I suggest that we modify this comment to match the new code more closely. One reason it took me a while to review this CL, other than it's Friday afternoon, was that this comment has a negative sense ("didn't use"), but the conditional expression is in a positive sense (==). How about "We used a socket that was never idle"?
http://codereview.chromium.org/174287/diff/4/1003 File net/http/http_network_transaction.cc (right): http://codereview.chromium.org/174287/diff/4/1003#newcode1557 Line 1557: // We didn't use an idle socket. On 2009/08/22 02:05:13, wtc wrote: > I suggest that we modify this comment to match the new > code more closely. > > One reason it took me a while to review this CL, > other than it's Friday afternoon, was that this comment > has a negative sense ("didn't use"), but the conditional > expression is in a positive sense (==). > > How about "We used a socket that was never idle"? Done.
I'm confused by this change. If the idle time is very short for a late-bound socket, then won't it look a lot like a newly opened socket for a normal HTTP request? In which case, it seems like you are saying that we should fallback to retrying for RST errors that occur on a newly opened socket. That doesn't seem correct to me. I think it is a mistake to keep idle late-bound sockets that are not used. (I don't know if I am using the right terminology.) http://codereview.chromium.org/174287/diff/7/1008 File net/socket/client_socket_handle.h (right): http://codereview.chromium.org/174287/diff/7/1008#newcode31 Line 31: UNUSED = 0, // unused socket that just finished connectin nit: "connecting"
I don't think the RST errors are a function of whether or not the newly connected socket was immediately used, but how much time it sits idle before use. I think it's perfectly fine to keep unused idle late-bound sockets up to X idle time. I'm trying to figure out what X should be based on histogram data. If you look at http://www.corp.google.com/~chrome-eng/dashboards/histograms/index.html?data_..., you'll see that 90% of RSTs occur on a idle, unused socket happen after 10 seconds. I misread the code and thought the idle sockets were timed out after 10 seconds, but I see now that it's just that the timer fires every 10 seconds to time out idle sockets after 5 minutes, so I'm changing the histogram to measure up to 6 minutes. I'm also adding histograms to see how long a socket sits idle before it gets used. I plan to compare these two histograms to determine the appropriate amount of time to let sockets sit idle before they get closed. I have the same data for already used (HTTP keep-alive), idle sockets, so it's possible we might want to time those out earlier than 5 seconds as well. This change is just to mitigate user visible ERR_CONNECTION_RESETs while I add the extra histograms this week to figure out an appropriate idle time. Given the above data, are you still sure we never want to keep unused, idle late-bound sockets? I haven't made that change yet, because I can't gather the data I want then. On 2009/08/24 16:54:41, darin wrote: > I'm confused by this change. If the idle time is very short for a late-bound > socket, then won't it look a lot like a newly opened socket for a normal HTTP > request? In which case, it seems like you are saying that we should fallback to > retrying for RST errors that occur on a newly opened socket. That doesn't seem > correct to me. > > I think it is a mistake to keep idle late-bound sockets that are not used. (I > don't know if I am using the right terminology.) > > http://codereview.chromium.org/174287/diff/7/1008 > File net/socket/client_socket_handle.h (right): > > http://codereview.chromium.org/174287/diff/7/1008#newcode31 > Line 31: UNUSED = 0, // unused socket that just finished connectin > nit: "connecting"
Woops, I forgot to mention which histograms to look at. Take a look at the Net.SocketIdleTimeOnIOError* histograms to see how long a socket is idle before it RSTs (or otherwise closes/aborts). On 2009/08/24 17:34:37, willchan wrote: > I don't think the RST errors are a function of whether or not the newly > connected socket was immediately used, but how much time it sits idle before > use. I think it's perfectly fine to keep unused idle late-bound sockets up to X > idle time. I'm trying to figure out what X should be based on histogram data. > > If you look at > http://www.corp.google.com/%7Echrome-eng/dashboards/histograms/index.html?dat..., > you'll see that 90% of RSTs occur on a idle, unused socket happen after 10 > seconds. I misread the code and thought the idle sockets were timed out after > 10 seconds, but I see now that it's just that the timer fires every 10 seconds > to time out idle sockets after 5 minutes, so I'm changing the histogram to > measure up to 6 minutes. I'm also adding histograms to see how long a socket > sits idle before it gets used. I plan to compare these two histograms to > determine the appropriate amount of time to let sockets sit idle before they get > closed. I have the same data for already used (HTTP keep-alive), idle sockets, > so it's possible we might want to time those out earlier than 5 seconds as well. > > This change is just to mitigate user visible ERR_CONNECTION_RESETs while I add > the extra histograms this week to figure out an appropriate idle time. Given > the above data, are you still sure we never want to keep unused, idle late-bound > sockets? I haven't made that change yet, because I can't gather the data I want > then. > > On 2009/08/24 16:54:41, darin wrote: > > I'm confused by this change. If the idle time is very short for a late-bound > > socket, then won't it look a lot like a newly opened socket for a normal HTTP > > request? In which case, it seems like you are saying that we should fallback > to > > retrying for RST errors that occur on a newly opened socket. That doesn't > seem > > correct to me. > > > > I think it is a mistake to keep idle late-bound sockets that are not used. (I > > don't know if I am using the right terminology.) > > > > http://codereview.chromium.org/174287/diff/7/1008 > > File net/socket/client_socket_handle.h (right): > > > > http://codereview.chromium.org/174287/diff/7/1008#newcode31 > > Line 31: UNUSED = 0, // unused socket that just finished connectin > > nit: "connecting"
On 2009/08/24 16:54:41, darin wrote: > > I think it is a mistake to keep idle late-bound sockets that are not used. (I > don't know if I am using the right terminology.) I also suggested this as a fix. Wan-Teh
Yeah, I know both of you said this :) Do you guys still stand by this, in spite of the histogram data on the IO errors corresponding to idle times? Perhaps I'm being too aggressive about trying to keep the sockets around? On Mon, Aug 24, 2009 at 10:49 AM, <wtc@chromium.org> wrote: > On 2009/08/24 16:54:41, darin wrote: > >> I think it is a mistake to keep idle late-bound sockets that are not > > used. =A0(I >> >> don't know if I am using the right terminology.) > > I also suggested this as a fix. > > Wan-Teh > > http://codereview.chromium.org/174287 >
That's a very valid concern. I'm happy to change it only to retry on the ERR_CONNECTION_CLOSED case (I've been meaning to gather data on what the distribution is of ERR_CONNECTION_* is too). I'll go ahead and make that change. I'm not clear though why this concern is only valid for a never-before-used socket. If we get ERR_CONNECTION_CLOSED or ERR_CONNECTION_ABORTED, doesn't it seem possible that the same problem occurred? Or perhaps it's just less likely? On Mon, Aug 24, 2009 at 10:41 AM, Darin Fisher<darin@chromium.org> wrote: > So, I think what you are doing is safe for the RST case since that error = is > an indication that the other side did not receive the data. =A0But what a= bout > ERR_CONNECTION_CLOSED and ERR_CONNECTION_ABORTED? > My concern with retrying on an error observed after attempting to write t= o a > never-before-used socket is that we might end up issuing the HTTP request > twice. =A0That could be very bad. > -Darin > > On Mon, Aug 24, 2009 at 10:34 AM, <willchan@chromium.org> wrote: >> >> I don't think the RST errors are a function of whether or not the newly >> connected socket was immediately used, but how much time it sits idle >> before use. =A0I think it's perfectly fine to keep unused idle late-boun= d >> sockets up to X idle time. =A0I'm trying to figure out what X should be >> based on histogram data. >> >> If you look at >> >> http://www.corp.google.com/~chrome-eng/dashboards/histograms/index.html?= data_source=3Dyesterday.js&version=3D4.0.202.0-.&group=3DNet&histogram=3DNe= t, >> you'll see that 90% of RSTs occur on a idle, unused socket happen after >> 10 seconds. =A0I misread the code and thought the idle sockets were time= d >> out after 10 seconds, but I see now that it's just that the timer fires >> every 10 seconds to time out idle sockets after 5 minutes, so I'm >> changing the histogram to measure up to 6 minutes. =A0I'm also adding >> histograms to see how long a socket sits idle before it gets used. =A0I >> plan to compare these two histograms to determine the appropriate amount >> of time to let sockets sit idle before they get closed. =A0I have the sa= me >> data for already used (HTTP keep-alive), idle sockets, so it's possible >> we might want to time those out earlier than 5 seconds as well. >> >> This change is just to mitigate user visible ERR_CONNECTION_RESETs while >> I add the extra histograms this week to figure out an appropriate idle >> time. =A0Given the above data, are you still sure we never want to keep >> unused, idle late-bound sockets? =A0I haven't made that change yet, >> because I can't gather the data I want then. >> >> On 2009/08/24 16:54:41, darin wrote: >>> >>> I'm confused by this change. =A0If the idle time is very short for a >> >> late-bound >>> >>> socket, then won't it look a lot like a newly opened socket for a >> >> normal HTTP >>> >>> request? =A0In which case, it seems like you are saying that we should >> >> fallback to >>> >>> retrying for RST errors that occur on a newly opened socket. =A0That >> >> doesn't seem >>> >>> correct to me. >> >>> I think it is a mistake to keep idle late-bound sockets that are not >> >> used. =A0(I >>> >>> don't know if I am using the right terminology.) >> >>> http://codereview.chromium.org/174287/diff/7/1008 >>> File net/socket/client_socket_handle.h (right): >> >>> http://codereview.chromium.org/174287/diff/7/1008#newcode31 >>> Line 31: UNUSED =3D 0, =A0 // unused socket that just finished connecti= n >>> nit: "connecting" >> >> >> >> http://codereview.chromium.org/174287 > >
So, I think what you are doing is safe for the RST case since that error is an indication that the other side did not receive the data. But what about ERR_CONNECTION_CLOSED and ERR_CONNECTION_ABORTED? My concern with retrying on an error observed after attempting to write to a never-before-used socket is that we might end up issuing the HTTP request twice. That could be very bad. -Darin On Mon, Aug 24, 2009 at 10:34 AM, <willchan@chromium.org> wrote: > I don't think the RST errors are a function of whether or not the newly > connected socket was immediately used, but how much time it sits idle > before use. I think it's perfectly fine to keep unused idle late-bound > sockets up to X idle time. I'm trying to figure out what X should be > based on histogram data. > > If you look at > > http://www.corp.google.com/~chrome-eng/dashboards/histograms/index.html?data_... > , > you'll see that 90% of RSTs occur on a idle, unused socket happen after > 10 seconds. I misread the code and thought the idle sockets were timed > out after 10 seconds, but I see now that it's just that the timer fires > every 10 seconds to time out idle sockets after 5 minutes, so I'm > changing the histogram to measure up to 6 minutes. I'm also adding > histograms to see how long a socket sits idle before it gets used. I > plan to compare these two histograms to determine the appropriate amount > of time to let sockets sit idle before they get closed. I have the same > data for already used (HTTP keep-alive), idle sockets, so it's possible > we might want to time those out earlier than 5 seconds as well. > > This change is just to mitigate user visible ERR_CONNECTION_RESETs while > I add the extra histograms this week to figure out an appropriate idle > time. Given the above data, are you still sure we never want to keep > unused, idle late-bound sockets? I haven't made that change yet, > because I can't gather the data I want then. > > > On 2009/08/24 16:54:41, darin wrote: > >> I'm confused by this change. If the idle time is very short for a >> > late-bound > >> socket, then won't it look a lot like a newly opened socket for a >> > normal HTTP > >> request? In which case, it seems like you are saying that we should >> > fallback to > >> retrying for RST errors that occur on a newly opened socket. That >> > doesn't seem > >> correct to me. >> > > I think it is a mistake to keep idle late-bound sockets that are not >> > used. (I > >> don't know if I am using the right terminology.) >> > > http://codereview.chromium.org/174287/diff/7/1008 >> File net/socket/client_socket_handle.h (right): >> > > http://codereview.chromium.org/174287/diff/7/1008#newcode31 >> Line 31: UNUSED = 0, // unused socket that just finished connectin >> nit: "connecting" >> > > > > http://codereview.chromium.org/174287 >
On Mon, Aug 24, 2009 at 10:54 AM, Darin Fisher<darin@chromium.org> wrote: > Backing up a second... I agree that finding the right timeout value is th= e > best answer here. =C2=A0If we can discover that, then it makes sense to r= euse > sockets that were connected but never used. > If the only way to do that is to have this code in place temporarily, the= n > so be it. Ok, cool. > If we can narrow the experiment to only the error condition we are > interested in, then that is probably best. Sounds great > Q: =C2=A0Why do you want to change it to only retry on ERR_CONNECTION_CLO= SED and > not ERR_CONNECTION_RESET? =C2=A0I thought this was about RST errors. Because I clearly haven't woken up yet :) Thanks for correcting that, I meant to say ERR_CONNECTION_RESET. > > -Darin > > > On Mon, Aug 24, 2009 at 10:48 AM, William Chan (=E9=99=88=E6=99=BA=E6=98= =8C) <willchan@chromium.org> > wrote: >> >> That's a very valid concern. =C2=A0I'm happy to change it only to retry = on >> the ERR_CONNECTION_CLOSED case (I've been meaning to gather data on >> what the distribution is of ERR_CONNECTION_* is too). =C2=A0I'll go ahea= d >> and make that change. =C2=A0I'm not clear though why this concern is onl= y >> valid for a never-before-used socket. =C2=A0If we get ERR_CONNECTION_CLO= SED >> or ERR_CONNECTION_ABORTED, doesn't it seem possible that the same >> problem occurred? =C2=A0Or perhaps it's just less likely? >> >> On Mon, Aug 24, 2009 at 10:41 AM, Darin Fisher<darin@chromium.org> wrote= : >> > So, I think what you are doing is safe for the RST case since that err= or >> > is >> > an indication that the other side did not receive the data. =C2=A0But = what >> > about >> > ERR_CONNECTION_CLOSED and ERR_CONNECTION_ABORTED? >> > My concern with retrying on an error observed after attempting to writ= e >> > to a >> > never-before-used socket is that we might end up issuing the HTTP >> > request >> > twice. =C2=A0That could be very bad. >> > -Darin >> > >> > On Mon, Aug 24, 2009 at 10:34 AM, <willchan@chromium.org> wrote: >> >> >> >> I don't think the RST errors are a function of whether or not the new= ly >> >> connected socket was immediately used, but how much time it sits idle >> >> before use. =C2=A0I think it's perfectly fine to keep unused idle lat= e-bound >> >> sockets up to X idle time. =C2=A0I'm trying to figure out what X shou= ld be >> >> based on histogram data. >> >> >> >> If you look at >> >> >> >> >> >> http://www.corp.google.com/~chrome-eng/dashboards/histograms/index.ht= ml?data_source=3Dyesterday.js&version=3D4.0.202.0-.&group=3DNet&histogram= =3DNet, >> >> you'll see that 90% of RSTs occur on a idle, unused socket happen aft= er >> >> 10 seconds. =C2=A0I misread the code and thought the idle sockets wer= e timed >> >> out after 10 seconds, but I see now that it's just that the timer fir= es >> >> every 10 seconds to time out idle sockets after 5 minutes, so I'm >> >> changing the histogram to measure up to 6 minutes. =C2=A0I'm also add= ing >> >> histograms to see how long a socket sits idle before it gets used. = =C2=A0I >> >> plan to compare these two histograms to determine the appropriate >> >> amount >> >> of time to let sockets sit idle before they get closed. =C2=A0I have = the >> >> same >> >> data for already used (HTTP keep-alive), idle sockets, so it's possib= le >> >> we might want to time those out earlier than 5 seconds as well. >> >> >> >> This change is just to mitigate user visible ERR_CONNECTION_RESETs >> >> while >> >> I add the extra histograms this week to figure out an appropriate idl= e >> >> time. =C2=A0Given the above data, are you still sure we never want to= keep >> >> unused, idle late-bound sockets? =C2=A0I haven't made that change yet= , >> >> because I can't gather the data I want then. >> >> >> >> On 2009/08/24 16:54:41, darin wrote: >> >>> >> >>> I'm confused by this change. =C2=A0If the idle time is very short fo= r a >> >> >> >> late-bound >> >>> >> >>> socket, then won't it look a lot like a newly opened socket for a >> >> >> >> normal HTTP >> >>> >> >>> request? =C2=A0In which case, it seems like you are saying that we s= hould >> >> >> >> fallback to >> >>> >> >>> retrying for RST errors that occur on a newly opened socket. =C2=A0T= hat >> >> >> >> doesn't seem >> >>> >> >>> correct to me. >> >> >> >>> I think it is a mistake to keep idle late-bound sockets that are not >> >> >> >> used. =C2=A0(I >> >>> >> >>> don't know if I am using the right terminology.) >> >> >> >>> http://codereview.chromium.org/174287/diff/7/1008 >> >>> File net/socket/client_socket_handle.h (right): >> >> >> >>> http://codereview.chromium.org/174287/diff/7/1008#newcode31 >> >>> Line 31: UNUSED =3D 0, =C2=A0 // unused socket that just finished co= nnectin >> >>> nit: "connecting" >> >> >> >> >> >> >> >> http://codereview.chromium.org/174287 >> > >> > > >
On Mon, Aug 24, 2009 at 10:56 AM, William Chan (=B3=C2=D6=C7=B2=FD) <willchan@chromium.org>wrote: > On Mon, Aug 24, 2009 at 10:54 AM, Darin Fisher<darin@chromium.org> wrote: > > Backing up a second... I agree that finding the right timeout value is > the > > best answer here. If we can discover that, then it makes sense to reus= e > > sockets that were connected but never used. > > If the only way to do that is to have this code in place temporarily, > then > > so be it. > > Ok, cool. > > > If we can narrow the experiment to only the error condition we are > > interested in, then that is probably best. > > Sounds great > > > Q: Why do you want to change it to only retry on ERR_CONNECTION_CLOSED > and > > not ERR_CONNECTION_RESET? I thought this was about RST errors. > > Because I clearly haven't woken up yet :) Thanks for correcting that, > I meant to say ERR_CONNECTION_RESET. > > okay, cool. -darin > > > > -Darin > > > > > > On Mon, Aug 24, 2009 at 10:48 AM, William Chan (=B3=C2=D6=C7=B2=FD) < > willchan@chromium.org> > > wrote: > >> > >> That's a very valid concern. I'm happy to change it only to retry on > >> the ERR_CONNECTION_CLOSED case (I've been meaning to gather data on > >> what the distribution is of ERR_CONNECTION_* is too). I'll go ahead > >> and make that change. I'm not clear though why this concern is only > >> valid for a never-before-used socket. If we get ERR_CONNECTION_CLOSED > >> or ERR_CONNECTION_ABORTED, doesn't it seem possible that the same > >> problem occurred? Or perhaps it's just less likely? > >> > >> On Mon, Aug 24, 2009 at 10:41 AM, Darin Fisher<darin@chromium.org> > wrote: > >> > So, I think what you are doing is safe for the RST case since that > error > >> > is > >> > an indication that the other side did not receive the data. But wha= t > >> > about > >> > ERR_CONNECTION_CLOSED and ERR_CONNECTION_ABORTED? > >> > My concern with retrying on an error observed after attempting to > write > >> > to a > >> > never-before-used socket is that we might end up issuing the HTTP > >> > request > >> > twice. That could be very bad. > >> > -Darin > >> > > >> > On Mon, Aug 24, 2009 at 10:34 AM, <willchan@chromium.org> wrote: > >> >> > >> >> I don't think the RST errors are a function of whether or not the > newly > >> >> connected socket was immediately used, but how much time it sits id= le > >> >> before use. I think it's perfectly fine to keep unused idle > late-bound > >> >> sockets up to X idle time. I'm trying to figure out what X should = be > >> >> based on histogram data. > >> >> > >> >> If you look at > >> >> > >> >> > >> >> > http://www.corp.google.com/~chrome-eng/dashboards/histograms/index.html?d= ata_source=3Dyesterday.js&version=3D4.0.202.0-.&group=3DNet&histogram=3DNet > , > >> >> you'll see that 90% of RSTs occur on a idle, unused socket happen > after > >> >> 10 seconds. I misread the code and thought the idle sockets were > timed > >> >> out after 10 seconds, but I see now that it's just that the timer > fires > >> >> every 10 seconds to time out idle sockets after 5 minutes, so I'm > >> >> changing the histogram to measure up to 6 minutes. I'm also adding > >> >> histograms to see how long a socket sits idle before it gets used. = I > >> >> plan to compare these two histograms to determine the appropriate > >> >> amount > >> >> of time to let sockets sit idle before they get closed. I have the > >> >> same > >> >> data for already used (HTTP keep-alive), idle sockets, so it's > possible > >> >> we might want to time those out earlier than 5 seconds as well. > >> >> > >> >> This change is just to mitigate user visible ERR_CONNECTION_RESETs > >> >> while > >> >> I add the extra histograms this week to figure out an appropriate > idle > >> >> time. Given the above data, are you still sure we never want to ke= ep > >> >> unused, idle late-bound sockets? I haven't made that change yet, > >> >> because I can't gather the data I want then. > >> >> > >> >> On 2009/08/24 16:54:41, darin wrote: > >> >>> > >> >>> I'm confused by this change. If the idle time is very short for a > >> >> > >> >> late-bound > >> >>> > >> >>> socket, then won't it look a lot like a newly opened socket for a > >> >> > >> >> normal HTTP > >> >>> > >> >>> request? In which case, it seems like you are saying that we shou= ld > >> >> > >> >> fallback to > >> >>> > >> >>> retrying for RST errors that occur on a newly opened socket. That > >> >> > >> >> doesn't seem > >> >>> > >> >>> correct to me. > >> >> > >> >>> I think it is a mistake to keep idle late-bound sockets that are n= ot > >> >> > >> >> used. (I > >> >>> > >> >>> don't know if I am using the right terminology.) > >> >> > >> >>> http://codereview.chromium.org/174287/diff/7/1008 > >> >>> File net/socket/client_socket_handle.h (right): > >> >> > >> >>> http://codereview.chromium.org/174287/diff/7/1008#newcode31 > >> >>> Line 31: UNUSED =3D 0, // unused socket that just finished conne= ctin > >> >>> nit: "connecting" > >> >> > >> >> > >> >> > >> >> http://codereview.chromium.org/174287 > >> > > >> > > > > > >
Backing up a second... I agree that finding the right timeout value is the best answer here. If we can discover that, then it makes sense to reuse sockets that were connected but never used. If the only way to do that is to have this code in place temporarily, then so be it. If we can narrow the experiment to only the error condition we are interested in, then that is probably best. Q: Why do you want to change it to only retry on ERR_CONNECTION_CLOSED and not ERR_CONNECTION_RESET? I thought this was about RST errors. -Darin On Mon, Aug 24, 2009 at 10:48 AM, William Chan (=E9=99=88=E6=99=BA=E6=98=8C= ) <willchan@chromium.org>wrote: > That's a very valid concern. I'm happy to change it only to retry on > the ERR_CONNECTION_CLOSED case (I've been meaning to gather data on > what the distribution is of ERR_CONNECTION_* is too). I'll go ahead > and make that change. I'm not clear though why this concern is only > valid for a never-before-used socket. If we get ERR_CONNECTION_CLOSED > or ERR_CONNECTION_ABORTED, doesn't it seem possible that the same > problem occurred? Or perhaps it's just less likely? > > On Mon, Aug 24, 2009 at 10:41 AM, Darin Fisher<darin@chromium.org> wrote: > > So, I think what you are doing is safe for the RST case since that erro= r > is > > an indication that the other side did not receive the data. But what > about > > ERR_CONNECTION_CLOSED and ERR_CONNECTION_ABORTED? > > My concern with retrying on an error observed after attempting to write > to a > > never-before-used socket is that we might end up issuing the HTTP reque= st > > twice. That could be very bad. > > -Darin > > > > On Mon, Aug 24, 2009 at 10:34 AM, <willchan@chromium.org> wrote: > >> > >> I don't think the RST errors are a function of whether or not the newl= y > >> connected socket was immediately used, but how much time it sits idle > >> before use. I think it's perfectly fine to keep unused idle late-boun= d > >> sockets up to X idle time. I'm trying to figure out what X should be > >> based on histogram data. > >> > >> If you look at > >> > >> > http://www.corp.google.com/~chrome-eng/dashboards/histograms/index.html?d= ata_source=3Dyesterday.js&version=3D4.0.202.0-.&group=3DNet&histogram=3DNet > , > >> you'll see that 90% of RSTs occur on a idle, unused socket happen afte= r > >> 10 seconds. I misread the code and thought the idle sockets were time= d > >> out after 10 seconds, but I see now that it's just that the timer fire= s > >> every 10 seconds to time out idle sockets after 5 minutes, so I'm > >> changing the histogram to measure up to 6 minutes. I'm also adding > >> histograms to see how long a socket sits idle before it gets used. I > >> plan to compare these two histograms to determine the appropriate amou= nt > >> of time to let sockets sit idle before they get closed. I have the sa= me > >> data for already used (HTTP keep-alive), idle sockets, so it's possibl= e > >> we might want to time those out earlier than 5 seconds as well. > >> > >> This change is just to mitigate user visible ERR_CONNECTION_RESETs whi= le > >> I add the extra histograms this week to figure out an appropriate idle > >> time. Given the above data, are you still sure we never want to keep > >> unused, idle late-bound sockets? I haven't made that change yet, > >> because I can't gather the data I want then. > >> > >> On 2009/08/24 16:54:41, darin wrote: > >>> > >>> I'm confused by this change. If the idle time is very short for a > >> > >> late-bound > >>> > >>> socket, then won't it look a lot like a newly opened socket for a > >> > >> normal HTTP > >>> > >>> request? In which case, it seems like you are saying that we should > >> > >> fallback to > >>> > >>> retrying for RST errors that occur on a newly opened socket. That > >> > >> doesn't seem > >>> > >>> correct to me. > >> > >>> I think it is a mistake to keep idle late-bound sockets that are not > >> > >> used. (I > >>> > >>> don't know if I am using the right terminology.) > >> > >>> http://codereview.chromium.org/174287/diff/7/1008 > >>> File net/socket/client_socket_handle.h (right): > >> > >>> http://codereview.chromium.org/174287/diff/7/1008#newcode31 > >>> Line 31: UNUSED =3D 0, // unused socket that just finished connecti= n > >>> nit: "connecting" > >> > >> > >> > >> http://codereview.chromium.org/174287 > > > > >
Yes, ERR_CONNECTION_CLOSED isn't returned by socket functions when TCP is used. I looked into ERR_CONNECTION_CLOSED when Darin added that error code. I remember my research showed that it was based on the Winsock error code WSAEDISCONN: http://msdn.microsoft.com/en-us/library/ms741688(VS.85).aspx which is only used by message-oriented protocols. This is why I added a NOTREACHED assertion in tcp_client_socket_win.cc. But it is now being used by SOCKS sockets. We should take this opportunity to define exactly what ERR_CONNECTION_CLOSED means because our error handling depends on this error code. Perhaps we should add a new error code for SOCKS and other layered protocols, and remove ERR_CONNECTION_CLOSED.
On Mon, Aug 24, 2009 at 11:18 AM, <wtc@chromium.org> wrote: > Yes, ERR_CONNECTION_CLOSED isn't returned by socket functions > when TCP is used. > > I looked into ERR_CONNECTION_CLOSED when Darin added that error > code. I remember my research showed that it was based on the > Winsock error code WSAEDISCONN: > http://msdn.microsoft.com/en-us/library/ms741688(VS.85).aspx > which is only used by message-oriented protocols. This is why > I added a NOTREACHED assertion in tcp_client_socket_win.cc. > But it is now being used by SOCKS sockets. > > We should take this opportunity to define exactly what > ERR_CONNECTION_CLOSED means because our error handling > depends on this error code. Perhaps we should add a new > error code for SOCKS and other layered protocols, and remove > ERR_CONNECTION_CLOSED. > > +1 What is the SOCKS use case? -Darin > > http://codereview.chromium.org/174287 >
On 2009/08/24 20:56:29, darin wrote: > On Mon, Aug 24, 2009 at 11:18 AM, <mailto:wtc@chromium.org> wrote: > > > We should take this opportunity to define exactly what > > ERR_CONNECTION_CLOSED means because our error handling > > depends on this error code. Perhaps we should add a new > > error code for SOCKS and other layered protocols, and remove > > ERR_CONNECTION_CLOSED. > > > > > +1 > > What is the SOCKS use case? SOCKS uses ERR_CONNECTION_CLOSED to mean the SOCKS layer gets an unexpected EOF (0, as opposed to an error like ECONNRESET) from the underlying socket. This is a reasonable interpretation of ERR_CONNECTION_CLOSED in a layered protocol. Should we define ERR_CONNECTION_CLOSED that way? Or should we add a new error code ERR_UNEXPECTED_EOF (or more verbosely, ERR_UNDERLYING_CONNECTION_CLOSED)? Note: we could also use an *internal* error code ERR_EOF in two places to disambiguate a rv of 0 in DoLoop, which may mean - we read and reached EOF, or - we didn't need to read for some reason, for example, in a HTTP response with a known length, we've reached the end of the response body on a keep-alive connection (hence no EOF). Right now we're using additional bool members to help disambiguate the 0 rv value.
On Tue, Aug 25, 2009 at 5:48 PM, <wtc@chromium.org> wrote: > On 2009/08/24 20:56:29, darin wrote: > >> On Mon, Aug 24, 2009 at 11:18 AM, <mailto:wtc@chromium.org> wrote: >> > > > We should take this opportunity to define exactly what >> > ERR_CONNECTION_CLOSED means because our error handling >> > depends on this error code. Perhaps we should add a new >> > error code for SOCKS and other layered protocols, and remove >> > ERR_CONNECTION_CLOSED. >> > >> > >> +1 >> > > What is the SOCKS use case? >> > > SOCKS uses ERR_CONNECTION_CLOSED to mean the SOCKS > layer gets an unexpected EOF (0, as opposed to an > error like ECONNRESET) from the underlying socket. > This is a reasonable interpretation of ERR_CONNECTION_CLOSED > in a layered protocol. > > Should we define ERR_CONNECTION_CLOSED that way? > > Maybe > Or should we add a new error code ERR_UNEXPECTED_EOF > (or more verbosely, ERR_UNDERLYING_CONNECTION_CLOSED)? > It seems like what we want is a signal from the socket that tells us whether or not data that we wrote to it was delivered. This is what TCP RST means (closed without all sent packets receiving an ACK), so maybe the SOCKS layer should just be returning ERR_CONNECTION_RESET if it finds the underlying connection to be prematurely closed after it has accepted data from the application layer. -Darin > > Note: we could also use an *internal* error code ERR_EOF > in two places to disambiguate a rv of 0 in DoLoop, which > may mean > - we read and reached EOF, or > - we didn't need to read for some reason, for example, > in a HTTP response with a known length, we've reached > the end of the response body on a keep-alive connection > (hence no EOF). > Right now we're using additional bool members to help > disambiguate the 0 rv value. > > > http://codereview.chromium.org/174287 > |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
