|
|
Created:
9 years, 5 months ago by Sergey Ulanov Modified:
9 years, 5 months ago CC:
chromium-reviews, jamiewalch+watch_chromium.org, hclam+watch_chromium.org, cbentzel+watch_chromium.org, simonmorris+watch_chromium.org, wez+watch_chromium.org, Paweł Hajdan Jr., dmaclach+watch_chromium.org, garykac+watch_chromium.org, lambroslambrou+watch_chromium.org, darin-cc_chromium.org, ajwong+watch_chromium.org, sergeyu+watch_chromium.org Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionFix instability in SSL client/server sockets
When writing to a socket, net::StreamSocket::Write() may return
immediately, but it may not write the whole buffer passed to it.
In this case SSL sockets were not trying to call Write() again
to send rest of the data, until client tries to call Write()
next time. If client doesn't call Write the remaining data is
never sent to the transport socket.
remoting_unittests was flaky due to this bug.
BUG=88726
TEST=updated net_unittest to catch the bug, remoting_unittests is not flaky
Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=93445
Patch Set 1 #Patch Set 2 : - #
Total comments: 1
Patch Set 3 : Fix similar issue in OnSendComplete() #
Total comments: 11
Patch Set 4 : Simpler fix #
Total comments: 4
Patch Set 5 : Fix test failure #Patch Set 6 : address feedback, update tests #Patch Set 7 : Update FakeSocketTest. #
Total comments: 19
Patch Set 8 : - #
Total comments: 2
Patch Set 9 : - #Patch Set 10 : - #
Messages
Total messages: 18 (0 generated)
http://codereview.chromium.org/7399025/diff/4001/net/socket/ssl_client_socket... File net/socket/ssl_client_socket_nss.cc (right): http://codereview.chromium.org/7399025/diff/4001/net/socket/ssl_client_socket... net/socket/ssl_client_socket_nss.cc:1753: network_moved = (nsent > 0 || nreceived > 0); I'm not sure why why we were returning true for nreceived=0 here. Changed it here because otherwise we may try to read indefinitely. Let me know if this is not correct.
http://codereview.chromium.org/7399025/diff/5002/net/socket/ssl_client_socket... File net/socket/ssl_client_socket_nss.cc (left): http://codereview.chromium.org/7399025/diff/5002/net/socket/ssl_client_socket... net/socket/ssl_client_socket_nss.cc:1745: network_moved = (nsent > 0 || nreceived >= 0); A zero return value from the underlying net::Socket::Read() indicates EOF. Changing the |network_moved| test to |nreceived| > 0 means that DoTransportIO() will return false (no I/O performed) at EOF, which seems counter-intuitive. If the underlying transport connection returns EOF, the SSLClientSocket should try to ensure that if the SSL teardown handshake was processed then the caller gets a zero result from the next (or pending) Read, otherwise they get an error. Do we care about that, though? The result of DoTransportIO() should be consistent between SSLClientSocketNSS and SSLServerSocketNSS. http://codereview.chromium.org/7399025/diff/5002/net/socket/ssl_client_socket... File net/socket/ssl_client_socket_nss.cc (right): http://codereview.chromium.org/7399025/diff/5002/net/socket/ssl_client_socket... net/socket/ssl_client_socket_nss.cc:1196: } while (network_moved); This loop used to exit on error, or if one or other of the payload operations succeeded. If the loop continues after DoPayloadRead() or DoPayloadWrite() have returned success, because |network_moved| is true, then they will end up writing the same data to the channel again, or reading over the data that they just read; neither function clears the buffer pointer on success. Since it's also possible for *both* DoPayloadRead() and DoPayloadWrite() to succeed (or fail), we may end up calling both DoReadCallback() *and* DoWriteCallback(), which will cause a crash if the caller deletes the SSLClientSocket in response to the DoReadCallback(). http://codereview.chromium.org/7399025/diff/5002/net/socket/ssl_client_socket... net/socket/ssl_client_socket_nss.cc:1323: } while (network_moved); This will continue to loop on PR_Write() errors. http://codereview.chromium.org/7399025/diff/5002/net/socket/ssl_server_socket... File net/socket/ssl_server_socket_nss.cc (right): http://codereview.chromium.org/7399025/diff/5002/net/socket/ssl_server_socket... net/socket/ssl_server_socket_nss.cc:431: } while (network_moved); I take it that while (DoTransportIO()) { } is bad style? http://codereview.chromium.org/7399025/diff/5002/net/socket/ssl_server_socket... net/socket/ssl_server_socket_nss.cc:654: } while (network_moved); The loop now won't exit on failure (or success!) of DoPayloadWrite.
Implemented a simpler fix: now synchronous read/write are handled in DoTransportIO() instead of the code that calls DoTransportIO(). http://codereview.chromium.org/7399025/diff/5002/net/socket/ssl_client_socket... File net/socket/ssl_client_socket_nss.cc (left): http://codereview.chromium.org/7399025/diff/5002/net/socket/ssl_client_socket... net/socket/ssl_client_socket_nss.cc:1745: network_moved = (nsent > 0 || nreceived >= 0); On 2011/07/18 21:53:33, Wez wrote: > A zero return value from the underlying net::Socket::Read() indicates EOF. > Changing the |network_moved| test to |nreceived| > 0 means that DoTransportIO() > will return false (no I/O performed) at EOF, which seems counter-intuitive. The previous behavior may be counter-intuitive depending on how you look at it. E.g. nreceived = 0 means that we haven't received anything, and we shouldn't try reading again. > If the underlying transport connection returns EOF, the SSLClientSocket should > try to ensure that if the SSL teardown handshake was processed then the caller > gets a zero result from the next (or pending) Read, otherwise they get an error. > Do we care about that, though? We pass nreceived = 0 to memio (see memio_PutReadResult() call in BufferRecv()), and I think it should be properly propagated to the client layer. > > The result of DoTransportIO() should be consistent between SSLClientSocketNSS > and SSLServerSocketNSS. http://codereview.chromium.org/7399025/diff/5002/net/socket/ssl_client_socket... File net/socket/ssl_client_socket_nss.cc (right): http://codereview.chromium.org/7399025/diff/5002/net/socket/ssl_client_socket... net/socket/ssl_client_socket_nss.cc:1196: } while (network_moved); On 2011/07/18 21:53:33, Wez wrote: > This loop used to exit on error, or if one or other of the payload operations > succeeded. > > If the loop continues after DoPayloadRead() or DoPayloadWrite() have returned > success, because |network_moved| is true, then they will end up writing the same > data to the channel again, or reading over the data that they just read; neither > function clears the buffer pointer on success. > > Since it's also possible for *both* DoPayloadRead() and DoPayloadWrite() to > succeed (or fail), we may end up calling both DoReadCallback() *and* > DoWriteCallback(), which will cause a crash if the caller deletes the > SSLClientSocket in response to the DoReadCallback(). Actually the code below may crash even when only DoPayloadRead() succeeds: if the socket is destroyed when we call DoReadCallback() then we may crash when checking |user_write_buf_|. In any case, this should be dealt with in a separate CL. http://codereview.chromium.org/7399025/diff/5002/net/socket/ssl_client_socket... net/socket/ssl_client_socket_nss.cc:1323: } while (network_moved); On 2011/07/18 21:53:33, Wez wrote: > This will continue to loop on PR_Write() errors. Why can't we continue pushing data to the transport socket when PR_Write() fails? What can cause PR_Write() to fail? Anyway, I reverted this change. http://codereview.chromium.org/7399025/diff/5002/net/socket/ssl_server_socket... File net/socket/ssl_server_socket_nss.cc (right): http://codereview.chromium.org/7399025/diff/5002/net/socket/ssl_server_socket... net/socket/ssl_server_socket_nss.cc:431: } while (network_moved); On 2011/07/18 21:53:33, Wez wrote: > I take it that > > while (DoTransportIO()) { } > > is bad style? reverted this change http://codereview.chromium.org/7399025/diff/5002/net/socket/ssl_server_socket... net/socket/ssl_server_socket_nss.cc:654: } while (network_moved); On 2011/07/18 21:53:33, Wez wrote: > The loop now won't exit on failure (or success!) of DoPayloadWrite. Same as in client sockets: why can't we continue pushing data to the transport socket? It won't try calling DoPayloadWrite again after it returned anything other than ERR_IO_PENDING.
LGTM, for what it's worth. Can you create a bug for the delete-in-callback issue, though, please? http://codereview.chromium.org/7399025/diff/9001/net/socket/ssl_client_socket... File net/socket/ssl_client_socket_nss.cc (right): http://codereview.chromium.org/7399025/diff/9001/net/socket/ssl_client_socket... net/socket/ssl_client_socket_nss.cc:1738: // Return true if some I/O performed, false otherwise (error or ERR_IO_PENDING) nit: Update this comment to indicate that true is returned only if one or more bytes are read or written. I would regard reading EOF from the wire as "some I/O performed", which will now return false.
http://codereview.chromium.org/7399025/diff/9001/net/socket/ssl_client_socket... File net/socket/ssl_client_socket_nss.cc (right): http://codereview.chromium.org/7399025/diff/9001/net/socket/ssl_client_socket... net/socket/ssl_client_socket_nss.cc:1742: if (nss_bufs_ != NULL) { nit: Actually, a comment here to explain the need for the two loops would help.
Addressed the comments. Also updated the tests so that they would catch this bug. http://codereview.chromium.org/7399025/diff/9001/net/socket/ssl_client_socket... File net/socket/ssl_client_socket_nss.cc (right): http://codereview.chromium.org/7399025/diff/9001/net/socket/ssl_client_socket... net/socket/ssl_client_socket_nss.cc:1738: // Return true if some I/O performed, false otherwise (error or ERR_IO_PENDING) On 2011/07/19 20:32:18, Wez wrote: > nit: Update this comment to indicate that true is returned only if one or more > bytes are read or written. I would regard reading EOF from the wire as "some > I/O performed", which will now return false. Updated the comment. I didn't changed meaning of the result in case of EOF. http://codereview.chromium.org/7399025/diff/9001/net/socket/ssl_client_socket... net/socket/ssl_client_socket_nss.cc:1742: if (nss_bufs_ != NULL) { On 2011/07/19 20:33:07, Wez wrote: > nit: Actually, a comment here to explain the need for the two loops would help. Done.
Wan-Teh, ping? I'd like to fix this problem in M14, i.e. need to land it by EOD next Monday, and I'll be out on Friday.
LGTM. Thanks! willchan: I ask you to review a piece of code. Please search for "willchan" in my comments below. sergeyu: I haven't read Wez's review comments yet. I will do that next. Please read the two comments below marked with "IMPORTANT" to verify my understanding of this bug. If I am right, this CL won't fix the bug completely, and you should update the CL's commit message to describe the bug accurately. Since the CL fixes the flaky tests, you should still check it in. http://codereview.chromium.org/7399025/diff/18001/net/socket/ssl_server_socke... File net/socket/ssl_server_socket_nss.cc (right): http://codereview.chromium.org/7399025/diff/18001/net/socket/ssl_server_socke... net/socket/ssl_server_socket_nss.cc:428: DoTransportIO(); IMPORTANT: We may get here via this call stack: BufferSendComplete > OnSendComplete > DoTransportIO This DoTransportIO call does call Write() again to send the rest of the data. So I believe your diagnosis of the bug is not entirely correct. If the SSLServerSocketNSS object stays around, it will flush the rest of the data. So I believe the bug occurs only if we destroy the SSLServerSocketNSS object when we think the SSL Write() has succeeded. http://codereview.chromium.org/7399025/diff/18001/net/socket/ssl_server_socke... net/socket/ssl_server_socket_nss.cc:531: // Do as much as possible network I/O between the buffer and the as much as possible network I/O => as much network I/O as possible Please make the same change to ssl_client_socket_nss.cc. http://codereview.chromium.org/7399025/diff/18001/net/socket/ssl_server_socke... net/socket/ssl_server_socket_nss.cc:544: } while (rv > 0); IMPORTANT: Unless this BufferSend loop empties the nss_bufs_, it won't fix the bug completely. I believe the last SSLServerSocket::Write() call before the SSLServerSocket is destroyed is still vulnerable to this bug. The real fix is for SSLServerSocket::Write to not complete until the nss_bufs_ has been emptied. Do you agree? http://codereview.chromium.org/7399025/diff/18001/net/socket/ssl_server_socke... net/socket/ssl_server_socket_nss.cc:549: } while (rv > 0); The loop around BufferRecv should not be necessary. http://codereview.chromium.org/7399025/diff/18001/net/socket/ssl_server_socke... File net/socket/ssl_server_socket_unittest.cc (right): http://codereview.chromium.org/7399025/diff/18001/net/socket/ssl_server_socke... net/socket/ssl_server_socket_unittest.cc:74: &FakeDataChannel::DoReadCallback)); willchan: can you review the use of ScopedRunnableMethodFactory in this class? sergeyu: why do you need to post a task to run DoReadCallback()? http://codereview.chromium.org/7399025/diff/18001/net/socket/ssl_server_socke... net/socket/ssl_server_socket_unittest.cc:80: if (!read_callback_ || !data_.size()) Nit: some people prefer data_.empty(). Why do you need to check if data_ is empty? It seems to imply that you are calling DoReadCallback more times than you should. http://codereview.chromium.org/7399025/diff/18001/net/socket/ssl_server_socke... net/socket/ssl_server_socket_unittest.cc:135: buf_len = rand() % buf_len + 1; Are you using rand() instead of base::RandInt() because you need a repeatable sequence of pseudorandom numbers? http://codereview.chromium.org/7399025/diff/18001/net/socket/ssl_server_socke... net/socket/ssl_server_socket_unittest.cc:225: int read = client.Read(read_buf, kReadBufSize, NULL); Nit: name these local variables nwritten and nread or bytes_written and bytes_read. "read" is the name of a Unix system call.
http://codereview.chromium.org/7399025/diff/18001/net/socket/ssl_server_socke... File net/socket/ssl_server_socket_nss.cc (right): http://codereview.chromium.org/7399025/diff/18001/net/socket/ssl_server_socke... net/socket/ssl_server_socket_nss.cc:428: DoTransportIO(); On 2011/07/21 00:27:12, wtc wrote: > IMPORTANT: We may get here via this call stack: > BufferSendComplete > OnSendComplete > DoTransportIO > > This DoTransportIO call does call Write() again to send > the rest of the data. So I believe your diagnosis of the > bug is not entirely correct. If the SSLServerSocketNSS > object stays around, it will flush the rest of the data. > So I believe the bug occurs only if we destroy the > SSLServerSocketNSS object when we think the SSL Write() > has succeeded. BufferSendComplete() is called from Write callback, but the write callback is never called when write completes synchronously. http://codereview.chromium.org/7399025/diff/18001/net/socket/ssl_server_socke... net/socket/ssl_server_socket_nss.cc:544: } while (rv > 0); On 2011/07/21 00:27:12, wtc wrote: > IMPORTANT: Unless this BufferSend loop empties the > nss_bufs_, it won't fix the bug completely. I believe the > last SSLServerSocket::Write() call before the > SSLServerSocket is destroyed is still vulnerable to this > bug. > > The real fix is for SSLServerSocket::Write to not complete > until the nss_bufs_ has been emptied. > > Do you agree? This loop can exit in 3 cases: 1. There is no pending data in nss_bufs_ - that's what we want, 2. BufferSend() returned ERR_IO_PENDING - in that case BufferSendComplete will be called later, which will call DoTransportIO() again. 3. BufferSend() returned another error - in that case we don't care about writing rest of the data.
http://codereview.chromium.org/7399025/diff/5002/net/socket/ssl_client_socket... File net/socket/ssl_client_socket_nss.cc (right): http://codereview.chromium.org/7399025/diff/5002/net/socket/ssl_client_socket... net/socket/ssl_client_socket_nss.cc:1196: } while (network_moved); On 2011/07/18 21:53:33, Wez wrote: > > Since it's also possible for *both* DoPayloadRead() and DoPayloadWrite() to > succeed (or fail), we may end up calling both DoReadCallback() *and* > DoWriteCallback(), which will cause a crash if the caller deletes the > SSLClientSocket in response to the DoReadCallback(). Wez, you are right on. I recently noticed this problem during a code review and filed http://crbug.com/88108.
http://codereview.chromium.org/7399025/diff/18001/net/socket/ssl_server_socke... File net/socket/ssl_server_socket_nss.cc (right): http://codereview.chromium.org/7399025/diff/18001/net/socket/ssl_server_socke... net/socket/ssl_server_socket_nss.cc:531: // Do as much as possible network I/O between the buffer and the On 2011/07/21 00:27:12, wtc wrote: > as much as possible network I/O => as much network I/O as possible > > Please make the same change to ssl_client_socket_nss.cc. Done. http://codereview.chromium.org/7399025/diff/18001/net/socket/ssl_server_socke... net/socket/ssl_server_socket_nss.cc:549: } while (rv > 0); On 2011/07/21 00:27:12, wtc wrote: > The loop around BufferRecv should not be necessary. Yes, agree, this loop isn't neccessary, but it may make reading more efficient, don't think it matters much though. Removed it. http://codereview.chromium.org/7399025/diff/18001/net/socket/ssl_server_socke... File net/socket/ssl_server_socket_unittest.cc (right): http://codereview.chromium.org/7399025/diff/18001/net/socket/ssl_server_socke... net/socket/ssl_server_socket_unittest.cc:74: &FakeDataChannel::DoReadCallback)); On 2011/07/21 00:27:12, wtc wrote: > willchan: can you review the use of ScopedRunnableMethodFactory > in this class? > > sergeyu: why do you need to post a task to run DoReadCallback()? Because DoReadCallback() calls callback on the socket on the other end, which then may then send some data back which would invoke callback on the first socket. I.e. SSL socket calls Write() on the fake socket, and receives read callback before Write() returns - this is not expected by SSL sockets code, so it crashes. http://codereview.chromium.org/7399025/diff/18001/net/socket/ssl_server_socke... net/socket/ssl_server_socket_unittest.cc:80: if (!read_callback_ || !data_.size()) On 2011/07/21 00:27:12, wtc wrote: > Nit: some people prefer data_.empty(). > > Why do you need to check if data_ is empty? > It seems to imply that you are calling DoReadCallback > more times than you should. Done.
sergeyu: thank you for the explanation. I now understand the bug you encountered, and agree that your commit message describes the bug accurately. May I suggest a small change? Please change may not write the whole buffer passed to it to may not write the whole encrypted buffer The problem I pointed out in my last comment is different. See below. http://codereview.chromium.org/7399025/diff/18001/net/socket/ssl_server_socke... File net/socket/ssl_server_socket_nss.cc (right): http://codereview.chromium.org/7399025/diff/18001/net/socket/ssl_server_socke... net/socket/ssl_server_socket_nss.cc:544: } while (rv > 0); On 2011/07/21 00:33:28, sergeyu wrote: > > This loop can exit in 3 cases: > 1. There is no pending data in nss_bufs_ - that's what we want, > 2. BufferSend() returned ERR_IO_PENDING - in that case BufferSendComplete will > be called later, which will call DoTransportIO() again. > 3. BufferSend() returned another error - in that case we don't care about > writing rest of the data. It is case 2 that I am worried about. The caller may destroy this SSLServerSocketNSS object before the transport socket Write() completes asynchronously. Then the peer of the SSL connection won't receive all the data. But now I know case 1 will fix the bug you encountered.
http://codereview.chromium.org/7399025/diff/24001/net/socket/ssl_client_socke... File net/socket/ssl_client_socket_nss.cc (right): http://codereview.chromium.org/7399025/diff/24001/net/socket/ssl_client_socke... net/socket/ssl_client_socket_nss.cc:1761: // because Read() and Write() may return synchronously. This comment should be updated to only describe the loop around BufferSend. Make the same change to ssl_server_socket_nss.cc.
>sergeyu: thank you for the explanation. I now understand >the bug you encountered, and agree that your commit message >describes the bug accurately. May I suggest a small change? >Please change > may not write the whole buffer passed to it >to > may not write the whole encrypted buffer The first sentence is about net::Socket interface in general, so I believe it's correct. http://codereview.chromium.org/7399025/diff/18001/net/socket/ssl_server_socke... File net/socket/ssl_server_socket_nss.cc (right): http://codereview.chromium.org/7399025/diff/18001/net/socket/ssl_server_socke... net/socket/ssl_server_socket_nss.cc:544: } while (rv > 0); On 2011/07/21 00:56:45, wtc wrote: > On 2011/07/21 00:33:28, sergeyu wrote: > > > > This loop can exit in 3 cases: > > 1. There is no pending data in nss_bufs_ - that's what we want, > > 2. BufferSend() returned ERR_IO_PENDING - in that case BufferSendComplete > will > > be called later, which will call DoTransportIO() again. > > 3. BufferSend() returned another error - in that case we don't care about > > writing rest of the data. > > It is case 2 that I am worried about. > > The caller may destroy this SSLServerSocketNSS object > before the transport socket Write() completes asynchronously. > Then the peer of the SSL connection won't receive all > the data. But now I know case 1 will fix the bug you > encountered. Yes, write callback doesn't mean that the data was delivered to the other end, but it's also true for TCP sockets: if write() completes for a TCP socket it doesn't mean that the data was actually sent, it just gives you signal that you may try to send next chunk of data. There isn't much we can do about it without affecting performance.
http://codereview.chromium.org/7399025/diff/24001/net/socket/ssl_client_socke... File net/socket/ssl_client_socket_nss.cc (right): http://codereview.chromium.org/7399025/diff/24001/net/socket/ssl_client_socke... net/socket/ssl_client_socket_nss.cc:1761: // because Read() and Write() may return synchronously. On 2011/07/21 01:06:07, wtc wrote: > This comment should be updated to only describe the loop > around BufferSend. > > Make the same change to ssl_server_socket_nss.cc. Done.
http://codereview.chromium.org/7399025/diff/18001/net/socket/ssl_server_socke... File net/socket/ssl_server_socket_unittest.cc (right): http://codereview.chromium.org/7399025/diff/18001/net/socket/ssl_server_socke... net/socket/ssl_server_socket_unittest.cc:55: ALLOW_THIS_IN_INITIALIZER_LIST(task_factory_(this)) { Need compiler_specific.h for this macro. http://codereview.chromium.org/7399025/diff/18001/net/socket/ssl_server_socke... net/socket/ssl_server_socket_unittest.cc:74: &FakeDataChannel::DoReadCallback)); On 2011/07/21 00:27:12, wtc wrote: > willchan: can you review the use of ScopedRunnableMethodFactory > in this class? Looks reasonable to me. > > sergeyu: why do you need to post a task to run DoReadCallback()?
http://codereview.chromium.org/7399025/diff/18001/net/socket/ssl_server_socke... File net/socket/ssl_server_socket_unittest.cc (right): http://codereview.chromium.org/7399025/diff/18001/net/socket/ssl_server_socke... net/socket/ssl_server_socket_unittest.cc:55: ALLOW_THIS_IN_INITIALIZER_LIST(task_factory_(this)) { On 2011/07/21 08:16:06, willchan wrote: > Need compiler_specific.h for this macro. Done. |