|
|
Chromium Code Reviews
DescriptionPass the content size in the content-length header for file:// streams.
BUG=577981
Patch Set 1 #Patch Set 2 : int64 #
Total comments: 5
Messages
Total messages: 27 (2 generated)
jbreiden@google.com changed reviewers: + jbreiden@google.com
lgtm
thestig@chromium.org changed reviewers: + mmenke@chromium.org, zork@chromium.org
+zork who I believe originally wrote this block of code +mmenk for OWNERS
On 2016/01/15 22:10:24, Lei Zhang wrote: > +mmenk for OWNERS +mmenke, even
https://codereview.chromium.org/1585403004/diff/20001/content/browser/loader/... File content/browser/loader/resource_dispatcher_host_impl.cc (right): https://codereview.chromium.org/1585403004/diff/20001/content/browser/loader/... content/browser/loader/resource_dispatcher_host_impl.cc:832: base::Int64ToString(response->head.content_length)); The PDF viewer uses the content-length header to make some sort of caching decisions? That's not the actual length of the content after filtering. I suppose we don't really need to harden the PDF reader against this sort of thing as an attack (Or against PDFs without content-lengths...), but it seems like really flaky logic.
https://codereview.chromium.org/1585403004/diff/20001/content/browser/loader/... File content/browser/loader/resource_dispatcher_host_impl.cc (right): https://codereview.chromium.org/1585403004/diff/20001/content/browser/loader/... content/browser/loader/resource_dispatcher_host_impl.cc:832: base::Int64ToString(response->head.content_length)); On 2016/01/15 22:20:59, mmenke wrote: > The PDF viewer uses the content-length header to make some sort of caching > decisions? That's not the actual length of the content after filtering. I > suppose we don't really need to harden the PDF reader against this sort of thing > as an attack (Or against PDFs without content-lengths...), but it seems like > really flaky logic. If we go the bogus header route, wonder if we should add them for all file URLs. Looks like we do add bogus headers for blobs (Though, curiously, they don't include the content length).
This is where PDF uses content-length for memory allocation. https://code.google.com/p/chromium/codesearch#chromium/src/pdf/document_loade...
This is where PDF uses content-length for memory allocation. https://code.google.com/p/chromium/codesearch#chromium/src/pdf/document_loade...
https://codereview.chromium.org/1585403004/diff/20001/content/browser/loader/... File content/browser/loader/resource_dispatcher_host_impl.cc (right): https://codereview.chromium.org/1585403004/diff/20001/content/browser/loader/... content/browser/loader/resource_dispatcher_host_impl.cc:832: base::Int64ToString(response->head.content_length)); On 2016/01/15 22:20:59, mmenke wrote: > The PDF viewer uses the content-length header to make some sort of caching > decisions? That's not the actual length of the content after filtering. I > suppose we don't really need to harden the PDF reader against this sort of thing > as an attack (Or against PDFs without content-lengths...), but it seems like > really flaky logic. Sorry, I don't know anything about file URLS and "filtering", but at least in the simple case of opening a PDF file from disk, this is the size of the file. In terms of an attack, the PDF viewer already handles the case of not having a content-length. I suppose sending a large content-length may cause extra memory allocation in the PDF viewer. https://codereview.chromium.org/1585403004/diff/20001/content/browser/loader/... content/browser/loader/resource_dispatcher_host_impl.cc:832: base::Int64ToString(response->head.content_length)); On 2016/01/15 22:29:14, mmenke wrote: > On 2016/01/15 22:20:59, mmenke wrote: > > The PDF viewer uses the content-length header to make some sort of caching > > decisions? That's not the actual length of the content after filtering. I > > suppose we don't really need to harden the PDF reader against this sort of > thing > > as an attack (Or against PDFs without content-lengths...), but it seems like > > really flaky logic. > > If we go the bogus header route, wonder if we should add them for all file URLs. > Looks like we do add bogus headers for blobs (Though, curiously, they don't > include the content length). I can't speak for other places. The StreamInfo is the data structure that gets sent to the PDF viewer, so I can't think of another way to convey the file size other than this or adding a new field to StreamInfo.
On 2016/01/15 23:13:29, Lei Zhang wrote: > https://codereview.chromium.org/1585403004/diff/20001/content/browser/loader/... > File content/browser/loader/resource_dispatcher_host_impl.cc (right): > > https://codereview.chromium.org/1585403004/diff/20001/content/browser/loader/... > content/browser/loader/resource_dispatcher_host_impl.cc:832: > base::Int64ToString(response->head.content_length)); > On 2016/01/15 22:20:59, mmenke wrote: > > The PDF viewer uses the content-length header to make some sort of caching > > decisions? That's not the actual length of the content after filtering. I > > suppose we don't really need to harden the PDF reader against this sort of > thing > > as an attack (Or against PDFs without content-lengths...), but it seems like > > really flaky logic. > > Sorry, I don't know anything about file URLS and "filtering", but at least in > the simple case of opening a PDF file from disk, this is the size of the file. > > In terms of an attack, the PDF viewer already handles the case of not having a > content-length. I suppose sending a large content-length may cause extra memory > allocation in the PDF viewer. The issue is that for web requests, the content-length (If we even have one, which we often don't) is the compressed length of the response, not the uncompressed length. So if the response decompresses to be 5 million times the size of what we receive over the wire... > https://codereview.chromium.org/1585403004/diff/20001/content/browser/loader/... > content/browser/loader/resource_dispatcher_host_impl.cc:832: > base::Int64ToString(response->head.content_length)); > On 2016/01/15 22:29:14, mmenke wrote: > > On 2016/01/15 22:20:59, mmenke wrote: > > > The PDF viewer uses the content-length header to make some sort of caching > > > decisions? That's not the actual length of the content after filtering. I > > > suppose we don't really need to harden the PDF reader against this sort of > > thing > > > as an attack (Or against PDFs without content-lengths...), but it seems like > > > really flaky logic. > > > > If we go the bogus header route, wonder if we should add them for all file > URLs. > > Looks like we do add bogus headers for blobs (Though, curiously, they don't > > include the content length). > > I can't speak for other places. The StreamInfo is the data structure that gets > sent to the PDF viewer, so I can't think of another way to convey the file size > other than this or adding a new field to StreamInfo.
On 2016/01/15 23:16:52, mmenke wrote: > The issue is that for web requests, the content-length (If we even have one, > which we often don't) is the compressed length of the response, not the > uncompressed length. So if the response decompresses to be 5 million times the > size of what we receive over the wire... Ah, that. I didn't write the PDF viewer's DocumentLoader, so I'm not sure how this is handled. If it is an issue, it's pre-existing. The new code is for file:// schemes only, where there is presumably no compression.
On 2016/01/15 23:20:32, Lei Zhang wrote: > On 2016/01/15 23:16:52, mmenke wrote: > > The issue is that for web requests, the content-length (If we even have one, > > which we often don't) is the compressed length of the response, not the > > uncompressed length. So if the response decompresses to be 5 million times > the > > size of what we receive over the wire... > > Ah, that. I didn't write the PDF viewer's DocumentLoader, so I'm not sure how > this is handled. If it is an issue, it's pre-existing. > > The new code is for file:// schemes only, where there is presumably no > compression. Right, but the problem doesn't only affect file schemes. I'm wondering if we can come up with something more general, rather than a hack for this specific case. I'm not against fundamentally sending the exact file sizes for file URLs, I'm against doing that and considering this issue resolved.
On 2016/01/15 23:22:54, mmenke wrote: > On 2016/01/15 23:20:32, Lei Zhang wrote: > > On 2016/01/15 23:16:52, mmenke wrote: > > > The issue is that for web requests, the content-length (If we even have one, > > > which we often don't) is the compressed length of the response, not the > > > uncompressed length. So if the response decompresses to be 5 million times > > the > > > size of what we receive over the wire... > > > > Ah, that. I didn't write the PDF viewer's DocumentLoader, so I'm not sure how > > this is handled. If it is an issue, it's pre-existing. > > > > The new code is for file:// schemes only, where there is presumably no > > compression. > > Right, but the problem doesn't only affect file schemes. I'm wondering if we > can come up with something more general, rather than a hack for this specific > case. I'm not against fundamentally sending the exact file sizes for file URLs, > I'm against doing that and considering this issue resolved. Maybe try to load the entire resource with the current logic, but if it looks to be getting too big, switch to partial mode?
On 2016/01/15 23:26:18, mmenke wrote: > On 2016/01/15 23:22:54, mmenke wrote: > > On 2016/01/15 23:20:32, Lei Zhang wrote: > > > On 2016/01/15 23:16:52, mmenke wrote: > > > > The issue is that for web requests, the content-length (If we even have > one, > > > > which we often don't) is the compressed length of the response, not the > > > > uncompressed length. So if the response decompresses to be 5 million > times > > > the > > > > size of what we receive over the wire... > > > > > > Ah, that. I didn't write the PDF viewer's DocumentLoader, so I'm not sure > how > > > this is handled. If it is an issue, it's pre-existing. > > > > > > The new code is for file:// schemes only, where there is presumably no > > > compression. > > > > Right, but the problem doesn't only affect file schemes. I'm wondering if we > > can come up with something more general, rather than a hack for this specific > > case. I'm not against fundamentally sending the exact file sizes for file > URLs, > > I'm against doing that and considering this issue resolved. > > Maybe try to load the entire resource with the current logic, but if it looks to > be getting too big, switch to partial mode? Actually, if the server supports range requests, range requests don't work with compression, so depending on how we get content-length, it may be uncompressed content-length. Not an area I know that well. I'll dig around on Tuesday.
I understand the range request situation very well and can answer any questions about it. But today only. Will be travelling and incommunicado Jan 19-29.
The design intent is: no range requests if PDF is compressed during transport.
On 2016/01/15 23:59:18, jbreiden wrote: > I understand the range request situation very well and can answer > any questions about it. But today only. Will be travelling and > incommunicado Jan 19-29. I'm out all next week too. I'm in no rush with this CL.
On 2016/01/15 23:59:18, jbreiden wrote: > I understand the range request situation very well and can answer > any questions about it. But today only. Will be travelling and > incommunicado Jan 19-29. I'm EST...So looks like we can revisit this next month. I want to get a better understanding of the relevant code in the meantime, and figure out if we can reasonably do something better.
Okay. Spelchat can also answer questions if you desperately need to understand something in the meantime.
Everyone is back.
On 2016/02/02 03:21:07, jbreiden wrote: > Everyone is back. I haven't done the digging I planned to do yet. :( One thing I think we should also do, at a minimum, is to make https://code.google.com/p/chromium/codesearch#chromium/src/pdf/document_loade... treat the lack of a content-length header as being above the kMinFileSize. Also...how does this actually fix the issue? Looking at the code you linked before, I don't see how we'd end up with accept_byte_ranges set as true for file URLs, so not sure how we'd end up in the partial path. Link you gave, for easy reference: https://code.google.com/p/chromium/codesearch#chromium/src/pdf/document_loade...
Should we pick up this CL again? Does seem like something worth fixing. While the existing code does seem a bit broken, I'm happy to talk about fixes for just the file URL case. What exactly is a "Stream" anyways? What other cases does adding bogus headers to one affect?
Ya, we should get back to this CL. zork@ should be able to answer some of these questions about streams?
On 2016/05/09 21:49:37, Lei Zhang wrote: > Ya, we should get back to this CL. zork@ should be able to answer some of these > questions about streams? A stream takes a dispatched resource, and provides the data from it to an extension. https://code.google.com/p/chromium/codesearch#chromium/src/chrome/common/exte... The bogus headers will be passed to the extension, and would probably fill the expected_file_size. It shouldn't affect anything in the extension, though. The size is a hint, and isn't guaranteed to be correct.
https://codereview.chromium.org/1585403004/diff/20001/content/browser/loader/... File content/browser/loader/resource_dispatcher_host_impl.cc (right): https://codereview.chromium.org/1585403004/diff/20001/content/browser/loader/... content/browser/loader/resource_dispatcher_host_impl.cc:832: base::Int64ToString(response->head.content_length)); On 2016/01/15 23:13:29, Lei Zhang wrote: > On 2016/01/15 22:29:14, mmenke wrote: > > On 2016/01/15 22:20:59, mmenke wrote: > > > The PDF viewer uses the content-length header to make some sort of caching > > > decisions? That's not the actual length of the content after filtering. I > > > suppose we don't really need to harden the PDF reader against this sort of > > thing > > > as an attack (Or against PDFs without content-lengths...), but it seems like > > > really flaky logic. > > > > If we go the bogus header route, wonder if we should add them for all file > URLs. > > Looks like we do add bogus headers for blobs (Though, curiously, they don't > > include the content length). > > I can't speak for other places. The StreamInfo is the data structure that gets > sent to the PDF viewer, so I can't think of another way to convey the file size > other than this or adding a new field to StreamInfo. My preference, from a correctness standpoint, would be to add a field to StreamInfo - it seems more robust, and would also fix the case where content-length is different from the uncompressed body length. Adding this header just for file URLs just for this API seems hackish to me, and misses some more obscure cases (PDFs in data URLs, PDFs in blob URLs, PDFs in Chrome URLs [If anyone does that], etc). That having been said, I'm not sure how much of a pain passing this information all the way down to pdf/document_loader.cc is. I guess we'd have to update a NaCl (ppapi?) interface to pass the information there as well?
On 2016/05/09 22:23:07, Zachary Kuznia wrote: > On 2016/05/09 21:49:37, Lei Zhang wrote: > > Ya, we should get back to this CL. zork@ should be able to answer some of > these > > questions about streams? > > A stream takes a dispatched resource, and provides the data from it to an > extension. > https://code.google.com/p/chromium/codesearch#chromium/src/chrome/common/exte... > The bogus headers will be passed to the extension, and would probably fill the > expected_file_size. It shouldn't affect anything in the extension, though. The > size is a hint, and isn't guaranteed to be correct. I can't find where we set expectedContentSize, but it really should be set from URLRequest::GetExpectedContentSize, which is the actual size of the body, if there's no error, and the server did not lie to us, or -1. It's -1 whenever any compression is applied. |
