|
|
Chromium Code Reviews|
Created:
4 years, 10 months ago by gabadie Modified:
4 years, 9 months ago CC:
chromium-reviews, gabadie+watch_chromium.org, jbudorick+watch_chromium.org, mikecase+watch_chromium.org Base URL:
https://chromium.googlesource.com/chromium/src.git@i08 Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
Descriptionsandwich: Implements patch-wpr subcommand.
The patch-wpr sub-command patches all resources response headers of a WPR archive, to make sure they all will be going into the chrome cache on disk if not already in, and also takes care of making sure that this cached resources will not be invalidated or re-validated in the next 10 years.
BUG=582080
Committed: https://crrev.com/4bfeb32fdb5d16888edca6eeabb694496ccf6fe7
Cr-Commit-Position: refs/heads/master@{#378452}
Patch Set 1 #
Total comments: 13
Patch Set 2 : Addresses pasko's comments #
Total comments: 2
Patch Set 3 : Lists x-cache and Vary response headers as a TODO. #
Dependent Patchsets: Messages
Total messages: 22 (8 generated)
Description was changed from ========== sandwich: Implements patch-wpr subcommand. BUG=582080 ========== to ========== sandwich: Implements patch-wpr subcommand. The patch-wpr sub-command patches a resources response headers of an WPR archive, to make sure they all will be going into the chrome cache on disk if not already in, and also takes care of making sure that this cached resources will not be invalidated or re-validated in the next 10 years. BUG=582080 ==========
gabadie@chromium.org changed reviewers: + lizeb@chromium.org, mattcary@chromium.org, pasko@chromium.org
gabadie@chromium.org changed reviewers: + lizeb@chromium.org, mattcary@chromium.org, pasko@chromium.org
Sorry for the double email, the form got mistakenly submitted too quickly. The sandwich patch-wpr sub-command patches a resources response headers of an WPR archive, to make sure they all will be going into the chrome cache on disk if not already in, and also takes care of making sure that this cached resources will not be invalidated or re-validated in the next 10 years. mattcary@chromium.org, lizeb@chromium.org, pasko@chromium.org: PTAL.
Description was changed from ========== sandwich: Implements patch-wpr subcommand. The patch-wpr sub-command patches a resources response headers of an WPR archive, to make sure they all will be going into the chrome cache on disk if not already in, and also takes care of making sure that this cached resources will not be invalidated or re-validated in the next 10 years. BUG=582080 ========== to ========== sandwich: Implements patch-wpr subcommand. The patch-wpr sub-command patches all resources response headers of a WPR archive, to make sure they all will be going into the chrome cache on disk if not already in, and also takes care of making sure that this cached resources will not be invalidated or re-validated in the next 10 years. BUG=582080 ==========
lgtm
https://codereview.chromium.org/1740653002/diff/1/tools/android/loading/sandw... File tools/android/loading/sandwich.py (right): https://codereview.chromium.org/1740653002/diff/1/tools/android/loading/sandw... tools/android/loading/sandwich.py:369: for url_entry in wpr_archive.ListUrlEntries(): Is Cache-Control the only caching directive? If not, does it prevail?
https://codereview.chromium.org/1740653002/diff/1/tools/android/loading/sandw... File tools/android/loading/sandwich.py (right): https://codereview.chromium.org/1740653002/diff/1/tools/android/loading/sandw... tools/android/loading/sandwich.py:369: for url_entry in wpr_archive.ListUrlEntries(): On 2016/02/25 15:24:54, Benoit L wrote: > Is Cache-Control the only caching directive? > If not, does it prevail? It is not the only cache-directive but according to section 13.2.4 (https://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html), it prevail from the Expire header in the freshness_lifetime computation.
Right, this would relax the freshness checks. However, validators can be applied as well that are checked in addition to freshness. I listed all validator-causing response headers in comments, please take a look. I might have missed some. I learned some of these just today. I think these corner cases will probably not affect Wikipedia, btu for larger scale experiments we'd need this extra care. https://codereview.chromium.org/1740653002/diff/1/tools/android/loading/sandw... File tools/android/loading/sandwich.py (right): https://codereview.chromium.org/1740653002/diff/1/tools/android/loading/sandw... tools/android/loading/sandwich.py:369: for url_entry in wpr_archive.ListUrlEntries(): On 2016/02/25 15:31:26, gabadie wrote: > On 2016/02/25 15:24:54, Benoit L wrote: > > Is Cache-Control the only caching directive? > > If not, does it prevail? > > It is not the only cache-directive but according to section 13.2.4 > (https://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html), it prevail from the > Expire header in the freshness_lifetime computation. afaict, 13.2.4 says that the old Expires comes into play if Cache-Control:max-age is not present. There is also s-maxage that may override max-age (though I was not aware of this 5 minutes ago). Also, there is Cache-Control:no-store that needs to be patched out. And Cache-Control:must-revalidate. I would also drop all Vary: headers because verifying whether they'd lead to expiration/revalidation is less trivial. Also, web developers occasionally put JS-generated stuff into ETag, it will be just safer to remove all ETag: headers than to debug why occasionally somewhere a different ETag was popped in when stars aligned. Also, the Last-Modified header is a validator. We should replace it with a 10 year max-age, I believe. See more about cache related headers in: https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html https://codereview.chromium.org/1740653002/diff/1/tools/android/loading/sandw... tools/android/loading/sandwich.py:371: if 'cache-control' in response_headers and \ Field names are case-insensitive, see: https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2 https://codereview.chromium.org/1740653002/diff/1/tools/android/loading/sandw... tools/android/loading/sandwich.py:377: if name == 'no-cache': This feel unsafe as it can remove a valid header. Is there anything beyond Cache-Control:no-cache that has "no-cache" in it? https://codereview.chromium.org/1740653002/diff/1/tools/android/loading/sandw... tools/android/loading/sandwich.py:387: url_entry.SetResponseHeader('cache-control', CACHE_CONTROL) the spec allows only one cache-control header, can we assert on it for additional clarity here?
Thanks Egor for review. And good catch for the case in-sensitiveness! I added comments as agreed offline in the new revision. Please take a look. https://codereview.chromium.org/1740653002/diff/1/tools/android/loading/sandw... File tools/android/loading/sandwich.py (right): https://codereview.chromium.org/1740653002/diff/1/tools/android/loading/sandw... tools/android/loading/sandwich.py:369: for url_entry in wpr_archive.ListUrlEntries(): On 2016/02/26 15:54:47, pasko wrote: > On 2016/02/25 15:31:26, gabadie wrote: > > On 2016/02/25 15:24:54, Benoit L wrote: > > > Is Cache-Control the only caching directive? > > > If not, does it prevail? > > > > It is not the only cache-directive but according to section 13.2.4 > > (https://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html), it prevail from the > > Expire header in the freshness_lifetime computation. > > afaict, 13.2.4 says that the old Expires comes into play if > Cache-Control:max-age is not present. Yes, but the point of this loop is to add max-age on all entries anyway... > > There is also s-maxage that may override max-age (though I was not aware of this > 5 minutes ago). Yes but s-maxage is a cache-control directive, so we don't have any chance to have a s-maxage poping because we simply override any cache-control headers with CACHE_CONTROL. > > Also, there is Cache-Control:no-store that needs to be patched out. Same as previous. > > And Cache-Control:must-revalidate. Same as previous. > > I would also drop all Vary: headers because verifying whether they'd lead to > expiration/revalidation is less trivial. My understanding was that the Vary response header was meant to be used by proxy caching, and not client caching. > > Also, web developers occasionally put JS-generated stuff into ETag, it will be > just safer to remove all ETag: headers than to debug why occasionally somewhere > a different ETag was popped in when stars aligned. Not sure if this is a good idea. WPR freeze the ETag and the ressource content. From my understanding, ETag are useful for resource re-validations. But we are going to avoid them anyway for the NoState-Prefetch simulation. Adding a TODO. > > Also, the Last-Modified header is a validator. We should replace it with a 10 > year max-age, I believe. Note sure if it is required because it is written "An origin server MUST NOT send a Last-Modified date which is later than the server's time of message origination.", so the max-age cache directive should be enough. Adding a TODO as agreed offline. > > See more about cache related headers in: > https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html > https://codereview.chromium.org/1740653002/diff/1/tools/android/loading/sandw... tools/android/loading/sandwich.py:371: if 'cache-control' in response_headers and \ On 2016/02/26 15:54:47, pasko wrote: > Field names are case-insensitive, see: > https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2 The point of this branch is to just not be logging this entry because we already patched it. But this extra care of case-insensitiveness should be taken care of in the WprBackend. Good catch! Will make the change in a following CL. https://codereview.chromium.org/1740653002/diff/1/tools/android/loading/sandw... tools/android/loading/sandwich.py:377: if name == 'no-cache': On 2016/02/26 15:54:47, pasko wrote: > This feel unsafe as it can remove a valid header. Is there anything beyond > Cache-Control:no-cache that has "no-cache" in it? Yes, x-cache, and wikipedia is actually generating it, the reason why I have added this. https://codereview.chromium.org/1740653002/diff/1/tools/android/loading/sandw... tools/android/loading/sandwich.py:387: url_entry.SetResponseHeader('cache-control', CACHE_CONTROL) On 2016/02/26 15:54:47, pasko wrote: > the spec allows only one cache-control header, can we assert on it for > additional clarity here? I already have seen cache-control several time in one response header while debugging this. According to the spec: "Multiple message-header fields with the same field-name MAY be present in a message if and only if the entire field-value for that header field is defined as a comma-separated list [i.e., #(values)]. It MUST be possible to combine the multiple header fields into one "field-name: field-value" pair, without changing the semantics of the message, by appending each subsequent field-value to the first, each separated by a comma. The order in which header fields with the same field-name are received is therefore significant to the interpretation of the combined field value, and thus a proxy MUST NOT change the order of these field values when a message is forwarded.", https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html
https://codereview.chromium.org/1740653002/diff/1/tools/android/loading/sandw... File tools/android/loading/sandwich.py (right): https://codereview.chromium.org/1740653002/diff/1/tools/android/loading/sandw... tools/android/loading/sandwich.py:371: if 'cache-control' in response_headers and \ On 2016/02/29 17:45:23, gabadie wrote: > On 2016/02/26 15:54:47, pasko wrote: > > Field names are case-insensitive, see: > > https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2 > > The point of this branch is to just not be logging this entry because we already > patched it. > > But this extra care of case-insensitiveness should be taken care of in the > WprBackend. Good catch! Will make the change in a following CL. Acknowledged. https://codereview.chromium.org/1740653002/diff/1/tools/android/loading/sandw... tools/android/loading/sandwich.py:377: if name == 'no-cache': On 2016/02/29 17:45:23, gabadie wrote: > On 2016/02/26 15:54:47, pasko wrote: > > This feel unsafe as it can remove a valid header. Is there anything beyond > > Cache-Control:no-cache that has "no-cache" in it? > > Yes, x-cache, and wikipedia is actually generating it, the reason why I have > added this. Does overwriting it make a noticeable difference for Wikipedia in terms of speed or reloaded resources? I want to avoid hours of debugging if this plays against us unintentionally. https://codereview.chromium.org/1740653002/diff/1/tools/android/loading/sandw... tools/android/loading/sandwich.py:387: url_entry.SetResponseHeader('cache-control', CACHE_CONTROL) On 2016/02/29 17:45:23, gabadie wrote: > On 2016/02/26 15:54:47, pasko wrote: > > the spec allows only one cache-control header, can we assert on it for > > additional clarity here? > > I already have seen cache-control several time in one response header while > debugging this. According to the spec: "Multiple message-header fields with the > same field-name MAY be present in a message if and only if the entire > field-value for that header field is defined as a comma-separated list [i.e., > #(values)]. It MUST be possible to combine the multiple header fields into one > "field-name: field-value" pair, without changing the semantics of the message, > by appending each subsequent field-value to the first, each separated by a > comma. The order in which header fields with the same field-name are received is > therefore significant to the interpretation of the combined field value, and > thus a proxy MUST NOT change the order of these field values when a message is > forwarded.", https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html yeah, I just did not look what the function does; replacing the first header and removing all others is OK https://codereview.chromium.org/1740653002/diff/20001/tools/android/loading/s... File tools/android/loading/sandwich.py (right): https://codereview.chromium.org/1740653002/diff/20001/tools/android/loading/s... tools/android/loading/sandwich.py:382: # TODO(gabadie), may need to delete ETag. please add Vary to the list
https://codereview.chromium.org/1740653002/diff/20001/tools/android/loading/s... File tools/android/loading/sandwich.py (right): https://codereview.chromium.org/1740653002/diff/20001/tools/android/loading/s... tools/android/loading/sandwich.py:382: # TODO(gabadie), may need to delete ETag. On 2016/03/01 00:48:10, pasko wrote: > please add Vary to the list Done.
lgtm, thank you
The CQ bit was checked by gabadie@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from mattcary@chromium.org Link to the patchset: https://codereview.chromium.org/1740653002/#ps40001 (title: "Lists x-cache and Vary response headers as a TODO.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1740653002/40001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1740653002/40001
Message was sent while issue was closed.
Description was changed from ========== sandwich: Implements patch-wpr subcommand. The patch-wpr sub-command patches all resources response headers of a WPR archive, to make sure they all will be going into the chrome cache on disk if not already in, and also takes care of making sure that this cached resources will not be invalidated or re-validated in the next 10 years. BUG=582080 ========== to ========== sandwich: Implements patch-wpr subcommand. The patch-wpr sub-command patches all resources response headers of a WPR archive, to make sure they all will be going into the chrome cache on disk if not already in, and also takes care of making sure that this cached resources will not be invalidated or re-validated in the next 10 years. BUG=582080 ==========
Message was sent while issue was closed.
Committed patchset #3 (id:40001)
Message was sent while issue was closed.
Description was changed from ========== sandwich: Implements patch-wpr subcommand. The patch-wpr sub-command patches all resources response headers of a WPR archive, to make sure they all will be going into the chrome cache on disk if not already in, and also takes care of making sure that this cached resources will not be invalidated or re-validated in the next 10 years. BUG=582080 ========== to ========== sandwich: Implements patch-wpr subcommand. The patch-wpr sub-command patches all resources response headers of a WPR archive, to make sure they all will be going into the chrome cache on disk if not already in, and also takes care of making sure that this cached resources will not be invalidated or re-validated in the next 10 years. BUG=582080 Committed: https://crrev.com/4bfeb32fdb5d16888edca6eeabb694496ccf6fe7 Cr-Commit-Position: refs/heads/master@{#378452} ==========
Message was sent while issue was closed.
Patchset 3 (id:??) landed as https://crrev.com/4bfeb32fdb5d16888edca6eeabb694496ccf6fe7 Cr-Commit-Position: refs/heads/master@{#378452} |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
