|
|
Created:
9 years, 11 months ago by gavinp Modified:
9 years, 6 months ago CC:
chromium-reviews, cbentzel+watch_chromium.org, darin-cc_chromium.org Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionAdd histogram for HTTP response codes
This adds an histogram to the HTTP response header parser. I reviewed
the code, and I've convinced myself that this captures the two main
cases of HTTP header parsing (tunneling, and normal HTTP), and that
double-calls aren't occuring. I liked having all the code in the one
translation unit. But, I might have missed one, and we might still
be getting called twice for the same load: so the statistics might be off.
Alternatively, I could put a static function on HttpResponseHeader, to
be called from the site of parse. But this might miss a parse.
All in all, I wanted to be sure to catch all distinct response codes
we get, so we can be sure not to stomp on anything. So I chose the
approach using the HttpResponseHeaders constructor.
TEST=none
BUG=70428
Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=72801
Patch Set 1 : fix lint + range #
Total comments: 8
Patch Set 2 : remediation to review #
Total comments: 9
Patch Set 3 : remedation to reivew #
Total comments: 10
Patch Set 4 : address reviews from wtc and jar #Messages
Total messages: 23 (0 generated)
Made a second upload after realising a lint boo-boo plus a mistake with a range.
I only see one upload. Did you really do a second upload? Jim On Fri, Jan 21, 2011 at 11:22 AM, <gavinp@chromium.org> wrote: > Made a second upload after realising a lint boo-boo plus a mistake with a > range. > > > http://codereview.chromium.org/6317011/ >
http://codereview.chromium.org/6317011/diff/7001/net/http/http_response_heade... File net/http/http_response_headers.cc (right): http://codereview.chromium.org/6317011/diff/7001/net/http/http_response_heade... net/http/http_response_headers.cc:96: std::vector<int> codes; How expensive is this? It looks like this fairly large vector allocation and copy is done everytime HttpResponseHeaders is called. Can you only generate it once?
Is this code called from anywhere? Are there additional files in this CL? Also, please see comment about motivation below. Thanks, Jim http://codereview.chromium.org/6317011/diff/7001/net/http/http_response_heade... File net/http/http_response_headers.cc (right): http://codereview.chromium.org/6317011/diff/7001/net/http/http_response_heade... net/http/http_response_headers.cc:93: kHistogramMaxHttpResponseCode = 599 }; This is going to make a slightly large histogram. Most common are 50-100 buckets, but this will be about 500 buckets. Memory cost for these things is around 4 bytes per bucket for a counter, plus about 4 bytes per bucket for the range. That totals around 2K bytes. That gets doubled because we have to snapshot this array, for about 4K, plus of course a bit of string name overhead etc. Can you motivate this gathering of data a bit? Will you need to gather this forever? The amount of memory is not the end of the world, but a death by a million cuts is achieved one slice at a time. http://codereview.chromium.org/6317011/diff/7001/net/http/http_response_heade... net/http/http_response_headers.cc:109: return code; Probably want something more like code - kHistogramMinHttpResponseCode + 1 http://codereview.chromium.org/6317011/diff/7001/net/http/http_response_heade... net/http/http_response_headers.cc:110: else style nit: avoid else clauses when you returned in the previous clause.
http://codereview.chromium.org/6317011/diff/7001/net/http/http_response_heade... File net/http/http_response_headers.cc (right): http://codereview.chromium.org/6317011/diff/7001/net/http/http_response_heade... net/http/http_response_headers.cc:96: std::vector<int> codes; This is only done during the initialization of the HISTOGRAM. The macro only evals its arg once, creating a static ref to the instance, and will never re-evaluate this. On 2011/01/21 20:33:25, cbentzel wrote: > How expensive is this? > > It looks like this fairly large vector allocation and copy is done everytime > HttpResponseHeaders is called. Can you only generate it once? > http://codereview.chromium.org/6317011/diff/7001/net/http/http_response_heade... net/http/http_response_headers.cc:109: return code; <doh> My mistake. Please ignore. The relocation of the numbers are handled by the histogram. <sorry... not thinking> On 2011/01/21 20:37:46, jar wrote: > Probably want something more like > > code - kHistogramMinHttpResponseCode + 1
One concern - is HttpResponseHeaders too low level a location for this? It seems like UrlRequestHttpJob may be a better place. On 2011/01/21 20:40:36, jar wrote: > http://codereview.chromium.org/6317011/diff/7001/net/http/http_response_heade... > File net/http/http_response_headers.cc (right): > > http://codereview.chromium.org/6317011/diff/7001/net/http/http_response_heade... > net/http/http_response_headers.cc:96: std::vector<int> codes; > This is only done during the initialization of the HISTOGRAM. The macro only > evals its arg once, creating a static ref to the instance, and will never > re-evaluate this. > > On 2011/01/21 20:33:25, cbentzel wrote: > > How expensive is this? > > > > It looks like this fairly large vector allocation and copy is done everytime > > HttpResponseHeaders is called. Can you only generate it once? > > > > http://codereview.chromium.org/6317011/diff/7001/net/http/http_response_heade... > net/http/http_response_headers.cc:109: return code; > <doh> My mistake. Please ignore. The relocation of the numbers are handled by > the histogram. <sorry... not thinking> > > On 2011/01/21 20:37:46, jar wrote: > > Probably want something more like > > > > code - kHistogramMinHttpResponseCode + 1
I really did do the second upload talked about earlier Jim, but to "reduce confusion" I deleted the bad first upload. Sorry for creating confusion by that action. This histogram is updated every time an HttpResponseHeaders is constructed from a bare string (i.e. not pickled). That covers numerous tests (which don't upload), but also http;/http_stream_parser.cc:511 and socket_stream/socket_stream.cc:720. I'll do another upload in just a minute, it's baking as we speak. http://codereview.chromium.org/6317011/diff/7001/net/http/http_response_heade... File net/http/http_response_headers.cc (right): http://codereview.chromium.org/6317011/diff/7001/net/http/http_response_heade... net/http/http_response_headers.cc:93: kHistogramMaxHttpResponseCode = 599 }; On 2011/01/21 20:37:46, jar wrote: > This is going to make a slightly large histogram. Most common are 50-100 > buckets, but this will be about 500 buckets. Memory cost for these things is > around 4 bytes per bucket for a counter, plus about 4 bytes per bucket for the > range. That totals around 2K bytes. That gets doubled because we have to > snapshot this array, for about 4K, plus of course a bit of string name overhead > etc. > > Can you motivate this gathering of data a bit? Will you need to gather this > forever? > > The amount of memory is not the end of the world, but a death by a million cuts > is achieved one slice at a time. I appreciate your point Jim. I do not believe we'll need to gather this forever. The goal is to learn which HTTP response codes are used in the wild; for prerendering we might need to add a new HTTP response code, and I wanted to learn which codes are used (in each category) in the wild before proposing a new number. I think eventually I'll be able to reduce this greatly: we can cut it back to just the response codes that are actually used in practice, and add bins in each category for "other", which we can then dynamically expand to find out what's happening. I'll add a TODO() to cut this back. I believe the majority of bins we collect will be zero almost every time, but since the point is to find out which ones aren't, initially (for a few weeks) I'd like to leave it this wide. http://codereview.chromium.org/6317011/diff/7001/net/http/http_response_heade... net/http/http_response_headers.cc:110: else On 2011/01/21 20:37:46, jar wrote: > style nit: avoid else clauses when you returned in the previous clause. Done.
On 2011/01/21 21:33:08, cbentzel wrote: > One concern - is HttpResponseHeaders too low level a location for this? It seems > like UrlRequestHttpJob may be a better place. > I considered, and rejected that approach. I'm happy to return to it though. The goal wascapture all HTTP response codes, from all sources. If one is being used, for instance, in tunnels (net/socket_stream/socket_stream.cc), I wanted to capture that. Does that end up in an UrlRequestHttpJob ultimately? What about websockets? My concern here was coverage. If we want to know what codes are available, we shouldn't overlap other protocols.
*PING*
Adding willchan, wtc, and eroman who may also have good ideas for where to place this. I'm still in favor of UrlRequestHttpJob, for the purpose of finding lowly-used response codes in the wild. In the case of a tunnel establishment, the 200 response code will be handled at a lower layer than UrlRequestHttpJob. However, any code explicitly handled at a lower layer than UrlRequestHttpJob is one that's already in widespread usage and not a candidate.
On Tue, Jan 25, 2011 at 8:47 AM, <cbentzel@chromium.org> wrote: > Adding willchan, wtc, and eroman who may also have good ideas for where to > place > this. > > I'm still in favor of UrlRequestHttpJob, for the purpose of finding > lowly-used > response codes in the wild. > > In the case of a tunnel establishment, the 200 response code will be > handled at > a lower layer than UrlRequestHttpJob. However, any code explicitly handled > at a > lower layer than UrlRequestHttpJob is one that's already in widespread > usage and > not a candidate. What does "not a candidate" mean? Note, if you do it at the URLRequestHttpJob level, you will include cached HTTP responses. Is this what you want? I'll go look at the code in a bit, but IIRC, we do some amount of copying of HttpResponseInfos, so we should make sure not to double count. > > > > http://codereview.chromium.org/6317011/ >
On 2011/01/25 16:47:22, cbentzel wrote: > > In the case of a tunnel establishment, the 200 response code will be handled at > a lower layer than UrlRequestHttpJob. However, any code explicitly handled at a > lower layer than UrlRequestHttpJob is one that's already in widespread usage and > not a candidate. My thinking on this was: any code explicitly handled, or ever provided or used at a lower layer than UrlRequestHttpJob is not a candidate, or in a layer that sites beside UrlRequestJob (WebSockets, etc...). We'd like to know about these codes being in use; this histogram will discover them, and the nonzero usage will clearly show us to stay clear of that code.
LGTM
Ok, I was wrong about the copying, so don't worry about that. Ok, here's the list of places I see that create it: chrome/common/security_filter_peer.cc (I don't know what this is for) net/http/http_network_transaction.cc (used to handle 100 Continue responses, except we initialize it with an empty HttpResponseHeaders(""), so it does your histogram no good, needs to be fixed) chrome/browser/automation/url_request_automation_job.cc (I think this is for ChromeFrame when it used to use the host network stack instead of the Chrome network stack, need to ask CF folks what the status of this is) net/spdy/spdy_http_util.cc (spdy pathway, covers both proxies and normal requests) net/socket_stream/socket_stream.cc (websockets?) net/websockets/websocket_handshake* (websockets?) webkit/blob/blob_url_request_job.cc (webkit blobs, I don't understand it well) For your change to work as is, the HttpNetworkTransaction case needs to fixed and the security_filter_peer.cc case needs to be understood. If you don't care about absolute accuracy of distribution, and mostly care about just knowing which response codes are in use, then I think this is a fine solution. I'm a bit worried about this in terms of cleanliness of API, since constructing a HttpResponseHeaders, specifically with that constructor, is not obvious that it is contributing to the histogram. I could easily see someone abusing it to create a temporary HttpResponseHeaders for some reason. That said, it's definitely the best place if you want to make sure you catch everything. I just would be wary about claiming accuracy of distribution. Please clarify what you intend to use this distribution for, and when you update the histogram dashboard, you might want to add a caveat on use of the distribution.
It seems useful to have separate histograms for the HTTP GET, POST, and CONNECT methods. Please find a way to avoid constructing the std::vector<int> of codes every time. http://codereview.chromium.org/6317011/diff/17001/net/http/http_response_head... File net/http/http_response_headers.cc (right): http://codereview.chromium.org/6317011/diff/17001/net/http/http_response_head... net/http/http_response_headers.cc:95: kHistogramMaxHttpResponseCode = 599 }; This should be formatted as enum { kHistogramMinHttpResponseCode = 100, kHistogramMaxHttpResponseCode = 599 }; http://codereview.chromium.org/6317011/diff/17001/net/http/http_response_head... net/http/http_response_headers.cc:103: i <= kHistogramMaxHttpResponseCode;++i) Document what the 0 status code added on line 101 means. Nit: add a space before ++i. http://codereview.chromium.org/6317011/diff/17001/net/http/http_response_head... net/http/http_response_headers.cc:108: int MapHttpResponseCode(const int code) { Nit: remove the 'const' from the function parameter. http://codereview.chromium.org/6317011/diff/17001/net/http/http_response_head... net/http/http_response_headers.cc:141: GetAllHttpResponseCodes()); It is wasteful to build the std::vector<int> of codes every time. There must be a way to avoid this.
On 2011/01/25 18:38:00, wtc wrote: > It seems useful to have separate histograms for the HTTP > GET, POST, and CONNECT methods. > > Please find a way to avoid constructing the std::vector<int> > of codes every time. > It is not constructed each time; the macro expands to a static object initialization which calls the function exactly once, at the first time the PC reaches this point. I will add a comment explaining this, since you're the second reviewer to make this complaint, so clearly it deserves addressing.
On 2011/01/25 18:30:58, willchan wrote: > Ok, I was wrong about the copying, so don't worry about that. > > Ok, here's the list of places I see that create it: > chrome/common/security_filter_peer.cc (I don't know what this is for) This looks to be code to blank out the renderer when going to a disallowed/aborted site. The parse is of a static string, and so the result will be an extra, spurious 200 code in the histogram. > net/http/http_network_transaction.cc (used to handle 100 Continue responses, > except we initialize it with an empty HttpResponseHeaders(""), so it does your > histogram no good, needs to be fixed) This is fine. The parse occurs in HttpStreamParser::DoParseResponseHeaders, which has HttpNetworkTransaction::DoReadHeaders() on the stack, sort of (IO completion). The first time 1xx is read, the histogram gets the 1xx code. The empty string initialization might as well be NULL, since the state is driven right back to read the headers, and another call in the stream parser will seed the histogram. > chrome/browser/automation/url_request_automation_job.cc (I think this is for > ChromeFrame when it used to use the host network stack instead of the Chrome > network stack, need to ask CF folks what the status of this is) > net/spdy/spdy_http_util.cc (spdy pathway, covers both proxies and normal > requests) > net/socket_stream/socket_stream.cc (websockets?) > net/websockets/websocket_handshake* (websockets?) > webkit/blob/blob_url_request_job.cc (webkit blobs, I don't understand it well) > > For your change to work as is, the HttpNetworkTransaction case needs to fixed > and the security_filter_peer.cc case needs to be understood. > > If you don't care about absolute accuracy of distribution, and mostly care about > just knowing which response codes are in use, then I think this is a fine > solution. I'm a bit worried about this in terms of cleanliness of API, since > constructing a HttpResponseHeaders, specifically with that constructor, is not > obvious that it is contributing to the histogram. I could easily see someone > abusing it to create a temporary HttpResponseHeaders for some reason. > > That said, it's definitely the best place if you want to make sure you catch > everything. I just would be wary about claiming accuracy of distribution. > > Please clarify what you intend to use this distribution for, and when you update > the histogram dashboard, you might want to add a caveat on use of the > distribution. Our intention is to see which does are in use. I'd prefer we catch a code used in an uncommon case, and potentially double count (although I concur that there aren't many of these now, but of course it's fragile exactly like you said: double use could some up any time). When I add this to the histogram dashboard, I'll make sure the description points out the problems with the distribution.
http://codereview.chromium.org/6317011/diff/17001/net/http/http_response_head... File net/http/http_response_headers.cc (right): http://codereview.chromium.org/6317011/diff/17001/net/http/http_response_head... net/http/http_response_headers.cc:95: kHistogramMaxHttpResponseCode = 599 }; On 2011/01/25 18:38:00, wtc wrote: > This should be formatted as > > enum { > kHistogramMinHttpResponseCode = 100, > kHistogramMaxHttpResponseCode = 599 > }; Done. http://codereview.chromium.org/6317011/diff/17001/net/http/http_response_head... net/http/http_response_headers.cc:103: i <= kHistogramMaxHttpResponseCode;++i) On 2011/01/25 18:38:00, wtc wrote: > Document what the 0 status code added on line 101 means. > > Nit: add a space before ++i. Done. http://codereview.chromium.org/6317011/diff/17001/net/http/http_response_head... net/http/http_response_headers.cc:103: i <= kHistogramMaxHttpResponseCode;++i) On 2011/01/25 18:38:00, wtc wrote: > Document what the 0 status code added on line 101 means. > > Nit: add a space before ++i. Done. http://codereview.chromium.org/6317011/diff/17001/net/http/http_response_head... net/http/http_response_headers.cc:108: int MapHttpResponseCode(const int code) { On 2011/01/25 18:38:00, wtc wrote: > Nit: remove the 'const' from the function parameter. Done. http://codereview.chromium.org/6317011/diff/17001/net/http/http_response_head... net/http/http_response_headers.cc:141: GetAllHttpResponseCodes()); On 2011/01/25 18:38:00, wtc wrote: > It is wasteful to build the std::vector<int> of codes every > time. There must be a way to avoid this. Added a comment documenting that this is not what's occuring.
remediated to latest reviews. Thanks everyone!
LGTM. http://codereview.chromium.org/6317011/diff/29001/net/http/http_response_head... File net/http/http_response_headers.cc (right): http://codereview.chromium.org/6317011/diff/29001/net/http/http_response_head... net/http/http_response_headers.cc:92: // ares are not sent in practice, to reduce upload size & memory use. Typo: remove 'ares', or change it to 'ones' http://codereview.chromium.org/6317011/diff/29001/net/http/http_response_head... net/http/http_response_headers.cc:95: kHistogramMaxHttpResponseCode = 599, The formatting is still wrong. It should be: enum { kHistogramMinHttpResponseCode = 100, kHistogramMaxHttpResponseCode = 599, }; http://codereview.chromium.org/6317011/diff/29001/net/http/http_response_head... net/http/http_response_headers.cc:103: // range [kHistogramMinHttpResponseCode, kHistogramMaxHttpResponseCode]. I see that you already explained the 0 histogram value at lines 89-90 above. So you may remove this comment I asked you to add. http://codereview.chromium.org/6317011/diff/29001/net/http/http_response_head... net/http/http_response_headers.cc:141: // instantiating the histogram in two places. You should also explain why the other constructors don't call this histogram. I would not do the histogram in a HttpResponseHeaders constructor because this side effect is not obvious, but this may be a matter of personal taste. Nit: don't abbreviate "def'n".
http://codereview.chromium.org/6317011/diff/29001/net/http/http_response_head... File net/http/http_response_headers.cc (right): http://codereview.chromium.org/6317011/diff/29001/net/http/http_response_head... net/http/http_response_headers.cc:95: kHistogramMaxHttpResponseCode = 599, nit: On Chrome, we use MACRO_STYLE_NAMING for enums (contrary to google3). see: https://sites.google.com/a/chromium.org/dev/developers/coding-style
Thanks for the reviews everyone! Willchan, I put together a python server that provides 100 response codes, and I've confirmed the writeup above: when 100 response code is provided, nothing is double counted. http://codereview.chromium.org/6317011/diff/29001/net/http/http_response_head... File net/http/http_response_headers.cc (right): http://codereview.chromium.org/6317011/diff/29001/net/http/http_response_head... net/http/http_response_headers.cc:92: // ares are not sent in practice, to reduce upload size & memory use. On 2011/01/26 19:03:11, wtc wrote: > Typo: remove 'ares', or change it to 'ones' Done. http://codereview.chromium.org/6317011/diff/29001/net/http/http_response_head... net/http/http_response_headers.cc:95: kHistogramMaxHttpResponseCode = 599, On 2011/01/26 19:03:11, wtc wrote: > The formatting is still wrong. It should be: > > enum { > kHistogramMinHttpResponseCode = 100, > kHistogramMaxHttpResponseCode = 599, > }; Done. http://codereview.chromium.org/6317011/diff/29001/net/http/http_response_head... net/http/http_response_headers.cc:95: kHistogramMaxHttpResponseCode = 599, On 2011/01/26 21:33:14, jar wrote: > nit: On Chrome, we use MACRO_STYLE_NAMING for enums (contrary to google3). > > see: https://sites.google.com/a/chromium.org/dev/developers/coding-style Done. http://codereview.chromium.org/6317011/diff/29001/net/http/http_response_head... net/http/http_response_headers.cc:103: // range [kHistogramMinHttpResponseCode, kHistogramMaxHttpResponseCode]. On 2011/01/26 19:03:11, wtc wrote: > I see that you already explained the 0 histogram value at lines 89-90 above. > So you may remove this comment I asked you to add. Done. http://codereview.chromium.org/6317011/diff/29001/net/http/http_response_head... net/http/http_response_headers.cc:141: // instantiating the histogram in two places. On 2011/01/26 19:03:11, wtc wrote: > You should also explain why the other constructors don't call this histogram. > > I would not do the histogram in a HttpResponseHeaders constructor because > this side effect is not obvious, but this may be a matter of personal taste. > > Nit: don't abbreviate "def'n". Done. |