|
|
Created:
10 years ago by Scott Hess - ex-Googler Modified:
9 years, 6 months ago CC:
chromium-reviews Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionConsolidate some safe-browsing histograms.
Combine SB2.GetHash200 and SB2.GetHash204 into SB2.GetHashResult enum.
Rework SB2.GetHashServerMiss to increment only once per request,
rather than for each client. Also fold it into SB2.GetHashResult.
BUG=none
TEST=Monitor histograms.
Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=70243
Patch Set 1 #
Total comments: 5
Patch Set 2 : Histogram hits instead of misses, also empty responses. #
Total comments: 3
Patch Set 3 : git try #Patch Set 4 : tweaking #
Messages
Total messages: 14 (0 generated)
The all-misses version has been wrong forever, I _think_ I've matched the intent.
LGTM. http://codereview.chromium.org/6025005/diff/1/chrome/browser/safe_browsing/pr... File chrome/browser/safe_browsing/protocol_manager.h (right): http://codereview.chromium.org/6025005/diff/1/chrome/browser/safe_browsing/pr... chrome/browser/safe_browsing/protocol_manager.h:166: GET_HASH_NO_HITS, // None of the full hashes matched. Can you add some more comments to clarify that GET_HASH_NO_HITS could be hit for both GET_HASH_STATUS_200 and GET_HASH_STATUS_204? Otherwise, it could be misunderstood that the three enum covers three exclusive sets.
LGTM http://codereview.chromium.org/6025005/diff/1/chrome/browser/safe_browsing/pr... File chrome/browser/safe_browsing/protocol_manager.h (right): http://codereview.chromium.org/6025005/diff/1/chrome/browser/safe_browsing/pr... chrome/browser/safe_browsing/protocol_manager.h:164: GET_HASH_STATUS_200, Is it worth tracking other status codes here? It might be interesting to get an idea of how often people are being affected by odd errors (proxy auth, 500s, etc.). Or maybe that should be a separate histogram?
http://codereview.chromium.org/6025005/diff/1/chrome/browser/safe_browsing/pr... File chrome/browser/safe_browsing/protocol_manager.h (right): http://codereview.chromium.org/6025005/diff/1/chrome/browser/safe_browsing/pr... chrome/browser/safe_browsing/protocol_manager.h:164: GET_HASH_STATUS_200, On 2010/12/22 19:29:19, Erik Kay wrote: > Is it worth tracking other status codes here? It might be interesting to get an > idea of how often people are being affected by odd errors (proxy auth, 500s, > etc.). Or maybe that should be a separate histogram? Hmm. I vote for a separate histogram. It can enumerate the codes in a generic fashion so the we could either have a histogram for all safe-browsing requests, or one for each different request, all using the same layout. Also, it's really more of a transit-level bit of information, whereas this is more of a protocol-level item. http://codereview.chromium.org/6025005/diff/1/chrome/browser/safe_browsing/pr... chrome/browser/safe_browsing/protocol_manager.h:166: GET_HASH_NO_HITS, // None of the full hashes matched. On 2010/12/22 00:24:58, lzheng wrote: > Can you add some more comments to clarify that GET_HASH_NO_HITS could be hit for > both GET_HASH_STATUS_200 and GET_HASH_STATUS_204? Otherwise, it could be > misunderstood that the three enum covers three exclusive sets. In thinking this through, my understanding is that 204 has an empty body, so full_hashes will be empty, so the current value will be GET_HASH_STATUS_204 + epsilon. WDYT of inverting this to instead measure the number of hits, which will be strictly a subset of GET_HASH_STATUS_200?
http://codereview.chromium.org/6025005/diff/1/chrome/browser/safe_browsing/pr... File chrome/browser/safe_browsing/protocol_manager.h (right): http://codereview.chromium.org/6025005/diff/1/chrome/browser/safe_browsing/pr... chrome/browser/safe_browsing/protocol_manager.h:166: GET_HASH_NO_HITS, // None of the full hashes matched. On 2010/12/22 20:41:51, shess wrote: > On 2010/12/22 00:24:58, lzheng wrote: > > Can you add some more comments to clarify that GET_HASH_NO_HITS could be hit > for > > both GET_HASH_STATUS_200 and GET_HASH_STATUS_204? Otherwise, it could be > > misunderstood that the three enum covers three exclusive sets. > > In thinking this through, my understanding is that 204 has an empty body, so > full_hashes will be empty, so the current value will be GET_HASH_STATUS_204 + > epsilon. WDYT of inverting this to instead measure the number of hits, which > will be strictly a subset of GET_HASH_STATUS_200? Glad I thought more, because GET_HASH_STATUS_200 can also return no full hashes. This is the case we were recently in, I think, where the prefix was recognized but had been logically deleted on the server side. I think that's reasonable to track separately from real misses.
On Wed, Dec 22, 2010 at 12:49 PM, <shess@chromium.org> wrote: > > > http://codereview.chromium.org/6025005/diff/1/chrome/browser/safe_browsing/pr... > File chrome/browser/safe_browsing/protocol_manager.h (right): > > > http://codereview.chromium.org/6025005/diff/1/chrome/browser/safe_browsing/pr... > chrome/browser/safe_browsing/protocol_manager.h:166: GET_HASH_NO_HITS, > // None of the full hashes matched. > On 2010/12/22 20:41:51, shess wrote: > >> On 2010/12/22 00:24:58, lzheng wrote: >> > Can you add some more comments to clarify that GET_HASH_NO_HITS >> > could be hit > >> for >> > both GET_HASH_STATUS_200 and GET_HASH_STATUS_204? Otherwise, it >> > could be > >> > misunderstood that the three enum covers three exclusive sets. >> > > In thinking this through, my understanding is that 204 has an empty >> > body, so > >> full_hashes will be empty, so the current value will be >> > GET_HASH_STATUS_204 + > >> epsilon. WDYT of inverting this to instead measure the number of >> > hits, which > >> will be strictly a subset of GET_HASH_STATUS_200? >> > > Glad I thought more, because GET_HASH_STATUS_200 can also return no full > hashes. This is the case we were recently in, I think, where the prefix > was recognized but had been logically deleted on the server side. I > think that's reasonable to track separately from real misses. > > Sounds good. > > http://codereview.chromium.org/6025005/ >
OK, modified to record hits to full_hashes and empty responses.
http://codereview.chromium.org/6025005/diff/8001/chrome/browser/safe_browsing... File chrome/browser/safe_browsing/protocol_manager.h (right): http://codereview.chromium.org/6025005/diff/8001/chrome/browser/safe_browsing... chrome/browser/safe_browsing/protocol_manager.h:170: // case, and also 200 responses for prefixes which are logically So should we split these up? GET_HASH_FULL_HASH_EMPTY_200/204? Without them being split, I'm not sure how you would know what we were getting more of one than the other.
http://codereview.chromium.org/6025005/diff/8001/chrome/browser/safe_browsing... File chrome/browser/safe_browsing/protocol_manager.h (right): http://codereview.chromium.org/6025005/diff/8001/chrome/browser/safe_browsing... chrome/browser/safe_browsing/protocol_manager.h:170: // case, and also 200 responses for prefixes which are logically On 2010/12/22 22:07:22, Erik Kay wrote: > So should we split these up? GET_HASH_FULL_HASH_EMPTY_200/204? Without them > being split, I'm not sure how you would know what we were getting more of one > than the other. My understanding is that every GET_HASH_STATUS_204 should be empty (that's what http status code 204 means), so that shouldn't add any info about what is already here. Would it?
http://codereview.chromium.org/6025005/diff/8001/chrome/browser/safe_browsing... File chrome/browser/safe_browsing/protocol_manager.h (right): http://codereview.chromium.org/6025005/diff/8001/chrome/browser/safe_browsing... chrome/browser/safe_browsing/protocol_manager.h:170: // case, and also 200 responses for prefixes which are logically On 2010/12/22 22:13:20, shess wrote: > On 2010/12/22 22:07:22, Erik Kay wrote: > > So should we split these up? GET_HASH_FULL_HASH_EMPTY_200/204? Without them > > being split, I'm not sure how you would know what we were getting more of one > > than the other. > > My understanding is that every GET_HASH_STATUS_204 should be empty (that's what > http status code 204 means), so that shouldn't add any info about what is > already here. Would it? That's my understanding too. So, GET_HASH_FULL_HASH_EMPTY = (GET_HASH_STATUS_200 - GET_HASH_FULL_HASH_HIT) + GET_HASH_STATUS_204. I think you could export (GET_HASH_STATUS_200 - GET_HASH_FULL_HASH_HIT) too (what's Erik wants, I believe), but since it could be calculated out, I am fine either way.
The spot where the hits are known doesn't know what of its inputs are 200 versus 204. I could bump the "empty" histogram up a level, and have it record 200_empty distinct from 200_all and 204. As currently phrased, "empty" plus "one or more hits" plus "not empty but no hits" should sum up to 200+204, which is why I put the "empty" case down near the "one or more hits" case. I could record the "not empty but no hits" directly by having an else on both the "one or more hits" cases. I wasn't sure that was clear, though, because it's not simply "no hits". -scott On Wed, Dec 22, 2010 at 3:56 PM, <lzheng@chromium.org> wrote: > > http://codereview.chromium.org/6025005/diff/8001/chrome/browser/safe_browsing... > File chrome/browser/safe_browsing/protocol_manager.h (right): > > http://codereview.chromium.org/6025005/diff/8001/chrome/browser/safe_browsing... > chrome/browser/safe_browsing/protocol_manager.h:170: // case, and also > 200 responses for prefixes which are logically > On 2010/12/22 22:13:20, shess wrote: >> >> On 2010/12/22 22:07:22, Erik Kay wrote: >> > So should we split these up? GET_HASH_FULL_HASH_EMPTY_200/204? > > Without them >> >> > being split, I'm not sure how you would know what we were getting > > more of one >> >> > than the other. > >> My understanding is that every GET_HASH_STATUS_204 should be empty > > (that's what >> >> http status code 204 means), so that shouldn't add any info about what > > is >> >> already here. Would it? > > That's my understanding too. So, GET_HASH_FULL_HASH_EMPTY = > (GET_HASH_STATUS_200 - GET_HASH_FULL_HASH_HIT) + GET_HASH_STATUS_204. I > think you could export (GET_HASH_STATUS_200 - GET_HASH_FULL_HASH_HIT) > too (what's Erik wants, I believe), but since it could be calculated > out, I am fine either way. > > http://codereview.chromium.org/6025005/ >
Lei's point about you being able to calculate the result is all I was talking about. If you wanted to at a glance count "recognized hash that was deleted on the server", you simply need to look at GET_HASH_FULL_HASH_EMPTY - GET_HASH_STATUS_204. I guess that's good enough, but it seems like semantically these are separate values. Don't we only really care about these three buckets? FALSE_POSITIVE (204) STALE (200, but empty) MATCH (200, non-empty) On Wed, Dec 22, 2010 at 4:03 PM, Scott Hess <shess@chromium.org> wrote: > The spot where the hits are known doesn't know what of its inputs are > 200 versus 204. I could bump the "empty" histogram up a level, and > have it record 200_empty distinct from 200_all and 204. As currently > phrased, "empty" plus "one or more hits" plus "not empty but no hits" > should sum up to 200+204, which is why I put the "empty" case down > near the "one or more hits" case. > > I could record the "not empty but no hits" directly by having an else > on both the "one or more hits" cases. I wasn't sure that was clear, > though, because it's not simply "no hits". > > -scott > > > On Wed, Dec 22, 2010 at 3:56 PM, <lzheng@chromium.org> wrote: > > > > > http://codereview.chromium.org/6025005/diff/8001/chrome/browser/safe_browsing... > > File chrome/browser/safe_browsing/protocol_manager.h (right): > > > > > http://codereview.chromium.org/6025005/diff/8001/chrome/browser/safe_browsing... > > chrome/browser/safe_browsing/protocol_manager.h:170: // case, and also > > 200 responses for prefixes which are logically > > On 2010/12/22 22:13:20, shess wrote: > >> > >> On 2010/12/22 22:07:22, Erik Kay wrote: > >> > So should we split these up? GET_HASH_FULL_HASH_EMPTY_200/204? > > > > Without them > >> > >> > being split, I'm not sure how you would know what we were getting > > > > more of one > >> > >> > than the other. > > > >> My understanding is that every GET_HASH_STATUS_204 should be empty > > > > (that's what > >> > >> http status code 204 means), so that shouldn't add any info about what > > > > is > >> > >> already here. Would it? > > > > That's my understanding too. So, GET_HASH_FULL_HASH_EMPTY = > > (GET_HASH_STATUS_200 - GET_HASH_FULL_HASH_HIT) + GET_HASH_STATUS_204. I > > think you could export (GET_HASH_STATUS_200 - GET_HASH_FULL_HASH_HIT) > > too (what's Erik wants, I believe), but since it could be calculated > > out, I am fine either way. > > > > http://codereview.chromium.org/6025005/ > > >
You can also miss on a non-empty 200 (like if yahoo.com and malware.com end up with the same prefix. I mentioned to Lei that I would rather have names be clean rather than ambiguous, even if the result is that you need a little work to derive some values. That way 9 months out the new guy doesn't have to read between the lines of what was _meant_. But maybe it would be easier to just make the three check bins (STALE, MATCH, MISS) entirely independent from the two result-status bins (200, 204). Then both sets should sum to the same value and a bunch of explaining is not necessary. -scott On 2010/12/23 00:18:36, Erik Kay wrote: > Lei's point about you being able to calculate the result is all I was > talking about. If you wanted to at a glance count "recognized hash that was > deleted on the server", you simply need to look at GET_HASH_FULL_HASH_EMPTY > - GET_HASH_STATUS_204. I guess that's good enough, but it seems like > semantically these are separate values. Don't we only really care about > these three buckets? > > FALSE_POSITIVE (204) > STALE (200, but empty) > MATCH (200, non-empty) > > > > > On Wed, Dec 22, 2010 at 4:03 PM, Scott Hess <mailto:shess@chromium.org> wrote: > > > The spot where the hits are known doesn't know what of its inputs are > > 200 versus 204. I could bump the "empty" histogram up a level, and > > have it record 200_empty distinct from 200_all and 204. As currently > > phrased, "empty" plus "one or more hits" plus "not empty but no hits" > > should sum up to 200+204, which is why I put the "empty" case down > > near the "one or more hits" case. > > > > I could record the "not empty but no hits" directly by having an else > > on both the "one or more hits" cases. I wasn't sure that was clear, > > though, because it's not simply "no hits". > > > > -scott > > > > > > On Wed, Dec 22, 2010 at 3:56 PM, <mailto:lzheng@chromium.org> wrote: > > > > > > > > > http://codereview.chromium.org/6025005/diff/8001/chrome/browser/safe_browsing... > > > File chrome/browser/safe_browsing/protocol_manager.h (right): > > > > > > > > > http://codereview.chromium.org/6025005/diff/8001/chrome/browser/safe_browsing... > > > chrome/browser/safe_browsing/protocol_manager.h:170: // case, and also > > > 200 responses for prefixes which are logically > > > On 2010/12/22 22:13:20, shess wrote: > > >> > > >> On 2010/12/22 22:07:22, Erik Kay wrote: > > >> > So should we split these up? GET_HASH_FULL_HASH_EMPTY_200/204? > > > > > > Without them > > >> > > >> > being split, I'm not sure how you would know what we were getting > > > > > > more of one > > >> > > >> > than the other. > > > > > >> My understanding is that every GET_HASH_STATUS_204 should be empty > > > > > > (that's what > > >> > > >> http status code 204 means), so that shouldn't add any info about what > > > > > > is > > >> > > >> already here. Would it? > > > > > > That's my understanding too. So, GET_HASH_FULL_HASH_EMPTY = > > > (GET_HASH_STATUS_200 - GET_HASH_FULL_HASH_HIT) + GET_HASH_STATUS_204. I > > > think you could export (GET_HASH_STATUS_200 - GET_HASH_FULL_HASH_HIT) > > > too (what's Erik wants, I believe), but since it could be calculated > > > out, I am fine either way. > > > > > > http://codereview.chromium.org/6025005/ > > > > >
On Wed, Dec 22, 2010 at 4:31 PM, <shess@chromium.org> wrote: > You can also miss on a non-empty 200 (like if yahoo.com and malware.comend up > with the same prefix. > > I mentioned to Lei that I would rather have names be clean rather than > ambiguous, even if the result is that you need a little work to derive some > values. That way 9 months out the new guy doesn't have to read between the > lines of what was _meant_. But maybe it would be easier to just make the > three > check bins (STALE, MATCH, MISS) entirely independent from the two > result-status > bins (200, 204). Then both sets should sum to the same value and a bunch > of > explaining is not necessary. > This sounds like the right approach. It feels much more clear. > > -scott > > > > On 2010/12/23 00:18:36, Erik Kay wrote: > >> Lei's point about you being able to calculate the result is all I was >> talking about. If you wanted to at a glance count "recognized hash that >> was >> deleted on the server", you simply need to look at >> GET_HASH_FULL_HASH_EMPTY >> - GET_HASH_STATUS_204. I guess that's good enough, but it seems like >> semantically these are separate values. Don't we only really care about >> these three buckets? >> > > FALSE_POSITIVE (204) >> STALE (200, but empty) >> MATCH (200, non-empty) >> > > > > > On Wed, Dec 22, 2010 at 4:03 PM, Scott Hess <mailto:shess@chromium.org> >> wrote: >> > > > The spot where the hits are known doesn't know what of its inputs are >> > 200 versus 204. I could bump the "empty" histogram up a level, and >> > have it record 200_empty distinct from 200_all and 204. As currently >> > phrased, "empty" plus "one or more hits" plus "not empty but no hits" >> > should sum up to 200+204, which is why I put the "empty" case down >> > near the "one or more hits" case. >> > >> > I could record the "not empty but no hits" directly by having an else >> > on both the "one or more hits" cases. I wasn't sure that was clear, >> > though, because it's not simply "no hits". >> > >> > -scott >> > >> > >> > On Wed, Dec 22, 2010 at 3:56 PM, <mailto:lzheng@chromium.org> wrote: >> > > >> > > >> > >> > > > http://codereview.chromium.org/6025005/diff/8001/chrome/browser/safe_browsing... > >> > > File chrome/browser/safe_browsing/protocol_manager.h (right): >> > > >> > > >> > >> > > > http://codereview.chromium.org/6025005/diff/8001/chrome/browser/safe_browsing... > >> > > chrome/browser/safe_browsing/protocol_manager.h:170: // case, and also >> > > 200 responses for prefixes which are logically >> > > On 2010/12/22 22:13:20, shess wrote: >> > >> >> > >> On 2010/12/22 22:07:22, Erik Kay wrote: >> > >> > So should we split these up? GET_HASH_FULL_HASH_EMPTY_200/204? >> > > >> > > Without them >> > >> >> > >> > being split, I'm not sure how you would know what we were getting >> > > >> > > more of one >> > >> >> > >> > than the other. >> > > >> > >> My understanding is that every GET_HASH_STATUS_204 should be empty >> > > >> > > (that's what >> > >> >> > >> http status code 204 means), so that shouldn't add any info about >> what >> > > >> > > is >> > >> >> > >> already here. Would it? >> > > >> > > That's my understanding too. So, GET_HASH_FULL_HASH_EMPTY = >> > > (GET_HASH_STATUS_200 - GET_HASH_FULL_HASH_HIT) + GET_HASH_STATUS_204. >> I >> > > think you could export (GET_HASH_STATUS_200 - GET_HASH_FULL_HASH_HIT) >> > > too (what's Erik wants, I believe), but since it could be calculated >> > > out, I am fine either way. >> > > >> > > http://codereview.chromium.org/6025005/ >> > > >> > >> > > > > http://codereview.chromium.org/6025005/ > |