|
|
Created:
6 years, 10 months ago by noé Modified:
6 years, 9 months ago CC:
chromium-reviews, lucasballard_google.com Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionSeparate pre-classification checks for client-side malware and phishing detection and enable the client-side malware
feature in all channels (not just dev and canary).
BUG=352782
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=260171
Patch Set 1 #
Total comments: 18
Patch Set 2 : Address Matt's comments. #Patch Set 3 : Remove Done() method which isn't used. #Patch Set 4 : Remove done() #
Total comments: 8
Patch Set 5 : add unit-testing #Patch Set 6 : browse_info_ isn't always valid. #Patch Set 7 : Fix the service unit-test. #
Total comments: 30
Patch Set 8 : Address Matt's comments. #
Total comments: 6
Patch Set 9 : More comments #
Total comments: 2
Patch Set 10 : final comments from matt. #Patch Set 11 : Merge and fix issues in the histograms file. #
Total comments: 6
Patch Set 12 : add histogram #Patch Set 13 : Change some of the histograms to bool histograms when appropriate. #
Total comments: 2
Patch Set 14 : Set right enum type for boolean histograms. #Messages
Total messages: 28 (0 generated)
Hi Matt, This CL splits the pre-classification checks for client-side detection in two. We want to do separate pre-classification checks for the phishing and malware classifiers. Could you take a first look and tell me if you think that makes sense? Changing unit-tests is notoriously laborious in Chrome code so I would appreciate getting high level comments before making changes to the large unit-tests. Thanks for your help, noe.
sorry for delay. This code is somewhat hard to follow (before too). https://codereview.chromium.org/173133004/diff/1/chrome/browser/safe_browsing... File chrome/browser/safe_browsing/client_side_detection_host.cc (right): https://codereview.chromium.org/173133004/diff/1/chrome/browser/safe_browsing... chrome/browser/safe_browsing/client_side_detection_host.cc:93: UMA_HISTOGRAM_COUNTS("SBClientMalware.ClassificationStart", 1); Don't forget to update histograms.xml (it's in the same repo now so you don't need a separate CL) https://codereview.chromium.org/173133004/diff/1/chrome/browser/safe_browsing... chrome/browser/safe_browsing/client_side_detection_host.cc:98: MaybeClassifyForPhishing()) { It is confusing that some of the tests like this one check the MaybeClassifyForX() as part of the condition, whereas others just depend on the DontClassifyForX doing the check internally. https://codereview.chromium.org/173133004/diff/1/chrome/browser/safe_browsing... chrome/browser/safe_browsing/client_side_detection_host.cc:188: // Track the first reason why we stopped classifying for phishing. s/phishing/malware/ https://codereview.chromium.org/173133004/diff/1/chrome/browser/safe_browsing... chrome/browser/safe_browsing/client_side_detection_host.cc:478: should_classify_for_malware_ = should_classify; I think there are going to be some races between this callback, ClientSideDetectionHost::Observe, and DocumentOnLoadCompletedInMainFrame. https://codereview.chromium.org/173133004/diff/1/chrome/browser/safe_browsing... chrome/browser/safe_browsing/client_side_detection_host.cc:482: int32 page_id) { Add thread dcheck (probably in more other methods too) https://codereview.chromium.org/173133004/diff/1/chrome/browser/safe_browsing... chrome/browser/safe_browsing/client_side_detection_host.cc:486: // that except when there is an error. Not super clear, but it sounds like if there are slow loading resources or the page's onload handler takes an arbitrarily long amount of time, this could happen? Seems like it would be better to just only clear the browse_info when both are done with it. https://codereview.chromium.org/173133004/diff/1/chrome/browser/safe_browsing... chrome/browser/safe_browsing/client_side_detection_host.cc:490: if (browse_info_->page_id == page_id && Is there an expected case where the pageids don't match? Should there be a dcheck or uma log ? https://codereview.chromium.org/173133004/diff/1/chrome/browser/safe_browsing... File chrome/browser/safe_browsing/client_side_detection_host.h (right): https://codereview.chromium.org/173133004/diff/1/chrome/browser/safe_browsing... chrome/browser/safe_browsing/client_side_detection_host.h:126: virtual void DocumentOnLoadCompletedInMainFrame(int32 page_id); OVERRIDE
Thanks Matt, I agree that this code isn't very readable. Sorry for not making it much cleaner. I believe I've addressed most of your comments. Thanks for your input! n. https://codereview.chromium.org/173133004/diff/1/chrome/browser/safe_browsing... File chrome/browser/safe_browsing/client_side_detection_host.cc (right): https://codereview.chromium.org/173133004/diff/1/chrome/browser/safe_browsing... chrome/browser/safe_browsing/client_side_detection_host.cc:93: UMA_HISTOGRAM_COUNTS("SBClientMalware.ClassificationStart", 1); On 2014/02/21 00:35:29, mattm wrote: > Don't forget to update histograms.xml (it's in the same repo now so you don't > need a separate CL) Done. https://codereview.chromium.org/173133004/diff/1/chrome/browser/safe_browsing... chrome/browser/safe_browsing/client_side_detection_host.cc:98: MaybeClassifyForPhishing()) { On 2014/02/21 00:35:29, mattm wrote: > It is confusing that some of the tests like this one check the > MaybeClassifyForX() as part of the condition, whereas others just depend on the > DontClassifyForX doing the check internally. That's a leftover. We shouldn't need to check Maybe* anywhere except to know whether we should continue at all. For example, right before we post a new task to the IO thread. https://codereview.chromium.org/173133004/diff/1/chrome/browser/safe_browsing... chrome/browser/safe_browsing/client_side_detection_host.cc:188: // Track the first reason why we stopped classifying for phishing. On 2014/02/21 00:35:29, mattm wrote: > s/phishing/malware/ Done. https://codereview.chromium.org/173133004/diff/1/chrome/browser/safe_browsing... chrome/browser/safe_browsing/client_side_detection_host.cc:478: should_classify_for_malware_ = should_classify; On 2014/02/21 00:35:29, mattm wrote: > I think there are going to be some races between this callback, > ClientSideDetectionHost::Observe, and DocumentOnLoadCompletedInMainFrame. Good point. Created a separate method with a member variable that keeps the state around. https://codereview.chromium.org/173133004/diff/1/chrome/browser/safe_browsing... chrome/browser/safe_browsing/client_side_detection_host.cc:482: int32 page_id) { On 2014/02/21 00:35:29, mattm wrote: > Add thread dcheck (probably in more other methods too) Done. https://codereview.chromium.org/173133004/diff/1/chrome/browser/safe_browsing... chrome/browser/safe_browsing/client_side_detection_host.cc:486: // that except when there is an error. On 2014/02/21 00:35:29, mattm wrote: > Not super clear, but it sounds like if there are slow loading resources or the > page's onload handler takes an arbitrarily long amount of time, this could > happen? Seems like it would be better to just only clear the browse_info when > both are done with it. Removed that part. Is there a help class to do reference counting by hand? I'd rather avoid yet another flag or member variable to track that both feature extractions are done. Any thoughts? Right now, I decided not to delete the browse info. It's a pretty small object and it will be deleted once a new page loads. https://codereview.chromium.org/173133004/diff/1/chrome/browser/safe_browsing... chrome/browser/safe_browsing/client_side_detection_host.cc:490: if (browse_info_->page_id == page_id && On 2014/02/21 00:35:29, mattm wrote: > Is there an expected case where the pageids don't match? Should there be a > dcheck or uma log ? Added a UMA stat for that instead. https://codereview.chromium.org/173133004/diff/1/chrome/browser/safe_browsing... File chrome/browser/safe_browsing/client_side_detection_host.h (right): https://codereview.chromium.org/173133004/diff/1/chrome/browser/safe_browsing... chrome/browser/safe_browsing/client_side_detection_host.h:126: virtual void DocumentOnLoadCompletedInMainFrame(int32 page_id); On 2014/02/21 00:35:29, mattm wrote: > OVERRIDE Done.
Yeah, general approach seems good. https://codereview.chromium.org/173133004/diff/1/chrome/browser/safe_browsing... File chrome/browser/safe_browsing/client_side_detection_host.cc (right): https://codereview.chromium.org/173133004/diff/1/chrome/browser/safe_browsing... chrome/browser/safe_browsing/client_side_detection_host.cc:486: // that except when there is an error. On 2014/02/21 19:04:16, noé wrote: > On 2014/02/21 00:35:29, mattm wrote: > > Not super clear, but it sounds like if there are slow loading resources or the > > page's onload handler takes an arbitrarily long amount of time, this could > > happen? Seems like it would be better to just only clear the browse_info when > > both are done with it. > > Removed that part. > > Is there a help class to do reference counting by hand? I'd rather avoid yet > another flag or member variable to track that both feature extractions are done. > Any thoughts? Right now, I decided not to delete the browse info. It's a > pretty small object and it will be deleted once a new page loads. Don't think there's too much like that. Could make it a RefCounted object and keep separate refptrs for the malware and phishing. I guess there is the RefCountedData wrapper but I'm not sure that is much better than just making it RefCounted (and RefCountedData is thread-safe which isn't necessary here.) https://codereview.chromium.org/173133004/diff/130001/chrome/browser/safe_bro... File chrome/browser/safe_browsing/client_side_detection_host.cc (right): https://codereview.chromium.org/173133004/diff/130001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host.cc:253: // phishing we want to give ourselves a chance to fix false positives. Guess it's not really related to this CL, but this comment seems confusing. Maybe I'm not clear on what "fix false positives" means. Seems like it's really something more like "if we had a cached result but it's too old, always re-check it". Another interesting thing is if the classifier doesn't think it's phishing, the entry in the cache won't actually be updated, so this would keep checking until the cache entry gets expired. (If you agree with any of that, you don't need to fix in this CL.) https://codereview.chromium.org/173133004/diff/130001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host.cc:352: weak_factory_.InvalidateWeakPtrs(); I can't remember, does invalidating weak pointers make the callback.is_null() return true? If so you might need to move the classification_request_->Cancel(); above this. https://codereview.chromium.org/173133004/diff/130001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host.cc:377: should_classify_for_malware_.reset(); Seems like a plain bool initialized to false here would achieve the same effect.
Sorry for the delay. Thanks for your comments Matt. Please take another look. I've added unit-testing. noe. https://codereview.chromium.org/173133004/diff/1/chrome/browser/safe_browsing... File chrome/browser/safe_browsing/client_side_detection_host.cc (right): https://codereview.chromium.org/173133004/diff/1/chrome/browser/safe_browsing... chrome/browser/safe_browsing/client_side_detection_host.cc:486: // that except when there is an error. On 2014/02/22 02:49:03, mattm wrote: > On 2014/02/21 19:04:16, noé wrote: > > On 2014/02/21 00:35:29, mattm wrote: > > > Not super clear, but it sounds like if there are slow loading resources or > the > > > page's onload handler takes an arbitrarily long amount of time, this could > > > happen? Seems like it would be better to just only clear the browse_info > when > > > both are done with it. > > > > Removed that part. > > > > Is there a help class to do reference counting by hand? I'd rather avoid yet > > another flag or member variable to track that both feature extractions are > done. > > Any thoughts? Right now, I decided not to delete the browse info. It's a > > pretty small object and it will be deleted once a new page loads. > > Don't think there's too much like that. Could make it a RefCounted object and > keep separate refptrs for the malware and phishing. > > I guess there is the RefCountedData wrapper but I'm not sure that is much better > than just making it RefCounted (and RefCountedData is thread-safe which isn't > necessary here.) > RefCounted isn't a good fit here. The object is a member of the host and should be deleted once both malware and phishing classification is done. Since I can't manually increment the reference counter it's not that helpful. I think the object is small enough to keep it around. https://codereview.chromium.org/173133004/diff/130001/chrome/browser/safe_bro... File chrome/browser/safe_browsing/client_side_detection_host.cc (right): https://codereview.chromium.org/173133004/diff/130001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host.cc:253: // phishing we want to give ourselves a chance to fix false positives. On 2014/02/22 02:49:03, mattm wrote: > Guess it's not really related to this CL, but this comment seems confusing. > Maybe I'm not clear on what "fix false positives" means. > > Seems like it's really something more like "if we had a cached result but it's > too old, always re-check it". If the cached value is phishing we want to re-send a server ping to make sure it's still phishing. That's how we fix false positives. Changed the comment slightly. > Another interesting thing is if the classifier doesn't think it's phishing, the > entry in the cache won't actually be updated, so this would keep checking until > the cache entry gets expired. > > (If you agree with any of that, you don't need to fix in this CL.) That's a good observation. I don't think we want to cache any URL that isn't classified as phishing on the client side. Otherwise, the cache will blow up quickly. The client verdict is unlikely to change from phishing to non-phishing unless the model changes so I think we're OK here. https://codereview.chromium.org/173133004/diff/130001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host.cc:352: weak_factory_.InvalidateWeakPtrs(); On 2014/02/22 02:49:03, mattm wrote: > I can't remember, does invalidating weak pointers make the callback.is_null() > return true? If so you might need to move the classification_request_->Cancel(); > above this. Done. https://codereview.chromium.org/173133004/diff/130001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host.cc:377: should_classify_for_malware_.reset(); On 2014/02/22 02:49:03, mattm wrote: > Seems like a plain bool initialized to false here would achieve the same effect. MaybeStartMalwareFeatureExtraction() needs to know whether or not OnMalwarePreClassificationDone() was called already. It does so by checking whether that flag is NULL or not before checking its actual value.
https://codereview.chromium.org/173133004/diff/130001/chrome/browser/safe_bro... File chrome/browser/safe_browsing/client_side_detection_host.cc (right): https://codereview.chromium.org/173133004/diff/130001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host.cc:377: should_classify_for_malware_.reset(); On 2014/03/14 22:21:32, noé wrote: > On 2014/02/22 02:49:03, mattm wrote: > > Seems like a plain bool initialized to false here would achieve the same > effect. > > MaybeStartMalwareFeatureExtraction() needs to know whether or not > OnMalwarePreClassificationDone() was called already. It does so by checking > whether that flag is NULL or not before checking its actual value. But the only thing it actually checks is "should_classify_for_malware_.get() && *should_classify_for_malware_" The possible states are 1. false && n/a = false (OnMalwarePreClassificationDone not called yet) 2. true && false = false (OnMalwarePreClassificationDone called and shouldn't classify) 3. true && true = true (OnMalwarePreClassificationDone called and should classify) These states could all be represented by a single bool. https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... File chrome/browser/safe_browsing/browser_feature_extractor.cc (left): https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... chrome/browser/safe_browsing/browser_feature_extractor.cc:265: DCHECK_EQ(0U, request->url().find("http:")); why is this removed? https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... File chrome/browser/safe_browsing/client_side_detection_host.cc (left): https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host.cc:275: channel == chrome::VersionInfo::CHANNEL_CANARY); CL description should mention that it is removing the channel restriction. https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... File chrome/browser/safe_browsing/client_side_detection_host.cc (right): https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host.cc:22: #include "chrome/common/chrome_switches.h" unused? https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host.cc:23: #include "chrome/common/chrome_version_info.h" unused? https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host.cc:122: // the csd-whitelist we won't phishing start classification. The word order? https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host.cc:168: bool MaybeClassifyForPhishing() const { Calling these MaybeFoo is a bit confusing, usually that means the function method might do something itself. https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host.cc:184: VLOG(2) << "Failed phishing pre-classification checks. Reason: " prefer DVLOG instead of VLOG, unless there is a reason for VLOG https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host.cc:204: void CheckCsdWhitelist(const GURL& url) { update name https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host.cc:520: scoped_ptr<ClientMalwareRequest> malware_verdict( malware_request? https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... File chrome/browser/safe_browsing/client_side_detection_host_unittest.cc (right): https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host_unittest.cc:275: if (match_csd_whitelist) { hm, does it make sense to make these dependent on the same value? Would you want to test matching the csd whitelist but not the malware kill switch, or vice versa? https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host_unittest.cc:279: .WillRepeatedly(Return(*match_csd_whitelist)); reason this is Repeatedly instead of Once? https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host_unittest.cc:752: DocumentOnLoadCompletedInMainFrameMalwareIP) { what is this case testing / how is it different than (the first part of) DocumentOnLoadCompletedInMainFrameShowMalwareInterstitial? https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... File chrome/browser/safe_browsing/client_side_detection_service_unittest.cc (right): https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_service_unittest.cc:484: EXPECT_EQ(5U, report_times.size()); changed from 4 to 5 because the limit is not enforced in CSDS now? Should test the limit in client_side_detection_host_unittest.cc.
Thanks Matt for the thorough review! Much appreciated. PTAL. n. https://codereview.chromium.org/173133004/diff/130001/chrome/browser/safe_bro... File chrome/browser/safe_browsing/client_side_detection_host.cc (right): https://codereview.chromium.org/173133004/diff/130001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host.cc:377: should_classify_for_malware_.reset(); On 2014/03/18 02:19:06, mattm wrote: > On 2014/03/14 22:21:32, noé wrote: > > On 2014/02/22 02:49:03, mattm wrote: > > > Seems like a plain bool initialized to false here would achieve the same > > effect. > > > > MaybeStartMalwareFeatureExtraction() needs to know whether or not > > OnMalwarePreClassificationDone() was called already. It does so by checking > > whether that flag is NULL or not before checking its actual value. > > But the only thing it actually checks is "should_classify_for_malware_.get() && > *should_classify_for_malware_" > > The possible states are > 1. false && n/a = false (OnMalwarePreClassificationDone not called yet) > 2. true && false = false (OnMalwarePreClassificationDone called and shouldn't > classify) > 3. true && true = true (OnMalwarePreClassificationDone called and should > classify) > > These states could all be represented by a single bool. You are absolutely right. So sorry about the unnecessary back and forth. https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... File chrome/browser/safe_browsing/browser_feature_extractor.cc (left): https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... chrome/browser/safe_browsing/browser_feature_extractor.cc:265: DCHECK_EQ(0U, request->url().find("http:")); On 2014/03/18 02:19:06, mattm wrote: > why is this removed? For malware we also look at HTTPS websites. This is similar to what we do for malicious download protection. https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... File chrome/browser/safe_browsing/client_side_detection_host.cc (left): https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host.cc:275: channel == chrome::VersionInfo::CHANNEL_CANARY); On 2014/03/18 02:19:06, mattm wrote: > CL description should mention that it is removing the channel restriction. Done. https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... File chrome/browser/safe_browsing/client_side_detection_host.cc (right): https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host.cc:22: #include "chrome/common/chrome_switches.h" On 2014/03/18 02:19:06, mattm wrote: > unused? Done. https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host.cc:23: #include "chrome/common/chrome_version_info.h" On 2014/03/18 02:19:06, mattm wrote: > unused? Done. https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host.cc:122: // the csd-whitelist we won't phishing start classification. The On 2014/03/18 02:19:06, mattm wrote: > word order? Done. https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host.cc:168: bool MaybeClassifyForPhishing() const { On 2014/03/18 02:19:06, mattm wrote: > Calling these MaybeFoo is a bit confusing, usually that means the function > method might do something itself. Done. https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host.cc:184: VLOG(2) << "Failed phishing pre-classification checks. Reason: " On 2014/03/18 02:19:06, mattm wrote: > prefer DVLOG instead of VLOG, unless there is a reason for VLOG Done. https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host.cc:204: void CheckCsdWhitelist(const GURL& url) { On 2014/03/18 02:19:06, mattm wrote: > update name Done. https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host.cc:520: scoped_ptr<ClientMalwareRequest> malware_verdict( On 2014/03/18 02:19:06, mattm wrote: > malware_request? Done. https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... File chrome/browser/safe_browsing/client_side_detection_host_unittest.cc (right): https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host_unittest.cc:275: if (match_csd_whitelist) { On 2014/03/18 02:19:06, mattm wrote: > hm, does it make sense to make these dependent on the same value? Would you want > to test matching the csd whitelist but not the malware kill switch, or vice > versa? Separated them. https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host_unittest.cc:279: .WillRepeatedly(Return(*match_csd_whitelist)); On 2014/03/18 02:19:06, mattm wrote: > reason this is Repeatedly instead of Once? That's because some tests call the Expect method multiple times. Because that mocked function doesn't have a URL to match expectations against I don't see another way than making it repeatedly return the same result. Note: this is the same reason we have IsOffTheRecord as repeated above. https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host_unittest.cc:752: DocumentOnLoadCompletedInMainFrameMalwareIP) { On 2014/03/18 02:19:06, mattm wrote: > what is this case testing / how is it different than (the first part of) > DocumentOnLoadCompletedInMainFrameShowMalwareInterstitial? Good point. The other test is more complete. Removing that test. https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... File chrome/browser/safe_browsing/client_side_detection_service_unittest.cc (right): https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_service_unittest.cc:484: EXPECT_EQ(5U, report_times.size()); On 2014/03/18 02:19:06, mattm wrote: > changed from 4 to 5 because the limit is not enforced in CSDS now? Should test > the limit in client_side_detection_host_unittest.cc. The actual limit is checked below. In the host unit-test we do check that OverMalwareReportLimit is called on the mock object.
https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... File chrome/browser/safe_browsing/browser_feature_extractor.cc (left): https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... chrome/browser/safe_browsing/browser_feature_extractor.cc:265: DCHECK_EQ(0U, request->url().find("http:")); On 2014/03/20 17:01:45, noé wrote: > On 2014/03/18 02:19:06, mattm wrote: > > why is this removed? > > For malware we also look at HTTPS websites. This is similar to what we do for > malicious download protection. Ah, I had forgotten the HTTP check done in PhishingClassifier. (Seems a little weird that's not part of the pre-classification checks here. ah well.) https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... File chrome/browser/safe_browsing/client_side_detection_service_unittest.cc (right): https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_service_unittest.cc:484: EXPECT_EQ(5U, report_times.size()); On 2014/03/20 17:01:45, noé wrote: > On 2014/03/18 02:19:06, mattm wrote: > > changed from 4 to 5 because the limit is not enforced in CSDS now? Should test > > the limit in client_side_detection_host_unittest.cc. > > The actual limit is checked below. In the host unit-test we do check that > OverMalwareReportLimit is called on the mock object. I only see OverPhishingReportLimit tested in the host unittest. https://codereview.chromium.org/173133004/diff/260001/chrome/browser/safe_bro... File chrome/browser/safe_browsing/client_side_detection_host_unittest.cc (right): https://codereview.chromium.org/173133004/diff/260001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host_unittest.cc:944: // RenderViewHost that won't have the mime type set. Given the comment, is this test still correct? https://codereview.chromium.org/173133004/diff/260001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host_unittest.cc:1067: &kFalse, &kFalse, &kFalse); The flags seem to be wrong on the new version of this test https://codereview.chromium.org/173133004/diff/260001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host_unittest.cc:1094: ExpectPreClassificationChecks(url, &kFalse, &kFalse, &kFalse, &kTrue, &kTrue, Is the malware_killswitch here on purpose?
thanks, noe. https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... File chrome/browser/safe_browsing/browser_feature_extractor.cc (left): https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... chrome/browser/safe_browsing/browser_feature_extractor.cc:265: DCHECK_EQ(0U, request->url().find("http:")); On 2014/03/20 22:51:40, mattm wrote: > On 2014/03/20 17:01:45, noé wrote: > > On 2014/03/18 02:19:06, mattm wrote: > > > why is this removed? > > > > For malware we also look at HTTPS websites. This is similar to what we do for > > malicious download protection. > > Ah, I had forgotten the HTTP check done in PhishingClassifier. (Seems a little > weird that's not part of the pre-classification checks here. ah well.) You're right. Added a pre-classification check for that but didn't change the classifier for now. https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... File chrome/browser/safe_browsing/client_side_detection_service_unittest.cc (right): https://codereview.chromium.org/173133004/diff/240001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_service_unittest.cc:484: EXPECT_EQ(5U, report_times.size()); On 2014/03/20 22:51:40, mattm wrote: > On 2014/03/20 17:01:45, noé wrote: > > On 2014/03/18 02:19:06, mattm wrote: > > > changed from 4 to 5 because the limit is not enforced in CSDS now? Should > test > > > the limit in client_side_detection_host_unittest.cc. > > > > The actual limit is checked below. In the host unit-test we do check that > > OverMalwareReportLimit is called on the mock object. > > I only see OverPhishingReportLimit tested in the host unittest. You're right again. fixed. https://codereview.chromium.org/173133004/diff/260001/chrome/browser/safe_bro... File chrome/browser/safe_browsing/client_side_detection_host_unittest.cc (right): https://codereview.chromium.org/173133004/diff/260001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host_unittest.cc:944: // RenderViewHost that won't have the mime type set. On 2014/03/20 22:51:41, mattm wrote: > Given the comment, is this test still correct? Done. https://codereview.chromium.org/173133004/diff/260001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host_unittest.cc:1067: &kFalse, &kFalse, &kFalse); On 2014/03/20 22:51:41, mattm wrote: > The flags seem to be wrong on the new version of this test Done. https://codereview.chromium.org/173133004/diff/260001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host_unittest.cc:1094: ExpectPreClassificationChecks(url, &kFalse, &kFalse, &kFalse, &kTrue, &kTrue, On 2014/03/20 22:51:41, mattm wrote: > Is the malware_killswitch here on purpose? Nop. Copy&Paste error. Good catch!
lgtm with nit https://codereview.chromium.org/173133004/diff/280001/chrome/browser/safe_bro... File chrome/browser/safe_browsing/client_side_detection_host_unittest.cc (right): https://codereview.chromium.org/173133004/diff/280001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host_unittest.cc:1146: EXPECT_TRUE(Mock::VerifyAndClear(ui_manager_.get())); WaitAndCheckPreClassificationChecks already does VerifyAndClear on the ui_manager
On 2014/03/21 01:15:22, mattm wrote: > lgtm with nit > > https://codereview.chromium.org/173133004/diff/280001/chrome/browser/safe_bro... > File chrome/browser/safe_browsing/client_side_detection_host_unittest.cc > (right): > > https://codereview.chromium.org/173133004/diff/280001/chrome/browser/safe_bro... > chrome/browser/safe_browsing/client_side_detection_host_unittest.cc:1146: > EXPECT_TRUE(Mock::VerifyAndClear(ui_manager_.get())); > WaitAndCheckPreClassificationChecks already does VerifyAndClear on the > ui_manager oh, and csd_host_ doesn't actually look like it's a mock object.. Looks like some places that were doing VerifyAndClear(csd_service_) were changed to csd_host_ in https://codereview.chromium.org/42553002 ... I'm a bit surprised that doesn't cause an error.
https://codereview.chromium.org/173133004/diff/280001/chrome/browser/safe_bro... File chrome/browser/safe_browsing/client_side_detection_host_unittest.cc (right): https://codereview.chromium.org/173133004/diff/280001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host_unittest.cc:1146: EXPECT_TRUE(Mock::VerifyAndClear(ui_manager_.get())); On 2014/03/21 01:15:22, mattm wrote: > WaitAndCheckPreClassificationChecks already does VerifyAndClear on the > ui_manager Done.
+jar for histograms approval. thanks, n.
Merged with histogram changes. PTAL.
+asvitkine to review histogram changes. Thanks, n.
https://codereview.chromium.org/173133004/diff/320001/chrome/browser/safe_bro... File chrome/browser/safe_browsing/client_side_detection_host.cc (right): https://codereview.chromium.org/173133004/diff/320001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host.cc:92: UMA_HISTOGRAM_COUNTS("SBClientMalware.ClassificationStart", 1); Is there a plan for this to eventually be something other than 1? It seems wasteful to allocate a 50-bucket counts histogram just to ever log a value to one of those buckets. https://codereview.chromium.org/173133004/diff/320001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host.cc:516: UMA_HISTOGRAM_COUNTS("SBClientMalware.UnexpectedPageId", 1); Are you missing a histograms.xml entry for this one?
Thanks for the feedback. Excuse my ignorance about stats collection best practices in Chrome. Thanks, n. https://codereview.chromium.org/173133004/diff/320001/chrome/browser/safe_bro... File chrome/browser/safe_browsing/client_side_detection_host.cc (right): https://codereview.chromium.org/173133004/diff/320001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host.cc:92: UMA_HISTOGRAM_COUNTS("SBClientMalware.ClassificationStart", 1); On 2014/03/25 13:51:39, Alexei Svitkine wrote: > Is there a plan for this to eventually be something other than 1? > > It seems wasteful to allocate a 50-bucket counts histogram just to ever log a > value to one of those buckets. Mostly copy & paste from the other histogram above. We don't expect to use the other buckets. What are you suggesting I should use instead? https://codereview.chromium.org/173133004/diff/320001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host.cc:516: UMA_HISTOGRAM_COUNTS("SBClientMalware.UnexpectedPageId", 1); On 2014/03/25 13:51:39, Alexei Svitkine wrote: > Are you missing a histograms.xml entry for this one? Done.
https://codereview.chromium.org/173133004/diff/320001/chrome/browser/safe_bro... File chrome/browser/safe_browsing/client_side_detection_host.cc (right): https://codereview.chromium.org/173133004/diff/320001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host.cc:92: UMA_HISTOGRAM_COUNTS("SBClientMalware.ClassificationStart", 1); On 2014/03/25 15:54:25, noé wrote: > On 2014/03/25 13:51:39, Alexei Svitkine wrote: > > Is there a plan for this to eventually be something other than 1? > > > > It seems wasteful to allocate a 50-bucket counts histogram just to ever log a > > value to one of those buckets. > > Mostly copy & paste from the other histogram above. We don't expect to use the > other buckets. What are you suggesting I should use instead? I suggest UMA_HISTOGRAM_BOOLEAN(). If you look at the bucket distribution for UMA_HISTOGRAM_COUNTS(), the bucket for value 1 will have range [1-2). (You can see this by looking at the data for another UMA_HISTOGRAM_COUNTS() that logs a broad range of values, e.g.: https://uma.googleplex.com/histograms/?dayCount=1&endDate=03-25-2014&version=... ) Since the bucket only has size 1, you can change it to UMA_HISTOGRAM_BOOLEAN which has two buckets - one for 0 and another for 1. You can change both of these to it, since it should be backwards compatible in terms of reported buckets to the old one if only 1 has ever been logged.
https://codereview.chromium.org/173133004/diff/320001/chrome/browser/safe_bro... File chrome/browser/safe_browsing/client_side_detection_host.cc (right): https://codereview.chromium.org/173133004/diff/320001/chrome/browser/safe_bro... chrome/browser/safe_browsing/client_side_detection_host.cc:92: UMA_HISTOGRAM_COUNTS("SBClientMalware.ClassificationStart", 1); On 2014/03/25 16:43:03, Alexei Svitkine wrote: > On 2014/03/25 15:54:25, noé wrote: > > On 2014/03/25 13:51:39, Alexei Svitkine wrote: > > > Is there a plan for this to eventually be something other than 1? > > > > > > It seems wasteful to allocate a 50-bucket counts histogram just to ever log > a > > > value to one of those buckets. > > > > Mostly copy & paste from the other histogram above. We don't expect to use > the > > other buckets. What are you suggesting I should use instead? > > I suggest UMA_HISTOGRAM_BOOLEAN(). > > If you look at the bucket distribution for UMA_HISTOGRAM_COUNTS(), the bucket > for value 1 will have range [1-2). > > (You can see this by looking at the data for another UMA_HISTOGRAM_COUNTS() that > logs a broad range of values, e.g.: > https://uma.googleplex.com/histograms/?dayCount=1&endDate=03-25-2014&version=... > ) > > Since the bucket only has size 1, you can change it to UMA_HISTOGRAM_BOOLEAN > which has two buckets - one for 0 and another for 1. > > You can change both of these to it, since it should be backwards compatible in > terms of reported buckets to the old one if only 1 has ever been logged. Thanks for the tip. I changed a handful of histograms that are in that file.
LGTM with a final suggestion. Thanks! https://codereview.chromium.org/173133004/diff/410001/tools/metrics/histogram... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/173133004/diff/410001/tools/metrics/histogram... tools/metrics/histograms/histograms.xml:21302: +<histogram name="SBClientMalware.UnexpectedPageId"> Probably this (and the other boolean ones) should have enum="BooleanHit" specified.
https://codereview.chromium.org/173133004/diff/410001/tools/metrics/histogram... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/173133004/diff/410001/tools/metrics/histogram... tools/metrics/histograms/histograms.xml:21302: +<histogram name="SBClientMalware.UnexpectedPageId"> On 2014/03/26 19:43:02, Alexei Svitkine wrote: > Probably this (and the other boolean ones) should have enum="BooleanHit" > specified. Done.
The CQ bit was checked by noelutz@chromium.org
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/noelutz@chromium.org/173133004/430001
The CQ bit was unchecked by commit-bot@chromium.org
Retried try job too often on win_rel for step(s) unit_tests http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=win_rel&nu...
The CQ bit was checked by noelutz@chromium.org
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/noelutz@chromium.org/173133004/430001
Message was sent while issue was closed.
Change committed as 260171 |