Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(1695)

Side by Side Diff: chrome/renderer/safe_browsing/scorer.h

Issue 268673007: Extracting page shingle hashes for similarity detection. (Closed) Base URL: https://chromium.googlesource.com/chromium/src.git@master
Patch Set: Address 1st round comment Created 6 years, 7 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
OLDNEW
1 // Copyright (c) 2011 The Chromium Authors. All rights reserved. 1 // Copyright (c) 2011 The Chromium Authors. All rights reserved.
2 // Use of this source code is governed by a BSD-style license that can be 2 // Use of this source code is governed by a BSD-style license that can be
3 // found in the LICENSE file. 3 // found in the LICENSE file.
4 // 4 //
5 // This class loads a client-side model and lets you compute a phishing score 5 // This class loads a client-side model and lets you compute a phishing score
6 // for a set of previously extracted features. The phishing score corresponds 6 // for a set of previously extracted features. The phishing score corresponds
7 // to the probability that the features are indicative of a phishing site. 7 // to the probability that the features are indicative of a phishing site.
8 // 8 //
9 // For more details on how the score is actually computed for a given model 9 // For more details on how the score is actually computed for a given model
10 // and a given set of features read the comments in client_model.proto file. 10 // and a given set of features read the comments in client_model.proto file.
(...skipping 39 matching lines...) Expand 10 before | Expand all | Expand 10 after
50 // Returns a set of hashed page words that appear in the model in binary 50 // Returns a set of hashed page words that appear in the model in binary
51 // format. 51 // format.
52 const base::hash_set<uint32>& page_words() const; 52 const base::hash_set<uint32>& page_words() const;
53 53
54 // Return the maximum number of words per term for the loaded model. 54 // Return the maximum number of words per term for the loaded model.
55 size_t max_words_per_term() const; 55 size_t max_words_per_term() const;
56 56
57 // Returns the murmurhash3 seed for the loaded model. 57 // Returns the murmurhash3 seed for the loaded model.
58 uint32 murmurhash3_seed() const; 58 uint32 murmurhash3_seed() const;
59 59
60 // Return the maximum number of unique shingle hashes per page.
61 size_t max_shingles_per_page() const;
62
63 // Return the number of words in a shingle.
64 size_t shingle_size() const;
65
60 protected: 66 protected:
61 // Most clients should use the factory method. This constructor is public 67 // Most clients should use the factory method. This constructor is public
62 // to allow for mock implementations. 68 // to allow for mock implementations.
63 Scorer(); 69 Scorer();
64 70
65 private: 71 private:
66 friend class PhishingScorerTest; 72 friend class PhishingScorerTest;
67 73
68 // Computes the score for a given rule and feature map. The score is computed 74 // Computes the score for a given rule and feature map. The score is computed
69 // by multiplying the rule weight with the product of feature weights for the 75 // by multiplying the rule weight with the product of feature weights for the
70 // given rule. The feature weights are stored in the feature map. If a 76 // given rule. The feature weights are stored in the feature map. If a
71 // particular feature does not exist in the feature map we set its weight to 77 // particular feature does not exist in the feature map we set its weight to
72 // zero. 78 // zero.
73 double ComputeRuleScore(const ClientSideModel::Rule& rule, 79 double ComputeRuleScore(const ClientSideModel::Rule& rule,
74 const FeatureMap& features) const; 80 const FeatureMap& features) const;
75 81
76 ClientSideModel model_; 82 ClientSideModel model_;
77 base::hash_set<std::string> page_terms_; 83 base::hash_set<std::string> page_terms_;
78 base::hash_set<uint32> page_words_; 84 base::hash_set<uint32> page_words_;
79 85
80 DISALLOW_COPY_AND_ASSIGN(Scorer); 86 DISALLOW_COPY_AND_ASSIGN(Scorer);
81 }; 87 };
82 } // namepsace safe_browsing 88 } // namespace safe_browsing
83 89
84 #endif // CHROME_RENDERER_SAFE_BROWSING_SCORER_H_ 90 #endif // CHROME_RENDERER_SAFE_BROWSING_SCORER_H_
OLDNEW

Powered by Google App Engine
This is Rietveld 408576698