Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(307)

Side by Side Diff: chrome/renderer/safe_browsing/phishing_classifier.h

Issue 2667343006: Componentize safe_browsing [X+1] : move the renderer part to component.
Patch Set: Created 3 years, 10 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
OLDNEW
(Empty)
1 // Copyright (c) 2012 The Chromium Authors. All rights reserved.
2 // Use of this source code is governed by a BSD-style license that can be
3 // found in the LICENSE file.
4 //
5 // This class handles the process of extracting all of the features from a
6 // page and computing a phishyness score. The basic steps are:
7 // - Run each feature extractor over the page, building up a FeatureMap of
8 // feature -> value.
9 // - SHA-256 hash all of the feature names in the map so that they match the
10 // supplied model.
11 // - Hand the hashed map off to a Scorer, which computes the probability that
12 // the page is phishy.
13 // - If the page is phishy, run the supplied callback.
14 //
15 // For more details, see phishing_*_feature_extractor.h, scorer.h, and
16 // client_model.proto.
17
18 #ifndef CHROME_RENDERER_SAFE_BROWSING_PHISHING_CLASSIFIER_H_
19 #define CHROME_RENDERER_SAFE_BROWSING_PHISHING_CLASSIFIER_H_
20
21 #include <stdint.h>
22
23 #include <memory>
24 #include <set>
25
26 #include "base/callback.h"
27 #include "base/macros.h"
28 #include "base/memory/weak_ptr.h"
29 #include "base/strings/string16.h"
30
31 namespace content {
32 class RenderFrame;
33 }
34
35 namespace safe_browsing {
36 class ClientPhishingRequest;
37 class FeatureExtractorClock;
38 class FeatureMap;
39 class PhishingDOMFeatureExtractor;
40 class PhishingTermFeatureExtractor;
41 class PhishingUrlFeatureExtractor;
42 class Scorer;
43
44 class PhishingClassifier {
45 public:
46 // Callback to be run when phishing classification finishes. The verdict
47 // is a ClientPhishingRequest which contains the verdict computed by the
48 // classifier as well as the extracted features. If the verdict.is_phishing()
49 // is true, the page is considered phishy by the client-side model,
50 // and the browser should ping back to get a final verdict. The
51 // verdict.client_score() is set to kInvalidScore if classification failed.
52 typedef base::Callback<void(const ClientPhishingRequest& /* verdict */)>
53 DoneCallback;
54
55 static const float kInvalidScore;
56
57 // Creates a new PhishingClassifier object that will operate on
58 // |render_view|. |clock| is used to time feature extractor operations, and
59 // the PhishingClassifier takes ownership of this object. Note that the
60 // classifier will not be 'ready' until set_phishing_scorer() is called.
61 PhishingClassifier(content::RenderFrame* render_frame,
62 FeatureExtractorClock* clock);
63 virtual ~PhishingClassifier();
64
65 // Sets a scorer for the classifier to use in computing the phishiness score.
66 // This must live at least as long as the PhishingClassifier. The caller is
67 // expected to cancel any pending classification before setting a phishing
68 // scorer.
69 void set_phishing_scorer(const Scorer* scorer);
70
71 // Returns true if the classifier is ready to classify pages, i.e. it
72 // has had a scorer set via set_phishing_scorer().
73 bool is_ready() const;
74
75 // Called by the RenderView when a page has finished loading. This begins
76 // the feature extraction and scoring process. |page_text| should contain
77 // the plain text of a web page, including any subframes, as returned by
78 // RenderView::CaptureText(). |page_text| is owned by the caller, and must
79 // not be destroyed until either |done_callback| is run or
80 // CancelPendingClassification() is called.
81 //
82 // To avoid blocking the render thread for too long, phishing classification
83 // may run in several chunks of work, posting a task to the current
84 // MessageLoop to continue processing. Once the scoring process is complete,
85 // |done_callback| is run on the current thread. PhishingClassifier takes
86 // ownership of the callback.
87 //
88 // It is an error to call BeginClassification if the classifier is not yet
89 // ready.
90 virtual void BeginClassification(const base::string16* page_text,
91 const DoneCallback& callback);
92
93 // Called by the RenderView (on the render thread) when a page is unloading
94 // or the RenderView is being destroyed. This cancels any extraction that
95 // is in progress. It is an error to call CancelPendingClassification if
96 // the classifier is not yet ready.
97 virtual void CancelPendingClassification();
98
99 private:
100 // Any score equal to or above this value is considered phishy.
101 static const float kPhishyThreshold;
102
103 // Begins the feature extraction process, by extracting URL features and
104 // beginning DOM feature extraction.
105 void BeginFeatureExtraction();
106
107 // Callback to be run when DOM feature extraction is complete.
108 // If it was successful, begins term feature extraction, otherwise
109 // runs the DoneCallback with a non-phishy verdict.
110 void DOMExtractionFinished(bool success);
111
112 // Callback to be run when term feature extraction is complete.
113 // If it was successful, computes a score and runs the DoneCallback.
114 // If extraction was unsuccessful, runs the DoneCallback with a
115 // non-phishy verdict.
116 void TermExtractionFinished(bool success);
117
118 // Helper to verify that there is no pending phishing classification. Dies
119 // in debug builds if the state is not as expected. This is a no-op in
120 // release builds.
121 void CheckNoPendingClassification();
122
123 // Helper method to run the DoneCallback and clear the state.
124 void RunCallback(const ClientPhishingRequest& verdict);
125
126 // Helper to run the DoneCallback when feature extraction has failed.
127 // This always signals a non-phishy verdict for the page, with kInvalidScore.
128 void RunFailureCallback();
129
130 // Clears the current state of the PhishingClassifier.
131 void Clear();
132
133 content::RenderFrame* render_frame_; // owns us
134 const Scorer* scorer_; // owned by the caller
135 std::unique_ptr<FeatureExtractorClock> clock_;
136 std::unique_ptr<PhishingUrlFeatureExtractor> url_extractor_;
137 std::unique_ptr<PhishingDOMFeatureExtractor> dom_extractor_;
138 std::unique_ptr<PhishingTermFeatureExtractor> term_extractor_;
139
140 // State for any in-progress extraction.
141 std::unique_ptr<FeatureMap> features_;
142 std::unique_ptr<std::set<uint32_t>> shingle_hashes_;
143 const base::string16* page_text_; // owned by the caller
144 DoneCallback done_callback_;
145
146 // Used in scheduling BeginFeatureExtraction tasks.
147 // These pointers are invalidated if classification is cancelled.
148 base::WeakPtrFactory<PhishingClassifier> weak_factory_;
149
150 DISALLOW_COPY_AND_ASSIGN(PhishingClassifier);
151 };
152
153 } // namespace safe_browsing
154
155 #endif // CHROME_RENDERER_SAFE_BROWSING_PHISHING_CLASSIFIER_H_
OLDNEW
« no previous file with comments | « chrome/renderer/safe_browsing/murmurhash3_util_unittest.cc ('k') | chrome/renderer/safe_browsing/phishing_classifier.cc » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698