DescriptionAdd a term feature extractor for client-side phishing detection.
This class creates features for n-grams in the page text that appear in the
phishing classification model. It will eventually operate on the plain text
that is extracted by RenderView::CaptureText().
To make it harder for phishers to enumerate the terms in the classification
model, they will be supplied as SHA-256 hashes rather than plain text. The
term feature extractor hashes the words in the document in order to check
whether they match the model. Since this is potentially expensive, the term
feature extractor limits how long it will run on each iteration, similar to
the PhishingDOMFeatureExtractor.
TEST=PhishingTermFeatureExtractorTest
BUG=none
Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=58537
Patch Set 1 #
Total comments: 29
Patch Set 2 : address noe's comments #
Total comments: 10
Patch Set 3 : address lei's comments #Patch Set 4 : Add an extra comment/TODO about performance. #Messages
Total messages: 8 (0 generated)
|