|
|
Created:
9 years, 3 months ago by Brian Ryner Modified:
9 years, 3 months ago CC:
chromium-reviews, Paweł Hajdan Jr., darin-cc_chromium.org, brettw-cc_chromium.org, Garrett Casto Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionChange PhishingDOMFeatureExtractor to cache the WebDocument rather than a WebFrame pointer.
WebFrames are not guaranteed to remain valid across calls to ExtractFeaturesWithTmeout.
BUG=97148
TEST=PhishingDOMFeatureExtractorTest.SubframeRemoval
Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=102541
Patch Set 1 #
Total comments: 14
Patch Set 2 : address mattm's and darin's review comments #
Total comments: 2
Patch Set 3 : Address Noe's review comment #
Messages
Total messages: 16 (0 generated)
http://codereview.chromium.org/7976039/diff/1/chrome/renderer/safe_browsing/p... File chrome/renderer/safe_browsing/phishing_dom_feature_extractor.cc (right): http://codereview.chromium.org/7976039/diff/1/chrome/renderer/safe_browsing/p... chrome/renderer/safe_browsing/phishing_dom_feature_extractor.cc:400: if (frame) { Does this mean that if a frame is removed while processing it, any further frames won't be processed? http://codereview.chromium.org/7976039/diff/1/chrome/renderer/safe_browsing/p... File chrome/renderer/safe_browsing/phishing_dom_feature_extractor.h (right): http://codereview.chromium.org/7976039/diff/1/chrome/renderer/safe_browsing/p... chrome/renderer/safe_browsing/phishing_dom_feature_extractor.h:135: scoped_ptr<WebKit::WebDocument> cur_document_; WebDocument is already basically refcounted ptr, so we don't really need to wrap it in a scoped_ptr too, do we? http://codereview.chromium.org/7976039/diff/1/chrome/renderer/safe_browsing/p... File chrome/renderer/safe_browsing/phishing_dom_feature_extractor_browsertest.cc (right): http://codereview.chromium.org/7976039/diff/1/chrome/renderer/safe_browsing/p... chrome/renderer/safe_browsing/phishing_dom_feature_extractor_browsertest.cc:363: // iframe document. Should we also have a test for when the iframe is removed before the extractor gets to it?
http://codereview.chromium.org/7976039/diff/1/chrome/renderer/safe_browsing/p... File chrome/renderer/safe_browsing/phishing_dom_feature_extractor.cc (right): http://codereview.chromium.org/7976039/diff/1/chrome/renderer/safe_browsing/p... chrome/renderer/safe_browsing/phishing_dom_feature_extractor.cc:400: if (frame) { On 2011/09/22 02:31:38, mattm wrote: > Does this mean that if a frame is removed while processing it, any further > frames won't be processed? Yes, that's correct -- not ideal, but I don't know how big of a problem it's likely to be. It might be possible to avoid this by traversing all of the frames up-front to create an array of WebDocuments... thoughts? http://codereview.chromium.org/7976039/diff/1/chrome/renderer/safe_browsing/p... File chrome/renderer/safe_browsing/phishing_dom_feature_extractor.h (right): http://codereview.chromium.org/7976039/diff/1/chrome/renderer/safe_browsing/p... chrome/renderer/safe_browsing/phishing_dom_feature_extractor.h:135: scoped_ptr<WebKit::WebDocument> cur_document_; On 2011/09/22 02:31:38, mattm wrote: > WebDocument is already basically refcounted ptr, so we don't really need to wrap > it in a scoped_ptr too, do we? The only reason I'm doing that is so that we can use NULL to represent the "extraction hasn't started yet" state. If you think this is too confusing, I could make that a separate boolean, or use page_feature_tate_->num_iterations to figure this out. http://codereview.chromium.org/7976039/diff/1/chrome/renderer/safe_browsing/p... File chrome/renderer/safe_browsing/phishing_dom_feature_extractor_browsertest.cc (right): http://codereview.chromium.org/7976039/diff/1/chrome/renderer/safe_browsing/p... chrome/renderer/safe_browsing/phishing_dom_feature_extractor_browsertest.cc:363: // iframe document. On 2011/09/22 02:31:38, mattm wrote: > Should we also have a test for when the iframe is removed before the extractor > gets to it? That code path has never really had a problem, but I could add a test for this if you'd prefer. It's certainly something we'd need to exercise if we changed to building up an array of WebDocuments as I mentioned above.
http://codereview.chromium.org/7976039/diff/1/chrome/renderer/safe_browsing/p... File chrome/renderer/safe_browsing/phishing_dom_feature_extractor.cc (right): http://codereview.chromium.org/7976039/diff/1/chrome/renderer/safe_browsing/p... chrome/renderer/safe_browsing/phishing_dom_feature_extractor.cc:400: if (frame) { On 2011/09/22 02:42:45, Brian Ryner wrote: > On 2011/09/22 02:31:38, mattm wrote: > > Does this mean that if a frame is removed while processing it, any further > > frames won't be processed? > > Yes, that's correct -- not ideal, but I don't know how big of a problem it's > likely to be. It might be possible to avoid this by traversing all of the > frames up-front to create an array of WebDocuments... thoughts? hmm, that's a good thought. Maybe add a histogram for how many times it happens so we can see if it's worth doing? http://codereview.chromium.org/7976039/diff/1/chrome/renderer/safe_browsing/p... File chrome/renderer/safe_browsing/phishing_dom_feature_extractor.h (right): http://codereview.chromium.org/7976039/diff/1/chrome/renderer/safe_browsing/p... chrome/renderer/safe_browsing/phishing_dom_feature_extractor.h:135: scoped_ptr<WebKit::WebDocument> cur_document_; On 2011/09/22 02:42:45, Brian Ryner wrote: > On 2011/09/22 02:31:38, mattm wrote: > > WebDocument is already basically refcounted ptr, so we don't really need to > wrap > > it in a scoped_ptr too, do we? > > The only reason I'm doing that is so that we can use NULL to represent the > "extraction hasn't started yet" state. If you think this is too confusing, I > could make that a separate boolean, or use page_feature_tate_->num_iterations to > figure this out. mm, maybe could use one of the other pointers? features_ seems to have the same lifecycle, and page_feature_state_ itself could if it was reset in Clear() ? http://codereview.chromium.org/7976039/diff/1/chrome/renderer/safe_browsing/p... File chrome/renderer/safe_browsing/phishing_dom_feature_extractor_browsertest.cc (right): http://codereview.chromium.org/7976039/diff/1/chrome/renderer/safe_browsing/p... chrome/renderer/safe_browsing/phishing_dom_feature_extractor_browsertest.cc:363: // iframe document. On 2011/09/22 02:42:45, Brian Ryner wrote: > On 2011/09/22 02:31:38, mattm wrote: > > Should we also have a test for when the iframe is removed before the extractor > > gets to it? > > That code path has never really had a problem, but I could add a test for this > if you'd prefer. It's certainly something we'd need to exercise if we changed > to building up an array of WebDocuments as I mentioned above. Okay, lets leave that for if we change it later.
http://codereview.chromium.org/7976039/diff/1/chrome/renderer/safe_browsing/p... File chrome/renderer/safe_browsing/phishing_dom_feature_extractor.cc (right): http://codereview.chromium.org/7976039/diff/1/chrome/renderer/safe_browsing/p... chrome/renderer/safe_browsing/phishing_dom_feature_extractor.cc:400: if (frame) { On 2011/09/22 03:08:23, mattm wrote: > On 2011/09/22 02:42:45, Brian Ryner wrote: > > On 2011/09/22 02:31:38, mattm wrote: > > > Does this mean that if a frame is removed while processing it, any further > > > frames won't be processed? > > > > Yes, that's correct -- not ideal, but I don't know how big of a problem it's > > likely to be. It might be possible to avoid this by traversing all of the > > frames up-front to create an array of WebDocuments... thoughts? > > hmm, that's a good thought. Maybe add a histogram for how many times it happens > so we can see if it's worth doing? Good idea, done. http://codereview.chromium.org/7976039/diff/1/chrome/renderer/safe_browsing/p... File chrome/renderer/safe_browsing/phishing_dom_feature_extractor.h (right): http://codereview.chromium.org/7976039/diff/1/chrome/renderer/safe_browsing/p... chrome/renderer/safe_browsing/phishing_dom_feature_extractor.h:135: scoped_ptr<WebKit::WebDocument> cur_document_; On 2011/09/22 03:08:23, mattm wrote: > On 2011/09/22 02:42:45, Brian Ryner wrote: > > On 2011/09/22 02:31:38, mattm wrote: > > > WebDocument is already basically refcounted ptr, so we don't really need to > > wrap > > > it in a scoped_ptr too, do we? > > > > The only reason I'm doing that is so that we can use NULL to represent the > > "extraction hasn't started yet" state. If you think this is too confusing, I > > could make that a separate boolean, or use page_feature_tate_->num_iterations > to > > figure this out. > > mm, maybe could use one of the other pointers? features_ seems to have the same > lifecycle, and page_feature_state_ itself could if it was reset in Clear() ? These are both currently created in ExtractFeatures(), not ExtractFeaturesWithTimeout(). Maybe we should just initialize cur_document_ there as well -- I don't remember a particular reason to do that in ExtractFeaturesWithTimeout. Do either of you see a problem with that change?
lgtm
http://codereview.chromium.org/7976039/diff/1/chrome/renderer/safe_browsing/p... File chrome/renderer/safe_browsing/phishing_dom_feature_extractor_browsertest.cc (right): http://codereview.chromium.org/7976039/diff/1/chrome/renderer/safe_browsing/p... chrome/renderer/safe_browsing/phishing_dom_feature_extractor_browsertest.cc:39: method_factory_.NewRunnableMethod( nit: use base::Bind instead. http://codereview.chromium.org/7976039/diff/1/chrome/renderer/safe_browsing/p... chrome/renderer/safe_browsing/phishing_dom_feature_extractor_browsertest.cc:46: ALLOW_THIS_IN_INITIALIZER_LIST(method_factory_(this)) {} nit: instead of ScopedRunnableMethodFactory, use WeakPtrFactory (or SupportsWeakPtr)
http://codereview.chromium.org/7976039/diff/1/chrome/renderer/safe_browsing/p... File chrome/renderer/safe_browsing/phishing_dom_feature_extractor.h (right): http://codereview.chromium.org/7976039/diff/1/chrome/renderer/safe_browsing/p... chrome/renderer/safe_browsing/phishing_dom_feature_extractor.h:135: scoped_ptr<WebKit::WebDocument> cur_document_; On 2011/09/22 05:09:56, Brian Ryner wrote: > On 2011/09/22 03:08:23, mattm wrote: > > On 2011/09/22 02:42:45, Brian Ryner wrote: > > > On 2011/09/22 02:31:38, mattm wrote: > > > > WebDocument is already basically refcounted ptr, so we don't really need > to > > > wrap > > > > it in a scoped_ptr too, do we? > > > > > > The only reason I'm doing that is so that we can use NULL to represent the > > > "extraction hasn't started yet" state. If you think this is too confusing, > I > > > could make that a separate boolean, or use > page_feature_tate_->num_iterations > > to > > > figure this out. > > > > mm, maybe could use one of the other pointers? features_ seems to have the > same > > lifecycle, and page_feature_state_ itself could if it was reset in Clear() ? > > These are both currently created in ExtractFeatures(), not > ExtractFeaturesWithTimeout(). Maybe we should just initialize cur_document_ > there as well -- I don't remember a particular reason to do that in > ExtractFeaturesWithTimeout. Do either of you see a problem with that change? sgtm
Please take another look.
http://codereview.chromium.org/7976039/diff/8001/chrome/renderer/safe_browsin... File chrome/renderer/safe_browsing/phishing_dom_feature_extractor.cc (right): http://codereview.chromium.org/7976039/diff/8001/chrome/renderer/safe_browsin... chrome/renderer/safe_browsing/phishing_dom_feature_extractor.cc:138: cur_document_ = web_view->mainFrame()->document(); Alternatively, you could also only set cur_document_ if there is a view and a frame and check whether cur_document_ is null in the extraction with timeout method below. That would avoid the additional PostTask. Might be a bit easier?
Please take another look. http://codereview.chromium.org/7976039/diff/8001/chrome/renderer/safe_browsin... File chrome/renderer/safe_browsing/phishing_dom_feature_extractor.cc (right): http://codereview.chromium.org/7976039/diff/8001/chrome/renderer/safe_browsin... chrome/renderer/safe_browsing/phishing_dom_feature_extractor.cc:138: cur_document_ = web_view->mainFrame()->document(); On 2011/09/23 01:13:15, noelutz wrote: > Alternatively, you could also only set cur_document_ if there is a view and a > frame and check whether cur_document_ is null in the extraction with timeout > method below. That would avoid the additional PostTask. Might be a bit easier? Done.
lgtm Splendid. thanks, noé.
OK,LGTM
lgtm
CQ is trying tha patch. Follow status at https://chromium-status.appspot.com/cq/bryner@chromium.org/7976039/11002
Change committed as 102541 |