|
|
Created:
6 years, 5 months ago by Avi (use Gerrit) Modified:
6 years, 5 months ago CC:
chromium-reviews, Takashi Toyoshima Base URL:
svn://svn.chromium.org/chrome/trunk/src Project:
chromium Visibility:
Public. |
DescriptionRemove page id from "page capture".
ChromeRenderViewObserver::CapturePageInfo is a relic. It hasn't captured page content for indexing for about a year now (r212459), and the fancy footwork to avoid re-indexing isn't needed any more.
BUG=371151
TEST=everything still works
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=282479
Patch Set 1 #
Total comments: 10
Patch Set 2 : fixier #Patch Set 3 : fixes #
Total comments: 2
Patch Set 4 : fix #Patch Set 5 : test #Patch Set 6 : test4sure #
Total comments: 5
Patch Set 7 : comments #Patch Set 8 : revert to patch set 1 #
Messages
Total messages: 19 (0 generated)
John, WDYT? This should be safe.
John is out this week; finding someone in.
drive-by comments. https://codereview.chromium.org/363293005/diff/1/chrome/renderer/chrome_rende... File chrome/renderer/chrome_render_view_observer.cc (left): https://codereview.chromium.org/363293005/diff/1/chrome/renderer/chrome_rende... chrome/renderer/chrome_render_view_observer.cc:390: if (render_view()->GetPageId() != page_id) This looks like a guard against the RenderView loading a new page between the post task in CapturePageInfoLater() and the invocation. I would expect us to need to replace the page_id based cancellation with a WeakPtrFactory or something other similar flag. Might even be able to just remove the timer and use PostDelayedTask directly. https://codereview.chromium.org/363293005/diff/1/chrome/renderer/chrome_rende... chrome/renderer/chrome_render_view_observer.cc:452: GURL stripped_url(StripRef(url)); In this observer, the page_id seems to be used as a coarse grain indication of "we've been here before." In particular, this block of code seems to be avoiding reclassification of pages with the same top-level URL. I think the classifier itself does some squashing here, but not 100% sure if we've done the same job of de-duping.
thestig: for review ajwong: fyi https://codereview.chromium.org/363293005/diff/1/chrome/renderer/chrome_rende... File chrome/renderer/chrome_render_view_observer.cc (left): https://codereview.chromium.org/363293005/diff/1/chrome/renderer/chrome_rende... chrome/renderer/chrome_render_view_observer.cc:390: if (render_view()->GetPageId() != page_id) I'm not convinced this check for page id is even necessary. When a navigation happens, DidCommitProvisionalLoad will get called, and it will call CapturePageInfoLater, which will reset the timer. How is is possible that a timed capture request with the wrong page id would even happen? https://codereview.chromium.org/363293005/diff/1/chrome/renderer/chrome_rende... chrome/renderer/chrome_render_view_observer.cc:452: GURL stripped_url(StripRef(url)); On 2014/07/07 18:36:59, awong wrote: > In this observer, the page_id seems to be used as a coarse grain indication of > "we've been here before." > > In particular, this block of code seems to be avoiding reclassification of pages > with the same top-level URL. I think the classifier itself does some squashing > here, but not 100% sure if we've done the same job of de-duping. The classifier does check for PAGE_TRANSITION_FORWARD_BACK. It's roughly equivalent. What's irritating is that that check for PAGE_TRANSITION_FORWARD_BACK could never have been used before, because forward-back pages were never passed to it in the first place. I'd rather rely on the checks that the classifier people put in rather than the upstream checks that it appears that they weren't even aware of.
I don't know this code well either, so I had to do some archaeology and poke it a bit. +toyoshim who touched some of this code last year. https://codereview.chromium.org/363293005/diff/1/chrome/renderer/chrome_rende... File chrome/renderer/chrome_render_view_observer.cc (left): https://codereview.chromium.org/363293005/diff/1/chrome/renderer/chrome_rende... chrome/renderer/chrome_render_view_observer.cc:390: if (render_view()->GetPageId() != page_id) On 2014/07/07 18:49:46, Avi (OOO July 11-27) wrote: > I'm not convinced this check for page id is even necessary. > > When a navigation happens, DidCommitProvisionalLoad will get called, and it will > call CapturePageInfoLater, which will reset the timer. How is is possible that a > timed capture request with the wrong page id would even happen? Can the page id change if |is_new_navigation| in DidCommitProvisionalLoad() is false? That would be the only possible cause. https://codereview.chromium.org/363293005/diff/1/chrome/renderer/chrome_rende... chrome/renderer/chrome_render_view_observer.cc:452: GURL stripped_url(StripRef(url)); On 2014/07/07 18:49:46, Avi (OOO July 11-27) wrote: > On 2014/07/07 18:36:59, awong wrote: > > In this observer, the page_id seems to be used as a coarse grain indication of > > "we've been here before." > > > > In particular, this block of code seems to be avoiding reclassification of > pages > > with the same top-level URL. I think the classifier itself does some squashing > > here, but not 100% sure if we've done the same job of de-duping. > > The classifier does check for PAGE_TRANSITION_FORWARD_BACK. It's roughly > equivalent. > > What's irritating is that that check for PAGE_TRANSITION_FORWARD_BACK could > never have been used before, because forward-back pages were never passed to it > in the first place. I'd rather rely on the checks that the classifier people put > in rather than the upstream checks that it appears that they weren't even aware > of. The PAGE_TRANSITION_FORWARD_BACK was added in r83366 and this check here was added afterwards in r99025 for bug 80742. Given a foo.html that calls location.replace("foo.html"), currently this check gets hit and the phishing classifier code never runs. When removed, the phishing classifier code runs and it passes the PAGE_TRANSITION_FORWARD_BACK test. https://codereview.chromium.org/363293005/diff/1/chrome/renderer/chrome_rende... File chrome/renderer/chrome_render_view_observer.h (right): https://codereview.chromium.org/363293005/diff/1/chrome/renderer/chrome_rende... chrome/renderer/chrome_render_view_observer.h:77: // Captures the thumbnail and text contents for indexing for the given load The comment still refers to the id.
https://codereview.chromium.org/363293005/diff/1/chrome/renderer/chrome_rende... File chrome/renderer/chrome_render_view_observer.cc (left): https://codereview.chromium.org/363293005/diff/1/chrome/renderer/chrome_rende... chrome/renderer/chrome_render_view_observer.cc:390: if (render_view()->GetPageId() != page_id) On 2014/07/07 19:37:04, Lei Zhang wrote: > On 2014/07/07 18:49:46, Avi (OOO July 11-27) wrote: > > I'm not convinced this check for page id is even necessary. > > > > When a navigation happens, DidCommitProvisionalLoad will get called, and it > will > > call CapturePageInfoLater, which will reset the timer. How is is possible that > a > > timed capture request with the wrong page id would even happen? > > Can the page id change if |is_new_navigation| in DidCommitProvisionalLoad() is > false? That would be the only possible cause. I don't know. I can reset the timer no matter what. https://codereview.chromium.org/363293005/diff/1/chrome/renderer/chrome_rende... chrome/renderer/chrome_render_view_observer.cc:452: GURL stripped_url(StripRef(url)); So the phishing classifier runs an extra time if the page does a location.replace? I would think that's a good thing; what if different, fishier, content was loaded this next time? It would be nice to get a cleaner signal here as to whether things changed or not. https://codereview.chromium.org/363293005/diff/1/chrome/renderer/chrome_rende... File chrome/renderer/chrome_render_view_observer.h (right): https://codereview.chromium.org/363293005/diff/1/chrome/renderer/chrome_rende... chrome/renderer/chrome_render_view_observer.h:77: // Captures the thumbnail and text contents for indexing for the given load On 2014/07/07 19:37:04, Lei Zhang wrote: > The comment still refers to the id. Done.
lgtm assuming mattm agrees that it is ok to call the phishing classifier one more time.
On 2014/07/07 20:33:44, Lei Zhang wrote: > lgtm assuming mattm agrees that it is ok to call the phishing classifier one > more time. FYI if you look at PhishingClassifierDelegate::MaybeStartClassification line 254, it already ignores repeats of the top-level URL. That should take care of the case where a page reloads itself via location replacement.
On 2014/07/08 00:50:29, Avi (OOO July 11-27) wrote: > On 2014/07/07 20:33:44, Lei Zhang wrote: > > lgtm assuming mattm agrees that it is ok to call the phishing classifier one > > more time. > > FYI if you look at PhishingClassifierDelegate::MaybeStartClassification line > 254, it already ignores repeats of the top-level URL. That should take care of > the case where a page reloads itself via location replacement. Indeed, I didn't look further down.
https://codereview.chromium.org/363293005/diff/40001/chrome/renderer/safe_bro... File chrome/renderer/safe_browsing/phishing_classifier_delegate.cc (right): https://codereview.chromium.org/363293005/diff/40001/chrome/renderer/safe_bro... chrome/renderer/safe_browsing/phishing_classifier_delegate.cc:157: if (preliminary_capture || *page_text != classifier_page_text_) should that be "==" ?
Addressed. https://codereview.chromium.org/363293005/diff/40001/chrome/renderer/safe_bro... File chrome/renderer/safe_browsing/phishing_classifier_delegate.cc (right): https://codereview.chromium.org/363293005/diff/40001/chrome/renderer/safe_bro... chrome/renderer/safe_browsing/phishing_classifier_delegate.cc:157: if (preliminary_capture || *page_text != classifier_page_text_) On 2014/07/08 23:35:50, mattm wrote: > should that be "==" ? D'oh. Yes.
On 2014/07/08 23:43:18, Avi (OOO July 11-27) wrote: > Addressed. > > https://codereview.chromium.org/363293005/diff/40001/chrome/renderer/safe_bro... > File chrome/renderer/safe_browsing/phishing_classifier_delegate.cc (right): > > https://codereview.chromium.org/363293005/diff/40001/chrome/renderer/safe_bro... > chrome/renderer/safe_browsing/phishing_classifier_delegate.cc:157: if > (preliminary_capture || *page_text != classifier_page_text_) > On 2014/07/08 23:35:50, mattm wrote: > > should that be "==" ? > > D'oh. Yes. Looks like some tests need updating too
And this is where it starts scaring me. The tests very deliberately test the inner workings and behaviors, and I'm intentionally changing the behavior of the code to a new one that I'm pretty sure is safe but that doesn't match what the test is expecting. I hacked the tests to pass, but I have much less confidence in what I did to the tests than in what I did to the code. (PTAL)
https://codereview.chromium.org/363293005/diff/100001/chrome/renderer/safe_br... File chrome/renderer/safe_browsing/phishing_classifier_delegate_browsertest.cc (left): https://codereview.chromium.org/363293005/diff/100001/chrome/renderer/safe_br... chrome/renderer/safe_browsing/phishing_classifier_delegate_browsertest.cc:293: // content is being replaced. It would be good to update and keep more of these comments. Like this could be "Reloading the same page should not cancel classification or trigger a reclassification". https://codereview.chromium.org/363293005/diff/100001/chrome/renderer/safe_br... File chrome/renderer/safe_browsing/phishing_classifier_delegate_browsertest.cc (right): https://codereview.chromium.org/363293005/diff/100001/chrome/renderer/safe_br... chrome/renderer/safe_browsing/phishing_classifier_delegate_browsertest.cc:299: // see the TODO in PhishingClassifierDelegate::DidCommitProvisionalLoad. that TODO no longer exists
Those comments are done. I'm at the edge of my understanding here, btw. I hope I got them correct. https://codereview.chromium.org/363293005/diff/100001/chrome/renderer/safe_br... File chrome/renderer/safe_browsing/phishing_classifier_delegate_browsertest.cc (left): https://codereview.chromium.org/363293005/diff/100001/chrome/renderer/safe_br... chrome/renderer/safe_browsing/phishing_classifier_delegate_browsertest.cc:293: // content is being replaced. On 2014/07/10 03:21:01, mattm wrote: > It would be good to update and keep more of these comments. Like this could be > "Reloading the same page should not cancel classification or trigger a > reclassification". Done. https://codereview.chromium.org/363293005/diff/100001/chrome/renderer/safe_br... File chrome/renderer/safe_browsing/phishing_classifier_delegate_browsertest.cc (right): https://codereview.chromium.org/363293005/diff/100001/chrome/renderer/safe_br... chrome/renderer/safe_browsing/phishing_classifier_delegate_browsertest.cc:299: // see the TODO in PhishingClassifierDelegate::DidCommitProvisionalLoad. On 2014/07/10 03:21:01, mattm wrote: > that TODO no longer exists Done. https://codereview.chromium.org/363293005/diff/100001/chrome/renderer/safe_br... chrome/renderer/safe_browsing/phishing_classifier_delegate_browsertest.cc:299: // see the TODO in PhishingClassifierDelegate::DidCommitProvisionalLoad. On 2014/07/10 03:21:01, mattm wrote: > that TODO no longer exists Done.
patch set 8 lgtm
The CQ bit was checked by avi@chromium.org
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/avi@chromium.org/363293005/140001
Message was sent while issue was closed.
Change committed as 282479 |