|
|
Created:
4 years, 8 months ago by kolos1 Modified:
4 years, 6 months ago CC:
chromium-reviews, mlamouri+watch-content_chromium.org, vabr+watchlistpasswordmanager_chromium.org, rouslan+autofill_chromium.org, jam, blink-reviews, browser-components-watch_chromium.org, jdonnelly+autofillwatch_chromium.org, bondd+autofillwatch_chromium.org, darin-cc_chromium.org, mkwst+moarreviews-renderer_chromium.org, estade+watch_chromium.org, gcasto+watchlist_chromium.org, vabr+watchlistautofill_chromium.org, mkwst+watchlist-passwords_chromium.org Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
Description[Password Manager] HTML parsing based client-side form type classifier
This CL introduces the classifier that detects forms where password generation should be offered (i.e. signup and change password forms).
BUG=582434
Committed: https://crrev.com/1a8666ef27c1fbee0f0d57a79b533dc51ba6e605
Cr-Commit-Position: refs/heads/master@{#399899}
Patch Set 1 #Patch Set 2 : Fields number and text features. #Patch Set 3 : Testing system #Patch Set 4 : #Patch Set 5 : Baseline 4 #Patch Set 6 : Only id/name, only visible elements #Patch Set 7 : Smart parent checking. All attributes #Patch Set 8 : #Patch Set 9 : #Patch Set 10 : #Patch Set 11 : Moved the classifier to new files #Patch Set 12 : Reseted unrelated files to origin/master #Patch Set 13 : Reseted unrelated files to origin/master 2 #Patch Set 14 : #Patch Set 15 : #Patch Set 16 : #Patch Set 17 : #Patch Set 18 : The patch sent to reviewer #Patch Set 19 : Changed the signature of ClassifyFormAndFindGenerationField #
Total comments: 50
Patch Set 20 : Changes addressed to reviewer comments #
Total comments: 5
Patch Set 21 : Changes addressed to reviewers' comments 2 #
Total comments: 2
Patch Set 22 : Don't check the ancestors of <form> #Patch Set 23 : Check <form>'s attributes #
Dependent Patchsets: Messages
Total messages: 34 (12 generated)
Patchset #3 (id:40001) has been deleted
kolos@chromium.org changed reviewers: + vabr@chromium.org
Hi Vaclav, Please review HTML parsing based form classifier. In this CL, there is no calls of classifier from PasswordGenerationAgent. Only the classifier and its tests. I will add sending votes in a separate CL that we will discuss with Vadym. Please review all files except chrome/chrome_tests.gypi. I put form_classifier.* in the same folder with PasswordGenerationAgent and form_classifier_browsertests.cc in the same folder with password_generation_agent_browsertest.cc. Is it appropriate locations? Best regards, Maxim
On 2016/06/08 14:52:07, kolos1 wrote: > Hi Vaclav, > > Please review HTML parsing based form classifier. In this CL, there is no calls > of classifier from PasswordGenerationAgent. Only the classifier and its tests. I > will add sending votes in a separate CL that we will discuss with Vadym. > > Please review all files except chrome/chrome_tests.gypi. > > I put form_classifier.* in the same folder with PasswordGenerationAgent and > form_classifier_browsertests.cc in the same folder with > password_generation_agent_browsertest.cc. Is it appropriate locations? > > Best regards, > Maxim Thanks, Maxim! I'm unfortunately out of time today, and OOO until late afternoon tomorrow. Will have a look as soon as I get a chance. Cheers, Vaclav
Description was changed from ========== [Password Manager] Client-side form type classifier BUG=582434 ========== to ========== [Password Manager] HTML parsing based client-side form type classifier This CL introduces the classifier that detects forms where password generation should be offered (i.e. signup and change password forms). BUG=582434 ==========
dvadym@chromium.org changed reviewers: + dvadym@chromium.org
https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... File components/autofill/content/renderer/form_classifier.cc (right): https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:61: for (char d : kCharactersToBeRemoved) Can we use regexp for removing symbols? https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:102: void FindTextFeaturesInElement(bool* found_signin_text_features, Input arguments should be before output arguments https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:103: bool* found_signup_text_features, Could you please add DCHECK for found_*_text_features? https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:117: CR_DEFINE_STATIC_LOCAL(WebString, kButton, ("button")); All input type names const are in InputTypeNames.h, but I'm not sure can we use them non in blink code. https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:165: void FindTextFeaturesInFormAndItsAncestors(const blink::WebFormElement& form, The same comment as to FindTextFeaturesInElement arguments https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:172: for (; !parent.isNull();) { I'm concerning a little bit in performance of this function. A loop can go till top of the DOM and the question of running time of parent.getElementsByHTMLTagName for different sites. https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:280: password_creation_field = passwords[0]; What's about case when passwords.size() > 3? It could be for example when sign-in + sign-up form are in one <form> element or where there are hidden from the user password fields, that we incorrectly consider to be visible. Probably we should do the same as in password_form_conversion_utils.cc:LocateSpecificPasswords, namely skip such form.
https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... File components/autofill/content/renderer/form_classifier.cc (right): https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:61: for (char d : kCharactersToBeRemoved) On 2016/06/09 12:40:59, dvadym wrote: > Can we use regexp for removing symbols? Done. https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:102: void FindTextFeaturesInElement(bool* found_signin_text_features, On 2016/06/09 12:40:59, dvadym wrote: > Input arguments should be before output arguments Done. https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:103: bool* found_signup_text_features, On 2016/06/09 12:40:59, dvadym wrote: > Could you please add DCHECK for found_*_text_features? Done. https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:117: CR_DEFINE_STATIC_LOCAL(WebString, kButton, ("button")); On 2016/06/09 12:40:59, dvadym wrote: > All input type names const are in InputTypeNames.h, but I'm not sure can we use > them non in blink code. I'm not sure we could use this file. As I understand, it for WebKit/Source/core classes, but for WebFormControlElement static local variables are used. https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:165: void FindTextFeaturesInFormAndItsAncestors(const blink::WebFormElement& form, On 2016/06/09 12:40:59, dvadym wrote: > The same comment as to FindTextFeaturesInElement arguments Done. https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:172: for (; !parent.isNull();) { On 2016/06/09 12:40:59, dvadym wrote: > I'm concerning a little bit in performance of this function. A loop can go till > top of the DOM and the question of running time of > parent.getElementsByHTMLTagName for different sites. I will ask to review Webkit specialist. Thanks. https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:280: password_creation_field = passwords[0]; I saw one site where there were 4 password fields (2 fields for account password, and 2 fields for security questions or so). But probably this is an exception, > 3 passwords is a mistake. I will add "number_of_password_input_fields > 3" above.
Hi Maxim, This looks mostly good to me, but I am concerned about the performance. Cheers, Vaclav https://codereview.chromium.org/1883183002/diff/380001/chrome/renderer/autofi... File chrome/renderer/autofill/form_classifier_browsertest.cc (right): https://codereview.chromium.org/1883183002/diff/380001/chrome/renderer/autofi... chrome/renderer/autofill/form_classifier_browsertest.cc:176: // i.e. if one more text/password/checkbox/other field is added, the form nit: Please keep horizontal spaces to one character and also fix the line breaks to make most of the 80 character per line limit. https://codereview.chromium.org/1883183002/diff/380001/chrome/renderer/autofi... chrome/renderer/autofill/form_classifier_browsertest.cc:256: // signal of signup form. So, this form should classified as signup. typo: should classified -> should be classified https://codereview.chromium.org/1883183002/diff/380001/components/autofill.gypi File components/autofill.gypi (right): https://codereview.chromium.org/1883183002/diff/380001/components/autofill.gy... components/autofill.gypi:493: 'autofill/content/renderer/form_classifier.cc', Where do we plan to use form_classifier? If only inside the password_manager component, then we should place it there. Having password code in the autofill component is an unfortunate result of some dependencies related to how the agents in the renderer code work. We should not put more password manager code to autofill than necessary. https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... File components/autofill/content/renderer/form_classifier.cc (right): https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:9: #include "base/strings/string_util.h" Please also #include "base/string16.h". https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:28: const int kNumberOfSigninFeatures = arraysize(kSigninTextFeatures); Please #include "base/macros.h" for arraysize. https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:28: const int kNumberOfSigninFeatures = arraysize(kSigninTextFeatures); Please use constexpr here and with the constants below. It is allowed and encouraged by http://chromium-cpp.appspot.com/#core-whitelist . https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:28: const int kNumberOfSigninFeatures = arraysize(kSigninTextFeatures); Please use size_t instead of int. Unlike int, size_t is appropriate for non-negative counts and compatible with sizeof() and the STL size()/length() return types. https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:41: // Minimal number of input fields to detect signup/change password form. The comment does not make it clear how the minimal counts are applied. Assuming that T, P, C and O are the respective counts of fields for a given form. If all four of them are >= the given minimal counts, does that guarantee that the classifier will say the form is a signup/change password form? Can the classifier say that a form is signup/change password even if at least one of T, P, C or O are less than the minimal count? https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:65: for (size_t j = 0; j < number_of_features; j++) { nit: ++j (Let's be consistent and use prefix increment. We don't need the expression to represent the pre-increment value, so no need for post-increment.) https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:162: // Finds text features in <form> tag of |form| and its ancestors. nit: The comment sounds not completely correct. What is "<form> tag" of an arbitrary ancestor (which may not be a form)? https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:173: if (parent.getElementsByHTMLTagName(kInput).length() > number_of_inputs) +1 to Vadym's worry about performance. This line in particular worries me, as it recounts the form elements and costs time quadratic in the number of elements in the DOM. We must no underestimate the performance impact, because form classification runs on all pages. http://crbug.com/614856 is an example of where this might be problematic. https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:223: for (size_t i = 0; i < control_elements.size(); ++i) { nit: Compress the first two line into: for (const WebControlElement& control_element : control_elements) {... https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:225: bool element_is_invisible = !form_util::IsWebNodeVisible(control_element); IsWebNodeVisible might be expensive, please only call it if |ignore_invisible_elements| is true. https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:227: IsHiddenElement(control_element)) nit: "if" statements with more than 2 lines in total should use { and }. https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:241: number_of_other_input_fields++; nit: Here and below: please use prefix ++ unless you need the whole expression to represent the value before incrementing. https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... File components/autofill/content/renderer/form_classifier.h (right): https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.h:10: // of the generation field to |generation_field| and returns true. Otherwise, nit: It might not be completely clear what a "generation field" is. Perhaps we could call it "field where Chrome should generate a password"? https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.h:11: // returns false (w/o any changes of |generation_field|). nit: changes of -> changes to https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.h:12: bool ClassifyFormAndFindGenerationField(const blink::WebFormElement& form, You should just forward-declare blink::WebFormElement (keep the #include for the .cc file) and you should also forward-declare base::string16. http://dev.chromium.org/developers/coding-style#TOC-Forward-declarations-vs.-... https://codereview.chromium.org/1883183002/diff/400001/components/autofill/co... File components/autofill/content/renderer/form_classifier.cc (right): https://codereview.chromium.org/1883183002/diff/400001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:58: RE2::GlobalReplace(&filtered_value, kCharactersToBeRemoved, ""); I'm afraid both regexps and the substring replacement here incur performance cost compared to your previous elegant solution with the erase-remove idiom. You know that you are replacing single characters, so the power of regexps goes mostly in vain here. In a similar situation [1] we also went with the simple solution. Please drop the regexps and revert to your previous removal code. [1] https://cs.chromium.org/chromium/src/components/autofill/core/common/save_pas...
https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... File components/autofill/content/renderer/form_classifier.cc (right): https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:117: CR_DEFINE_STATIC_LOCAL(WebString, kButton, ("button")); On 2016/06/10 12:18:08, kolos1 wrote: > On 2016/06/09 12:40:59, dvadym wrote: > > All input type names const are in InputTypeNames.h, but I'm not sure can we > use > > them non in blink code. > > I'm not sure we could use this file. As I understand, it for WebKit/Source/core > classes, but for WebFormControlElement static local variables are used. Acknowledged https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:173: if (parent.getElementsByHTMLTagName(kInput).length() > number_of_inputs) On 2016/06/10 13:22:11, vabr (Chromium) wrote: > +1 to Vadym's worry about performance. This line in particular worries me, as it > recounts the form elements and costs time quadratic in the number of elements in > the DOM. > > We must no underestimate the performance impact, because form classification > runs on all pages. http://crbug.com/614856 is an example of where this might be > problematic. Yeah, perfomance impact is not clear. Independently for what solution we will come up, probably it's worth to check impact with UMA on Dev and Canary https://codereview.chromium.org/1883183002/diff/400001/components/autofill/co... File components/autofill/content/renderer/form_classifier.cc (right): https://codereview.chromium.org/1883183002/diff/400001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:58: RE2::GlobalReplace(&filtered_value, kCharactersToBeRemoved, ""); On 2016/06/10 13:22:12, vabr (Chromium) wrote: > I'm afraid both regexps and the substring replacement here incur performance > cost compared to your previous elegant solution with the erase-remove idiom. You > know that you are replacing single characters, so the power of regexps goes > mostly in vain here. > > In a similar situation [1] we also went with the simple solution. > > Please drop the regexps and revert to your previous removal code. > > [1] > https://cs.chromium.org/chromium/src/components/autofill/core/common/save_pas... I wouldn't mind returning to erase-remove, but I'd like to mention that regexp is fast, it work in linear time in one iteration through string. But of course it's needed to pass to ::GlobalReplace not a string but RE2 object in order to avoid parsing of string each time.
https://codereview.chromium.org/1883183002/diff/400001/components/autofill/co... File components/autofill/content/renderer/form_classifier.cc (right): https://codereview.chromium.org/1883183002/diff/400001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:58: RE2::GlobalReplace(&filtered_value, kCharactersToBeRemoved, ""); On 2016/06/10 14:12:09, dvadym wrote: > On 2016/06/10 13:22:12, vabr (Chromium) wrote: > > I'm afraid both regexps and the substring replacement here incur performance > > cost compared to your previous elegant solution with the erase-remove idiom. > You > > know that you are replacing single characters, so the power of regexps goes > > mostly in vain here. > > > > In a similar situation [1] we also went with the simple solution. > > > > Please drop the regexps and revert to your previous removal code. > > > > [1] > > > https://cs.chromium.org/chromium/src/components/autofill/core/common/save_pas... > > I wouldn't mind returning to erase-remove, but I'd like to mention that regexp > is fast, it work in linear time in one iteration through string. But of course > it's needed to pass to ::GlobalReplace not a string but RE2 object in order to > avoid parsing of string each time. Parsing is indeed fast, but building/compiling the parser from the string representation is not. The regexp gets turned into a non-deterministic automaton, which gets determinised, etc. Also, removing a character from the string one at a time is asymptotically worse than erase-remove, which avoids moving the non-erased characters more than once each.
https://codereview.chromium.org/1883183002/diff/400001/components/autofill/co... File components/autofill/content/renderer/form_classifier.cc (right): https://codereview.chromium.org/1883183002/diff/400001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:58: RE2::GlobalReplace(&filtered_value, kCharactersToBeRemoved, ""); On 2016/06/10 14:32:14, vabr (Chromium) wrote: > On 2016/06/10 14:12:09, dvadym wrote: > > On 2016/06/10 13:22:12, vabr (Chromium) wrote: > > > I'm afraid both regexps and the substring replacement here incur performance > > > cost compared to your previous elegant solution with the erase-remove idiom. > > You > > > know that you are replacing single characters, so the power of regexps goes > > > mostly in vain here. > > > > > > In a similar situation [1] we also went with the simple solution. > > > > > > Please drop the regexps and revert to your previous removal code. > > > > > > [1] > > > > > > https://cs.chromium.org/chromium/src/components/autofill/core/common/save_pas... > > > > I wouldn't mind returning to erase-remove, but I'd like to mention that regexp > > is fast, it work in linear time in one iteration through string. But of course > > it's needed to pass to ::GlobalReplace not a string but RE2 object in order to > > avoid parsing of string each time. > > Parsing is indeed fast, but building/compiling the parser from the string > representation is not. The regexp gets turned into a non-deterministic > automaton, which gets determinised, etc. Also, removing a character from the > string one at a time is asymptotically worse than erase-remove, which avoids > moving the non-erased characters more than once each. Sure regexp parsing is slow, that exactly what I meant in the previous comment, when wrote that it's needed to pass RE2 object to GlobalReplace. This object should be created once and that allows to avoid parsing each time. GlobalReplace has version with RE2 object as an argument https://cs.chromium.org/chromium/src/third_party/re2/src/re2/re2.h?q=re2&sq=p...
Patchset #21 (id:420001) has been deleted
Patchset #21 (id:440001) has been deleted
Hi Vadym and Vaclav, Great thanks for your comments. I made some fixes and tested the performance (see the comment). I am also going to talk with Jochen about using Webkit and performance. Regards, Maxim https://codereview.chromium.org/1883183002/diff/380001/chrome/renderer/autofi... File chrome/renderer/autofill/form_classifier_browsertest.cc (right): https://codereview.chromium.org/1883183002/diff/380001/chrome/renderer/autofi... chrome/renderer/autofill/form_classifier_browsertest.cc:176: // i.e. if one more text/password/checkbox/other field is added, the form Removed double space. Did I exceed 80 character limit somewhere? https://codereview.chromium.org/1883183002/diff/380001/chrome/renderer/autofi... chrome/renderer/autofill/form_classifier_browsertest.cc:256: // signal of signup form. So, this form should classified as signup. On 2016/06/10 13:22:11, vabr (Chromium) wrote: > typo: should classified -> should be classified Done. https://codereview.chromium.org/1883183002/diff/380001/components/autofill.gypi File components/autofill.gypi (right): https://codereview.chromium.org/1883183002/diff/380001/components/autofill.gy... components/autofill.gypi:493: 'autofill/content/renderer/form_classifier.cc', As we discussed in person, the form classifier will be called from PasswordGenerationAgent. So, I put the classifier in the same location and the same targets as the agent. https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... File components/autofill/content/renderer/form_classifier.cc (right): https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:9: #include "base/strings/string_util.h" On 2016/06/10 13:22:12, vabr (Chromium) wrote: > Please also #include "base/string16.h". Done. https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:28: const int kNumberOfSigninFeatures = arraysize(kSigninTextFeatures); On 2016/06/10 13:22:11, vabr (Chromium) wrote: > Please #include "base/macros.h" for arraysize. Done. https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:28: const int kNumberOfSigninFeatures = arraysize(kSigninTextFeatures); On 2016/06/10 13:22:12, vabr (Chromium) wrote: > Please use size_t instead of int. Unlike int, size_t is appropriate for > non-negative counts and compatible with sizeof() and the STL size()/length() > return types. Done. https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:28: const int kNumberOfSigninFeatures = arraysize(kSigninTextFeatures); On 2016/06/10 13:22:11, vabr (Chromium) wrote: > Please #include "base/macros.h" for arraysize. Done. https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:41: // Minimal number of input fields to detect signup/change password form. Fixed the comment. https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:65: for (size_t j = 0; j < number_of_features; j++) { On 2016/06/10 13:22:12, vabr (Chromium) wrote: > nit: ++j > (Let's be consistent and use prefix increment. We don't need the expression to > represent the pre-increment value, so no need for post-increment.) Done. https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:162: // Finds text features in <form> tag of |form| and its ancestors. Fixed the comment. https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:173: if (parent.getElementsByHTMLTagName(kInput).length() > number_of_inputs) I reduced the number of calls "getElementsByHTMLTagName(kInput)". Now it is called at least once and at maximum twice if the feature was found. I measured the work time of the classifier on my collection of annotated forms and also on an artificial form with 200 inputs (the whole page contains 1K inputs). The measured time is 0 ms in almost all cases. The only exception is bet365.com where signup form contains very long text attributes. I tried to regexes. The performance is more or less the same. https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:223: for (size_t i = 0; i < control_elements.size(); ++i) { On 2016/06/10 13:22:11, vabr (Chromium) wrote: > nit: Compress the first two line into: > > for (const WebControlElement& control_element : control_elements) {... Done. https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:225: bool element_is_invisible = !form_util::IsWebNodeVisible(control_element); On 2016/06/10 13:22:12, vabr (Chromium) wrote: > IsWebNodeVisible might be expensive, please only call it if > |ignore_invisible_elements| is true. Done. https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:227: IsHiddenElement(control_element)) On 2016/06/10 13:22:11, vabr (Chromium) wrote: > nit: "if" statements with more than 2 lines in total should use { and }. Done. https://codereview.chromium.org/1883183002/diff/380001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:241: number_of_other_input_fields++; On 2016/06/10 13:22:12, vabr (Chromium) wrote: > nit: Here and below: please use prefix ++ unless you need the whole expression > to represent the value before incrementing. Done. https://codereview.chromium.org/1883183002/diff/400001/components/autofill/co... File components/autofill/content/renderer/form_classifier.cc (right): https://codereview.chromium.org/1883183002/diff/400001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:58: RE2::GlobalReplace(&filtered_value, kCharactersToBeRemoved, ""); On 2016/06/10 15:05:16, dvadym wrote: > On 2016/06/10 14:32:14, vabr (Chromium) wrote: > > On 2016/06/10 14:12:09, dvadym wrote: > > > On 2016/06/10 13:22:12, vabr (Chromium) wrote: > > > > I'm afraid both regexps and the substring replacement here incur > performance > > > > cost compared to your previous elegant solution with the erase-remove > idiom. > > > You > > > > know that you are replacing single characters, so the power of regexps > goes > > > > mostly in vain here. > > > > > > > > In a similar situation [1] we also went with the simple solution. > > > > > > > > Please drop the regexps and revert to your previous removal code. > > > > > > > > [1] > > > > > > > > > > https://cs.chromium.org/chromium/src/components/autofill/core/common/save_pas... > > > > > > I wouldn't mind returning to erase-remove, but I'd like to mention that > regexp > > > is fast, it work in linear time in one iteration through string. But of > course > > > it's needed to pass to ::GlobalReplace not a string but RE2 object in order > to > > > avoid parsing of string each time. > > > > Parsing is indeed fast, but building/compiling the parser from the string > > representation is not. The regexp gets turned into a non-deterministic > > automaton, which gets determinised, etc. Also, removing a character from the > > string one at a time is asymptotically worse than erase-remove, which avoids > > moving the non-erased characters more than once each. > > Sure regexp parsing is slow, that exactly what I meant in the previous comment, > when wrote that it's needed to pass RE2 object to GlobalReplace. This object > should be created once and that allows to avoid parsing each time. GlobalReplace > has version with RE2 object as an argument > https://cs.chromium.org/chromium/src/third_party/re2/src/re2/re2.h?q=re2&sq=p... > Replaced with erase/remove_if solution.
Thank you, Maxim! Your changes LGTM, and I'm glad you plan to talk about Blink experts about performance (note that BlinkOn is in our office this week, so it's the ideal opportunity!). Cheers, Vaclav https://codereview.chromium.org/1883183002/diff/380001/chrome/renderer/autofi... File chrome/renderer/autofill/form_classifier_browsertest.cc (right): https://codereview.chromium.org/1883183002/diff/380001/chrome/renderer/autofi... chrome/renderer/autofill/form_classifier_browsertest.cc:176: // i.e. if one more text/password/checkbox/other field is added, the form On 2016/06/13 14:27:34, kolos1 wrote: > Removed double space. > > Did I exceed 80 character limit somewhere? No, you did not, quite the opposite :), which was what I was commenting about. In other words, a good part of line 176 could have been fitted at the end of line 175. https://codereview.chromium.org/1883183002/diff/460001/components/autofill/co... File components/autofill/content/renderer/form_classifier.cc (right): https://codereview.chromium.org/1883183002/diff/460001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:244: bool element_is_invisible = !form_util::IsWebNodeVisible(control_element); nit: Shorter form is: if (!form_util::IsWebNodeVisible(control_element)) continue;
On 2016/06/13 16:01:57, vabr (Chromium) wrote: > Thank you, Maxim! > > Your changes LGTM, and I'm glad you plan to talk about Blink experts about > performance (note that BlinkOn is in our office this week, so it's the ideal > opportunity!). To be clear -- my understanding is that you will discuss the performance impact with some Blink experts before landing this. > > Cheers, > Vaclav
Thanks! LGTM % perfomance questions of FindTextFeaturesInFormAndItsAncestors and correctness of using Blink API (I'm not an expert in Blink, so I can't be sure). That's better to talk with somebody who knows Blink very well.
kolos@chromium.org changed reviewers: + esprehn@chromium.org
esprehn@: FYI, this is the form classifier I mentioned in chat.
On 2016/06/14 at 09:11:35, kolos wrote: > esprehn@: FYI, this is the form classifier I mentioned in chat. Note that all the Web* APIs used in this patch are going away, this patch also doesn't handle shadow dom. The code here is also very expensive, when will run it? How does it connect to blink?
On 2016/06/14 09:31:28, esprehn wrote: > On 2016/06/14 at 09:11:35, kolos wrote: > > esprehn@: FYI, this is the form classifier I mentioned in chat. > > Note that all the Web* APIs used in this patch are going away, this patch also > doesn't handle shadow dom. The code here is also very expensive, when will run > it? How does it connect to blink? Shall this code handle shadow DOM? We will run it each time we see a <form> on a page. The code reads the attributes of the form control elements.
kolos@chromium.org changed reviewers: - esprehn@chromium.org
Hi Vaclav, I removed checking attributes of the ancestors of <form>. Now the classifier uses blink in the same way as the rest of Password manager code. Please review. Best regards, Maxim https://codereview.chromium.org/1883183002/diff/460001/components/autofill/co... File components/autofill/content/renderer/form_classifier.cc (right): https://codereview.chromium.org/1883183002/diff/460001/components/autofill/co... components/autofill/content/renderer/form_classifier.cc:244: bool element_is_invisible = !form_util::IsWebNodeVisible(control_element); On 2016/06/13 16:01:57, vabr (Chromium) wrote: > nit: Shorter form is: > if (!form_util::IsWebNodeVisible(control_element)) > continue; Done.
Thank you, Maxim, LGTM, this removed my concern about performance. Cheers, Vaclav
The CQ bit was checked by kolos@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from dvadym@chromium.org Link to the patchset: https://codereview.chromium.org/1883183002/#ps480002 (title: "Check <form>'s attributes")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1883183002/480002
Message was sent while issue was closed.
Description was changed from ========== [Password Manager] HTML parsing based client-side form type classifier This CL introduces the classifier that detects forms where password generation should be offered (i.e. signup and change password forms). BUG=582434 ========== to ========== [Password Manager] HTML parsing based client-side form type classifier This CL introduces the classifier that detects forms where password generation should be offered (i.e. signup and change password forms). BUG=582434 ==========
Message was sent while issue was closed.
Committed patchset #23 (id:480002)
Message was sent while issue was closed.
CQ bit was unchecked
Message was sent while issue was closed.
Description was changed from ========== [Password Manager] HTML parsing based client-side form type classifier This CL introduces the classifier that detects forms where password generation should be offered (i.e. signup and change password forms). BUG=582434 ========== to ========== [Password Manager] HTML parsing based client-side form type classifier This CL introduces the classifier that detects forms where password generation should be offered (i.e. signup and change password forms). BUG=582434 Committed: https://crrev.com/1a8666ef27c1fbee0f0d57a79b533dc51ba6e605 Cr-Commit-Position: refs/heads/master@{#399899} ==========
Message was sent while issue was closed.
Patchset 23 (id:??) landed as https://crrev.com/1a8666ef27c1fbee0f0d57a79b533dc51ba6e605 Cr-Commit-Position: refs/heads/master@{#399899} |