|
|
Created:
8 years, 3 months ago by mrossetti Modified:
8 years, 2 months ago CC:
chromium-reviews, MAD, James Su, jar (doing other things) Base URL:
svn://svn.chromium.org/chrome/trunk/src/ Visibility:
Public. |
DescriptionImplement Bookmark Autocomplete Provider
Introduce a new autocomplete provider that searches bookmarks by bookmark title.
Note: The behavior of QueryNodeWord::HasMatchIn in query_parser.cc has been changed to return all match positions rather than just the first.
Note: CoalesceMatchesFrom (in query_parser.cc) has been changed to replace the intersection of two match positions with the union of those two matches. Previously, it would behave differently based on ordering of the terms.
BUG=13308, 154582, 155873
TEST=New unit tests added. Existing unit tests enhanced.
pkasting: everything under autocomplete/.
mpearson: scoring in bookmark_provider, also unit test.
isherman: metrics (Done).
sky: query_parser.h/.cc and bookmark_index.h/.cc.
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=162807
Patch Set 1 #
Total comments: 39
Patch Set 2 : #Patch Set 3 : #
Total comments: 16
Patch Set 4 : Address clint complaints. #
Total comments: 41
Patch Set 5 : #
Total comments: 27
Patch Set 6 : Now handles duplicate term matching. Comments, etc. #
Total comments: 33
Patch Set 7 : Fixed match position coalescing. #
Total comments: 12
Patch Set 8 : #Patch Set 9 : Edited comments. #
Total comments: 10
Patch Set 10 : Final upload. #Messages
Total messages: 31 (0 generated)
Mark, would you please take a look at the scoring of the bookmark result found in bookmark_provider.cc around lines 108:125 in the TitleMatchToACMatch and suggest a more appropriate approach/algorithm? Your review of the rest is also welcome. The unit test has not been finished. (I ran out of time.) I'll finish up the unit tests when I get back. Thanks.
Some initial comments. I didn't get far into bookmark_provider.cc, but I did look through everything else (but the test file and the gypi files). --mark http://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/au... File chrome/browser/autocomplete/autocomplete_controller.cc (right): http://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/au... chrome/browser/autocomplete/autocomplete_controller.cc:57: default: return 64; Please look into whether you need to add something here. (I think the answer is probably yes.) http://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/au... File chrome/browser/autocomplete/autocomplete_provider.cc (right): http://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/au... chrome/browser/autocomplete/autocomplete_provider.cc:95: // TODO: Add to OmniboxEventProto::ProviderType. You need to do something here, else ZERO_SUGGEST types will get reutrned as BOOKMARKS. (No return on break in section.) http://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/au... File chrome/browser/autocomplete/autocomplete_provider.h (right): http://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/au... chrome/browser/autocomplete/autocomplete_provider.h:39: // UNKNOWN input type: When you're ready, please update the below tables with your provider and a rough range. http://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/bo... File chrome/browser/autocomplete/bookmark_provider.cc (right): http://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/bo... chrome/browser/autocomplete/bookmark_provider.cc:114: // Add 1199 - score * total match length / title length I can't follow this code from 114-125. Perhaps in this comment you meant to have an open parenthesis before the 1199 and a close one before the slash? Then I think that makes sense with the code. But I still find the structure of this scoring odd. In particular, you're computing (base score + #bookmark boost) and then subtracting off a discounted version of this. I find it hard to wrap my head around what effect this has. Can you simplify this / decompose this quantity of title match code? http://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/bo... File chrome/browser/autocomplete/bookmark_provider.h (right): http://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/bo... chrome/browser/autocomplete/bookmark_provider.h:27: void set_bookmark_model_for_testing(BookmarkModel* bookmark_model) { Should this be private and a friend to your test class? http://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/bo... chrome/browser/autocomplete/bookmark_provider.h:35: // matches. If |best_match| then only return the single best match, otherwise matches -> |matches_| http://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/bo... chrome/browser/autocomplete/bookmark_provider.h:36: // return the top |kMaxMatches| matches. Using return (both times) in the above sentence is odd because this function doesn't return anything. http://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/bo... chrome/browser/autocomplete/bookmark_provider.h:41: AutocompleteMatch TitleMatchToACMatch( Can this be static or at least const? http://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/bo... chrome/browser/autocomplete/bookmark_provider.h:46: // terminating classification. I don't understand the last sentence. (What is a terminating classification and why might you need one?) http://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/hi... File chrome/browser/autocomplete/history_provider.cc (right): http://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/hi... chrome/browser/autocomplete/history_provider.cc:11: #include "chrome/browser/autocomplete/autocomplete_input.h" I'm curious: are either of the above two things necessary for this changelist? http://codereview.chromium.org/10913262/diff/1/chrome/common/metrics/proto/om... File chrome/common/metrics/proto/omnibox_event.proto (right): http://codereview.chromium.org/10913262/diff/1/chrome/common/metrics/proto/om... chrome/common/metrics/proto/omnibox_event.proto:119: CONTACT = 12; // One of the user's contacts You need to add BOOKMARK_TITLE here too I think. You also need to modify metrics_service AsOmniboxEventResultType
Still haven't finished the rest of the .cc, but here are a few more comments, mostly not significant. --mark http://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/bo... File chrome/browser/autocomplete/bookmark_provider.cc (right): http://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/bo... chrome/browser/autocomplete/bookmark_provider.cc:7: #include "base/string_number_conversions.h" Why do you need this? http://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/bo... chrome/browser/autocomplete/bookmark_provider.cc:42: UMA_HISTOGRAM_TIMES("Autocomplete.BookmarkProviderMatchTime", Perhaps you want two different histograms, one for searching for the best match and one for all other cases? http://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/bo... chrome/browser/autocomplete/bookmark_provider.cc:52: string16 search_string = input.text(); It makes more sense (to me) to have these two lines closer to where the first use of this variable is. http://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/bo... chrome/browser/autocomplete/bookmark_provider.cc:55: bookmark_model_ = BookmarkModelFactory::GetForProfile(profile_); Is this fast enough that it can be done synchronously on the first omnibox keystroke? http://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/bo... chrome/browser/autocomplete/bookmark_provider.cc:57: return; Is this worth a DCHECK NOT REACHED? http://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/bo... chrome/browser/autocomplete/bookmark_provider.cc:61: bookmark_model_->GetBookmarksWithTitlesMatching(search_string, 99, &matches); Comment on the meaning of 99. http://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/bo... chrome/browser/autocomplete/bookmark_provider.cc:207: } I don't see you ever touching |done_|. Does it get set correctly even if you're not doing anything to it explicitly?
Thanks for the comments Mark. Issues addressed one way or the other. Please see my plea for scoring guidance in bookmark_provider.cc. https://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/a... File chrome/browser/autocomplete/autocomplete_controller.cc (right): https://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/a... chrome/browser/autocomplete/autocomplete_controller.cc:57: default: return 64; On 2012/09/28 00:55:13, Mark P wrote: > Please look into whether you need to add something here. (I think the answer is > probably yes.) Done. https://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/a... File chrome/browser/autocomplete/autocomplete_provider.cc (right): https://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/a... chrome/browser/autocomplete/autocomplete_provider.cc:95: // TODO: Add to OmniboxEventProto::ProviderType. On 2012/09/28 00:55:13, Mark P wrote: > You need to do something here, else ZERO_SUGGEST types will get reutrned as > BOOKMARKS. (No return on break in section.) Done. https://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/a... File chrome/browser/autocomplete/autocomplete_provider.h (right): https://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/a... chrome/browser/autocomplete/autocomplete_provider.h:39: // UNKNOWN input type: On 2012/09/28 00:55:13, Mark P wrote: > When you're ready, please update the below tables with your provider and a rough > range. I'll add comments now and can refine them later when we settle on a scoring algo. https://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/b... File chrome/browser/autocomplete/bookmark_provider.cc (right): https://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/b... chrome/browser/autocomplete/bookmark_provider.cc:7: #include "base/string_number_conversions.h" On 2012/09/28 23:10:22, Mark P wrote: > Why do you need this? I don't any longer. Removed. Good catch. https://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/b... chrome/browser/autocomplete/bookmark_provider.cc:42: UMA_HISTOGRAM_TIMES("Autocomplete.BookmarkProviderMatchTime", On 2012/09/28 23:10:22, Mark P wrote: > Perhaps you want two different histograms, one for searching for the best match > and one for all other cases? It doesn't make a significant difference in execution time so I don't see a need to differentiate. https://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/b... chrome/browser/autocomplete/bookmark_provider.cc:52: string16 search_string = input.text(); On 2012/09/28 23:10:22, Mark P wrote: > It makes more sense (to me) to have these two lines closer to where the first > use of this variable is. Good point! And I was able to get rid of the temporary completely. https://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/b... chrome/browser/autocomplete/bookmark_provider.cc:55: bookmark_model_ = BookmarkModelFactory::GetForProfile(profile_); On 2012/09/28 23:10:22, Mark P wrote: > Is this fast enough that it can be done synchronously on the first omnibox > keystroke? Right now, the BookmarkProvider will returning suggestions for search terms with at least 3 characters. The response time where there are hundreds of bookmarks (I don't have an exact number for my test set but it's close to a thousand) is under 10 ms. I'd like to get this settled with the 3 limit and then experiment with reducing it to 1 after this is stable. There is one gotcha with the 3 limit tho: All search terms must be at least 3 characters long. So if 'cou' brings up your favorite bookmark with a title of "Country Lyrics" and you then type an 'l' as in "cou l" or even "cou ly" then the BP will return no results. Once the 'r' is typed, "cou lyr", the BP will once again start showing results. https://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/b... chrome/browser/autocomplete/bookmark_provider.cc:57: return; On 2012/09/28 23:10:22, Mark P wrote: > Is this worth a DCHECK NOT REACHED? Not really, because there might not be a bookmark model for some unit tests. I did, however, add a check for |profile_| which might also be NULL for some unit tests. https://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/b... chrome/browser/autocomplete/bookmark_provider.cc:61: bookmark_model_->GetBookmarksWithTitlesMatching(search_string, 99, &matches); On 2012/09/28 23:10:22, Mark P wrote: > Comment on the meaning of 99. Done. https://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/b... chrome/browser/autocomplete/bookmark_provider.cc:114: // Add 1199 - score * total match length / title length I threw in a very simple scoring algorithm here as a placeholder in hopes you would suggest a more appropriate approach. Scoring of suggestions is already totally wacky so any stab I take at it couldn't be much worse but, unfortunately, not any better either. That is the whole purpose of my asking if you could suggest a general approach and I'll fill in the code. On the other hand, I could use the HUP approach and simply boost the score because it's a bookmark. But, frankly, I didn't not want to take any historical usage into consideration for the BookmarkProvider simply because that is already being done by the HQP. _This_ provider will probably be most valuable for picking up bookmarks that have not been visited for a while. With all of that said, here was the (simple-minded) thinking I put into this placeholder scoring scheme: 1. If it's a bookmark start with a base score of 900. (Line 115) 2. If the URL in the bookmark is referenced by other bookmarks then boost the score. (Lines 112-115) 3. The maximum possible score is 1199 so take the difference of the current score and that max. Calculate the fraction of the bookmark's title that is spanned by the search terms and boost the score by multiplying that fraction by the difference in the max and current score. For example, if the current score is 900 and the search terms span 21% of the bookmark title that gives (1199 - 900) * 21% ==> 63 points. (Lines 116-125) 4. If there were multiple search terms and they appeared out of order in the bookmark title then 50 points for each out-of-order search term. As I said, simple-minded. I'm sure you can proposed a much more rational approach! On 2012/09/28 00:55:13, Mark P wrote: > I can't follow this code from 114-125. Perhaps in this comment you meant to > have an open parenthesis before the 1199 and a close one before the slash? Then > I think that makes sense with the code. > > But I still find the structure of this scoring odd. In particular, you're > computing (base score + #bookmark boost) and then subtracting off a discounted > version of this. I find it hard to wrap my head around what effect this has. > Can you simplify this / decompose this quantity of title match code? https://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/b... chrome/browser/autocomplete/bookmark_provider.cc:207: } On 2012/09/28 23:10:22, Mark P wrote: > I don't see you ever touching |done_|. Does it get set correctly even if you're > not doing anything to it explicitly? Excellent question. |done_| is always true unless the provider specifically sets it to false so it is irrelevant in the case of a synchronous-only provider. https://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/b... File chrome/browser/autocomplete/bookmark_provider.h (right): https://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/b... chrome/browser/autocomplete/bookmark_provider.h:27: void set_bookmark_model_for_testing(BookmarkModel* bookmark_model) { On 2012/09/28 00:55:13, Mark P wrote: > Should this be private and a friend to your test class? Not necessarily. There is no style guidance for this and, in this where there is a very limited number of for_testing functions I'd prefer to not have a friend. https://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/b... chrome/browser/autocomplete/bookmark_provider.h:35: // matches. If |best_match| then only return the single best match, otherwise On 2012/09/28 00:55:13, Mark P wrote: > matches -> |matches_| Done. https://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/b... chrome/browser/autocomplete/bookmark_provider.h:36: // return the top |kMaxMatches| matches. On 2012/09/28 00:55:13, Mark P wrote: > Using return (both times) in the above sentence is odd because this function > doesn't return anything. Done. https://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/b... chrome/browser/autocomplete/bookmark_provider.h:41: AutocompleteMatch TitleMatchToACMatch( On 2012/09/28 00:55:13, Mark P wrote: > Can this be static or at least const? Unfortunately, neither. Can't be static because it needs access to |languages_|. Can't be const because the AutocompleteMatch created has a non-const pointer to the provider. (And I don't want to go down the const rabbit hole trying to make everything happy with a const pointer.) https://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/b... chrome/browser/autocomplete/bookmark_provider.h:46: // terminating classification. On 2012/09/28 00:55:13, Mark P wrote: > I don't understand the last sentence. (What is a terminating classification and > why might you need one?) I've updated the comment a bit, but for your edification what this means is that if the last substring match doesn't extend to the end of the source string then we need to add one final classification to turn off the match highlighting so that the tail of the source string is not highlighted when it shouldn't be. https://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/h... File chrome/browser/autocomplete/history_provider.cc (right): https://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/h... chrome/browser/autocomplete/history_provider.cc:11: #include "chrome/browser/autocomplete/autocomplete_input.h" On 2012/09/28 00:55:13, Mark P wrote: > I'm curious: are either of the above two things necessary for this changelist? No, it's just cleanup that I happened across. I can submit separately if you'd prefer. https://codereview.chromium.org/10913262/diff/1/chrome/common/metrics/proto/o... File chrome/common/metrics/proto/omnibox_event.proto (right): https://codereview.chromium.org/10913262/diff/1/chrome/common/metrics/proto/o... chrome/common/metrics/proto/omnibox_event.proto:119: CONTACT = 12; // One of the user's contacts On 2012/09/28 00:55:13, Mark P wrote: > You need to add BOOKMARK_TITLE here too I think. > > You also need to modify metrics_service AsOmniboxEventResultType Ah! Excellent!
Mark, I've addressed the issues you mentioned. I need some guidance on the scoring, please.
This is ready for general review: pkasting: all things autocomplete. mpearson: scoring approach in BookmarkProvider::TitleMatchToACMatch. sky: metrics/metrics_log.cc. Once mpearson advises and LGs on the scoring I will add a unit test for such.
My remaining comments on everything (including scoring). I've looked at all the files except the unittest. Sorry for the delay; I got distracted with non-omnibox work this week. --mark https://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/b... File chrome/browser/autocomplete/bookmark_provider.cc (right): https://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/b... chrome/browser/autocomplete/bookmark_provider.cc:42: UMA_HISTOGRAM_TIMES("Autocomplete.BookmarkProviderMatchTime", On 2012/10/02 00:44:16, mrossetti wrote: > On 2012/09/28 23:10:22, Mark P wrote: > > Perhaps you want two different histograms, one for searching for the best > match > > and one for all other cases? > > It doesn't make a significant difference in execution time so I don't see a need > to differentiate. Okay. https://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/b... chrome/browser/autocomplete/bookmark_provider.cc:114: // Add 1199 - score * total match length / title length On 2012/10/02 00:44:16, mrossetti wrote: > I threw in a very simple scoring algorithm here as a placeholder in hopes you > would suggest a more appropriate approach. Scoring of suggestions is already > totally wacky so any stab I take at it couldn't be much worse but, > unfortunately, not any better either. That is the whole purpose of my asking if > you could suggest a general approach and I'll fill in the code. On the other > hand, I could use the HUP approach and simply boost the score because it's a > bookmark. But, frankly, I didn't not want to take any historical usage into > consideration for the BookmarkProvider simply because that is already being done > by the HQP. _This_ provider will probably be most valuable for picking up > bookmarks that have not been visited for a while. > > With all of that said, here was the (simple-minded) thinking I put into this > placeholder scoring scheme: > > 1. If it's a bookmark start with a base score of 900. (Line 115) > 2. If the URL in the bookmark is referenced by other bookmarks then boost the > score. (Lines 112-115) > 3. The maximum possible score is 1199 so take the difference of the current > score and that max. Calculate the fraction of the bookmark's title that is > spanned by the search terms and boost the score by multiplying that fraction by > the difference in the max and current score. For example, if the current score > is 900 and the search terms span 21% of the bookmark title that gives (1199 - > 900) * 21% ==> 63 points. (Lines 116-125) > 4. If there were multiple search terms and they appeared out of order in the > bookmark title then 50 points for each out-of-order search term. > > As I said, simple-minded. I'm sure you can proposed a much more rational > approach! I don't think I can. :) Your approach sounds reasonable. I mainly was complaining about step 3. It seems conceptually odd to me that an input that matches one bookmark and covers the bookmark text by 21% should get more credit for covering it than an input that matches two bookmarks and covers them by 21%. I know the scores end up in a reasonable order, but it seems odd that you give the first a larger boost than the second because the first has more headroom before it hits 1199. I'll give specific suggestions to this on patchset 3. > On 2012/09/28 00:55:13, Mark P wrote: > > I can't follow this code from 114-125. Perhaps in this comment you meant to > > have an open parenthesis before the 1199 and a close one before the slash? > Then > > I think that makes sense with the code. > > > > But I still find the structure of this scoring odd. In particular, you're > > computing (base score + #bookmark boost) and then subtracting off a discounted > > version of this. I find it hard to wrap my head around what effect this has. > > Can you simplify this / decompose this quantity of title match code? > https://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/b... File chrome/browser/autocomplete/bookmark_provider.h (right): https://codereview.chromium.org/10913262/diff/1/chrome/browser/autocomplete/b... chrome/browser/autocomplete/bookmark_provider.h:27: void set_bookmark_model_for_testing(BookmarkModel* bookmark_model) { On 2012/10/02 00:44:16, mrossetti wrote: > On 2012/09/28 00:55:13, Mark P wrote: > > Should this be private and a friend to your test class? > > Not necessarily. There is no style guidance for this and, in this where there is > a very limited number of for_testing functions I'd prefer to not have a friend. You can make a function a friend without having to make the whole class friends. Limited friendship I suppose it can be called. If you still decline, that's okay. https://codereview.chromium.org/10913262/diff/21001/chrome/browser/autocomple... File chrome/browser/autocomplete/bookmark_provider.cc (right): https://codereview.chromium.org/10913262/diff/21001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:59: bookmark_model_ = BookmarkModelFactory::GetForProfile(profile_); On 2012/10/02 00:44:16, mrossetti wrote: > On 2012/09/28 23:10:22, Mark P wrote: > > Is this fast enough that it can be done synchronously on the first omnibox > > keystroke? > > Right now, the BookmarkProvider will returning suggestions for search terms with > at least 3 characters. The response time where there are hundreds of bookmarks > (I don't have an exact number for my test set but it's close to a thousand) is > under 10 ms. I'd like to get this settled with the 3 limit and then experiment > with reducing it to 1 after this is stable. This sounds fine to me. Where is the 3 set in code? I don't seem to see it in this file. My question, however, was much simpler than you took it to mean. I wanted to know if this line in particular "BookmarkModelFactory::GetForProfile(profile_)" is fast enough that it can be done synchronously on the first omnibox keystroke. Can it? > There is one gotcha with the 3 limit tho: All search terms must be at least 3 > characters long. So if 'cou' brings up your favorite bookmark with a title of > "Country Lyrics" and you then type an 'l' as in "cou l" or even "cou ly" then > the BP will return no results. Once the 'r' is typed, "cou lyr", the BP will > once again start showing results. Ugh, what icky behavior. Will this go away if you can resolve the 3 -> 1 issue? https://codereview.chromium.org/10913262/diff/21001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:65: // Retrieve enough bookmarks so that we can reasonably score the best. we can reasonably score the best -> we have a reasonable chance of getting one the user may have meant https://codereview.chromium.org/10913262/diff/21001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:94: class PositionFunctor { When / how does last_pos_ get set? https://codereview.chromium.org/10913262/diff/21001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:109: }; What will happen with an omnibox input foobar foobar for input foobar foo? Do we get two matches at each term foobar in the title or one? How do they sort? How do we calculate disorders and total length of matches? (Are we overcounting length of matched text?) I know I can answer all this by looking carefully at the code, but my job as a code reviewer is to make sure you're doing the right thing and to have you make it clear the code is doing the right thing (by adding comments or code as appropriate). :) https://codereview.chromium.org/10913262/diff/21001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:117: // Base score = 900. Add 100 for first match, 75 for next, 50, 25, 0. I like boosting for matching the URL in multiple *bookmarks*. Is that what you're doing here? Or are you boosting for matching multiple times in the title to one particular bookmark? It's not clear to me from either the code or the comments. (I'd have to dive into other files to see how match_positions is defined.) Can you clarify this code 117-121? Also, it's misleading to say base score = 900 because you know there will always be at least one match. Make the base score correspond to one match, then increment from there. P.S. I prefer boosting for matching in multiple bookmarks and leaving the percent coverage code to reflect the benefit of having multiple matches (multiple coverage) within one particular title. https://codereview.chromium.org/10913262/diff/21001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:122: // Add 1199 - score * total match length / title length I don't like this idea of using the magic number 1199 here. I'm trying to move away from this kind of stuff. I also don't like how we can boost two different matches that have 21% coverage differently depending on other factors (as I explained in a different comment). I'd prefer to see lines 122-131 be simpler, something more like relevance += (percent of title that's matched) i.e., 100 points for a full title match, 50 for half. To make this all stay under 1199 without having to explicitly use a cap, I'd suggest compressing the 900-1150 range you currently have based on number of matches. Perhaps simply go from 900 (one match), 975 (two matches), 1025 (three matches), 1050 (four or more matches). https://codereview.chromium.org/10913262/diff/21001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:133: relevance -= 50 * position_functor.TotalDisorders(); Lines 132-133 are okay for now. There's one thing I'm worried about. Suppose the user enters aaa bbb and you have two possible contenders with the same percent coverage): aaa bbb aaa foo foo bar and aaa bbb foo bar I'd think we should prefer the first because it has an extra match, even though it's out-of-order. This code prefers the second. I'm comfortable leaving this as-is because people tend not to enter multiple term inputs in the omnibox and because bookmark titles I think tend to be short, so it's not worth spending much time on this. However, for completeness I'd like to mention that a plausible way to do this would be to count the number of term pairs that never appear in the correct order in the title and only penalize for those. If a term pair appears at least once in the right order, we shouldn't count it as a disorder. https://codereview.chromium.org/10913262/diff/21001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:135: AutocompleteMatch match(this, relevance, false, comment on the meaning of false https://codereview.chromium.org/10913262/diff/21001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:186: for (Snippet::MatchPositions::const_iterator b_iter = ++a_iter; I think you can right this loop with only one iterator but if you want to leave it as is that's okay with me. https://codereview.chromium.org/10913262/diff/21001/chrome/common/metrics/pro... File chrome/common/metrics/proto/omnibox_event.proto (right): https://codereview.chromium.org/10913262/diff/21001/chrome/common/metrics/pro... chrome/common/metrics/proto/omnibox_event.proto:120: BOOKMARK_TITLE = 13; // A bookmark whose title contains the input. Please remember to check in the corresponding change in google's codebase (or ask me to do it right around when you submit).
On 2012/10/03 19:11:16, mrossetti wrote: > sky: metrics/metrics_log.cc. Metrics changes LGTM. Please prefer people in the metrics OWNERS file for reviewing future metrics changes -- there's a lot of subtle stuff here that we are more accustomed to being watchful for. Keep in mind that you'll also need server-side changes to match the protocol buffer definition changes.
You want Ilya to review metrics, not me.
Sorry about that. I scanned the OWNERS and saw 'sky' but it was a figment of my imagination. On Wed, Oct 3, 2012 at 2:54 PM, <isherman@chromium.org> wrote: > On 2012/10/03 19:11:16, mrossetti wrote: > >> sky: metrics/metrics_log.cc. >> > > Metrics changes LGTM. Please prefer people in the metrics OWNERS file for > reviewing future metrics changes -- there's a lot of subtle stuff here > that we > are more accustomed to being watchful for. > > Keep in mind that you'll also need server-side changes to match the > protocol > buffer definition changes. > > https://codereview.chromium.**org/10913262/<https://codereview.chromium.org/1... >
https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... File chrome/browser/autocomplete/autocomplete_classifier.cc (right): https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/autocomplete_classifier.cc:28: const int AutocompleteClassifier::kInstantExtendedOmniboxProviders = Talk to dcblack about whether you should be in this list. https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... File chrome/browser/autocomplete/autocomplete_controller.cc (right): https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/autocomplete_controller.cc:58: // NOTE: Default is equivalent to OMNIBOX_OTHER and must remain 64. OMNIBOX_OTHER? What is that? https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/autocomplete_controller.cc:144: // Disable the BookmarkProvider if there the switch says so. Nit: Comment adds nothing https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... File chrome/browser/autocomplete/autocomplete_match.h (right): https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/autocomplete_match.h:93: BOOKMARK_TITLE, // A bookmark whose title contains the input. Nit: I'd list this just below HISTORY_KEYWORD https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... File chrome/browser/autocomplete/autocomplete_provider.h (right): https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/autocomplete_provider.h:39: // UNKNOWN input type: These tables don't even list the HQP. Maybe we should just scrap them altogether. I'm not sure how much value they add anymore. https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/autocomplete_provider.h:192: TYPE_BOOKMARK = 1 << 10, Nit: Can we keep these in alphabetical order? https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... File chrome/browser/autocomplete/bookmark_provider.cc (right): https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:36: matches_.clear(); Seems like if |minimal_changes| is true we can just return immediately before clearing this, no? https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:38: // Only bother with UNKNOWN and QUERY. What about REQUESTED_URL? That's similar to UNKNOWN... https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:45: input.matches_requested() == AutocompleteInput::BEST_MATCH); We should probably bail if BEST_MATCH and input.prevent_inline_autocomplete(), the way the HQP does. https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:55: if (!bookmark_model_) { Does setting this lazily here speed up startup? If so, document that. If not, just do this in the constructor. https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:67: bookmark_model_->GetBookmarksWithTitlesMatching(input.text(), Does this match only exact prefixes, or is it smarter than that? The comments on its declaration don't say, and this file doesn't really have any comments about how the overall matching/sorting/scoring pipeline works. https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:113: AutocompleteMatch BookmarkProvider::TitleMatchToACMatch( Nit: This function could really use a description of how, in general, it matches and scores. https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:165: size_t text_length) { We do similar things in contact_provider_chromeos.cc:WordIsNamePrefix() and to a lesser degree ShortcutsProvider::ClassifyAllMatchesInString(). Can we share code somehow? Especially the "collapse overlapping positions and convert to classifications" part below can all be handled by repeated calls to AutocompleteMatch::MergeClassifications() as the ContactsProvider does. https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:167: if (positions.size() == 0) { Nit: empty() https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... File chrome/browser/autocomplete/bookmark_provider.h (right): https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.h:20: // provides autocomplete suggestions based on matches between the search terms Nit: "matches between the search terms entered by the user into the omnibox and" can be removed. https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.h:22: // titles and URLs of the bookmarks are not matched. Nit: Mention why. https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... File chrome/browser/autocomplete/bookmark_provider_unittest.cc (right): https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider_unittest.cc:76: TestBookmarkPosition(size_t from, size_t to) Nit: Use the same arg names as your member names ("begin(begin)" will work as an initializer) https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider_unittest.cc:189: // WHAT??? ?? https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider_unittest.cc:191: ASCIIToUTF16(""), false, false, false, Nit: string16()
mpearson: I've completely reworked scoring (thank you for the advice). jar: please sign off on the gypi changes. pkasting: please review changes generally. https://codereview.chromium.org/10913262/diff/21001/chrome/browser/autocomple... File chrome/browser/autocomplete/bookmark_provider.cc (right): https://codereview.chromium.org/10913262/diff/21001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:59: bookmark_model_ = BookmarkModelFactory::GetForProfile(profile_); On 2012/10/03 20:40:14, Mark P wrote: > On 2012/10/02 00:44:16, mrossetti wrote: > > On 2012/09/28 23:10:22, Mark P wrote: > > > Is this fast enough that it can be done synchronously on the first omnibox > > > keystroke? > > > > Right now, the BookmarkProvider will returning suggestions for search terms > with > > at least 3 characters. The response time where there are hundreds of bookmarks > > (I don't have an exact number for my test set but it's close to a thousand) is > > under 10 ms. I'd like to get this settled with the 3 limit and then experiment > > with reducing it to 1 after this is stable. > > This sounds fine to me. > > Where is the 3 set in code? I don't seem to see it in this file. > > My question, however, was much simpler than you took it to mean. I wanted to > know if this line in particular "BookmarkModelFactory::GetForProfile(profile_)" > is fast enough that it can be done synchronously on the first omnibox keystroke. > Can it? Yes, because if the bookmark model has not been loaded then this will cause it to be loaded. The data, however, will be loaded on the FILE thread so it won't slow down omnibox response, it just won't be able to provide any data for this particular query. > > There is one gotcha with the 3 limit tho: All search terms must be at least 3 > > characters long. So if 'cou' brings up your favorite bookmark with a title of > > "Country Lyrics" and you then type an 'l' as in "cou l" or even "cou ly" then > > the BP will return no results. Once the 'r' is typed, "cou lyr", the BP will > > once again start showing results. > > Ugh, what icky behavior. Will this go away if you can resolve the 3 -> 1 issue? It's complicated behavior down in the BookmarkIndex. An exact match is performed for queries with fewer than the minimum needed for a prefix match. That minimum is determined by the QueryParser::IsWordLongEnoughForPrefixSearch(term) function and has to do with optimizing SQLite queries. Minimizing this particular irritation is the ShortcutsProvider, which should serve up historical searches on foreshortened search strings. So for subsequent searches for "Country Lyrics" merely typing a 'c' or 'co' ought to bring up the most frequently accessed sites. (Frankly, I'd like to get to a point where a single-word search term is also considered as a possible acronym, matching the first letters in a list of words. Typing something like 'qsb' could find results like "Quick Search Box". But this is besides the point.) https://codereview.chromium.org/10913262/diff/21001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:65: // Retrieve enough bookmarks so that we can reasonably score the best. On 2012/10/03 20:40:14, Mark P wrote: > we can reasonably score the best > -> > we have a reasonable chance of getting one the user may have meant Done. https://codereview.chromium.org/10913262/diff/21001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:94: class PositionFunctor { On 2012/10/03 20:40:14, Mark P wrote: > When / how does last_pos_ get set? Good catch! It _was_ in there, I swear. I'll add a unit test that catches this once scoring is finalized. https://codereview.chromium.org/10913262/diff/21001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:109: }; On 2012/10/03 20:40:14, Mark P wrote: > What will happen with an omnibox input > foobar foobar > for input foobar foo? > Do we get two matches at each term foobar in the title or one? > How do theysort? > How do we calculate disorders and total length of matches? > (Are we overcounting length of matched text?) > > I know I can answer all this by looking carefully at the code, but my job as a > code reviewer is to make sure you're doing the right thing and to have you make > it clear the code is doing the right thing (by adding comments or code as > appropriate). :) The problem with relying on comments is that it's far too easy for the comments to get out-of-synch with the implementation. Also, I was trying to avoid influencing your thinking about how scoring ought to be done. In fact, I was hesitant to put _any_ scoring at all into this function and was simply going to give each result a score of 900. I don't know what came over me. :) https://codereview.chromium.org/10913262/diff/21001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:135: AutocompleteMatch match(this, relevance, false, On 2012/10/03 20:40:14, Mark P wrote: > comment on the meaning of false That's documented in the header comments for the AutocompleteMatch constructor. The false means that the item to which the AutocompleteMatch refers cannot be deleted. We don't want to allow the user to accidentally delete bookmarks from the suggestions list. https://codereview.chromium.org/10913262/diff/21001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:186: for (Snippet::MatchPositions::const_iterator b_iter = ++a_iter; On 2012/10/03 20:40:14, Mark P wrote: > I think you can right this loop with only one iterator but if you want to leave > it as is that's okay with me. OK, left as-is. https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... File chrome/browser/autocomplete/autocomplete_classifier.cc (right): https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/autocomplete_classifier.cc:28: const int AutocompleteClassifier::kInstantExtendedOmniboxProviders = On 2012/10/04 20:40:31, Peter Kasting wrote: > Talk to dcblack about whether you should be in this list. email sent to dcblack who approves. https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... File chrome/browser/autocomplete/autocomplete_controller.cc (right): https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/autocomplete_controller.cc:58: // NOTE: Default is equivalent to OMNIBOX_OTHER and must remain 64. On 2012/10/04 20:40:31, Peter Kasting wrote: > OMNIBOX_OTHER? What is that? Please see email sent separately. https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/autocomplete_controller.cc:144: // Disable the BookmarkProvider if there the switch says so. On 2012/10/04 20:40:31, Peter Kasting wrote: > Nit: Comment adds nothing Done. (I was just following the pattern used for ZeroSuggest.) https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... File chrome/browser/autocomplete/autocomplete_match.h (right): https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/autocomplete_match.h:93: BOOKMARK_TITLE, // A bookmark whose title contains the input. On 2012/10/04 20:40:31, Peter Kasting wrote: > Nit: I'd list this just below HISTORY_KEYWORD If it's okay with you, I think I'll leave it here so that it matches the positioning of BOOKMARK_TITLE found in omnibox_event.proto/ https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... File chrome/browser/autocomplete/autocomplete_provider.h (right): https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/autocomplete_provider.h:39: // UNKNOWN input type: On 2012/10/04 20:40:31, Peter Kasting wrote: > These tables don't even list the HQP. Maybe we should just scrap them > altogether. I'm not sure how much value they add anymore. Yeah, I know. As the comment above indicates, neither the HQP nor the SP are documented and, frankly, it's getting a lot tougher to adequately quantify the possibilities. On the other hand, we ought to have some guidance here to assist in the proper scoring of autocomplete results. I'll consult with mpearson and propose a replacement. https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/autocomplete_provider.h:192: TYPE_BOOKMARK = 1 << 10, On 2012/10/04 20:40:31, Peter Kasting wrote: > Nit: Can we keep these in alphabetical order? Sure. Done. https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... File chrome/browser/autocomplete/bookmark_provider.cc (right): https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:36: matches_.clear(); On 2012/10/04 20:40:31, Peter Kasting wrote: > Seems like if |minimal_changes| is true we can just return immediately before > clearing this, no? Ah! Excellent point. https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:38: // Only bother with UNKNOWN and QUERY. On 2012/10/04 20:40:31, Peter Kasting wrote: > What about REQUESTED_URL? That's similar to UNKNOWN... Done. https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:45: input.matches_requested() == AutocompleteInput::BEST_MATCH); On 2012/10/04 20:40:31, Peter Kasting wrote: > We should probably bail if BEST_MATCH and input.prevent_inline_autocomplete(), > the way the HQP does. Done. https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:55: if (!bookmark_model_) { On 2012/10/04 20:40:31, Peter Kasting wrote: > Does setting this lazily here speed up startup? If so, document that. If not, > just do this in the constructor. Done. (Premature optimization!) https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:67: bookmark_model_->GetBookmarksWithTitlesMatching(input.text(), On 2012/10/04 20:40:31, Peter Kasting wrote: > Does this match only exact prefixes, or is it smarter than that? The comments > on its declaration don't say, and this file doesn't really have any comments > about how the overall matching/sorting/scoring pipeline works. Yes, you are correct. The BookmarkIndex::GetBookmarksWithTitlesMatching behavior is not very well documented and has been a learning process. I will elaborate here as I rework the scoring with mpearson's input. https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:113: AutocompleteMatch BookmarkProvider::TitleMatchToACMatch( On 2012/10/04 20:40:31, Peter Kasting wrote: > Nit: This function could really use a description of how, in general, it matches > and scores. Forthcoming as I rework the scoring. https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:165: size_t text_length) { On 2012/10/04 20:40:31, Peter Kasting wrote: > We do similar things in contact_provider_chromeos.cc:WordIsNamePrefix() and to a > lesser degree ShortcutsProvider::ClassifyAllMatchesInString(). Can we share > code somehow? > > Especially the "collapse overlapping positions and convert to classifications" > part below can all be handled by repeated calls to > AutocompleteMatch::MergeClassifications() as the ContactsProvider does. Done. https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:167: if (positions.size() == 0) { On 2012/10/04 20:40:31, Peter Kasting wrote: > Nit: empty() Gah! https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... File chrome/browser/autocomplete/bookmark_provider.h (right): https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.h:20: // provides autocomplete suggestions based on matches between the search terms On 2012/10/04 20:40:31, Peter Kasting wrote: > Nit: "matches between the search terms entered by the user into the omnibox and" > can be removed. Done. https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.h:22: // titles and URLs of the bookmarks are not matched. On 2012/10/04 20:40:31, Peter Kasting wrote: > Nit: Mention why. Done, hopefully it's clear enough of an explanation. https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... File chrome/browser/autocomplete/bookmark_provider_unittest.cc (right): https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider_unittest.cc:76: TestBookmarkPosition(size_t from, size_t to) On 2012/10/04 20:40:31, Peter Kasting wrote: > Nit: Use the same arg names as your member names ("begin(begin)" will work as an > initializer) Done. https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider_unittest.cc:189: // WHAT??? On 2012/10/04 20:40:31, Peter Kasting wrote: > ?? Leftovers removed. https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider_unittest.cc:191: ASCIIToUTF16(""), false, false, false, On 2012/10/04 20:40:31, Peter Kasting wrote: > Nit: string16() Done.
https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... File chrome/browser/autocomplete/autocomplete_match.h (right): https://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomple... chrome/browser/autocomplete/autocomplete_match.h:93: BOOKMARK_TITLE, // A bookmark whose title contains the input. On 2012/10/05 22:10:04, mrossetti wrote: > On 2012/10/04 20:40:31, Peter Kasting wrote: > > Nit: I'd list this just below HISTORY_KEYWORD > > If it's okay with you, I think I'll leave it here so that it matches the > positioning of BOOKMARK_TITLE found in omnibox_event.proto/ It's nice to have these grouped as (URLs, Searches), but I guess it's not a huge deal. Although we could list the proto enum values "out of order" since they're all explicitly assigned...
I'm not a chrome/* owner, so I can't do much with a review of of the gypi files.
.gyp* LGTM
http://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomplet... File chrome/browser/autocomplete/autocomplete_classifier.cc (right): http://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomplet... chrome/browser/autocomplete/autocomplete_classifier.cc:28: const int AutocompleteClassifier::kInstantExtendedOmniboxProviders = On 2012/10/05 22:10:04, mrossetti wrote: > On 2012/10/04 20:40:31, Peter Kasting wrote: > > Talk to dcblack about whether you should be in this list. > > email sent to dcblack who approves. If you said dcblack approves of being in this list, then please remember to add yourself to the list. http://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomplet... File chrome/browser/autocomplete/autocomplete_provider.h (right): http://codereview.chromium.org/10913262/diff/29002/chrome/browser/autocomplet... chrome/browser/autocomplete/autocomplete_provider.h:39: // UNKNOWN input type: On 2012/10/05 22:10:04, mrossetti wrote: > On 2012/10/04 20:40:31, Peter Kasting wrote: > > These tables don't even list the HQP. Maybe we should just scrap them > > altogether. I'm not sure how much value they add anymore. > > Yeah, I know. As the comment above indicates, neither the HQP nor the SP are > documented and, frankly, it's getting a lot tougher to adequately quantify the > possibilities. On the other hand, we ought to have some guidance here to assist > in the proper scoring of autocomplete results. I'll consult with mpearson and > propose a replacement. Something is better than nothing. I'd strongly prefer we keep them. I use them for quick reference sometime rather than hunting through code. http://codereview.chromium.org/10913262/diff/38001/chrome/browser/autocomplet... File chrome/browser/autocomplete/autocomplete_provider.h (right): http://codereview.chromium.org/10913262/diff/38001/chrome/browser/autocomplet... chrome/browser/autocomplete/autocomplete_provider.h:182: TYPE_BOOKMARK = 1 << 10, On 2012/10/05 22:10:04, mrossetti wrote: > On 2012/10/04 20:40:31, Peter Kasting wrote: > > Nit: Can we keep these in alphabetical order? > > Sure. Done. Side comment: alphabetic order without renumbering the values will make it easier for people to screw something up when adding a new provider later. Consider renumbering (if it's allowed). http://codereview.chromium.org/10913262/diff/38001/chrome/browser/autocomplet... File chrome/browser/autocomplete/bookmark_provider.cc (right): http://codereview.chromium.org/10913262/diff/38001/chrome/browser/autocomplet... chrome/browser/autocomplete/bookmark_provider.cc:13: #include "base/utf_string_conversions.h" What do you need this for? http://codereview.chromium.org/10913262/diff/38001/chrome/browser/autocomplet... chrome/browser/autocomplete/bookmark_provider.cc:75: // separately. In the following I use the 'term' to refer to a search word. Perhaps replace this bullet with - The search text is broken up into search terms. http://codereview.chromium.org/10913262/diff/38001/chrome/browser/autocomplet... chrome/browser/autocomplete/bookmark_provider.cc:85: // exact order to match. with no intervening words (I assume) http://codereview.chromium.org/10913262/diff/38001/chrome/browser/autocomplet... chrome/browser/autocomplete/bookmark_provider.cc:86: // - There are no wildcards. Consider dropping this bullet. http://codereview.chromium.org/10913262/diff/38001/chrome/browser/autocomplet... chrome/browser/autocomplete/bookmark_provider.cc:88: // in order to gain a good understanding of how title searches are performed gain a good understanding of -> understand or perhaps learn http://codereview.chromium.org/10913262/diff/38001/chrome/browser/autocomplet... chrome/browser/autocomplete/bookmark_provider.cc:152: // Compose a match that has the URL of the bookmar and the bookmark's title, bookmark http://codereview.chromium.org/10913262/diff/38001/chrome/browser/autocomplet... chrome/browser/autocomplete/bookmark_provider.cc:155: AutocompleteMatch match(this, 0, false, AutocompleteMatch::BOOKMARK_TITLE); On 2012/10/05 22:10:04, mrossetti wrote: > On 2012/10/03 20:40:14, Mark P wrote: > > comment on the meaning of false > > That's documented in the header comments for the AutocompleteMatch constructor. > > The false means that the item to which the AutocompleteMatch refers cannot be > deleted. We don't want to allow the user to accidentally delete bookmarks from > the suggestions list. Perhaps I wasn't clear. Please add a comment here that you pass false to make the item not deletable because you don't want to let user delete a bookmark by deleting the autocomplete entry (and you don't want to add a class of bookmarks that cannot be searched). http://codereview.chromium.org/10913262/diff/38001/chrome/browser/autocomplet... chrome/browser/autocomplete/bookmark_provider.cc:157: if (title.empty()) How can this happen? http://codereview.chromium.org/10913262/diff/38001/chrome/browser/autocomplet... chrome/browser/autocomplete/bookmark_provider.cc:178: // 2) how many times the bookmark's URL is referenced by other bookmarks. 3) http://codereview.chromium.org/10913262/diff/38001/chrome/browser/autocomplet... chrome/browser/autocomplete/bookmark_provider.cc:184: title_match.match_positions.end(), ScoringFunctor(title.size())); Can you please comment here or elsewhere what will match_position look like for an omnibox input foobar foobar for input foobar foobar foo? I'm trying to figure out what score this will get (e.g., could it get something >1.0, which it sounds like you might be worried about--see lines 189-190.) A good comment will also help explain in general how match_position looks when the omnibox input is multiple terms. This will help a reader estimate the value of the factor you calculate. http://codereview.chromium.org/10913262/diff/38001/chrome/browser/autocomplet... chrome/browser/autocomplete/bookmark_provider.cc:203: kURLCountBoost[std::min(arraysize(kURLCountBoost), nodes.size()) - 1]; Do you want a max(, 0) in there as a precaution, or do you trust the bookmark model will always still exist and be able to re-find your URL (and hence the count will always be at least 1)? http://codereview.chromium.org/10913262/diff/38001/chrome/browser/autocomplet... File chrome/browser/autocomplete/bookmark_provider.h (right): http://codereview.chromium.org/10913262/diff/38001/chrome/browser/autocomplet... chrome/browser/autocomplete/bookmark_provider.h:29: // asynch completion performed. |minimal_changes| is not ignored. Please comment on how it's used.
http://codereview.chromium.org/10913262/diff/38001/chrome/browser/autocomplet... File chrome/browser/autocomplete/autocomplete_provider.h (right): http://codereview.chromium.org/10913262/diff/38001/chrome/browser/autocomplet... chrome/browser/autocomplete/autocomplete_provider.h:182: TYPE_BOOKMARK = 1 << 10, On 2012/10/08 22:51:59, Mark P wrote: > On 2012/10/05 22:10:04, mrossetti wrote: > > On 2012/10/04 20:40:31, Peter Kasting wrote: > > > Nit: Can we keep these in alphabetical order? > > > > Sure. Done. > > Side comment: alphabetic order without renumbering the values will make it > easier for people to screw something up when adding a new provider later. > Consider renumbering (if it's allowed). Agreed, my intent was that these also be renumbered.
All issues addressed. Scott: Please note that I have made changes to bookmark_index.h/.cc and query_parser.h/.cc that will need your review. The change to QueryNodeWord::HasMatchIn(...) is the most significant. I've looked over clients of that function and this change looks harmless but I need your reassurance. https://codereview.chromium.org/10913262/diff/38001/chrome/browser/autocomple... File chrome/browser/autocomplete/autocomplete_provider.h (right): https://codereview.chromium.org/10913262/diff/38001/chrome/browser/autocomple... chrome/browser/autocomplete/autocomplete_provider.h:182: TYPE_BOOKMARK = 1 << 10, It looks like renumber would be harmless. Done. On 2012/10/08 22:51:59, Mark P wrote: > On 2012/10/05 22:10:04, mrossetti wrote: > > On 2012/10/04 20:40:31, Peter Kasting wrote: > > > Nit: Can we keep these in alphabetical order? > > > > Sure. Done. > > Side comment: alphabetic order without renumbering the values will make it > easier for people to screw something up when adding a new provider later. > Consider renumbering (if it's allowed). https://codereview.chromium.org/10913262/diff/38001/chrome/browser/autocomple... File chrome/browser/autocomplete/bookmark_provider.cc (right): https://codereview.chromium.org/10913262/diff/38001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:13: #include "base/utf_string_conversions.h" Don't. Removed. On 2012/10/08 22:51:59, Mark P wrote: > What do you need this for? https://codereview.chromium.org/10913262/diff/38001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:75: // separately. In the following I use the 'term' to refer to a search word. On 2012/10/08 22:51:59, Mark P wrote: > Perhaps replace this bullet with > - The search text is broken up into search terms. Done. https://codereview.chromium.org/10913262/diff/38001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:85: // exact order to match. Yes. On 2012/10/08 22:51:59, Mark P wrote: > with no intervening words > (I assume) https://codereview.chromium.org/10913262/diff/38001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:86: // - There are no wildcards. On 2012/10/08 22:51:59, Mark P wrote: > Consider dropping this bullet. Done. https://codereview.chromium.org/10913262/diff/38001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:88: // in order to gain a good understanding of how title searches are performed Reworded. On 2012/10/08 22:51:59, Mark P wrote: > gain a good understanding of > -> > understand > or perhaps > learn https://codereview.chromium.org/10913262/diff/38001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:152: // Compose a match that has the URL of the bookmar and the bookmark's title, Oops. On 2012/10/08 22:51:59, Mark P wrote: > bookmark https://codereview.chromium.org/10913262/diff/38001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:155: AutocompleteMatch match(this, 0, false, AutocompleteMatch::BOOKMARK_TITLE); I've added some comments. Please note that deletable really means that the source of the suggestion, the bookmark in this case, can or cannot be deleted. When a history item suggestion is deleted that underlying history record is deleted from the HUP and/or HQP. On 2012/10/08 22:51:59, Mark P wrote: > On 2012/10/05 22:10:04, mrossetti wrote: > > On 2012/10/03 20:40:14, Mark P wrote: > > > comment on the meaning of false > > > > That's documented in the header comments for the AutocompleteMatch > constructor. > > > > The false means that the item to which the AutocompleteMatch refers cannot be > > deleted. We don't want to allow the user to accidentally delete bookmarks from > > the suggestions list. > > Perhaps I wasn't clear. Please add a comment here that you pass false to make > the item not deletable because you don't want to let user delete a bookmark by > deleting the autocomplete entry (and you don't want to add a class of bookmarks > that cannot be searched). https://codereview.chromium.org/10913262/diff/38001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:157: if (title.empty()) It probably cannot happen. I'll change it to a DCHECK. On 2012/10/08 22:51:59, Mark P wrote: > How can this happen? https://codereview.chromium.org/10913262/diff/38001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:178: // 2) how many times the bookmark's URL is referenced by other bookmarks. I'm glad one of us can count! On 2012/10/08 22:51:59, Mark P wrote: > 3) https://codereview.chromium.org/10913262/diff/38001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:184: title_match.match_positions.end(), ScoringFunctor(title.size())); On 2012/10/08 22:51:59, Mark P wrote: > Can you please comment here or elsewhere what will match_position look like for an omnibox input > foobar foobar > for input foobar foobar foo? I'm trying to figure out what score this will get > (e.g., could it get something >1.0, which it sounds like you might be worried > about--see lines 189-190.) I don't think I'll add any consideration for repeated search terms. The case of a search query of "foobar foobar" will be equivalent to a search query of "foobar" and both will have the same score when thrown against "foobar foobar foo". This means that a query of "foobar foobar" will match against a bookmark with a single instance of the word "foobar". If this causes the user any consternation then we could certainly add logic that would require two separate instances of 'foobar' to occur in the bookmark name but I'm not anxious to add that logic unless necessary. (Note that a quoted query of "\"foobar foobar\"" _will_ require both words.) > A good comment will also help explain in general how match_position looks when > the omnibox input is multiple terms. This will help a reader estimate the value > of the factor you calculate. I'll add some comments, but comments, even good ones, do not substitute for a serious reading of the code. A relevant comment on ACMatchClassification being compressed and combined was added just above the call to GetBookmarksWithTitlesMatching in BookmarkProvider::DoAutocomplete. https://codereview.chromium.org/10913262/diff/38001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:203: kURLCountBoost[std::min(arraysize(kURLCountBoost), nodes.size()) - 1]; I'm sure we can trust that there will always be at least 1 bookmark with the given URL, but I'll add a DCHECK just to be safe. On 2012/10/08 22:51:59, Mark P wrote: > Do you want a max(, 0) in there as a precaution, or do you trust the bookmark > model will always still exist and be able to re-find your URL (and hence the > count will always be at least 1)? https://codereview.chromium.org/10913262/diff/38001/chrome/browser/autocomple... File chrome/browser/autocomplete/bookmark_provider.h (right): https://codereview.chromium.org/10913262/diff/38001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.h:29: // asynch completion performed. Great catch! Thanks! On 2012/10/08 22:51:59, Mark P wrote: > |minimal_changes| is not ignored. Please comment on how it's used.
LGTM
https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... File chrome/browser/autocomplete/bookmark_provider.cc (right): https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:47: (input.matches_requested() == AutocompleteInput::BEST_MATCH && Nit: Be consistent about whether boolean subexpressions always get their own set of parens https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:48: input.prevent_inline_autocomplete())) Nit: Might want a comment about why we're bailing in this last case, a la the similar HQP comment. https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:143: (title_length_ - match.first) / title_length_; Nit: Indent 4 (only function args get indented even) https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:159: // is never changed from 0 that the match will be discarded. false is passed Nit: Kill passive voice: "The AutocompleteMatch we construct is non-deletable because the only way to support this would be to delete the underlying bookmark, which the user probably didn't intend." https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... File chrome/browser/autocomplete/bookmark_provider.h (right): https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.h:21: // titles, URLs, and typed and visit counts of the bookmarks are not taken into The old comment said "Page titles and URLs are not matched", which makes sense given this larger comment. But I don't get the bit about typed and visit counts you added. Why wouldn't we still use typed/visit counts as an input to our scoring algorithm? After all, the real goal of this provider is to match bookmarks whose titles differ from their URL/page title, and thus would be missed by the HQP. Right? https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... File chrome/browser/autocomplete/bookmark_provider_unittest.cc (right): https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider_unittest.cc:80: // Log the positions for unit test diagnostics. Nit: Is this a comment about the class as a whole? In which case it should be above the class. https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider_unittest.cc:89: LOG(WARNING) << title << ":"; Using LOG(WARNING) directly doesn't seem right. Can we maybe return a string instead, so that the caller can do something more like: EXPECT_TRUE(...) << "EXPECTED: " << BookmarkPositions(...) << "ACTUAL: " << BookmarkPositions(...); https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider_unittest.cc:134: const size_t expectations[9][2], int* score) { Nit: One arg per line https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider_unittest.cc:209: // This next set is for scoring validation of single term matches. I would really rather not expect precise scores from the scoring algorithm. This will break any time anyone makes a change to the algorithm for good or ill, and people will just blindly paste the new values in. Instead can we write the test so it expects more general conditions, things like "longer matches score higher than shorter ones" and the like? These will require less maintenance but be more likely to actually catch regressions.
I think we're almost there. :) --mark https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... File chrome/browser/autocomplete/bookmark_provider.cc (right): https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:86: // Note that the AutocompleteMatch::ACMatchClassification for each match I don't see how this paragraph is relevant here. Or are you talking about match_positions, not ACMatchClassifications? https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:185: // ACMatchClassifications) calculate a 'factor', sum up those factors, then I'm confused. I thought these matches were given by MatchPositions. https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:189: // The relevance calculation is based on: Please divide this into section things: things that go into the factor (#1, #2) and other things under consideration (#3). https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:191: // (term length / title length), term length -> total match length https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:202: // partial factor of 8/14 = 0.571) than 'abcde' (5/14 = 0.357). Please revise the text to make it clearer that this is an example of the factor from clause 1 only. I didn't make this connection at first. https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:209: // See ScoringFunctor, above, for the details on how the factor is calculated. Only the first two clauses above go into the factor, not the third. Once you revise in response to my comment at line 189, this sentence here should be clearer. https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:212: title_match.match_positions.end(), ScoringFunctor(title.size())); On 2012/10/10 02:26:42, mrossetti wrote: > On 2012/10/08 22:51:59, Mark P wrote: > > Can you please comment here or elsewhere what will match_position look like > for an omnibox input > > foobar foobar > > for input foobar foobar foo? I'm trying to figure out what score this will > get > > (e.g., could it get something >1.0, which it sounds like you might be worried > > about--see lines 189-190.) > > I don't think I'll add any consideration for repeated search terms. Actually, let me change my request. Suppose the bookmark title is foobar foobar and the omnibox input is foobar foo. What do match_position look like? Will it looks like foo doesn't match at all because all of its matches are subsumed by the longer foobar matches? > The case of > a search query of "foobar foobar" will be equivalent to a search query of > "foobar" and both will have the same score when thrown against "foobar foobar > foo". This means that a query of "foobar foobar" will match against a bookmark > with a single instance of the word "foobar". If this causes the user any > consternation then we could certainly add logic that would require two separate > instances of 'foobar' to occur in the bookmark name but I'm not anxious to add > that logic unless necessary. (Note that a quoted query of "\"foobar foobar\"" > _will_ require both words.) > > > A good comment will also help explain in general how match_position looks when > > the omnibox input is multiple terms. This will help a reader estimate the > value > > of the factor you calculate. > > I'll add some comments, but comments, even good ones, do not substitute for a > serious reading of the code. > > A relevant comment on ACMatchClassification being compressed and combined was > added just above the call to GetBookmarksWithTitlesMatching in > BookmarkProvider::DoAutocomplete. I don't think that helps much. There's a big different between match position coming back from the bookmark search (this involves term matching) and the autocomplete system's internal match classifications (this involves highlighting only).
Issues hopefully addressed. sky@: I found another issue in the QueryParser so would appreciate your taking a look at the changes I made to the CoalesceMatchesFrom(…) function in query_parser.cc. https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... File chrome/browser/autocomplete/bookmark_provider.cc (right): https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:47: (input.matches_requested() == AutocompleteInput::BEST_MATCH && On 2012/10/10 18:29:06, Peter Kasting wrote: > Nit: Be consistent about whether boolean subexpressions always get their own set > of parens Done. https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:48: input.prevent_inline_autocomplete())) On 2012/10/10 18:29:06, Peter Kasting wrote: > Nit: Might want a comment about why we're bailing in this last case, a la the > similar HQP comment. Done. https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:86: // Note that the AutocompleteMatch::ACMatchClassification for each match On 2012/10/10 22:28:57, Mark P wrote: > I don't see how this paragraph is relevant here. > > Or are you talking about match_positions, not ACMatchClassifications? Reworded. https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:143: (title_length_ - match.first) / title_length_; On 2012/10/10 18:29:06, Peter Kasting wrote: > Nit: Indent 4 (only function args get indented even) Done. https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:159: // is never changed from 0 that the match will be discarded. false is passed On 2012/10/10 18:29:06, Peter Kasting wrote: > Nit: Kill passive voice: "The AutocompleteMatch we construct is non-deletable > because the only way to support this would be to delete the underlying bookmark, > which the user probably didn't intend." Done. https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:185: // ACMatchClassifications) calculate a 'factor', sum up those factors, then On 2012/10/10 22:28:57, Mark P wrote: > I'm confused. I thought these matches were given by MatchPositions. Correct. I've edited and re-edited these comments so much that I'm losing track... ;^) https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:189: // The relevance calculation is based on: On 2012/10/10 22:28:57, Mark P wrote: > Please divide this into section things: things that go into the factor (#1, #2) > and other things under consideration (#3). Done. https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:191: // (term length / title length), On 2012/10/10 22:28:57, Mark P wrote: > term length -> total match length "term length". I've changed a couple of references to "input text" and "matches" to "term" in this section to help make this clear. https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:202: // partial factor of 8/14 = 0.571) than 'abcde' (5/14 = 0.357). On 2012/10/10 22:28:57, Mark P wrote: > Please revise the text to make it clearer that this is an example of the factor > from clause 1 only. I didn't make this connection at first. I moved the examples to be adjacent to the points that each is describing. https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:209: // See ScoringFunctor, above, for the details on how the factor is calculated. On 2012/10/10 22:28:57, Mark P wrote: > Only the first two clauses above go into the factor, not the third. Once you > revise in response to my comment at line 189, this sentence here should be > clearer. Done. https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:212: title_match.match_positions.end(), ScoringFunctor(title.size())); On 2012/10/10 22:28:57, Mark P wrote: > On 2012/10/10 02:26:42, mrossetti wrote: > > On 2012/10/08 22:51:59, Mark P wrote: > > > Can you please comment here or elsewhere what will match_position look like for an omnibox input > > > foobar foobar > > > for input foobar foobar foo? I'm trying to figure out what score this will get (e.g., could it get something >1.0, which it sounds like you might be worried about--see lines 189-190.) > > > > I don't think I'll add any consideration for repeated search terms. > > Actually, let me change my request. Suppose the bookmark title is foobar foobar and the omnibox input is foobar foo. What do match_position look like? Will it looks like foo doesn't match at all because all of its matches are subsumed by the longer foobar matches? There will be two match positions representing both matches against both 'foobar's and where the intervening space is not matched. The addition term foo is, essentially, irrelevant. To be clear, the BookmarkIndex is doing the matching. It breaks down the query into words (regardless of quoting). It will scan for one of the words from "foobar foo" and then the other. The match positions for each term will then be coalesced. "foo" will match as follows: foobar foobar ^^^ ^^^ and "foobar" will match as follows: foobar foobar ^^^^^^ ^^^^^^ The two matches are then coalesced. And the results are as if there were no 'foo' in the query. > > The case of > > a search query of "foobar foobar" will be equivalent to a search query of > > "foobar" and both will have the same score when thrown against "foobar foobar > > foo". This means that a query of "foobar foobar" will match against a bookmark > > with a single instance of the word "foobar". If this causes the user any > > consternation then we could certainly add logic that would require two > separate > > instances of 'foobar' to occur in the bookmark name but I'm not anxious to add > > that logic unless necessary. (Note that a quoted query of "\"foobar foobar\"" > > _will_ require both words.) > > > > > A good comment will also help explain in general how match_position looks > when > > > the omnibox input is multiple terms. This will help a reader estimate the > > value > > > of the factor you calculate. > > > > I'll add some comments, but comments, even good ones, do not substitute for a > > serious reading of the code. > > > > A relevant comment on ACMatchClassification being compressed and combined was > > added just above the call to GetBookmarksWithTitlesMatching in > > BookmarkProvider::DoAutocomplete. > > I don't think that helps much. There's a big different between match position > coming back from the bookmark search (this involves term matching) and the > autocomplete system's internal match classifications (this involves highlighting > only). Let me know if the new wording is an improvement. As far as match positions from the BookmarkIndex search is concerned, it serves dual-duty. The match position information is used in scoring but also for highlighting. I'm sorry if my mixed and mingled use of that phrase along with ACMatchClassifications and synonyms caused confusion. Hopefully I've cleared those up. https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... File chrome/browser/autocomplete/bookmark_provider.h (right): https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.h:21: // titles, URLs, and typed and visit counts of the bookmarks are not taken into Exactly right. But I wanted to get this basic implementation into place before complicating it with interactions with the HQP (or, potentially, the History database—not my preferred option). I am sorry I did not make that intention clear initially. I'm also trying to figure out how to best coordinate the results form the BP with those from the HQP and think I'll have a better handle on that once this gets landed. This might be done in the merge phase by the AutocompleteController. Or it might be done by the BP while indexing a bookmark by recognizing that the HQP also knows about the URL. Or it might be done by the HQP during indexing by recognizing that the URL is also referenced by a bookmark and adding the bookmark title to the indexed corpus. There are a number of options and I'm not prepared to choose one at the moment. And finally, I'm trying to avoid tossing a massive change onto your lap for review; as I've been known to do in the past. I'll add a TODO hinting at this. If it's okay with you, I'd like to get this settled as-is and then propose in a future CL how to best coordinate with the history. On 2012/10/10 18:29:06, Peter Kasting wrote: > The old comment said "Page titles and URLs are not matched", which makes sense > given this larger comment. But I don't get the bit about typed and visit counts > you added. Why wouldn't we still use typed/visit counts as an input to our > scoring algorithm? After all, the real goal of this provider is to match > bookmarks whose titles differ from their URL/page title, and thus would be > missed by the HQP. Right? https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... File chrome/browser/autocomplete/bookmark_provider_unittest.cc (right): https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider_unittest.cc:80: // Log the positions for unit test diagnostics. Removed leftovers. On 2012/10/10 18:29:06, Peter Kasting wrote: > Nit: Is this a comment about the class as a whole? In which case it should be > above the class. https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider_unittest.cc:89: LOG(WARNING) << title << ":"; Great idea! Done. On 2012/10/10 18:29:06, Peter Kasting wrote: > Using LOG(WARNING) directly doesn't seem right. Can we maybe return a string > instead, so that the caller can do something more like: > > EXPECT_TRUE(...) << "EXPECTED: " << BookmarkPositions(...) << "ACTUAL: " << > BookmarkPositions(...); https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider_unittest.cc:134: const size_t expectations[9][2], int* score) { On 2012/10/10 18:29:06, Peter Kasting wrote: > Nit: One arg per line Done. https://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider_unittest.cc:209: // This next set is for scoring validation of single term matches. Okay. Done. On 2012/10/10 18:29:06, Peter Kasting wrote: > I would really rather not expect precise scores from the scoring algorithm. > This will break any time anyone makes a change to the algorithm for good or ill, > and people will just blindly paste the new values in. > > Instead can we write the test so it expects more general conditions, things like > "longer matches score higher than shorter ones" and the like? These will > require less maintenance but be more likely to actually catch regressions.
LGTM https://codereview.chromium.org/10913262/diff/65001/chrome/browser/autocomple... File chrome/browser/autocomplete/bookmark_provider.cc (right): https://codereview.chromium.org/10913262/diff/65001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:44: // none of the BookmarkProvider's matches can score high enough to qualify. Nit: This comment is not actually technically accurate -- you mean something like "...when looking for BEST_MATCH and inline autocompletion is disabled..." https://codereview.chromium.org/10913262/diff/65001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:162: // that the AutocompleteMatch we construct is non-deletable because the only Nit: For clarity, I'd remove everything in this sentence before the words "...the AutocompleteMatch we construct...". https://codereview.chromium.org/10913262/diff/65001/chrome/browser/autocomple... File chrome/browser/autocomplete/bookmark_provider_unittest.cc (right): https://codereview.chromium.org/10913262/diff/65001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider_unittest.cc:108: if (first) Nit: Or just eliminate |first|: if (i != positions.begin()) position_string += ", "; https://codereview.chromium.org/10913262/diff/65001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider_unittest.cc:115: position_string += "}"; Nit: Might as well tack on a \n too since callers want it and MatchesAsString16() does something similar https://codereview.chromium.org/10913262/diff/65001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider_unittest.cc:332: if (j >= query_data[i].match_count) { Nit: Why not something more like: EXPECT_LT(j, query_data[i].match_count) << "Unexpected match" ... EXPECT_LT(j, matches.size()) << "Missing match" ... ... ? If you really want to preserve that we only report the first error we find in this loop you can either switch from EXPECT to ASSERT or add conditionals like "if (j >= query_data[i].match_count) continue;" after the EXPECTs.
https://codereview.chromium.org/10913262/diff/65001/chrome/browser/history/qu... File chrome/browser/history/query_parser.cc (right): https://codereview.chromium.org/10913262/diff/65001/chrome/browser/history/qu... chrome/browser/history/query_parser.cc:39: mp.first = std::min(mp.first, i->first); I get the need for std::max on 40, bug 39 shouldn't be needed, right? The snippets are ordered by 'first'.
Addressed all comments. mpearson, any last words? ;^) https://codereview.chromium.org/10913262/diff/65001/chrome/browser/autocomple... File chrome/browser/autocomplete/bookmark_provider.cc (right): https://codereview.chromium.org/10913262/diff/65001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:44: // none of the BookmarkProvider's matches can score high enough to qualify. On 2012/10/15 22:17:22, Peter Kasting wrote: > Nit: This comment is not actually technically accurate -- you mean something > like "...when looking for BEST_MATCH and inline autocompletion is disabled..." Ah, but of course. https://codereview.chromium.org/10913262/diff/65001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:162: // that the AutocompleteMatch we construct is non-deletable because the only On 2012/10/15 22:17:22, Peter Kasting wrote: > Nit: For clarity, I'd remove everything in this sentence before the words > "...the AutocompleteMatch we construct...". Sounds good. https://codereview.chromium.org/10913262/diff/65001/chrome/browser/autocomple... File chrome/browser/autocomplete/bookmark_provider_unittest.cc (right): https://codereview.chromium.org/10913262/diff/65001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider_unittest.cc:108: if (first) On 2012/10/15 22:17:22, Peter Kasting wrote: > Nit: Or just eliminate |first|: > > if (i != positions.begin()) > position_string += ", "; Done. https://codereview.chromium.org/10913262/diff/65001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider_unittest.cc:115: position_string += "}"; On 2012/10/15 22:17:22, Peter Kasting wrote: > Nit: Might as well tack on a \n too since callers want it and > MatchesAsString16() does something similar Done. https://codereview.chromium.org/10913262/diff/65001/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider_unittest.cc:332: if (j >= query_data[i].match_count) { On 2012/10/15 22:17:22, Peter Kasting wrote: > Nit: Why not something more like: > > EXPECT_LT(j, query_data[i].match_count) << "Unexpected match" ... > EXPECT_LT(j, matches.size()) << "Missing match" ... > ... > > ? If you really want to preserve that we only report the first error we find in > this loop you can either switch from EXPECT to ASSERT or add conditionals like > "if (j >= query_data[i].match_count) continue;" after the EXPECTs. Okay, I can make that work. There really is only one error condition possible for each iteration so the continue handles things just fine. https://codereview.chromium.org/10913262/diff/65001/chrome/browser/history/qu... File chrome/browser/history/query_parser.cc (right): https://codereview.chromium.org/10913262/diff/65001/chrome/browser/history/qu... chrome/browser/history/query_parser.cc:39: mp.first = std::min(mp.first, i->first); On 2012/10/15 23:32:34, sky wrote: > I get the need for std::max on 40, bug 39 shouldn't be needed, right? The > snippets are ordered by 'first'. Makes sense. Thanks.
lgtm Minor comments. --mark http://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomplet... File chrome/browser/autocomplete/bookmark_provider.cc (right): http://codereview.chromium.org/10913262/diff/51001/chrome/browser/autocomplet... chrome/browser/autocomplete/bookmark_provider.cc:212: title_match.match_positions.end(), ScoringFunctor(title.size())); On 2012/10/15 19:22:46, mrossetti wrote: > On 2012/10/10 22:28:57, Mark P wrote: > > On 2012/10/10 02:26:42, mrossetti wrote: > > > On 2012/10/08 22:51:59, Mark P wrote: > > > > Can you please comment here or elsewhere what will match_position look > like for an omnibox input > > > > foobar foobar > > > > for input foobar foobar foo? I'm trying to figure out what score this > will get (e.g., could it get something >1.0, which it sounds like you might be > worried about--see lines 189-190.) > > > > > > I don't think I'll add any consideration for repeated search terms. > > > > Actually, let me change my request. Suppose the bookmark title is foobar > foobar and the omnibox input is foobar foo. What do match_position look like? > Will it looks like foo doesn't match at all because all of its matches are > subsumed by the longer foobar matches? > > There will be two match positions representing both matches against both > 'foobar's and where the intervening space is not matched. The addition term foo > is, essentially, irrelevant. > > To be clear, the BookmarkIndex is doing the matching. It breaks down the query > into words (regardless of quoting). It will scan for one of the words from > "foobar foo" and then the other. The match positions for each term will then be > coalesced. > > "foo" will match as follows: > > foobar foobar > ^^^ ^^^ > > and "foobar" will match as follows: > > foobar foobar > ^^^^^^ ^^^^^^ > > The two matches are then coalesced. And the results are as if there were no > 'foo' in the query. > > > > The case of > > > a search query of "foobar foobar" will be equivalent to a search query of > > > "foobar" and both will have the same score when thrown against "foobar > foobar > > > foo". This means that a query of "foobar foobar" will match against a > bookmark > > > with a single instance of the word "foobar". If this causes the user any > > > consternation then we could certainly add logic that would require two > > separate > > > instances of 'foobar' to occur in the bookmark name but I'm not anxious to > add > > > that logic unless necessary. (Note that a quoted query of "\"foobar > foobar\"" > > > _will_ require both words.) > > > > > > > A good comment will also help explain in general how match_position looks > > when > > > > the omnibox input is multiple terms. This will help a reader estimate the > > > value > > > > of the factor you calculate. > > > > > > I'll add some comments, but comments, even good ones, do not substitute for > a > > > serious reading of the code. > > > > > > A relevant comment on ACMatchClassification being compressed and combined > was > > > added just above the call to GetBookmarksWithTitlesMatching in > > > BookmarkProvider::DoAutocomplete. > > > > I don't think that helps much. There's a big different between match position > > coming back from the bookmark search (this involves term matching) and the > > autocomplete system's internal match classifications (this involves > highlighting > > only). > > Let me know if the new wording is an improvement. > > As far as match positions from the BookmarkIndex search is concerned, it serves > dual-duty. The match position information is used in scoring but also for > highlighting. I'm sorry if my mixed and mingled use of that phrase along with > ACMatchClassifications and synonyms caused confusion. Hopefully I've cleared > those up. Thanks for the explanation. http://codereview.chromium.org/10913262/diff/78003/chrome/browser/autocomplet... File chrome/browser/autocomplete/bookmark_provider.cc (right): http://codereview.chromium.org/10913262/diff/78003/chrome/browser/autocomplet... chrome/browser/autocomplete/bookmark_provider.cc:157: AutocompleteMatch BookmarkProvider::TitleMatchToACMatch( I found this comment that you deleted useful: Note that if the relevance is never changed from 0 that the match will be discarded. Please restore it somewhere. http://codereview.chromium.org/10913262/diff/78003/chrome/browser/autocomplet... chrome/browser/autocomplete/bookmark_provider.cc:208: // Once all term factors have been calculated they are summed. The resulting This first sentence isn't true. You multiple the factors, then sum those products over all matches. http://codereview.chromium.org/10913262/diff/78003/chrome/browser/autocomplet... chrome/browser/autocomplete/bookmark_provider.cc:209: // sum will never be greater than 1.0. This sum is then multiplied against Why will it never be greater than 1.0? I think it has to do with how duplicates/overlaps are eliminated in GetBookmarksWithTitlesMatching(), but it would help to claim that directly if that's what you're relying in. And if indeed it's true, please DCHECK it. http://codereview.chromium.org/10913262/diff/78003/chrome/browser/autocomplet... chrome/browser/autocomplete/bookmark_provider.cc:216: // it can be boosted if the URL referenced by the bookmark is also referenced boosted if -> boosted up to the maximum possible score if If you make this change, drop the last sentence of the paragraph.
I see you revised the code in response to most of my comments but forgot to publish your replies. Not a big deal, but I notice it looks like you ignored two of my comments (see below). --mark > http://codereview.chromium.org/10913262/diff/78003/chrome/browser/autocomplet... > chrome/browser/autocomplete/bookmark_provider.cc:208: // Once all term factors > have been calculated they are summed. The resulting > This first sentence isn't true. You multiple the factors, then sum those > products over all matches. > > http://codereview.chromium.org/10913262/diff/78003/chrome/browser/autocomplet... > chrome/browser/autocomplete/bookmark_provider.cc:209: // sum will never be > greater than 1.0. This sum is then multiplied against > Why will it never be greater than 1.0? > > I think it has to do with how duplicates/overlaps are eliminated in > GetBookmarksWithTitlesMatching(), but it would help to claim that directly if > that's what you're relying in. > > And if indeed it's true, please DCHECK it. >
Final comments. Sorry for the delay. https://codereview.chromium.org/10913262/diff/78003/chrome/browser/autocomple... File chrome/browser/autocomplete/bookmark_provider.cc (right): https://codereview.chromium.org/10913262/diff/78003/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:157: AutocompleteMatch BookmarkProvider::TitleMatchToACMatch( On 2012/10/18 18:37:54, Mark P wrote: > I found this comment that you deleted useful: Note that if the relevance is > never changed from 0 that the match will be discarded. > > Please restore it somewhere. I moved part of the original comment you liked to the header where TitleMatchToACMatch() is declared and another part to where TitleMatchToACMatch() is called in DoAutocomplete(). https://codereview.chromium.org/10913262/diff/78003/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:208: // Once all term factors have been calculated they are summed. The resulting On 2012/10/18 18:37:54, Mark P wrote: > This first sentence isn't true. You multiple the factors, then sum those > products over all matches. Please take a look at ScoringFunctor::operator() above, specifically line 144-145: 144 scoring_factor_ += term_length / title_length_ * 145 (title_length_ - match.first) / title_length_; the "+=" is doing the summing. A factor is calculated for each _term_ in the query. Those factors are summed. We're running into an inadequate lexicon I believe. It's easy to mix up 'term', 'word', 'match', 'result', etc. https://codereview.chromium.org/10913262/diff/78003/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:209: // sum will never be greater than 1.0. This sum is then multiplied against On 2012/10/18 18:37:54, Mark P wrote: > Why will it never be greater than 1.0? > > I think it has to do with how duplicates/overlaps are eliminated in > GetBookmarksWithTitlesMatching(), but it would help to claim that directly if > that's what you're relying in. > > And if indeed it's true, please DCHECK it. The math is pure. It can never be greater than 1.0 other than the potential for a very minor fractional rounding error that might make it 1.0+epsilon. Even with that very slight difference the resulting partial score will always be clamped to the maximum possible of 299. https://codereview.chromium.org/10913262/diff/78003/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:216: // it can be boosted if the URL referenced by the bookmark is also referenced On 2012/10/18 18:37:54, Mark P wrote: > boosted if > -> > boosted up to the maximum possible score if > > If you make this change, drop the last sentence of the paragraph. Done.
https://codereview.chromium.org/10913262/diff/78003/chrome/browser/autocomple... File chrome/browser/autocomplete/bookmark_provider.cc (right): https://codereview.chromium.org/10913262/diff/78003/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:208: // Once all term factors have been calculated they are summed. The resulting On 2012/10/19 17:33:07, mrossetti wrote: > On 2012/10/18 18:37:54, Mark P wrote: > > This first sentence isn't true. You multiple the factors, then sum those > > products over all matches. > > Please take a look at ScoringFunctor::operator() above, specifically line > 144-145: > > 144 scoring_factor_ += term_length / title_length_ * > 145 (title_length_ - match.first) / title_length_; > > the "+=" is doing the summing. > > A factor is calculated for each _term_ in the query. Those factors are summed. > > We're running into an inadequate lexicon I believe. It's easy to mix up 'term', > 'word', 'match', 'result', etc. Ah, I see why I thought your comment was misleading. I read the above list (1 & 2) as a list of factors, then when I saw this sentence "Once all term factors have been calculated they are summed." I said "oh no, they're multiplied first, then summed." Perhaps change line 188 from The factor for each term is calculated based on: to The factor for each term is the product of: https://codereview.chromium.org/10913262/diff/78003/chrome/browser/autocomple... chrome/browser/autocomplete/bookmark_provider.cc:209: // sum will never be greater than 1.0. This sum is then multiplied against On 2012/10/19 17:33:07, mrossetti wrote: > On 2012/10/18 18:37:54, Mark P wrote: > > Why will it never be greater than 1.0? > > > > I think it has to do with how duplicates/overlaps are eliminated in > > GetBookmarksWithTitlesMatching(), but it would help to claim that directly if > > that's what you're relying in. > > > > And if indeed it's true, please DCHECK it. > > The math is pure. It can never be greater than 1.0 other than the potential for > a very minor fractional rounding error that might make it 1.0+epsilon. Even with > that very slight difference the resulting partial score will always be clamped > to the maximum possible of 299. Consider the title "food" and omnibox input "foo ood". If term matches ends up with a three character match foo and a three character match ood, then your factors will be 3/4 * 1 3/4 * 3/4 The sum of these is 1.3125 > 1.0. The only way the match works out ("is pure") is the bookmarks search never reports overlapping matching. If this is true, you should assert it in comments and claim that the math works out precisely for this reason. Regardless, I think a DCHECK is appropriate. (Even if it's true, the DCHECK may help prevent regressions in how the bookmark search works.)
On 2012/10/19 17:50:43, Mark P wrote: > Ah, I see why I thought your comment was misleading. I read the above list (1 & > 2) as a list of factors, then when I saw this sentence "Once all term factors > have been calculated they are summed." I said "oh no, they're multiplied first, > then summed." > > Perhaps change line 188 from > The factor for each term is calculated based on: > to > The factor for each term is the product of: Fair enough. I'll update the comment next time I'm in the code. > https://codereview.chromium.org/10913262/diff/78003/chrome/browser/autocomple... > chrome/browser/autocomplete/bookmark_provider.cc:209: // sum will never be > greater than 1.0. This sum is then multiplied against > On 2012/10/19 17:33:07, mrossetti wrote: > > On 2012/10/18 18:37:54, Mark P wrote: > > > Why will it never be greater than 1.0? > > > > > > I think it has to do with how duplicates/overlaps are eliminated in > > > GetBookmarksWithTitlesMatching(), but it would help to claim that directly > if > > > that's what you're relying in. > > > > > > And if indeed it's true, please DCHECK it. > > > > The math is pure. It can never be greater than 1.0 other than the potential > for > > a very minor fractional rounding error that might make it 1.0+epsilon. Even > with > > that very slight difference the resulting partial score will always be clamped > > to the maximum possible of 299. > > Consider the title "food" and omnibox input "foo ood". If term matches ends up > with a three character match foo and a three character match ood, then your > factors will be > 3/4 * 1 > 3/4 * 3/4 > The sum of these is 1.3125 > 1.0. I fear I'm being pedantic, but this particular combination will not occur. Given omnibox input of "foo ood" and a bookmark title of "food" the match will fail because only prefix matches are valid with the BookmarkIndex; the "ood" will not match anything since it's not a prefix. This is tested in the BookmarkProviderTest.Positions unit test (see "foo bar" and "a d"). I will point out, however, that the QueryParser coalescing code will properly handle this case should we ever decide to allow non-prefix matches against bookmark titles. Nevertheless, there _are_ title/query combinations that _will_ demonstrate your point. This case is also tested in the unit tests. Two queries of particular interest are "foobar foo" and "foo foobar", both of which have overlapping matches against the title "foobar foobar". In the case of query "foo foobar" the initial match positions set will be {{0,3}, {7,10}, {0,6}, {7,13}}. These are coalesced into the match positions set {{0,6}, {7,13}}. > The only way the match works out ("is pure") is the bookmarks search never > reports overlapping matching. If this is true, you should assert it in comments > and claim that the math works out precisely for this reason. > > Regardless, I think a DCHECK is appropriate. (Even if it's true, the DCHECK may > help prevent regressions in how the bookmark search works.) I believe that a DCHECK should usually be reserved for potentially catastrophic error conditions and where unit tests cannot normally be counted on to catch a change in environment. In this case, should the BookmarkIndex search or QueryParser coalescing be changed such that the match position coalescing no longer works as this scoring algorithm expects, the BookmarkProvider unit tests will fail, quickly pointing out a problem.
> > > > https://codereview.chromium.**org/10913262/diff/78003/** > chrome/browser/autocomplete/**bookmark_provider.cc#**newcode209<https://codereview.chromium.org/10913262/diff/78003/chrome/browser/autocomplete/bookmark_provider.cc#newcode209> > >> chrome/browser/autocomplete/**bookmark_provider.cc:209: // sum will >> never be >> greater than 1.0. This sum is then multiplied against >> On 2012/10/19 17:33:07, mrossetti wrote: >> > On 2012/10/18 18:37:54, Mark P wrote: >> > > Why will it never be greater than 1.0? >> > > >> > > I think it has to do with how duplicates/overlaps are eliminated in >> > > GetBookmarksWithTitlesMatching**(), but it would help to claim that >> directly >> if >> > > that's what you're relying in. >> > > >> > > And if indeed it's true, please DCHECK it. >> > >> > The math is pure. It can never be greater than 1.0 other than the >> potential >> for >> > a very minor fractional rounding error that might make it 1.0+epsilon. >> Even >> with >> > that very slight difference the resulting partial score will always be >> > clamped > >> > to the maximum possible of 299. >> > > Consider the title "food" and omnibox input "foo ood". If term matches >> ends >> > up > >> with a three character match foo and a three character match ood, then >> your >> factors will be >> 3/4 * 1 >> 3/4 * 3/4 >> The sum of these is 1.3125 > 1.0. >> > > I fear I'm being pedantic, but this particular combination will not occur. > Given > omnibox input of "foo ood" and a bookmark title of "food" the match will > fail > because only prefix matches are valid with the BookmarkIndex; the "ood" > will not > match anything since it's not a prefix. This is tested in the > BookmarkProviderTest.Positions unit test (see "foo bar" and "a d"). I will > point > out, however, that the QueryParser coalescing code will properly handle > this > case should we ever decide to allow non-prefix matches against bookmark > titles. > > Nevertheless, there _are_ title/query combinations that _will_ demonstrate > your > point. This case is also tested in the unit tests. Two queries of > particular > interest are "foobar foo" and "foo foobar", both of which have overlapping > matches against the title "foobar foobar". In the case of query "foo > foobar" the > initial match positions set will be {{0,3}, {7,10}, {0,6}, {7,13}}. These > are > coalesced into the match positions set {{0,6}, {7,13}}. Okay. Next time you're in the code, I'd love if you can something after the sentence "the resulting sum will never be greater than 1." Perhaps something like: ... because of the way the bookmark model matches and removes overlaps. (In particular, the bookmarks models only matches terms to the beginning of words and it removes all overlapping matches, keeping only the longest. Together these mean that each character is included in at most one match. This property ensures the sum of factors is at most 1.) --mark > > > The only way the match works out ("is pure") is the bookmarks search never >> reports overlapping matching. If this is true, you should assert it in >> > comments > >> and claim that the math works out precisely for this reason. >> > > Regardless, I think a DCHECK is appropriate. (Even if it's true, the >> DCHECK >> > may > >> help prevent regressions in how the bookmark search works.) >> > > I believe that a DCHECK should usually be reserved for potentially > catastrophic > error conditions and where unit tests cannot normally be counted on to > catch a > change in environment. In this case, should the BookmarkIndex search or > QueryParser coalescing be changed such that the match position coalescing > no > longer works as this scoring algorithm expects, the BookmarkProvider unit > tests > will fail, quickly pointing out a problem. > > https://codereview.chromium.**org/10913262/<https://codereview.chromium.org/1... > |