|
|
DescriptionCleaning up before switching to flat sets in HQP.
BUG=693225
Review-Url: https://codereview.chromium.org/2690303012
Cr-Commit-Position: refs/heads/master@{#453269}
Committed: https://chromium.googlesource.com/chromium/src/+/950ccead606c70df01cb13625fdb2da24729345d
Patch Set 1 : Rough sketch. #
Total comments: 37
Patch Set 2 : Review, round 1. #
Total comments: 21
Patch Set 3 : Removing sorting, utilitiy for sets intersection. #
Total comments: 32
Patch Set 4 : Review, round 2. #Patch Set 5 : Review, round 3. #
Total comments: 24
Patch Set 6 : Review, round 4. #
Total comments: 8
Patch Set 7 : Compilation issues with old standard library. #Patch Set 8 : Dropping noexcept from move operations. #
Messages
Total messages: 55 (14 generated)
Description was changed from ========== Switching to use flat sets in HQP. BUG=693225 ========== to ========== Switching to flat sets in HQP. BUG=693225 ==========
dyaroshev@yandex-team.ru changed reviewers: + mpearson@google.com, pkasting@chromium.org
@pkasting, @mpearson. Hi. So, there are more changes in this CL than just switching to flat_sets. I can revert them back but I suggest we clean up url_index_private_data.cc even if just in this modest way it has some very messy code. PS: First patch is just a sketch, there are at the very least one bug, I'll find it tomorrow. Since communication is slow, I wanted to get some initial feedback. PPS: For some reason the corresponding issue keeps deleting my comments, I can't post a link to the CL there.
I really like this cleanup. It makes the code simpler and better. Thank you for doing it. It seems like about 99% of the change is the cleanup. I'd extract out the flat_set bits and land those as a dependent patch after landing the cleanup. That way if either patch introduces a change (functional or performance), it's easier to revert, fix, and reland without worrying about the other bit. https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... File components/omnibox/browser/in_memory_url_index_types.cc (right): https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/in_memory_url_index_types.cc:101: std::make_move_iterator(words.end())}; This dropped the lowercasing and character-clamping of the old code. That seems like a functional issue, in particular the removal of ToLower(). https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... File components/omnibox/browser/in_memory_url_index_types.h (right): https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/in_memory_url_index_types.h:148: HistoryInfoMapValue(); Since we're just trying to declare all these as "= default", should we just eliminate all the constructor and destructor declarations entirely so the compiler can do this automatically? (2 places) https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/in_memory_url_index_types.h:150: HistoryInfoMapValue& operator=(const HistoryInfoMapValue& other); Nit: Google style guide suggests listing constructors before assignment operators (i.e. I'd swap this with the line below) (2 places) https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... File components/omnibox/browser/url_index_private_data.cc (right): https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/url_index_private_data.cc:213: history_id_set.end()}; Nit: Prefer the direct constructor: HistoryIDVector history_ids(history_id_set.begin(), history_id_set.end()); https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/url_index_private_data.cc:218: std::nth_element(history_ids.begin(), I've never seen this algorithm before. Good use of it. https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/url_index_private_data.cc:514: if (term_history_set.empty()) { Nit: No {} https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/url_index_private_data.cc:515: return {}; Nit: I'd return HistoryIDSet(); (3 similar places) https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/url_index_private_data.cc:603: word_id_set.end()); Nit: Slightly shorter, reads a bit less awkwardly to me: const auto filter = [this, &term](WordID id) { return word_list_[id].find(term) == base::string16::npos; } const auto end = word_id_set.end(); word_id_set.erase(std::remove_if(word_id_set.begin(), end, filter), end); https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/url_index_private_data.cc:611: auto history_id_set = [&]() -> HistoryIDSet { Why do this in a lambda? Why not just "inline" the body of the lambda here? https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/url_index_private_data.cc:612: HistoryIDVector buffer; Nit: Deserves a comment about why it's important to append to the vector before moving into the set all at once. https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/url_index_private_data.cc:613: Nit: No blank line https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/url_index_private_data.cc:617: continue; Nit: I probably wouldn't change the old form here to the new -- they're about equal in readability, though personally I tend to prefer avoiding the continue in this case. But mostly it's just about not churning the code needlessly. https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/url_index_private_data.cc:640: if (char_iter == char_word_map_.end()) Nit: Preserve the old comments https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/url_index_private_data.cc:648: base::STLSetIntersection<WordIDSet>(word_id_set, char_word_id_set); Does dropping the assignment the old code used in the begin() case actually work? That looks like a functional change (it looks like it'd result in never having any results). https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/url_index_private_data.cc:846: for (base::char16 uni_char : Char16SetFromString16(term)) { Nit: No {} https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/url_index_private_data.cc:847: char_word_map_[uni_char].insert(word_id); Nice simplification! https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/url_index_private_data.cc:882: Nit: No blank line https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/url_index_private_data.cc:883: if (word_id_history_map_iter == word_id_history_map_.end()) Can this conditional succeed? The old code doesn't bother checking for it, but maybe that's because that code will silently no-op in this case? https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/url_index_private_data.cc:1042: for (const auto& entrie : word_starts_map_) { Nit: Did you mean "entry"? (many places) https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/url_index_private_data.cc:1171: GURL url(entrie.url()); Nit: Prefer = to () here https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/url_index_private_data.cc:1176: if (entrie.has_title()) { Nit: No {}
Description was changed from ========== Switching to flat sets in HQP. BUG=693225 ========== to ========== Cleaning up before switching to flat sets in HQP. BUG=693225 ==========
https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... File components/omnibox/browser/in_memory_url_index_types.cc (right): https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/in_memory_url_index_types.cc:101: std::make_move_iterator(words.end())}; On 2017/02/17 01:33:45, Peter Kasting wrote: > This dropped the lowercasing and character-clamping of the old code. That seems > like a functional issue, in particular the removal of ToLower(). Done. https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... File components/omnibox/browser/in_memory_url_index_types.h (right): https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/in_memory_url_index_types.h:148: HistoryInfoMapValue(); On 2017/02/17 01:33:45, Peter Kasting wrote: > Since we're just trying to declare all these as "= default", should we just > eliminate all the constructor and destructor declarations entirely so the > compiler can do this automatically? (2 places) Checkers forbidding inlining don't allow me to do that. https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/in_memory_url_index_types.h:150: HistoryInfoMapValue& operator=(const HistoryInfoMapValue& other); On 2017/02/17 01:33:46, Peter Kasting wrote: > Nit: Google style guide suggests listing constructors before assignment > operators (i.e. I'd swap this with the line below) (2 places) Done. https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... File components/omnibox/browser/url_index_private_data.cc (right): https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/url_index_private_data.cc:213: history_id_set.end()}; On 2017/02/17 01:33:46, Peter Kasting wrote: > Nit: Prefer the direct constructor: > > HistoryIDVector history_ids(history_id_set.begin(), history_id_set.end()); Done. https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/url_index_private_data.cc:218: std::nth_element(history_ids.begin(), On 2017/02/17 01:33:46, Peter Kasting wrote: > I've never seen this algorithm before. Good use of it. thx) While we are here - I'm not 100% sure, but it seems we don't need elements to be sorted after this? We can probably just use the vector? Sure it will require coping always but the code will be clearer and I think I'll be able to land some conversions to/from vector in base::flat_set later, so we'll fix it. It doesn't affect the performance either way - I've checked. https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/url_index_private_data.cc:514: if (term_history_set.empty()) { On 2017/02/17 01:33:46, Peter Kasting wrote: > Nit: No {} Done. https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/url_index_private_data.cc:515: return {}; On 2017/02/17 01:33:46, Peter Kasting wrote: > Nit: I'd return HistoryIDSet(); (3 similar places) Done. https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/url_index_private_data.cc:640: if (char_iter == char_word_map_.end()) On 2017/02/17 01:33:46, Peter Kasting wrote: > Nit: Preserve the old comments Done. https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/url_index_private_data.cc:648: base::STLSetIntersection<WordIDSet>(word_id_set, char_word_id_set); On 2017/02/17 01:33:46, Peter Kasting wrote: > Does dropping the assignment the old code used in the begin() case actually > work? That looks like a functional change (it looks like it'd result in never > having any results). Yes, my mistake. Done. Reverted almost everything except for swap. https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/url_index_private_data.cc:847: char_word_map_[uni_char].insert(word_id); On 2017/02/17 01:33:46, Peter Kasting wrote: > Nice simplification! I rewrote AddWordToIndex, AddWordHistory and UpdateWordHistory again in the second patch. https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/url_index_private_data.cc:882: On 2017/02/17 01:33:46, Peter Kasting wrote: > Nit: No blank line Done. https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/url_index_private_data.cc:883: if (word_id_history_map_iter == word_id_history_map_.end()) On 2017/02/17 01:33:46, Peter Kasting wrote: > Can this conditional succeed? The old code doesn't bother checking for it, but > maybe that's because that code will silently no-op in this case? Seems like no, we always add word to this map. And even if it is end() it should be the other way around to match old behaviour. Changed to DCHECK. https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/url_index_private_data.cc:1042: for (const auto& entrie : word_starts_map_) { On 2017/02/17 01:33:46, Peter Kasting wrote: > Nit: Did you mean "entry"? (many places) Oops, spelling( Done. https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/url_index_private_data.cc:1171: GURL url(entrie.url()); On 2017/02/17 01:33:46, Peter Kasting wrote: > Nit: Prefer = to () here Removed the temporary altogether. https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/url_index_private_data.cc:1176: if (entrie.has_title()) { On 2017/02/17 01:33:46, Peter Kasting wrote: > Nit: No {} Done. https://codereview.chromium.org/2690303012/diff/20001/components/omnibox/brow... File components/omnibox/browser/url_index_private_data.cc (right): https://codereview.chromium.org/2690303012/diff/20001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:1057: // allow synced entrys to now appear, expired entries to disappear, etc. Fix with next patch!
@pkasting Not in this patch, but I wondered how would you feel about small algorithmic library in history? Landing something like this in base is near to impossible and making any changes/updates would be super hard but this doesn't mean that we can't use something like this. Few examples: erase_if from library fundamentals 2: http://en.cppreference.com/w/cpp/experimental/vector/erase_if // parallel_set: we have two associative containers with the same key. Right now what we do is: for (const auto& key : first_set) { second_set.find(key); //.. } We can utilise hint from the previous search in very easy way. // set_intersection(first, last) - intersects a range of sets (union is not so easy, but maybe we can do union two). // Projection predicates: less_by, equal_by - smth like that. Than we can instead of writing: std::sort(v.begin(), v.end(), [](const elem_type& lhs, const elem_type& rhs) { return do_smth(lhs) < do_smth(rhs); }); just write: std::sort(v.begin(), v.end(), less_by(do_smt)); // equal_ranges: this: https://cs.chromium.org/chromium/src/components/history/core/browser/delete_d... is less than ideal way to group elements. Consider: std::sort(v.begin(), v.end(), less_by(Func)); equal_ranges(v.begin(), v.end(), equal_by(Func), [] (v_iter first, v_iter last) { }); Where group_by can, for example, utilise std::upper_bound to minimise number of predicate calls. https://codereview.chromium.org/2690303012/diff/20001/components/omnibox/brow... File components/omnibox/browser/url_index_private_data.cc (left): https://codereview.chromium.org/2690303012/diff/20001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:512: for (String16Vector::iterator iter = words.begin(); iter != words.end(); set_intersection. https://codereview.chromium.org/2690303012/diff/20001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:644: for (Char16Set::const_iterator c_iter = term_chars.begin(); set_intersection. https://codereview.chromium.org/2690303012/diff/20001/components/omnibox/brow... File components/omnibox/browser/url_index_private_data.cc (right): https://codereview.chromium.org/2690303012/diff/20001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:595: for (WordIDSet::iterator word_set_iter = word_id_set.begin(); erase_if https://codereview.chromium.org/2690303012/diff/20001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:839: for (base::char16 uni_char : Char16SetFromString16(term)) parallel_set. https://codereview.chromium.org/2690303012/diff/20001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:886: for (base::char16 uni_char : Char16SetFromString16(word)) { parallel_set. https://codereview.chromium.org/2690303012/diff/20001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:1103: word_map_[base::UTF8ToUTF16(entry.word())] = entry.word_id(); parallel_set. https://codereview.chromium.org/2690303012/diff/20001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:1147: for (HistoryID history_id : history_ids) parallel_set. https://codereview.chromium.org/2690303012/diff/20001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:1214: for (const auto& entry : history_info_map_) { parallel_set.
On 2017/02/17 20:44:38, dyaroshev wrote: > // > set_intersection(first, last) - intersects a range of sets (union is not so > easy, but maybe we can do union two). too. > Where group_by can equal_ranges.
On 2017/02/17 20:44:38, dyaroshev wrote: > @pkasting > > Not in this patch, but I wondered how would you feel about small algorithmic > library in history? > > Landing something like this in base is near to impossible and making any > changes/updates would be super hard but this doesn't mean that we can't use > something like this. > > Few examples: > > erase_if from library fundamentals 2: > http://en.cppreference.com/w/cpp/experimental/vector/erase_if This one I would put in base/stl_util.h. It's generic, widely useful, and the fact that it's in library fundamentals 2 should make it pretty non-controversial. Could call it STLEraseIf(). > // > parallel_set: we have two associative containers with the same key. > Right now what we do is: > for (const auto& key : first_set) { > second_set.find(key); //.. > } > We can utilise hint from the previous search in very easy way. Can you spell out more about what you imagine your function doing? Something like invoking std::binary_search(previous_match + 1, end())? Or are you constructing a new container type? I'm not clear on the proposal. > // > set_intersection(first, last) - intersects a range of sets (union is not so > easy, but maybe we can do union two). This seems pretty narrowly focused. I'm not even sure whether it would make your life a lot easier, because you'd have to generate the range to intersect first (but maybe with std::generate() or std::transform()?). If it's really useful, I'd add this specifically to url_index_private_data.cc as an anonymous helper unless it's useful outside that. > // > Projection predicates: > less_by, equal_by - smth like that. > > Than we can instead of writing: > std::sort(v.begin(), v.end(), [](const elem_type& lhs, const elem_type& rhs) { > return do_smth(lhs) < do_smth(rhs); > }); > > just write: > std::sort(v.begin(), v.end(), less_by(do_smt)); Honestly this feels hard for me to read and pretty specialized as well. I'm skeptical of this, but again, if we want it I'd suggest making it an anonymous function local to where it's used. > // > equal_ranges: > this: > https://cs.chromium.org/chromium/src/components/history/core/browser/delete_d... > is less than ideal way to group elements. > Consider: > std::sort(v.begin(), v.end(), less_by(Func)); > equal_ranges(v.begin(), v.end(), equal_by(Func), > [] (v_iter first, v_iter last) { > }); > > Where group_by can, for example, utilise std::upper_bound to minimise number of > predicate calls. I don't think I understand the suggestion. What is equal_ranges() doing? It looks like std::equal(). And how does it help the code you linked to? (I spent a few minutes figuring out what that code seemed to be doing, since I haven't seen it before.)
https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... File components/omnibox/browser/url_index_private_data.cc (right): https://codereview.chromium.org/2690303012/diff/1/components/omnibox/browser/... components/omnibox/browser/url_index_private_data.cc:218: std::nth_element(history_ids.begin(), On 2017/02/17 20:17:22, dyaroshev wrote: > On 2017/02/17 01:33:46, Peter Kasting wrote: > > I've never seen this algorithm before. Good use of it. > > thx) While we are here - I'm not 100% sure, but it seems we don't need elements > to be sorted after this? Correct, we're going to score them and then sort the scored items. > We can probably just use the vector? Sure it will > require coping always but the code will be clearer and I think I'll be able to > land some conversions to/from vector in base::flat_set later, so we'll fix it. I don't know that I grasp what you're proposing, I'd say write it up and I'll look at the implementation.
On 2017/02/17 21:44:10, Peter Kasting wrote: > On 2017/02/17 20:44:38, dyaroshev wrote: > > @pkasting > > > > Not in this patch, but I wondered how would you feel about small algorithmic > > library in history? > > > > Landing something like this in base is near to impossible and making any > > changes/updates would be super hard but this doesn't mean that we can't use > > something like this. > > > > Few examples: > > > > erase_if from library fundamentals 2: > > http://en.cppreference.com/w/cpp/experimental/vector/erase_if > > This one I would put in base/stl_util.h. It's generic, widely useful, and the > fact that it's in library fundamentals 2 should make it pretty > non-controversial. Could call it STLEraseIf(). Commiting in base takes months because we demand extrime qulity for anything there. This is strange and unpleasant because if I just copy paste algorithm many times - no problem, we can lend it in a few days - while in fact: test coverage is poorer, more code involved an we make the same mistakes all over again. I do not think that base should not be perfect - it's a very important library in the bottom of the stack but I would like to write algorithms instead of loops in regular CLs and not be forced to get them to perfection (if allowed at all). > > > // > > parallel_set: we have two associative containers with the same key. > > Right now what we do is: > > for (const auto& key : first_set) { > > second_set.find(key); //.. > > } > > We can utilise hint from the previous search in very easy way. > > Can you spell out more about what you imagine your function doing? Something > like invoking std::binary_search(previous_match + 1, end())? Or are you > constructing a new container type? I'm not clear on the proposal. > Sorry, didn't check beforehand, for some weired reason, std::set/map do not have lookup with a hint but for flat versions it could smth like this: template <typename I1, typename I2, typename Compare typename Action> // Requires InputIterator<I1>, InputIterator<I2>, BinaryPredicate<Compare>, UnaryFunction<Action> Action for_each_in_parallel_sorted(I1 first1, I1 last1, I2 first2, I2 last2, Compare comp, Action action) { I2 last_hit = first2; while(first1 != last1) { last_hit = std::lower_bound(last_hit, last2, comp, *first1); DCHECK(last_hit != last2); action(*last_hit); } } template <typename Cont1, typename Cont2, typename Action> // Requires AsociativeContainer<Cont1>, AsociativeContainer<Cont2> // RandomAccessContainer<Cont2>, UnaryFunction<Action> Action for_each_in_parrallel_set(const Cont1& lookup_by, Cont2& action_on, Action action) { for_each_in_parrallel_sorted(lookup_by.begin(), lookup_by.end(), action_on.begin(), action_on.end(), detail::smart_compare(action_on), // smart compare can handle key to value compare. action); } Since won't work for std::maps in reasonalbe way, take this one of the table. > > // > > set_intersection(first, last) - intersects a range of sets (union is not so > > easy, but maybe we can do union two). > > This seems pretty narrowly focused. I'm not even sure whether it would make > your life a lot easier, because you'd have to generate the range to intersect > first (but maybe with std::generate() or std::transform()?). If it's really > useful, I'd add this specifically to url_index_private_data.cc as an anonymous > helper unless it's useful outside that. > Well, it's duplicated in 2-3 places. We could do it in one. And I've already made a bug in it. We could do it in one header, write couple of tests and reuse it. > > // > > Projection predicates: > > less_by, equal_by - smth like that. > > > > Than we can instead of writing: > > std::sort(v.begin(), v.end(), [](const elem_type& lhs, const elem_type& rhs) > { > > return do_smth(lhs) < do_smth(rhs); > > }); > > > > just write: > > std::sort(v.begin(), v.end(), less_by(do_smt)); > > Honestly this feels hard for me to read and pretty specialized as well. I'm > skeptical of this, but again, if we want it I'd suggest making it an anonymous > function local to where it's used. Is it? smth like this https://cs.chromium.org/chromium/src/tools/gn/xcode_writer.cc?type=cs&q=std::... Doesn't read better to you as: std::sort(targets->begin(), targets->end(), helpers::less_by([](const Target* a) { return a->label().name(); })); Ok. > > > // > > equal_ranges: > > this: > > > https://cs.chromium.org/chromium/src/components/history/core/browser/delete_d... > > is less than ideal way to group elements. > > Consider: > > std::sort(v.begin(), v.end(), less_by(Func)); > > equal_ranges(v.begin(), v.end(), equal_by(Func), > > [] (v_iter first, v_iter last) { > > }); > > > > Where group_by can, for example, utilise std::upper_bound to minimise number > of > > predicate calls. > > I don't think I understand the suggestion. What is equal_ranges() doing? It > looks like std::equal(). And how does it help the code you linked to? (I spent > a few minutes figuring out what that code seemed to be doing, since I haven't > seen it before.) Oh, sorry, I assumed that because it's in history - you are familiar. Well, there are deleting directives, if ordered by time and grouped (there a two different grouping strategies for two types of directives) they can process better. The way they do it for the first one is to collect directives in a separate map. For the second one they do sort and then some type of an adjacent find. We could do some form of grouping as a generic algorithm (either with linear or with binary search). eqaul_ranges would be a version of an std::equal_range that instead of finding one range for one key would call you back on each range of equivalent keys in the sequence. Maybe it should be better done with less either than with equals for consistency with std::equal_range. Smth like: template <typename I, typename Comp, typename RangeAction> // requires RandomAccessIterator<I>, BinaryPredicate<Comp>, BinaryFunction<RangeAction> on (I, I) RangeAction equal_ranges(I first, I last, Comp comp, RangeAction action) { while (first != last) { I equals_end = std::upper_bound(first, last, comp, *first); action(first, equals_end); first = equals_end; } return action; } It is a question whethere std::is_sorted_until is better than std::upper_bound but this could be a good start anyways and then, if we see a performance problem, we could tune the algorithm. Anyways, this is just a thought.
On 2017/02/17 23:17:46, dyaroshev wrote: > On 2017/02/17 21:44:10, Peter Kasting wrote: > > On 2017/02/17 20:44:38, dyaroshev wrote: > > > @pkasting > > > > > > Not in this patch, but I wondered how would you feel about small algorithmic > > > library in history? > > > > > > Landing something like this in base is near to impossible and making any > > > changes/updates would be super hard but this doesn't mean that we can't use > > > something like this. > > > > > > Few examples: > > > > > > erase_if from library fundamentals 2: > > > http://en.cppreference.com/w/cpp/experimental/vector/erase_if > > > > This one I would put in base/stl_util.h. It's generic, widely useful, and the > > fact that it's in library fundamentals 2 should make it pretty > > non-controversial. Could call it STLEraseIf(). > > Commiting in base takes months because we demand extrime qulity for anything > there. Committing an entirely new container took a long time because there wasn't general agreement it was a good idea, the API had to be worked out, and multiple base/ reviewers were on holiday. Committing _this_ to base/ should take a day. base/ is no different than the rest of Chromium in this regard. If it is, something is wrong; but in my experience it isn't. Put it in base/. > > > // > > > set_intersection(first, last) - intersects a range of sets (union is not so > > > easy, but maybe we can do union two). > > > > This seems pretty narrowly focused. I'm not even sure whether it would make > > your life a lot easier, because you'd have to generate the range to intersect > > first (but maybe with std::generate() or std::transform()?). If it's really > > useful, I'd add this specifically to url_index_private_data.cc as an anonymous > > helper unless it's useful outside that. > > > > Well, it's duplicated in 2-3 places. We could do it in one. And I've already > made a bug in it. > We could do it in one header, write couple of tests and reuse it. I'm all for refactoring things, but don't put it in a header unless you're actually aware of and planning to use it in other places. Otherwise, keep it local. I was just saying that once you do the refactoring, you might find it's less useful than you hoped. But maybe still a win. > > > // > > > Projection predicates: > > > less_by, equal_by - smth like that. > > > > > > Than we can instead of writing: > > > std::sort(v.begin(), v.end(), [](const elem_type& lhs, const elem_type& > rhs) > > { > > > return do_smth(lhs) < do_smth(rhs); > > > }); > > > > > > just write: > > > std::sort(v.begin(), v.end(), less_by(do_smt)); > > > > Honestly this feels hard for me to read and pretty specialized as well. I'm > > skeptical of this, but again, if we want it I'd suggest making it an anonymous > > function local to where it's used. > > Is it? smth like this > https://cs.chromium.org/chromium/src/tools/gn/xcode_writer.cc?type=cs&q=std::... > Doesn't read better to you as: > std::sort(targets->begin(), targets->end(), > helpers::less_by([](const Target* a) { return a->label().name(); })); The problem here is that "less_by" doesn't mean much to me grammatically or as idiomatic C++. So I have to figure out what it means. By that point I could just read the original. It's still worthwhile to do if it comes up a lot and we can think of a good name. > > > // > > > equal_ranges: > > > this: > > > > > > https://cs.chromium.org/chromium/src/components/history/core/browser/delete_d... > > > is less than ideal way to group elements. > > > Consider: > > > std::sort(v.begin(), v.end(), less_by(Func)); > > > equal_ranges(v.begin(), v.end(), equal_by(Func), > > > [] (v_iter first, v_iter last) { > > > }); > > > > > > Where group_by can, for example, utilise std::upper_bound to minimise number > > of > > > predicate calls. > > > > I don't think I understand the suggestion. What is equal_ranges() doing? It > > looks like std::equal(). And how does it help the code you linked to? (I > spent > > a few minutes figuring out what that code seemed to be doing, since I haven't > > seen it before.) > > Oh, sorry, I assumed that because it's in history - you are familiar. The majority of history code is unfamiliar to me :). I've only touched small pieces of it. > We could do some form of grouping as a generic algorithm (either with linear or > with binary search). Again, how broadly applicable is this? I'd start by simply rewriting the specific code here in a better way, and only go for the generic solution where it's clearly widely useful.
https://codereview.chromium.org/2690303012/diff/20001/components/omnibox/brow... File components/omnibox/browser/url_index_private_data.cc (right): https://codereview.chromium.org/2690303012/diff/20001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:595: for (WordIDSet::iterator word_set_iter = word_id_set.begin(); How come you ended up reverting the changes here? https://codereview.chromium.org/2690303012/diff/20001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:607: // the sets from each word. And here? https://codereview.chromium.org/2690303012/diff/20001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:695: for (auto it = history_id_set.begin();;) { And here? https://codereview.chromium.org/2690303012/diff/20001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:845: word_id_history_map_[word_pos->second].insert(history_id); This looks functionally different than the old code, in that the old code reset the set here to have one entry, while this just adds the new entry to the existing set. Does that matter? https://codereview.chromium.org/2690303012/diff/20001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:857: available_words_.pop(); Doesn't this need to actually place |term| in the |word_list_|?
https://codereview.chromium.org/2690303012/diff/20001/components/omnibox/brow... File components/omnibox/browser/url_index_private_data.cc (right): https://codereview.chromium.org/2690303012/diff/20001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:595: for (WordIDSet::iterator word_set_iter = word_id_set.begin(); On 2017/02/18 00:40:40, Peter Kasting wrote: > How come you ended up reverting the changes here? I reverted flat_set to std::set, which has const keys. remove_if is a mutating algorithm. https://codereview.chromium.org/2690303012/diff/20001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:607: // the sets from each word. On 2017/02/18 00:40:40, Peter Kasting wrote: > And here? Putting in a vector and then inserting makes sense mostly for flat_set. https://codereview.chromium.org/2690303012/diff/20001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:695: for (auto it = history_id_set.begin();;) { On 2017/02/18 00:40:40, Peter Kasting wrote: > And here? Same story with remove_if. https://codereview.chromium.org/2690303012/diff/20001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:845: word_id_history_map_[word_pos->second].insert(history_id); On 2017/02/18 00:40:40, Peter Kasting wrote: > This looks functionally different than the old code, in that the old code reset > the set here to have one entry, while this just adds the new entry to the > existing set. Does that matter? No - previous code reseted set only for new entries, for old entries it inserted. I united both pieces of code to just insert because for a new entry set would be empty. https://codereview.chromium.org/2690303012/diff/20001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:857: available_words_.pop(); On 2017/02/18 00:40:40, Peter Kasting wrote: > Doesn't this need to actually place |term| in the |word_list_|? Yes. Tests that I ran didn't catch it. Sorry.
On 2017/02/17 23:33:57, Peter Kasting wrote: > > Committing an entirely new container took a long time because there wasn't > general agreement it was a good idea, the API had to be worked out, and multiple > base/ reviewers were on holiday. > > Committing _this_ to base/ should take a day. base/ is no different than the > rest of Chromium in this regard. If it is, something is wrong; but in my > experience it isn't. > > Put it in base/. > Ok, when I get around to it. > I'm all for refactoring things, but don't put it in a header unless you're > actually aware of and planning to use it in other places. Otherwise, keep it > local. I was just saying that once you do the refactoring, you might find it's > less useful than you hoped. But maybe still a win. > Done. > The problem here is that "less_by" doesn't mean much to me grammatically or as > idiomatic C++. So I have to figure out what it means. By that point I could > just read the original. > > It's still worthwhile to do if it comes up a lot and we can think of a good > name. > It does come up with many of calls to sort, unique etc. Maybe the name should be better. > > Again, how broadly applicable is this? I'd start by simply rewriting the > specific code here in a better way, and only go for the generic solution where > it's clearly widely useful. Ok, done locally.
On 2017/02/18 01:00:14, dyaroshev wrote: > On 2017/02/17 23:33:57, Peter Kasting wrote: > > Again, how broadly applicable is this? I'd start by simply rewriting the > > specific code here in a better way, and only go for the generic solution where > > it's clearly widely useful. > > Ok, done locally. I meant - will do locally if I'll get to actually rewrite that code, it doesn't seem to be used often, not clear what's with tests and the last was God knows when. I'll fix the avaliable_ids tomorrow. Sorry - it's pretty late here.
https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... File components/omnibox/browser/url_index_private_data.cc (right): https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:90: ResultingSet res(tmp_set.begin(), tmp_set.end()); If we keep this, I should write smth like ContainerCast: template <typename To, typename From> To ContainerCast(From from) { return To(std::make_move_iterator(from.begin()), std::make_move_iterator(from.end())); } template <typename To> To ContainerCast(To from) { return from; } https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:516: HistoryIDVector history_ids; Unused variable.
https://codereview.chromium.org/2690303012/diff/20001/components/omnibox/brow... File components/omnibox/browser/url_index_private_data.cc (right): https://codereview.chromium.org/2690303012/diff/20001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:595: for (WordIDSet::iterator word_set_iter = word_id_set.begin(); On 2017/02/18 00:57:36, dyaroshev wrote: > On 2017/02/18 00:40:40, Peter Kasting wrote: > > How come you ended up reverting the changes here? > > I reverted flat_set to std::set, which has const keys. remove_if is a mutating > algorithm. Seems like the issue is that std::set is an inherently sorted container, so an algorithm like "remove_if" doesn't really make sense on it at all. Conceptually, that worries me about flat_set too -- if it lets you "un-sort" it via remove_if, that seems kinda like a contract violation. Maybe the answer is to expose erase_if in the flat_set API itself (rather than as a helper in base/)? I suppose the main idea of putting this in base/ itself might need specializations for things like set<>. https://codereview.chromium.org/2690303012/diff/20001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:607: // the sets from each word. On 2017/02/18 00:57:36, dyaroshev wrote: > On 2017/02/18 00:40:40, Peter Kasting wrote: > > And here? > > Putting in a vector and then inserting makes sense mostly for flat_set. Right, but if it's perf-neutral on std::set and perf-positive on flat_set, it makes some sense to do as a preparatory change, so the functional change afterwards is as small as possible. And since std::set does have a range-based insert(), maybe this would even be better than perf-neutral for std::set? https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... File components/omnibox/browser/url_index_private_data.cc (right): https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:82: template <typename ResultingSet, typename I, typename Transformation> Nit: I -> Iter? https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:83: // Requires ResultingSet is a Container, I is an InputIterator, Transformation Nit: template<...> is part of the function declaration, so move this comment above it. Add (above this requirement) what the function actually does: "Constructs sets from each element in a range of inputs and returns the intersection of those sets." https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:85: ResultingSet IntersectSets(I first, I last, Transformation get_set) { Nit: GetTransformIntersection()? IntersectConstructedSets()? (I'm looking for a name that makes clear that the inputs to this function are not sets) https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:96: break; Shouldn't this return the empty set instead of breaking and returning the existing intersection? That seems very wrong. I'd expect unit tests to catch this sort of thing (if they don't maybe we need more tests). https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:234: post_filter_item_count_ += history_ids.size(); This line seems like a functional change (it was adding |pre_filter_item_count_|), but the old code looks wrong to me and your new code looks correct. (The old code was introduced as-is in https://codereview.chromium.org/2364553004 .) You might want to make this functional fix separately; I'm not sure. Mark would understand better the consequence of overcounting the post_filter_item_count_, if that is indeed what this was doing. Presumably, not clearing the cached results often enough, leading to lower result quality, but in what way I'm not sure. https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:509: // old data valid. Should we worry about this? Maybe it's OK to just change this? Ask Mark. https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:531: Nit: No blank line https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:656: return char_iter->second; Nit: Shorter: auto GetSet = [this](base::char16 c) { auto char_iter = char_word_map_.find(c); return char_iter == char_word_map_.end() ? WordIDSet() : char_iter->second; } return IntersectSets<WordIDSet>(term_chars.begin(), term_chars.end(), GetSet); https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:699: history_ids.end()); Nit: Pulling history_ids.end() out into a temp |end| above this call saves one line on net (since then the whole call fits on one line), and should be safe since remove_if() won't invalidate end(). https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... File components/omnibox/browser/url_index_private_data.h (right): https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.h:212: // Trim the candidate pool if it is large. Note that we do not filter out Nit: Trims Might want to give a little idea of how/why, e.g. "Trims the candidate pool in advance of doing proper substring searching, to cap the cost of such searching. Discards the least-relevant items (based on visit stats), which are least likely to score highly in the end. To minimize the risk of discarding a valuable URL, the candidate pool is still left two orders of magnitude larger than the final number of results returned from the HQP."
https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... File components/omnibox/browser/url_index_private_data.cc (right): https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:82: template <typename ResultingSet, typename I, typename Transformation> On 2017/02/18 01:46:31, Peter Kasting wrote: > Nit: I -> Iter? Done. https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:83: // Requires ResultingSet is a Container, I is an InputIterator, Transformation On 2017/02/18 01:46:31, Peter Kasting wrote: > Nit: template<...> is part of the function declaration, so move this comment > above it. > > Add (above this requirement) what the function actually does: "Constructs sets > from each element in a range of inputs and returns the intersection of those > sets." This comment describes concepts. Used concepts here are from Concepts TS and when (in some distant future) concepts come, this comments could be rewritten with requires. If you want I can drop them. https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:85: ResultingSet IntersectSets(I first, I last, Transformation get_set) { On 2017/02/18 01:46:31, Peter Kasting wrote: > Nit: GetTransformIntersection()? IntersectConstructedSets()? > > (I'm looking for a name that makes clear that the inputs to this function are > not sets) Done. https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:90: ResultingSet res(tmp_set.begin(), tmp_set.end()); On 2017/02/18 01:22:14, dyaroshev wrote: > If we keep this, I should write smth like ContainerCast: > > template <typename To, typename From> > To ContainerCast(From from) { > return To(std::make_move_iterator(from.begin()), > std::make_move_iterator(from.end())); > } > > template <typename To> > To ContainerCast(To from) { > return from; > } Done. @pkasting: there are number of places here where one container is constructed from the other. What's better - to keep them as is or to rewrite as smth like: auto history_id_set = ContainerCast<HistoryIdSet>(std::move(buffer)); https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:96: break; On 2017/02/18 01:46:31, Peter Kasting wrote: > Shouldn't this return the empty set instead of breaking and returning the > existing intersection? That seems very wrong. > > I'd expect unit tests to catch this sort of thing (if they don't maybe we need > more tests). Done. One of the advantages of keeping algorithms in a different header is that we can test them very easily and make sure that we've covered all the corner cases. https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:234: post_filter_item_count_ += history_ids.size(); On 2017/02/18 01:46:31, Peter Kasting wrote: > This line seems like a functional change (it was adding > |pre_filter_item_count_|), but the old code looks wrong to me and your new code > looks correct. (The old code was introduced as-is in > https://codereview.chromium.org/2364553004 .) > > You might want to make this functional fix separately; I'm not sure. Mark would > understand better the consequence of overcounting the post_filter_item_count_, > if that is indeed what this was doing. Presumably, not clearing the cached > results often enough, leading to lower result quality, but in what way I'm not > sure. Moved this logic into TrimHistoryIdsPool and revert to the way it was before. https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:509: // old data valid. On 2017/02/18 01:46:31, Peter Kasting wrote: > Should we worry about this? Maybe it's OK to just change this? Ask Mark. Well, it's for you mostly - I don't have access to your histograms. @Mark? https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:516: HistoryIDVector history_ids; On 2017/02/18 01:22:14, dyaroshev wrote: > Unused variable. Done. https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:531: On 2017/02/18 01:46:31, Peter Kasting wrote: > Nit: No blank line Done. https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:656: return char_iter->second; On 2017/02/18 01:46:31, Peter Kasting wrote: > Nit: Shorter: > > auto GetSet = [this](base::char16 c) { > auto char_iter = char_word_map_.find(c); > return char_iter == char_word_map_.end() ? WordIDSet() : char_iter->second; > } > return IntersectSets<WordIDSet>(term_chars.begin(), term_chars.end(), GetSet); Done. https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:699: history_ids.end()); On 2017/02/18 01:46:31, Peter Kasting wrote: > Nit: Pulling history_ids.end() out into a temp |end| above this call saves one > line on net (since then the whole call fits on one line), and should be safe > since remove_if() won't invalidate end(). I don't like invalid iterators lying around and it seems like I should declare begin than as well. I put remove_if on a separate line - is it ok? https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... File components/omnibox/browser/url_index_private_data.h (right): https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.h:212: // Trim the candidate pool if it is large. Note that we do not filter out On 2017/02/18 01:46:31, Peter Kasting wrote: > Nit: Trims > > Might want to give a little idea of how/why, e.g. "Trims the candidate pool in > advance of doing proper substring searching, to cap the cost of such searching. > Discards the least-relevant items (based on visit stats), which are least likely > to score highly in the end. To minimize the risk of discarding a valuable URL, > the candidate pool is still left two orders of magnitude larger than the final > number of results returned from the HQP." Done.
On 2017/02/18 01:46:32, Peter Kasting wrote: > > Seems like the issue is that std::set is an inherently sorted container, so an > algorithm like "remove_if" doesn't really make sense on it at all. > > Conceptually, that worries me about flat_set too -- if it lets you "un-sort" it > via remove_if, that seems kinda like a contract violation. > > Maybe the answer is to expose erase_if in the flat_set API itself (rather than > as a helper in base/)? I suppose the main idea of putting this in base/ itself > might need specializations for things like set<>. > I've argued about this with Sean Middleditch - the author of proposal for standardisation - he doesn't seem to have a solution too. Consider: a) const keys for flat_set is very hard to implement: vector of const elements does not work, and any tricks with const_cast are not portable (I am not sure what the precise problem there, I just know tha c++17 has some hacks around that). b) std::set of smth like unique_ptr is essentially useless. This doesn't have to be the case for flat_sets (especially after we add templated lookup functions. c) remove_if, unique etc do not invalidate sort. d) actually doing erase(remove_if) on a set could very much make sence because it doesn't affect cache locality. e) we forbid writing new remove_if like algorithms by trying to improve safety. To conclude: it's a rare bug, protecting against it is hard and does more bad than good. Like Bjarne Stroustrup: "For me it has always been more important to enable good programming than to forbid bad one".
On 2017/02/18 01:46:32, Peter Kasting wrote: > https://codereview.chromium.org/2690303012/diff/20001/components/omnibox/brow... > components/omnibox/browser/url_index_private_data.cc:607: // the sets from each > word. > On 2017/02/18 00:57:36, dyaroshev wrote: > > On 2017/02/18 00:40:40, Peter Kasting wrote: > > > And here? > > > > Putting in a vector and then inserting makes sense mostly for flat_set. > > Right, but if it's perf-neutral on std::set and perf-positive on flat_set, it > makes some sense to do as a preparatory change, so the functional change > afterwards is as small as possible. > > And since std::set does have a range-based insert(), maybe this would even be > better than perf-neutral for std::set? > FYI: as far as I understand - no it's not: https://github.com/llvm-mirror/libcxx/blob/master/include/set#L620
Please try to send replies inline on the original comments, it makes it harder to reply to you when I can do some things inline but have to collect other things from multiple separate emails in order to keep it all in one spot. On 2017/02/18 12:48:58, dyaroshev wrote: > Consider: > a) const keys for flat_set is very hard to implement: vector of const elements > does not work, Keeping in mind I don't ever look at or reimplement fundamental containers and their specs, so I'm feeling my way here: * std::set uses const keys to prevent things like swap() from working, so it can ensure things stay where it expects in the tree (or whatever it uses to implement its ordering), right? That's why "const keys" are relevant here. * Meanwhile, I assume vector-of-const-T doesn't work because if vector needs to resize and thus move all the elements, the compile breaks. I don't have any magic ideas of how flat_set can enforce that callers cannot muck with the elements, the way set does. Although the idea of casting-to-add-const seems like it could go somewhere, but I'm not sure, and I'm not sure how to declare it even then. Not convinced it's "not portable", but it's not something I want to spend my time trying to figure out :) > b) std::set of smth like unique_ptr is essentially useless. This doesn't have to > be the case for flat_sets (especially after we add templated lookup functions. I think your point here is "forcing flat_set to contain const Ts would also eliminate meaningful capabilities it could have that can't be implemented in std::set". Which is true, but I sort of feel like one has to make a decision as to whether a flat_set is "a set, with different perf characteristics", or "a container that has similar ordering properties to a set but is fundamentally its own thing". The argument here goes along with the latter viewpoint; mentally I tend to think in terms of the former. > c) remove_if, unique etc do not invalidate sort. They don't? If I have a set of 1, 2, 3, 4, 5, and I remove_if(even numbers), I'll get 1, 3, 5, 2, 4, and end() will point past 4. Now things like upper_bound() are broken, aren't they? I don't know that flat_set can easily prevent someone doing this, but I would think it causes the set to violate its ordering invariant. > d) actually doing erase(remove_if) on a set could very much make sence because > it doesn't affect cache locality. Yes, in terms of implementation, it's reasonable. My thought was that flat_set::erase_if() could be implemented like this under the hood. It's reasonable for the set to violate its own invariants within the body of a single function as long as it wouldn't throw an exception and be left in a violated state. > e) we forbid writing new remove_if like algorithms by trying to improve safety. I was proposing writing flat_set::erase_if() (since LF2 has it on std::set anyway so we'll get it eventually) instead of writing a non-member STLEraseIf()-type helper and using it on flat_sets. It sounds like you're thinking I was saying not to write something at all. I'm OK with not trying to solve the general bug here, I'd just rather make it easiest on people to avoid tripping over it by providing helpers that keep any invariant violations inside flat_set's own code. On 2017/02/19 22:31:12, dyaroshev wrote: > On 2017/02/18 01:46:32, Peter Kasting wrote: > > > https://codereview.chromium.org/2690303012/diff/20001/components/omnibox/brow... > > components/omnibox/browser/url_index_private_data.cc:607: // the sets from > each > > word. > > On 2017/02/18 00:57:36, dyaroshev wrote: > > > On 2017/02/18 00:40:40, Peter Kasting wrote: > > > > And here? > > > > > > Putting in a vector and then inserting makes sense mostly for flat_set. > > > > Right, but if it's perf-neutral on std::set and perf-positive on flat_set, it > > makes some sense to do as a preparatory change, so the functional change > > afterwards is as small as possible. > > > > And since std::set does have a range-based insert(), maybe this would even be > > better than perf-neutral for std::set? > > > > FYI: as far as I understand - no it's not: > https://github.com/llvm-mirror/libcxx/blob/master/include/set#L620 Since the question here is merely whether to do this now or later, let's just drop and, and if you feel it's better to do it in your subsequent patch, that's fine with me. https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... File components/omnibox/browser/url_index_private_data.cc (right): https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:83: // Requires ResultingSet is a Container, I is an InputIterator, Transformation On 2017/02/18 11:48:14, dyaroshev wrote: > On 2017/02/18 01:46:31, Peter Kasting wrote: > > Nit: template<...> is part of the function declaration, so move this comment > > above it. > > > > Add (above this requirement) what the function actually does: "Constructs sets > > from each element in a range of inputs and returns the intersection of those > > sets." > > This comment describes concepts. Used concepts here are from Concepts TS and > when (in some distant future) concepts come, this comments could be rewritten > with requires. > > If you want I can drop them. You can leave them if you want, but don't insert them mid-declaration to save one line of reordering many years down the line when Concepts are in C++, we support that version of the standard, and someone wants to rewrite this. Basically, write your comments for reader clarity, not to ease hypothetical future refactoring. Judge what to say at all, and where to say it, based solely on that reader clarity. https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:90: ResultingSet res(tmp_set.begin(), tmp_set.end()); On 2017/02/18 11:48:14, dyaroshev wrote: > On 2017/02/18 01:22:14, dyaroshev wrote: > > If we keep this, I should write smth like ContainerCast: > > > > template <typename To, typename From> > > To ContainerCast(From from) { > > return To(std::make_move_iterator(from.begin()), > > std::make_move_iterator(from.end())); > > } > > > > template <typename To> > > To ContainerCast(To from) { > > return from; > > } > > Done. > @pkasting: there are number of places here where one container is constructed > from the other. > What's better - to keep them as is or to rewrite as smth like: > auto history_id_set = ContainerCast<HistoryIdSet>(std::move(buffer)); I find things more readable without ContainerCast. (Looks like in your new patch set you implemented but did not use it, which makes it slightly hard to tell how many places it's useful, but it looks to me like an extremely small change that buys arguable readability on the caller side, with templates + boilerplate on the callee side.) https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:96: break; On 2017/02/18 11:48:14, dyaroshev wrote: > On 2017/02/18 01:46:31, Peter Kasting wrote: > > Shouldn't this return the empty set instead of breaking and returning the > > existing intersection? That seems very wrong. > > > > I'd expect unit tests to catch this sort of thing (if they don't maybe we need > > more tests). > > Done. > > One of the advantages of keeping algorithms in a different header is that we can > test them very easily and make sure that we've covered all the corner cases. If there's really a need to test them, they can be exposed in this class' header too, as a compromise between keeping implementation detail scoped locally and being testable. But I was more saying, I'd hope the tests for the caller of this code are sufficient to have caught this error. https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:234: post_filter_item_count_ += history_ids.size(); On 2017/02/18 11:48:14, dyaroshev wrote: > On 2017/02/18 01:46:31, Peter Kasting wrote: > > This line seems like a functional change (it was adding > > |pre_filter_item_count_|), but the old code looks wrong to me and your new > code > > looks correct. (The old code was introduced as-is in > > https://codereview.chromium.org/2364553004 .) > > > > You might want to make this functional fix separately; I'm not sure. Mark > would > > understand better the consequence of overcounting the post_filter_item_count_, > > if that is indeed what this was doing. Presumably, not clearing the cached > > results often enough, leading to lower result quality, but in what way I'm not > > sure. > > Moved this logic into TrimHistoryIdsPool and revert to the way it was before. Ugh, don't to that. Now the code is as wrong as before but the wrongness is far less obvious to the reader, so it's liable to stay broken. What you were doing here before is vastly better. To simplify this, I would pull out just changing post_filter_item_count_ to always add history_id_set.size() into its own CL that lands before this one, and having mpearson review that. https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:699: history_ids.end()); On 2017/02/18 11:48:14, dyaroshev wrote: > On 2017/02/18 01:46:31, Peter Kasting wrote: > > Nit: Pulling history_ids.end() out into a temp |end| above this call saves one > > line on net (since then the whole call fits on one line), and should be safe > > since remove_if() won't invalidate end(). > > I don't like invalid iterators lying around That's a reasonable objection to have. > and it seems like I should declare > begin than as well. I feel no sense of "if I declare a temp for one concept I use, I have to declare a temp for every concept I use", so I don't resonate with this. > I put remove_if on a separate line - is it ok? What you've done now is just as long as what you had before, but has more temps, and an extra scope that makes us ask what's going out of scope and why the scoping matters. So it's more work for the brain to process. So I would consider it worse than before. If you dislike my suggestion, I would just not implement it :).
On 2017/02/20 10:02:54, Peter Kasting wrote: > Please try to send replies inline on the original comments, ... I'm trying, it's not always easy( Especially after I've made a mistake. And there is a 1000 character restriction. > On 2017/02/18 12:48:58, dyaroshev wrote: > > Consider: > > a) const keys for flat_set is very hard to implement: vector of const elements > > does not work, > > Keeping in mind I don't ever look at or reimplement fundamental containers and > their specs, so I'm feeling my way here: > > * std::set uses const keys to prevent things like swap() from working, so it can > ensure things stay where it expects in the tree (or whatever it uses to > implement its ordering), right? That's why "const keys" are relevant here. > * Meanwhile, I assume vector-of-const-T doesn't work because if vector needs to > resize and thus move all the elements, the compile breaks. > > I don't have any magic ideas of how flat_set can enforce that callers cannot > muck with the elements, the way set does. Although the idea of > casting-to-add-const seems like it could go somewhere, but I'm not sure, and I'm > not sure how to declare it even then. Not convinced it's "not portable", but > it's not something I want to spend my time trying to figure out :) > Yes, std::set tries to avoid breaking sorted order since C++11. Questionable decision, since it doesn't allow us to do useful things. > > b) std::set of smth like unique_ptr is essentially useless. This doesn't have > to > > be the case for flat_sets (especially after we add templated lookup functions. > > ... to whether a flat_set is "a set, with different perf characteristics", or "a > container that has similar ordering properties to a set but is fundamentally its > own thing"... > Well it's almost a set) Ok, not this patch's question anyways. > > c) remove_if, unique etc do not invalidate sort. > > They don't? > > If I have a set of 1, 2, 3, 4, 5, and I remove_if(even numbers), I'll get 1, 3, > 5, 2, 4, and end() will point past 4. ... Well accessing removed elements in any container I would say invalidate the invariant that elements in a container are fully formed, so this is consistent with everything. FYI: I think that remove is implemented in terms of move, so we would get 1, 3, 5, 5, 5, 5. > > d) actually doing erase(remove_if) on a set could very much make sence because > > it doesn't affect cache locality. > > Yes, in terms of implementation, it's reasonable. My thought was that > flat_set::erase_if() could be implemented like this under the hood. It's > reasonable for the set to violate its own invariants within the body of a single > function as long as it wouldn't throw an exception and be left in a violated > state. > The whole idea why we have algorithms separately is to avoid the permutation problem: we would have to implement every algorithm for every container. Of course we have sometimes to step away from this rule and then we get list::sort and staff like that but this is unfortunate. flat_set::erase_if is inconsistent with the standard and doesn't solve the problem. Consider the case when we have the lower bound: our_set.erase(std::remove_if(some_element, out_set.end(), predicate), our_set.end()); It would be unpleasant to try to make this case bad. I think non-member helper for common case is ok, but a member function implies closer relations than it has to be. > > e) we forbid writing new remove_if like algorithms by trying to improve > safety. > > I was proposing writing flat_set::erase_if() (since LF2 has it on std::set > anyway so we'll get it eventually) instead of writing a non-member > STLEraseIf()-type helper and using it on flat_sets. It sounds like you're > thinking I was saying not to write something at all. > > I'm OK with not trying to solve the general bug here, I'd just rather make it > easiest on people to avoid tripping over it by providing helpers that keep any > invariant violations inside flat_set's own code. > Anyways, this is not the problem of this bug. Holy wars are a lot of fun though) > Since the question here is merely whether to do this now or later, let's just > drop and, and if you feel it's better to do it in your subsequent patch, that's > fine with me. > I'll revert it then. > You can leave them if you want, but don't insert them mid-declaration to save > one line of reordering many years down the line when Concepts are in C++, we > support that version of the standard, and someone wants to rewrite this. > > Basically, write your comments for reader clarity, not to ease hypothetical > future refactoring. Judge what to say at all, and where to say it, based solely > on that reader clarity. > This is just a variation on modern style writing templates: https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#... I don't care that much for them - not very helpful here but I don't like putting them before template. > I find things more readable without ContainerCast. (Looks like in your new > patch set you implemented but did not use it, which makes it slightly hard to > tell how many places it's useful, but it looks to me like an extremely small > change that buys arguable readability on the caller side, with templates + > boilerplate on the callee side.) > I agree about more readable but I need it for Intersect algorithm: it's important to call copy/move over range constructor for sets. Seems very complicated for a local function - I would move it to util header. > https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... > components/omnibox/browser/url_index_private_data.cc:96: break; > On 2017/02/18 11:48:14, dyaroshev wrote: > > On 2017/02/18 01:46:31, Peter Kasting wrote: > > > Shouldn't this return the empty set instead of breaking and returning the > > > existing intersection? That seems very wrong. > > > > > > I'd expect unit tests to catch this sort of thing (if they don't maybe we > need > > > more tests). > > > > Done. > > > > One of the advantages of keeping algorithms in a different header is that we > can > > test them very easily and make sure that we've covered all the corner cases. > > If there's really a need to test them, they can be exposed in this class' header > too, as a compromise between keeping implementation detail scoped locally and > being testable. > > But I was more saying, I'd hope the tests for the caller of this code are > sufficient to have caught this error. > Exposing in url_index_private_data.h IntersectSets ? Seems like a poor placement. Well, it's quite hard to get all possible details of an algorithm tested, when it's buried deep down in the class. I prefer algorithms to be separate from application logic where it makes sense. > > To simplify this, I would pull out just changing post_filter_item_count_ to > always add history_id_set.size() into its own CL that lands before this one, and > having mpearson review that. > And a test for it if possible. Ok. > > I feel no sense of "if I declare a temp for one concept I use, I have to declare > a temp for every concept I use", so I don't resonate with this. Not for every concept, but begin/end come in pairs. If I were to see end, I would immediately look for a reason why it might be not the end of a container. > > > I put remove_if on a separate line - is it ok? > > What you've done now is just as long as what you had before, but has more temps, > and an extra scope that makes us ask what's going out of scope and why the > scoping matters. So it's more work for the brain to process. So I would > consider it worse than before. > > If you dislike my suggestion, I would just not implement it :). Ok, I'll drop it)
https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... File components/omnibox/browser/url_index_private_data.cc (right): https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:234: post_filter_item_count_ += history_ids.size(); On 2017/02/20 10:02:54, Peter Kasting wrote: > On 2017/02/18 11:48:14, dyaroshev wrote: > > On 2017/02/18 01:46:31, Peter Kasting wrote: > > > This line seems like a functional change (it was adding > > > |pre_filter_item_count_|), but the old code looks wrong to me and your new > > code > > > looks correct. (The old code was introduced as-is in > > > https://codereview.chromium.org/2364553004 .) > > > > > > You might want to make this functional fix separately; I'm not sure. Mark > > would > > > understand better the consequence of overcounting the > post_filter_item_count_, > > > if that is indeed what this was doing. Presumably, not clearing the cached > > > results often enough, leading to lower result quality, but in what way I'm > not > > > sure. > > > > Moved this logic into TrimHistoryIdsPool and revert to the way it was before. > > Ugh, don't to that. Now the code is as wrong as before but the wrongness is far > less obvious to the reader, so it's liable to stay broken. > > What you were doing here before is vastly better. > > To simplify this, I would pull out just changing post_filter_item_count_ to > always add history_id_set.size() into its own CL that lands before this one, and > having mpearson review that. Other than creating a separate CL with this fix seems like I'm done. That patch requires some digging - for now I have no idea why would pre/post counts be member variables - seems like right now we'll never clean the cache (I might be wrong - haven't looked at it too closely).
https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... File components/omnibox/browser/url_index_private_data.cc (right): https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:104: ContainerTo ContinerCast(ContainerFrom&& cont) { ContainerCast? https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:118: // Creates a new set by intersecting all the sets in range defined by |fist|, Can you make this comment more human readable? Whats the customization point? I am not sure every mere mortal knows whats it. Until I saw how this function is used I could not figure what it was doing. Maybe example would be useful? I would prefer this function to be inlined in both places where its used instead of having all the ContainerCast magic. Peter, Mark? https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... File components/omnibox/browser/url_index_private_data.h (right): https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.h:347: // case the slots for those words are added to available_words_ for resuse reuse
On 2017/02/21 07:07:15, Alexander Yashkin wrote: https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... > components/omnibox/browser/url_index_private_data.cc:118: // Creates a new set > by intersecting all the sets in range defined by |fist|, > Can you make this comment more human readable? > Whats the customization point? I am not sure every mere mortal knows whats it. > > Until I saw how this function is used I could not figure what it was doing. > Maybe example would be useful? > > I would prefer this function to be inlined in both places where its used instead > of having > all the ContainerCast magic. > > Peter, Mark? > Sure, it's probably too complex for a local function. However: It's better than previous iterations of this algorithm. It has a name that tells it's purpose. User's code looks better. I think that this kind of ContainerCast magic would be completly ok in a detail namespace of an appropriatly named utility header and that such function should live there.
https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... File components/omnibox/browser/url_index_private_data.cc (right): https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:234: post_filter_item_count_ += history_ids.size(); On 2017/02/20 22:21:31, dyaroshev wrote: > On 2017/02/20 10:02:54, Peter Kasting wrote: > > On 2017/02/18 11:48:14, dyaroshev wrote: > > > On 2017/02/18 01:46:31, Peter Kasting wrote: > > > > This line seems like a functional change (it was adding > > > > |pre_filter_item_count_|), but the old code looks wrong to me and your new > > > code > > > > looks correct. (The old code was introduced as-is in > > > > https://codereview.chromium.org/2364553004 .) > > > > > > > > You might want to make this functional fix separately; I'm not sure. Mark > > > would > > > > understand better the consequence of overcounting the > > post_filter_item_count_, > > > > if that is indeed what this was doing. Presumably, not clearing the > cached > > > > results often enough, leading to lower result quality, but in what way I'm > > not > > > > sure. > > > > > > Moved this logic into TrimHistoryIdsPool and revert to the way it was > before. > > > > Ugh, don't to that. Now the code is as wrong as before but the wrongness is > far > > less obvious to the reader, so it's liable to stay broken. > > > > What you were doing here before is vastly better. > > > > To simplify this, I would pull out just changing post_filter_item_count_ to > > always add history_id_set.size() into its own CL that lands before this one, > and > > having mpearson review that. > > Other than creating a separate CL with this fix seems like I'm done. > That patch requires some digging - for now I have no idea why would pre/post > counts be member variables - seems like right now we'll never clean the cache (I > might be wrong - haven't looked at it too closely). New CL: https://codereview.chromium.org/2702413007/ https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... File components/omnibox/browser/url_index_private_data.cc (right): https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:118: // Creates a new set by intersecting all the sets in range defined by |fist|, On 2017/02/21 07:07:15, Alexander Yashkin wrote: > Can you make this comment more human readable? > Whats the customization point? I am not sure every mere mortal knows whats it. > > Until I saw how this function is used I could not figure what it was doing. > Maybe example would be useful? > > I would prefer this function to be inlined in both places where its used instead > of having > all the ContainerCast magic. > > Peter, Mark? Another good argument for not inlining this -> this seems to be a bad way to implement multiple sets intersect (still better than the previous one) - it calls an allocator multiple times an does many unnecessary copies. A good way could grow out of smth like this: https://godbolt.org/g/V67d6R - it's definitely not inlinable. And if we put a method in an independent location, we could easily improve it later and affect all the places of use.
https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... File components/omnibox/browser/url_index_private_data.cc (right): https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:118: // Creates a new set by intersecting all the sets in range defined by |fist|, On 2017/02/22 22:57:25, dyaroshev wrote: > On 2017/02/21 07:07:15, Alexander Yashkin wrote: > > Can you make this comment more human readable? > > Whats the customization point? I am not sure every mere mortal knows whats it. > > > > Until I saw how this function is used I could not figure what it was doing. > > Maybe example would be useful? > > > > I would prefer this function to be inlined in both places where its used > instead > > of having > > all the ContainerCast magic. > > > > Peter, Mark? > > Another good argument for not inlining this -> this seems to be a bad way to > implement multiple sets intersect (still better than the previous one) - it > calls an allocator multiple times an does many unnecessary copies. > > A good way could grow out of smth like this: https://godbolt.org/g/V67d6R - it's > definitely not inlinable. And if we put a method in an independent location, we > could easily improve it later and affect all the places of use. Significantly revised version: https://godbolt.org/g/vgCVmr
I will try to get to this tomorrow.
On 2017/02/23 02:07:07, Peter Kasting wrote: > I will try to get to this tomorrow. The algorithm has to be further improved, there are some naming issues (for example now I don't think that linear_upper_bound is actually an upper_bound) so please don't get cought up in it. My main point is that this is not such an easy algorithm and it would be nice to have it in one place and maybe improve it later, if we need to.
mpearson@chromium.org changed reviewers: + mpearson@chromium.org
Only scanned this long thread; saw one thing that clearly asked for my opinion. Replied below. --mark https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... File components/omnibox/browser/url_index_private_data.cc (right): https://codereview.chromium.org/2690303012/diff/40001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:509: // old data valid. On 2017/02/18 11:48:14, dyaroshev wrote: > On 2017/02/18 01:46:31, Peter Kasting wrote: > > Should we worry about this? Maybe it's OK to just change this? Ask Mark. > > Well, it's for you mostly - I don't have access to your histograms. > @Mark? Might as well keep the name of the histogram to keep history being valid. These names are close enough that one is discoverable if the user is searching for the other.
LGTM. I'd still rather kill ContainerCast (see below), but I've taken the debate as far as I can either way. https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... File components/omnibox/browser/url_index_private_data.cc (right): https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:128: auto res = ContinerCast<ResultingSet>(get_set(*first)); It seems like the main reason you need ContainerCast is because one of the callers is trying to supply a vector type for ResultingSet. This sort of makes the type names and comments of this function a lie. They shouldn't explicitly say "set" if this is really just about working on arbitrary containers. But the other fix would be to intersect to a set, and then pull a vector out of that at the end to return from the caller that wants to do so. This would allow removing all the ContainerCast stuff, which still seems like a desirable goal to me in terms of lowering boilerplate and improving readability. It's not instantly obvious how it would affect efficiency, but it seems like it would lower it slightly (but maybe not enough to matter). https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:542: // old data valid. Nit: Sounds like we're leaving the old name. If so I'd just say: "This histogram name reflects the historic name of this function." https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:681: auto GetSet = [&](base::char16 c) -> const WordIDSet& { Nit: I would explicitly capture [this, &empty_set]. That said; can you return the set by value and not const ref, and avoid declaring |empty_set| at all (just return WordIDSet() in the lambda)? https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:685: return char_iter->second; Nit: Simpler: return char_iter == char_word_map_.end() ? empty_set : char_iter->second; https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:725: { Nit: {} not necessary https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:1153: const RepeatedField<int32_t>& word_ids(entry.word_id()); Nit: Use = rather than () to init references like this (multiple places) https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... File components/omnibox/browser/url_index_private_data.h (right): https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.h:264: // Adds a new entry to word_list. Uses previously freed positions if Nit: word_list -> |word_list_|
@pkasting The changes seem to be big enough that I would appreciate if you confirmed that you are OK with them. Maybe I should change the comment to HistoryIdsToScoredMatches? It currently says: Helper function for HistoryItemsForTerms(). Fills in |scored_items| from the matches listed in |history_ids|. https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... File components/omnibox/browser/url_index_private_data.cc (right): https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:104: ContainerTo ContinerCast(ContainerFrom&& cont) { On 2017/02/21 07:07:15, Alexander Yashkin wrote: > ContainerCast? Not applicable. https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:128: auto res = ContinerCast<ResultingSet>(get_set(*first)); On 2017/02/25 04:54:42, Peter Kasting wrote: > It seems like the main reason you need ContainerCast is because one of the > callers is trying to supply a vector type for ResultingSet. > > This sort of makes the type names and comments of this function a lie. They > shouldn't explicitly say "set" if this is really just about working on arbitrary > containers. > > But the other fix would be to intersect to a set, and then pull a vector out of > that at the end to return from the caller that wants to do so. This would allow > removing all the ContainerCast stuff, which still seems like a desirable goal to > me in terms of lowering boilerplate and improving readability. It's not > instantly obvious how it would affect efficiency, but it seems like it would > lower it slightly (but maybe not enough to matter). Since this is the cleaning up patch and I don't see a 100% satisfying solution - I've reverted the intersect patch, created a task for writing a generic union/intersect functions and added a todo. Is this ok? https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:542: // old data valid. On 2017/02/25 04:54:41, Peter Kasting wrote: > Nit: Sounds like we're leaving the old name. If so I'd just say: "This > histogram name reflects the historic name of this function." Done. https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:681: auto GetSet = [&](base::char16 c) -> const WordIDSet& { On 2017/02/25 04:54:42, Peter Kasting wrote: > Nit: I would explicitly capture [this, &empty_set]. > > That said; can you return the set by value and not const ref, and avoid > declaring |empty_set| at all (just return WordIDSet() in the lambda)? Not applicable. Returning by value would add an extra copy which can be expensive. https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:685: return char_iter->second; On 2017/02/25 04:54:41, Peter Kasting wrote: > Nit: Simpler: > > return char_iter == char_word_map_.end() ? empty_set : char_iter->second; Not applicable. https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:725: { On 2017/02/25 04:54:42, Peter Kasting wrote: > Nit: {} not necessary Done. https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:1153: const RepeatedField<int32_t>& word_ids(entry.word_id()); On 2017/02/25 04:54:41, Peter Kasting wrote: > Nit: Use = rather than () to init references like this (multiple places) Done. https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... File components/omnibox/browser/url_index_private_data.h (right): https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.h:264: // Adds a new entry to word_list. Uses previously freed positions if On 2017/02/25 04:54:42, Peter Kasting wrote: > Nit: word_list -> |word_list_| Done. https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.h:347: // case the slots for those words are added to available_words_ for resuse On 2017/02/21 07:07:15, Alexander Yashkin wrote: > reuse Done. https://codereview.chromium.org/2690303012/diff/100001/components/omnibox/bro... File components/omnibox/browser/url_index_private_data.cc (right): https://codereview.chromium.org/2690303012/diff/100001/components/omnibox/bro... components/omnibox/browser/url_index_private_data.cc:206: &scored_items); In fixed version of HistoryIdsToScoredMatches() we don't actually need a temporary. https://codereview.chromium.org/2690303012/diff/100001/components/omnibox/bro... components/omnibox/browser/url_index_private_data.cc:699: First transforming and then removing seems unnecessary - it's better to filter prior to putting in.
https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... File components/omnibox/browser/url_index_private_data.cc (right): https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:681: auto GetSet = [&](base::char16 c) -> const WordIDSet& { On 2017/02/26 01:12:17, dyaroshev wrote: > On 2017/02/25 04:54:42, Peter Kasting wrote: > > Nit: I would explicitly capture [this, &empty_set]. > > > > That said; can you return the set by value and not const ref, and avoid > > declaring |empty_set| at all (just return WordIDSet() in the lambda)? > > Not applicable. > > Returning by value would add an extra copy which can be expensive. True. But is it expensive enough to matter? One of the themes of my review comments should be taken to be "use simpler and more readable forms when the performance difference has little practical effect". I honestly don't know the answer here, that's just my general guidance for the future.
https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... File components/omnibox/browser/url_index_private_data.cc (right): https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:681: auto GetSet = [&](base::char16 c) -> const WordIDSet& { On 2017/02/26 01:35:18, Peter Kasting wrote: > On 2017/02/26 01:12:17, dyaroshev wrote: > > On 2017/02/25 04:54:42, Peter Kasting wrote: > > > Nit: I would explicitly capture [this, &empty_set]. > > > > > > That said; can you return the set by value and not const ref, and avoid > > > declaring |empty_set| at all (just return WordIDSet() in the lambda)? > > > > Not applicable. > > > > Returning by value would add an extra copy which can be expensive. > > True. But is it expensive enough to matter? > > One of the themes of my review comments should be taken to be "use simpler and > more readable forms when the performance difference has little practical > effect". I honestly don't know the answer here, that's just my general guidance > for the future. Well, for std::set I've seen cases where it does. I agree with the general guidance.
The CQ bit was checked by dyaroshev@yandex-team.ru
The patchset sent to the CQ was uploaded after l-g-t-m from pkasting@chromium.org Link to the patchset: https://codereview.chromium.org/2690303012/#ps100001 (title: "Review, round 4.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: android_compile_dbg on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_comp...)
The CQ bit was checked by dyaroshev@yandex-team.ru
The patchset sent to the CQ was uploaded after l-g-t-m from pkasting@chromium.org Link to the patchset: https://codereview.chromium.org/2690303012/#ps120001 (title: "Compilation issues with old standard library.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: android_compile_dbg on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_comp...)
The CQ bit was checked by dyaroshev@yandex-team.ru
The patchset sent to the CQ was uploaded after l-g-t-m from pkasting@chromium.org Link to the patchset: https://codereview.chromium.org/2690303012/#ps140001 (title: "Dropping noexcept from move operations.")
On 2017/02/26 18:24:32, commit-bot: I haz the power wrote: > Try jobs failed on following builders: > android_compile_dbg on master.tryserver.chromium.android (JOB_FAILED, > https://build.chromium.org/p/tryserver.chromium.android/builders/android_comp...) Move operations with noexcept do not compile on android. I've reported it on the discussion thread: https://groups.google.com/a/chromium.org/forum/#!topic/chromium-dev/8i4tMqNpHhg
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 140001, "attempt_start_ts": 1488217114168340, "parent_rev": "3f42bbc56b1f750a4f4df136bacb5b079dcc50df", "commit_rev": "950ccead606c70df01cb13625fdb2da24729345d"}
Message was sent while issue was closed.
Description was changed from ========== Cleaning up before switching to flat sets in HQP. BUG=693225 ========== to ========== Cleaning up before switching to flat sets in HQP. BUG=693225 Review-Url: https://codereview.chromium.org/2690303012 Cr-Commit-Position: refs/heads/master@{#453269} Committed: https://chromium.googlesource.com/chromium/src/+/950ccead606c70df01cb13625fdb... ==========
Message was sent while issue was closed.
Committed patchset #8 (id:140001) as https://chromium.googlesource.com/chromium/src/+/950ccead606c70df01cb13625fdb...
Message was sent while issue was closed.
LGTM Most of the comments below ended up being about the original code you restored, which I only reviewed because it was a diff from the previous patch set. They're all optional, you're welcome to do them in some following patch if you think they're a win, or ignore them. https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... File components/omnibox/browser/url_index_private_data.cc (right): https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... components/omnibox/browser/url_index_private_data.cc:128: auto res = ContinerCast<ResultingSet>(get_set(*first)); On 2017/02/26 01:12:16, dyaroshev wrote: > On 2017/02/25 04:54:42, Peter Kasting wrote: > > It seems like the main reason you need ContainerCast is because one of the > > callers is trying to supply a vector type for ResultingSet. > > > > This sort of makes the type names and comments of this function a lie. They > > shouldn't explicitly say "set" if this is really just about working on > arbitrary > > containers. > > > > But the other fix would be to intersect to a set, and then pull a vector out > of > > that at the end to return from the caller that wants to do so. This would > allow > > removing all the ContainerCast stuff, which still seems like a desirable goal > to > > me in terms of lowering boilerplate and improving readability. It's not > > instantly obvious how it would affect efficiency, but it seems like it would > > lower it slightly (but maybe not enough to matter). > > Since this is the cleaning up patch and I don't see a 100% satisfying solution - > I've reverted the intersect patch, created a task for writing a generic > union/intersect functions and added a todo. Is this ok? It's fine. https://codereview.chromium.org/2690303012/diff/100001/components/omnibox/bro... File components/omnibox/browser/url_index_private_data.cc (right): https://codereview.chromium.org/2690303012/diff/100001/components/omnibox/bro... components/omnibox/browser/url_index_private_data.cc:486: base::string16 uni_word = *iter; Nit: Inline into next statement. https://codereview.chromium.org/2690303012/diff/100001/components/omnibox/bro... components/omnibox/browser/url_index_private_data.cc:489: history_ids.clear(); Nit: Just "return HistoryIDVector();" rather than clear() + break; + return. (here, plus two similar places below) https://codereview.chromium.org/2690303012/diff/100001/components/omnibox/bro... components/omnibox/browser/url_index_private_data.cc:632: word_id_set.clear(); Nit: Same comment as above https://codereview.chromium.org/2690303012/diff/100001/components/omnibox/bro... components/omnibox/browser/url_index_private_data.cc:635: WordIDSet& char_word_id_set(char_iter->second); Nit: Can be const https://codereview.chromium.org/2690303012/diff/100001/components/omnibox/bro... components/omnibox/browser/url_index_private_data.cc:644: // First character results becomes base set of results. Nit: These comments just restate the code, remove. https://codereview.chromium.org/2690303012/diff/100001/components/omnibox/bro... components/omnibox/browser/url_index_private_data.cc:712: if (new_scored_match.raw_score != 0) Nit: > 0?
Message was sent while issue was closed.
On 2017/02/28 00:33:55, Peter Kasting wrote: > LGTM > > Most of the comments below ended up being about the original code you restored, > which I only reviewed because it was a diff from the previous patch set. > They're all optional, you're welcome to do them in some following patch if you > think they're a win, or ignore them. > > https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... > File components/omnibox/browser/url_index_private_data.cc (right): > > https://codereview.chromium.org/2690303012/diff/80001/components/omnibox/brow... > components/omnibox/browser/url_index_private_data.cc:128: auto res = > ContinerCast<ResultingSet>(get_set(*first)); > On 2017/02/26 01:12:16, dyaroshev wrote: > > On 2017/02/25 04:54:42, Peter Kasting wrote: > > > It seems like the main reason you need ContainerCast is because one of the > > > callers is trying to supply a vector type for ResultingSet. > > > > > > This sort of makes the type names and comments of this function a lie. They > > > shouldn't explicitly say "set" if this is really just about working on > > arbitrary > > > containers. > > > > > > But the other fix would be to intersect to a set, and then pull a vector out > > of > > > that at the end to return from the caller that wants to do so. This would > > allow > > > removing all the ContainerCast stuff, which still seems like a desirable > goal > > to > > > me in terms of lowering boilerplate and improving readability. It's not > > > instantly obvious how it would affect efficiency, but it seems like it would > > > lower it slightly (but maybe not enough to matter). > > > > Since this is the cleaning up patch and I don't see a 100% satisfying solution > - > > I've reverted the intersect patch, created a task for writing a generic > > union/intersect functions and added a todo. Is this ok? > > It's fine. > > https://codereview.chromium.org/2690303012/diff/100001/components/omnibox/bro... > File components/omnibox/browser/url_index_private_data.cc (right): > > https://codereview.chromium.org/2690303012/diff/100001/components/omnibox/bro... > components/omnibox/browser/url_index_private_data.cc:486: base::string16 > uni_word = *iter; > Nit: Inline into next statement. > > https://codereview.chromium.org/2690303012/diff/100001/components/omnibox/bro... > components/omnibox/browser/url_index_private_data.cc:489: history_ids.clear(); > Nit: Just "return HistoryIDVector();" rather than clear() + break; + return. > (here, plus two similar places below) > > https://codereview.chromium.org/2690303012/diff/100001/components/omnibox/bro... > components/omnibox/browser/url_index_private_data.cc:632: word_id_set.clear(); > Nit: Same comment as above > > https://codereview.chromium.org/2690303012/diff/100001/components/omnibox/bro... > components/omnibox/browser/url_index_private_data.cc:635: WordIDSet& > char_word_id_set(char_iter->second); > Nit: Can be const > > https://codereview.chromium.org/2690303012/diff/100001/components/omnibox/bro... > components/omnibox/browser/url_index_private_data.cc:644: // First character > results becomes base set of results. > Nit: These comments just restate the code, remove. > > https://codereview.chromium.org/2690303012/diff/100001/components/omnibox/bro... > components/omnibox/browser/url_index_private_data.cc:712: if > (new_scored_match.raw_score != 0) > Nit: > 0? Sorry that I merged without waiting - I thought that since we have another patch to go - I'll do your suggestions there. |