|
|
DescriptionAdd range insertion for base::flat_tree
This change implements an efficient algorithm for base::flat_tree::insert(Iter, Iter).
BUG=671759, 682249
Review-Url: https://codereview.chromium.org/2853333004
Cr-Commit-Position: refs/heads/master@{#473148}
Committed: https://chromium.googlesource.com/chromium/src/+/154dc49655238f0fc9928763bcf47930823a8e02
Patch Set 1 #Patch Set 2 : Faster Algorithm #
Total comments: 3
Patch Set 3 : Rename prev_end to middle and check for sortedness #
Total comments: 13
Patch Set 4 : Add iterator arguments to sort_and_unique and test for empty range #
Total comments: 4
Patch Set 5 : Reuse Binary Search Result and Dupes #
Total comments: 11
Patch Set 6 : Next Round #
Total comments: 6
Patch Set 7 : Address comments #
Total comments: 2
Patch Set 8 : Next Round #
Total comments: 12
Patch Set 9 : Next Round of Comments #
Total comments: 10
Patch Set 10 : Another round of comments #
Messages
Total messages: 83 (53 generated)
The CQ bit was checked by jdoerrie@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: android_compile_dbg on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_comp...)
The CQ bit was checked by jdoerrie@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_tsan_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by jdoerrie@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Description was changed from ========== Add range insertion for base::flat_tree BUG= ========== to ========== Add range insertion for base::flat_tree This change implements an efficient algorithm for base::flat_tree::insert(Iter, Iter). BUG=671759, 682249 ==========
jdoerrie@chromium.org changed reviewers: + brettw@chromium.org, danakj@chromium.org, dyaroshev@yandex-team.ru
Dear all, please review. This change follows the discussion between dyaroshev@ and me in http://crbug.com/682249. https://codereview.chromium.org/2853333004/diff/20001/base/containers/flat_tr... File base/containers/flat_tree.h (right): https://codereview.chromium.org/2853333004/diff/20001/base/containers/flat_tr... base/containers/flat_tree.h:614: size_type prev_size = size(); Here we could add checks to immediately return if first == last or to dispatch to single element insert when std::next(first) == last. I did not do this, because in the current implementation all operations are no-ops if first == last and explicitly checking if std::next(first) == last breaks this algorithm for InputIterators (strictly speaking, the optimization would only work for ForwardIterators). This is also one of the reasons why I did not add a call to reserve here. This also would break for ForwardIterators and could potentially lead to very bad performance if insert(Iter, Iter) and thus reserve() would be called in a loop. https://codereview.chromium.org/2853333004/diff/20001/base/containers/flat_tr... base/containers/flat_tree.h:636: std::lower_bound(begin(), prev_end, *prev_end, value_comp()), prev_end, Here I'm assuming standard library implementations are smart enough to simply return immediately when lower_bound() == prev_end. We could add explicit checks for this, and also add a O(1) check to skip the binary search when it's clearly not necessary (i.e. when the whole range is already sorted). In this version I decided against it, because I wanted to keep explicit checks to a minimum and just make the algorithm as fast as possible for the general case. What is your opinion here?
I didn't look at this yet, but I'm noticing I got a similar code review: https://codereview.chromium.org/2859593002/ (it looks like that one was abandoned, not sure what the decision was).
The CQ bit was checked by jdoerrie@chromium.org to run a CQ dry run
jdoerrie@chromium.org changed reviewers: + vmpstr@chromium.org
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Thanks for the link to the other CL, I wasn't aware of it. It looks like it was abandoned because it wasn't clear how it would perform for many inserts of very small ranges. In this implementation I took extra care to be as efficient as possible for that case, but I still won't be able to match the O(n) single element insert with a hint can provide for already sorted ranges. However, I do like vmpstr@'s idea of providing an API that merges two flat_trees. There we would be able to guarantee O(n), given that we're only merging two already sorted ranges. This does actually mirror the STL, since as of C++17 there are std::set::merge and std::map::merge: - http://en.cppreference.com/w/cpp/container/set/merge - http://en.cppreference.com/w/cpp/container/map/merge The implementation would be different and we couldn't guarantee stable pointers and iterators, but the API is the same. https://codereview.chromium.org/2853333004/diff/20001/base/containers/flat_tr... File base/containers/flat_tree.h (right): https://codereview.chromium.org/2853333004/diff/20001/base/containers/flat_tr... base/containers/flat_tree.h:636: std::lower_bound(begin(), prev_end, *prev_end, value_comp()), prev_end, On 2017/05/08 14:52:52, jdoerrie (slow this week) wrote: > Here I'm assuming standard library implementations are smart enough to simply > return immediately when lower_bound() == prev_end. We could add explicit checks > for this, and also add a O(1) check to skip the binary search when it's clearly > not necessary (i.e. when the whole range is already sorted). In this version I > decided against it, because I wanted to keep explicit checks to a minimum and > just make the algorithm as fast as possible for the general case. What is your > opinion here? I ended up implementing the extra check for sorted-ness, because I realized I could do it in the same if condition and it removes the need to hope library implementations are doing the optimization.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_chromium_x64_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_x64_...) win_clang on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_clang/builds/...)
https://codereview.chromium.org/2853333004/diff/40001/base/containers/flat_tr... File base/containers/flat_tree.h (right): https://codereview.chromium.org/2853333004/diff/40001/base/containers/flat_tr... base/containers/flat_tree.h:619: }); You are doing binary search for the first non-unique element twice. Save the result. When you do - please split into find_if and copy_if, so you do not do unnecessary if checks for remaining elements, they seem to be expensive. https://codereview.chromium.org/2853333004/diff/40001/base/containers/flat_tr... base/containers/flat_tree.h:631: end()); Just add begin/end to sort_and_unque method?
On 2017/05/08 23:11:43, jdoerrie (slow this week) wrote: > On 2017/05/08 14:52:52, jdoerrie (slow this week) wrote: > > Here I'm assuming standard library implementations are smart enough to simply > > return immediately when lower_bound() == prev_end. We could add explicit > checks > > for this, and also add a O(1) check to skip the binary search when it's > clearly > > not necessary (i.e. when the whole range is already sorted). In this version I > > decided against it, because I wanted to keep explicit checks to a minimum and > > just make the algorithm as fast as possible for the general case. What is your > > opinion here? > > I ended up implementing the extra check for sorted-ness, because I realized I > could do it in the same if condition and it removes the need to hope library > implementations are doing the optimization. The standard library does not check for lower bound, but it does check for empty ranges (maybe a few layers deeper than you want). Check doesn't seem to make that much sence. https://codereview.chromium.org/2853333004/diff/40001/base/containers/flat_tr... File base/containers/flat_tree.h (right): https://codereview.chromium.org/2853333004/diff/40001/base/containers/flat_tr... base/containers/flat_tree.h:637: std::inplace_merge(std::lower_bound(begin(), middle, *middle, value_comp()), !value_comp()(*std::prev(middle), *middle)) <==> lower_bound == middle. For an empty range standard library should shortcut efficiently.
https://codereview.chromium.org/2853333004/diff/40001/base/containers/flat_tr... File base/containers/flat_tree_unittest.cc (right): https://codereview.chromium.org/2853333004/diff/40001/base/containers/flat_tr... base/containers/flat_tree_unittest.cc:772: } An empty range check is important. Maybe we want to generate some inputs and check against those? Doesn't seem that hard.
https://codereview.chromium.org/2853333004/diff/40001/base/containers/flat_tr... File base/containers/flat_tree.h (right): https://codereview.chromium.org/2853333004/diff/40001/base/containers/flat_tr... base/containers/flat_tree.h:619: }); On 2017/05/09 01:38:49, dyaroshev wrote: > You are doing binary search for the first non-unique element twice. > Save the result. > When you do - please split into find_if and copy_if, so you do not do > unnecessary if checks for remaining elements, they seem to be expensive. I thought about it a bit - you would have to find new minimal element and cache binary search for it, not the first guy in new elements.
The CQ bit was checked by jdoerrie@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2853333004/diff/40001/base/containers/flat_tr... File base/containers/flat_tree.h (right): https://codereview.chromium.org/2853333004/diff/40001/base/containers/flat_tr... base/containers/flat_tree.h:619: }); On 2017/05/09 11:31:42, dyaroshev wrote: > On 2017/05/09 01:38:49, dyaroshev wrote: > > You are doing binary search for the first non-unique element twice. > > Save the result. > > When you do - please split into find_if and copy_if, so you do not do > > unnecessary if checks for remaining elements, they seem to be expensive. > > I thought about it a bit - you would have to find new minimal element and cache > binary search for it, not the first guy in new elements. Yeah, I gave this some though as well. Given that the vector might reallocate during insertion, we would need to cache the index instead of an iterator to the first insertion point and use that later. It's doable, but it makes the code a bit more ugly. What do you think? https://codereview.chromium.org/2853333004/diff/40001/base/containers/flat_tr... base/containers/flat_tree.h:631: end()); On 2017/05/09 01:38:49, dyaroshev wrote: > Just add begin/end to sort_and_unque method? Done. https://codereview.chromium.org/2853333004/diff/40001/base/containers/flat_tr... base/containers/flat_tree.h:637: std::inplace_merge(std::lower_bound(begin(), middle, *middle, value_comp()), On 2017/05/09 01:52:50, dyaroshev wrote: > !value_comp()(*std::prev(middle), *middle)) <==> lower_bound == middle. > For an empty range standard library should shortcut efficiently. > That is true, however an advantage with the current implementation is that it also skips the binary search when its not necessary. This can result in a speed-up of about 25% for doing single element range inserts in ascending order: Number of Items: 1048576 Method: Without Check= 121599 us Number of Items: 1048576 Method: With Check= 94437 us Number of Items: 2097152 Method: Without Check= 242699 us Number of Items: 2097152 Method: With Check= 152297 us Number of Items: 4194304 Method: Without Check= 385009 us Number of Items: 4194304 Method: With Check= 293204 us Number of Items: 8388608 Method: Without Check= 788019 us Number of Items: 8388608 Method: With Check= 609145 us Number of Items: 16777216 Method: Without Check= 1628433 us Number of Items: 16777216 Method: With Check= 1240901 us Number of Items: 33554432 Method: Without Check= 3341725 us Number of Items: 33554432 Method: With Check= 2590313 us https://codereview.chromium.org/2853333004/diff/40001/base/containers/flat_tr... File base/containers/flat_tree_unittest.cc (right): https://codereview.chromium.org/2853333004/diff/40001/base/containers/flat_tr... base/containers/flat_tree_unittest.cc:772: } On 2017/05/09 09:32:46, dyaroshev wrote: > An empty range check is important. > > Maybe we want to generate some inputs and check against those? Doesn't seem that > hard. Done.
https://codereview.chromium.org/2853333004/diff/60001/base/containers/flat_tr... File base/containers/flat_tree.h (right): https://codereview.chromium.org/2853333004/diff/60001/base/containers/flat_tr... base/containers/flat_tree.h:204: void insert(InputIterator first, InputIterator last); I hate diverging from STL, but we've been consistent in the other "range" cases to specify a FlatContainerDupes parameter. We probably want that here also.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: android_arm64_dbg_recipe on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_arm6...)
https://codereview.chromium.org/2853333004/diff/40001/base/containers/flat_tr... File base/containers/flat_tree.h (right): https://codereview.chromium.org/2853333004/diff/40001/base/containers/flat_tr... base/containers/flat_tree.h:619: }); On 2017/05/09 17:03:30, jdoerrie (slow this week) wrote: > On 2017/05/09 11:31:42, dyaroshev wrote: > > On 2017/05/09 01:38:49, dyaroshev wrote: > > > You are doing binary search for the first non-unique element twice. > > > Save the result. > > > When you do - please split into find_if and copy_if, so you do not do > > > unnecessary if checks for remaining elements, they seem to be expensive. > > > > I thought about it a bit - you would have to find new minimal element and > cache > > binary search for it, not the first guy in new elements. > > Yeah, I gave this some though as well. Given that the vector might reallocate > during insertion, we would need to cache the index instead of an iterator to the > first insertion point and use that later. It's doable, but it makes the code a > bit more ugly. What do you think? I think, that this binary search might be why you are noticably slower than insert(value_type). It's base - I say - worth it. https://codereview.chromium.org/2853333004/diff/40001/base/containers/flat_tr... base/containers/flat_tree.h:637: std::inplace_merge(std::lower_bound(begin(), middle, *middle, value_comp()), On 2017/05/09 17:03:30, jdoerrie (slow this week) wrote: > On 2017/05/09 01:52:50, dyaroshev wrote: > > !value_comp()(*std::prev(middle), *middle)) <==> lower_bound == middle. > > For an empty range standard library should shortcut efficiently. > > > > That is true, however an advantage with the current implementation is that it > also skips the binary search when its not necessary. This can result in a > speed-up of about 25% for doing single element range inserts in ascending order: > > Number of Items: 1048576 Method: Without Check= 121599 us > Number of Items: 1048576 Method: With Check= 94437 us > > Number of Items: 2097152 Method: Without Check= 242699 us > Number of Items: 2097152 Method: With Check= 152297 us > > Number of Items: 4194304 Method: Without Check= 385009 us > Number of Items: 4194304 Method: With Check= 293204 us > > Number of Items: 8388608 Method: Without Check= 788019 us > Number of Items: 8388608 Method: With Check= 609145 us > > Number of Items: 16777216 Method: Without Check= 1628433 us > Number of Items: 16777216 Method: With Check= 1240901 us > > Number of Items: 33554432 Method: Without Check= 3341725 us > Number of Items: 33554432 Method: With Check= 2590313 us Is it your binary search or is it the one in inplace_merge? If it yours, caching it would help. I'm all for caching. https://codereview.chromium.org/2853333004/diff/60001/base/containers/flat_tr... File base/containers/flat_tree.h (right): https://codereview.chromium.org/2853333004/diff/60001/base/containers/flat_tr... base/containers/flat_tree.h:204: void insert(InputIterator first, InputIterator last); On 2017/05/09 20:25:45, brettw (behind--catching up) wrote: > I hate diverging from STL, but we've been consistent in the other "range" cases > to specify a FlatContainerDupes parameter. We probably want that here also. I thought about that and I don't see how to do it efficiently (at the very least right away). It's consistent with insert(value_type) - can we say that it's consisten enough?
https://codereview.chromium.org/2853333004/diff/60001/base/containers/flat_tr... File base/containers/flat_tree.h (right): https://codereview.chromium.org/2853333004/diff/60001/base/containers/flat_tr... base/containers/flat_tree.h:204: void insert(InputIterator first, InputIterator last); On 2017/05/09 21:37:19, dyaroshev wrote: > On 2017/05/09 20:25:45, brettw (behind--catching up) wrote: > > I hate diverging from STL, but we've been consistent in the other "range" > cases > > to specify a FlatContainerDupes parameter. We probably want that here also. > > I thought about that and I don't see how to do it efficiently (at the very least > right away). It's consistent with insert(value_type) - can we say that it's > consisten enough? I thought about it a bit more - probably we can make it work. Will KEEP_LAST replace the elements that are already in?
On 2017/05/09 22:04:05, dyaroshev wrote: > I thought about it a bit more - probably we can make it work. Will KEEP_LAST > replace the elements that are already in? I suppose that would be nice.
The CQ bit was checked by jdoerrie@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2853333004/diff/40001/base/containers/flat_tr... File base/containers/flat_tree.h (right): https://codereview.chromium.org/2853333004/diff/40001/base/containers/flat_tr... base/containers/flat_tree.h:619: }); On 2017/05/09 21:37:19, dyaroshev wrote: > On 2017/05/09 17:03:30, jdoerrie (slow this week) wrote: > > On 2017/05/09 11:31:42, dyaroshev wrote: > > > On 2017/05/09 01:38:49, dyaroshev wrote: > > > > You are doing binary search for the first non-unique element twice. > > > > Save the result. > > > > When you do - please split into find_if and copy_if, so you do not do > > > > unnecessary if checks for remaining elements, they seem to be expensive. > > > > > > I thought about it a bit - you would have to find new minimal element and > > cache > > > binary search for it, not the first guy in new elements. > > > > Yeah, I gave this some though as well. Given that the vector might reallocate > > during insertion, we would need to cache the index instead of an iterator to > the > > first insertion point and use that later. It's doable, but it makes the code a > > bit more ugly. What do you think? > > I think, that this binary search might be why you are noticably slower than > insert(value_type). It's base - I say - worth it. Done. https://codereview.chromium.org/2853333004/diff/40001/base/containers/flat_tr... base/containers/flat_tree.h:637: std::inplace_merge(std::lower_bound(begin(), middle, *middle, value_comp()), On 2017/05/09 21:37:19, dyaroshev wrote: > On 2017/05/09 17:03:30, jdoerrie (slow this week) wrote: > > On 2017/05/09 01:52:50, dyaroshev wrote: > > > !value_comp()(*std::prev(middle), *middle)) <==> lower_bound == middle. > > > For an empty range standard library should shortcut efficiently. > > > > > > > That is true, however an advantage with the current implementation is that it > > also skips the binary search when its not necessary. This can result in a > > speed-up of about 25% for doing single element range inserts in ascending > order: > > > > Number of Items: 1048576 Method: Without Check= 121599 us > > Number of Items: 1048576 Method: With Check= 94437 us > > > > Number of Items: 2097152 Method: Without Check= 242699 us > > Number of Items: 2097152 Method: With Check= 152297 us > > > > Number of Items: 4194304 Method: Without Check= 385009 us > > Number of Items: 4194304 Method: With Check= 293204 us > > > > Number of Items: 8388608 Method: Without Check= 788019 us > > Number of Items: 8388608 Method: With Check= 609145 us > > > > Number of Items: 16777216 Method: Without Check= 1628433 us > > Number of Items: 16777216 Method: With Check= 1240901 us > > > > Number of Items: 33554432 Method: Without Check= 3341725 us > > Number of Items: 33554432 Method: With Check= 2590313 us > > Is it your binary search or is it the one in inplace_merge? If it yours, caching > it would help. I'm all for caching. Yeah, this was my binary search. I think (buffered) inplace_merge can't do binary search because of the at most n-1 comparisons requirement. In the worst case binary search does not help at all and you have to do N - 1 + O(log N) comparisons. I did implement the caching solution in the new patch set, however. https://codereview.chromium.org/2853333004/diff/60001/base/containers/flat_tr... File base/containers/flat_tree.h (right): https://codereview.chromium.org/2853333004/diff/60001/base/containers/flat_tr... base/containers/flat_tree.h:204: void insert(InputIterator first, InputIterator last); On 2017/05/09 22:04:05, dyaroshev wrote: > On 2017/05/09 21:37:19, dyaroshev wrote: > > On 2017/05/09 20:25:45, brettw (behind--catching up) wrote: > > > I hate diverging from STL, but we've been consistent in the other "range" > > cases > > > to specify a FlatContainerDupes parameter. We probably want that here also. > > > > I thought about that and I don't see how to do it efficiently (at the very > least > > right away). It's consistent with insert(value_type) - can we say that it's > > consisten enough? > > I thought about it a bit more - probably we can make it work. Will KEEP_LAST > replace the elements that are already in? Done.
owners LGTM I didn't look at the logic details, I think dyaroshev was looking at that so please wait for his OK. Can you add the new insert function to the quick reference in flat_set.h and flat_map.h? The problem is most people don't look in the base class when they want to know what they can call. https://codereview.chromium.org/2853333004/diff/80001/base/containers/flat_tr... File base/containers/flat_tree.h (right): https://codereview.chromium.org/2853333004/diff/80001/base/containers/flat_tr... base/containers/flat_tree.h:204: void insert(InputIterator first, Can you add a comment here about FlatContainerDupes applying to both the existing and inserted items (i.e. it overwrites on "LAST").
https://codereview.chromium.org/2853333004/diff/80001/base/containers/flat_tr... File base/containers/flat_tree.h (right): https://codereview.chromium.org/2853333004/diff/80001/base/containers/flat_tr... base/containers/flat_tree.h:621: difference_type pos_first_new = std::distance(begin(), end()); Nit: Just say size()? Or are there problems with difference_type/size_type conversions? I would use two different variables for pos_first_new and original_size, it's slightly confusing this way. (Do not insist). https://codereview.chromium.org/2853333004/diff/80001/base/containers/flat_tr... base/containers/flat_tree.h:624: }; How do you feel about reserve if we can compute distance? Folly does it. You can do it with an if, by the way, if you want to: if (std::is_same<typename std::iterator_traits<InputIterator>::iterator_category, std::random_access_iterator_tag>::value) reserve(std::distance(first, last)); https://codereview.chromium.org/2853333004/diff/80001/base/containers/flat_tr... base/containers/flat_tree.h:632: } else if (lower != middle() && dupes == KEEP_LAST_OF_DUPES) { 1 - The check for (dupes == KEEP_LAST_OF_DUPES) should not be inside a loop. 2 - Maybe factor this piece out (I do not insist) https://codereview.chromium.org/2853333004/diff/80001/base/containers/flat_tr... base/containers/flat_tree.h:643: value_comp()); Have you measured? Are you closer now to inserting by one element?
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by jdoerrie@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2853333004/diff/80001/base/containers/flat_tr... File base/containers/flat_tree.h (right): https://codereview.chromium.org/2853333004/diff/80001/base/containers/flat_tr... base/containers/flat_tree.h:204: void insert(InputIterator first, On 2017/05/10 16:52:06, brettw (behind--catching up) wrote: > Can you add a comment here about FlatContainerDupes applying to both the > existing and inserted items (i.e. it overwrites on "LAST"). Done. https://codereview.chromium.org/2853333004/diff/80001/base/containers/flat_tr... base/containers/flat_tree.h:621: difference_type pos_first_new = std::distance(begin(), end()); On 2017/05/10 17:12:09, dyaroshev wrote: > Nit: > Just say size()? Or are there problems with difference_type/size_type > conversions? > I would use two different variables for pos_first_new and original_size, it's > slightly confusing this way. > > (Do not insist). Done both now. There usually is a difference, difference_type is commonly ptrdiff_t, while size_type is size_t. However, I don't think there should be issues initializing one with the other. https://codereview.chromium.org/2853333004/diff/80001/base/containers/flat_tr... base/containers/flat_tree.h:624: }; On 2017/05/10 17:12:09, dyaroshev wrote: > How do you feel about reserve if we can compute distance? > Folly does it. > > You can do it with an if, by the way, if you want to: > if (std::is_same<typename > std::iterator_traits<InputIterator>::iterator_category, > std::random_access_iterator_tag>::value) > reserve(std::distance(first, last)); I don't like using reserve, because it can lead to very bad performance when this method is called in a loop. Commonly reserve(n) only allocates space for exactly n elements, causing a reallocation every single time. This leads to quadratic performance as the benchmark shows: Number of Items: 16384 Method: With Reserve= 163415 us Number of Items: 16384 Method: Without Reserve= 2556 us Number of Items: 32768 Method: With Reserve= 654739 us Number of Items: 32768 Method: Without Reserve= 4798 us Number of Items: 65536 Method: With Reserve= 2017561 us Number of Items: 65536 Method: Without Reserve= 9691 us Number of Items: 131072 Method: With Reserve= 8027410 us Number of Items: 131072 Method: Without Reserve= 18062 us Number of Items: 262144 Method: With Reserve= 32167225 us Number of Items: 262144 Method: Without Reserve= 32479 us We could fix this by doing reserve(std::max(size() + last - first, growth_factor * size()), but I don't like the idea of defining a growth factor ourself and messing with what standard library implementations intended. However, I do like your idea of checking for multipass iterators in order to optimize for insertions of one element. With the introduction of dupes we cannot simply dispatch to insert(val) anymore, but we can provide a simplified implementation for this case. Here are the benchmarks for it: Number of Items: 1048576 Method: One By One = 43246 us Number of Items: 1048576 Method: Special Case = 38016 us Number of Items: 1048576 Method: No Special Case = 159393 us Number of Items: 2097152 Method: One By One = 95512 us Number of Items: 2097152 Method: Special Case = 80637 us Number of Items: 2097152 Method: No Special Case = 306446 us Number of Items: 4194304 Method: One By One = 204112 us Number of Items: 4194304 Method: Special Case = 167022 us Number of Items: 4194304 Method: No Special Case = 488665 us Number of Items: 8388608 Method: One By One = 309807 us Number of Items: 8388608 Method: Special Case = 265266 us Number of Items: 8388608 Method: No Special Case = 990554 us Number of Items: 16777216 Method: One By One = 645032 us Number of Items: 16777216 Method: Special Case = 570092 us Number of Items: 16777216 Method: No Special Case = 2005552 us Number of Items: 33554432 Method: One By One = 1355454 us Number of Items: 33554432 Method: Special Case = 1177451 us Number of Items: 33554432 Method: No Special Case = 4051799 us Doing the special case is 3 - 4x faster and can even out perform one-by-one, therefore I added this now in this patch set. https://codereview.chromium.org/2853333004/diff/80001/base/containers/flat_tr... base/containers/flat_tree.h:632: } else if (lower != middle() && dupes == KEEP_LAST_OF_DUPES) { On 2017/05/10 17:12:09, dyaroshev wrote: > 1 - The check for (dupes == KEEP_LAST_OF_DUPES) should not be inside a loop. > 2 - Maybe factor this piece out (I do not insist) Done. https://codereview.chromium.org/2853333004/diff/80001/base/containers/flat_tr... base/containers/flat_tree.h:643: value_comp()); On 2017/05/10 17:12:09, dyaroshev wrote: > Have you measured? Are you closer now to inserting by one element? See above.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
https://codereview.chromium.org/2853333004/diff/80001/base/containers/flat_tr... File base/containers/flat_tree.h (right): https://codereview.chromium.org/2853333004/diff/80001/base/containers/flat_tr... base/containers/flat_tree.h:624: }; On 2017/05/11 06:34:57, jdoerrie (slow this week) wrote: > On 2017/05/10 17:12:09, dyaroshev wrote: > > How do you feel about reserve if we can compute distance? > > Folly does it. > > > > You can do it with an if, by the way, if you want to: > > if (std::is_same<typename > > std::iterator_traits<InputIterator>::iterator_category, > > std::random_access_iterator_tag>::value) > > reserve(std::distance(first, last)); > > I don't like using reserve, because it can lead to very bad performance when > this method is called in a loop. Commonly reserve(n) only allocates space for > exactly n elements, causing a reallocation every single time. This leads to > quadratic performance as the benchmark shows: > > Number of Items: 16384 Method: With Reserve= 163415 us > Number of Items: 16384 Method: Without Reserve= 2556 us > > Number of Items: 32768 Method: With Reserve= 654739 us > Number of Items: 32768 Method: Without Reserve= 4798 us > > Number of Items: 65536 Method: With Reserve= 2017561 us > Number of Items: 65536 Method: Without Reserve= 9691 us > > Number of Items: 131072 Method: With Reserve= 8027410 us > Number of Items: 131072 Method: Without Reserve= 18062 us > > Number of Items: 262144 Method: With Reserve= 32167225 us > Number of Items: 262144 Method: Without Reserve= 32479 us > > We could fix this by doing reserve(std::max(size() + last - first, growth_factor > * size()), but I don't like the idea of defining a growth factor ourself and > messing with what standard library implementations intended. > > However, I do like your idea of checking for multipass iterators in order to > optimize for insertions of one element. With the introduction of dupes we cannot > simply dispatch to insert(val) anymore, but we can provide a simplified > implementation for this case. Here are the benchmarks for it: > > Number of Items: 1048576 Method: One By One = 43246 us > Number of Items: 1048576 Method: Special Case = 38016 us > Number of Items: 1048576 Method: No Special Case = 159393 us > > Number of Items: 2097152 Method: One By One = 95512 us > Number of Items: 2097152 Method: Special Case = 80637 us > Number of Items: 2097152 Method: No Special Case = 306446 us > > Number of Items: 4194304 Method: One By One = 204112 us > Number of Items: 4194304 Method: Special Case = 167022 us > Number of Items: 4194304 Method: No Special Case = 488665 us > > Number of Items: 8388608 Method: One By One = 309807 us > Number of Items: 8388608 Method: Special Case = 265266 us > Number of Items: 8388608 Method: No Special Case = 990554 us > > Number of Items: 16777216 Method: One By One = 645032 us > Number of Items: 16777216 Method: Special Case = 570092 us > Number of Items: 16777216 Method: No Special Case = 2005552 us > > Number of Items: 33554432 Method: One By One = 1355454 us > Number of Items: 33554432 Method: Special Case = 1177451 us > Number of Items: 33554432 Method: No Special Case = 4051799 us > > Doing the special case is 3 - 4x faster and can even out perform one-by-one, > therefore I added this now in this patch set. That's a good point. https://codereview.chromium.org/2853333004/diff/100001/base/containers/flat_t... File base/containers/flat_tree.h (right): https://codereview.chromium.org/2853333004/diff/100001/base/containers/flat_t... base/containers/flat_tree.h:636: if (is_multipass<InputIterator>() && std::next(first) == last) { We probably need this to be smth along the lines of if (distance < boundary) -> do insert by one. But not in this patch. https://codereview.chromium.org/2853333004/diff/100001/base/containers/flat_t... base/containers/flat_tree.h:656: difference_type pos_first_new = size(); Nit: declare pos_first_new where you need it. https://codereview.chromium.org/2853333004/diff/100001/base/containers/flat_t... base/containers/flat_tree.h:669: case KEEP_LAST_OF_DUPES: Don't do this check inside a loop. Even if it's optimised away: a) it's a good practice, b) the debug build would still be slower.
The CQ bit was checked by jdoerrie@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Next round, I tried to de-duplicate the code a bit. Performance for single element insertions is still on-par with inserts one by one. https://codereview.chromium.org/2853333004/diff/100001/base/containers/flat_t... File base/containers/flat_tree.h (right): https://codereview.chromium.org/2853333004/diff/100001/base/containers/flat_t... base/containers/flat_tree.h:636: if (is_multipass<InputIterator>() && std::next(first) == last) { On 2017/05/11 09:46:43, dyaroshev wrote: > We probably need this to be smth along the lines of if (distance < boundary) -> > do insert by one. But not in this patch. Acknowledged, although even doing inplace inserts for a range of size two can already be very bad if you insert two new elements at the very beginning of an already big set. https://codereview.chromium.org/2853333004/diff/100001/base/containers/flat_t... base/containers/flat_tree.h:656: difference_type pos_first_new = size(); On 2017/05/11 09:46:43, dyaroshev wrote: > Nit: declare pos_first_new where you need it. Done. https://codereview.chromium.org/2853333004/diff/100001/base/containers/flat_t... base/containers/flat_tree.h:669: case KEEP_LAST_OF_DUPES: On 2017/05/11 09:46:43, dyaroshev wrote: > Don't do this check inside a loop. Even if it's optimised away: a) it's a good > practice, b) the debug build would still be slower. Done.
https://codereview.chromium.org/2853333004/diff/120001/base/containers/flat_t... File base/containers/flat_tree.h (right): https://codereview.chromium.org/2853333004/diff/120001/base/containers/flat_t... base/containers/flat_tree.h:663: } Ok, I seem to fail to explain. Check out this CL for example: https://codereview.chromium.org/2807933002/ It gains up to two times speed up by removing the if statement in the middle of the loop. What I suggest is some form of: ``` if (insert_inplace) { std::copy(first, last, std::inserter(*this, end()); } if (overwrite_existing) { // replace_or_pushback } else { std::copy_if(first, last, ...); } ``` Branches in the middle of the loop cause problems. Other than that I'm good.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by jdoerrie@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_android_rel_ng on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/linux_androi...)
https://codereview.chromium.org/2853333004/diff/120001/base/containers/flat_t... File base/containers/flat_tree.h (right): https://codereview.chromium.org/2853333004/diff/120001/base/containers/flat_t... base/containers/flat_tree.h:663: } On 2017/05/12 11:23:28, dyaroshev wrote: > Ok, I seem to fail to explain. Check out this CL for example: > https://codereview.chromium.org/2807933002/ > It gains up to two times speed up by removing the if statement in the middle of > the loop. > > What I suggest is some form of: > > ``` > if (insert_inplace) { > std::copy(first, last, std::inserter(*this, end()); > } > if (overwrite_existing) { > // replace_or_pushback > } else { > std::copy_if(first, last, ...); > } > ``` > > Branches in the middle of the loop cause problems. Other than that I'm good. Thanks for the clarification. I tried my best to remove branches within loops while still keeping code duplication to a minimum. The performance is still very good, please let me know what you think.
Move only won't work( (std::make_move_iterator allows you to insert ranges of move only elements) Nit: Maybe you want to factor some methods out? Like replace_or_append or insert_or_replace? They can be private? It's up to you, just the method is very long. https://codereview.chromium.org/2853333004/diff/140001/base/containers/flat_t... File base/containers/flat_tree.h (right): https://codereview.chromium.org/2853333004/diff/140001/base/containers/flat_t... base/containers/flat_tree.h:640: if (insert_inplace) { Maybe factor this piece out? Function seems quite big? (Do not insist) https://codereview.chromium.org/2853333004/diff/140001/base/containers/flat_t... base/containers/flat_tree.h:642: std::for_each(first, last, [this](const value_type& val) { This won't work for move only types, and you will invoke copy even for move_iterators. Sinse we don't have generic lambdas and you need perfect forwarding, I don't think it's possible to use a lambda here. https://codereview.chromium.org/2853333004/diff/140001/base/containers/flat_t... base/containers/flat_tree.h:664: const value_type& val) { Again, move is problematic. If you take the push_back out of here - you won't need it.
The CQ bit was checked by jdoerrie@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2853333004/diff/140001/base/containers/flat_t... File base/containers/flat_tree.h (right): https://codereview.chromium.org/2853333004/diff/140001/base/containers/flat_t... base/containers/flat_tree.h:640: if (insert_inplace) { On 2017/05/12 22:10:03, dyaroshev wrote: > Maybe factor this piece out? Function seems quite big? (Do not insist) Done. https://codereview.chromium.org/2853333004/diff/140001/base/containers/flat_t... base/containers/flat_tree.h:642: std::for_each(first, last, [this](const value_type& val) { On 2017/05/12 22:10:03, dyaroshev wrote: > This won't work for move only types, and you will invoke copy even for > move_iterators. > Sinse we don't have generic lambdas and you need perfect forwarding, I don't > think it's possible to use a lambda here. Done. https://codereview.chromium.org/2853333004/diff/140001/base/containers/flat_t... base/containers/flat_tree.h:648: std::copy(first, last, std::inserter(*this, end())); I dropped the call to copy in favor of an explicit loop, because copy can be unsafe if std::distance(first, last) >= 2. Since end() is only evaluated once, it will be invalidated should the first inserted element trigger a reallocation. For the second element we then would try to dereference an invalid iterator. https://codereview.chromium.org/2853333004/diff/140001/base/containers/flat_t... base/containers/flat_tree.h:656: difference_type pos_first_new = original_size; I dropped this optimization, because it would have made the interface of append_* more awkward. Furthermore, the reason for this optimization was mainly efficiency for small ranges, however given that we do insertion inplace in this case it is no longer critical. Getting rid of it makes the code cleaner and has to measurable performance implication for bigger ranges. https://codereview.chromium.org/2853333004/diff/140001/base/containers/flat_t... base/containers/flat_tree.h:664: const value_type& val) { On 2017/05/12 22:10:03, dyaroshev wrote: > Again, move is problematic. If you take the push_back out of here - you won't > need it. Done. https://codereview.chromium.org/2853333004/diff/160001/base/containers/flat_t... File base/containers/flat_tree.h (right): https://codereview.chromium.org/2853333004/diff/160001/base/containers/flat_t... base/containers/flat_tree.h:691: is_multipass<InputIterator>() && std::next(first) == last; You were right with your earlier comment that this should be generalized to std::distance(first, last) <= kSomeThreshold. I did some quick testing and for me the sweet spot seems to be around 6 elements. However, as you said, this change should be done in a future CL.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_chromium_x64_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_x64_...)
LGTM with nits. (not going to torture you anymore)) Great work. I think that caching lower_bound result is important, I wrote my suggestions on how you can do this, but I won't hold it against you if you don't. I've shared our experience with SG14 (study group for the standardization committee) here: https://groups.google.com/a/isocpp.org/forum/#!topic/sg14/E-glkgcgCaY The conversation has been somewhat hijacked with esoteric cases, but the question of insert(first, last) for sure is of value. It would be great to share your results there. Are you willing to do it? If you aren't - can I? (obviously I'd say that it's your research). https://codereview.chromium.org/2853333004/diff/140001/base/containers/flat_t... File base/containers/flat_tree.h (right): https://codereview.chromium.org/2853333004/diff/140001/base/containers/flat_t... base/containers/flat_tree.h:648: std::copy(first, last, std::inserter(*this, end())); On 2017/05/17 06:41:16, jdoerrie wrote: > I dropped the call to copy in favor of an explicit loop, because copy can be > unsafe if std::distance(first, last) >= 2. Since end() is only evaluated once, > it will be invalidated should the first inserted element trigger a reallocation. > For the second element we then would try to dereference an invalid iterator. Nah, it assigns the iterator. Smth like: pos = container.insert(pos, val); https://codereview.chromium.org/2853333004/diff/140001/base/containers/flat_t... base/containers/flat_tree.h:656: difference_type pos_first_new = original_size; On 2017/05/17 06:41:15, jdoerrie wrote: > I dropped this optimization, because it would have made the interface of > append_* more awkward. Furthermore, the reason for this optimization was mainly > efficiency for small ranges, however given that we do insertion inplace in this > case it is no longer critical. > > Getting rid of it makes the code cleaner and has to measurable performance > implication for bigger ranges. I think that inserting small ranges is actually an important use case. https://codereview.chromium.org/2853333004/diff/160001/base/containers/flat_t... File base/containers/flat_tree.h (right): https://codereview.chromium.org/2853333004/diff/160001/base/containers/flat_t... base/containers/flat_tree.h:357: std::pair<iterator, bool> append_or_assign(iterator first, I don't actually believe, that in these methods returning where the element landed is more useful than returning lower_bound result. This solves the awkward interface problem for cashing binary search result. https://codereview.chromium.org/2853333004/diff/160001/base/containers/flat_t... base/containers/flat_tree.h:683: FlatContainerDupes dupes) -> void { Nit: we just wrote void, cause the type doesn't depend on the flat_tree. https://codereview.chromium.org/2853333004/diff/160001/base/containers/flat_t... base/containers/flat_tree.h:691: is_multipass<InputIterator>() && std::next(first) == last; On 2017/05/17 at 06:41:16, jdoerrie wrote: > You were right with your earlier comment that this should be generalized to std::distance(first, last) <= kSomeThreshold. I did some quick testing and for me the sweet spot seems to be around 6 elements. However, as you said, this change should be done in a future CL. I suspect it depends on two numbers: size() and distance(first, last). https://codereview.chromium.org/2853333004/diff/160001/base/containers/flat_t... base/containers/flat_tree.h:699: insert(*first); Nit: std::copy(first, last, std::inserter(*this, end()); I've seen your comment and will explain there. https://codereview.chromium.org/2853333004/diff/160001/base/containers/flat_t... base/containers/flat_tree.h:729: std::lower_bound(begin(), middle(), *middle(), value_comp()), middle(), I really think you ought to cache binary search.
The CQ bit was checked by jdoerrie@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2853333004/diff/140001/base/containers/flat_t... File base/containers/flat_tree.h (right): https://codereview.chromium.org/2853333004/diff/140001/base/containers/flat_t... base/containers/flat_tree.h:648: std::copy(first, last, std::inserter(*this, end())); On 2017/05/17 10:15:19, dyaroshev wrote: > On 2017/05/17 06:41:16, jdoerrie wrote: > > I dropped the call to copy in favor of an explicit loop, because copy can be > > unsafe if std::distance(first, last) >= 2. Since end() is only evaluated once, > > it will be invalidated should the first inserted element trigger a > reallocation. > > For the second element we then would try to dereference an invalid iterator. > > Nah, it assigns the iterator. Smth like: > pos = container.insert(pos, val); You are right, my bad. Here's libc++'s implementation: https://github.com/llvm-mirror/libcxx/blob/master/include/iterator#L852-L857 When initially looking this up I missed the assignment and only saw ++iter. https://codereview.chromium.org/2853333004/diff/140001/base/containers/flat_t... base/containers/flat_tree.h:656: difference_type pos_first_new = original_size; On 2017/05/17 10:15:19, dyaroshev wrote: > On 2017/05/17 06:41:15, jdoerrie wrote: > > I dropped this optimization, because it would have made the interface of > > append_* more awkward. Furthermore, the reason for this optimization was > mainly > > efficiency for small ranges, however given that we do insertion inplace in > this > > case it is no longer critical. > > > > Getting rid of it makes the code cleaner and has to measurable performance > > implication for bigger ranges. > > I think that inserting small ranges is actually an important use case. Acknowledged. https://codereview.chromium.org/2853333004/diff/160001/base/containers/flat_t... File base/containers/flat_tree.h (right): https://codereview.chromium.org/2853333004/diff/160001/base/containers/flat_t... base/containers/flat_tree.h:357: std::pair<iterator, bool> append_or_assign(iterator first, On 2017/05/17 10:15:19, dyaroshev wrote: > I don't actually believe, that in these methods returning where the element > landed is more useful than returning lower_bound result. > This solves the awkward interface problem for cashing binary search result. Done. https://codereview.chromium.org/2853333004/diff/160001/base/containers/flat_t... base/containers/flat_tree.h:683: FlatContainerDupes dupes) -> void { On 2017/05/17 10:15:19, dyaroshev wrote: > Nit: we just wrote void, cause the type doesn't depend on the flat_tree. Done. https://codereview.chromium.org/2853333004/diff/160001/base/containers/flat_t... base/containers/flat_tree.h:691: is_multipass<InputIterator>() && std::next(first) == last; On 2017/05/17 10:15:19, dyaroshev wrote: > On 2017/05/17 at 06:41:16, jdoerrie wrote: > > You were right with your earlier comment that this should be generalized to > std::distance(first, last) <= kSomeThreshold. I did some quick testing and for > me the sweet spot seems to be around 6 elements. However, as you said, this > change should be done in a future CL. > > I suspect it depends on two numbers: size() and distance(first, last). Acknowledged. https://codereview.chromium.org/2853333004/diff/160001/base/containers/flat_t... base/containers/flat_tree.h:729: std::lower_bound(begin(), middle(), *middle(), value_comp()), middle(), On 2017/05/17 10:15:19, dyaroshev wrote: > I really think you ought to cache binary search. Done.
I just saw your LGTM, thanks for the rigorous review and insightful comments :) I'll hold off landing just a bit longer so that brettw@ can have another look at this. There have been many changes since his last review. Please go ahead and share our results in the SG14 thread :)
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
LGTM, I didn't look at the details.
The CQ bit was checked by jdoerrie@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from dyaroshev@yandex-team.ru Link to the patchset: https://codereview.chromium.org/2853333004/#ps180001 (title: "Another round of comments")
CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: win_chromium_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_...)
The CQ bit was checked by jdoerrie@chromium.org
CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 180001, "attempt_start_ts": 1495180690160300, "parent_rev": "890c278213ee759d9ac2063847168209c38c32e9", "commit_rev": "154dc49655238f0fc9928763bcf47930823a8e02"}
Message was sent while issue was closed.
Description was changed from ========== Add range insertion for base::flat_tree This change implements an efficient algorithm for base::flat_tree::insert(Iter, Iter). BUG=671759, 682249 ========== to ========== Add range insertion for base::flat_tree This change implements an efficient algorithm for base::flat_tree::insert(Iter, Iter). BUG=671759, 682249 Review-Url: https://codereview.chromium.org/2853333004 Cr-Commit-Position: refs/heads/master@{#473148} Committed: https://chromium.googlesource.com/chromium/src/+/154dc49655238f0fc9928763bcf4... ==========
Message was sent while issue was closed.
Committed patchset #10 (id:180001) as https://chromium.googlesource.com/chromium/src/+/154dc49655238f0fc9928763bcf4... |