|
|
DescriptionMake flat containers stable, allow constructing from vector.
The stability matters more for maps that it did for sets, so I think it makes sense if all of the flat_set/map containers are stable sorted.
Since we are unlikely to switch out the storage type, I made constructors that take a moved vector. I want to use this pattern to
collect base::Values and then construct a map from it all at once.
Review-Url: https://codereview.chromium.org/2776793002
Cr-Commit-Position: refs/heads/master@{#463094}
Committed: https://chromium.googlesource.com/chromium/src/+/5c72fdadcbe0e5914a2a1c935992975e31ff3e67
Patch Set 1 #Patch Set 2 : Comment update #Patch Set 3 : Checkpoint #
Total comments: 6
Patch Set 4 : Update #Patch Set 5 : Comments #Patch Set 6 : . #Patch Set 7 : A few more fixes #Patch Set 8 : More #Patch Set 9 : GCC fixes #Patch Set 10 : Merge #
Total comments: 25
Patch Set 11 : Review comments #Patch Set 12 : Move to internal #Patch Set 13 : Put back media log change lost in merge #
Messages
Total messages: 101 (40 generated)
brettw@chromium.org changed reviewers: + danakj@chromium.org, dyaroshev@yandex-team.ru
I went to try to use this efficiently and ran into a problem where the "all at once" constructors weren't as efficient or predictable as I needed.
The CQ bit was checked by brettw@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Comment update
The CQ bit was checked by brettw@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-device on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device/builds...) ios-device-xcode-clang on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device-xcode-...)
On 2017/03/24 22:56:28, brettw (plz ping after 24h) wrote: > Comment update 1. Please link to https://bugs.chromium.org/p/chromium/issues/detail?id=682254 2. I am against making range constructor stable because one would have to choose between using sorted vector and a flat container, and I think that flat_set/map should be a pure replacement for those cases where it makes sense. I agree that stability for maps is important but in a case of consecutive insertions not for one bulk. So insert(first, last) (if we ever have it) should not override elements that are already in but it's ok for any of the same new elements to land. We can have a special overload flat_map(base::ordered, first, last) and do stable_sort before calling it, if you really need it.
Underlying type can be exposed in the interface. Smth like: base::flat_set<int, std::vector<int>>. Than if you have to conversion it makes sense to have a from conversion. And maybe I would do both l and rvalue in one go (for construction). explicit flat_set(underlying_type body, const KeyCompare& comp) : impl_(comp, std::move(body)) { sort_and_unique(); } explicit operator underlying_type() & { return impl_.body_; } explicit operator underlying_type() && { return std::move(impl_.body_); }
On 2017/03/25 12:36:30, dyaroshev wrote: > explicit operator underlying_type() & { > return impl_.body_; > } > > explicit operator underlying_type() && { > return std::move(impl_.body_); > } explicit operator underlying_type&() & { return impl_.body_; } explicit operator underlying_type&&() && { return std::move(impl_.body_); }
On 2017/03/25 12:39:09, dyaroshev wrote: > On 2017/03/25 12:36:30, dyaroshev wrote: > > explicit operator underlying_type() & { > > return impl_.body_; > > } > > > > explicit operator underlying_type() && { > > return std::move(impl_.body_); > > } > > explicit operator underlying_type&() & { > return impl_.body_; > } > > explicit operator underlying_type&&() && { > return std::move(impl_.body_); > } Attempt number 3) Sorry. explicit operator const underlying_type&() const & { return impl_.body_; } explicit operator underlying_type&&() && { return std::move(impl_.body_); }
I used to think that keeping it unordered was fine and the obvious answer for callers is "don't do that, silly". I was assuming that people would be doing this from static data like the unit tests. But in trying to figure out how to use this in practice, I have changed my mind. In order for performance to be good, you have to use the bulk-insertion operations. This means that the common pattern will be to collect the inputs, and then put them in the set/map all at once. With this mode being the recommended way to use the class, it needs to be well-defined. And given the trivial difference between the performance of sort() and stable_sort() for the sizes of lists we're talking about, we shouldn't be making these classes more surprising or difficult to use. Unstable behavior will just come back to bite unsuspecting people in the future, and this class is already hard enough to use properly (preferring the bulk-insertion operations).
On 2017/03/27 17:36:31, brettw (plz ping after 24h) wrote: > I used to think that keeping it unordered was fine and the obvious answer for > callers is "don't do that, silly". I was assuming that people would be doing > this from static data like the unit tests. > > But in trying to figure out how to use this in practice, I have changed my mind. > In order for performance to be good, you have to use the bulk-insertion > operations. This means that the common pattern will be to collect the inputs, > and then put them in the set/map all at once. > > With this mode being the recommended way to use the class, it needs to be > well-defined. And given the trivial difference between the performance of > sort() and stable_sort() for the sizes of lists we're talking about, we > shouldn't be making these classes more surprising or difficult to use. Unstable > behavior will just come back to bite unsuspecting people in the future, and this > class is already hard enough to use properly (preferring the bulk-insertion > operations). I strongly disagree. a) I'm not sure that the std::stable_sort and std::sort are all the same for small data - std::stable_sort is not an inplace algorithm in common case, so it would call std::get_temporary_buffer which is (for many implementations) is a pure new call - so it's an overhead of calling malloc. For small sizes cost of malloc is significant. b) I think that if you rely on the fact that first element passed in a map's constructor is the one that landed, you should be very vocal about it - it's neither a well known or obvious property. c) The number of cases where you would actually care which one is landed in a bulk insertion is practically negligible. Let's not make people pay for what they don't use.
I think it will be helpful for Dana to weigh in as a base owner. I don't want to keep the current unstable sorting API because I think it's dangerous to have undefined behavior in a fundamental container like this. I think we should also match std::map's behavior which is "keep the first if there are duplicates." There is an alternative option I'm happy with but I'm worried it's tending toward overdesign. The context is I want to try using flat_map for base::Value dictionaries. I did some initial measurements and I think performance will be improved, but I need to make the parser construct optimally rather than doing the O(n^2) insertion-one-at-a-time dance. This seems like a good test of this container. If there are duplicate values, the JSON parser keeps the *last* one. I don't know if this is in the spec (it may not be spec'ed at all) but I think this is too risky to change. So even making the sort stable doesn't really solve my problem and the JSON parser will have to process the values somehow. The option that would work for the JSON case and may also make dyaroshev happy is a parameter that specifies what to do about duplicates. Because of my JSON requirements, I either need to be able to pass in a vector that's guaranteed sorted already, or have a way to specify "keep the last." A maximal interface is: enum class Process { // Better name? KEEP_FIRST, // Keep first instance of dupes (like map). KEEP_LAST, // Keep last instance of dupes (what I need for JSON). KEEP_ANY, // Unstable sort, faster. PRE_PROCESSED, // Input is already sorted and free of duplicates (maybe don't need?) }; flat_map(InputIterator begin, InputIterator end, Process process = Process::KEEP_FIRST, Compare...); flat_map(std::vector&& storage, Process process = Process::KEEP_FIRST, Compare...); ... If we think the JSON use-case of needing the last one is unusual (I don't think it will be), then only PRE_PROCESSED and KEEP_FIRST values will solve my problem. Alternatively, the minimal interface supporting JSON is KEEP_FIRST and KEEP_LAST. dyaroshev is arguing for KEEP_ANY as an option which I'm ambivalent about.
On 2017/03/27 19:58:58, brettw (plz ping after 24h) wrote: > I think it will be helpful for Dana to weigh in as a base owner. > > I don't want to keep the current unstable sorting API because I think it's > dangerous to have undefined behavior in a fundamental container like this. I > think we should also match std::map's behavior which is "keep the first if there > are duplicates." There is an alternative option I'm happy with but I'm worried > it's tending toward overdesign. > > The context is I want to try using flat_map for base::Value dictionaries. I did > some initial measurements and I think performance will be improved, but I need > to make the parser construct optimally rather than doing the O(n^2) > insertion-one-at-a-time dance. This seems like a good test of this container. > > If there are duplicate values, the JSON parser keeps the *last* one. I don't > know if this is in the spec (it may not be spec'ed at all) but I think this is > too risky to change. So even making the sort stable doesn't really solve my > problem and the JSON parser will have to process the values somehow. > > The option that would work for the JSON case and may also make dyaroshev happy > is a parameter that specifies what to do about duplicates. Because of my JSON > requirements, I either need to be able to pass in a vector that's guaranteed > sorted already, or have a way to specify "keep the last." A maximal interface > is: > > enum class Process { // Better name? > KEEP_FIRST, // Keep first instance of dupes (like map). > KEEP_LAST, // Keep last instance of dupes (what I need for JSON). > KEEP_ANY, // Unstable sort, faster. > PRE_PROCESSED, // Input is already sorted and free of duplicates (maybe > don't need?) > }; > flat_map(InputIterator begin, InputIterator end, Process process = > Process::KEEP_FIRST, Compare...); > flat_map(std::vector&& storage, Process process = Process::KEEP_FIRST, > Compare...); > > ... > > If we think the JSON use-case of needing the last one is unusual (I don't think > it will be), then only PRE_PROCESSED and KEEP_FIRST values will solve my > problem. Alternatively, the minimal interface supporting JSON is KEEP_FIRST and > KEEP_LAST. dyaroshev is arguing for KEEP_ANY as an option which I'm ambivalent > about. We can have overloads: flat_set(first, last); flat_set(base::unstable_t, first, last); flat_set(base::ordered_unique_t, first, last); ordered_unique_t - this is how it's called in boost.
On 2017/03/27 19:58:58, brettw (plz ping after 24h) wrote: > I think it will be helpful for Dana to weigh in as a base owner. > > I don't want to keep the current unstable sorting API because I think it's > dangerous to have undefined behavior in a fundamental container like this. I > think we should also match std::map's behavior which is "keep the first if there > are duplicates." There is an alternative option I'm happy with but I'm worried > it's tending toward overdesign. > > The context is I want to try using flat_map for base::Value dictionaries. I did > some initial measurements and I think performance will be improved, but I need > to make the parser construct optimally rather than doing the O(n^2) > insertion-one-at-a-time dance. This seems like a good test of this container. > > If there are duplicate values, the JSON parser keeps the *last* one. I don't > know if this is in the spec (it may not be spec'ed at all) but I think this is > too risky to change. So even making the sort stable doesn't really solve my > problem and the JSON parser will have to process the values somehow. > > The option that would work for the JSON case and may also make dyaroshev happy > is a parameter that specifies what to do about duplicates. Because of my JSON > requirements, I either need to be able to pass in a vector that's guaranteed > sorted already, or have a way to specify "keep the last." A maximal interface > is: > > enum class Process { // Better name? > KEEP_FIRST, // Keep first instance of dupes (like map). > KEEP_LAST, // Keep last instance of dupes (what I need for JSON). > KEEP_ANY, // Unstable sort, faster. > PRE_PROCESSED, // Input is already sorted and free of duplicates (maybe > don't need?) > }; > flat_map(InputIterator begin, InputIterator end, Process process = > Process::KEEP_FIRST, Compare...); > flat_map(std::vector&& storage, Process process = Process::KEEP_FIRST, > Compare...); > > ... > > If we think the JSON use-case of needing the last one is unusual (I don't think > it will be), then only PRE_PROCESSED and KEEP_FIRST values will solve my > problem. Alternatively, the minimal interface supporting JSON is KEEP_FIRST and > KEEP_LAST. dyaroshev is arguing for KEEP_ANY as an option which I'm ambivalent > about. So okay high level thoughts: - This is deviating from std::map/std::set. I'm okay with that. - Why is this specific to constructors? Should you not be able to do bulk insert the same way? - In terms of insert.. this conversation of KEEP_FIRST vs KEEP_LAST reminds me entirely of insert() vs insert_or_assign(). - If we had insert-from-vector and insert_or_assign-from-vector (and insert-from-unique-vector?), would we still need the constructor? We can encode he behaviour differences in function names but not in constructor names. (C++ please...) - If we need the constructor, maybe we could make the enum (or whatever we use to differentiate) terminology there closer to insert vs insert_and_assign (and/or whatever we use for bulk-insert)? - I'm not really diffing the KEEP_ANY case, I'd combine it with PRE_PROCESSED. If you want to make things fast, then ensure your vector is unique. We can do unstable sort there rather than forcing the caller to do a sort though, it reduces the number of footguns by one. - The idea of splitting into diff constructors with a type like in boost is decent. It reminds me more of other cases where we choose a constructor with a type (gfx::Transform) Thoughts?
On 2017/03/28 21:16:12, danakj wrote: > On 2017/03/27 19:58:58, brettw (plz ping after 24h) wrote: > > I think it will be helpful for Dana to weigh in as a base owner. > > > > I don't want to keep the current unstable sorting API because I think it's > > dangerous to have undefined behavior in a fundamental container like this. I > > think we should also match std::map's behavior which is "keep the first if > there > > are duplicates." There is an alternative option I'm happy with but I'm worried > > it's tending toward overdesign. > > > > The context is I want to try using flat_map for base::Value dictionaries. I > did > > some initial measurements and I think performance will be improved, but I need > > to make the parser construct optimally rather than doing the O(n^2) > > insertion-one-at-a-time dance. This seems like a good test of this container. > > > > If there are duplicate values, the JSON parser keeps the *last* one. I don't > > know if this is in the spec (it may not be spec'ed at all) but I think this is > > too risky to change. So even making the sort stable doesn't really solve my > > problem and the JSON parser will have to process the values somehow. > > > > The option that would work for the JSON case and may also make dyaroshev happy > > is a parameter that specifies what to do about duplicates. Because of my JSON > > requirements, I either need to be able to pass in a vector that's guaranteed > > sorted already, or have a way to specify "keep the last." A maximal interface > > is: > > > > enum class Process { // Better name? > > KEEP_FIRST, // Keep first instance of dupes (like map). > > KEEP_LAST, // Keep last instance of dupes (what I need for JSON). > > KEEP_ANY, // Unstable sort, faster. > > PRE_PROCESSED, // Input is already sorted and free of duplicates (maybe > > don't need?) > > }; > > flat_map(InputIterator begin, InputIterator end, Process process = > > Process::KEEP_FIRST, Compare...); > > flat_map(std::vector&& storage, Process process = Process::KEEP_FIRST, > > Compare...); > > > > ... > > > > If we think the JSON use-case of needing the last one is unusual (I don't > think > > it will be), then only PRE_PROCESSED and KEEP_FIRST values will solve my > > problem. Alternatively, the minimal interface supporting JSON is KEEP_FIRST > and > > KEEP_LAST. dyaroshev is arguing for KEEP_ANY as an option which I'm ambivalent > > about. > > So okay high level thoughts: > > - This is deviating from std::map/std::set. I'm okay with that. > - Why is this specific to constructors? Should you not be able to do bulk insert > the same way? > - In terms of insert.. this conversation of KEEP_FIRST vs KEEP_LAST reminds me > entirely of insert() vs insert_or_assign(). > - If we had insert-from-vector and insert_or_assign-from-vector (and > insert-from-unique-vector?), would we still need the constructor? We can encode > he behaviour differences in function names but not in constructor names. (C++ > please...) > - If we need the constructor, maybe we could make the enum (or whatever we use > to differentiate) terminology there closer to insert vs insert_and_assign > (and/or whatever we use for bulk-insert)? > - I'm not really diffing the KEEP_ANY case, I'd combine it with PRE_PROCESSED. My brain short circuited there. "I'm not really feeling the" is how that should read. > If you want to make things fast, then ensure your vector is unique. We can do > unstable sort there rather than forcing the caller to do a sort though, it > reduces the number of footguns by one. > - The idea of splitting into diff constructors with a type like in boost is > decent. It reminds me more of other cases where we choose a constructor with a > type (gfx::Transform) > > Thoughts?
On 2017/03/28 21:16:12, danakj wrote: > On 2017/03/27 19:58:58, brettw (plz ping after 24h) wrote: > > I think it will be helpful for Dana to weigh in as a base owner. > > > > I don't want to keep the current unstable sorting API because I think it's > > dangerous to have undefined behavior in a fundamental container like this. I > > think we should also match std::map's behavior which is "keep the first if > there > > are duplicates." There is an alternative option I'm happy with but I'm worried > > it's tending toward overdesign. > > > > The context is I want to try using flat_map for base::Value dictionaries. I > did > > some initial measurements and I think performance will be improved, but I need > > to make the parser construct optimally rather than doing the O(n^2) > > insertion-one-at-a-time dance. This seems like a good test of this container. > > > > If there are duplicate values, the JSON parser keeps the *last* one. I don't > > know if this is in the spec (it may not be spec'ed at all) but I think this is > > too risky to change. So even making the sort stable doesn't really solve my > > problem and the JSON parser will have to process the values somehow. > > > > The option that would work for the JSON case and may also make dyaroshev happy > > is a parameter that specifies what to do about duplicates. Because of my JSON > > requirements, I either need to be able to pass in a vector that's guaranteed > > sorted already, or have a way to specify "keep the last." A maximal interface > > is: > > > > enum class Process { // Better name? > > KEEP_FIRST, // Keep first instance of dupes (like map). > > KEEP_LAST, // Keep last instance of dupes (what I need for JSON). > > KEEP_ANY, // Unstable sort, faster. > > PRE_PROCESSED, // Input is already sorted and free of duplicates (maybe > > don't need?) > > }; > > flat_map(InputIterator begin, InputIterator end, Process process = > > Process::KEEP_FIRST, Compare...); > > flat_map(std::vector&& storage, Process process = Process::KEEP_FIRST, > > Compare...); > > > > ... > > > > If we think the JSON use-case of needing the last one is unusual (I don't > think > > it will be), then only PRE_PROCESSED and KEEP_FIRST values will solve my > > problem. Alternatively, the minimal interface supporting JSON is KEEP_FIRST > and > > KEEP_LAST. dyaroshev is arguing for KEEP_ANY as an option which I'm ambivalent > > about. > > So okay high level thoughts: > > - This is deviating from std::map/std::set. I'm okay with that. > - Why is this specific to constructors? Should you not be able to do bulk insert > the same way? The problem with bulk insert is that I don't know of any decent implementation. Even benchmarking it is very hard. As soon as we have one - sure we have special cases for it. However, insert(vector) doesn't seem to make a lot sense - we do not need it's memory and it seems to be misleading. The only useful thing I might think of is using it as a buffer where we can sort new elements but since I do not know the algorithm, I don't know how useful it actually is. > - In terms of insert.. this conversation of KEEP_FIRST vs KEEP_LAST reminds me > entirely of insert() vs insert_or_assign(). > - If we had insert-from-vector and insert_or_assign-from-vector (and > insert-from-unique-vector?), would we still need the constructor? We can encode > he behaviour differences in function names but not in constructor names. (C++ > please...) > - If we need the constructor, maybe we could make the enum (or whatever we use > to differentiate) terminology there closer to insert vs insert_and_assign > (and/or whatever we use for bulk-insert)? > - I'm not really diffing the KEEP_ANY case, I'd combine it with PRE_PROCESSED. I really think it should be very easy to do it: it's the fastest way and why to leave performarnse on the table just because we don't have a good enough api. > If you want to make things fast, then ensure your vector is unique. We can do > unstable sort there rather than forcing the caller to do a sort though, it > reduces the number of footguns by one. > - The idea of splitting into diff constructors with a type like in boost is > decent. It reminds me more of other cases where we choose a constructor with a > type (gfx::Transform) > > Thoughts? I think that special constructors for different sortings are a good way to go here. There is a question whether to keep plain insert(first, last) or force people to always choose one. I don't think I have a strong enough opinion on this one.
On Tue, Mar 28, 2017 at 5:25 PM, <dyaroshev@yandex-team.ru> wrote: > On 2017/03/28 21:16:12, danakj wrote: > > On 2017/03/27 19:58:58, brettw (plz ping after 24h) wrote: > > > I think it will be helpful for Dana to weigh in as a base owner. > > > > > > I don't want to keep the current unstable sorting API because I think > it's > > > dangerous to have undefined behavior in a fundamental container like > this. I > > > think we should also match std::map's behavior which is "keep the > first if > > there > > > are duplicates." There is an alternative option I'm happy with but I'm > worried > > > it's tending toward overdesign. > > > > > > The context is I want to try using flat_map for base::Value > dictionaries. I > > did > > > some initial measurements and I think performance will be improved, > but I > need > > > to make the parser construct optimally rather than doing the O(n^2) > > > insertion-one-at-a-time dance. This seems like a good test of this > container. > > > > > > If there are duplicate values, the JSON parser keeps the *last* one. I > don't > > > know if this is in the spec (it may not be spec'ed at all) but I think > this > is > > > too risky to change. So even making the sort stable doesn't really > solve my > > > problem and the JSON parser will have to process the values somehow. > > > > > > The option that would work for the JSON case and may also make > dyaroshev > happy > > > is a parameter that specifies what to do about duplicates. Because of > my > JSON > > > requirements, I either need to be able to pass in a vector that's > guaranteed > > > sorted already, or have a way to specify "keep the last." A maximal > interface > > > is: > > > > > > enum class Process { // Better name? > > > KEEP_FIRST, // Keep first instance of dupes (like map). > > > KEEP_LAST, // Keep last instance of dupes (what I need for JSON). > > > KEEP_ANY, // Unstable sort, faster. > > > PRE_PROCESSED, // Input is already sorted and free of duplicates (maybe > > > don't need?) > > > }; > > > flat_map(InputIterator begin, InputIterator end, Process process = > > > Process::KEEP_FIRST, Compare...); > > > flat_map(std::vector&& storage, Process process = Process::KEEP_FIRST, > > > Compare...); > > > > > > ... > > > > > > If we think the JSON use-case of needing the last one is unusual (I > don't > > think > > > it will be), then only PRE_PROCESSED and KEEP_FIRST values will solve > my > > > problem. Alternatively, the minimal interface supporting JSON is > KEEP_FIRST > > and > > > KEEP_LAST. dyaroshev is arguing for KEEP_ANY as an option which I'm > ambivalent > > > about. > > > > So okay high level thoughts: > > > > - This is deviating from std::map/std::set. I'm okay with that. > > - Why is this specific to constructors? Should you not be able to do bulk > insert > > the same way? > > The problem with bulk insert is that I don't know of any decent > implementation. > Even benchmarking it is very hard. > > As soon as we have one - sure we have special cases for it. > > However, insert(vector) doesn't seem to make a lot sense - we do not need > it's > memory and it seems to be misleading. > Can you explain what would be bad about insert(vector)? I don't see how that is any different than using flat_set(vector) really. The only difference would be items are moved out of it instead of stealing the storage. You'd append them to the flat_set's storage all in one move, and sort them all together, what is misleading? > The only useful thing I might think of is using it as a buffer where we > can sort > new elements but since I do not know the algorithm, I don't know how > useful it > actually is. > > > - In terms of insert.. this conversation of KEEP_FIRST vs KEEP_LAST > reminds me > > entirely of insert() vs insert_or_assign(). > > - If we had insert-from-vector and insert_or_assign-from-vector (and > > insert-from-unique-vector?), would we still need the constructor? We can > encode > > he behaviour differences in function names but not in constructor names. > (C++ > > please...) > > - If we need the constructor, maybe we could make the enum (or whatever > we use > > to differentiate) terminology there closer to insert vs insert_and_assign > > (and/or whatever we use for bulk-insert)? > > - I'm not really diffing the KEEP_ANY case, I'd combine it with > PRE_PROCESSED. > > I really think it should be very easy to do it: it's the fastest way and > why to > leave performarnse on the table just because we don't have a good enough > api. > > If you want to make things fast, then ensure your vector is unique. We > can do > > unstable sort there rather than forcing the caller to do a sort though, > it > > reduces the number of footguns by one. > > - The idea of splitting into diff constructors with a type like in boost > is > > decent. It reminds me more of other cases where we choose a constructor > with a > > type (gfx::Transform) > > > > Thoughts? > > I think that special constructors for different sortings are a good way to > go > here. There is a question whether to keep plain insert(first, last) or > force > people to always choose one. I don't think I have a strong enough opinion > on > this one. > > > > https://codereview.chromium.org/2776793002/ > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On 2017/03/28 21:29:07, danakj wrote: > Can you explain what would be bad about insert(vector)? I don't see how > that is any different than > using flat_set(vector) really. The only difference would be items are moved > out of it instead of > stealing the storage. You'd append them to the flat_set's storage all in > one move, and sort them > all together, what is misleading? > Well, this is a job for std::move_iterator. Why would you want it to be a vector? Or underlying_type? It should probably be any container. Or maybe range. So, I think it's strange.
On Tue, Mar 28, 2017 at 5:25 PM, <dyaroshev@yandex-team.ru> wrote: > On 2017/03/28 21:16:12, danakj wrote: > > On 2017/03/27 19:58:58, brettw (plz ping after 24h) wrote: > > > I think it will be helpful for Dana to weigh in as a base owner. > > > > > > I don't want to keep the current unstable sorting API because I think > it's > > > dangerous to have undefined behavior in a fundamental container like > this. I > > > think we should also match std::map's behavior which is "keep the > first if > > there > > > are duplicates." There is an alternative option I'm happy with but I'm > worried > > > it's tending toward overdesign. > > > > > > The context is I want to try using flat_map for base::Value > dictionaries. I > > did > > > some initial measurements and I think performance will be improved, > but I > need > > > to make the parser construct optimally rather than doing the O(n^2) > > > insertion-one-at-a-time dance. This seems like a good test of this > container. > > > > > > If there are duplicate values, the JSON parser keeps the *last* one. I > don't > > > know if this is in the spec (it may not be spec'ed at all) but I think > this > is > > > too risky to change. So even making the sort stable doesn't really > solve my > > > problem and the JSON parser will have to process the values somehow. > > > > > > The option that would work for the JSON case and may also make > dyaroshev > happy > > > is a parameter that specifies what to do about duplicates. Because of > my > JSON > > > requirements, I either need to be able to pass in a vector that's > guaranteed > > > sorted already, or have a way to specify "keep the last." A maximal > interface > > > is: > > > > > > enum class Process { // Better name? > > > KEEP_FIRST, // Keep first instance of dupes (like map). > > > KEEP_LAST, // Keep last instance of dupes (what I need for JSON). > > > KEEP_ANY, // Unstable sort, faster. > > > PRE_PROCESSED, // Input is already sorted and free of duplicates (maybe > > > don't need?) > > > }; > > > flat_map(InputIterator begin, InputIterator end, Process process = > > > Process::KEEP_FIRST, Compare...); > > > flat_map(std::vector&& storage, Process process = Process::KEEP_FIRST, > > > Compare...); > > > > > > ... > > > > > > If we think the JSON use-case of needing the last one is unusual (I > don't > > think > > > it will be), then only PRE_PROCESSED and KEEP_FIRST values will solve > my > > > problem. Alternatively, the minimal interface supporting JSON is > KEEP_FIRST > > and > > > KEEP_LAST. dyaroshev is arguing for KEEP_ANY as an option which I'm > ambivalent > > > about. > > > > So okay high level thoughts: > > > > - This is deviating from std::map/std::set. I'm okay with that. > > - Why is this specific to constructors? Should you not be able to do bulk > insert > > the same way? > > The problem with bulk insert is that I don't know of any decent > implementation. > Even benchmarking it is very hard. > > As soon as we have one - sure we have special cases for it. > > However, insert(vector) doesn't seem to make a lot sense - we do not need > it's > memory and it seems to be misleading. > The only useful thing I might think of is using it as a buffer where we > can sort > new elements but since I do not know the algorithm, I don't know how > useful it > actually is. > > > - In terms of insert.. this conversation of KEEP_FIRST vs KEEP_LAST > reminds me > > entirely of insert() vs insert_or_assign(). > > - If we had insert-from-vector and insert_or_assign-from-vector (and > > insert-from-unique-vector?), would we still need the constructor? We can > encode > > he behaviour differences in function names but not in constructor names. > (C++ > > please...) > > - If we need the constructor, maybe we could make the enum (or whatever > we use > > to differentiate) terminology there closer to insert vs insert_and_assign > > (and/or whatever we use for bulk-insert)? > > - I'm not really diffing the KEEP_ANY case, I'd combine it with > PRE_PROCESSED. > > I really think it should be very easy to do it: it's the fastest way and > why to > leave performarnse on the table just because we don't have a good enough > api. > It's not that it's hard, it's that it adds more options to the API and I'm not convinced of the value proposition. How often do we have sorted data, as opposed to data that we may choose to sort before inserting or may choose not to sort and let the container do it? Also the big red flag to me is that sorting is the implementation details of the container. It could be implemented in any way. It also depends on the comparator of the container, which is getting into very sketchy territory IMO. Whereas unique-ness is much less tied to the container's implementation (sure it's related to the comparator but more weakly). > > > If you want to make things fast, then ensure your vector is unique. We > can do > > unstable sort there rather than forcing the caller to do a sort though, > it > > reduces the number of footguns by one. > > - The idea of splitting into diff constructors with a type like in boost > is > > decent. It reminds me more of other cases where we choose a constructor > with a > > type (gfx::Transform) > > > > Thoughts? > > I think that special constructors for different sortings are a good way to > go > here. There is a question whether to keep plain insert(first, last) or > force > people to always choose one. I don't think I have a strong enough opinion > on > this one. > I would keep them, people may use something other than a vector. > > > > https://codereview.chromium.org/2776793002/ > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On Tue, Mar 28, 2017 at 5:31 PM, <dyaroshev@yandex-team.ru> wrote: > On 2017/03/28 21:29:07, danakj wrote: > > Can you explain what would be bad about insert(vector)? I don't see how > > that is any different than > > using flat_set(vector) really. The only difference would be items are > moved > > out of it instead of > > stealing the storage. You'd append them to the flat_set's storage all in > > one move, and sort them > > all together, what is misleading? > > > > Well, this is a job for std::move_iterator. > > Why would you want it to be a vector? Or underlying_type? It should > probably be > any container. Or maybe range. > So, I think it's strange. > I see, this is a good point, there's no need to tie it to a type. The constructor is, I suppose, different because we're providing the initial storage. I do think that if we're going to add bulk-construct it would be appropriate to add bulk-insert at the same time though. You mentioned that we don't have a good method. Does "append everything and call sort" not work for insert in the same way it would work for constructing? Keeping the (first, end) constructor(s) would more closely match the bulk-insert methods then if we went that route. But it would be appropriate to make different (first, end) constructors in the same way that there'd be different (vector) constructors IMO. > > https://codereview.chromium.org/2776793002/ > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On 2017/03/28 21:32:05, danakj wrote: > On Tue, Mar 28, 2017 at 5:25 PM, <mailto:dyaroshev@yandex-team.ru> wrote: > > > On 2017/03/28 21:16:12, danakj wrote: > > > On 2017/03/27 19:58:58, brettw (plz ping after 24h) wrote: > > > > I think it will be helpful for Dana to weigh in as a base owner. > > > > > > > > I don't want to keep the current unstable sorting API because I think > > it's > > > > dangerous to have undefined behavior in a fundamental container like > > this. I > > > > think we should also match std::map's behavior which is "keep the > > first if > > > there > > > > are duplicates." There is an alternative option I'm happy with but I'm > > worried > > > > it's tending toward overdesign. > > > > > > > > The context is I want to try using flat_map for base::Value > > dictionaries. I > > > did > > > > some initial measurements and I think performance will be improved, > > but I > > need > > > > to make the parser construct optimally rather than doing the O(n^2) > > > > insertion-one-at-a-time dance. This seems like a good test of this > > container. > > > > > > > > If there are duplicate values, the JSON parser keeps the *last* one. I > > don't > > > > know if this is in the spec (it may not be spec'ed at all) but I think > > this > > is > > > > too risky to change. So even making the sort stable doesn't really > > solve my > > > > problem and the JSON parser will have to process the values somehow. > > > > > > > > The option that would work for the JSON case and may also make > > dyaroshev > > happy > > > > is a parameter that specifies what to do about duplicates. Because of > > my > > JSON > > > > requirements, I either need to be able to pass in a vector that's > > guaranteed > > > > sorted already, or have a way to specify "keep the last." A maximal > > interface > > > > is: > > > > > > > > enum class Process { // Better name? > > > > KEEP_FIRST, // Keep first instance of dupes (like map). > > > > KEEP_LAST, // Keep last instance of dupes (what I need for JSON). > > > > KEEP_ANY, // Unstable sort, faster. > > > > PRE_PROCESSED, // Input is already sorted and free of duplicates (maybe > > > > don't need?) > > > > }; > > > > flat_map(InputIterator begin, InputIterator end, Process process = > > > > Process::KEEP_FIRST, Compare...); > > > > flat_map(std::vector&& storage, Process process = Process::KEEP_FIRST, > > > > Compare...); > > > > > > > > ... > > > > > > > > If we think the JSON use-case of needing the last one is unusual (I > > don't > > > think > > > > it will be), then only PRE_PROCESSED and KEEP_FIRST values will solve > > my > > > > problem. Alternatively, the minimal interface supporting JSON is > > KEEP_FIRST > > > and > > > > KEEP_LAST. dyaroshev is arguing for KEEP_ANY as an option which I'm > > ambivalent > > > > about. > > > > > > So okay high level thoughts: > > > > > > - This is deviating from std::map/std::set. I'm okay with that. > > > - Why is this specific to constructors? Should you not be able to do bulk > > insert > > > the same way? > > > > The problem with bulk insert is that I don't know of any decent > > implementation. > > Even benchmarking it is very hard. > > > > As soon as we have one - sure we have special cases for it. > > > > However, insert(vector) doesn't seem to make a lot sense - we do not need > > it's > > memory and it seems to be misleading. > > The only useful thing I might think of is using it as a buffer where we > > can sort > > new elements but since I do not know the algorithm, I don't know how > > useful it > > actually is. > > > > > - In terms of insert.. this conversation of KEEP_FIRST vs KEEP_LAST > > reminds me > > > entirely of insert() vs insert_or_assign(). > > > - If we had insert-from-vector and insert_or_assign-from-vector (and > > > insert-from-unique-vector?), would we still need the constructor? We can > > encode > > > he behaviour differences in function names but not in constructor names. > > (C++ > > > please...) > > > - If we need the constructor, maybe we could make the enum (or whatever > > we use > > > to differentiate) terminology there closer to insert vs insert_and_assign > > > (and/or whatever we use for bulk-insert)? > > > - I'm not really diffing the KEEP_ANY case, I'd combine it with > > PRE_PROCESSED. > > > > I really think it should be very easy to do it: it's the fastest way and > > why to > > leave performarnse on the table just because we don't have a good enough > > api. > > > > It's not that it's hard, it's that it adds more options to the API and I'm > not convinced of > the value proposition. How often do we have sorted data, as opposed to data > that we > may choose to sort before inserting or may choose not to sort and let the > container > do it? Also the big red flag to me is that sorting is the implementation > details of the > container. It could be implemented in any way. It also depends on the > comparator of > the container, which is getting into very sketchy territory IMO. Whereas > unique-ness > is much less tied to the container's implementation (sure it's related to > the comparator > but more weakly). > It's not that hard when it does matters. What I mean is flat_set/flat_map should be widely used for small data sets. In these rear cases the programmer would not try to squeeze performance out and would use the default one. This means slowing down in every small us usecase. For example: https://cs.chromium.org/chromium/src/components/omnibox/browser/in_memory_url... Char16Set Char16SetFromString16(const base::string16& term) { return Char16Set(term.begin(), term.end()); } would transform into: Char16Set Char16SetFromString16(const base::string16& term) { auto tmp = term; std::sort(tmp.begin(), tmp.end()); tmp.erase(std::unique(tmp.begin(), tmp.end()), tmp.end()); return Char16Set(base::ordered_unique, std::move(tmp)); } So - this would become a trade off between short code and efficient code - very unpleasant. If we have special constructor, we can do: Char16Set Char16SetFromString16(const base::string16& term) { return Char16Set(base::unstable, term.begin(), term.end()); } And it's not that much of a choice. > > > > > > If you want to make things fast, then ensure your vector is unique. We > > can do > > > unstable sort there rather than forcing the caller to do a sort though, > > it > > > reduces the number of footguns by one. > > > - The idea of splitting into diff constructors with a type like in boost > > is > > > decent. It reminds me more of other cases where we choose a constructor > > with a > > > type (gfx::Transform) > > > > > > Thoughts? > > > > I think that special constructors for different sortings are a good way to > > go > > here. There is a question whether to keep plain insert(first, last) or > > force > > people to always choose one. I don't think I have a strong enough opinion > > on > > this one. > > > > I would keep them, people may use something other than a vector. > I'm not sure we are on the same page here. This is a list of constructors that I have in mind: flat_set(base::unstable, first, last) flat_set(base::unstable, underlying_type) flat_set(base::stable, first, last) flat_set(base::stable, underlying_type) flat_set(base::ordered_unique, first, last) flat_set(first, last) // ? this is the one I have doubts about. We can even, instead of a type tag make it an action template <typename Sorter, typename I> flat_tree(Sorter s, I first, I last) : impl_(Compare(), first, last) { s(begin(), end(), value_compare()); }
On 2017/03/28 21:42:57, danakj wrote: > I see, this is a good point, there's no need to tie it to a type. The > constructor is, I suppose, > different because we're providing the initial storage. > > I do think that if we're going to add bulk-construct it would be > appropriate to add bulk-insert > at the same time though. You mentioned that we don't have a good method. > Does "append > everything and call sort" not work for insert in the same way it would work > for constructing? > > Keeping the (first, end) constructor(s) would more closely match the > bulk-insert methods then > if we went that route. But it would be appropriate to make different > (first, end) constructors in > the same way that there'd be different (vector) constructors IMO. > With constructor you do not have an issue of elements that are already in. Consider appending two elements to a 100 - do you really want to sort 102 elements? + this is where stability would be nice. I had a better algorithm: size_type old_size = size(); impl_.body_,insert(end(), first, last); auto mid_point = begin() + old_size; std::sort(mid_point, end, value_compare()); std::inplace_merge(begin(), mid_point, end()); erase(std::unique(begin(), end(), std::not_fn(value_compare()), end()); But it still less than ideal and for some examples it was slower than: std::copy(first, last, std::inserter(*this)); for some examples. This algorithm should be stable (it's not - there is a bug in libc++ inplace_merge) but it should be. We need to find a good algorithm. Choose the correct merge. But I still can't make my benchmarks produce stable results even for simple cases, and we need benchmarks.
On Tue, Mar 28, 2017 at 5:47 PM, <dyaroshev@yandex-team.ru> wrote: > On 2017/03/28 21:32:05, danakj wrote: > > > On Tue, Mar 28, 2017 at 5:25 PM, <mailto:dyaroshev@yandex-team.ru> > wrote: > > > > > On 2017/03/28 21:16:12, danakj wrote: > > > > On 2017/03/27 19:58:58, brettw (plz ping after 24h) wrote: > > > > > I think it will be helpful for Dana to weigh in as a base owner. > > > > > > > > > > I don't want to keep the current unstable sorting API because I > think > > > it's > > > > > dangerous to have undefined behavior in a fundamental container > like > > > this. I > > > > > think we should also match std::map's behavior which is "keep the > > > first if > > > > there > > > > > are duplicates." There is an alternative option I'm happy with but > I'm > > > worried > > > > > it's tending toward overdesign. > > > > > > > > > > The context is I want to try using flat_map for base::Value > > > dictionaries. I > > > > did > > > > > some initial measurements and I think performance will be improved, > > > but I > > > need > > > > > to make the parser construct optimally rather than doing the O(n^2) > > > > > insertion-one-at-a-time dance. This seems like a good test of this > > > container. > > > > > > > > > > If there are duplicate values, the JSON parser keeps the *last* > one. I > > > don't > > > > > know if this is in the spec (it may not be spec'ed at all) but I > think > > > this > > > is > > > > > too risky to change. So even making the sort stable doesn't really > > > solve my > > > > > problem and the JSON parser will have to process the values > somehow. > > > > > > > > > > The option that would work for the JSON case and may also make > > > dyaroshev > > > happy > > > > > is a parameter that specifies what to do about duplicates. Because > of > > > my > > > JSON > > > > > requirements, I either need to be able to pass in a vector that's > > > guaranteed > > > > > sorted already, or have a way to specify "keep the last." A maximal > > > interface > > > > > is: > > > > > > > > > > enum class Process { // Better name? > > > > > KEEP_FIRST, // Keep first instance of dupes (like map). > > > > > KEEP_LAST, // Keep last instance of dupes (what I need for JSON). > > > > > KEEP_ANY, // Unstable sort, faster. > > > > > PRE_PROCESSED, // Input is already sorted and free of duplicates > (maybe > > > > > don't need?) > > > > > }; > > > > > flat_map(InputIterator begin, InputIterator end, Process process = > > > > > Process::KEEP_FIRST, Compare...); > > > > > flat_map(std::vector&& storage, Process process = > Process::KEEP_FIRST, > > > > > Compare...); > > > > > > > > > > ... > > > > > > > > > > If we think the JSON use-case of needing the last one is unusual (I > > > don't > > > > think > > > > > it will be), then only PRE_PROCESSED and KEEP_FIRST values will > solve > > > my > > > > > problem. Alternatively, the minimal interface supporting JSON is > > > KEEP_FIRST > > > > and > > > > > KEEP_LAST. dyaroshev is arguing for KEEP_ANY as an option which I'm > > > ambivalent > > > > > about. > > > > > > > > So okay high level thoughts: > > > > > > > > - This is deviating from std::map/std::set. I'm okay with that. > > > > - Why is this specific to constructors? Should you not be able to do > bulk > > > insert > > > > the same way? > > > > > > The problem with bulk insert is that I don't know of any decent > > > implementation. > > > Even benchmarking it is very hard. > > > > > > As soon as we have one - sure we have special cases for it. > > > > > > However, insert(vector) doesn't seem to make a lot sense - we do not > need > > > it's > > > memory and it seems to be misleading. > > > The only useful thing I might think of is using it as a buffer where we > > > can sort > > > new elements but since I do not know the algorithm, I don't know how > > > useful it > > > actually is. > > > > > > > - In terms of insert.. this conversation of KEEP_FIRST vs KEEP_LAST > > > reminds me > > > > entirely of insert() vs insert_or_assign(). > > > > - If we had insert-from-vector and insert_or_assign-from-vector (and > > > > insert-from-unique-vector?), would we still need the constructor? We > can > > > encode > > > > he behaviour differences in function names but not in constructor > names. > > > (C++ > > > > please...) > > > > - If we need the constructor, maybe we could make the enum (or > whatever > > > we use > > > > to differentiate) terminology there closer to insert vs > insert_and_assign > > > > (and/or whatever we use for bulk-insert)? > > > > - I'm not really diffing the KEEP_ANY case, I'd combine it with > > > PRE_PROCESSED. > > > > > > I really think it should be very easy to do it: it's the fastest way > and > > > why to > > > leave performarnse on the table just because we don't have a good > enough > > > api. > > > > > > > It's not that it's hard, it's that it adds more options to the API and > I'm > > not convinced of > > the value proposition. How often do we have sorted data, as opposed to > data > > that we > > may choose to sort before inserting or may choose not to sort and let the > > container > > do it? Also the big red flag to me is that sorting is the implementation > > details of the > > container. It could be implemented in any way. It also depends on the > > comparator of > > the container, which is getting into very sketchy territory IMO. Whereas > > unique-ness > > is much less tied to the container's implementation (sure it's related to > > the comparator > > but more weakly). > > > > It's not that hard when it does matters. > What I mean is flat_set/flat_map should be widely used for small data > sets. In > these rear cases the programmer would not try to squeeze performance out > and > would use the default one. This means slowing down in every small us > usecase. > > For example: > https://cs.chromium.org/chromium/src/components/ > omnibox/browser/in_memory_url_index_types.cc?q=in_memory_ > url_index_type+package:%5Echromium$&l=141 > > Char16Set Char16SetFromString16(const base::string16& term) { > return Char16Set(term.begin(), term.end()); > } > > would transform into: > > Char16Set Char16SetFromString16(const base::string16& term) { > auto tmp = term; > std::sort(tmp.begin(), tmp.end()); > tmp.erase(std::unique(tmp.begin(), tmp.end()), tmp.end()); > return Char16Set(base::ordered_unique, std::move(tmp)); > } > > So - this would become a trade off between short code and efficient code - > very > unpleasant. > > If we have special constructor, we can do: > > Char16Set Char16SetFromString16(const base::string16& term) { > return Char16Set(base::unstable, term.begin(), term.end()); > } > > And it's not that much of a choice. > I'm not super buying the argument that stable vs unstable will matter for small sets. It's mostly a case of the behaviour you want in that case which won't matter there, so they can just pick whatever and let the container do it. I'm mostly feeling very wary of making users of the class also write code that explicitly uses the class' comparator, as I have very little expectation of people doing that right, and if the comparator ever changed, I'd super expect things to break in subtle ways with no support from the compiler detecting anything going wrong. > > > > > > > > If you want to make things fast, then ensure your vector is unique. > We > > > can do > > > > unstable sort there rather than forcing the caller to do a sort > though, > > > it > > > > reduces the number of footguns by one. > > > > - The idea of splitting into diff constructors with a type like in > boost > > > is > > > > decent. It reminds me more of other cases where we choose a > constructor > > > with a > > > > type (gfx::Transform) > > > > > > > > Thoughts? > > > > > > I think that special constructors for different sortings are a good > way to > > > go > > > here. There is a question whether to keep plain insert(first, last) or > > > force > > > people to always choose one. I don't think I have a strong enough > opinion > > > on > > > this one. > > > > > > > I would keep them, people may use something other than a vector. > > > > I'm not sure we are on the same page here. > > This is a list of constructors that I have in mind: > flat_set(base::unstable, first, last) > flat_set(base::unstable, underlying_type) > flat_set(base::stable, first, last) > flat_set(base::stable, underlying_type) > flat_set(base::ordered_unique, first, last) > flat_set(first, last) // ? this is the one I have doubts about. > Oh thanks! I assumed this is being replaced by the others, and I think it should be. Was (ordered_unique, underlying_type) omitted on purpose here? We can even, instead of a type tag make it an action > > template <typename Sorter, typename I> > flat_tree(Sorter s, I first, I last) : impl_(Compare(), first, last) { > s(begin(), end(), value_compare()); > } > This is very interesting, but makes the API a bit harder to hold, as now it's not so obvious what is a valid sorter, its harder to find, and it may contradict the comparator of the flat_tree class.. it feels like it's too easy to get very wrong. > > > https://codereview.chromium.org/2776793002/ > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On Tue, Mar 28, 2017 at 5:58 PM, <dyaroshev@yandex-team.ru> wrote: > On 2017/03/28 21:42:57, danakj wrote: > > I see, this is a good point, there's no need to tie it to a type. The > > constructor is, I suppose, > > different because we're providing the initial storage. > > > > I do think that if we're going to add bulk-construct it would be > > appropriate to add bulk-insert > > at the same time though. You mentioned that we don't have a good method. > > Does "append > > everything and call sort" not work for insert in the same way it would > work > > for constructing? > > > > Keeping the (first, end) constructor(s) would more closely match the > > bulk-insert methods then > > if we went that route. But it would be appropriate to make different > > (first, end) constructors in > > the same way that there'd be different (vector) constructors IMO. > > > > With constructor you do not have an issue of elements that are already in. > Consider appending two elements to a 100 - do you really want to sort 102 > elements? > This feels like a poor use of the class, similar to calling insert(). I'm not sure that inserting 2 things is fundamentally really different. If you're doing it enough that performance matters you're doing it wrong? Do you disagree? > + this is where stability would be nice. > I had a better algorithm: > > size_type old_size = size(); > impl_.body_,insert(end(), first, last); > auto mid_point = begin() + old_size; > std::sort(mid_point, end, value_compare()); > std::inplace_merge(begin(), mid_point, end()); > erase(std::unique(begin(), end(), std::not_fn(value_compare()), end()); > > But it still less than ideal and for some examples it was slower than: > > std::copy(first, last, std::inserter(*this)); > > for some examples. > For small inserts only? Like they'd be better off calling individual inserts? > > This algorithm should be stable (it's not - there is a bug in libc++ > inplace_merge) but it should be. > > We need to find a good algorithm. Choose the correct merge. > But I still can't make my benchmarks produce stable results even for simple > cases, and we need benchmarks. > > > > https://codereview.chromium.org/2776793002/ > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On 2017/03/28 22:09:29, danakj wrote: > On Tue, Mar 28, 2017 at 5:58 PM, <mailto:dyaroshev@yandex-team.ru> wrote: > > > On 2017/03/28 21:42:57, danakj wrote: > > > I see, this is a good point, there's no need to tie it to a type. The > > > constructor is, I suppose, > > > different because we're providing the initial storage. > > > > > > I do think that if we're going to add bulk-construct it would be > > > appropriate to add bulk-insert > > > at the same time though. You mentioned that we don't have a good method. > > > Does "append > > > everything and call sort" not work for insert in the same way it would > > work > > > for constructing? > > > > > > Keeping the (first, end) constructor(s) would more closely match the > > > bulk-insert methods then > > > if we went that route. But it would be appropriate to make different > > > (first, end) constructors in > > > the same way that there'd be different (vector) constructors IMO. > > > > > > > With constructor you do not have an issue of elements that are already in. > > Consider appending two elements to a 100 - do you really want to sort 102 > > elements? > > > > This feels like a poor use of the class, similar to calling insert(). I'm > not sure that inserting 2 things is fundamentally really different. If > you're doing it enough that performance matters you're doing it wrong? Do > you disagree? > I do. You are doing it wrong if replacing flat_set with std::set gains you a win. I think that flat_set should provide the best way to insert many elements in it, possible for a flat_set. I don't believe it's acceptable for insert(first, last) to be slower than std::copy(first, last, std::inserter) for any input (within reason). Doing sort for a full container is bad - I did some computations: (worst case - move all the elements) Sum<i=0,n>{log(size + i)+ size} = n * size + Sum<i = 0, n>{log(size + i)} < n * size + n * log(n + size) Sorting full container (let's say you have a really good sort, so that coefficients are the same as for insert) (n + size) * log(n + size) So, (n + size) * log(n + size) > n * size + n (log(n + size)) size * log(n + size) > n * size log(n + size) > n So, for example, inserting 4 elements in 16 element set in the worst possible case is better done by one element. > > > + this is where stability would be nice. > > I had a better algorithm: > > > > size_type old_size = size(); > > impl_.body_,insert(end(), first, last); > > auto mid_point = begin() + old_size; > > std::sort(mid_point, end, value_compare()); > > std::inplace_merge(begin(), mid_point, end()); > > erase(std::unique(begin(), end(), std::not_fn(value_compare()), end()); > > > > But it still less than ideal and for some examples it was slower than: > > > > std::copy(first, last, std::inserter(*this)); > > > > for some examples. > > > > For small inserts only? Like they'd be better off calling individual > inserts? > Not only. When we always hit near the end, it's good too.
I did some research. In MS's implementation, both std::sort and std::stable_sort both resolve to insertion sort for counts less than 32. In MS at least, I don't think a temporary buffer is created in this case. Our glibc seems to use 16 for the same thing (I didn't check as much as on Windows). This is well within the expected range of flat_maps so I don't think we should be worrying about sort performance. Thanks for the long discussion. Let's put this to rest. - We'll write some types for the different constructor modes rather than have an enum. I don't care and it seems like you two agree. - We'll have a vector constructor to re-use the memory. - We won't add a vector insert. There's no benefit since we can't re-use the storage. Such callers should use the range-based implementation which we'll keep. - We'll add "first" and "last" modes to the constructor and bulk insert functions - "Preprocessed" mode exposes internals in a bad way and probably not super helpful. We'll skip that. - Le't skip "pick any". Our design philosophy for base is that there needs to be as few sharp edges as possible. It will be used by a lot of people, many of whom aren't fully paying attention. The performance benefits of stable sorting aren't compelling enough to justify surprising or incorrect behavior, and I think Dana and I agree here. This is especially the case given the typical sizes we expect for flat_maps don't matter for sort vs stable_sort. - We can work on better bulk-insertions separately.
On 2017/03/29 00:18:57, brettw (plz ping after 24h) wrote: > I did some research. In MS's implementation, both std::sort and std::stable_sort > both resolve to insertion sort for counts less than 32. In MS at least, I don't > think a temporary buffer is created in this case. Our glibc seems to use 16 for > the same thing (I didn't check as much as on Windows). This is well within the > expected range of flat_maps so I don't think we should be worrying about sort > performance. > > > Thanks for the long discussion. Let's put this to rest. > > - We'll write some types for the different constructor modes rather than have an > enum. I don't care and it seems like you two agree. > > - We'll have a vector constructor to re-use the memory. > How about defining it in terms of underlying_type? You can provide a default for it. > - We won't add a vector insert. There's no benefit since we can't re-use the > storage. Such callers should use the range-based implementation which we'll > keep. > > - We'll add "first" and "last" modes to the constructor and bulk insert > functions > > - "Preprocessed" mode exposes internals in a bad way and probably not super > helpful. We'll skip that. > > - Le't skip "pick any". Our design philosophy for base is that there needs to be > as few sharp edges as possible. It will be used by a lot of people, many of whom > aren't fully paying attention. The performance benefits of stable sorting aren't > compelling enough to justify surprising or incorrect behavior, and I think Dana > and I agree here. This is especially the case given the typical sizes we expect > for flat_maps don't matter for sort vs stable_sort. > I would really like the unstable option. I think that my sorter idea is not that bad. I have big enough sets that don't fit in a typical usecase in history. I don't want to pay for what I don't use.
Thanks Brett, your approach sounds good to me. Thanks for looking into sort implementations. On Wed, Mar 29, 2017 at 6:22 AM, <dyaroshev@yandex-team.ru> wrote: > On 2017/03/29 00:18:57, brettw (plz ping after 24h) wrote: > > I did some research. In MS's implementation, both std::sort and > std::stable_sort > > both resolve to insertion sort for counts less than 32. In MS at least, I > don't > > think a temporary buffer is created in this case. Our glibc seems to use > 16 > for > > the same thing (I didn't check as much as on Windows). This is well > within the > > expected range of flat_maps so I don't think we should be worrying about > sort > > performance. > > > > > > Thanks for the long discussion. Let's put this to rest. > > > > - We'll write some types for the different constructor modes rather than > have > an > > enum. I don't care and it seems like you two agree. > > > > - We'll have a vector constructor to re-use the memory. > > > > How about defining it in terms of underlying_type? > You can provide a default for it. > > > - We won't add a vector insert. There's no benefit since we can't re-use > the > > storage. Such callers should use the range-based implementation which > we'll > > keep. > > > > - We'll add "first" and "last" modes to the constructor and bulk insert > > functions > > > > - "Preprocessed" mode exposes internals in a bad way and probably not > super > > helpful. We'll skip that. > > > > - Le't skip "pick any". Our design philosophy for base is that there > needs to > be > > as few sharp edges as possible. It will be used by a lot of people, many > of > whom > > aren't fully paying attention. The performance benefits of stable sorting > aren't > > compelling enough to justify surprising or incorrect behavior, and I > think > Dana > > and I agree here. This is especially the case given the typical sizes we > expect > > for flat_maps don't matter for sort vs stable_sort. > > > > I would really like the unstable option. I think that my sorter idea is > not that > bad. > I have big enough sets that don't fit in a typical usecase in history. > I don't want to pay for what I don't use. > I'd like to see what kind of performance time/power differences you're looking at to justify it. -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On 2017/03/29 18:50:03, danakj wrote: > I'd like to see what kind of performance time/power differences you're > looking at to justify it. > I don't have the measurements on hand, I'll try by tomorrow. I know that when profiling HQP (the reason why I pushed for flat sets in the first place) now - this sort dominates everything else: https://cs.chromium.org/chromium/src/components/omnibox/browser/url_index_pri... I don't expect this to be a huge impact but: 1 - style: being forced to choose between sorted vector and a flat_set is unbelievably painful. 2 - Adding unstable_sort when we already have this kind of api (keep first, keep last etc) doesn't seem to be harmful. 3 - Not using unstable_sort when you can, would have small performance impact all over the application that would never show up in a profiler. 4 - Not carrying about order is a common case for bulk construction. 5 - Now you would force people to write base::keep_first or base::keep_last and the reader would have to figure out wether it's important. By having another version we would have a self documenting code.
On Wed, Mar 29, 2017 at 3:02 PM, <dyaroshev@yandex-team.ru> wrote: > On 2017/03/29 18:50:03, danakj wrote: > > I'd like to see what kind of performance time/power differences you're > > looking at to justify it. > > > > I don't have the measurements on hand, I'll try by tomorrow. > I know that when profiling HQP (the reason why I pushed for flat sets in > the > first place) now - this sort dominates everything else: > https://cs.chromium.org/chromium/src/components/omnibox/browser/url_index_ > private_data.cc?q=url_index_priv&l=607 > > I don't expect this to be a huge impact but: > > 1 - style: being forced to choose between sorted vector and a flat_set is > unbelievably painful. > 2 - Adding unstable_sort when we already have this kind of api (keep > first, keep > last etc) doesn't seem to be harmful. > 3 - Not using unstable_sort when you can, would have small performance > impact > all over the application that would never show up in a profiler. > 4 - Not carrying about order is a common case for bulk construction. > 5 - Now you would force people to write base::keep_first or > base::keep_last and > the reader would have to figure out wether it's important. By having > another > version we would have a self documenting code. > > Thanks, I can see the sort dominating, but I'm not sure that implies stable vs not would really matter. I will grant that choosing unstable would at least not leak knowledge about sorting behaviour out of the class, even less so than saying "everything i have is unique" would kinda leak that. If there's justification for it, providing an unstable option seems like the best choice here. I'm just trying to be skeptical until proven otherwise. > > > > https://codereview.chromium.org/2776793002/ > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On 2017/03/29 19:15:10, danakj wrote: > On Wed, Mar 29, 2017 at 3:02 PM, <mailto:dyaroshev@yandex-team.ru> wrote: > > > On 2017/03/29 18:50:03, danakj wrote: > > > I'd like to see what kind of performance time/power differences you're > > > looking at to justify it. > > > > > > > I don't have the measurements on hand, I'll try by tomorrow. > > I know that when profiling HQP (the reason why I pushed for flat sets in > > the > > first place) now - this sort dominates everything else: > > https://cs.chromium.org/chromium/src/components/omnibox/browser/url_index_ > > private_data.cc?q=url_index_priv&l=607 > > > > I don't expect this to be a huge impact but: > > > > 1 - style: being forced to choose between sorted vector and a flat_set is > > unbelievably painful. > > 2 - Adding unstable_sort when we already have this kind of api (keep > > first, keep > > last etc) doesn't seem to be harmful. > > 3 - Not using unstable_sort when you can, would have small performance > > impact > > all over the application that would never show up in a profiler. > > 4 - Not carrying about order is a common case for bulk construction. > > 5 - Now you would force people to write base::keep_first or > > base::keep_last and > > the reader would have to figure out wether it's important. By having > > another > > version we would have a self documenting code. > > > > > Thanks, I can see the sort dominating, but I'm not sure that implies stable > vs not would really matter. > > I will grant that choosing unstable would at least not leak knowledge about > sorting behaviour out of the class, even less so than saying "everything i > have is unique" would kinda leak that. If there's justification for it, > providing an unstable option seems like the best choice here. I'm just > trying to be skeptical until proven otherwise. > Don't you agree that having to do stable_sort for no good reason is annoying?
On Wed, Mar 29, 2017 at 3:32 PM, <dyaroshev@yandex-team.ru> wrote: > On 2017/03/29 19:15:10, danakj wrote: > > > On Wed, Mar 29, 2017 at 3:02 PM, <mailto:dyaroshev@yandex-team.ru> > wrote: > > > > > On 2017/03/29 18:50:03, danakj wrote: > > > > I'd like to see what kind of performance time/power differences > you're > > > > looking at to justify it. > > > > > > > > > > I don't have the measurements on hand, I'll try by tomorrow. > > > I know that when profiling HQP (the reason why I pushed for flat sets > in > > > the > > > first place) now - this sort dominates everything else: > > > https://cs.chromium.org/chromium/src/components/ > omnibox/browser/url_index_ > > > private_data.cc?q=url_index_priv&l=607 > > > > > > I don't expect this to be a huge impact but: > > > > > > 1 - style: being forced to choose between sorted vector and a flat_set > is > > > unbelievably painful. > > > 2 - Adding unstable_sort when we already have this kind of api (keep > > > first, keep > > > last etc) doesn't seem to be harmful. > > > 3 - Not using unstable_sort when you can, would have small performance > > > impact > > > all over the application that would never show up in a profiler. > > > 4 - Not carrying about order is a common case for bulk construction. > > > 5 - Now you would force people to write base::keep_first or > > > base::keep_last and > > > the reader would have to figure out wether it's important. By having > > > another > > > version we would have a self documenting code. > > > > > > > > Thanks, I can see the sort dominating, but I'm not sure that implies > stable > > vs not would really matter. > > > > I will grant that choosing unstable would at least not leak knowledge > about > > sorting behaviour out of the class, even less so than saying "everything > i > > have is unique" would kinda leak that. If there's justification for it, > > providing an unstable option seems like the best choice here. I'm just > > trying to be skeptical until proven otherwise. > > > > Don't you agree that having to do stable_sort for no good reason is > annoying? > Not if its needed in most cases and the perf difference isn't important. -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On 2017/03/29 19:42:56, danakj wrote: > On Wed, Mar 29, 2017 at 3:32 PM, <mailto:dyaroshev@yandex-team.ru> wrote: > > > On 2017/03/29 19:15:10, danakj wrote: > > > > > On Wed, Mar 29, 2017 at 3:02 PM, <mailto:dyaroshev@yandex-team.ru> > > wrote: > > > > > > > On 2017/03/29 18:50:03, danakj wrote: > > > > > I'd like to see what kind of performance time/power differences > > you're > > > > > looking at to justify it. > > > > > > > > > > > > > I don't have the measurements on hand, I'll try by tomorrow. > > > > I know that when profiling HQP (the reason why I pushed for flat sets > > in > > > > the > > > > first place) now - this sort dominates everything else: > > > > https://cs.chromium.org/chromium/src/components/ > > omnibox/browser/url_index_ > > > > private_data.cc?q=url_index_priv&l=607 > > > > > > > > I don't expect this to be a huge impact but: > > > > > > > > 1 - style: being forced to choose between sorted vector and a flat_set > > is > > > > unbelievably painful. > > > > 2 - Adding unstable_sort when we already have this kind of api (keep > > > > first, keep > > > > last etc) doesn't seem to be harmful. > > > > 3 - Not using unstable_sort when you can, would have small performance > > > > impact > > > > all over the application that would never show up in a profiler. > > > > 4 - Not carrying about order is a common case for bulk construction. > > > > 5 - Now you would force people to write base::keep_first or > > > > base::keep_last and > > > > the reader would have to figure out wether it's important. By having > > > > another > > > > version we would have a self documenting code. > > > > > > > > > > > Thanks, I can see the sort dominating, but I'm not sure that implies > > stable > > > vs not would really matter. > > > > > > I will grant that choosing unstable would at least not leak knowledge > > about > > > sorting behaviour out of the class, even less so than saying "everything > > i > > > have is unique" would kinda leak that. If there's justification for it, > > > providing an unstable option seems like the best choice here. I'm just > > > trying to be skeptical until proven otherwise. > > > > > > > Don't you agree that having to do stable_sort for no good reason is > > annoying? > > > > Not if its needed in most cases and the perf difference isn't important. > In most cases, especially for sets, there is zero need for stability. Even for maps this is true, when you can make that switch from insert by one to construct with a buffer you probably do no need stability. This is why in the original design it was unstable. brettw has some weird case in base::Value and this is the first one I know of. And once again, this is one of those things that would slow down code in general, mostly not hitting one particular spot.
Dana and I are optimizing for developer surprise and predictability. Being on a ~1000 person team for a very long time, this becomes the most important thing to optimize for, and is the biggest impediment for building a quality product. Incremental performance is actually easier to come by. An unstable option is easy to add later in a later pass if we want. I'm not opposed to it if we have data supporting its usefulness. If we can show that unstable sorting makes a nontrivial performance difference, then people that are paying attention and can use it can select that mode. If people aren't paying attention, we need to have the most predictable thing being the default. I didn't parameterize the container type because I couldn't imagine what one might want to replace it with later. The only potential addition is some vector with inline capacity, but I don't think we'ready for that kind of thing. This is also easy to add later if we need it.
On Wed, Mar 29, 2017 at 5:27 PM, <brettw@chromium.org> wrote: > Dana and I are optimizing for developer surprise and predictability. Being > on a > ~1000 person team for a very long time, this becomes the most important > thing to > optimize for, and is the biggest impediment for building a quality product. > Incremental performance is actually easier to come by. > > An unstable option is easy to add later in a later pass if we want. I'm not > opposed to it if we have data supporting its usefulness. If we can show > that > unstable sorting makes a nontrivial performance difference, then people > that are > paying attention and can use it can select that mode. If people aren't > paying > attention, we need to have the most predictable thing being the default. > > I didn't parameterize the container type because I couldn't imagine what > one > might want to replace it with later. The only potential addition is some > vector > with inline capacity, but I don't think we'ready for that kind of thing. > This is > also easy to add later if we need it. > I think this is fair. I think this would primarily impact any utilities we built around flat_map/set, as ordinary consumers are highly likely to just write std::vector anyways (and I don't know that any such utilities will ever exist). > > https://codereview.chromium.org/2776793002/ > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On 2017/03/29 21:33:02, danakj wrote: > On Wed, Mar 29, 2017 at 5:27 PM, <mailto:brettw@chromium.org> wrote: > > > Dana and I are optimizing for developer surprise and predictability. Being > > on a > > ~1000 person team for a very long time, this becomes the most important > > thing to > > optimize for, and is the biggest impediment for building a quality product. > > Incremental performance is actually easier to come by. > > > > An unstable option is easy to add later in a later pass if we want. I'm not > > opposed to it if we have data supporting its usefulness. If we can show > > that > > unstable sorting makes a nontrivial performance difference, then people > > that are > > paying attention and can use it can select that mode. If people aren't > > paying > > attention, we need to have the most predictable thing being the default. > > > > I didn't parameterize the container type because I couldn't imagine what > > one > > might want to replace it with later. The only potential addition is some > > vector > > with inline capacity, but I don't think we'ready for that kind of thing. > > This is > > also easy to add later if we need it. > > > > I think this is fair. I think this would primarily impact any utilities we > built around flat_map/set, as ordinary consumers are highly likely to just > write std::vector anyways (and I don't know that any such utilities will > ever exist). > > > > > > https://codereview.chromium.org/2776793002/ > > > > -- > You received this message because you are subscribed to the Google Groups > "Chromium-reviews" group. > To unsubscribe from this group and stop receiving emails from it, send an email > to mailto:chromium-reviews+unsubscribe@chromium.org. a) We have a stack_optimized vector in the codebase already: https://cs.chromium.org/chromium/src/base/containers/stack_container.h?q=stac... It doesn't have an exactly correct interface, but the most important part is there. b) If I'll do the patch with unstable sort, will you accept it? This explicitly slows down browser for no good reason and it's extrimly irritating. I still don't think that reading flat_set<int> my_set(base::keep_first, {1, 2, 3}); is any better than: flat_set<int> my_set(base::keep_any, {1, 2, 3}); and the only thing it does is misleads the reader into thinking that which one you keep is important. However, I can't force you to do what I think is right and it doesn't seem like my attempts to convince you help, so do what you think is right.
On Wed, Mar 29, 2017 at 5:50 PM, <dyaroshev@yandex-team.ru> wrote: > On 2017/03/29 21:33:02, danakj wrote: > > > On Wed, Mar 29, 2017 at 5:27 PM, <mailto:brettw@chromium.org> wrote: > > > > > Dana and I are optimizing for developer surprise and predictability. > Being > > > on a > > > ~1000 person team for a very long time, this becomes the most important > > > thing to > > > optimize for, and is the biggest impediment for building a quality > product. > > > Incremental performance is actually easier to come by. > > > > > > An unstable option is easy to add later in a later pass if we want. > I'm not > > > opposed to it if we have data supporting its usefulness. If we can show > > > that > > > unstable sorting makes a nontrivial performance difference, then people > > > that are > > > paying attention and can use it can select that mode. If people aren't > > > paying > > > attention, we need to have the most predictable thing being the > default. > > > > > > I didn't parameterize the container type because I couldn't imagine > what > > > one > > > might want to replace it with later. The only potential addition is > some > > > vector > > > with inline capacity, but I don't think we'ready for that kind of > thing. > > > This is > > > also easy to add later if we need it. > > > > > > > I think this is fair. I think this would primarily impact any utilities > we > > built around flat_map/set, as ordinary consumers are highly likely to > just > > write std::vector anyways (and I don't know that any such utilities will > > ever exist). > > > > > > > > > > https://codereview.chromium.org/2776793002/ > > > > > > > -- > > You received this message because you are subscribed to the Google Groups > > "Chromium-reviews" group. > > To unsubscribe from this group and stop receiving emails from it, send an > email > > to mailto:chromium-reviews+unsubscribe@chromium.org. > > a) We have a stack_optimized vector in the codebase already: > https://cs.chromium.org/chromium/src/base/containers/ > stack_container.h?q=stack_con+package:%5Echromium$&l=232 > It doesn't have an exactly correct interface, but the most important part > is > there. > > b) If I'll do the patch with unstable sort, will you accept it? This > explicitly > slows down browser for no good reason and it's extrimly irritating. > I think it should be motivated by performance results showing its value, just as any optimization should. > I still don't think that reading > > flat_set<int> my_set(base::keep_first, {1, 2, 3}); > > is any better than: > > flat_set<int> my_set(base::keep_any, {1, 2, 3}); > > and the only thing it does is misleads the reader into thinking that which > one > you keep is important. > > However, I can't force you to do what I think is right and it doesn't seem > like > my attempts to convince you help, so do what you think is right. > It creates arbitrary behaviour which people may accidentally rely on and then the order changes on some other platform/library/device and creates a bug. It's a potential footgun and it's cost should have a value tradeoff IMO. -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
Ok, so I did the measurements. On mac std::stable_sort makes HQP about 0-7% slower. On windows it's up to 10% faster!!!! I have two possible suggestions: a - std::sort is more of a "quickish" on Windows than on Mac sort and it's bad on sorted ranges (in history we use sort to merge many sets). b - It's a bug in windows stl. Ok, I'm willing to post pone unstable until I have good benchmarking proves.
Checkpoint
https://codereview.chromium.org/2776793002/diff/40001/base/containers/flat_map.h File base/containers/flat_map.h (right): https://codereview.chromium.org/2776793002/diff/40001/base/containers/flat_ma... base/containers/flat_map.h:162: flat_map(std::vector<value_type>&& items, Accept by value and mke it explicit please. https://codereview.chromium.org/2776793002/diff/40001/base/containers/flat_tr... File base/containers/flat_tree.h (right): https://codereview.chromium.org/2776793002/diff/40001/base/containers/flat_tr... base/containers/flat_tree.h:16: }; I would prefer delegates that do the algorithm. https://codereview.chromium.org/2776793002/diff/40001/base/containers/flat_tr... base/containers/flat_tree.h:117: const key_compare& comp = key_compare()); I think this parametr should go first. And should always be stated explicitly. https://codereview.chromium.org/2776793002/diff/40001/base/containers/flat_tr... base/containers/flat_tree.h:339: } Just do the unique algorithm first. And, I still think it should be Sorter. And then we call sorter to do sort and unique for us.
https://codereview.chromium.org/2776793002/diff/40001/base/containers/flat_tr... File base/containers/flat_tree.h (right): https://codereview.chromium.org/2776793002/diff/40001/base/containers/flat_tr... base/containers/flat_tree.h:16: }; On 2017/03/31 07:46:52, dyaroshev wrote: > I would prefer delegates that do the algorithm. Possible implementation: https://godbolt.org/g/RddXRt https://codereview.chromium.org/2776793002/diff/40001/base/containers/flat_tr... base/containers/flat_tree.h:42: } A - this should be defined as std::unique - it uses the negation of compare as a predicate. B - this is a generic algorithm - it should be tested individually for different cases What about: 1, 2, 2? Will this algorithm works? I'm concerned about keeping dest == first until the first !compare. I don't like dest comparisons - this should be two loops. Here is my take on it: https://godbolt.org/g/XQyhHA
On 2017/03/31 13:58:46, dyaroshev wrote: > https://codereview.chromium.org/2776793002/diff/40001/base/containers/flat_tr... > File base/containers/flat_tree.h (right): > > https://codereview.chromium.org/2776793002/diff/40001/base/containers/flat_tr... > base/containers/flat_tree.h:16: }; > On 2017/03/31 07:46:52, dyaroshev wrote: > > I would prefer delegates that do the algorithm. > > Possible implementation: > > https://godbolt.org/g/RddXRt > > https://codereview.chromium.org/2776793002/diff/40001/base/containers/flat_tr... > base/containers/flat_tree.h:42: } > A - this should be defined as std::unique - it uses the negation of compare as a > predicate. > > B - this is a generic algorithm - it should be tested individually for different > cases > > What about: 1, 2, 2? Will this algorithm works? I'm concerned about keeping > dest == first until the first !compare. > > I don't like dest comparisons - this should be two loops. > Here is my take on it: > https://godbolt.org/g/XQyhHA By no means I insist on my version, it's just a view on how we can get rid of dest compare. And there is at least one bug -unique_last should use move_iterator when calling unique_copy_last.
Update
Sorry, you weren't supposed to review the "Checkpoint" one, that was just me uploading some code I wrote at home to get to work. New snap up. I didn't quite understand all of your comments in the last one. I did the enum for the duplicate handling instead of a type tag because the type caused the number of constructors to explode. I passed the vector in the constructor only by rvalue reference. If you have a normal vector where you make a copy, I think you should use the constructor that looks like other STL containers (the range one). The new one is specifically for transferring ownership so I wanted to make it harder to screw up (it's easy to forget std::move at the call site and to make the extra copy). I did the enum after the range/vector in the constructor because I felt like it should look like STL. Currently STL is undefined ordering but does FIRST in practice, and there's an open item in the standards committee to specify this behavior.
The CQ bit was checked by brettw@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: cast_shell_linux on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/cast_shell_linu...) ios-device on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device/builds...) ios-device-xcode-clang on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device-xcode-...) ios-simulator-xcode-clang on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator-xco...)
On 2017/03/31 22:31:56, brettw (plz ping after 24h) wrote: > Sorry, you weren't supposed to review the "Checkpoint" one, that was just me > uploading some code I wrote at home to get to work. > > New snap up. I didn't quite understand all of your comments in the last one. > > I did the enum for the duplicate handling instead of a type tag because the type > caused the number of constructors to explode. > Not if you pass an object as a template argument and give the sort/unique responsobility to it. Like this: struct keep_first_t { void operator()(...) const { }; }; contexpr keep_first_t keep_first; template <typename SortingPolicy, typename I> flat_tree(SortingPolicy policy, I f, I l) :... { erase(policy(begin(), end()), end); } Call looks like: flat_set<int> test(base::keep_first, {1, 3, 5}); > I passed the vector in the constructor only by rvalue reference. If you have a > normal vector where you make a copy, I think you should use the constructor that > looks like other STL containers (the range one). The new one is specifically for > transferring ownership so I wanted to make it harder to screw up (it's easy to > forget std::move at the call site and to make the extra copy). > I don't think it's the correct way to do it. It should be an explicit constructor - that's for sure, but there is nothing wrong with: flat_set<int> (my_vec); And oh so much wrong with: flat_set<int>(std::vector<int>(my_vec)); If you actually want to copy. Rvalue references should really be used in a few very rare cases. I don't think that explicitly only allowing rvalue references help anything and it is a serious deviation from usual interfaces. > I did the enum after the range/vector in the constructor because I felt like it > should look like STL. Currently STL is undefined ordering but does FIRST in > practice, and there's an open item in the standards committee to specify this > behavior. I'm not sure about having flat_tree(first, last), but I'm pretty sure that I would like to have sorting policy first. You can overload instead of a default value.
I uploaded a new snapshot. I made the duplicate handling explicit which will have the same practical affect of making it first. It really makes more sense to me as the last parameter since I want to match STL for the first parameters when possible. I passed vector by value. I kept the enum. Passing in a templatized algorithm seems to me to make everything overlay abstract and difficult to understand and provides no significant benefit.
The CQ bit was checked by brettw@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: cast_shell_linux on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/cast_shell_linu...) ios-simulator-xcode-clang on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator-xco...) mac_chromium_rel_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_...)
A few more fixes
The CQ bit was checked by brettw@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-device-xcode-clang on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device-xcode-...) mac_chromium_compile_dbg_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_comp...) mac_chromium_rel_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_...)
On 2017/04/03 23:00:09, brettw (plz ping after 24h) wrote: > I uploaded a new snapshot. I made the duplicate handling explicit which will > have the same practical affect of making it first. It really makes more sense to > me as the last parameter since I want to match STL for the first parameters when > possible. > > I passed vector by value. > > I kept the enum. Passing in a templatized algorithm seems to me to make > everything overlay abstract and difficult to understand and provides no > significant benefit. Ok. Clean up and test the algorithm please. I think that initiliazer_list constructor/assignment should also accept unique policy.
More
I initially didn't add the dupe handling to the initializer list constructors because I thought it might interfere with the syntax you liked (I don't care either way). I changed this and updated the callers. I didn't change the operator= initializer list since that really doesn't make any sense. If you think it's important to have the dupe handling be explicit everywhere, we should remove operator=(initializer_list) and have people use operator=(flat_set&&)
The CQ bit was checked by brettw@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-device on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device/builds...) ios-device-xcode-clang on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device-xcode-...) ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...) mac_chromium_rel_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_...)
GCC fixes
The CQ bit was checked by brettw@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: cast_shell_android on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/cast_shell_a...) cast_shell_linux on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/cast_shell_linu...)
Merge
The CQ bit was checked by brettw@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
I'm cool with the API, mostly questions about comments, and about LastUnique https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_m... File base/containers/flat_map.h (right): https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_m... base/containers/flat_map.h:41: // Constructors (inputs need not be sorted, first of duplicates will be kept): I don't understand this comment in the presence of FlatContainerDupes. The first may not be kept. https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_m... base/containers/flat_map.h:148: // the input be sorted. If there are duplcates, the first one in the input Same here https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_m... base/containers/flat_map.h:167: // Takes the first if there are duplicates in the initializer list. and same here https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_s... File base/containers/flat_set.h (right): https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_s... base/containers/flat_set.h:25: // Usage guidance: Noticing that guidance of some sort like this is conspicuously missing from flat_map.h https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_s... base/containers/flat_set.h:31: // Using the cosntructor that takes a vector rvalue allows you to re-use no rvalue https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_s... base/containers/flat_set.h:45: // constructor that moves a vector (not required to be sorted) is the most receives a vector? https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_s... base/containers/flat_set.h:82: // Constructors (inputs need not be sorted, first of duplicates will be kept): Like in the map, the comments saying first is kept seems wrong https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_s... File base/containers/flat_set_unittest.cc (right): https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_s... base/containers/flat_set_unittest.cc:41: base::KEEP_FIRST_OF_DUPES); how come only keep first is tested? https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_t... File base/containers/flat_tree.h (right): https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_t... base/containers/flat_tree.h:21: Iterator LastUnique(Iterator first, Iterator last, BinaryPredicate compare) { Maybe this belongs in stl_util.h? Or somewhere.. but I don't think in flat_tree.h https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_t... base/containers/flat_tree.h:22: if (first == last) Curious why you wrote your own LastUnique instead of using reverse iterators and std::unique? https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_t... base/containers/flat_tree.h:106: // the input be sorted. If there are duplcates, the first one in the input Same as from map, the first is kept part seems wrong? https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_t... base/containers/flat_tree.h:319: // We need to preserve stability here and take only the first element. "first element" isn't always correct, right?
https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_t... File base/containers/flat_tree.h (right): https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_t... base/containers/flat_tree.h:22: if (first == last) On 2017/04/05 21:33:43, danakj wrote: > Curious why you wrote your own LastUnique instead of using reverse iterators and > std::unique? It moves elements in the wrong direction - you'll get elements collected in the beginning of the vector. PS: I still think this algorithm can be implemented better.
Review comments
Review comments addressed. dyaroshev: I honestly couldn't follow the example you posted. It seems the one simple loop in on function is both efficient and correct. https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_m... File base/containers/flat_map.h (right): https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_m... base/containers/flat_map.h:41: // Constructors (inputs need not be sorted, first of duplicates will be kept): On 2017/04/05 21:33:42, danakj wrote: > I don't understand this comment in the presence of FlatContainerDupes. The first > may not be kept. Done. https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_m... base/containers/flat_map.h:148: // the input be sorted. If there are duplcates, the first one in the input On 2017/04/05 21:33:42, danakj wrote: > Same here Done. https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_m... base/containers/flat_map.h:167: // Takes the first if there are duplicates in the initializer list. On 2017/04/05 21:33:42, danakj wrote: > and same here Done. https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_s... File base/containers/flat_set.h (right): https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_s... base/containers/flat_set.h:31: // Using the cosntructor that takes a vector rvalue allows you to re-use On 2017/04/05 21:33:43, danakj wrote: > no rvalue Done. https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_s... base/containers/flat_set.h:45: // constructor that moves a vector (not required to be sorted) is the most On 2017/04/05 21:33:43, danakj wrote: > receives a vector? Done. https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_s... base/containers/flat_set.h:82: // Constructors (inputs need not be sorted, first of duplicates will be kept): On 2017/04/05 21:33:42, danakj wrote: > Like in the map, the comments saying first is kept seems wrong Done. https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_s... File base/containers/flat_set_unittest.cc (right): https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_s... base/containers/flat_set_unittest.cc:41: base::KEEP_FIRST_OF_DUPES); On 2017/04/05 21:33:43, danakj wrote: > how come only keep first is tested? Since set is just a typedef for tree, most of the tests are there and I was trying to avoid duplicating huge amounts of code. This file contains a few cases to make sure the typedef for unique elements is getting used properly. https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_t... File base/containers/flat_tree.h (right): https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_t... base/containers/flat_tree.h:21: Iterator LastUnique(Iterator first, Iterator last, BinaryPredicate compare) { On 2017/04/05 21:33:43, danakj wrote: > Maybe this belongs in stl_util.h? Or somewhere.. but I don't think in > flat_tree.h I started by writing this in a separate file (I think stl_util is too generic) but I couldn't imagine when this would be used outside of this context. In the interest of not cluttering up the common places with an algorithm that will probably never be re-used, I just put it as an internal implementation detail of flat_tree. https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_t... base/containers/flat_tree.h:106: // the input be sorted. If there are duplcates, the first one in the input On 2017/04/05 21:33:43, danakj wrote: > Same as from map, the first is kept part seems wrong? Done. https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_t... base/containers/flat_tree.h:319: // We need to preserve stability here and take only the first element. On 2017/04/05 21:33:43, danakj wrote: > "first element" isn't always correct, right? Done.
The CQ bit was checked by brettw@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
LGTM but: https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_t... File base/containers/flat_tree.h (right): https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_t... base/containers/flat_tree.h:21: Iterator LastUnique(Iterator first, Iterator last, BinaryPredicate compare) { On 2017/04/07 21:59:04, brettw (plz ping after 24h) wrote: > On 2017/04/05 21:33:43, danakj wrote: > > Maybe this belongs in stl_util.h? Or somewhere.. but I don't think in > > flat_tree.h > > I started by writing this in a separate file (I think stl_util is too generic) > but I couldn't imagine when this would be used outside of this context. In the > interest of not cluttering up the common places with an algorithm that will > probably never be re-used, I just put it as an internal implementation detail of > flat_tree. I think it should be in base::internal if you want it to be an implementation detail.
Move to internal
The CQ bit was checked by brettw@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from danakj@chromium.org Link to the patchset: https://codereview.chromium.org/2776793002/#ps220001 (title: "Move to internal")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_t... File base/containers/flat_tree.h (right): https://codereview.chromium.org/2776793002/diff/180001/base/containers/flat_t... base/containers/flat_tree.h:21: Iterator LastUnique(Iterator first, Iterator last, BinaryPredicate compare) { On 2017/04/07 22:02:53, danakj wrote: > On 2017/04/07 21:59:04, brettw (plz ping after 24h) wrote: > > On 2017/04/05 21:33:43, danakj wrote: > > > Maybe this belongs in stl_util.h? Or somewhere.. but I don't think in > > > flat_tree.h > > > > I started by writing this in a separate file (I think stl_util is too generic) > > but I couldn't imagine when this would be used outside of this context. In the > > interest of not cluttering up the common places with an algorithm that will > > probably never be re-used, I just put it as an internal implementation detail > of > > flat_tree. > > I think it should be in base::internal if you want it to be an implementation > detail. Oh, I meant to do that actually. Thanks.
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: cast_shell_linux on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/cast_shell_linu...) mac_chromium_rel_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_...)
Put back media log change lost in merge
The CQ bit was checked by brettw@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from danakj@chromium.org Link to the patchset: https://codereview.chromium.org/2776793002/#ps240001 (title: "Put back media log change lost in merge")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 240001, "attempt_start_ts": 1491604333337090, "parent_rev": "d3dcdb05fa128fc8958a5772734847198324e25b", "commit_rev": "5c72fdadcbe0e5914a2a1c935992975e31ff3e67"}
Message was sent while issue was closed.
Description was changed from ========== Make flat containers stable, allow constructing from vector. The stability matters more for maps that it did for sets, so I think it makes sense if all of the flat_set/map containers are stable sorted. Since we are unlikely to switch out the storage type, I made constructors that take a moved vector. I want to use this pattern to collect base::Values and then construct a map from it all at once. ========== to ========== Make flat containers stable, allow constructing from vector. The stability matters more for maps that it did for sets, so I think it makes sense if all of the flat_set/map containers are stable sorted. Since we are unlikely to switch out the storage type, I made constructors that take a moved vector. I want to use this pattern to collect base::Values and then construct a map from it all at once. Review-Url: https://codereview.chromium.org/2776793002 Cr-Commit-Position: refs/heads/master@{#463094} Committed: https://chromium.googlesource.com/chromium/src/+/5c72fdadcbe0e5914a2a1c935992... ==========
Message was sent while issue was closed.
Committed patchset #13 (id:240001) as https://chromium.googlesource.com/chromium/src/+/5c72fdadcbe0e5914a2a1c935992... |