|
|
Created:
5 years, 9 months ago by robliao Modified:
5 years, 8 months ago CC:
chromium-reviews, miu+watch_chromium.org, chromium-apps-reviews_chromium.org, markusheintz_, erikwright+watch_chromium.org, extensions-reviews_chromium.org, rkaplow, tfarina Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionRefactor Uses of std::set to std::vector in SimpleFeature
Local profiling suggests that converting std::set to std::vector will save
around 10% of the overall startup time, likely due to the reduced allocs
involved with std::vector.
This change also reduces lookups by traversing the incoming dictionary
and iterating on the available values instead of looking through all
values for all features.
Finally, cleaned up and enforced the testing accessors via code and friend
declarations
BUG=460987
Committed: https://crrev.com/43eb02c2ffcfa18adfcbf34cd0507a3d7494febe
Cr-Commit-Position: refs/heads/master@{#322609}
Patch Set 1 #Patch Set 2 : IsIdInArray and other Cleanup #
Total comments: 6
Patch Set 3 : Convert STLCount to const ref argument #Patch Set 4 : Sync to Latest #Patch Set 5 : Remove Unsigned From Checks with STLCount #
Total comments: 2
Messages
Total messages: 41 (10 generated)
robliao@chromium.org changed reviewers: + rockot@chromium.org
rockot: Please review the changes in c/b/e/* and e/*. Thanks.
Have you considered using a base::hash_set instead? While it may consume more memory overall, I suspect it might have better allocation patterns (wrt time) than either std::set or std::vector, with the bonus that it doesn't lose fast lookups.
On 2015/03/26 01:14:58, Ken Rockot wrote: > Have you considered using a base::hash_set instead? While it may consume more > memory overall, I suspect it might have better allocation patterns (wrt time) > than either std::set or std::vector, with the bonus that it doesn't lose fast > lookups. I disagree. All callers of IsIdInSet() are copying all elements into std::strings *and* then building a container to pass as an argument. This is very wasteful. My suggestion: The 2nd arg should just be a const char* array: static bool IsIdInList( const std::string& extension_id, const char* list[], int list_length); Then, for example, the call from tab_capture_api.cc achieves zero copying: !SimpleFeature::IsIdInList( extension_id, kWhitelist, arraysize(kWhitelist)) BTW--Thanks for fixing this! :)
On 2015/03/26 01:31:58, miu wrote: > On 2015/03/26 01:14:58, Ken Rockot wrote: > > Have you considered using a base::hash_set instead? While it may consume more > > memory overall, I suspect it might have better allocation patterns (wrt time) > > than either std::set or std::vector, with the bonus that it doesn't lose fast > > lookups. > > I disagree. All callers of IsIdInSet() are copying all elements into > std::strings *and* then building a container to pass as an argument. This is > very wasteful. Sorry, I missed that part. I was actually referring to the internal storage of blacklists and whitelists, etc.
Also: I do agree with your suggestion for IsIdInList! On Wed, Mar 25, 2015 at 6:35 PM, <rockot@chromium.org> wrote: > On 2015/03/26 01:31:58, miu wrote: > >> On 2015/03/26 01:14:58, Ken Rockot wrote: >> > Have you considered using a base::hash_set instead? While it may consume >> > more > >> > memory overall, I suspect it might have better allocation patterns (wrt >> > time) > >> > than either std::set or std::vector, with the bonus that it doesn't lose >> > fast > >> > lookups. >> > > I disagree. All callers of IsIdInSet() are copying all elements into >> std::strings *and* then building a container to pass as an argument. >> This is >> very wasteful. >> > > Sorry, I missed that part. I was actually referring to the internal > storage of > blacklists and whitelists, etc. > > https://codereview.chromium.org/1010973013/ > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On 2015/03/26 01:37:17, Ken Rockot wrote: > Also: I do agree with your suggestion for IsIdInList! > > On Wed, Mar 25, 2015 at 6:35 PM, <mailto:rockot@chromium.org> wrote: > > > On 2015/03/26 01:31:58, miu wrote: > > > >> On 2015/03/26 01:14:58, Ken Rockot wrote: > >> > Have you considered using a base::hash_set instead? While it may consume > >> > > more > > > >> > memory overall, I suspect it might have better allocation patterns (wrt > >> > > time) > > > >> > than either std::set or std::vector, with the bonus that it doesn't lose > >> > > fast > > > >> > lookups. > >> > > > > I disagree. All callers of IsIdInSet() are copying all elements into > >> std::strings *and* then building a container to pass as an argument. > >> This is > >> very wasteful. > >> > > > > Sorry, I missed that part. I was actually referring to the internal > > storage of > > blacklists and whitelists, etc. > > > > https://codereview.chromium.org/1010973013/ > > > > To unsubscribe from this group and stop receiving emails from it, send an email > to mailto:chromium-reviews+unsubscribe@chromium.org. I went with std::vector for the preallocation support with the search tradeoff, especially since heap ops showed up in some of the debug builds (less of an issue in the release builds). I figured the n in the O(n) search would be small enough to be okay. Since I am sorting them on the way out, I could change the search to std::binary_search, and we'd at least get O(lg n) lookups. Do we expect large n in these lists? The IsIdInList looks like a good suggestion. I didn't know if there were other plans for it. I'll change this to IsIdInArray and that looks like we can make IsIdInList private.
The n might be small enough to be OK. These functions are called pretty often though and hash_set's preallocation behavior should be similar to vector's, so I'm having trouble seeing why we shouldn't just use hash_set here instead of vector. What's the risk? On Thu, Mar 26, 2015 at 10:36 AM, <robliao@chromium.org> wrote: > On 2015/03/26 01:37:17, Ken Rockot wrote: > >> Also: I do agree with your suggestion for IsIdInList! >> > > On Wed, Mar 25, 2015 at 6:35 PM, <mailto:rockot@chromium.org> wrote: >> > > > On 2015/03/26 01:31:58, miu wrote: >> > >> >> On 2015/03/26 01:14:58, Ken Rockot wrote: >> >> > Have you considered using a base::hash_set instead? While it may >> consume >> >> >> > more >> > >> >> > memory overall, I suspect it might have better allocation patterns >> (wrt >> >> >> > time) >> > >> >> > than either std::set or std::vector, with the bonus that it doesn't >> lose >> >> >> > fast >> > >> >> > lookups. >> >> >> > >> > I disagree. All callers of IsIdInSet() are copying all elements into >> >> std::strings *and* then building a container to pass as an argument. >> >> This is >> >> very wasteful. >> >> >> > >> > Sorry, I missed that part. I was actually referring to the internal >> > storage of >> > blacklists and whitelists, etc. >> > >> > https://codereview.chromium.org/1010973013/ >> > >> > > To unsubscribe from this group and stop receiving emails from it, send an >> > email > >> to mailto:chromium-reviews+unsubscribe@chromium.org. >> > > I went with std::vector for the preallocation support with the search > tradeoff, > especially > since heap ops showed up in some of the debug builds (less of an issue in > the > release builds). > I figured the n in the O(n) search would be small enough to be okay. Since > I am > sorting them on > the way out, I could change the search to std::binary_search, and we'd at > least > get O(lg n) lookups. > Do we expect large n in these lists? > > The IsIdInList looks like a good suggestion. I didn't know if there were > other > plans for it. > I'll change this to IsIdInArray and that looks like we can make IsIdInList > private. > > https://codereview.chromium.org/1010973013/ > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On 2015/03/26 17:51:44, Ken Rockot wrote: > The n might be small enough to be OK. These functions are called pretty > often though and hash_set's preallocation behavior should be similar to > vector's, so I'm having trouble seeing why we shouldn't just use hash_set > here instead of vector. What's the risk? > > On Thu, Mar 26, 2015 at 10:36 AM, <mailto:robliao@chromium.org> wrote: > > > On 2015/03/26 01:37:17, Ken Rockot wrote: > > > >> Also: I do agree with your suggestion for IsIdInList! > >> > > > > On Wed, Mar 25, 2015 at 6:35 PM, <mailto:rockot@chromium.org> wrote: > >> > > > > > On 2015/03/26 01:31:58, miu wrote: > >> > > >> >> On 2015/03/26 01:14:58, Ken Rockot wrote: > >> >> > Have you considered using a base::hash_set instead? While it may > >> consume > >> >> > >> > more > >> > > >> >> > memory overall, I suspect it might have better allocation patterns > >> (wrt > >> >> > >> > time) > >> > > >> >> > than either std::set or std::vector, with the bonus that it doesn't > >> lose > >> >> > >> > fast > >> > > >> >> > lookups. > >> >> > >> > > >> > I disagree. All callers of IsIdInSet() are copying all elements into > >> >> std::strings *and* then building a container to pass as an argument. > >> >> This is > >> >> very wasteful. > >> >> > >> > > >> > Sorry, I missed that part. I was actually referring to the internal > >> > storage of > >> > blacklists and whitelists, etc. > >> > > >> > https://codereview.chromium.org/1010973013/ > >> > > >> > > > > To unsubscribe from this group and stop receiving emails from it, send an > >> > > email > > > >> to mailto:chromium-reviews+unsubscribe@chromium.org. > >> > > > > I went with std::vector for the preallocation support with the search > > tradeoff, > > especially > > since heap ops showed up in some of the debug builds (less of an issue in > > the > > release builds). > > I figured the n in the O(n) search would be small enough to be okay. Since > > I am > > sorting them on > > the way out, I could change the search to std::binary_search, and we'd at > > least > > get O(lg n) lookups. > > Do we expect large n in these lists? > > > > The IsIdInList looks like a good suggestion. I didn't know if there were > > other > > plans for it. > > I'll change this to IsIdInArray and that looks like we can make IsIdInList > > private. > > > > https://codereview.chromium.org/1010973013/ > > > > To unsubscribe from this group and stop receiving emails from it, send an email > to mailto:chromium-reviews+unsubscribe@chromium.org. My understanding of hash_set preallocation is that it gives you number of buckets but doesn't guarantee that you won't get collisions on insertion, meaning you'd have to start allocating space to accommodate the collisions. Searches will also have to start traversing on collisions. We do lose ordering with hash_set, which will be an annoyance to the tests, but not insurmountable. Is the platforms supposed to be test only as well? It wasn't grouped in with the original set.
On 2015/03/26 18:08:31, robliao wrote: > On 2015/03/26 17:51:44, Ken Rockot wrote: > > The n might be small enough to be OK. These functions are called pretty > > often though and hash_set's preallocation behavior should be similar to > > vector's, so I'm having trouble seeing why we shouldn't just use hash_set > > here instead of vector. What's the risk? > > > > On Thu, Mar 26, 2015 at 10:36 AM, <mailto:robliao@chromium.org> wrote: > > > > > On 2015/03/26 01:37:17, Ken Rockot wrote: > > > > > >> Also: I do agree with your suggestion for IsIdInList! > > >> > > > > > > On Wed, Mar 25, 2015 at 6:35 PM, <mailto:rockot@chromium.org> wrote: > > >> > > > > > > > On 2015/03/26 01:31:58, miu wrote: > > >> > > > >> >> On 2015/03/26 01:14:58, Ken Rockot wrote: > > >> >> > Have you considered using a base::hash_set instead? While it may > > >> consume > > >> >> > > >> > more > > >> > > > >> >> > memory overall, I suspect it might have better allocation patterns > > >> (wrt > > >> >> > > >> > time) > > >> > > > >> >> > than either std::set or std::vector, with the bonus that it doesn't > > >> lose > > >> >> > > >> > fast > > >> > > > >> >> > lookups. > > >> >> > > >> > > > >> > I disagree. All callers of IsIdInSet() are copying all elements into > > >> >> std::strings *and* then building a container to pass as an argument. > > >> >> This is > > >> >> very wasteful. > > >> >> > > >> > > > >> > Sorry, I missed that part. I was actually referring to the internal > > >> > storage of > > >> > blacklists and whitelists, etc. > > >> > > > >> > https://codereview.chromium.org/1010973013/ > > >> > > > >> > > > > > > To unsubscribe from this group and stop receiving emails from it, send an > > >> > > > email > > > > > >> to mailto:chromium-reviews+unsubscribe@chromium.org. > > >> > > > > > > I went with std::vector for the preallocation support with the search > > > tradeoff, > > > especially > > > since heap ops showed up in some of the debug builds (less of an issue in > > > the > > > release builds). > > > I figured the n in the O(n) search would be small enough to be okay. Since > > > I am > > > sorting them on > > > the way out, I could change the search to std::binary_search, and we'd at > > > least > > > get O(lg n) lookups. > > > Do we expect large n in these lists? > > > > > > The IsIdInList looks like a good suggestion. I didn't know if there were > > > other > > > plans for it. > > > I'll change this to IsIdInArray and that looks like we can make IsIdInList > > > private. > > > > > > https://codereview.chromium.org/1010973013/ > > > > > > > To unsubscribe from this group and stop receiving emails from it, send an > email > > to mailto:chromium-reviews+unsubscribe@chromium.org. > > My understanding of hash_set preallocation is that it gives you number of > buckets but doesn't guarantee that you won't get collisions on insertion, > meaning > you'd have to start allocating space to accommodate the collisions. Searches > will also have to start traversing on collisions. > > We do lose ordering with hash_set, which will be an annoyance to the tests, but > not > insurmountable. Is the platforms supposed to be test only as well? It wasn't > grouped in > with the original set. Well it doesn't guarantee that you won't get collisions, but it should allocate more buckets appropriately during rehash if the load factor gets too high. In any case I didn't think about the tests and really don't want to block this on a minor yak shave that might only amount to a trivial difference in practical performance, so lgtm.
miu@chromium.org changed reviewers: + miu@chromium.org
Before you commit, please don't forget to change IsIdInList() as discussed. ;-)
On 2015/03/26 18:46:16, miu wrote: > Before you commit, please don't forget to change IsIdInList() as discussed. ;-) Definitely not! It's in progress :-D
Patchset #2 (id:20001) has been deleted
Added IsIdInArray, made IsIdInList private, and made platforms private. I considered unifying IsIdInArray and IsIdInList, but I couldn't find a suitable non-copying std container array wrapper :-/
robliao@chromium.org changed reviewers: + thestig@chromium.org
thestig@chromium.org: Please review changes in stl_util.h and c/b/content* Thanks!
https://codereview.chromium.org/1010973013/diff/40001/base/stl_util.h File base/stl_util.h (right): https://codereview.chromium.org/1010973013/diff/40001/base/stl_util.h#newcode97 base/stl_util.h:97: STLCount(const Container* const container, const T& val) { Can |container| be a const ref instead?
https://codereview.chromium.org/1010973013/diff/40001/base/stl_util.h File base/stl_util.h (right): https://codereview.chromium.org/1010973013/diff/40001/base/stl_util.h#newcode97 base/stl_util.h:97: STLCount(const Container* const container, const T& val) { On 2015/03/26 21:48:45, Lei Zhang wrote: > Can |container| be a const ref instead? My initial iteration started out like that, but I changed it to const * for the tests. Check out https://codereview.chromium.org/1032023002/diff/40001/extensions/common/featu... for the old iteration. The const * allows for the removal of all the * references.
On 2015/03/26 21:51:48, robliao wrote: > https://codereview.chromium.org/1010973013/diff/40001/base/stl_util.h > File base/stl_util.h (right): > > https://codereview.chromium.org/1010973013/diff/40001/base/stl_util.h#newcode97 > base/stl_util.h:97: STLCount(const Container* const container, const T& val) { > On 2015/03/26 21:48:45, Lei Zhang wrote: > > Can |container| be a const ref instead? > > My initial iteration started out like that, but I changed it to const * for the > tests. > > Check out > https://codereview.chromium.org/1032023002/diff/40001/extensions/common/featu... > for the old iteration. > > The const * allows for the removal of all the * references. My reasoning that that STLCount() should be the same as c_count() in google3, just as base/stl_util.h is similar to the google3 version.
On 2015/03/26 22:15:47, Lei Zhang wrote: > On 2015/03/26 21:51:48, robliao wrote: > > https://codereview.chromium.org/1010973013/diff/40001/base/stl_util.h > > File base/stl_util.h (right): > > > > > https://codereview.chromium.org/1010973013/diff/40001/base/stl_util.h#newcode97 > > base/stl_util.h:97: STLCount(const Container* const container, const T& val) { > > On 2015/03/26 21:48:45, Lei Zhang wrote: > > > Can |container| be a const ref instead? > > > > My initial iteration started out like that, but I changed it to const * for > the > > tests. > > > > Check out > > > https://codereview.chromium.org/1032023002/diff/40001/extensions/common/featu... > > for the old iteration. > > > > The const * allows for the removal of all the * references. > > My reasoning that that STLCount() should be the same as c_count() in google3, > just as base/stl_util.h is similar to the google3 version. Fun. Well, if we want to maintain that parity, I can go ahead and update the callers.
On 2015/03/26 22:19:51, robliao wrote: > On 2015/03/26 22:15:47, Lei Zhang wrote: > > On 2015/03/26 21:51:48, robliao wrote: > > > https://codereview.chromium.org/1010973013/diff/40001/base/stl_util.h > > > File base/stl_util.h (right): > > > > > > > > > https://codereview.chromium.org/1010973013/diff/40001/base/stl_util.h#newcode97 > > > base/stl_util.h:97: STLCount(const Container* const container, const T& val) > { > > > On 2015/03/26 21:48:45, Lei Zhang wrote: > > > > Can |container| be a const ref instead? > > > > > > My initial iteration started out like that, but I changed it to const * for > > the > > > tests. > > > > > > Check out > > > > > > https://codereview.chromium.org/1032023002/diff/40001/extensions/common/featu... > > > for the old iteration. > > > > > > The const * allows for the removal of all the * references. > > > > My reasoning that that STLCount() should be the same as c_count() in google3, > > just as base/stl_util.h is similar to the google3 version. > > Fun. Well, if we want to maintain that parity, I can go ahead and update the > callers. It would be nice to. One can also argue that taking a pointer means some other set of callers will have to add a '&' to their references. Passing a reference also conveys the implication that what you are passing in is not NULL.
tab_capture_apitest.cc lgtm Other concerns: https://codereview.chromium.org/1010973013/diff/40001/base/stl_util.h File base/stl_util.h (right): https://codereview.chromium.org/1010973013/diff/40001/base/stl_util.h#newcode97 base/stl_util.h:97: STLCount(const Container* const container, const T& val) { On 2015/03/26 21:51:48, robliao wrote: > On 2015/03/26 21:48:45, Lei Zhang wrote: > > Can |container| be a const ref instead? I gotta agree with thestig@ here. "const Container& container" would be a declaration that the container argument is not optional (i.e., cannot be null), whereas the pointer version you have here leaves doubt. https://codereview.chromium.org/1010973013/diff/40001/extensions/common/featu... File extensions/common/features/simple_feature.h (right): https://codereview.chromium.org/1010973013/diff/40001/extensions/common/featu... extensions/common/features/simple_feature.h:167: static bool IsIdInList(const std::string& extension_id, Is this still needed now that you have IsIdInArray()?
Patchset #3 (id:60001) has been deleted
https://codereview.chromium.org/1010973013/diff/40001/base/stl_util.h File base/stl_util.h (right): https://codereview.chromium.org/1010973013/diff/40001/base/stl_util.h#newcode97 base/stl_util.h:97: STLCount(const Container* const container, const T& val) { On 2015/03/26 22:24:43, miu wrote: > On 2015/03/26 21:51:48, robliao wrote: > > On 2015/03/26 21:48:45, Lei Zhang wrote: > > > Can |container| be a const ref instead? > > I gotta agree with thestig@ here. "const Container& container" would be a > declaration that the container argument is not optional (i.e., cannot be null), > whereas the pointer version you have here leaves doubt. Changed and Updated. https://codereview.chromium.org/1010973013/diff/40001/extensions/common/featu... File extensions/common/features/simple_feature.h (right): https://codereview.chromium.org/1010973013/diff/40001/extensions/common/featu... extensions/common/features/simple_feature.h:167: static bool IsIdInList(const std::string& extension_id, On 2015/03/26 22:24:43, miu wrote: > Is this still needed now that you have IsIdInArray()? Unfortunately yes. We'd need to convert an array of std::string's into an array of const char *s for this combine.
base/ lgtm
The CQ bit was checked by robliao@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from rockot@chromium.org, miu@chromium.org Link to the patchset: https://codereview.chromium.org/1010973013/#ps80001 (title: "Convert STLCount to const ref argument")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1010973013/80001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: linux_chromium_compile_dbg_32_ng on tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
On 2015/03/27 02:48:23, I haz the power (commit-bot) wrote: > Try jobs failed on following builders: > linux_chromium_compile_dbg_32_ng on tryserver.chromium.linux (JOB_FAILED, > http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...) What a picky builder. Every other builder was fine with it lol.
On 2015/03/27 04:10:34, robliao wrote: > On 2015/03/27 02:48:23, I haz the power (commit-bot) wrote: > > Try jobs failed on following builders: > > linux_chromium_compile_dbg_32_ng on tryserver.chromium.linux (JOB_FAILED, > > > http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...) > > What a picky builder. Every other builder was fine with it lol. Looking at c_count, I'm inclined to drop the unsigned modifiers from the tests. c_count operates on decltype std::distance, which is what std::iterator_traits::difference_type delivers. I would expect that to include plausible negative distances.
Goofy part is that this doesn't repro on my local machine... Removed the unsigned constant from the test anyways.
The CQ bit was checked by robliao@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from thestig@chromium.org, rockot@chromium.org, miu@chromium.org Link to the patchset: https://codereview.chromium.org/1010973013/#ps120001 (title: "Remove Unsigned From Checks with STLCount")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1010973013/120001
Message was sent while issue was closed.
Committed patchset #5 (id:120001)
Message was sent while issue was closed.
Patchset 5 (id:??) landed as https://crrev.com/43eb02c2ffcfa18adfcbf34cd0507a3d7494febe Cr-Commit-Position: refs/heads/master@{#322609}
Message was sent while issue was closed.
https://codereview.chromium.org/1010973013/diff/120001/base/stl_util.h File base/stl_util.h (right): https://codereview.chromium.org/1010973013/diff/120001/base/stl_util.h#newcod... base/stl_util.h:209: bool ContainsValue(const Collection& collection, const Value& value) { why these and the above functions were not added to base namespace?
Message was sent while issue was closed.
https://codereview.chromium.org/1010973013/diff/120001/base/stl_util.h File base/stl_util.h (right): https://codereview.chromium.org/1010973013/diff/120001/base/stl_util.h#newcod... base/stl_util.h:209: bool ContainsValue(const Collection& collection, const Value& value) { On 2015/04/03 01:26:37, tfarina wrote: > why these and the above functions were not added to base namespace? Consistency was the major driving factor. ContainsKey already lived in the global namespace, so it was natural that ContainsValue would live there as well. As for STLCount more STL* items lived outside of base namespace, so it was majority wins. I would agree that putting all of the functions here into something like base::stl would be nice. That refactor would certainly be a large undertaking. |