|
|
Description[Courgette] PagedArray: Add Iterators and Parametrize Page Size as int Template.
This is a refactoring CL to enable PagedArray usage by libdivsufsort.
In addition to overloading operator[], for more general usage we need
need pointer-like accessors to PagedArray. To this end we implement
PagedArray_const_iterator and PagedArray_const_iterator, which merely
wraps a PagedArray pointer along with an index. We also add various
operators needed by libdivsufsort. For optimization, '<' and '<='
operators omits pointer checks.
By default PagedArray page size is 2**18 elements (1 MiB for int32_t).
To enable better testing, we made (log) page size a tepmlate parameter.
BUG=608885
Committed: https://crrev.com/804ed8a1e0f236f22e19abef7013e9641a3ad49d
Cr-Commit-Position: refs/heads/master@{#397311}
Patch Set 1 #
Total comments: 22
Patch Set 2 : Make better use of C++11. #Patch Set 3 : Make Clang happy. #Patch Set 4 : Follow etiennep@'s suggestion to use single iterator class. #
Total comments: 12
Patch Set 5 : More cleanup. #Patch Set 6 : git cl format #
Total comments: 8
Patch Set 7 : Adding more checks for Debug build only. #Patch Set 8 : Reverting 2 versions of operator[]() to duplicate code. #Patch Set 9 : Add operators > and >= to make Clang std::sort() happy. #
Messages
Total messages: 41 (15 generated)
huangs@chromium.org changed reviewers: + chrisha@chromium.org, etiennep@google.com
Prerequisite for adopting libdivsufsort, PTAL. Thanks!
Great, I love iterators! Round one. https://codereview.chromium.org/2008553007/diff/1/courgette/third_party/bsdif... File courgette/third_party/bsdiff/paged_array.h (right): https://codereview.chromium.org/2008553007/diff/1/courgette/third_party/bsdif... courgette/third_party/bsdiff/paged_array.h:41: struct iterator_traits<class courgette::PagedArray_iterator<T, LOG_PAGE_SIZE>> { We can declare these typedefs directly in the iterator class, and iterator_traits does not need to be specialized. Optionally, we can get them by making the iterator class derived from std::iterator<random_access_iterator_tag, T>. https://codereview.chromium.org/2008553007/diff/1/courgette/third_party/bsdif... courgette/third_party/bsdiff/paged_array.h:64: class PagedArray_const_iterator { This does a lot of duplicate code. I don't think we need to define two iterators. ContainerType could be injected through a template parameter. We can then define iterators in PagedArray class as const_iterator = PagedArray_iterator<const PagedArray, const T, LOG_PAGE_SIZE>; iterator = PagedArray_iterator<PagedArray, T, LOG_PAGE_SIZE>; https://codereview.chromium.org/2008553007/diff/1/courgette/third_party/bsdif... courgette/third_party/bsdiff/paged_array.h:98: ConstItType operator+=(size_t delta) { NIT: the iterator concept requires delta to be difference_type (std::ptrdiff_t) https://codereview.chromium.org/2008553007/diff/1/courgette/third_party/bsdif... courgette/third_party/bsdiff/paged_array.h:115: bool operator!=(const ConstItType& it) const { The implementation could be {return !(*this == it);} https://codereview.chromium.org/2008553007/diff/1/courgette/third_party/bsdif... courgette/third_party/bsdiff/paged_array.h:145: using ConstItType = PagedArray_const_iterator<T, LOG_PAGE_SIZE>; If you were to merge both iterator, we can get ConstItType with PagedArray_const_iterator< typename add_const<ContainerType>::type, typename add_const<T>::type, LOG_PAGE_SIZE> Even if ConstItType is the same type as the iterator itself, conversion operator overload works in any case. https://codereview.chromium.org/2008553007/diff/1/courgette/third_party/bsdif... courgette/third_party/bsdiff/paged_array.h:154: ~PagedArray_iterator() { } Would be more consistent = default https://codereview.chromium.org/2008553007/diff/1/courgette/third_party/bsdif... courgette/third_party/bsdiff/paged_array.h:160: We could put operator-> as it is a requirement for iterators (it would probably never be used though) https://codereview.chromium.org/2008553007/diff/1/courgette/third_party/bsdif... courgette/third_party/bsdiff/paged_array.h:161: const T& operator[](size_t idx) const { return (*array_)[index_ + idx]; } Not very useful. A const-qualified iterator (could not increment) does not mean data is const. https://codereview.chromium.org/2008553007/diff/1/courgette/third_party/bsdif... courgette/third_party/bsdiff/paged_array.h:184: ItType& operator=(void*) { The arguments should be std::nullptr_t instead. https://codereview.chromium.org/2008553007/diff/1/courgette/third_party/bsdif... courgette/third_party/bsdiff/paged_array.h:195: bool operator==(const void* p) const { std::nullptr_t again. There would be non need to check. https://codereview.chromium.org/2008553007/diff/1/courgette/third_party/bsdif... courgette/third_party/bsdiff/paged_array.h:255: const T& operator[](size_t i) const { This is me asking : If PagedArray is meant to support array bigger than 4GB on 32bit systems, wouldn't size_t (probably equivalent to uint32_t) overflow anyway (I'm guessing i'm wrong because it is being used in bsdiff, but why)?
Thanks! Looked at std::vector::iterator and did it a more conventional way of having non-const iterator inherit from const iterator. Gonna do dry run see if this compiles uniformly. Going to try the suggested template method next. https://codereview.chromium.org/2008553007/diff/1/courgette/third_party/bsdif... File courgette/third_party/bsdiff/paged_array.h (right): https://codereview.chromium.org/2008553007/diff/1/courgette/third_party/bsdif... courgette/third_party/bsdiff/paged_array.h:41: struct iterator_traits<class courgette::PagedArray_iterator<T, LOG_PAGE_SIZE>> { Went with directly declaring, using "using". https://codereview.chromium.org/2008553007/diff/1/courgette/third_party/bsdif... courgette/third_party/bsdiff/paged_array.h:64: class PagedArray_const_iterator { For this revision, went with the standard way of defining const iterator with mutable element, then have non-const iterator inherited class. Still have to implement many operators though. Maybe the way you described is more modern, and should try that next? Does it allow one-way cast from non-const to const? https://codereview.chromium.org/2008553007/diff/1/courgette/third_party/bsdif... courgette/third_party/bsdiff/paged_array.h:98: ConstItType operator+=(size_t delta) { On 2016/05/25 14:19:01, etiennep wrote: > NIT: the iterator concept requires delta to be difference_type (std::ptrdiff_t) Done. https://codereview.chromium.org/2008553007/diff/1/courgette/third_party/bsdif... courgette/third_party/bsdiff/paged_array.h:115: bool operator!=(const ConstItType& it) const { On 2016/05/25 14:19:02, etiennep wrote: > The implementation could be {return !(*this == it);} Done. https://codereview.chromium.org/2008553007/diff/1/courgette/third_party/bsdif... courgette/third_party/bsdiff/paged_array.h:145: using ConstItType = PagedArray_const_iterator<T, LOG_PAGE_SIZE>; May try this next. So conversion operator will seamlessly handle non-const to const cast, and raise error if we try the reverse? https://codereview.chromium.org/2008553007/diff/1/courgette/third_party/bsdif... courgette/third_party/bsdiff/paged_array.h:154: ~PagedArray_iterator() { } On 2016/05/25 14:19:02, etiennep wrote: > Would be more consistent = default Done. https://codereview.chromium.org/2008553007/diff/1/courgette/third_party/bsdif... courgette/third_party/bsdiff/paged_array.h:160: Added, with tests. https://codereview.chromium.org/2008553007/diff/1/courgette/third_party/bsdif... courgette/third_party/bsdiff/paged_array.h:161: const T& operator[](size_t idx) const { return (*array_)[index_ + idx]; } Good point; removed! Also removing const T& operator*(). And adding "const" to functions that don't affect the iterator itself (but perhaps the data it points to). https://codereview.chromium.org/2008553007/diff/1/courgette/third_party/bsdif... courgette/third_party/bsdiff/paged_array.h:184: ItType& operator=(void*) { On 2016/05/25 14:19:01, etiennep wrote: > The arguments should be std::nullptr_t instead. Done. https://codereview.chromium.org/2008553007/diff/1/courgette/third_party/bsdif... courgette/third_party/bsdiff/paged_array.h:195: bool operator==(const void* p) const { On 2016/05/25 14:19:02, etiennep wrote: > std::nullptr_t again. There would be non need to check. Done. https://codereview.chromium.org/2008553007/diff/1/courgette/third_party/bsdif... courgette/third_party/bsdiff/paged_array.h:255: const T& operator[](size_t i) const { PagedArray is meant to solve memory fragmentation problem; otherwise we'll need 1 GB of consecutive memory address, and that's likely to be unavailable.
The CQ bit was checked by huangs@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/2008553007/20001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/2008553007/20001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: cast_shell_linux on tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/cast_shell_linu...)
The CQ bit was checked by huangs@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/2008553007/40001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/2008553007/40001
Dry run: None
Updated. There were more details to be flush out, but should be good. PTAL. Thanks!
The CQ bit was checked by huangs@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/2008553007/60001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/2008553007/60001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Dry run: None
Round two! I think conversions between different iterator types is handled in a clever way. https://codereview.chromium.org/2008553007/diff/60001/courgette/third_party/b... File courgette/third_party/bsdiff/paged_array.h (right): https://codereview.chromium.org/2008553007/diff/60001/courgette/third_party/b... courgette/third_party/bsdiff/paged_array.h:16: #include <cstddef> Redundant with stddef.h https://codereview.chromium.org/2008553007/diff/60001/courgette/third_party/b... courgette/third_party/bsdiff/paged_array.h:53: virtual ~PagedArray_iterator() = default; I don't think virtual is necessary anymore. https://codereview.chromium.org/2008553007/diff/60001/courgette/third_party/b... courgette/third_party/bsdiff/paged_array.h:117: difference_type operator-(const PagedArray_iterator<ContainerType2, T2>& it) const { Line too long? https://codereview.chromium.org/2008553007/diff/60001/courgette/third_party/b... courgette/third_party/bsdiff/paged_array.h:143: PagedArray() { } = default https://codereview.chromium.org/2008553007/diff/60001/courgette/third_party/b... courgette/third_party/bsdiff/paged_array.h:159: const T& operator[](size_t i) const { Optionally, implement this one using the other one: return const_cast<PagedArray&>(*this)[i]; https://codereview.chromium.org/2008553007/diff/60001/courgette/third_party/b... courgette/third_party/bsdiff/paged_array.h:205: size_t page_count_ = 0U; If we have size_, we don't really need page_count_, as it can be easily computed (and only used when calling clear)
Updated, PTAL. https://codereview.chromium.org/2008553007/diff/60001/courgette/third_party/b... File courgette/third_party/bsdiff/paged_array.h (right): https://codereview.chromium.org/2008553007/diff/60001/courgette/third_party/b... courgette/third_party/bsdiff/paged_array.h:16: #include <cstddef> On 2016/05/26 14:03:29, etiennep wrote: > Redundant with stddef.h Done. https://codereview.chromium.org/2008553007/diff/60001/courgette/third_party/b... courgette/third_party/bsdiff/paged_array.h:53: virtual ~PagedArray_iterator() = default; On 2016/05/26 14:03:30, etiennep wrote: > I don't think virtual is necessary anymore. Done. https://codereview.chromium.org/2008553007/diff/60001/courgette/third_party/b... courgette/third_party/bsdiff/paged_array.h:117: difference_type operator-(const PagedArray_iterator<ContainerType2, T2>& it) const { On 2016/05/26 14:03:29, etiennep wrote: > Line too long? Done. https://codereview.chromium.org/2008553007/diff/60001/courgette/third_party/b... courgette/third_party/bsdiff/paged_array.h:143: PagedArray() { } On 2016/05/26 14:03:29, etiennep wrote: > = default Done. https://codereview.chromium.org/2008553007/diff/60001/courgette/third_party/b... courgette/third_party/bsdiff/paged_array.h:159: const T& operator[](size_t i) const { I'd rather avoid unneeded use of const_cast<>. Adding a common inline helper routine that's used by both. https://codereview.chromium.org/2008553007/diff/60001/courgette/third_party/b... courgette/third_party/bsdiff/paged_array.h:205: size_t page_count_ = 0U; True. However, note that page_count_ is used in the Allocate()'s "ramp-up" loop, and on failure is used to "ramp-down". Changing how this works is slightly risky. On principle, the change should not be bundled into this CL.
lgtm
Ping chrisha@ for review, PTAL.
lgtm with nits https://codereview.chromium.org/2008553007/diff/100001/courgette/third_party/... File courgette/third_party/bsdiff/paged_array.h (right): https://codereview.chromium.org/2008553007/diff/100001/courgette/third_party/... courgette/third_party/bsdiff/paged_array.h:30: template <typename ContainerType, typename T> A class comment would be nice: ie, this provides a random access iterator with pointer-like semantics, for the PagedArray class. https://codereview.chromium.org/2008553007/diff/100001/courgette/third_party/... courgette/third_party/bsdiff/paged_array.h:104: // Not bothering to check array_, for performance. Could put such a check behind #ifdef DEBUG or #ifndef NDEBUG constants, so at least it would run in debug-build tests. https://codereview.chromium.org/2008553007/diff/100001/courgette/third_party/... courgette/third_party/bsdiff/paged_array.h:109: // Not bothering to check array_, for performance. Ditto https://codereview.chromium.org/2008553007/diff/100001/courgette/third_party/... courgette/third_party/bsdiff/paged_array.h:192: // bsdiff_create run 2x slower (even when compiled optimized.) Ditto
Thanks! Updated and committing soon. Local benchmarking shows that having operator[]() use common code via at() slows BSDiff phase by 5% in the Release build . Decided to be conservative, so reverted to having duplicated code for these. https://codereview.chromium.org/2008553007/diff/100001/courgette/third_party/... File courgette/third_party/bsdiff/paged_array.h (right): https://codereview.chromium.org/2008553007/diff/100001/courgette/third_party/... courgette/third_party/bsdiff/paged_array.h:30: template <typename ContainerType, typename T> On 2016/05/31 20:15:17, chrisha (slow) wrote: > A class comment would be nice: ie, this provides a random access iterator with > pointer-like semantics, for the PagedArray class. Done. https://codereview.chromium.org/2008553007/diff/100001/courgette/third_party/... courgette/third_party/bsdiff/paged_array.h:104: // Not bothering to check array_, for performance. On 2016/05/31 20:15:16, chrisha (slow) wrote: > Could put such a check behind #ifdef DEBUG or #ifndef NDEBUG constants, so at > least it would run in debug-build tests. Done. https://codereview.chromium.org/2008553007/diff/100001/courgette/third_party/... courgette/third_party/bsdiff/paged_array.h:109: // Not bothering to check array_, for performance. On 2016/05/31 20:15:16, chrisha (slow) wrote: > Ditto Done. https://codereview.chromium.org/2008553007/diff/100001/courgette/third_party/... courgette/third_party/bsdiff/paged_array.h:192: // bsdiff_create run 2x slower (even when compiled optimized.) Done. Remeasured impact of DCHECK on Release: About 1.4x slowdown for the bsdiff phase, so updated comment.
The CQ bit was checked by huangs@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from etiennep@google.com, chrisha@chromium.org Link to the patchset: https://codereview.chromium.org/2008553007/#ps140001 (title: "Reverting 2 versions of operator[]() to duplicate code.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/2008553007/140001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/2008553007/140001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: android_arm64_dbg_recipe on tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_arm6...)
The CQ bit was checked by huangs@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from chrisha@chromium.org, etiennep@google.com Link to the patchset: https://codereview.chromium.org/2008553007/#ps160001 (title: "Add operators > and >= to make Clang std::sort() happy.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/2008553007/160001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/2008553007/160001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: linux_chromium_rel_ng on tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by huangs@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/2008553007/160001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/2008553007/160001
Message was sent while issue was closed.
Committed patchset #9 (id:160001)
Message was sent while issue was closed.
Description was changed from ========== [Courgette] PagedArray: Add Iterators and Parametrize Page Size as int Template. This is a refactoring CL to enable PagedArray usage by libdivsufsort. In addition to overloading operator[], for more general usage we need need pointer-like accessors to PagedArray. To this end we implement PagedArray_const_iterator and PagedArray_const_iterator, which merely wraps a PagedArray pointer along with an index. We also add various operators needed by libdivsufsort. For optimization, '<' and '<=' operators omits pointer checks. By default PagedArray page size is 2**18 elements (1 MiB for int32_t). To enable better testing, we made (log) page size a tepmlate parameter. BUG=608885 ========== to ========== [Courgette] PagedArray: Add Iterators and Parametrize Page Size as int Template. This is a refactoring CL to enable PagedArray usage by libdivsufsort. In addition to overloading operator[], for more general usage we need need pointer-like accessors to PagedArray. To this end we implement PagedArray_const_iterator and PagedArray_const_iterator, which merely wraps a PagedArray pointer along with an index. We also add various operators needed by libdivsufsort. For optimization, '<' and '<=' operators omits pointer checks. By default PagedArray page size is 2**18 elements (1 MiB for int32_t). To enable better testing, we made (log) page size a tepmlate parameter. BUG=608885 Committed: https://crrev.com/804ed8a1e0f236f22e19abef7013e9641a3ad49d Cr-Commit-Position: refs/heads/master@{#397311} ==========
Message was sent while issue was closed.
Patchset 9 (id:??) landed as https://crrev.com/804ed8a1e0f236f22e19abef7013e9641a3ad49d Cr-Commit-Position: refs/heads/master@{#397311}
Message was sent while issue was closed.
thakis@chromium.org changed reviewers: + thakis@chromium.org
Message was sent while issue was closed.
changes to third_party code should update some README.chromium to mention in what ways we're forking upstream code
Message was sent while issue was closed.
PagedArray is purely a Google addition. It's kinda weird that it lives in this directory. There will be more activities in the area though, specifically, I will: - Make licensing comments more uniform in \courgette\third_party\bsdiff, and remove ad hoc exceptions in tools/checklicenses/checklicenses.py - Move BSDiff-specific code from courgette namespace to bsdiff namespace - Replace QSufSort with divsufsort, which will require third_party code review. thakis@: Would be interested in being CC'ed on these issues? |