|
|
DescriptionModify DocumentMarkerController to support overlapping and nested DocumentMarkers
I have a CL in progress right now that adds support for Android SuggestionSpans
(underlined ranges of text marked by the IME that a user can tap on to get a
list of suggested replacements):
https://codereview.chromium.org/2650113004
I was refactoring it and noticed DocumentMarkerController::removeMarkers()
doesn't work properly with nested markers. There's a comment above addMarker()
noting that markers of the same type should not overlap, but then in the very
same method, mergeOverlapping() is not called for TextMatch or Composition
markers, which suggests that those marker types *are* allowed to overlap. So I
think this comment is out-of-date, and someone added support for overlapping
markers without fixing removeMarkers(), which is unfortunate.
Basically the issue is that markers are stored ordered by start position, and if
they're allowed to overlap, we can't assume that ordering them by end position
results in the same order, so I've just removed any code from this file that
tries to make that faulty assumption (so I also updated setMarkersActive()).
BUG=686850
Patch Set 1 #
Total comments: 1
Patch Set 2 : Fix DocumentMarkerController::setMarkersActive() #Patch Set 3 : Update commit message #Patch Set 4 : Make sure DocumentMarker lists stay sorted after removing a range that overlaps multiple markers #Patch Set 5 : Remove comment that accidentally got left in #Patch Set 6 : Remove debug code accidentally left in test #
Total comments: 2
Patch Set 7 : Clean up according to yosin's comments #Patch Set 8 : Fix bug (caught by layout test) #Patch Set 9 : Fix bug resulting from potential iterator invalidation (one of the trybots seems to have run into t… #
Total comments: 4
Patch Set 10 : Use range-for statement in setMarkersActive() #
Messages
Total messages: 73 (43 generated)
rlanday@chromium.org changed reviewers: + yosin@chromium.org
rlanday@chromium.org changed required reviewers: + yosin@chromium.org
https://codereview.chromium.org/2665923002/diff/1/third_party/WebKit/Source/c... File third_party/WebKit/Source/core/editing/markers/DocumentMarkerControllerTest.cpp (right): https://codereview.chromium.org/2665923002/diff/1/third_party/WebKit/Source/c... third_party/WebKit/Source/core/editing/markers/DocumentMarkerControllerTest.cpp:285: markerController().removeMarkers( This actually crashes DocumentMarkerController without my patch!
The CQ bit was checked by rlanday@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_chromium_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_...)
The CQ bit was checked by rlanday@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
xiaochengh@chromium.org changed reviewers: + xiaochengh@chromium.org
Could you give an example of why overlapping markers are needed? The no-merging behavior for Composition markers is just for adjacent markers, not overlapping ones: https://codereview.chromium.org/1347313002 The behavior for TextMatch seems introduced a long time ago, though, and I'm not sure why it is there.
On 2017/02/01 at 08:27:57, xiaochengh wrote: > Could you give an example of why overlapping markers are needed? > > The no-merging behavior for Composition markers is just for adjacent markers, not overlapping ones: https://codereview.chromium.org/1347313002 > > The behavior for TextMatch seems introduced a long time ago, though, and I'm not sure why it is there. Sure. I'm adding a marker type to support Android SuggestionSpans, which are essentially ranges of text the IME can mark as having a list of suggested replacements associated with them. In particular, these are used by the Voice IME. For example, I dictated some gibberish into my phone and got the result "Spell seems been posting". The Voice IME has added SuggestionSpans for the ranges "Spell seems", "seems", "seems been", and "been posting" (and probably a few more as well). When you tap on a word, we look at all the spans that contain where you tapped and get a list of suggestions to display to the user, and highlight the union of those spans.
The example is reasonable. I'm OK with the code change. Please also fix the patch description because: - This patch also modifies setMarkersActive - This patch adds support for overlapping markers, not just nesting
Description was changed from ========== Modify DocumentMarkerController::removeMarkers() to support nested DocumentMarkers I have a CL in progress right now that adds support for Android SuggestionSpans (underlined ranges of text marked by the IME that a user can tap on to get a list of suggested replacements): https://codereview.chromium.org/2650113004 I was refactoring it and noticed DocumentMarkerController::removeMarkers() doesn't work properly with nested markers. There's a comment above addMarker() noting that markers of the same type should not overlap, but then in the very same method, mergeOverlapping() is not called for TextMatch or Composition markers, which suggests that those marker types *are* allowed to overlap. So I think this comment is out-of-date, and someone added support for overlapping markers without fixing removeMarkers(), which is unfortunate. Basically the issue is that markers are stored ordered by start position, and if they're allowed to overlap, we can't assume that ordering them by end position results in the same order, so I've just removed any code from this file that tries to make that faulty assumption. BUG=686850 ========== to ========== Modify DocumentMarkerController to support overlapping and nested DocumentMarkers I have a CL in progress right now that adds support for Android SuggestionSpans (underlined ranges of text marked by the IME that a user can tap on to get a list of suggested replacements): https://codereview.chromium.org/2650113004 I was refactoring it and noticed DocumentMarkerController::removeMarkers() doesn't work properly with nested markers. There's a comment above addMarker() noting that markers of the same type should not overlap, but then in the very same method, mergeOverlapping() is not called for TextMatch or Composition markers, which suggests that those marker types *are* allowed to overlap. So I think this comment is out-of-date, and someone added support for overlapping markers without fixing removeMarkers(), which is unfortunate. Basically the issue is that markers are stored ordered by start position, and if they're allowed to overlap, we can't assume that ordering them by end position results in the same order, so I've just removed any code from this file that tries to make that faulty assumption (so I also updated setMarkersActive()). BUG=686850 ==========
Ok, I've updated the description (and updated the commit message, if that's necessary)
xiaochengh@chromium.org changed reviewers: + tkent@chromium.org
lgtm +tkent as owner
lgtm
The CQ bit was checked by rlanday@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
All required reviewers (with asterisk prefixes) have not yet approved this CL. No L-G-T-M from a valid reviewer yet. CQ run can only be started by full committers or once the patch has received an L-G-T-M from a full committer. Even if an L-G-T-M may have been provided, it was from a non-committer, _not_ a full super star committer. Committers are members of the group "project-chromium-committers". Note that this has nothing to do with OWNERS files.
rlanday@chromium.org changed required reviewers: - yosin@chromium.org
I just noticed there's actually another thing I think I should fix in this CL: If you call removeMarkers() with the DoNotRemovePartiallyOverlappingMarker flag, it will sort of slice up a marker that contains the entire range you're removing, but isn't exactly equal to the range you're removing (i.e. it contains extra text). As an example, let's say you have the following markers: 32:[0:14](0) 32:[0:24](0) 32:[0:8](0) We call removeMarkers with an offset of 13 and a count 5 to prepare for deleting that range of text: 32:[0:14](0) => 32:[0:13](0) 32:[0:24](0) => 32:[0:13](0), 32:[18:24](0) 32:[0:8](0) => 32:[0:8](0) Now we have, in this order: 32:[0:13](0) 32:[0:13](0) 32:[18:24](0) 32:[0:8](0) Problematically, the markers are no longer sorted by start offset. I think the most straightforward way to fix this is to sort the list immediately after we're done with all the inserts. I will add a test case for this and update the CL.
The CQ bit was checked by rlanday@chromium.org to run a CQ dry run
The CQ bit was checked by rlanday@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
On 2017/02/03 at 19:29:17, rlanday wrote: > I just noticed there's actually another thing I think I should fix in this CL: > > If you call removeMarkers() with the DoNotRemovePartiallyOverlappingMarker flag, it will sort of slice up a marker that contains the entire range you're removing, but isn't exactly equal to the range you're removing (i.e. it contains extra text). > > As an example, let's say you have the following markers: > 32:[0:14](0) 32:[0:24](0) 32:[0:8](0) > > We call removeMarkers with an offset of 13 and a count 5 to prepare for deleting that range of text: > > 32:[0:14](0) => 32:[0:13](0) > 32:[0:24](0) => 32:[0:13](0), 32:[18:24](0) > 32:[0:8](0) => 32:[0:8](0) > > Now we have, in this order: > 32:[0:13](0) 32:[0:13](0) 32:[18:24](0) 32:[0:8](0) > > Problematically, the markers are no longer sorted by start offset. I think the most straightforward way to fix this is to sort the list immediately after we're done with all the inserts. I will add a test case for this and update the CL. Ok, I've fixed this bug (by sorting the list after inserting a marker slice), and also added some more test cases (DeleteSplittingNestedMarkers is the only one that currently fails on master), does this still look good?
Dry run: Try jobs failed on following builders: linux_chromium_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...) mac_chromium_rel_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_...)
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_asan_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by rlanday@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
https://codereview.chromium.org/2665923002/diff/100001/third_party/WebKit/Sou... File third_party/WebKit/Source/core/editing/markers/DocumentMarkerController.cpp (right): https://codereview.chromium.org/2665923002/diff/100001/third_party/WebKit/Sou... third_party/WebKit/Source/core/editing/markers/DocumentMarkerController.cpp:418: std::sort(list->begin(), list->end(), compareByStart); Can we do smarter way rather than sorting entire list? Also, using flag |insertSlice| looks not sophisticated.
https://codereview.chromium.org/2665923002/diff/100001/third_party/WebKit/Sou... File third_party/WebKit/Source/core/editing/markers/DocumentMarkerController.cpp (right): https://codereview.chromium.org/2665923002/diff/100001/third_party/WebKit/Sou... third_party/WebKit/Source/core/editing/markers/DocumentMarkerController.cpp:418: std::sort(list->begin(), list->end(), compareByStart); On 2017/02/06 at 04:54:58, yosin_UTC9 wrote: > Can we do smarter way rather than sorting entire list? > Also, using flag |insertSlice| looks not sophisticated. I thought about trying to directly insert the slices into the correct locations, but was thinking that would be weird/not work because we're doing this inside a loop over all the markers, so if we're inserting markers further on in the list, the loop might do something it shouldn't with the new markers. But I guess since the new markers shouldn't have any overlap with the range we're removing, that should probably work?
@yosin, I think I've addressed your comments
The CQ bit was checked by rlanday@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...) mac_chromium_rel_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_...)
The CQ bit was checked by rlanday@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_asan_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by rlanday@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2665923002/diff/160001/third_party/WebKit/Sou... File third_party/WebKit/Source/core/editing/markers/DocumentMarkerController.cpp (right): https://codereview.chromium.org/2665923002/diff/160001/third_party/WebKit/Sou... third_party/WebKit/Source/core/editing/markers/DocumentMarkerController.cpp:373: for (size_t markerIndex = 0; markerIndex < list->size();) { Can we use std::remove_if() for improving code health? I feel fair to see for-statement without step part. https://codereview.chromium.org/2665923002/diff/160001/third_party/WebKit/Sou... third_party/WebKit/Source/core/editing/markers/DocumentMarkerController.cpp:805: for (MarkerList::iterator marker = list->begin(); marker != list->end(); Can we use range-for statement?
>There's a comment above addMarker() >noting that markers of the same type should not overlap, but then in the very >same method, mergeOverlapping() is not called for TextMatch or Composition >markers, which suggests that those marker types *are* allowed to overlap. So I >think this comment is out-of-date, and someone added support for overlapping >markers without fixing removeMarkers(), which is unfortunate. I think the comment is correct. We should not have overlapping markers. Composition and TextMatch markers aren't overlapped as of these nature. Not to call mergeOverlapping() for Composition and TextMatch is a kind of optimization. We should have DCHECK() to make sure inserting marker of Composition and TextMatch not overlapped.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
On 2017/02/10 at 07:30:42, yosin wrote: > >There's a comment above addMarker() > >noting that markers of the same type should not overlap, but then in the very > >same method, mergeOverlapping() is not called for TextMatch or Composition > >markers, which suggests that those marker types *are* allowed to overlap. So I > >think this comment is out-of-date, and someone added support for overlapping > >markers without fixing removeMarkers(), which is unfortunate. > > I think the comment is correct. We should not have overlapping markers. > Composition and TextMatch markers aren't overlapped as of these nature. > Not to call mergeOverlapping() for Composition and TextMatch is a kind of > optimization. We should have DCHECK() to make sure inserting marker of > Composition and TextMatch not overlapped. Is there any way to check that efficiently?
https://codereview.chromium.org/2665923002/diff/160001/third_party/WebKit/Sou... File third_party/WebKit/Source/core/editing/markers/DocumentMarkerController.cpp (right): https://codereview.chromium.org/2665923002/diff/160001/third_party/WebKit/Sou... third_party/WebKit/Source/core/editing/markers/DocumentMarkerController.cpp:373: for (size_t markerIndex = 0; markerIndex < list->size();) { On 2017/02/10 at 07:06:27, yosin_UTC9 wrote: > Can we use std::remove_if() for improving code health? > I feel fair to see for-statement without step part. I don't know of a good way we can use remove_if() and also handle re-inserting the slices. I think we would do remove_if(), then all the markers we're removing would be at the end of the vector, and we'd have to check those to see if we have to insert slices for them, then insert slices in the correct positions all over the vector, then finally erase the elements that remove_if() put at the end, which seems really messy. https://codereview.chromium.org/2665923002/diff/160001/third_party/WebKit/Sou... third_party/WebKit/Source/core/editing/markers/DocumentMarkerController.cpp:805: for (MarkerList::iterator marker = list->begin(); marker != list->end(); On 2017/02/10 at 07:06:27, yosin_UTC9 wrote: > Can we use range-for statement? Probably, I'll switch it
The CQ bit was checked by rlanday@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
On 2017/02/10 at 18:59:35, rlanday wrote: > https://codereview.chromium.org/2665923002/diff/160001/third_party/WebKit/Sou... > File third_party/WebKit/Source/core/editing/markers/DocumentMarkerController.cpp (right): > > https://codereview.chromium.org/2665923002/diff/160001/third_party/WebKit/Sou... > third_party/WebKit/Source/core/editing/markers/DocumentMarkerController.cpp:373: for (size_t markerIndex = 0; markerIndex < list->size();) { > On 2017/02/10 at 07:06:27, yosin_UTC9 wrote: > > Can we use std::remove_if() for improving code health? > > I feel fair to see for-statement without step part. > > I don't know of a good way we can use remove_if() and also handle re-inserting the slices. I think we would do remove_if(), then all the markers we're removing would be at the end of the vector, and we'd have to check those to see if we have to insert slices for them, then insert slices in the correct positions all over the vector, then finally erase the elements that remove_if() put at the end, which seems really messy. > How about using wtf/LinkedHashSet.h? To move elements after vector takes time time. So, tree is faster than vector in our use case. It is trade off between memory and speed. > https://codereview.chromium.org/2665923002/diff/160001/third_party/WebKit/Sou... > third_party/WebKit/Source/core/editing/markers/DocumentMarkerController.cpp:805: for (MarkerList::iterator marker = list->begin(); marker != list->end(); > On 2017/02/10 at 07:06:27, yosin_UTC9 wrote: > > Can we use range-for statement? > > Probably, I'll switch it
> > I don't know of a good way we can use remove_if() and also handle re-inserting the slices. I think we would do remove_if(), then all the markers we're removing would be at the end of the vector, and we'd have to check those to see if we have to insert slices for them, then insert slices in the correct positions all over the vector, then finally erase the elements that remove_if() put at the end, which seems really messy. > > > How about using wtf/LinkedHashSet.h? To move elements after vector takes time time. So, tree is faster than vector in our use case. It is trade off between memory and speed. To clarify, you're saying we should use a LinkedHashSet instead of storing the markers sorted in a vector by start offset? Or are you just describing some way of using remove_if()?
On 2017/02/13 at 21:53:28, rlanday wrote: > > > I don't know of a good way we can use remove_if() and also handle re-inserting the slices. I think we would do remove_if(), then all the markers we're removing would be at the end of the vector, and we'd have to check those to see if we have to insert slices for them, then insert slices in the correct positions all over the vector, then finally erase the elements that remove_if() put at the end, which seems really messy. > > > > > How about using wtf/LinkedHashSet.h? To move elements after vector takes time time. So, tree is faster than vector in our use case. It is trade off between memory and speed. > > To clarify, you're saying we should use a LinkedHashSet instead of storing the markers sorted in a vector by start offset? Or are you just describing some way of using remove_if()? Sorry for confusion. I say we would like to use |LinkedHashSet|, which is ordered set, to represents list of markers. Using |LinkedHashSet| might resolve slowness issue for big text editing with lots of spelling markers. DocumentMarker and DocumentMarkerController is very old code, 10+ years old, and it is technical debt and it is pain to work with. It is better to clean up at this time rather after making code more complex.
On 2017/02/14 at 07:49:03, yosin wrote: > On 2017/02/13 at 21:53:28, rlanday wrote: > > > > I don't know of a good way we can use remove_if() and also handle re-inserting the slices. I think we would do remove_if(), then all the markers we're removing would be at the end of the vector, and we'd have to check those to see if we have to insert slices for them, then insert slices in the correct positions all over the vector, then finally erase the elements that remove_if() put at the end, which seems really messy. > > > > > > > How about using wtf/LinkedHashSet.h? To move elements after vector takes time time. So, tree is faster than vector in our use case. It is trade off between memory and speed. > > > > To clarify, you're saying we should use a LinkedHashSet instead of storing the markers sorted in a vector by start offset? Or are you just describing some way of using remove_if()? > > Sorry for confusion. I say we would like to use |LinkedHashSet|, which is ordered set, to represents list of markers. Using |LinkedHashSet| might resolve slowness issue for big text editing with lots of spelling markers. > > DocumentMarker and DocumentMarkerController is very old code, 10+ years old, and it is technical debt and it is pain to work with. > It is better to clean up at this time rather after making code more complex. So then we're no longer going to keep the list of markers sorted (since there's no advantage to keeping a linked list in sorted order since we have to traverse it linearly anyway)? How do we know this is actually a performance advantage?
On 2017/02/14 at 21:01:44, rlanday wrote: > On 2017/02/14 at 07:49:03, yosin wrote: > > On 2017/02/13 at 21:53:28, rlanday wrote: > > > > > I don't know of a good way we can use remove_if() and also handle re-inserting the slices. I think we would do remove_if(), then all the markers we're removing would be at the end of the vector, and we'd have to check those to see if we have to insert slices for them, then insert slices in the correct positions all over the vector, then finally erase the elements that remove_if() put at the end, which seems really messy. > > > > > > > > > How about using wtf/LinkedHashSet.h? To move elements after vector takes time time. So, tree is faster than vector in our use case. It is trade off between memory and speed. > > > > > > To clarify, you're saying we should use a LinkedHashSet instead of storing the markers sorted in a vector by start offset? Or are you just describing some way of using remove_if()? > > > > Sorry for confusion. I say we would like to use |LinkedHashSet|, which is ordered set, to represents list of markers. Using |LinkedHashSet| might resolve slowness issue for big text editing with lots of spelling markers. > > > > DocumentMarker and DocumentMarkerController is very old code, 10+ years old, and it is technical debt and it is pain to work with. > > It is better to clean up at this time rather after making code more complex. > > So then we're no longer going to keep the list of markers sorted (since there's no advantage to keeping a linked list in sorted order since we have to traverse it linearly anyway)? How do we know this is actually a performance advantage? Sorry for confusion, LinkedHashSet and ListHashSet are *not* replacement of std::set<T>.
> > So then we're no longer going to keep the list of markers sorted (since there's no advantage to keeping a linked list in sorted order since we have to traverse it linearly anyway)? How do we know this is actually a performance advantage? > > Sorry for confusion, LinkedHashSet and ListHashSet are *not* replacement of std::set<T>. They're linked lists that allow removal of elements in O(1) time, right? The issue I see is that presumably we store DocumentMarkers in a vector ordered by start offset because there are certain operations that the sorting makes faster. E.g. when we add a marker and we check if there are any overlapping markers, we can do this in O(log n) time since the list is sorted. But if we use a linked list, regardless of how we keep it sorted, all the operations that can currently be done in O(log n) time will require O(n) time because we'll have to traverse the list linearly. So if we switch to using LinkedHashSet/ListHashSet, we'll just be making a different set of performance tradeoffs, and I don't think we have actual data with which to make that decision, do we? Another option would be to keep using a vector but not worry about keeping it sorted, that may or may not be faster than using a linked list depending on how many elements we typically have, right?
On 2017/02/15 at 02:26:56, rlanday wrote: > > > So then we're no longer going to keep the list of markers sorted (since there's no advantage to keeping a linked list in sorted order since we have to traverse it linearly anyway)? How do we know this is actually a performance advantage? > > > > Sorry for confusion, LinkedHashSet and ListHashSet are *not* replacement of std::set<T>. > > They're linked lists that allow removal of elements in O(1) time, right? > > The issue I see is that presumably we store DocumentMarkers in a vector ordered by start offset because there are certain operations that the sorting makes faster. E.g. when we add a marker and we check if there are any overlapping markers, we can do this in O(log n) time since the list is sorted. > > But if we use a linked list, regardless of how we keep it sorted, all the operations that can currently be done in O(log n) time will require O(n) time because we'll have to traverse the list linearly. So if we switch to using LinkedHashSet/ListHashSet, we'll just be making a different set of performance tradeoffs, and I don't think we have actual data with which to make that decision, do we? > > Another option would be to keep using a vector but not worry about keeping it sorted, that may or may not be faster than using a linked list depending on how many elements we typically have, right? I got same suggestion from Architecture team. They don't want introduce WTF::OrderedSet<T>, WTF version std::set<T>, due by memory locality. This is kind of transaction log. We record "add" and "remove" operations into log vector then we process log to get sorted marker list. In this way, complex operation is encapsulated into log processing function, say DMC::processLog(). processLog() is invoked by: - AddMarker: when log is bigger than threshold - RemoveMarker: process log before RemoveOperation == process Add operation, then remove markers in range. - Before Paint and other: They should call DMC::sortMarkers() before calling DMC::markerFor(Node) Markers are: 1. IME marker: only one for each Text node, no overlap 2. Find highlights: Many, no overlap. may be added monotonically increased start offset. Remove all rather than individual one. There is only one focus highlight. Need to manage tick marks. 3. Spelling/Grammar: Many. Adding markers at random position, but ordered in paragraph. Remove markers for paragraph/range. 4. Suggestion: not many, overlap. I'll compile this idea into [1]. I think it is better to have different implementation for each marker type. And dispatched in DMC, e.g. DMC::AddMarker(MarkerType markerType, const EphomeralRange& range, Details details) { MarkerListImpl* listImpl = MarkerListImplFor(markerType); listImpl->addMarker(range, details); } WDYT? [1] https://goo.gl/VVJlDz
On 2017/02/15 at 04:02:53, yosin_UTC9 wrote: > On 2017/02/15 at 02:26:56, rlanday wrote: > > > > So then we're no longer going to keep the list of markers sorted (since there's no advantage to keeping a linked list in sorted order since we have to traverse it linearly anyway)? How do we know this is actually a performance advantage? > > > > > > Sorry for confusion, LinkedHashSet and ListHashSet are *not* replacement of std::set<T>. > > > > They're linked lists that allow removal of elements in O(1) time, right? > > > > The issue I see is that presumably we store DocumentMarkers in a vector ordered by start offset because there are certain operations that the sorting makes faster. E.g. when we add a marker and we check if there are any overlapping markers, we can do this in O(log n) time since the list is sorted. > > > > But if we use a linked list, regardless of how we keep it sorted, all the operations that can currently be done in O(log n) time will require O(n) time because we'll have to traverse the list linearly. So if we switch to using LinkedHashSet/ListHashSet, we'll just be making a different set of performance tradeoffs, and I don't think we have actual data with which to make that decision, do we? > > > > Another option would be to keep using a vector but not worry about keeping it sorted, that may or may not be faster than using a linked list depending on how many elements we typically have, right? > I got same suggestion from Architecture team. They don't want introduce WTF::OrderedSet<T>, WTF version std::set<T>, due by memory locality. > This is kind of transaction log. We record "add" and "remove" operations into log vector then we process log to get sorted marker list. > In this way, complex operation is encapsulated into log processing function, say DMC::processLog(). > > processLog() is invoked by: > - AddMarker: when log is bigger than threshold > - RemoveMarker: process log before RemoveOperation == process Add operation, then remove markers in range. > - Before Paint and other: They should call DMC::sortMarkers() before calling DMC::markerFor(Node) > > Markers are: > 1. IME marker: only one for each Text node, no overlap > 2. Find highlights: Many, no overlap. may be added monotonically increased start offset. Remove all rather than individual one. There is only one focus highlight. Need to manage tick marks. > 3. Spelling/Grammar: Many. Adding markers at random position, but ordered in paragraph. Remove markers for paragraph/range. > 4. Suggestion: not many, overlap. > > I'll compile this idea into [1]. > > I think it is better to have different implementation for each marker type. And dispatched in DMC, e.g. > > > DMC::AddMarker(MarkerType markerType, const EphomeralRange& range, Details details) { > MarkerListImpl* listImpl = MarkerListImplFor(markerType); > listImpl->addMarker(range, details); > } > > WDYT? > > [1] https://goo.gl/VVJlDz Another motivation of having different list implementation is not using RenderedDocumentMarker except for TextMatch.
> > I think it is better to have different implementation for each marker type. And dispatched in DMC, e.g. > > > > > > DMC::AddMarker(MarkerType markerType, const EphomeralRange& range, Details details) { > > MarkerListImpl* listImpl = MarkerListImplFor(markerType); > > listImpl->addMarker(range, details); > > } > > > > WDYT? > > > > [1] https://goo.gl/VVJlDz > > Another motivation of having different list implementation is not using RenderedDocumentMarker except for TextMatch. What's the rationale for having different implementations for each marker type? It seems like we're talking about introducing a lot of complexity to solve something I'm not sure is really a problem (performance).
On 2017/02/15 at 04:53:18, rlanday wrote: > > > I think it is better to have different implementation for each marker type. And dispatched in DMC, e.g. > > > > > > > > > DMC::AddMarker(MarkerType markerType, const EphomeralRange& range, Details details) { > > > MarkerListImpl* listImpl = MarkerListImplFor(markerType); > > > listImpl->addMarker(range, details); > > > } > > > > > > WDYT? > > > > > > [1] https://goo.gl/VVJlDz > > > > Another motivation of having different list implementation is not using RenderedDocumentMarker except for TextMatch. > > What's the rationale for having different implementations for each marker type? It seems like we're talking about introducing a lot of complexity to solve something I'm not sure is really a problem (performance). It should be implementation simpler, reduce chance of regression, and some optimization. Since requirement of marker list is different: Spell/Grammer: We need to have fast insert/remove/query with many markers IME: Single marker Find: Adding monotonically increased marker, remove all once with many markers, need to support tick marks. Suggestion: Few markers overlap In this way, introducing new marker type doesn't make regression for other marker types.
aelias@chromium.org changed reviewers: + aelias@chromium.org
OK, that makes sense. We wrote a specific class hierarchy proposal for DocumentMarker and DocumentMarkerList at the bottom of rlanday@'s doc https://docs.google.com/document/d/1IQQTZS3T0QdP8eb9tOea2HNhbjOJxmRDiWJAHMzdm... . yosin@, please let us know if that looks OK, and rlanday@ will start implementing the refactoring. |