|
|
Created:
5 years, 8 months ago by please use gerrit instead Modified:
5 years, 6 months ago Reviewers:
groby-ooo-7-16 CC:
chromium-reviews, groby+spellwatch_chromium.org, rlp+watch_chromium.org, rouslan+spellwatch_chromium.org, Julius Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionHandle typographical apostrophe with spelling service client
Do not mark typographical apostrophe as misspelled, if the server wants
to replace it with a typewriter apostrophe.
TEST=SpellCheckTest.CreateTextCheckingResults
TEST=SpellCheckTest.IsValidContraction
TEST=SpellcheckWordIteratorTest.TypographicalApostropheIsPartOfWord
TEST=SpellingServiceClientTest.RequestTextCheck
BUG=165079
Committed: https://crrev.com/e12b92b09e5aa238758e3f2f0d7c195709fbf217
Cr-Commit-Position: refs/heads/master@{#335193}
Patch Set 1 : #Patch Set 2 : Slight refactor. #
Total comments: 3
Patch Set 3 : SpellcheckWordIteratorTest.TypographicalApostropheIsPartOfWord #Patch Set 4 : Preserve apostrophes. #Patch Set 5 : Ignore apostrophe-only suggestions from browser process. #Patch Set 6 : Fix ozone and mac #
Total comments: 16
Patch Set 7 : Address comments. Avoid server confusion with typographical apostrophes. #
Total comments: 7
Patch Set 8 : Address comments. #
Total comments: 4
Patch Set 9 : base::char16 #
Messages
Total messages: 27 (7 generated)
Patchset #1 (id:1) has been deleted
Patchset #1 (id:20001) has been deleted
Patchset #1 (id:40001) has been deleted
rouslan@chromium.org changed reviewers: + groby@chromium.org
Rachel, PTAL.
This does not account for the concatenation system for the native spellchecker. As the bug says, that'd require fixing SpellcheckWordIterator https://codereview.chromium.org/1092243002/diff/80001/chrome/renderer/spellch... File chrome/renderer/spellchecker/spellcheck.cc (right): https://codereview.chromium.org/1092243002/diff/80001/chrome/renderer/spellch... chrome/renderer/spellchecker/spellcheck.cc:398: // Ignore typographical apostrophe misspellings. Eeew. Are all languages handling typographic apostrophes the same?
On 2015/04/20 21:02:22, groby wrote: > This does not account for the concatenation system for the native spellchecker. > As the bug says, that'd require fixing SpellcheckWordIterator SpellcheckWordIterator appears to be working correctly in all English languages that we support (US, CA, GB, AU). I added SpellcheckWordIteratorTest.TypographicalApostropheIsPartOfWord to verify this in Patch Set 3. Do you know of any other languages that use apostrophes? The other two languages I'm familiar with (Russian and Chinese) do not use apostrophes, if I remember correctly.
Rachel, PTAL Patch Set 3. https://codereview.chromium.org/1092243002/diff/80001/chrome/renderer/spellch... File chrome/renderer/spellchecker/spellcheck.cc (right): https://codereview.chromium.org/1092243002/diff/80001/chrome/renderer/spellch... chrome/renderer/spellchecker/spellcheck.cc:398: // Ignore typographical apostrophe misspellings. On 2015/04/20 21:02:22, groby wrote: > Eeew. Are all languages handling typographic apostrophes the same? SpellcheckWordIterator says it's using ICU for word iteration, so that's out of our hands, as far as I understand.
https://codereview.chromium.org/1092243002/diff/80001/chrome/renderer/spellch... File chrome/renderer/spellchecker/spellcheck.cc (right): https://codereview.chromium.org/1092243002/diff/80001/chrome/renderer/spellch... chrome/renderer/spellchecker/spellcheck.cc:398: // Ignore typographical apostrophe misspellings. On 2015/04/20 21:02:22, groby wrote: > Eeew. By the way, we can remove this hack once the server treats typographical apostrophes correctly. I've filed a bug for that.
Another solution is to replace typographical apostrophes with typewriter apostrophes before sending text to the spelling server. Pro: change is concentrated in SpellingServiceClient. Con: additional O(n) pass through text sent to spelling server. WDYT?
What happens for words that change length? I.e. check Ik've been here Does that preserve the quotation mark? What about IsValidContraction? If this is for the server only, why not handle that in the server-specific code? As for performance, I don't care about the cost of apostrophe replacement - but it still doesn't handle the issue where it contextually replaces something. See Ik've above - RU calls it out as a mistake, but the correction is not typographically correct. This is, unfortunately, not quite as trivial as one would hope :)
Rachel, PTAL Patch Set 5, which preserves the original apostrophe types and adds a test for IsValidContraction. On 2015/04/21 21:42:44, groby wrote: > What happens for words that change length? I.e. check > Ik've been here > Does that preserve the quotation mark? Yes, SpellCheckTest.CreateTextCheckingResults checks for this. If the original word is "Ik’ve" (typographical), but browser process returns suggestion "I've" (typewriter), then the renderer will fix the apostrophe in the suggestion to match the original text. The suggestion becomes "I’ve" (typographical). > What about IsValidContraction? It works. I add SpellCheckTest.IsValidContraction to be sure. > If this is for the server only, why not handle that in the server-specific code? Server-specific code does not keep track of checked text. I think my current approach compartmentalizes the solution well. See PreserveOriginalApostropheTypes() function in spellcheck.cc. > As for performance, I don't care about the cost of apostrophe replacement - but > it still doesn't handle the issue where it contextually replaces something. See > Ik've above - RU calls it out as a mistake, but the correction is not > typographically correct. SpellCheckTest.CreateTextCheckingResults checks for this now. It should be working correctly.
Sorry, still being picky :) https://codereview.chromium.org/1092243002/diff/160001/chrome/renderer/spellc... File chrome/renderer/spellchecker/spellcheck.cc (right): https://codereview.chromium.org/1092243002/diff/160001/chrome/renderer/spellc... chrome/renderer/spellchecker/spellcheck.cc:103: const base::string16 kApostrophes(base::WideToUTF16(L"\x2019'")); Please declare as constant https://codereview.chromium.org/1092243002/diff/160001/chrome/renderer/spellc... chrome/renderer/spellchecker/spellcheck.cc:106: for (base::string16::iterator to_it = std::find_first_of( You'll need to handle the case of differing apostrophe counts somehow. (I.e. if a spelling correction inserts/removes an apostrophe) https://codereview.chromium.org/1092243002/diff/160001/chrome/renderer/spellc... chrome/renderer/spellchecker/spellcheck.cc:108: from_it != from_end && to_it != to->end(); Also, this is the for-loop from hell. For readability purposes, I'd suggest something like for (base::char16& c : to_string) { if (isApostrophe(c)) { // Find the next one in from_string, replace } } (If you use find_if, you can reuse IsApostrophe :) https://codereview.chromium.org/1092243002/diff/160001/chrome/renderer/spellc... chrome/renderer/spellchecker/spellcheck.cc:409: for (size_t i = 0; i < spellcheck_results.size(); ++i) { While you're here: for (const SpellCheckResult& spellcheck_result : spellcheck_results) { https://codereview.chromium.org/1092243002/diff/160001/chrome/renderer/spellc... chrome/renderer/spellchecker/spellcheck.cc:410: if (custom_dictionary_.SpellCheckWord(line_text, Makes sense to move this up - thanks! (If this CL drags on, might want to do this separately) https://codereview.chromium.org/1092243002/diff/160001/chrome/renderer/spellc... chrome/renderer/spellchecker/spellcheck.cc:418: line_text.begin() + spellcheck_results[i].location, Given that we refer to the substring several times, I'm wondering if we shouldn't simply copy it out into a separate string. https://codereview.chromium.org/1092243002/diff/160001/chrome/renderer/spellc... File chrome/renderer/spellchecker/spellcheck_unittest.cc (right): https://codereview.chromium.org/1092243002/diff/160001/chrome/renderer/spellc... chrome/renderer/spellchecker/spellcheck_unittest.cc:19: #define TYPOGRAPHICAL_APOSTROPHE L"\x2019" Eeeeeeeek. I suppose we need the define, but... eeek. https://codereview.chromium.org/1092243002/diff/160001/chrome/renderer/spellc... chrome/renderer/spellchecker/spellcheck_unittest.cc:1187: SpellCheckResult::SPELLING, 0, 5, base::WideToUTF16(L"I've"))); UTF8ToUTF16 is enough here. No need to go wide char. (Here and elsewhere)
Can we wrap this up? Can you give this to Julius, if you're swamped?
Patchset #7 (id:180001) has been deleted
Patchset #7 (id:200001) has been deleted
Rachel, PTAL Patch Set 7. https://codereview.chromium.org/1092243002/diff/160001/chrome/renderer/spellc... File chrome/renderer/spellchecker/spellcheck.cc (right): https://codereview.chromium.org/1092243002/diff/160001/chrome/renderer/spellc... chrome/renderer/spellchecker/spellcheck.cc:103: const base::string16 kApostrophes(base::WideToUTF16(L"\x2019'")); On 2015/04/25 00:19:02, groby wrote: > Please declare as constant No longer applicable after using literals in IsApostrophe(). https://codereview.chromium.org/1092243002/diff/160001/chrome/renderer/spellc... chrome/renderer/spellchecker/spellcheck.cc:106: for (base::string16::iterator to_it = std::find_first_of( On 2015/04/25 00:19:02, groby wrote: > You'll need to handle the case of differing apostrophe counts somehow. (I.e. if > a spelling correction inserts/removes an apostrophe) I think that's an edge case that requires some more thinking on how to properly handle it. As such, it's out of scope of this quick fix, but let's discuss it further in https://goo.gl/em8lwm. https://codereview.chromium.org/1092243002/diff/160001/chrome/renderer/spellc... chrome/renderer/spellchecker/spellcheck.cc:108: from_it != from_end && to_it != to->end(); On 2015/04/25 00:19:02, groby wrote: > Also, this is the for-loop from hell. > > For readability purposes, I'd suggest something like > > for (base::char16& c : to_string) { > if (isApostrophe(c)) { > // Find the next one in from_string, replace > } > } > > (If you use find_if, you can reuse IsApostrophe :) Done. https://codereview.chromium.org/1092243002/diff/160001/chrome/renderer/spellc... chrome/renderer/spellchecker/spellcheck.cc:409: for (size_t i = 0; i < spellcheck_results.size(); ++i) { On 2015/04/25 00:19:02, groby wrote: > While you're here: > for (const SpellCheckResult& spellcheck_result : spellcheck_results) { Done. https://codereview.chromium.org/1092243002/diff/160001/chrome/renderer/spellc... chrome/renderer/spellchecker/spellcheck.cc:410: if (custom_dictionary_.SpellCheckWord(line_text, On 2015/04/25 00:19:02, groby wrote: > Makes sense to move this up - thanks! (If this CL drags on, might want to do > this separately) Ack https://codereview.chromium.org/1092243002/diff/160001/chrome/renderer/spellc... chrome/renderer/spellchecker/spellcheck.cc:418: line_text.begin() + spellcheck_results[i].location, On 2015/04/25 00:19:02, groby wrote: > Given that we refer to the substring several times, I'm wondering if we > shouldn't simply copy it out into a separate string. Done. https://codereview.chromium.org/1092243002/diff/160001/chrome/renderer/spellc... File chrome/renderer/spellchecker/spellcheck_unittest.cc (right): https://codereview.chromium.org/1092243002/diff/160001/chrome/renderer/spellc... chrome/renderer/spellchecker/spellcheck_unittest.cc:19: #define TYPOGRAPHICAL_APOSTROPHE L"\x2019" On 2015/04/25 00:19:02, groby wrote: > Eeeeeeeek. I suppose we need the define, but... eeek. Ack https://codereview.chromium.org/1092243002/diff/160001/chrome/renderer/spellc... chrome/renderer/spellchecker/spellcheck_unittest.cc:1187: SpellCheckResult::SPELLING, 0, 5, base::WideToUTF16(L"I've"))); On 2015/04/25 00:19:02, groby wrote: > UTF8ToUTF16 is enough here. No need to go wide char. (Here and elsewhere) Done. https://codereview.chromium.org/1092243002/diff/220001/chrome/browser/spellch... File chrome/browser/spellchecker/spelling_service_client_unittest.cc (left): https://codereview.chromium.org/1092243002/diff/220001/chrome/browser/spellch... chrome/browser/spellchecker/spelling_service_client_unittest.cc:155: EXPECT_EQ(text, request_text); Cannot compare sanitized request text with actual request text, because they are not the same anymore (due to the sanitation.)
Tiny nits, plus the question of char16 vs. wchar_t https://codereview.chromium.org/1092243002/diff/220001/chrome/browser/spellch... File chrome/browser/spellchecker/spelling_service_client.cc (right): https://codereview.chromium.org/1092243002/diff/220001/chrome/browser/spellch... chrome/browser/spellchecker/spelling_service_client.cc:72: for (base::char16& c : text_copy) { std::replace(text_copy.begin(), text_copy.end(), L'\x2019', '\'); // should work as well https://codereview.chromium.org/1092243002/diff/220001/chrome/browser/spellch... File chrome/browser/spellchecker/spelling_service_client_unittest.cc (right): https://codereview.chromium.org/1092243002/diff/220001/chrome/browser/spellch... chrome/browser/spellchecker/spelling_service_client_unittest.cc:218: const wchar_t* request_text; Sigh. I double-checked, utf literals are still not allowed, so I suppose this is it... https://codereview.chromium.org/1092243002/diff/220001/chrome/renderer/spellc... File chrome/renderer/spellchecker/spellcheck.cc (right): https://codereview.chromium.org/1092243002/diff/220001/chrome/renderer/spellc... chrome/renderer/spellchecker/spellcheck.cc:100: bool IsApostrophe(base::char16 c) { The literal below is wchar_t, not char16 And on some systems, wchar_t is utf32 instead: https://code.google.com/p/chromium/codesearch#chromium/src/build/build_config... I'm not sure if that _really_ affects us, or if the build_config is uber-careful. Feel free to investigate, or switch to wchar_t :)
Rachel, PTAL Patch Set 8. https://codereview.chromium.org/1092243002/diff/220001/chrome/browser/spellch... File chrome/browser/spellchecker/spelling_service_client.cc (right): https://codereview.chromium.org/1092243002/diff/220001/chrome/browser/spellch... chrome/browser/spellchecker/spelling_service_client.cc:72: for (base::char16& c : text_copy) { On 2015/06/17 19:12:02, groby wrote: > std::replace(text_copy.begin(), text_copy.end(), L'\x2019', '\'); // should > work as well Ooh, I like. https://codereview.chromium.org/1092243002/diff/220001/chrome/browser/spellch... File chrome/browser/spellchecker/spelling_service_client_unittest.cc (right): https://codereview.chromium.org/1092243002/diff/220001/chrome/browser/spellch... chrome/browser/spellchecker/spelling_service_client_unittest.cc:218: const wchar_t* request_text; On 2015/06/17 19:12:02, groby wrote: > Sigh. I double-checked, utf literals are still not allowed, so I suppose this is > it... > Acknowledged. This also what we're doing in https://code.google.com/p/chromium/codesearch#chromium/src/chrome/renderer/sp... https://codereview.chromium.org/1092243002/diff/220001/chrome/renderer/spellc... File chrome/renderer/spellchecker/spellcheck.cc (right): https://codereview.chromium.org/1092243002/diff/220001/chrome/renderer/spellc... chrome/renderer/spellchecker/spellcheck.cc:100: bool IsApostrophe(base::char16 c) { On 2015/06/17 19:12:03, groby wrote: > The literal below is wchar_t, not char16 > > And on some systems, wchar_t is utf32 instead: > https://code.google.com/p/chromium/codesearch#chromium/src/build/build_config... > > I'm not sure if that _really_ affects us, or if the build_config is > uber-careful. Feel free to investigate, or switch to wchar_t :) Switched to wchar_t.
One more thing... (I _really_ don't trust wchar_t) https://codereview.chromium.org/1092243002/diff/240001/chrome/renderer/spellc... File chrome/renderer/spellchecker/spellcheck.cc (left): https://codereview.chromium.org/1092243002/diff/240001/chrome/renderer/spellc... chrome/renderer/spellchecker/spellcheck.cc:388: // are probably contextually-misspelled words. Why are we removing this comment? It's making at least somewhat clear what we do here :) https://codereview.chromium.org/1092243002/diff/240001/chrome/renderer/spellc... File chrome/renderer/spellchecker/spellcheck.cc (right): https://codereview.chromium.org/1092243002/diff/240001/chrome/renderer/spellc... chrome/renderer/spellchecker/spellcheck.cc:100: bool IsApostrophe(wchar_t c) { This still doesn't work - the for-loop in Preserve... uses char16. wchar_t and char16 are not necessarily convertible. (Again, I don't know if that's my paranoia or not - I'd ask brettw around this) Personally, I'd take in a char16, and have two constants const base::char16 kRightSingleQuotationMark = 0x2019 const base::char16 kApostrophe = 0x27; and then compare against these, which avoids all wchar_t nonsense
Rachel, PTAL Patch Set 9. https://codereview.chromium.org/1092243002/diff/240001/chrome/renderer/spellc... File chrome/renderer/spellchecker/spellcheck.cc (left): https://codereview.chromium.org/1092243002/diff/240001/chrome/renderer/spellc... chrome/renderer/spellchecker/spellcheck.cc:388: // are probably contextually-misspelled words. On 2015/06/18 21:52:26, groby wrote: > Why are we removing this comment? It's making at least somewhat clear what we do > here :) There're three things happening in this function now, but this comment is talking about only one. I moved this comment down to line 426 and "trimmed" it a bit in the process. I can see how that can get confusing. I've copied the comment verbatim now. https://codereview.chromium.org/1092243002/diff/240001/chrome/renderer/spellc... File chrome/renderer/spellchecker/spellcheck.cc (right): https://codereview.chromium.org/1092243002/diff/240001/chrome/renderer/spellc... chrome/renderer/spellchecker/spellcheck.cc:100: bool IsApostrophe(wchar_t c) { On 2015/06/18 21:52:26, groby wrote: > This still doesn't work - the for-loop in Preserve... uses char16. wchar_t and > char16 are not necessarily convertible. (Again, I don't know if that's my > paranoia or not - I'd ask brettw around this) > > Personally, I'd take in a char16, and have two constants > > const base::char16 kRightSingleQuotationMark = 0x2019 > const base::char16 kApostrophe = 0x27; > > and then compare against these, which avoids all wchar_t nonsense > > > Done.
Gimme an L! Gimme a G!... OK, fine, I'll just spell it out directly: LGTM :)
The CQ bit was checked by rouslan@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1092243002/260001
Message was sent while issue was closed.
Committed patchset #9 (id:260001)
Message was sent while issue was closed.
Patchset 9 (id:??) landed as https://crrev.com/e12b92b09e5aa238758e3f2f0d7c195709fbf217 Cr-Commit-Position: refs/heads/master@{#335193}
Message was sent while issue was closed.
A revert of this CL (patchset #9 id:260001) has been created in https://codereview.chromium.org/1201563004/ by rouslan@chromium.org. The reason for reverting is: Suspected to be causing crashes in http://crbug.com/502786.. |