Chromium Code Reviews
DescriptionSwitch language detection to use CLD2's DetectLanguageCheckUTF8 method.
There is a long-standing bug in issue 444258 describing a crash in
CLD2::QuadHashV3Lookup4. A hypothesis for the crashes is that invalid UTF-8 is
reaching the CLD2 code layer, and this is confirmed in bug 444258. We must use
the "safe" (CheckUTF8) variant of language detection instead.
The existing code uses DetectLanguageSummary, but there is no "safe" variant of
this method in CLD2. However, the existing code doesn't consider the extra data
returned by DetectLanguageSummary (i.e., multiple language guesses with
accompanying probabilities) so there's no reason to stick with it. Using the
simpler and safer DetectLanguageCheckUTF8 should produce the same results, will
be safer, and should have comparable performance for Chromium's use cases.
BUG=444258
Committed: https://crrev.com/90bb2b934366c595c1c979e0b2363f0a822e1b92
Cr-Commit-Position: refs/heads/master@{#328947}
Patch Set 1 #
Total comments: 7
Patch Set 2 : droger@ comments #Patch Set 3 : Remove DCHECK and reintroduce retry behavior #Patch Set 4 : Fix compile error under CLD1 for Android #Patch Set 5 : git cl format #Messages
Total messages: 17 (3 generated)
|