Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(2902)

Unified Diff: extensions/renderer/i18n_custom_bindings.cc

Issue 2780323002: Sets is_reliable for CLD3 if below a minimum byte threshold of 50 bytes. (Closed)
Patch Set: Removes comments specific to Translate Extension Created 3 years, 9 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View side-by-side diff with in-line comments
Download patch
« no previous file with comments | « no previous file | no next file » | no next file with comments »
Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
Index: extensions/renderer/i18n_custom_bindings.cc
diff --git a/extensions/renderer/i18n_custom_bindings.cc b/extensions/renderer/i18n_custom_bindings.cc
index 34dd6c89c0f9be01558bf8098b37fab95dc18ce9..57e0e96e2e8bfb4f1037fc3a2523056540731e56 100644
--- a/extensions/renderer/i18n_custom_bindings.cc
+++ b/extensions/renderer/i18n_custom_bindings.cc
@@ -41,6 +41,10 @@ namespace {
// Max number of languages to detect.
const int kCldNumLangs = 3;
+// CLD3 minimum reliable byte threshold. Predictions for inputs below this size
+// in bytes will be considered unreliable.
+const int kCld3MinimumByteThreshold = 50;
+
struct DetectedLanguage {
DetectedLanguage(const std::string& language, int percentage)
: language(language), percentage(percentage) {}
@@ -310,8 +314,17 @@ void I18NCustomBindings::DetectTextLanguage(
#elif BUILDFLAG(CLD_VERSION) == 3
chrome_lang_id::NNetLanguageIdentifier nnet_lang_id(/*min_num_bytes=*/0,
/*max_num_bytes=*/512);
- const std::vector<chrome_lang_id::NNetLanguageIdentifier::Result>
- lang_results = nnet_lang_id.FindTopNMostFreqLangs(text, kCldNumLangs);
+ std::vector<chrome_lang_id::NNetLanguageIdentifier::Result> lang_results =
+ nnet_lang_id.FindTopNMostFreqLangs(text, kCldNumLangs);
+
+ // is_reliable is set to false if we believe the input is too short to be
+ // accurately identified by the current model.
+ if (text.size() < kCld3MinimumByteThreshold) {
+ for (auto& result : lang_results) {
+ result.is_reliable = false;
+ }
+ }
+
LanguageDetectionResult result;
// Populate LanguageDetectionResult with prediction reliability, languages,
« no previous file with comments | « no previous file | no next file » | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698