Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(138)

Side by Side Diff: components/translate/core/language_detection/chinese_script_classifier.h

Issue 2732023003: Adds ChineseScriptClassifier to predict zh-Hant or zh-Hans for input detected as zh. (Closed)
Patch Set: Fixes broken test Created 3 years, 9 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
OLDNEW
(Empty)
1 // Copyright 2017 The Chromium Authors. All rights reserved.
2 // Use of this source code is governed by a BSD-style license that can be
3 // found in the LICENSE file.
4
5 #ifndef COMPONENTS_TRANSLATE_CORE_LANGUAGE_DETECTION_CHINESE_SCRIPT_CLASSIFIER_H _
6 #define COMPONENTS_TRANSLATE_CORE_LANGUAGE_DETECTION_CHINESE_SCRIPT_CLASSIFIER_H _
7
8 #include <memory>
9 #include <string>
10 #include "third_party/icu/source/i18n/unicode/translit.h"
11
12 namespace translate {
13
14 class ChineseScriptClassifier {
15 public:
16 // Initializes both the Hant-to-Hans ICU transliterator and the
17 // Hans-to-Hant ICU transliterator.
18 ChineseScriptClassifier();
19 ~ChineseScriptClassifier();
20
21 // Given Chinese text as input, returns either zh-Hant or zh-Hans.
22 // When the input is ambiguous, i.e. not completely zh-Hans and not
23 // completely zh-Hant, this function returns the closest language code
24 // matching the input.
25 //
26 // Behavior is undefined for non-Chinese input.
27 std::string Classify(const std::string& input) const;
28
29 // Returns true if the underlying transliterators were properly initialized
30 // by the constructor.
31 bool IsInitialized() const;
groby-ooo-7-16 2017/03/08 00:41:55 Maybe use a factory mode instead? (It's a personal
riesa 2017/03/08 01:47:42 My understanding is that the factory model is not
32
33 private:
34 // BCP 47 language code representing Chinese in Han Simplified script.
35 static const char kChineseSimplifiedCode[];
groby-ooo-7-16 2017/03/08 00:41:55 Why keep those as class members, as opposed to ano
riesa 2017/03/08 01:47:42 Just a style/organization preference. But in a mas
36
37 // BCP 47 language code representing Chinese in Han Traditional script.
38 static const char kChineseTraditionalCode[];
39
40 // ICU Transliterator that does Hans to Hant conversion.
41 std::unique_ptr<icu::Transliterator> hans2hant_;
42
43 // ICU Transliterator that does Hant to Hans conversion.
44 std::unique_ptr<icu::Transliterator> hant2hans_;
45 };
46
47 } // namespace translate
48
49 #endif // COMPONENTS_TRANSLATE_CORE_LANGUAGE_DETECTION_CHINESE_SCRIPT_CLASSIFIE R_H_
OLDNEW

Powered by Google App Engine
This is Rietveld 408576698