|
|
Created:
4 years, 5 months ago by nicolaso Modified:
4 years, 5 months ago CC:
chromium-reviews, rouslan+autofill_chromium.org, estade+watch_chromium.org, vabr+watchlistautofill_chromium.org, browser-components-watch_chromium.org, jdonnelly+autofillwatch_chromium.org Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionSplit CJK full names into name parts correctly.
This a a partial, somewhat naive, fix for a bug where CJK names were
split the same way as western names. CJK names have the family name
(surname) first, and don't usually have a space in-between the two parts.
For a complete fix, we might need to improve Japanese name detection,
when there is no space between the two names. It's unclear how often
users actually enter their name without spaces in a form field.
We might also want to fix the opposite use-case: when the user enters
their first & last name in separate fields, we should infer their full
name in the right order (with ordering based on the script used).
BUG=89111
Committed: https://crrev.com/4a655982616cfd5c05744f509b5d088f9d93bb81
Cr-Commit-Position: refs/heads/master@{#405997}
Patch Set 1 #
Total comments: 8
Patch Set 2 : Fixed some comments and added a TODO. #
Total comments: 2
Patch Set 3 : Move a TODO. #
Total comments: 11
Patch Set 4 : Improve precision for splitting Korean names. #
Total comments: 8
Patch Set 5 : Assume 2-character CJK names are 1/1. #Patch Set 6 : nits #Patch Set 7 : Fix build on Windows. #
Messages
Total messages: 31 (11 generated)
Description was changed from ========== Split CJK full names into name parts correctly. This a a partial, somewhat naive, fix for a bug where CJK names were split the same way as western names. CJK names have the family name (surname) first, and don't usually have a space in-between the two parts. For a complete fix, we might need to improve Japanese name detection, when there is no space between the two names. It's unclear how often users actually enter their name without spaces in a form field. We might also want to fix the opposite use-case: when the user enters their first & last name in separate fields, we should infer their full name in the right order (with ordering based on the script used). BUG=89111 ========== to ========== Split CJK full names into name parts correctly. This a a partial, somewhat naive, fix for a bug where CJK names were split the same way as western names. CJK names have the family name (surname) first, and don't usually have a space in-between the two parts. For a complete fix, we might need to improve Japanese name detection, when there is no space between the two names. It's unclear how often users actually enter their name without spaces in a form field. We might also want to fix the opposite use-case: when the user enters their first & last name in separate fields, we should infer their full name in the right order (with ordering based on the script used). BUG=89111 ==========
nicolaso@google.com changed reviewers: + mathp@chromium.org, sebsg@chromium.org
Looks good, one question https://codereview.chromium.org/2132103002/diff/1/components/autofill/core/br... File components/autofill/core/browser/autofill_data_util.cc (right): https://codereview.chromium.org/2132103002/diff/1/components/autofill/core/br... components/autofill/core/browser/autofill_data_util.cc:130: // Returns true if |name| looks like a Korean name, in Hangul characters. "in Hangul characters" -> "made up entirely of Hangul characters or spaces"? https://codereview.chromium.org/2132103002/diff/1/components/autofill/core/br... components/autofill/core/browser/autofill_data_util.cc:151: // FIXME(89111): Japanese names with no space will be mis-split, since we TODO(crbug.com/89111) https://codereview.chromium.org/2132103002/diff/1/components/autofill/core/br... components/autofill/core/browser/autofill_data_util.cc:211: // TODO(89111): Hungarian, Tamil, Telugu, and Vietnamese also have the format is TODO(crbug.com/89111) https://codereview.chromium.org/2132103002/diff/1/components/autofill/core/br... components/autofill/core/browser/autofill_data_util.cc:215: if (IsCJKName(name) && SplitCJKName(name_tokens, &parts)) { Are there cases where a name is made up of CJK characters of different scripts? If not, I'm thinking IsCJKName could instead return the script, which could be used in SplitCJKName (would avoid going over the name twice). Let me know if such a thing could be possible, thanks
On 2016/07/11 13:49:15, Mathieu Perreault wrote: > Looks good, one question > > https://codereview.chromium.org/2132103002/diff/1/components/autofill/core/br... > File components/autofill/core/browser/autofill_data_util.cc (right): > > https://codereview.chromium.org/2132103002/diff/1/components/autofill/core/br... > components/autofill/core/browser/autofill_data_util.cc:130: // Returns true if > |name| looks like a Korean name, in Hangul characters. > "in Hangul characters" -> "made up entirely of Hangul characters or spaces"? > > https://codereview.chromium.org/2132103002/diff/1/components/autofill/core/br... > components/autofill/core/browser/autofill_data_util.cc:151: // FIXME(89111): > Japanese names with no space will be mis-split, since we > TODO(crbug.com/89111) > > https://codereview.chromium.org/2132103002/diff/1/components/autofill/core/br... > components/autofill/core/browser/autofill_data_util.cc:211: // TODO(89111): > Hungarian, Tamil, Telugu, and Vietnamese also have the > format is TODO(crbug.com/89111) > > https://codereview.chromium.org/2132103002/diff/1/components/autofill/core/br... > components/autofill/core/browser/autofill_data_util.cc:215: if (IsCJKName(name) > && SplitCJKName(name_tokens, &parts)) { > Are there cases where a name is made up of CJK characters of different scripts? > If not, I'm thinking IsCJKName could instead return the script, which could be > used in SplitCJKName (would avoid going over the name twice). > > Let me know if such a thing could be possible, thanks I noticed another edge-case while addressing comments: foreign names in Japanese are written in Katakana, with '・' (KATAKANA MIDDLE DOT U+30FB) as a separator, and the western ordering. This shouldn't be too hard to account for, so I added a TODO. I'll fix it in a follow-up CL.
https://codereview.chromium.org/2132103002/diff/1/components/autofill/core/br... File components/autofill/core/browser/autofill_data_util.cc (right): https://codereview.chromium.org/2132103002/diff/1/components/autofill/core/br... components/autofill/core/browser/autofill_data_util.cc:130: // Returns true if |name| looks like a Korean name, in Hangul characters. On 2016/07/11 13:49:15, Mathieu Perreault wrote: > "in Hangul characters" -> "made up entirely of Hangul characters or spaces"? Done. https://codereview.chromium.org/2132103002/diff/1/components/autofill/core/br... components/autofill/core/browser/autofill_data_util.cc:151: // FIXME(89111): Japanese names with no space will be mis-split, since we On 2016/07/11 13:49:15, Mathieu Perreault wrote: > TODO(crbug.com/89111) Done. https://codereview.chromium.org/2132103002/diff/1/components/autofill/core/br... components/autofill/core/browser/autofill_data_util.cc:211: // TODO(89111): Hungarian, Tamil, Telugu, and Vietnamese also have the On 2016/07/11 13:49:15, Mathieu Perreault wrote: > format is TODO(crbug.com/89111) Done. https://codereview.chromium.org/2132103002/diff/1/components/autofill/core/br... components/autofill/core/browser/autofill_data_util.cc:215: if (IsCJKName(name) && SplitCJKName(name_tokens, &parts)) { On 2016/07/11 13:49:15, Mathieu Perreault wrote: > Are there cases where a name is made up of CJK characters of different scripts? > If not, I'm thinking IsCJKName could instead return the script, which could be > used in SplitCJKName (would avoid going over the name twice). > > Let me know if such a thing could be possible, thanks Off the top of my head, the only one I can think of, is someone whose name is like "Bruce Takeshi". Japanese uses one script for Japanese names, and another for foreign names. "Bruce" and "Takeshi" *could* be in different scripts. As far as performance is concerned (since we re-traverse the name twice, with `IsHangulName()'), I don't think we need to worry too much. Most CJK names are made up of about 3 characters. A quick benchmark also says `uscript_getScript(c)' is even faster than looking up `c' in an `std::unordered_map', meaning that `IsCJK(c)' and `IsHangul(c)' are pretty fast as well.
lgtm, thanks https://codereview.chromium.org/2132103002/diff/20001/components/autofill/cor... File components/autofill/core/browser/autofill_data_util.cc (right): https://codereview.chromium.org/2132103002/diff/20001/components/autofill/cor... components/autofill/core/browser/autofill_data_util.cc:157: // TODO(crbug.com/89111): Japanese names with no space will be mis-split, I would put the TODO paragraph after the description paragraph.
It LGTM for me but I'd wait to see if someone from a Korean office is available to take a look. Great work btw.
https://codereview.chromium.org/2132103002/diff/20001/components/autofill/cor... File components/autofill/core/browser/autofill_data_util.cc (right): https://codereview.chromium.org/2132103002/diff/20001/components/autofill/cor... components/autofill/core/browser/autofill_data_util.cc:157: // TODO(crbug.com/89111): Japanese names with no space will be mis-split, On 2016/07/12 14:39:00, Mathieu Perreault wrote: > I would put the TODO paragraph after the description paragraph. Done.
nicolaso@google.com changed reviewers: + gogerald@chromium.org
jinsukkim@chromium.org changed reviewers: + jinsukkim@chromium.org
https://codereview.chromium.org/2132103002/diff/40001/components/autofill/cor... File components/autofill/core/browser/autofill_data_util.cc (right): https://codereview.chromium.org/2132103002/diff/40001/components/autofill/cor... components/autofill/core/browser/autofill_data_util.cc:38: // https://namu.wiki/w/%ED%95%9C%EA%B5%AD%EC%9D%98%20%EC%84%B1%EC%94%A8#s-6 Consider changing the reference to the article in wikipedia. https://ko.wikipedia.org/wiki/%ED%95%9C%EA%B5%AD%EC%9D%98_%EC%84%B1%EC%94%A8_... Namu Wiki, though getting more and more popular, allows lots of geeky, controversial, cult-like contents only available through unofficial channels. wikipedia is a more trustworthy source. https://codereview.chromium.org/2132103002/diff/40001/components/autofill/cor... components/autofill/core/browser/autofill_data_util.cc:168: // Korean full names always have at least 3 characters. So, if there are This is not entirely true. Names with 1-char surname + 1-char given name (1/1 for short) are also quite common. You need to check if the first letter is one of the surnames in the reference. It is not always possible to tell 1/1 from 0/2 (given name only) though. But this is worth checking if the input is supposed to contain both sur-/given name combo. https://codereview.chromium.org/2132103002/diff/40001/components/autofill/cor... components/autofill/core/browser/autofill_data_util.cc:175: // 4-character Korean full names default to a 2-character surname. It's You should still check if the first 2-character word is one of the surnames in |cjk_multi_char_surnames|. There are people with 3-character (or longer) given name. Korean surname is a well-defined list as shown in the reference(i.e. you cannot invent your own surname and use it). The first 2 char is not a surname if it is not in the list. https://codereview.chromium.org/2132103002/diff/40001/components/autofill/cor... components/autofill/core/browser/autofill_data_util.cc:186: if (base::StartsWith(name, surname, base::CompareCase::SENSITIVE)) { This is more complicated because the name is more likely to be 1/2 than 2/1 statistics wise. I suggest the input be categorized in following cases: A) 1/2: Checking the first character against the list of 1-char surname list will give much better result. A name like "강전희(Kang Jeon Hee)" (an old friend of mine) has the first 2 letters that can be a surname in |cjk_multi_char_surnames| but actually her surname is only 'Kang' which is one of the most common surnames in Korea, and her given name is 'Jeon Hee'. Same goes for "동방", "소봉", "어금", "장곡", and "황목" whose first letter is a surname more common that whole 2-letter one. In fact I've never heard/met people with these 2-char surnames. They are quite obscure ones. https://ko.wikipedia.org/wiki/%ED%95%9C%EA%B5%AD%EC%9D%98_%EB%B3%B5%EC%84%B1 shows that there are about 70 families of the name "어금". Likewise, There are only 70 동방, 50 소봉, 17 장곡, 12 강전 families, etc. B) 2/1: "남궁", "사공", "서문", "선우", "제갈" also have the first char that can be a surname but the whole 2-char one is much more likely to be a right surname. C) Easy decision 2/1: "독고", "망절" don't need such checking since their first letter cannot be a surname by itself. And I'm not really certain about "황보" - it can be either A) or B). I'd suggest B). Note that this is a subjective rule, but I don't think it's very biased. In conclusion, the surname to suggest for each input are: A) "강전", "동방", "소봉", "어금", "장곡", "황목" : first char only: i.e., "강", "동", "소", "어", "장", "황" B) "남궁", "사공", "서문", "선우", "제갈", "황보" : the whole 2 chars C) "독고", "망절" : ditto
https://codereview.chromium.org/2132103002/diff/40001/components/autofill/cor... File components/autofill/core/browser/autofill_data_util.cc (right): https://codereview.chromium.org/2132103002/diff/40001/components/autofill/cor... components/autofill/core/browser/autofill_data_util.cc:38: // https://namu.wiki/w/%ED%95%9C%EA%B5%AD%EC%9D%98%20%EC%84%B1%EC%94%A8#s-6 On 2016/07/12 22:50:48, Jinsuk wrote: > Consider changing the reference to the article in wikipedia. > https://ko.wikipedia.org/wiki/%ED%95%9C%EA%B5%AD%EC%9D%98_%EC%84%B1%EC%94%A8_... > Namu Wiki, though getting more and more popular, allows lots of geeky, > controversial, cult-like contents only available through unofficial channels. > wikipedia is a more trustworthy source. Done. https://codereview.chromium.org/2132103002/diff/40001/components/autofill/cor... components/autofill/core/browser/autofill_data_util.cc:168: // Korean full names always have at least 3 characters. So, if there are On 2016/07/12 22:50:48, Jinsuk wrote: > This is not entirely true. Names with 1-char surname + 1-char given name (1/1 > for short) are also quite common. You need to check if the first letter is one > of the surnames in the reference. It is not always possible to tell 1/1 from 0/2 > (given name only) though. But this is worth checking if the input is supposed to > contain both sur-/given name combo. Should we check that the first character is a very common surname? (like 김, 이, 최, 박, ...) If it is, it's more likely to be 1/1 than 0/2. If it's not a common surname, then we could use 0/2. https://codereview.chromium.org/2132103002/diff/40001/components/autofill/cor... components/autofill/core/browser/autofill_data_util.cc:175: // 4-character Korean full names default to a 2-character surname. It's On 2016/07/12 22:50:48, Jinsuk wrote: > You should still check if the first 2-character word is one of the surnames in > |cjk_multi_char_surnames|. There are people with 3-character (or longer) given > name. Korean surname is a well-defined list as shown in the reference(i.e. you > cannot invent your own surname and use it). The first 2 char is not a surname if > it is not in the list. I used two separate lists, |common_cjk_multi_char_surnames| and |korean_multi_char_surnames|. The first one only has the common ones (marked as B and C in your other comment), and the second one includes every one from the list, even rare ones like 강전 and 장곡. Even though 1/2 is more likely than 2/1 for a 3-character name, it might not be true for 2/2 vs. 1/3. This is why I used |korean_multi_char_surnames| here. I could be completely wrong. If I am, we can just remove this `if' (and the |korean_multi_char_surnames| declaration) completely. https://codereview.chromium.org/2132103002/diff/40001/components/autofill/cor... components/autofill/core/browser/autofill_data_util.cc:186: if (base::StartsWith(name, surname, base::CompareCase::SENSITIVE)) { On 2016/07/12 22:50:48, Jinsuk wrote: > This is more complicated because the name is more likely to be 1/2 than 2/1 > statistics wise. I suggest the input be categorized in following cases: > > A) 1/2: Checking the first character against the list of 1-char surname list > will give much better result. A name like "강전희(Kang Jeon Hee)" (an old friend of > mine) has the first 2 letters that can be a surname in |cjk_multi_char_surnames| > but actually her surname is only 'Kang' which is one of the most common surnames > in Korea, and her given name is 'Jeon Hee'. Same goes for "동방", "소봉", "어금", > "장곡", and "황목" whose first letter is a surname more common that whole 2-letter > one. In fact I've never heard/met people with these 2-char surnames. They are > quite obscure ones. > https://ko.wikipedia.org/wiki/%ED%95%9C%EA%B5%AD%EC%9D%98_%EB%B3%B5%EC%84%B1 > shows that there are about 70 families of the name "어금". Likewise, There are > only 70 동방, 50 소봉, 17 장곡, 12 강전 families, etc. > > B) 2/1: "남궁", "사공", "서문", "선우", "제갈" also have the first char that can be a > surname but the whole 2-char one is much more likely to be a right surname. > > C) Easy decision 2/1: "독고", "망절" don't need such checking since their first > letter cannot be a surname by itself. > > And I'm not really certain about "황보" - it can be either A) or B). I'd suggest > B). > > Note that this is a subjective rule, but I don't think it's very biased. > > In conclusion, the surname to suggest for each input are: > > A) "강전", "동방", "소봉", "어금", "장곡", "황목" : first char only: i.e., "강", "동", "소", > "어", "장", "황" > B) "남궁", "사공", "서문", "선우", "제갈", "황보" : the whole 2 chars > C) "독고", "망절" : ditto > I renamed |cjk_multi_char_surnames| to |common_cjk_multi_char_surnames|, and removed the surnames you listed in A). Also added 강전희 as a test case.
Thanks for the update. Looks great. https://codereview.chromium.org/2132103002/diff/40001/components/autofill/cor... File components/autofill/core/browser/autofill_data_util.cc (right): https://codereview.chromium.org/2132103002/diff/40001/components/autofill/cor... components/autofill/core/browser/autofill_data_util.cc:168: // Korean full names always have at least 3 characters. So, if there are On 2016/07/13 16:31:06, nicolaso wrote: > On 2016/07/12 22:50:48, Jinsuk wrote: > > This is not entirely true. Names with 1-char surname + 1-char given name (1/1 > > for short) are also quite common. You need to check if the first letter is one > > of the surnames in the reference. It is not always possible to tell 1/1 from > 0/2 > > (given name only) though. But this is worth checking if the input is supposed > to > > contain both sur-/given name combo. > > Should we check that the first character is a very common surname? (like 김, 이, > 최, 박, ...) If it is, it's more likely to be 1/1 than 0/2. If it's not a common > surname, then we could use 0/2. I wonder if it is safe to assume that people in most cases put their full name (with or without space) for autofill setting - isn't it what autofill expects? If it is, we can just go with 1/1 split, leaving a wrong input (only family name or nickname in the name field) leading to a wrong output. If not, I agree on using only common names (like top 10) for checking. No point using all of them - it would still give lots of false positives even with them all. https://codereview.chromium.org/2132103002/diff/40001/components/autofill/cor... components/autofill/core/browser/autofill_data_util.cc:175: // 4-character Korean full names default to a 2-character surname. It's On 2016/07/13 16:31:06, nicolaso wrote: > On 2016/07/12 22:50:48, Jinsuk wrote: > > You should still check if the first 2-character word is one of the surnames in > > |cjk_multi_char_surnames|. There are people with 3-character (or longer) given > > name. Korean surname is a well-defined list as shown in the reference(i.e. you > > cannot invent your own surname and use it). The first 2 char is not a surname > if > > it is not in the list. > > I used two separate lists, |common_cjk_multi_char_surnames| and > |korean_multi_char_surnames|. The first one only has the common ones (marked as > B and C in your other comment), and the second one includes every one from the > list, even rare ones like 강전 and 장곡. > > Even though 1/2 is more likely than 2/1 for a 3-character name, it might not be > true for 2/2 vs. 1/3. This is why I used |korean_multi_char_surnames| here. I > could be completely wrong. If I am, we can just remove this `if' (and the > |korean_multi_char_surnames| declaration) completely. Acknowledged. Looks good. https://codereview.chromium.org/2132103002/diff/40001/components/autofill/cor... components/autofill/core/browser/autofill_data_util.cc:186: if (base::StartsWith(name, surname, base::CompareCase::SENSITIVE)) { On 2016/07/13 16:31:06, nicolaso wrote: > On 2016/07/12 22:50:48, Jinsuk wrote: > > This is more complicated because the name is more likely to be 1/2 than 2/1 > > statistics wise. I suggest the input be categorized in following cases: > > > > A) 1/2: Checking the first character against the list of 1-char surname list > > will give much better result. A name like "강전희(Kang Jeon Hee)" (an old friend > of > > mine) has the first 2 letters that can be a surname in > |cjk_multi_char_surnames| > > but actually her surname is only 'Kang' which is one of the most common > surnames > > in Korea, and her given name is 'Jeon Hee'. Same goes for "동방", "소봉", "어금", > > "장곡", and "황목" whose first letter is a surname more common that whole 2-letter > > one. In fact I've never heard/met people with these 2-char surnames. They are > > quite obscure ones. > > https://ko.wikipedia.org/wiki/%ED%95%9C%EA%B5%AD%EC%9D%98_%EB%B3%B5%EC%84%B1 > > shows that there are about 70 families of the name "어금". Likewise, There are > > only 70 동방, 50 소봉, 17 장곡, 12 강전 families, etc. > > > > B) 2/1: "남궁", "사공", "서문", "선우", "제갈" also have the first char that can be a > > surname but the whole 2-char one is much more likely to be a right surname. > > > > C) Easy decision 2/1: "독고", "망절" don't need such checking since their first > > letter cannot be a surname by itself. > > > > And I'm not really certain about "황보" - it can be either A) or B). I'd suggest > > B). > > > > Note that this is a subjective rule, but I don't think it's very biased. > > > > In conclusion, the surname to suggest for each input are: > > > > A) "강전", "동방", "소봉", "어금", "장곡", "황목" : first char only: i.e., "강", "동", "소", > > "어", "장", "황" > > B) "남궁", "사공", "서문", "선우", "제갈", "황보" : the whole 2 chars > > C) "독고", "망절" : ditto > > > > I renamed |cjk_multi_char_surnames| to |common_cjk_multi_char_surnames|, and > removed the surnames you listed in A). Also added 강전희 as a test case. Looks good. How about adding a comment on the test case "강전희". It tests a different flow in your logic. comment in the sense of "..choose 강 over 강전 since the former is much more common". https://codereview.chromium.org/2132103002/diff/60001/components/autofill/cor... File components/autofill/core/browser/autofill_data_util.cc (right): https://codereview.chromium.org/2132103002/diff/60001/components/autofill/cor... components/autofill/core/browser/autofill_data_util.cc:35: // The most common CJK surnames (last names) that have more than one character. In fact it contains two groups. How about updating the comment "s/common/common or non-ambiguous/" so other people reading the code won't stop and stare at "독고"/"망절" https://codereview.chromium.org/2132103002/diff/60001/components/autofill/cor... File components/autofill/core/browser/autofill_data_util_unittest.cc (right): https://codereview.chromium.org/2132103002/diff/60001/components/autofill/cor... components/autofill/core/browser/autofill_data_util_unittest.cc:56: // Korean full names always have at least 3 characters, so a 2-character Please update the comment here.
https://codereview.chromium.org/2132103002/diff/60001/components/autofill/cor... File components/autofill/core/browser/autofill_data_util.cc (right): https://codereview.chromium.org/2132103002/diff/60001/components/autofill/cor... components/autofill/core/browser/autofill_data_util.cc:35: // The most common CJK surnames (last names) that have more than one character. On 2016/07/13 21:51:09, Jinsuk wrote: > In fact it contains two groups. How about updating the comment "s/common/common > or non-ambiguous/" so other people reading the code won't stop and stare at > "독고"/"망절" Done. https://codereview.chromium.org/2132103002/diff/60001/components/autofill/cor... File components/autofill/core/browser/autofill_data_util_unittest.cc (right): https://codereview.chromium.org/2132103002/diff/60001/components/autofill/cor... components/autofill/core/browser/autofill_data_util_unittest.cc:56: // Korean full names always have at least 3 characters, so a 2-character On 2016/07/13 21:51:09, Jinsuk wrote: > Please update the comment here. Done.
lgtm with comments https://codereview.chromium.org/2132103002/diff/60001/components/autofill/cor... File components/autofill/core/browser/autofill_data_util.cc (right): https://codereview.chromium.org/2132103002/diff/60001/components/autofill/cor... components/autofill/core/browser/autofill_data_util.cc:214: } else if (name_tokens.size() == 2) { break this 'else if' into a separate 'if' statement since you have returned, https://codereview.chromium.org/2132103002/diff/60001/components/autofill/cor... File components/autofill/core/browser/autofill_data_util_unittest.cc (right): https://codereview.chromium.org/2132103002/diff/60001/components/autofill/cor... components/autofill/core/browser/autofill_data_util_unittest.cc:48: {"劉翔", "翔", "", "劉"}, // (Simplified) Chinese name, Unihan nit: I believe this ("劉") is traditional and above ("刘") is simplified,
https://codereview.chromium.org/2132103002/diff/60001/components/autofill/cor... File components/autofill/core/browser/autofill_data_util.cc (right): https://codereview.chromium.org/2132103002/diff/60001/components/autofill/cor... components/autofill/core/browser/autofill_data_util.cc:214: } else if (name_tokens.size() == 2) { On 2016/07/14 18:28:52, gogerald1 wrote: > break this 'else if' into a separate 'if' statement since you have returned, Done. https://codereview.chromium.org/2132103002/diff/60001/components/autofill/cor... File components/autofill/core/browser/autofill_data_util_unittest.cc (right): https://codereview.chromium.org/2132103002/diff/60001/components/autofill/cor... components/autofill/core/browser/autofill_data_util_unittest.cc:48: {"劉翔", "翔", "", "劉"}, // (Simplified) Chinese name, Unihan On 2016/07/14 18:28:52, gogerald1 wrote: > nit: I believe this ("劉") is traditional and above ("刘") is simplified, Fixed.
On 2016/07/14 18:12:43, nicolaso wrote: > https://codereview.chromium.org/2132103002/diff/60001/components/autofill/cor... > File components/autofill/core/browser/autofill_data_util.cc (right): > > https://codereview.chromium.org/2132103002/diff/60001/components/autofill/cor... > components/autofill/core/browser/autofill_data_util.cc:35: // The most common > CJK surnames (last names) that have more than one character. > On 2016/07/13 21:51:09, Jinsuk wrote: > > In fact it contains two groups. How about updating the comment > "s/common/common > > or non-ambiguous/" so other people reading the code won't stop and stare at > > "독고"/"망절" > > Done. > > https://codereview.chromium.org/2132103002/diff/60001/components/autofill/cor... > File components/autofill/core/browser/autofill_data_util_unittest.cc (right): > > https://codereview.chromium.org/2132103002/diff/60001/components/autofill/cor... > components/autofill/core/browser/autofill_data_util_unittest.cc:56: // Korean > full names always have at least 3 characters, so a 2-character > On 2016/07/13 21:51:09, Jinsuk wrote: > > Please update the comment here. > > Done. lgtm
The CQ bit was checked by nicolaso@google.com
The patchset sent to the CQ was uploaded after l-g-t-m from mathp@chromium.org, sebsg@chromium.org, gogerald@chromium.org Link to the patchset: https://codereview.chromium.org/2132103002/#ps100001 (title: "nits")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: win_chromium_compile_dbg_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_comp...)
The CQ bit was checked by nicolaso@google.com
The patchset sent to the CQ was uploaded after l-g-t-m from mathp@chromium.org, sebsg@chromium.org, jinsukkim@chromium.org, gogerald@chromium.org Link to the patchset: https://codereview.chromium.org/2132103002/#ps120001 (title: "Fix build on Windows.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Message was sent while issue was closed.
Description was changed from ========== Split CJK full names into name parts correctly. This a a partial, somewhat naive, fix for a bug where CJK names were split the same way as western names. CJK names have the family name (surname) first, and don't usually have a space in-between the two parts. For a complete fix, we might need to improve Japanese name detection, when there is no space between the two names. It's unclear how often users actually enter their name without spaces in a form field. We might also want to fix the opposite use-case: when the user enters their first & last name in separate fields, we should infer their full name in the right order (with ordering based on the script used). BUG=89111 ========== to ========== Split CJK full names into name parts correctly. This a a partial, somewhat naive, fix for a bug where CJK names were split the same way as western names. CJK names have the family name (surname) first, and don't usually have a space in-between the two parts. For a complete fix, we might need to improve Japanese name detection, when there is no space between the two names. It's unclear how often users actually enter their name without spaces in a form field. We might also want to fix the opposite use-case: when the user enters their first & last name in separate fields, we should infer their full name in the right order (with ordering based on the script used). BUG=89111 ==========
Message was sent while issue was closed.
Committed patchset #7 (id:120001)
Message was sent while issue was closed.
CQ bit was unchecked.
Message was sent while issue was closed.
Description was changed from ========== Split CJK full names into name parts correctly. This a a partial, somewhat naive, fix for a bug where CJK names were split the same way as western names. CJK names have the family name (surname) first, and don't usually have a space in-between the two parts. For a complete fix, we might need to improve Japanese name detection, when there is no space between the two names. It's unclear how often users actually enter their name without spaces in a form field. We might also want to fix the opposite use-case: when the user enters their first & last name in separate fields, we should infer their full name in the right order (with ordering based on the script used). BUG=89111 ========== to ========== Split CJK full names into name parts correctly. This a a partial, somewhat naive, fix for a bug where CJK names were split the same way as western names. CJK names have the family name (surname) first, and don't usually have a space in-between the two parts. For a complete fix, we might need to improve Japanese name detection, when there is no space between the two names. It's unclear how often users actually enter their name without spaces in a form field. We might also want to fix the opposite use-case: when the user enters their first & last name in separate fields, we should infer their full name in the right order (with ordering based on the script used). BUG=89111 Committed: https://crrev.com/4a655982616cfd5c05744f509b5d088f9d93bb81 Cr-Commit-Position: refs/heads/master@{#405997} ==========
Message was sent while issue was closed.
Patchset 7 (id:??) landed as https://crrev.com/4a655982616cfd5c05744f509b5d088f9d93bb81 Cr-Commit-Position: refs/heads/master@{#405997} |