|
|
Created:
4 years, 8 months ago by jungshik at Google Modified:
4 years, 7 months ago Reviewers:
drott CC:
blink-reviews, blink-reviews-platform-graphics_chromium.org, Rik, chromium-reviews, danakj+watch_chromium.org, dshwang, drott+blinkwatch_chromium.org, krit, f(malita), jbroman, Justin Novosad, kinuko+watch, pdr+graphicswatchlist_chromium.org, rwlbuis, Stephen Chennney Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionUse ICU's emoji character properties
ICU 57 has added support for Emoji-related character properites.
It's ported to Chrome's copy of ICU 56 so that we can use them.
For system ICU (56 or earlier), we continue to use the home-grown
implementation.
BUG=579552
TEST=blink_platform_unittests --gtest_filter=Charact*
Committed: https://crrev.com/ee583775d8a9e16b71810a9fe305c5f4db638a57
Cr-Commit-Position: refs/heads/master@{#392429}
Patch Set 1 #Patch Set 2 : uchar.h: include explicitly #
Total comments: 1
Patch Set 3 : rebased #Patch Set 4 : rebased #Messages
Total messages: 13 (4 generated)
jshin@chromium.org changed reviewers: + drott@chromium.org
drott@: currently, isEmojiTextPresentation fails because the hardcoded set you have is different from what I get with "|Emoji=yes| && |EmojiPresentation=no|". I saw your comment somewhere else that both isEmojiEmojiPresentation and isEmojiTextPresentation return true for some characters overlap each other. Indeed, the two hard-coded sets (kEmojiEmojiPattern and kEmojiTextPattern) in CharacterEmoji.cpp do overlap each other. It's not clear to me what use case you have in mind by making them overlap each other. How did you derive the two sets from the emoji data at http://www.unicode.org/Public/emoji/2.0//emoji-data.txt ? http://www.unicode.org/reports/tr51/#Presentation_Style has the following: There had been no clear line for implementers between three categories of Unicode characters: emoji-default: those expected to have an emoji presentation by default, but can also have a text presentation text-default: those expected to have a text presentation by default, but could also have an emoji presentation text-only: those that should only have a text presentation These categories can be distinguished using properties listed in Annex A: Emoji Properties and Data Files. The first category are characters with Emoji=Yes and Emoji_Presentation=Yes. The second category are characters with Emoji=Yes and Emoji_Presentation=No. The third category are characters with Emoji=No.
On 2016/04/05 at 09:56:00, jshin wrote: > drott@: currently, isEmojiTextPresentation fails because the hardcoded set you have is different from what I get with "|Emoji=yes| && |EmojiPresentation=no|". Thanks for pointing this out. Thinking about it more, I'll change the implementation to align it a bit closer with the terminology in UTR51. I'll rename the functions to isEmojiTextDefault and isEmojiEmojiDefault, and make the sets not overlap each other and change the failing test case to TestEmojiTextDefault and change these lines: // These are codepoints that have Emoji=Yes and EmojiPresentation=Yes EXPECT_FALSE(Character::isEmojiTextDefault(0x1F9C0)); EXPECT_FALSE(Character::isEmojiTextDefault(0x26BD)); EXPECT_FALSE(Character::isEmojiTextDefault(0x26BE)); > I saw your comment somewhere else that both isEmojiEmojiPresentation and isEmojiTextPresentation return true for some characters overlap each other. Indeed, the two hard-coded sets (kEmojiEmojiPattern and kEmojiTextPattern) in CharacterEmoji.cpp do overlap each other. Yes, that comment is in SymbolsIterator. > It's not clear to me what use case you have in mind by making them overlap each other. How did you derive the two sets from the emoji data at http://www.unicode.org/Public/emoji/2.0//emoji-data.txt ? I did not have a particular use case in mind, quite to the contrary, in SymbolsIterator I was subtracting the other set intentionally because of how those methods behave - the reason was probably rather a bit of a misconception on my side. I did a straightforward transformation from the range sets defined in emoji_data.txt file to these patterns/ranges in CharacterEmoji.cpp. > http://www.unicode.org/reports/tr51/#Presentation_Style has the following: > > There had been no clear line for implementers between three categories of Unicode characters: > > emoji-default: those expected to have an emoji presentation by default, but can also have a text presentation > text-default: those expected to have a text presentation by default, but could also have an emoji presentation > text-only: those that should only have a text presentation > These categories can be distinguished using properties listed in Annex A: Emoji Properties and Data Files. The first category are characters with Emoji=Yes and Emoji_Presentation=Yes. The second category are characters with Emoji=Yes and Emoji_Presentation=No. The third category are characters with Emoji=No. Okay, yes, thanks for pointing that out. So, as mentioned above, I would suggest to bring the method names closer to this part of the TR in order to avoid further confusion.
https://codereview.chromium.org/1856353002/diff/20001/third_party/WebKit/Sour... File third_party/WebKit/Source/platform/fonts/CharacterEmoji.cpp (right): https://codereview.chromium.org/1856353002/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/platform/fonts/CharacterEmoji.cpp:10: #if defined(USING_SYSTEM_ICU) && (U_ICU_VERSION_MAJOR_NUM <= 56) Can we avoid #ifdef and only keep the ICU property functions? I am slightly worried about maintaining two sets of these properties. On which platforms are we not controlling the ICU version?
On 2016/04/05 19:40:40, drott wrote: > https://codereview.chromium.org/1856353002/diff/20001/third_party/WebKit/Sour... > File third_party/WebKit/Source/platform/fonts/CharacterEmoji.cpp (right): > > https://codereview.chromium.org/1856353002/diff/20001/third_party/WebKit/Sour... > third_party/WebKit/Source/platform/fonts/CharacterEmoji.cpp:10: #if > defined(USING_SYSTEM_ICU) && (U_ICU_VERSION_MAJOR_NUM <= 56) > Can we avoid #ifdef and only keep the ICU property functions? I am slightly > worried about maintaining two sets of these properties. On which platforms are > we not controlling the ICU version? We control everything in Google Chrome. USING...ICU is for Chromium. Linux distros packages their own builds of Chromium with various versions of ICU.
Description was changed from ========== Use ICU's emoji character properties This has to be landed after ICU update to cherry-pick Emoji properties support. BUG=579552 TEST=blink_platform_unittests --gtest_filter=Charact* ========== to ========== Use ICU's emoji character properties ICU 57 has added support for Emoji-related character properites. It's ported to Chrome's copy of ICU 56 so that we can use them. For system ICU (56 or earlier), we continue to use the home-grown implementation. BUG=579552 TEST=blink_platform_unittests --gtest_filter=Charact* ==========
Sorry for the delay. PTAL. Thanks
LGTM, thanks!
The CQ bit was checked by jshin@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1856353002/60001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1856353002/60001
Message was sent while issue was closed.
Committed patchset #4 (id:60001)
Message was sent while issue was closed.
Description was changed from ========== Use ICU's emoji character properties ICU 57 has added support for Emoji-related character properites. It's ported to Chrome's copy of ICU 56 so that we can use them. For system ICU (56 or earlier), we continue to use the home-grown implementation. BUG=579552 TEST=blink_platform_unittests --gtest_filter=Charact* ========== to ========== Use ICU's emoji character properties ICU 57 has added support for Emoji-related character properites. It's ported to Chrome's copy of ICU 56 so that we can use them. For system ICU (56 or earlier), we continue to use the home-grown implementation. BUG=579552 TEST=blink_platform_unittests --gtest_filter=Charact* Committed: https://crrev.com/ee583775d8a9e16b71810a9fe305c5f4db638a57 Cr-Commit-Position: refs/heads/master@{#392429} ==========
Message was sent while issue was closed.
Patchset 4 (id:??) landed as https://crrev.com/ee583775d8a9e16b71810a9fe305c5f4db638a57 Cr-Commit-Position: refs/heads/master@{#392429} |