|
|
Created:
9 years, 10 months ago by Nebojša Ćirić Modified:
9 years, 7 months ago CC:
v8-dev, gabor_google.com Visibility:
Public. |
DescriptionAdding break iterator support to the i18n api extension.
This is vendor specific, and is prefixed by v8.
WebKit layout tests will be added in a separate CL.
Committed: http://code.google.com/p/v8/source/detail?r=7006
Patch Set 1 #
Total comments: 11
Patch Set 2 : '' #
Total comments: 10
Patch Set 3 : '' #
Total comments: 2
Patch Set 4 : '' #
Total comments: 8
Patch Set 5 : '' #
Total comments: 4
Patch Set 6 : '' #
Messages
Total messages: 12 (0 generated)
Mads, please take a close look at MakeWeak/Delete logic. Also, I would like to check with you if I understand HandleScope properly: 1. If I do any of ::New or NewInstance and those are local vars (not to be returned) I should always use HandleScope for auto cleanup. 2. Plain returns of say v8::String::New(...) don't need HandleScope::Close(allocated-string) 3. If HandleScope is used, then I need to call Close(var) for any variable I would like to return back to the caller. Does that sound good? Jungshik please take a look at i18n stuff.
Thank you for the CL. Below are some comments. http://codereview.chromium.org/6598014/diff/1/src/extensions/experimental/bre... File src/extensions/experimental/break-iterator.cc (right): http://codereview.chromium.org/6598014/diff/1/src/extensions/experimental/bre... src/extensions/experimental/break-iterator.cc:63: text_value->Utf8Length()); Chrome's ICU didn't define a macro (U_CHARSET_IS_UTF8) to get ICU to assume that |char *| is UTF-8. Perhaps, we'll in the future. At least for now (and perhaps in the future, too because we don't control how other embedders will build ICU), you should explicitly specify the encoding for UnicodeString ctor. You didn't get caught with this because you tested it on Mac (or Linux under UTF-8 locale). Under non-UTF-8 locale on Linux or Windows, your test would have failed. You have three alternatives: 1. use UnicodeString:: fromUTF8 2. use a ctor accepting the codepage name : pass "UTF-8" 3. use a ctor accepting UChar* instead of char*. #1 is faster than #2. http://codereview.chromium.org/6598014/diff/1/src/extensions/experimental/bre... src/extensions/experimental/break-iterator.cc:81: return v8::Int32::New(break_iterator->next()); When there's no more item, ICU's next() will returb UBRK_DONE (or sth like that). How would you signal that to your API users? If you want to use '-1' for that, perhaps you need to check the return value from bi->next() and return '-1' explicitly for UBRK_DONE. http://codereview.chromium.org/6598014/diff/1/src/extensions/experimental/bre... src/extensions/experimental/break-iterator.cc:110: return v8::Int32::New(UBRK_WORD_IDEO); We need to give a bit more thought on what types to expose to Javascript. BTW, line/setnence breakers can also have rule_status ( http://icu-project.org/apiref/icu4c/ubrk_8h.html#ad03d8e27f121bcf11eaed0a2887... ) although they may not be very useful. http://codereview.chromium.org/6598014/diff/1/src/extensions/experimental/bre... src/extensions/experimental/break-iterator.cc:154: raw_template->Set(v8::String::New("assign"), How about 'adopt' or 'adoptText'? 'assign' is kinda misleading (assigning 'string' to 'break iterator' ?) http://codereview.chromium.org/6598014/diff/1/src/extensions/experimental/bre... src/extensions/experimental/break-iterator.cc:163: v8::FunctionTemplate::New(BreakIteratorBreakType)); One method is missing; a method that returns the word at the current position. A customer of this API can certainly find that out, but wouldn't it be nice to offer ourselves? Yeah, we discussed this and I said that next() is sufficient without nextWord(). Alternatives: 1. Do nothing. Customers can figure that out 2. Add currentWord() method (either in C++ or JavaScript). 3. Add nextWord() (and remove next()) What do you say?
I'll make code changes and get back to you on the rest... http://codereview.chromium.org/6598014/diff/1/src/extensions/experimental/bre... File src/extensions/experimental/break-iterator.cc (right): http://codereview.chromium.org/6598014/diff/1/src/extensions/experimental/bre... src/extensions/experimental/break-iterator.cc:81: return v8::Int32::New(break_iterator->next()); UBRK_DONE is actually int32_t and is represented as -1, so this works as intended, and ICU is probably not going to change public API any time soon. I would leave it as is. On 2011/02/26 00:31:38, Jungshik Shin wrote: > When there's no more item, ICU's next() will returb UBRK_DONE (or sth like > that). How would you signal that to your API users? If you want to use '-1' for > that, perhaps you need to check the return value from bi->next() and return '-1' > explicitly for UBRK_DONE. http://codereview.chromium.org/6598014/diff/1/src/extensions/experimental/bre... src/extensions/experimental/break-iterator.cc:110: return v8::Int32::New(UBRK_WORD_IDEO); We can take them out at any time - it's still experimental. On 2011/02/26 00:31:38, Jungshik Shin wrote: > We need to give a bit more thought on what types to expose to Javascript. > > BTW, line/setnence breakers can also have rule_status ( > http://icu-project.org/apiref/icu4c/ubrk_8h.html#ad03d8e27f121bcf11eaed0a2887... > ) although they may not be very useful. > http://codereview.chromium.org/6598014/diff/1/src/extensions/experimental/bre... src/extensions/experimental/break-iterator.cc:154: raw_template->Set(v8::String::New("assign"), I'll fix that. On 2011/02/26 00:31:38, Jungshik Shin wrote: > How about 'adopt' or 'adoptText'? 'assign' is kinda misleading (assigning > 'string' to 'break iterator' ?) http://codereview.chromium.org/6598014/diff/1/src/extensions/experimental/bre... src/extensions/experimental/break-iterator.cc:163: v8::FunctionTemplate::New(BreakIteratorBreakType)); JavaScript has two (actually 3, but one is deprecated) methods to slice a string: slice(from, to) and substring(from, to). So simple prev_pos and pos with slice/substring give you that new method. I think we should keep the API minimal, and I don't think it's going to put any burden on the users given the existing JS apis. On 2011/02/26 00:31:38, Jungshik Shin wrote: > One method is missing; a method that returns the word at the current position. > A customer of this API can certainly find that out, but wouldn't it be nice to > offer ourselves? > > Yeah, we discussed this and I said that next() is sufficient without nextWord(). > > > Alternatives: > > 1. Do nothing. Customers can figure that out > 2. Add currentWord() method (either in C++ or JavaScript). > 3. Add nextWord() (and remove next()) > > What do you say? > >
If we decide we need line and sentence break types we can add them later, but for now I think we should skip them (not terribly useful for our use case). http://codereview.chromium.org/6598014/diff/1/src/extensions/experimental/bre... File src/extensions/experimental/break-iterator.cc (right): http://codereview.chromium.org/6598014/diff/1/src/extensions/experimental/bre... src/extensions/experimental/break-iterator.cc:63: text_value->Utf8Length()); On 2011/02/26 00:31:38, Jungshik Shin wrote: > Chrome's ICU didn't define a macro (U_CHARSET_IS_UTF8) to get ICU to assume that > |char *| is UTF-8. Perhaps, we'll in the future. > > At least for now (and perhaps in the future, too because we don't control how > other embedders will build ICU), you should explicitly specify the encoding for > UnicodeString ctor. You didn't get caught with this because you tested it on Mac > (or Linux under UTF-8 locale). Under non-UTF-8 locale on Linux or Windows, your > test would have failed. > > You have three alternatives: > > 1. use UnicodeString:: fromUTF8 > 2. use a ctor accepting the codepage name : pass "UTF-8" > 3. use a ctor accepting UChar* instead of char*. > > #1 is faster than #2. Done. http://codereview.chromium.org/6598014/diff/1/src/extensions/experimental/bre... src/extensions/experimental/break-iterator.cc:154: raw_template->Set(v8::String::New("assign"), On 2011/02/26 00:31:38, Jungshik Shin wrote: > How about 'adopt' or 'adoptText'? 'assign' is kinda misleading (assigning > 'string' to 'break iterator' ?) Done.
This is headed in the right direction on the use of the V8 API. There are a couple of issues that we need to address. Some of this might be easier to discuss face to face. Please feel free to setup a video conference meeting with me if needed. The main issue is that we need to hook up the lifetime of the JS break iterator object to the lifetime of the C++ object. When there are no more JS references to the JS wrapper we can delete the C++ break iterator. For that we need a persistent handle pr. BreakIterator to its wrapper object. That persistent handle is the one that needs to be weak and have a callback that deletes the BreakIterator *and* disposes the persistent handle. http://codereview.chromium.org/6598014/diff/4003/src/extensions/experimental/... File src/extensions/experimental/break-iterator.cc (right): http://codereview.chromium.org/6598014/diff/4003/src/extensions/experimental/... src/extensions/experimental/break-iterator.cc:42: return static_cast<icu::BreakIterator*>(obj->GetPointerFromInternalField(0)); You need to be more careful here. The users of UnpackBreakIterator are functions. These functions can be moved to other objects and therefore the holder does not need to be a break iterator object at all. Instead of using an ObjectTemplate for you break iterator wrappers I think you should use a FunctionTemplate instead. Then here you can check that the argument is an instance of that function template. If it is, it will have be internal field and the value in it will be a BreakIterator pointer. Have a look at FunctionTemplate. You add the methods on the instance_template as you do now. Then you can use templ->HasInstance(obj). http://codereview.chromium.org/6598014/diff/4003/src/extensions/experimental/... src/extensions/experimental/break-iterator.cc:49: delete UnpackBreakIterator(v8::Persistent<v8::Object>::Cast(object)); You need to do object.Dispose() as well to delete the persistent handle as well. This callback needs to be used on all break iterator instances instead of on the template (see below). http://codereview.chromium.org/6598014/diff/4003/src/extensions/experimental/... src/extensions/experimental/break-iterator.cc:150: v8::HandleScope handle_scope; Is there a reason not have have this handle scope earlier in this method? You are also allocating strings above for instance. The handles for those might as well be local as well. http://codereview.chromium.org/6598014/diff/4003/src/extensions/experimental/... src/extensions/experimental/break-iterator.cc:167: // Make template weak so we can delete iterator once GC kicks in. This is not the thing that you want to make weak. The break_iterator_template does not correspond to a concrete break iterator? http://codereview.chromium.org/6598014/diff/4003/src/extensions/experimental/... src/extensions/experimental/break-iterator.cc:174: wrapper->SetPointerInInternalField(0, break_iterator); This is where you set up the correspondence. Here you need to decide on the memory management story that you want. I think your BreakIterator instance needs to have a reference (persistent handle) to the wrapper so it can return it again for the next access to it. Also, that will allow you to tie the life-time of the wrapper to the lifetime of the breakiterator C++ instance. Is the breakiterator C++ instance used by anything else than JavaScript? If not, then each break iterator C++ instance should have a persistent handle to the JS wrapper. This persistent handle should be weak and in the callback you should call dispose on the persistent handle *and* delete the break iterator object.
I've chatted with Mads today to make sure FunctionTemplate would work for me. It actually lets me do things I wanted to do with ObjectTemplate but didn't see an easy way of doing it, like specifying the prototype methods. I now: 1. Create FunctionTemplate 2. Add internal field to its InstanceTemplate 3. Add all methods to its PrototypeTemplate 4. Create each break iterator as persistent 5. Check the type on upack 6. Delete the pointer and Dispose of the handle on delete callback http://codereview.chromium.org/6598014/diff/4003/src/extensions/experimental/... File src/extensions/experimental/break-iterator.cc (right): http://codereview.chromium.org/6598014/diff/4003/src/extensions/experimental/... src/extensions/experimental/break-iterator.cc:42: return static_cast<icu::BreakIterator*>(obj->GetPointerFromInternalField(0)); On 2011/02/28 13:03:32, Mads Ager wrote: > You need to be more careful here. The users of UnpackBreakIterator are > functions. These functions can be moved to other objects and therefore the > holder does not need to be a break iterator object at all. > > Instead of using an ObjectTemplate for you break iterator wrappers I think you > should use a FunctionTemplate instead. Then here you can check that the argument > is an instance of that function template. If it is, it will have be internal > field and the value in it will be a BreakIterator pointer. > > Have a look at FunctionTemplate. You add the methods on the instance_template as > you do now. Then you can use templ->HasInstance(obj). Done. http://codereview.chromium.org/6598014/diff/4003/src/extensions/experimental/... src/extensions/experimental/break-iterator.cc:49: delete UnpackBreakIterator(v8::Persistent<v8::Object>::Cast(object)); On 2011/02/28 13:03:32, Mads Ager wrote: > You need to do object.Dispose() as well to delete the persistent handle as well. > This callback needs to be used on all break iterator instances instead of on the > template (see below). Done. http://codereview.chromium.org/6598014/diff/4003/src/extensions/experimental/... src/extensions/experimental/break-iterator.cc:150: v8::HandleScope handle_scope; On 2011/02/28 13:03:32, Mads Ager wrote: > Is there a reason not have have this handle scope earlier in this method? You > are also allocating strings above for instance. The handles for those might as > well be local as well. Done. http://codereview.chromium.org/6598014/diff/4003/src/extensions/experimental/... src/extensions/experimental/break-iterator.cc:167: // Make template weak so we can delete iterator once GC kicks in. Yip, it felt weird when I did it. Never occurred to me I should make actual Object persistent. Done. On 2011/02/28 13:03:32, Mads Ager wrote: > This is not the thing that you want to make weak. The break_iterator_template > does not correspond to a concrete break iterator? http://codereview.chromium.org/6598014/diff/4003/src/extensions/experimental/... src/extensions/experimental/break-iterator.cc:174: wrapper->SetPointerInInternalField(0, break_iterator); On 2011/02/28 13:03:32, Mads Ager wrote: > This is where you set up the correspondence. Here you need to decide on the > memory management story that you want. I think your BreakIterator instance needs > to have a reference (persistent handle) to the wrapper so it can return it again > for the next access to it. Also, that will allow you to tie the life-time of the > wrapper to the lifetime of the breakiterator C++ instance. > > Is the breakiterator C++ instance used by anything else than JavaScript? If not, > then each break iterator C++ instance should have a persistent handle to the JS > wrapper. This persistent handle should be weak and in the callback you should > call dispose on the persistent handle *and* delete the break iterator object. > > > Done.
lgtm http://codereview.chromium.org/6598014/diff/6007/src/extensions/experimental/... File src/extensions/experimental/break-iterator.cc (right): http://codereview.chromium.org/6598014/diff/6007/src/extensions/experimental/... src/extensions/experimental/break-iterator.cc:33: #include "unicode/uloc.h" nit: You don't need this here, do you?
http://codereview.chromium.org/6598014/diff/6007/src/extensions/experimental/... File src/extensions/experimental/break-iterator.cc (right): http://codereview.chromium.org/6598014/diff/6007/src/extensions/experimental/... src/extensions/experimental/break-iterator.cc:33: #include "unicode/uloc.h" On 2011/02/28 23:02:24, Jungshik Shin wrote: > nit: You don't need this here, do you? Done.
Getting really close. One type checking thing and then we are ready to land. :) http://codereview.chromium.org/6598014/diff/6008/src/extensions/experimental/... File src/extensions/experimental/break-iterator.cc (right): http://codereview.chromium.org/6598014/diff/6008/src/extensions/experimental/... src/extensions/experimental/break-iterator.cc:65: v8::String::New("Internal - Unexpected object type."))); Might be nice to make the error message more informative? How about "BreakIterator method called on an object that is not a BreakIterator". http://codereview.chromium.org/6598014/diff/6008/src/extensions/experimental/... src/extensions/experimental/break-iterator.cc:147: // No need to check args here, we do it in JavaScript. I actually think that you do want to check that the arguments are strings here and throw an exception otherwise. In the JS code, you only check that the arguments are not undefined. You can get arbitrary objects in here and converting them to string does not necessarily succeed. Consider the following JavaScript code: var o = { valueOf: function() { throw 42; } }; var s = "" + o; This will perform a ToString operation on 'o' which will result in an exception being thrown. You can get in this situation for the two ToString calls here and you do not deal with the exception case (where you will get a null handle back from V8). I think it is fine to not deal with it by checking that the two arguments are strings before the conversion and throwing an exception otherwise. http://codereview.chromium.org/6598014/diff/6008/src/extensions/experimental/... src/extensions/experimental/break-iterator.cc:182: raw_template->SetClassName(v8::String::New("v8Locale.v8BreakIterator")); Maybe just v8BreakIterator? http://codereview.chromium.org/6598014/diff/6008/src/extensions/experimental/... File src/extensions/experimental/i18n-extension.cc (right): http://codereview.chromium.org/6598014/diff/6008/src/extensions/experimental/... src/extensions/experimental/i18n-extension.cc:92: "v8Locale.v8BreakIterator = function(locale, type) {" Do you need to have the v8 prefix on everything in here? I would guess that having a V8 prefix on the v8Locale object is enough. You cannot get to any of this without going through that which makes it clear that this is v8 specific. I would remove the v8 prefix from v8BreakIterator and from v8CreateBreakIterator (which would be consistent with using names like displayName instead of v8DisplayName for other methods).
All done. Thanks for the review! I'll have to create a new client to land this once I get lgtm from you. http://codereview.chromium.org/6598014/diff/6008/src/extensions/experimental/... File src/extensions/experimental/break-iterator.cc (right): http://codereview.chromium.org/6598014/diff/6008/src/extensions/experimental/... src/extensions/experimental/break-iterator.cc:65: v8::String::New("Internal - Unexpected object type."))); On 2011/03/01 09:10:14, Mads Ager wrote: > Might be nice to make the error message more informative? How about > "BreakIterator method called on an object that is not a BreakIterator". Done. http://codereview.chromium.org/6598014/diff/6008/src/extensions/experimental/... src/extensions/experimental/break-iterator.cc:147: // No need to check args here, we do it in JavaScript. On 2011/03/01 09:10:14, Mads Ager wrote: > I actually think that you do want to check that the arguments are strings here > and throw an exception otherwise. In the JS code, you only check that the > arguments are not undefined. You can get arbitrary objects in here and > converting them to string does not necessarily succeed. > > Consider the following JavaScript code: > > var o = { valueOf: function() { throw 42; } }; > var s = "" + o; > > This will perform a ToString operation on 'o' which will result in an exception > being thrown. You can get in this situation for the two ToString calls here and > you do not deal with the exception case (where you will get a null handle back > from V8). > > I think it is fine to not deal with it by checking that the two arguments are > strings before the conversion and throwing an exception otherwise. Done. http://codereview.chromium.org/6598014/diff/6008/src/extensions/experimental/... src/extensions/experimental/break-iterator.cc:182: raw_template->SetClassName(v8::String::New("v8Locale.v8BreakIterator")); I did some testing yesterday before I decided on this name. If you do this: Foo = function() {}; Foo.Bar = function() {}; var bar = new Foo.Bar(); Inspector in Chrome will show Foo.Bar for the type, not just Bar. If you still think I should rename, I'll do it. On 2011/03/01 09:10:14, Mads Ager wrote: > Maybe just v8BreakIterator? http://codereview.chromium.org/6598014/diff/6008/src/extensions/experimental/... File src/extensions/experimental/i18n-extension.cc (right): http://codereview.chromium.org/6598014/diff/6008/src/extensions/experimental/... src/extensions/experimental/i18n-extension.cc:92: "v8Locale.v8BreakIterator = function(locale, type) {" I'll rename v8Locale into LocaleInfo soon so we'll lose the vendor branding there. BreakIterator is different from other methods under Locale object because it's not part of the current EcmaScript proposal and I want to keep all vendor specific methods/object under v8 prefix for now. So for now we have ugly(?) v8Locale.v8BreakIterator, but as soon as I rename v8Locale we will have only LocaleInfo.v8BreakIterator. On 2011/03/01 09:10:14, Mads Ager wrote: > Do you need to have the v8 prefix on everything in here? I would guess that > having a V8 prefix on the v8Locale object is enough. You cannot get to any of > this without going through that which makes it clear that this is v8 specific. > > I would remove the v8 prefix from v8BreakIterator and from v8CreateBreakIterator > (which would be consistent with using names like displayName instead of > v8DisplayName for other methods).
LGTM with a couple of comment nits. http://codereview.chromium.org/6598014/diff/6013/src/extensions/experimental/... File src/extensions/experimental/break-iterator.cc (right): http://codereview.chromium.org/6598014/diff/6013/src/extensions/experimental/... src/extensions/experimental/break-iterator.cc:54: // First delete the hidden C++ object. Deleting NULL is ok. Unpacking should never give NULL here. That will only happen if you use this as the weak callback for persistent handles not pointing to a break iterator. Maybe update the comment with that info instead? http://codereview.chromium.org/6598014/diff/6013/src/extensions/experimental/... src/extensions/experimental/break-iterator.cc:61: // Returns a ThrowException value in case we encounter invalid ThrowException returns undefined and schedules an exception to be thrown (which means that the undefined is ignored and the stack unwinded). Change the comment: "Returns a ThrowEception value" -> "Throws an exception"?
Thanks. I kept the comments long for educational purposes :) http://codereview.chromium.org/6598014/diff/6013/src/extensions/experimental/... File src/extensions/experimental/break-iterator.cc (right): http://codereview.chromium.org/6598014/diff/6013/src/extensions/experimental/... src/extensions/experimental/break-iterator.cc:54: // First delete the hidden C++ object. Deleting NULL is ok. On 2011/03/01 18:58:32, Mads Ager wrote: > Unpacking should never give NULL here. That will only happen if you use this as > the weak callback for persistent handles not pointing to a break iterator. Maybe > update the comment with that info instead? Done. http://codereview.chromium.org/6598014/diff/6013/src/extensions/experimental/... src/extensions/experimental/break-iterator.cc:61: // Returns a ThrowException value in case we encounter invalid On 2011/03/01 18:58:32, Mads Ager wrote: > ThrowException returns undefined and schedules an exception to be thrown (which > means that the undefined is ignored and the stack unwinded). Change the comment: > "Returns a ThrowEception value" -> "Throws an exception"? Done. |