-
Notifications
You must be signed in to change notification settings - Fork 7.3k
build: add i18n build option #6371
Comments
I don't think you have to set os_posix, deps/v8/build/standalone.gypi is supposed to figure it out by itself, depending on what the OS variable is set to. |
@bnoordhuis |
Hrm, weird. Now that I think about it, os_posix is only used in standalone.gypi but node doesn't use that file. At least, it shouldn't - we don't build v8 binaries. |
Ping @trevnorris, any update on this? |
@davglass Master is now using v8 3.22, but we've disabled i18n by default. @isaacs @bnoordhuis Do we have any plans to include basic icu support? if nothing else as a build flag? |
I see no reason we couldn't make it a configure flag |
That's all we are really looking for is a "supported" configure flag without all the little hacks :) |
We could make it a configure flag. Caveat emptor, V8 currently only builds against the libicu that's bundled with chromium. |
Adds a --with-icu-path= switch to the configure script. Requires that the user checks out the copy of libicu that's bundled with chromium to a fixed directory. It's still a little rough around the edges but it works. Fixes nodejs#6371.
Seems like i18n isn't really a necessity given the NumberFormat and Number.toLocalString |
@iandrewfuchs toLocaleString was updated along with the creation of the other Intl.* APIs as part of Ecma 402 Internationalization which is part of ECMAScript 5.1. You can see these comparability details on the MDN page you linked to. The idea is to bring Node's Internationalization support on par with Firefox, Chrome, and IE 11. |
this is marked as closed, this means we can now install a node v8 with the i18n support? |
@lwelti you'll have to build Node from the master branch using the options specified in the README. |
You can't be serious in not including Unicode support? What kind of an USA-centric view is this you're propagating with your software? I'm trying to build a Node.js web application with some nicely sorted sorted lists. var list = [ 'Z', 'A', 'Á', 'Ä' ];
list.sort(function(a, b) {
return a.localeCompare(b);
});
list; Node.js (v0.11.12) output:
Chrome (34.0.1847.116) output:
It took JavaScript long enough to evolve to a state where it has something like |
@JannesMeyer You're blatant attitude is unwelcome. Next time just ask for a reason, rather than start off with accusations. Node bundles all of its dependencies. This adds extra code weight, but it makes development and debugging much simpler. Especially in the few cases where we have to float a patch (f9ced08 for example). Right now the only icu package v8 supports is https://src.chromium.org/chrome/trunk/deps/third_party/icu46. Which adds an additional 350MB of dependency files. Judging by the lack of support tickets or mailing list questions on the matter generally demonstrates this is a feature a small minority use, and this minority has thus far found it acceptable to build their own in order to enable this feature. IIRC we discussed that once an icu package can be included that isn't so dammed large it would be acceptable to include it as a dependency. Note: accidentally posted the wrong has in the floating patch case. this has been changed. |
@trevnorris I am sorry for jumping to conclusions a bit too quickly. I guess that was unfair. But on the other hand I think it's also unfair to assume that nobody wants these features just because nobody is finding the right place to ask questions about it (an English-speaking mailing list, English-speaking github issues, et cetera...) I think these problems disproportionally affect people whose primary language might be something else than English. Furthermore, it took me about an hour to track this broken behavior down to a Node.js issue. At first I thought that there was a bug in my code, because in some cases the sorting of some Unicode characters actually seemed to work correctly. Other developers might not be that patient and just move on and leave their users with a less than optimal sorting behaviour or they might implement nasty workarounds. I think that Unicode support indeed is a neccessity to provide a good user experience. Some people who are asking for it:
An effort to create a polyfill for Node.js: Intl.js: Node.js support and npm package
And this is just for sorting. I'm not even talking about all the other Unicode issues which affect JavaScript in general: JavaScript has a Unicode problem I understand your reasons for not wanting to pull in 350MB of source files directly into the Node repo. But why not make it a git submodule or an external dependency? |
For the sake of transparency, let's NOT use the |
@trevnorris Hi, ICU developer here. Do you have a target size? I'm working on slicing things up, some idea of what's acceptable would be helpful. (By the way - |
@trevnorris so you can use the customizer tool at http://apps.icu-project.org/datacustom/ICUData46.html and deselect data items, and then download an ICU4C data file. Unzip the resulting file, drop it in as |
@trevnorris okay. four megabytes, that's the increase with just English, just by using the data customizer and building ICU tools on the side to rebuild the data file. (i.e. no specific code changes). So the score:
You can get the .S file at https://ssl.icu-project.org/~srl/tmp/icudt46l_dat.S_ENGLISH.tar.xz and then it replaces I'm sure there is code and data that node doesn't need being included there, but I'd like to get the conversation going here. |
@JannesMeyer Fair enough about non-English speakers not being able to post their concerns. As far as allowing it to be an external dependency, instead of being included like all other dependencies, would have to be discussed with the remainder core team. @caridy I'll concede that by removing @srl295 As far as "target size" are you referring to additional size of the binary, or the additional amount of data that would be included in the repository? AFAIK, no one on the core team opposes having ICU support. The issue is the fact we include all external dependencies, and including ICU for all supported platforms would include a massive amount of code. @tjfontaine @indutny I'd like you're guys' feedback on this matter. |
@trevnorris as noted above - ICU4C 4.6 is a Is it acceptable to have the base have a |
@srl295 As a data point, a release tarball is currently about 15 MB. Once extracted, it's around 85 MB. My initial objection against bundling libicu was that it would double both numbers. If you can keep the increase down to, let's say 20 MB / 100 MB (not hard numbers, just guidelines), that would be perfectly acceptable to me. It'd be great to have i18n enabled by default. @trevnorris Sounds reasonable? |
I agree with @bnoordhuis. But I think it should be possible to build node with any ICU configuration using |
@bnoordhuis @indutny The max size of the binary increase is 12MB not 400MB. So ICU (thanks for using that name instead of "libicu") and node both have pretty comparable binary (12M) and source (15MB expanding to 85MB) sizes - please think about them as comparable. There's no 400Lb or 400MB gorilla. Good question. The problem I see is balancing custom modifications to ICU vs. being able to just drop in a new version. If doubling the source size is acceptable, we can keep the complexity lower and thus ease of updating. What we can't do is have english-only data in the source and then enable additional data without hitting the network. |
Could the ICU data be loaded through a npm module?
To update all ICU data:
To build without ICU data, think of ./configure --without-i18n && make && make install
npm install -g icu-data-fr We could assume all I don't know the feasibility/difficulty of doing such a thing but well, it's my 2 cents on the problem, from a not-only-English-speaking-app developer ;-) |
@oncletom ICU's source data has 3,500 elements to it in 500+ locales (here I go w/ big numbers) - icu-data- may be too fine grained. Perhaps one module that just adds "everything" for <10MB? ICU does have api such as [ |
Update: I'm able to build node from an arbitrary (non-chrome) ICU. |
@srl295 that's good to hear. Now, how often will we have to update the ICU library? Just curious. Also, it seems there is the issue of language support. How many languages do we support? etc. As @oncletom noted, it could be advantageous to allow users to install additional language support via npm. It's late and I'm tired, so here go the stupid questions. Would it be possible to only include the code that allows ICU to work, but not include any language support (since it seems that is where the bulk of the size comes from)? And would it then be possible to install whatever language support someone desired via npm? Issue is that the person would still have to do some type of Just crapping out ideas. While it's convenient all dependencies are included, I'm annoyed that the |
@trevnorris if I've even partially done this right, the branch I'm working on is at https://github.com/srl295/node/compare/smaller-icu - can build from any ICU, but calls out to shell/makefile (just to get things moving). This is WIP/experimental/proof of concept, not a looming pull request. Numbers: node tip is now 14MB for me as I write. I was able to get a node build with the latest ICU (from ICU trunk), + just english, at 18MB (so +4 still). With "everything" is 38MB, but, I have not tried "all locales, but only what Node actually will use". I would guess it would be on the order of +10MB from non-i18n. how many languages? ICU out of the box supports ~200 languages, ~500 locales total. [CLDR][http://cldr.unicode.org] has some others that ICU doesn't take by default. how often? ICU (and CLDR) update roughly yearly. Minor updates at other times of course. I'd say node could determine what schedule works, perhaps watching CLDR especially for any language change of interest. *.. just the code that allows ICU to work.. * my "18MB count" above ( 4 MB over stock node ) is what I can build at 11PM that's node + all code needed + english/root data. ...install language support .. I don't know enough about the loading process in node or v8. As is, ICU has to be called first thing with a "here's some new data" function, not mid-operation. But, that's as-is, it's all just code. I would question how fine to slice the installed packages. Adding French, say, (in all territories) adds 67k. Of that, French in Canada is 4.5k. Hundreds of tiny packages seem like they could create more overhead than flexibility. Maybe I can return with stupid questions of my own: is there any concept of persistent state, a "node.js home/configuration directory", etc where data could be loaded from? I think the answers are no, if not a resounding no. Anyways, thanks for the thoughts here. Tomorrow I'll figure out what the upper limit is using "only stuff ecma402 cares about". Then we can really compare: 14MB with nothing, 18MB with English, vs ?? with "all locales". |
Indeed. It's either baked into the binary or loaded through the module system - but that's long after the VM is up and running. Idle speculation: with icu_use_data_file_flag=1, V8::InitializeICU() calls udata_setCommonData(). I know you can call that function multiple times but does that also work when ICU is already being used? |
Right, the timing could be a problem here.
No, unless ICU is |
I just want to make sure I understand what's going on. How much source code would be added for just English support? (if it came bundled like other deps) Are we willing to include an external dependency? (@tjfontaine) How much would it add to the binary size? Is it possible to include other languages after the program has started up? |
This will be the path forward:
This is basically the same work that needs to be done for certificate stores for crypto. |
@tjfontaine recappipng here from our IRC conversation. Please correct me if I misunderstood anything.
So, ICU supports all of this today, just need to make sure the right calls get made.
|
I feel strongly that This is clearly a complex language feature to implement, but it should be noted that Firefox and Chrome were able to ship Intl with data for the common locales in ~4MB of overhead. It would be great to also get clarity from @tjfontaine on whether or not it's a core project principle/requirement for Node.js to be an ECMAScript runtime and plan to follow/implement updates to the spec. (I get that most of the language features will come in via v8, but things like |
The statement "otherwise Node.js will not be a ECMAScript 5.1 runtime" is not correct. ECMAScript 5.1 is defined by standard ECMA-262 edition 5.1, while the ECMAScript Internationalization API is defined by a separate complementary standard, ECMA-402. A runtime can implement ECMA-262 edition 5.1 and thus ECMAScript 5.1 without implementing ECMA-402. That said, I'm with Eric on the practical aspect: It's not clear why adding the internationalization API and ICU (with a combined download size of a few MB) is such a big issue for Node. Download size is a real issue for browsers because they're downloaded by end-users, many of whom (especially in developing countries) don't have broadband access yet. It shouldn't be an issue for server software, which is usually only downloaded by people (developers, ops people) who have broadband access. It seems at least Ebay, Yahoo, and IBM need the internationalization API in Node, and I can't imagine they're the only ones. |
@NorbertLindenberg thanks for clarifying the relationship between the ECMA-262 and ECMA-402 specs, and how they relate to ECMAScript 5.1. |
I don't find it controversial at all -- I will happily review PRs that work in the way I described |
Hi, just a late remark on all this: I strongly support that i18n support (including Collations and proper sorting) should be a standard part of node or at least should be easyly configurable, without doing a custom build. It is obvious that this is a central feature for lots and lots of non english languages (including my own norwegian language). Chrome, Firefox and IE are already supporting this on the client side. |
Addresses nodejs#5566. The `ee.once()` function is currently documented as invoking the listener, and then removing it when the event is triggered. However, this is not really the case. The listener is removed and _then_ invoked. This only matters in a narrow set of use cases, but when it matters, it matters that the docs are correct. See the issue (nodejs#5566) for a discussion on why the code has not been modified to match the documentation, but instead the documentation has been modified to match the code. Fixes: nodejs#5566 Ref: nodejs#6371 PR-URL: nodejs/node#7103 Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Robert Lindstaedt <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Сковорода Никита Андреевич <[email protected]>
Addresses nodejs#5566. The `ee.once()` function is currently documented as invoking the listener, and then removing it when the event is triggered. However, this is not really the case. The listener is removed and _then_ invoked. This only matters in a narrow set of use cases, but when it matters, it matters that the docs are correct. See the issue (nodejs#5566) for a discussion on why the code has not been modified to match the documentation, but instead the documentation has been modified to match the code. Fixes: nodejs#5566 Ref: nodejs#6371 PR-URL: nodejs/node#7103 Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Robert Lindstaedt <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Сковорода Никита Андреевич <[email protected]>
This is something we'll have to address when we upgrade to v8
3.22
anyways, since it's enabled by default.IMO it should be disabled by default, but add a
configure
flag like--with-{intl,icu,i18n}
(choose one) that'll automatically take care of adding it to the build. Currently in v8 if you do amake dependencies
it checks out a substantial amount of code for icu. I'm told that there's a much slimmed down version that can be used. Either way I don't think it should be bundled with Node by default. Instead it'll just grab the dependency before the build process.I was able to build current
master
with i18n support by doing the following steps:'os_posix': 1
to thevariables
section ofcommon.gypi
.'v8_enable_i18n_support': 1
to thevariables
section ofcommon.gypi
.This gave me a build of Node that had the
Intl
global object.possible dependency: #6370
extension of: #4689
/cc @tjfontaine @TooTallNate @davglass
The text was updated successfully, but these errors were encountered: