-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider supporting retrieval of the language preference list from the system #3990
Comments
Related: #3059 |
Is this in scope? Seeking feedback from: |
I think this is in scope as a utils crate we publish separately, perhaps with icu_locid integration. I don't consider this priority. |
I would like to consider this out of scope of ICU. I would name such create |
Based on feedback from the i18n unconference at RustConf, devs want libraries that "just work" and integrating nicely with the operating system is part of that. We are in a decent position to write this type of code. I don't know where exactly this code lands, but if it lands in the icu4x repository, it should probably be under |
Ok, I'm comfortable with this as long as it's explicit. |
This is likely a "good first issue" because the API surface is small, mostly a function that returns the system locale as a |
Comment from @VorpalBlade in #4580:
|
Here is my finding so far:
Thoughts regarding this @ everyone in this thread? |
@ashu26jha I'm only going to comment on Linux / non-Mac *nix, since that is the only thing I'm even remotely qualified to talk about: It shouldn't be too hard to implement the resolution logic for *nix in pure rust after reading the environment variables (that may or may not be set). It is after all standardised by POSIX. There is a really long section of the POSIX standard on locales: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html A lot of that is not needed here as it is about a definition language for locales. In this case see https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08 (section 8.2) for locale resolution order. On modern systems I believe the fallback in practice when nothing is set is C.UTF-8, it used to be just the C locale (ASCII or indeterminate encoding I believe).
At least glibc also have some additional categories (LC_PAPER for example). But I'm on my phone so I can't easily check which other ones. You could also consider trying to interpret locale definitions from POSIX (e.g. what the name of the months are etc), but I'm under the impression that IC4UX would probably prefer to use their own mapping from |
Hello, I'm hoping to tackle this issue for GSoC, and in my research, I believe I've found reasonable retrieval methods for each OS:
I feel that writing the FFI bindings from scratch for Windows and Mac should be relatively trivial if we wish to reduce dependencies, though this will of course involve more unsafe code. PS: I feel this issue is quite detached from the rest of the crate - may someone point me towards some relevant tasks I can attempt, to gain some familiarity with the crate? |
@JMoogs I don't belive that would work on Unix, due to rust-lang/rust#27970 Basically it is unsound to call C functions that read the environment. If you can read the environment variables from Rust instead it should be fine (as std has a lock internally). |
@VorpalBlade I think the proper fix is on the Rust side, by having the setter functions |
That would be great, unfortunately not much have happened in recent years with that bug. :( |
I under the impression that a locale could be set without having an associated environment variable - it seems this isn't the case and so a pure Rust implementation should work on Linux. |
In glibc, |
@VorpalBlade I went through your links, good resource but I happen find one corner case:
I think we need to think about this case, we could have an enum which looks something like this:
I personally feel the crux of this feature is not getting the locales but actually it's making sure they map correctly (locales names if they don't match) which will require testing it thoroughly |
To keep up with the modularity of the proposed crate, I think following should be workflow:
The need for converter is that it would make a common ground and standardization for adding more Operating System in our coverage. This converter needs to take care of the cases where we don't have a direct mapping for eg: let locale: Locale = locale!("C"); The above code will fail, so we need to build a mapping to handle these corner cases. |
As highlighted by @hsivonen for android:
We could introuduce a C/C++ layer in between the Java & Rust. Directly handling JNI from Rust is not the best way to move forward. Most of the overhead shall be handled by this layer. It will retrieve the results from the JNI call, converting them back into a format suitable for C/C++ (and ultimately for Rust). |
Consider providing functionality (with
std
, not withno_std
) for retrieving the user's system-level preference list of languages as ICU4X locales.On Windows, Gecko prefers https://learn.microsoft.com/en-us/uwp/api/windows.system.userprofile.globalizationpreferences.languages?view=winrt-22621#windows-system-userprofile-globalizationpreferences-languages and adds region with likely subtags if the system gives a language only.
On Mac, Gecko uses https://developer.apple.com/documentation/corefoundation/1542887-cflocalecopypreferredlanguages
On Android, Gecko prefers https://developer.android.com/reference/android/os/LocaleList#getDefault() . Not sure if it's practical to call a Java method, even a static one, deep within Rust code when the Rust code isn't responsible for the whole app's JNI setup.
On Gtk, Gecko delegates to ICU4C, which AFAICT, calls
setlocale(LC_MESSAGES, NULL);
and performs fixup. It appears (note the author of the answer) that it's OK to call glibcsetlocale
to read get (not actually set) a value, and nothing else in the process actually sets the value, either, so that it's constant for the lifetime of the process. Obviously, this code path retrieving only one locale.(Note: Gecko already has non-ICU4C C++ code for this (except on Gtk), so whereas #3059 is deliberately U-gecko-tagged, I'm filing this as a general U-ecma402 courtesy without the usual implication that everything U-ecma402 is implicitly U-gecko.)
(Note 2: ECMA-402 default locale isn't a preference list and is implied to have data available for it across all the ECMA-402 objects, so implementing ECMA-402 on top of what's suggested above would involve further filtering.)
The text was updated successfully, but these errors were encountered: