Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Baked data for icu_plurals, icu_timezone #3462

Merged
merged 19 commits into from
Jun 7, 2023
Merged
2 changes: 2 additions & 0 deletions components/plurals/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ categories = ["internationalization"]
# Keep this in sync with other crates unless there are exceptions
include = [
"src/**/*",
"data/macros.rs",
"examples/**/*",
"benches/**/*",
"tests/**/*",
Expand Down Expand Up @@ -53,6 +54,7 @@ serde = ["dep:serde", "zerovec/serde", "icu_locid/serde", "icu_provider/serde"]
datagen = ["serde", "zerovec/databake", "dep:databake"]
experimental = []
bench = ["serde"]
data = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (nb, followup): data feels very generic to me and calling it globaldata might help

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really "global" though, it's just for this crate. If data is too general, I'd prefer "builtin-data".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works for me

I think we've been calling it globaldata with global being real-world global

Copy link
Member Author

@robertbastian robertbastian Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it should be moderndata :)


[lib]
path = "src/lib.rs"
Expand Down
5 changes: 2 additions & 3 deletions components/plurals/README.md

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 12 additions & 0 deletions components/plurals/build.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
// This file is part of ICU4X. For terms of use, please see the file
// called LICENSE at the top level of the ICU4X source tree
// (online at: https://github.com/unicode-org/icu4x/blob/main/LICENSE ).

fn main() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: really not a fan of introducing more build scripts; they're a pain for non-Cargo users, and people in general do not like them.

Would prefer the crate based solution

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the issue we decided

globaldata can be replaced with a custom baked rust source using an env var.

There's no way to do defaulting env vars without build scripts.

Copy link
Member Author

@robertbastian robertbastian Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does a separate crate solve this? It would only move the problem, you'd still have a build script there. It would also introduce another problem around semver, we'd need to use a = version range, but that's also frowned upon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I was under the impression we were still considering doing crates. I think crate replacement works out far easier.

I would ideally like a non-build-script ICU4X to still work with the data feature. To do this you'd have to use a cfg that's detected from the env var. Basically:

  • ICU4X_PLURALS_REPLACE_DATA = ... is the input
  • Build script detects it and sets --cfg data-replacement (see CPTBuilder's icu4c_enable_renaming for how this works)
  • Code uses cfg(data-replacement) to replace the relative path include with include!(env!("ICU4X_PLURALS_REPLACE_DATA"))

This means code using build scripts needs to set an env var and a CFG, but that's not an undue burden IMO.

(If we're not doing crates, we don't really need macros anyway, right?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try the cfg approach.

(If we're not doing crates, we don't really need macros anyway, right?)

We do for cross-crate dependencies. icu_datetime will load macros from icu_calendar, icu_decimal, and icu_plurals.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could actually do this without a build script if we had a builtin-data and a custom-data feature instead of feature + automatic cfg.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah dueling features are annoying

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two separate discussions in this thread.

Custom globaldata path: In #2992 I proposed the dual features approach, but I like the cfg better. A disadvantage of the dual features is that you have the --all-features problem.

Macros in crates: Unless my assumptions are wrong, I see it as a non-starter to call macros in crates they don't belong to. It makes the crates heavier and much more difficult for DCE. There are a couple solution we discussed previously:

  1. The crates export a DataProvider implementing the bounds they need, and client crates fork between them manually (fork-by-key DataProvider works if you manually list out every key you need to fork on).
  2. The globaldata constructors call other globaldata constructors directly. This creates two code paths, one for globaldata and one for custom data, which isn't great, but most constructors are thin so it's probably fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now thinking of it, it's really not correct that _with_any_provider and _with_buffer_provider call _unstable unconditionally; we see this problem in icu_locid_transform. It is more correct to call the current constructor type all the way down the stack, only converting to a DataProvider at the last minute when calling actual load functions. So, all four constructors should follow the same pattern of calling the same constructor of child classes and combining the results with load calls on the current crate's data structs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to agree on the Custom globaldata path issue, and Macros in crates is not a discussion we need to have on this PR. I'm not calling macros from other crates.

if std::env::var("CARGO_FEATURE_DATA").is_ok() {
if std::env::var("ICU4X_DATA_DIR").is_ok() {
println!("cargo:rustc-cfg=icu4x_custom_data");
}
println!("cargo:rerun-if-env-changed=ICU4X_DATA_DIR");
}
}
23 changes: 23 additions & 0 deletions components/plurals/data.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
{
robertbastian marked this conversation as resolved.
Show resolved Hide resolved
"keys": {
"Explicit": [
"plurals/ordinal@1",
"plurals/cardinal@1"
]
},
"locales": {
"CldrSet": [
"modern"
]
},
"cldr": "Latest",
"icu_export": "Latest",
"segmenter_lstm": "Latest",
"export": {
"Baked": {
"output_path": "data",
"use_separate_crates": true
}
},
"overwrite": true
}
2 changes: 2 additions & 0 deletions components/plurals/data/any.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
// @generated
impl_any_provider ! (BakedDataProvider) ;
2 changes: 2 additions & 0 deletions components/plurals/data/macros.rs

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions components/plurals/data/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
// @generated
include ! ("macros.rs") ; # [clippy :: msrv = "1.61"] pub struct BakedDataProvider ; impl_data_provider ! (BakedDataProvider) ;
Loading