-
Notifications
You must be signed in to change notification settings - Fork 335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable choice from multiple Unicode back ends #965
Conversation
The CI failures are due to the |
And the intent, of course, is to make |
Looks great so far! |
Per Zulip regarding Servo TSC discussion about the Unicode-3.0 part here: "everyone was fine, and it uses an open source license so no issues with that" |
Per today's Servo TSC meeting, I'll publish idna_adapter and idna_mapping to crates.io with the repos remaining under my GitHub account at least for now. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## idna-v1x #965 +/- ##
============================================
+ Coverage 79.60% 80.48% +0.88%
============================================
Files 23 24 +1
Lines 4221 4253 +32
============================================
+ Hits 3360 3423 +63
+ Misses 861 830 -31 ☔ View full report in Codecov by Sentry. |
Looks like the coverage check is bad at dealing with expressions formatted over multiple lines. |
Merged per out-of-band check to interpret this comment as review. |
(This PR is intentionally to the
idna-v1x
branch as an intermediate step and not directly tomain
.)This PR is a draft due to the dependency declaration using
path
.Build time charts of
reqwest
without application codeLogically, the review here involves reviewing also:
main
branch ofidna_mapping
.unicode-joining-type
dependency is behind this crate to hide its upcoming semver break fromidna_adapter
and to keep open the option to provide the data directly inidna_mapping
if needed.idna_adapter
branches:main
branch (ICU4X 1.x fromicu_normalizer
1.4.3 andicu_properties
1.4.2 onwards but not 2.x. Currently in practice 1.5.)unicode-rs
branch (idna_mapping
,unicode-normalization
,unicode-bidi
)no-unicode
branch (No Unicode back end; rejects Unicode-form input and does not check Punycode input for rules that require Unicode data)url
andidna
.idna_adapter
also has a branch calledicu4x-trunk
. It has apath
dependency to an adjacenticu4x
checkout. Once ICU4X 2.0 has been released, the expectation is to merge this code to themain
branch ofidna_adapter
as version 1.2.1.I have omitted previously-prototyped ICU4X-1.x-relevant performance hacks from
idna
when the performance issues will be remedied by ICU4X 2.0.On AMD Ryzen Threadripper PRO 5975WX, the fastest path that does not call into
idna_adapter
at all apart from its constructor is a nanosecond slower with ICU4X 1.x or 2.x compared withunicode-rs
orno-unicode
. In the ICU4X 2.0 case, it would be possible to put the adapter in astatic
, which logically is supposed to avoid running the constructor in the fastest-path case, but that makes it another nanosecond slower, which generally makes no sense. I didn't investigate this further, but from this, it's reasonable to guess that there are likely some issues with the code layout interacting with the instruction cache or somesuch in an unlucky way, and it's not worthwhile to try to optimize for such effects. (When there's any Unicode involved at all, ICU4X performs better thanunicode-rs
.)Technically,
idna_adapter
andidna_mapping
could be published to crates.io with the repos being under my GitHub account. My understanding of Servo policies is that moving the repos to under github.com/servo/ requires an OK from the TSC as a matter of formality, since the licenses are not MPL2. I have posted a request to get such an OK when the TSC meets on September 23rd.