Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make characters with Line_Break=Ambiguous ambiguous #61

Merged

Conversation

Jules-Bertholet
Copy link
Contributor

UAX 14:

As originally defined, the line break class AI contained all characters with East_Asian_Width value A (ambiguous width) that would otherwise be AL in this classification. For more information on East_Asian_Width and how to resolve it, see Unicode Standard Annex #11, East Asian Width [UAX11].

The original definition included many Latin, Greek, and Cyrillic characters. These characters are now classified by default as AL because use of the AL line breaking class better corresponds to modern practice. Where strict compatibility with older legacy implementations is desired, some of these characters need to be treated as ID in certain contexts. This can be done by always tailoring them to ID or by continuing to classify them as AI and resolving them to ID where required.

As part of the same revision, the set of ambiguous characters has been extended to completely encompass the enclosed alphanumeric characters used for numbering of bullets.

As updated, the AI line breaking class includes all characters with East Asian Width A that are outside the range U+0000..U+1FFF, plus the following characters:

24EA CIRCLED DIGIT ZERO
2780..2793 DINGBAT CIRCLED SANS-SERIF DIGIT ONE..DINGBAT NEGATIVE CIRCLED SANS-SERIF NUMBER TEN

Copy link

@defihook defihook left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved

Copy link

@defihook defihook left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved

@Manishearth Manishearth merged commit e77b292 into unicode-rs:master Nov 1, 2024
2 checks passed
@Jules-Bertholet Jules-Bertholet deleted the ambiguous-line-break branch November 1, 2024 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants