Skip to content

Commit

Permalink
ICU-22986 GL takes CM
Browse files Browse the repository at this point in the history
  • Loading branch information
eggrobin committed Dec 10, 2024
1 parent 515d0a7 commit 23d9a3e
Show file tree
Hide file tree
Showing 4 changed files with 10 additions and 2 deletions.
2 changes: 1 addition & 1 deletion icu4c/source/data/brkitr/rules/line.txt
Original file line number Diff line number Diff line change
Expand Up @@ -297,7 +297,7 @@ $LB20NonBreaks = [$LB18NonBreaks - $CB];
# and then to default UAX #14 behaviour (UTC-179-C32).
#
^($HY | $HH) $CM* $ALPlus;
$GL ($HY | $HH) $CM* $ALPlus;
$GL $CM* ($HY | $HH) $CM* $ALPlus;
# Non-breaking CB from LB8a:
$CB $CM* $ZWJ ($HY | $HH) $CM* $ALPlus;
# Non-breaking SP from LB14:
Expand Down
2 changes: 1 addition & 1 deletion icu4c/source/test/testdata/break_rules/line.txt
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,7 @@ LB11.2: SP WJ;
LB11.3: WJ CM* [^CM];

# Needs to apply before LB12, because the new monkeys are not greedy.
LB20a.2: GL (HY | HH) CM* AL;
LB20a.2: GL CM* (HY | HH) CM* AL;
LB12: GL CM* [^CM];

LB12a: [^SP BA HY] CM* GL;
Expand Down
4 changes: 4 additions & 0 deletions icu4c/source/test/testdata/rbbitst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2214,3 +2214,7 @@ Bangkok)•</data>
<data>•« Complex »« chaining » •</data>
<data>•« .618 »•</data> # Interaction with the ICU tailoring to break before such numbers.

# A hyphen following non-breaking space that carries an intervening combining
# mark is treated as word-initial; by LB20a it has no break opportunity after
# it. A bug in ICU 76 incorrectly handled that case (ICU-22986).
<data>• ̄-k•</data>
Original file line number Diff line number Diff line change
Expand Up @@ -2214,3 +2214,7 @@ Bangkok)•</data>
<data>•« Complex »« chaining » •</data>
<data>•« .618 »•</data> # Interaction with the ICU tailoring to break before such numbers.

# A hyphen following non-breaking space that carries an intervening combining
# mark is treated as word-initial; by LB20a it has no break opportunity after
# it. A bug in ICU 76 incorrectly handled that case (ICU-22986).
<data>• ̄-k•</data>

0 comments on commit 23d9a3e

Please sign in to comment.