Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode 16.0 support #201

Closed
hpjansson opened this issue May 2, 2024 · 13 comments
Closed

Unicode 16.0 support #201

hpjansson opened this issue May 2, 2024 · 13 comments
Labels
feature New feature or request symbols Fonts and symbols
Milestone

Comments

@hpjansson
Copy link
Owner

hpjansson commented May 2, 2024

We need to add support for new Unicode 16.0 legacy symbols, chiefly:

Large type builtins will probably require manual definitions, but those for octants can be generated at runtime. We need new tags CHAFA_SYMBOL_TAG_OCTANT/octant and CHAFA_SYMBOL_TAG_LARGETYPE/largetype.

We may want to extend the coverage of the legacy tag, but it may be wise to hold off on this until terminal/font support is more widespread.

It'd also be a good idea to test our total coverage with Cascadia Code and look for any obvious gaps.

@hpjansson hpjansson added feature New feature or request symbols Fonts and symbols labels May 2, 2024
@PhMajerus
Copy link

PhMajerus commented May 2, 2024

I don't think the Large Type Pieces make sense for your project, unless I missed some text rendering feature besides the image conversions. The large type are really designed to build large text and their weight and exact design may differ from one font to another, so I really wouldn't use them in a bitmap to ANSI/VT converter.
Try the following if you want more details than in my Cascadia feature request, the following document is more complete: curl https://raw.githubusercontent.com/PhMajerus/Documents/main/HowTos/HowTo%20Large%20Type%20Pieces.txt (from a terminal using a font that supports large type pieces).

On the other hand, octants are definitely something you'll want to support in a bitmap to ANSI/VT converter. Here is a comparison of all the pseudo-pixels mosaics:
image
These are half-blocks, quadrants, sextants, octants, separated quadrants, separated sextants, and braille.

Another set of characters coming in Unicode 16.0 that you'll want to take advantage of and are predictable regardless of the font are the sedecimants and eights sets, they add some 4×4 and 8×8 patterns.
They don't provide all the patterns possible, but adding them to improve the resolution would be great:
image
curl https://raw.githubusercontent.com/PhMajerus/Documents/main/CheatSheets/More%20blocks%20tables.txt

@hpjansson
Copy link
Owner Author

I don't think the Large Type Pieces make sense for your project, unless I missed some text rendering feature besides the image conversions. The large type are really designed to build large text and their weight and exact design may differ from one font to another, so I really wouldn't use them in a bitmap to ANSI/VT converter.

Well - Chafa is kind of two-pronged. On one hand (and by default) it does straight MSE minimization to code points that look similar across terminals and fonts. On the other, it also supports alternative symbol sets and escape sequences for more traditional/artistic flavors, e.g. ASCII, some CJK, and custom fonts. I'd like to push further in both directions.

There's a lot of interesting research on structural character art rendering - see #150 for examples and ideas. Some of those could benefit from a greater selection of "imperfect" connective glyphs.

Try the following if you want more details than in my Cascadia feature request, the following document is more complete:

On the other hand, octants are definitely something you'll want to support in a bitmap to ANSI/VT converter. Here is a comparison of all the pseudo-pixels mosaics:

Another set of characters coming in Unicode 16.0 that you'll want to take advantage of and are predictable regardless of the font are the sedecimants and eights sets, they add some 4×4 and 8×8 patterns. They don't provide all the patterns possible, but adding them to improve the resolution would be great:

Brilliant - definitely adding support for these!

@hpjansson hpjansson added this to the 1.16 milestone May 3, 2024
@PhMajerus
Copy link

PhMajerus commented May 9, 2024

I've been thinking about your idea of using all possible characters by loading fonts and analyzing the glyphs.
Did you already include color emojis in your renderers?
This could work like those pieces of art creating a large picture using a patchwork of smaller pictures. Emojis could provide some shape and colors contributing to a larger image.

This example only uses hearts emojis for their colors as pseudo-pixels:
image

It doesn't provide any benefit over VT colors in a terminal, but works in plain-text:
🧡🧡🧡🧡🧡🧡🧡🧡🧡🧡🧡🖤🖤🖤🖤🖤🖤🧡🧡🧡
🧡🧡🧡🧡🧡🧡🧡🧡🧡🖤🖤🖤🖤🖤🖤🖤🖤🖤🧡🧡
🧡🧡🧡🧡🧡🧡🖤🖤🖤🖤🖤🖤🖤🖤🖤🖤🖤🖤🧡🧡
🧡🧡🧡🧡🧡🖤🖤🖤💙💙🖤🖤🖤🖤🖤🖤🖤🖤🖤🧡
🧡🧡🧡🖤🖤🖤🖤🖤💙🩵💚🩶💙🖤🖤🖤🖤🖤🖤🧡
🧡🧡🖤🖤🖤🖤🖤🖤💙🩵🩵💛💚🩵💙🖤🖤🖤🖤🧡
🧡🧡🖤🖤💜🖤🖤🖤🖤🩵🩵🩵🩵💙🩶🩵💙🖤🤎🧡
🧡🖤🖤🖤🖤🖤🖤🖤🖤🖤💙🩵🩵💜💜🩷🩵💙🧡🧡
🧡🖤🖤🖤💜💜💜💜💜💜🖤🖤💙💜💙💜🩶🩵🧡🧡
🧡💜🖤🖤💜💜💜💜💜💜🩷🩷🖤🖤🖤💜🩵🩵🧡🧡
🤎💜💜🖤💜💜💜🩷💜🩷🩷🩷💜💜🩷💜🖤💙🧡🧡
🖤🩷🖤💜💜💜💜🩷💜🩷🩷🩷🖤🖤🩷🩷🩶🧡🧡🧡
🤎💜💜💜💜💜💜💜💜💜🩷🩷🩷🩷🩷🩷🩷🧡🧡🧡
🧡💜💜💜💜💜💜💜💜💜💜🩷🩷💜🖤🩷🩷🧡🧡🧡
🧡🖤💜🖤💜💜💜💜💜💜💜🖤🩷🩷🩷🩷🩷🧡🧡🧡
🧡🤎🖤💜💜💜💜💜💜💜🖤🖤💜🖤💜🩷🩷🧡🧡🧡
🧡🧡🖤💜💜💜💜💜🖤🖤🖤🖤🖤🖤🖤💜🩶🧡🧡🧡
🧡🧡🤎🩷💜💜💜💜💜💜🖤🖤🖤🖤🖤🤎🧡🧡🧡🧡
🧡🧡🧡🩷💜💜💜💜💜💜🖤🖤🖤🖤🤎🧡🧡🧡🧡🧡
🧡🧡🖤🩷💜💜💜💜💜💜🖤💜💜💜🧡🧡🧡🧡🧡🧡

You could achieve something more detailed by using all the emojis patterns and colors to do this at a higher resolution, and it would still work in plain-text.

@hpjansson
Copy link
Owner Author

Yes - I kept the door open to this in the API, so when implemented, you will be able to add multicolor glyphs while remaining backwards compatible.

See chafa_symbol_map_add_glyph() - it takes a number of pixel formats, although currently it's rendered to mono bitmaps internally. The internals will need some work.

@oshaboy
Copy link
Contributor

oshaboy commented Sep 3, 2024

@PhMajerus I've tried doing that before. The main problem is different emoji fonts use slightly different colors for the emoji. So it's almost impossible to get a consistent shade. Also Emoji hearts get the terminal really confused because the red one is traditionally half width while the rest are traditionally full width. Though this isn't a problem with modern font rendering.

Though I guess the same problem exists with the 8 and 16 bit colors as shown here https://en.wikipedia.org/wiki/ANSI_escape_code#3-bit_and_4-bit. Still at least the colors are way closer in ANSI then they are with emoji.

@PhMajerus
Copy link

@PhMajerus I've tried doing that before. The main problem is different emoji fonts use slightly different colors for the emoji. So it's almost impossible to get a consistent shade.

Of course images using colored hearts will not be exact, but a red stays a red and a yellow stays a yellow. It still provides some color information:
image
(A 125×125 colored hearts image)

Of course, that is at the expense of resolution, as we can show 4×4 pseudo-pixels for each colored heart if we use octants:
image
(A 256×125 octants image)

Though I guess the same problem exists with the 8 and 16 bit colors as shown here https://en.wikipedia.org/wiki/ANSI_escape_code#3-bit_and_4-bit. Still at least the colors are way closer in ANSI then they are with emoji.

I don't know of a 16-bit color ANSI, AFAIK there are the 16 base colors (4-bit), 256 colors (8-bit), and RGB (24-bit).
Note the 24-bit should be reliable, 8-bit slightly less, because although the 6x6x6 colors cube and grayscale are supposed to be consistent, they can differ by terminal or be modified by users or other apps.
The ANSI 16 colors palette I'd argue is worse than the colored hearts, because they differ between the standard DOS colors and the Windows legacy console (used for about 30 years), and even between two CGA systems depending on the attached RGBI monitor (see the whole dark yellow vs brown/ochre issue):

image

@hpjansson
Copy link
Owner Author

The ANSI 16 colors palette I'd argue is worse than the colored hearts, because they differ between the standard DOS colors and the Windows legacy console (used for about 30 years), and even between two CGA systems depending on the attached RGBI monitor (see the whole dark yellow vs brown/ochre issue):

Even worse - many TEs have configurable presets for these. A common one on Linux is Tango (GNOME default):

term-pal-tango

And here's Solarized (dark):

term-pal-solarized-dark

@oshaboy
Copy link
Contributor

oshaboy commented Sep 10, 2024

At least 16 color ANSI has an ad-hoc standard that has the 16 colors specified. Most people who do chafa style stuff won't have their terminal set to a funky color scheme.

Meanwhile with emoji a quick glance at emojipedia will tell you how unspecified emoji actually are. Especially green, blue and purple.

This issue is somewhat solvable by either targeting a specific font, having the font selectable and having a table of all different colors or create an emoji font with well specified colors specifically for those purposes. But this feels beyond the scope of chafa. The way I solved it was to first clamp all the colors to 3 level rgb and then use a lookup table to approximate the right color. This is far from an ideal or even good solution.

@hpjansson
Copy link
Owner Author

I think the point is that color matching for 16-color ANSI and emojis (if/when implemented) are both approximate. Yes - emoji (and other colored glyph) output would target a specific font, cf. --glyph-file, potentially with built-ins for emojis that are similar between many fonts (I think hearts could qualify, but I wouldn't mind looking at counterexamples of common fonts where their representations are wildly different).

I hope everyone is having a great day :-)

@acxz
Copy link

acxz commented Oct 28, 2024

I want to share @mafik 's ansi-art as it uses the font to generate "24-bit, Unicode-capable" for terminal output. See this reddit post: https://www.reddit.com/r/unixporn/comments/wgpxu3/oc_ive_been_working_on_extending_ansiart_with/

and mafik's website where he has an interactive version hosted: https://mrogalski.eu/ansi-art/
He uses JuliaMono due to the font's large support of unicode characters (the largest that I'm aware of)

It is the highest quality terminal art I've seen (excluding sixels)

@hpjansson
Copy link
Owner Author

That's pretty sweet! At a glance, we use the same algorithm at -w 9; MSE exhaustive search. However, ansi-art stores the font glyphs with higher fidelity (15pt -> a bit more than 10x20 with 256 gray levels vs. our 8x8 bitmap). It may be able to capture more detail that way.

That said, I had a branch at one point where I experimented with 16x16 bitmaps, but didn't see enough of an improvement to justify the added complexity of multiple glyph resolutions -- and variable glyph resolution is a big performance hit, since iteration counts wouldn't be known at compile time anymore, and you couldn't fit the bitmaps into an exact multiple of CPU registers.

Maybe I'm wrong about the quality gap; I'd love to look at side-by-side comparisons using -w 9 --symbols all (preferably in a new issue).

@hpjansson
Copy link
Owner Author

hpjansson commented Nov 6, 2024

I added support for octants in c23d8bc . Haven't decided what to do about the rest yet.

Edit: Actually, I did decide to include the sedecimants and eights. Just gotta do it.

@hpjansson
Copy link
Owner Author

I went over the remaining block symbols just now and added those we were missing. Had to skip some of the sedecimants, because VTE (and thus likely other common TEs) is confused as to their halfwidth/fullwidth status.

See 2af2e04 and 5290444. I'll address multicolor glyphs in a separate issue. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request symbols Fonts and symbols
Projects
None yet
Development

No branches or pull requests

4 participants