Define `parsers.punctuation` in a streaming fashion #432

Witiko · 2024-04-03T19:35:31Z

Since #416, we have first read all Unicode punctuation characters to a table punctuation, then defined parsers.punctuation using the table punctuation, and then we deleted the table punctuation. Since #416, we have also been experiencing steady out-of-memory issues with our capybara runner, as discussed with @TeXhackse earlier today.

I have disabled capybara, since it's been having intermittent out-of-memory issues ever since Markdown 3.0.0 and its speed has also lately been an issue. Nevertheless, this indicates a potential cost of the current approach, which may eventually impact our users as well.

This PR removes the table punctuation directly from the file UnicodeData.txt without any intermediate data structure. This should alleviate any memory issues caused by #416.

@TeXhackse

Since #416, we have first read all Unicode punctuation characters to a table `punctuation`, then defined `parsers.punctuation` using the table `punctuation`, and then we deleted the table `punctuation`. Since #416, we have also been experiencing steady out-of-memory issues with our capybara runner, as discussed with @TeXhackse earlier today. I have disabled capybara, since it's been having intermittent out-of-memory issues ever since Markdown 3.0.0 and its speed has also lately been an issue. Nevertheless, this indicates a potential cost of the current approach, which may eventually impact our users as well. This PR removes the table `punctuation` directly from the file `UnicodeData.txt` without any intermediate data structure. This should alleviate any memory issues caused by #416.

Witiko added commonmark Related to making the syntax of markdown follow the CommonMark spec speed Related to speed improvements labels Apr 3, 2024

Witiko added this to the 3.4.3 milestone Apr 3, 2024

Witiko self-assigned this Apr 3, 2024

Witiko marked this pull request as ready for review April 3, 2024 21:42

Update CHANGES.md

dad61ee

Witiko added the automerge This pull request will be automatically merged after continuous integration has succeeded label Apr 3, 2024

Witiko force-pushed the fix/parsers-punctuation-memory-issues branch from c711c61 to dad61ee Compare April 3, 2024 21:43

Witiko merged commit e2c6be1 into main Apr 3, 2024
9 of 12 checks passed

Witiko deleted the fix/parsers-punctuation-memory-issues branch April 3, 2024 22:48

Witiko mentioned this pull request Apr 6, 2024

Support ConTeXt standalone #402

Open

Witiko mentioned this pull request Aug 13, 2024

Improve the speed of the Markdown package #474

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define `parsers.punctuation` in a streaming fashion #432

Define `parsers.punctuation` in a streaming fashion #432

Witiko commented Apr 3, 2024

Define parsers.punctuation in a streaming fashion #432

Define parsers.punctuation in a streaming fashion #432

Conversation

Witiko commented Apr 3, 2024

Define `parsers.punctuation` in a streaming fashion #432

Define `parsers.punctuation` in a streaming fashion #432