Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define parsers.punctuation in a streaming fashion #432

Merged
merged 2 commits into from
Apr 3, 2024

Conversation

Witiko
Copy link
Owner

@Witiko Witiko commented Apr 3, 2024

Since #416, we have first read all Unicode punctuation characters to a table punctuation, then defined parsers.punctuation using the table punctuation, and then we deleted the table punctuation. Since #416, we have also been experiencing steady out-of-memory issues with our capybara runner, as discussed with @TeXhackse earlier today.

I have disabled capybara, since it's been having intermittent out-of-memory issues ever since Markdown 3.0.0 and its speed has also lately been an issue. Nevertheless, this indicates a potential cost of the current approach, which may eventually impact our users as well.

This PR removes the table punctuation directly from the file UnicodeData.txt without any intermediate data structure. This should alleviate any memory issues caused by #416.

Since #416, we have first read all Unicode punctuation characters to a
table `punctuation`, then defined `parsers.punctuation` using the table
`punctuation`, and then we deleted the table `punctuation`. Since #416,
we have also been experiencing steady out-of-memory issues with our
capybara runner, as discussed with @TeXhackse earlier today.

I have disabled capybara, since it's been having intermittent
out-of-memory issues ever since Markdown 3.0.0 and its speed has also
lately been an issue. Nevertheless, this indicates a potential cost of
the current approach, which may eventually impact our users as well.

This PR removes the table `punctuation` directly from the file
`UnicodeData.txt` without any intermediate data structure. This should
alleviate any memory issues caused by #416.
@Witiko Witiko added commonmark Related to making the syntax of markdown follow the CommonMark spec speed Related to speed improvements labels Apr 3, 2024
@Witiko Witiko added this to the 3.4.3 milestone Apr 3, 2024
@Witiko Witiko self-assigned this Apr 3, 2024
@Witiko Witiko marked this pull request as ready for review April 3, 2024 21:42
@Witiko Witiko added the automerge This pull request will be automatically merged after continuous integration has succeeded label Apr 3, 2024
@Witiko Witiko force-pushed the fix/parsers-punctuation-memory-issues branch from c711c61 to dad61ee Compare April 3, 2024 21:43
@Witiko Witiko merged commit e2c6be1 into main Apr 3, 2024
9 of 12 checks passed
@Witiko Witiko deleted the fix/parsers-punctuation-memory-issues branch April 3, 2024 22:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge This pull request will be automatically merged after continuous integration has succeeded commonmark Related to making the syntax of markdown follow the CommonMark spec speed Related to speed improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant