Define parsers.punctuation
in a streaming fashion
#432
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Since #416, we have first read all Unicode punctuation characters to a table
punctuation
, then definedparsers.punctuation
using the tablepunctuation
, and then we deleted the tablepunctuation
. Since #416, we have also been experiencing steady out-of-memory issues with our capybara runner, as discussed with @TeXhackse earlier today.I have disabled capybara, since it's been having intermittent out-of-memory issues ever since Markdown 3.0.0 and its speed has also lately been an issue. Nevertheless, this indicates a potential cost of the current approach, which may eventually impact our users as well.
This PR removes the table
punctuation
directly from the fileUnicodeData.txt
without any intermediate data structure. This should alleviate any memory issues caused by #416.