Parse `&str` instead of `&[u8]` #541

Kijewski · 2021-10-11T13:30:25Z

Askama's takes valid UTF-8 files as input. So why operate on byte slices
instead of strings? This makes writing some functions a lot simpler.

djc

Thanks, I think this makes sense directionally. Some questions about the new implementations though.

askama_shared/src/parser.rs

Kijewski · 2021-10-11T15:49:22Z

Should I turn the third commit https://github.com/djc/askama/pull/541/commits/edc3cf80d6708800322f716b90ce1ae3a939f6c0 into a new PR?

djc · 2021-10-12T08:03:10Z

No, the third commit can stay in this PR.

So I guess my feeling is that a lot of the substantial changes to take_content(), split_ws_parts(), identifier() and nested_parenthesis() are really kind of orthogonal to this PR, that is, there is probably a way to rewrite those to deal with &str instead of &[u8] while keeping their structure substantially similar (but do tell me if I'm wrong!). If so, I'd prefer to keep them that way and consider their refactoring in a separate PR.

Kijewski · 2021-10-12T17:31:31Z

I refactored the commits and opened PR #545. This PR is now based on #545.

djc · 2021-10-13T08:13:56Z

So this is kind of the other way around from what I intended. I'm actually more interested in the &[u8] to &str conversion and less interested in the other parser function changes -- I'd like to consider each of the latter separately. Would you be willing to do the minimal modification for each of these that gets them working in the &str world?

(If that is harder than I'm making it out to be or if you're just not interested in doing the work, that's fine too.)

Kijewski · 2021-10-13T08:54:08Z

Iterating strings is just no fun in Rust. One little mistake and you are splitting a codepoint, which causes a panic. That's why I wanted to let nom do the error prone stuff instead of porting the &[u8] implementation directly.

Also I'm not 100% sure the current implementation in main is 100% correct. E.g.

ws() uses the allocating function many0(),
take_content() needs two ASCII characters a block where is first character of the three variants is more of a set, e.g. if you have the delimitters "{{", "<%" and "($", then "<%" works too (but that's mentioned in the documentation)

askama_shared/src/parser.rs

djc

Again sorry for the slow review. Your contributions are highly valued, I've just found it hard to find the time to spend enough attention on these nuanced changes.

askama_shared/src/parser.rs

djc · 2021-11-10T09:12:28Z

askama_shared/src/parser.rs

+/// Skips over the any amout of `inner` until `end` was found.
+/// Returns a tuple of the skipped string and the found end marker.
+fn skip_till<'a, I, O>(
+    mut inner: impl FnMut(&'a [u8]) -> IResult<&'a [u8], I>,


I think take_content() is now the only caller for skip_till, which means inner is always anychar. Can we simplify? Maybe even inline skip_till() into take_content()?

The function is used in #546, too.
I removed the "inner" parameter, because it's "anychar" for both its uses.
I cleaned up the implementation a bit. I think it's much easier on the eyes now. :)

djc · 2021-11-19T14:37:44Z

@Kijewski do you have a moment to work on this soon? Otherwise I'll probably take it over, I think it's the last blocker to 0.11.

Askama's takes valid UTF-8 files as input. So why operate on byte slices instead of strings? This makes writing some functions a lot simpler.

Kijewski · 2021-11-19T16:39:36Z

I rebased #541 and #546.

djc

Looks great, thanks for all the work on this!

djc reviewed Oct 11, 2021

View reviewed changes

Kijewski mentioned this pull request Oct 12, 2021

Allow whitespace trimming in {{raw}} blocks #546

Merged

djc reviewed Oct 13, 2021

View reviewed changes

askama_shared/src/parser.rs Show resolved Hide resolved

askama_shared/src/parser.rs Outdated Show resolved Hide resolved

askama_shared/src/parser.rs Outdated Show resolved Hide resolved

askama_shared/src/parser.rs Outdated Show resolved Hide resolved

Kijewski mentioned this pull request Oct 13, 2021

Simply multiple parsing functions #545

Closed

djc reviewed Nov 10, 2021

View reviewed changes

Kijewski added 4 commits November 19, 2021 16:28

use nom::error::ErrorKind

df34f81

Simplify ws() and split_ws_parts()

5a8acd1

Simplify identifier() implementation

059952f

Parse &str instead of &[u8]

6fd7a9b

Askama's takes valid UTF-8 files as input. So why operate on byte slices instead of strings? This makes writing some functions a lot simpler.

Simplify take_content() implementation

58a24c8

djc approved these changes Nov 24, 2021

View reviewed changes

djc merged commit 3ef2869 into rinja-rs:main Nov 24, 2021

Kijewski deleted the pr-nom-str branch September 26, 2022 13:00

djc mentioned this pull request Feb 26, 2023

0.12 release planning #722

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse `&str` instead of `&[u8]` #541

Parse `&str` instead of `&[u8]` #541

Kijewski commented Oct 11, 2021

djc left a comment

Kijewski commented Oct 11, 2021

djc commented Oct 12, 2021 •

edited

Loading

Kijewski commented Oct 12, 2021

djc commented Oct 13, 2021

Kijewski commented Oct 13, 2021

djc left a comment

djc Nov 10, 2021

Kijewski Nov 19, 2021

djc commented Nov 19, 2021

Kijewski commented Nov 19, 2021

djc left a comment

Parse &str instead of &[u8] #541

Parse &str instead of &[u8] #541

Conversation

Kijewski commented Oct 11, 2021

djc left a comment

Choose a reason for hiding this comment

Kijewski commented Oct 11, 2021

djc commented Oct 12, 2021 • edited Loading

Kijewski commented Oct 12, 2021

djc commented Oct 13, 2021

Kijewski commented Oct 13, 2021

djc left a comment

Choose a reason for hiding this comment

djc Nov 10, 2021

Choose a reason for hiding this comment

Kijewski Nov 19, 2021

Choose a reason for hiding this comment

djc commented Nov 19, 2021

Kijewski commented Nov 19, 2021

djc left a comment

Choose a reason for hiding this comment

Parse `&str` instead of `&[u8]` #541

Parse `&str` instead of `&[u8]` #541

djc commented Oct 12, 2021 •

edited

Loading