Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse &str instead of &[u8] #541

Merged
merged 5 commits into from
Nov 24, 2021
Merged

Parse &str instead of &[u8] #541

merged 5 commits into from
Nov 24, 2021

Conversation

Kijewski
Copy link
Collaborator

Askama's takes valid UTF-8 files as input. So why operate on byte slices
instead of strings? This makes writing some functions a lot simpler.

Copy link
Collaborator

@djc djc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think this makes sense directionally. Some questions about the new implementations though.

askama_shared/src/parser.rs Show resolved Hide resolved
askama_shared/src/parser.rs Outdated Show resolved Hide resolved
askama_shared/src/parser.rs Outdated Show resolved Hide resolved
askama_shared/src/parser.rs Outdated Show resolved Hide resolved
askama_shared/src/parser.rs Outdated Show resolved Hide resolved
@Kijewski
Copy link
Collaborator Author

Should I turn the third commit https://github.com/djc/askama/pull/541/commits/edc3cf80d6708800322f716b90ce1ae3a939f6c0 into a new PR?

@djc
Copy link
Collaborator

djc commented Oct 12, 2021

No, the third commit can stay in this PR.

So I guess my feeling is that a lot of the substantial changes to take_content(), split_ws_parts(), identifier() and nested_parenthesis() are really kind of orthogonal to this PR, that is, there is probably a way to rewrite those to deal with &str instead of &[u8] while keeping their structure substantially similar (but do tell me if I'm wrong!). If so, I'd prefer to keep them that way and consider their refactoring in a separate PR.

@Kijewski
Copy link
Collaborator Author

I refactored the commits and opened PR #545. This PR is now based on #545.

@djc
Copy link
Collaborator

djc commented Oct 13, 2021

So this is kind of the other way around from what I intended. I'm actually more interested in the &[u8] to &str conversion and less interested in the other parser function changes -- I'd like to consider each of the latter separately. Would you be willing to do the minimal modification for each of these that gets them working in the &str world?

(If that is harder than I'm making it out to be or if you're just not interested in doing the work, that's fine too.)

@Kijewski
Copy link
Collaborator Author

Iterating strings is just no fun in Rust. One little mistake and you are splitting a codepoint, which causes a panic. That's why I wanted to let nom do the error prone stuff instead of porting the &[u8] implementation directly.

Also I'm not 100% sure the current implementation in main is 100% correct. E.g.

  • ws() uses the allocating function many0(),
  • take_content() needs two ASCII characters a block where is first character of the three variants is more of a set, e.g. if you have the delimitters "{{", "<%" and "($", then "<%" works too (but that's mentioned in the documentation)

askama_shared/src/parser.rs Show resolved Hide resolved
askama_shared/src/parser.rs Outdated Show resolved Hide resolved
askama_shared/src/parser.rs Outdated Show resolved Hide resolved
askama_shared/src/parser.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@djc djc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again sorry for the slow review. Your contributions are highly valued, I've just found it hard to find the time to spend enough attention on these nuanced changes.

askama_shared/src/parser.rs Outdated Show resolved Hide resolved
askama_shared/src/parser.rs Outdated Show resolved Hide resolved
/// Skips over the any amout of `inner` until `end` was found.
/// Returns a tuple of the skipped string and the found end marker.
fn skip_till<'a, I, O>(
mut inner: impl FnMut(&'a [u8]) -> IResult<&'a [u8], I>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think take_content() is now the only caller for skip_till, which means inner is always anychar. Can we simplify? Maybe even inline skip_till() into take_content()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is used in #546, too.
I removed the "inner" parameter, because it's "anychar" for both its uses.
I cleaned up the implementation a bit. I think it's much easier on the eyes now. :)

@djc
Copy link
Collaborator

djc commented Nov 19, 2021

@Kijewski do you have a moment to work on this soon? Otherwise I'll probably take it over, I think it's the last blocker to 0.11.

Askama's takes valid UTF-8 files as input. So why operate on byte slices
instead of strings? This makes writing some functions a lot simpler.
@Kijewski
Copy link
Collaborator Author

I rebased #541 and #546.

Copy link
Collaborator

@djc djc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks for all the work on this!

@djc djc merged commit 3ef2869 into rinja-rs:main Nov 24, 2021
@Kijewski Kijewski deleted the pr-nom-str branch September 26, 2022 13:00
@djc djc mentioned this pull request Feb 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants