-
-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More forgiving tag names #33
Comments
I'm not sure if this can be implemented because with non-standard tag names you will get completely different grammar. |
I'm not sure what you mean by "different grammar". How about adding some flag to allow it, or perhaps going as far as making the parser customizable like skunks? |
I mean formal grammar. |
Ok, but what if that grammar were customizable? This brings us back to skunks. |
Then you will end up with the parser generator. Looks like |
Would it be silly to make parse5 both forgiving and unforgiving? |
Spec-compatible parser is a parser that can parse HTML, "forgiving" parser is a parser that can "somehow parse some subset of the HTML". Having both in the one package doesn't makes sense for me. |
Ok, thank you. My reasoning would be for parsing spec-compatible HTML (nested comments, |
Hi - I'm also quite interested in this, in this case for a new iteration of the WordPress editor that we're working on. We plan to use HTML comments as a "pseudo-block-tag" to store post content, and these "pseudo-tags" will contain HTML content inside of them. Here's an example. I'm curious about your thoughts on how to parse a structure like this - it's not too different from Handlebar templates, but I agree that the robustness of |
@nylen I'd switched projects and haven't yet gotten around to solving this one. |
@nylen We don't have any plans to support grammars except HTML. I recommend to stick with another project or create a fork of parse5 which will support your custom grammar. |
Why not use custom HTML elements for this purpose? E.g. |
We want to preserve the structure of existing markup as much as possible, so that browsers will render it correctly without modification. We also want to avoid adding extra container tags because this will break things like CSS rules that apply to specific sections of post content. A bit more context about the parsing specifically and what I would like to achieve there: WordPress/gutenberg#391 |
@nylen It would be nice to have some context before giving any advice. What's the lifecycle of these "pseudo-block-tags": who create them, how they processed, how their content is displayed, is there any sanity check required for content, etc.? |
Probably the easiest way to explain that is to point you to one of our prototypes: https://wordpress.github.io/gutenberg/tinymce-per-block/ There's a lot of needed functionality/UX missing from the prototype, but the basic idea is there: to re-work editing a WordPress post into editing a series of "blocks". These "blocks" will be delimited by HTML comments. You can see how this is serialized by clicking the "Html" button. However, block delimiters have changed since then, to be more robust and look as follows: <!-- wp:core/text -->
Welcome to WordPress. This is your first post. Edit or delete it, then start writing!
<!-- /wp:core/text --> If you're interested in reading further, I'd recommend taking a look at the links in the Overview section of our project readme. |
Seems like I've got it. As I recall there was something similar on tumblr . |
Well if those parts are always edited separately then workflow is quite simple: parsed document, get all child nodes between matching comment nodes, serialize them and dump them to editor. On save parse given fragment with |
Or, even better:
|
Rather than just extracting substrings, I think parsing the HTML inside of the block delimiters is part of the task. We expect to have many different types of blocks, including implementations by third-party code via plugins. It seems much better to me to provide a "recommended" way to handle parsing and verify that the markup inside of a block is actually valid for that block type, providing a fallback otherwise. I had hoped to achieve this in a single parsing step by extending a library like |
@nylen Let me know if you'll need any assistance. |
Bumps [ts-jest](https://github.com/kulshekhar/ts-jest) from 27.1.1 to 27.1.2. - [Release notes](https://github.com/kulshekhar/ts-jest/releases) - [Changelog](https://github.com/kulshekhar/ts-jest/blob/main/CHANGELOG.md) - [Commits](kulshekhar/ts-jest@v27.1.1...v27.1.2) --- updated-dependencies: - dependency-name: ts-jest dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
is currently parsed as text.
I realize that this is not standard HTML5, but it'd be nice to benefit from many of this lib's features when parsing HTML variants such as Handlebars templates.
The text was updated successfully, but these errors were encountered: