More forgiving tag names #33

stevenvachon · 2015-01-05T21:39:50Z

<{{tag}}>asdf</{{tag}}>

is currently parsed as text.

I realize that this is not standard HTML5, but it'd be nice to benefit from many of this lib's features when parsing HTML variants such as Handlebars templates.

inikulin · 2015-01-10T15:54:47Z

I'm not sure if this can be implemented because with non-standard tag names you will get completely different grammar.

stevenvachon · 2015-01-10T18:02:57Z

I'm not sure what you mean by "different grammar".

How about adding some flag to allow it, or perhaps going as far as making the parser customizable like skunks?

inikulin · 2015-01-10T18:07:01Z

I mean formal grammar.

stevenvachon · 2015-01-10T18:09:04Z

Ok, but what if that grammar were customizable? This brings us back to skunks.

inikulin · 2015-01-10T18:16:21Z

Then you will end up with the parser generator. Looks like Skunks is a co-called 'forgiving' tokenizer, while parse5 was designed to be precise spec-compatible parser.

stevenvachon · 2015-01-10T18:17:24Z

Would it be silly to make parse5 both forgiving and unforgiving?

inikulin · 2015-01-10T18:34:39Z

Spec-compatible parser is a parser that can parse HTML, "forgiving" parser is a parser that can "somehow parse some subset of the HTML". Having both in the one package doesn't makes sense for me.

stevenvachon · 2015-01-10T18:37:41Z

Ok, thank you.

My reasoning would be for parsing spec-compatible HTML (nested comments, checked="checked", etc) but with non-spec additions. htmlparser2 is a good "forgiving" parser, but it has trouble in some areas. Handlebars is a good example of the need for both forgiving and spec-compatible, without having to venture into a completely custom parser.

nylen · 2017-04-11T14:49:39Z

Hi - I'm also quite interested in this, in this case for a new iteration of the WordPress editor that we're working on. We plan to use HTML comments as a "pseudo-block-tag" to store post content, and these "pseudo-tags" will contain HTML content inside of them. Here's an example.

I'm curious about your thoughts on how to parse a structure like this - it's not too different from Handlebar templates, but I agree that the robustness of parse5 would be a big benefit. @stevenvachon what did you end up doing here?

stevenvachon · 2017-04-11T14:56:14Z

@nylen I'd switched projects and haven't yet gotten around to solving this one.

inikulin · 2017-04-11T15:01:04Z

@nylen We don't have any plans to support grammars except HTML. I recommend to stick with another project or create a fork of parse5 which will support your custom grammar.

inikulin · 2017-04-11T15:03:42Z

@nylen

We plan to use HTML comments as a "pseudo-block-tag" to store post content, and these "pseudo-tags" will contain HTML content inside of them.

Why not use custom HTML elements for this purpose? E.g. <wordpress-post-content>

nylen · 2017-04-11T15:17:51Z

Why not use custom HTML elements for this purpose?

We want to preserve the structure of existing markup as much as possible, so that browsers will render it correctly without modification. We also want to avoid adding extra container tags because this will break things like CSS rules that apply to specific sections of post content.

A bit more context about the parsing specifically and what I would like to achieve there: WordPress/gutenberg#391

inikulin · 2017-04-11T15:26:26Z

@nylen It would be nice to have some context before giving any advice. What's the lifecycle of these "pseudo-block-tags": who create them, how they processed, how their content is displayed, is there any sanity check required for content, etc.?

nylen · 2017-04-11T15:33:32Z

Probably the easiest way to explain that is to point you to one of our prototypes: https://wordpress.github.io/gutenberg/tinymce-per-block/

There's a lot of needed functionality/UX missing from the prototype, but the basic idea is there: to re-work editing a WordPress post into editing a series of "blocks". These "blocks" will be delimited by HTML comments. You can see how this is serialized by clicking the "Html" button. However, block delimiters have changed since then, to be more robust and look as follows:

<!-- wp:core/text -->
Welcome to WordPress. This is your first post. Edit or delete it, then start writing!
<!-- /wp:core/text -->

If you're interested in reading further, I'd recommend taking a look at the links in the Overview section of our project readme.

inikulin · 2017-04-11T15:37:49Z

There's a lot of needed functionality/UX missing from the prototype, but the basic idea is there: to re-work editing a WordPress post into editing a series of "blocks". These "blocks" will be delimited by HTML comments. You can see how this is serialized by clicking the "Html" button

Seems like I've got it. As I recall there was something similar on tumblr .

inikulin · 2017-04-11T15:44:49Z

Well if those parts are always edited separately then workflow is quite simple: parsed document, get all child nodes between matching comment nodes, serialize them and dump them to editor. On save parse given fragment with parseFragment (this will automatically strip unwanted elements like <head>) perform sanity checks if necessary, then insert those nodes into parsed document and serialize it.

inikulin · 2017-04-11T15:52:33Z

Or, even better:

Use SAXParser with location info enabled and get locations of content between two matching comments for required section. (You can stop parsing once you found what you need).
Dump found substring to editor
On save parse given fragment with parseFragment (this will automatically strip unwanted elements like ) perform sanity checks if necessary
Insert new content instead of substring that was obtained earlier

nylen · 2017-04-11T16:37:44Z

Rather than just extracting substrings, I think parsing the HTML inside of the block delimiters is part of the task. We expect to have many different types of blocks, including implementations by third-party code via plugins. It seems much better to me to provide a "recommended" way to handle parsing and verify that the markup inside of a block is actually valid for that block type, providing a fallback otherwise.

I had hoped to achieve this in a single parsing step by extending a library like parse5, but that may not be possible. Another reason for this is that there are also other considerations - WordPress post content can contain shortcodes, yet another type of tag which needs another grammar extension. Eventually we'd like to detect these and transparently upgrade them to the equivalent "block" representation.

inikulin · 2017-04-12T09:13:55Z

@nylen Let me know if you'll need any assistance.

Bumps [ts-jest](https://github.com/kulshekhar/ts-jest) from 27.1.1 to 27.1.2. - [Release notes](https://github.com/kulshekhar/ts-jest/releases) - [Changelog](https://github.com/kulshekhar/ts-jest/blob/main/CHANGELOG.md) - [Commits](kulshekhar/ts-jest@v27.1.1...v27.1.2) --- updated-dependencies: - dependency-name: ts-jest dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

stevenvachon mentioned this issue Jan 5, 2015

parse5 and streaming #26

Closed

inikulin closed this as completed Jan 10, 2015

nylen mentioned this issue Apr 13, 2017

Propose block APIs for backwards compatibility WordPress/gutenberg#413

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More forgiving tag names #33

More forgiving tag names #33

stevenvachon commented Jan 5, 2015

inikulin commented Jan 10, 2015

stevenvachon commented Jan 10, 2015

inikulin commented Jan 10, 2015

stevenvachon commented Jan 10, 2015

inikulin commented Jan 10, 2015

stevenvachon commented Jan 10, 2015

inikulin commented Jan 10, 2015

stevenvachon commented Jan 10, 2015

nylen commented Apr 11, 2017

stevenvachon commented Apr 11, 2017

inikulin commented Apr 11, 2017

inikulin commented Apr 11, 2017

nylen commented Apr 11, 2017

inikulin commented Apr 11, 2017

nylen commented Apr 11, 2017

inikulin commented Apr 11, 2017

inikulin commented Apr 11, 2017

inikulin commented Apr 11, 2017

nylen commented Apr 11, 2017

inikulin commented Apr 12, 2017

More forgiving tag names #33

More forgiving tag names #33

Comments

stevenvachon commented Jan 5, 2015

inikulin commented Jan 10, 2015

stevenvachon commented Jan 10, 2015

inikulin commented Jan 10, 2015

stevenvachon commented Jan 10, 2015

inikulin commented Jan 10, 2015

stevenvachon commented Jan 10, 2015

inikulin commented Jan 10, 2015

stevenvachon commented Jan 10, 2015

nylen commented Apr 11, 2017

stevenvachon commented Apr 11, 2017

inikulin commented Apr 11, 2017

inikulin commented Apr 11, 2017

nylen commented Apr 11, 2017

inikulin commented Apr 11, 2017

nylen commented Apr 11, 2017

inikulin commented Apr 11, 2017

inikulin commented Apr 11, 2017

inikulin commented Apr 11, 2017

nylen commented Apr 11, 2017

inikulin commented Apr 12, 2017