Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Future of this repo #127

Open
gsnedders opened this issue May 26, 2020 · 19 comments
Open

Future of this repo #127

gsnedders opened this issue May 26, 2020 · 19 comments

Comments

@gsnedders
Copy link
Member

gsnedders commented May 26, 2020

@whatwg/html-parser (i.e., @wycats @gsnedders @sideshowbarker @zcorpan @inikulin @hsivonen) + other parser maintainers I'm aware of (@lexborisov, @jdm who seems to review html5ever stuff now, @iabudiab) this repo has been pretty unloved for a while (see also: all the open PRs), so we should probably decide what we're doing with it.

As far as I'm aware, the encoding tests are only used by html5lib, the serializer tests are entirely unused (and in the browser case essentially supplanted by html/syntax/serializing-html-fragments/ in WPT), so outright dropping both of these costs us nothing.

The tree-construction tests could quite easily move to WPT, which would also make it less likely that we continue to grow WPT-only parsing tests.

The tokenizer tests are the hardest ones; as far as I'm aware there's various consumers, but these are tests that cannot be run within the browser setting (as the tokenizer isn't even exposed). With my WPT hat on, I don't really object to them living there, but it's also not obvious they should live there.

@sideshowbarker
Copy link
Contributor

The tokenizer tests are the hardest ones; as far as I'm aware there's various consumers, but these are tests that cannot be run within the browser setting (as the tokenizer isn't even exposed). With my WPT hat on, I don't really object to them living there, but it's also not obvious they should live there.

I think we should move them, because there’s compelling value to moving them (together).

Rationale:

Given that we already have other test suites in WPT that cannot be run with the browser setting, and no WPT policy that prohibits adding more such test suites, at least it’s clear that there would be nothing blocking us from choosing to move the tokenizer tests into WPT along with the tree-construction tests.

Among the good reasons for doing that would be, if we’re going to move the tests at all, we should move the tokenizers tests together with the tree-construction tests — not split them up.

I guess it’s implicit that nobody would want to see the tests split up. So I guess the question comes down to whether we should move them into WPT or continue to keep them here.

I think it’s clear that one big advantage of moving them is that they’re more easily findable by the set of people who already knows about WPT or who are likely to discover WPT — which is a much bigger set than the people who already know about this repo or who are likely to find it.

And getting the tree-construction tests into WPT seems like an obvious big win that strongly argues for moving them out from here and into WPT. And if we want to keep the tests together, then that necessarily means moving the tokenizer tests along with the tree-construction tests.

@gsnedders
Copy link
Member Author

FWIW, I agree with @sideshowbarker here; I was trying to avoid leading anyone to any specific decision in the OP.

On the whole my inclination is to migrate this all over in a few weeks unless anyone objects.

@jgraham
Copy link
Member

jgraham commented May 27, 2020

I don't object.

@nolanw
Copy link
Contributor

nolanw commented May 27, 2020

I use the encoding tests in my lil toy parser. That's no reason to keep them, just wanted to mention that the work wasn't unused :)

I don't know what WPT is (I'm pretty out of the loop), so a handy link to wherever the tests end up would be awesome.

@sideshowbarker
Copy link
Contributor

@nolanw
Copy link
Contributor

nolanw commented May 28, 2020

@sideshowbarker Thank you!

@Ms2ger
Copy link
Contributor

Ms2ger commented May 28, 2020

Moving to wpt SGTM.

@craigbarnes
Copy link

If these tests are moved to WPT, does that imply changing the format at all?

@gsnedders
Copy link
Member Author

@craigbarnes no, it doesn't

@hsivonen
Copy link
Member

hsivonen commented Jun 1, 2020

Since the non-browser test harnesses for the Validator.nu HTML Parser use the present input formats and are more sensitive to format changes that, as I understand it, the html5lib harness, I'd prefer to avoid format changes and I'd like to keep the non-scripted tree construction tests clearly separate from the scripted ones.

I'm OK with moving these to the WPT repo if it's up to the WPT harness to consume the tests in their current format for in-browser testing.

@zcorpan
Copy link
Contributor

zcorpan commented Jun 4, 2020

I'm ok with moving to wpt.

@untitaker
Copy link
Contributor

untitaker commented Mar 1, 2022

these are tests that cannot be run within the browser setting (as the tokenizer isn't even exposed)

not to derail the thread too much, but couldn't you "run" them in the browser by:

  1. constructing tree/dom from the raw input of the test using browser
  2. constructing tree/dom from the expected output tokens of the test using some unrelated, non-browser "reference" tree builder implementation
  3. compare the resulting doms

@gsnedders
Copy link
Member Author

these are tests that cannot be run within the browser setting (as the tokenizer isn't even exposed)

not to derail the thread too much, but couldn't you "run" them in the browser by:

  1. constructing tree/dom from the raw input of the test using browser
  2. constructing tree/dom from the expected output tokens of the test using some unrelated, non-browser "reference" tree builder implementation
  3. compare the resulting doms

Assuming we have a correct reference, yes.

But this often doesn't actually handle a whole load of the tests (e.g., if the last token didn't actually change the tree in any way—like if it's most closing tags).

@gsnedders
Copy link
Member Author

#141 is perhaps the most interesting potential complexity here—various people have added some CI recently, partly so better pay attention to what line/character offsets parse errors occur at.

cc @fb55, @flavorjones given you're the other people recently active who maintain non-browser implementations

@fb55
Copy link
Contributor

fb55 commented Mar 30, 2023

I opened several PRs in this repo last year that ended up breaking some parser implementation due to aspects that parse5 doesn't use (usually error locations), prompting a fix after the PR was merged. That experience is quite negative, and having CI set up for relevant parsers would be a big win.

As this issue is about porting this repo to WPT: WPT can definitely run the CI jobs that were proposed, but given how small of a part parser tests would be, I can see how this can become an issue.

I do enjoy the simplicity of using the parser tests as a submodule, and how I can use dependabot to open a PR when something was merged. The availability of more reviewers might outweigh all downsides though.

@zcorpan
Copy link
Contributor

zcorpan commented Mar 31, 2023

Which parsers are the relevant parsers that we should cover in CI? (Edit: oh, this is #141 )

I think a benefit of having the test data in wpt is that it's easier for browser engineers to fix and add tests, given existing syncing infra with wpt. For the dependabot usecase, would it work to automatically import test data to this repo from wpt when the test data changes in wpt?

@fb55
Copy link
Contributor

fb55 commented Mar 31, 2023

For the dependabot usecase, would it work to automatically import test data to this repo from wpt when the test data changes in wpt?

Dependabot can update submodules automatically. Submodules are always for an entire repository, so a move to WPT would add a lot of noise and I'd have to find another solution. I like the convenience of the current setup, but care much more about html5lib-tests being maintained.

@zcorpan
Copy link
Contributor

zcorpan commented Mar 31, 2023

Yeah so if we move the test data to wpt but also set up automation in this repo to copy the test data back to this repo whenever the test data in wpt changes, it seems to me that it would work the same for downstream projects in terms of updating tests. It would only change where to contribute test changes (wpt instead of this repo).

@fb55
Copy link
Contributor

fb55 commented Mar 31, 2023

That makes a lot of sense to me. As a contributor, I'd still like to see downstream tests, but from a consumer's perspective I don't think moving the WPT would be much of an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants