Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Produce HTML documentation using unified/remark/rehype #21490

Closed
wants to merge 9 commits into from

Conversation

rubys
Copy link
Member

@rubys rubys commented Jun 23, 2018

First baby step towards implementing #21476

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are updated
  • commit message follows commit guidelines

@nodejs-github-bot nodejs-github-bot added doc Issues and PRs related to the documentations. tools Issues and PRs related to the tools directory. labels Jun 23, 2018
@rubys
Copy link
Member Author

rubys commented Jun 23, 2018

Replacing the remainder of calls to marked in tools/doc/html.js would be straightforward; basically remark/unified is a pipeline of filters, and depending on where you insert a processor, you have access to raw input, a tree representing the markdown, a tree representing the HTML, and the resulting HTML as a string. Processors can modify the data (as is done in this commit by adding class attributes to the TOC links). The resulting code should be more robust as we have the opportunity to deal with the data in tree form as opposed to attacking strings with regular expressions.

tools/doc/json.js in theory should be simpler as we can deal with the markdown in tree form, and converting a tree to into another tree, and then stringifying that tree as JSON.

The coding parts I'm comfortable with. It is the process by which code gets merged into the baseline that I'm interested in. I figured working on a documentation tool would be a good way to get started.

@@ -24,6 +24,11 @@
const common = require('./common.js');
const fs = require('fs');
const marked = require('marked');
var unified = require('unified');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed and squashed.

@rubys rubys force-pushed the process_gtoc_with_remark branch from 515edb5 to 9b5344b Compare June 23, 2018 22:04
@Trott
Copy link
Member

Trott commented Jun 23, 2018

From within tools/doc, you might want to run npx dmn -f clean to see if it removes a few files.

tools/license-build.sh will need an update before this can land.

I'm going to add the in progress label on this.

If we can get all the code changes in that would result in marked being removed, that would help us make a decent evaluation as to trade-off. But that may be asking a lot.

@Trott Trott added the wip Issues and PRs that are still a work in progress. label Jun 23, 2018
@rubys
Copy link
Member Author

rubys commented Jun 23, 2018

From within tools/doc, you might want to run npx dmn -f clean to see if it removes a few files.

Unlikely at this point as I have only added dependencies so far. Removing comes later.

tools/license-build.sh will need an update before this can land.

Please explain?

I'm going to add the in progress label on this.

Cool.

If we can get all the code changes in that would result in marked being removed, that would help us make a decent evaluation as to trade-off. But that may be asking a lot.

It is possible that I will be able to complete this tomorrow, but if not I don't expect to have much time Monday through Wednesday, but I should be able to wrap this up by next weekend.

I'm happiest if changes are merged incrementally, but that may be asking a lot. I will say that this change should be easy to revert should it come to that. If not, should I have separate branches, or a single branch for the total set of changes?

@rubys
Copy link
Member Author

rubys commented Jun 23, 2018

tools/license-build.sh will need an update before this can land.

Please explain?

Nevermind. Looks straightforward enough.

@Trott
Copy link
Member

Trott commented Jun 24, 2018

Unlikely at this point as I have only added dependencies so far. Removing comes later.

npx dmn -f clean will remove these files:

	deleted:    tools/doc/node_modules/define-properties/.editorconfig
	deleted:    tools/doc/node_modules/define-properties/.eslintrc
	deleted:    tools/doc/node_modules/define-properties/.jscs.json
	deleted:    tools/doc/node_modules/define-properties/.npmignore
	deleted:    tools/doc/node_modules/define-properties/.travis.yml
	deleted:    tools/doc/node_modules/define-properties/CHANGELOG.md
	deleted:    tools/doc/node_modules/define-properties/test/index.js
	deleted:    tools/doc/node_modules/extend/.eslintrc
	deleted:    tools/doc/node_modules/extend/.jscs.json
	deleted:    tools/doc/node_modules/extend/.npmignore
	deleted:    tools/doc/node_modules/extend/.travis.yml
	deleted:    tools/doc/node_modules/extend/CHANGELOG.md
	deleted:    tools/doc/node_modules/extend/component.json
	deleted:    tools/doc/node_modules/foreach/.npmignore
	deleted:    tools/doc/node_modules/foreach/component.json
	deleted:    tools/doc/node_modules/foreach/test.js
	deleted:    tools/doc/node_modules/is-nan/.eslintrc
	deleted:    tools/doc/node_modules/is-nan/.jscs.json
	deleted:    tools/doc/node_modules/is-nan/.npmignore
	deleted:    tools/doc/node_modules/is-nan/.travis.yml
	deleted:    tools/doc/node_modules/is-nan/CHANGELOG.md
	deleted:    tools/doc/node_modules/is-nan/test/index.js
	deleted:    tools/doc/node_modules/is-nan/test/shimmed.js
	deleted:    tools/doc/node_modules/is-nan/test/tests.js
	deleted:    tools/doc/node_modules/kebab-case/.editorconfig
	deleted:    tools/doc/node_modules/kebab-case/.npmignore
	deleted:    tools/doc/node_modules/kebab-case/.travis.yml
	deleted:    tools/doc/node_modules/kebab-case/test.js
	deleted:    tools/doc/node_modules/marked/.npmignore
	deleted:    tools/doc/node_modules/marked/.travis.yml
	deleted:    tools/doc/node_modules/marked/Gulpfile.js
	deleted:    tools/doc/node_modules/marked/bower.json
	deleted:    tools/doc/node_modules/marked/component.json
	deleted:    tools/doc/node_modules/marked/doc/broken.md
	deleted:    tools/doc/node_modules/marked/doc/todo.md
	deleted:    tools/doc/node_modules/mdurl/CHANGELOG.md
	deleted:    tools/doc/node_modules/object-keys/.editorconfig
	deleted:    tools/doc/node_modules/object-keys/.eslintrc
	deleted:    tools/doc/node_modules/object-keys/.jscs.json
	deleted:    tools/doc/node_modules/object-keys/.travis.yml
	deleted:    tools/doc/node_modules/object-keys/CHANGELOG.md
	deleted:    tools/doc/node_modules/object-keys/test/index.js
	deleted:    tools/doc/node_modules/trim/.npmignore
	deleted:    tools/doc/node_modules/trim/History.md
	deleted:    tools/doc/node_modules/trim/component.json
	deleted:    tools/doc/node_modules/x-is-array/.npmignore
	deleted:    tools/doc/node_modules/x-is-array/.travis.yml
	deleted:    tools/doc/node_modules/x-is-array/test/index.js
	deleted:    tools/doc/node_modules/x-is-string/.npmignore
	deleted:    tools/doc/node_modules/x-is-string/.travis.yml
	deleted:    tools/doc/node_modules/x-is-string/test/index.js
	deleted:    tools/doc/node_modules/xtend/.jshintrc
	deleted:    tools/doc/node_modules/xtend/.npmignore
	deleted:    tools/doc/node_modules/xtend/test.js

@Trott
Copy link
Member

Trott commented Jun 24, 2018

It is possible that I will be able to complete this tomorrow, but if not I don't expect to have much time Monday through Wednesday, but I should be able to wrap this up by next weekend.

Do it at whatever pace suits you. Having just one markdown processor in our dependencies is a nice-to-have for sure, but there's certainly no deadline or anything.

Also, be aware that there's always the chance it won't get merged. There may be reasons for using marked that I'm unaware or, or shortcomings of remark that I don't know about. Just mentioning this now because I don't want you to do a ton of work, have it ultimately not merged, and feel like that was never broached as a possibility. :-D

I'm happiest if changes are merged incrementally, but that may be asking a lot. I will say that this change should be easy to revert should it come to that. If not, should I have separate branches, or a single branch for the total set of changes?

Maybe one PR/branch but multiple atomic/incremental commits? Just as long as every commit contains a fully working system. No commit should be broken, of course.

@@ -36,13 +41,31 @@ marked.setOptions({ renderer });

const docPath = path.resolve(__dirname, '..', '..', 'doc');

// add class attributes to index navigation links
Copy link
Contributor

@vsemozhetbyt vsemozhetbyt Jun 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually use a first uppercase letter and a period in the end in comments lately)

// Add class attributes to index navigation links.

@vsemozhetbyt
Copy link
Contributor

vsemozhetbyt commented Jun 24, 2018

I've checked the diff between the results of this PR and the current master and it seems we have no significant changes so far, just these nits:

  1. The order of attributes inside <a> elements is changed (class after or before href).
  2. HTML entities format sometimes is different (&amp; vs &#x26;).
  3. Some empty lines are deleted between elements.

@TimothyGu
Copy link
Member

Is there a way to reuse the remark modules that are already included in our source tree?

@rubys
Copy link
Member Author

rubys commented Jun 24, 2018

Hit a speed bump. I've converted the code to the point where HTML is produced that resembles the input (always a good sign!) for all but one of the files. That file is all.md. Processing that file runs out of heap space.

A small test case: https://gist.github.com/rubys/a061064e0744a5792b3e8331b0dae37d

The file in question is 52K lines. The problem is the size - processing either the first 26K or the last 26K lines works just fine.

Possible solutions include increasing the heap size; processing this one file in chunks, building this file using other means, or aborting this effort.

@vsemozhetbyt
Copy link
Contributor

vsemozhetbyt commented Jun 24, 2018

BTW, all.html generation is currently broken in some way, so it would be good to find any solution for this issue during experiments with other markdown modules.

@rubys rubys changed the title Process gtoc with remark Produce HTML documentation using unified/remark/rehype Jun 25, 2018
@rubys
Copy link
Member Author

rubys commented Jun 25, 2018

There appears to be very few LICENSE files in the node_modules themselves:

~/git/node/tools/doc/node_modules$ find . | grep -i license  | egrep 'unified|remark|rehype'
./unified/LICENSE
./remark/node_modules/unified/LICENSE
./remark/node_modules/vfile/LICENSE
./remark-rehype/LICENSE
./rehype-raw/LICENSE

@Trott
Copy link
Member

Trott commented Jun 25, 2018

This is fantastic so far. Thanks for getting it moving so far so fast. I don't know what the right answer is to the 52K-line heap-exhaustion dilemma, but processing it in chunks seems like a promising idea given that the all.md file itself is a bunch of @includes of other .md files. Additionally, processing it one @include at a time would seem to make it possible to fix the issue noted by @vsemozhetbyt, since the origin of that problem is naming collisions between files.

@vsemozhetbyt
Copy link
Contributor

It seems the cause of the issue is a collision between link references: now we join sources first and then parse/convert them, so all homonymous links are rewritten by the last one. If we could parse/convert the sources first and then join them, the issue may be fixed.

@rubys
Copy link
Member Author

rubys commented Jun 25, 2018

@vsemozhetbyt perhaps something like this: https://gist.github.com/rubys/4a140b5b61eab4cea34aca56098e900e ? Seems to be able to handle the full document without running out of heap space. Perhaps it could be cleaned up and landed separately as a fix for #20100 .

@@ -7,7 +7,15 @@
"node": ">=6"
},
"dependencies": {
"marked": "^0.3.5"
"jsdom": "^11.11.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this dependency intended for this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Fixed.

@vsemozhetbyt
Copy link
Contributor

vsemozhetbyt commented Jun 28, 2018

What we also need to estimate is doc building time with various toolchains, as doc building is run on each CI and local tests. Currently on my machine, master with marked vs this PR branch with unified/remark/rehype:

real    0m46.983s

real    1m32.970s

I.e. this approach is twice as slow. If we use it also for JSON generation, it can be three times as slow. However, it can be faster (or slower) with another all.md processing (as in #21568 or #21544).

@rubys
Copy link
Member Author

rubys commented Jun 28, 2018

Some performance data:

{ gtoc: 31,
  parse: 207,
  firstHeader: 1,
  preprocessText: 8,
  preprocessElements: 131,
  buildToc: 46,
  remark2rehype: 26,
  raw: 262,
  serialize: 41 }

See https://gist.github.com/rubys/4798e9c5418814c8810b295dfb5cbbaa for how I instrumented the code. Here's the command I used:

node tools/doc/generate.js --node-version=v11.0.0 --format=html --analytics= doc/api/fs.md > /dev/null

I picked fs.md as it was the largest. The calls to .bind{} are because unified seems to optimize out multiple calls to the same function, so I made new functions.

The biggest times are for parsing HTML (raw) followed by parsing markdown (parse). Together, they make up 62% of the time.

I haven't looked closely at the JSON processing yet, but given that there is but a single occurrence of lexed.forEach((tok), it seems likely to me that it could be implemented as a single Unified plugin -- a plugin that doesn't modify the tree, but does write to a side file. If so, this plugin could be fed the same parsed tree structure; meaning that a single call to generate could produce both html and json output.

This one change won't eliminate the time difference between marked and remark tool chains, but would significantly reduce it.

Copy link
Contributor

@vsemozhetbyt vsemozhetbyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only started to learn the toolchain and the new html.js, so this is just some nits so far) I will continue on Saturday if I find some more time)

function navClasses() {
return walker;

function walker(node, file) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we do not use the file argument, maybe we can drop it here and in the recursive calls?

.use(markdown)
.use(firstHeader)
.use(preprocessText)
.use(preprocessElements, { filename: filename })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just { filename } here and below?

.replace(/__VERSION__/g, nodeVersion)
.replace('__TOC__', toc)
.replace('__TOC__', content.toc)
.replace('__CONTENT__', content.toString())
Copy link
Contributor

@vsemozhetbyt vsemozhetbyt Jun 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this is significant, but is it worth to make this change at the very end? We have some replacing below, and content.toString() is the biggest insertion here. Would the __GTOC__, analytics and docCreated insertions be faster if it is made before __CONTENT__ replacement? Because this will make .replace() calls operate on a smaller string.

@vsemozhetbyt
Copy link
Contributor

I've read the rest of html.js, LGTM. If #21568 is incorporated and this approach is preferred to the #21081, this may be landable (JSON generation should also be considered though). But we may need to wait for more collaborators chime in.

@vsemozhetbyt
Copy link
Contributor

@rubys
Copy link
Member Author

rubys commented Jun 30, 2018

If #21568 is incorporated, I'll clean up and complete this pull request (for example, the increase in heap size can be removed). I'd rather wait on the JSON generation until a decision is made on remark vs marked, but if this one is approved, I will start on the JSON conversion.

@Trott
Copy link
Member

Trott commented Jul 1, 2018

Should the in progress label be removed?

Trott pushed a commit to Trott/io.js that referenced this pull request Jul 20, 2018
PR-URL: nodejs#21490
Reviewed-By: Vse Mozhet Byt <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
@Trott
Copy link
Member

Trott commented Jul 20, 2018

Landed in 9466865. Fantastic work, @rubys!!!! 🎉

@Trott Trott closed this Jul 20, 2018
@Trott
Copy link
Member

Trott commented Jul 20, 2018

make test-doc and make doc-only now leave some cruft around in the source tree that git does not ignore. Not the end of the world or anything, but if someone feels motivated to clean that up in a subsequent PR, 👍.

@rubys
Copy link
Member Author

rubys commented Jul 20, 2018

I'll try to clean it up as a part of #21697, which is now unblocked.

@vsemozhetbyt

This comment has been minimized.

rubys added a commit to rubys/node that referenced this pull request Jul 21, 2018
Markdown interprets the syntax for optional arguments as a form of a link,
so instead of trying to build up the contents using the node value, use
grab the raw original markup instead.

Fixes regression noted in nodejs#21490 (comment)
vsemozhetbyt pushed a commit that referenced this pull request Jul 21, 2018
Markdown interprets the syntax for optional arguments as a form
of a link, so instead of trying to build up the contents using
the node value, use grab the raw original markup instead.

Fixes regression noted in
#21490 (comment)

PR-URL: #21922
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Vse Mozhet Byt <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
targos pushed a commit that referenced this pull request Jul 21, 2018
PR-URL: #21490
Reviewed-By: Vse Mozhet Byt <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
targos pushed a commit that referenced this pull request Jul 21, 2018
Markdown interprets the syntax for optional arguments as a form
of a link, so instead of trying to build up the contents using
the node value, use grab the raw original markup instead.

Fixes regression noted in
#21490 (comment)

PR-URL: #21922
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Vse Mozhet Byt <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
@vsemozhetbyt
Copy link
Contributor

vsemozhetbyt commented Jul 22, 2018

@rubys Another breaking change: in headings, some optional arguments are linkified and next arguments disappear.

Compare, for example, this doc heading before this PR landed and in the last Nightly (the last one is after the #21922 has been landed so that PR has not fixed this issue).

rubys added a commit to rubys/node that referenced this pull request Jul 22, 2018
ensure optional parameters are not treated as markedown links by
replacing the children of headers nodes with a single text node
containing the raw markup;

Fixes issue identified in nodejs#21490 (comment)
@rubys rubys mentioned this pull request Jul 22, 2018
2 tasks
vsemozhetbyt pushed a commit that referenced this pull request Jul 25, 2018
ensure optional parameters are not treated as markedown links by
replacing the children of headers nodes with a single text node
containing the raw markup;

Fixes issue identified in
#21490 (comment)

PR-URL: #21936
Reviewed-By: Vse Mozhet Byt <[email protected]>
Reviewed-By: Minwoo Jung <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
targos pushed a commit that referenced this pull request Jul 26, 2018
ensure optional parameters are not treated as markedown links by
replacing the children of headers nodes with a single text node
containing the raw markup;

Fixes issue identified in
#21490 (comment)

PR-URL: #21936
Reviewed-By: Vse Mozhet Byt <[email protected]>
Reviewed-By: Minwoo Jung <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
@targos targos mentioned this pull request Jul 31, 2018
@refack refack mentioned this pull request Aug 19, 2018
2 tasks
refack added a commit to refack/node that referenced this pull request Aug 23, 2018
* remove obsolete `node_modules/js-yaml/package.json` target
* remove `@touch` since `npm ci` is always destructive

PR-URL: nodejs#22399
Refs: nodejs#21802
Refs: nodejs#21490
Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
Reviewed-By: Sam Ruby <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Joyee Cheung <[email protected]>
Reviewed-By: Daijiro Wachi <[email protected]>
targos pushed a commit that referenced this pull request Aug 24, 2018
* remove obsolete `node_modules/js-yaml/package.json` target
* remove `@touch` since `npm ci` is always destructive

PR-URL: #22399
Refs: #21802
Refs: #21490
Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
Reviewed-By: Sam Ruby <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Joyee Cheung <[email protected]>
Reviewed-By: Daijiro Wachi <[email protected]>
targos pushed a commit that referenced this pull request Sep 3, 2018
* remove obsolete `node_modules/js-yaml/package.json` target
* remove `@touch` since `npm ci` is always destructive

PR-URL: #22399
Refs: #21802
Refs: #21490
Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
Reviewed-By: Sam Ruby <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Joyee Cheung <[email protected]>
Reviewed-By: Daijiro Wachi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Issues and PRs related to the documentations. tools Issues and PRs related to the tools directory.
Projects
None yet
Development

Successfully merging this pull request may close these issues.