Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docgen: Optimize README update script #18840

Merged
merged 17 commits into from
Mar 26, 2020
Merged

docgen: Optimize README update script #18840

merged 17 commits into from
Mar 26, 2020

Conversation

aduth
Copy link
Member

@aduth aduth commented Nov 30, 2019

Previously: #15679, #15200

This pull request seeks to optimize and refactor the bin/update-readmes.js script. The resulting changes should decrease runtime of this script by 90% or more for typical usage (baseline of 6.65s to 0.6s lint-staged single package or 1.275s complete run average).

The goal here was largely to reduce the delay in pre-commit tasks.

Implementation Notes:

Influenced by previous optimizations to the Gutenberg build script in #15230, the approach here uses a fast-glob stream, allowing compilation to begin even before all files are known.

Like in #15200, the process is now again asynchronous, but incorporates necessary revisions from #15679 to ensure that tokens within a single file can only be replaced in sequence, not in parallel, to avoid conflict.

It avoids a hard-coded list in favor of reading in the contents of discovered README files to determine whether replacement tokens exist. A "new" syntax was incorporated for files which contain multiple tokens that source from separate files (see "Autogenerated actions", "Autogenerated selectors"). This is really more a convention considered by update-readmes.js, and doesn't require any revisions to the docgen tool.

Finally, since lint-staged will already provide the staged files as arguments to the script, we can use this to filter packages to those where modifications have occurred.

Testing Instructions:

Verify that changes to a file covered by docgen will update the corresponding README, including those with multiple or custom "Autogenerated" tokens (e.g. core-data).

Verify that the behavior of lint-staged is not regressed. You can do this by editing a JavaScript file, staging it (git add), then running npx lint-staged.

@aduth aduth added [Type] Build Tooling Issues or PRs related to build tooling [Type] Performance Related to performance efforts [Tool] Docgen /packages/docgen labels Nov 30, 2019
@aduth aduth requested a review from oandregal November 30, 2019 04:18
@oandregal
Copy link
Member

oandregal commented Dec 4, 2019

The performance gains that something like this could yield are great! Thanks for working on this.

A couple of things:

  • This doesn't work with node latest v10 LTS (it does with v12 LTS). Executing the script directly node ./bin/update-readmes.js yields some error messages that can help investigate.
  • I'd suggest testing with the worst-case scenario: that both the package's README and the handbook's data API docs need to be updated. Are the changes in bin/update-readmes.js portable to ./docs/tool/update-data.js?

Testing instructions for the worst-case scenario

# edit the JSON comment of an exported entity in `packages/core-data/src/actions.js`
git add packages/core-data/src/actions.js
git commit -m 'testing'

The expected result is that:

  • two files should have been updated: packages/core-data/README.md and docs/designers-developers/developers/data/data-core.md.
  • the exit code is 1 (echo $? gives you the exit code of the last command).

bin/update-readmes.js Outdated Show resolved Hide resolved
bin/update-readmes.js Outdated Show resolved Hide resolved
// Each target operates over the same file, so it needs to be processed synchronously,
// as to make sure the processes don't overwrite each other.
const { status, stderr } = spawnSync(
join( __dirname, '..', 'node_modules', '.bin', 'docgen' ).replace( / /g, '\\ ' ),
Copy link
Member Author

@aduth aduth Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: The changes here remove the String#replace fix which was introduced by @MarkMarzeotti in #18253 to improve Windows support.

My reading of the documentation of execa leads me to believe that this should be handled by default in the library, as well as potential other issues we could have with Windows support:

Node has issues when using spawn on Windows:

[...]

  • Has problems running commands with spaces

https://github.com/sindresorhus/execa#why
https://github.com/moxystudio/node-cross-spawn#why

@aduth
Copy link
Member Author

aduth commented Dec 4, 2019

  • This doesn't work with node latest v10 LTS (it does with v12 LTS). Executing the script directly node ./bin/update-readmes.js yields some error messages that can help investigate.

Well, my understanding is that we target the current LTS for support. This was documented previously, but apparently was removed as part of the changes in #17004. I'm not really sure why, because it leaves some ambiguity to what we support (commented at #17004 (comment)).

Personally, I think we should restore this documented expectation, and not care to support v10 and earlier.

@aduth
Copy link
Member Author

aduth commented Dec 4, 2019

  • I'd suggest testing with the worst-case scenario: that both the package's README and the handbook's data API docs need to be updated. Are the changes in bin/update-readmes.js portable to ./docs/tool/update-data.js?

I wasn't aware of this other script. It seems it's not updated correctly, as per your suspicion. Looking at the code, I'd guess at the very least the keys of "Autogenerated selectors" would need to be updated, else try to adapt similar changes to what was applied here for update-readmes.js into update-data.js.

@oandregal
Copy link
Member

oandregal commented Dec 5, 2019

Personally, I think we should restore this documented expectation, and not care to support v10 and earlier.

I'm fine with this. However, I wanted to note that, at the moment, node has active two LTS lines and v10 will be LTS until April 2020. Don't know why they do this and I also don't see why we wouldn't upgrade to v12 given that it's easy to install it into any system with a nvm/n kind of setup. I wanted to share anyway to not leave anyone behind.

@aduth
Copy link
Member Author

aduth commented Dec 6, 2019

However, I wanted to note that, at the moment, node has active two LTS lines and v10 will be LTS until April 2020.

Followed-up at #18923 (comment).

That's a good observation. In the updated documentation in #18923, I suggest we note this as "latest active LTS", especially when we're already promoting use of nvm which will install on this basis anyways.

@aduth
Copy link
Member Author

aduth commented Dec 6, 2019

On the topic of docs/tool/index.js, I would really like if we could find some way to consolidate these to a single script. I'm still trying to think through how that might look. Do you have any thoughts on this?

I think the use of "Autogenerated actions" and "Autogenerated selectors" could serve as some sort of distinction of what should qualify as data documentation, but there's a few edge cases to consider:

  • We include core-data action/selector documentation in its README.md
  • We effectively rename the core-data package to "core" for its data documentation as a hard-coded exception

@oandregal oandregal force-pushed the update/fast-update-readmes branch from 3958615 to 21963bf Compare December 19, 2019 12:23
@oandregal
Copy link
Member

@aduth now that #18820 has landed, I've rebased this in an attempt to ease the burden from you. I hope it's useful, but I still have the old branch locally, in case you rather want it back.

This doesn't work as expected as per the instructions at #18840 (comment) (readme package docs aren't updated). We can port the changes to docs/tool/update-data when it does.

@aduth
Copy link
Member Author

aduth commented Dec 20, 2019

@nosolosw Thanks for helping out with the refresh! I haven't had a chance yet to review the updates in much detail, but it looks good at a glance.

A few things I'd like to do here:

  • Eliminate bin/api-docs/packages.js
  • Eliminate docs/tool/update-data.js
    • I'm thinking this can be collapsed into the existing behavior of update-readmes.js, where the file glob is updated to consider both package READMEs and this data documentation, and the data documentation can leverage the "file reference" syntax introduced here as a means to point to the actions/selectors files from where the documentation is to be extracted.

@github-actions
Copy link

github-actions bot commented Mar 21, 2020

Size Change: 0 B

Total Size: 860 kB

ℹ️ View Unchanged
Filename Size Change
build/a11y/index.js 998 B 0 B
build/annotations/index.js 3.43 kB 0 B
build/api-fetch/index.js 3.39 kB 0 B
build/autop/index.js 2.58 kB 0 B
build/blob/index.js 620 B 0 B
build/block-directory/index.js 6.02 kB 0 B
build/block-directory/style-rtl.css 760 B 0 B
build/block-directory/style.css 760 B 0 B
build/block-editor/index.js 102 kB 0 B
build/block-editor/style-rtl.css 11 kB 0 B
build/block-editor/style.css 11 kB 0 B
build/block-library/editor-rtl.css 7.22 kB 0 B
build/block-library/editor.css 7.23 kB 0 B
build/block-library/index.js 110 kB 0 B
build/block-library/style-rtl.css 7.44 kB 0 B
build/block-library/style.css 7.45 kB 0 B
build/block-library/theme-rtl.css 669 B 0 B
build/block-library/theme.css 671 B 0 B
build/block-serialization-default-parser/index.js 1.65 kB 0 B
build/block-serialization-spec-parser/index.js 3.1 kB 0 B
build/blocks/index.js 57.5 kB 0 B
build/components/index.js 191 kB 0 B
build/components/style-rtl.css 15.8 kB 0 B
build/components/style.css 15.7 kB 0 B
build/compose/index.js 6.21 kB 0 B
build/core-data/index.js 10.6 kB 0 B
build/data-controls/index.js 1.04 kB 0 B
build/data/index.js 8.25 kB 0 B
build/date/index.js 5.37 kB 0 B
build/deprecated/index.js 771 B 0 B
build/dom-ready/index.js 568 B 0 B
build/dom/index.js 3.06 kB 0 B
build/edit-post/index.js 91.2 kB 0 B
build/edit-post/style-rtl.css 8.47 kB 0 B
build/edit-post/style.css 8.46 kB 0 B
build/edit-site/index.js 6.72 kB 0 B
build/edit-site/style-rtl.css 2.88 kB 0 B
build/edit-site/style.css 2.88 kB 0 B
build/edit-widgets/index.js 4.43 kB 0 B
build/edit-widgets/style-rtl.css 2.58 kB 0 B
build/edit-widgets/style.css 2.58 kB 0 B
build/editor/editor-styles-rtl.css 428 B 0 B
build/editor/editor-styles.css 431 B 0 B
build/editor/index.js 43.8 kB 0 B
build/editor/style-rtl.css 4 kB 0 B
build/editor/style.css 3.98 kB 0 B
build/element/index.js 4.44 kB 0 B
build/escape-html/index.js 733 B 0 B
build/format-library/index.js 6.95 kB 0 B
build/format-library/style-rtl.css 502 B 0 B
build/format-library/style.css 502 B 0 B
build/hooks/index.js 1.93 kB 0 B
build/html-entities/index.js 622 B 0 B
build/i18n/index.js 3.49 kB 0 B
build/is-shallow-equal/index.js 710 B 0 B
build/keyboard-shortcuts/index.js 2.3 kB 0 B
build/keycodes/index.js 1.69 kB 0 B
build/list-reusable-blocks/index.js 2.99 kB 0 B
build/list-reusable-blocks/style-rtl.css 226 B 0 B
build/list-reusable-blocks/style.css 226 B 0 B
build/media-utils/index.js 4.84 kB 0 B
build/notices/index.js 1.57 kB 0 B
build/nux/index.js 3.01 kB 0 B
build/nux/style-rtl.css 616 B 0 B
build/nux/style.css 613 B 0 B
build/plugins/index.js 2.54 kB 0 B
build/primitives/index.js 1.5 kB 0 B
build/priority-queue/index.js 781 B 0 B
build/redux-routine/index.js 2.84 kB 0 B
build/rich-text/index.js 14.5 kB 0 B
build/server-side-render/index.js 2.55 kB 0 B
build/shortcode/index.js 1.7 kB 0 B
build/token-list/index.js 1.27 kB 0 B
build/url/index.js 4.01 kB 0 B
build/viewport/index.js 1.61 kB 0 B
build/warning/index.js 1.14 kB 0 B
build/wordcount/index.js 1.18 kB 0 B

compressed-size-action

@aduth
Copy link
Member Author

aduth commented Mar 21, 2020

I've refreshed this pull request. As I mentioned in previous comments, I was able to absorb all of the functionality of data and package documentation to a single script and eliminate the hard-coded listing of packages.

Average run times are ~1.8s for a full run (down from ~6.3s). However, I expect the most common usage will only need to regenerate documentation for a single package (via pre-commit arguments), where the average run time is more in the range of ~0.7s.

aduth added a commit that referenced this pull request Mar 21, 2020
@oandregal
Copy link
Member

I'm reviewing this (and pushed a path fix).

Copy link
Member

@oandregal oandregal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very excited about the performance gains and simplification this brings in: in my testing, it takes 1/3 the time it took to validate and generate the files. Thanks for your work, Andrew!

@aduth
Copy link
Member Author

aduth commented Mar 25, 2020

@sirreal This may be an interesting case to consider for your work in #18942. The changes here seek to opt-in the bin/ file to type-checking. In #18942, we're removing this configuration, effectively only allowing packages to be type-checked.

Can you imagine there to be any compromise? Should we just accept that these scripts won't be type-checked? Would individual top-of-file // @ts-check flags suffice?

<!-- END TOKEN(Autogenerated actions) -->

<!-- END TOKEN(Autogenerated actions|../../../../packages/core-data/src/actions.js) -->
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full-transparency: I'm not entirely sure that the extra newline here would be expected. And, if I recall correctly, these were some of the symptoms we saw in the earlier parallelization efforts that ultimately led to their revert. But when I debugged the code flow of the documentation build script, it was always the case that no two docgen processes would run on the same document at the same time, which was the problem we had in the earlier implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something that I've looked and tested specifically by modifying core-data's selectors and actions, whose changes should be reflected in the package's README and the handbook's data-core doc. It's working as expected: different files are processed asynchronously (good for performance), but different tokens within the same file are processed synchronously (required for docgen). So that's 👍

I don't have an answer for the extra line, though. It gives me confidence that it's consistent and doesn't depend on input order: note how the extra line is always added before the end of the second token for all files, while in the master implementation that extra line was always added at the end of the first token. In data-core.md the last item is the actions while in the README.md is the selectors. I've also tested for things like input variance (reversed input data and output is the same):

node ./bin/api-docs/update-api-docs.js packages/core-data/src/selectors.js packages/core-data/src/actions.js

and

node ./bin/api-docs/update-api-docs.js packages/core-data/src/actions.js packages/core-data/src/selectors.js

produce the same results.

@aduth aduth force-pushed the update/fast-update-readmes branch from 4119f47 to 278ae89 Compare March 25, 2020 21:03
@aduth
Copy link
Member Author

aduth commented Mar 25, 2020

Rebased, since there were failures related to yesterday's issues (#21118).

It's end of day for me, so I'll plan to merge this tomorrow assuming everything is in order, and to allow time for the build to pass and potential responses to #18840 (comment) and/or #18840 (comment). I'm not too worried if we just have to remove this from type-checking if it becomes a problem for #18942.

@sirreal
Copy link
Member

sirreal commented Mar 25, 2020

@sirreal This may be an interesting case to consider for your work in #18942.

Thanks for bringing this up.

The changes here seek to opt-in the bin/ file to type-checking. In #18942, we're removing this configuration, effectively only allowing packages to be type-checked.

I don't believe anything in #18942 will be compatible with the changes here. The latest iterations in #18942 continue to include "index" into all the typed projects (not necessarily packages). Scripts in bin could be typechecked as well, either as part of the main build or independently.

gutenberg/tsconfig.json

Lines 2 to 13 in 31ea1e7

"references": [
{ "path": "packages/a11y" },
{ "path": "packages/blob" },
{ "path": "packages/dom-ready" },
{ "path": "packages/i18n" },
{ "path": "packages/is-shallow-equal" },
{ "path": "packages/priority-queue" },
{ "path": "packages/project-management-automation" },
{ "path": "packages/token-list" },
{ "path": "packages/url" },
{ "path": "packages/warning" }
],

@aduth
Copy link
Member Author

aduth commented Mar 26, 2020

I don't believe anything in #18942 will be compatible with the changes here.

Just to clarify: Do you mean "incompatible"? The remainder of your comment would seem to suggest it'll be possible to include.

@aduth aduth merged commit 00c92c7 into master Mar 26, 2020
@aduth aduth deleted the update/fast-update-readmes branch March 26, 2020 12:22
@github-actions github-actions bot added this to the Gutenberg 7.9 milestone Mar 26, 2020
@sirreal
Copy link
Member

sirreal commented Mar 26, 2020

I don't believe anything in #18942 will be compatible with the changes here.

Just to clarify: Do you mean "incompatible"? The remainder of your comment would seem to suggest it'll be possible to include.

🤦‍♂ Completely compatible, no incompatibilities 👍

#18942 has been rebased and includes this in its type build/check setup.

@aduth
Copy link
Member Author

aduth commented Apr 17, 2020

Regarding 4179376 : I think you may be correct in the current state of affairs that it's unnecessary. However, if I recall correctly, the reason I had implemented it this way initially was in anticipation of potential undesirable warnings / behavior, related to:

I'd need to dig into those resources again to fully understand the possible implications and whether we're "safe" without the try / catch, but just highlighting as the reason it was there in the first place.

It turns out that it is necessary 😅 See #21467 (comment) . I'll be preparing a pull request shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Tool] Docgen /packages/docgen [Type] Build Tooling Issues or PRs related to build tooling [Type] Performance Related to performance efforts
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants