-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Output multiple resources #1
Comments
Refs #1. This represents the path segment of a URL in a "valid by construction" pattern. Any type using a `UrlPath` object knows that the `path` string starts with a slash and is consistent.
Refs #1. This type holds file's contents and its associated URL path. This will be leveraged by users to associate a particular path to its file contents, which `rules_prerender` tooling will generate. The file contents are expressed as an `ArrayBuffer` to support binary files without requiring a specific representation like `Uint8Array` would. `string` construction is supported, just bound to UTF-8 encoding, which is usually what users want. If users care about a specific encoding, they can easily convert a string to their own `ArrayBuffer` using their desired encoding and just pass that in, so I don't think binding `string` inputs to UTF-8 will be a maintenance issue. I opted for `ArrayBuffer` as the type representing binary files because `Blob` doesn't appear to be supported by NodeJS and I wanted to use something that is actually a part of the web spec. `Buffer` is the obvious NodeJS choice, but I didn't see much value in binding to the NodeJS API when an open web specification alternative was just as usable. The downside of not using `Blob` is that file contents are forced to be in-memory and are not easily streamable. I think this is fine for now, but in the future we may want to switch to `Stream`, which would couple us to NodeJS, or using an `AsyncIterable<string | ArrayBuffer>`, which would be a bit less common but at least use standardized APIs. The most awkward part here is that `Uint8Array` is structurally compatible with `ArrayBuffer` but is **not** a subclass of it. This means that the following compiles and runs without error: ```typescript function normalize(input: ArrayBuffer | string): ArrayBuffer { if (input instanceof ArrayBuffer) return input; return new TextEncoder().encode(input); } console.log(normalize(new Uint8Array([ 0, 1, 2, 3 ]))); // Uint8Array(7) [ 48, 44, 49, 44, 50, 44, 51 ] - Wat? ``` Because `Uint8Array` is not a subclass of `ArrayBuffer`, the `instanceof` check returns `false`. `TextEncoder` happily encodes the value, effectively encoding the input as a UTF-8 string, when it is intended to be left as is. This is particularly rough because the input comes from user code, so users are very likely to pass in a `Uint8Array` and TypeScript won't complain. The best solution to this problem is to just accept all `TypedArray` types as an input, even though `rules_prerender` truly doesn't care about any of them and we only convert to an `ArrayBuffer` anyways. This unfortunately leaks into our public API and means we have to support another input type, but the alternative is a bunch of wasted debugging time for users which I think is the worse option anyways.
Refs #1. This type holds file's contents and its associated URL path. This will be leveraged by users to associate a particular path to its file contents, which `rules_prerender` tooling will generate. It is exported as public API because users will construct the object to pass to `rules_prerender` to generate it. The file contents are expressed as an `ArrayBuffer` to support binary files without requiring a specific representation like `Uint8Array` would. `string` construction is supported, just bound to UTF-8 encoding, which is usually what users want. If users care about a specific encoding, they can easily convert a string to their own `ArrayBuffer` using their desired encoding and just pass that in, so I don't think binding `string` inputs to UTF-8 will be a maintenance issue. I opted for `ArrayBuffer` as the type representing binary files because `Blob` doesn't appear to be supported by NodeJS and I wanted to use something that is actually a part of the web spec. `Buffer` is the obvious NodeJS choice, but I didn't see much value in binding to the NodeJS API when an open web specification alternative was just as usable. The downside of not using `Blob` is that file contents are forced to be in-memory and are not easily streamable. I think this is fine for now, but in the future we may want to switch to `Stream`, which would couple us to NodeJS, or using an `AsyncIterable<string | ArrayBuffer>`, which would be a bit less common but at least use standardized APIs. The most awkward part here is that `Uint8Array` is structurally compatible with `ArrayBuffer` but is **not** a subclass of it. This means that the following compiles and runs without error: ```typescript function normalize(input: ArrayBuffer | string): ArrayBuffer { if (input instanceof ArrayBuffer) return input; return new TextEncoder().encode(input); } console.log(normalize(new Uint8Array([ 0, 1, 2, 3 ]))); // Uint8Array(7) [ 48, 44, 49, 44, 50, 44, 51 ] - Wat? ``` Because `Uint8Array` is not a subclass of `ArrayBuffer`, the `instanceof` check returns `false`. `TextEncoder` happily encodes the value, effectively encoding the input as a UTF-8 string, when it is intended to be left as is. This is particularly rough because the input comes from user code, so users are very likely to pass in a `Uint8Array` and TypeScript won't complain. The best solution to this problem is to just accept all `TypedArray` types as an input, even though `rules_prerender` truly doesn't care about any of them and we only convert to an `ArrayBuffer` anyways. This unfortunately leaks into our public API and means we have to support another input type, but the alternative is a bunch of wasted debugging time for users which I think is the worse option anyways.
Refs #1. The `entry_point` module now allows dynamically imported user code to return an `Iterable<PrerenderResource>`, `Promise<Iterable<PrerenderResource>>`, or an `AsyncIterable<PrerenderResource>`. I considered allowing returning a single `PrerenderResource`, but this seems like an unnecessary expansion of the API surface, since it can be trivially wrapped in an array by user code. `renderer` is updated to disallow all the new `PrerenderResource` types that might be returned by `entry_point.invoke(/* ... */)`. This makes it functionally unchanged, though some error messages are different. I also updated these failure cases to print directly to `stderr` and fail rather than throwing an error. This hides the stack trace which would be just noise to the user. A stack trace should only be shown in an internal error, not from bad user input. Because `renderer` undoes the change from `entry_point`, this commit is effectively a no-op. However a future multi-renderer tool can re-use the `entry_point` module and simply allow `PrerenderResource` results.
Refs #1. This tool is roughly equivalent to the existing renderer tool in purpose, however it allows user code to return `PrerenderResource` objects and will write all of them to disk under a single output directory. This effectively allows users to render multiple files at once rather than being limited to just one.
Refs #1. This tool is roughly equivalent to the existing renderer tool in purpose, however it allows user code to return `PrerenderResource` objects and will write all of them to disk under a single output directory. This effectively allows users to render multiple files at once rather than being limited to just one. This tool is exported to users, though it has the same awkward caveats as the existing renderer. Because we dynamically import user code, we must compile user code into the `nodejs_binary()` which runs the tool. As a result, we can't publish a `nodejs_binary()` directly, but must instead build one as needed, including both multi-renderer code and user code all in a single binary. This means the published version of the tool is not actually a `nodejs_binary()` as one might expect, but instead it is just a `filegroup()` of the code required to run the tool. For simplicity we "round up" to all the code in the package, since it makes little difference to users who have to install the whole package anyways and ensures we won't forget to list a critical file.
Refs #1. This tool is roughly equivalent to the existing renderer tool in purpose, however it allows user code to return `PrerenderResource` objects and will write all of them to disk under a single output directory. This effectively allows users to render multiple files at once rather than being limited to just one. This tool is exported to users, though it has the same awkward caveats as the existing renderer. Because we dynamically import user code, we must compile user code into the `nodejs_binary()` which runs the tool. As a result, we can't publish a `nodejs_binary()` directly, but must instead build one as needed, including both multi-renderer code and user code all in a single binary. This means the published version of the tool is not actually a `nodejs_binary()` as one might expect, but instead it is just a `filegroup()` of the code required to run the tool. For simplicity we "round up" to all the code in the package, since it makes little difference to users who have to install the whole package anyways and ensures we won't forget to list a critical file.
Refs #1. This tool is roughly equivalent to the existing renderer tool in purpose, however it allows user code to return `PrerenderResource` objects and will write all of them to disk under a single output directory. This effectively allows users to render multiple files at once rather than being limited to just one. This tool is exported to users, though it has the same awkward caveats as the existing renderer. Because we dynamically import user code, we must compile user code into the `nodejs_binary()` which runs the tool. As a result, we can't publish a `nodejs_binary()` directly, but must instead build one as needed, including both multi-renderer code and user code all in a single binary. This means the published version of the tool is not actually a `nodejs_binary()` as one might expect, but instead it is just a `filegroup()` of the code required to run the tool. For simplicity we "round up" to all the code in the package, since it makes little difference to users who have to install the whole package anyways and ensures we won't forget to list a critical file.
Refs #1. This is mostly a copy of `prerender_page()`, except outputting a directory of generated files than just a single one.
Refs #1. This tool is roughly equivalent to the existing renderer tool in purpose, however it allows user code to return `PrerenderResource` objects and will write all of them to disk under a single output directory. This effectively allows users to render multiple files at once rather than being limited to just one. This tool is exported to users, though it has the same awkward caveats as the existing renderer. Because we dynamically import user code, we must compile user code into the `nodejs_binary()` which runs the tool. As a result, we can't publish a `nodejs_binary()` directly, but must instead build one as needed, including both multi-renderer code and user code all in a single binary. This means the published version of the tool is not actually a `nodejs_binary()` as one might expect, but instead it is just a `filegroup()` of the code required to run the tool. For simplicity we "round up" to all the code in the package, since it makes little difference to users who have to install the whole package anyways and ensures we won't forget to list a critical file.
Refs #1. This is mostly a copy of `prerender_page()`, except outputting a directory of generated files rather than just a single file. Currently, we don't have the infrastructure to generate entry points for scripts or styles within the context of multiple generated pages. For now, `prerender_multi_page()` just emits an aggregated set of scripts and styles with no specific entry point. Support for that will come later.
Refs #1. This is a simple example site which uses `prerender_multi_page()` to generate multiple HTML files in a single Bazel target. This only generates HTML and does not include any client-side scripts, styles, or other resources. Support for that will come later and the example will be updated accordingly.
Refs #1. This enables the `extract_annotations()` macro implementation re-usable between `prerender_page()` and `prerender_multi_page()`.
Refs #1. This tool acts as a new version of the existing annotation extractor tool which supports processing multiple input files at once. It accepts a directory of input files and extracts annotations from all of them, combining them into a single metadata JSON file. The input files are copied to an output directory with annotations stripped out. Non-HTML files are copied directly over with no changes. Ideally, we should have a unique metadata JSON file *for each* input HTML file. However, `rollup_bundle()` and `postcss_binary()` do not have usable implementations which support bundling multiple files when the file set is not known until execution time. As a result, having unique metadata files is not useful at the moment, so instead we generate one metadata file that contains all the scripts and styles from all HTML files in the input. This is overly broad since if one HTML file includes a script, that script is attributed to *every* HTML file. We will explore making this more preceise and enabling better tree-shaking of unused scripts and styles at a later time.
Adds multi-annotation extractor tool. Refs #1. This tool acts as a new version of the existing annotation extractor tool which supports processing multiple input files at once. It accepts a directory of input files and extracts annotations from all of them, combining them into a single metadata JSON file. The input files are copied to an output directory with annotations stripped out. Non-HTML files are copied directly over with no changes. Ideally, we should have a unique metadata JSON file *for each* input HTML file. However, `rollup_bundle()` and `postcss_binary()` do not have usable implementations which support bundling multiple files when the file set is not known until execution time. As a result, having unique metadata files is not useful at the moment, so instead we generate one metadata file that contains all the scripts and styles from all HTML files in the input. This is overly broad since if one HTML file includes a script, that script is attributed to *every* HTML file. We will explore making this more preceise and enabling better tree-shaking of unused scripts and styles at a later time.
Refs #1. This tool acts as a new version of the existing annotation extractor tool which supports processing multiple input files at once. It accepts a directory of input files and extracts annotations from all of them, combining them into a single metadata JSON file. The input files are copied to an output directory with annotations stripped out. Non-HTML files are copied directly over with no changes. Ideally, we should have a unique metadata JSON file *for each* input HTML file. However, `rollup_bundle()` and `postcss_binary()` do not have usable implementations which support bundling multiple files when the file set is not known until execution time. As a result, having unique metadata files is not useful at the moment, so instead we generate one metadata file that contains all the scripts and styles from all HTML files in the input. This is overly broad since if one HTML file includes a script, that script is attributed to *every* HTML file. We will explore making this more preceise and enabling better tree-shaking of unused scripts and styles at a later time.
Refs #1. This performs a deep equality check between annotations.
Refs #1. This provides a reusable implementation for all code which needs to deduplicate an array of items based on an equality comparison.
Refs #1. We are going to add a new multi-annotation extractor binary which also needs to deduplicate annotations. However the multi-annotation extractor must make multiple calls to `extract()` for each file it is processing. This means `extract()`'s deduplication is not sufficient, because the file set of all extracted annotations across all processed files must be deduplicated. As a result, `extractor.ts`'s deduplicate logic is not actually useful. Instead this needs to be done at the binary level. This commit simply moves the logic from `extractor.ts` to `annotation_extractor.ts`. A similar implementation will be included in the new multi-annotation extractor. I also took the opportunity to update the implementation to use the new common `unique()` function with a real `annotationEquals()` implementation rather than relying on Node's deep equality assertion. This is cross-platform and a more authoritative implementation.
Refs #1. This tool acts as a new version of the existing annotation extractor tool which supports processing multiple input files at once. It accepts a directory of input files and extracts annotations from all of them, combining them into a single metadata JSON file. The input files are copied to an output directory with annotations stripped out. Non-HTML files are copied directly over with no changes. Ideally, we should have a unique metadata JSON file *for each* input HTML file. However, `rollup_bundle()` and `postcss_binary()` do not have usable implementations which support bundling multiple files when the file set is not known until execution time. As a result, having unique metadata files is not useful at the moment, so instead we generate one metadata file that contains all the scripts and styles from all HTML files in the input. This is overly broad since if one HTML file includes a script, that script is attributed to *every* HTML file. We will explore making this more preceise and enabling better tree-shaking of unused scripts and styles at a later time.
Refs #1. This tool acts as a new version of the existing annotation extractor tool which supports processing multiple input files at once. It accepts a directory of input files and extracts annotations from all of them, combining them into a single metadata JSON file. The input files are copied to an output directory with annotations stripped out. Non-HTML files are copied directly over with no changes. Ideally, we should have a unique metadata JSON file *for each* input HTML file. However, `rollup_bundle()` and `postcss_binary()` do not have usable implementations which support bundling multiple files when the file set is not known until execution time. As a result, having unique metadata files is not useful at the moment, so instead we generate one metadata file that contains all the scripts and styles from all HTML files in the input. This is overly broad since if one HTML file includes a script, that script is attributed to *every* HTML file. We will explore making this more preceise and enabling better tree-shaking of unused scripts and styles at a later time.
Refs #1. This processes all the HTML files generated by the user and extracts annotations into a single metadata file. Added a script to the multi-page example in order to exercise and test this behavior.
Refs #1. Previously I forgot to shift visibility to the unannotated HTML directory. This is the new "public" target that will be depended upon and should have the `visibility` field.
… own file. Refs #1. Unlike `_extract_annotations()`, both of these macros are usable for `prerender_multi_page()` because there is still only one metadata file. As a result, pulling these macros to their file enables `prerender_multi_page()` to share the implementation.
Refs #1. This uses the existing `script_entry_point()` to generate an entry point for client-side scripts based on the generated metadata file. This works identically to `prerender_page()`. I also took the opportunity to add a `bzl_library()` target for `prerender_multi_page()` which I had previously forgotten to do.
Refs #1. This uses the existing `script_entry_point()` to generate an entry point for client-side scripts based on the generated metadata file. This works identically to `prerender_page()`. I also took the opportunity to add a `bzl_library()` target for `prerender_multi_page()` which I had previously forgotten to do.
Refs #1. This generates the CSS entry point for styles from `prerender_multi_page()` and includes it in the exported `%{name}_styles` target. Also added a CSS file to the multi-page example to test the behavior.
…es into multiple files. Refs #1. This modifies the existing logic to accept an input directory and output directory as arguments. For every file in the input directory, it is injected with the provided resources and then written to the same relative path in the output directory. Non-HTML files are copied to the output directory unchanged. We unfortunately need to support non-HTML files because a user might generate other files in the same tool which generates HTML, so we can't assume the input directory only has HTML files. Copy-pasted some helper utilites from multi-annotation extractor for iterating over input files and creating the parent directory for output files. If these utilites continue to be useful in other contexts, it may be worth factoring them out to a shared library.
Refs #1. This introduces a new `prerender_multi_page_bundled()` as a corollary to `prerender_page_bundled()`, generating multiple files. It calls through to `prerender_multi_page()` to generate the HTML, JS, and CSS files, using Rollup and PostCSS to bundle these resources into individual JS and CSS files. The resource injector tool is currently not compatible with injecting multiple files, so that step is skipped for now. Instead the JS and CSS are bundled, but then left alone and not used. Only the HTML and resources are merged together. Updated the multi-page example to use `prerender_multi_page_bundled()`. Functionally, all this means for now is that we can drop the excess `web_resources()` target. Also added a `build_test()` for the bundled JS and CSS since nothing else uses them at the moment. I also took the opportunity to update the documentation of the pre-existing `prerender_multi_page()`, `prerender_page()`, and `prerender_page_bundled()` macros to be consistent between them and fix any minor copy-paste errors I found.
Refs #1. For now, this is equivalent to the existing `resource_injector`. Future commits will modify it to support multiple input HTML files.
…es into multiple files. Refs #1. This modifies the existing logic to accept an input directory and output directory as arguments. For every file in the input directory, it is injected with the provided resources and then written to the same relative path in the output directory. Non-HTML files are copied to the output directory unchanged. We unfortunately need to support non-HTML files because a user might generate other files in the same tool which generates HTML, so we can't assume the input directory only has HTML files. Copy-pasted some helper utilites from multi-annotation extractor for iterating over input files and creating the parent directory for output files. If these utilites continue to be useful in other contexts, it may be worth factoring them out to a shared library.
Refs #1. This introduces a new `prerender_multi_page_bundled()` as a corollary to `prerender_page_bundled()`, generating multiple files. It calls through to `prerender_multi_page()` to generate the HTML, JS, and CSS files, using Rollup and PostCSS to bundle these resources into individual JS and CSS files. The resource injector tool is currently not compatible with injecting multiple files, so that step is skipped for now. Instead the JS and CSS are bundled, but then left alone and not used. Only the HTML and resources are merged together. Updated the multi-page example to use `prerender_multi_page_bundled()`. Functionally, all this means for now is that we can drop the excess `web_resources()` target. Also added a `build_test()` for the bundled JS and CSS since nothing else uses them at the moment. I also took the opportunity to update the documentation of the pre-existing `prerender_multi_page()`, `prerender_page()`, and `prerender_page_bundled()` macros to be consistent between them and fix any minor copy-paste errors I found.
Refs #1. For now, this is equivalent to the existing `resource_injector`. Future commits will modify it to support multiple input HTML files.
…es into multiple files. Refs #1. This modifies the existing logic to accept an input directory and output directory as arguments. For every file in the input directory, it is injected with the provided resources and then written to the same relative path in the output directory. Non-HTML files are copied to the output directory unchanged. We unfortunately need to support non-HTML files because a user might generate other files in the same tool which generates HTML, so we can't assume the input directory only has HTML files. Copy-pasted some helper utilites from multi-annotation extractor for iterating over input files and creating the parent directory for output files. If these utilites continue to be useful in other contexts, it may be worth factoring them out to a shared library.
…ge_bundled()`. Refs #1. `multi_inject_resources()` is a rough copy of `inject_resources()` except it uses directories for inputs and outputs. It returns a `WebResourceInfo()` in order to be compatible with `web_resources()`. The multi-resource injector tool is also published to the final NPM package directory. Tested this with the multi-page example. This requires that all HTML files *actually* adhere to the HTML spec, so I had to update the example to create properly formatted HTML files. Also tested with `ref/external` and confirmed that `multi_inject_resources()` and the multi-resource injector tool both function when used by external repositories via `@npm//rules_prerender`. Currently, `multi_inject_resources()` only deals with CSS files, as this is all that is supported by the multi-resource injector tool. The rule will be updated to support scripts once the tool is able to inject JavaScript in a usable manner.
…lti_page_bundled()`. Refs #1. `multi_inject_resources()` is a rough copy of `inject_resources()` except it uses directories for inputs and outputs. It returns a `WebResourceInfo()` in order to be compatible with `web_resources()`. The multi-resource injector tool is also published to the final NPM package directory. Tested this with the multi-page example. This requires that all HTML files *actually* adhere to the HTML spec, so I had to update the example to create properly formatted HTML files. Also tested with `ref/external` and confirmed that `multi_inject_resources()` and the multi-resource injector tool both function when used by external repositories via `@npm//rules_prerender`. Currently, `multi_inject_resources()` only deals with CSS files, as this is all that is supported by the multi-resource injector tool. The rule will be updated to support scripts once the tool is able to inject JavaScript in a usable manner.
Refs #1. This option gives a single JavaScript file which gets copied alongside every HTML file processed. This allows each file to own and link to its own JS, even if they happen to all be the same. Long term, these files will be different, so while we could optimize things to use a single JS file, it would be temporary anyways. For now, the JavaScript file is only copied alongside each HTML file, a `<script />` tag is not yet injected into the HTML.
Refs #1. This makes every HTML file link to its sibling JavaScript bundle and load it appropriately.
…i_page_bundled()`. Refs #1. This means that `prerender_multi_page_bundled()` will bundle all the associated JavaScript to a single file and then inject it into each HTML page, apply the scripts to all of them.
…i_page_bundled()`. Refs #1. This means that `prerender_multi_page_bundled()` will bundle all the associated JavaScript to a single file and then inject it into each HTML page, apply the scripts to all of them. Updated the existing multi-page example to include visible JavaScript on the page and assert on it in tests.
Refs #1. This exports the macro for public use.
Was able to make a bit more progress today than expected. I got I just published A few things still TODO:
I'd also like to try simplifying some of the infrastructure. As of now there are a few tools with multiple versions, a basic version which processes a single HTML file, and then a "multi" version with processes multiple. I'm curious if I can drop the basic version and update the BUILD rules to use the "multi" versions with just a single input. This is a bit awkward for a few reasons, but it may be worth doing if we can drop a lot of the extra complexity. I'll file separate issues for those points, but I think this is good enough of an MVP to consider this issue closed. 🥳 |
Some follow up for today:
|
Currently,
prerender_page()
andprerender_page_bundled()
output a single HTML page (and other included resources). However, users may want to prerender multiple pages from a single prerender file. A common example might be a blogging site, which has a number of blog posts which use mostly equivalent HTML/JS/CSS, but are built from different markdown. Rather than having aprerender_page_bundled()
for each blog post, there should be a way to generate all of them from a single rule.This could work in Node by simply having the default export return an
Iterable
orAsyncIterable
ofPrerenderResource
objects, wherePrerenderResource
simply correlates a URL path to its contents. It could look something like:In Starlark, we would probably need another macro, since the output is complete different. A corollary to
prerender_page()
is pretty straightforward, since we don't have to worry about bundling.prerender_page_bundled()
is a little trickier but would look the same to the user. We would need to get Rollup and PostCSS to bundle each page individually and then inject the resources into each one. Alternatively, we could treat all pages as sharing the same resources (which is likely mostly true in practice, but there would certainly be edge cases) and simply do a single bundle step to generate one JavaScript and another CSS file, then inject just those two into every page. That would likely be easier from a tooling perspective, but would also be less optimal, since one page including a large dep would cause that dep to be included in all pages.The text was updated successfully, but these errors were encountered: