Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install themes and plugins using the ReadableStream API #919

Closed
wants to merge 7 commits into from

Conversation

adamziel
Copy link
Collaborator

@adamziel adamziel commented Jan 8, 2024

What is this PR doing?

Ports the plugin and theme installation process over from the main exploratory branch https://github.com/WordPress/wordpress-playground/pull/851/files#diff-9d46207f1e5fb029ef3d3006b551da2345dacb15a6cf3d07e736e11e18499b2b

🚧 Work in progress 🚧

@adamziel adamziel requested a review from a team as a code owner January 8, 2024 14:21
@adamziel
Copy link
Collaborator Author

Here's a problem I stumbled upon

Buffering the entire .zip file and passing it to PHP.wasm in one go takes around 300ms on my machine. Here's how I measure:

console.time("installAsset");
const { assetFolderPath } = await installAsset(playground, {
	zipFile: pluginZipFile,
	targetPath: `${await playground.documentRoot}/wp-content/plugins`,
});
console.timeEnd("installAsset");

However, using the browser streams and writing each file as it becomes available takes around 3.5s:

console.time("writeToPHP");
files = files || decodeZip(pluginZipFile.stream());
const zipFileName = pluginZipFile?.name.split('/').pop() || 'plugin.zip';
const zipNiceName = zipNameToHumanName(zipFileName);
const assetName = zipFileName.replace(/\.zip$/, '');

progress?.tracker.setCaption(`Installing the ${zipNiceName} plugin`);
try {
	const extractTo = joinPaths(
		await php.documentRoot,
		'wp-content/plugins',
		crypto.randomUUID()
	);
	await iteratorToStream(files!).pipeTo(streamWriteToPhp(php, extractTo));

	console.timeEnd("writeToPHP");

I'm measuring that as the zip file is cached to eliminate the download time as a possible factor.

There are two possible explanations:

  • WASM is this much faster
  • Something in the streams implementation is so unnecessarily slow

I am leaning towards the second one. I'll return with more detailed measurements.

@adamziel
Copy link
Collaborator Author

Profile:

CleanShot 2024-01-15 at 12 05 51@2x

@adamziel
Copy link
Collaborator Author

adamziel commented Jan 15, 2024

Part of the slowness was mistakenly using the Blob.stream() polyfill in the browser even when it wasn't needed. I fixed that in b285cb6 and this trimmed the time from 3.5s down to 2s.

@adamziel
Copy link
Collaborator Author

Out of the remaining 2s:

  • ~50% is spent decoding the ZIP file, using ArrayBuffers, ReadableStreams, and such
  • ~50% is spent sending each decompressed file to the PHP worker via postMessage

There's potentially a lot of room for improvement, but I'm not too keen about tweaking this.

Tweaking the stream-based implementation would take a lot of time, and I'm not convinced about the benefits. The lower bound for the execution time seems to be set by the native version of libzip and libz. My intuition is that:

  • JavaScript–based zip decoder is essentially a lot of overhead on top of that lower bound including executing JavaScript code, copying the data in JS VM more than we need to, marshalling and sending the data via postMessage many times over etc.
  • WASM zip decoder is that lower bound times a factor like wasmSlowerThanNative (which could be nuanced, non-linear etc).

All of this is hand wavy and based on my intuition. I don't have any actual measurements and perhaps I could spend a dozen or more hours here and either prove or disprove those assumptions, but I think there's a more promising avenue to explore instead.

I wonder if we could stream all the downloaded bytes directly to WASM memory, and stream-handle them there. JavaScript would become an API layer over WASM operations, much like numpy in python. We wouldn't be passing data around, just orchestrating what happens on the lower level. The API could look like this:

php
    .acceptDataStream( fetch( pluginFileURL ).body )
    .unzip()
    .writeTo( "/wordpress/wp-content/plugins/gutenberg" );

I wonder what do you think @dmsnell

@adamziel adamziel mentioned this pull request Jan 15, 2024
8 tasks
@dmsnell
Copy link
Member

dmsnell commented Jan 15, 2024

this is all very good with the measurements, @adamziel. unfortunately I don't think I have much to add here, other than to ask if you measured the direct unzipping operation in JS and in WASM to compare the decompress itself (with a full Buffer and no slicing or other transformation).

I've seen cases where the overhead of something like streaming was too much to make it worthwhile, but here I did not expect it.

we aren't converting the data to UTF-8 are we on transfer into the worker? I would imagine it would be breaking if we did that, since the compressed data will not likely mask as valid UTF-8, but that might be able to slow it down sufficiently.

I'm also leaning towards the second point, that something is wrong here, but I don't know what to look for, other than noting slicing and buffer operations.

could it be that we anticipate a certain length up front and then could allocate an ArrayBuffer all at once of the given size, filling in the data as we receive chunks? that is, if we aren't doing this already and instead are appending buffer chunks? it could very well be that we've hit some O(n^2) copy because we have to constantly expand the array buffer. you could test this theory by comparing the runtimes for files of sizes ranging several magnitudes: a 1KB zip, 100 KB zip, a 1 MB zip, a 10 MB zip. that picture would hopefully be clear already if it's quadratic in nature.

@adamziel
Copy link
Collaborator Author

adamziel commented Jan 26, 2024

One more idea – calling these ReadableStreams in the worker thread would save the time required for the postMessage call. Then, if @dmsnell hunch about the quadratic complexity is right, perhaps they could achieve speeds comparable to, or just a bit slower than the wasm implementation. Also, perhaps we could transfer some of that data into the worker and WASM HEAP instead of copying it.

@adamziel
Copy link
Collaborator Author

adamziel commented Jan 27, 2024

I spent some time tweaking this implementation and I got the numbers down to:

  • ~540ms for unzipping alone (without creating the files in VFS)
  • ~1500ms for unzipping and sending the postMessage to PHP

Unzipping and creating the files in the web worker takes ~700ms so the speed is comparable with the WASM implementation. Cool! And we get to work with the stream API which is quite convenient.

@adamziel
Copy link
Collaborator Author

Closing in favor of https://github.com/WordPress/blueprints-library/ that uses native PHP streams for all file transfers.

@adamziel adamziel closed this Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants