-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Research the feasibility to providing a VFS extension mechanism to fs
, require()
, child_process
, etc
#37
Comments
Btw, after nodejs/node#44537 landed, we are having some related conversation in nodejs/node#44713. |
Sounds like nodejs/node#39711 (comment) is a potential approach we can try experimenting with: // node.h
class UvDelegate {
public:
explicit UvDelegate(node::Environment* env);
virtual int uv_fs_open(...);
virtual int uv_fs_close(...);
};
// node.cc
void InternalModuleReadJSON() {
...
int fd = env->delegate()->uv_fs_open();
...
}
// user.cc
class EmbedderUvDelegate : node::UvDelegate {
int uv_fs_open(...) override;
int uv_fs_close(...) override;
};
node::CreateEnvironment(..., delegate, ...); or something simpler: // node.h
struct UvAPIs {
UvFsOpenFunc uv_fs_open = uv_fs_open;
UvFsCloseFunc uv_fs_close = uv_fs_close;
...
};
// user.cc
UvAPIs uv_apis;
uv_apis.uv_fs_open = ...;
uv_apis.uv_fs_close = ...;
node::CreateEnvironment(..., uv_apis, ...); but I'm not sure if this is actually "easier". |
@RaisinTen Yeah, this is exactly what I'm talking about!
I do think it's easier from an "amount of code" point of view and I'd be interested in exploring this further. Maybe do a demo patch if you are up for it @RaisinTen ? The only limitation I can see here is that while it does make it potentially easier to implement a VFS on C++, it would be hard to use the same APIs from JavaScript, right? I wonder if we could also expose the same "delegate" concept or something like that to JS too? |
My understanding was that the patching would take place completely at the C++ level and there should be no extra APIs exposed to JS? |
It's still unclear to me why you want/need C++ hooks? Why isn't patching |
Patching |
Yes, I'm familiar with this problem; still:
And for experimentations, patching |
I think that was the general agreement.
I don't think doing C++ hooks is a strict requirement. We're trying to find the most optimal level where allowing the configuration of these APIs would reduce maintenance burden for both node and userland. It probably should be somewhere between monkey-patching of node's fs apis and allowing overloads for libuv c++ functions. Today @jviotti and I were discussing about doing this in the We also realized that supporting the plain monkey-patching approach would require patching:
I wonder if we can come up with a single API that allows users to configure all of these at once. |
This wasn't my understanding, and under this prompt I believe the work drastically changes. If, instead of finding a blessed format, and integrating it transparently inside Node itself (similar to phar, eggs, or jar archives), the intent is to simply make it easier to support multiple VFS implementations, then:
|
cc @nodejs/loaders |
The observation here is that even if we provide a "blessed" VFS implementation in the context of SEAs, there are other projects that implement VFSs for reasons other than SEAs, and they still monkey-patch a lot of functions. If there is a way we can provide hooking capabilities in Node.js, then we would both solve the SEA problem AND solve the overall monkey-patching situation that everybody seems to agree sucks and should not happen (and it's incompatible with ESM). I think that if we widen our scope a bit more, we might be able to come up with a solution that fixes the entire problem for once and for all. That said, I don't think blocks the rest of the SEA work. For now, we can monkey patch as everybody else is doing (or patch Node.js directly for the sake of our experiments) while we try to find a proper solution to the problem, and hopefully use that later on.
We want to find a "blessed" one, but based on all the previous discussions we had over the past weeks, it seems that identifying a VFS that checks all the boxes will be tricky. The decision that came out of the last meeting was:
For point 3, we will need to integrate with many VFS (for the time being) as we experiment with them, so the problem of hooking in into Node.js becomes more problematic for us. This is all internal experimentation and we would hope to ship a blessed VFS at some point, but we don't know which yet. |
I wrote a VFS a long time ago https://github.com/matrixai/js-virtualfs and indeed Node's FS API has changed quite a bit since then. My company hasn't had resources to keep it up to date since we moved to js-encryptedfs. If nodejs itself supported a FS interface "blessed", then enabled different underlying implementations, that might be quite useful. |
I think to support VFS there are two things that need to be added to Node.js core:
How 2 can be used on top of 1 to build a fully functional VFS can be either built-in or be left to the user land. But at least 1 and 2 have to live in core for a reliable VFS implementation. It seems to me 2 is probably more like a Node.js core issue, as it also matters for use cases beyond SEA e.g. tests mocking the file system, and it needs some proper mechanism to e.g. work with the permissions model (instead of introducing a leak in that model). |
IMO it makes the most sense to implement VFS as a provider of the FileSystem standard API along with a provider for direct fs access. The API is a lot more conducive to virtualization. We could also modify workers to take a FileSystem to "jail" its fs access to that virtual file system, routing all fs access (including require/import) through it. When no FileSystem is provided it would just inherit the direct-to-disk FileSystem as the root which everything hangs off. This could be useful for flexible sandboxing; could technically even be nestable. Might also be worth providing helpers to build jailed FileSystem instances scoped to particular directories, or build archive files to pass around. |
Modelling the hooks based on FileSystem sounds like a good idea, that can save us a lot of design work. I think for a proper VFS that works with existing code though we may need some sort of synchronous extensions as FileSystem is mostly async, while many of the APIs that users need to intercept are probably synchronous. |
Oh actually there is also a synchronous variant of the FileSystem handles https://fs.spec.whatwg.org/#api-filesystemsyncaccesshandle |
Those interfaces are high level and lack many features of the typical |
There could be an internal handle with more power which the FileSystem instance wraps which could be extracted for the more power-user bits to use. For example, when used as a "jail" for a worker it could extract that handle as a base for the fs module to do all the normal fs things on. From the user perspective though, they would have just constructed the standard FileSystem on the parent thread. I think as much as possible we should try to stick to exposing standard interfaces, even if they're just wrappers of more powerful interfaces for the cases where standard APIs are insufficient. |
I think a lot can be obtained by registering new URL schemes in our filesystem API, e.g. This has a few benefits:
An alternative with similar properties is "mounting" a filesystem on a given path. I would stay away from providing an "overlay" filesystem, as I think it would have a significant issues with performance. |
I don't think the Node API should enforce any kind of format over the paths. If it goes through Mounting paths would be vastly better than a protocol1, but it still feels like an arbitrary restriction to me; whether the filtering is performed by Node or by the userland code, the performances will be the same. Also keep in mind we have prior work here. Both Electron and Yarn have been operating production-grade virtual filesystems for many years now, and perhaps there's no need to reinvent the wheel. Footnotes
|
I agree with you on mounting a path. We do not agree on the performance of filtering on top on an existing path. The problem is that in that case, you'd need to check both the vfs and the actual file system for each API call, and mix the result. This is going to be very complex, likely bug prone and difficult for certain APIs. Have you got links to the vfs implementations of electron and yarn handy? |
I imagine we would just implement it as a if (UNLIKELY(realm->has_fs_delegate()) kind of check in C++ and if there is a delegate, we can call a user-defined delegate (we can call into JS too if that’s configured). And for certain commonly used fs operations (fs.open etc) we can just coalesce the calls in JS. That looks sufficiently performant to me if we are only checking a boolean. |
Electron: bundles an entire The core problem of tying
Then we put
Where does the path |
Here is the kind of API that I would like to see: import { mount } from 'node:fs'
await mount('/path/to/nonexistent/dir', {
async open (path) {
return new FakeFDHandle()
},
async createReadStream (fakeFdHandle) {
// ...
}
// not sure I would support anything else as most things can be derived by those two
}) |
Generally the ASAR implementation is a much more limited version of Yarn's one; since Yarn is used for a very large variety of tools (Eslint, Parcel, Next, ...) we follow very closely the Node.js
We do something similar, it's just that the check is left up to the implementer (because what constitutes a virtual file is rarely decided based on the folder; it's rather the file being accessed): async function readFile(p, opts, nextReadFile) {
if (p.includes('.zip/')) {
return readFileFromZip(p, opts);
} else {
return nextReadFile(p, opts);
}
} This mechanism is also closer from how other loader hooks work, as they delegate to the next resolver: async function resolve(specifier, context, nextResolve) {
if (specifier.match(/whatever/)) {
return myCustomResolve(specifier, context);
} else {
return nextResolve(specifier, context);
}
} |
I think the API node.js should implement is the minimal one needed to allow 3rd party modules to hook it to specific file formats. The reason why I mention mounting something to a specific folder is that that can be checked extremely quickly via a prefix trie. |
Okay, what is the expected behaviour of the resulting VFS (and by extension SEA) file? |
In order for all things in the ecosystem to work correctly, support needs to be built in inside the The name of the function can be whatever we want: |
I'd be a bit wary of jumping quickly to perf things like a prefix trie; for small numbers of mounts (I suspect this will be the normal case) there are likely other more efficient checks that could be done. |
Agreed. What I want to avoid is to have to iterate through a full list of regexp for each mount for every file system operation. |
When we designed |
Closing this a I think we did a lot of research already. Any action items can be extracted a new issues |
Refer to #31 (comment) for a better explanation of the problem.
Monkey-patching all of these functions is non-trivial and requires a lot of code. The user-facing surface of this is pretty large, and I wonder if providing an extension point at the lower levels is actually easier.
Can we do some research to evaluate the feasibility of this?
@RaisinTen You have dug a bit on this on Postman's internal bootstrapper patch, but we do welcome any help or insights from the community.
The text was updated successfully, but these errors were encountered: