Add `vfs` API and `memory_vfs` implementation #639

tsoutsman · 2022-09-18T11:39:34Z

This PR reworks the file system API. It also includes a port of memfs. The new API should also be simple to integrate with fatfs and other physical file systems.

The API uses forest mounting (i.e. Windows flavour), which is explained in vfs's crate-level documentation. I chose this mounting method because it's simpler to implement.

Key differences from fs_node:

Nodes use interior mutability - Certain operations may not need mutable access, e.g. when interacting with physical file systems; renaming the file doesn't require mutable access to the struct as the changes are made on the storage device, not in memory. Interior mutability simplifies the API and means mutexes are only used where necessary. Also, data can be stored behind multiple mutexes allowing for more parallel access, e.g. the contents and name of a memory file can be changed simultaneously.
FileSystem trait - The critical thing lacking from fs_node is a clear distinction between different file systems, which is what this trait and the mount points aim to solve.
Arbitrary nodes can't be added to a directory - insert_file and insert_directory (and their entry counterparts) take a path (or name) and return a new empty node. This means consumers can't add nodes from one file system into another, e.g. memory file into a FAT directory. Copying should be implemented by creating a new node and then writing all the bytes from the source file into the destination file.

Some issues:

Trailing slashes
Invalid file/dir names e.g. ., .., /
Format of paths for get_node - I chose the file_system:/path/to/file because it's simple and is similar to what Redox uses, but I don't have any excellent reasons to use it.
Resolving . and .. in paths relies on the trait_upcasting feature, which is incomplete. This should be resolved soon Stablize trait_upcasting feature rust-lang/rust#101718 (aaaaaaand it's closed :))

Signed-off-by: Klim Tsoutsman <[email protected]>

NathanRoyer

Overall: great work imo 👌

kernel/vfs/src/lib.rs

kernel/vfs/src/node/directory.rs

NathanRoyer · 2022-09-19T08:21:49Z

kernel/vfs/src/node/mod.rs

+/// Gets the node at the `path`.
+///
+/// The path consists of a file system id, followed by a colon, followed by the
+/// path. For example, `tmp:/a/b` or `nvme:/foo/bar`.


The use of colon here could be debated, I'm not particularly against it though.

As I said in the PR, I just copied Redox but this is definitely up for debate if anyone has any better ideas.

If we used the first path node name as fs name (/root-fs/usr/bin/gcc or /usb-drive-2/photos/camping-trip-2016 for instance), that would ease filesystem enumeration but could also result in confusement and vulnerabilities (php's path traversal issues come to mind).

i'd probably prefer to have a more Unix-style set of mount points, but for simplicity's sake it doesn't have to be arbitrary. We could have all FSes mounted at / or /mnt/, for example.

NathanRoyer · 2022-09-19T08:23:40Z

kernel/vfs/src/path.rs

+/// A file system path.
+#[derive(Clone, Copy, Debug, Eq, Hash, PartialEq)]
+pub struct Path<'a> {
+    inner: &'a str,


We could allow filesystems to use slashes in node names if we used an iterable here directly rather than a consolidated path, a vec for instance

Allowing slashes in node names sounds like asking for trouble. Especially because the std fs API uses string paths and so we'd somehow have to escape slashes in node names.

yeah we cannot allow slashes in fs node names, that's a fairly typical limitation.

Signed-off-by: Klim Tsoutsman <[email protected]>

kevinaboos

Good start so far. Agreed that we obviously do need a way to register and support multiple distinct filesystem instances. I also like the idea of forcing interior mutability from the primary VFS interface. There is one major issue I see that requires a redesign: we should not and cannot use Arc everywhere.

Aside from being inefficient, copious usage of Arc only really works in the context of an in-memory FS; a real physical FS wouldn't be able to use Arcs to represent a relationship between nodes when stored on disk, e.g., a directory entry containing multiple file entries. For that, inodes are commonly used, but we don't have to explicitly include the concept of inodes in the VFS layer. Instead, we just need an abstract way to represent the concept of one entry that refers to one or more other entries.

So i'd recommend that we create an actual concrete outer type for File and Directory instead of using trait objects like Arc<dyn File> and Arc<dyn Directory>, and then the inner fields of that struct are dependent on what the real filesystem needs to represent the inter-node relationships. For a simple heap-based fs (like the currentheapfile), the inner field of the top-level Directory could indeed by an Arc reference to a set of directory entries, which would make sense because it's an efficient way to represent those relationships when you know everything is in heap memory.

Other than that, I left a few small comments on existing discussions.

Since you're focused on other things right now, I'd be happy to take a crack at extending this PR with a new attempt at the design I've described above. Feedback also welcome here or on Discord.

tsoutsman · 2022-10-04T14:16:16Z

Hm, yea that's definitely an issue.

Even if we have concrete outer types, eventually we will have to use trait objects. Generics won't work because the type of filesystem returned by something like open("/dev1/foo/bar") is only decided at runtime.

I don't have any particularly good ideas at the moment so definitely feel free to extend the PR.

tsoutsman · 2022-11-07T13:05:57Z

I've been thinking about this a bit, and I've come up with a potential solution.

We can't guarantee that the nodes returned from get are valid because another thread may immediately delete them, so we can't return a reference. We can work around this by tracking open files.

If we do something like this:

pub trait FileSystemImplementation {
    type File: File;
    type FileRef: FileRef<File = Self::File>;
    type Directory: Directory;

    fn root(&self) -> Self::Directory;
}

pub trait FileRef {
    type File: File;

    fn upgrade(self) -> Option<Arc<dyn File>>;
}

pub struct FileSystem {
    inner: Box<
        dyn FileSystemImplementation<
            File = dyn File,
            FileRef = dyn FileRef<File = dyn File>,
            Directory = dyn Directory,
        >,
    >,
    opened: HashMap<Path, OpenFile>,
}

pub struct OpenFile {
    inner: Arc<dyn File>,
}

pub trait File {
    // ...
}

pub trait Directory {
    // ...
}

pub struct Path {
    // ..
}

We can enforce that files aren't deleted if they are open. File system operations would be done on the FileSystem struct, which would perform the necessary modifications to the opened field. For example, deletion would first check the strong count of OpenFile.inner.

Directory::get would return a FileSystem::FileRef whereas Directory::open would return an OpenFile. For example, in a FAT file system, the FileRef would contain a reference to the filesystem, which it would then query in upgrade. A memory filesystem FileRef would be a weak reference to the file system node.

Although this implementation also extensively uses Arcs, they're used as a layer on top of the filesystem to prevent/reduce race conditions rather than a way to model the file system's structure.

Hypothetically, we can also just return a FileSystem::File and assume it won't be deleted. E.g. fatfs::File or in the case of memfs an Arc<MemoryFile>.

tsoutsman · 2022-12-18T01:16:25Z

I've played around with this some more, and I don't think it will be possible till traits with GATs are object safe.

Add vfs API and memory_vfs implementation

bd2aa84

Signed-off-by: Klim Tsoutsman <[email protected]>

tsoutsman requested review from kevinaboos and NathanRoyer September 18, 2022 11:39

NathanRoyer reviewed Sep 19, 2022

View reviewed changes

tsoutsman added 2 commits September 19, 2022 21:10

Use swap_remove when unmounting fs

ce2df52

Signed-off-by: Klim Tsoutsman <[email protected]>

Rename methods to use 'create' over 'insert'

cfa9452

Signed-off-by: Klim Tsoutsman <[email protected]>

kevinaboos requested changes Sep 29, 2022

View reviewed changes

tsoutsman mentioned this pull request Oct 4, 2022

Implement std::sys::theseus theseus-os/rust#12

Open

25 tasks

kevinaboos marked this pull request as draft October 11, 2022 00:58

tsoutsman closed this Dec 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `vfs` API and `memory_vfs` implementation #639

Add `vfs` API and `memory_vfs` implementation #639

tsoutsman commented Sep 18, 2022 •

edited

Loading

NathanRoyer left a comment

NathanRoyer Sep 19, 2022

tsoutsman Sep 19, 2022

NathanRoyer Sep 19, 2022 •

edited

Loading

kevinaboos Sep 29, 2022

NathanRoyer Sep 19, 2022

tsoutsman Sep 19, 2022

kevinaboos Sep 29, 2022

kevinaboos left a comment

tsoutsman commented Oct 4, 2022

tsoutsman commented Nov 7, 2022

tsoutsman commented Dec 18, 2022

Add vfs API and memory_vfs implementation #639

Add vfs API and memory_vfs implementation #639

Conversation

tsoutsman commented Sep 18, 2022 • edited Loading

NathanRoyer left a comment

Choose a reason for hiding this comment

NathanRoyer Sep 19, 2022

Choose a reason for hiding this comment

tsoutsman Sep 19, 2022

Choose a reason for hiding this comment

NathanRoyer Sep 19, 2022 • edited Loading

Choose a reason for hiding this comment

kevinaboos Sep 29, 2022

Choose a reason for hiding this comment

NathanRoyer Sep 19, 2022

Choose a reason for hiding this comment

tsoutsman Sep 19, 2022

Choose a reason for hiding this comment

kevinaboos Sep 29, 2022

Choose a reason for hiding this comment

kevinaboos left a comment

Choose a reason for hiding this comment

tsoutsman commented Oct 4, 2022

tsoutsman commented Nov 7, 2022

tsoutsman commented Dec 18, 2022

Add `vfs` API and `memory_vfs` implementation #639

Add `vfs` API and `memory_vfs` implementation #639

tsoutsman commented Sep 18, 2022 •

edited

Loading

NathanRoyer Sep 19, 2022 •

edited

Loading