Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vfs API and memory_vfs implementation #639

Closed

Conversation

tsoutsman
Copy link
Member

@tsoutsman tsoutsman commented Sep 18, 2022

This PR reworks the file system API. It also includes a port of memfs. The new API should also be simple to integrate with fatfs and other physical file systems.

The API uses forest mounting (i.e. Windows flavour), which is explained in vfs's crate-level documentation. I chose this mounting method because it's simpler to implement.

Key differences from fs_node:

  • Nodes use interior mutability - Certain operations may not need mutable access, e.g. when interacting with physical file systems; renaming the file doesn't require mutable access to the struct as the changes are made on the storage device, not in memory. Interior mutability simplifies the API and means mutexes are only used where necessary. Also, data can be stored behind multiple mutexes allowing for more parallel access, e.g. the contents and name of a memory file can be changed simultaneously.
  • FileSystem trait - The critical thing lacking from fs_node is a clear distinction between different file systems, which is what this trait and the mount points aim to solve.
  • Arbitrary nodes can't be added to a directory - insert_file and insert_directory (and their entry counterparts) take a path (or name) and return a new empty node. This means consumers can't add nodes from one file system into another, e.g. memory file into a FAT directory. Copying should be implemented by creating a new node and then writing all the bytes from the source file into the destination file.

Some issues:

  • Trailing slashes
  • Invalid file/dir names e.g. ., .., /
  • Format of paths for get_node - I chose the file_system:/path/to/file because it's simple and is similar to what Redox uses, but I don't have any excellent reasons to use it.
  • Resolving . and .. in paths relies on the trait_upcasting feature, which is incomplete. This should be resolved soon Stablize trait_upcasting feature rust-lang/rust#101718 (aaaaaaand it's closed :))

Copy link
Member

@NathanRoyer NathanRoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall: great work imo 👌

kernel/vfs/src/lib.rs Outdated Show resolved Hide resolved
kernel/vfs/src/node/directory.rs Show resolved Hide resolved
kernel/vfs/src/node/directory.rs Outdated Show resolved Hide resolved
/// Gets the node at the `path`.
///
/// The path consists of a file system id, followed by a colon, followed by the
/// path. For example, `tmp:/a/b` or `nvme:/foo/bar`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of colon here could be debated, I'm not particularly against it though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said in the PR, I just copied Redox but this is definitely up for debate if anyone has any better ideas.

Copy link
Member

@NathanRoyer NathanRoyer Sep 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we used the first path node name as fs name (/root-fs/usr/bin/gcc or /usb-drive-2/photos/camping-trip-2016 for instance), that would ease filesystem enumeration but could also result in confusement and vulnerabilities (php's path traversal issues come to mind).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd probably prefer to have a more Unix-style set of mount points, but for simplicity's sake it doesn't have to be arbitrary. We could have all FSes mounted at / or /mnt/, for example.

/// A file system path.
#[derive(Clone, Copy, Debug, Eq, Hash, PartialEq)]
pub struct Path<'a> {
inner: &'a str,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could allow filesystems to use slashes in node names if we used an iterable here directly rather than a consolidated path, a vec for instance

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing slashes in node names sounds like asking for trouble. Especially because the std fs API uses string paths and so we'd somehow have to escape slashes in node names.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we cannot allow slashes in fs node names, that's a fairly typical limitation.

Copy link
Member

@kevinaboos kevinaboos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start so far. Agreed that we obviously do need a way to register and support multiple distinct filesystem instances. I also like the idea of forcing interior mutability from the primary VFS interface. There is one major issue I see that requires a redesign: we should not and cannot use Arc everywhere.

Aside from being inefficient, copious usage of Arc only really works in the context of an in-memory FS; a real physical FS wouldn't be able to use Arcs to represent a relationship between nodes when stored on disk, e.g., a directory entry containing multiple file entries. For that, inodes are commonly used, but we don't have to explicitly include the concept of inodes in the VFS layer. Instead, we just need an abstract way to represent the concept of one entry that refers to one or more other entries.

So i'd recommend that we create an actual concrete outer type for File and Directory instead of using trait objects like Arc<dyn File> and Arc<dyn Directory>, and then the inner fields of that struct are dependent on what the real filesystem needs to represent the inter-node relationships. For a simple heap-based fs (like the currentheapfile), the inner field of the top-level Directory could indeed by an Arc reference to a set of directory entries, which would make sense because it's an efficient way to represent those relationships when you know everything is in heap memory.

Other than that, I left a few small comments on existing discussions.

Since you're focused on other things right now, I'd be happy to take a crack at extending this PR with a new attempt at the design I've described above. Feedback also welcome here or on Discord.

@tsoutsman
Copy link
Member Author

Hm, yea that's definitely an issue.

Even if we have concrete outer types, eventually we will have to use trait objects. Generics won't work because the type of filesystem returned by something like open("/dev1/foo/bar") is only decided at runtime.

I don't have any particularly good ideas at the moment so definitely feel free to extend the PR.

@kevinaboos kevinaboos marked this pull request as draft October 11, 2022 00:58
@tsoutsman
Copy link
Member Author

I've been thinking about this a bit, and I've come up with a potential solution.

We can't guarantee that the nodes returned from get are valid because another thread may immediately delete them, so we can't return a reference. We can work around this by tracking open files.

If we do something like this:

pub trait FileSystemImplementation {
    type File: File;
    type FileRef: FileRef<File = Self::File>;
    type Directory: Directory;

    fn root(&self) -> Self::Directory;
}

pub trait FileRef {
    type File: File;

    fn upgrade(self) -> Option<Arc<dyn File>>;
}

pub struct FileSystem {
    inner: Box<
        dyn FileSystemImplementation<
            File = dyn File,
            FileRef = dyn FileRef<File = dyn File>,
            Directory = dyn Directory,
        >,
    >,
    opened: HashMap<Path, OpenFile>,
}

pub struct OpenFile {
    inner: Arc<dyn File>,
}

pub trait File {
    // ...
}

pub trait Directory {
    // ...
}

pub struct Path {
    // ..
}

We can enforce that files aren't deleted if they are open. File system operations would be done on the FileSystem struct, which would perform the necessary modifications to the opened field. For example, deletion would first check the strong count of OpenFile.inner.

Directory::get would return a FileSystem::FileRef whereas Directory::open would return an OpenFile. For example, in a FAT file system, the FileRef would contain a reference to the filesystem, which it would then query in upgrade. A memory filesystem FileRef would be a weak reference to the file system node.

Although this implementation also extensively uses Arcs, they're used as a layer on top of the filesystem to prevent/reduce race conditions rather than a way to model the file system's structure.

Hypothetically, we can also just return a FileSystem::File and assume it won't be deleted. E.g. fatfs::File or in the case of memfs an Arc<MemoryFile>.

@tsoutsman
Copy link
Member Author

I've played around with this some more, and I don't think it will be possible till traits with GATs are object safe.

@tsoutsman tsoutsman closed this Dec 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants