Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server: Storage layer refactor #1011

Open
bo0tzz opened this issue Nov 23, 2022 · 5 comments · Fixed by #1098
Open

Server: Storage layer refactor #1011

bo0tzz opened this issue Nov 23, 2022 · 5 comments · Fixed by #1098

Comments

@bo0tzz
Copy link
Member

bo0tzz commented Nov 23, 2022

Feature detail

At the moment, Immich does not have an abstracted storage layer. On upload, files are stored in the semi-hardcoded library path with a randomly generated filename, their path is stored in the database, and in any future (read) operations this stored path is used (file serving, thumbnail generation, etc).
For several of the features we're meaning to (potentially) implement in the future (eg #34, #418, #451), it will be very helpful to refactor and abstract the storage layer. For some of them, like supporting multiple storage backends, it will be necessary entirely. In this issue I want to propose a design, although it will need some more discussion and refinement before it will be complete.

As mentioned above, currently the storage path for a file is generated once and stored in the database. I propose that we instead move to a model where storage paths are built on the fly based on the data we have for an asset. We already use some of that data to build the path on upload right now:

const originalUploadFolder = join(basePath, req.user.id, 'original', sanitizedDeviceId);

Instead, when trying to write or read an asset, the storage layer would expose a function for that which accepts the AssetEntity (or a more limited set of data, if desired). The storage implementation then uses that internally, together with some configuration, to build the actual path. That way, things like the storage path become an implementation detail that does not need to be exposed to the rest of Immich.

I think it would be good to keep the storage providers as self-contained as we can, and avoid having it do things like access the database. Instead, it would take in a configuration when initializing (eg, the root path where to store files, S3 access credentials, or a template for the filename). That configuration can of course be read from the database by whatever code initializes the provider.

This will allow for a multitude of nice things:

  • Storage backends become swappable. We can then relatively easily add support for storing files on S3, etc.
  • We can use multiple storage backends for different file types (original file, thumbnail, etc), even on different mediums
    (eg thumbnails on disk, originals on S3)
  • Code for things like path templating can be cleanly contained inside a single storage provider
  • Migrations between storage types! Just instantiate both the old and the new configuration, and copy between them. No need to even update the database.
  • Probably other stuff :)

tbd:

  1. What interface does a storage provider need? Probably at least create, delete and stat. How about something like S3, which might be able to provide URLs for direct access (bypassing immich)?
  2. The description here (partly) covers multiple features. What is the exact scope of the initial refactor (and how do we anticipate those future features in it)?
  3. ???

Platform

Server

@Cellivar
Copy link

Cellivar commented Dec 8, 2022

If I might be so bold as to add some unsolicited advice..

You're very close to the concept of a generic blob storage interface, where filesystem storage just a slightly weird looking blob storage API. Blob storage for large files, like images, is a very common design pattern for modern networked systems. Though my link hasn't been updated for a few years it's a good example of the way you may want to head in with your implementation. I suspect you can find more modern options for TypeScript out there, searching from my phone is difficult.

Your abstraction layer you described becomes blob operations, which then translate into actual blob API calls (filesystem write, s3 write, NFS share write, etc).

@bo0tzz
Copy link
Member Author

bo0tzz commented Dec 8, 2022

That's very helpful, thank you!

@pinpox
Copy link

pinpox commented Jul 11, 2023

Does immich support S3 Storage currently? I saw a related pr was merged some time ago, but can't figure out how to set it up.

@jrasm91
Copy link
Contributor

jrasm91 commented Jul 11, 2023

This is probably what you are thinking of: #1683 (comment)

@ibotty
Copy link

ibotty commented Aug 28, 2024

I just wanted to point to Apache OpenDAL which is used in the big data ecosystem quiet a bit. It is a unified storage layer supporting many different storage systems, among it s3 and local posix file systems.

It also has node bindings.

https://opendal.apache.org/docs/nodejs/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🚧 Tasks
Development

Successfully merging a pull request may close this issue.

6 participants