fix(volumes): increase volume count and decrease size to ensure available volumes for s3 buckets #15
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Welcome to Cryostat3! 👋
Before contributing, make sure you have:
main
branch[chore, ci, docs, feat, fix, test]
To recreate commits with GPG signature
git fetch upstream && git rebase --force --gpg-sign upstream/main
Fixes: cryostatio/cryostat#370
Related to #14
Description of the change:
After some more review of SeaweedFS docs, it is expected that each S3 bucket will have multiple volumes allocated to it, even when those volumes are not yet filled. Once the volumes are filled then each bucket will also claim more of the free volumes. In practice with the Cryostat 3 smoketest I find that 28 volumes are needed for the
archivedreports,archivedrecordings,eventtemplates
set of buckets after some light exercising to upload a custom template, create some active recordings, create multiple copies of archived recordings, and generate reports for each of those recordings. The number of claimed volumes reaches 28 as soon as each bucket has a single file of content, and it seems to stay at 28 (presumably until the volumes become full). Therefore, 40 seems like a reasonable number of volumes to allocate. The maximum size of each volume is defaulted to a relatively small 256MB here (Seaweed's default is 30GB), but this number should be tuned to suit the disk space actually allocated to storage. Seaweed's default strategy is to divide the available disk space by the volume size and assign that many volumes, but this strategy does not work well when using the S3 interface which actually imposes a minimum number of volumes. Instead, we should assign a minimum number of volumes, then divide the desired total storage capacity by the number of volumes, and assign this result as the maximum volume size. Therefore, the tuning knobs to do so are exposed by environment variable, so that the Operator (and Helm chart?) can do this math based on the PVC supplied to the storage container.The default selections of 40 volumes and 256MB per volume implies a total maximum storage capacity of 10GB. The container should happily run with less disk space available than that - it will just complain about not having disk space if it does fill up sooner. However, if the desired storage capacity is larger than 10GB (ie. a PVC larger than that is being assigned), then this is when it becomes important to tune the maximum volume size and increase it past 256MB. Otherwise, assigning a 50GB volume will not help, and the container will still only store and manage up to 10GB of content, leaving the rest of the disk unused and unavailable.
Question: should the "disk space / 40" calculation just be done in this entrypoint script instead of putting that work onto the Helm chart/Operator? Should we always assume that whatever disk space we see available to this container, should always be fully allocated to it? (this also has implications for cryostatio/cryostat-operator#727 , where a single PVC is assigned and shared between both the cryostat-storage container and a postgresql database).