Short example on using LakeFS for use with MAST data.
We have decided to not pursue this approach for data versioning on the FAIR MAST project, and this repo exsits purely as a guide for any future work that may want to use LakeFS as a service. Justification for not using LakeFS is outlined in lakefs_downsides.docx
.
If you are using Mac for development, use podman instead of docker. Follow the installation guide to set it up, then follow the below set up.
If using Linux or Windows, you need to make sure you have docker and docker-compose
installed on your system.
We will be using the Python package manager uv to install our dependencies. As a first step, make sure this is installed with:
pip install uv
Secondly, clone the repository:
git clone [email protected]:jameshod5/lakefs-trial.git
cd lakefs-trial
You can use either conda
or venv
to set up the environment. Follow the below instructions for venv:
Ensure you are using Python version 3.11
:
uv venv venv
source venv/bin/activate
uv pip install -r requirements.txt
Use uv --help
for additional commands, or refer to the documentation if needed.
Run the development container to start the lakefs repo, minio storage and the bucket. You need to populate your own bucket, for this test case place the 30420.zarr
file inside the bucket. Follow the below command using s5cmd as this will ensure the hidden metadata files are in your bucket. Make sure the bucket is named zarr-example
in order for the notebooks to work well.
s5cmd --endpoint-url http://localhost:9000 cp 30420.zarr s3://zarr-example/
podman compose \
--env-file dev/docker/.env.dev \
-f dev/docker/docker-compose.yml \
up \
--build
Podman does not shutdown containers on its own, unlike Docker. To shutdown Podman completely run:
podman compose -f dev/docker/docker-compose.yml down
podman volume rm --all