Skip to content

splitgraph/seafowl-gcsfuse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scale to zero Seafowl via gcsfuse

Because Seafowl was architected with the cloud in mind it's a good candidate for serverless/'scale to zero' hosting e.g. Cloud Run and similar.

Database objects can be stored in S3 buckets, which avoids depending on a persistent volume. Meanwhile Seafowl's catalog can be backed by SQLite, also good for the scale to zero story because it avoids the usual persistent Postgres process.

Platforms like Lambda and Cloud Run will forward incoming HTTP requests to the waiting Seafowl service, which is the ideal time for it to fetch the up-to-date SQLite catalog so the query can be handled with fresh data.

By adding gcsfuse to the Seafowl container, the SQLite file is mounted from the bucket into the Seafowl container. While there is a performance penalty in doing so, the catalog is only metadata, not the raw database objects, so the penalty (at least observed so far) is negligible. Plus, so long as traffic is within the same region, GCP doesn't charge for bucket <-> Cloud Run traffic, an additional bonus for hosting costs.

How to use

Please check out blog post for the step by step details on how to set this up.

In the end you'll have a bucket like this:

and some endpoint similar to

https://seafowl-gcsfuse-YourEndpointHere.a.run.app/q

you can query from a browser, or your backend.

Steps

  • Build the Docker image (or if you want, use the prebuilt image splitgraph/seafowl-gcsfuse)
  • Make it avaialble to Cloud Run (e.g. push to hub.docker.com or your own repo)
  • Deploy a new Cloud Run instance using this image, including the secrets
  • Test the endpoint using e.g. curl
    curl -i -H "Content-Type: application/json" \
    -X POST "https://your-endpoint-goes-here.a.run.app/q" -d@- <<EOF
    {"query": "
    SELECT now()
    "}
    EOF

What's in the repo

This repo provides a Dockerfile which runs the gcsfuse_run.sh file at init time.

If you want to try separate endpoints for readonly and read/write, two example config files (read-only and write-enabled) are provided. You could upload them as secrets which lets them be mounted into your Cloud Run instance's filesystem, similarly to FUSE. Or if you prefer to mount these configs some other way, or use env vars to avoid depending on Secret Manager, that's possible too.