Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datasette publish lambda plugin #236

Open
simonw opened this issue Apr 23, 2018 · 11 comments
Open

datasette publish lambda plugin #236

simonw opened this issue Apr 23, 2018 · 11 comments

Comments

@simonw
Copy link
Owner

simonw commented Apr 23, 2018

Refs #217 - create a publish plugin that can deploy to AWS Lambda.

https://docs.aws.amazon.com/lambda/latest/dg/limits.html says lambda packages can be up to 50 MB, so this would only work with smaller databases (the command can check the filesize before attempting to package and deploy it).

Lambdas do get a 512 MB /tmp directory too, so for larger databases the function could start and then download up to 512MB from an S3 bucket - so the plugin could take an optional S3 bucket to write to and know how to upload the .db file there and then have the lambda download it on startup.

@simonw simonw added the feature label Jul 10, 2018
simonw added a commit that referenced this issue Jul 26, 2018
This change introduces a new plugin hook, publish_subcommand, which can be
used to implement new subcommands for the "datasette publish" command family.

I've used this new hook to refactor out the "publish now" and "publish heroku"
implementations into separate modules. I've also added unit tests for these
two publishers, mocking the subprocess.call and subprocess.check_output
functions.

As part of this, I introduced a mechanism for loading default plugins. These
are defined in the new "default_plugins" list inside datasette/app.py

Closes #217 (Plugin support for datasette publish)
Closes #348 (Unit tests for "datasette publish")
Refs #14, #59, #102, #103, #146, #236, #347
simonw pushed a commit that referenced this issue Jul 26, 2018
… heroku/now (#349)

This change introduces a new plugin hook, publish_subcommand, which can be
used to implement new subcommands for the "datasette publish" command family.

I've used this new hook to refactor out the "publish now" and "publish heroku"
implementations into separate modules. I've also added unit tests for these
two publishers, mocking the subprocess.call and subprocess.check_output
functions.

As part of this, I introduced a mechanism for loading default plugins. These
are defined in the new "default_plugins" list inside datasette/app.py

Closes #217 (Plugin support for datasette publish)
Closes #348 (Unit tests for "datasette publish")
Refs #14, #59, #102, #103, #146, #236, #347
@cldellow
Copy link
Contributor

cldellow commented Apr 3, 2020

Hi Simon,

I'm thinking of attempting this. Can you clarify some questions I have?

  1. I assume the goal is to have a CORS-friendly HTTPS endpoint that hosts the datasette service + user's db.

  2. If that's the goal, I think Lambda alone is insufficient. Lambda provides the compute fabric, but not the HTTP routing. You'd also need to add Application Load Balancer or API Gateway to provide an HTTP endpoint that routes to the lambda function.

Do you have a preference between ALB or API GW? ALB has better economics at scale, but has a minimum monthly cost. API GW has worse per-request economics, but scales to zero when no requests are happening.

  1. Does Datasette have any native components, or is it all pure python? If it has native bits, they'll likely need to be recompiled to work on Amazon Linux 2.

  2. There are a few disparate services that need to be wired together to expose a Python service securely to the web. If I was doing this outside of the datasette publish system, I'd use an AWS CloudFormation template. Even within datasette, I think it still makes sense to use a CloudFormation template and just have the publish plugin invoke it (via the standard aws cli) with user-specified parameters. Does that sound reasonable to you?

Thanks for your help!

@cldellow
Copy link
Contributor

I made a repo at https://github.com/code402/datasette-lambda to demonstrate the idea, and scratch my personal itch for this.

The demo relies on some central authority having already published a public, reusable Lambda layer with Datasette & its dependencies. I think that differs from the other publish plugins which seem to mainly publish Dockerfiles that the host will interpret to install deps from a requirements.txt file.

I chose that approach because uvloop appears to be a dependency with native code that needs to be compiled for the target runtime environment. In this case, that's Amazon Linux 2. I'm not 100% clear on whether that's still required, because:

  • maybe uvloop is only needed for uvicorn, which the demo doesn't actually use since HTTP routing is handled by API Gateway
  • it seems like uvloop may be an optional, drop-in optimization for asyncio in any case (but I may be misreading this; I'm very much a Python noob)

If it's the case that uvloop is truly optional, then I think the publish plugin could do the packaging on the user's machine, regardless of what flavour of operating system they're on. That'd be a bit slower for the user, but would provide the most long-term flexibility in terms of supporting plugins.

@simonw
Copy link
Owner Author

simonw commented Jun 16, 2020

Hi Colin,

Sorry I didn't see this sooner! I've just started digging into this myself, to try and play with the new EFS Lambda support: #850.

Yes, uvloop is only needed because of uvicorn. I have a branch here that removes that dependency just for trying out Lambda: https://github.com/simonw/datasette/tree/no-uvicorn - so you can run pip install https://github.com/simonw/datasette/archive/no-uvicorn.zip to get that.

I'm going to try out your datasette-lambda project next - really excited to see how far you've got with it.

@simonw
Copy link
Owner Author

simonw commented Jun 16, 2020

As for your other questions:

  1. I assume the goal is to have a CORS-friendly HTTPS endpoint that hosts the datasette service + user's db.

Yes, exactly. I know this will limit the size of database that can be deployed (since Lambda has a 50MB total package limit as far as I can tell) but there are plenty of interesting databases that are small enough to fit there.

The new EFS support for Lambda means that theoretically the size of database is now unlimited, which is really interesting. That's what got me inspired to take a look at a proof of concept in #850.

  1. If that's the goal, I think Lambda alone is insufficient. Lambda provides the compute fabric, but not the HTTP routing. You'd also need to add Application Load Balancer or API Gateway to provide an HTTP endpoint that routes to the lambda function.

Do you have a preference between ALB or API GW? ALB has better economics at scale, but has a minimum monthly cost. API GW has worse per-request economics, but scales to zero when no requests are happening.

I personally like scale-to-zero because many of my projects are likely to receive very little traffic. So API GW first, and maybe ALB as an option later on for people operating at scale?

  1. Does Datasette have any native components, or is it all pure python? If it has native bits, they'll likely need to be recompiled to work on Amazon Linux 2.

As you've found, the only native component is uvloop which is only needed if uvicorn is being used to serve requests.

  1. There are a few disparate services that need to be wired together to expose a Python service securely to the web. If I was doing this outside of the datasette publish system, I'd use an AWS CloudFormation template. Even within datasette, I think it still makes sense to use a CloudFormation template and just have the publish plugin invoke it (via the standard aws cli) with user-specified parameters. Does that sound reasonable to you?

For the eventual "datasette publish lambda" command I want whatever results in the smallest amount of inconvenience for users. I've been trying out Amazon SAM in #850 and it requires users to run Docker on their machines, which is a pretty huge barrier to entry! I don't have much experience with CloudFormation but it's probably a better bet, especially if you can "pip install" the dependencies needed to deploy with it.

@jacobian
Copy link
Contributor

Now that Lambda supports Docker, this probably is a bit easier and may be able to build on top of the existing package command.

There are weirdnesses in how the command actually gets invoked; the aws-lambda-python image shows a bit of that. So Datasette would probably need some sort of Lambda-specific entry point to make this work.

@jacobian
Copy link
Contributor

Oh, and the container image can be up to 10GB, so the EFS step might not be needed except for pretty big stuff.

@simonw
Copy link
Owner Author

simonw commented Mar 15, 2021

Yeah the Lambda Docker stuff is pretty odd - you still don't get to speak HTTP, you have to speak their custom event protocol instead.

https://github.com/glassechidna/serverlessish looks interesting here - it adds a proxy inside the container which allows your existing HTTP Docker image to run within Docker-on-Lambda. I've not tried it out yet though.

@sethvincent
Copy link

👋 I just put together a small example using the lambda container image support: https://github.com/sethvincent/datasette-aws-lambda-example

It uses mangum and AWS's python runtime interface client to handle the lambda event stuff.

I'd be happy to help with a publish plugin for AWS lambda as I plan to use this for upcoming projects.

The example uses the serverless cli for deployment but there might be a more suitable deployment approach for the plugin. It would be cool if users didn't have to install anything additional other than the aws cli and its associated config/credentials setup.

@simonw
Copy link
Owner Author

simonw commented Sep 17, 2021

That's so useful @sethvincent! Really interesting reading your code there, especially clever how you're using the base_url config.

I'd be very interested to see what your demo looks like without using serverless - completely agree that the less additional dependencies there are for this the better.

I'm also very interested in figuring out a way to run Datasette in Lambda but with the SQLite database on an EFS volume. Do you have a feel for how hard that would be?

@jordaneremieff
Copy link

Hi @simonw,

I've received some inquiries over the last year or so about Datasette and how it might be supported by Mangum. I maintain Mangum which is, as far as I know, the only project that provides support for ASGI applications in AWS Lambda.

If there is anything that I can help with here, please let me know because I think what Datasette provides to the community (even beyond OSS) is noble and worthy of special consideration.

@sopel
Copy link

sopel commented Mar 12, 2023

I keep coming back to this in search for the related exploration, so I'll just link it now:

@simonw has meanwhile researched how to deploy Datasette to AWS Lambda using function URLs and Mangum via simonw/public-notes#6 and concluded that's everything I need to know in order to build a datasette-publish-lambda plugin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants