-
Notifications
You must be signed in to change notification settings - Fork 804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python codegen support #923
Conversation
Before I get into the actual code, I just wanted to express how happy I am to see pull requests like this. I've wanted sqlc to support more than Go and Kotlin for a long time, and I think Python is a great addition. I'm happy to merge this as-s behind an Can you tell me more about how much time you're looking to spend on Python support? If you'd just like to contribute the initial implementation, I'm happy to use it. If you're looking to maintain it, there are a number of changes to be made before I'd be comfortable marking it as stable. I'll leave an initial review and we discuss from there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the Go implementation, I've tried to generate code that doesn't rely on external packages. This isn't always possible, but in most cases the generated code just uses the standard library.
With that in mind, I don't think we should use pydantic or a custom sqlc-runtime package. Instead, I'd like to use dataclasses for models and SQLAlchemy for the runtime.
SQLAlchemy solves the issue of supporting multiple database engines. It also gives us a consistent interface across engines. The ORM mode is too high-level for our use case, but we can rely on the core package. This also allows people already using SQLAlchemy and easy way to integrate the code.
Lastly, I'd prefer the classes and methods over bare functions. The Go code was initially modeled after gRPC services. gRPC services in Python look like this. Ignoring the obvious style issues, I think simple classes like this are easier to pass around and mock out.
# Enums | ||
|
||
# Models |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't generate this comments, they're mainly just noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, these are leftover from my initial single mega-file implementation, I'll remove them.
@overload | ||
def create_author(conn: sqlc.Connection, name: str, bio: Optional[str]) -> Optional[models.Author]: | ||
pass | ||
|
||
|
||
@overload | ||
def create_author(conn: sqlc.AsyncConnection, name: str, bio: Optional[str]) -> Awaitable[Optional[models.Author]]: | ||
pass | ||
|
||
|
||
def create_author(conn: sqlc.GenericConnection, name: str, bio: Optional[str]) -> sqlc.ReturnType[Optional[models.Author]]: | ||
return conn.execute_one_model(models.Author, CREATE_AUTHOR, name, bio) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've never used or seen the @overload
operator. How common is it? Are there other ways to expose a similar sync / async function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how common it is, I'm using it here so a single function can be used with both sync and async engines and type checkers properly understand it. It's definitely something I cooked up myself though, I haven't seen this specific pattern for sync+async support anywhere else. Without it I'd have to define separate sync and async functions, and with bare functions I definitely didn't want to just prefix (or suffix) every function with async_
.
However, if the query functions are moved into classes it's a lot simpler to just define two classes, one with sync functions and another with async functions, no method prefix/suffix is needed since they are contained within separate classes. The generation of async (or sync) query classes can also be placed behind a config option, so users who don't care about async (or sync) support don't get that generated code.
LIST_AUTHORS = """-- name: list_authors :many | ||
SELECT id, name, bio FROM authors | ||
ORDER BY name | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like to generate the query strings closed to the functions themselves so it's easier to read. If I'm looking at list_authors
, I can then see the query it's referencing without having to jump around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's nice to keep them close together, but how would this work if the query functions are in a class? I could define these constants as class properties above each method, but I can't say I've seen other Python code that does that, class properties usually go at the top of the class definition.
I'm definitely on-board for maintaining Python support. I'm not totally sold on the current state of the generated code, so I'm happy to work on improvements. Concerning pydantic: One of the use cases I had in mind was using sqlc + FastAPI to create a simple REST API (this is also what the asyncio support is for), directly using the sqlc generated models for the API. I certainly see the merit in having zero external dependencies for the generated code though. What about an option to generate pydantic models (off by default)? To the query code dataclasses and pydantic models will be identical, so an option should be very simple. Concerning Sqlalchemy: I considered it initially, but didn't want to require it for the generated code to work. I went though a few iterations since then though, so I'm not sure that reasoning makes sense anymore, especially since using it could eliminate the runtime package. The only potential problem is asyncio support was only added in 1.4, which is still in beta and has some breaking changes. The non-asyncio code won't require any 1.4 features though, so it shouldn't be a big deal. I think I will switch to it, that also simplifies parameter handling (I can just change Classes vs functions: Wrapping the queries up into a class would make unit test mocks (and assertions they're called) easier. I can change that so each query function is a class method. I do want to stick to a file/class per |
Great! We can discuss the rest of the changes in future PRs. I'm going to merge in the support as-is and then put it behind an experimental command line flag. Sound good? |
Yeah, that sounds good to me |
* Initial python codegen support * Add python tests CI job
I've come to really like using sqlc for Go projects and wanted something similar for Python. There's a few existing Python libraries that load queries from sql files, but they lack the type information, models, and query validation that sqlc can do. Since sqlc already has support for multiple languages I figured adding Python support should be fairly straight forward, which turned out to be accurate.
So basically this PR adds support for generating fully type-hinted Python code. I modeled my changes after the existing Kotlin support, although I think it ended up looking at lot more like the Go codegen. There's a bunch of code duplication between the Python and Go codegen, but I think that's unavoidable until some kind of general codegen interface can be developed. I wanted to keep my changes limited to just adding Python support.
The generated code relies on a small runtime library: sqlc-python-runtime. This keeps the generated code simple and allows for one function to support synchronous and asynchronous executors.
What's working:
examples/
(including tests I've added for them) and the features they useWhat doesn't work:
emit_prepared_queries
option (although theasyncpg
executor automatically prepares and caches queries)As far as additional testing goes, I've started using this in an existing project with a couple dozen queries against a moderately complicated schema and it's been great so far.
What's different:
psycopg2
doesn't support numbered parameters. Howeverasyncpg
does, so the regex hack the Kotlin codegen uses is applied at runtime for onlypsycopg2
. I'm planning on investigatingpsycopg2
s named parameters as another potential solution.Thanks again for sqlc! Hopefully this is something you're interested in for sqlc.