Python codegen support #923

robholt · 2021-03-03T21:22:58Z

I've come to really like using sqlc for Go projects and wanted something similar for Python. There's a few existing Python libraries that load queries from sql files, but they lack the type information, models, and query validation that sqlc can do. Since sqlc already has support for multiple languages I figured adding Python support should be fairly straight forward, which turned out to be accurate.

So basically this PR adds support for generating fully type-hinted Python code. I modeled my changes after the existing Kotlin support, although I think it ended up looking at lot more like the Go codegen. There's a bunch of code duplication between the Python and Go codegen, but I think that's unavoidable until some kind of general codegen interface can be developed. I wanted to keep my changes limited to just adding Python support.

The generated code relies on a small runtime library: sqlc-python-runtime. This keeps the generated code simple and allows for one function to support synchronous and asynchronous executors.

What's working:

postgresql
All the projects under examples/ (including tests I've added for them) and the features they use
Dates
Json fields
Type overrides

What doesn't work:

mysql (I focused on postgresql because the project I'm using this on is postgresql only, but mysql support could definitely be added)
json/db tag options (not a thing for Python)
The emit_prepared_queries option (although the asyncpg executor automatically prepares and caches queries)

As far as additional testing goes, I've started using this in an existing project with a couple dozen queries against a moderately complicated schema and it's been great so far.

What's different:

Like JDBC in Kotlin/Java, psycopg2 doesn't support numbered parameters. However asyncpg does, so the regex hack the Kotlin codegen uses is applied at runtime for only psycopg2. I'm planning on investigating psycopg2s named parameters as another potential solution.
The generated query functions aren't part of a class or anything, so their first parameter is the database connection object. This made multiple python query files (instead of just one big one) and the sync/async support much simpler.

Thanks again for sqlc! Hopefully this is something you're interested in for sqlc.

kyleconroy · 2021-03-04T03:21:31Z

Before I get into the actual code, I just wanted to express how happy I am to see pull requests like this. I've wanted sqlc to support more than Go and Kotlin for a long time, and I think Python is a great addition. I'm happy to merge this as-s behind an --experimental flag. I want to get some people using it before we commit to the structure of the generated code.

Can you tell me more about how much time you're looking to spend on Python support? If you'd just like to contribute the initial implementation, I'm happy to use it. If you're looking to maintain it, there are a number of changes to be made before I'd be comfortable marking it as stable.

I'll leave an initial review and we discuss from there.

kyleconroy

In the Go implementation, I've tried to generate code that doesn't rely on external packages. This isn't always possible, but in most cases the generated code just uses the standard library.

With that in mind, I don't think we should use pydantic or a custom sqlc-runtime package. Instead, I'd like to use dataclasses for models and SQLAlchemy for the runtime.

SQLAlchemy solves the issue of supporting multiple database engines. It also gives us a consistent interface across engines. The ORM mode is too high-level for our use case, but we can rely on the core package. This also allows people already using SQLAlchemy and easy way to integrate the code.

Lastly, I'd prefer the classes and methods over bare functions. The Go code was initially modeled after gRPC services. gRPC services in Python look like this. Ignoring the obvious style issues, I think simple classes like this are easier to pass around and mock out.

kyleconroy · 2021-03-04T03:22:54Z

examples/python/src/authors/models.py

+# Enums
+
+# Models


We shouldn't generate this comments, they're mainly just noise.

I agree, these are leftover from my initial single mega-file implementation, I'll remove them.

kyleconroy · 2021-03-04T03:25:18Z

examples/python/src/authors/query.py

+@overload
+def create_author(conn: sqlc.Connection, name: str, bio: Optional[str]) -> Optional[models.Author]:
+    pass
+
+
+@overload
+def create_author(conn: sqlc.AsyncConnection, name: str, bio: Optional[str]) -> Awaitable[Optional[models.Author]]:
+    pass
+
+
+def create_author(conn: sqlc.GenericConnection, name: str, bio: Optional[str]) -> sqlc.ReturnType[Optional[models.Author]]:
+    return conn.execute_one_model(models.Author, CREATE_AUTHOR, name, bio)


I've never used or seen the @overload operator. How common is it? Are there other ways to expose a similar sync / async function?

I'm not sure how common it is, I'm using it here so a single function can be used with both sync and async engines and type checkers properly understand it. It's definitely something I cooked up myself though, I haven't seen this specific pattern for sync+async support anywhere else. Without it I'd have to define separate sync and async functions, and with bare functions I definitely didn't want to just prefix (or suffix) every function with async_.

However, if the query functions are moved into classes it's a lot simpler to just define two classes, one with sync functions and another with async functions, no method prefix/suffix is needed since they are contained within separate classes. The generation of async (or sync) query classes can also be placed behind a config option, so users who don't care about async (or sync) support don't get that generated code.

kyleconroy · 2021-03-04T03:26:16Z

examples/python/src/authors/query.py

+LIST_AUTHORS = """-- name: list_authors :many
+SELECT id, name, bio FROM authors
+ORDER BY name
+"""


I like to generate the query strings closed to the functions themselves so it's easier to read. If I'm looking at list_authors, I can then see the query it's referencing without having to jump around.

I agree it's nice to keep them close together, but how would this work if the query functions are in a class? I could define these constants as class properties above each method, but I can't say I've seen other Python code that does that, class properties usually go at the top of the class definition.

examples/python/src/booktest/models.py

robholt · 2021-03-04T15:15:52Z

I'm definitely on-board for maintaining Python support. I'm not totally sold on the current state of the generated code, so I'm happy to work on improvements.

Concerning pydantic: One of the use cases I had in mind was using sqlc + FastAPI to create a simple REST API (this is also what the asyncio support is for), directly using the sqlc generated models for the API. I certainly see the merit in having zero external dependencies for the generated code though. What about an option to generate pydantic models (off by default)? To the query code dataclasses and pydantic models will be identical, so an option should be very simple.

Concerning Sqlalchemy: I considered it initially, but didn't want to require it for the generated code to work. I went though a few iterations since then though, so I'm not sure that reasoning makes sense anymore, especially since using it could eliminate the runtime package. The only potential problem is asyncio support was only added in 1.4, which is still in beta and has some breaking changes. The non-asyncio code won't require any 1.4 features though, so it shouldn't be a big deal. I think I will switch to it, that also simplifies parameter handling (I can just change $1 to :p1 in the generated code).

Classes vs functions: Wrapping the queries up into a class would make unit test mocks (and assertions they're called) easier. I can change that so each query function is a class method. I do want to stick to a file/class per .sql file though, my first implementation generated a single file with a single Queries mega-class and I think that got unwieldy quick.

kyleconroy · 2021-03-04T16:24:40Z

I'm definitely on-board for maintaining Python support

Great! We can discuss the rest of the changes in future PRs. I'm going to merge in the support as-is and then put it behind an experimental command line flag. Sound good?

robholt · 2021-03-04T17:02:45Z

Yeah, that sounds good to me

* Initial python codegen support * Add python tests CI job

Robert Holt added 2 commits March 3, 2021 13:50

Initial python codegen support

8956e2a

Add python tests CI job

0a382c3

kyleconroy requested changes Mar 4, 2021

View reviewed changes

kyleconroy approved these changes Mar 4, 2021

View reviewed changes

kyleconroy merged commit 3f131b7 into sqlc-dev:master Mar 5, 2021

victoraugustolls pushed a commit to Streppel/sqlc that referenced this pull request May 6, 2021

Python codegen support (sqlc-dev#923)

bdb0bd0

* Initial python codegen support * Add python tests CI job

stephen mentioned this pull request Mar 1, 2022

Refactor go codegen to plugin types #1460

Merged

danicc097 mentioned this pull request Apr 10, 2022

Pydantic models config field #1538

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python codegen support #923

Python codegen support #923

robholt commented Mar 3, 2021

kyleconroy commented Mar 4, 2021

kyleconroy left a comment

kyleconroy Mar 4, 2021

robholt Mar 4, 2021

kyleconroy Mar 4, 2021

robholt Mar 4, 2021

kyleconroy Mar 4, 2021

robholt Mar 4, 2021

robholt commented Mar 4, 2021

kyleconroy commented Mar 4, 2021

robholt commented Mar 4, 2021

Python codegen support #923

Python codegen support #923

Conversation

robholt commented Mar 3, 2021

kyleconroy commented Mar 4, 2021

kyleconroy left a comment

Choose a reason for hiding this comment

kyleconroy Mar 4, 2021

Choose a reason for hiding this comment

robholt Mar 4, 2021

Choose a reason for hiding this comment

kyleconroy Mar 4, 2021

Choose a reason for hiding this comment

robholt Mar 4, 2021

Choose a reason for hiding this comment

kyleconroy Mar 4, 2021

Choose a reason for hiding this comment

robholt Mar 4, 2021

Choose a reason for hiding this comment

robholt commented Mar 4, 2021

kyleconroy commented Mar 4, 2021

robholt commented Mar 4, 2021