Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLite support #161

Closed
RadhiFadlillah opened this issue Dec 13, 2019 · 35 comments
Closed

SQLite support #161

RadhiFadlillah opened this issue Dec 13, 2019 · 35 comments
Labels
enhancement New feature or request future In the year 3000...

Comments

@RadhiFadlillah
Copy link
Contributor

Hi all, great works on this project. Is it possible to add SQLite support ? Right now I'm working on embedded device and often uses SQLite, so it will be awesome for me.

@kyleconroy kyleconroy added the enhancement New feature or request label Dec 13, 2019
@kyleconroy
Copy link
Collaborator

Agreed! I'd love to add SQLite support.

The biggest blocker right now is access to a high-quality SQLite query parser. The SQLite C API doesn't expose a parsing function as far as I can tell, which means we'll probably need to create a project similar to pg_query_go.

We'll make this the official issue for tracking SQLite.

@jhaungs
Copy link

jhaungs commented Dec 14, 2019

Just a suggestion. The amalgamated sqlite source file can be used as a static or dynamic library. If you wanted to write to the C interface, you could get a parser for free if you're willing to track their development changes over time. Even if all you did was separate out the lemon parser generator and the generated parser, you'd have a leg up on writing your own, and a simpler path to future compliance with their changes.

@kyleconroy
Copy link
Collaborator

I spent a few days looking into the feasibility of reusing the parser present in the sqlite source code. I don't think it's going to work. The parse.y file is littered with C / C++ calls, making using the existing parser very difficult. The internal structs are not cleaning exposed during parsing.

Instead, my current plan is to use the ANTLR, as they have an existing SQLite grammar. I played around with the generated parser today. It's not perfect, but it should fit the bill.

@douglas-plezentek
Copy link
Contributor

Hi @kyleconroy, has there been any progress on this? I'm close to publishing build rules for sqlc in bazel, and it would be useful for writing and running tests that don't require a backend.

@mvdan
Copy link

mvdan commented May 14, 2021

In case it helps, note that someone has a pretty advanced cgo-free implementation of sqlite at https://pkg.go.dev/modernc.org/sqlite. I seem to recall that it's translated from C, so the internal APIs will likely not be nicer than the C ones, but perhaps you can reuse some of the code at least.

The author is not on GitHub, but they are on Gitlab if you want to start a conversation: https://gitlab.com/cznic/sqlite

@mvdan
Copy link

mvdan commented May 14, 2021

In particular, the internal parser seems to be at https://pkg.go.dev/modernc.org/sqlite/lib#Xsqlite3Parser.

@maxhawkins
Copy link
Contributor

I did a little investigation of using the internal parser from modernc's sqlite implementation. I wasn't able to get anything working that could be used in sqlc, but I wanted to share my progress in case this helps someone else.

https://gist.github.com/maxhawkins/f162eaacddabbe380c7fb75791d85a56

$ go run *.go 'WITH foo AS (SELECT 1) SELECT * FROM foo'
valid sql

$ go run *.go 'bad sql'
2021/08/02 13:09:53 parse error: near "bad": syntax error
exit status 1

This program uses the sqlite3RunParser function to parse an input sql string and return a syntax error if it fails to parse. I learned that sqlite3RunParser is a higher-level parsing function that relies on information from the database schema and parses directly into bytecode instead of creating an AST. This means my program fails if the input SQL references any tables or tries to create them.

A better approach may be using the tokenizer (sqlite3GetToken) and LALR(1) parser (yyParser) directly and then pulling the parsed data from the yyParser struct. There's an example of that in the source for sqlite3RunParser.

I won't have a chunk of time to dive into this for a few weeks, but I'm hoping this inspires someone else to poke around with modernc.org/sqlite and maybe get a parser working.

@kyleconroy kyleconroy added the future In the year 3000... label Aug 28, 2021
@StevenACoffman
Copy link
Contributor

@maxhawkins I don't know if you had any further explorations, but I summarized this issue and included your investigations in https://gitlab.com/cznic/sqlite/-/issues/75

@StevenACoffman
Copy link
Contributor

Current state is: #1260 (comment)

As you've discovered, the SQLite support is very rough right now. The parser we're using is low level and difficult to use, so I don't think SQLite is going to be improved until we find a better parser.

@StevenACoffman
Copy link
Contributor

Update from @cznic - https://gitlab.com/cznic/sqlite/-/issues/75#note_720410495 :

I wonder if there's a chance to rewrite the lemon parser generator version of the formal grammar to the yacc form and use goyacc or equivalent to produce a Go parser that just builds the AST (dropping all actions). I'm not sure it's possible, ie. if the grammar is of the right type per se, without the actions helping the parser and I did not try this way.

Instead I tried to manually write down the grammar from scratch and I failed. But maybe someone else can fix the problems I ran into or dig something useful out of it. Also, if the requirement is just for some subset of the SQLite grammar then it might be possible to adjust. The failures that stopped me were in some more exotic corners of the syntax, IIRC. See the attached archive. A haven't seen the code in almost a year, not sure about how much I can help with understanding it, but feel free to ask me anything I might be able to answer.

sqlite-parser.tar.gz

@graf0
Copy link

graf0 commented Dec 9, 2021

Maybe this will help:

https://github.com/CovenantSQL/sqlparser

@PadraigK
Copy link
Contributor

It looks like there's a more maintained and up-to-date version of the antlr grammar for SQLite here:
https://github.com/antlr/grammars-v4/tree/master/sql/sqlite

I started playing with migrating sqlc to use this. There are some naming differences that will require a bit of work to figure out, but I'm interested in doing this work if the maintainers think it would be useful? Is the antlr parser thought of as a dead end for this, or is it worth investing in improving it? Let me know and I'll see if I can get a PR together this week.

@zellyn
Copy link

zellyn commented Jan 25, 2022

FWIW, I just completed a port of pikchr to Go: gopikchr. As part of that effort, I also ported the Lemon Parser to Go: golemon.

In all, it was almost 12,000 lines of code, hand-ported from C to Go.

After that, the 849 lines in tokenize.c and the 1904 lines in parse.y don't seem that daunting, if someone wanted to hand-port the sqlite parser to Go. I'm planning on working on a github action to notice file changes in the sqlite code (fossil has per-file rss feeds) and automatically create github issues, although for me, it's much simpler, since lemon.c and pikchr.y change infrequently.

Anyway, just a thought. I'd do the port myself, but I've just exhausted my quota of idiotic-hand-porting-to-Go energy 🙂

@jmillerv
Copy link

I understand that hand porting is tedious, but what's the difficulty here? I have pretty limited c exposure but write enough go in the day to day. I have many personal projects that could benefit from generating sqlite and am willing to contribute.

@zellyn
Copy link

zellyn commented Jan 26, 2022

You're right, it's not difficult, just tedious. After playing around with string and []rune, I eventually replaced char * with []byte. I actually zero-terminated it, because there are lots of places the code uses z[i+1] without checkout length, trusting that the zero takes a byte. z += 2 becomes z = z[2:]. That left a few instances of pointer comparisons and pointer math, but it wasn't that bad.

@StevenACoffman
Copy link
Contributor

StevenACoffman commented Jan 26, 2022

I wonder how easy it is to run ccgo/v3 on just the 849 lines in tokenize.c and the 1904 lines in parse.y and then wire it up to golemon. I'm a little out of my depth though.

@zellyn
Copy link

zellyn commented Jan 26, 2022

ccgo/v3 does not understand .y files, so that wouldn't work. I found it generates fairly ugly code (for good reasons: it's an amazing tool).

@jmillerv
Copy link

You're right, it's not difficult, just tedious. After playing around with string and []rune, I eventually replaced char * with []byte. I actually zero-terminated it, because there are lots of places the code uses z[i+1] without checkout length, trusting that the zero takes a byte. z += 2 becomes z = z[2:]. That left a few instances of pointer comparisons and pointer math, but it wasn't that bad.

Alright. I will update here if I end up taking a crack at this.

@zellyn
Copy link

zellyn commented Jan 26, 2022

One other lesson: where a dynamically sized array is only ever cleanly appended to, you can replace the pointer/current-index/current-size triple with a simple slice in Go. If the current index moves around, or the code frequently references arr[n-1], arr[n], and arr[n+1], then replace pointer+current-size with a slice, but keep current-index around. The changes get too complicated and finicky otherwise, and it's desirable to keep the code as similar as possible, to make future changes easy to port over.

@PadraigK
Copy link
Contributor

PadraigK commented Feb 2, 2022

I've been chipping away at implementing support using the antlr generated parser. Latest work here: #1410 — I can write up some issues if anyone else wants to jump in, otherwise I'll keep going on this for a few hours / week.

@danthegoodman1
Copy link

Would love to see this soon!

@StevenACoffman
Copy link
Contributor

It looks like benbjohnson created a Pure Go sqlite parser here: https://github.com/benbjohnson/sql that is not currently maintained, but caught my eye.

@PadraigK it looks like your PRs have been merged so #1397 #1410 #1443 #1447 #1455 how is the current support looking?

@kyleconroy
Copy link
Collaborator

@zellyn @jmillerv I attempted to port over parse.y and tokenize.c this weekend. I made some progress here: https://github.com/kyleconroy/golite. It turns out you also need the structs and functions defined in sqliteInt.h. I've never ported C to Go, so I'm not sure if I'm actually close to a working solution.

@PadraigK
Copy link
Contributor

I'm still interested in continuing on the path I'm on, I was just on vacation for a few weeks and haven't found time to get back into it yet. The support is coming along well, but not finished — the main missing thing is support for where clauses, but a lot of the pieces for that are in place. After that, I think it's just small details to handle more obscure SQL, so it may even be beta-testable by then.

As a heads up, it is possible my work situation will change in the next month or so, and in that case I might not get to finish, but I'll let you know @kyleconroy.

For what it's worth, I don't have any concerns about the ANTLR parsing side — 95% of the work is in mapping the ANTLR output into sqlc's AST, and I think the same will be true with a different parser. Occasionally it is necessary to tweak the grammar to produce something that's easier to work with. That said, I get that using a port of SQLite's own parser feels better overall so it's probably worth continuing to investigate.

My process has been to work down through the existing tests and make SQLite versions of them where appropriate then I examine the output structs/functions and fix any bugs. This makes it quite easy to jump in and get something impactful done without much ramp up time. The test suite will also be useful to test other parsers / mapping approaches, so even if the native SQLite parser gets done, some of the work should still be useful at least :D

@kyleconroy
Copy link
Collaborator

@PadraigK I'm unsure that my experiment with golemon will work, so I still think the ANTLR based approach has the best chance to succeed.

@j0holo
Copy link

j0holo commented May 10, 2022

@PadraigK I saw that #1410 has already been merged into the main branch. Is there anything I can do to help?

Edit I saw in the changes that that there is some form of todo list.

@hakobera
Copy link
Contributor

@PadraigK @kyleconroy I'm not sure what is the best SQLite parser for sqlc, but I can write code #1687 to support more SQL Syntax using current ANTLR based parser.

I also added some examples and many end to end test data for SQLite in the PR. I think these examples and tests could help us for check backward compatibility, even if you decided changed parse implementation in the future.

@avalonbits
Copy link
Contributor

We now have beta support for sqlite! Gonna test it out this weekend.

@oliverpool
Copy link
Contributor

Looks good, thank you for the amazing work!

Is there any issue summarizing what is supported and what is not supported on sqlite? (I am personally missing RETURNING - but using the postgres engine works just fine for now, see https://codeberg.org/dpsgmuf/buchungssystem/src/branch/main/database/queries :)

@avalonbits
Copy link
Contributor

@oliverpool FTR I'm not a maintainer of sqlc, just a very happy user that is eager to get SQLite support :)

@aaa8212
Copy link

aaa8212 commented Jul 13, 2022

Great work on the project and perfect timing on SQLite support, thank you! :)

I gave it a go (latest commit on main: 8618c39) and it works nicely. As a feedback, there was one minor issue along the way... the project no longer works with Go 1.17:

/go/pkg/mod/github.com/kyleconroy/[email protected]/internal/config/go_type.go:175:15: undefined: strings.Cut
note: module requires Go 1.18

I was able to work around it (built a Docker container and used sqlc inside it) so it is not a big issue for me, just wanted to share. Otherwise works great.

@timohuovinen
Copy link

timohuovinen commented Jul 19, 2022

Amazing work, I can't thank you enough!

@marcusirgens
Copy link

Do you want issues with the SQLite beta registered on the issue tracker?

@kyleconroy
Copy link
Collaborator

Do you want issues with the SQLite beta registered on the issue tracker?

Yep!

@kyleconroy
Copy link
Collaborator

SQLite will be in beta for 1.15.0, so I'm going to close this out :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request future In the year 3000...
Projects
None yet
Development

No branches or pull requests