Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INTEGER PRIMARY KEY in SQLite #852

Closed
lnicola opened this issue Apr 8, 2017 · 30 comments · Fixed by #3940
Closed

INTEGER PRIMARY KEY in SQLite #852

lnicola opened this issue Apr 8, 2017 · 30 comments · Fixed by #3940

Comments

@lnicola
Copy link

lnicola commented Apr 8, 2017

Diesel deduces the a column's type as BigInt if it's declared as BIGINT in the database. This is generally fine in SQLite because INTEGER and BIGINT are synonyms, but INTEGER PRIMARY KEY is treated differently and is always 64-bit according to https://sqlite.org/autoinc.html. Moreso, BIGINT PRIMARY KEY AUTOINCREMENT is not accepted.

It might be a good idea to detect these columns and use BigInt for them. Note that the SQLite documentation says that INTEGER PRIMARY KEY is special only for tables that aren't declared with WITHOUT ROWID, but BIGINT PRIMARY KEY is still not allowed even when using that.

@Eijebong
Copy link
Member

Is this really an issue ? I just tried and I was able to insert 2^256 in an integer field. SQLite does not really care about types. It would be confusing for users to deserialize what they think is an integer to an i64 though as I'm sure nobody knows about this bug. If you need to retrieve a I Bigint from SQLite, you can use the table macro and say it's a bigint

@lnicola
Copy link
Author

lnicola commented Jul 17, 2017

It's not an issue for SQLite, but diesel deduces the type as i32 IIRC.

If you need to retrieve a I Bigint from SQLite, you can use the table macro and say it's a bigint

Yeah, that works.

@sgrif
Copy link
Member

sgrif commented Jul 17, 2017

I'm a bit torn on what to do here. It's really unfortunate that SQLite requires the type name to be exactly INTEGER in this case. As @Eijebong mentioned, our inference on SQLite is really more just trying to deduce intent, as types are just suggestions more than anything in SQLite. Ultimately I'd prefer not to special case this, as the use cases for SQLite trend towards not needing 64 bit PKs, and you can override manually in the cases where it matters. Ultimately though I think keeping our inference rules simple and consistent here takes priority.

@sgrif sgrif closed this as completed Jul 17, 2017
thpts added a commit to thpts/a_or_cname that referenced this issue May 14, 2019
Despite SQLite considering BIGINT and INTEGER as synonymous, we have to declare
this field as BigInt in order to have greater than 32bits of resolution.

See also: diesel-rs/diesel#852
@kpcyrd
Copy link
Contributor

kpcyrd commented Nov 20, 2019

Is there a reason i32 was picked instead of i64? I can store an i32 in a i64 but not the other way around, so using a i64 seems like the obviously better choice.

Are there any docs I can read on how to work around this? I'm currently stuck with a number that I need to store but can't.

@lnicola
Copy link
Author

lnicola commented Nov 20, 2019

I think you can use diesel print-schema and edit the resulting file to use your own types.

@weiznich
Copy link
Member

@kpcyrd To quote Sean from the replay literally above yours:

Ultimately I'd prefer not to special case this, as the use cases for SQLite trend towards not needing 64 bit PKs, and you can override manually in the cases where it matters. Ultimately though I think keeping our inference rules simple and consistent here takes priority.

@kpcyrd
Copy link
Contributor

kpcyrd commented Nov 21, 2019

My column is not a primary key, should I raise a separate issue for this?

For anybody with the same issue, I'm currently editing my src/schema.rs manually like this:

-        balance -> Nullable<Integer>,
+        balance -> Nullable<BigInt>,

Since this file is auto-generated it is overwritten every time diesel migration run and diesel migration redo is used.

@lnicola
Copy link
Author

lnicola commented Nov 21, 2019

Since this file is auto-generated it is overwritten every time diesel migration run and diesel migration redo is used.

I didn't realize that. How does diesel migration run which is the schema file? Are you sure you don't have diesel print-schema somewhere?

@kpcyrd
Copy link
Contributor

kpcyrd commented Nov 21, 2019

% cat diesel.toml 
# For documentation on how to configure this file,
# see diesel.rs/guides/configuring-diesel-cli

[print_schema]
file = "src/schema.rs"

I think it's correct that the file is rewritten to pick up possible schema changes, the problem is that I need to edit this file afterwards. Allowing me to overwrite the type for certain fields with a config file might work as a solution.

@lnicola
Copy link
Author

lnicola commented Nov 21, 2019

Ah, okay. My solution was to not run print-schema and maintain it manually.

@weiznich
Copy link
Member

@kpcyrd If you column is not a primary key you can just use BigInt as column type on sql side then diesel print-schema will pick things automatically.

About changing the auto generated schema: This guide is covering options to modify the schema after generation.

@tobx
Copy link

tobx commented Jan 10, 2020

@weiznich I have a primary key and want it to be i64. Your proposed guide mentions the patch_file field. This works, but only until I change the database schema. Then the patch file does not match with the newly generated schema.rs anymore and the following error occurs:

Failed to apply schema patch. stdout: patching file src/schema.rs

So after changing the database schema I would still have to manually change the schema.rs and additionally rebuild the schema.patch file.

@tobx
Copy link

tobx commented Jan 10, 2020

Ok, now I understand. First, deactivate the auto-generation of the schema.rs in diesel-toml:

[print_schema]
# file = "src/schema.rs"

Then use the command diesel print_schema and edit the schema.rs manually.

I can live with that, but I still do not understand how patch_file should generally work after a schema change.

@weiznich
Copy link
Member

@tobx patch_file just applies the given file via patch to the generated schema file, so there is no magic involved there. This means applying the patch is only possible as long as you schema didn't change in all places at once. (The context for the changed lines needs not to be changed). That does entirely depend on your changes to the database schema, so it can work or it cannot work.

@Boscop
Copy link

Boscop commented Jan 20, 2020

@weiznich patch_file is not working for me in Git shell / msys:

$ diesel print-schema
Failed to apply schema patch. stdout: patching file 'C:\Users\me\AppData\Local\Temp\.tmp0VJQNn'
Hunk #1 FAILED at 43.
Hunk #2 FAILED at 67.
2 out of 2 hunks FAILED -- saving rejects to file 'C:\Users\me\AppData\Local\Temp\.tmp0VJQNn.rej'
 stderr:

How can I make it work on Windows?

I followed the guide and did git diff -U6 > src/schema.patch and I have

[print_schema]
file = "src/schema.rs"
patch_file = "src/schema.patch"

@ahicks92
Copy link

I want to propose reopening this. While it is true that most applications using sqlite won't ever have 4 billion rows in a table at a time, sqlite's autoincrement algorithm can and will generate integers above the i32 limit. This means that if you end up with around 2 billion inserts to a table, you'll end up over the i32 limit and something will happen. It would be nice to at least clarify what that something is. This isn't rows in the table, just inserts.

You can hit this in a month with a mere 1000 updates per second, so that's kind of rendering schema inference unsuitable for any application with 1000 updates a second. That seems like a pretty low limit. You could hit that just using sqlite to store metrics data or logs or something with a retention policy in a pretty reasonable time frame. For more reasonable sqlite volumes, you can hit it in a year or so, say 100 or 200 updates a second. Your database needn't even go over a few megabytes to make that happen.

I understand why you wouldn't want to make the schema inference special, but it's already bad enough to maintain two places on every schema change, and for production applications you appear to literally have no choice but to do so if you don't want to hit the above problem. If there's also an underlying ticking time bomb that's going to fail silently, I strongly suggest that fixing said problem would be worth the special case. This needs to at least fail loudly, otherwise you just get spurious failures that you'd have to spend an afternoon tracking down.

@weiznich
Copy link
Member

@camlorn As it is simple to change the corresponding field type in the autogenerated schema later I do not see any reason to change the behavior here. Additionally the CLI tool provides multiple ways to automate such changes for future schema changes, so I really do not see why we should break a lot of already existing code to fix a "special" case.

@lnicola
Copy link
Author

lnicola commented Sep 16, 2020

Nb. I'm the one who filed this issue.

I've come to think that this is actually a bug in diesel: if the SQLite docs say that INTEGER PRIMARY KEY is Integer64, that's what diesel print-schema should output and Integer32 is incorrect. It may seem like a special case, but it's a special case in the SQLite specification, not in diesel.

As for the use cases for SQLite and whether they need 64-bit surrogate keys or not, I think that's up to the application developers to decide, and not the ORM's choice. Consider that SQLite is actually used in embedded systems.

On the other hand, manually updating the generated schema on changes will usually take less time than installing the diesel CLI from scratch. If the CLI doesn't (or can't) produce the right types, it's worth being explicit. You still have to update the database and the model, so one more change for the schema isn't that much of a hassle. So don't blindly trust the ORM generated code, but use it as a starting point and then maintain it like you maintain your models.

I really do not see why we should break a lot of already existing code to fix a "special" case.

It could be done for a 2.0 release, whenever that happens.

@tobx
Copy link

tobx commented Sep 16, 2020

I really do not see why we should break a lot of already existing code to fix a "special" case.

As @camlorn said, a silent failure in a production system would be a good reason to fix this case. I think not everyone should be forced to understand such detail of the ORM just to be sure to run a production system that will not randomly break after a month or a year.

To avoid breaking code maybe for now the system could just print out a big warning for this case?

@weiznich
Copy link
Member

First of all the type mapping support from diesel for sqlite has been that what we as diesel maintainers have defined by ourself as being reasonable since diesel supports sqlite. That's because in the end types do not really matter for sqlite. From a theoretical point of view we as developer cannot assume that a column defined as Integer in sqlite will actually contain numbers, as sqlite happy accepts any value there. So following the reasoning behind this issue that the generated schema.rs should be correct the only fully "correct" solution here would be not to generate any schema as we cannot do that without assuming a lot of things.

So let's assume we only want to "fix" this specific issue here:
We currently generate column types based on a mapping based on the type names used by postgres/mysql. See here for the exact mapping code. If you look a bit more in detail in the surrounding code you will see that we are getting the column type names via PRAGMA TABLE_INFO('table_name') from sqlite.
So let's just do a small experiment: Let's create the following two tables and have a look at the output of PRAGMA TABLE_INFO:

create table test(id integer primary key autoincrement);
create table test2(id integer primary key);

Now let's have a look at the corresponding output:

sqlite> pragma table_info('test');
cid|name|type|notnull|dflt_value|pk
0|id|integer|0||1

sqlite> pragma table_info('test2');
cid|name|type|notnull|dflt_value|pk
0|id|integer|0||1

As you can see both return the same table info. That means we do not have any chance to even infer if a primary key is autoincrement or not. The only place where this information is available is in the corresponding CREATE TABLE statement contained in the sqlite_master table. Getting the information from there would require parsing those statements. At least I do not plan to implement a SQL parser any time soon, as there are quite a lot things that are more important for me.

@lnicola
Copy link
Author

lnicola commented Sep 16, 2020

That's a fair reason. Pre-emptively (since others will probably point them out anyway), you can use some heuristics:

  • look in sqlite_sequence for a sequence with the same name as the table
  • look for primary key autoincrement in the table's sqlite_master entry

Neither of them is perfect (e.g. first one breaks when you rename the table or if it's never had any data), but they would probably cover most of the users. And it only reinforces my point about "declare your column types instead of letting the ORM infer them".

But I agree that both of the solutions above are icky and you're not unreasonable if you don't want to implement something like that.

@weiznich
Copy link
Member

Just for the sake of completeness:

  • look in sqlite_sequence for a sequence with the same name as the table

As already noted that works only for tables that already have entries. I would say that this assumption is not true for a significant amount of tables that are generated via diesel migration run + where schema.rs is generated directly afterwards (by diesel migration run for example). Therefore this approach is not acceptable, as it would break on a relatively common use case.

  • look for primary key autoincrement in the table's sqlite_master entry

As written above: that requires parsing sql, because otherwise this will fail as soon as someone has something like the following table:

create table test3(text defaut 'primary key autoincrement');

As already stated: It least I do not want to write or maintain a sql parser for sqlites create table statements.

And it only reinforces my point about "declare your column types instead of letting the ORM infer them".

That's definitively a valid strategy. To reiterate this: Diesel explicitly supports this use case, it's at any point possible to write your own table! macro calls or edit those generated by diesel-cli.

@ahicks92
Copy link

For anyone else who finds this later and who wants to use schema inference, the solution appears to be as follows:

Create a sqlite_mapping.rs containing the following:

use diesel::sql_types::*;

type Integer = BigInt;

Then, in your print-schema, add --import-types mycrate::sqlite_mapping::*. I'm typing this off memory, sorry if there's typos.

If it's obvious, I missed where it's documented. There's a couple offhand mentions of --import-types in various places but that's all I could find.

I now understand that it breaks code to change this (and why), but I'm not too enthused about the silent breakage. I almost missed this point myself. Your autogenerated schema will work fine until it doesn't and until it doesn't is far enough out that it'd be in prod by that point in all likelihood. At the moment, the only warning you get is that i64 doesn't work. Maybe this should be called out somewhere in the documentation?

Though beside the point for this issue, "maintain everything yourself" doesn't make Diesel competitive with options in other languages. Checking is a good idea, but going down that road gives manually written sql migrations, manually written table! macros, and a set of structs whose fields have to be kept in order with both of the proceeding. I understand why these should be decoupled components, but it feels like taking the stance of doing all of this yourself makes it easier only on the developers of Diesel.

@weiznich
Copy link
Member

For anyone else who finds this later and who wants to use schema inference, the solution appears to be as follows:

Create a sqlite_mapping.rs containing the following:

To call that out at least once explicitly: This will change the mapping for all Integer columns in your database to i64, not only the PRIMARY KEY AUTOINCREMENT ones. Depending on your use case this can be fine or not...

Then, in your print-schema, add --import-types mycrate::sqlite_mapping::*. I'm typing this off memory, sorry if there's typos.

If it's obvious, I missed where it's documented. There's a couple offhand mentions of --import-types in various places but that's all I could find.

As for the location where options to manipulate the generated schema are documented like this are documented, have a look at this guide at the official web page? Otherwise I'm fine with getting a PR extending that documentation to make this more clear.

Though beside the point for this issue, "maintain everything yourself" doesn't make Diesel competitive with options in other languages. Checking is a good idea, but going down that road gives manually written sql migrations, manually written table! macros, and a set of structs whose fields have to be kept in order with both of the proceeding. I understand why these should be decoupled components, but it feels like taking the stance of doing all of this yourself makes it easier only on the developers of Diesel.

I mean nobody forces you to use diesel. If you don't like it just use something different or write your own ORM 🤷 . I do not consider it as friendly to call out design decisions because "makes it easier only on the developers of Diesel". I mean we provide this crate in our free time, so maybe recheck your wording.
That said the guide I've mentioned above lists 2 possibilities how to in cooperate that with automatic schema generation.

@ahicks92
Copy link

I'm not saying that I have a problem with the diesel developers, only that the justification here seems weak and the usability much less than ideal. Not having the resources and developing something for free are indeed good reasons to not do something, but the justification here seems to be that manually maintaining these is fine, and if it's a resource issue then in general I'd have expected someone to say "that's a resource issue". Perhaps they did and I missed it.

Not having one authoritative location for configuration in your code is a good way to introduce bugs. But if there's wider discussion to be had, that's probably out of scope for this issue in any case, and regardless I found (and documented) a workaround which at least temporarily reduces the pain points for people who need to use Sqlite. maybe I will revisit this topic in general somewhere else in a month or two after we've finished adopting the Rust stack, when I may perhaps be able to propose concrete action.

@curtisleefulton
Copy link

Upvote for re-opening this issue - it's clearly a bug in print-schema.

The fact that print-schema depends on a single-source, Pg-derived lookup table for type mapping is an implementation detail. The chosen implementation does not scale to the sqlite use case.

Manually maintaining the schema may be an acceptable work-around and the cost of fixing this bug may exceed the benefit. If so, I'd propose a loud warning from print-schema when it encounters an integer in the sqlite context.

@weiznich
Copy link
Member

@curtisleefulton As outlined above: I do not see this issue as actionable, therefore it remains closed as long as noone can point out a solution that is:

  • backward compatible as we will not release a breaking diesel version anytime soon
  • addresses all the problems listed above

I'm happy to read your suggestions how to address these point. Otherwise its discouraged to post on old issues to just repeat what already has been said.

I do not consider adding a bolt warning here as a valid solution as I expect that only fraction of our users will ever have more than i32::MAX table entries.

@gx0r
Copy link

gx0r commented Mar 28, 2023

Hello, my first post here so first thank you to the Diesel developers and contributors for this great library. I also ran into this issue. Good discussion and I agree it's a tricky one.

I also worked around it via diesel.toml:

[print_schema]
import_types = ["crate::sqlite_mapping::*"]
file = "src/schema.rs"

sqlite_mapping.rs:

// https://github.com/diesel-rs/diesel/issues/852
pub use diesel::sql_types::*;
// This will change the mapping for all Integer columns in your database to i64, not only the PRIMARY KEY AUTOINCREMENT ones.
// Depending on your use case this can be fine or not...
pub type Integer = BigInt;

Perhaps it's arguably a bug in SQLIte as it should, instead of,

CREATE TABLE items (
  id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL
)

let you write:

CREATE TABLE items (
  id BIGINT PRIMARY KEY AUTOINCREMENT NOT NULL
)

because in the BIGINT case SQLite will fail with AUTOINCREMENT is only allowed on an INTEGER PRIMARY KEY

@stormshield-kg
Copy link
Contributor

I made a PR to fix this issue : #3940.

@stormshield-kg
Copy link
Contributor

This issue is now fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.