Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use integer primary keys for smaller tables #20

Merged
merged 4 commits into from
Nov 21, 2021
Merged

Conversation

simonw
Copy link
Owner

@simonw simonw commented Nov 19, 2021

Refs #12. Still needs a bit more work:

  • See if I can come up with a better column name than _item_hash_id (I went with _item_id)
  • Ship sqlite-utils 3.19 and update dependency
  • Update schema description in README
  • Update reserved columns

@simonw
Copy link
Owner Author

simonw commented Nov 19, 2021

This is also using sqlite-utils 3.19a0: https://github.com/simonw/sqlite-utils/releases/tag/3.19a0

install_requires=["click", "GitPython", "sqlite-utils>=3.19a0"],

I need to turn that into a non-alpha release before landing this branch.

@simonw
Copy link
Owner Author

simonw commented Nov 19, 2021

Here's the new schema produced by this code:

assert db.schema == (
"CREATE TABLE [commits] (\n"
" [id] INTEGER PRIMARY KEY,\n"
" [hash] TEXT,\n"
" [commit_at] TEXT\n"
");\n"
"CREATE UNIQUE INDEX [idx_commits_hash]\n"
" ON [commits] ([hash]);\n"
"CREATE TABLE [items] (\n"
" [_id] INTEGER PRIMARY KEY,\n"
" [_item_hash_id] TEXT,\n"
" [item_id] INTEGER,\n"
" [name] TEXT,\n"
" [_commit] INTEGER\n"
");\n"
"CREATE UNIQUE INDEX [idx_items__item_hash_id]\n"
" ON [items] ([_item_hash_id]);\n"
"CREATE TABLE [item_versions] (\n"
" [_item] INTEGER REFERENCES [items]([_id]),\n"
" [_version] INTEGER,\n"
" [_commit] INTEGER REFERENCES [commits]([id]),\n"
" [item_id] INTEGER,\n"
" [name] TEXT,\n"
" PRIMARY KEY ([_item], [_version])\n"
");"
)

@simonw
Copy link
Owner Author

simonw commented Nov 19, 2021

The _item_hash_id column holds the hash of the ID column values for that specific item.

I called it _item_hash_id because the underlying code also calculates the hash of ALL of the column values in order to detect if it has changed since the previous version. In the Python code that's called item_hash and I was getting it confused with the other hash, hence item_hash_id as a variable name which turned into a column.

I'm sure I can come up with a better name for this column - and refactor the code once I do.

@simonw
Copy link
Owner Author

simonw commented Nov 19, 2021

I should update this reserved column list:

RESERVED = ("_id", "_item", "_version", "_commit", "rowid")

@simonw
Copy link
Owner Author

simonw commented Nov 19, 2021

The README talks about column names and needs to be updated, but I may also extend the README to show a full schema (maybe generated with cog) for as clear an explanation as possible.

I should rig up a live demo too, ala datasette-graphql and friends.

@simonw
Copy link
Owner Author

simonw commented Nov 21, 2021

I'm going to rename _item_hash_id to _item_id and in the code I'll differentiate that from _item_hash.

@simonw
Copy link
Owner Author

simonw commented Nov 21, 2021

The README talks about column names and needs to be updated, but I may also extend the README to show a full schema (maybe generated with cog) for as clear an explanation as possible.

I decided not to use cog because generating an example schema requires running the full test fixtures mechanism that creates a temporary Git repository and that feels like too much complexity here.

@simonw simonw marked this pull request as ready for review November 21, 2021 05:02
@simonw simonw merged commit 6aec163 into main Nov 21, 2021
simonw added a commit that referenced this pull request Nov 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant