Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

table.transform() should preserve rowid values #592

Closed
simonw opened this issue Sep 8, 2023 · 6 comments
Closed

table.transform() should preserve rowid values #592

simonw opened this issue Sep 8, 2023 · 6 comments
Labels
bug Something isn't working

Comments

@simonw
Copy link
Owner

simonw commented Sep 8, 2023

I just spotted a bug when using https://datasette.io/plugins/datasette-configure-fts and https://datasette.io/plugins/datasette-edit-schema at the same time.

Steps to reproduce:

  • Configure FTS for a table, then run a test search
  • Edit the schema for that table and change the order of columns
  • Run the test search again

I got the wrong search results, which I think is because the _fts table pointed to the first table by rowid but those rowid values were entirely rewritten as a consequence of running table.transform() on the table.

Reconfiguring FTS on the table fixed the problem.

I think table.transform() should be able to preserve rowid values.

@simonw simonw added the bug Something isn't working label Sep 8, 2023
@simonw
Copy link
Owner Author

simonw commented Sep 8, 2023

That's odd, I wrote a test for this just now and it passes already:

def test_transform_preserves_rowids(fresh_db):
    # Create a rowid table
    fresh_db["places"].insert_all(
        (
            {"name": "Paris", "country": "France"},
            {"name": "London", "country": "UK"},
            {"name": "New York", "country": "USA"},
        ),
    )
    assert fresh_db["places"].use_rowid
    previous_rows = list(
        tuple(row) for row in fresh_db.execute("select rowid, name from places")
    )
    # Transform it
    fresh_db["places"].transform(column_order=("country", "name"))
    # Should be the same
    next_rows = list(
        tuple(row) for row in fresh_db.execute("select rowid, name from places")
    )
    assert previous_rows == next_rows

So maybe I'm wrong about the cause of that bug?

@simonw
Copy link
Owner Author

simonw commented Sep 8, 2023

I tried bumping that up to 10,000 rows instead of just 3 but the test still passed.

@simonw
Copy link
Owner Author

simonw commented Sep 8, 2023

I just noticed that the table where I encountered this bug wasn't actually a rowid table after all - it had an id column that was a text primary key.

The reason the rowid was important is that's how the FTS mechanism in Datasette relates FTS entries to their rows.

But I tried this test and it passed, too:

def test_transform_preserves_rowids(fresh_db):
    fresh_db["places"].insert_all(
        [
            {"id": "1", "name": "Paris", "country": "France"},
            {"id": "2", "name": "London", "country": "UK"},
            {"id": "3", "name": "New York", "country": "USA"},
        ],
        pk="id",
    )
    previous_rows = list(
        tuple(row) for row in fresh_db.execute("select rowid, id, name from places")
    )
    # Transform it
    fresh_db["places"].transform(column_order=("country", "name"))
    # Should be the same
    next_rows = list(
        tuple(row) for row in fresh_db.execute("select rowid, id, name from places")
    )
    assert previous_rows == next_rows

@simonw
Copy link
Owner Author

simonw commented Sep 8, 2023

Oh! Maybe the row ID preservation here is a coincidence because the tables are created from scratch and count 1, 2, 3.

If I delete a row from the table and then insert some more - breaking the rowid sequence - it might show the bug.

@simonw
Copy link
Owner Author

simonw commented Sep 8, 2023

Yes! That recreated the bug:

>       assert previous_rows == next_rows
E       AssertionError: assert equals failed
E         [                                                                [                                                               
E           (1, '1', 'Paris'),                                               (1, '1', 'Paris'),                                            
E           (3, '3', 'New York'),                                            (2, '3', 'New York'),                                         
E           (4, '4', 'London'),                                              (3, '4', 'London'),                                           
E         ]                                                       ...
E         

@simonw
Copy link
Owner Author

simonw commented Sep 10, 2023

In working on this I learned that rowid values in SQLite are way less stable than I had thought - in particular, they are often entirely rewritten on a VACUUM:

https://www.sqlite.org/lang_vacuum.html#how_vacuum_works

The VACUUM command may change the ROWIDs of entries in any tables that do not have an explicit INTEGER PRIMARY KEY.

So this fix wasn't as valuable as I thought. I need to move away from ever assuming that a rowid is a useful foreign key for anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant