-
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added_ms and updated_ms columns are not reliable in face of operations like INSERT OR REPLACE INTO #7
Comments
Questions about this:
I think there are actually two consequences here:
Both of these are serious limitations in the chronicle system. At the very least they need to be documented, but they have knock-on effects for ways I want to use this system as well. Open questions:
Some use-cases are mostly unaffected: if you're using chronicle to sync copies of tables it's not harmful if you occasionally pull down duplicate data that hasn't actually changed. It's the audit log style functionality that's most affected by this. |
Ideally the system would reliably avoid incrementing version and resetting
I imagine there are other edge cases like this too. |
The |
One option for CREATE TRIGGER after_update_multi_column_data
AFTER UPDATE ON multi_column_data
WHEN OLD.col1 IS NOT NEW.col1 OR
OLD.col2 IS NOT NEW.col2 OR
OLD.col3 IS NOT NEW.col3 OR
OLD.col4 IS NOT NEW.col4 OR
OLD.col5 IS NOT NEW.col5 OR
OLD.col6 IS NOT NEW.col6
BEGIN
INSERT INTO update_log (last_update) VALUES (datetime('now'));
END; The problem with this is that it will break any time a new column is added, unless the trigger itself is updated. So maybe |
This is now the case: |
Could the |
Cases I want to cover:
That last one may turn out to be impossible, because I don't think the |
Put together a quick demo script to show exactly what is visible to the various SQLite triggers at different points: https://gist.github.com/simonw/7f7bf70f4732f5952ab39059d8c069e7 Output (plus extra comments):
Where each indented line represents the captured log message from the trigger, showing the OLD and NEW rows rendered as JSON using Note how the duplicated |
How about if in the It could work by putting a note in some kind of table along with a timestamp, then having the The fundamental problem here is that |
I've been learning more about SQLite triggers recently here: https://til.simonwillison.net/sqlite/json-audit-log |
Got this: https://chat.openai.com/share/e6697bdc-e7a4-4b44-9232-d1f4ad1f6361 # Recreate the database connection and cursor
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()
# Recreate the tables and the updated trigger
cursor.execute('''
CREATE TABLE IF NOT EXISTS log_messages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
message TEXT
);
''')
cursor.execute('''
CREATE TABLE your_table (
id INTEGER PRIMARY KEY,
col1 TEXT,
col2 TEXT
);
''')
cursor.execute('''
CREATE TRIGGER before_insert_your_table
BEFORE INSERT ON your_table
FOR EACH ROW
BEGIN
INSERT INTO log_messages (message)
SELECT
CASE
WHEN EXISTS (SELECT 1 FROM your_table WHERE id = NEW.id) THEN
json_object(
'id', NEW.id,
'differences', json_object(
'col1', CASE WHEN NEW.col1 != your_table.col1 THEN NEW.col1 ELSE NULL END,
'col2', CASE WHEN NEW.col2 != your_table.col2 THEN NEW.col2 ELSE NULL END
)
)
ELSE
'record created'
END
FROM (SELECT 1) -- Dummy table for CASE WHEN structure
LEFT JOIN your_table ON your_table.id = NEW.id;
END;
''')
# Insert initial data into your_table
cursor.execute("INSERT INTO your_table (id, col1, col2) VALUES (1, 'A', 'B');")
# Try to insert a new record with the same primary key but different values in other columns
cursor.execute("INSERT OR REPLACE INTO your_table (id, col1, col2) VALUES (1, 'C', 'D');")
# Insert a new record with no existing primary key
cursor.execute("INSERT INTO your_table (id, col1, col2) VALUES (2, 'E', 'F');")
# Query the log_messages table to see the logged differences
cursor.execute("SELECT * FROM log_messages;")
log_messages = cursor.fetchall()
# Close the database connection
conn.close()
# Print the log messages
log_messages Which prints:
|
Got it to determine if a row has been inserted, updated or left unmodified: # Recreate the database connection and cursor
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()
# Recreate the tables and the updated trigger
cursor.execute('''
CREATE TABLE IF NOT EXISTS log_messages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
message TEXT
);
''')
cursor.execute('''
CREATE TABLE your_table (
id INTEGER PRIMARY KEY,
col1 TEXT,
col2 TEXT
);
''')
cursor.execute('''
CREATE TRIGGER before_insert_your_table
BEFORE INSERT ON your_table
FOR EACH ROW
BEGIN
INSERT INTO log_messages (message)
SELECT
CASE
WHEN EXISTS (SELECT 1 FROM your_table WHERE id = NEW.id) THEN
CASE
WHEN NEW.col1 = your_table.col1 AND NEW.col2 = your_table.col2 THEN
'Record unchanged'
ELSE
json_object(
'id', NEW.id,
'message', 'Record updated',
'differences', json_object(
'col1', CASE WHEN NEW.col1 != your_table.col1 THEN NEW.col1 ELSE NULL END,
'col2', CASE WHEN NEW.col2 != your_table.col2 THEN NEW.col2 ELSE NULL END
)
)
END
ELSE
json_object(
'id', NEW.id,
'message', 'Record created',
'columns', json_object(
'col1', NEW.col1,
'col2', NEW.col2
)
)
END
FROM (SELECT 1) -- Dummy table for CASE WHEN structure
LEFT JOIN your_table ON your_table.id = NEW.id;
END;
''')
# Insert initial data into your_table
cursor.execute("INSERT INTO your_table (id, col1, col2) VALUES (1, 'A', 'B');")
# Try to insert a new record with the same primary key but different values in other columns
cursor.execute("INSERT OR REPLACE INTO your_table (id, col1, col2) VALUES (1, 'C', 'D');")
# Insert a new record with no existing primary key
cursor.execute("INSERT INTO your_table (id, col1, col2) VALUES (2, 'E', 'F');")
# Insert a record with the same primary key and same values in other columns
cursor.execute("INSERT OR REPLACE INTO your_table (id, col1, col2) VALUES (1, 'C', 'D');")
# Query the log_messages table to see the logged differences
cursor.execute("SELECT * FROM log_messages;")
log_messages = cursor.fetchall()
# Close the database connection
conn.close()
# Print the log messages
log_messages
https://chat.openai.com/share/e6697bdc-e7a4-4b44-9232-d1f4ad1f6361 |
That's inserting rows into Interesting that this ends up being logic in the |
The |
There may be another option here. I ran this all through Claude 3 Opus and it spat out code that doesn't actually work yet but that suggests a potential alternative route: CREATE TRIGGER "_chronicle_{table_name}_ai"
AFTER INSERT ON "{table_name}"
FOR EACH ROW
WHEN NOT EXISTS (
SELECT 1 FROM "_chronicle_{table_name}"
WHERE {' AND '.join([f'"{col[0]}" = NEW."{col[0]}"' for col in primary_key_columns])}
)
BEGIN
INSERT INTO "_chronicle_{table_name}" ({', '.join([f'"{col[0]}"' for col in primary_key_columns])}, added_ms, updated_ms, version)
VALUES ({', '.join(['NEW.' + f'"{col[0]}"' for col in primary_key_columns])}, {current_time_expr}, {current_time_expr}, {next_version_expr});
END; And: c.execute(
f"""
CREATE TRIGGER "_chronicle_{table_name}_au"
AFTER UPDATE ON "{table_name}"
FOR EACH ROW
WHEN EXISTS (
SELECT 1 FROM "_chronicle_{table_name}"
WHERE {' AND '.join([f'"{col[0]}" = OLD."{col[0]}"' for col in primary_key_columns])}
AND (
{' OR '.join([f'OLD."{col[0]}" IS NOT NEW."{col[0]}"' for col in primary_key_columns if col[0] not in [pk[0] for pk in primary_key_columns]])}
)
)
BEGIN
UPDATE "_chronicle_{table_name}"
SET updated_ms = {current_time_expr},
version = {next_version_expr},
{', '.join([f'"{col[0]}" = NEW."{col[0]}"' for col in primary_key_columns])}
WHERE { ' AND '.join([f'"{col[0]}" = OLD."{col[0]}"' for col in primary_key_columns]) };
END;
"""
) Full transcript: https://gist.github.com/simonw/d800c38df975c7d768b425532c48f1fe Like I said, this code doesn't actually work - but the idea of using |
No doing this in
So my original idea involving some kid of note left in a table by |
The thing that matters most here is detecting if the record was either inserted or updated (modified) in some way. So maybe the trick is to serialize the row as JSON in the before trigger and then do that again in the after trigger and run a dumb string comparison? |
I figured out a robust (if long-winded) pattern for JSON serializing a row - including nulls and BLOB columns - in this TIL: https://til.simonwillison.net/sqlite/json-audit-log |
I'm going to try a |
An edge case to worry about: what happens if you update just one of the values that is part of a compound primary key? I think logically that should be seen as a deletion of the previous primary key and the insertion of a brand new one - so for the purposes of chronicle, it should be seen as the new row being marked as being both inserted and updated, and the previous row should get a updated set to now and a I'd be OK with saying that this edge case is not supported in the documentation though. But not sure if I can enforce that within Datasette, since that supports arbitrary Might be possible to have a before update trigger which raises an error if you attempt to update any of the primary column values. It is possible to trigger an |
Original title: Using insert-or-replace sets added_ms and updated_ms to the same value
I just noticed that on this demo:
That's populated by this script https://github.com/simonw/federal-register-to-datasette/blob/fb848b0e05ff79ca60a9d9d8adb0c9a36a938751/fetch_documents.py which makes this API call:
Which results in a call to the
sqlite-utils
method.insert_all(..., replace=True)
Which does this: https://github.com/simonw/sqlite-utils/blob/88bd37220593f46ad2221601d6724dd0198400ad/sqlite_utils/db.py#L2983-L2988
INSERT OR REPLACE INTO documents ...
The text was updated successfully, but these errors were encountered: