-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add document TTL for SQLiteYStore #50
Changes from all commits
ad1d270
406023f
ee3ba41
5f1268d
645b46c
d3b2cf2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -127,3 +127,7 @@ dmypy.json | |
|
||
# Pyre type checker | ||
.pyre/ | ||
|
||
# test JS dependencies | ||
tests/node_modules | ||
tests/package-lock.json |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,7 @@ | ||
import asyncio | ||
import tempfile | ||
import time | ||
from abc import ABC, abstractmethod | ||
from datetime import datetime | ||
from pathlib import Path | ||
from typing import AsyncIterator, Callable, Optional, Tuple | ||
|
||
|
@@ -126,6 +126,10 @@ class MySQLiteYStore(SQLiteYStore): | |
""" | ||
|
||
db_path: str = "ystore.db" | ||
# Determines the "time to live" for all documents, i.e. how recent the | ||
# latest update of a document must be before purging document history. | ||
# Defaults to 1 day. | ||
document_ttl: int = 24 * 60 * 60 | ||
path: str | ||
db_created: asyncio.Event | ||
|
||
|
@@ -138,7 +142,10 @@ def __init__(self, path: str, metadata_callback: Optional[Callable] = None): | |
async def create_db(self): | ||
async with aiosqlite.connect(self.db_path) as db: | ||
await db.execute( | ||
"CREATE TABLE IF NOT EXISTS yupdates (path TEXT, yupdate BLOB, metadata BLOB, timestamp TEXT)" | ||
"CREATE TABLE IF NOT EXISTS yupdates (path TEXT NOT NULL, yupdate BLOB, metadata BLOB, timestamp REAL NOT NULL)" | ||
) | ||
await db.execute( | ||
"CREATE INDEX IF NOT EXISTS idx_yupdates_path_timestamp ON yupdates (path, timestamp)" | ||
) | ||
await db.commit() | ||
self.db_created.set() | ||
|
@@ -163,8 +170,21 @@ async def write(self, data: bytes) -> None: | |
await self.db_created.wait() | ||
metadata = await self.get_metadata() | ||
async with aiosqlite.connect(self.db_path) as db: | ||
# first, determine time elapsed since last update | ||
cursor = await db.execute( | ||
"SELECT timestamp FROM yupdates WHERE path = ? ORDER BY timestamp DESC LIMIT 1", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand why we need to order by timestamp, the updates are supposed to be already ordered, right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not necessarily, and even if it is, I think it's better to be explicit about the order this query requires, to avoid breaking this query if the table schema were to change in the future. AFAIK, tables without a primary key are ordered simply by insertion order. Thus, the oldest update would be returned without the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But this also probably comes with an extra cost. In our case, insertion order is already ordered by time. Isn't it possible to get the last row in the query? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
You're right, but this is addressed by the composite index that you commented on earlier. I'm not expert in SQLite performance characteristics, but there are some justifications for this:
No, ascending order is assumed if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do keep in mind that yes, I don't have benchmarks, so these rationalizations could be completely false. However, I think there is plenty of good justification for this implementation, and we shouldn't let performance ambiguity steer us away. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the details David 👍 |
||
(self.path,), | ||
) | ||
row = await cursor.fetchone() | ||
diff = (time.time() - row[0]) if row else 0 | ||
|
||
# if diff > document_ttl, delete document history | ||
if diff > self.document_ttl: | ||
await db.execute("DELETE FROM yupdates WHERE path = ?", (self.path,)) | ||
|
||
# finally, write this update to the DB | ||
await db.execute( | ||
"INSERT INTO yupdates VALUES (?, ?, ?, ?)", | ||
(self.path, data, metadata, datetime.utcnow()), | ||
(self.path, data, metadata, time.time()), | ||
dlqqq marked this conversation as resolved.
Show resolved
Hide resolved
|
||
) | ||
await db.commit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you describe what this does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This creates a composite index that makes the
SELECT ... WHERE path = ? ORDER BY timestamp
query more efficient. The mental model is that this constructs a B-tree where records are first sorted by path, and then ties are resolved by the timestamp, which is the best data structure for this query.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://www.sqlite.org/queryplanner.html#_multi_column_indices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍