Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(waku-store): added an index to improve messages query time #1120

Merged
merged 1 commit into from
Sep 5, 2022

Conversation

LNSD
Copy link
Contributor

@LNSD LNSD commented Sep 5, 2022

This PR contains surgical changes to unblock Status's waku store testing. Some deficiencies have been identified during the analysis and will be addressed shortly.

The existing SQLite index of the messages table, i_rt, was not helpful for the store queries.

  • Added a DB migration and increased the database schema version:
    • Drop the previous i_rt index
    • Create a new index, i_msg, that contains the fields relevant to the store query
  • Update the messages index creation query

@LNSD LNSD self-assigned this Sep 5, 2022
@@ -76,7 +76,7 @@ proc createTable*(db: SqliteDatabase): DatabaseResult[void] {.inline.} =
## Create index

template createIndexQuery(table: string): SqlQueryStr =
"CREATE INDEX IF NOT EXISTS i_rt ON " & table & " (receiverTimestamp);"
"CREATE INDEX IF NOT EXISTS i_msg ON " & table & " (contentTopic, pubsubTopic, senderTimestamp, id);"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: nice 👍.

@@ -76,7 +76,7 @@ proc createTable*(db: SqliteDatabase): DatabaseResult[void] {.inline.} =
## Create index

template createIndexQuery(table: string): SqlQueryStr =
"CREATE INDEX IF NOT EXISTS i_rt ON " & table & " (receiverTimestamp);"
"CREATE INDEX IF NOT EXISTS i_msg ON " & table & " (contentTopic, pubsubTopic, senderTimestamp, id);"
Copy link

@felicio felicio Sep 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking:

The queried node MUST sort the WakuMessages based on their Index, where the senderTime constitutes the most significant part and the digest comes next, and then perform pagination on the sorted result. As such, the retrieved page contains an ordered list of WakuMessages from the oldest message to the most recent one.

– https://rfc.vac.dev/spec/13/

ReceiverTimestamp is guaranteed to be set, while senders could omit setting senderTimestamp.

– https://github.com/status-im/nwaku/blob/2cb7123df84b18ab7e4df0be0d77353cca4820f2/waku/v2/node/storage/message/waku_message_store.nim

  1. Since the field is optional, where would messages without it end up in the resulting list? At the end?
  2. In case of a null value, isn't it worth fallbacking to receiverTimestamp when sorting, thus keeping it in the index?
  3. Or shouldn't senderTimestamp be required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Since the field is optional, where would messages like that end up in the resulting list? At the end?

I am not sure about that. I need to review the current implementation.

  1. In case of a null value, isn't worth fallbacking to receiverTimestamp, thus keeping it in the index?

What I am thinking is to simplify the current implementation by keeping a single column with a timestamp.

The new column will be named storeTimestamp or storedAt. This timestamp will be set to senderTimesamp if present and to the epoch time value at the moment of message insertion. The message will be dropped and not inserted into the store if the message is "too old" (older than the current time minus ~10s (?)) or "from the future" (current time plus more than ~10s (?)).

Optionally we can keep the senderTimestamp column just in case we want to use it to extract metrics of the message propagation. But the reference timestamp column will be the storeTimestamp column for query purposes (and store restore functionality).

  1. Or shouldn't senderTimestamp be required?

IIRC For privacy-preserving reasons, the sender timestamp can be omitted. So makes no sense to make it mandatory.

@kaiserd @jm-clius WDYT?

@LNSD
Copy link
Contributor Author

LNSD commented Sep 5, 2022

As the load tests' results indicate a substantial improvement in the query times, I merge these changes into the master branch. Further improvements in the waku store protocol (e.g., readability, metrics, implementation details, etc.) will be in future pull requests.

See: status-im/status-web#306

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants