Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow reindexing a Storage #24

Open
albe opened this issue Aug 18, 2017 · 3 comments
Open

Allow reindexing a Storage #24

albe opened this issue Aug 18, 2017 · 3 comments
Labels
enhancement P: Index Affects the indexing layer P: Storage Affects the storage layer

Comments

@albe
Copy link
Owner

albe commented Aug 18, 2017

This is a very expensive operation, as it involves a full database scan, but it can help when indexes get broken and would technically allow for rewriting storage partitions.

Edit: This is not easily possible without completely losing original global document order, since the global version number needs either to be stored in the document record itself (which means writing the index before the document), OR assume that the documents haven't changed in amount and order, so going through the existing index and keeping that order.

An midway alternative would be to just guarantee a chronological order and reindex on commit timestamp (which could still be unrelated to actuall document creation/event occurence order).

@albe
Copy link
Owner Author

albe commented Sep 26, 2019

Alternatives:

  • no global order, only per partition
  • store global sequence inside document (still means writers can never be distributed per partition)
  • partial global order by timestamp/vector clock/TrueTime (Google spanner)
  • global order by [monotonic timestamp, partitionId] (only consistent with distributed writers with a synchronized clock)

@albe
Copy link
Owner Author

albe commented Sep 29, 2019

Depending on the goals, following solutions are viable:

  • strong consistency for all reads over multiple partitions:
    • single writer for all partitions / global sequence number
    • TrueTime (increases latency in relation to clock drift, so performance issue without guarantees for synchronized clocks drift in the order of few ms)
  • guaranteed consistency only for reads per partition:
    • no global order
    • partial global order (cross-partition reads are consistent most of the time)

partial global order gives best results vs. implementation cost, but no strong guarantees. The partial global order could be "upgraded" to TrueTime though, by storing two timestamps. As long as a single writer is still used (no replication in place), this would still provide a strong consistency with a monotonic clock in the system and write-timestamp given by the writer.

@albe albe added enhancement P: Index Affects the indexing layer P: Storage Affects the storage layer labels Oct 5, 2019
@albe
Copy link
Owner Author

albe commented May 31, 2020

With #80 a global order can be established via the monotonic clock stamp and/or the sequence number in the document header.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement P: Index Affects the indexing layer P: Storage Affects the storage layer
Projects
None yet
Development

No branches or pull requests

1 participant