Skip to content
This repository has been archived by the owner on Jul 13, 2022. It is now read-only.

3-part solution to OpenData postgres database duplicates #2

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

komasing
Copy link

@komasing komasing commented Jun 6, 2018

Reason: Postgres database might include duplicates in case of exceedingly long anonymization process or MongoDB failure.

Solution:

  1. last_mongodb_timestamp is updated after each successful batch to reduce the amount of records which need to be recalculated in case of MongoDB failure;
  2. data lock for Postgres database is introduced - another anonymization process can't commence before the previous process is finished;
  3. respective MongoID is stored in Postgres database under mongoid column, which makes it easier to identify duplicates. Note, however, that at any given time it's OK to have 2 Postgres entries with identical MongoID, as producer and client are under different entries in Postgres, yet under one in MongoDB.

…every successful batch, and mongo IDs stored in opendata postgres database for maintenance purposes
Restore variables DAYS and LOGS_TIME_BUFFER
…every successful batch, and mongo IDs are now stored in opendata postgres database for maintenance purposes
…every successful batch, and mongo IDs stored in opendata postgres database for maintenance purposes
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant