Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DB: Uniqueness constraint in "waveforms_continuoustrace" needs to be extended #84

Closed
1 of 2 tasks
megies opened this issue Jul 26, 2018 · 7 comments
Closed
1 of 2 tasks

Comments

@megies
Copy link
Collaborator

megies commented Jul 26, 2018

In real life, it unfortunately sometimes happens that data is duplicated in local files, e.g. mostly identical miniseed packets (maybe with just a different sequence number) duplicated in the same file. I know this can happen e.g. when accidentally running two slarchive processes (to save data from seedlink buffer to disk) where only a single one should run.

Example:

In [1]: st = read('Z3.A104A..HHE.D.2018.042')

In [2]: st
Out[2]: 
2 Trace(s) in Stream:
Z3.A104A..HHE | 2018-02-11T00:00:00.400000Z - 2018-02-12T00:00:00.340000Z | 100.0 Hz, 8639995 samples
Z3.A104A..HHE | 2018-02-11T00:00:00.400000Z - 2018-02-12T00:00:00.340000Z | 100.0 Hz, 8639995 samples

Right now, jane can not index such files (if at least one contiguous data chunk in the file is fully duplicated), because during indexing a unique constraint on waveforms_contiguoustrace gets violated:

2018-07-26 08:56:54 UTC [10285-70] jane@jane ERROR:  duplicate key value violates unique constraint "waveforms_continuoustrace_file_id_1a3787d9_uniq"
2018-07-26 08:56:54 UTC [10285-71] jane@jane DETAIL:  Key (file_id, network, station, location, channel, timerange)=(15868162, Z3, A104A, , HHE, ["2018-04-10 00:00:00.45+00","2018-04-11 00:00:03.4+00")) already exists.

To add to initial confusion, the conflicting and supposedly "existing" database entry mentioned in the postgres error (here some file with id 15868162 in waveforms_file) can not even be found in the database, because the conflict happens during one transaction that is then never committed but instantly rolled back.

Two problematic outcomes of this are:

  • postgres error log blows up at enormous speed (if indexer is looping), we had it fill up the whole partition (which can lead to even nastier problems in consequence)
  • id field in waveforms_contiguoustrace gets blown up (incremented by entries that are immediately rolled back and that never make it into the table actually) which might lead to further problems when hitting the maximum possible number of the id field (if this problem is not detected soon enough)

Two implications:

  • must: unique constraint on waveforms_contiguoustrace needs to be changed, it has to include pos field (which I assume gives index of Trace when reading the file into a Stream in ObsPy). then jane should be able to index such problematic files
  • "would be nice": ideally, the indexer should remember what files could not be indexed and when starting another loop should skip those files if they did not change in the meantime (checking modification time)
@megies
Copy link
Collaborator Author

megies commented Jul 26, 2018

Main question I have right now is how to properly change the unique constraint on the live production jane instance.. @krischer @barsch I'm guessing via that migration mechanic I don't really know well enough right now?

@barsch
Copy link
Collaborator

barsch commented Jul 26, 2018

https://docs.djangoproject.com/en/1.11/topics/migrations/

basically change your model - after changes are done run python manage.py makemigrations [module] (module is optional) - after that apply your migrations via python manage.py migrate

@barsch
Copy link
Collaborator

barsch commented Jul 26, 2018

in order to set the unique constraint you also need to use unique_together in the Meta class, see

https://stackoverflow.com/questions/2201598/how-to-define-two-fields-unique-as-couple

@megies
Copy link
Collaborator Author

megies commented Jul 26, 2018

And django knows which migrations were already acted on because it stores that info in the DB in "django_migrations" table, I guess?

in order to set the unique constraint you also need to use unique_together in the Meta class, see

Mhm.. that's the spot in source code I saw will need changing

So.. django creates the migration automatically?

  • in jane source code in a PR, change model in jane/waveforms/models.py to include pos in unique_together
  • on production server..
    • stop jane
    • run python manage.py makemigrations
    • run python manage.py migrate
    • start jane
    • EDIT: then commit those migration files django creates and push them to github??

?

@barsch
Copy link
Collaborator

barsch commented Jul 26, 2018

And django knows which migrations were already acted on because it stores that info in the DB in "django_migrations" table, I guess?

correct

So.. django creates the migration automatically?

it tries hard - it detects all kind of changes - adding/changing the unique_together constraint shouldn't be an issue

So.. django creates the migration automatically?
in jane source code in a PR, change model in jane/waveforms/models.py to include pos in unique_together
on production server..
stop jane
run python manage.py makemigrations
run python manage.py migrate
start jane

careful!

  • development server:
    • stop jane
    • run python manage migrate (makes sure all current migrations are applied)
    • change model
    • run python manage.py makemigrations (check if no error, new migrations file should be created)
    • run python manage migrate (apply new migration)
    • check if database has changed
    • start jane
    • test everything
  • production server
    • stop jane
    • make a backup of the database!
    • run python manage migrate (makes sure all current migrations are applied)
    • git pull latest changes
    • run python manage migrate (apply new migration)
    • start jane

-> so essentiall generating migrations should be done on development server, on productive you only apply the migrations ....

@megies
Copy link
Collaborator Author

megies commented Jul 26, 2018

-> so essentiall generating migrations should be done on development server, on productive you only apply the migrations ....

alright.. fingers crossed 🙈

megies pushed a commit that referenced this issue Aug 22, 2018
@megies
Copy link
Collaborator Author

megies commented Aug 23, 2018

Closed by 956cf03 and fccfcaf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants