Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observation-graph loading should be transactional #120

Open
Robsteranium opened this issue Jun 10, 2022 · 0 comments
Open

Observation-graph loading should be transactional #120

Robsteranium opened this issue Jun 10, 2022 · 0 comments
Assignees
Labels
etl Related to the etl/pipelines

Comments

@Robsteranium
Copy link
Contributor

#115 introduces a graph-index (for #17) which is dropped/ re-inserted before the observation-pipeline runs. If the ETL process is interrupted while loading observations then it leaves the graph and observation indexes in an inconsistent state. If you try re-running the ETL process then nothing happens as it finds the graph-index to be up to date (even though the observation index is out of date and could include partially-loaded graphs).

To recover from this sort of interruption we need to manually delete the observation and graph indices before restarting, e.g.

curl -X DELETE http://localhost:9200/observation
curl -X DELETE http://localhost:9200/graph
sudo systemctl start etl

It'd be nice if it were a bit more transactional or at least didn't leave the indices in an inconsistent state after interruption - e.g. only update the graph index one doc at a time after all the observations are loaded for that graph. That way even if the observation-pipeline was interrupted mid-graph, it'd redo that graph on the next run (and retain any completed ones).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
etl Related to the etl/pipelines
Projects
None yet
Development

No branches or pull requests

2 participants