Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize the long running clearmash download pipelines #86

Open
OriHoch opened this issue Aug 8, 2017 · 2 comments
Open

optimize the long running clearmash download pipelines #86

OriHoch opened this issue Aug 8, 2017 · 2 comments

Comments

@OriHoch
Copy link
Contributor

OriHoch commented Aug 8, 2017

download and download_related pipelines might take a long time to run (for the first time)

this is problematic because:

  • have to wait a long time until you see result
  • if it fails, everything fails
  • it overloads the DB
  • if there are more urgent items coming in while it's running - have to wait for it to complete

the solution:

  • use the time delay processor (see Update delay_limit #85) in combination with the schedule - to limit processors running times.
  • change the dump.to_sql processor to a version that updates and commits every few rows (as opposed to committing all at the end)
@OriHoch
Copy link
Contributor Author

OriHoch commented Aug 14, 2017

  • clearmash/download-entities - 5 minutes
  • clearmash/download-related-entities - 1.5 hours
  • clearmash/entities-delete - 5 minutes
  • clearmash/entities-sync - <1 minute
  • clearmash/entity-ids - 5 minutes

@OriHoch OriHoch removed their assignment Aug 14, 2017
@OriHoch
Copy link
Contributor Author

OriHoch commented Aug 14, 2017

according to last run times above - it's not critical

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant