Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce number of parallel data retrieving tasks #33

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

brad
Copy link
Member

@brad brad commented Apr 25, 2016

@orcasgit/orcas-developers Please review. This greatly reduces the number of tasks we have running simultaneously, make it far less likely for conflicts resulting in bad refresh tokens.

@brad brad mentioned this pull request Apr 25, 2016
@coveralls
Copy link

Coverage Status

Coverage increased (+0.4%) to 93.995% when pulling 1f2c606 on less-parallelism into ae6b524 on master.

raise Reject(sys.exc_info()[1], requeue=False)

# Create a lock so we don't try to run the same task multiple times
sdat = date.strftime('%Y-%m-%d') if date else 'ALL'
lock_id = '{0}-lock-{1}-{2}-{3}'.format(__name__, fitbit_user, _type, sdat)
cats = '-'.join('%s' % i for i in categories)
lock_id = '{0}-lock-{1}-{2}-{3}'.format(__name__, fitbit_user, cats, sdat)
if not cache.add(lock_id, 'true', LOCK_EXPIRE):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brad I don't think we can use the Django cache for the lock and guarantee that it will work with the various setups that people are likely to have. For example, Django's default caching method is local memory caching, which is a per-process cache. Depending on celery setup, this code can be executed by more than one process which would each have their own cache and not be able to see the locks created by the other workers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grokcode What can I do then? Would it be safe to get rid of this lock and decorate get_fitbit_data with @transaction.atomic()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brad I think the easiest solution is to punt on it for now and make a note in the README here that the fitbit tasks shouldn't be run concurrently, and then give an example of a way to set up celery to do that. I think we can use celery's manual routing feature to create a new queue and then when starting celery, make sure there is only one thread working on that queue.

It would be much nicer to support concurrent tasks (but trickier too). I think we can do the locking with the db. One idea is to store the lock in the DB, use the @transaction.atomic() decorator like you said, and Django's select_for_update to acquire the lock. I think it would be enough to have one lock per user so that only tasks for one user can execute at a time. That way we shouldn't have multiple processes trying to renew the token at the same time and stepping on each other.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants