Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix realtime sims in distributed architecture #56

Open
wants to merge 8 commits into
base: distributed_simulation_per_worker
Choose a base branch
from

Conversation

Dinika
Copy link
Collaborator

@Dinika Dinika commented Nov 13, 2024

Implements the following changes to fix the new architecture:

  • Replace celery state updates with redis pub sub. Motivations:

    • Celery state updates are not event driven. This means, it is not possible to react to a new state update, the server can only poll continuously. The state updates will therefore be duplicated (which is why a "hash" was needed).
    • If we add a sleep() (or similar) then we risk losing state updates that were for data.
    • The server was polling for state updates continuously and until all the sub tasks were complete, it was unable to attend to any other simulation request.
  • Start a sub process to run the simulation inside the celery worker - If the simulation is not started in a child, the neuron simulator is not reset. It still holds state from the previous simulations that that particular celery worker ran. This is why we were seeing repeated data for current.

  • Reduce the frequency of health checks of celery workers - The command celery -A bluenaas.infrastructure.celery status takes around 7 seconds to reply. The worker gets busy replying to these health checks and the simulation performance goes down.

Neuron simulator does not reset itself automatically after a call to
simulation.run() This causes in-correct simulation data because it (the
data) also contains information from previous simulations that ran on
the same worker. The best way to avoid this (as suggested by neuron
forums) is to run the simulation inside the child process.

Since we cannot use multiprocessing module with celery, billiard is
used.

See:
- https://www.neuron.yale.edu/phpBB/viewtopic.php?t=4039
- https://stackoverflow.com/questions/54858326/python-multiprocessing-billiard-vs-multiprocessing
- celery/celery#5362 (comment)

Signed-off-by: Dinika Saxena <[email protected]>
@pgetta
Copy link
Collaborator

pgetta commented Nov 13, 2024

Great work, Dinika, the change looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants