Bbpp134 2298/distributed bluenaas #45

bilalesi · 2024-10-04T10:40:36Z

refactor simulations:

separate current and frequency specific functionalities in /core/stimulation.
refactor common methods into /core/stimulation/utils.
refactor the stimulation runners into /core/stimulation/runners.
refactor stimulation basic method (prepare parameters, basic config, apply stimulation) into /core/stimulation/common.

add /scripts folder contains the entrypoints for main application and celery workers containers .
add redis-insight, flower and redis container

redis as broker and backend (for transmitting state).
redis-insight for debugging the queue/state of the application.
flower better to use it to track the broker queue and tasks status.

add monitor container to track events of the workers (DEV only).
create nfs storage to share the models between main app and workers.
use celery for distributed tasks (simulation).
add new celery task class that will help for init/saving/update simulation in nexus.
create 2 endpoints (start/stop simulations).

update: extend protection of a worker based on queue depth

Signed-off-by: Dinika Saxena <[email protected]>

… and non-realtime sims

…th frontend

…aptome

then update with results on finish The idea was to already create a distribution in s3 for a simulation as soon as we get a request to launch a simulation. This distribution would later be updated by the celery task to also include simulation result. However, it looks like it's not possible to update a file stored in s3 (see https://bluebrainproject.slack.com/archives/G013PKBUHT2/p1728567806810799) so I will create a distribution at the end, when the simulation is complete. This distribution will contain the simulation config, simulation result as well as stimulus plot data.

Reponse: id: str status: SimulationStatus results: Any name: str description: str created_by: str simulation_config: Optional[SingleNeuronSimulationConfig] type: SimulationType me_model_self: str synaptome_model_self: Optional[str] Signed-off-by: Dinika Saxena <[email protected]>

Signed-off-by: Dinika Saxena <[email protected]>

update: add filter by creation date for list simulations

pgetta

Hi Bilal, this is very impressive. I've added some comments/suggestions.

Feel free to decide yourself what's important to address and when.

.vscode/settings.json

bluenaas/celery.sh

bluenaas/config/settings.py

bluenaas/routes/simulation.py

bluenaas/services/simulation/run_simulation.py

pgetta · 2024-10-17T10:54:10Z

docker-compose.yml

+volumes:
+  nfs_share:
+    driver_opts:
+      type: "nfs"


What's the benefit to have NFS for local development?

From the application perspective, the model cache is just another folder in the filesystem that the app can use.
In AWS deployment it is implemented via NFS, but locally we can use normal docker volume/mount. Maybe there is something I'm missing.

just mimicking the same arch as prod (next step is to use swarm or other tech for replicating workers in local dev)

Imho, the idea of using normal docker volumes instead of nfs locally is worth considering.

It helps to keep the local setup easy (no need to setup an nfs server locally), similar between developers and OS, and having fs locally or remotely doesn't change anything from docker or application perspective (so we don't gain much from having a more complicated local setup).

I used normal docker volumes when working on the new endpoints for ML team and it did seem to work without any issues.

Thanks for the details, Bilal.

I'm not sure how we can benefit from this (closer to production) implementation, especially given that the NFS server platform/version/configuration is going to be still different in our local machine and AWS.

The current implementation has few limitations:

Works only in macOS.

Relies on local NFS server and updates it's global configuration.

A better approach would be to have another container with NFS server with docker mount for local filesystem - so that everything is containerised and easy to manage.

At the same time, I think, as Dinika mentioned, it's best to leave NFS out of scope just to keep everything simple.

pgetta · 2024-10-17T10:56:03Z

pyproject.toml


 [tool.poetry.group.types.dependencies]
 types-requests = "^2.32.0.20240712"
+
 [build-system]
 requires = ["poetry-core"]
 build-backend = "poetry.core.masonry.api"

 [tool.pyright]
 venvPath = "/Users/meddah/Library/Caches/pypoetry/virtualenvs"


Just noticed something that probably we should not have there.

what's it ?

Ah yeah, it's the absolute path from your machine: "/Users/meddah/Library/Caches/pypoetry/virtualenvs".
There should be a way not to hardcode it for everybody.

bilalesi added 3 commits October 4, 2024 11:21

configure distributed sims

4060a22

finalizing the worker arch

098610f

update: stop simulation

4d469ad

bilalesi self-assigned this Oct 4, 2024

bilalesi and others added 26 commits October 15, 2024 17:06

update: add scaling functionalities

604f477

update: build top level functionality for scaling

fe0b3e4

update: add ecs task protection

73bdb30

update: scale based on queue size

58d11b0

update: extend protection of a worker based on queue depth

Create and update simulation results to nexus

5b2a6d8

Signed-off-by: Dinika Saxena <[email protected]>

Add endpoint to fetch simulation status

9d246e9

Get status of launched simulation

2d89002

Use literal for status

d4716cc

Get results when simulation completes

ad27454

Reuse same celery task for realtime as well as bg simulation

7fcfbbb

Use same function to split simulation into child process for realtime…

5db56d1

… and non-realtime sims

Use same function to run simulations for diff curr/Hz in child processes

6c5d18e

Use synaptome instead of synapses in distribution to be consistent wi…

aec1355

…th frontend

Rename me_model to model since it can also be of type SingleNeuronSyn…

f760d5a

…aptome

Enable non-realtime simulation for current varying simulations

48ffc3d

Remove unnecessary check for _rev when prepping distribution

bcf4fb2

Save error in simulation resource

f07ab67

Save stimulus plot data to distribution too

0f0fb45

Save stimulus data for synaptome simulations

e09adcc

End simulation config and model selfs to to simualtionresponse

504bd4a

Reuse logic to create simulation response

2de608b

Add endpoint to fetch all simulations of type

12a34b3

Add endpoint to delete simulations

58477d3

Signed-off-by: Dinika Saxena <[email protected]>

Update results type to be a dictionary to be a bit more helpful than Any

12a5a7b

bilalesi added 6 commits October 15, 2024 17:06

fix: typing and urls

7e8f0a2

update: add autosave to realtime simulation

81c77d0

update: add autosave to run-realtime endpoint

6d9f972

update: add some docs to service fns

9280de0

fix: deprecate simulation

a97e453

update: add filter by creation date for list simulations

update: docs for endpoints

af9fdbf

bilalesi force-pushed the BBPP134-2298/distributed-bluenaas branch from 9d75052 to 53ebaf8 Compare October 16, 2024 20:57

fix: docs and vars

1a41fea

bilalesi force-pushed the BBPP134-2298/distributed-bluenaas branch from 53ebaf8 to 1a41fea Compare October 17, 2024 07:18

pgetta reviewed Oct 17, 2024

View reviewed changes

fix: review

fd79114

bilalesi force-pushed the BBPP134-2298/distributed-bluenaas branch from e95decb to fd79114 Compare October 19, 2024 08:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bbpp134 2298/distributed bluenaas #45

Bbpp134 2298/distributed bluenaas #45

bilalesi commented Oct 4, 2024

pgetta left a comment

pgetta Oct 17, 2024

bilalesi Oct 18, 2024 •

edited

Loading

Dinika Oct 18, 2024

pgetta Oct 21, 2024

pgetta Oct 17, 2024

bilalesi Oct 18, 2024

pgetta Oct 21, 2024

Bbpp134 2298/distributed bluenaas #45

Are you sure you want to change the base?

Bbpp134 2298/distributed bluenaas #45

Conversation

bilalesi commented Oct 4, 2024

pgetta left a comment

Choose a reason for hiding this comment

pgetta Oct 17, 2024

Choose a reason for hiding this comment

bilalesi Oct 18, 2024 • edited Loading

Choose a reason for hiding this comment

Dinika Oct 18, 2024

Choose a reason for hiding this comment

pgetta Oct 21, 2024

Choose a reason for hiding this comment

pgetta Oct 17, 2024

Choose a reason for hiding this comment

bilalesi Oct 18, 2024

Choose a reason for hiding this comment

pgetta Oct 21, 2024

Choose a reason for hiding this comment

bilalesi Oct 18, 2024 •

edited

Loading