Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bbpp134 2298/distributed bluenaas #45

Open
wants to merge 37 commits into
base: develop
Choose a base branch
from

Conversation

bilalesi
Copy link
Collaborator

@bilalesi bilalesi commented Oct 4, 2024

  1. refactor simulations:
  • separate current and frequency specific functionalities in /core/stimulation.
  • refactor common methods into /core/stimulation/utils.
  • refactor the stimulation runners into /core/stimulation/runners.
  • refactor stimulation basic method (prepare parameters, basic config, apply stimulation) into /core/stimulation/common.
  1. add /scripts folder contains the entrypoints for main application and celery workers containers .
  2. add redis-insight, flower and redis container
  • redis as broker and backend (for transmitting state).
  • redis-insight for debugging the queue/state of the application.
  • flower better to use it to track the broker queue and tasks status.
  1. add monitor container to track events of the workers (DEV only).
  2. create nfs storage to share the models between main app and workers.
  3. use celery for distributed tasks (simulation).
  4. add new celery task class that will help for init/saving/update simulation in nexus.
  5. create 2 endpoints (start/stop simulations).

@bilalesi bilalesi self-assigned this Oct 4, 2024
bilalesi and others added 26 commits October 15, 2024 17:06
update: extend protection of a worker based on queue depth
then update with results on finish

The idea was to already create a distribution in s3 for a simulation as
soon as we get a request to launch a simulation.
This distribution would later be updated by the celery task to also
include simulation result. However, it looks like it's not possible to
update a file stored in s3 (see https://bluebrainproject.slack.com/archives/G013PKBUHT2/p1728567806810799)
so I will create a distribution at the end, when the simulation is
complete. This distribution will contain the simulation config,
simulation result as well as stimulus plot data.
Reponse:
    id: str
    status: SimulationStatus
    results: Any

    name: str
    description: str
    created_by: str
    simulation_config: Optional[SingleNeuronSimulationConfig]
    type: SimulationType

    me_model_self: str
    synaptome_model_self: Optional[str]

Signed-off-by: Dinika Saxena <[email protected]>
@bilalesi bilalesi force-pushed the BBPP134-2298/distributed-bluenaas branch from 9d75052 to 53ebaf8 Compare October 16, 2024 20:57
@bilalesi bilalesi force-pushed the BBPP134-2298/distributed-bluenaas branch from 53ebaf8 to 1a41fea Compare October 17, 2024 07:18
Copy link
Collaborator

@pgetta pgetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Bilal, this is very impressive. I've added some comments/suggestions.

Feel free to decide yourself what's important to address and when.

.vscode/settings.json Outdated Show resolved Hide resolved
bluenaas/celery.sh Outdated Show resolved Hide resolved
bluenaas/config/settings.py Outdated Show resolved Hide resolved
bluenaas/config/settings.py Outdated Show resolved Hide resolved
bluenaas/config/settings.py Outdated Show resolved Hide resolved
bluenaas/routes/simulation.py Show resolved Hide resolved
bluenaas/routes/simulation.py Show resolved Hide resolved
volumes:
nfs_share:
driver_opts:
type: "nfs"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit to have NFS for local development?

From the application perspective, the model cache is just another folder in the filesystem that the app can use.
In AWS deployment it is implemented via NFS, but locally we can use normal docker volume/mount. Maybe there is something I'm missing.

Copy link
Collaborator Author

@bilalesi bilalesi Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just mimicking the same arch as prod (next step is to use swarm or other tech for replicating workers in local dev)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imho, the idea of using normal docker volumes instead of nfs locally is worth considering.

It helps to keep the local setup easy (no need to setup an nfs server locally), similar between developers and OS, and having fs locally or remotely doesn't change anything from docker or application perspective (so we don't gain much from having a more complicated local setup).

I used normal docker volumes when working on the new endpoints for ML team and it did seem to work without any issues.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the details, Bilal.

I'm not sure how we can benefit from this (closer to production) implementation, especially given that the NFS server platform/version/configuration is going to be still different in our local machine and AWS.

The current implementation has few limitations:

  • Works only in macOS.
  • Relies on local NFS server and updates it's global configuration.

A better approach would be to have another container with NFS server with docker mount for local filesystem - so that everything is containerised and easy to manage.

At the same time, I think, as Dinika mentioned, it's best to leave NFS out of scope just to keep everything simple.


[tool.poetry.group.types.dependencies]
types-requests = "^2.32.0.20240712"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

[tool.pyright]
venvPath = "/Users/meddah/Library/Caches/pypoetry/virtualenvs"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed something that probably we should not have there.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's it ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, it's the absolute path from your machine: "/Users/meddah/Library/Caches/pypoetry/virtualenvs".
There should be a way not to hardcode it for everybody.

@bilalesi bilalesi force-pushed the BBPP134-2298/distributed-bluenaas branch from e95decb to fd79114 Compare October 19, 2024 08:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants