Run your personal MLFlow server on the Apolo platform with apolo-flow

This is a apolo-flow action launching an instance of MLFlow server. You can use it to track your ML experiments and model trainings, track model in model registry and perform model deployment to production using our integrations in-job-deployments or MLFlow2Seldon integration, or build own integrations.

The MLFlow action exposes several arguments, one of which is mandatory: artifacts_destination.

Usage example could be found in the .neuro/live.yaml file.

Arguments

`mode`

Mode of operation of MLFlow server. Allowed values are "server" or "ui".

server mode. The MLFLow server should run while MLFlow Client is connected to it. Refer to it as to Scenario 5 from official MLFlow docs. This mode is also required to run in-job ML models deployments.
ui mode. In this case, MLFlow server only serves the artifacts and metadata previously stored by clients on a backend store. See official MLFlow docs describing this use-case. The bennefit of it is that you should not run MLFlow server all the time while the training happens. This helps to save costs and HW resources in a constrained environment.

Example

Running server in UI mode:

args:
    mode: "ui"

`artifacts_destination`

A local path within the MLFlow server, whee to store artifacts such as model dumps. You can find more information here

Example

You can use platform storage as a backend. To do this, use a local path for artifact store:

Set this input's value to the mount path of the needed volume.
Add its read-write reference to the inputs.volumes list.

args:
    artifacts_destination: ${{ volumes.mlflow_artifacts.mount }}
    volumes: "${{ to_json( [volumes.mlflow_artifacts.ref_rw] ) }}"

`backend_store_uri`

URI of the storage which should be used to dump experiments metainfo, their metrics, registered models, etc. You can find more information here.

Examples

The argument is not set. In this case the --backend_store_uri MLFlow flag will be ommited and the default value will be used (see the regular file case below).
Postgres server as a job within the same project:

args:
    backend_store_uri: postgresql://postgres:password@${{ inspect_job('postgres').internal_hostname_named }}:5432

SQLite persistent on a platform disk or storage. This also implies adding the respective disk's or volume reference to the volumes argument.

args:
    artifacts_destination: ${{ volumes.mlflow_artifacts.mount }}
    backend_store_uri: sqlite:///${{ volumes.mlflow_artifacts.mount }}/mlflow.db
    volumes: "${{ to_json( [ volumes.mlflow_artifacts.ref_rw ] ) }}"

Regular file. In this case, the MLFlow registered models functionality will not work.

args:
    backend_store_uri: /path/to/store

`volumes`

Reference to a list of volumes which should be mounted to the MLFlow server job. Empty by default.

Example

args:
    volumes: "${{ to_json(
        [
          volumes.mlflow_artifacts.ref_rw,
          volumes.mlflow_storage.ref_rw
        ]
      ) }}"

`envs`

List of environment variables added to the job. Empty by default.

Example

args:
    envs: "${{ to_json(
        {
          'ENV1': 'env_1_value',
          'ENV2': 'env_2_value'
        }
      ) }}"

`http_auth`

Boolean value specifying whether to use HTTP authentication for Jupyter or not. "False" by default.

Example

Enable HTTP authentication by setting this argument to True.

args:
    http_auth: "True"

Note: your training job should be able to communicate with MLFlow guarded by the Apolo platform authentication solution. In order to do so, you should put a token of a user (or a service account), which has access the corresponding MLFlow server, into the MLFLOW_TRACKING_TOKEN environment variable within the training job.

`life_span`

A value specifying how long the MLFlow server job should be running. "10d" by default.

Example

args:
    life_span: 1d2h3m

`port`

HTTP port to use for the MLFlow server. "5000" by default.

Example

args:
    http_port: "4444"

`job_name`

Predictable subdomain name which replaces the job's ID in the full job URI. "" by default (i.e., the job ID will be used).

Example

args:
    job_name: "mlflow-server"

`preset`

Resource preset to use when running the Jupyter job. "" by default (i.e., the first preset specified in the apolo config show list will be used).

Example

args:
    preset: cpu-small

`extra_params`

Additional parameters transferred to the mlflow server command. "" by default. Check the full list of accepted parameters via mlflow server --help.

Example

args:
    extra_params: "--workers 2"

Known issues

`sqlite3.OperationalError: database is locked`

This might happen under the following circumstances:

the mlflow server parameter --backend_store_uri is not set (by default, SQLite is used) or set to use SQLite or a regular file
the filesystem used to handle the file for --backend_store_uri does not support the file Lock operation (observed with the Azure File NFS solution).

To confirm whether you're running in Azure cloud hit apolo admin get-clusters.

A work-around for this is to use a platform disk: to host the SQLite data, or to use a dedicated SQL DB, for instance, PostgreSQL hosted on the platform.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github		.github
.neuro		.neuro
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
action.yaml		action.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Run your personal MLFlow server on the Apolo platform with apolo-flow

Usage example could be found in the .neuro/live.yaml file.

Arguments

`mode`

Example

`artifacts_destination`

Example

`backend_store_uri`

Examples

`volumes`

Example

`envs`

Example

`http_auth`

Example

`life_span`

Example

`port`

Example

`job_name`

Example

`preset`

Example

`extra_params`

Example

Known issues

`sqlite3.OperationalError: database is locked`

About

Releases 6

Packages

Contributors 6

Languages

License

apolo-actions/mlflow

Folders and files

Latest commit

History

Repository files navigation

Run your personal MLFlow server on the Apolo platform with apolo-flow

Usage example could be found in the .neuro/live.yaml file.

Arguments

mode

Example

artifacts_destination

Example

backend_store_uri

Examples

volumes

Example

envs

Example

http_auth

Example

life_span

Example

port

Example

job_name

Example

preset

Example

extra_params

Example

Known issues

sqlite3.OperationalError: database is locked

About

Resources

License

Stars

Watchers

Forks

Releases 6

Packages 0

Contributors 6

Languages

`mode`

`artifacts_destination`

`backend_store_uri`

`volumes`

`envs`

`http_auth`

`life_span`

`port`

`job_name`

`preset`

`extra_params`

`sqlite3.OperationalError: database is locked`

Packages