Skip to content
This repository has been archived by the owner on Jan 14, 2025. It is now read-only.

Update refactor #274

Merged
merged 11 commits into from
Oct 19, 2023
3 changes: 2 additions & 1 deletion docs/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,5 @@ ipython
jupytext
jupyter
matplotlib
eztao
eztao
ray
1 change: 1 addition & 0 deletions docs/tutorials.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,4 @@ functionality.
Binning Sources in the Ensemble <tutorials/binning_slowly_changing_sources>
Structure Function Showcase <tutorials/structure_function_showcase>
Loading Data into the Ensemble <tutorials/tape_datasets>
Using Ray with the Ensemble <tutorials/using_ray_with_the_ensemble>
184 changes: 184 additions & 0 deletions docs/tutorials/using_ray_with_the_ensemble.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,184 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "bcb10f72-948f-475e-a856-4f5c9516fd5e",
"metadata": {},
"source": [
"# Using Dask on Ray with the Ensemble\n",
"\n",
"[Ray](https://docs.ray.io/en/latest/ray-overview/index.html) is an open-source unified framework for scaling AI and Python applications. Ray provides a scheduler for Dask ([dask_on_ray](https://docs.ray.io/en/latest/ray-more-libs/dask-on-ray.html)) which allows you to build data analyses using Dask’s collections and execute the underlying tasks on a Ray cluster. We have found with TAPE that the Ray scheduler is often more performant than Dasks scheduler. Ray can be used on TAPE using the setup shown in the following example."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ace065cd-5c75-4282-bca5-36ebe6868234",
"metadata": {},
"outputs": [],
"source": [
"import ray\n",
"from ray.util.dask import enable_dask_on_ray, disable_dask_on_ray\n",
"from tape import Ensemble\n",
"from tape.analysis.structurefunction2 import calc_sf2\n",
"\n",
"context = ray.init()\n",
"\n",
"# Use the Dask config helper to set the scheduler to ray_dask_get globally,\n",
"# without having to specify it on each compute call.\n",
"enable_dask_on_ray()"
]
},
{
"cell_type": "markdown",
"id": "e6e9fa72-5811-4750-8ba8-bcd762eb80fa",
"metadata": {},
"source": [
"We import ray, and just need to invoke two commands. `context = ray.init()` starts a local ray cluster, and we can use this context object to retrieve the url of the ray dashboard, as shown below. `enable_dask_on_ray()` is a dask configuration function that sets up all Dask work to use the established Ray cluster."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "04453edd-b22b-43cb-abc3-e61e0c958b04",
"metadata": {},
"outputs": [],
"source": [
"print(context.dashboard_url)"
]
},
{
"cell_type": "markdown",
"id": "f9ad55cc-2203-4145-be1c-0af331805624",
"metadata": {},
"source": [
"For TAPE, the only needed change is to specify `client=False` when initializing an `Ensemble` object. Because the Dask configuration has been set, the Ensemble will automatically use the established Ray cluster."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2cf7c608-fc46-455e-a7f7-04e8b64d52ec",
"metadata": {},
"outputs": [],
"source": [
"ens=Ensemble(client=False) # Do not use a client"
]
},
{
"cell_type": "markdown",
"id": "6a1b904e-7bf6-4dd5-b1e6-0c6229a98739",
"metadata": {},
"source": [
"From here, we are free to work with TAPE as normal."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e0e3bf1a-f9b9-45be-9fea-390d25380794",
"metadata": {},
"outputs": [],
"source": [
"ens.from_dataset(\"s82_qso\")\n",
"ens._source = ens._source.repartition(npartitions=10)\n",
"ens.batch(calc_sf2, use_map=False) # use_map is false as we repartition naively, splitting per-object sources across partitions"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "c5692d75",
"metadata": {},
"source": [
"## Timing Comparison\n",
"\n",
"As mentioned above, we generally see that Ray is more performant than Dask. Below is a simple timing comparison."
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "f128cdbf",
"metadata": {},
"source": [
"### Ray Timing"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dd960e10",
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"\n",
"ens=Ensemble(client=False) # Do not use a client\n",
"ens.from_dataset(\"s82_qso\")\n",
"ens._source = ens._source.repartition(npartitions=10)\n",
"ens.batch(calc_sf2, use_map=False)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "228e5114",
"metadata": {},
"source": [
"### Dask Timing"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "24a8f466",
"metadata": {},
"outputs": [],
"source": [
"disable_dask_on_ray() # unsets the dask_on_ray configuration settings"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1552c2b8",
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"\n",
"ens = Ensemble()\n",
"ens.from_dataset(\"s82_qso\")\n",
"ens._source = ens._source.repartition(npartitions=10)\n",
"ens.batch(calc_sf2, use_map=False)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
},
"vscode": {
"interpreter": {
"hash": "83afbb17b435d9bf8b0d0042367da76f26510da1c5781f0ff6e6c518eab621ec"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}
3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,8 @@ dev = [
"ipython", # Also used in building notebooks into Sphinx
"matplotlib", # Used in sample notebook intro_notebook.ipynb
"eztao==0.4.1", # Used in Structure Function example notebook
"bokeh", # Used to render dask client dashboard in Scaling to Large Data notebook
"bokeh", # Used to render dask client dashboard in Scaling to Large Data notebook
"ray[default]" # Used in the Ray on Ensemble notebook
]

[project.urls]
Expand Down
Loading