Skip to content

Commit

Permalink
1. fed_stats is broken due to the simulator changes, change the works…
Browse files Browse the repository at this point in the history
…pace directory in the notebooks and examples (#2691)

2. for df_stats job, notice the new datasets has empty line at end, causing pandas to mistaken numerical data type to string type. We change the prepare_data.py to remove empty line
3. update df_stats notebook to add panda data table format.

Co-authored-by: Sean Yang <[email protected]>
  • Loading branch information
chesterxgchen and SYangster authored Jul 12, 2024
1 parent f7ca641 commit dd248ca
Show file tree
Hide file tree
Showing 7 changed files with 95 additions and 205 deletions.
64 changes: 58 additions & 6 deletions examples/advanced/federated-statistics/df_stats.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,7 @@
"\n",
"## Set Up NVFLARE\n",
"\n",
"Follow [Getting Started](https://nvflare.readthedocs.io/en/main/getting_started.html) to set up a virtual environment and install NVFLARE.\n",
"\n",
"You can also follow this [notebook](../../nvflare_setup.ipynb) to get set up.\n",
"\n",
"\n",
"> make sure you have installed nvflare from the terminal first"
"Follow [Getting Started](https://nvflare.readthedocs.io/en/main/getting_started.html) to set up a virtual environment and install NVFLARE.\n"
]
},
{
Expand Down Expand Up @@ -77,6 +72,63 @@
"prepare_data(data_root_dir = \"/tmp/nvflare/df_stats/data\")"
]
},
{
"cell_type": "markdown",
"id": "c5444d8f-4938-4759-bd43-831013043c23",
"metadata": {},
"source": [
"#### Let's take a look at the data"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a1cf37d0-7555-4818-9963-ca7342161a4d",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"data_path =\"/tmp/nvflare/df_stats/data/site-1/data.csv\"\n",
"data_features = [\n",
" \"Age\",\n",
" \"Workclass\",\n",
" \"fnlwgt\",\n",
" \"Education\",\n",
" \"Education-Num\",\n",
" \"Marital Status\",\n",
" \"Occupation\",\n",
" \"Relationship\",\n",
" \"Race\",\n",
" \"Sex\",\n",
" \"Capital Gain\",\n",
" \"Capital Loss\",\n",
" \"Hours per week\",\n",
" \"Country\",\n",
" \"Target\",\n",
" ]\n",
"\n",
" # the original dataset has no header,\n",
" # we will use the adult.train dataset for site-1, the adult.test dataset for site-2\n",
" # the adult.test dataset has incorrect formatted row at 1st line, we will skip it.\n",
"skip_rows = {\n",
" \"site-1\": [],\n",
" \"site-2\": [0],\n",
" }\n",
"\n",
"df= pd.read_csv(data_path, names=data_features, sep=r\"\\s*,\\s*\", skiprows=skip_rows, engine=\"python\", na_values=\"?\")\n",
"df\n"
]
},
{
"cell_type": "markdown",
"id": "81f6d572-7dc0-4cec-8382-f25555f52af9",
"metadata": {
"jp-MarkdownHeadingCollapsed": true
},
"source": [
"> Note **We will only calculate the statistics of numerical features, categorical features will be skipped**"
]
},
{
"cell_type": "markdown",
"id": "f00de5e4-4360-4fc5-a819-4eb156e56341",
Expand Down
6 changes: 3 additions & 3 deletions examples/advanced/federated-statistics/df_stats/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,12 +50,12 @@ With FL simulator, we can just run the example with CLI command

```
cd NVFlare/examples/advanced/federated-statistics
nvflare simulator df_stats/jobs/df_stats -w /tmp/nvflare/df_stats -n 2 -t 2
nvflare simulator df_stats/jobs/df_stats -w /tmp/nvflare/workspace/df_stats -n 2 -t 2
```

The results are stored in workspace "/tmp/nvflare"
```
/tmp/nvflare/df_stats/simulate_job/statistics/adults_stats.json
/tmp/nvflare/workspace/df_stats/simulate_job/statistics/adults_stats.json
```

## 3. Visualization
Expand All @@ -66,7 +66,7 @@ The results are stored in workspace "/tmp/nvflare"
assuming NVFLARE_HOME env variable point to the GitHub project location (NVFlare) which contains current example.

```bash
cp /tmp/nvflare/df_stats/simulate_job/advanced/statistics/adults_stats.json $NVFLARE_HOME/examples/advanced/federated-statistics/df_stats/demo/.
cp /tmp/nvflare/workspace/df_stats/simulate_job/advanced/statistics/adults_stats.json $NVFLARE_HOME/examples/advanced/federated-statistics/df_stats/demo/.

cd $NVFLARE_HOME/examples/advanced/federated-statistics/df_stats/demo

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,10 @@ def __init__(self, data_path):
"Country",
"Target",
]

# the original dataset has no header,
# we will use the adult.train dataset for site-1, the adult.test dataset for site-2
# the adult.test dataset has incorrect formatted row at 1st line, we will skip it.
self.skip_rows = {
"site-1": [],
"site-2": [0],
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,10 @@ def prepare_data(data_root_dir: str):
writer = csv.writer(f)
r = requests.get(url, allow_redirects=True)
for line in r.iter_lines():
writer.writerow(line.decode("utf-8").split(","))
if line:
writer.writerow(line.decode("utf-8").split(","))
else:
print("skip empty line\n")
print("\ndone with prepare data")


Expand Down
21 changes: 7 additions & 14 deletions examples/advanced/federated-statistics/image_stats.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,7 @@
"\n",
"## Setup NVFLARE\n",
"\n",
"Follow [Getting Started](https://nvflare.readthedocs.io/en/main/getting_started.html) to set up a virtual environment and install NVFLARE.\n",
"\n",
"You can also follow this [notebook](../../nvflare_setup.ipynb) to get set up.\n",
"\n",
"> Make sure you have installed nvflare from **terminal** \n"
"Follow [Getting Started](https://nvflare.readthedocs.io/en/main/getting_started.html) to set up a virtual environment and install NVFLARE.\n"
]
},
{
Expand Down Expand Up @@ -147,7 +143,7 @@
"outputs": [],
"source": [
"from nvflare.private.fed.app.simulator.simulator_runner import SimulatorRunner\n",
"runner = SimulatorRunner(job_folder=\"image_stats/jobs/image_stats\", workspace=\"/tmp/nvflare/image_stats\", n_clients = 4, threads=4)\n",
"runner = SimulatorRunner(job_folder=\"image_stats/jobs/image_stats\", workspace=\"/tmp/nvflare/workspace/image_stats\", n_clients = 4, threads=4)\n",
"runner.run()"
]
},
Expand All @@ -163,17 +159,14 @@
"From a **terminal** one can also the following equivallent CLI\n",
"\n",
"```\n",
"nvflare simulator image_stats/jobs/image_stats -w /tmp/nvflare/image_stats -n 4 -t 4\n",
"nvflare simulator image_stats/jobs/image_stats -w /tmp/nvflare/workspace/image_stats -n 4 -t 4\n",
"\n",
"```\n",
"\n",
"assuming the nvflare is installed from a **terminal**. doing pip install from the notebook cell directory with bash command (! or %%bash) may or may not work depending on which python runtime kernel selected. Also %pip install or %pip install from notebook cell doesn't register the console_scripts in the PATH. \n",
"\n",
"\n",
"## Examine the result\n",
"\n",
"\n",
"\n"
"## Examine the result\n"
]
},
{
Expand All @@ -194,7 +187,7 @@
},
"outputs": [],
"source": [
"! ls -al /tmp/nvflare/image_stats/simulate_job/statistics/image_statistics.json"
"! ls -al /tmp/nvflare/workspace/image_stats/simulate_job/statistics/image_statistics.json"
]
},
{
Expand All @@ -217,7 +210,7 @@
},
"outputs": [],
"source": [
"! cp /tmp/nvflare/image_stats/simulate_job/statistics/image_statistics.json image_stats/demo/."
"! cp /tmp/nvflare/workspace/image_stats/simulate_job/statistics/image_statistics.json image_stats/demo/."
]
},
{
Expand Down Expand Up @@ -266,7 +259,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.17"
"version": "3.10.2"
}
},
"nbformat": 4,
Expand Down

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions nvflare/app_common/workflows/statistics_controller.py
Original file line number Diff line number Diff line change
Expand Up @@ -309,6 +309,7 @@ def results_cb(self, client_task: ClientTask, fl_ctx: FLContext):

result = client_task.result
rc = result.get_return_code()
ds_features = None
if rc == ReturnCode.OK:
self.log_info(fl_ctx, f"Received result entries from client:{client_name}, " f"for task {task_name}")
dxo = from_shareable(result)
Expand Down

0 comments on commit dd248ca

Please sign in to comment.