Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/SK-658 | Add gRPC healthchecking #515

Merged
merged 10 commits into from
Feb 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
ARG BASE_IMG=python:3.10-slim
FROM $BASE_IMG

ARG GRPC_HEALTH_PROBE_VERSION=""

# Requirements (use MNIST Keras as default)
ARG REQUIREMENTS=""

Expand All @@ -15,6 +17,17 @@ COPY $REQUIREMENTS /app/config/requirements.txt
# Install developer tools (needed for psutil)
RUN apt-get update && apt-get install -y python3-dev gcc

# Install grpc health probe checker
RUN if [ ! -z "$GRPC_HEALTH_PROBE_VERSION" ]; then \
apt-get install -y wget && \
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-amd64 && \
chmod +x /bin/grpc_health_probe && \
apt-get remove -y wget && apt autoremove -y; \
else \
echo "No grpc_health_probe version specified, skipping installation"; \
fi


# Create FEDn app directory
SHELL ["/bin/bash", "-c"]
RUN mkdir -p /app \
Expand Down
15 changes: 13 additions & 2 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ services:
context: .
args:
BASE_IMG: ${BASE_IMG:-python:3.10-slim}
GRPC_HEALTH_PROBE_VERSION: v0.4.24
working_dir: /app
volumes:
- ${HOST_REPO_DIR:-.}/fedn:/app/fedn
Expand All @@ -99,6 +100,16 @@ services:
- "/venv/bin/pip install --no-cache-dir -e /app/fedn && /venv/bin/fedn run combiner --init config/settings-combiner.yaml"
ports:
- 12080:12080
healthcheck:
test:
[
"CMD",
"/bin/grpc_health_probe",
"-addr=localhost:12080"
]
interval: 2s
timeout: 10s
retries: 5
depends_on:
- api-server

Expand All @@ -119,5 +130,5 @@ services:
deploy:
replicas: 0
depends_on:
- api-server
- combiner
combiner:
condition: service_healthy
5 changes: 5 additions & 0 deletions docs/quickstart.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
Quickstart Tutorial PyTorch (MNIST)
===================================

.. note::
This tutorial is a quickstart guide to FEDn where the federated learning project has been defined
using an already implemented ``compute package``. This turorial is a good starting point for developers, however,
head over to :ref:`tutorial-label` when you are ready to define your own federated learning projects.

This classic example of hand-written text recognition is well suited as a lightweight test when developing on FEDn in pseudo-distributed mode.
A normal high-end laptop or a workstation should be able to sustain a few clients.
The example automates the partitioning of data and deployment of a variable number of clients on a single host.
Expand Down
10 changes: 5 additions & 5 deletions examples/mnist-pytorch/API_Example.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 1,
"id": "743dfe47",
"metadata": {},
"outputs": [],
Expand All @@ -38,7 +38,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 2,
"id": "1061722d",
"metadata": {},
"outputs": [],
Expand All @@ -58,7 +58,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 4,
"id": "5107f6f9",
"metadata": {},
"outputs": [],
Expand All @@ -78,7 +78,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 6,
"id": "f0380d35",
"metadata": {},
"outputs": [],
Expand All @@ -90,7 +90,7 @@
" \"session_id\": session_id,\n",
" \"aggregator\": \"fedavg\",\n",
" \"model_id\": seed_model['model_id'],\n",
" \"rounds\": 10\n",
" \"rounds\": 2\n",
" }\n",
"\n",
"result_fedavg = client.start_session(**session_config_fedavg)"
Expand Down
2 changes: 1 addition & 1 deletion fedn/fedn/network/combiner/combiner.py
Original file line number Diff line number Diff line change
Expand Up @@ -398,7 +398,7 @@ def _flush_model_update_queue(self):

#####################################################################################################################

# Control Service
# Controller Service

def Start(self, control: fedn.ControlRequest, context):
""" Start a round of federated learning"
Expand Down
4 changes: 4 additions & 0 deletions fedn/fedn/network/grpc/server.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from concurrent import futures

import grpc
from grpc_health.v1 import health, health_pb2_grpc

import fedn.network.grpc.fedn_pb2_grpc as rpc
from fedn.common.log_config import (logger, set_log_level_from_string,
Expand All @@ -17,6 +18,7 @@ def __init__(self, servicer, modelservicer, config):

self.server = grpc.server(futures.ThreadPoolExecutor(max_workers=350))
self.certificate = None
self.health_servicer = health.HealthServicer()

if isinstance(servicer, rpc.CombinerServicer):
rpc.add_CombinerServicer_to_server(servicer, self.server)
Expand All @@ -29,6 +31,8 @@ def __init__(self, servicer, modelservicer, config):
if isinstance(servicer, rpc.CombinerServicer):
rpc.add_ControlServicer_to_server(servicer, self.server)

health_pb2_grpc.add_HealthServicer_to_server(self.health_servicer, self.server)

if config['secure']:
logger.info(f'Creating secure gRPCS server using certificate: {config["certificate"]}')
server_credentials = grpc.ssl_server_credentials(
Expand Down
8 changes: 4 additions & 4 deletions fedn/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@

setup(
name='fedn',
version='0.7.2',
version='0.8.0',
description="""Scaleout Federated Learning""",
author='Scaleout Systems AB',
author_email='[email protected]',
url='https://www.scaleoutsystems.com',
py_modules=['fedn'],
python_requires='>=3.7,<3.11',
python_requires='>=3.8,<3.11',
install_requires=[
"PyYAML>=5.4",
"requests",
Expand All @@ -31,7 +31,8 @@
"plotly",
"pandas",
"bokeh<3.0.0",
"networkx"
"networkx",
"grpcio-health-checking"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this must be pinned, since grpcio is pinned at v1.57

],
license='Apache 2.0',
zip_safe=False,
Expand All @@ -42,7 +43,6 @@
packages=find_packages(exclude=["tests", "tests.*"]),
classifiers=[
'Natural Language :: English',
'Programming Language :: Python :: 3.7',
'Programming Language :: Python :: 3.8',
'Programming Language :: Python :: 3.9',
'Programming Language :: Python :: 3.10',
Expand Down
Loading