Docs/update (#581)

scaleoutsystems · Apr 18, 2024 · d59d6d8 · d59d6d8
1 parent 2c06fc8
commit d59d6d8
Show file tree

Hide file tree

Showing 24 changed files with 381 additions and 383 deletions.
diff --git a/.ci/tests/examples/print_logs.sh b/.ci/tests/examples/print_logs.sh
@@ -12,7 +12,8 @@ echo "Combiner logs"
 docker logs "$(basename $PWD)-combiner-1"
 
 echo "Client 1 logs"
-docker logs "$(basename $PWD)-client-1"
-
-echo "Client 2 logs"
-docker logs "$(basename $PWD)-client-2"
+if [ "$example" == "mnist-keras" ]; then
+    docker logs "$(basename $PWD)-client-1"
+else
+    docker logs "$(basename $PWD)-client1-1"
+fi
diff --git a/.ci/tests/examples/run.sh b/.ci/tests/examples/run.sh
@@ -19,10 +19,17 @@ pushd "examples/$example"
 "../../.$example/bin/fedn" package create --path client
 "../../.$example/bin/fedn" run build --path client
 
-docker compose \
-    -f ../../docker-compose.yaml \
-    -f docker-compose.override.yaml \
-    up -d --build --scale client=1
+if [ "$example" == "mnist-keras" ]; then
+    docker compose \
+        -f ../../docker-compose.yaml \
+        -f docker-compose.override.yaml \
+        up -d --build --scale client=1
+else
+    docker compose \
+        -f ../../docker-compose.yaml \
+        -f docker-compose.override.yaml \
+        up -d --build combiner api-server mongo minio client1   
+fi
 
 >&2 echo "Wait for reducer to start"
 python ../../.ci/tests/examples/wait_for.py reducer

diff --git a/.github/workflows/build-containers.yaml b/.github/workflows/build-containers.yaml
@@ -35,40 +35,14 @@ jobs:
             type=semver,pattern={{version}}
             type=semver,pattern={{major}}.{{minor}}
             type=sha
-      
-      - name: Docker meta mnist-keras
-        id: meta2
-        uses: docker/metadata-action@v4
-        with:
-          images: |
-            docker.pkg.github.com/${{ github.repository }}/fedn
-          tags: |
-            type=ref,event=branch,suffix=-mnist-keras
-            type=ref,event=pr,suffix=-mnist-keras
-            type=semver,pattern={{version}},suffix=-mnist-keras
-            type=semver,pattern={{major}}.{{minor}},suffix=-mnist-keras
-            type=sha,suffix=-mnist-keras
-      
-      - name: Docker meta mnist-pytorch
-        id: meta3
-        uses: docker/metadata-action@v4
-        with:
-          images: |
-            docker.pkg.github.com/${{ github.repository }}/fedn
-          tags: |
-            type=ref,event=branch,suffix=-mnist-pytorch
-            type=ref,event=pr,suffix=-mnist-pytorch
-            type=semver,pattern={{version}},suffix=-mnist-pytorch
-            type=semver,pattern={{major}}.{{minor}},suffix=-mnist-pytorch
-            type=sha,suffix=-mnist-pytorch
-
 
       - name: Log in to GitHub Container Registry
         uses: docker/login-action@v2
         with:
           registry: docker.pkg.github.com
           username: ${{ github.actor }}
           password: ${{ secrets.GITHUB_TOKEN }}
+
 
       - name: Build and push
         uses: docker/build-push-action@v4
@@ -77,21 +51,3 @@ jobs:
           tags: ${{ steps.meta1.outputs.tags }}
           labels: ${{ steps.meta1.outputs.labels }}
           file: Dockerfile
-
-      - name: Build and push (mnist-keras)
-        uses: docker/build-push-action@v4
-        with:
-          push: "${{ github.event_name != 'pull_request' }}"
-          tags: ${{ steps.meta2.outputs.tags }}
-          labels: ${{ steps.meta2.outputs.labels }}
-          file: Dockerfile
-          build-args: |
-            REQUIREMENTS=examples/mnist-keras/requirements.txt
-
-      - name: Build and push (mnist-pytorch)
-        uses: docker/build-push-action@v4
-        with:
-          push: "${{ github.event_name != 'pull_request' }}"
-          tags: ${{ steps.meta3.outputs.tags }}
-          labels: ${{ steps.meta3.outputs.labels }}
-          file: Dockerfile
diff --git a/.github/workflows/code-checks.yaml b/.github/workflows/code-checks.yaml
@@ -45,6 +45,7 @@ jobs:
           --exclude-dir='docs'
           --exclude-dir='flower-client'
           --exclude='tests.py'
+          --exclude='README.rst'
           '^[ \t]+(import|from) ' -I . 
 
       # TODO: add linting/formatting for all file types
diff --git a/.github/workflows/integration-tests.yaml b/.github/workflows/integration-tests.yaml
@@ -17,7 +17,7 @@ jobs:
         to_test:
           - "mnist-keras numpyhelper"
           - "mnist-pytorch numpyhelper"
-        python_version: ["3.9","3.10", "3.11"]
+        python_version: ["3.8","3.9","3.10", "3.11"]
         os:
           - ubuntu-22.04
     runs-on: ${{ matrix.os }}

diff --git a/README.rst b/README.rst
@@ -57,7 +57,7 @@ Getting started
 
 The best way to get started is to take the quickstart tutorial: 
 
-- `Quickstart <https://fedn.readthedocs.io/en/latest/quickstart.html>`__
+- `Quickstart <https://fedn.readthedocs.io/en/stable/quickstart.html>`__
 
 Documentation
 =============
@@ -72,8 +72,8 @@ Running your project in FEDn Studio (SaaS or on-premise)
 
 The FEDn Studio SaaS is free for development, testing and research (one project per user, backend compute resources sized for dev/test):   
 
-- `Register for a free account in FEDn Studio <https://studio.scaleoutsystems.com/signup/>`__
-- `Take the tutorial to deploy your project on FEDn Studio <https://guide.scaleoutsystems.com/#/docs>`__  
+- `Register for a free account in FEDn Studio <https://fedn.scaleoutsystems.com/signup/>`__
+- `Take the tutorial to deploy your project on FEDn Studio <https://fedn.readthedocs.io/en/stable/studio.html>`__  
 
 Scaleout can also support users to scale up experiments and demonstrators on Studio, by granting custom resource quotas. Additonally, charts are available for self-managed deployment on-premise or in your cloud VPC (all major cloud providers). Contact the Scaleout team for more information.
 
@@ -91,7 +91,7 @@ Making contributions
 
 All pull requests will be considered and are much appreciated. For
 more details please refer to our `contribution
-guidelines <https://github.com/scaleoutsystems/fedn/blob/develop/CONTRIBUTING.md>`__.
+guidelines <https://github.com/scaleoutsystems/fedn/blob/master/CONTRIBUTING.md>`__.
 
 Citation
 ========

diff --git a/docker-compose.yaml b/docker-compose.yaml
@@ -1,5 +1,5 @@
 # Compose schema version
-version: '3.3'
+version: '3.4'
 
 # Setup network
 networks:

diff --git a/docs/_static/css/text.css b/docs/_static/css/text.css
@@ -34,6 +34,7 @@ body {
 
 a {
     color: var(--scaleout-black);
+    font-weight: bold;
     text-decoration: none;
     display: inline-block;
 }

diff --git a/docs/conf.py b/docs/conf.py
@@ -23,7 +23,8 @@
     'sphinx.ext.mathjax',
     'sphinx.ext.ifconfig',
     'sphinx.ext.viewcode',
-    'sphinx_rtd_theme'
+    'sphinx_rtd_theme',
+    'sphinx_code_tabs'
 ]
 
 # The master toctree document.

diff --git a/docs/distributed.rst b/docs/distributed.rst
@@ -1,13 +1,13 @@
-Distributed deployment
-======================
+Self-managed distributed deployment
+===================================
 
 This tutorial outlines the steps for deploying the FEDn framework over a **local network**, using a single workstation or laptop as 
-the host, and different devices as clients. For general steps on how to run FEDn, see one of the quickstart tutorials. 
+the host for the servier-side components, and other hosts or devices as clients. For general steps on how to run FEDn, see the quickstart tutorials. 
 
 
 .. note::
    For a secure and production-grade deployment solution over **public networks**, explore the FEDn Studio service at 
-   **studio.scaleoutsystems.com**. 
+   **fedn.scaleoutsystems.com**. 
 
    Alternatively follow this tutorial substituting the hosts local IP with your public IP, open the neccesary 
    ports (see which ports are used in docker-compose.yaml), and ensure you have taken additional neccesary security 

diff --git a/docs/faq.rst b/docs/faq.rst
@@ -19,17 +19,6 @@ However, during development of a new model it will be necessary to reinitialize.
 
    2. Restart the clients. 
 
-Q: Can I skip fetching the remote package and instead use a local folder when developing the compute package
-------------------------------------------------------------------------------------------------------------
-
-Yes, to facilitate interactive development of the compute package you can start a client that uses a local folder 'client' in your current working directory by: 
-
-.. code-block:: bash
-
-    fedn run client --remote=False -in client.yaml 
-
-
-Note that in production federations this options should in most cases be disallowed. 
 
 Q: How can other aggregation algorithms can be defined?
 -------------------------------------------------------
@@ -39,10 +28,10 @@ There is a plugin interface for extending the framework with new aggregators. Se
 :ref:`agg-label`
 
 
-Q: What is needed to include other ML frameworks in FEDn like sklearn, xgboost, etc.?
+Q: What is needed to include additional ML frameworks in FEDn?
 -------------------------------------------------------------------------------------
 
-You need to make sure that FEDn knows how to serialize and deseralize the model object into paramters. If you can 
+You need to make sure that FEDn knows how to serialize and deserialize the model object. If you can 
 serialize to a list of numpy ndarrays in your compute package entrypoint (see the Quickstart Tutorial code), you 
 can use the built in "numpyhelper". If this is not possible, you can extend the framework with a custom helper, 
 see the section about model marshaling: 
@@ -62,27 +51,27 @@ Yes! You can toggle which message streams a client subscibes to when starting th
 Q: How do you approach the question of output privacy? 
 ----------------------------------------------------------------------------------
 
-We take security in (federated) machine learning very seriously. Federated learning is a foundational technology that impoves input privacy 
+We take security in (federated) machine learning seriously. Federated learning is a foundational technology that impoves input privacy 
 in machine learning by allowing datasets to stay local and private, and not copied to a server. FEDn is designed to provide an industry grade
 implementation of the core communication and aggregration layers of federated learning, as well as configurable modules for traceability, logging
 etc, to allow the developer balance between privacy and auditability. With `FEDn Studio <https://scaleoutsystems.com/framework>`__ we add 
 functionality for user authentication, authorization, and federated client identity management. As such, The FEDn Framework provides
 a comprehensive software suite for implemeting secure federated learning following industry best-practices.     
 
-Going beyond input privacy, there are several additional considerations relating to output privacy and potential attacks on (federated) machine learning systems. For an
-introduction to the topic, see this blog post: 
+Going beyond input privacy, there are several additional considerations relating to output privacy and potential attacks on (federated) machine learning systems. 
+For an introduction to the topic, see this blog post: 
 
 - `Output Privacy and Federated Machine Learning <https://www.scaleoutsystems.com/post/output-privacy-and-federated-machine-learning>`__
 
-Striking the appropriate balance between system complexity and secturity becomes a use-case dependent endeavor, and we are happy to 
-engage in detailed conversations about this. As an example, one might consider layering differential privacy on top of the aggregation 
-to protect against a honest-but-curious server, at the price of a loss of accuracy for the global model. Depending on the privacy requirements, 
+Striking the appropriate balance between system complexity and security becomes a use-case dependent endeavor, and we are happy to 
+support projects with guidance on these matters. For an example, one might consider layering differential privacy on top of the aggregation 
+to protect against an honest-but-curious server, at the price of a reduced accuracy for the global model. Depending on the privacy requirements, 
 the model type, the amount of data, the number of local updates possible during training etc, this may or may not be necessary. 
 
 We are engaged in several cybersecurity projects focused on federated machine learning, do not hesitate to reach out to discuss further
 with the Scaleout team.  
 
 - `LEAKPRO: Leakage Profiling and Risk Oversight for Machine Learning Models <https://www.vinnova.se/en/p/leakpro-leakage-profiling-and-risk-oversight-for-machine-learning-models/>`__
 - `Validating a System Development Kit for edge federated learning <https://www.vinnova.se/en/p/validating-a-system-development-kit-for-edge-federated-learning/>`__
-- `Truseted Execution Environments for Federated Learning: <https://www.vinnova.se/en/p/trusted-execution-environments-for-federated-learning/>`__
+- `Trusted Execution Environments for Federated Learning: <https://www.vinnova.se/en/p/trusted-execution-environments-for-federated-learning/>`__
 - `Robust IoT Security: Intrusion Detection Leveraging Contributions from Multiple Systems <https://www.vinnova.se/en/p/robust-iot-security-intrusion-detection-leveraging-contributions-from-multiple-systems/>`__
diff --git a/docs/helpers.rst b/docs/helpers.rst
@@ -1,7 +1,7 @@
 .. _helper-label:
 
-Model Serialization/Deserialization - Helpers
-=============================================
+Model Serialization/Deserialization
+===================================
 
 In federated learning, model updates need to be serialized and deserialized in order to be 
 transferred between clients and server/combiner. There is also a need to write and load models 

diff --git a/docs/introduction.rst b/docs/introduction.rst
@@ -7,53 +7,51 @@ Federated Learning allows for collaborative model training while keeping data lo
 scenarios where data cannot be easily shared due to privacy regulations, network limitations, or ownership concerns.
 
 At its core, Federated Learning orchestrates model training across distributed devices or servers, referred to as clients or participants. 
-These participants could be diverse endpoints such as mobile devices, IoT gadgets, or remote servers. Rather than transmitting raw data to a central location, 
+These participants could be diverse endpoints such as mobile devices, IoT gateways, or remote servers. Rather than transmitting raw data to a central location, 
 each participant computes gradients locally based on its data. These gradients are then communicated to a server, often called the aggregator. 
 The server aggregates and combines the gradients from multiple participants to update a global model. 
 This iterative process allows the global model to improve without the need to share the raw data.
 
-**FEDn: the SDK for scalable federated learning**
+FEDn empowers users to create federated learning applications that seamlessly transition from local proofs-of-concept to secure distributed deployments. 
+We develop the FEDn framework following these core design principles:
 
-FEDn serves as a System Development Kit (SDK) enabling scalable federated learning. 
-It is used to implement the core server side logic (including model aggregation) and the client side integrations. 
-Developers and ML engineers can use FEDn to build custom federated learning systems and bespoke deployments.
+-  **Seamless transition from proof-of-concepts to real-world FL**. FEDn has been designed to make the journey from R&D to real-world deployments as smooth as possibe. Develop your federated learning use case in a pseudo-local environment, then deploy it to FEDn Studio (cloud or on-premise) for real-world scenarios. No code change is required to go from development and testing to production. 
 
+-  **Designed for scalability and resilience.** FEDn enables model aggregation through multiple aggregation servers sharing the workload. A hierarchical architecture makes the framework well suited borh for cross-silo and cross-device use-cases. FEDn seamlessly recover from failures in all critical components, and manages intermittent client-connections, ensuring robust deployment in production environments.
 
-One of the standout features of FEDn is its ability to deploy and scale the server-side in geographically distributed setups,
-adapting to varying project needs and geographical considerations.
+-  **Secure by design.** FL clients do not need to open any ingress ports, facilitating distributed deployments across a wide variety of settings. Additionally, FEDn utilizes secure, industry-standard communication protocols and supports token-based authentication and RBAC for FL clients (JWT), providing flexible integration in production environments.   
 
+-  **Developer and data scientist friendly.** Extensive event logging and distributed tracing enables developers to monitor experiments in real-time, simplifying troubleshooting and auditing. Machine learning metrics can be accessed via both a Python API and visualized in an intuitive UI that helps the data scientists analyze and communicate ML-model training progress.
 
-**Scalable and Resilient**
 
-FEDn exhibits scalability and resilience, thanks to its tiered architecture. Multiple aggregation servers, in FEDn called combiners, 
-form a network to divide the workload of coordinating clients and aggregating models. 
-This architecture allows for high performance in various settings, from thousands of clients in a cross-device environment to 
-large model updates in a cross-silo scenario. Importantly, FEDn has built-in recovery capabilities for all critical components, enhancing system reliability.
+Features
+=========
 
-**ML-Framework Agnostic**
+Federated machine learning: 
 
-With FEDn, model updates are treated as black-box computations, meaning it can support any ML model type or framework. 
-This flexibility allows for out-of-the-box support for popular frameworks like Keras and PyTorch, making it a versatile tool for any machine learning project.
+- Support for any ML framework (e.g. PyTorch, Tensforflow/Keras and Scikit-learn)
+- Extendable via a plug-in architecture (aggregators, load balancers, object storage backends, databases  etc.)
+- Built-in federated algorithms (FedAvg, FedAdam, FedYogi, FedAdaGrad, etc.)
+- CLI and Python API client for running FEDn networks and coordinating experiments. 
+- Implement clients in any language (Python, C++, Kotlin etc.)
+- No open ports needed client-side.
 
-**Security**
 
-A key security feature of FEDn is its client protection capabilities - clients do not need to expose any ingress ports, 
-thus reducing potential security vulnerabilities.
+FEDn Studio - From development to FL in production: 
 
-**Event Tracking and Training progress**
+-  Leverage Scaleout's free managed service for development and testing in real-world scenarios (SaaS).      
+-  Token-based authentication (JWT) and role-based access control (RBAC) for FL clients.  
+-  REST API and UI. 
+-  Data science dashboard for orchestrating experiments and visualizing results.
+-  Admin dashboard for managing the FEDn network and users/clients.
+-  View extensive logging and tracing information. 
+-  Collaborate with other data-scientists on the project specification in a shared workspace. 
+-  Cloud or on-premise deployment (cloud-native design, deploy to any Kubernetes cluster)
 
-To ensure transparency and control over the training process, as well as to provide means to troubleshoot distributed deployments, 
-FEDn logs events and does real-time tracking of training progress. A flexible API lets the user define validation strategies locally on clients. 
-Data is logged as JSON to MongoDB, enabling users to create custom dashboards and visualizations easily.
+Support
+=========
 
-**REST-API and Python API Client and CLI**
+Community support in available in our `Discord
+server <https://discord.gg/KMg4VwszAd>`__.
 
-FEDn comes with an REST-API, a CLI and a Python API Client for programmatic interaction with a FEDn network. This allows for flexible automation of experiments, for integration with 
-other systems, and for easy integration with external dashboards and visualization tools.
-
-FEDn Studio
------------
-
-FEDn Studio is a web-based tool for managing and monitoring federated learning experiments. It provides the FEDn network as a managed service, as well as a user-friendly interface for monitoring the progress of training and visualizing the results. FEDn Studio is available as a SaaS at fedn.scaleoutsystems.com . It is free for development, testing and research (one project per user, backend compute resources sized for dev/test).
-
-Scaleout can also support users to scale up experiments and demonstrators on Studio, by granting custom resource quotas. Additonally, charts are available for self-managed deployment on-premise or in your cloud VPC (all major cloud providers). Contact the Scaleout team for more information.
+Options are available for `Enterprise support <https://www.scaleoutsystems.com/start#pricing>`__.