Skip to content

Commit

Permalink
Add architecture description and plot (#628)
Browse files Browse the repository at this point in the history
  • Loading branch information
PhilippeMoussalli authored Nov 15, 2023
1 parent f89f9c6 commit 092e967
Show file tree
Hide file tree
Showing 4 changed files with 66 additions and 10 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -250,6 +250,8 @@ We welcome contributions of different kinds:
For a detailed view on the roadmap and day to day development, you can check our [github project
board](https://github.com/orgs/ml6team/projects/1).

You can also check out our [architecture](docs/architecture.md) page to famliarize yourself with the Fondant architecture and repository structure.

### Environment setup

We use [poetry](https://python-poetry.org/docs/) and [pre-commit](https://pre-commit.com/) to enable a smooth developer flow. Run the following commands to
Expand Down
60 changes: 56 additions & 4 deletions docs/architecture.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,62 @@
# Architecture

An overview of the architecture of Fondant
### Fondant architecture overview

### Coming soon
![data explorer](art/architecture.png)

At a high level, Fondant consists of three main parts:

## Conceptual overview
* The `/core` directory serves as the foundational backbone of Fondant, encompassing essential
shared functionalities:
* `component_spec.py`: Defines the component spec class which is used to define the
specification of a component. Those
specifications mainly include the component image location, arguments, columns it consumes and
produces.
* `manifest.py` Describes dataset content, facilitating reference passing between components.
It evolves during pipeline execution and aids static evaluation.
* `schema.py` Defines the Type class, used for dataset data type definition.
* `/schema` Directory Containing JSON schema specifications for the component spec and manifest.

#### TODO: Add a diagram here

* The `/component` directory which contains modules for implementing Fondant components:
* `component.py`: Defines the `Component` class which is the base class for all Fondant
components. This is used
to defines interfaces for different component types (Load, Transform, Write) across different
data processing frameworks
(Dask, Pandas, ...). The user should inherit from those classes to implement their own
components.
* `data_io.py`: Defines the `DataIO` class which is used to define the reading and writing
operations from/to a dataset. This includes
optimizing the reading and writing operations as well as selecting which columns to read/write
according to the manifest.
* `executor.py`: Defines the `Executor` class which is used to define the execution of a
component. This includes
parsing the component arguments, executing the component and evolving/writing the manifest.
Each executor
subclasses a corresponding `Component` class to implement the execution logic for a specific
component type.


* The `/pipeline` directory which contains the modules for implementing a Fondant pipeline.
* `pipeline.py`: Defines the `Pipeline` class which is used to define the pipeline graph and the
pipeline run. The
implemented class is then consumed by the compiler to compile to a specific pipeline runner.
This module also implements the
`ComponentOp` class which is used to define the component operation in the pipeline graph.
* `compiler.py`: Defines the `Compiler` class which is used to define the compiler that
compilers the pipeline graph for a specific
runner.
* `runner.py`: Defines the `Runner` class which is used to define the runner that executes the
compiled pipeline graph.

### Additional modules

Additional modules in Fondant include:

* `cli.py`: Defines the CLI for interacting with Fondant. This includes the `fondant` command line
tool which is used to build components,
compile and run pipelines and explore datasets.
* `explore.py`: Runs the explorer which is a web application that allows the user to explore the
content of a dataset.
* `build.py`: Defines the `build` command which is used to build and publish a component.
* `testing.py`: Contains common testing utilities for testing components and pipelines.
Binary file added docs/art/architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 8 additions & 6 deletions docs/caching.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,17 @@ Fondant supports caching of pipeline executions. If a certain component and its
are exactly the same as in some previous execution, then its execution can be skipped and the output
dataset of the previous execution can be used instead.

Caching offers the following benefits:
1) **Reduced costs.** Skipping the execution of certain components can help avoid unnecessary costly computations.
2) **Faster pipeline runs.** Skipping the execution of certain components results in faster pipeline runs.
3) **Faster pipeline development.** Caching allows you develop and test your pipeline faster.
Caching offers the following benefits:
1) **Reduced costs.** Skipping the execution of certain components can help avoid unnecessary costly computations.
2) **Faster pipeline runs.** Skipping the execution of certain components results in faster pipeline runs.
3) **Faster pipeline development.** Caching allows you develop and test your pipeline faster.
4) **Reproducibility.** Caching allows you to reproduce the results of a pipeline run by reusing
the outputs of a previous pipeline run.

**Note:** The cached runs are tied to the base path which stores the caching key of previous component runs.
Changing the base path will invalidate the cache of previous executed pipelines.
!!! note "IMPORTANT"

The cached runs are tied to the base path which stores the caching key of previous component runs.
Changing the base path will invalidate the cache of previous executed pipelines.

The caching feature is **enabled** by default.

Expand Down

0 comments on commit 092e967

Please sign in to comment.