Skip to content

Commit

Permalink
Add docs
Browse files Browse the repository at this point in the history
  • Loading branch information
RobbeSneyders committed Feb 15, 2024
1 parent 737da5b commit ee00432
Showing 1 changed file with 28 additions and 0 deletions.
28 changes: 28 additions & 0 deletions docs/components/components.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
from distributed import Client

# Components

Fondant makes it easy to build data preparation pipelines leveraging reusable components. Fondant
Expand Down Expand Up @@ -65,6 +67,32 @@ this data can be accessed using `dataframe["image"]`.
The `transform` method should return a single dataframe, with the columns complying to the
schema defined by the `produces` section of the component specification.

### Configuring Dask

You can configure the Dask client based on the needs of your component by overriding the
`dask_client` method:

```python
import os

from dask.distributed import Client, LocalCluster
from fondant.component import PandasTransformComponent

class Component(PandasTransformComponent):

def dask_client(self) -> Client:
"""Initialize the dask client to use for this component."""
cluster = LocalCluster(
processes=True,
n_workers=os.cpu_count(),
threads_per_worker=1,
)
return Client(cluster)
```

The default Dask client is configured to work with processes, the same amount of workers as
logical CPUs available, and on thread per worker.

## Component types

We can distinguish two different types of components:
Expand Down

0 comments on commit ee00432

Please sign in to comment.