Skip to content

Commit

Permalink
Add component hub doc page (#487)
Browse files Browse the repository at this point in the history
This PR adds a script and template to automatically add a component hub
page to our docs.
  • Loading branch information
RobbeSneyders authored Oct 5, 2023
1 parent aa41774 commit 52cdbb2
Show file tree
Hide file tree
Showing 28 changed files with 274 additions and 44 deletions.
6 changes: 4 additions & 2 deletions components/caption_images/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,14 @@ This component captions images using a BLIP model from the Hugging Face hub
### Inputs / outputs

**This component consumes:**

- images
- data: binary
- data: binary

**This component produces:**

- captions
- text: string
- text: string

### Arguments

Expand Down
10 changes: 6 additions & 4 deletions components/download_images/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,16 @@ from the img2dataset library.
### Inputs / outputs

**This component consumes:**

- images
- url: string
- url: string

**This component produces:**

- images
- data: binary
- width: int32
- height: int32
- data: binary
- width: int32
- height: int32

### Arguments

Expand Down
6 changes: 4 additions & 2 deletions components/embed_images/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,14 @@ Component that generates CLIP embeddings from images
### Inputs / outputs

**This component consumes:**

- images
- data: binary
- data: binary

**This component produces:**

- embeddings
- data: list<item: float>
- data: list<item: float>

### Arguments

Expand Down
6 changes: 4 additions & 2 deletions components/embedding_based_laion_retrieval/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,14 @@ used to find images similar to the embedded images / captions.
### Inputs / outputs

**This component consumes:**

- embeddings
- data: list<item: float>
- data: list<item: float>

**This component produces:**

- images
- url: string
- url: string

### Arguments

Expand Down
3 changes: 2 additions & 1 deletion components/filter_comments/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,9 @@ Component that filters code based on the code to comment ratio
### Inputs / outputs

**This component consumes:**

- code
- content: string
- content: string

**This component produces no data.**

Expand Down
5 changes: 3 additions & 2 deletions components/filter_image_resolution/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,10 @@ Component that filters images based on minimum size and max aspect ratio
### Inputs / outputs

**This component consumes:**

- images
- width: int32
- height: int32
- width: int32
- height: int32

**This component produces no data.**

Expand Down
7 changes: 4 additions & 3 deletions components/filter_line_length/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,11 @@ Component that filters code based on line length
### Inputs / outputs

**This component consumes:**

- code
- avg_line_length: double
- max_line_length: int32
- alphanum_fraction: double
- avg_line_length: double
- max_line_length: int32
- alphanum_fraction: double

**This component produces no data.**

Expand Down
10 changes: 6 additions & 4 deletions components/image_cropping/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,16 @@ right side is border-cropped image.
### Inputs / outputs

**This component consumes:**

- images
- data: binary
- data: binary

**This component produces:**

- images
- data: binary
- width: int32
- height: int32
- data: binary
- width: int32
- height: int32

### Arguments

Expand Down
10 changes: 6 additions & 4 deletions components/image_resolution_extraction/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,16 @@ Component that extracts image resolution data from the images
### Inputs / outputs

**This component consumes:**

- images
- data: binary
- data: binary

**This component produces:**

- images
- data: binary
- width: int32
- height: int32
- data: binary
- width: int32
- height: int32

### Arguments

Expand Down
3 changes: 2 additions & 1 deletion components/language_filter/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,9 @@ A component that filters text based on the provided language.
### Inputs / outputs

**This component consumes:**

- text
- data: string
- data: string

**This component produces no data.**

Expand Down
5 changes: 3 additions & 2 deletions components/load_from_files/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,10 @@ location. It supports the following formats: .zip, gzip, tar and tar.gz.
**This component consumes no data.**

**This component produces:**

- file
- filename: string
- content: binary
- filename: string
- content: binary

### Arguments

Expand Down
3 changes: 2 additions & 1 deletion components/load_from_hf_hub/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,9 @@ Component that loads a dataset from the hub
**This component consumes no data.**

**This component produces:**

- dummy_variable
- data: binary
- data: binary

### Arguments

Expand Down
3 changes: 2 additions & 1 deletion components/load_from_parquet/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,9 @@ Component that loads a dataset from a parquet uri
**This component consumes no data.**

**This component produces:**

- dummy_variable
- data: binary
- data: binary

### Arguments

Expand Down
6 changes: 4 additions & 2 deletions components/minhash_generator/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,14 @@ A component that generates minhashes of text.
### Inputs / outputs

**This component consumes:**

- text
- data: string
- data: string

**This component produces:**

- text
- minhash: list<item: uint64>
- minhash: list<item: uint64>

### Arguments

Expand Down
6 changes: 4 additions & 2 deletions components/pii_redaction/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,14 @@ code.
### Inputs / outputs

**This component consumes:**

- code
- content: string
- content: string

**This component produces:**

- code
- content: string
- content: string

### Arguments

Expand Down
6 changes: 4 additions & 2 deletions components/prompt_based_laion_retrieval/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,14 @@ This component doesn’t return the actual images, only URLs.
### Inputs / outputs

**This component consumes:**

- prompts
- text: string
- text: string

**This component produces:**

- images
- url: string
- url: string

### Arguments

Expand Down
6 changes: 4 additions & 2 deletions components/segment_images/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,14 @@ Component that creates segmentation masks for images using a model from the Hugg
### Inputs / outputs

**This component consumes:**

- images
- data: binary
- data: binary

**This component produces:**

- segmentations
- data: binary
- data: binary

### Arguments

Expand Down
3 changes: 2 additions & 1 deletion components/text_length_filter/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,9 @@ A component that filters out text based on their length
### Inputs / outputs

**This component consumes:**

- text
- data: string
- data: string

**This component produces no data.**

Expand Down
3 changes: 2 additions & 1 deletion components/text_normalization/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,9 @@ the training of large language models.
### Inputs / outputs

**This component consumes:**

- text
- data: string
- data: string

**This component produces no data.**

Expand Down
3 changes: 2 additions & 1 deletion components/write_to_hf_hub/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,9 @@ Component that writes a dataset to the hub
### Inputs / outputs

**This component consumes:**

- dummy_variable
- data: binary
- data: binary

**This component produces no data.**

Expand Down
5 changes: 4 additions & 1 deletion docs/.readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,7 @@ build:
- poetry config virtualenvs.create false
post_install:
# Install dependencies with 'docs' dependency group
- poetry install --with docs
- poetry install --with docs
pre_build:
# Generate hub documentation
- python scripts/component_readme/generate_hub.py
2 changes: 1 addition & 1 deletion docs/components/components.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Fondant makes it easy to build data preparation pipelines leveraging reusable components. Fondant
provides a lot of components out of the box
([overview](https://github.com/ml6team/fondant/tree/main/components)), but you can also define your
([overview](hub.md)), but you can also define your
own custom components.

## The anatomy of a component
Expand Down
Loading

0 comments on commit 52cdbb2

Please sign in to comment.