Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update generic readme generation #737

Merged
merged 5 commits into from
Dec 22, 2023
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 14 additions & 8 deletions components/caption_images/README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,25 @@
# Caption images

### Description
## Description
This component captions images using a BLIP model from the Hugging Face hub

### Inputs / outputs
## Inputs / outputs

### Consumes
**This component consumes:**

- image: binary

**This component produces:**




### Produces
**This component produces:**
- caption: string

### Arguments


## Arguments

The component takes the following arguments to alter its behavior:

Expand All @@ -23,7 +29,7 @@ The component takes the following arguments to alter its behavior:
| batch_size | int | Batch size to use for inference | 8 |
| max_new_tokens | int | Maximum token length of each caption | 50 |

### Usage
## Usage

You can add this component to your pipeline using the following code:

Expand All @@ -42,11 +48,11 @@ dataset = dataset.apply(
# "model_id": "Salesforce/blip-image-captioning-base",
# "batch_size": 8,
# "max_new_tokens": 50,
}
},
)
```

### Testing
## Testing

You can run the tests using docker with BuildKit. From this directory, run:
```
Expand Down
22 changes: 14 additions & 8 deletions components/chunk_text/README.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,30 @@
# Chunk text

### Description
## Description
Component that chunks text into smaller segments

This component takes a body of text and chunks into small chunks. The id of the returned dataset
consists of the id of the original document followed by the chunk index.


### Inputs / outputs
## Inputs / outputs

### Consumes
**This component consumes:**

- text: string

**This component produces:**




### Produces
**This component produces:**
- text: string
- original_document_id: string

### Arguments


## Arguments

The component takes the following arguments to alter its behavior:

Expand All @@ -27,7 +33,7 @@ The component takes the following arguments to alter its behavior:
| chunk_size | int | Maximum size of chunks to return | / |
| chunk_overlap | int | Overlap in characters between chunks | / |

### Usage
## Usage

You can add this component to your pipeline using the following code:

Expand All @@ -45,11 +51,11 @@ dataset = dataset.apply(
# Add arguments
# "chunk_size": 0,
# "chunk_overlap": 0,
}
},
)
```

### Testing
## Testing

You can run the tests using docker with BuildKit. From this directory, run:
```
Expand Down
20 changes: 13 additions & 7 deletions components/crop_images/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Image cropping

### Description
## Description
This component crops out image borders. This is typically useful when working with graphical
images that have single-color borders (e.g. logos, icons, etc.).

Expand All @@ -18,19 +18,25 @@ right side is border-cropped image.
![Example of image cropping by removing the single-color border. Left side is original, right side is cropped image](../../docs/art/components/image_cropping/component_border_crop_0.png)


### Inputs / outputs
## Inputs / outputs

### Consumes
**This component consumes:**

- images_data: binary

**This component produces:**




### Produces
**This component produces:**
- image: binary
- image_width: int32
- image_height: int32

### Arguments


## Arguments

The component takes the following arguments to alter its behavior:

Expand All @@ -39,7 +45,7 @@ The component takes the following arguments to alter its behavior:
| cropping_threshold | int | Threshold parameter used for detecting borders. A lower (negative) parameter results in a more performant border detection, but can cause overcropping. Default is -30 | -30 |
| padding | int | Padding for the image cropping. The padding is added to all borders of the image. | 10 |

### Usage
## Usage

You can add this component to your pipeline using the following code:

Expand All @@ -57,7 +63,7 @@ dataset = dataset.apply(
# Add arguments
# "cropping_threshold": -30,
# "padding": 10,
}
},
)
```

22 changes: 14 additions & 8 deletions components/download_images/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Download images

### Description
## Description
Component that downloads images from a list of URLs.

This component takes in image URLs as input and downloads the images, along with some metadata
Expand All @@ -10,19 +10,25 @@ component also resizes the images using the
from the img2dataset library.


### Inputs / outputs
## Inputs / outputs

### Consumes
**This component consumes:**

- image_url: string

**This component produces:**




### Produces
**This component produces:**
- image: binary
- image_width: int32
- image_height: int32

### Arguments


## Arguments

The component takes the following arguments to alter its behavior:

Expand All @@ -37,7 +43,7 @@ The component takes the following arguments to alter its behavior:
| min_image_size | int | Minimum size of the images. | / |
| max_aspect_ratio | float | Maximum aspect ratio of the images. | 99.9 |

### Usage
## Usage

You can add this component to your pipeline using the following code:

Expand All @@ -61,11 +67,11 @@ dataset = dataset.apply(
# "resize_only_if_bigger": False,
# "min_image_size": 0,
# "max_aspect_ratio": 99.9,
}
},
)
```

### Testing
## Testing

You can run the tests using docker with BuildKit. From this directory, run:
```
Expand Down
20 changes: 13 additions & 7 deletions components/embed_images/README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,25 @@
# Embed images

### Description
## Description
Component that generates CLIP embeddings from images

### Inputs / outputs
## Inputs / outputs

### Consumes
**This component consumes:**

- image: binary

**This component produces:**




### Produces
**This component produces:**
- embedding: list<item: float>

### Arguments


## Arguments

The component takes the following arguments to alter its behavior:

Expand All @@ -22,7 +28,7 @@ The component takes the following arguments to alter its behavior:
| model_id | str | Model id of a CLIP model on the Hugging Face hub | openai/clip-vit-large-patch14 |
| batch_size | int | Batch size to use when embedding | 8 |

### Usage
## Usage

You can add this component to your pipeline using the following code:

Expand All @@ -40,7 +46,7 @@ dataset = dataset.apply(
# Add arguments
# "model_id": "openai/clip-vit-large-patch14",
# "batch_size": 8,
}
},
)
```

22 changes: 14 additions & 8 deletions components/embed_text/README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,25 @@
# Embed text

### Description
## Description
Component that generates embeddings of text passages.

### Inputs / outputs
## Inputs / outputs

### Consumes
**This component consumes:**

- text: string

**This component produces:**




### Produces
**This component produces:**
- embedding: list<item: float>

### Arguments


## Arguments

The component takes the following arguments to alter its behavior:

Expand All @@ -24,7 +30,7 @@ The component takes the following arguments to alter its behavior:
| api_keys | dict | The API keys to use for the model provider that are written to environment variables.Pass only the keys required by the model provider or conveniently pass all keys you will ever need. Pay attention how to name the dictionary keys so that they can be used by the model provider. | / |
| auth_kwargs | dict | Additional keyword arguments required for api initialization/authentication. | / |

### Usage
## Usage

You can add this component to your pipeline using the following code:

Expand All @@ -44,11 +50,11 @@ dataset = dataset.apply(
# "model": ,
# "api_keys": {},
# "auth_kwargs": {},
}
},
)
```

### Testing
## Testing

You can run the tests using docker with BuildKit. From this directory, run:
```
Expand Down
31 changes: 23 additions & 8 deletions components/evaluate_ragas/README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,29 @@
# retriever_eval_ragas

### Description
## Description
Component that evaluates the retriever using RAGAS

### Inputs / outputs
## Inputs / outputs

### Consumes
**This component consumes:**

- question: string
- retrieved_chunks: list<item: string>

**This component produces no data.**

### Arguments



### Produces

**This component can produce additional fields**
- <field_name>: <field_schema>
This defines a mapping to update the fields produced by the operation as defined in the component spec.
The keys are the names of the fields to be produced by the component, while the values are
the type of the field that should be used to write the output dataset.


## Arguments

The component takes the following arguments to alter its behavior:

Expand All @@ -22,7 +33,7 @@ The component takes the following arguments to alter its behavior:
| llm_name | str | Name of the selected llm | / |
| llm_kwargs | dict | Arguments of the selected llm | / |

### Usage
## Usage

You can add this component to your pipeline using the following code:

Expand All @@ -41,11 +52,15 @@ dataset = dataset.apply(
# "module": "langchain.llms",
# "llm_name": ,
# "llm_kwargs": {},
}
},
produces={
<field_name>: <field_schema>,
..., # Add fields
},
)
```

### Testing
## Testing

You can run the tests using docker with BuildKit. From this directory, run:
```
Expand Down
Loading
Loading