ml6team · PhilippeMoussalli · Dec 22, 2023 · Dec 19, 2023 · Dec 20, 2023 · Dec 21, 2023
diff --git a/components/caption_images/README.md b/components/caption_images/README.md
@@ -1,19 +1,25 @@
 # Caption images
 
-### Description
+## Description
 This component captions images using a BLIP model from the Hugging Face hub
 
-### Inputs / outputs
+## Inputs / outputs
 
+### Consumes
 **This component consumes:**
-
 - image: binary
 
-**This component produces:**
 
+
+
+
+### Produces
+**This component produces:**
 - caption: string
 
-### Arguments
+
+
+## Arguments
 
 The component takes the following arguments to alter its behavior:
 
@@ -23,7 +29,7 @@ The component takes the following arguments to alter its behavior:
 | batch_size | int | Batch size to use for inference | 8 |
 | max_new_tokens | int | Maximum token length of each caption | 50 |
 
-### Usage
+## Usage
 
 You can add this component to your pipeline using the following code:
 
@@ -42,11 +48,11 @@ dataset = dataset.apply(
         # "model_id": "Salesforce/blip-image-captioning-base",
         # "batch_size": 8,
         # "max_new_tokens": 50,
-    }
+    },
 )
 ```
 
-### Testing
+## Testing
 
 You can run the tests using docker with BuildKit. From this directory, run:
 ```

diff --git a/components/chunk_text/README.md b/components/chunk_text/README.md
@@ -1,24 +1,30 @@
 # Chunk text
 
-### Description
+## Description
 Component that chunks text into smaller segments 
 
 This component takes a body of text and chunks into small chunks. The id of the returned dataset
 consists of the id of the original document followed by the chunk index.
 
 
-### Inputs / outputs
+## Inputs / outputs
 
+### Consumes
 **This component consumes:**
-
 - text: string
 
-**This component produces:**
 
+
+
+
+### Produces
+**This component produces:**
 - text: string
 - original_document_id: string
 
-### Arguments
+
+
+## Arguments
 
 The component takes the following arguments to alter its behavior:
 
@@ -27,7 +33,7 @@ The component takes the following arguments to alter its behavior:
 | chunk_size | int | Maximum size of chunks to return | / |
 | chunk_overlap | int | Overlap in characters between chunks | / |
 
-### Usage
+## Usage
 
 You can add this component to your pipeline using the following code:
 
@@ -45,11 +51,11 @@ dataset = dataset.apply(
         # Add arguments
         # "chunk_size": 0,
         # "chunk_overlap": 0,
-    }
+    },
 )
 ```
 
-### Testing
+## Testing
 
 You can run the tests using docker with BuildKit. From this directory, run:
 ```

diff --git a/components/crop_images/README.md b/components/crop_images/README.md
@@ -1,6 +1,6 @@
 # Image cropping
 
-### Description
+## Description
 This component crops out image borders. This is typically useful when working with graphical 
 images that have single-color borders (e.g. logos, icons, etc.).
 
@@ -18,19 +18,25 @@ right side is border-cropped image.
 ![Example of image cropping by removing the single-color border. Left side is original, right side is cropped image](../../docs/art/components/image_cropping/component_border_crop_0.png)
 
 
-### Inputs / outputs
+## Inputs / outputs
 
+### Consumes
 **This component consumes:**
-
 - images_data: binary
 
-**This component produces:**
 
+
+
+
+### Produces
+**This component produces:**
 - image: binary
 - image_width: int32
 - image_height: int32
 
-### Arguments
+
+
+## Arguments
 
 The component takes the following arguments to alter its behavior:
 
@@ -39,7 +45,7 @@ The component takes the following arguments to alter its behavior:
 | cropping_threshold | int | Threshold parameter used for detecting borders. A lower (negative) parameter results in a more performant border detection, but can cause overcropping. Default is -30 | -30 |
 | padding | int | Padding for the image cropping. The padding is added to all borders of the image. | 10 |
 
-### Usage
+## Usage
 
 You can add this component to your pipeline using the following code:
 
@@ -57,7 +63,7 @@ dataset = dataset.apply(
         # Add arguments
         # "cropping_threshold": -30,
         # "padding": 10,
-    }
+    },
 )
 ```
 
diff --git a/components/download_images/README.md b/components/download_images/README.md
@@ -1,6 +1,6 @@
 # Download images
 
-### Description
+## Description
 Component that downloads images from a list of URLs.
 
 This component takes in image URLs as input and downloads the images, along with some metadata 
@@ -10,19 +10,25 @@ component also resizes the images using the
 from the img2dataset library.
 
 
-### Inputs / outputs
+## Inputs / outputs
 
+### Consumes
 **This component consumes:**
-
 - image_url: string
 
-**This component produces:**
 
+
+
+
+### Produces
+**This component produces:**
 - image: binary
 - image_width: int32
 - image_height: int32
 
-### Arguments
+
+
+## Arguments
 
 The component takes the following arguments to alter its behavior:
 
@@ -37,7 +43,7 @@ The component takes the following arguments to alter its behavior:
 | min_image_size | int | Minimum size of the images. | / |
 | max_aspect_ratio | float | Maximum aspect ratio of the images. | 99.9 |
 
-### Usage
+## Usage
 
 You can add this component to your pipeline using the following code:
 
@@ -61,11 +67,11 @@ dataset = dataset.apply(
         # "resize_only_if_bigger": False,
         # "min_image_size": 0,
         # "max_aspect_ratio": 99.9,
-    }
+    },
 )
 ```
 
-### Testing
+## Testing
 
 You can run the tests using docker with BuildKit. From this directory, run:
 ```

diff --git a/components/embed_images/README.md b/components/embed_images/README.md
@@ -1,19 +1,25 @@
 # Embed images
 
-### Description
+## Description
 Component that generates CLIP embeddings from images
 
-### Inputs / outputs
+## Inputs / outputs
 
+### Consumes
 **This component consumes:**
-
 - image: binary
 
-**This component produces:**
 
+
+
+
+### Produces
+**This component produces:**
 - embedding: list<item: float>
 
-### Arguments
+
+
+## Arguments
 
 The component takes the following arguments to alter its behavior:
 
@@ -22,7 +28,7 @@ The component takes the following arguments to alter its behavior:
 | model_id | str | Model id of a CLIP model on the Hugging Face hub | openai/clip-vit-large-patch14 |
 | batch_size | int | Batch size to use when embedding | 8 |
 
-### Usage
+## Usage
 
 You can add this component to your pipeline using the following code:
 
@@ -40,7 +46,7 @@ dataset = dataset.apply(
         # Add arguments
         # "model_id": "openai/clip-vit-large-patch14",
         # "batch_size": 8,
-    }
+    },
 )
 ```
 
diff --git a/components/embed_text/README.md b/components/embed_text/README.md
@@ -1,19 +1,25 @@
 # Embed text
 
-### Description
+## Description
 Component that generates embeddings of text passages.
 
-### Inputs / outputs
+## Inputs / outputs
 
+### Consumes
 **This component consumes:**
-
 - text: string
 
-**This component produces:**
 
+
+
+
+### Produces
+**This component produces:**
 - embedding: list<item: float>
 
-### Arguments
+
+
+## Arguments
 
 The component takes the following arguments to alter its behavior:
 
@@ -24,7 +30,7 @@ The component takes the following arguments to alter its behavior:
 | api_keys | dict | The API keys to use for the model provider that are written to environment variables.Pass only the keys required by the model provider or conveniently pass all keys you will ever need. Pay attention how to name the dictionary keys so that they can be used by the model provider. | / |
 | auth_kwargs | dict | Additional keyword arguments required for api initialization/authentication. | / |
 
-### Usage
+## Usage
 
 You can add this component to your pipeline using the following code:
 
@@ -44,11 +50,11 @@ dataset = dataset.apply(
         # "model": ,
         # "api_keys": {},
         # "auth_kwargs": {},
-    }
+    },
 )
 ```
 
-### Testing
+## Testing
 
 You can run the tests using docker with BuildKit. From this directory, run:
 ```

diff --git a/components/evaluate_ragas/README.md b/components/evaluate_ragas/README.md
@@ -1,18 +1,29 @@
 # retriever_eval_ragas
 
-### Description
+## Description
 Component that evaluates the retriever using RAGAS
 
-### Inputs / outputs
+## Inputs / outputs
 
+### Consumes
 **This component consumes:**
-
 - question: string
 - retrieved_chunks: list<item: string>
 
-**This component produces no data.**
 
-### Arguments
+
+
+
+### Produces
+
+**This component can produce additional fields**
+- <field_name>: <field_schema>
+This defines a mapping to update the fields produced by the operation as defined in the component spec.
+The keys are the names of the fields to be produced by the component, while the values are 
+the type of the field that should be used to write the output dataset.
+
+
+## Arguments
 
 The component takes the following arguments to alter its behavior:
 
@@ -22,7 +33,7 @@ The component takes the following arguments to alter its behavior:
 | llm_name | str | Name of the selected llm | / |
 | llm_kwargs | dict | Arguments of the selected llm | / |
 
-### Usage
+## Usage
 
 You can add this component to your pipeline using the following code:
 
@@ -41,11 +52,15 @@ dataset = dataset.apply(
         # "module": "langchain.llms",
         # "llm_name": ,
         # "llm_kwargs": {},
-    }
+    },
+    produces={
+         <field_name>: <field_schema>,
+         ..., # Add fields
+    },
 )
 ```
 
-### Testing
+## Testing
 
 You can run the tests using docker with BuildKit. From this directory, run:
 ```