[Docs] : fix typos in docs (#5612)

# Description  Closes #<issue_number> **Type of change**  - Bug fix (non-breaking change which fixes an issue) - New feature (non-breaking change which adds functionality) - Breaking change (fix or feature that would cause existing functionality to not work as expected) - Refactor (change restructuring the codebase without changing functionality) - Improvement (change adding some improvement to an existing functionality) - Documentation update **How Has This Been Tested**  **Checklist**  - I added relevant documentation - I followed the style guidelines of this project - I did a self-review of my code - I made corresponding changes to the documentation - I confirm My changes generate no new warnings - I have added tests that prove my fix is effective or that my feature works - I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/) --------- Co-authored-by: Sara Han <[email protected]>
argilla-io · Oct 31, 2024 · 02413d1 · 02413d1
1 parent 038172c
commit 02413d1
Show file tree

Hide file tree

Showing 8 changed files with 15 additions and 15 deletions.
diff --git a/docs/_source/conceptual_guides/data_model.md b/docs/_source/conceptual_guides/data_model.md
@@ -133,7 +133,7 @@ record = rg.TextClassificationRecord(
 
 ##### Token classification
 
-Tasks of the kind of token classification are NLP tasks aimed at dividing the input text into words, or syllables, and assigning certain values to them. Think about giving each word in a sentence its grammatical category or highlight which parts of a medical report belong to a certain specialty. There are some popular ones like NER or POS-tagging.
+Tasks of the kind of token classification are NLP tasks aimed at dividing the input text into words, or syllables, and assigning certain values to them. Think about giving each word in a sentence its grammatical category or highlight which parts of a medical report belong to a certain speciality. There are some popular ones like NER or POS-tagging.
 
 ```python
 record = rg.TokenClassificationRecord(
@@ -190,4 +190,4 @@ You can see our supported tasks at {ref}`tasks`.
 
 ### Settings
 
-For now, only a set of predefined labels (labels schema) is configurable. Still, other settings like annotators, and metadata schema, are planned to be supported as part of dataset settings.
+For now, only a set of predefined labels (labels schema) is configurable. Still, other settings like annotators, and metadata schema, are planned to be supported as part of dataset settings.
diff --git a/docs/_source/getting_started/argilla.md b/docs/_source/getting_started/argilla.md
@@ -138,7 +138,7 @@ Finally, platforms like Snorkel, Prodigy or Scale, while more comprehensive, oft
 <summary>What is Argilla currently working on?</summary>
 <p>
 
-We are continuously working on improving Argilla's features and usability, focusing now concentrating on a three-pronged vision: the development of Argilla Core (open-source), Distilabel, and Argilla JS/TS. You can find a list of our current projects <a href="https://github.com/orgs/argilla-io/projects/10/views/1">here</a>.
+We are continuously working on improving Argilla's features and usability, focusing now on a three-pronged vision: the development of Argilla Core (open-source), Distilabel, and Argilla JS/TS. You can find a list of our current projects <a href="https://github.com/orgs/argilla-io/projects/10/views/1">here</a>.
 
 </p>
 </details>

diff --git a/docs/_source/getting_started/installation/deployments/cloud_providers.md b/docs/_source/getting_started/installation/deployments/cloud_providers.md
@@ -157,7 +157,7 @@ gcloud auth login
 
 ### 2. Build and deploy the container
 
-We will use the `gcloud run deploy` command to deploy the Argilla container directly from the Docker Hub. We can point the cloud run url to the container's default port (6900) and define relevant compute resouces.
+We will use the `gcloud run deploy` command to deploy the Argilla container directly from the Docker Hub. We can point the cloud run url to the container's default port (6900) and define relevant compute resources.
 
 ```bash
 gcloud run deploy <deployment-name> \

diff --git a/docs/_source/practical_guides/annotate_dataset.md b/docs/_source/practical_guides/annotate_dataset.md
@@ -90,7 +90,7 @@ You can track your progress and the number of `Pending`, `Draft`, `Submitted` an
 
 In Argilla's Feedback Task datasets, you can annotate and process records in two ways:
 
-- **Focus view**: you can only see, respond and perfom actions on one record at a time. This is better for records that need to be examined closely and individually before responding.
+- **Focus view**: you can only see, respond and perform actions on one record at a time. This is better for records that need to be examined closely and individually before responding.
 - **Bulk view**: you can see multiple records in a list so you can respond and perform actions on more than one record at a time. This is useful for actions that can be taken on many records that have similar characteristics e.g., apply the same label to the results of a similarity search, discard all records in a specific language or save/submit records with a suggestion score over a safe threshold.
 
 ```{hint}
@@ -105,7 +105,7 @@ If you have a Span question in your dataset, you can always answer other questio
 
 In the queue of **Pending** records, you can change from _Focus_ to _Bulk_ view. Once in the _Bulk view_, you can expand or collapse records --i.e. see the full length of all records in the page or set a fixed height-- and select the number of records you want to see per page.
 
-To select or unselect all records in the page, click on the checkbox above the record list. To select or unselect specific records, click on the checkbox inside the individual record card. When you use filters inside the bulk view and the results are higher than the records visible in the page but lower than 1000, you will also have the option to select all of the results after you click on the checkbox. You can cancel this selection clicking on the _Cancel_ button.
+To select or unselect all records in the page, click on the checkbox above the record list. To select or unselect specific records, click on the checkbox inside the individual record card. When you use filters inside the bulk view and the results are higher than the records visible in the page but lower than 1000, you will also have the option to select all of the results after you click on the checkbox. You can cancel this selection by clicking on the _Cancel_ button.
 
 Once records are selected, choose the responses that apply to all selected records (if any) and do the desired action: _Discard_, _Save as draft_ or even _Submit_. Note that you can only submit the records if all required questions have been answered.
 
@@ -169,7 +169,7 @@ Not all filters listed below are available for all tasks.
 
 ##### Predictions filter
 
-This filter allows you to filter records with respect of their predictions:
+This filter allows you to filter records with respect to their predictions:
 
 - **Predicted as**: filter records by their predicted labels.
 - **Predicted ok**: filter records whose predictions do, or do not, match the annotations.
@@ -291,4 +291,4 @@ If you struggle to increase the overall coverage, try to filter for the records
 #### Manage rules
 
 Here you will see a list of your saved rules.
-You can edit a rule by clicking on its name, or delete it by clicking on the trash icon.
+You can edit a rule by clicking on its name, or delete it by clicking on the trash icon.
diff --git a/docs/_source/practical_guides/collect_responses.md b/docs/_source/practical_guides/collect_responses.md
@@ -183,7 +183,7 @@ We plan on adding more support for other metrics so feel free to reach out on ou
 
 #### Model Metrics
 
-In contrast to agreement metrics, where we compare the responses of annotators with each other, it is a good practice to evaluate the suggestions of models against the annotators as ground truths. As `FeedbackDataset` already offers the possibility to add `suggestions` to the responses, we can compare these initial predictions against the verified reponses. This will give us two important insights: how reliable the responses of a given annotator are, and how good the suggestions we are giving to the annotators are. This way, we can take action to improve the quality of the responses by making changes to the guidelines or the structure, and the suggestions given to the annotators by changing or updating the model we use. Note that each question type has a different set of metrics available.
+In contrast to agreement metrics, where we compare the responses of annotators with each other, it is a good practice to evaluate the suggestions of models against the annotators as ground truths. As `FeedbackDataset` already offers the possibility to add `suggestions` to the responses, we can compare these initial predictions against the verified responses. This will give us two important insights: how reliable the responses of a given annotator are, and how good the suggestions we are giving to the annotators are. This way, we can take action to improve the quality of the responses by making changes to the guidelines or the structure, and the suggestions given to the annotators by changing or updating the model we use. Note that each question type has a different set of metrics available.
 
 Here is an example use of the `compute` function to calculate the metrics for a `FeedbackDataset`:
 
@@ -495,4 +495,4 @@ f1(name="sst2").visualize()
 # now compute metrics for negation ( -> negative precision and positive recall go down)
 f1(name="sst2", query="n't OR not").visualize()
 ```
-![F1 metrics from query](/_static/images/guides/metrics/negation_f1.png)
+![F1 metrics from query](/_static/images/guides/metrics/negation_f1.png)
diff --git a/docs/_source/practical_guides/export_dataset.md b/docs/_source/practical_guides/export_dataset.md
@@ -20,7 +20,7 @@ remote_dataset = rg.FeedbackDataset.from_argilla("my-dataset", workspace="my-wor
 local_dataset = remote_dataset.pull(max_records=100) # get first 100 records
 ```
 
-If your dataset includes vectors, by default these will **not** get pulled with the rest of the dataset in order to improve performace. If you would like to pull the vectors in your records, you will need to specify it like so:
+If your dataset includes vectors, by default these will **not** get pulled with the rest of the dataset in order to improve performance. If you would like to pull the vectors in your records, you will need to specify it like so:
 
 ::::{tab-set}
 
@@ -204,4 +204,4 @@ df = dataset_rg.to_pandas()
 df.to_csv("my_dataset.csv")  # Save as CSV
 df.to_json("my_dataset.json")  # Save as JSON
 df.to_parquet("my_dataset.parquet")  # Save as Parquet
-```
+```
diff --git a/docs/_source/practical_guides/fine_tune.md b/docs/_source/practical_guides/fine_tune.md
@@ -533,7 +533,7 @@ task = TrainingTask.for_sentence_similarity(
 )
 ```
 
-For datasets that where annotated with numerical values we could also pass the label strategy we want to use (let's assume we have another question in the dataset named "other-question" that contains values that come from rated answers):
+For datasets that were annotated with numerical values we could also pass the label strategy we want to use (let's assume we have another question in the dataset named "other-question" that contains values that come from rated answers):
 
 ```python
 task = TrainingTask.for_sentence_similarity(
@@ -1547,4 +1547,4 @@ Options:
 --update-config-kwargs        TEXT                                                      update_config() kwargs to be passed as a dictionary. [default: {}]
 --help                                                                                  Show this message and exit.
 
-```
+```
diff --git a/...rials_and_integrations/integrations/add_sentence_transformers_embeddings_as_vectors.ipynb b/...rials_and_integrations/integrations/add_sentence_transformers_embeddings_as_vectors.ipynb
@@ -23,7 +23,7 @@
     "\n",
     "The basic idea is to use a pre-trained model to generate a vector representation for each relevant `TextFields` within the records. These vectors are then indexed within our databse and can then used to search based the similarity between texts. This should be useful for searching similar records based on the semantic meaning of the text.\n",
     "\n",
-    "To get the these vectors and config, we will use the `SentenceTransformersExtractor` based on the [sentence-transformers](https://www.sbert.net/index.html) library. The default model we use for this is the [TaylorAI/bge-micro-v2](https://huggingface.co/TaylorAI/bge-micro-v2), which offers a nice trade-off between speed and accuracy, but you can use any model from the [sentence-transformers](https://www.sbert.net/index.html) library or from the [Hugging Face Hub](https://huggingface.co/models?library=sentence-transformers)."
+    "To get these vectors and config, we will use the `SentenceTransformersExtractor` based on the [sentence-transformers](https://www.sbert.net/index.html) library. The default model we use for this is the [TaylorAI/bge-micro-v2](https://huggingface.co/TaylorAI/bge-micro-v2), which offers a nice trade-off between speed and accuracy, but you can use any model from the [sentence-transformers](https://www.sbert.net/index.html) library or from the [Hugging Face Hub](https://huggingface.co/models?library=sentence-transformers)."
    ]
   },
   {
-Original file line number
+Diff line change
@@ Expand Up / @@ -23,7 +23,7 @@ @@
         "\n",
         "The basic idea is to use a pre-trained model to generate a vector representation for each relevant `TextFields` within the records. These vectors are then indexed within our databse and can then used to search based the similarity between texts. This should be useful for searching similar records based on the semantic meaning of the text.\n",
         "\n",
-        "To get the these vectors and config, we will use the `SentenceTransformersExtractor` based on the [sentence-transformers](https://www.sbert.net/index.html) library. The default model we use for this is the [TaylorAI/bge-micro-v2](https://huggingface.co/TaylorAI/bge-micro-v2), which offers a nice trade-off between speed and accuracy, but you can use any model from the [sentence-transformers](https://www.sbert.net/index.html) library or from the [Hugging Face Hub](https://huggingface.co/models?library=sentence-transformers)."
+        "To get these vectors and config, we will use the `SentenceTransformersExtractor` based on the [sentence-transformers](https://www.sbert.net/index.html) library. The default model we use for this is the [TaylorAI/bge-micro-v2](https://huggingface.co/TaylorAI/bge-micro-v2), which offers a nice trade-off between speed and accuracy, but you can use any model from the [sentence-transformers](https://www.sbert.net/index.html) library or from the [Hugging Face Hub](https://huggingface.co/models?library=sentence-transformers)."
        ]
       },
       {
@@ Expand Down @@