Skip to content

Commit

Permalink
[Docs] : fix typos in docs (#5612)
Browse files Browse the repository at this point in the history
# Description
<!-- Please include a summary of the changes and the related issue.
Please also include relevant motivation and context. List any
dependencies that are required for this change. -->

Closes #<issue_number>

**Type of change**
<!-- Please delete options that are not relevant. Remember to title the
PR according to the type of change -->

- Bug fix (non-breaking change which fixes an issue)
- New feature (non-breaking change which adds functionality)
- Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- Refactor (change restructuring the codebase without changing
functionality)
- Improvement (change adding some improvement to an existing
functionality)
- Documentation update

**How Has This Been Tested**
<!-- Please add some reference about how your feature has been tested.
-->

**Checklist**
<!-- Please go over the list and make sure you've taken everything into
account -->

- I added relevant documentation
- I followed the style guidelines of this project
- I did a self-review of my code
- I made corresponding changes to the documentation
- I confirm My changes generate no new warnings
- I have added tests that prove my fix is effective or that my feature
works
- I have added relevant notes to the CHANGELOG.md file (See
https://keepachangelog.com/)

---------

Co-authored-by: Sara Han <[email protected]>
  • Loading branch information
FarukhS52 and sdiazlor authored Oct 31, 2024
1 parent 038172c commit 02413d1
Show file tree
Hide file tree
Showing 8 changed files with 15 additions and 15 deletions.
4 changes: 2 additions & 2 deletions docs/_source/conceptual_guides/data_model.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ record = rg.TextClassificationRecord(

##### Token classification

Tasks of the kind of token classification are NLP tasks aimed at dividing the input text into words, or syllables, and assigning certain values to them. Think about giving each word in a sentence its grammatical category or highlight which parts of a medical report belong to a certain specialty. There are some popular ones like NER or POS-tagging.
Tasks of the kind of token classification are NLP tasks aimed at dividing the input text into words, or syllables, and assigning certain values to them. Think about giving each word in a sentence its grammatical category or highlight which parts of a medical report belong to a certain speciality. There are some popular ones like NER or POS-tagging.

```python
record = rg.TokenClassificationRecord(
Expand Down Expand Up @@ -190,4 +190,4 @@ You can see our supported tasks at {ref}`tasks`.

### Settings

For now, only a set of predefined labels (labels schema) is configurable. Still, other settings like annotators, and metadata schema, are planned to be supported as part of dataset settings.
For now, only a set of predefined labels (labels schema) is configurable. Still, other settings like annotators, and metadata schema, are planned to be supported as part of dataset settings.
2 changes: 1 addition & 1 deletion docs/_source/getting_started/argilla.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ Finally, platforms like Snorkel, Prodigy or Scale, while more comprehensive, oft
<summary>What is Argilla currently working on?</summary>
<p>

We are continuously working on improving Argilla's features and usability, focusing now concentrating on a three-pronged vision: the development of Argilla Core (open-source), Distilabel, and Argilla JS/TS. You can find a list of our current projects <a href="https://github.com/orgs/argilla-io/projects/10/views/1">here</a>.
We are continuously working on improving Argilla's features and usability, focusing now on a three-pronged vision: the development of Argilla Core (open-source), Distilabel, and Argilla JS/TS. You can find a list of our current projects <a href="https://github.com/orgs/argilla-io/projects/10/views/1">here</a>.

</p>
</details>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ gcloud auth login

### 2. Build and deploy the container

We will use the `gcloud run deploy` command to deploy the Argilla container directly from the Docker Hub. We can point the cloud run url to the container's default port (6900) and define relevant compute resouces.
We will use the `gcloud run deploy` command to deploy the Argilla container directly from the Docker Hub. We can point the cloud run url to the container's default port (6900) and define relevant compute resources.

```bash
gcloud run deploy <deployment-name> \
Expand Down
8 changes: 4 additions & 4 deletions docs/_source/practical_guides/annotate_dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ You can track your progress and the number of `Pending`, `Draft`, `Submitted` an

In Argilla's Feedback Task datasets, you can annotate and process records in two ways:

- **Focus view**: you can only see, respond and perfom actions on one record at a time. This is better for records that need to be examined closely and individually before responding.
- **Focus view**: you can only see, respond and perform actions on one record at a time. This is better for records that need to be examined closely and individually before responding.
- **Bulk view**: you can see multiple records in a list so you can respond and perform actions on more than one record at a time. This is useful for actions that can be taken on many records that have similar characteristics e.g., apply the same label to the results of a similarity search, discard all records in a specific language or save/submit records with a suggestion score over a safe threshold.

```{hint}
Expand All @@ -105,7 +105,7 @@ If you have a Span question in your dataset, you can always answer other questio

In the queue of **Pending** records, you can change from _Focus_ to _Bulk_ view. Once in the _Bulk view_, you can expand or collapse records --i.e. see the full length of all records in the page or set a fixed height-- and select the number of records you want to see per page.

To select or unselect all records in the page, click on the checkbox above the record list. To select or unselect specific records, click on the checkbox inside the individual record card. When you use filters inside the bulk view and the results are higher than the records visible in the page but lower than 1000, you will also have the option to select all of the results after you click on the checkbox. You can cancel this selection clicking on the _Cancel_ button.
To select or unselect all records in the page, click on the checkbox above the record list. To select or unselect specific records, click on the checkbox inside the individual record card. When you use filters inside the bulk view and the results are higher than the records visible in the page but lower than 1000, you will also have the option to select all of the results after you click on the checkbox. You can cancel this selection by clicking on the _Cancel_ button.

Once records are selected, choose the responses that apply to all selected records (if any) and do the desired action: _Discard_, _Save as draft_ or even _Submit_. Note that you can only submit the records if all required questions have been answered.

Expand Down Expand Up @@ -169,7 +169,7 @@ Not all filters listed below are available for all tasks.

##### Predictions filter

This filter allows you to filter records with respect of their predictions:
This filter allows you to filter records with respect to their predictions:

- **Predicted as**: filter records by their predicted labels.
- **Predicted ok**: filter records whose predictions do, or do not, match the annotations.
Expand Down Expand Up @@ -291,4 +291,4 @@ If you struggle to increase the overall coverage, try to filter for the records
#### Manage rules

Here you will see a list of your saved rules.
You can edit a rule by clicking on its name, or delete it by clicking on the trash icon.
You can edit a rule by clicking on its name, or delete it by clicking on the trash icon.
4 changes: 2 additions & 2 deletions docs/_source/practical_guides/collect_responses.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,7 @@ We plan on adding more support for other metrics so feel free to reach out on ou

#### Model Metrics

In contrast to agreement metrics, where we compare the responses of annotators with each other, it is a good practice to evaluate the suggestions of models against the annotators as ground truths. As `FeedbackDataset` already offers the possibility to add `suggestions` to the responses, we can compare these initial predictions against the verified reponses. This will give us two important insights: how reliable the responses of a given annotator are, and how good the suggestions we are giving to the annotators are. This way, we can take action to improve the quality of the responses by making changes to the guidelines or the structure, and the suggestions given to the annotators by changing or updating the model we use. Note that each question type has a different set of metrics available.
In contrast to agreement metrics, where we compare the responses of annotators with each other, it is a good practice to evaluate the suggestions of models against the annotators as ground truths. As `FeedbackDataset` already offers the possibility to add `suggestions` to the responses, we can compare these initial predictions against the verified responses. This will give us two important insights: how reliable the responses of a given annotator are, and how good the suggestions we are giving to the annotators are. This way, we can take action to improve the quality of the responses by making changes to the guidelines or the structure, and the suggestions given to the annotators by changing or updating the model we use. Note that each question type has a different set of metrics available.

Here is an example use of the `compute` function to calculate the metrics for a `FeedbackDataset`:

Expand Down Expand Up @@ -495,4 +495,4 @@ f1(name="sst2").visualize()
# now compute metrics for negation ( -> negative precision and positive recall go down)
f1(name="sst2", query="n't OR not").visualize()
```
![F1 metrics from query](/_static/images/guides/metrics/negation_f1.png)
![F1 metrics from query](/_static/images/guides/metrics/negation_f1.png)
4 changes: 2 additions & 2 deletions docs/_source/practical_guides/export_dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ remote_dataset = rg.FeedbackDataset.from_argilla("my-dataset", workspace="my-wor
local_dataset = remote_dataset.pull(max_records=100) # get first 100 records
```

If your dataset includes vectors, by default these will **not** get pulled with the rest of the dataset in order to improve performace. If you would like to pull the vectors in your records, you will need to specify it like so:
If your dataset includes vectors, by default these will **not** get pulled with the rest of the dataset in order to improve performance. If you would like to pull the vectors in your records, you will need to specify it like so:

::::{tab-set}

Expand Down Expand Up @@ -204,4 +204,4 @@ df = dataset_rg.to_pandas()
df.to_csv("my_dataset.csv") # Save as CSV
df.to_json("my_dataset.json") # Save as JSON
df.to_parquet("my_dataset.parquet") # Save as Parquet
```
```
4 changes: 2 additions & 2 deletions docs/_source/practical_guides/fine_tune.md
Original file line number Diff line number Diff line change
Expand Up @@ -533,7 +533,7 @@ task = TrainingTask.for_sentence_similarity(
)
```

For datasets that where annotated with numerical values we could also pass the label strategy we want to use (let's assume we have another question in the dataset named "other-question" that contains values that come from rated answers):
For datasets that were annotated with numerical values we could also pass the label strategy we want to use (let's assume we have another question in the dataset named "other-question" that contains values that come from rated answers):

```python
task = TrainingTask.for_sentence_similarity(
Expand Down Expand Up @@ -1547,4 +1547,4 @@ Options:
--update-config-kwargs TEXT update_config() kwargs to be passed as a dictionary. [default: {}]
--help Show this message and exit.

```
```
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
"\n",
"The basic idea is to use a pre-trained model to generate a vector representation for each relevant `TextFields` within the records. These vectors are then indexed within our databse and can then used to search based the similarity between texts. This should be useful for searching similar records based on the semantic meaning of the text.\n",
"\n",
"To get the these vectors and config, we will use the `SentenceTransformersExtractor` based on the [sentence-transformers](https://www.sbert.net/index.html) library. The default model we use for this is the [TaylorAI/bge-micro-v2](https://huggingface.co/TaylorAI/bge-micro-v2), which offers a nice trade-off between speed and accuracy, but you can use any model from the [sentence-transformers](https://www.sbert.net/index.html) library or from the [Hugging Face Hub](https://huggingface.co/models?library=sentence-transformers)."
"To get these vectors and config, we will use the `SentenceTransformersExtractor` based on the [sentence-transformers](https://www.sbert.net/index.html) library. The default model we use for this is the [TaylorAI/bge-micro-v2](https://huggingface.co/TaylorAI/bge-micro-v2), which offers a nice trade-off between speed and accuracy, but you can use any model from the [sentence-transformers](https://www.sbert.net/index.html) library or from the [Hugging Face Hub](https://huggingface.co/models?library=sentence-transformers)."
]
},
{
Expand Down

0 comments on commit 02413d1

Please sign in to comment.