From 2f7d2d826d600e06bb95a2ea7ce7a7507148c6d2 Mon Sep 17 00:00:00 2001
From: gramirez-prompsit <32385845+gramirez-prompsit@users.noreply.github.com>
Date: Tue, 29 Aug 2023 15:36:09 +0200
Subject: [PATCH] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index a7b9a7d4..66905302 100644
--- a/README.md
+++ b/README.md
@@ -35,7 +35,7 @@ Code and data are located in `/work`
 - Sentence length distribution: tokens per sentence for each language, showing total, unique and duplicate sentences.
 - Language distribution: shows percentage of automatically identified languages.
 - Quality Score distribution: as per language models (monolingual) or bicleaner scores (tool that computes the likelihood of two sentences of being mutual translations)
-- Noise distribution: the result of applying hard rules and computing which percentage is affected by them (too short or too long sentences, sentences being URLs, sentences containing poor language, etc.)
+- Noise distribution: the result of applying hard rules and computing which percentage is affected by them (too short or too long sentences, sentences being URLs, bad encoding, sentences containing poor language, etc.)
 - Common n-grams: 1-5 more frequent n-grams
 
 - MORE TO BE ADDED, SUGGESTIONS WELCOME!