chore: update documentation format and some key errors

ddu72 · Dec 20, 2024 · 61aa3ca · 61aa3ca
1 parent 6ab8edd
commit 61aa3ca
Show file tree

Hide file tree

Showing 22 changed files with 405 additions and 280 deletions.
diff --git a/README.md b/README.md
@@ -2,16 +2,21 @@
     <img width="50" src="./docs/src/images/EEUM_logo_EN.jpg"> 
 </div>
 
-# PI24 - Real-Time Speech Synthesis 
+# PI24 - Real-Time Speech Synthesis
+
 **Authors:**
+
 - [Beatriz Monteiro](https://github.com/5ditto)
 - [Daniel Du](https://github.com/ddu72)
 - [Daniel Furtado](https://github.com/danielfurtado11)
 - [Miguel Gomes](https://github.com/MayorX500)
 - [Moisés Antunes](https://github.com/MoisesA14)
 - [Telmo Maciel](https://github.com/telmomaciel9)
 
+## A better version of the Documentation can be found [here](https://ddu72.github.io/PI24/)
+
 ## Description
+
 This project aims to create a text-to-speech system that can be used in real-time. The system is divided into three main components: the client, the proxy, and the server. The client is responsible for sending the text to be synthesized to the proxy. The proxy is responsible for redirecting the requests to the available servers. The server is responsible for normalizing the text, synthesizing it and sending the audio back to the client.
 The project was proposed by the company Agentifai and was being developed by a group of students from the University of Minho.
 
@@ -22,109 +27,136 @@ The project was proposed by the company Agentifai and was being developed by a g
 The Text-to-Speech (TTS) system follows a modular pipeline to convert input text or Speech Synthesis Markup Language (SSML) into audio files or real-time streams. Below is a description of the key components and their roles in the process:
 
 1. User Input or SSML File:
-    
-    Users can provide natural text or structured SSML files. However, due to limitations in supporting SSML files, this functionality has not been implemented. Currently, the system only processes natural text inputs.
+
+   Users can provide natural text or structured SSML files. However, due to limitations in supporting SSML files, this functionality has not been implemented. Currently, the system only processes natural text inputs.
 
 2. SSML Parser:
-    
-    (Not implemented) This component was intended to extract relevant text data from SSML files and prepare it for further processing.
+
+   (Not implemented) This component was intended to extract relevant text data from SSML files and prepare it for further processing.
 
 3. Normalizer:
-    
-    The normalizer standardizes the input text (e.g., expanding abbreviations, handling numbers) to ensure it is ready for phonemization.
+
+   The normalizer standardizes the input text (e.g., expanding abbreviations, handling numbers) to ensure it is ready for phonemization.
 
 4. Phonemizer:
-    
-    The phonemizer converts normalized text into phonetic representations, enabling accurate pronunciation during synthesis. This component is also subject to training and evaluation cycles to improve accuracy and performance.
+
+   The phonemizer converts normalized text into phonetic representations, enabling accurate pronunciation during synthesis. This component is also subject to training and evaluation cycles to improve accuracy and performance.
 
 5. TTS Model:
-    
-    The text's phonetic representation is passed to the TTS model, which generates audio output. The model has been optimized for both file-based outputs and real-time streaming scenarios.
+
+   The text's phonetic representation is passed to the TTS model, which generates audio output. The model has been optimized for both file-based outputs and real-time streaming scenarios.
 
 6. Audio Output:
-
-    The generated audio can be delivered as a downloadable file or streamed directly, depending on user requirements.
 
-The system's modular architecture ensures flexibility, enabling enhancements to individual components without disrupting the entire workflow. While SSML support was initially planned, the current implementation focuses solely on natural text processing.
+   The generated audio can be delivered as a downloadable file or streamed directly, depending on user requirements.
 
+The system's modular architecture ensures flexibility, enabling enhancements to individual components without disrupting the entire workflow. While SSML support was initially planned, the current implementation focuses solely on natural text processing.
 
 ## Components
+
 ### Client
+
 The client is a simple TUI program that sends the text to be synthesized to the proxy. It is implemented in Python.
 
 ### Proxy
+
 The proxy receives the text to be synthesized from the client and redirects it to the available servers. It is implemented in Python.
 
 ### Server
+
 The server receives the text to be synthesized from the proxy, sends it to be normalized, receives the normalized text, synthesizes it and sends the audio back to the client. It is implemented in Python.
 
 ### Normalizer
+
 The normalizer receives the text to be synthesized from the server, normalizes it and sends it back to the server. It is implemented in Python.
 
 ### API & Frontend
+
 The API is responsible for receiving the text to be synthesized and returning the audio. The frontend is a simple web interface that allows the user to interact with the API.
 
 ## Architecture
+
 The system was implemented using a microservices architecture. Each component is a separate service that communicates with the others using gRPCs. Each component is implemented in Python and is dockerized.
 
 ![Architecture](./docs/src/images/architecture.png)
 
 The black arrows represent the flow of the text to be synthesized. The blue arrows represent the flow of the audio.
 
 ## Requirements
+
 - Python 3.12
 - Docker
 - Docker Compose
 
 ## Installation
+
 ### Standalone Program
+
 This allows the user to synthesize text using the Intelex Module. This version is a standalone (Single Service) version of the implementation.
 
 #### Steps
+
 1. Install the requirements: `pip install -r enviroments/server_requirements.txt`
 2. Run the program: `python intlex.py [TEXT] [CONFIG] --output [OUTPUT] --lang [LANG] --kwargs [KWARGS]`
 3. The output will be saved in the output file if provided, otherwise it will be stored in the default output file.
 
 ##### Arguments
+
 - `TEXT`: Text to be synthesized
 - `CONFIG`: Configuration file
 - `OUTPUT`: Output file (optional)
 - `LANG`: Language [pt, en] (optional)
 - `KWARGS`: Additional arguments (optional)
 
 ### Docker
+
 This allows the user to synthesize text using the Intelex Program. This version is a dockerized version of the implementation. It uses a microservices architecture.
 
 #### Steps
-1. Build the docker images: 
 
-    `docker compose build`
-2. Initialize proxy and required services: 
-
-    `docker compose up proxy -d`
-3. Initialize the client: 
+1. Build the docker images:
+
+   `docker compose build`
+
+2. Initialize proxy and required services:
+
+   `docker compose up proxy -d`
+
+3. Initialize the client:
+
+   `docker compose run -e PROXY_SERVER_PORT={PROXY_SERVER_PORT} -e PROXY_SERVER_ADDRESS={PROXY_SERVER_ADDRESS} client`
 
-    `docker compose run -e PROXY_SERVER_PORT={PROXY_SERVER_PORT} -e PROXY_SERVER_ADDRESS={PROXY_SERVER_ADDRESS} client`
 4. The output will be displayed in the terminal.
 
-- To stop the services: 
-
-    `docker compose down`
+- To stop the services:
+  `docker compose down`
 
 ## Improvements
+
 ##### General
+
 - [TODO] Tests
 - [TODO] Documentation
 - [TODO] More languages
+
 ##### Client
+
 - [TODO] Voice option in client
 - [TODO] Better user interface
+
 ##### Proxy
+
 - [FIX] Prints in proxy
 - [TODO] Logfile
+
 ##### Server
+
 - [TODO] Logfile
+
 ##### Normalizer
+
 - [TODO] Logfile
+
 ##### API & Frontend
+
 - [FIX] Not connecting using Docker
diff --git a/docs/README.pdf b/docs/README.pdf
diff --git a/docs/src/app_flow.md b/docs/src/app_flow.md
@@ -1,11 +1,13 @@
 # Flow
-The system, as designed, is composed of several components, each responsible for a specific task. 
+
+The system, as designed, is composed of several components, each responsible for a specific task.
 
 ## App Flow
 
 Based on the [architecture diagram](architecture.md#architecture-image), the flow of the system is as follows:
 
 #### Frontend API flow:
+
 - The user interacts with the system through the [Frontend](components/app.md), which sends requests to the `API`.
 
 - The [API](components/app_api.md) processes the requests and sends them to the `Proxy`.
@@ -22,9 +24,8 @@ Based on the [architecture diagram](architecture.md#architecture-image), the flo
 
 - The [API](components/app_api.md) sends the results back to the `Frontend`, which displays the results to the user.
 
+#### Client flow:
 
-
-#### Client flow: 
 - The user interacts with the system through the [Client](components/app_client.md) , which sends requests directly to the `Proxy`.
 
 - The [Proxy](components/app_proxy.md) routes the requests to an available `Server`.
@@ -41,7 +42,7 @@ Based on the [architecture diagram](architecture.md#architecture-image), the flo
 
 ## Data Flow
 
- The flow of data between these components is crucial for the system to function correctly. The following diagram illustrates the flow of data between the components of the system:
+The flow of data between these components is crucial for the system to function correctly. The following diagram illustrates the flow of data between the components of the system:
 
 ![Data Flow](images/data_flow.png)
 
@@ -58,12 +59,14 @@ The system accepts two types of inputs:
 #### 2. Normalizer:
 
 The input text is sent to the Normalizer, which standardizes it for further processing. For example:
+
 - Expanding abbreviations.
 - Converting numbers into words.
 
 #### 3. TTS Model:
 
 The normalized text is then processed by the TTS Model, which converts the text into audio data. This includes:
+
 - Generating phonetic representations.
 - Applying prosody to ensure naturalness.
 
@@ -78,11 +81,14 @@ The audio data is finalized and saved as an output file (e.g., .wav or .mp3), re
 **Streaming**: The system generates complete audio files and sends them directly to the user, but streaming capabilities could be added in future iterations.
 
 ## Communication
+
 ### Main Components Communication
 
 To handle the comunications between the main components, the system uses `gRPC` as the communication protocol. This allows for fast and efficient communication between the components, ensuring that the system can handle the real-time requirements of the audio synthesis process.
 
 The use of `gRPC` also allows for a technology-agnostic approach to the system, as it can be used with a wide variety of programming languages and platforms.
 
 ### Frontend API Communication
-To handle the communication between the **Frontend** and the **API**, the system uses `HTTP` as the communication protocol. This allows for easy integration with web-based applications and ensures that the system can be easily accessed by a wide variety of devices.
+
+To handle the communication between the **Frontend** and the **API**, the system uses `HTTP` as the communication protocol. This allows for easy integration with web-based applications and ensures that the system can be easily accessed by a wide variety of devices.
+
diff --git a/docs/src/architecture.md b/docs/src/architecture.md
@@ -6,16 +6,18 @@ The system was designed to be simple, modular, and scalable. The architecture is
 ![Architecture](images/architecture.png)
 
 In the diagram above, it is possible to see the main components of the system:
+
 - **Black Arrows**: Represent the flow of "text" between the components.
 - **Blue Arrows**: Represent the flow of audio between the components.
 - **Components**:
-    - [**Frontend**](components/app.md): The user interface, responsible for sending requests to the api and displaying the results.
-    - [**API**](components/app_api.md): One of the possible interfaces of the system, responsible for processing requests from the frontend and sending them to the server.
-    - [**Client**](components/app_client.md): The client is a Terminal-based interface that allows users to interact with the system without being dependent on the API.
-    - [**Proxy**](components/app_proxy.md): The proxy is responsible for routing requests to the server and returning the results to the client and/or API.
-    - [**Server**](components/app_server.md): The server is responsible for processing requests from the proxy and generating the audio output.
-    - [**Normalizer**](components/app_normalizer.md): The normalizer is responsible for processing the input text and preparing it for synthesis.
+  - [**Frontend**](components/app.md): The user interface, responsible for sending requests to the api and displaying the results.
+  - [**API**](components/app_api.md): One of the possible interfaces of the system, responsible for processing requests from the frontend and sending them to the server.
+  - [**Client**](components/app_client.md): The client is a Terminal-based interface that allows users to interact with the system without being dependent on the API.
+  - [**Proxy**](components/app_proxy.md): The proxy is responsible for routing requests to the server and returning the results to the client and/or API.
+  - [**Server**](components/app_server.md): The server is responsible for processing requests from the proxy and generating the audio output.
+  - [**Normalizer**](components/app_normalizer.md): The normalizer is responsible for processing the input text and preparing it for synthesis.
 
 ## Docker
 
-Each component is incapsulated in a Docker container, allowing for easy deployment and scaling. The provided docker-compose file allows for easy deployment of the system on a single machine.
+Each component is incapsulated in a Docker container, allowing for easy deployment and scaling. The provided docker-compose file allows for easy deployment of the system on a single machine.
+
diff --git a/docs/src/closing.md b/docs/src/closing.md
@@ -1,9 +1,10 @@
 # Closing Notes
+
 ## To whom it may concern
 
 This project is open-source and welcomes contributions from the community.
 This project was developed by a team of students from the University of Minho, as part of the curricular unit: 14602 - Informatics Project for the course [Master's in Informatics Engineering](https://www.uminho.pt/EN/education/educational-offer/Cursos-Conferentes-a-Grau/_layouts/15/UMinho.PortalUM.UI/Pages/CatalogoCursoDetail.aspx?itemId=5067&catId=15).
 
 The project was developed under the supervision of [Professor João Miguel Fernandes](https://www.di.uminho.pt/~jmf/) and [Professor Victor Manuel Rodrigues Alves](https://www.di.uminho.pt/~vma/), with a proposal from [Agentifai](https://agentifai.com/) to create a high-quality, low-latency, and modular TTS system for their virtual assistant.
 
-We would also like to thank our Agentifai mentor João Cunha for his guidance and support throughout the project.
+We would also like to thank our Agentifai mentor João Cunha for his guidance and support throughout the project.
diff --git a/docs/src/components/README.md b/docs/src/components/README.md
@@ -3,7 +3,8 @@
 In this section, we will discuss the system components, how they function, and how they interact with each other.
 
 ## Components
-- [Intlex Module](./app_standalone.md) 
+
+- [Intlex Module](./app_standalone.md)
 - [Frontend](./app.md)
 - [API](./app_api.md)
 - [Client](./app_client.md)
@@ -16,9 +17,9 @@ In this section, we will discuss the system components, how they function, and h
 The components communicate with each other in the following ways:
 
 - With the use of GRPC's:
-    - `API` <---> `Proxy`
-    - `Client` <---> `Proxy`
-    - `Proxy` <---> `Server`
-    - `Server` <---> `Normalizer`
+  - `API` <---> `Proxy`
+  - `Client` <---> `Proxy`
+  - `Proxy` <---> `Server`
+  - `Server` <---> `Normalizer`
 - With the use of HTTP Requests and Responses:
-    - `Frontend` <---> `API`
+  - `Frontend` <---> `API`
diff --git a/docs/src/components/app.md b/docs/src/components/app.md
@@ -7,37 +7,38 @@ The frontend is the user interface of the application. It is responsible for ren
 The frontend requires the following:
 
 - ENV: The environment variables must be defined in the `.env` file or as environment variables. The following variables must be defined:
-    - `PORT`: The port on which the frontend will run.
-    - `REACT_APP_API_IP_PORT`: The port of the API server.
-    - `REACT_APP_API_IP_ADDRESS`: The address of the API server.
+
+  - `PORT`: The port on which the frontend will run.
+  - `REACT_APP_API_IP_PORT`: The port of the API server.
+  - `REACT_APP_API_IP_ADDRESS`: The address of the API server.
 
 - Dependencies: The frontend requires the following dependencies:
-    - [Node.js](https://nodejs.org/en/)
-    - [npm](https://www.npmjs.com/get-npm)
-    - [axios](https://www.npmjs.com/package/axios)
-    - [react-audio-player](https://www.npmjs.com/package/react-audio-player)
+  - [Node.js](https://nodejs.org/en/)
+  - [npm](https://www.npmjs.com/get-npm)
+  - [axios](https://www.npmjs.com/package/axios)
+  - [react-audio-player](https://www.npmjs.com/package/react-audio-player)
 
 ### Usage
 
 To run the frontend, you can do the following:
 
 1. Define the api address and port in the `.env` file or as an environment variable. The serve port can also be defined. For example:
-    ```bash
-    export PORT=3000
-    export REACT_APP_API_IP_PORT=5000
-    export REACT_APP_API_IP_ADDRESS={api_address}
-    ```
+   ```bash
+   export PORT=3000
+   export REACT_APP_API_IP_PORT=5000
+   export REACT_APP_API_IP_ADDRESS={api_address}
+   ```
 2. Install the required dependencies by running:
-    ```bash
-    cd app
-    npm install
-    ```
+   ```bash
+   cd app
+   npm install
+   ```
 3. Start the frontend by running:
-    ```bash
+   `bash
     cd app
     npm start
-    ```
-The frontend will be available at `http://localhost:3000`.
+    `
+   The frontend will be available at `http://localhost:3000`.
 
 ### Comunication
 
@@ -49,8 +50,8 @@ The frontend sends a POST request to the API with the text to be synthesized. Th
 
 ```json
 {
-    "text":"O exame de Época Especial realiza-se no dia 10 de Julho, às 9:00, na sala Ed.2 1.03.",
-    "language":"pt"
+  "text": "O exame de Época Especial realiza-se no dia 10 de Julho, às 9:00, na sala Ed.2 1.03.",
+  "language": "pt"
 }
 ```
 
@@ -71,5 +72,3 @@ The frontend is composed of only one page, which contains the text input and the
 ![Frontend Waiting](../images/frontend/frontend_sent.png)
 
 ![Frontend With Audio Player](../images/frontend/frontend_audio.png)
-
-