MT API

API for serving machine translation models.

It can run three types of translation systems:

ctranslate2 models
Certain transformer-based models models provided through huggingface.
- OPUS and OPUS-big models of Helsinki-NLP
- NLLB (Multilingual)
- M2M100 (Multilingual)
Custom translators specified as a python module (Experimental)

Model specifications need to go in config.json.

Model configuration

Configuration file syntax

API configuration file (config.json) is where we specify the models to load and their pipeline. It is a JSON format file containing a dictionary languages and a list models. Languages is just an (optional) mapping between language codes (e.g. en) and language names (e.g. English). model lists the model configurations as dictionaries. An minimal example of configuration file:

{
  "languages": {
    "es": "Spanish",
    "ca": "Catalan",
    "en": "English"
  },
  "models": [
    {
      "src": "ca",
      "tgt": "es",
      "model_type": "ctranslator2",
      "hugging_face_repo_id": "id...." // repo id to download the model (optional)",
      "model_path": "model...", //model directory name under models
      "src_sentencepiece_model": "spm.model",
      "tgt_sentencepiece_model": "spm.model",
      "sentence_split": "nltk",
      "pipeline": {
        "sentencepiece": true,
        "translate": true
      }
    },
    ...
  ]
}

Setup development environment

Set the environment variables:

MT_API_CONFIG=config.json
MODELS_ROOT=./models
MT_API_DEVICE=cpu #or "gpu"

Create an environment

python -m venv venv
source venv/bin/activate

pip install -r requirements.txt
python main.py [--load es-ca ca-es ...] [--models ./models]

It takes two arguments both optional. With the argument --load you can specify ids for models to be loaded. If this argument contains all it will load all the models. by default it will load es-ca and ca-es

python main.py --load all

On the other hand, the argument --models is used to specify path of preexisting model, it will prevent downloading and use these preexisting models.

python main.py --models ./path_to_models

Download models (required for load models from local path)

create directory for models if not exists

mkdir models && cd models

Install git lfs

git lfs install

Now you can download any model from this repository MT-MODELS.

Next command lines examples of how to download these models.

Ex1: Download model ca-es to translate from catalan to spanish

git clone https://huggingface.co/projecte-aina/aina-translator-ca-es

Ex2: Download models es-ca to translate from spanish to catalan

git clone https://huggingface.co/projecte-aina/aina-translator-es-ca

Docker

Docker launch from the hub

To launch using lastest version available on the Dockerhub:

docker run -p 8000:8000 -v ./models:/app/models projecteaina/mt-api:latest

Check out the documentation available on the Dockerhub

Run offline mode with docker

docker run -p 8000:8000 -e HF_HUB_OFFLINE=True  -v ./models:/app/models projecteaina/mt-api:latest  [--load es-ca ca-es ...]

--load argument is used to prevent downloading all the models, by default it will load only ca-es es-ca. If you need to load all the models you can add:

--load all

Deploy with docker-compose

make deploy

To use GPU on docker-compose

Do the following edits on docker-compose file

Remove comment on runtime: nvidia line
Under environment, set MT_API_DEVICE=gpu
Build and run.

Rest API Endpoints

1. Translate text

Method	Endpoint	Description
POST	`/api/v1/translate`	Translate text.

Request Parameters:

Parameter	Type	Description
src	string	Source language code (e.g., "es")
tgt	string	Target language code (e.g., "ca")
text	string	Text to translate (e.g., "Hola cómo estás")

2. Translate batch of text

Method	Endpoint	Description
POST	`/api/v1/translate/batch`	Translate batch of texts.

Request Parameters:

Parameter	Type	Description
src	string	Source language code (e.g., "es")
tgt	string	Target language code (e.g., "ca")
texts	string	List of texts (e.g., ["Hola", "Cómo estás"])

2. Check api health

Method	Endpoint	Description
POST	`/health`	Check the api health, if the api is working correctly.

Example calls

cURL

curl --location --request POST 'http://127.0.0.1:8000/api/v1/translate' \
--header 'Content-Type: application/json' \
--data-raw '{"src":"ca", "tgt":"es", "text":"c'\''Això es una prova."}'

Python

import httpx
translate_service_url = "http://127.0.0.1:8000/api/v1/translate"
json_data = {"src":"ca", "tgt":"es", "text":"Això es una prova."}
r = httpx.post(translate_service_url, json=json_data)
response = r.json()
print("Translation:", response['translation'])

Using alternative models

You can specify usage of alternative models with the alt parameter in your requests.

cURL

curl --location --request POST 'http://127.0.0.1:8001/api/v1/translate' \
--header 'Content-Type: application/json' \
--data-raw '{"src":"en", "tgt":"fr", "alt":"big", text":"this is a test."}'

Python

import httpx
translate_service_url = "http://127.0.0.1:8000/api/v1/translate"
json_data = {'src':'en', 'tgt':'fr', 'alt':'big', text':"this is a test."}
r = httpx.post(translate_service_url, json=json_data)
response = r.json()
print("Translation:", response['translation'])

Batch translation

Endpoint for translating a list of sentences.

cURL

curl --location --request POST 'http://127.0.0.1:8000/api/v1/translate/batch' \
--header 'Content-Type: application/json' \
--data-raw '{"src":"ca", "tgt":"es", "texts":["This is a sentence", "this is another sentence"]}'

Python

import httpx
translate_service_url = "http://127.0.0.1:8001/api/v1/translate/batch"
json_data = {'src':'fr', 'tgt':'en', 'texts':["This is a sentence", "this is another sentence"]}
r = httpx.post(translate_service_url, json=json_data)
response = r.json()
print("Translation:", response['translation'])

Retrieve languages

Retrieves a the list of supported languages and model pairs.

cURL

curl 'http://127.0.0.1:8000/api/v1/translate/'

Python

import httpx
translate_service_url = "http://127.0.0.1:8000/api/v1/translate/"
r = httpx.get(translate_url)
response = r.json()
print(response)

Response description

{
  "languages": {
    "ar": "Levantine Arabic",
    "en": "English",
    "fr": "French",
    "swc": "Congolese Swahili",
    "ti": "Tigrinya"
  }, "models":{"fr":{"en":["fr_en"],"swc":["fr_swc"]},"swc":{"fr":["swc_fr"]},"ti":{"en":["ti_en"]},"en":{"ti":["en_ti"]},"ar":{"en":["ar_en", "ar_en_domainspecific"]}}
}

languages: All language codes used in the system and their respective language names.
models: Three level dictionary listing the available models with the structure:

    <source-language>
        ↳ <target-language>
            ↳ List of <model-id>s associated with the language pair

Note: The third level model list is for seeing different versions of the model. There is always a default model in the language pair with id <src>_<tgt> and there could be alternative models with model id <src>_<tgt>_<alt-tag>.

For example in the setup above, there are two alternative models in the Arabic-English direction: A default model ar_en and a domain specific ar_en_domainspecific model where domainspecific is the alternative model id. For the rest of the language pairs there is only one default model.

Api Testing

To test the api just run the next command in terminal

sh run_test.sh

Authors and acknowledgment

Developed by the Language Technologies Unit in Barcelona Supercomputing Center. The code is based on TWB-MT-fastapi which has a GNU General Public License.

License

GNU General Public License v3.0

Funding

This work is funded by the Generalitat de Catalunya within the framework of Projecte AINA.

Name		Name	Last commit message	Last commit date
Latest commit History 182 Commits
.github/workflows		.github/workflows
app		app
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
config.json		config.json
docker-compose.yml		docker-compose.yml
logging.yml		logging.yml
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_local.sh		run_local.sh
run_test.sh		run_test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MT API

Model configuration

Configuration file syntax

Setup development environment

Download models (required for load models from local path)

Docker

Docker launch from the hub

Run offline mode with docker

Deploy with docker-compose

To use GPU on docker-compose

Rest API Endpoints

1. Translate text

2. Translate batch of text

2. Check api health

Example calls

cURL

Python

Using alternative models

cURL

Python

Batch translation

cURL

Python

Retrieve languages

cURL

Python

Response description

Api Testing

Authors and acknowledgment

License

Funding

About

Releases

Packages

Contributors 5

Languages

License

projecte-aina/mt-api

Folders and files

Latest commit

History

Repository files navigation

MT API

Model configuration

Configuration file syntax

Setup development environment

Download models (required for load models from local path)

Docker

Docker launch from the hub

Run offline mode with docker

Deploy with docker-compose

To use GPU on docker-compose

Rest API Endpoints

1. Translate text

2. Translate batch of text

2. Check api health

Example calls

cURL

Python

Using alternative models

cURL

Python

Batch translation

cURL

Python

Retrieve languages

cURL

Python

Response description

Api Testing

Authors and acknowledgment

License

Funding

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages