Skip to content

Commit

Permalink
feat: streamlit page transcription
Browse files Browse the repository at this point in the history
  • Loading branch information
leoguillaumegouv committed Nov 7, 2024
1 parent ee92568 commit d3e7afa
Show file tree
Hide file tree
Showing 11 changed files with 394 additions and 44 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Tous les changements notables de l'application sont documentés dans ce fichier.
## [Alpha] - 2024-11-05

- 🎉 Ajout de l'endpoint POST `/audio/transcriptions` pour la transcription d'audio
- 🎉 Ajout d'une page de transcription d'audio dans l'UI

## [Alpha] - 2024-10-23

Expand Down
30 changes: 24 additions & 6 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,11 @@

Pour contribuer au projet, merci de suivre les instructions suivantes.

> ⚠️ **Attention** : Vous devez disposer d'une API de modèle de language et d'embeddings pour lancer l'API en local.
# Commit

Merci de respecter la convention suivante pour vos commits :
Merci de respecter la convention suivante pour vos commits :

```
[doc|feat|fix](*) commit object (in english)
Expand All @@ -17,25 +19,41 @@ feat(collections): collection name retriever

# Packages

1. Installez [libmagic](https://man7.org/linux/man-pages/man3/libmagic.3.html)

2. Dans un environnement virtuel Python, installez les packages Python présents dans le fichier *[pyproject.toml](./pyproject.toml)*
1. Dans un environnement virtuel Python, installez les packages Python présents dans le fichier *[pyproject.toml](./pyproject.toml)*

```bash
pip install ".[ui,app,dev,test]"
pre-commit install
```

# Tests
# Lancement des services

Merci, avant chaque pull request, de vérifier le bon déploiement de votre API en exécutant des tests unitaires.
Pour plus d'information sur le déploiement des services, veuillez consulter la [documentation dédiée](./docs/deployment.md).

## API (FastAPI)

1. Après avoir créé un fichier *config.yml*, lancez l'API en local

```bash
uvicorn app.main:app --port 8080 --log-level debug --reload
```

## User interface (Streamlit)

1. Lancez l'API en local (voir la section[Lancement de l'API](#lancement-de-l-api))

2. Lancez l'UI en local
```bash
python -m streamlit run ui/chat.py --server.port 8501 --browser.gatherUsageStats false --theme.base light
```
# Tests
Merci, avant chaque pull request, de vérifier le bon déploiement de votre API en exécutant des tests unitaires.
1. Lancez l'API en local (voir la section[Lancement de l'API](#lancement-de-l-api))
2. Exécutez les tests unitaires
```bash
Expand Down
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,7 @@
<summary><h1>Albert API</h1></summary>

![](https://img.shields.io/badge/version-alpha-yellow) ![](https://img.shields.io/badge/Python-3.12-green) ![](https://img.shields.io/badge/vLLM-v0.6.3.post1-blue) ![](https://img.shields.io/badge/HuggingFace%20Text%20Embeddings%20Inference-1.5-red)<br>
<a href="https://albert.api.etalab.gouv.fr/documentation"><b>Documentation</b></a> | <a href="https://github.com/etalab-ia/albert-api/blob/main/CHANGELOG.md"><b>Changelog</b></a> | <a href="https://huggingface.co/AgentPublic"><b>HuggingFace</b></a>
| <a href="https://albert.api.etalab.gouv.fr/swagger"><b>Swagger</b></a> <br><br>
<a href="https://github.com/etalab-ia/albert-api/blob/main/CHANGELOG.md"><b>Changelog</b></a> | <a href="https://albert.api.etalab.gouv.fr/documentation"><b>Documentation</b></a> | <a href="https://albert.api.etalab.gouv.fr/status"><b>Status</b></a> | <a href="https://albert.api.etalab.gouv.fr/swagger"><b>Swagger</b></a> <br><br>
</ul></div>

Albert API est une initiative d'[Etalab](https://www.etalab.gouv.fr/). Il s'agit d'une API open source d'IA générative développée par Etalab. Elle permet d'être un proxy entre des modèles de langage et vos données. Elle agrège les services suivants :
Expand Down
6 changes: 4 additions & 2 deletions app/endpoints/audio.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from typing import List
from typing import List, Literal

from fastapi import APIRouter, Form, Security, Request, UploadFile, File

Expand All @@ -8,9 +8,11 @@
from app.utils.security import check_api_key, check_rate_limit, User
from app.utils.lifespan import clients, limiter
from app.utils.exceptions import ModelNotFoundException
from app.utils.variables import SUPPORTED_LANGUAGES


router = APIRouter()
SUPPORTED_LANGUAGES_VALUES = sorted(set(SUPPORTED_LANGUAGES.values())) + sorted(set(SUPPORTED_LANGUAGES.keys()))


@router.post("/audio/transcriptions")
Expand All @@ -19,7 +21,7 @@ async def audio_transcriptions(
request: Request,
file: UploadFile = File(...),
model: str = Form(...),
language: str = Form(None),
language: Literal[*SUPPORTED_LANGUAGES_VALUES] = Form("fr"),
prompt: str = Form(None),
response_format: str = Form("json"),
temperature: float = Form(0),
Expand Down
116 changes: 115 additions & 1 deletion app/utils/variables.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,121 @@
JSON_TYPE = "application/json"
TXT_TYPE = "text/plain"
HTML_TYPE = "text/html"
# @TODO : add DOCX_TYPE (application/vnd.openxmlformats-officedocument.wordprocessingml.document)
ROLE_LEVEL_0 = 0
ROLE_LEVEL_1 = 1
ROLE_LEVEL_2 = 2
# @TODO : add DOCX_TYPE (application/vnd.openxmlformats-officedocument.wordprocessingml.document)
SUPPORTED_LANGUAGES = {
"afrikaans": "af",
"albanian": "sq",
"amharic": "am",
"arabic": "ar",
"armenian": "hy",
"assamese": "as",
"azerbaijani": "az",
"bashkir": "ba",
"basque": "eu",
"belarusian": "be",
"bengali": "bn",
"bosnian": "bs",
"breton": "br",
"bulgarian": "bg",
"burmese": "my",
"cantonese": "yue",
"castilian": "es",
"catalan": "ca",
"chinese": "zh",
"croatian": "hr",
"czech": "cs",
"danish": "da",
"dutch": "nl",
"english": "en",
"estonian": "et",
"faroese": "fo",
"finnish": "fi",
"flemish": "nl",
"french": "fr",
"galician": "gl",
"georgian": "ka",
"german": "de",
"greek": "el",
"gujarati": "gu",
"haitian": "ht",
"haitian creole": "ht",
"hausa": "ha",
"hawaiian": "haw",
"hebrew": "he",
"hindi": "hi",
"hungarian": "hu",
"icelandic": "is",
"indonesian": "id",
"italian": "it",
"japanese": "ja",
"javanese": "jw",
"kannada": "kn",
"kazakh": "kk",
"khmer": "km",
"korean": "ko",
"lao": "lo",
"latin": "la",
"latvian": "lv",
"letzeburgesch": "lb",
"lingala": "ln",
"lithuanian": "lt",
"luxembourgish": "lb",
"macedonian": "mk",
"malagasy": "mg",
"malay": "ms",
"malayalam": "ml",
"maltese": "mt",
"mandarin": "zh",
"maori": "mi",
"marathi": "mr",
"moldavian": "ro",
"moldovan": "ro",
"mongolian": "mn",
"myanmar": "my",
"nepali": "ne",
"norwegian": "no",
"nynorsk": "nn",
"occitan": "oc",
"panjabi": "pa",
"pashto": "ps",
"persian": "fa",
"polish": "pl",
"portuguese": "pt",
"punjabi": "pa",
"pushto": "ps",
"romanian": "ro",
"russian": "ru",
"sanskrit": "sa",
"serbian": "sr",
"shona": "sn",
"sindhi": "sd",
"sinhala": "si",
"sinhalese": "si",
"slovak": "sk",
"slovenian": "sl",
"somali": "so",
"spanish": "es",
"sundanese": "su",
"swahili": "sw",
"swedish": "sv",
"tagalog": "tl",
"tajik": "tg",
"tamil": "ta",
"tatar": "tt",
"telugu": "te",
"thai": "th",
"tibetan": "bo",
"turkish": "tr",
"turkmen": "tk",
"ukrainian": "uk",
"urdu": "ur",
"uzbek": "uz",
"valencian": "ca",
"vietnamese": "vi",
"welsh": "cy",
"yiddish": "yi",
"yoruba": "yo",
}
5 changes: 3 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,13 @@ requires-python = ">=3.12"
license = { text = "MIT" }
dependencies = [
"openai==1.43.0",
"requests==2.32.3",
]

[project.optional-dependencies]
ui = [
"requests==2.32.3",
"streamlit==1.38.0",
"streamlit==1.39.0",
"streamlit-extras==0.5.0",
]
app = [
"langchain==0.2.15",
Expand Down
41 changes: 25 additions & 16 deletions ui/chat.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
from openai import OpenAI
import requests
import streamlit as st
from streamlit_extras.stylable_container import stylable_container

from config import BASE_URL
from utils import get_collections, get_models, header, set_config
Expand All @@ -14,7 +15,7 @@

# Data
try:
language_models, embeddings_models = get_models(api_key=API_KEY)
language_models, embeddings_models, _ = get_models(api_key=API_KEY)
collections = get_collections(api_key=API_KEY)
except Exception:
st.error("Error to fetch user data.")
Expand All @@ -25,11 +26,10 @@

# Sidebar
with st.sidebar:
st.title("Model parameters")
params = {"sampling_params": dict(), "rag": dict()}

st.title("Chat parameters")
params["sampling_params"]["model"] = st.selectbox("Language model", language_models)
params["sampling_params"]["temperature"] = st.number_input("Temperature", value=0.1)
params["sampling_params"]["temperature"] = st.slider("Temperature", value=0.2, min_value=0.0, max_value=1.0, step=0.1)
params["sampling_params"]["max_tokens"] = st.number_input("Max tokens (optional)", value=400)

st.title("RAG parameters")
Expand All @@ -43,18 +43,27 @@
params["rag"]["k"] = st.number_input("Top K", value=3)

# Main
col1, col2 = st.columns([0.85, 0.15])
with col1:
new_chat = st.button("New chat")
with col2:
if model_collections:
rag = st.toggle("Activated RAG", value=False, disabled=not bool(params["rag"]["collections"]))
else:
rag = st.toggle("Activated RAG", value=False, disabled=True)
if new_chat:
st.session_state.pop("messages", None)
st.session_state.pop("sources", None)
st.rerun()
with stylable_container(
key="Chat",
css_styles="""
button{
float: right;
}
""",
):
col1, col2 = st.columns(2)
with col2:
new_chat = st.button("New chat")
with col1:
if model_collections:
rag = st.toggle("Activated RAG", value=True, disabled=not bool(params["rag"]["collections"]))
else:
rag = st.toggle("Activated RAG", value=False, disabled=True)

if new_chat:
st.session_state.pop("messages", None)
st.session_state.pop("sources", None)
st.rerun()

if "messages" not in st.session_state:
st.session_state.messages = []
Expand Down
Loading

0 comments on commit d3e7afa

Please sign in to comment.