- Clone the repository
- Navigate to the project folder
- Build the Docker container:
bash deploy/build.sh
- Run the curation and translation job:
bash deploy/run.sh
which uses the following:publisher.py
: Main job control loopsources.json
: Source site configurationbrowser.py
: Extracts text from sitesfinder.txt
: Prompt for finding articles to translatesummarizer.txt
: Prompt for summarizing and ranking articlesllm.py
: Handles LLM connections and formattingtemplater.py
: Creates html and deploys to AWS S3
- Output appears in the
debug
folder (can also push to S3 with AWS credentials) Access logs in thett-logs
Docker volume (NOTE: this directory might be different on your machine, rundocker volume inspect tt-logs
to confirm):Or, follow logs in real-time:less /var/lib/docker/volumes/tt-logs/_data/publisher.log
tail -f /var/lib/docker/volumes/tt-logs/_data/publisher.log
TranslateTribune uses various AI APIs, but can also run 100% locally via open-mixtral-8x7b
or other open models of similar quality.
See sources_debug.json to change models for local testing, sources.json or sources_finance_technology.json.
See llm.py
to see the list of supported models.
TT (Translate Tribune) utilizes approximately 10 million tokens per day for article curation and summarization tasks (refer to finder.txt
and summarizer.txt
for prompt details). While Claude 3 Haiku generally outperforms other models in these tasks across all languages, and would only cost around $2.50 per day to publish from roughly 30 sources into 19 languages at $0.25 per million tokens, Anthropic's closed models and cumbersome API access approval process pose significant challenges.
As strong advocates for free and open software, we strive to use it whenever possible. With this philosophy in mind, TT is designed to be model-agnostic and supports various popular model providers. Moreover, TT has been rigorously tested using exclusively free and open models, such as open-mixtral-8x7b
, enabling it to run on consumer hardware from anywhere in the world without requiring approval from any company or government.
While we do not condone copyright infringement, we believe that our approach of consistently translating, summarizing, providing links to source material, and maintaining transparency through our free and open codebase places us on the right side of the law and any ethical debates. However, we also recognize that ethics can be subjective and often nonsensical. We refuse to be held hostage by any company or government's half-baked ethical theories regarding our work. Instead, we remain committed to our mission of providing accessible and open language translation and summarization services.
-
Mistral AI (Usage) 🌬️
- Very good at European languages 🇪🇺
- Free and open-source models available 🆓
-
Anthropic (Usage) 🤖
- Free Evaluation Period 🎉
- Very good at Asian languages
- Cheapest acceptable model available via API @ $ USD 0.25 per 1M tokens
- Annoying application/approval process 😒
-
OpenAI (Usage) 🧠
- $50 in free credits 💰
-
together.ai (Usage) 🤝
- $25 in free credits 💸
- Free and open-source models available 🆓
-
cohere (Usage) 🧩
- $25 in free credits 💳
- Annoying application/approval process 😕
-
Not Diamond (Usage) 💎
- First 100,000 in query routing free 🎁