This projects reads the text content from PDF files, translates it and saves translated text as a formatted PDF file.
Two different translating options can be used:
- Google translate python library googletrans (can be installed via pip
). To use this option, use script
python translator.py
- AWS Translate. This option requires you to have AWS account. You need to
obtain the AWS access key and a secret access key and configure them either
as environment variables or local files. To use AWS Translate:
python aws_translator.py
Additional python packages that are required to install are:
PyPDF2
reportlab
boto3
(AWS Python SDK)
AWS offer free tier to try out the AWS Translator. Free tier included 2 million characters for 12 months. Please note that after that the charges will occur and AWS will charge $15 per million characters. More info: Click Here for more info
Please note that there was a breaking change for googletrans. If you get error: error in result (AttributeError: 'NoneType' object has no attribute 'group')
Then to fix you need to do 2 things:
- Change URL to URL_COM = 'translate.googleapis.com'
- Install the latest version of Google Translate: pip install googletrans==3.1.0a0 It fixed the issue. More information about the issue: https://stackoverflow.com/questions/52455774/googletrans-stopped-working-with-error-nonetype-object-has-no-attribute-group#52456197
- Change language to which you would like to translate to: LANG = "lv" (ln: 19 in translator.py & ln: 13 in aws_translator.py)
- Change the file name from
file_name = "example.pdf"
. Raplace example.pdf to match the pdf file name you have. (ln: 61 in translator.py & ln: 59 in aws_translator.py)