Gemini implementation #216

NikitaVrnv · 2024-12-17T05:48:06Z

This is unbelievable OCR service, just take a look @MLhotak @mduda100871 @DanielaIwashita

NikitaVrnv · 2024-12-17T05:51:33Z

The advantages are:

Multilang. support;
Low cost;
Possibility to segment letters metadata;
AI-AI-AI. The last Google model. The totally best.
Thanks ;)
Supports of any handwritten letter. ANY LANGUAGE;
TOTALLY THE BEST.

NikitaVrnv · 2024-12-17T06:02:06Z

Pricing Table (Estimated):

I will provide the pricing in US dollars ($), and please note that these are just estimations, and should be verified with the official documentation. I will give an approximation based on the gemini-1.5-flash-latest model, that we have been using for the code.

Feature/Component	Unit	Approximate Price	Notes
Gemini 1.5 Flash Input (Text)	1,000 tokens	~$0.0003 - $0.0007	The cost depends on the size of the prompt. This is the text that is sent to the API, including instructions.
Gemini 1.5 Flash Input (Image)	1 image	~$0.0007 - $0.002	The cost depends on the size of the image, as they are converted into tokens. This is the image of the letter being scanned.
Gemini 1.5 Flash Output (Text)	1,000 tokens	~$0.0012	This includes the text that is generated by Gemini, in the form of the summary, metadata and full text.
Gemini 1.5 Pro Input (Text)	1,000 tokens	~$0.00035	The cost depends on the size of the prompt. This is the text that is sent to the API, including instructions. If you decide to use this model.
Gemini 1.5 Pro Input (Image)	1 image	~$0.0015	The cost depends on the size of the image, as they are converted into tokens. This is the image of the letter being scanned. If you decide to use this model.
Gemini 1.5 Pro Output (Text)	1,000 tokens	~$0.0015	This includes the text that is generated by Gemini, in the form of the summary, metadata and full text. If you decide to use this model.

Assumptions:

Handwritten Letter: A typical scanned handwritten letter, is a JPEG or PNG image, and it contains approximately 100-200 words. A full A4 page might contain approximately 200-300 words, that will translate into approximately 1000 tokens, or more.
Gemini Task: The Gemini API will perform the OCR, language detection, meta
@MLhotak @mduda100871

EliskaMullerova · 2024-12-27T19:00:00Z

@NikitaVrnv

;-)

NikitaVrnv added the enhancement New feature or request label Dec 17, 2024

NikitaVrnv self-assigned this Dec 17, 2024

NikitaVrnv changed the title ~~Gemini simplementation~~ Gemini implementation Dec 17, 2024

NikitaVrnv closed this as completed Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemini implementation #216

Gemini implementation #216

NikitaVrnv commented Dec 17, 2024

NikitaVrnv commented Dec 17, 2024 •

edited

Loading

NikitaVrnv commented Dec 17, 2024

EliskaMullerova commented Dec 27, 2024

Gemini implementation #216

Gemini implementation #216

Comments

NikitaVrnv commented Dec 17, 2024

NikitaVrnv commented Dec 17, 2024 • edited Loading

NikitaVrnv commented Dec 17, 2024

EliskaMullerova commented Dec 27, 2024

NikitaVrnv commented Dec 17, 2024 •

edited

Loading