Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gemini implementation #216

Closed
NikitaVrnv opened this issue Dec 17, 2024 · 3 comments
Closed

Gemini implementation #216

NikitaVrnv opened this issue Dec 17, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@NikitaVrnv
Copy link
Collaborator

image
119-2

This is unbelievable OCR service, just take a look @MLhotak @mduda100871 @DanielaIwashita

@NikitaVrnv NikitaVrnv added the enhancement New feature or request label Dec 17, 2024
@NikitaVrnv NikitaVrnv self-assigned this Dec 17, 2024
@NikitaVrnv NikitaVrnv changed the title Gemini simplementation Gemini implementation Dec 17, 2024
@NikitaVrnv
Copy link
Collaborator Author

NikitaVrnv commented Dec 17, 2024

The advantages are:

  1. Multilang. support;
  2. Low cost;
  3. Possibility to segment letters metadata;
  4. AI-AI-AI. The last Google model. The totally best.
  5. Thanks ;)
  6. Supports of any handwritten letter. ANY LANGUAGE;
  7. TOTALLY THE BEST.

@NikitaVrnv
Copy link
Collaborator Author

Pricing Table (Estimated):

I will provide the pricing in US dollars ($), and please note that these are just estimations, and should be verified with the official documentation. I will give an approximation based on the gemini-1.5-flash-latest model, that we have been using for the code.

Feature/Component Unit Approximate Price Notes
Gemini 1.5 Flash Input (Text) 1,000 tokens ~$0.0003 - $0.0007 The cost depends on the size of the prompt. This is the text that is sent to the API, including instructions.
Gemini 1.5 Flash Input (Image) 1 image ~$0.0007 - $0.002 The cost depends on the size of the image, as they are converted into tokens. This is the image of the letter being scanned.
Gemini 1.5 Flash Output (Text) 1,000 tokens ~$0.0012 This includes the text that is generated by Gemini, in the form of the summary, metadata and full text.
Gemini 1.5 Pro Input (Text) 1,000 tokens ~$0.00035 The cost depends on the size of the prompt. This is the text that is sent to the API, including instructions. If you decide to use this model.
Gemini 1.5 Pro Input (Image) 1 image ~$0.0015 The cost depends on the size of the image, as they are converted into tokens. This is the image of the letter being scanned. If you decide to use this model.
Gemini 1.5 Pro Output (Text) 1,000 tokens ~$0.0015 This includes the text that is generated by Gemini, in the form of the summary, metadata and full text. If you decide to use this model.

Assumptions:

  • Handwritten Letter: A typical scanned handwritten letter, is a JPEG or PNG image, and it contains approximately 100-200 words. A full A4 page might contain approximately 200-300 words, that will translate into approximately 1000 tokens, or more.
  • Gemini Task: The Gemini API will perform the OCR, language detection, meta
    @MLhotak @mduda100871

@EliskaMullerova
Copy link

@NikitaVrnv

;-)

1200px-The_Isolator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants
@NikitaVrnv @EliskaMullerova and others