GitHub - MITTALBHAVYA/InvoiceDetailsExtractor: Invoice Extraction Application is a Python-based tool built with Streamlit for extracting and processing invoice details from PDFs and images. It uses OCR via PaddleOCR and Generative AI with Google's Gemini API to provide structured data, including customer details, product information, and total amounts

Here's a comprehensive README.md file for your invoice extraction Streamlit application:

# Invoice Extraction Application

## Overview

The Invoice Extraction Application is a powerful tool built with Python and Streamlit that allows users to extract and process invoice details from various file types, including PDFs and images. This application leverages Optical Character Recognition (OCR) and Generative AI to provide structured data from invoices.

## Features

- **File Upload**: Supports uploading of PDF and image files.
- **Extraction Methods**: Offers two methods of extraction:
  - **Direct Extraction (Image-based)**: Processes images directly for extraction.
  - **Text Extraction (Text-based)**: Extracts text from PDFs and then processes the text.
- **Structured Output**: Provides extracted invoice details in a well-organized JSON format.

## Requirements

Ensure you have Python 3.7+ installed. Create a virtual environment and install the necessary packages listed in `requirements.txt`:

```bash
pip install -r requirements.txt

Setup

Clone the Repository

git clone https://github.com/MITTALBHAVYA/InvoiceDetailsExtractor
cd invoice-extraction-app

Set Up Environment Variables

Create a .env file in the project root directory with the following content:
```
GEMINI_API_KEY=your_api_key_here
```
Replace your_api_key_here with your actual API key.
Run the Application

Start the Streamlit app:
```
streamlit run app.py
```
This will launch the application in your default web browser.

Usage

Upload a File

Use the file uploader to choose an invoice file. Supported formats include PDF and common image formats (PNG, JPG, JPEG, GIF, BMP, TIFF).
Select Extraction Method

Choose between:
- Direct Extraction (Image-based): Suitable for image files.
- Text Extraction (Text-based): Suitable for PDF files.
Process the File

Click the "Process" button to start the extraction. The application will process the file and display the extracted details.
View Results

The extracted details will be displayed in a structured format. You can see customer details, product information, and the total amount extracted from the invoice.

Contributing

Contributions are welcome! Please fork the repository and submit a pull request with your changes. Ensure your code adheres to the project's coding standards and includes relevant tests.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For any questions or issues, please reach out to:

Author: Bhavya Mittal
Email: [email protected]
GitHub: INVOICE_DETAILS_EXTRACTOR

Acknowledgments

PaddleOCR: For Optical Character Recognition.
PyMuPDF: For PDF text extraction.
Streamlit: For creating the web application interface.
Google Generative AI: For AI-powered text extraction.


### Instructions:
1. **Clone and Setup**: Instructions for cloning the repository and setting up the environment.
2. **Run the Application**: How to start the Streamlit app.
3. **Usage**: Detailed steps on how to use the application.
4. **Contributing**: Guidelines for contributing to the project.
5. **License and Contact**: Licensing information and contact details.

Feel free to adjust the contact details and any other specifics according to your project and preferences!

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Sample invoices		Sample invoices
convertedfolder		convertedfolder
.gitignore		.gitignore
README.md		README.md
check_file_type.py		check_file_type.py
finalProduct.py		finalProduct.py
front.png		front.png
image_prompt_call.py		image_prompt_call.py
image_to_text_exractor.py		image_to_text_exractor.py
images_to_pdf.py		images_to_pdf.py
invoice2.png		invoice2.png
jetpack.jpg		jetpack.jpg
main.py		main.py
pdf_to_images.py		pdf_to_images.py
requirements.txt		requirements.txt
selectablepdftextextraction.py		selectablepdftextextraction.py
task.docx		task.docx
temp_image.png		temp_image.png
text_prompt_call.py		text_prompt_call.py
trial3.py		trial3.py
workflow.jpg		workflow.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Setup

Usage

Contributing

License

Contact

Acknowledgments

About

Releases

Packages

Languages

MITTALBHAVYA/InvoiceDetailsExtractor

Folders and files

Latest commit

History

Repository files navigation

Setup

Usage

Contributing

License

Contact

Acknowledgments

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages