Transform Receipt Images into Structured Data with OCR & AI
Transform Receipt Images into Structured Data with OCR & AI
# 🚀 Receipts Unlocked
**Transform Receipt Images into Structured Data with OCR & AI**
[](https://www.python.org/downloads/)
[](https://opencv.org/)
[](https://opensource.org/licenses/MIT)
🚀 Features
- 📸 Extract text from receipt images using OCR
- 🤖 AI-powered data structuring
- 📊 Convert receipt data to JSON format
- 🎯 High accuracy text recognition
- ⚡ Fast processing
📋 Requirements
- Python 3.x
- OpenCV
- Pytesseract
- OpenAI API key
🛠️ Installation
1. Install Tesseract OCR
Choose your operating system
- **Windows**: Download from [UB-Mannheim/tesseract](https://github.com/UB-Mannheim/tesseract/wiki) - **Linux**: `sudo apt-get install tesseract-ocr` - **Mac**: `brew install tesseract`2. Set Up Project
- Clone the repository:
1 2
git clone <repository-url> cd <repository-directory>
- Install Python dependencies:
1
pip install -r requirements.txt
- Create a
.env
file in the project root and add your OpenAI API key:1
OPENAI_API_KEY=your_api_key_here
- Update the Tesseract path in
config.py
to match your system’s installation.
💻 Usage
Method 1: Using the Python Module
1
2
3
4
from src.main import process_receipt
result = process_receipt("path/to/receipt.jpg")
print(result)
Method 2: Running the Script Directly
- Add an image of a receipt named
receipt.jpg
to the project directory - Run the script:
1
python main.py
- The extracted JSON data will be saved in
receipt.json
📊 Output Format
The program outputs a JSON object with structured receipt data. Example format:
1
2
3
4
5
6
7
8
9
10
11
12
{
"store_name": "Example Store",
"date": "2023-01-01",
"items": [
{
"name": "Product 1",
"price": 10.99,
"quantity": 1
}
],
"total": 10.99
}
📁 Project Structure
main.py
: The main script that processes the image, extracts text, and converts it to JSONsrc/
: Source code directoryrequirements.txt
: List of required Python packagesconfig.py
: Configuration settings including Tesseract pathREADME.md
: Project documentation
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
This post is licensed under
CC BY 4.0
by the author.