Skip to content

AI-powered document organization tool that helps teams discover similar documents, generate summaries, and efficiently manage content across cloud storage platforms.

License

Notifications You must be signed in to change notification settings

opportunity-hack/document-compass

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Document Compass 🧭

Intelligent Document Organization & Discovery Platform

License: MIT Python Version Node.js Version PRs Welcome

Document Compass is an open-source platform that helps organizations intelligently organize, discover, and utilize their documents through AI-powered similarity matching and smart grouping. Built with both enterprise and nonprofit use cases in mind, it specifically addresses challenges in low-bandwidth environments and offers integration with popular cloud storage providers.

🎯 Project Goals

Primary Objectives

  • Enable intelligent document discovery across large document collections
  • Reduce time spent searching for related documents by 70%
  • Make document management accessible in low-bandwidth environments
  • Provide actionable insights through document summarization and grouping
  • Integrate seamlessly with existing cloud storage solutions

Target Users

  • Nonprofits managing program documentation
  • Organizations with distributed teams
  • Educational institutions organizing learning materials
  • Research teams managing related papers and studies
  • Any team struggling with document discovery and organization

🌟 Key Features

Core Functionality

  • Smart Document Grouping: Automatically identify and group similar documents
  • Intelligent Summarization: Generate concise summaries at multiple detail levels
  • Low-Bandwidth Optimization: Compressed previews and progressive loading
  • Cloud Storage Integration: Native support for Google Drive and Dropbox
  • Flexible Search: Find documents by content, metadata, or similarity

Technical Highlights

  • Machine learning-powered similarity detection
  • Efficient document vectorization and indexing
  • Scalable architecture supporting millions of documents
  • REST API for easy integration
  • Containerized deployment for simple scaling

πŸ›  Technology Stack

Backend

  • Python 3.9+
  • FastAPI for REST API
  • Sentence Transformers for document embedding
  • PostgreSQL for metadata storage
  • Redis for caching

Frontend

  • React 18+
  • Next.js for server-side rendering
  • TailwindCSS for styling
  • ShadcnUI for components

Infrastructure

  • Docker for containerization
  • GitHub Actions for CI/CD
  • Fly.io for deployment
  • MinIO for object storage

πŸ“‹ Prerequisites

# Backend
Python 3.9+
PostgreSQL 13+
Redis 6+

# Frontend
Node.js 18+
npm 8+

# Infrastructure
Docker 20.10+
docker-compose 2.0+

πŸš€ Quick Start

# Clone the repository
git clone https://github.com/opportunity-hack/document-compass.git

# Install dependencies
cd document-compass
pip install -r requirements.txt
cd packages/interface && npm install

# Set up environment
cp .env.example .env
# Edit .env with your configurations

# Start development environment
docker-compose up -d

# Run migrations
python manage.py migrate

# Start backend
python manage.py runserver

# Start frontend (new terminal)
cd packages/interface && npm run dev

πŸ“Š Project Structure

document-compass/
β”œβ”€β”€ packages/
β”‚   β”œβ”€β”€ core/              # Core similarity engine
β”‚   β”œβ”€β”€ navigator/         # Search & grouping
β”‚   β”œβ”€β”€ api/              # FastAPI application
β”‚   └── interface/        # React frontend
β”œβ”€β”€ docs/                 # Documentation
β”œβ”€β”€ examples/             # Usage examples
β”œβ”€β”€ tests/               # Test suites
└── deployment/          # Deployment configs

🀝 Contributing

We welcome contributions! See our Contributing Guide for details.

Development Process

  1. Fork the repository
  2. Create a feature branch
  3. Commit your changes
  4. Push to the branch
  5. Create a Pull Request

Code Quality Standards

  • 100% test coverage for core functionality
  • Type hints for Python code
  • ESLint compliance for JavaScript/TypeScript
  • Comprehensive documentation

πŸ“ˆ Roadmap

Phase 1

As a user, I would like to be able to upload and/or sync documents within Google Drive. I would like the documents contained within the app to show me which ones are similar to one another and try to group them into folders based on similarity.

  • Core similarity engine
  • Basic document grouping
  • Google Drive integration
  • Initial API release

Phase 2

As a product manager, I would like to use either Dropbox or Google Drive - this enables the most common cloud drive platforms to use what we have built. As a user, I would like to have my documents summarized and then easily searched. I would also like to use this application from my mobile device.

  • Dropbox integration
  • Advanced summarization
  • Batch processing
  • Mobile-responsive UI

Phase 3 TODO - update this

  • Enterprise features
  • Advanced permission system
  • Custom ML model training
  • API rate limiting

Phase 4 TODO - update this

  • Additional storage providers
  • Advanced analytics
  • Workflow automation
  • Enterprise SSO

πŸ“Š Success Metrics

We track the following metrics to measure project success:

User Impact

  • Document discovery time reduction
  • Bandwidth savings
  • User engagement with summaries
  • Group accuracy rates

Technical Performance

  • API response times
  • Processing speed
  • System uptime
  • Error rates

πŸ”’ Security

  • JWT-based authentication
  • Role-based access control
  • Document encryption at rest
  • Regular security audits
  • GDPR compliance built-in

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

Special thanks to:

  • The Opportunity Hack community
  • Our open-source contributors
  • Organizations providing valuable feedback

πŸ“ž Contact


Made with ❀️ by the Opportunity Hack Team

About

AI-powered document organization tool that helps teams discover similar documents, generate summaries, and efficiently manage content across cloud storage platforms.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published