Skip to content

Latest commit

 

History

History
212 lines (163 loc) · 6.32 KB

README.md

File metadata and controls

212 lines (163 loc) · 6.32 KB

Document Compass 🧭

Intelligent Document Organization & Discovery Platform

License: MIT Python Version Node.js Version PRs Welcome

Document Compass is an open-source platform that helps organizations intelligently organize, discover, and utilize their documents through AI-powered similarity matching and smart grouping. Built with both enterprise and nonprofit use cases in mind, it specifically addresses challenges in low-bandwidth environments and offers integration with popular cloud storage providers.

🎯 Project Goals

Primary Objectives

  • Enable intelligent document discovery across large document collections
  • Reduce time spent searching for related documents by 70%
  • Make document management accessible in low-bandwidth environments
  • Provide actionable insights through document summarization and grouping
  • Integrate seamlessly with existing cloud storage solutions

Target Users

  • Nonprofits managing program documentation
  • Organizations with distributed teams
  • Educational institutions organizing learning materials
  • Research teams managing related papers and studies
  • Any team struggling with document discovery and organization

🌟 Key Features

Core Functionality

  • Smart Document Grouping: Automatically identify and group similar documents
  • Intelligent Summarization: Generate concise summaries at multiple detail levels
  • Low-Bandwidth Optimization: Compressed previews and progressive loading
  • Cloud Storage Integration: Native support for Google Drive and Dropbox
  • Flexible Search: Find documents by content, metadata, or similarity

Technical Highlights

  • Machine learning-powered similarity detection
  • Efficient document vectorization and indexing
  • Scalable architecture supporting millions of documents
  • REST API for easy integration
  • Containerized deployment for simple scaling

🛠 Technology Stack

Backend

  • Python 3.9+
  • FastAPI for REST API
  • Sentence Transformers for document embedding
  • PostgreSQL for metadata storage
  • Redis for caching

Frontend

  • React 18+
  • Next.js for server-side rendering
  • TailwindCSS for styling
  • ShadcnUI for components

Infrastructure

  • Docker for containerization
  • GitHub Actions for CI/CD
  • Fly.io for deployment
  • MinIO for object storage

📋 Prerequisites

# Backend
Python 3.9+
PostgreSQL 13+
Redis 6+

# Frontend
Node.js 18+
npm 8+

# Infrastructure
Docker 20.10+
docker-compose 2.0+

🚀 Quick Start

# Clone the repository
git clone https://github.com/opportunity-hack/document-compass.git

# Install dependencies
cd document-compass
pip install -r requirements.txt
cd packages/interface && npm install

# Set up environment
cp .env.example .env
# Edit .env with your configurations

# Start development environment
docker-compose up -d

# Run migrations
python manage.py migrate

# Start backend
python manage.py runserver

# Start frontend (new terminal)
cd packages/interface && npm run dev

📊 Project Structure

document-compass/
├── packages/
│   ├── core/              # Core similarity engine
│   ├── navigator/         # Search & grouping
│   ├── api/              # FastAPI application
│   └── interface/        # React frontend
├── docs/                 # Documentation
├── examples/             # Usage examples
├── tests/               # Test suites
└── deployment/          # Deployment configs

🤝 Contributing

We welcome contributions! See our Contributing Guide for details.

Development Process

  1. Fork the repository
  2. Create a feature branch
  3. Commit your changes
  4. Push to the branch
  5. Create a Pull Request

Code Quality Standards

  • 100% test coverage for core functionality
  • Type hints for Python code
  • ESLint compliance for JavaScript/TypeScript
  • Comprehensive documentation

📈 Roadmap

Phase 1

As a user, I would like to be able to upload and/or sync documents within Google Drive. I would like the documents contained within the app to show me which ones are similar to one another and try to group them into folders based on similarity.

  • Core similarity engine
  • Basic document grouping
  • Google Drive integration
  • Initial API release

Phase 2

As a product manager, I would like to use either Dropbox or Google Drive - this enables the most common cloud drive platforms to use what we have built. As a user, I would like to have my documents summarized and then easily searched. I would also like to use this application from my mobile device.

  • Dropbox integration
  • Advanced summarization
  • Batch processing
  • Mobile-responsive UI

Phase 3 TODO - update this

  • Enterprise features
  • Advanced permission system
  • Custom ML model training
  • API rate limiting

Phase 4 TODO - update this

  • Additional storage providers
  • Advanced analytics
  • Workflow automation
  • Enterprise SSO

📊 Success Metrics

We track the following metrics to measure project success:

User Impact

  • Document discovery time reduction
  • Bandwidth savings
  • User engagement with summaries
  • Group accuracy rates

Technical Performance

  • API response times
  • Processing speed
  • System uptime
  • Error rates

🔒 Security

  • JWT-based authentication
  • Role-based access control
  • Document encryption at rest
  • Regular security audits
  • GDPR compliance built-in

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Special thanks to:

  • The Opportunity Hack community
  • Our open-source contributors
  • Organizations providing valuable feedback

📞 Contact


Made with ❤️ by the Opportunity Hack Team