This project automatically retrieves, filters, and summarizes recent academic papers from arXiv based on user-specified categories, then sends notifications to the user.
- ArXiv Integration: Fetches recent papers from arXiv using their API, ensuring up-to-date access to the latest research.
- Customizable Filtering: Filters papers based on user-defined preferences, including keywords, categories, and exclusion criteria.
- Intelligent Summarization (BETA): Generates concise summaries or extracts abstracts, providing quick insights into paper content. Note: This feature is currently in beta and may have some limitations.
- Flexible Notification System: Notifies users via email, with potential for expansion to other notification methods.
- Configurable Settings: Allows users to fine-tune the application's behavior through a YAML configuration file.
- Getting Started
- Installation
- Quick Start
- Usage
- Configuration
- FAQ and Troubleshooting
- Roadmap
- Glossary
- License
- Contributing
- Acknowledgments
- Python 3.10 or higher
- Required Python packages:
- pypdf
- python-dotenv
- PyYAML
- requests
- simplerllm
-
Clone the repository:
git clone https://github.com/seanbrar/paperweight.git cd paperweight
-
Install the package:
pip install .
- Copy
config-base.yaml
toconfig.yaml
and edit it with your preferences. - Create a
.env
file in the project root and add your API keys (if using the summarization functionality):Note: While .env files are supported for local development, using system environment variables is recommended for enhanced security in production environments.OPENAI_API_KEY=your_openai_api_key_here GEMINI_API_KEY=your_gemini_api_key_here
- Important: Configure valid SMTP settings in
config.yaml
for email notifications. - Run the application:
paperweight
Note: paperweight uses a last_processed_date.txt
file to track when it last processed papers, ensuring efficient updates on subsequent runs.
After installation, you can run paperweight from anywhere using:
paperweight
Recommended usage: Run paperweight daily for optimal paper tracking. Automatic scheduling is not currently built-in.
Note: Runtime may vary based on the number of categories, papers, and whether summarization is enabled. Check the log file for progress updates during execution.
--force-refresh
: Forces paperweight to fetch and process papers regardless of the last processed date.
For detailed information on configuration options, please see the configuration guide.
For details on environment variables and handling sensitive information, refer to the environment variables guide.
For quick solutions to common issues:
- Email Notifications Not Sending: Ensure your email configuration is correct and that you've allowed less secure app access if using Gmail.
- Paper Content Not Downloading: Check your internet connection and verify that the arXiv API is accessible from your network.
For a comprehensive list of frequently asked questions, including setup instructions, usage details, and troubleshooting steps, please refer to the FAQ.
If you can't find an answer to your question or solution to your problem in the FAQ, please open an issue on GitHub.
Key upcoming features:
- Implement machine learning-based paper recommendations
- Add support for additional academic paper sources
- Expand notification methods
For a full list of proposed features and known issues, see the open issues page or the detailed roadmap.
- arXiv: An open-access repository of electronic preprints for scientific papers.
- API: Application Programming Interface; a way for different software to communicate.
- YAML: A human-readable data serialization format used for configuration files.
- SMTP: Simple Mail Transfer Protocol; used for sending emails.
- LLM: Large Language Model; an AI model used for text generation and analysis.
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! If you're interested in contributing to paperweight, please refer to the contributing guide for detailed information on:
- Setting up the development environment
- Running tests
- Our coding standards
- The pull request process
We appreciate all forms of contribution, from code to documentation to bug reports. Thank you for helping to improve paperweight!
- arXiv for providing the API
- simplerllm for the LLM interface