paperweight

This project automatically retrieves, filters, and summarizes recent academic papers from arXiv based on user-specified categories, then sends notifications to the user.

Features

ArXiv Integration: Fetches recent papers from arXiv using their API, ensuring up-to-date access to the latest research.
Customizable Filtering: Filters papers based on user-defined preferences, including keywords, categories, and exclusion criteria.
Intelligent Summarization (BETA): Generates concise summaries or extracts abstracts, providing quick insights into paper content. Note: This feature is currently in beta and may have some limitations.
Flexible Notification System: Notifies users via email, with potential for expansion to other notification methods.
Configurable Settings: Allows users to fine-tune the application's behavior through a YAML configuration file.

Getting Started

Prerequisites

Python 3.10 or higher
Required Python packages:
- pypdf
- python-dotenv
- PyYAML
- requests
- simplerllm

Installation

Clone the repository:

git clone https://github.com/seanbrar/paperweight.git
cd paperweight

Install the package:
```
pip install .
```

Quick Start

Copy config-base.yaml to config.yaml and edit it with your preferences.
Create a .env file in the project root and add your API keys (if using the summarization functionality):
```
OPENAI_API_KEY=your_openai_api_key_here
GEMINI_API_KEY=your_gemini_api_key_here
```
Note: While .env files are supported for local development, using system environment variables is recommended for enhanced security in production environments.
Important: Configure valid SMTP settings in config.yaml for email notifications.
Run the application:
```
paperweight
```

Note: paperweight uses a last_processed_date.txt file to track when it last processed papers, ensuring efficient updates on subsequent runs.

Usage

After installation, you can run paperweight from anywhere using:

paperweight

Recommended usage: Run paperweight daily for optimal paper tracking. Automatic scheduling is not currently built-in.

Note: Runtime may vary based on the number of categories, papers, and whether summarization is enabled. Check the log file for progress updates during execution.

Command-line Arguments

--force-refresh: Forces paperweight to fetch and process papers regardless of the last processed date.

Configuration

For detailed information on configuration options, please see the configuration guide.

For details on environment variables and handling sensitive information, refer to the environment variables guide.

FAQ and Troubleshooting

For quick solutions to common issues:

Email Notifications Not Sending: Ensure your email configuration is correct and that you've allowed less secure app access if using Gmail.
Paper Content Not Downloading: Check your internet connection and verify that the arXiv API is accessible from your network.

For a comprehensive list of frequently asked questions, including setup instructions, usage details, and troubleshooting steps, please refer to the FAQ.

If you can't find an answer to your question or solution to your problem in the FAQ, please open an issue on GitHub.

Roadmap

Key upcoming features:

Implement machine learning-based paper recommendations
Add support for additional academic paper sources
Expand notification methods

For a full list of proposed features and known issues, see the open issues page or the detailed roadmap.

Glossary

arXiv: An open-access repository of electronic preprints for scientific papers.
API: Application Programming Interface; a way for different software to communicate.
YAML: A human-readable data serialization format used for configuration files.
SMTP: Simple Mail Transfer Protocol; used for sending emails.
LLM: Large Language Model; an AI model used for text generation and analysis.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! If you're interested in contributing to paperweight, please refer to the contributing guide for detailed information on:

Setting up the development environment
Running tests
Our coding standards
The pull request process

We appreciate all forms of contribution, from code to documentation to bug reports. Thank you for helping to improve paperweight!

Acknowledgments

arXiv for providing the API
simplerllm for the LLM interface

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
docs		docs
src/paperweight		src/paperweight
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
config-base.yaml		config-base.yaml
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

paperweight

Features

Table of Contents

Getting Started

Prerequisites

Installation

Quick Start

Usage

Command-line Arguments

Configuration

FAQ and Troubleshooting

Roadmap

Glossary

License

Contributing

Acknowledgments

About

Releases

Packages

Languages

License

seanbrar/paperweight

Folders and files

Latest commit

History

Repository files navigation

paperweight

Features

Table of Contents

Getting Started

Prerequisites

Installation

Quick Start

Usage

Command-line Arguments

Configuration

FAQ and Troubleshooting

Roadmap

Glossary

License

Contributing

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages