Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix typos, tweak grammer, linkify README.md #8

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 11 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,35 +1,35 @@
# Big-Cases
This is the basic Python code behind the [@big_cases Twitter bot](http://twitter.com/big_cases).
This is the basic Python code behind the [@big_cases Twitter bot](https://twitter.com/big_cases).

The bot uses PACER RSS feeds to gather the latest filings from 74 U.S. District Courts and five federal courts of appeals and stores the docket entries in a database. It matches new filings against a preselected list of major cases, scrapes matching documents from PACER, uploads them to both a [DocumentCloud](http://www.documentcloud.org) project and the RECAP archive, then posts the results on Twitter.
The bot uses PACER RSS feeds to gather the latest filings from 74 U.S. District Courts and five federal courts of appeals and stores the docket entries in a database. It matches new filings against a preselected list of major cases, scrapes matching documents from PACER, uploads them to both a [DocumentCloud](https://www.documentcloud.org/) project and the [RECAP archive](https://www.courtlistener.com/recap/), and then posts the results on [big_cases' Twitter feed](https://twitter.com/big_cases).

Note that some federal district courts -- including those in the Eastern District of Virginia, the District of Maryland and the District of Hawaii -- have elected not to publish an RSS feed from their dockets. As a result, Big Cases can't follow cases in those districts. Most federal courts of appeal also do not pulbish RSS feeds of new docket entries.
Note that some federal district courts -- including those in the Eastern District of Virginia, the District of Maryland and the District of Hawaii -- do not publish an RSS feed from their dockets. As a result, Big Cases can't follow cases in those districts. Most federal courts of appeal also do not publish RSS feeds of new docket entries.

(A separate part of the bot gathers docket information from the U.S. Supreme Court's electronic filing system.)

## How it basically works

Big-Cases uses three main Python scripts to gather docket entries, scrape dockets and tweet the results. This is mostly a vestigate of the fact that the bot was built as an add-on to a larger scraping tool. The scripts run in order, every four minutes:
Big-Cases uses three main Python scripts to gather docket entries, scrape dockets and tweet the results. This is mostly due to the fact that the bot was built as an add-on to a larger scraping tool. The scripts run in order, every four minutes:

* **pacer_rss.py** - Scrapes PACER's RSS feeds, matches them against a list of selected cases and stores the results in a MySQL database.
* **bigcases_scrape_docs.py** - Attempts to scrape documents from PACER based on new matches identified from RSS feeds. It also publishes the results on DocumentCloud and the RECAP archive.
* **bigcases.py** - Tweets the results.
* **[pacer-rss.py](../../blob/master/pacer-rss.py)** - Scrapes PACER's RSS feeds, matches them against a list of selected cases and stores the results in a MySQL database.
* **[bigcases_scrape_docs.py](../../blob/master/bigcases_scrape_docs.py)** - Attempts to scrape documents from PACER based on new matches identified from RSS feeds. It also publishes the results on DocumentCloud and the RECAP archive.
* **[bigcases.py](../../blob/master/bigcases.py)** - Tweets the results.

## Settings and lists of stuff

The list of cases Big-Cases is following is stored in **bigcases_list.py**; the list of available PACER RSS feeds is stored in **pacer_rss_feeds.py**.
The list of cases Big-Cases follows is stored in **[bigcases_list.py](../../blob/master/bigcases_list.py)**; the list of available PACER RSS feeds is stored in **[pacer_rss_feeds.py](../../blob/master/pacer_rss_feeds.py)**.

## Yes, the robot listens

A separate secript, **bigcases_listen.py** periodically checks the bot's Twitter mentions for new cases to follow, parses the requests and matches them against courts with active data feeds and cases that already exist in the database.
A separate script, **[bigcases_listen.py](../../blob/master/bigcases_listen.py)** periodically checks the bot's Twitter mentions for new cases to follow, parses the requests and matches them against courts with active data feeds and cases that already exist in the database.

## Other stuff to know

Big-Cases has only been tested on a machine running CentOS Linux 7.

## Dependencies

pip install python-documentcloud
`pip install python-documentcloud`

Customize `bigcases_settings` with credential information for PACER,
Customize **[bigcases_settings.py](../../blob/master/bigcases_settings.py)** with credential information for PACER,
DocumentCloud, Twitter, RECAP, &c.