Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add download_agency notes #141

Open
wants to merge 3 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions docs/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,8 +116,8 @@ When coding a new scraper, there are a few important conventions to follow:
- If it's a new state folder, add an empty `__init__.py` to the folder
- Create a `Site` class inside the agency's scraper module with the following attributes/methods:
- `name` - Official name of the agency
- `scrape_meta` - generates a CSV with metadata about videos and other available files (file name, URL, and size at minimum)
- `scrape` - uses the CSV generated by `scrape_meta` to download videos and other files
- `scrape_meta` - generates a JSON with metadata about videos and other available files (file name, URL, and size at minimum)
- `download_agency` - uses the JSON generated by `scrape_meta` to download videos and other files

Below is a pared down version of San Diego's [Site](https://github.com/biglocalnews/clean-scraper/blob/main/clean/ca/san_diego_pd.py) class to illustrate these conventions.

Expand Down Expand Up @@ -278,6 +278,7 @@ Options:
Commands:
list List all available agencies and their slugs.
scrape-meta Command-line interface for generating metadata CSV about...
download_agency Downloads assets retrieved in scrape-meta
```

Running a state is as simple as passing arguments to the appropriate subcommand.
Expand All @@ -292,7 +293,7 @@ pipenv run python -m clean.cli list
pipenv run python -m clean.cli scrape-meta ca_san_diego_pd

# Trigger file downloads using agency slug
pipenv run python -m clean.cli scrape ca_san_diego_pd
pipenv run python -m clean.cli download_agency ca_san_diego_pd
```

For more verbose logging, you can ask the system to show debugging information.
Expand Down
9 changes: 5 additions & 4 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,14 +31,14 @@ You can then run a scraper for an agency using its slug:
clean-scraper scrape-meta ca_san_diego_pd
```

> **NOTE**: Always run `scrape-meta` at least once initially. It generates output required by the `scrape` subcommand.
> **NOTE**: Always run `scrape-meta` at least once initially. It generates output required by the `download_agency` subcommand.

To use the `clean` library in Python, import an agency's scraper and run it directly.

```python
from clean.ca import san_diego_pd

san_diego_pd.scrape()
san_diego_pd.download_agency()
```

## Configuration
Expand All @@ -56,6 +56,7 @@ Options:
--help Show this message and exit.

Commands:
list List all available agencies and their slugs.
scrape-meta Command-line interface for downloading CLEAN files.
list List all available agencies and their slugs.
scrape-meta Command-line interface for generating metadata CSV about...
download_agency Downloads assets retrieved in scrape-meta
```
Loading