Skip to content

Commit

Permalink
Merge pull request #319 from biglocalnews/precommit
Browse files Browse the repository at this point in the history
Added pre-commit with first round of fixes
  • Loading branch information
palewire authored Jan 19, 2022
2 parents 6d9cc84 + c1b6c58 commit 57a2f0e
Show file tree
Hide file tree
Showing 53 changed files with 192 additions and 92 deletions.
49 changes: 49 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# See https://pre-commit.com for more information
# See https://pre-commit.com/hooks.html for more hooks
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v3.2.0
hooks:
- id: trailing-whitespace
language_version: python3
- id: end-of-file-fixer
language_version: python3
- id: check-yaml
language_version: python3
- id: check-added-large-files
language_version: python3
- id: check-byte-order-marker
language_version: python3
- id: check-case-conflict
language_version: python3
- id: check-json
language_version: python3
- id: mixed-line-ending
language_version: python3
- repo: https://github.com/psf/black
rev: 21.12b0
hooks:
- id: black
language_version: python3
- repo: https://github.com/asottile/blacken-docs
rev: v1.8.0
hooks:
- id: blacken-docs
additional_dependencies: [black]
language_version: python3
- repo: https://github.com/timothycrosley/isort
rev: 5.7.0
hooks:
- id: isort
language_version: python3
- repo: https://gitlab.com/pycqa/flake8
rev: 3.8.4
hooks:
- id: flake8
language_version: python3
- repo: https://github.com/asottile/pyupgrade
rev: v2.7.2
hooks:
- id: pyupgrade
language_version: python3
args: [--py37-plus]
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#
# Colors
#
#

# Define ANSI color codes
RESET_COLOR = \033[m
Expand Down
1 change: 1 addition & 0 deletions Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ flake8-docstrings = "*"
setuptools-scm = "*"
us = "*"
jinja2 = "*"
pre-commit = "*"

[packages]
bs4 = "*"
Expand Down
56 changes: 55 additions & 1 deletion Pipfile.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

11 changes: 6 additions & 5 deletions docs/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,12 @@ The `pipenv` package manager can install all of the Python tools necessary to ru
pipenv install --dev
Now install `pre-commit` to run a battery of automatic quick fixes against your work.

.. code-block:: bash
pipenv run pre-commit install
Create an issue
###############

Expand Down Expand Up @@ -221,8 +227,3 @@ Send a pull request
###################

The final step is to submit a `pull request <https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests>`_ to the main respository, asking the maintainers to consider integrating your patch. GitHub has `a short guide <https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request>`_ that can walk you through the process. You should tag your issue number in the request so that they linked in GitHub’s system.





4 changes: 2 additions & 2 deletions docs/scrapers/al.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
-Media Inquiries: 334.242.0400
-Email: [email protected]
- WARN Data Contact found by Cody: Jessica D. (phone # unknown)

### July 21, 2021 12PM
I had an intuition that I needed to call earlier in order to get a response, so I did! I called back and I was transferred to Jessica D. (sounded like Jessica Dent or Dint?). Unfortunately I had to leave a voicemail again, but at least I got somewhere.

Expand All @@ -21,7 +21,7 @@ Email follow-up
### July 1, 2021
Called the Media Inquiries number, left a voicemail with my callback number. Called the Toll Free number, left a voicemail with my callback number. Sent an email to the contact address with my questions.

### Jun 30th, 2021
### Jun 30th, 2021

The following are some specific issues with this data that needs further processing and phone calls:
- Q: I noticed in the last 8 lines of the data there are alternating rows of future dates and jan 1 1970, with some strange characters in the other fields. Do these lines mean anything, or can we discard them?
Expand Down
1 change: 0 additions & 1 deletion docs/scrapers/ca.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,3 @@ California provides historical data as PDFs and data for the current fiscal year
an Excel file.

- [Home page](https://edd.ca.gov/Jobs_and_Training/Layoff_Services_WARN.htm)

4 changes: 2 additions & 2 deletions docs/scrapers/co.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,10 @@ WARN site: https://cdle.colorado.gov/employers/layoff-separations/layoff-warn-li
Website feedback google form: https://docs.google.com/forms/d/e/1FAIpQLSdlZ6HyYpV4wzE-BMBS3S9YDclg5hfyaJlDblwXkETdMWFs1A/viewform (in case the other email isnt useful)

### Email sent to [email protected] Aug 6, 2021
>Hi, I'm emailing on behalf of Stanford's Big Local News project. We're trying to increase journalists' access to WARN data, and we had a question about Colorado's WARN data.
>Hi, I'm emailing on behalf of Stanford's Big Local News project. We're trying to increase journalists' access to WARN data, and we had a question about Colorado's WARN data.
>In comparing your 2021 data with previous years, I'm wondering if 'layoff total' in the archived data (2015-2019) represents a sum of the 'permanent layoffs', 'temporary layoffs', and 'furloughs' columns in the 2021 data. Would that be a fair assessment?
>Thanks for your help!
>(P.S: if you happen to have any data for Colorado before 2015, that would be nice to have!)
Response Received:
> Thank you for your question. You are correct. The Total layoffs from 2015-2019 includes all permanent, temporary and furloughs reported under WARN for those years. We evolved the data as time progressed and we better understood the data needs of the public.
> Thank you for your question. You are correct. The Total layoffs from 2015-2019 includes all permanent, temporary and furloughs reported under WARN for those years. We evolved the data as time progressed and we better understood the data needs of the public.
2 changes: 1 addition & 1 deletion docs/scrapers/dc.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@

### Jul 1st, 2021

On the most recent page for WARN notices, the link for 2014 data directs to the page for 2018 data. To account for this problem, url for 2014 page is hard-coded in the code at the moment. Reported this problem to the [website maintenance](https://dc.gov/page/dcgovernmentwebsite).
On the most recent page for WARN notices, the link for 2014 data directs to the page for 2018 data. To account for this problem, url for 2014 page is hard-coded in the code at the moment. Reported this problem to the [website maintenance](https://dc.gov/page/dcgovernmentwebsite).
2 changes: 1 addition & 1 deletion docs/scrapers/in.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Indianapolis, IN 46204
[email protected]

### July 21, 2021
Response:
Response:

>Cody:
>This error should be fixed soon.
Expand Down
1 change: 0 additions & 1 deletion docs/scrapers/ks.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,3 @@
The Kansas Job Center site does not display Non-WARN layoff notices by default but our scraper captures both WARN and Non-WARN notices. Therefore, our record counts are significantly higher than what you'll get from a count of records displayed online.

[All data]: https://www.kansasworks.com/search/warn_lookups?commit=Search&page=1&q%5Bemployer_name_cont%5D=&q%5Bmain_contact_contact_info_addresses_full_location_city_matches%5D=&q%5Bnotice_eq%5D=true&q%5Bnotice_on_gteq%5D=&q%5Bnotice_on_lteq%5D=&q%5Bs%5D=notice_on+desc&q%5Bservice_delivery_area_id_eq%5D=&q%5Bzipcode_code_start%5D=&utf8=%E2%9C%93

2 changes: 1 addition & 1 deletion docs/scrapers/mo.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Response from MO Department of Higher Education and Workforce Devlopment about t
![image](https://user-images.githubusercontent.com/56002814/124181019-fc64cc00-da82-11eb-87ed-d3c6cd2e3021.png)


### Jun 29th, 2021
### Jun 29th, 2021

The following are some specific issues with this data that needs further processing and phone calls:
- year 2015-2019 data is in fiscal year, but year 2020 and 2021 data is in calendar year. This produced some number of duplicate data entries between 2019 and 2020 data page. Some of these duplicate entries have minor differences in number affected. This issue requires us to inquiry about the difference between the 2020 and 2019 data page as well as how updated date works for different companies (some companies have multiple entries with same date but different numbers). Currently, I am waiting for a response from the state agency.
Expand Down
10 changes: 5 additions & 5 deletions docs/scrapers/ny.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,18 @@
- [Homepage](https://dol.ny.gov/warn-notices)
- To request for full documentation of any notices prior to 2021, email [email protected]
- [Contacts](https://dol.ny.gov/worker-adjustment-and-retraining-notification-warn)
- NYS WARN Coordinator: Janet Faraone (phone: 518-457-1518, email: [[email protected]]([email protected]))
- NYS WARN Coordinator: Janet Faraone (phone: 518-457-1518, email: [[email protected]]([email protected]))

### Jan 5, 2022
We've implemented historical document scraping for NY (issue #136) and removed the old website scraping system while we wait for their website to be upgraded. As mentioned in July, we should implement a system of periodically requesting a historical document from NY. We are hosting this historical document at https://storage.googleapis.com/bln-data-public/warn-layoffs/ny_historical.xlsx .

### Jul 12th, 2021
reaching out to the NYS WARN coordinator was fruitful. the state agency prepared and sent over an excel containing records from 2016 to 2021 within a day upon email request. Here is a snippet of the data:
reaching out to the NYS WARN coordinator was fruitful. the state agency prepared and sent over an excel containing records from 2016 to 2021 within a day upon email request. Here is a snippet of the data:
![image](https://user-images.githubusercontent.com/56002814/125342121-736f4f80-e322-11eb-8c22-b1f22346ded7.png)
The full dataset is currently attached to issue #136, it should be hosted on some platform in the future.
When requesting the data, the state agency asked for a timeframe from which data is needed (i responded with 2016-2021), but it might be possible to request data prior to 2016 as well.
When requesting the data, the state agency asked for a timeframe from which data is needed (i responded with 2016-2021), but it might be possible to request data prior to 2016 as well.
Regarding updating data in the future, the NYS WARN website is undergoing updates, and it should allow scraping in the future. In the meantime, we should periodically request updated data from NYS.

### Jul 6th, 2021
- 2021 data doesn't directly contain # employee affected in the html table on the website, needs to parse the corresponding pdf to obtain the # affected.
- for data prior to 2021, each year is contained in a pdf and doesn't contain # affected, needs to reach out and ask if there is a better form of data/access company specific data.
- 2021 data doesn't directly contain # employee affected in the html table on the website, needs to parse the corresponding pdf to obtain the # affected.
- for data prior to 2021, each year is contained in a pdf and doesn't contain # affected, needs to reach out and ask if there is a better form of data/access company specific data.
2 changes: 1 addition & 1 deletion docs/scrapers/sc.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# South Carolina

## Site:
## Site:
https://scworks.org/employer/employer-programs/at-risk-of-closing/layoff-notification-reports

### 10/8/2021
Expand Down
8 changes: 4 additions & 4 deletions docs/scrapers/tx.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ URL: https://www.twc.texas.gov/businesses/worker-adjustment-and-retraining-notif
"To access older WARN notices or if you have any questions regarding WARN notices, contact TWC at [email protected]."

### 1/5/22
Just want to document here that TX is an edge-case scraper, meaning that it has a hybrid strategy of both scraping the website and downloading a historical document, merging the two data. We've hosted the document at https://storage.googleapis.com/bln-data-public/warn-layoffs/tx_historical.xlsx
Just want to document here that TX is an edge-case scraper, meaning that it has a hybrid strategy of both scraping the website and downloading a historical document, merging the two data. We've hosted the document at https://storage.googleapis.com/bln-data-public/warn-layoffs/tx_historical.xlsx

### 8/19/21
**Cody:** "Hi Francisco,
Expand All @@ -23,14 +23,14 @@ So it seems like our guess was correct, that the xtra data included in the histo

### 8/18/21

Noticed historical data seemed more comprehensive than yearly data for 2019 (see WARN/#227). Sent an email bringing to their attention & asking which is the best source of the WARN data.
Noticed historical data seemed more comprehensive than yearly data for 2019 (see WARN/#227). Sent an email bringing to their attention & asking which is the best source of the WARN data.

"Thanks so much for this data. I noticed something interesting that I thought you might like to be aware of. After looking at the differences between the WARN historical file you provided and the 2019 data available on the public-facing website, I noticed some interesting discrepancies: https://www.diffchecker.com/LBUWndL9I was surprised to find that the historical data includes about 12 additional rows.I think this trend might extend to other years as well. I am wondering if you all are aware of this, and if so, whether the historical data or the data on the website would be a more accurate source for Texas' WARN information."


### 8/16/2021

Received a reply from TX from [email protected], with the following .xlsx file.
Received a reply from TX from [email protected], with the following .xlsx file.

[Warns 01-01-89-09-30-19.xlsx](https://github.com/biglocalnews/WARN/files/6994307/Warns.01-01-89-09-30-19.xlsx)

Expand All @@ -39,5 +39,5 @@ This file seems to overlap quite a bit with our website scraper, maybe the websi

### 8/13/21

Sent an email to the provided email address requesting WARN from prior years. Received an email back from [email protected] with the historical WARN data. He also CC'd "Hession,Margaret" <[email protected]> and
Sent an email to the provided email address requesting WARN from prior years. Received an email back from [email protected] with the historical WARN data. He also CC'd "Hession,Margaret" <[email protected]> and
"Bernsen,James" <[email protected]>
2 changes: 1 addition & 1 deletion docs/scrapers/vt.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,4 @@
The Vermont Job Center site does not display Non-WARN layoff notices by default but our scraper captures both WARN and Non-WARN notices. Therefore, our record counts are significantly higher than what you'll get from a count of records displayed online.


[All data]: https://www.vermontjoblink.com/search/warn_lookups?utf8=%E2%9C%93&q%5Bemployer_name_cont%5D=&q%5Bmain_contact_contact_info_addresses_full_location_city_matches%5D=&q%5Bzipcode_code_start%5D=&q%5Bservice_delivery_area_id_eq%5D=&q%5Bnotice_on_gteq%5D=&q%5Bnotice_on_lteq%5D=&q%5Bnotice_eq%5D=true&commit=Search
[All data]: https://www.vermontjoblink.com/search/warn_lookups?utf8=%E2%9C%93&q%5Bemployer_name_cont%5D=&q%5Bmain_contact_contact_info_addresses_full_location_city_matches%5D=&q%5Bzipcode_code_start%5D=&q%5Bservice_delivery_area_id_eq%5D=&q%5Bnotice_on_gteq%5D=&q%5Bnotice_on_lteq%5D=&q%5Bnotice_eq%5D=true&commit=Search
2 changes: 1 addition & 1 deletion docs/scrapers/wi.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Madison, WI 53707
### July 23, 2021
Sent an email asking about the 'Y' column. Hopefully we get a response! Seems like it's valid data, just unmarked.

RESPONSE:
RESPONSE:
Hi Cody –
The unlabeled column you cite in your inquiry denotes whether an update to the notice was received at some point in time after receiving the initial notice.
Thanks!
Expand Down
6 changes: 3 additions & 3 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@ warn-scraper AK CT
To use the `warn` library in Python, import a state's scraper and run it directly.

```python
>>> # Scrape Alaska to the default download directory
>>> from warn.scrapers import ak
>>> ak.scrape()
from warn.scrapers import ak

ak.scrape()
```

## Configuration
Expand Down
12 changes: 7 additions & 5 deletions setup.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""Configure the package for distribution."""
import distutils.cmd
import os
from pathlib import Path

import jinja2
import us
from setuptools import find_packages, setup

import warn
import jinja2
import distutils.cmd
from pathlib import Path
from setuptools import setup, find_packages


def read(file_name):
Expand All @@ -27,6 +28,7 @@ def version_scheme(version):
If that issue is resolved, this method can be removed.
"""
import time

from setuptools_scm.version import guess_next_version

if version.exact:
Expand Down
Loading

0 comments on commit 57a2f0e

Please sign in to comment.