Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(dc): use archive for 2014, links for others #444

Merged
merged 1 commit into from
Feb 27, 2022

Conversation

chriszs
Copy link
Contributor

@chriszs chriszs commented Feb 26, 2022

Improves on the hack for #238 by now using the mostly-correct links from the agency site, but patching in an Archive.org URL for 2014 only if the URL for 2018 continues to match (sunsetting the hack if the agency corrects this). This expands the list of years we scrape. Also filters out an empty row and corrects a reference to md in the cache key.

@chriszs
Copy link
Contributor Author

chriszs commented Feb 26, 2022

Note there's some inconsistent date formatting that will have to be dealt with in the transformer:

Screen Shot 2022-02-26 at 12 17 39 PM

@palewire palewire merged commit 6bb13e7 into biglocalnews:main Feb 27, 2022
@chriszs chriszs deleted the dc-gh-238-use-current-links branch February 27, 2022 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants