Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add censusfips metadata #4006

Merged
merged 8 commits into from
Jan 17, 2025
Merged

add censusfips metadata #4006

merged 8 commits into from
Jan 17, 2025

Conversation

cmgosnell
Copy link
Member

@cmgosnell cmgosnell commented Jan 9, 2025

Overview

Working on #3884.

What problem does this address?

-just add the fips codes metadata

OOS/for this other PR #4019:

  • add the extractor and relevant metadata/column mapping
  • [working towards] replacing addfips! make pudl.helpers.add_fips_ids work like it does rn but with our newer archived data and doesn't have the error found in Incorrect county FIPS code for Bedford, VA #3531

Documentation

Make sure to update relevant aspects of the documentation.

Tasks

Preview Give feedback

Testing

How did you make sure this worked? How can a reviewer verify this?

To-do list

Preview Give feedback

@cmgosnell cmgosnell requested a review from e-belfer January 9, 2025 15:03
@cmgosnell cmgosnell self-assigned this Jan 9, 2025
@cmgosnell cmgosnell added the bug Things that are just plain broken. label Jan 9, 2025
@e-belfer e-belfer added the new-data Requests for integration of new data. label Jan 9, 2025
Copy link
Member

@e-belfer e-belfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to approve so you can merge this without re-review but the working partitions need to get fixed!

"Reference files for Federal Information Processing Series (FIPS) Geographic Codes. "
"These FIPS Codes are a subset of a broader Population Estimates dataset."
),
"working_partitions": {"years": sorted(set(range(2011, 2024)))},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"working_partitions": {"years": sorted(set(range(2011, 2024)))},
"working_partitions": {"years": sorted(set(2000, range(2011, 2024)))},

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good flag! but since for now we are only planning on using the one freshest year i think it should only be 2023

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we anticipate updating our fipsification to deal with the time variability of these mappings? There'll definitely be inconsistencies in any individual vintage if we apply it across all years of data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the plan for this issue rn is to effectively replicate addfips which is to say use one vintage. This vintage will be much more recent (2023 instead of addfips most recent 2015). It should be pretty easy with this overall setup to add the additional vintages as a next step.

src/pudl/metadata/sources.py Show resolved Hide resolved
@e-belfer e-belfer self-requested a review January 16, 2025 13:57
Copy link
Member

@e-belfer e-belfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the extractor in here is going to create a horrible loop - you can't merge the archiver PR without having the metadata in PUDL main, and you can't extract the data without having a production archive published or else zenodo-cache-sync will fail. I suppose you could work around this by pinning to your PUDL with the metadata locally, but it'll make it hard for anyone else to reproduce this. So I'd propose splitting the extraction into a separate PR and getting the metadata piece merged in first (which I think is pretty much good to go?).

@cmgosnell
Copy link
Member Author

yes good point! i will remove all the extractor stuff and make a new pr.

@cmgosnell cmgosnell marked this pull request as ready for review January 17, 2025 13:22
@e-belfer e-belfer added this pull request to the merge queue Jan 17, 2025
Merged via the queue into main with commit 948d3a9 Jan 17, 2025
17 checks passed
@e-belfer e-belfer deleted the fips-fix branch January 17, 2025 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Things that are just plain broken. new-data Requests for integration of new data.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants