-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add additional methods to base classes to let users support additional sources #77
Comments
Here are some examples from customized subclasses I've implemented for my own data loading project to support additional sources. They might be useful to understand what I'm talking about in this issue. |
I love the idea of integrating features you've added, but I think we'd probably be best off taking things on a case by case basis, with a clear vision for what new use case the individual change would allow. Is there a feature addition you would propose for the end user? Is it supporting a data source beyond ACS? Something else? |
@palewire to clarify, this is less adding a feature or support for a specific data source in the CLI, and instead making backward-compatible changes to the Python API that would make it easier for users of the Python API to add support for other data sources from the Census Bureau's API in their own projects. These changes may also make it easier to add support for additional sources in this tool (i.e. #2). I ran into the need for this when writing code to download and process data from the self-response rate endpoint. That support definitely doesn't need to be in this library/the CLI, but it would be great to make it easier to use code and conventions in this package to support consuming data from other Census API endpoints. The changes I describe above address these two needs (for Python API users, not CLI users):
I'm not sure whether the approaches I've taken in my code are the best way to address these needs, but I wanted to document them in case you all have run into this internally when thinking about how to pull data from the Census API for sources that aren't ACS tables. |
Gotcha. I'm not opposed to such changes, I just want them to be pegged to new features for the user of this library, which I think could bring some focus to the work. That way the edits aren't academic but are integrated with the code here from the start. In other words, I don't want to prematurely optimize. For instance, if we set the goal of integrating the three and one year samples from the ACS into this library, could adding that feature naturally also include some of the refactoring you propose? |
Supporting other ACS releases wouldn't require these changes. Supporting decennial tables, like sf1, would require a way to specify field types on a per source/table basis. A hook to support a different client class wouldn't be required by either of these additions. That's only needed for supporting API data sources that aren't supported by the census package. |
Got it. With the decennial census coming out this year, maybe it's a good time to figure out SF1. Have you integrated it downstream in any of your stuff? |
I haven't integrated SF1 yet, but I'm likely going to be using some tables (e.g. P1) soon. I'll update this issue with any relevant findings or bits of example code. |
This is somewhat related to #2.
I find this project to be extremely useful and a great framework for a task that I have to do often. In my projects, I've found myself using the base classes and concepts from this project when I want to download and process data from other Census Bureau API sources.
However, for non-ACS sources, I find myself entirely reimplementing many of the methods on my geotype downloader classes because the changes in functionality aren't possible by just calling
super()
and then adding additional logic.I think adding these methods to
BaseGeoTypeDownloader
could make adding additional data sources easier, both in this project, and for other users in their own projects:BaseGeoTypeDownloader.get_api_client()
: This would be called from the constructor to setsefl.api
and allow subclasses to specify a customized subclass ofcensus.Census
that supports additional API endpoints.BaseGeoTypeDownloader.get_field_type_map()
: This would be similar toBaseGeoTypeDownloader.get_raw_field_map()
except it would map from raw field names to types that would be passed topd.Series.astype()
. LikeBaseGeoTypeDownloader.get_raw_field_map()
, this would be called fromBaseGeoTypeDownloader.process()
when setting the column types after reading in the raw table. The implementation could check for the existence of aFIELD_TYPES
attribute on the table configuration class, and if that doesn't exist, default to the existing logic for ACS tables that checks the field name suffix. Adding the ability to explicitly set type conversions allows supporting non-ACS tables that might have field names that don't have the same suffix convention as ACS tables.The text was updated successfully, but these errors were encountered: