This is a list of various datasets that are collected by states initially and then provided to federal agencies.
I'm curious to see how many federal datasets originate at the state level and then are provided to federal agencies. In the context of Open Data, states may be reluctant to publish datasets that are collected in accordance with federal programs or with federal Funds. There may be federal rules, laws, guidance, etc that may dictate what States can do with this data. Further, this may raise issues or questions for operators of Open Data Repositories such as: Can I or should I publish data on a State Repository that is already published by a federal fgency.
Categories are based on the G8 National Action Plan definition of "high value datasets" used in the U.S. Open Data Institute State Data Census: sunlightpolicy/State-Open-Data-Census#1 (comment)
The accompanying .csv file state fed data.csv contains datasets that I have identified thus far. Contributions via pull request would be gladly accepted.
There is an additional csv file federal funding programs - for reference which is derived from the Catalog of Federal Domestic Assistance, filtered for State programs, which can be used to assist in finding datasets. The assumption here is that there would likely be some level of data provided to the funding agency.
Please do! If you have a github account, and know how pull requests work, they will be gladly accepted. Otherwise, feel free to submit additions as an issue.
Don't have a github account, or just prefer to do it the old fashioned way? Download the "state fed data.csv" file, make your addtions, and email them to me: tyler.kleykamp[at]georgetown.edu