Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-processing coverage data for Data Visualizations #6

Open
flyingzumwalt opened this issue Jul 11, 2017 · 0 comments
Open

Pre-processing coverage data for Data Visualizations #6

flyingzumwalt opened this issue Jul 11, 2017 · 0 comments

Comments

@flyingzumwalt
Copy link

@mhucka has been exploring ways to facilitate visually drilling down into the coverage data (aka. public record of all the data held by participating orgs). Discussion of dataviz options here: https://github.com/datatogether/research/tree/master/data_visualization

This will inevitably require pre-processing of the data, partially because you often end up with situations where there are tens of thousands of items (ie. URLs) at a given layer of the navigation tree. In addition to pre-processing based on simple analysis of the content, such as running files through FITS to extract content types, there is clearly a need for deeper machine analysis. At the very least you could use entity extraction to identify patterns/topics within a corpus.

@mhucka has already been working on some of this. Let's rope in a few more people. @chrpr and @mejackreed come to mind.

The ETL pattern seems pretty applicable, and opens opportunities for experimenting with incorporating distributed data and distributed tools into machine analysis pipelines:

  1. aggregate the essential info into a workable dataset (currently tracking info in a SQL database, eventually will be distributed)
  2. analyze that dataset
  3. write the analyzed/reformatted result (ie. to IPFS)
  4. pass around a reference to the updated/processed/extended dataset (ie. IPFS hash)
@flyingzumwalt flyingzumwalt changed the title Pre-processing coverage date for Data Visualizations Pre-processing coverage data for Data Visualizations Jul 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant