You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This will inevitably require pre-processing of the data, partially because you often end up with situations where there are tens of thousands of items (ie. URLs) at a given layer of the navigation tree. In addition to pre-processing based on simple analysis of the content, such as running files through FITS to extract content types, there is clearly a need for deeper machine analysis. At the very least you could use entity extraction to identify patterns/topics within a corpus.
@mhucka has already been working on some of this. Let's rope in a few more people. @chrpr and @mejackreed come to mind.
The ETL pattern seems pretty applicable, and opens opportunities for experimenting with incorporating distributed data and distributed tools into machine analysis pipelines:
aggregate the essential info into a workable dataset (currently tracking info in a SQL database, eventually will be distributed)
analyze that dataset
write the analyzed/reformatted result (ie. to IPFS)
pass around a reference to the updated/processed/extended dataset (ie. IPFS hash)
The text was updated successfully, but these errors were encountered:
flyingzumwalt
changed the title
Pre-processing coverage date for Data Visualizations
Pre-processing coverage data for Data Visualizations
Jul 11, 2017
@mhucka has been exploring ways to facilitate visually drilling down into the coverage data (aka. public record of all the data held by participating orgs). Discussion of dataviz options here: https://github.com/datatogether/research/tree/master/data_visualization
This will inevitably require pre-processing of the data, partially because you often end up with situations where there are tens of thousands of items (ie. URLs) at a given layer of the navigation tree. In addition to pre-processing based on simple analysis of the content, such as running files through FITS to extract content types, there is clearly a need for deeper machine analysis. At the very least you could use entity extraction to identify patterns/topics within a corpus.
@mhucka has already been working on some of this. Let's rope in a few more people. @chrpr and @mejackreed come to mind.
The ETL pattern seems pretty applicable, and opens opportunities for experimenting with incorporating distributed data and distributed tools into machine analysis pipelines:
The text was updated successfully, but these errors were encountered: