Skip to content

Latest commit

 

History

History
20 lines (12 loc) · 426 Bytes

File metadata and controls

20 lines (12 loc) · 426 Bytes

Common crawl to sentiment to chernoff faces to worldmap suposition

#install lots of stuff, ask me

install https://github.com/internetarchive/warctools

for warcfilter and warcextract

Repo to document 2nd place standing at Big Open Data Hackathon. Dissecting the Common Crawl Corpus of WARC files.

Team Members (in alphabetical order):

Alex Aruj Andrew Defries Adam Ericksen Trent Robbins Ed Tsang Amir Youssefi