Some miscellaneous examples of using Spark to analyze some common-crawl data.
The original use of these scripts were for some simple evaluations. Use them at your own risk and for an example of how to work with the data.
I copied the common-crawl datasets from s3 to a local hdfs cluster.