Skip to content

MartinArens/open_web_data_mining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Domain Metadata Analysis

  • Root Domain Crawl

    • Javascript / Cookie Tracking
    • Javascript Libs
    • SSL available
    • Page Speed
    • Domain Whois Data
    • Security Issues
    • HTTP Server
    • HTTP Protocol
    • structured Data (schema.org)
    • Used HTML Tags ("iframe", "svg", ...)
    • Content Management Systems
    • PHP Versions
    • RSS/Atom feeds
  • Full Domain Crawl

    • Match Tracking Data with data privacy statement
    • Referrer
    • Redirects
    • Broken Links
  • time consuming Crawl

    • SSL Implementation / Rating
    • HTML Validation (w3.org)
    • Ports (MySQL, MongoDB, ...)

Other similar Projects

Domain Lists

Used Libs and Formats

Splash - Lightweight, scriptable browser as a service with an HTTP API

adblockparser - Parser for Adblock Plus rules

HTTP Archive format (HAR)

HTTP Archive format (HAR) Viewer

Publish

Keywords

"Webometrie" "Webometrics" "Cybermetrics" "Web Mining" "Internet Data Mining", "Internet Research", "Internet Technologie Trends"

Crawler Performance without Threads

avg sec. * domain count = duration sec. / 86400 = duration days 5 * 1000000 = 5000000 / 86400 = 57.8 days

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages