-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User community insight using an improved crawler #1750
Labels
Comments
http://statistics.tribler.org/ is back with IPv8 showing user communities, we just need longer term statistics now. |
qstokkink
added
type: enhancement
and removed
type: MSc Thesis Work
needs volunteer
labels
Nov 10, 2017
A 2024 update: we now have multiple crawlers but they do not meet the original goal of OP. They are semi-validated, not documented, and not reliable. Frankly, we have too many crawlers: I have a hard time remembering what we even have running. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Goal: a validated, documented and reliable crawler to understand user behavior. This enables the future step of measuring behavioral change.
We have an existing Crawler for Dispersy communities and Tribler. The general Tribler crawler stopped being updated in 2013. See: http://Statistics.tribler.org
This is annotated with our releases and major news events. However, totally unmaintained and difficult to maintain.
This crawler needs to move to a proxmox machine and improved. Improved insight will help us understand the network health and roadmap.
Expected results: real-time daily graphs of Tribler network size:
User upgrade behavior:
Examples taken from: http://crawler.doxu.org/uptimes.html
ToDo: NAT type as reported by Dispersy in our community and evolution in time.
According this Github downloads stats we have 302000 downloads of Tribler.
http://www.somsubhra.com/github-release-stats/?username=tribler&repository=tribler
However, our non-validated, many-years-old crawler only sees a few thousand users.
The thesis of Niels contains an extensive user community evaluation and "data science" portion.
http://www.tribler.org/SimilarityFunction/
Thesis.pdf: http://kayapo.tribler.org/trac/raw-attachment/wiki/SimilarityFunction/thesis.pdf
Current setup:
Kayapo web space: /var/www/statistics.tribler.org/htdocs/img/
Soft links to: /home/tribler/generate-periodic-statistics
kayapo:/home/tribler/generate-periodic-statistics# wc -l *.py
193 first_last.py
191 parse.py
169 reduce.py
553 total
Some crawlers have died a few years ago:
The text was updated successfully, but these errors were encountered: