-
-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs metrics trial with Plausible #204
Comments
Missed the thread from discuss, I don't spend a lot of time connected those days, but I'm against all form of user tracking:
|
I think the investment in these resources, is worthwhile because it motivates translators to stay connected with the impact of their work. By enabling them to monitor recent views on their translations, we create an environment that encourages their engagement and improves the quality of their contributions. Despite these costs, the value gained from this investment outweighs them.
This is what Ee said in our conversation about the traffic data of docs.python.org:
I believe going with Plausible is the best option when considering its advantages and disadvantages. |
And https://plausible.io/lightweight-web-analytics says the https://plausible.io/js/script.js is <1 KB. Picking a docs page at random, 76.4 KB is transferred for https://docs.python.org/3/library/json.html. 77.4 KB instead should be a negligible difference.
Plausible say of their hosted option:
Their Data Processing Agreement says (again about their hosted option):
As @egeakman mentioned, we started off asking Ee if it's possible to get fresh stats out, but unfortunately it's no longer feasible. Thanks for pointing out the old top 50, that gives a good rough idea, but the docs have changed in the past 7 years and there are new pages. It would also be good to know more than just the top 50 (which includes Python 2 pages which we're no longer interested in). |
I agree that Plausible seems way way way better than GA, yet I hardly changes my mind. About the costs for example, OK it's way smaller than GA, yet it's infinitely bigger than no tracking. The costs are far from negligible, for every page view it's a DNS query, a 3 way TCP handshake, the certificate verification (costs non-negligible CPU usage), potentially another DNS query if the analytics script, at runtime, hit another domain, ... Calculating the exact cost of it is near impossible, the AFNIC (.fr operator) tried to compute the cost of a single DNS query (and answer), they went deep on the road, even trying to compute the amortized cost of network hardware along the path. The conclusion IIRC is: we keep adding things, it forces network to adapt in the long term (upgrade hardware, which itself has an (amortized) cost), but removing one thing does not allow to "downgrade"/"reimburse" already installed hardware. OHHH you know what could change my mind a little bit about the costs? Client-side Bernoulli sampling. If we could make the client-side script hosted on d.p.o (to avoid DNS query and new TCP connection) and make it only log 1 out of n visits, it would divide by n some of the involved costs. Not the humans costs though. Put n=10_000 or so and we got non-negligible "gains" vs 1 query per page view (again, infinitely more than not having analytics at all). |
That sounds like a lot but opening the query inspector on any browser I bet that the cost of downloading just the Python logo is way more than that. When people say "negligible", I suppose they mean "negligible with respect to what it's already in there". |
Measuring https://hugovk-cpython.readthedocs.io/en/plausible/ which has Plausible's
Here's analysis from PageSpeed Insights (like Lighthouse in Chrome): I think the main problem is the page itself is so big at 2,636 DOM elements (16.3 kB, 33 ms) and not caching our own static assets. No mention of Looking at a "treemap" of the JavaScript, |
Seems like this discussion should be happening at https://discuss.python.org/t/docs-metrics-trial-with-plausible/28896?u=hugovk, not here. |
The steering council resolved this internally via chat, we all agree: Go ahead with a Plausible trial for the docs. |
Can that be discussed in the appropriate tracker? (not sure if it’s sphinx, cpython, psf infra…) |
In the Docs Community monthly meetings, we’ve discussed the need for gathering some metrics of page views from the docs:
Plausible looks a promising choice:
The Docs Community would like to run a 30-day trial of the hosted version for https://docs.python.org/. This would let us know how many page views we’d get, so we can see at what pricing level we’d need for paid hosting, and whether to consider self-hosting.
I asked first on DPO and there were no objections (and 10x👍).
Would a 30-day trial be okay with the Steering Council?
Thanks!
The text was updated successfully, but these errors were encountered: