-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove analytics #408
remove analytics #408
Conversation
Is there a different way to keep track of traffic on the website? (Sorry if that was discussed in the steerco conversation, that link isn't available to non members) |
Hi @choldgraf ! Here's some context: a little over a year ago, this came up on the Jupyter Steering Committee list about removing google analytics form ipython.org, specifically pointing to this article. There was agreement to do so, which is what happened in ipython/ipython-website#150 , and also a proposal to do the same for jupyter.org, with some discussion about possible alternatives to GA that could be used instead. So far, no one has stepped up to do that work, but in the meantime jupyter.org has continued to track users. I have just pinged the @jupyter/steeringcouncil on list pointing to this PR, encouraging further discussion to take place in the open here. |
per discussion in 2020 on the steering council list https://groups.google.com/u/1/g/jupyter-steering/c/7j7F0lyQY84/m/Ch9Qj9nMAAAJ
4d49b7d
to
32951af
Compare
+1 from me - in particular, the cookies point seems like a clear violation with the current setup. Basically we're collecting cookies without offering either disclosure nor opt-out options. These days just about every website has at least offered a consent/opt-out pop-up (which I always open to deactivate all I can). At first I thought we should try to add an alternative option as part of the removal, but after reading the article it seems pretty clear to me that right now, if we wanted to keep Google Analytics at all, we'd have to at least add the tools for cookie consent/opt-out. Since we're obviously not going to do that (it would probably be ~ as much work as replacing GA with something less intrusive), we might as well get rid of GA immediately. And then, we have an issue of how to do the work to add back some minimal analytics with a different tool - I'm all for at least tracking basic metrics of site access b/c that's useful to know, but we should do it with something better than GA. In summary, +1 for this PR in its current form, even if it leaves us without analytics for now. The cookies/GDPR argument I think is very strong and calls for immediate action. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Rough website analytics has just come up in requested information from the CZI proposals, so it would be nice to deploy something like matomo, which we do for mybinder.org. Matomo lets you do the kind of much more crude, privacy-respecting analytics that we are actually interested in (very coarse hit counter with some vague geographic distributions), and when self-hosted we don't need the "trust a third party to tell the truth" part. FWIW, I believe the It's relatively easy for mybinder.org to deploy matomo, since that's already a big kubernetes application, so one more pod and a GCP-managed SQL server is no big deal. But for what's so far a static web site, adding a persistent managed server is a big step in continuous maintenance and cost. We could pay for matomo, but at our current traffic level of ~2M pageviews/month, that would be around $500/month. Plausible Analytics (the source of that "don't use GA" article), would be closer to $70/month. I don't know anything about it, but would be happy to give it a try, and less that $1k/year is probably fine for us? Self-hosting matomo (or plausible) would likely not be a lot cheaper than that (possibly more), especially taking maintenance time into account. We also have Google Analytics sprinkled on some other sites like nbviewer and readthedocs, with cookies disabled where possible, so we might not be done stripping this out. |
I'm pretty sure we still use GA on mybinder.org as well, it is the easiest way to demonstrate impact of that service, IMO. It would be great if we can have an alternative strategy for how to demonstrate "who is using the jupyter website, where are they coming from, etc" because this can be very important in grant proposals. Also while I generally agree that we should try to move away from Google analytics, I didn't realize that the article linked above was from a company that directly competes with Google Analytics. Does anybody know of a good writeup from someone without an obvious conflict of interest? |
I think "conflict of interest" does not properly describe the linked article. You can say the piece may be biased due to the vested interest of the authors - but everything here is above board and transparent - it's literally on their company blog. I think you'd be hard-pressed to find someone describing the wound without proposing a salve. Surely we can separate the two, we don't have to buy the salve to acknowledge the wound. There's company A that makes its money selling advertisements that provides an analytics service for free, thereby increasing the quality and quantity of their product (ad space and eyeballs). And there's company B that makes its money charging for an analytics service describing the hidden costs associated with using Company A's free analytics service, "surveillance capitalism" being chief among them. To their credit, company B's article does not limit the salve to their own paid solutions: this is what gives the article credibility. They point the reader to alternatives: using server logs directly, a competitor company C, and even an alternative way to get relevant data from company A, before proposing company B's own offering. Getting back to "conflict of interest" - I think the kind of article you seek, "good writeup from someone without an obvious [vested] interest" would be the kind of article that is susceptible to having a conflict of interest. Suppose someone unaffiliated with company A writes an article "Don't believe the haters: free analytics from company A is just fine." You won't know if that article was written because the author was approached and compensated by Company A for the piece, is trying to get a job there or have a family member works there, or they own a bunch of company A stock, and so on. |
Sorry if my statement came across as strongly skeptical - a better question would have been asking about a writeup of many alternatives that didn't also come from one of those alternatives. I am just bummed that we don't know how much traffic the site is getting anymore, which pages, etc, and trying to figure out if there's another easy option (I asked in the Plausible repo if they had plans for an open source plan, but no dice) |
Not a write-up, but just a list of analytics tools: https://github.com/0xnr/awesome-analytics |
It is general knowledge that using stock google analytics poses issues wrt GDPR. |
One things unrelated to GDPR, is that other analytics tools like plausible, simple-analytics and co to allow the metrics to be public (not sure matomo). Which I think is good. I believe having the community be able to look at metrics is important for them to be able to bring up issues. Second thing, most of above-mentioned analytics tools allow multiple domains; it could be a good idea to have an account at numfocus level to make it easy for other/new projects that don't have many visit to ride on the plan of the bigger projects. |
I am supportive of moving to a more privacy respecting analytics/telemetry tool, and do see the value in having some data to help demonstrate impact and reach in a quantitative manner. |
Just for reference, it appears that using Google Analytics is now really illegal in France and Austria (and other EU countries will probably follow). |
Do we have any uses of it left still? If so, I think there's reasonable agreement of proceeding with the removal. @choldgraf is it still on Binder, you think? This legal change raises the priority of making the removal, we'll need to figure out an alternative that's not as invasive (and hopefully easy to use), as obviously getting some analytics on usage is very important both internally and regarding funders. |
yes it's still on Binder, I'll open an issue to share this context. Probably won't be act on anything myself this week though, as I am on vacation. |
issue here: jupyterhub/team-compass#491 Can we find some funds to pay for something like Plausible analytics across the project? It would really be a shame if we can't keep track of which pages our users are hitting anymore. For those of us that apply for grants, it is one of the only quantifiable metrics we have to demonstrate impact and reach. Note: I also suspect it is being used on several documentation sites that use ReadTheDocs, because they let you embed GA links directly on pages. Here's their issue where they concluded that Google Analytics wasn't an issue as long you respected "do not track". I am not an expert in this at all so defer to others on how this impacts Jupyter. |
As a side-note: I'd be happy to explore whether we can use grant funds to pay for Plausible, if it would help simplify our workflows. I bet that this would be in-scope for most grants focused on supporting Jupyter. In my opinion, paying $99/mo for something is totally worth it if it means we don't have to spend ~any time thinking about maintaining it or providing access. |
I'm pretty sure we have the funds for something like that, but I can't confirm right now - pinging @afshin @jasongrout @Ruv7 @ellisonbg so we don't forget to check on this at the Friday call (our best place right now for things like this). Totally legitimate points Chris, thx! |
I am strongly in favor of removing Google Analytics, but would love to see
us use a legal and privacy-responsible analytics service.
…On Tue, Feb 15, 2022 at 8:31 AM Fernando Perez ***@***.***> wrote:
I'm pretty sure we have the funds for something like that, but I can't
confirm right now - pinging @afshin <https://github.com/afshin>
@jasongrout <https://github.com/jasongrout> @Ruv7
<https://github.com/Ruv7> @ellisonbg <https://github.com/ellisonbg> so we
don't forget to check on this at the Friday call (our best place right now
for things like this). Totally legitimate points Chris, thx!
—
Reply to this email directly, view it on GitHub
<#408 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAGXUCYP47DX7IYWHLARZLU3J5UZANCNFSM445JM3EQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Brian E. Granger
Senior Principal Technologist, AWS AI/ML ***@***.***)
On Leave - Professor of Physics and Data Science, Cal Poly
@ellisonbg on GitHub
|
As a follow-up: I'm sitting in a Scipy talk right now about scientific-python.org, which is (if I understand correctly) setting up a plausible instance for various projects in the community share and use: https://views.scientific-python.org/login |
To add on jason message, the instance does not track users, but only page view, and in a server which scientific-python.org own and is GDPR and european compliant. |
In the example image in the talk, it did have a statistic for "unique visitors", but I'm not sure how they do that. |
I think we should open an issue to track setting up plausible for the jupyter website. (or some other analytics tracker like matomo) - i think it'd be well worth the price if we can find a way to use analytics data as part of fundraising etc |
I think what is missing from Jason message and my reply is that such a server is set up, and it is https://views.scientific-python.org, we can just ask them for a tracking code for jupyter.org. And really when I say "them" it's also "us", as the folks who did that are Stefan, Jarod, ... |
@Carreau ah that is excellent, I was just asking about this in the scientific-python Discord. It seems like a good idea to me. For what it's worth we are also adding support for Plausible to the PyData theme, so we could re-use that across the Jupyter projects: |
per discussion a year ago on the steering council list
https://groups.google.com/u/1/g/jupyter-steering/c/7j7F0lyQY84/m/Ch9Qj9nMAAAJ