Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do Not Track support #4046

Merged
merged 16 commits into from
May 30, 2018
Merged

Conversation

davidfischer
Copy link
Contributor

@davidfischer davidfischer commented May 1, 2018

What DNT means for us:

  • No behavioral ad targeting. We weren't doing that anyway.
  • When DNT is enabled, do not load Google analytics or send anything to analytics
  • No long lasting unique cookies
  • Logs must be deleted after 10 days (we are going to do that anyway, see GDPR Meta Issue #3954 (comment))

New endpoints

  • /.well-known/dnt/ - this shows the DNT status of a particular user and links to our privacy policy.
  • /.well-known/dnt-policy.txt - this is the EFF's DNT policy verbatim (as they recommend) including trailing whitespace and all

Cookie changes

  • (Edit: a previous version of this PR had the CSRF cookie as a session cookie) The CSRF cookie is changed to last 30 days instead of a 1 year. The EFF policy says only a little about CSRF protection and it doesn't seem to match what Django does at all. I think this is a simple change that won't cause problems and will proactively reduce discussion the CSRF cookie being used as a unique identifier.
  • The longest lived GA cookie is set to 30 days instead of 2 years. This isn't strictly required by DNT (GA will be disabled if DNT is enabled). The biggest drawback to this is that users who visit once every 31 days will be counted as "new" instead of "returning". I don't think we care about that.

This doesn't delete any existing cookies so users who are cookied with GA already will have a long lasting cookie.

Advantages to us

More

For more details on Do Not Track, see:

NOTE: This should not be merged until after #3978

- Handle navigator.doNotTrack == 'unspecified'
- Make GA actually store persistent cookies
@agjohnson agjohnson added this to the 2.4 milestone May 3, 2018
@@ -59,6 +59,8 @@ class CommunityBaseSettings(Settings):
SESSION_COOKIE_DOMAIN = 'readthedocs.org'
SESSION_COOKIE_HTTPONLY = True
CSRF_COOKIE_HTTPONLY = True
# See: docs/advertising-details.rst
CSRF_COOKIE_AGE = None # session cookie (expires on browser quit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break user experience

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How so?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have closed the browser and restore the window, the page will be loaded from cache. but the CSRF cookie will not be there. So it may make submittion CSRF Error. maybe you can try using django session CSRF?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is potentially true and is warned about in the Django docs. I don't think it will be a big issue for us since it only affects form submissions of which the only ones on a non-authed page are the login/signup forms. However, we could make the CSRF cookie age match the logged in cookie age (~2 weeks) to mitigate it. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set it to 30 days so it matches the GA cookie. I think that's a reasonable balance so it's pretty obvious we aren't using it to track users.

@@ -2,33 +2,40 @@
// https://docs.readthedocs.io/en/latest/advertising-details.html#analytics


// RTD Analytics Code
// Skip analytics for users with Do Not Track enabled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use something like this script to check if do not track is enabled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That entire script is basically to handle an IE10 bug. I'm not sure it's worth the effort.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be in a script or a function that we can call from any scipt. Regarding IE, we may still support IE10.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IE10 is not supported by Microsoft with a couple exceptions and it is a tiny fraction of our users (sub-0.1%). I don't think it is unreasonable to not support a privacy feature for users who are using a browser unsupported by the vendor. In addition, the "support" that the linked script offers is mostly just to mark IE10 as "unspecified" for DNT which for our purpose would be off.

I lean toward simplicity here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understand. then we can keep window.doNotTrack === '1' || navigator.doNotTrack === '1') in a function and call it from everywhere!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, after testing this I'm reconsidering. It looks like IE11 on Windows 7 and Windows 8 set the DNT default to on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that to use this script would mean marking IE11 and IE10 as having DNT as "unspecified" I'm leaning toward maybe just checking navigator.doNotTrack === '1' and that's it. This would mean that no versions of IE can opt-out of tracking. It would be supported in MS Edge, however.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can keep window.doNotTrack === '1' || navigator.doNotTrack === '1') in a function and call it from everywhere!

Not very easily actually. readthedocs-analytics.js is loaded on the docs pages and should not have any outside dependencies apart from READTHEDOCS_DATA.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the PR to just check navigator.doNotTrack === '1'.

@davidfischer davidfischer mentioned this pull request May 8, 2018
6 tasks

* Users can opt-out of analytics by using the Do Not Track feature of their browser.
* Read the Docs instructs Google to anonymize IPs sent to them before they are stored.
* The cookies set by GA expire more rapidly (30 days) than the default.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"We configured the cookies set by GA to only last 30 days, instead of the default of 2 years" reads better.

@@ -110,13 +110,50 @@ However, we always give advance notice in our issue tracker
and via email about showing ads where none were shown before.


.. _do-not-track:

Do Not Track Policy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this is the right part of the docs for this. It feels like it should probably be it's own thing, since it applies to RTD itself, and not just ads.

I realize there are ad-specific things, so maybe a DNT page in the docs, and then also a section here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps having it in the Privacy Policy is enough as an additional section. I guess it depends how heavily we want to promote the fact that we support it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a section in the privacy policy is good and the advertising details will link there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@davidfischer
Copy link
Contributor Author

A small update to the plan:

  • DNT will also apply to the .com site. There aren't ads there so really all this means is the logging duration must be 10 days and DNT opts-out of GA.
  • I added a setting to settings.py to enable DNT. It is off by default so that other people using our code base don't accidentally declare DNT support.

@@ -2,33 +2,40 @@
// https://docs.readthedocs.io/en/latest/advertising-details.html#analytics


// RTD Analytics Code
// Skip analytics for users with Do Not Track enabled
if (navigator.doNotTrack === '1') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the JS respect the setting as well? Probably needs to be added to the context manager.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment. Can you elaborate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't add this JS opt out unless the DO_NOT_TRACK_ENABLED setting is True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, yes we want to only respect DNT if people running RTD have the setting enabled. However, the use case we are supporting by changing that is to help people taking the readthedocs.org code base and don't want to support DNT.

To do this, we would need to pass the DNT setting through the READTHEDOCS_DATA object. We can do that, but I don't think it's worth it. Do you think it is?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, that makes sense for user docs. I guess I was thinking for the base.html more than the analytics.js.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's the only tidbit I'd add before it's deployable -- just so that all our users dashboards don't get DNT'd automatically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to add this to base.html, but user dashboards are going to get DNT'd. The only thing this will change is that people who take the readthedocs.org codebase and use it in their infrastructure won't get DNT'd automatically on their installation except on docs pages where they still will.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants