Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GDPR Meta Issue #3954

Closed
6 tasks done
davidfischer opened this issue Apr 16, 2018 · 14 comments
Closed
6 tasks done

GDPR Meta Issue #3954

davidfischer opened this issue Apr 16, 2018 · 14 comments
Assignees

Comments

@davidfischer
Copy link
Contributor

davidfischer commented Apr 16, 2018

The GDPR comes into effect on May 25, 2018 and Read the Docs is going to use this to get our house in order. Read the Docs currently does not plan to do anything different for EU citizens than for anybody else. We want to respect user privacy as much as possible and so we're going to apply the stricter protections mandated by the GDPR to everybody.

It is unclear precisely what it means to be in compliance with the GDPR (if you are a lawyer with expertise in this subject, let us know!) but we are not going to use that as an excuse to throw up our hands and do nothing. Some of its provisions are clear enough.

The goal of this issue is to frame the discussion around the GDPR and how it applies to Read the Docs and to communicate what we are doing around data protections and privacy. This issue will be edited as more things are identified.

PERSONAL DATA

While Read the Docs tries not to collect very much personal information on users, we do collect some. Specifically, we collect at least:

  • Names and emails when somebody creates an account
  • Logged-in users can tie their account to 3rd party code hosting services like GitHub and Bitbucket
  • Names and emails exist in code repositories we have synced in order to build the docs. These code repositories are public. Does that affect things?
  • IPs in our web server log files
  • IPs are collected when users click on ads to combat ad fraud

We do not collect any data that is "considered sensitive" under the GDPR.

VARIOUS TASKS

These are things that I'm committing to by the May 25 deadline. This is a living list and should link to other issues where possible.

  • Get a privacy policy in place (No privacy policy #2602). This means mentioning all personal data we collect, when they are collected, the reasons for collection, and how long they are stored.
  • Ensure all user data is deleted when a user deletes their account
  • Limit timeframe of personally identifiable data in web server logs (see GDPR Meta Issue #3954 (comment))
  • Remove/anonymize/pseudo-anonymize IPs collected for advertising (see GDPR Meta Issue #3954 (comment))
  • Enumerate our list of partners with whom we share any data, verify we are sharing only as much as necessary, and verify their compliance
  • Update internal data policies

QUESTIONS

  • Do we have the appropriate level of consent for the data we collect?
  • What do we need to do to make sure we aren't inadvertently collecting data on minors?
  • Is our cookie policy defensible (which cookies for which reasons and for how long)? Do we need an explicit "cookie agreement" (this obviously depends on the first question here)?
  • Do we need a "Data Privacy Officer" or is just having an open line of communication through this public issue tracker sufficient? We have a team for privacy issues available at [email protected]
  • Can users edit and see all their personal data? Do we need a way to extract it? For example, if it is just a name and email, we probably don't need to do anything additional here. Users can control their data in their dashboard. We collect so little data that extraction means copy/pasting their name and email.

LINKS

EDIT HISTORY

  • 2018-05-22: Added Moz's blog on GDPR and online marketing
  • 2018-05-18: handle advertising
  • 2018-05-18: answer a few questions
  • 2018-05-02: note about not collecting any sensitive data
  • 2018-04-26: changes around plan for web server logging
  • 2018-04-18: added EFF DNT guide
  • 2018-04-16: fix typos
@davidfischer
Copy link
Contributor Author

Our current plan with respect to logs is to:

  • Retain only 10 days of logs which will be encrypted on rotation (days 2-10 will be encrypted). This will apply to .org as well as documentation sites. IPs and user agents will be present in the logs. 10 days is the limit of EFF's DNT policy which we are working toward from a compliance perspective.
  • Have a separate log for POST/PUT/DELETE/PATCH requests for .org only which will not have personally identifiable data in it (no IPs) but is retained for 90 days.

@davidfischer
Copy link
Contributor Author

As of today, we are only keeping 10 days of logs.

@davidfischer
Copy link
Contributor Author

Our privacy policy PR is here: #3978

@davidfischer
Copy link
Contributor Author

Here's the code that governs when a user deletes their account: https://github.com/rtfd/readthedocs.org/blob/dc96c6d/readthedocs/profiles/views.py#L197-L213

It looks like this does in fact delete the user model (where the name and email is) and it does cascade to their social account connections (github/bitbucket) as well as their user profile (which doesn't have anything personal).

This does not immediately delete documentation build artifacts or version control checkouts. These are public code repositories so they are probably not very sensitive but ideally they eventually get deleted.

@davidfischer
Copy link
Contributor Author

In our WIP privacy policy (#3978), I have detailed all the 3rd parties with whom we share data and what is shared.

@davidfischer
Copy link
Contributor Author

Currently Read the Docs can set the following 1st party cookies with the following durations:

  • CSRF cookie for all users (1 year)
  • Login cookie for logged in users only (2 weeks)
  • GA cookies for all users (up to 2 years)
  • Stripe cookies when visiting donation/subscription pages only (1 year)

The only 3rd party cookie I could find was a session cookie set by New Relic:

  • New Relic session cookie on all pages (session - deleted on browser close)

The CSRF and login cookies are definitely exempt from requiring a cookie agreement based on information here. Arguably the CSRF cookie should have a shorter timeframe but that's a separate issue.

Because the New Relic cookie is a session cookie, it may be exempt.

@csadorf
Copy link

csadorf commented May 8, 2018

Do we need to obtain consent from users for each individual project in case that the project uses a Google Analytics Tracking ID?

@davidfischer
Copy link
Contributor Author

@csadorf Docs authors will not need to do anything. I'm aiming to avoid a specific cookie/consent notice on docs sites and that will probably involve some changes. Any necessary changes though will be made in the Read the Docs codebase and rolled out to all docs sites automatically.

From a cookie standpoint, GA sets 1st party cookies which are not compliant currently but with some changes they may be. By default, the longest cookie lasts 2 years which is definitely unacceptable without consent. However, all of this is configurable and I should have this dialed in in the next couple weeks. GA can be run with a session cookie or even with no cookies whatsoever. In the last form, you'll lose things like the ability to differentiate new vs. returning visitors but all the rest of the data is there.

From a sharing personally identifiable data standpoint, I don't believe anything is required since Read the Docs is already instructing GA to anonymize IPs (the only personally identifiable data under GDPR shared). I do think we can do better, but from a legal standpoint, I don't believe anything is required.

Ideally, I think the solution is to proxy GA requests on Read the Docs' servers before sending to GA to anonymize data and generate a non-personal client ID in order to differentiate new vs. returning users. I think this solves the problem of sharing personal data and of the privacy complaints of visiting a docs site resulting in a request to google-analytics.com. It will result in tens of millions of extra requests though so it needs to be worked out.

We are also in the process of making people who have Do Not Track enabled not load GA whatsoever (#4046) so that might affect things as well.

@csadorf
Copy link

csadorf commented May 8, 2018

@davidfischer Thank you very much for clarifying.

@davidfischer
Copy link
Contributor Author

With respect to a data protection officer, we have created a small team internally to handle privacy related things. The email is [email protected].

@davidfischer
Copy link
Contributor Author

With respect to advertising, when somebody clicks an ad, we store some data to prevent fraud, handle billing, and to report aggregated statistics to advertisers (more on that below). We store a user agent, an anonymized version of a user's IP address, and a client ID which will change periodically per user but will be unique for a limited period of time. We believe this is in line with the GDPR and it is acceptable from a Do Not Track perspective.

We do not share personally identifiable information with advertisers such as users' IP addresses or possibly identifying info like user agents. We do not share even the anonymized IP address. We may share aggregated data (eg. a pie chart of countries where users clicked on the ad, % mobile vs. desktop, etc.).

@davidfischer
Copy link
Contributor Author

The privacy policy went live today: https://docs.readthedocs.io/en/latest/privacy-policy.html

@davidfischer
Copy link
Contributor Author

Moz had a pretty good blog yesterday regarding the GDPR and online marketing. Here's a brief summary:

  • They believe Google Analytics is good to go as long as IP anonymization is on (RTD has it on)
  • Email newsletters must be opt-in. No list buying/sharing. (our newsletter is double opt-in)
  • The privacy policy must be in plain language (ours is pretty plain)
  • No vague cookie statements like "We use cookies to give you a better experience and by using this site".

@davidfischer
Copy link
Contributor Author

We published our blog post on the GDPR and merged the somewhat related Do Not Track PR. As a result, I think we can close this.

If issues related to compliance arise, we are committed to addressing them as separate action items.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants