Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for Stats for the WDTK Transparency Report 2021 - 3 #923

Closed
sallytay opened this issue Nov 9, 2021 · 9 comments
Closed

Request for Stats for the WDTK Transparency Report 2021 - 3 #923

sallytay opened this issue Nov 9, 2021 · 9 comments
Assignees
Labels
administrative-task non-developer Tasks suitable for non-developers

Comments

@sallytay
Copy link
Contributor

sallytay commented Nov 9, 2021

Is is possible to get stats from the site for the below; for the time period: 1 November 2020 - 31 October 2021

GDPR concerns and High Risk Concerns email to User Support Inbox (data from inbox) @sallytay will source this
Data Breach
Data Breach - internal
Right of Access
Right to Erasure
Right to Rectification
Defamation
Harassment

Plus: Number of GDPR personal information take-down requests in the year / average per day - from GDPR spreadsheet @mdeuk would you be able to advise the best way to get this data from the spreadsheet

Deadlines are:
The annual report is scheduled to go out on 16 December
Design is scheduled to be completed by 9 December
Ideally this means that copy should be ready by 2 December

Linked to:
#910

Sally

@sallytay sallytay self-assigned this Nov 9, 2021
@mdeuk
Copy link
Collaborator

mdeuk commented Nov 9, 2021

Is is possible to get stats from the site for the below; for the time period: 1 November 2020 - 31 October 2021

GDPR concerns and High Risk Concerns email to User Support Inbox (data from inbox) @sallytay will source this Data Breach Data Breach - internal Right of Access Right to Erasure Right to Rectification Defamation Harassment

Plus: Number of GDPR personal information take-down requests in the year / average per day - from GDPR spreadsheet @mdeuk would you be able to advise the best way to get this data from the spreadsheet

There is certainly a correlation between the number of emails and cases recorded in the tracker; but I would add some caution as there are quite often multiple threads surrounding a similar case.

We track the following types of cases:

  1. Access (Art 15)
  2. Rectification (Art 16)
  3. Erasure (Art 17)
  4. Restrict Processing (Art 18)
  5. Portability (Art 20)
  6. Object (Art 21)
  7. Pending (PE)
  8. Data Breach - reporting (BR)
  9. Data Breach - WDTK (BI)

We can clearly exclude "Pending (PE)" as this is only used for cases where the category was never specified (and there are no relevant cases within the period); however, I assume you would wish a number of cases for each category in points 2 - 6 (e.g. UK GDPR articles 16 - 18, 20 - 21), would that be right?

Two things to note: A single case could encompass a number of WDTK requests, as typically the cases we see are user-specific, rather than a particular request. Additionally, the level of detail in the tracking isn't always robust enough to allow us to extract how the case was closed, and precisely when. There is some work to be done surrounding this.

Query:

Are you also looking for the total number of Right of Access requests (RoAR / SAR)?
We record the total case numbers for these cases, and additionally, we do keep records of requests that we have refused outright for whatever reason. Typically, this would be a case where we have required a user to provide identification, or in a rare scenario, where their request may have been deemed to be vexatious.

Query 2:

Additionally, are you looking for the number of data breach cases?

To explain the terminology - we use the "reporting (BR)" category for cases where we are reporting a third party breach, e.g. one made by a public body. This can involve reporting to the public body, and also to the Information Commissioner's Office, based on circumstances.

We use the "WDTK (BI)" category when recording a breach made by a WhatDoTheyKnow administrator. There have been <5 in the reporting period, with no special category data present (minor administrative issues / near-misses). We keep records of any such instance in line with our obligations under UK GDPR.

@sallytay
Copy link
Contributor Author

Data from the inbox has been collected and added to the draft report

Still to do: GDPR spreadsheet

@sallytay
Copy link
Contributor Author

I've tried to extract the data from the spreadsheet for the following columns between the datas of 1 November 2020 - 31 October 2021

I've made a copy of the master sheet so as to not disrupt the main spreadsheet.
https://docs.google.com/spreadsheets/d/1phit3pF1WItAUm7_c78cbwDSPZlV6LXtDSzaZPuTO44/edit#gid=0

Right (all types)
Decision (all types)
Erased? (all types)

I've done this by filtering the main request log by the date range and the using a Pivot table to get the results. However these don't seem to be returning reliable results. If anyone has any advice on what I might be doing wrong that would be appreciated.

@mdeuk mdeuk self-assigned this Nov 26, 2021
@mdeuk
Copy link
Collaborator

mdeuk commented Nov 26, 2021

Right (all types) Decision (all types) Erased? (all types)

@sallytay, would the data requested roughly take the format of:

  • Date logged (so we can calculate daily stats)
  • Which GDPR right (or rights) were invoked
  • What the decision / outcome was

Do you need any additional headings / data, or is this high level data enough?

Notes:

We don't, at present, record closure dates - or dates of when next action is due. I can manually interpret the first one however.

Reason: historically, we've never had a specific need to. The latter part is a planned improvement, but I can't see why we couldn't extend this to including logging when a case is closed.

Interpreting the decision isn't always possible based on the options chosen. I can make an attempt to manually interpret them.

Reason: there is quite a bit of latitude for the person closing a case to not record certain elements - this is a known issue so to speak, and certainly something we will need to look at improving, as part of a review of our IRM records management. mysociety/whatdotheyknow-private#239, mysociety/whatdotheyknow-private#238

Drop me a line if you identify anything else and I'll see what I can do. 😃

@sallytay
Copy link
Contributor Author

@mdeuk

I was hoping that from the data on the spreadsheet, we could get a view of

How many cases were recorded on the spreadsheet between 1 November 2020 - 31 October 2021
Of these how many cases were for each GDPR Right
For each GDPR right, how many of each decision type

So for example it might be that there were:

Overall total of 100 cases were recorded on the spreadsheet, of these there were 50 GPDR Right to Erasure Requests, of these we complied with 10, complied in part with 10, refused 10, etc

For the purposes of this I don't need anything more detailed than this. I can work out averages from the top level figures.

However, I understand due to the nuances of how people fill out the spreadsheet, that the data may not be accurate enough to use in the Transparency Report this time - this is just our initial attempt. And that this wasn't really the purpose of the spreadsheet when it was created so it may unfairly be too much to source data from here at the moment. As you say something to look at for the suggested improvements.

With the deadline fast approaching and as this is a trial run anyway, I was just after data that was easy to retrieve from the spreadsheet by a simple search and filter - but I've not managed to do this accurately. if this isn't possible perhaps this is something to work on for next year and leave for this year. We do have other data from other sources that is useful..

@mdeuk
Copy link
Collaborator

mdeuk commented Nov 26, 2021

With the deadline fast approaching and as this is a trial run anyway, I was just after data that was easy to retrieve from the spreadsheet by a simple search and filter - but I've not managed to do this accurately. if this isn't possible perhaps this is something to work on for next year and leave for this year. We do have other data from other sources that is useful..

I shall have a look over the weekend and let you know if I can wrangle the dataset together to produce what you need 😀

@mdeuk
Copy link
Collaborator

mdeuk commented Dec 1, 2021

@sallytay based on some Data Studio reporting, we had 415 cases in the period - I've sent some high level data to volunteers@, but I would note that it doesn't include closure reasons, as the data wasn't accurate enough.

Once we've standardised how cases are closed, this should improve - onwards and upwards!

Hopefully, however, it does provide something of use.

@sallytay
Copy link
Contributor Author

sallytay commented Dec 1, 2021

Data has been retrieved from the inbox and added to the draft report. Using label and data filters.

Filter used: label:XXX after:2020/11/01 before:2021/10/31

And total threads just using the date filter, full workings are here for reference https://docs.google.com/spreadsheets/d/1RhUxgXz-ue6VdJFF5L9x-XvGnX0QR5OeMUybv9IYWFI/edit#gid=0

@sallytay
Copy link
Contributor Author

The published Transparency report can be found https://www.mysociety.org/2021/12/16/whatdotheyknow-transparency-report/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
administrative-task non-developer Tasks suitable for non-developers
Projects
None yet
Development

No branches or pull requests

3 participants