Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for Stats for the WDTK Transparency Report 2021 - 2 #922

Closed
sallytay opened this issue Nov 9, 2021 · 12 comments
Closed

Request for Stats for the WDTK Transparency Report 2021 - 2 #922

sallytay opened this issue Nov 9, 2021 · 12 comments
Assignees

Comments

@sallytay
Copy link
Contributor

sallytay commented Nov 9, 2021

Is is possible to get stats from the site for the below; for the time period: 1 November 2020 - 31 October 2021

Total number of requests removed from the site
Broken down into
Visability:
Hidden Requests
Requsestor_Only
BackPage

Reason:
Not a valid FOI request
Personal correspondence
Subject access request
A vexatious request

Deadlines are:
The annual report is scheduled to go out on 16 December
Design is scheduled to be completed by 9 December
Ideally this means that copy should be ready by 2 December

Linked to:
#910

Sally

@sallytay
Copy link
Contributor Author

sallytay commented Nov 9, 2021

Would it also be possible to get:

Total number of requests ‘censored’ in 2021 - maybe number of censor rules applied?

Sally

@RichardTaylor
Copy link

Requests which are backpages aren't really removed from the site.

There will be other reasons than those listed above, but those are the reasons recorded by the site. So there should be an "other" category.

@RichardTaylor
Copy link

Total number of requests ‘censored’ in 2021 - maybe number of censor rules applied?

Including or excluding those not visible to the public? (Sorry for adding complexity!)

@garethrees
Copy link
Member

WIP on this; just recording so that I can get back to it.

from_date = Time.zone.parse('2020-11-01').at_beginning_of_day
to_date = Time.zone.parse('2021-10-31').at_end_of_day
period = from_date..to_date

hide_events =
  InfoRequestEvent.
    where(event_type: 'hide').
    where(created_at: period)

# Total number
hide_events.count
# => 824

# Events grouped by report reason
hide_events.each_with_object(Hash.new(0)) do |event, memo|
  reason = event.params[:reason]
  memo[reason] += 1
end
# => { "vexatious" => 29,
# =>   "not_foi"   => 671,
# =>   nil         => 124 }

# Events grouped by new prominence value
hide_events.each_with_object(Hash.new(0)) do |event, memo|
  reason = event.params[:prominence]
  memo[reason] += 1
end
# => { nil            => 700,
# =>  "hidden"         => 8,
# =>  "backpage"       => 11,
# =>  "requester_only" => 105 }

@garethrees garethrees self-assigned this Nov 25, 2021
@garethrees
Copy link
Member

I've been trying to figure out why we have hide events with no reason:

# Events grouped by report reason
hide_events.each_with_object(Hash.new(0)) do |event, memo|
  reason = event.params[:reason]
  memo[reason] += 1
end
# => { "vexatious" => 29,
# =>   "not_foi"   => 671,
# =>   nil         => 124 }

It looks like this happens when we change prominence via "Edit metadata" rather than "Hide the request and notify the user".

Ordinarily editing through "Edit metadata" records an "edit" event…

https://github.com/mysociety/alaveteli/blob/0.40.0.0/app/controllers/admin_request_controller.rb#L54

…but we have a callback that checks the changed attributes recorded in the event, and if only prominence has changed to something other than "normal", we swap out the event type to "hide":

https://github.com/mysociety/alaveteli/blob/0.40.0.0/app/models/info_request_event.rb#L89-L91

I suppose we'll just have to say these were hidden for "other reasons".

Why do we only have "vexatious" and "not_foi" when we have 4 options in the "Hide the request and notify the user" actions form?

The reason maps to a state (i.e. classification) that the request gets set to after the action, so we decided that other than the vexatious reason, the others are just particular cases of "not_foi".

https://github.com/mysociety/alaveteli/blob/0.40.0.0/app/views/admin_request/_hidden_user_explanation_reasons.html.erb

@garethrees
Copy link
Member

Note that the above query may count multiple hide events against a single request, so it's not quite true to say "we hid 824 requests" – looks like it's 822 requests, with one or more requests being hidden multiple times for whatever reason.

hide_events.count
# => 824

hide_events.pluck(:info_request_id).uniq.count
# => 822

@garethrees
Copy link
Member

So let's look at why we have lots of "hide" events with no prominence:

prominence_changes = hide_events.each_with_object(Hash.new(0)) do |event, memo|
  reason = event.params[:prominence]
  memo[reason] += 1
end
# => { nil            => 700,
# =>  "hidden"         => 8,
# =>  "backpage"       => 11,
# =>  "requester_only" => 105 }

In this case, these events look like they've been created through the "Hide the request and notify the user" action.

In this case we log a "hide" event, but don't record the change in prominence:

https://github.com/mysociety/alaveteli/blob/0.40.0.0/app/controllers/admin_request_controller.rb#L164-L171

We can see that through this action we set the prominence of the request to "requester_only" though, so we can add the nil count to the "requester_only" count.

@garethrees
Copy link
Member

Finally, censor rules:

from_date = Time.zone.parse('2020-11-01').at_beginning_of_day
to_date = Time.zone.parse('2021-10-31').at_end_of_day
period = from_date..to_date

censor_rules = CensorRule.where(created_at: period)

# Total number created within the period
censor_rules.count
# => 881

# Number of individual requests affected by censor rules created within
# the period
censor_rules.pluck(:info_request_id).uniq.compact.size
# => 196

# Number of individual users affected by censor rules created within
# the period
censor_rules.pluck(:user_id).uniq.compact.size
# => 122

# Number of individual public bodies affected by censor rules created
# within the period
censor_rules.pluck(:public_body_id).uniq.compact.size
# => 1

# ---

censor_rules_applied_to_visible_requests = 
   censor_rules.
     joins(:info_request).
     references(:info_request).
     where(info_requests: { prominence: %w(normal backpage) })

# Total number created within the period that are linked to visible
# requests
censor_rules_applied_to_visible_requests.count
# => 595

# Number of visible individual requests affected by censor rules
# created within the period
censor_rules_applied_to_visible_requests.
  pluck(:info_request_id).uniq.compact.size
# => 188

@garethrees
Copy link
Member

Wasn't requested but while I was thinking about it, a little info on hiding of individual messages:

from_date = Time.zone.parse('2020-11-01').at_beginning_of_day
to_date = Time.zone.parse('2021-10-31').at_end_of_day
period = from_date..to_date

# ---

edit_outgoing_events =
  InfoRequestEvent.
    where(event_type: 'edit_outgoing').
    where(created_at: period)

hide_outgoing_events =
  edit_outgoing_events.select { |e| e.params[:prominence] != 'normal' }

# Number of times we've hidden outgoing messages within the period
hide_outgoing_events.each_with_object(Hash.new(0)) do |event, memo|
  reason = event.params[:prominence]
  memo[reason] += 1
end
# => { "hidden"         => 10,
# =>   "requester_only" => 35 }

# ---

destroy_outgoing_events =
  InfoRequestEvent.
    where(event_type: 'destroy_outgoing').
    where(created_at: period)

# Number of times we've destroyed outgoing messages within the period
destroy_outgoing_events.count
# => 0

# ---

edit_incoming_events =
  InfoRequestEvent.
    where(event_type: 'edit_incoming').
    where(created_at: period)

hide_incoming_events =
  edit_incoming_events.select { |e| e.params[:prominence] != 'normal' }

# Number of times we've hidden incoming messages within the period
hide_incoming_events.each_with_object(Hash.new(0)) do |event, memo|
  reason = event.params[:prominence]
  memo[reason] += 1
end
# => { "hidden"         => 43,
# =>   "requester_only" => 90 }

# ---

destroy_incoming_events =
  InfoRequestEvent.
    where(event_type: 'destroy_incoming').
    where(created_at: period)

# Number of times we've destroyed incoming messages within the period
destroy_incoming_events.count
# => 105

@garethrees garethrees assigned sallytay and unassigned garethrees Nov 26, 2021
@sallytay
Copy link
Contributor Author

Thank you for all the data I've now added those into the draft report.

Apologies I seem to have forgotten to request one data set, which was total number of requests made.

Sally

@garethrees
Copy link
Member

Apologies I seem to have forgotten to request one data set, which was total number of requests made.

No problem!

from_date = Time.zone.parse('2020-11-01').at_beginning_of_day
to_date = Time.zone.parse('2021-10-31').at_end_of_day
period = from_date..to_date

# Total number of requests created in the period
InfoRequest.where(created_at: period).count
# => 100092

# Total number of outgoing messages created in the period
OutgoingMessage.where(created_at: period).count
# => 141131

# Total number of incoming messages created in the period
IncomingMessage.where(created_at: period).count
# => 237154

@sallytay
Copy link
Contributor Author

The published Transparency report can be found https://www.mysociety.org/2021/12/16/whatdotheyknow-transparency-report/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants