Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for Stats for the WDTK Transparency Report 2021 - 5 #925

Closed
sallytay opened this issue Nov 9, 2021 · 8 comments
Closed

Request for Stats for the WDTK Transparency Report 2021 - 5 #925

sallytay opened this issue Nov 9, 2021 · 8 comments
Assignees

Comments

@sallytay
Copy link
Contributor

sallytay commented Nov 9, 2021

Is it possible to get stats from the site for the below; for the time period: 1 November 2020 - 31 October 2021

User Information
Total number WDTK users
Number of new users
Number of banned users
Number of user accounts anonymised at user request in 2021

Deadlines are:
The annual report is scheduled to go out on 16 December
Design is scheduled to be completed by 9 December
Ideally this means that copy should be ready by 2 December

Linked to:
#910

Sally

@RichardTaylor
Copy link

There's a user count at

https://www.whatdotheyknow.com/admin/stats

@sallytay
Copy link
Contributor Author

sallytay commented Nov 9, 2021

In addition to the above user information, would it also be possible to get the data below

Total number of users banned for site misuse (not including for spamming)

@sallytay
Copy link
Contributor Author

I've added in the total number of users using the link suggested: https://www.whatdotheyknow.com/admin/stats

However this is in 'real time' as the rest of the data is being taken from 1 November 2020 to 31 October 2021 - should we have the figure as at 31 October - if that's possible.

@RichardTaylor
Copy link

The data should ideally be from the correct point in time of course.

I suspect if spam accounts are included in user count or not would introduce a more significant error than a couple of week's delay in the count.

@garethrees
Copy link
Member

garethrees commented Nov 25, 2021

from_date = Time.zone.parse('2020-11-01').at_beginning_of_day
to_date = Time.zone.parse('2021-10-31').at_end_of_day
period = from_date..to_date

users_created_before_cutoff = User.where('created_at <= ?', to_date)
users_created_in_period = User.where(created_at: period)
users_updated_in_period = User.where(updated_at: period)


# Total number of users created before the end date
users_created_before_cutoff.count
# => 241329

# Total number of users created before the end date who've confirmed their email
users_created_before_cutoff.where(email_confirmed: true).count
# => 222694

# Total number of users created before the end date who are still active
# i.e. confirmed their email, have not been banned, and have not closed
# their account.
users_created_before_cutoff.active.count
# => 212553

# ---

# Total number of users created within the given period .
users_created_in_period.count
# => 26405

# Number of users created within the given period who are still active
# i.e. confirmed their email, have not been banned, and have not closed
# their account.
users_created_in_period.active.count
# => 22847

# Number of users created within the given period who have
# subsequently confirmed their email address and are marked
# as banned for spamming.
#
# Note that these are *identified* spammers; likely to be significantly
# more in reality
users_created_in_period.where(ban_text: 'Banned for spamming').count
# => 3392

# Number of users created within the given period who have been banned
# for some other reason than obvious spam. 
users_created_in_period.
  where.not(ban_text: '').
  where.not(ban_text: 'Banned for spamming').
  count
# => 126

# Number of users created within the given period who have been
# anonymised
users_created_in_period.where(name: '[Name Removed]').count
# => 43

# ---

# Number of users where the last update was in the period who are marked
# as banned for spamming.
#
# Note that these are *identified* spammers; likely to be significantly
# more in reality
users_updated_in_period.where(ban_text: 'Banned for spamming').count
# => 3936 (EDIT: initially recorded here as 3392, but I must have mistakenly copied the stat from the users_created_in_period version)

# Number of users where the last update was in the period who have been
# banned for some other reason than obvious spam. 
users_updated_in_period.
  where.not(ban_text: '').
  where.not(ban_text: 'Banned for spamming').
  count
# => 166

# Number of users where the last update was in the period who have been
# anonymised
users_updated_in_period.where(name: '[Name Removed]').count
# => 127

@sallytay
Copy link
Contributor Author

I've now added these figures to the draft report

@sallytay
Copy link
Contributor Author

sallytay commented Dec 2, 2021

Suggestion from report for data moving forward from @mdeuk

Could we perhaps collect some metadata within Alaveteli when generating a ban - e.g. similar to how we set a prominence reason on a request (a dropdown of pre-defined options, then a freeform text box).

This might allow us to automate production of this statistic with a degree of certainty.

@sallytay
Copy link
Contributor Author

The published Transparency report can be found https://www.mysociety.org/2021/12/16/whatdotheyknow-transparency-report/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants