Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent 'calculated homepage' from being generated for certain domains #6434

Closed
mdeuk opened this issue Aug 4, 2021 · 5 comments · Fixed by #7836
Closed

Prevent 'calculated homepage' from being generated for certain domains #6434

mdeuk opened this issue Aug 4, 2021 · 5 comments · Fixed by #7836
Labels
easier-admin Make issues easier to resolve f:authorities improvement Improves existing functionality (UI tweaks, refactoring, performance, etc) re-user-experience reduce-admin Reduce issues coming to us in the first place x:uk

Comments

@mdeuk
Copy link
Collaborator

mdeuk commented Aug 4, 2021

Originally posted by @RichardTaylor in #427 (comment)

On a very closely related point we should stop calculated home pages eg. googlemail.com

Do we need a list of exceptions ?

This could be assembled from a list of the most common domains used in request addresses, presumably after excluding .ac.uk/.nhs.uk/gov.uk then hotmail / aol / gmail / outlook would come top and we could treat the latter specially?

We did some rough statistics on this issue on mysociety/whatdotheyknow-theme#690 previously - it is certainly causing a data quality issue in the UK, where we have a rather surprising number of public bodies which rely on either free or ISP provided email addresses!

Naturally, we should be surprised that our public bodies are conducting official business using free(mium) email products which are unlikely to meet the standards required by public records legislation, but at the same time, we should ensure our software doesn't unintentionally mislead people by sending them in the wrong direction.

This data quality issue creates issues for re-users of our data, such as the use case which prompted the WDTK ticket - e.g. our data being used to populate a Wikidata dataset; and given the pervasion of non-official email addresses for public bodies is unlikely to be a UK specific 'feature', I'd suggest it creates issues for re-users of data on other Alavateli websites as well.

@mdeuk mdeuk added easier-admin Make issues easier to resolve f:authorities re-user-experience reduce-admin Reduce issues coming to us in the first place x:uk labels Aug 4, 2021
@garethrees garethrees added the improvement Improves existing functionality (UI tweaks, refactoring, performance, etc) label Aug 5, 2021
@garethrees
Copy link
Member

our data being used to populate a Wikidata dataset

Linking to #6535.

@garethrees
Copy link
Member

Some feedback from a former colleague:

I pointed a colleague at the all-authorities.csv file on WDTK (linked from https://www.whatdotheyknow.com/help/api; takes a while to download), and they noticed that there are a lot of repeated and obviously-not-right homepages for the authorities listed. e.g. Wigton Town Council (https://www.whatdotheyknow.com/body/wigton_town_council) is listed as having a homepage of http://btinternet.com/.

@RichardTaylor
Copy link

Perhaps calculated homepages should be excluded from the CSV download?

If a calculated homepage is in the CSV and the CSV gets downloaded, and re-uploaded, the calculated hompage would become a normal entry in the homepage field.

Also when this issue is fixed removal of any homepages set to eg. gmail.com btinternet.com etc. should be done. There might not be many / any manually set to such domains, if there are any they probably arose via the spreadsheet download/upload mechanism described in the previous paragraph.

@garethrees
Copy link
Member

I think we should only calculate a homepage for a default value in the form, which could then be deleted from the field should it not look sensible, rather than the current system that dynamically generates it.

@garethrees
Copy link
Member

We now list over 8000 parish councils on WDTK. Many (1000s) have gmail, hotmail, outlook, btinternet, etc email addresses.

mysociety-pusher pushed a commit that referenced this issue Jul 21, 2023
Many smaller authorities use a request email address from a general
purpose public email provider. We don't want e.g. gmail.com being set as
the home page for the authority.

This is configurable in the theme by setting the class attribute:

    PublicBody.excluded_calculated_home_page_domains = %w[
      example.org
      example.net
    ]

Fixes #6434
mysociety-pusher pushed a commit that referenced this issue Jul 24, 2023
Many smaller authorities use a request email address from a general
purpose public email provider. We don't want e.g. gmail.com being set as
the home page for the authority.

This is configurable in the theme by setting the class attribute:

    # Add to the defaults
    PublicBody.excluded_calculated_home_page_domains << %w[
      example.net
    ]

    # Clear the defaults and set your own list
    PublicBody.excluded_calculated_home_page_domains = %w[
      example.org
      example.net
    ]

Fixes #6434
mysociety-pusher pushed a commit that referenced this issue Jul 24, 2023
Many smaller authorities use a request email address from a general
purpose public email provider. We don't want e.g. gmail.com being set as
the home page for the authority.

This is configurable in the theme by setting the class attribute:

    # Add to the defaults
    PublicBody.excluded_calculated_home_page_domains << %w[
      example.net
    ]

    # Clear the defaults and set your own list
    PublicBody.excluded_calculated_home_page_domains = %w[
      example.org
      example.net
    ]

Fixes #6434
mysociety-pusher pushed a commit that referenced this issue Jul 24, 2023
Many smaller authorities use a request email address from a general
purpose public email provider. We don't want e.g. gmail.com being set as
the home page for the authority.

This is configurable in the theme by setting the class attribute:

    # Add to the defaults
    PublicBody.excluded_calculated_home_page_domains << %w[
      example.net
    ]

    # Clear the defaults and set your own list
    PublicBody.excluded_calculated_home_page_domains = %w[
      example.org
      example.net
    ]

Fixes #6434
mysociety-pusher pushed a commit that referenced this issue Jul 24, 2023
Many smaller authorities use a request email address from a general
purpose public email provider. We don't want e.g. gmail.com being set as
the home page for the authority.

This is configurable in the theme by setting the class attribute:

    # Add to the defaults
    PublicBody.excluded_calculated_home_page_domains << %w[
      example.net
    ]

    # Clear the defaults and set your own list
    PublicBody.excluded_calculated_home_page_domains = %w[
      example.org
      example.net
    ]

Fixes #6434
@gbp gbp closed this as completed in 5cc84cd Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
easier-admin Make issues easier to resolve f:authorities improvement Improves existing functionality (UI tweaks, refactoring, performance, etc) re-user-experience reduce-admin Reduce issues coming to us in the first place x:uk
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants