-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New london.gov.uk breaks TheyWorkForYou questions scraper #1687
Comments
Have sent a message about a data feed. |
Had a reply, and they've suggested they should be creating an RSS feed:
I think this would work well for us? We'd still have to make a new scraper, but it should be more stable over the long term. I can have a look at the fields the scraper was originally extracting to pass back to them - do we have any other suggestions about format? e.g. ability to query by day/month? |
I guess the main issue is the one we had with their site - if it’s an RSS feed of questions, how do we get the answers? Question appear before there are answers, e.g. https://www.london.gov.uk/who-we-are/what-london-assembly-does/questions-mayor/find-an-answer/lfb-staff-progression-2 (as opposed to https://www.london.gov.uk/who-we-are/what-london-assembly-does/questions-mayor/find-an-answer/responding-climate-breakdown with an answer on that page.) |
I've had a go at the scraper, the big inefficiency is having to requery all non-answered questions. If we merge it, I'll go back to london assembly and see if we can still get a feed for that (speeds us up, less queries for them). |
London has a new website: https://www.london.gov.uk/
This breaks the previous scraper we were using to get Mayor's questions. https://www.theyworkforyou.com/london/
New site has doesn't have a page per session like the previous one - would need to use query date ranges through the search for the equivalent. https://www.london.gov.uk/who-we-are/what-london-assembly-does/questions-mayor/find-an-answer
The members feed comes via wikidata and is unaffected.
As a first action, we should contact them, raise awareness of the issue, and see if we can get a nicer data feed to work with rather than writing a new scraper. Assigning myself to keep track.
The text was updated successfully, but these errors were encountered: