Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check / update data for indoor swimming pools #1

Open
joergreichert opened this issue Jul 10, 2024 · 3 comments
Open

Check / update data for indoor swimming pools #1

joergreichert opened this issue Jul 10, 2024 · 3 comments

Comments

@joergreichert
Copy link

joergreichert commented Jul 10, 2024

Official data

https://www.leipzig.de/freizeit-kultur-und-tourismus/sport/sportstaetten/schwimmhallen

JSON data

scraped via yarn run scrape-swimming-pools

Scraped output

public/data/leipzig-swimming-pools.json

Transformed output

no additional transformation needed

Referenced in code

src/components/Map.js
showMarker -> swimming_pools

App

image

@joergreichert joergreichert changed the title Check / update data for indoor and out door swimming pools Check / update data for indoor swimming pools Jul 10, 2024
@alex-fdias
Copy link

Hi!

I cannot fully understand why executing the script "src/scrapers/leipzig-swimming-pools.js" leads to writing an empty JSON file ("public/data/leipzig-swimming-pools.json").

If I understood correctly, the script "leipzig-swimming-pools.js" uses the module "scrape-it" to retrieve the data on swimming pools from two URLs, and these data would be retrieved via AJAX requests. However, I checked for AJAX requests using the Developer Tools of Chrome (Network tab -> Fetch/XHR) and the data retrieved via AJAX requests has nothing to do with swimming pool data (it's the Werbung of the "Stadtliche Karriere" that shows at the bottom on the right). Also, I checked the URLs HTML source code and the lists of Schwimmhallen/Freibäder are hard coded anyway, though the information shown on the webpages is not structured/as detailed as in the file "public/data/leipzig-swimming-pools.json" of this repository.

My guess is that the webpages corresponding to the two URLs were changed or the existing data ("public/data/leipzig-swimming-pools.json") was obtained in another way (OpenStreetMap?).

@joergreichert
Copy link
Author

leipzig-swimming-pools.js uses the npm library https://www.npmjs.com/package/scrape-it to scrape the swimming pools data from https://www.leipzig.de/freizeit-kultur-und-tourismus/sport/sportstaetten/freibaeder by select each entry on that site and also follow for each the hyperlink to the details page to scrape the details data as well. Obviously they redesigned the web page, so the matchers don't work anymore and thus produce an empty JSON file.

@joergreichert
Copy link
Author

I've fixed the scraper: https://github.com/CodeforLeipzig/leipzigmaps/blob/main/src/scrapers/leipzig-swimming-pools.js

https://github.com/CodeforLeipzig/leipzigmaps/blob/main/test/scrapers/leipzig-swimming-pools-test.js can be used via node ./test/scrapers/leipzig-swimming-pools-test.js to test the scraper for exactly one entry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo
Development

No branches or pull requests

2 participants