Skip to content
This repository has been archived by the owner on Sep 21, 2020. It is now read-only.

Latest commit

 

History

History
42 lines (28 loc) · 1.93 KB

contributing.md

File metadata and controls

42 lines (28 loc) · 1.93 KB

How to contribute

Outside help is very welcome on this project.

Important areas where help could be used include:

  • adding a new spider for a data source (from the list of potential sources)
  • adding tests to ensure spiders function correctly.
  • improving the efficiency of existing spiders.

Adding a new spider

Potential data sources can be found in github issues.

To scrape a data source, first create a new spider in the findthatcharity_import/spiders folder (you may want to copy an existing spider). Inheriting from the BaseScraper class (in findthatcharity_import/spiders/base_scraper.py) provides some useful utilites for your spider.

Each spider needs the following:

  • a name attribute (this should be the same as the file name, and ideally be one word)
  • an org_id_prefix attribute. This will be used to construct the Org ID.

Spiders should return Organisation items for each organisation in the register. Each spider should also include at least one Source item, which gives information about where the data was obtained from. See an existing spider for the format of the source item.

We want findthatcharity to be a good internet citizen. Be respectful of the scraped sites' capacity and bandwidth, and use the most bandwidth-efficient way of scraping the data. This may mean using a CSV download rather than scraping individual web pages.

Submitting changes

Send a new pull request with your changes.

Get in touch

Report any problems or bugs by adding an issue. Get in touch by sending a tweet to @kanedr or getting in touch by https://drkane.co.uk/contact.