In this repo you will find a smattering of different Python things that I want to share. Yes, I could have created a separate repo for each of these, however they are small and it helps me stay organized. Enjoy!
A group of Scrapy spiders used for scraping data from the benefits.gov website.
- Python >= 2.7.11
- Scrapy >= 1.0.5
- fake-useragent >= 0.0.8
- service_identity >= 14.0.0
- benefitstofile: scraper to save the entire HTML response to a file
- benefitlist: scraper to grab only the programs from the list page
- benefitprogramspider: full on looping spider; will get the details for each program
Install Scrapy and fake-useragent
pip install scrapy
pip install fake-useragent
- Change into the govbenefitsspider/govbenefitsspider directory
- Run the following commmands replacing [NAME_OF_SPIDER] with the name of one of the spiders above:
scrapy crawl [NAME_OF_SPIDER]
A Jupyter notebook showing three very basic but useful ways to use Pandas for data engineering and analysis.
- Python >= 3.5.1
- Pandas >= 0.17.1
- Jupyter >= 4.0.0
- Ensuring changes you make to DataFrames stick
- Applying a function with no arguments to a DataFrame
- Applying a function with arguments to a DataFrame
- Run the Three Pandas Tips for Pandas Noobs notebook and enjoy the awesome.
A simple scraper used to extract the price of books from the Packt website
- Python >= 3.5.1
- BeaufifulSoup4 >= 4.4.1
- Change the file path on line 80 of pbic_pricing_scraper.py or in the Jupyter notebook file
- Run the script (or notebook)
Code and fake dataset used to show how to create and train a predictive model.
- Python >= 3.5.1
- Pandas >= 0.17.1
- scikit-learn >= 0.17
- Jupyter >= 4.0.0
- Run the TLO Validation With Logistic Regression V3 notebook to see an example of creating and training a LogisticRegression model.
- Run the Apply The Logistic Model To New TLO Data notebook to see how to apply the model to new observations (data).