Jobs Offer Scraper

What is it?

Jobs offer scraper is web application. It scrap intern offers from these three pages (pracuj.pl, linkdein, nofluffjobs). Storage it on database. Send how many new offers appears these day to my mail. It also check if offer still appear on the page and if not delete it from database. It is deployed on heroku. page-link. To gurantee every day update i use cron-job.

Technologies

Python
FastAPI (RestAPI server)
Selenium (Library for scraping data)
PostgresSQL (Database)
SqlAlchemy (Database mapper)

TODO Features

Adding next pages to scrap
Adding new table for company
Extend API on query company_name, job_name etc.

Database Scheme

Tables:

offers
last_scraped

Fields in collections:

Offers:
- id (int)
- job_name (string) - job title, ex software engineer
- company_name (string) - which company offer this job
- website_name (string) - on which page it was found
- place (string) - localization of job
- logo_url (string) - url of company logo
- url (string) - url to this offer on website
- found_date (Date) - date when this offer was found
last_scraped:
- id (int)
- last_scraped (Date) - date when last scraping was made. Something like system variable to prevent more than one scraping per day.

Files:

main - file where fastapi server was setup
offer_ordering - connector between database and scraped data
mail - class responsible for sending mails
selenium_driver - file with selenium webdriver and calling specified Scraper
scrapers - package where are all classes related to scraping data from specific page
- abstract_scraper - class with common code for all scraper classes
- linkedin_scraper - class for scraping data from linkedin
- nofluffjobs_scraper - class for scraping data from nofluffjobs
- pracuj_scraper - class for scraping data from pracuj.pl
sql_app - package where are files related to SQLAlchemy library, and connecting database to fastapi server
- crud - file with crud operation on
- database - file with setting up connection do database
- models - file with classes which interact with database
- schemas - file with Pydantic models which are valid data shape, this class are return to user from database
- website_names - file with enum class of all pages from which data are scraped

Server API

Resource	Description
`/`	Always return "Hello world"
`offers/{website}`	Return offers for specified website
`offers/offer/{offer_id}`	Return offer of specified id if exist
`offers/`	Return all offers

All offers resources will give first 100 offers. To get next or less you must use query parameters. Ex:

https://job-offers-scraper.herokuapp.com/offers?skip=10&limit=10 will give you second tenth of offers on page.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.idea		.idea
scrapers		scrapers
sql_app		sql_app
.gitignore		.gitignore
Procfile		Procfile
README.md		README.md
mail.py		mail.py
main.py		main.py
offer_ordering.py		offer_ordering.py
requirements.txt		requirements.txt
selenium_driver.py		selenium_driver.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Jobs Offer Scraper

What is it?

Technologies

TODO Features

Database Scheme

Tables:

Fields in collections:

Files:

Server API

About

Releases

Packages

Languages

Rados13/JobsOfferScraper

Folders and files

Latest commit

History

Repository files navigation

Jobs Offer Scraper

What is it?

Technologies

TODO Features

Database Scheme

Tables:

Fields in collections:

Files:

Server API

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages