Jobs

This is a collection of utilities for scraping job descriptions from the web.

.
├── aws_lambda #lambda components of state machines
│   ├── crawl_companies #crawl-scrape-v2 
│   │   ├── crawl_companies.py
│   │   ├── dockerfile
│   │   ├── requirements.in
│   │   ├── requirements.txt
│   │   └── update_image.sh
│   ├── create_new_lambda.sh
│   ├── ddg_search #ddg-search 
│   │   ├── ddg_search.py
│   │   ├── dockerfile
│   │   ├── requirements.txt
│   │   └── update_image.sh
│   ├── put_objects_s3 #ddg-search
│   │   ├── dockerfile
│   │   ├── put_objects_s3.py
│   │   ├── requirements.txt
│   │   └── update_image.sh
│   ├── scrape_jds #crawl-scrape-v2
│   │   ├── dockerfile
│   │   ├── requirements.txt
│   │   ├── scrape_jds.py
│   │   └── update_image.sh
│   └── update_lambda.sh #template called by each update_image.sh above
├── make_dummy_company.sh #make empty placeholder files on s3 used for filtering in crawl-scrape-v2
├── README.md
├── requirements.in
├── requirements.txt
├── scrape.py
├── state_machines
│   ├── crawl-scrape-v2.asl.yaml
│   └── ddgsearch.yaml
├── tests.py

There are two AWS state machines: ddg_search performs DuckDuckGo searches (right now only for jobs posted on Greenhouse ATS), and keeps track of companies which post those jobs on S3.

The crawl-scrape state machine lists companies in the S3 bucket, fetches all the job posting urls from the company job boards, and then performs a map step which scrapes all the job metadata asynchronously. This is also stored on an S3 Bucket for later analysis.

Updating lambda functions

The lambdas sometimes require updating. For instance, if a new version of dependency is released, the requirements.txt for that lambda may need to be updated. After making changes, run the shell script update_image.sh in the lambda directory in this repo. This will execute the shell script in aws_lambda/update_lambda.sh to update the specified lambda.

TODO automate dependency updates for ddg-search.

I am currently working on utilities to filter the jobs on S3 by criteria such as title, location and perform alerting via AWS SNS

After that I'm planning on training some ML models on the job descriptions in order to rank them by relevance to a candidate's resume.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Jobs

Updating lambda functions

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
aws_lambda		aws_lambda
state_machines		state_machines
.gitignore		.gitignore
README.md		README.md
make_dummy_company.sh		make_dummy_company.sh
requirements.in		requirements.in
requirements.txt		requirements.txt
scrape.py		scrape.py
tests.py		tests.py

daavidstein/jobs

Folders and files

Latest commit

History

Repository files navigation

Jobs

Updating lambda functions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages