Basketball-Reference Webcrawler

Scrapy-based webcrawler which collects all data for a specific NBA season from basketball-reference.com.

Prerequisites

Requires the scrapy and pandas python packages to be installed.

The webcrawler can be started from the project directory using the command

scrapy crawl basketball-reference -a season=2020

where the season for which data should be collected is given by the season argument (default is current season).

odds data : Odds data collecting with sbrscrape, scraping FanDuel odds data ➡️ test.py
```
  python3 test.py
year = ["2023", "2024"] , season = ["2023-24"] ➡️ change year with when you want to discover
```
in odds data, you can access tomorrow's betting info. ➡️ bet_api.py, accessable with your own key {https://api.the-odds-api.com/v4/sports}
season data : Seasonal data collecting with https://www.basketball-reference.com/leagues/NBA_{self.season}_games.html site.➡️ br_spider.py
```
  scrapy crawl basketball-reference -a season=2020 ➡️ crawl command, change season args with when you want.
```
merged data : merge odds data & season data with [date, home, away] ➡️ data_preprocess.py

hadoop과 spark를 사용해 nba 경기의 승부를 예측하는 프로그램입니다.

first commit은 무시하셔도 됩니다..

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
__pycache__		__pycache__
br_spider		br_spider
mapreduce		mapreduce
merged_data		merged_data
odds_data		odds_data
season_data		season_data
test1		test1
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
bet_api.py		bet_api.py
data_preprocess.py		data_preprocess.py
mapreduce_season_detailed.py		mapreduce_season_detailed.py
model.py		model.py
scrapy.cfg		scrapy.cfg
spider.py		spider.py
test.ipynb		test.ipynb
test.py		test.py
testspider.py		testspider.py