All codes in this Markdown were made using JupyterLab, and executed cell by cell.
To make the scripts we use the programming language Python. The follow librarys helped us to get the job done:
- requests
- BeautifulSoup
- pandas
For each competition (Adidas Next Generation, Euroleague and Eurocup), data from different pages in the respective website are scraped, considering the year of the competition and the information we want. To do that we need to search in two different pages for every year, for each competition, because the 'minutes_played' and 'index', were in different pages. Before gathering all the data, we made a third script, to delete duplicated information.
All data came from: Adidas Next Gen Competition.
1.1. Index
1.2. Minutes played
1.3. Delete duplicates
All data came from: Euroleague website.
2.1. Index
2.2. Minutes Played
2.3. Delete duplicates
All data came from: Eurocup website.
3.1. Index
3.2. Minutes Played
3.3. Delete duplicates
To gather all data in a single CSV file, we needed to made a fourth script. But, unlike the others, in this case we had to explore the data to fill all the gaps. To a better undestand of the process, you can access the Jupyter Notebook available in the link below.
There are probably several ways to do the same process proposed above, faster and even more efficiently. Even so, we believe that sharing the work done can serve as a starting point for those who want to do something similar.