This repository contains data on matches in the English Premier League between the 1992/93 and 2019/20 seasons.
The data has been scraped from the Premier League's official website and manipulated into a tabular format.
Raw data in for every game in JSON format can be found in data/games/. The processed data includes:
matches.csv
- Metadata about each match. Columns are as follows:
Column | Description |
---|---|
match_id | Premier league match id e.g. https://www.premierleague.com/match/{match_id} |
season | Premier league season |
game_week | Gameweek, an integer between 1 and 41 |
kickoff | Date and time of match kickoff |
home | Home team name |
home_id | Premier league home team id e.g. https://www.premierleague.com/clubs/{home_id} |
home_score | Home team score |
home_ht_score | Home team half-time score |
away | Away team name |
away_id | Premier league away team id, as above |
away_score | Away team score |
away_ht_score | Away team half-time score |
ground_id | Premier league ground id |
ground_name | Ground name e.g. Old Trafford |
ground_city | Ground city e.g. Manchester |
referee_id | Premier league referee id |
referee_name | Referee name |
behind_closed_doors | Boolean. Was game played behind closed doors? |
teams.csv
- Matchday squads for games. Joins to matches.csv
on match_id
. Note that this data looks to be slightly spotty in terms of reliability.
Column | Description |
---|---|
match_id | Premier league match id, as above |
team_id | Premier league team id, as above |
field | Specifies whether this was the home or the away team |
type | Either starting11 or substitute |
player_id | Premier league player id e.g. https://www.premierleague.com/players/{player_id} |
player_name | Player name |
position | Match position |
shirt_no | Shirt number |
captain | Boolean, was player captain for team |
nationality | Nationality |
birth_date | Birth date in YYYY-MM-DD format |
events.csv
- Events in game. These include goals, cards, substitutes etc. Again, Joins to matches.csv
on match_id
and to teams.csv
by player_id
and/or assist_id
value | description |
---|---|
match_id | Premier league match id, as above |
half | Integer, indicating first or second half |
event_type | see below |
event_desc | see below |
team_id | Premier league match id, as above |
player_id | Premier league player id, as above |
player_name | Player name |
assist_id | Where goal was assisted, Premier league player id |
assist_name | Where goal was assisted, player name |
minute | Time of event in minuteds, in mm'ss format (although seconds are redundant) |
seconds | Tim of event in seconds |
event_type
- Can be one of Play Start
or Play End
, at the start and end of each half, Booking
for yellow and red cards, Substitution
for subs in an out Goal
, Own-Goal
and Penalty Scored
for goals, and Penalty Saved
or Penalty Missed
.
event_description
is very similar, but carries an extra level of detail for subs in and out and card types: yellow, red or second yellow.