This is a simple scraper for pulling in match data from Voobly.
This repo also contains data obtained from running this scraper. Releases of data will be versioned to some extent. At the current time the following datasets are available.
Version | File | Content | Ladders | Size (zipped/unzipped) |
---|---|---|---|---|
20180504 | matchDump.zip | 648641 matches | RM 1v1, RM Team | 57MB / 404MB |
20180507 | matchDumpDMOnly.zip | 27290 matches | DM 1v1, DM Team | 3MB / 48MB |
20181027 | matchDump.csv.zip | 512486 matches | RM 1v1, RM Team, DM 1v1, DM Team | 78MB / 402MB |
20190208 | matchDump.csv.zip | 345891 matches | RM 1v1, RM Team, DM 1v1, DM Team | 49MB / 261MB |
For the sake of simplicity the data are provided in csv format, with one row per player per match, and the following columns. For some matches Voobly does not have the data for one or more players, and in these cases all of the MatchPlayer values are filled in with "VooblyErrorPlayerNotFound".
The id of the match on Voobly
The url of the match on Voobly
When the match was played
How long the match was, in seconds
Which ladder the match was played on
Which map the match was played on
Any match mods used, separated by commas
The id of the player on Voobly
The (most recently scraped) name of the player - may not be the same as the player name on the match url
Which team the player was on
The civ id of the match - a full list is available below
The civ name
Whether or not this player was a match winner (coded as 1 = Winner, 0 = Not winner)
The rating of the player on the relevant ladder before the match
The rating of the player on the relevant ladder after the match
URL to recording for this match. Voobly does not keep recordings for long, so these will only work for a few months after the match has been played.
These are the same as the ids Voobly uses to display civ images. Voobly has some errors here, and uses civ ids for which it doesn't have a picture or any information. This seems to be due to a use of incompatible mods (WK with UP 1.4 for example). These civs are labelled as "VooblyCivError".
Id | Name |
---|---|
1 | Britons |
2 | Franks |
3 | Goths |
4 | Teutons |
5 | Japanese |
6 | Chinese |
7 | Byzantines |
8 | Persians |
9 | Saracens |
10 | Turks |
11 | Vikings |
12 | Mongols |
13 | Celts |
14 | Spanish |
15 | Aztecs |
16 | Mayans |
17 | Huns |
18 | Koreans |
19 | Italians |
20 | Indians |
21 | Incas |
22 | Magyars |
23 | Slavs |
24 | Portuguese |
25 | Ethiopian |
26 | Malian |
27 | Berbers |
28 | Khmer |
29 | Malay |
30 | Burmese |
31 | Vietnamese |
32 | VooblyCivError |
33 | VooblyCivError |
34 | VooblyCivError |
36 | VooblyCivError |
35 | VooblyCivError |
37 | VooblyCivError |
38 | VooblyCivError |
39 | VooblyCivError |
40 | VooblyCivError |
41 | VooblyCivError |
46 | VooblyCivError |
47 | VooblyCivError |
Install stack
Install tidy (sudo apt-get install tidy)
The scraper can take a long time to run the first time (multiple days). After the first run, subsequent runs should be a lot faster (a few hours) because it will have cached a lot of the data required.