Imports IMDb datasets into a relational database for easy querying.
Don't host this somewhere publicly accessible, otherwise you run afoul of IMDb's licensing terms for their datasets.
- Create MySQL database, import
config/schema.sql
- Enter DB details in
config/config.json
php rmdb/public/index.php [command]
help - Print this help
download - Download zipped TSV files from datasets.imdbws.com. Needs 1.5GB+ of storage
extract - Extract zipped TSV files. Needs 7GB+ of storage
import - Import everything, replace existing data. Runs all the below commands:
import-names - Import cast & crew
import-titles - Import movies, TV episodes, short films, ...
import-titles-akas - Import foreign names for titles
import-episodes - Import episode <> show/series relation
import-ratings - Import average ratings for titles
import-principals - Import directors, writers. Should be redundant if you import names
See config/schema.sql
which pretty much maps to the IMDb dataset schema. Overview:
Table | Description |
---|---|
titles | All the movies, short films, episodes |
genres | All genres, like Action, Sci-Fi, Comedy |
titles_genres | Maps titles to genres |
names | All people with birth and death year |
categories | Main types of jobs like Actor, Actress, Composer |
jobs | Freeform job titles like Act Three written By, Live Show Editor |
principals | Maps names to titles, category and job, ordered by ordering |
principals_characters | Maps principals to a character name |
professions | Main professions like Producer, Stunts, casting Director |
names_primaryprofessions | Maps names to professions |
names_knownfortitles | Maps people to titles they are most known for |
titleakas | Foreign language / additional titles |
titleakaattributes | Type of additional titles like Fake Working Title, Berlin Film Festival Title |
titleakatypes | Smaller list of title types, similar to titleakaattributes, like Alternative, Working, IMDb Display |
titleakas_titleakaattributes | Maps titleakaattributes to titleakas |
titleakas_titleakatypes | Maps titleakas to titleakatypes |
episodes | Connects episode titles to their shows |
Get more data from other databases like TMDB. Country of origin, language and box-office would be interesting. But this probably needs to be imported one-by-one via their API.
SELECT start_year,
COUNT(*) AS count
FROM titles
LEFT JOIN titles_genres AS tg1
ON tg1.title_id = titles.id
WHERE tg1.genre_id = "drama" -- repeat for a few genres
-- Exclude tv episodes etc.
AND title_type = "movie"
AND start_year < YEAR(CURRENT_DATE())
GROUP BY start_year
ORDER BY start_year ASC;
Although after 2000 the amount of movies made, or listed in IMDb, rises rapidly, so the data might need some normalization. Or limit to before 2000 like here:
SELECT n.primary_name AS name,
COUNT(*) AS movies
FROM principals
LEFT JOIN names AS n
ON n.id = name_id
LEFT JOIN titles AS t
ON t.id = title_id
WHERE category_id = "director"
AND t.title_type = "movie"
AND t.start_year >= 1960
AND t.start_year <= 1969
AND t.runtime_minutes >= 90
AND t.average_rating > 6
AND t.num_votes > 10000
GROUP BY name_id
ORDER BY movies DESC
LIMIT 10;
name | movies |
---|---|
Jean-Luc Godard | 7 |
Blake Edwards | 6 |
Akira Kurosawa | 5 |
Michelangelo Antonioni | 5 |
Alfred Hitchcock | 5 |
John Frankenheimer | 5 |
Stanley Kramer | 4 |
Robert Aldrich | 4 |
Sergio Leone | 4 |
Roman Polanski | 4 |
SELECT CONCAT("[", primary_title, "](https://www.imdb.com/title/", id, ")") AS primary_title,
start_year,
average_rating,
num_votes
FROM titles
LEFT JOIN titles_genres AS tg1
ON tg1.title_id = titles.id
LEFT JOIN titles_genres AS tg2
ON tg2.title_id = titles.id
WHERE tg1.genre_id = "horror"
AND tg2.genre_id = "comedy"
-- Exclude TV episodes etc.
AND title_type = "movie"
-- Exclude little voted on movies where average_rating is often too high.
-- Higher num_votes = more popular
AND num_votes > 20000
ORDER BY average_rating DESC,
num_votes DESC
LIMIT 50;
primary_title | start_year | average_rating | num_votes |
---|---|---|---|
Shaun of the Dead | 2004 | 7.9 | 547079 |
Evil Dead II | 1987 | 7.7 | 161069 |
Zombieland | 2009 | 7.6 | 562503 |
What We Do in the Shadows | 2014 | 7.6 | 178609 |
One Cut of the Dead | 2017 | 7.6 | 23294 |
Tucker and Dale vs Evil | 2010 | 7.5 | 176660 |
An American Werewolf in London | 1981 | 7.5 | 104122 |
Dead Alive | 1992 | 7.5 | 96468 |
Stree | 2018 | 7.5 | 33219 |
Army of Darkness | 1992 | 7.4 | 174895 |
The Rocky Horror Picture Show | 1975 | 7.4 | 148043 |
Bhool Bhulaiyaa | 2007 | 7.4 | 26087 |
Gremlins | 1984 | 7.3 | 219049 |
The Return of the Living Dead | 1985 | 7.3 | 60200 |
House | 1977 | 7.3 | 26640 |
The Lost Boys | 1987 | 7.2 | 138911 |
Re-Animator | 1985 | 7.2 | 63177 |
Bhoot Police | 2021 | 7.2 | 26129 |
Tremors | 1990 | 7.1 | 135475 |
The Frighteners | 1996 | 7.1 | 87546 |
Little Shop of Horrors | 1986 | 7.1 | 74525 |
The Fearless Vampire Killers | 1967 | 7.1 | 31521 |
Dellamorte Dellamore | 1994 | 7.1 | 21375 |
Bubba Ho-Tep | 2002 | 6.9 | 48378 |
Warm Bodies | 2013 | 6.8 | 229755 |
Ready or Not | 2019 | 6.8 | 141645 |
Odd Thomas | 2013 | 6.8 | 52720 |
Creepshow | 1982 | 6.8 | 46916 |
Zombieland: Double Tap | 2019 | 6.7 | 173135 |
Trick 'r Treat | 2007 | 6.7 | 90509 |
Fresh | 2022 | 6.7 | 38519 |
Housebound | 2014 | 6.7 | 34328 |
Fido | 2006 | 6.7 | 29187 |
Behind the Mask: The Rise of Leslie Vernon | 2006 | 6.7 | 23800 |
Night of the Creeps | 1986 | 6.7 | 23355 |
Happy Death Day | 2017 | 6.6 | 136327 |
Death Becomes Her | 1992 | 6.6 | 115999 |
May | 2002 | 6.6 | 36838 |
Slither | 2006 | 6.5 | 81269 |
Arachnophobia | 1990 | 6.5 | 69144 |
The Witches of Eastwick | 1987 | 6.5 | 69079 |
Bad Taste | 1987 | 6.5 | 46735 |
Better Watch Out | 2016 | 6.5 | 37711 |
Fright Night | 2011 | 6.4 | 105215 |
Gremlins 2: The New Batch | 1990 | 6.4 | 102085 |
Severance | 2006 | 6.4 | 38791 |
The People Under the Stairs | 1991 | 6.4 | 34750 |
Mayhem | 2017 | 6.4 | 20928 |
The Babysitter | 2017 | 6.3 | 89374 |
Dead Snow | 2009 | 6.3 | 66839 |
Note: No Star Wars, it only has Action, Adventure and Fantasy genres.
SELECT CONCAT("[", primary_title, "](https://www.imdb.com/title/", id, ")") AS
primary_title,
start_year,
average_rating,
num_votes
FROM titles
LEFT JOIN titles_genres AS tg1
ON tg1.title_id = titles.id
WHERE tg1.genre_id = "sci-fi"
-- Exclude TV episodes etc.
AND title_type = "movie"
-- Exclude little voted on movies where average_rating is often too high.
-- Higher num_votes = more popular
AND num_votes > 10000
AND start_year >= 1970
AND start_year <= 1979
ORDER BY average_rating DESC,
num_votes DESC;
primary_title | start_year | average_rating | num_votes |
---|---|---|---|
Alien | 1979 | 8.5 | 855270 |
A Clockwork Orange | 1971 | 8.3 | 811076 |
Ivan Vasilyevich Changes His Profession | 1973 | 8.2 | 16458 |
Stalker | 1979 | 8.1 | 131654 |
Solaris | 1972 | 8 | 89827 |
Fantastic Planet | 1973 | 7.7 | 30480 |
Close Encounters of the Third Kind | 1977 | 7.6 | 198472 |
Superman | 1978 | 7.4 | 172371 |
Invasion of the Body Snatchers | 1978 | 7.4 | 59190 |
The Andromeda Strain | 1971 | 7.2 | 36803 |
Sleeper | 1973 | 7.1 | 42854 |
Time After Time | 1979 | 7.1 | 18387 |
Soylent Green | 1973 | 7 | 64000 |
The Boys from Brazil | 1978 | 7 | 27798 |
Westworld | 1973 | 6.9 | 57182 |
The Stepford Wives | 1975 | 6.9 | 17631 |
Mad Max | 1979 | 6.8 | 204229 |
Logan's Run | 1976 | 6.8 | 56057 |
The Brood | 1979 | 6.8 | 29614 |
Slaughterhouse-Five | 1972 | 6.8 | 13000 |
THX 1138 | 1971 | 6.7 | 51240 |
Phantasm | 1979 | 6.6 | 36464 |
Silent Running | 1972 | 6.6 | 28965 |
The Man Who Fell to Earth | 1976 | 6.6 | 26180 |
Rollerball | 1975 | 6.6 | 24752 |
Horror Express | 1972 | 6.5 | 10758 |
Star Trek: The Motion Picture | 1979 | 6.4 | 89032 |
The Omega Man | 1971 | 6.4 | 31325 |
Shivers | 1975 | 6.4 | 20837 |
A Boy and His Dog | 1975 | 6.4 | 17756 |
Escape from the Planet of the Apes | 1971 | 6.3 | 35051 |
Rabid | 1977 | 6.3 | 18535 |
The Fury | 1978 | 6.3 | 15211 |
Moonraker | 1979 | 6.2 | 99575 |
Death Race 2000 | 1975 | 6.2 | 27669 |
Dark Star | 1974 | 6.2 | 24315 |
Conquest of the Planet of the Apes | 1972 | 6.1 | 32523 |
The Crazies | 1973 | 6.1 | 13134 |
Beneath the Planet of the Apes | 1970 | 6 | 46579 |
The Black Hole | 1979 | 5.9 | 25455 |
Piranha | 1978 | 5.9 | 21507 |
Zardoz | 1974 | 5.8 | 22467 |
Futureworld | 1976 | 5.7 | 10713 |
Battle for the Planet of the Apes | 1973 | 5.4 | 30854 |
SELECT CONCAT("[", d_name, "](https://www.imdb.com/name/", name_id, ")") AS name,
ROUND(avg, 2) AS average_rating,
movies
FROM (SELECT name_id,
n.primary_name AS d_name,
SUM(t.average_rating) / COUNT(*) AS avg,
COUNT(*) AS movies
FROM principals
LEFT JOIN names AS n
ON n.id = name_id
LEFT JOIN titles AS t
ON t.id = title_id
WHERE category_id = "director"
AND t.title_type = "movie"
AND t.num_votes > 1000
AND n.death_year IS NULL
GROUP BY name_id
ORDER BY avg ASC) AS t
WHERE movies > 10
ORDER BY avg ASC
LIMIT 10;
name | average_rating | movies |
---|---|---|
Uwe Boll | 3.603 | 29 |
Fred Olen Ray | 3.7 | 11 |
Albert Pyun | 4.146 | 28 |
Bert I. Gordon | 4.158 | 12 |
Timothy Woodward Jr. | 4.191 | 11 |
Jim Wynorski | 4.25 | 14 |
Nico Mastorakis | 4.307 | 15 |
Charles Band | 4.515 | 13 |
Steven C. Miller | 4.636 | 11 |
Gregory Hatanaka | 4.727 | 11 |