Video Demo: Showcase
The data was scrapped from Another bleeding Monty Python website for this open-source non-commercial project (started as a final project for CS50’s Introduction to Programming with Python). All data is property of Monty Python group and will be removed upon their request.
Disclaimer: This code is not affiliated, associated, authorized, endorsed by, or officially connected in any way with Python (Monty) Pictures Limited nor does it claim to be. All material is respectfully copyright to them.
With this API you can get phrases from 3 Monty Python movies: Life of Brian, The Meaning of Life and Monty Python and The Holy Grail.
The functionality allows you:
- Create a user to save user's favourite quotes
- Get a list of movies from which quotes are available
- Get a random quote
- Search for quote(s) by quote's text
- Put a quote in your top quotes list by quote's id
- Get your top quotes list
- Get a random scene (all quotes from this scene) [^1]
- Get a whole scene text (all quotes) [^2]
- From a specific movie and: 1.1 By a scene by number 1.2 By a scene by scene name
API can be utilized by running project.py via CLI interface and also on Amazon AWS as a standalone API. The programm is divided into two separate parts:
- Swagger server (see below)
- CLI interface run through project.py
If you'd like to have fun with Monty Python Flying Cyrcus quotes - I can refer you to Vaiterius Gerard Gandionco API's
This server was generated by the swagger-codegen project and enchanced to support holy_grail_api.json swagger specification This example uses the Connexion library on top of Flask.
Python 3.5.2+
To run the server, please execute the following from the root directory:
pip3 install -r requirements.txt
python3 -m swagger_server -db holy_scripts.db
``` [^3]
and open your browser here:
###### Running with Docker
To run the server on a Docker container, please execute the following from the root directory:
```bash
# building the image
docker build -t swagger_server .
# starting up a container
docker run -p 8080:8080 swagger_server
In the datascrapper.py you can find a web scrapper with Beautiful Soup library.
Scrapper can parse either an url(s) or a html file.
usage: datascrapper.py [-h] [-m] [-p P] [-t [T]] [-f [F]]
Parse html file into csv
options:
-h, --help show this help message and exit
-m Multi-link mode - links for scrapping are in the code, -p, -h are not needed
-p P File name of html file to parse
-t [T] 1 - parse html file by the given in -p link, 0 - parse html file by given in -p path
-f [F] File name of csv file to write parsed results to
As datascrapper is finely designed for a specific html tags sequence - it will not work on any html you'd like to scrap. The format which was used is the following:
- For the dialogues
<p><span class="name">DENNIS:</span>
Ah, now we see the violence inherent in the system.
</p>
- For the directions (as in scripts)
<p>
<i>[King Arthur music stops]</i>
</p>
- Movie name is extracted from a first h2 tag
<h2>Monty Python and The Holy Grail</h2>
- Scene name and number (if available) is extracted from a first h1 tag
<h1>Scene 3: Repression is Nine Tenths of the Law?</h1>
Any deviations found were cleaned up but still some abnormalities can stay within data.
Scrapped data are saved into csv file in the following format:
movie | scene_number | scene_name | type | character | text |
---|---|---|---|---|---|
Life of Brian | 1 | The Relationship of Men and Sheep | direction | NULL | holy music |
Monty Python and The Holy Grail | 8 | Why No One Likes The French | dialogue | FRENCH GUARD | Of course not! You are English types-a! |
You can create SQLITE lite tables using db statements from dbcreator.sql. Afterwards, it will be possible using dataloader.py to load your scrapped data from csv into the database. Each new load deletes all previous entries, be mindful of the autoincremented indexes if planning to use them.
A separate database copy ./server/swagger_server/persistance/holy_scripts.db is used for test
Doesn't provide 100% coverage and intended as an example for both unit and integration test. Located at ./cli/test_project.py (test the project.py only).
Located at ./test/behave. Can be run with:
behave feature/<name_of_feature_file>.feature
- Feature files are in ./test/behave/feature
- Steps implementations are located at ./test/steps
- ./test/enviroment.py has implementation of the actions done before_all and setting testing enviroment
[^1] Functionality postponed to version 2 [^2] Functionality postponed to version 2 [^3] holy_scripts_test.db is used for tests (pytest and BDD with behave)