Using this library you can extract meta information from web pages and create site preview. The library uses four sources of information:
- oEmbed
- Open Graph
- Twitter Cards
- HTML meta tags
- python 3.5
- aiohttp
- beautifulsoup4
- html5lib
pip install aiounfurl
To extract all site data:
import asyncio
import aiohttp
from pprint import pprint
from aiounfurl.views import get_preview_data, fetch_all
async def get_links_data(links, loop):
results = []
async with aiohttp.ClientSession() as session:
tasks = [fetch_all(session, l, loop) for l in links]
results = await asyncio.gather(*tasks, loop=loop, return_exceptions=True)
return [{'link':l, 'data': d} for l, d in zip(links, results)]
links = [
'https://habrahabr.ru/post/314606/',
'https://www.youtube.com/watch?v=9EftQMnuhvU',
'https://medium.freecodecamp.com/million-requests-per-second-with-python-95c137af319'
]
loop = asyncio.get_event_loop()
result = loop.run_until_complete(get_links_data(links, loop))
loop.close()
pprint(result)
Full example you can find here.
Install required packages for running example:
pip install -r example/requirements.txt
Run python srv.py runserver
, then open http://127.0.0.1:8080/
I added a docker image with the example in http://hub.docker.com/ to run the sample as a separate independent service.
Running in the background:
docker run --name aiounfurl -p 8080:8080 -d tigorc/aiounfurl
then you can open our example http://127.0.0.1:8080/.
Using the list of oEmbed providers (a json file with a list of providers /path_to_file/providers.json has to be preliminarily created):
docker run --name aiounfurl -p 8080:8080 -e "OEMBED_PROVIDERS_FILE=/srv/app/providers.json" -v /path_to_file/providers.json:/srv/app/providers.json -d tigorc/aiounfurl
Install the tox
package and run command:
tox