-
Notifications
You must be signed in to change notification settings - Fork 106
Configuration options explained
You'll find here detail explanations about each configuration options available on LinkThumbnailer.
Maximum number of http
redirection allowed. If LinkThumbnailer cannot resolve given URL before redirect_limit
is reach, it will raise a LinkThumbnailer::RedirectLimit
exception.
Default is 3
You can set the http
user agent used to resolve given URL.
Default is link_thumbnailer
.
You can activate/deactivate SSL verification for each LinkThumbnailer requests.
Default is true
.
The amount of time in seconds to wait for a connection to be opened. If the HTTP object cannot open a connection in this many seconds, it raises a Net::OpenTimeout
exception.
See here for more details.
Default is 5
.
This is a list of backlisted URL pattern (using regex) to skip when LinkThumbnailer will fetch the website images. Use this option to filter advertising images.
Default are well known urls:
^http://ad\.doubleclick\.net/
^http://b\.scorecardresearch\.com/
^http://pixel\.quantserve\.com/
^http://s7\.addthis\.com/
This is a new option introduced in the v2
of LinkThumbnailer allowing you to explicitly tell what kind of HTML attributes you are expected to see.
LinkThumbnailer will do its best to find all given attributes in the provided website using the following scrapers
(order matter):
- OpenGraph protocol scraper
- Homemade custom scraper
Currently there are only the following attributes available:
title
description
images
videos
favicon
See here for more informations about each attributes and what they do.
This is a new option introduced with the v2
of LinkThumbnailer allowing you to customize how LinkThumbnailer selects the best description for a given website.
When fetching all possible description candidates for a given website, LinkThumbnailer computes the likely hood for each descriptions to be the best one using all graders against each descriptions.
See here for more informations about graders
and how to build your own.
Default are:
-
Length
grader will score description length -
HtmlAttribute
grader will score class's html node -
HtmlAttribute
grader will score id's html node -
Position
grader will score descriptions based on the order they appeared on the page. The first one are more likely to be reliable descriptions. -
LinkDensity
grader will score description link density
Every graders can specify a probability weight, 1 By default. For example, the position grader has a builtin weight of 3 since we consider the position of the text to be 3 times more important than the length of the text for example.
This is a new option introduced with the v2
of LinkThumbnailer allowing you to set description minimum length threshold to be taken as a candidate.
Default is 25
characters.
This is a new option introduced with the v2
of LinkThumbnailer allowing you to customize the word used to score class's html node and id's html node when using the HtmlAttribute
grader. Those are positive keywords.
Default is /article|body|content|entry|hentry|main|page|pagination|post|text|blog|story/i
.
This is a new option introduced with the v2
of LinkThumbnailer allowing you to customize the word used to score class's html node and id's html node when using the HtmlAttribute
grader. Those are negative keywords.
Default is /combx|comment|com-|contact|foot|footer|footnote|masthead|media|meta|outbrain|promo|related|scroll|shoutbox|sidebar|sponsor|shopping|tags|tool|widget|modal/i
.
This is a new option introduced with the v2
of LinkThumbnailer allowing you to set maximum number of images to fetch for a given website. Since fetching image informations has a cost (performing a http request for each images) you should consider setting a limit here.
Please note that when setting an image_limit
, the gem can't guarantee to return the "best" image describing the page. If you requested only 5 images and the 6th was the "best" image, it will not be returned. Only fetched images are compared to each other.
Default is 5
images.
This is a new option introduced with the v2
of LinkThumbnailer allowing to disable the image size and type parsing. In order for LinkThumbnailer to retrieve image's size and type, it performs a HTTP request using the image's url. This can have performance impact when parsing many images.
Set the value to false
to improve performance by deactivating image stats retrieval.
Whether you want LinkThumbnailer to raise an exception or not when the Content-Type
of the HTTP request is not supported by the gem. Since LinkThumbnailer was built to work on HTML pages, passing an URL pointing to a PDF file for example, might return unexpected results.