ReadSharp was previously PocketSharp.Reader and is now hosted without the PocketSharp dependency.
Install ReadSharp using NuGet
Install-Package ReadSharp
The library extracts the main content of a website and returns the article as HTML with it's associated title, description, favicon and all included images.
The content can be encapsulated in a <body>
-Tag and displayed as a readable website with a custom CSS (it's up to you!).
ReadSharp is based on a custom PCL port of NReadability and SgmlReader, which are included in the solution.
This library is a replacement for the Article View API by Pocket which is limited by usage and privacy.
With ReadSharp you won't hit any usage limits, as you are extracting the content directly. And it's open source.
using ReadSharp;
Reader reader = new Reader();
Article article;
try
{
article = await reader.Read(new Uri("http://frontendplay.com/story/4/http-caching-demystified-part-2-implementation"));
}
catch (ReadException exc)
{
// handle exception
}
You can pass HttpOptions
to the Reader
constructor, which count for all requests:
HttpMessageHandler
CustomHttpHandler
Use your own HTTP handlerint?
RequestTimeout
Define a custom timeout in seconds, after which requests should cancelbool
UseMobileUserAgent
Gets or sets a value indicating whether [use mobile user agent]string
UserAgent
Override the user agent, which is passed to the destination serverstring
UserAgentMobile
Override the mobile user agent, which is passed to the destination serverbool
UseMobileUserAgent
There are desktop and mobile default user agents. By enabling this property, the mobile user agent is used. If you pass a custom user agent, this property is ignored!int
MultipageLimit
Gets or sets the download limit for articles with multiple pages (default: 10)
There are also ReadOptions
available, which are passed on every request:
bool
HasHeaderTags
Return complete HTML document or just the body partbool
HasNoHeadline
Removes<h1>
title from the articlebool
UseDeepLinks
If you check this option, deep-links (containing hashes, e.g.href="#article"
) are not transformed into absolute URIsbool
PrettyPrint
Determines whether the HTML output will be formattedbool
PreferHTMLEncoding
Determines whether to prefer the encoding found in the HTML or the one found in the HTTP Header (default: true)bool
MultipageDownload
Download all pages for articles with multiple pages (default: false)bool
ReplaceImagesWithPlaceholders
If true, replace all img-tags with placeholders
The Article
contains following fields:
string
Title (the title of the page)string
Description (description of the page, extracted from meta information)string
Content (contains the article)Uri
FrontImage (main page image extracted from meta tags like apple-touch-icon and others)Uri
Favicon (the favicon of the page)List<ArticleImage>
Images (contains all images found in the text)string
NextPage (contains the next page URI, if available)
Uri
Uristring
Title (extracted from the title attribute)string
AlternativeText (extracted from the alt attribute)
ReadSharp is a Portable Class Library, therefore it's compatible with multiple platforms and Universal Apps:
- .NET >= 4.5 (including WPF)
- Windows Phone (Silverlight + WinPRT) >= 8
- Windows Store >= 8
- Xamarin iOS + Android
- WP7 and Silverlight are dropped in 6.0, use ReadSharp < 6.0, if you want to support them
forks are included in the primary source code
ceee |