paperwizard

paperwizard is an R package designed to extract readable content (such as news articles) from webpages using Readability.js. This package leverages Node.js to parse webpages and identify the main content of an article, allowing you to work with cleaner, structured content.

The package is supposed to be an addon for paperboy.

Installation

You can install the development version of paperwizard like so:

remotes::install_github("schochastics/paperwizard")

Setup

To use paperwizard, you need to have Node.js installed. Download and install Node.js from the official website. The page offers instructions for all major OS. After installing Node.js, you can confirm the installation by running the following command in your terminal.

node -v

This should return the version of Node.js installed.

To make sure that the package knows where the command node is found, set

options(paperwizard.node_path = "/path/to/node")

if it is not installed in a standard location.

Once Node.js is installed, you need to install the necessary libraries which are linkedom, Readability.js, puppeteer and axios.

pw_npm_install()

Use

You can use it either by supplying a url

pw_deliver(url)

or a data.frame that was created by paperboy::pb_collect()

x <- paperboy::pb_collect(list_or_urls)
pw_deliver(x)

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.github		.github
R		R
data-raw		data-raw
inst/js		inst/js
man		man
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

paperwizard

Installation

Setup

Use

Known sites with issues

About

Licenses found

Releases 2

Packages

Languages

License

Licenses found

schochastics/paperwizard

Folders and files

Latest commit

History

Repository files navigation

paperwizard

Installation

Setup

Use

Known sites with issues

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages