Skip to content

schochastics/paperwizard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

paperwizard

R-CMD-check

paperwizard is an R package designed to extract readable content (such as news articles) from webpages using Readability.js. This package leverages Node.js to parse webpages and identify the main content of an article, allowing you to work with cleaner, structured content.

The package is supposed to be an addon for paperboy.

Installation

You can install the development version of paperwizard like so:

remotes::install_github("schochastics/paperwizard")

Setup

To use paperwizard, you need to have Node.js installed. Download and install Node.js from the official website. The page offers instructions for all major OS. After installing Node.js, you can confirm the installation by running the following command in your terminal.

node -v

This should return the version of Node.js installed.

To make sure that the package knows where the command node is found, set

options(paperwizard.node_path = "/path/to/node")

if it is not installed in a standard location.

Once Node.js is installed, you need to install the necessary libraries which are linkedom, Readability.js, puppeteer and axios.

pw_npm_install()

Use

You can use it either by supplying a url

pw_deliver(url)

or a data.frame that was created by paperboy::pb_collect()

x <- paperboy::pb_collect(list_or_urls)
pw_deliver(x)

Known sites with issues

About

No description, website, or topics provided.

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Packages

No packages published