web-scraping-with-R

Examples for web-scraping and text analysis with R

Installation

Installing all required packages

Run the following lines first in R console to install required packages:

for (pkg in c("rvest","httr","dplyr","stringr","XML","RCurl","ggplot2","reshape","tm","ggmap")){
 if (!pkg %in% rownames(installed.packages())){install.packages(pkg)}
}

Checking out this repository

Then, in RStudio,

Select File -> New Project... -> Version Control -> Git
Paste https://github.com/ubcecon/web-scraping-with-R (the URL for this repo) into the space for Repository URL. Press on Create Project.

or simply clone this repo using your favourite Git client to checkout this repository.

Once the repo is checked out, try replicating the following examples by yourself by opening the corresponding .Rmd files:

Tutorials

EPS trend difference by industry from Yahoo Finance (`yahoo-finance.rmd`) by @jasminehao

Basic principles of web scraping by URL patterns and HTML parsers.

Real-time data mining from Yahoo Finance (`yahoo-realtime.rmd`) by @jasminehao

Web scraping for data that are chaging real-time.

Economic literature analysis from AER abstracts (`aer-articles.rmd`) by @chiyahn

HTML/CSS analysis using SelectorGadget and developer tools for rvest & principles of basic text analysis with beautiful wordclouds.

Cross-industry firm location differences from SEC website (`sec-location.rmd`) by @chiyahn and @jasminehao

Web scraping from query-based webpages and geocoding.

Resources

Relevant R packages and developer tools:

rvest (HTML parsing): https://github.com/hadley/rvest
tm (text mining and analysis): http://tm.r-forge.r-project.org/
SelectorGadget (HTML/CSS analysis): https://selectorgadget.com/

Useful R packages for data cleaning:

PSID: https://github.com/floswald/psidR
The World Wealth and Income Database: https://github.com/WIDworld/wid-r-tool
The Survey of Professional Forecasters: https://github.com/joergrieger/Survey

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
aer-articles_files/figure-markdown_github		aer-articles_files/figure-markdown_github
aer-articles_files_aux/figure-markdown_github		aer-articles_files_aux/figure-markdown_github
sec-location_files/figure-markdown_github		sec-location_files/figure-markdown_github
yahoo-finance_files/figure-markdown_github		yahoo-finance_files/figure-markdown_github
yahoo-realtime_files/figure-markdown_github		yahoo-realtime_files/figure-markdown_github
yahoo-realtime_files_aux/figure-markdown_github		yahoo-realtime_files_aux/figure-markdown_github
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
aer-articles.Rmd		aer-articles.Rmd
aer-articles.html		aer-articles.html
aer-articles.md		aer-articles.md
sec-location.Rmd		sec-location.Rmd
sec-location.md		sec-location.md
yahoo-finance.md		yahoo-finance.md
yahoo-finance.rmd		yahoo-finance.rmd
yahoo-realtime.md		yahoo-realtime.md
yahoo-realtime.rmd		yahoo-realtime.rmd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

web-scraping-with-R

Installation

Installing all required packages

Checking out this repository

Tutorials

EPS trend difference by industry from Yahoo Finance (`yahoo-finance.rmd`) by @jasminehao

Real-time data mining from Yahoo Finance (`yahoo-realtime.rmd`) by @jasminehao

Economic literature analysis from AER abstracts (`aer-articles.rmd`) by @chiyahn

Cross-industry firm location differences from SEC website (`sec-location.rmd`) by @chiyahn and @jasminehao

Resources

About

Releases

Packages

Contributors 2

Languages

License

ubcecon/web-scraping-with-R

Folders and files

Latest commit

History

Repository files navigation

web-scraping-with-R

Installation

Installing all required packages

Checking out this repository

Tutorials

EPS trend difference by industry from Yahoo Finance (yahoo-finance.rmd) by @jasminehao

Real-time data mining from Yahoo Finance (yahoo-realtime.rmd) by @jasminehao

Economic literature analysis from AER abstracts (aer-articles.rmd) by @chiyahn

Cross-industry firm location differences from SEC website (sec-location.rmd) by @chiyahn and @jasminehao

Resources

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

EPS trend difference by industry from Yahoo Finance (`yahoo-finance.rmd`) by @jasminehao

Real-time data mining from Yahoo Finance (`yahoo-realtime.rmd`) by @jasminehao

Economic literature analysis from AER abstracts (`aer-articles.rmd`) by @chiyahn

Cross-industry firm location differences from SEC website (`sec-location.rmd`) by @chiyahn and @jasminehao

Packages