Skip to content

Commit

Permalink
Updated README
Browse files Browse the repository at this point in the history
  • Loading branch information
boxuancui committed Oct 13, 2018
1 parent 4d1dc93 commit 9c8e2a4
Show file tree
Hide file tree
Showing 3 changed files with 79 additions and 61 deletions.
3 changes: 2 additions & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
# R for travis: see documentation at https://docs.travis-ci.com/user/languages/r

language: R
sudo: false
cache: packages
sudo: false
warnings_are_errors: true

r:
- oldrel
Expand Down
130 changes: 73 additions & 57 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,17 @@
# DataExplorer [![CRAN Version](http://www.r-pkg.org/badges/version/DataExplorer)](https://cran.r-project.org/package=DataExplorer)
<!--
[![CRAN Downloads](http://cranlogs.r-pkg.org/badges/DataExplorer)](https://cran.r-project.org/package=DataExplorer)
[![CRAN Total Downloads](http://cranlogs.r-pkg.org/badges/grand-total/DataExplorer)](https://cran.r-project.org/package=DataExplorer)
-->
# DataExplorer

[![CRAN Version](http://www.r-pkg.org/badges/version/DataExplorer)](https://cran.r-project.org/package=DataExplorer)
[![Downloads](http://cranlogs.r-pkg.org/badges/DataExplorer)](https://cran.r-project.org/package=DataExplorer)
[![Total Downloads](http://cranlogs.r-pkg.org/badges/grand-total/DataExplorer)](https://cran.r-project.org/package=DataExplorer)

###### master v0.6.1

[![Travis Build Status](https://travis-ci.org/boxuancui/DataExplorer.svg?branch=master)](https://travis-ci.org/boxuancui/DataExplorer/branches)
[![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/boxuancui/DataExplorer?branch=master&svg=true)](https://ci.appveyor.com/project/boxuancui/DataExplorer)
[![codecov](https://codecov.io/gh/boxuancui/DataExplorer/branch/master/graph/badge.svg)](https://codecov.io/gh/boxuancui/DataExplorer/branch/master)

###### develop v0.6.1.9000

[![Travis Build Status](https://travis-ci.org/boxuancui/DataExplorer.svg?branch=develop)](https://travis-ci.org/boxuancui/DataExplorer/branches)
[![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/boxuancui/DataExplorer?branch=develop&svg=true)](https://ci.appveyor.com/project/boxuancui/DataExplorer)
[![codecov](https://codecov.io/gh/boxuancui/DataExplorer/branch/develop/graph/badge.svg)](https://codecov.io/gh/boxuancui/DataExplorer/branch/develop)
Expand All @@ -22,83 +24,97 @@
## Installation
The package can be installed directly from CRAN.

install.packages("DataExplorer")
```R
install.packages("DataExplorer")
```

However, the latest stable version (if any) could be found on [GitHub](https://github.com/boxuancui/DataExplorer), and installed using `remotes` package.

if (!require(remotes)) install.packages("remotes")
remotes::install_github("boxuancui/DataExplorer")
```R
if (!require(remotes)) install.packages("remotes")
remotes::install_github("boxuancui/DataExplorer")
```

If you would like to install the latest [development version](https://github.com/boxuancui/DataExplorer/tree/develop), you may install the dev branch.

if (!require(remotes)) install.packages("remotes")
remotes::install_github("boxuancui/DataExplorer", ref = "develop")
```R
if (!require(remotes)) install.packages("remotes")
remotes::install_github("boxuancui/DataExplorer", ref = "develop")
```

## Examples
The package is extremely easy to use. Almost everything could be done in one line of code. Please refer to the package manuals for more information. You may also find the package vignettes [here](https://CRAN.R-project.org/package=DataExplorer/vignettes/dataexplorer-intro.html).

#### Report
To get a report for the [airquality](https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/airquality.html) dataset:

library(DataExplorer)
create_report(airquality)
```R
library(DataExplorer)
create_report(airquality)
```

To get a report for the [diamonds](http://docs.ggplot2.org/0.9.3.1/diamonds.html) dataset with response variable **price**:

library(DataExplorer)
library(ggplot2)
create_report(diamonds, y = "price")
```R
library(DataExplorer)
library(ggplot2)
create_report(diamonds, y = "price")
```

#### Visualization
You may also run all the plotting functions individually for your analysis, e.g.,

library(DataExplorer)
library(ggplot2)
## View missing value distribution for airquality data
plot_missing(airquality)

## View distribution of all discrete variables
plot_bar(diamonds)

## View `price` distribution of all discrete variables
plot_bar(diamonds, with = "price")

## View distribution of all continuous variables
plot_histogram(diamonds)

## View overall correlation heatmap
plot_correlation(diamonds)

## View bivariate continuous distribution based on `price`
plot_boxplot(diamonds, by = "price")
## Scatterplot `price` with all other features
plot_scatterplot(diamonds, by = "price")
```R
library(DataExplorer)
library(ggplot2)

## View missing value distribution for airquality data
plot_missing(airquality)

## View distribution of all discrete variables
plot_bar(diamonds)

## View `price` distribution of all discrete variables
plot_bar(diamonds, with = "price")

## View distribution of all continuous variables
plot_histogram(diamonds)

## View overall correlation heatmap
plot_correlation(diamonds)

## View bivariate continuous distribution based on `price`
plot_boxplot(diamonds, by = "price")

## Visualize principle component analysis
plot_prcomp(iris)
## Scatterplot `price` with all other features
plot_scatterplot(diamonds, by = "price")

## Visualize principle component analysis
plot_prcomp(iris)
```

#### Feature Engineering
To make quick updates to your data:

library(DataExplorer)
library(ggplot2)

## Group bottom 20% `clarity` by frequency
group_category(diamonds, feature = "clarity", threshold = 0.2, update = TRUE)

## Group bottom 20% `clarity` by `price`
group_category(diamonds, feature = "clarity", threshold = 0.2, measure = "price", update = TRUE)

## Set values for missing observations
df <- data.frame("a" = rnorm(260), "b" = rep(letters, 10))
df[sample.int(260, 50), ] <- NA
set_missing(df, list(0L, "unknown"))

## Drop columns
drop_columns(diamonds, 8:10)
drop_columns(diamonds, "clarity")
```R
library(DataExplorer)
library(ggplot2)

## Group bottom 20% `clarity` by frequency
group_category(diamonds, feature = "clarity", threshold = 0.2, update = TRUE)

## Group bottom 20% `clarity` by `price`
group_category(diamonds, feature = "clarity", threshold = 0.2, measure = "price", update = TRUE)

## Set values for missing observations
df <- data.frame("a" = rnorm(260), "b" = rep(letters, 10))
df[sample.int(260, 50), ] <- NA
set_missing(df, list(0L, "unknown"))

## Drop columns
drop_columns(diamonds, 8:10)
drop_columns(diamonds, "clarity")
```

## Articles

Expand Down
7 changes: 4 additions & 3 deletions appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,12 @@ init:
install:
ps: Bootstrap

cache:
- C:\RLibrary

# Adapt as necessary starting from here

environment:
global:
WARNINGS_ARE_ERRORS: 1

build_script:
- travis-tool.sh install_deps

Expand Down

0 comments on commit 9c8e2a4

Please sign in to comment.