This repository has been archived by the owner on Oct 17, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 3
/
README.Rmd
141 lines (92 loc) · 2.92 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
enigma
=======
```{r, eval=TRUE, echo=FALSE}
opts_chunk$set(fig.width=8, fig.pos="h", fig.path="inst/assets/figure/")
```
[![Build Status](https://api.travis-ci.org/rOpenGov/enigma.png)](https://travis-ci.org/rOpenGov/enigma)
**An R client for [Enigma.io](https://app.enigma.io/)**
Enigma holds government data and provides a really nice set of APIs for data, metadata, and stats on each of the datasets. That is, you can request a dataset itself, metadata on the dataset, and summary statistics on the columns of each dataset.
## enigma info
+ [enigma home page](https://app.enigma.io/)
+ [API docs](https://app.enigma.io/api)
## LICENSE
MIT, see [LICENSE file](https://github.com/rOpenGov/enigma/blob/master/LICENSE) and [MIT text](http://opensource.org/licenses/MIT)
## Quick start
### Install
```{r eval=FALSE}
install.packages("devtools")
library("devtools")
install_github("ropengov/enigma")
```
```{r}
library("enigma")
```
### Get data
```{r}
out <- enigma_data(dataset='us.gov.whitehouse.visitor-list', select=c('namelast','visitee_namelast','last_updatedby'))
```
Some metadata on the results
```{r}
out$info
```
Look at the data, first 6 rows for readme brevity
```{r}
head(out$result)
```
### Statistics on dataset columns
```{r}
out <- enigma_stats(dataset='us.gov.whitehouse.visitor-list', select='total_people')
```
Some summary stats
```{r}
out$result[c('sum','avg','stddev','variance','min','max')]
```
Frequency details
```{r}
head(out$result$frequency)
```
### Metadata on datasets
```{r}
out <- enigma_metadata(dataset='us.gov.whitehouse')
```
Paths
```{r}
out$info$paths
```
Immediate nodes
```{r}
out$info$immediate_nodes
```
Children tables
```{r}
out$info$children_tables[[1]]
```
### Use case: Plot frequency of flight distances
First, get columns for the air carrier dataset
```{r}
dset <- 'us.gov.dot.rita.trans-stats.air-carrier-statistics.t100d-market-all-carrier'
head(enigma_metadata(dset)$columns$table[,c(1:4)])
```
Looks like there's a column called _distance_ that we can search on. We by default for `varchar` type columns only `frequency` bake for the column.
```{r}
out <- enigma_stats(dset, select='distance')
head(out$result$frequency)
```
Then we can do a bit of tidying and make a plot
```{r warning=FALSE, message=FALSE, tidy=FALSE}
library("ggplot2")
library("ggthemes")
df <- out$result$frequency
df <- data.frame(distance=as.numeric(df$distance), count=as.numeric(df$count))
ggplot(df, aes(distance, count)) +
geom_bar(stat="identity") +
geom_point() +
theme_grey(base_size = 18) +
labs(y="flights", x="distance (miles)")
```
### Direct dataset download
Enigma provides an endpoint `.../export/<datasetid>` to download a zipped csv file of the entire dataset.
`enigma_fetch()` gives you an easy way to download these to a specific place on your machine. And a message tells you that a file has been written to disk.
```r
enigma_fetch(dataset='com.crunchbase.info.companies.acquisition')
```