Skip to content

Commit

Permalink
Merge pull request #133 from xKDR/design_update
Browse files Browse the repository at this point in the history
Design update
  • Loading branch information
smishr authored Dec 9, 2022
2 parents 7b6bdb3 + 02d749b commit de80168
Show file tree
Hide file tree
Showing 45 changed files with 1,615 additions and 3,327 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,7 @@
/docs/Manifest.toml
/docs/build/
/test/Manifest.toml
/dev/*
.gitignore
.DS_Store
*.json
3 changes: 1 addition & 2 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ version = "0.11.1"
AlgebraOfGraphics = "cbdf2221-f076-402e-a563-3d30da359d67"
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
CairoMakie = "13f3f980-e62b-5c42-98c6-ff1f3baf88f0"
CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
GLM = "38e38edf-8417-5370-95a0-9cbb8c7f171a"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
Expand All @@ -18,6 +18,5 @@ AlgebraOfGraphics = "0.6"
CSV = "0.10"
CairoMakie = "0.8, 0.9, 0.10"
DataFrames = "1"
GLM = "1"
StatsBase = "0.33"
julia = "1"
179 changes: 81 additions & 98 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,121 +7,104 @@
[![Milestones](https://img.shields.io/badge/-milestones-brightgreen)](https://github.com/xKDR/Survey.jl/milestones)


This package is used to study complex survey data. It is the Julia implementation of the [Survey package in R](https://cran.r-project.org/web/packages/survey/index.html) developed by [Professor Thomas Lumley](https://www.stat.auckland.ac.nz/people/tlum005).
This package is used to study complex survey data. It aims to be a fast alternative to the [Survey package in R](https://cran.r-project.org/web/packages/survey/index.html) developed by [Professor Thomas Lumley](https://www.stat.auckland.ac.nz/people/tlum005).

As the size of survey datasets have become larger, processing the records can take hours or days in R. We endeavour to solve this problem by implementing the Survey package in Julia.
This package currently supports simple random sample and stratified sample. In future releases, it will support multistage sampling as well.

## How to install

add "https://github.com/xKDR/Survey.jl.git"

## Basic usage

In the following example, we will load the Academic Performance Index dataset for Californian schools and produce the weighted mean for each county.
```julia
using Survey

apiclus1 = load_data("apiclus1")
## This function loads a commonly used dataset, Academic Performance Index (API), as an example.
## Any DataFrame object can be used with this package.

dclus1 = svydesign(id = :1, weights = :pw, data = apiclus1)

svyby(:api00, :cname, dclus1, svymean)
11×3 DataFrame
Row │ cname mean SE
│ String15 Float64 Float64
─────┼────────────────────────────────
1 │ Alameda 669.0 16.2135
2 │ Fresno 472.0 9.85278
3 │ Kern 452.5 29.5049
4 │ Los Angeles 647.267 23.5116
5 │ Mendocino 623.25 24.216
6 │ Merced 519.25 10.4925
7 │ Orange 710.562 28.9123
8 │ Plumas 709.556 13.2174
9 │ San Diego 659.436 12.2082
10 │ San Joaquin 551.189 11.578
11 │ Santa Clara 732.077 12.2291
```

This example is from the Survey package in R. The [examples section of the documentation](https://xkdr.github.io/Survey.jl/dev/examples/) shows the R and the Julia code side by side for this and a few other examples.

## Performance
We will measure the performance of the R and Julia for the example shown above.

**R**

```R
library(survey)
library(microbenchmark)
data(api)
dclus1 <- svydesign(id = ~1, weights = ~pw, data = apiclus1)
microbenchmark(svyby(~api00, by = ~cname, design = dclus1, svymean), units = "us")
] add "https://github.com/xKDR/Survey.jl.git"
```
## Basic usage

```R
expr min lq
svyby(~api00, by = ~cname, design = dclus1, svymean) 10180.47 12102.61
mean median uq max neval
12734.43 12421.93 12788.55 17242.35 100
```
### Simple Random Sample

**Julia**
In the following example, we will load a simple random sample of the Academic Performance Index dataset for Californian schools and do basic analysis.
```julia
using Survey, BenchmarkTools
apiclus1 = load_data("apiclus1")
dclus1 = svydesign(id=:1, weights=:pw, data = apiclus1)
@benchmark svyby(:api00, :cname, dclus1, svymean)
```
using Survey

```julia
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min max): 54.464 μs 6.070 ms ┊ GC (min max): 0.00% 94.01%
Time (median): 72.468 μs ┊ GC (median): 0.00%
Time (mean ± σ): 81.833 μs ± 190.657 μs ┊ GC (mean ± σ): 7.62% ± 3.23%
```

The Julia code is about 171 times faster than the R code.

We increase the complexity by grouping the data by two variables and then performing the same operations.
**R**

```R
library(survey)
library(microbenchmark)
data(api)
dclus1 <- svydesign(id = ~1, weights = ~pw, data = apiclus1)
microbenchmark(svyby(~api00, by = ~cname+meals, design = dclus1, svymean, keep.var = FALSE), units = "us")
srs = load_data("apisrs")

dsrs = SimpleRandomSample(srs; weights = :pw)

mean(:api00, dsrs)
1×2 DataFrame
Row │ mean SE
│ Float64 Float64
─────┼──────────────────
1656.585 9.24972

total(:enroll, dsrs)
1×2 DataFrame
Row │ total SE
│ Float64 Float64
─────┼─────────────────────
13.62107e6 1.6952e5

mean(:api00, :cname, dsrs)
38×3 DataFrame
Row │ cname mean SE
│ String15 Float64 Float64
─────┼────────────────────────────────────
1 │ Kern 573.6 42.8026
2 │ Los Angeles 658.156 21.0728
3 │ Orange 749.333 27.0613
36 │ Napa 727.0 46.722
37 │ Lake 804.0 NaN
38 │ Merced 595.0 NaN

quantile(:enroll,dsrs,[0.1,0.2,0.5,0.75,0.95])
5×2 DataFrame
Row │ probability quantile
│ Float64 Float64
─────┼───────────────────────
10.1 245.5
20.2 317.6
30.5 453.0
40.75 668.5
50.95 1473.1
```

```R
Unit: microseconds
expr min lq
svyby(~api00, by = ~cname + meals, design = dclus1, svymean) 132468.1 149914
mean median uq max neval
166121.9 160571.3 172301.6 304979.2 100
```
### Stratified Sample

**Julia**
```julia
using Survey, BenchmarkTools
apiclus1 = load_data("apiclus1")
dclus1 = svydesign(id=:1, weights=:pw, data = apiclus1)
@benchmark svyby(:api00, [:cname, :meals], dclus1, svymean)
```
In the following example, we will load a stratified sample of the Academic Performance Index dataset for Californian schools and do basic analysis.

```julia
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min max): 219.387 μs 8.284 ms ┊ GC (min max): 0.00% 90.94%
Time (median): 265.214 μs ┊ GC (median): 0.00%
Time (mean ± σ): 325.100 μs ± 513.020 μs ┊ GC (mean ± σ): 14.23% ± 8.58%
```
using Survey

The Julia code is about 605 times faster than the R code.
strat = load_data("apistrat")

dstrat = StratifiedSample(strat, :stype; weights = :pw, popsize = :fpc)

mean(:api00, dstrat)
1×2 DataFrame
Row │ mean SE
│ Float64 Float64
─────┼──────────────────
1662.287 9.40894

total(:api00, dstrat)
1×2 DataFrame
Row │ total SE
│ Float64 Float64
─────┼────────────────────
14.10221e6 58279.0

mean(:api00, :cname, dstrat)
40×3 DataFrame
Row │ cname mean SE
│ String15 Float64 Float64
─────┼───────────────────────────────────────
1 │ Los Angeles 633.511 21.3912
2 │ Ventura 707.172 31.6856
3 │ Kern 678.235 53.1337
39 │ Mendocino 632.018 1.04942
40 │ Butte 627.0 0.0
```

## Strategic goals

We want to implement all the features provided by the [Survey package in R](https://cran.r-project.org/web/packages/survey/index.html)

The [milestones](https://github.com/xKDR/Survey.jl/milestones) sections of the repository contains a list of features that contributors can implement in the short-term.
Expand Down
6 changes: 3 additions & 3 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,10 @@ makedocs(;
),
pages=[
"Home" => "index.md",
"Examples" => "examples.md",
"Comparison with R" => "R_comparison.md",
"Performance" => "performance.md",
"Moving from R" => "R_comparison.md",
"API reference" => "api.md"
],
checkdocs=:exports,
)

deploydocs(;
Expand Down
Loading

0 comments on commit de80168

Please sign in to comment.