Survey

This package is used to study complex survey data. It is the Julia implementation of the Survey package in R developed by Professor Thomas Lumley.

As the size of survey datasets have become larger, processing the records can take hours or days in R. We endeavour to solve this problem by implementing the Survey package in Julia.

How to install

add "https://github.com/xKDR/Survey.jl.git"

Basic usage

In the following example, we will load the Academic Performance Index dataset for Californian schools and produce the weighted mean for each county.

using Survey

data(api)
## This function loads a commonly used dataset, Academic Performance Index (API), as an example.
## Any DataFrame object can be used with this package.

dclus1 = svydesign(id = :1, weights = :pw, data = apiclus1)

svyby(:api00, :cname, dclus1, svymean)
11×3 DataFrame
 Row │ cname        mean     SE
     │ String15     Float64  Float64
─────┼────────────────────────────────
   1 │ Alameda      669.0    16.2135
   2 │ Fresno       472.0     9.85278
   3 │ Kern         452.5    29.5049
   4 │ Los Angeles  647.267  23.5116
   5 │ Mendocino    623.25   24.216
   6 │ Merced       519.25   10.4925
   7 │ Orange       710.562  28.9123
   8 │ Plumas       709.556  13.2174
   9 │ San Diego    659.436  12.2082
  10 │ San Joaquin  551.189  11.578
  11 │ Santa Clara  732.077  12.2291

This example is from the Survey package in R. The examples section of the documentation shows the R and the Julia code side by side for this and a few other examples.

Performance

We will measure the performance of the R and Julia for the example shown above.

R

library(survey)
library(microbenchmark)
data(api)
dclus1 <- svydesign(id = ~1, weights = ~pw, data = apiclus1)
microbenchmark(svyby(~api00, by = ~cname, design = dclus1, svymean), units = "us")

                                                 expr      min       lq
 svyby(~api00, by = ~cname, design = dclus1, svymean) 10180.47 12102.61
     mean   median       uq      max neval
 12734.43 12421.93 12788.55 17242.35   100

Julia

using Survey, BenchmarkTools
data(api)
dclus1 = svydesign(id=:1, weights=:pw, data = apiclus1)
@benchmark svyby(:api00, :cname, dclus1, svymean)

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  54.464 μs …   6.070 ms  ┊ GC (min … max): 0.00% … 94.01%
 Time  (median):     72.468 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   81.833 μs ± 190.657 μs  ┊ GC (mean ± σ):  7.62% ±  3.23%

The Julia code is about 171 times faster than the R code.

We increase the complexity by grouping the data by two variables and then performing the same operations. R

library(survey)
library(microbenchmark)
data(api)
dclus1 <- svydesign(id = ~1, weights = ~pw, data = apiclus1)
microbenchmark(svyby(~api00, by = ~cname+meals, design = dclus1, svymean, keep.var = FALSE), units = "us")

Unit: microseconds
                                                         expr      min     lq
 svyby(~api00, by = ~cname + meals, design = dclus1, svymean) 132468.1 149914
     mean   median       uq      max neval
 166121.9 160571.3 172301.6 304979.2   100

Julia

using Survey, BenchmarkTools
data(api)
dclus1 = svydesign(id=:1, weights=:pw, data = apiclus1)
@benchmark svyby(:api00, [:cname, :meals], dclus1, svymean)

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  219.387 μs …   8.284 ms  ┊ GC (min … max):  0.00% … 90.94%
 Time  (median):     265.214 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   325.100 μs ± 513.020 μs  ┊ GC (mean ± σ):  14.23% ±  8.58%

The Julia code is about 605 times faster than the R code.

Strategic goals

We want to implement all the features provided by the Survey package in R

The milestones sections of the repository contains a list of features that contributors can implement in the short-term.

Support

We gratefully acknowledge the JuliaLab at MIT for financial support for this project.

Name		Name	Last commit message	Last commit date
Latest commit History 133 Commits
.github/workflows		.github/workflows
assets		assets
docs		docs
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
Project.toml		Project.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Survey

How to install

Basic usage

Performance

Strategic goals

Support

About

Releases

Packages

Languages

License

greimel/Survey.jl

Folders and files

Latest commit

History

Repository files navigation

Survey

How to install

Basic usage

Performance

Strategic goals

Support

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages