Skip to content

Commit

Permalink
Merge pull request #15 from iuliadmtru/iuliadmtru/docimprove
Browse files Browse the repository at this point in the history
Add minor rephrasing to README
  • Loading branch information
ayushpatnaikgit authored Jul 18, 2022
2 parents 7eed86f + d7682fe commit 8adaf6c
Showing 1 changed file with 15 additions and 15 deletions.
30 changes: 15 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,30 +7,30 @@
[![Milestones](https://img.shields.io/badge/-milestones-brightgreen)](https://github.com/xKDR/Survey.jl/milestones)


This package is used to study stratified survey data. It is the Julia implementation of the [Survey package in R](https://cran.r-project.org/web/packages/survey/index.html) developed by [Professor Thomas Lumley](https://www.stat.auckland.ac.nz/people/tlum005).
This package is used to study complex survey data. It is the Julia implementation of the [Survey package in R](https://cran.r-project.org/web/packages/survey/index.html) developed by [Professor Thomas Lumley](https://www.stat.auckland.ac.nz/people/tlum005).

As the size of survey datasets have become larger, processing the records can take hours or days in R. We endeavour to solve this problem by implementing the Survey package in Julia.
As the size of survey datasets have become larger, processing the records can take hours or days in R. We endeavour to solve this problem by implementing the Survey package in Julia.

## How to install

add "https://github.com/xKDR/Survey.jl.git"

## Basic usage

In the following example, we will load the Academic Performance Index dataset for Californian schools and produce the weighted mean for each county.
In the following example, we will load the Academic Performance Index dataset for Californian schools and produce the weighted mean for each county.
```julia
using Survey

data(api)
## This function loads a commonly used dataset, Academic Performance Index (API), as an example.
## Any DataFrame object can be used with this package.
## Any DataFrame object can be used with this package.

dclus1 = svydesign(id = :1, weights = :pw, data = apiclus1)

svyby(:api00, :cname, dclus1, svymean)
11×3 DataFrame
Row │ cname mean SE
│ String15 Float64 Float64
Row │ cname mean SE
│ String15 Float64 Float64
─────┼────────────────────────────────
1 │ Alameda 669.0 16.2135
2 │ Fresno 472.0 9.85278
Expand All @@ -45,10 +45,10 @@ svyby(:api00, :cname, dclus1, svymean)
11 │ Santa Clara 732.077 12.2291
```

This example is from the Survey package in R. The [examples section of the documentation](https://xkdr.github.io/Survey.jl/dev/examples/) shows the R and the Julia code side by side for this and a few other examples.
This example is from the Survey package in R. The [examples section of the documentation](https://xkdr.github.io/Survey.jl/dev/examples/) shows the R and the Julia code side by side for this and a few other examples.

## Performance
We will measure the performance of the R and Julia for example shown above.
We will measure the performance of the R and Julia for the example shown above.

**R**

Expand All @@ -69,7 +69,7 @@ microbenchmark(svyby(~api00, by = ~cname, design = dclus1, svymean), units = "us

**Julia**
```julia
using Survey, BenchmarkTools
using Survey, BenchmarkTools
data(api)
dclus1 = svydesign(id=:1, weights=:pw, data = apiclus1)
@benchmark svyby(:api00, :cname, dclus1, svymean)
Expand All @@ -82,9 +82,9 @@ BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Time (mean ± σ): 81.833 μs ± 190.657 μs ┊ GC (mean ± σ): 7.62% ± 3.23%
```

The julia code is about 171 times faster than R.
The Julia code is about 171 times faster than the R code.

We increase the complexity to grouby two variables and then perform the same operations.
We increase the complexity by grouping the data by two variables and then performing the same operations.
**R**

```R
Expand All @@ -105,7 +105,7 @@ Unit: microseconds

**Julia**
```julia
using Survey, BenchmarkTools
using Survey, BenchmarkTools
data(api)
dclus1 = svydesign(id=:1, weights=:pw, data = apiclus1)
@benchmark svyby(:api00, [:cname, :meals], dclus1, svymean)
Expand All @@ -118,14 +118,14 @@ BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Time (mean ± σ): 325.100 μs ± 513.020 μs ┊ GC (mean ± σ): 14.23% ± 8.58%
```

The julia code is about 605 times faster than R.
The Julia code is about 605 times faster than the R code.

## Strategic goals

We want to implement all the features provided by the [Survey package in R](https://cran.r-project.org/web/packages/survey/index.html)

The [milestones](https://github.com/xKDR/Survey.jl/milestones) sections of the repository contains a list of features that contributors can implement in the short-term.
The [milestones](https://github.com/xKDR/Survey.jl/milestones) sections of the repository contains a list of features that contributors can implement in the short-term.

## Support

We gratefully acknowledge the JuliaLab at MIT for financial support for this project.
We gratefully acknowledge the JuliaLab at MIT for financial support for this project.

0 comments on commit 8adaf6c

Please sign in to comment.