Skip to content

Commit

Permalink
Merge branch 'design_update' into design_update
Browse files Browse the repository at this point in the history
  • Loading branch information
smishr authored Nov 5, 2022
2 parents dfc26ed + 5b072f2 commit 555a5b3
Show file tree
Hide file tree
Showing 20 changed files with 305 additions and 317 deletions.
Binary file modified docs/src/assets/hist.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/src/assets/scatter.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
61 changes: 58 additions & 3 deletions docs/src/examples.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,62 @@
# Examples

The following examples use the Academic Performance Index (API) dataset for Californian schools.
The following examples use the
[Academic Performance Index](https://r-survey.r-forge.r-project.org/survey/html/api.html)
(API) dataset for Californian schools. The data sets contain information for all schools
with at least 100 students and for various probability samples of the data.

```@docs
svyby(formula::Symbol, by, design::svydesign, func::Function, params = [])
The API program has been discontinued at the end of 2018. Information is archived at
[https://www.cde.ca.gov/re/pr/api.asp](https://www.cde.ca.gov/re/pr/api.asp)

## Simple Random Sample

Firstly, a survey design needs a dataset from which to gather information. A dataset
can be loaded as a `DataFrame` using the `load_data` function:

```julia
julia> apisrs = load_data("apisrs");
```

Next, we can build a design. The most basic survey design is a simple random sample design.
A [`SimpleRandomSample`](@ref) can be instantianted by calling the constructor:

```julia
julia> srs = SimpleRandomSample(apisrs; weights = :pw)
SimpleRandomSample:
data: 200x42 DataFrame
weights: 31.0, 31.0, 31.0, ..., 31.0
probs: 0.0323, 0.0323, 0.0323, ..., 0.0323
fpc: 6194, 6194, 6194, ..., 6194
popsize: 6194
sampsize: 200
sampfraction: 0.0323
ignorefpc: false
```

With a `SimpleRandomSample` (as well as with any subtype of [`AbstractSurveyDesign`](@ref))
it is possible to calculate estimates of the mean or population total for a given variable,
along with the corresponding standard errors.

```julia
julia> svymean(:api00, srs)
1×2 DataFrame
Row │ mean sem
│ Float64 Float64
─────┼──────────────────
1656.585 9.24972

julia> svytotal(:api00, srs)
1×2 DataFrame
Row │ total se_total
│ Float64 Float64
─────┼─────────────────────
14.06689e6 57292.8
```

The design can be tweaked by specifying the population or sample size or whether
or not to account for finite population correction (fpc). By default the weights
are equal to one, the sample size is equal to the number of rows in `data` and the
fpc is not ignored. The population size is calculated from the weights.

When `ignorefpc` is set to `false` the `fpc` is calculated from the sample and population
sizes. When it is set to `true` it is set to 1.
30 changes: 27 additions & 3 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,32 @@ This package is the Julia implementation of the [Survey package in R](https://cr

At [xKDR](https://xkdr.org/) we processed millions of records from household surveys using the survey package in R. This process took hours of computing time. By implementing the code in Julia, we are able to do the processing in seconds. In this package we have implemented the functions `svymean`, `svyquantile` and `svysum`. We have kept the syntax between the two packages similar so that we can easily move our existing code to the new language.

Documentation for [Survey](https://github.com/Survey.jl).
## Index

```@autodocs
Modules = [Survey]
```@index
Module = [Survey]
Private = false
```

## API
```@docs
load_data
AbstractSurveyDesign
SimpleRandomSample
StratifiedSample
ClusterSample
dim(design::AbstractSurveyDesign)
colnames(design::AbstractSurveyDesign)
dimnames(design::AbstractSurveyDesign)
svymean(x::Symbol, design::SimpleRandomSample)
svytotal(x::Symbol, design::SimpleRandomSample)
svyby
svyglm
svyplot(design::AbstractSurveyDesign, x::Symbol, y::Symbol; kwargs...)
svyhist(design::AbstractSurveyDesign, var::Symbol,
bins::Union{Integer, AbstractVector} = freedman_diaconis(design, var);
normalization = :density,
kwargs...
)
svyboxplot(design::AbstractSurveyDesign, x::Symbol, y::Symbol; kwargs...)
```
101 changes: 0 additions & 101 deletions shikharTests.jl

This file was deleted.

4 changes: 2 additions & 2 deletions src/Survey.jl
Original file line number Diff line number Diff line change
Expand Up @@ -11,19 +11,19 @@ using AlgebraOfGraphics
using CategoricalArrays

include("SurveyDesign.jl")
include("show.jl")
include("svydesign.jl")
include("svymean.jl")
include("svyquantile.jl")
include("svytotal.jl")
include("example.jl")
include("load_data.jl")
include("svyglm.jl")
include("svyhist.jl")
include("svyplot.jl")
include("dimnames.jl")
include("svyboxplot.jl")
include("svyby.jl")
include("ht.jl")
include("show.jl")

export load_data
export AbstractSurveyDesign, SimpleRandomSample, StratifiedSample
Expand Down
12 changes: 8 additions & 4 deletions src/SurveyDesign.jl
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
"""
Supertype for every survey design type: `SimpleRandomSample`, `ClusterSample`
and `StratifiedSample`.
AbstractSurveyDesign
The data to a survey constructor is modified. To avoid this pass a copy of the data
instead of the original.
Supertype for every survey design type: [`SimpleRandomSample`](@ref), [`StratifiedSample`](@ref)
and [`ClusterSample`](@ref).
!!! note
The data passed to a survey constructor is modified. To avoid this pass a copy of the data
instead of the original.
"""
abstract type AbstractSurveyDesign end

Expand Down
15 changes: 9 additions & 6 deletions src/dimnames.jl
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
"""
dim(design)
Get the dimensions of a `SurveyDesign`.
```jldoctest
julia> apisrs = load_data("apisrs");
julia> srs = SimpleRandomSample(apisrs);
julia> srs = SimpleRandomSample(apisrs; weights = :pw);
julia> dim(srs)
(200, 42)
Expand All @@ -14,7 +15,7 @@ julia> dim(srs)
dim(design::AbstractSurveyDesign) = size(design.data)

"""
Method for `svydesign` object.
Method for `svydesign`.
```jldoctest
julia> apistrat = load_data("apistrat");
Expand All @@ -29,12 +30,13 @@ dim(design::svydesign) = size(design.variables)

"""
colnames(design)
Get the column names of a `SurveyDesign`.
```jldoctest
julia> apisrs = load_data("apisrs");
julia> srs = SimpleRandomSample(apisrs);
julia> srs = SimpleRandomSample(apisrs; weights = :pw);
julia> colnames(srs)
42-element Vector{String}:
Expand Down Expand Up @@ -63,7 +65,7 @@ julia> colnames(srs)
colnames(design::AbstractSurveyDesign) = names(design.data)

"""
Method for `svydesign` objects.
Method for `svydesign`.
```jldoctest
julia> apistrat = load_data("apistrat");
Expand Down Expand Up @@ -98,12 +100,13 @@ colnames(design::svydesign) = names(design.variables)

"""
dimnames(design)
Get the names of the rows and columns of a `SurveyDesign`.
```jldoctest
julia> apisrs = load_data("apisrs");
julia> srs = SimpleRandomSample(apisrs);
julia> srs = SimpleRandomSample(apisrs; weights = :pw);
julia> dimnames(srs)
2-element Vector{Vector{String}}:
Expand All @@ -114,7 +117,7 @@ julia> dimnames(srs)
dimnames(design::AbstractSurveyDesign) = [string.(1:size(design.data, 1)), names(design.data)]

"""
Method for `svydesign` objects.
Method for `svydesign`.
```jldoctest
julia> apistrat = load_data("apistrat");
Expand Down
26 changes: 0 additions & 26 deletions src/example.jl

This file was deleted.

42 changes: 42 additions & 0 deletions src/load_data.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
const PKG_DIR = joinpath(pathof(Survey), "..", "..") |> normpath
asset_path(args...) = joinpath(PKG_DIR, "assets", args...)

"""
load_data(name)
Load a dataset as a `DataFrame`.
All available datasets can be found in the [`assets/`](https://github.com/xKDR/Survey.jl/tree/main/assets)
directory.
```jldoctest
julia> apisrs = load_data("apisrs")
200×40 DataFrame
Row │ Column1 cds stype name sname ⋯
│ Int64 Int64 String1 String15 String ⋯
─────┼──────────────────────────────────────────────────────────────────────────
1 │ 1039 15739081534155 H McFarland High McFarland High ⋯
2 │ 1124 19642126066716 E Stowers (Cecil Stowers (Cecil B.) E
3 │ 2868 30664493030640 H Brea-Olinda Hig Brea-Olinda High
4 │ 1273 19644516012744 E Alameda Element Alameda Elementary
5 │ 4926 40688096043293 E Sunnyside Eleme Sunnyside Elementary ⋯
6 │ 2463 19734456014278 E Los Molinos Ele Los Molinos Elementa
7 │ 2031 19647336058200 M Northridge Midd Northridge Middle
8 │ 1736 19647336017271 E Glassell Park E Glassell Park Elemen
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
194 │ 4880 39686766042782 E Tyler Skills El Tyler Skills Element ⋯
195 │ 993 15636851531987 H Desert Junior/S Desert Junior/Senior
196 │ 969 15635291534775 H North High North High
197 │ 1752 19647336017446 E Hammel Street E Hammel Street Elemen
198 │ 4480 37683386039143 E Audubon Element Audubon Elementary ⋯
199 │ 4062 36678196036222 E Edison Elementa Edison Elementary
200 │ 2683 24657716025621 E Franklin Elemen Franklin Elementary
36 columns and 185 rows omitted
```
"""
function load_data(name)
name = name * ".csv"
@assert name readdir(asset_path())

CSV.read(asset_path(name), DataFrame, missingstring="NA")
end
2 changes: 1 addition & 1 deletion src/show.jl
Original file line number Diff line number Diff line change
Expand Up @@ -64,4 +64,4 @@ function Base.show(io::IO, ::MIME"text/plain", design::svydesign)
print(design.nest)
printstyled("\ncheck_strat: "; bold=true)
print(design.check_strat)
end
end
Loading

0 comments on commit 555a5b3

Please sign in to comment.