Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New constructor for ReplicateDesign. #251

Merged
merged 16 commits into from
Mar 1, 2023
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion docs/Project.toml
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
[deps]
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
Survey = "c1a98b4d-6cd2-47ec-b9e9-69b59c35373c"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
Survey = "c1a98b4d-6cd2-47ec-b9e9-69b59c35373c"
113 changes: 110 additions & 3 deletions src/SurveyDesign.jl
Original file line number Diff line number Diff line change
Expand Up @@ -127,14 +127,38 @@ end
"""
ReplicateDesign <: AbstractSurveyDesign

Survey design obtained by replicating an original design using [`bootweights`](@ref).
Survey design obtained by replicating an original design using [`bootweights`](@ref). If
replicate weights are available, then they can be used to directly create a `ReplicateDesign`.

```jldoctest
# Constructors

```julia
ReplicateDesign(
data::AbstractDataFrame,
replicate_weights::Vector{Symbol};
clusters::Union{Nothing,Symbol,Vector{Symbol}} = nothing,
strata::Union{Nothing,Symbol} = nothing,
popsize::Union{Nothing,Symbol} = nothing,
weights::Union{Nothing,Symbol} = nothing
)
```

# Arguments

The constructor has the same arguments as [`SurveyDesign`](@ref). The only additional argument is `replicate_weights`, which
must be a `Vector` of `Symbols`. The `Symbol`s should represent the columns of `data` which contain the replicate weights. The
names of these columns must be of the form `replicate_i`, where `i` ranges from 1 to the number of replicate weights.

# Examples

Here is an example where the [`bootweights`](@ref) function is used to create a `ReplicateDesign`.

```jldoctest replicate-design; setup = :(using Survey, CSV, DataFrames)
julia> apistrat = load_data("apistrat");

julia> dstrat = SurveyDesign(apistrat; strata=:stype, weights=:pw);

julia> bootstrat = bootweights(dstrat; replicates=1000)
julia> bootstrat = bootweights(dstrat; replicates=1000) # creating a ReplicateDesign using bootweights
ReplicateDesign:
data: 200×1044 DataFrame
strata: stype
Expand All @@ -145,7 +169,38 @@ sampsize: [100, 100, 100 … 50]
weights: [44.21, 44.21, 44.21 … 15.1]
allprobs: [0.0226, 0.0226, 0.0226 … 0.0662]
replicates: 1000

```

If the replicate weights are given to us already, then we can directly pass them to the `ReplicateDesign` constructor. For instance, in
the above example, suppose we had the `bootstrat` data as a CSV file.

```jldoctest replicate-design
julia> using CSV;

julia> CSV.write("apistrat_withreplicates.csv", bootstrat.data);

```

We can now pass the replicate weights directly to the `ReplicateDesign` constructor.

```jldoctest replicate-design
julia> apistrat_fromcsv = CSV.read("apistrat_withreplicates.csv", DataFrame);

julia> bootstrat_direct = ReplicateDesign(apistrat_fromcsv, [Symbol("replicate_"*string(replicate)) for replicate in 1:1000]; strata=:stype, weights=:pw)
ReplicateDesign:
data: 200×1044 DataFrame
strata: stype
[E, E, E … H]
cluster: none
popsize: [4420.9999, 4420.9999, 4420.9999 … 755.0]
sampsize: [100, 100, 100 … 50]
weights: [44.21, 44.21, 44.21 … 15.1]
allprobs: [0.0226, 0.0226, 0.0226 … 0.0662]
replicates: 1000

```

"""
struct ReplicateDesign <: AbstractSurveyDesign
smishr marked this conversation as resolved.
Show resolved Hide resolved
data::AbstractDataFrame
Expand All @@ -156,5 +211,57 @@ struct ReplicateDesign <: AbstractSurveyDesign
weights::Symbol # Effective weights in case of singlestage approx supported
allprobs::Symbol # Right now only singlestage approx supported
pps::Bool
type::String
replicates::UInt
replicate_weights::Vector{Symbol}
smishr marked this conversation as resolved.
Show resolved Hide resolved

# default constructor
smishr marked this conversation as resolved.
Show resolved Hide resolved
function ReplicateDesign(
data::DataFrame,
cluster::Symbol,
popsize::Symbol,
sampsize::Symbol,
strata::Symbol,
weights::Symbol,
allprobs::Symbol,
pps::Bool,
type::String,
replicates::UInt,
replicate_weights::Vector{Symbol}
)
new(data, cluster, popsize, sampsize, strata, weights, allprobs,
pps, type, replicates, replicate_weights)
end

# constructor with given replicate_weights
function ReplicateDesign(
data::AbstractDataFrame,
replicate_weights::Vector{Symbol};
clusters::Union{Nothing,Symbol,Vector{Symbol}} = nothing,
strata::Union{Nothing,Symbol} = nothing,
popsize::Union{Nothing,Symbol} = nothing,
weights::Union{Nothing,Symbol} = nothing
)
# call the SurveyDesign constructor
smishr marked this conversation as resolved.
Show resolved Hide resolved
base_design = SurveyDesign(
data;
clusters=clusters,
strata=strata,
popsize=popsize,
weights=weights
)
new(
base_design.data,
base_design.cluster,
base_design.popsize,
base_design.sampsize,
base_design.strata,
base_design.weights,
base_design.allprobs,
base_design.pps,
"bootstrap",
length(replicate_weights),
replicate_weights
)
end
end
4 changes: 3 additions & 1 deletion src/bootstrap.jl
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,8 @@ function bootweights(design::SurveyDesign; replicates = 4000, rng = MersenneTwis
design.weights,
design.allprobs,
design.pps,
replicates,
"bootstrap",
UInt(replicates),
[Symbol("replicate_"*string(replicate)) for replicate in 1:replicates]
)
end
1 change: 1 addition & 0 deletions src/show.jl
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ Base.show(io::IO, ::MIME"text/plain", design::SurveyDesign) = surveyshow(io, des
function Base.show(io::IO, ::MIME"text/plain", design::ReplicateDesign)
# new_io = IOContext(io, :compact=>true, :limit=>true, :displaysize=>(50, 50))
surveyshow(io, design)
printinfo(io, "\ntype", design.type; newline = false)
printinfo(io, "\nreplicates", design.replicates; newline = false)
end

Expand Down
14 changes: 14 additions & 0 deletions test/SurveyDesign.jl
Original file line number Diff line number Diff line change
Expand Up @@ -259,3 +259,17 @@ end
yrbs = copy(yrbs_original)
dyrbs = SurveyDesign(yrbs; clusters = :psu, strata = :stratum, weights = :weight)
end

@testset "ReplicateDesign_constructor" begin
for (sample, sample_direct) in [(bsrs, bsrs_direct), (bstrat, bstrat_direct), (dclus1_boot, dclus1_boot_direct)]
@test isequal(sample.data, sample_direct.data)
@test isequal(sample.popsize, sample_direct.popsize)
@test isequal(sample.sampsize, sample_direct.sampsize)
@test isequal(sample.strata, sample_direct.strata)
@test isequal(sample.weights, sample_direct.weights)
@test isequal(sample.allprobs, sample_direct.allprobs)
@test isequal(sample.pps, sample_direct.pps)
@test isequal(sample.replicates, sample_direct.replicates)
@test isequal(sample.replicate_weights, sample_direct.replicate_weights)
end
end
5 changes: 5 additions & 0 deletions test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,21 +4,26 @@ using CategoricalArrays

const STAT_TOL = 1e-5
const SE_TOL = 1e-1
const REPLICATE_WEIGHTS = [Symbol("replicate_"*string(i)) for i in 1:4000]

# Simple random sample
apisrs = load_data("apisrs") # Load API dataset
srs = SurveyDesign(apisrs, weights = :pw)
bsrs = srs |> bootweights # Create replicate design
bsrs_direct = ReplicateDesign(bsrs.data, REPLICATE_WEIGHTS, weights = :pw) # using ReplicateDesign constructor

# Stratified sample
apistrat = load_data("apistrat") # Load API dataset
dstrat = SurveyDesign(apistrat, strata = :stype, weights = :pw) # Create SurveyDesign
bstrat = dstrat |> bootweights # Create replicate design
bstrat_direct = ReplicateDesign(bstrat.data, REPLICATE_WEIGHTS, strata=:stype, weights=:pw) # using ReplicateDesign constructor

# One-stage cluster sample
apiclus1 = load_data("apiclus1") # Load API dataset
apiclus1[!, :pw] = fill(757 / 15, (size(apiclus1, 1),)) # Correct api mistake for pw column
dclus1 = SurveyDesign(apiclus1; clusters = :dnum, weights = :pw) # Create SurveyDesign
dclus1_boot = dclus1 |> bootweights # Create replicate design
dclus1_boot_direct = ReplicateDesign(dclus1_boot.data, REPLICATE_WEIGHTS, clusters=:dnum, weights=:pw) # using ReplicateDesign constructor

@testset "Survey.jl" begin
@test size(load_data("apiclus1")) == (183, 40)
Expand Down
3 changes: 3 additions & 0 deletions test/show.jl
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
sampsize: [200, 200, 200 … 200]
weights: [30.97, 30.97, 30.97 … 30.97]
allprobs: [0.0323, 0.0323, 0.0323 … 0.0323]
type: bootstrap
replicates: 4000"""

show(io, MIME("text/plain"), bsrs)
Expand Down Expand Up @@ -58,6 +59,7 @@ end
sampsize: [100, 100, 100 … 50]
weights: [44.21, 44.21, 44.21 … 15.1]
allprobs: [0.0226, 0.0226, 0.0226 … 0.0662]
type: bootstrap
replicates: 4000"""

show(io, MIME("text/plain"), bstrat)
Expand Down Expand Up @@ -93,6 +95,7 @@ end
sampsize: [15, 15, 15 … 15]
weights: [50.4667, 50.4667, 50.4667 … 50.4667]
allprobs: [0.0198, 0.0198, 0.0198 … 0.0198]
type: bootstrap
replicates: 4000"""

show(io, MIME("text/plain"), dclus1_boot)
Expand Down