From d91538a3c693e939e01f9ee5dc4457262426fb4d Mon Sep 17 00:00:00 2001 From: Iulia Dumitru Date: Wed, 11 Jan 2023 14:11:14 +0530 Subject: [PATCH 01/19] Replace old design examples with updated design, add more examples --- README.md | 290 ++++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 218 insertions(+), 72 deletions(-) diff --git a/README.md b/README.md index 65f70cd2..7cdf8e72 100644 --- a/README.md +++ b/README.md @@ -6,111 +6,257 @@ [![codecov](https://codecov.io/gh/xKDR/Survey.jl/branch/main/graph/badge.svg?token=4PFSF47BT2)](https://codecov.io/gh/xKDR/Survey.jl) [![Milestones](https://img.shields.io/badge/-milestones-brightgreen)](https://github.com/xKDR/Survey.jl/milestones) +This package is used to study complex survey data. It aims to be a fast alternative +to the [Survey package in R](https://cran.r-project.org/web/packages/survey/index.html) +developed by [Professor Thomas Lumley](https://www.stat.auckland.ac.nz/people/tlum005). -This package is used to study complex survey data. It aims to be a fast alternative to the [Survey package in R](https://cran.r-project.org/web/packages/survey/index.html) developed by [Professor Thomas Lumley](https://www.stat.auckland.ac.nz/people/tlum005). +This package currently supports simple random sample, stratified sample, one- and +two-stage cluster sample. In future releases, it will support multistage sampling as well. -This package currently supports simple random sample and stratified sample. In future releases, it will support multistage sampling as well. - -## Documentation -See [Documentation](https://xkdr.github.io/Survey.jl/dev/) to learn how to use the package - -## How to install +## Installation ```julia ] add "https://github.com/xKDR/Survey.jl.git" ``` + ## Basic usage -### Simple Random Sample +The `SurveyDesign` constructor can take data corresponding to any type of design. +Depending on the keyword arguments passed, the data is processed in order to obtain +correct results for the given design. + +The following examples show how to create and manipulate different survey designs +using the [Academic Performance Index dataset for Californian schools](https://r-survey.r-forge.r-project.org/survey/html/api.html). + +### Simple random sample + +A simple random sample can be created without specifying any special keywords. Here +we will create a weighted simple random sample design. -In the following example, we will load a simple random sample of the Academic Performance Index dataset for Californian schools and do basic analysis. ```julia -using Survey +julia> apisrs = load_data("apisrs"); -srs = load_data("apisrs") +julia> srs = SurveyDesign(apisrs; weights=:pw) +SurveyDesign: +data: 200x47 DataFrame +cluster: false_cluster +design.data[!,design.cluster]: 1, 2, 3, ..., 200 +popsize: popsize +design.data[!,design.popsize]: 6190.0, 6190.0, 6190.0, ..., 6190.0 +sampsize: sampsize +design.data[!,design.sampsize]: 200, 200, 200, ..., 200 +design.data[!,:probs]: 0.0323, 0.0323, 0.0323, ..., 0.0323 +design.data[!,:allprobs]: 0.0323, 0.0323, 0.0323, ..., 0.0323 +``` -dsrs = SimpleRandomSample(srs; weights = :pw) +Using the `srs` design we can compute estimates of statistics such as mean and +population total. The design must first be resampled using +[bootstrapping](https://en.wikipedia.org/wiki/Bootstrapping_(statistics)) in order +to compute the standard errors. -mean(:api00, dsrs) +```julia +julia> bootsrs = bootweights(srs; replicates=1000) +ReplicateDesign: +data: 200x1047 DataFrame +cluster: false_cluster +design.data[!,design.cluster]: 1, 2, 3, ..., 200 +popsize: popsize +design.data[!,design.popsize]: 6190.0, 6190.0, 6190.0, ..., 6190.0 +sampsize: sampsize +design.data[!,design.sampsize]: 200, 200, 200, ..., 200 +design.data[!,:probs]: 0.0323, 0.0323, 0.0323, ..., 0.0323 +design.data[!,:allprobs]: 0.0323, 0.0323, 0.0323, ..., 0.0323 +replicates: 1000 + +julia> mean(:api00, bootsrs) 1×2 DataFrame - Row │ mean SE - │ Float64 Float64 + Row │ mean SE + │ Float64 Float64 ─────┼────────────────── - 1 │ 656.585 9.24972 + 1 │ 656.585 9.5409 -total(:enroll, dsrs) +julia> total(:enroll, bootsrs) 1×2 DataFrame - Row │ total SE - │ Float64 Float64 -─────┼───────────────────── - 1 │ 3.62107e6 1.6952e5 + Row │ total SE + │ Float64 Float64 +─────┼────────────────────── + 1 │ 3.62107e6 1.72846e5 +``` -mean(:api00, :cname, dsrs) +Now we know the mean academic performance index from the year 2000 and the total +number of students enrolled in the sampled Californian schools. We can also +calculate the statistic of two variables in one go... + +```julia +julia> mean([:api99, :api00], bootsrs) +2×3 DataFrame + Row │ names mean SE + │ String Float64 Float64 +─────┼────────────────────────── + 1 │ api99 624.685 9.84669 + 2 │ api00 656.585 9.5409 +``` + +... or we can calculate domain estimates: + +```julia +julia> total(:enroll, :cname, bootsrs) 38×3 DataFrame - Row │ cname mean SE - │ String15 Float64 Float64 -─────┼──────────────────────────────────── - 1 │ Kern 573.6 42.8026 - 2 │ Los Angeles 658.156 21.0728 - 3 │ Orange 749.333 27.0613 - ⋮ │ ⋮ ⋮ ⋮ - 36 │ Napa 727.0 46.722 - 37 │ Lake 804.0 NaN - 38 │ Merced 595.0 NaN - -quantile(:enroll,dsrs,[0.1,0.2,0.5,0.75,0.95]) -5×2 DataFrame - Row │ probability quantile - │ Float64 Float64 -─────┼─────────────────────── - 1 │ 0.1 245.5 - 2 │ 0.2 317.6 - 3 │ 0.5 453.0 - 4 │ 0.75 668.5 - 5 │ 0.95 1473.1 + Row │ cname total SE + │ String15 Float64 Any +─────┼──────────────────────────────────────────── + 1 │ Kern 1.95823e5 74731.2 + 2 │ Los Angeles 867129.0 1.36622e5 + 3 │ Orange 1.68786e5 63858.0 + 4 │ San Luis Obispo 6720.49 6790.49 + ⋮ │ ⋮ ⋮ ⋮ + 35 │ Calaveras 12976.4 13241.6 + 36 │ Napa 39239.0 30181.9 + 37 │ Lake 6410.79 6986.29 + 38 │ Merced 15392.1 15202.2 + 30 rows omitted ``` -### Stratified Sample +This gives us the total number of enrolled students in each county. + +### Stratified sample -In the following example, we will load a stratified sample of the Academic Performance Index dataset for Californian schools and do basic analysis. +All functionalities described above are also supported for stratified sample +designs. To create a stratified sample, the `strata` keyword must be passed to +`SurveyDesign`. ```julia -using Survey +julia> apistrat = load_data("apistrat"); -strat = load_data("apistrat") +julia> strat = SurveyDesign(apistrat; strata=:stype, weights=:pw) +SurveyDesign: +data: 200x46 DataFrame +cluster: false_cluster +design.data[!,design.cluster]: 1, 2, 3, ..., 200 +popsize: popsize +design.data[!,design.popsize]: 6190.0, 6190.0, 6190.0, ..., 6190.0 +sampsize: sampsize +design.data[!,design.sampsize]: 200, 200, 200, ..., 200 +design.data[!,:probs]: 0.0226, 0.0226, 0.0226, ..., 0.0662 +design.data[!,:allprobs]: 0.0226, 0.0226, 0.0226, ..., 0.0662 -dstrat = StratifiedSample(strat, :stype; weights = :pw, popsize = :fpc) -mean(:api00, dstrat) -1×2 DataFrame - Row │ mean SE - │ Float64 Float64 -─────┼────────────────── - 1 │ 662.287 9.40894 +julia> bootstrat = bootweights(strat; replicates=1000) +ReplicateDesign: +data: 200x1046 DataFrame +cluster: false_cluster +design.data[!,design.cluster]: 1, 2, 3, ..., 200 +popsize: popsize +design.data[!,design.popsize]: 6190.0, 6190.0, 6190.0, ..., 6190.0 +sampsize: sampsize +design.data[!,design.sampsize]: 200, 200, 200, ..., 200 +design.data[!,:probs]: 0.0226, 0.0226, 0.0226, ..., 0.0662 +design.data[!,:allprobs]: 0.0226, 0.0226, 0.0226, ..., 0.0662 +replicates: 1000 -total(:api00, dstrat) -1×2 DataFrame - Row │ total SE - │ Float64 Float64 -─────┼──────────────────── - 1 │ 4.10221e6 58279.0 -mean(:api00, :cname, dstrat) +julia> mean([:api99, :api00], bootstrat) +2×3 DataFrame + Row │ names mean SE + │ String Float64 Float64 +─────┼─────────────────────────── + 1 │ api99 629.395 10.08 + 2 │ api00 662.287 9.56931 + +julia> mean(:api00, :cname, bootstrat) 40×3 DataFrame - Row │ cname mean SE - │ String15 Float64 Float64 + Row │ cname mean SE + │ String15 Float64 Any +─────┼────────────────────────────────── + 1 │ Los Angeles 633.511 21.6242 + 2 │ Ventura 707.172 34.2091 + 3 │ Kern 678.235 57.651 + 4 │ San Diego 704.121 33.0882 + ⋮ │ ⋮ ⋮ ⋮ + 37 │ Napa 660.0 0.0 + 38 │ Mariposa 706.0 0.0 + 39 │ Mendocino 632.018 1.70573 + 40 │ Butte 627.0 0.0 + 32 rows omitted +``` + +### Cluster sample + +For now, the package supports one- and two-stage cluster sampling. These are +created by passing the `clusters` keyword argument to `SurveyDesign`. + +```julia +julia> apiclus1 = load_data("apiclus1"); + +julia> clus_one_stage = SurveyDesign(apiclus1; clusters=:dnum, weights=:pw) +SurveyDesign: +data: 183x46 DataFrame +cluster: dnum +design.data[!,design.cluster]: 637, 637, 637, ..., 448 +popsize: popsize +design.data[!,design.popsize]: 6190.0, 6190.0, 6190.0, ..., 6190.0 +sampsize: sampsize +design.data[!,design.sampsize]: 15, 15, 15, ..., 15 +design.data[!,:probs]: 0.0295, 0.0295, 0.0295, ..., 0.0295 +design.data[!,:allprobs]: 0.0295, 0.0295, 0.0295, ..., 0.0295 + + +julia> apiclus2 = load_data("apiclus2"); + +julia> clus_two_stage = SurveyDesign(apiclus2; clusters=[:dnum, :snum], weights=:pw) +SurveyDesign: +data: 126x47 DataFrame +cluster: dnum +design.data[!,design.cluster]: 15, 63, 83, ..., 795 +popsize: popsize +design.data[!,design.popsize]: 5130.0, 5130.0, 5130.0, ..., 5130.0 +sampsize: sampsize +design.data[!,design.sampsize]: 40, 40, 40, ..., 40 +design.data[!,:probs]: 0.0528, 0.0528, 0.0528, ..., 0.0528 +design.data[!,:allprobs]: 0.0528, 0.0528, 0.0528, ..., 0.0528 +``` + +Again, all above functionalities are supported for cluster sample designs as well. + +```julia +julia> bootclus_one_stage = bootweights(clus_one_stage; replicates=1000); + +julia> total([:enroll, Symbol("api.stu")], bootclus_one_stage) +2×3 DataFrame + Row │ names total SE + │ String Float64 Float64 +─────┼─────────────────────────────── + 1 │ enroll 3.40494e6 9.4505e5 + 2 │ api.stu 2.89321e6 8.10919e5 + +julia> bootclus_two_stage = bootweights(clus_two_stage; replicates=1000); + +julia> mean(:api00, :cname, bootclus_two_stage) +26×3 DataFrame + Row │ cname mean SE + │ String15 Float64 Any ─────┼─────────────────────────────────────── - 1 │ Los Angeles 633.511 21.3912 - 2 │ Ventura 707.172 31.6856 - 3 │ Kern 678.235 53.1337 - ⋮ │ ⋮ ⋮ ⋮ - 39 │ Mendocino 632.018 1.04942 - 40 │ Butte 627.0 0.0 + 1 │ Placer 821.0 0.0 + 2 │ Tuolumne 773.0 0.0 + 3 │ San Mateo 743.091 92.7257 + 4 │ San Luis Obispo 811.0 0.0 + ⋮ │ ⋮ ⋮ ⋮ + 23 │ Monterey 720.5 6.50969e-15 + 24 │ Tulare 607.5 106.359 + 25 │ Stanislaus 730.4 3.32051e-14 + 26 │ Contra Costa 864.0 0.0 + 18 rows omitted ``` -## Strategic goals -We want to implement all the features provided by the [Survey package in R](https://cran.r-project.org/web/packages/survey/index.html) +## Future goals + +We want to implement all the features provided by the +[Survey package in R](https://cran.r-project.org/web/packages/survey/index.html) +in a Julia-native way. The main goal is to have a complete package that provides +a large range of functionality and takes efficiency into consideration, such that +large surveys can be analysed fast. -The [milestones](https://github.com/xKDR/Survey.jl/milestones) sections of the repository contains a list of features that contributors can implement in the short-term. +The [milestones](https://github.com/xKDR/Survey.jl/milestones) section of the repository +contains a list of features that contributors can implement in the short-term. ## Support From 0755d094df6840a5ed81df4c8f4da80cf35c4937 Mon Sep 17 00:00:00 2001 From: Iulia Dumitru Date: Wed, 11 Jan 2023 14:16:40 +0530 Subject: [PATCH 02/19] Add reference to the documentation tutorial --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index 7cdf8e72..bfdc1ee3 100644 --- a/README.md +++ b/README.md @@ -247,6 +247,9 @@ julia> mean(:api00, :cname, bootclus_two_stage) 18 rows omitted ``` +For a more complete guide, see the [Tutorial](https://xkdr.github.io/Survey.jl/dev/#Basic-demo) +section in the documentation. + ## Future goals We want to implement all the features provided by the From 3ee9a77bcce0679808ee6c39676ae72fe3036b40 Mon Sep 17 00:00:00 2001 From: Iulia Dumitru Date: Wed, 11 Jan 2023 14:31:41 +0530 Subject: [PATCH 03/19] Add details about multistage sampling --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index bfdc1ee3..6f698260 100644 --- a/README.md +++ b/README.md @@ -11,7 +11,9 @@ to the [Survey package in R](https://cran.r-project.org/web/packages/survey/inde developed by [Professor Thomas Lumley](https://www.stat.auckland.ac.nz/people/tlum005). This package currently supports simple random sample, stratified sample, one- and -two-stage cluster sample. In future releases, it will support multistage sampling as well. +two-stage cluster sample, the latter using single stage approximation. For more +details see the [TODO](https://xkdr.github.io/Survey.jl/dev/) section of the +documentation. ## Installation ```julia From 815391bd50b2c68eca59d73487129a41d03bb0f9 Mon Sep 17 00:00:00 2001 From: Iulia Dumitru Date: Wed, 11 Jan 2023 16:23:00 +0530 Subject: [PATCH 04/19] Rephrase two-stage to multistage --- README.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 6f698260..a4fdb766 100644 --- a/README.md +++ b/README.md @@ -10,8 +10,8 @@ This package is used to study complex survey data. It aims to be a fast alternat to the [Survey package in R](https://cran.r-project.org/web/packages/survey/index.html) developed by [Professor Thomas Lumley](https://www.stat.auckland.ac.nz/people/tlum005). -This package currently supports simple random sample, stratified sample, one- and -two-stage cluster sample, the latter using single stage approximation. For more +This package currently supports simple random sample, stratified sample, single and +multistage cluster sample, the latter using single stage approximation. For more details see the [TODO](https://xkdr.github.io/Survey.jl/dev/) section of the documentation. @@ -183,8 +183,9 @@ julia> mean(:api00, :cname, bootstrat) ### Cluster sample -For now, the package supports one- and two-stage cluster sampling. These are -created by passing the `clusters` keyword argument to `SurveyDesign`. +For now, single and multistage cluster sampling is supported by using single stage +approximation. Cluster sample designs are created by passing the `clusters` keyword +argument to `SurveyDesign`. ```julia julia> apiclus1 = load_data("apiclus1"); From 7bfd63668b8c9f939a566d63fa460a17d2341a82 Mon Sep 17 00:00:00 2001 From: Iulia Dumitru <84318573+iuliadmtru@users.noreply.github.com> Date: Wed, 11 Jan 2023 21:11:13 +0530 Subject: [PATCH 05/19] Rephrase Co-authored-by: Ayush Patnaik --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index a4fdb766..a6d667f2 100644 --- a/README.md +++ b/README.md @@ -86,7 +86,7 @@ julia> total(:enroll, bootsrs) Now we know the mean academic performance index from the year 2000 and the total number of students enrolled in the sampled Californian schools. We can also -calculate the statistic of two variables in one go... +calculate the statistic of multiple variables in one go... ```julia julia> mean([:api99, :api00], bootsrs) From 20cdeeba170c60e9e6fed37b63507eed6cd2bf82 Mon Sep 17 00:00:00 2001 From: Iulia Dumitru Date: Wed, 11 Jan 2023 21:23:56 +0530 Subject: [PATCH 06/19] Reduce redundancy --- README.md | 188 +++++++++++++++--------------------------------------- 1 file changed, 51 insertions(+), 137 deletions(-) diff --git a/README.md b/README.md index a6d667f2..42fce5ee 100644 --- a/README.md +++ b/README.md @@ -29,10 +29,11 @@ correct results for the given design. The following examples show how to create and manipulate different survey designs using the [Academic Performance Index dataset for Californian schools](https://r-survey.r-forge.r-project.org/survey/html/api.html). -### Simple random sample +### Constructing a survey design -A simple random sample can be created without specifying any special keywords. Here -we will create a weighted simple random sample design. +A survey design can be created by calling the constructor with some keywords, +depending on the survey type. Let's create a simple random sample, a stratified +sample, a single-stage and a two-stage cluster sample. ```julia julia> apisrs = load_data("apisrs"); @@ -48,10 +49,52 @@ sampsize: sampsize design.data[!,design.sampsize]: 200, 200, 200, ..., 200 design.data[!,:probs]: 0.0323, 0.0323, 0.0323, ..., 0.0323 design.data[!,:allprobs]: 0.0323, 0.0323, 0.0323, ..., 0.0323 + +julia> apistrat = load_data("apistrat"); + +julia> strat = SurveyDesign(apistrat; strata=:stype, weights=:pw) +SurveyDesign: +data: 200x46 DataFrame +cluster: false_cluster +design.data[!,design.cluster]: 1, 2, 3, ..., 200 +popsize: popsize +design.data[!,design.popsize]: 6190.0, 6190.0, 6190.0, ..., 6190.0 +sampsize: sampsize +design.data[!,design.sampsize]: 200, 200, 200, ..., 200 +design.data[!,:probs]: 0.0226, 0.0226, 0.0226, ..., 0.0662 +design.data[!,:allprobs]: 0.0226, 0.0226, 0.0226, ..., 0.0662 + +julia> apiclus1 = load_data("apiclus1"); + +julia> clus_one_stage = SurveyDesign(apiclus1; clusters=:dnum, weights=:pw) +SurveyDesign: +data: 183x46 DataFrame +cluster: dnum +design.data[!,design.cluster]: 637, 637, 637, ..., 448 +popsize: popsize +design.data[!,design.popsize]: 6190.0, 6190.0, 6190.0, ..., 6190.0 +sampsize: sampsize +design.data[!,design.sampsize]: 15, 15, 15, ..., 15 +design.data[!,:probs]: 0.0295, 0.0295, 0.0295, ..., 0.0295 +design.data[!,:allprobs]: 0.0295, 0.0295, 0.0295, ..., 0.0295 + +julia> apiclus2 = load_data("apiclus2"); + +julia> clus_two_stage = SurveyDesign(apiclus2; clusters=[:dnum, :snum], weights=:pw) +SurveyDesign: +data: 126x47 DataFrame +cluster: dnum +design.data[!,design.cluster]: 15, 63, 83, ..., 795 +popsize: popsize +design.data[!,design.popsize]: 5130.0, 5130.0, 5130.0, ..., 5130.0 +sampsize: sampsize +design.data[!,design.sampsize]: 40, 40, 40, ..., 40 +design.data[!,:probs]: 0.0528, 0.0528, 0.0528, ..., 0.0528 +design.data[!,:allprobs]: 0.0528, 0.0528, 0.0528, ..., 0.0528 ``` -Using the `srs` design we can compute estimates of statistics such as mean and -population total. The design must first be resampled using +Using these designs we can compute estimates of statistics such as mean and +population total. The designs must first be resampled using [bootstrapping](https://en.wikipedia.org/wiki/Bootstrapping_(statistics)) in order to compute the standard errors. @@ -120,138 +163,9 @@ julia> total(:enroll, :cname, bootsrs) This gives us the total number of enrolled students in each county. -### Stratified sample - -All functionalities described above are also supported for stratified sample -designs. To create a stratified sample, the `strata` keyword must be passed to -`SurveyDesign`. - -```julia -julia> apistrat = load_data("apistrat"); - -julia> strat = SurveyDesign(apistrat; strata=:stype, weights=:pw) -SurveyDesign: -data: 200x46 DataFrame -cluster: false_cluster -design.data[!,design.cluster]: 1, 2, 3, ..., 200 -popsize: popsize -design.data[!,design.popsize]: 6190.0, 6190.0, 6190.0, ..., 6190.0 -sampsize: sampsize -design.data[!,design.sampsize]: 200, 200, 200, ..., 200 -design.data[!,:probs]: 0.0226, 0.0226, 0.0226, ..., 0.0662 -design.data[!,:allprobs]: 0.0226, 0.0226, 0.0226, ..., 0.0662 - - -julia> bootstrat = bootweights(strat; replicates=1000) -ReplicateDesign: -data: 200x1046 DataFrame -cluster: false_cluster -design.data[!,design.cluster]: 1, 2, 3, ..., 200 -popsize: popsize -design.data[!,design.popsize]: 6190.0, 6190.0, 6190.0, ..., 6190.0 -sampsize: sampsize -design.data[!,design.sampsize]: 200, 200, 200, ..., 200 -design.data[!,:probs]: 0.0226, 0.0226, 0.0226, ..., 0.0662 -design.data[!,:allprobs]: 0.0226, 0.0226, 0.0226, ..., 0.0662 -replicates: 1000 - - -julia> mean([:api99, :api00], bootstrat) -2×3 DataFrame - Row │ names mean SE - │ String Float64 Float64 -─────┼─────────────────────────── - 1 │ api99 629.395 10.08 - 2 │ api00 662.287 9.56931 - -julia> mean(:api00, :cname, bootstrat) -40×3 DataFrame - Row │ cname mean SE - │ String15 Float64 Any -─────┼────────────────────────────────── - 1 │ Los Angeles 633.511 21.6242 - 2 │ Ventura 707.172 34.2091 - 3 │ Kern 678.235 57.651 - 4 │ San Diego 704.121 33.0882 - ⋮ │ ⋮ ⋮ ⋮ - 37 │ Napa 660.0 0.0 - 38 │ Mariposa 706.0 0.0 - 39 │ Mendocino 632.018 1.70573 - 40 │ Butte 627.0 0.0 - 32 rows omitted -``` - -### Cluster sample - -For now, single and multistage cluster sampling is supported by using single stage -approximation. Cluster sample designs are created by passing the `clusters` keyword -argument to `SurveyDesign`. - -```julia -julia> apiclus1 = load_data("apiclus1"); - -julia> clus_one_stage = SurveyDesign(apiclus1; clusters=:dnum, weights=:pw) -SurveyDesign: -data: 183x46 DataFrame -cluster: dnum -design.data[!,design.cluster]: 637, 637, 637, ..., 448 -popsize: popsize -design.data[!,design.popsize]: 6190.0, 6190.0, 6190.0, ..., 6190.0 -sampsize: sampsize -design.data[!,design.sampsize]: 15, 15, 15, ..., 15 -design.data[!,:probs]: 0.0295, 0.0295, 0.0295, ..., 0.0295 -design.data[!,:allprobs]: 0.0295, 0.0295, 0.0295, ..., 0.0295 - - -julia> apiclus2 = load_data("apiclus2"); - -julia> clus_two_stage = SurveyDesign(apiclus2; clusters=[:dnum, :snum], weights=:pw) -SurveyDesign: -data: 126x47 DataFrame -cluster: dnum -design.data[!,design.cluster]: 15, 63, 83, ..., 795 -popsize: popsize -design.data[!,design.popsize]: 5130.0, 5130.0, 5130.0, ..., 5130.0 -sampsize: sampsize -design.data[!,design.sampsize]: 40, 40, 40, ..., 40 -design.data[!,:probs]: 0.0528, 0.0528, 0.0528, ..., 0.0528 -design.data[!,:allprobs]: 0.0528, 0.0528, 0.0528, ..., 0.0528 -``` - -Again, all above functionalities are supported for cluster sample designs as well. - -```julia -julia> bootclus_one_stage = bootweights(clus_one_stage; replicates=1000); - -julia> total([:enroll, Symbol("api.stu")], bootclus_one_stage) -2×3 DataFrame - Row │ names total SE - │ String Float64 Float64 -─────┼─────────────────────────────── - 1 │ enroll 3.40494e6 9.4505e5 - 2 │ api.stu 2.89321e6 8.10919e5 - -julia> bootclus_two_stage = bootweights(clus_two_stage; replicates=1000); - -julia> mean(:api00, :cname, bootclus_two_stage) -26×3 DataFrame - Row │ cname mean SE - │ String15 Float64 Any -─────┼─────────────────────────────────────── - 1 │ Placer 821.0 0.0 - 2 │ Tuolumne 773.0 0.0 - 3 │ San Mateo 743.091 92.7257 - 4 │ San Luis Obispo 811.0 0.0 - ⋮ │ ⋮ ⋮ ⋮ - 23 │ Monterey 720.5 6.50969e-15 - 24 │ Tulare 607.5 106.359 - 25 │ Stanislaus 730.4 3.32051e-14 - 26 │ Contra Costa 864.0 0.0 - 18 rows omitted -``` - -For a more complete guide, see the [Tutorial](https://xkdr.github.io/Survey.jl/dev/#Basic-demo) -section in the documentation. +All functionalities are supported by each design type. For a more complete guide, +see the [Tutorial](https://xkdr.github.io/Survey.jl/dev/#Basic-demo) section in +the documentation. ## Future goals From 8bbeeb74fd1fdfcf2773c58b820b6d0ef00771fb Mon Sep 17 00:00:00 2001 From: Iulia Dumitru Date: Thu, 12 Jan 2023 14:04:15 +0530 Subject: [PATCH 07/19] Correct phrasing regarding supported designs --- README.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 42fce5ee..a25bfe3a 100644 --- a/README.md +++ b/README.md @@ -10,10 +10,11 @@ This package is used to study complex survey data. It aims to be a fast alternat to the [Survey package in R](https://cran.r-project.org/web/packages/survey/index.html) developed by [Professor Thomas Lumley](https://www.stat.auckland.ac.nz/people/tlum005). -This package currently supports simple random sample, stratified sample, single and -multistage cluster sample, the latter using single stage approximation. For more -details see the [TODO](https://xkdr.github.io/Survey.jl/dev/) section of the -documentation. +All types of survey design are supported by this package. + +> **_NOTE:_** For multistage sampling a single stage approximation is used. For +more information see the [TODO](https://xkdr.github.io/Survey.jl/dev/) section of +the documentation. ## Installation ```julia From 7250e7d71fb67b791e82c162d460e8fcd4c7a02c Mon Sep 17 00:00:00 2001 From: Iulia Dumitru Date: Wed, 11 Jan 2023 19:02:01 +0530 Subject: [PATCH 08/19] Change show for `SurveyDesign` --- src/show.jl | 49 +++++++++++++++++++++++++++++++++---------------- 1 file changed, 33 insertions(+), 16 deletions(-) diff --git a/src/show.jl b/src/show.jl index 3319e653..5dd5b21a 100644 --- a/src/show.jl +++ b/src/show.jl @@ -6,16 +6,20 @@ function makeshort(x) x = round.(x, sigdigits=3) end # print short vectors or single values as they are, compress otherwise - x = length(x) < 3 ? join(x, ", ") : join(x[1:3], ", ") * ", ..., " * string(last(x)) + if length(x) > 1 + return "[" * (length(x) < 3 ? join(x, ", ") : join(x[1:3], ", ") * " … " * string(last(x))) * "]" + end + + return x end """ Print information in the form: **name:** content[\n] """ -function printinfo(io::IO, name::String, content::String; newline::Bool=true) +function printinfo(io::IO, name::String, content, args...; newline::Bool=true) printstyled(io, name, ": "; bold=true) - newline ? println(io, content) : print(io, content) + newline ? println(io, content, args...) : print(io, content, args...) end "Print information about a survey design." @@ -33,24 +37,37 @@ function Base.show(io::IO, ::MIME"text/plain", design::AbstractSurveyDesign) printinfo(io, "ignorefpc", string(design.ignorefpc); newline=false) end - "Print information about a survey design." -function Base.show(io::IO, ::MIME"text/plain", design::SurveyDesign) +Base.show(io::IO, ::MIME"text/plain", design::SurveyDesign) = + surveyshow(IOContext(io, :compact=>true, :limit=>true, :displaysize=>(50, 50)), design) + +function surveyshow(io::IO, design::SurveyDesign) + # structure name type = typeof(design) printstyled(io, "$type:\n"; bold=true) - printstyled(io, "data: "; bold=true) - println(io, size(design.data, 1), "x", size(design.data, 2), " DataFrame") - printinfo(io, "cluster", string(design.cluster); newline=true) - printinfo(io, "design.data[!,design.cluster]", makeshort(design.data[!,design.cluster])) - printinfo(io, "popsize", string(design.popsize); newline=true) - printinfo(io, "design.data[!,design.popsize]", makeshort(design.data[!,design.popsize])) - printinfo(io, "sampsize", string(design.sampsize); newline=true) - printinfo(io, "design.data[!,design.sampsize]", makeshort(design.data[!,design.sampsize])) - printinfo(io, "design.data[!,:probs]", makeshort(design.data.probs)) - printinfo(io, "design.data[!,:allprobs]", makeshort(design.data.allprobs)) + # data info + printinfo(io, "data", summary(design.data)) + # strata info + strata_content = + design.strata == :false_strata ? + "none" : + (string(design.strata), "\n ", makeshort(design.data[!, design.strata])) + printinfo(io, "strata", strata_content...) + # cluster(s) info + cluster_content = + design.cluster == :false_cluster ? + "none" : + (string(design.cluster), "\n ", makeshort(design.data[!, design.cluster])) + printinfo(io, "cluster", cluster_content...) + # popsize and sampsize info + printinfo(io, "popsize", "\n ", makeshort(design.data[!, design.popsize])) + printinfo(io, "sampsize", "\n ", makeshort(design.data[!, design.sampsize])) + # weights and probs info + printinfo(io, "weights", "\n ", makeshort(design.data[!, :weights])) + printinfo(io, "probs", "\n ", makeshort(design.data[!, :probs]); newline=false) end -"Print information about a repliocate design." +"Print information about a replicate design." function Base.show(io::IO, ::MIME"text/plain", design::ReplicateDesign) type = typeof(design) printstyled(io, "$type:\n"; bold=true) From 8e8580b836a86e0dc5d8e8222f31cdd2275a1413 Mon Sep 17 00:00:00 2001 From: Iulia Dumitru Date: Wed, 11 Jan 2023 19:30:34 +0530 Subject: [PATCH 09/19] Change show for `AbstractSurveyDesign` and `ReplicateDesign`, restructure code --- src/show.jl | 60 ++++++++++++++++------------------------------------- 1 file changed, 18 insertions(+), 42 deletions(-) diff --git a/src/show.jl b/src/show.jl index 5dd5b21a..2af3d6d1 100644 --- a/src/show.jl +++ b/src/show.jl @@ -1,3 +1,5 @@ +surveyio(io) = IOContext(io, :compact=>true, :limit=>true, :displaysize=>(50, 50)) + """ Helper function that transforms a given `Number` or `Vector` into a short-form string. """ @@ -23,41 +25,33 @@ function printinfo(io::IO, name::String, content, args...; newline::Bool=true) end "Print information about a survey design." -function Base.show(io::IO, ::MIME"text/plain", design::AbstractSurveyDesign) - type = typeof(design) - printstyled(io, "$type:\n"; bold=true) - printstyled(io, "data: "; bold=true) - println(io, size(design.data, 1), "x", size(design.data, 2), " DataFrame") - printinfo(io, "weights", makeshort(design.data.weights)) - printinfo(io, "probs", makeshort(design.data.probs)) - printinfo(io, "fpc", makeshort(design.data.fpc)) - printinfo(io, "popsize", makeshort(design.popsize)) - printinfo(io, "sampsize", makeshort(design.sampsize)) - printinfo(io, "sampfraction", makeshort(design.sampfraction)) - printinfo(io, "ignorefpc", string(design.ignorefpc); newline=false) -end +Base.show(io::IO, ::MIME"text/plain", design::AbstractSurveyDesign) = + surveyshow(surveyio(io), design) -"Print information about a survey design." Base.show(io::IO, ::MIME"text/plain", design::SurveyDesign) = - surveyshow(IOContext(io, :compact=>true, :limit=>true, :displaysize=>(50, 50)), design) + surveyshow(surveyio(io), design) + +function Base.show(io::IO, ::MIME"text/plain", design::ReplicateDesign) + # new_io = IOContext(io, :compact=>true, :limit=>true, :displaysize=>(50, 50)) + surveyshow(surveyio(io), design) + printinfo(surveyio(io), "\nreplicates", design.replicates; newline=false) +end -function surveyshow(io::IO, design::SurveyDesign) +function surveyshow(io::IO, design::AbstractSurveyDesign) # structure name type = typeof(design) printstyled(io, "$type:\n"; bold=true) # data info printinfo(io, "data", summary(design.data)) # strata info - strata_content = - design.strata == :false_strata ? - "none" : - (string(design.strata), "\n ", makeshort(design.data[!, design.strata])) + strata_content = design.strata == :false_strata ? + "none" : + (string(design.strata), "\n ", makeshort(design.data[!, design.strata])) printinfo(io, "strata", strata_content...) # cluster(s) info - cluster_content = - design.cluster == :false_cluster ? - "none" : - (string(design.cluster), "\n ", makeshort(design.data[!, design.cluster])) + cluster_content = design.cluster == :false_cluster ? + "none" : + (string(design.cluster), "\n ", makeshort(design.data[!, design.cluster])) printinfo(io, "cluster", cluster_content...) # popsize and sampsize info printinfo(io, "popsize", "\n ", makeshort(design.data[!, design.popsize])) @@ -66,21 +60,3 @@ function surveyshow(io::IO, design::SurveyDesign) printinfo(io, "weights", "\n ", makeshort(design.data[!, :weights])) printinfo(io, "probs", "\n ", makeshort(design.data[!, :probs]); newline=false) end - -"Print information about a replicate design." -function Base.show(io::IO, ::MIME"text/plain", design::ReplicateDesign) - type = typeof(design) - printstyled(io, "$type:\n"; bold=true) - printstyled(io, "data: "; bold=true) - println(io, size(design.data, 1), "x", size(design.data, 2), " DataFrame") - printinfo(io, "cluster", string(design.cluster); newline=true) - printinfo(io, "design.data[!,design.cluster]", makeshort(design.data[!,design.cluster])) - printinfo(io, "popsize", string(design.popsize); newline=true) - printinfo(io, "design.data[!,design.popsize]", makeshort(design.data[!,design.popsize])) - printinfo(io, "sampsize", string(design.sampsize); newline=true) - printinfo(io, "design.data[!,design.sampsize]", makeshort(design.data[!,design.sampsize])) - printinfo(io, "design.data[!,:probs]", makeshort(design.data.probs)) - printinfo(io, "design.data[!,:allprobs]", makeshort(design.data.allprobs)) - printstyled(io, "replicates: "; bold=true) - println(io, design.replicates) -end \ No newline at end of file From 19261c09721514848621a7b43b71e9316f9cf07a Mon Sep 17 00:00:00 2001 From: Iulia Dumitru Date: Wed, 11 Jan 2023 19:37:14 +0530 Subject: [PATCH 10/19] Revert "Change show for `AbstractSurveyDesign` and `ReplicateDesign`, restructure code" This reverts commit b92a8d8ed026131248e586595e06341d50af7ff7. --- src/show.jl | 60 +++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 42 insertions(+), 18 deletions(-) diff --git a/src/show.jl b/src/show.jl index 2af3d6d1..5dd5b21a 100644 --- a/src/show.jl +++ b/src/show.jl @@ -1,5 +1,3 @@ -surveyio(io) = IOContext(io, :compact=>true, :limit=>true, :displaysize=>(50, 50)) - """ Helper function that transforms a given `Number` or `Vector` into a short-form string. """ @@ -25,33 +23,41 @@ function printinfo(io::IO, name::String, content, args...; newline::Bool=true) end "Print information about a survey design." -Base.show(io::IO, ::MIME"text/plain", design::AbstractSurveyDesign) = - surveyshow(surveyio(io), design) +function Base.show(io::IO, ::MIME"text/plain", design::AbstractSurveyDesign) + type = typeof(design) + printstyled(io, "$type:\n"; bold=true) + printstyled(io, "data: "; bold=true) + println(io, size(design.data, 1), "x", size(design.data, 2), " DataFrame") + printinfo(io, "weights", makeshort(design.data.weights)) + printinfo(io, "probs", makeshort(design.data.probs)) + printinfo(io, "fpc", makeshort(design.data.fpc)) + printinfo(io, "popsize", makeshort(design.popsize)) + printinfo(io, "sampsize", makeshort(design.sampsize)) + printinfo(io, "sampfraction", makeshort(design.sampfraction)) + printinfo(io, "ignorefpc", string(design.ignorefpc); newline=false) +end +"Print information about a survey design." Base.show(io::IO, ::MIME"text/plain", design::SurveyDesign) = - surveyshow(surveyio(io), design) - -function Base.show(io::IO, ::MIME"text/plain", design::ReplicateDesign) - # new_io = IOContext(io, :compact=>true, :limit=>true, :displaysize=>(50, 50)) - surveyshow(surveyio(io), design) - printinfo(surveyio(io), "\nreplicates", design.replicates; newline=false) -end + surveyshow(IOContext(io, :compact=>true, :limit=>true, :displaysize=>(50, 50)), design) -function surveyshow(io::IO, design::AbstractSurveyDesign) +function surveyshow(io::IO, design::SurveyDesign) # structure name type = typeof(design) printstyled(io, "$type:\n"; bold=true) # data info printinfo(io, "data", summary(design.data)) # strata info - strata_content = design.strata == :false_strata ? - "none" : - (string(design.strata), "\n ", makeshort(design.data[!, design.strata])) + strata_content = + design.strata == :false_strata ? + "none" : + (string(design.strata), "\n ", makeshort(design.data[!, design.strata])) printinfo(io, "strata", strata_content...) # cluster(s) info - cluster_content = design.cluster == :false_cluster ? - "none" : - (string(design.cluster), "\n ", makeshort(design.data[!, design.cluster])) + cluster_content = + design.cluster == :false_cluster ? + "none" : + (string(design.cluster), "\n ", makeshort(design.data[!, design.cluster])) printinfo(io, "cluster", cluster_content...) # popsize and sampsize info printinfo(io, "popsize", "\n ", makeshort(design.data[!, design.popsize])) @@ -60,3 +66,21 @@ function surveyshow(io::IO, design::AbstractSurveyDesign) printinfo(io, "weights", "\n ", makeshort(design.data[!, :weights])) printinfo(io, "probs", "\n ", makeshort(design.data[!, :probs]); newline=false) end + +"Print information about a replicate design." +function Base.show(io::IO, ::MIME"text/plain", design::ReplicateDesign) + type = typeof(design) + printstyled(io, "$type:\n"; bold=true) + printstyled(io, "data: "; bold=true) + println(io, size(design.data, 1), "x", size(design.data, 2), " DataFrame") + printinfo(io, "cluster", string(design.cluster); newline=true) + printinfo(io, "design.data[!,design.cluster]", makeshort(design.data[!,design.cluster])) + printinfo(io, "popsize", string(design.popsize); newline=true) + printinfo(io, "design.data[!,design.popsize]", makeshort(design.data[!,design.popsize])) + printinfo(io, "sampsize", string(design.sampsize); newline=true) + printinfo(io, "design.data[!,design.sampsize]", makeshort(design.data[!,design.sampsize])) + printinfo(io, "design.data[!,:probs]", makeshort(design.data.probs)) + printinfo(io, "design.data[!,:allprobs]", makeshort(design.data.allprobs)) + printstyled(io, "replicates: "; bold=true) + println(io, design.replicates) +end \ No newline at end of file From b9eefbf13e74d300363872314d54e86757ba0db2 Mon Sep 17 00:00:00 2001 From: Iulia Dumitru Date: Wed, 11 Jan 2023 19:39:24 +0530 Subject: [PATCH 11/19] Revert "Change show for `SurveyDesign`" This reverts commit ae0cb2aa8bc87b7d318d97046f5f8de8d58a1be9. --- src/show.jl | 49 ++++++++++++++++--------------------------------- 1 file changed, 16 insertions(+), 33 deletions(-) diff --git a/src/show.jl b/src/show.jl index 5dd5b21a..3319e653 100644 --- a/src/show.jl +++ b/src/show.jl @@ -6,20 +6,16 @@ function makeshort(x) x = round.(x, sigdigits=3) end # print short vectors or single values as they are, compress otherwise - if length(x) > 1 - return "[" * (length(x) < 3 ? join(x, ", ") : join(x[1:3], ", ") * " … " * string(last(x))) * "]" - end - - return x + x = length(x) < 3 ? join(x, ", ") : join(x[1:3], ", ") * ", ..., " * string(last(x)) end """ Print information in the form: **name:** content[\n] """ -function printinfo(io::IO, name::String, content, args...; newline::Bool=true) +function printinfo(io::IO, name::String, content::String; newline::Bool=true) printstyled(io, name, ": "; bold=true) - newline ? println(io, content, args...) : print(io, content, args...) + newline ? println(io, content) : print(io, content) end "Print information about a survey design." @@ -37,37 +33,24 @@ function Base.show(io::IO, ::MIME"text/plain", design::AbstractSurveyDesign) printinfo(io, "ignorefpc", string(design.ignorefpc); newline=false) end -"Print information about a survey design." -Base.show(io::IO, ::MIME"text/plain", design::SurveyDesign) = - surveyshow(IOContext(io, :compact=>true, :limit=>true, :displaysize=>(50, 50)), design) -function surveyshow(io::IO, design::SurveyDesign) - # structure name +"Print information about a survey design." +function Base.show(io::IO, ::MIME"text/plain", design::SurveyDesign) type = typeof(design) printstyled(io, "$type:\n"; bold=true) - # data info - printinfo(io, "data", summary(design.data)) - # strata info - strata_content = - design.strata == :false_strata ? - "none" : - (string(design.strata), "\n ", makeshort(design.data[!, design.strata])) - printinfo(io, "strata", strata_content...) - # cluster(s) info - cluster_content = - design.cluster == :false_cluster ? - "none" : - (string(design.cluster), "\n ", makeshort(design.data[!, design.cluster])) - printinfo(io, "cluster", cluster_content...) - # popsize and sampsize info - printinfo(io, "popsize", "\n ", makeshort(design.data[!, design.popsize])) - printinfo(io, "sampsize", "\n ", makeshort(design.data[!, design.sampsize])) - # weights and probs info - printinfo(io, "weights", "\n ", makeshort(design.data[!, :weights])) - printinfo(io, "probs", "\n ", makeshort(design.data[!, :probs]); newline=false) + printstyled(io, "data: "; bold=true) + println(io, size(design.data, 1), "x", size(design.data, 2), " DataFrame") + printinfo(io, "cluster", string(design.cluster); newline=true) + printinfo(io, "design.data[!,design.cluster]", makeshort(design.data[!,design.cluster])) + printinfo(io, "popsize", string(design.popsize); newline=true) + printinfo(io, "design.data[!,design.popsize]", makeshort(design.data[!,design.popsize])) + printinfo(io, "sampsize", string(design.sampsize); newline=true) + printinfo(io, "design.data[!,design.sampsize]", makeshort(design.data[!,design.sampsize])) + printinfo(io, "design.data[!,:probs]", makeshort(design.data.probs)) + printinfo(io, "design.data[!,:allprobs]", makeshort(design.data.allprobs)) end -"Print information about a replicate design." +"Print information about a repliocate design." function Base.show(io::IO, ::MIME"text/plain", design::ReplicateDesign) type = typeof(design) printstyled(io, "$type:\n"; bold=true) From 559cc67905d425f2751d7b5bc9d916fdd9a19e6d Mon Sep 17 00:00:00 2001 From: Iulia Dumitru Date: Wed, 11 Jan 2023 19:42:41 +0530 Subject: [PATCH 12/19] Revert "Revert "Change show for `SurveyDesign`"" This reverts commit 15e34ece548cc990bfeca4683150ee2350032705. --- src/show.jl | 49 +++++++++++++++++++++++++++++++++---------------- 1 file changed, 33 insertions(+), 16 deletions(-) diff --git a/src/show.jl b/src/show.jl index 3319e653..5dd5b21a 100644 --- a/src/show.jl +++ b/src/show.jl @@ -6,16 +6,20 @@ function makeshort(x) x = round.(x, sigdigits=3) end # print short vectors or single values as they are, compress otherwise - x = length(x) < 3 ? join(x, ", ") : join(x[1:3], ", ") * ", ..., " * string(last(x)) + if length(x) > 1 + return "[" * (length(x) < 3 ? join(x, ", ") : join(x[1:3], ", ") * " … " * string(last(x))) * "]" + end + + return x end """ Print information in the form: **name:** content[\n] """ -function printinfo(io::IO, name::String, content::String; newline::Bool=true) +function printinfo(io::IO, name::String, content, args...; newline::Bool=true) printstyled(io, name, ": "; bold=true) - newline ? println(io, content) : print(io, content) + newline ? println(io, content, args...) : print(io, content, args...) end "Print information about a survey design." @@ -33,24 +37,37 @@ function Base.show(io::IO, ::MIME"text/plain", design::AbstractSurveyDesign) printinfo(io, "ignorefpc", string(design.ignorefpc); newline=false) end - "Print information about a survey design." -function Base.show(io::IO, ::MIME"text/plain", design::SurveyDesign) +Base.show(io::IO, ::MIME"text/plain", design::SurveyDesign) = + surveyshow(IOContext(io, :compact=>true, :limit=>true, :displaysize=>(50, 50)), design) + +function surveyshow(io::IO, design::SurveyDesign) + # structure name type = typeof(design) printstyled(io, "$type:\n"; bold=true) - printstyled(io, "data: "; bold=true) - println(io, size(design.data, 1), "x", size(design.data, 2), " DataFrame") - printinfo(io, "cluster", string(design.cluster); newline=true) - printinfo(io, "design.data[!,design.cluster]", makeshort(design.data[!,design.cluster])) - printinfo(io, "popsize", string(design.popsize); newline=true) - printinfo(io, "design.data[!,design.popsize]", makeshort(design.data[!,design.popsize])) - printinfo(io, "sampsize", string(design.sampsize); newline=true) - printinfo(io, "design.data[!,design.sampsize]", makeshort(design.data[!,design.sampsize])) - printinfo(io, "design.data[!,:probs]", makeshort(design.data.probs)) - printinfo(io, "design.data[!,:allprobs]", makeshort(design.data.allprobs)) + # data info + printinfo(io, "data", summary(design.data)) + # strata info + strata_content = + design.strata == :false_strata ? + "none" : + (string(design.strata), "\n ", makeshort(design.data[!, design.strata])) + printinfo(io, "strata", strata_content...) + # cluster(s) info + cluster_content = + design.cluster == :false_cluster ? + "none" : + (string(design.cluster), "\n ", makeshort(design.data[!, design.cluster])) + printinfo(io, "cluster", cluster_content...) + # popsize and sampsize info + printinfo(io, "popsize", "\n ", makeshort(design.data[!, design.popsize])) + printinfo(io, "sampsize", "\n ", makeshort(design.data[!, design.sampsize])) + # weights and probs info + printinfo(io, "weights", "\n ", makeshort(design.data[!, :weights])) + printinfo(io, "probs", "\n ", makeshort(design.data[!, :probs]); newline=false) end -"Print information about a repliocate design." +"Print information about a replicate design." function Base.show(io::IO, ::MIME"text/plain", design::ReplicateDesign) type = typeof(design) printstyled(io, "$type:\n"; bold=true) From cdeb607b055ec91c276d6f422eab1c82e16c39a6 Mon Sep 17 00:00:00 2001 From: Iulia Dumitru Date: Wed, 11 Jan 2023 19:42:54 +0530 Subject: [PATCH 13/19] Revert "Revert "Change show for `AbstractSurveyDesign` and `ReplicateDesign`, restructure code"" This reverts commit aaeebb1c35685f64548caaae9349bb1f0b10299e. --- src/show.jl | 60 ++++++++++++++++------------------------------------- 1 file changed, 18 insertions(+), 42 deletions(-) diff --git a/src/show.jl b/src/show.jl index 5dd5b21a..2af3d6d1 100644 --- a/src/show.jl +++ b/src/show.jl @@ -1,3 +1,5 @@ +surveyio(io) = IOContext(io, :compact=>true, :limit=>true, :displaysize=>(50, 50)) + """ Helper function that transforms a given `Number` or `Vector` into a short-form string. """ @@ -23,41 +25,33 @@ function printinfo(io::IO, name::String, content, args...; newline::Bool=true) end "Print information about a survey design." -function Base.show(io::IO, ::MIME"text/plain", design::AbstractSurveyDesign) - type = typeof(design) - printstyled(io, "$type:\n"; bold=true) - printstyled(io, "data: "; bold=true) - println(io, size(design.data, 1), "x", size(design.data, 2), " DataFrame") - printinfo(io, "weights", makeshort(design.data.weights)) - printinfo(io, "probs", makeshort(design.data.probs)) - printinfo(io, "fpc", makeshort(design.data.fpc)) - printinfo(io, "popsize", makeshort(design.popsize)) - printinfo(io, "sampsize", makeshort(design.sampsize)) - printinfo(io, "sampfraction", makeshort(design.sampfraction)) - printinfo(io, "ignorefpc", string(design.ignorefpc); newline=false) -end +Base.show(io::IO, ::MIME"text/plain", design::AbstractSurveyDesign) = + surveyshow(surveyio(io), design) -"Print information about a survey design." Base.show(io::IO, ::MIME"text/plain", design::SurveyDesign) = - surveyshow(IOContext(io, :compact=>true, :limit=>true, :displaysize=>(50, 50)), design) + surveyshow(surveyio(io), design) + +function Base.show(io::IO, ::MIME"text/plain", design::ReplicateDesign) + # new_io = IOContext(io, :compact=>true, :limit=>true, :displaysize=>(50, 50)) + surveyshow(surveyio(io), design) + printinfo(surveyio(io), "\nreplicates", design.replicates; newline=false) +end -function surveyshow(io::IO, design::SurveyDesign) +function surveyshow(io::IO, design::AbstractSurveyDesign) # structure name type = typeof(design) printstyled(io, "$type:\n"; bold=true) # data info printinfo(io, "data", summary(design.data)) # strata info - strata_content = - design.strata == :false_strata ? - "none" : - (string(design.strata), "\n ", makeshort(design.data[!, design.strata])) + strata_content = design.strata == :false_strata ? + "none" : + (string(design.strata), "\n ", makeshort(design.data[!, design.strata])) printinfo(io, "strata", strata_content...) # cluster(s) info - cluster_content = - design.cluster == :false_cluster ? - "none" : - (string(design.cluster), "\n ", makeshort(design.data[!, design.cluster])) + cluster_content = design.cluster == :false_cluster ? + "none" : + (string(design.cluster), "\n ", makeshort(design.data[!, design.cluster])) printinfo(io, "cluster", cluster_content...) # popsize and sampsize info printinfo(io, "popsize", "\n ", makeshort(design.data[!, design.popsize])) @@ -66,21 +60,3 @@ function surveyshow(io::IO, design::SurveyDesign) printinfo(io, "weights", "\n ", makeshort(design.data[!, :weights])) printinfo(io, "probs", "\n ", makeshort(design.data[!, :probs]); newline=false) end - -"Print information about a replicate design." -function Base.show(io::IO, ::MIME"text/plain", design::ReplicateDesign) - type = typeof(design) - printstyled(io, "$type:\n"; bold=true) - printstyled(io, "data: "; bold=true) - println(io, size(design.data, 1), "x", size(design.data, 2), " DataFrame") - printinfo(io, "cluster", string(design.cluster); newline=true) - printinfo(io, "design.data[!,design.cluster]", makeshort(design.data[!,design.cluster])) - printinfo(io, "popsize", string(design.popsize); newline=true) - printinfo(io, "design.data[!,design.popsize]", makeshort(design.data[!,design.popsize])) - printinfo(io, "sampsize", string(design.sampsize); newline=true) - printinfo(io, "design.data[!,design.sampsize]", makeshort(design.data[!,design.sampsize])) - printinfo(io, "design.data[!,:probs]", makeshort(design.data.probs)) - printinfo(io, "design.data[!,:allprobs]", makeshort(design.data.allprobs)) - printstyled(io, "replicates: "; bold=true) - println(io, design.replicates) -end \ No newline at end of file From dfa55a329124031e3c7208cd996c370e679d84b9 Mon Sep 17 00:00:00 2001 From: Iulia Dumitru Date: Wed, 11 Jan 2023 21:09:15 +0530 Subject: [PATCH 14/19] Change `surveyio(io)` to `io` --- src/show.jl | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/show.jl b/src/show.jl index 2af3d6d1..90782c5b 100644 --- a/src/show.jl +++ b/src/show.jl @@ -26,15 +26,15 @@ end "Print information about a survey design." Base.show(io::IO, ::MIME"text/plain", design::AbstractSurveyDesign) = - surveyshow(surveyio(io), design) + surveyshow(io, design) Base.show(io::IO, ::MIME"text/plain", design::SurveyDesign) = - surveyshow(surveyio(io), design) + surveyshow(io, design) function Base.show(io::IO, ::MIME"text/plain", design::ReplicateDesign) # new_io = IOContext(io, :compact=>true, :limit=>true, :displaysize=>(50, 50)) - surveyshow(surveyio(io), design) - printinfo(surveyio(io), "\nreplicates", design.replicates; newline=false) + surveyshow(io, design) + printinfo(io, "\nreplicates", design.replicates; newline=false) end function surveyshow(io::IO, design::AbstractSurveyDesign) From 90b941d29140ef02bffba2851b9c5e3697a31f53 Mon Sep 17 00:00:00 2001 From: Iulia Dumitru Date: Thu, 12 Jan 2023 10:44:52 +0530 Subject: [PATCH 15/19] Remove new line after `popsize`, `sampsize` and weights --- src/show.jl | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/src/show.jl b/src/show.jl index 90782c5b..a72c3250 100644 --- a/src/show.jl +++ b/src/show.jl @@ -11,7 +11,6 @@ function makeshort(x) if length(x) > 1 return "[" * (length(x) < 3 ? join(x, ", ") : join(x[1:3], ", ") * " … " * string(last(x))) * "]" end - return x end @@ -54,9 +53,9 @@ function surveyshow(io::IO, design::AbstractSurveyDesign) (string(design.cluster), "\n ", makeshort(design.data[!, design.cluster])) printinfo(io, "cluster", cluster_content...) # popsize and sampsize info - printinfo(io, "popsize", "\n ", makeshort(design.data[!, design.popsize])) - printinfo(io, "sampsize", "\n ", makeshort(design.data[!, design.sampsize])) + printinfo(io, "popsize", makeshort(design.data[!, design.popsize])) + printinfo(io, "sampsize", makeshort(design.data[!, design.sampsize])) # weights and probs info - printinfo(io, "weights", "\n ", makeshort(design.data[!, :weights])) - printinfo(io, "probs", "\n ", makeshort(design.data[!, :probs]); newline=false) + printinfo(io, "weights", makeshort(design.data[!, :weights])) + printinfo(io, "probs", makeshort(design.data[!, :probs]); newline=false) end From 9fe4bf7828d5fae19695edfc43832d986090cd1f Mon Sep 17 00:00:00 2001 From: Iulia Dumitru Date: Thu, 12 Jan 2023 13:56:54 +0530 Subject: [PATCH 16/19] Change docstrings to account for new `show` --- src/Survey.jl | 2 +- src/SurveyDesign.jl | 38 ++++++++++++++++++-------------------- src/bootstrap.jl | 23 ++++++++++------------- src/boxplot.jl | 2 +- src/by.jl | 2 +- src/hist.jl | 2 +- src/jackknife.jl | 2 +- src/mean.jl | 12 ++++++------ src/ratio.jl | 11 ++++------- src/total.jl | 12 ++++++------ 10 files changed, 49 insertions(+), 57 deletions(-) diff --git a/src/Survey.jl b/src/Survey.jl index dd71a092..f25e33a7 100644 --- a/src/Survey.jl +++ b/src/Survey.jl @@ -38,4 +38,4 @@ export bootweights export jkknife export ratio -end \ No newline at end of file +end diff --git a/src/SurveyDesign.jl b/src/SurveyDesign.jl index 69657cd0..0cc1e6b0 100644 --- a/src/SurveyDesign.jl +++ b/src/SurveyDesign.jl @@ -28,19 +28,18 @@ individuals in one cluster are sampled. The clusters are considered disjoint and - `popsize::Union{Nothing, Int, Symbol}=nothing`: the (expected) survey population size. ```jldoctest -julia> apiclus1 = load_data("apiclus1"); +julia> apistrat = load_data("apistrat"); -julia> dclus1 = SurveyDesign(apiclus1; clusters=:dnum, weights=:pw) +julia> strat = SurveyDesign(apistrat; strata=:stype, weights=:pw) SurveyDesign: -data: 183x46 DataFrame -cluster: dnum -design.data[!,design.cluster]: 637, 637, 637, ..., 448 -popsize: popsize -design.data[!,design.popsize]: 6190.0, 6190.0, 6190.0, ..., 6190.0 -sampsize: sampsize -design.data[!,design.sampsize]: 15, 15, 15, ..., 15 -design.data[!,:probs]: 0.0295, 0.0295, 0.0295, ..., 0.0295 -design.data[!,:allprobs]: 0.0295, 0.0295, 0.0295, ..., 0.0295 +data: 200×46 DataFrame +strata: stype + [E, E, E … H] +cluster: none +popsize: [6190.0, 6190.0, 6190.0 … 6190.0] +sampsize: [200, 200, 200 … 200] +weights: [44.2, 44.2, 44.2 … 15.1] +probs: [0.0226, 0.0226, 0.0226 … 0.0662] ``` """ struct SurveyDesign <: AbstractSurveyDesign @@ -107,15 +106,14 @@ julia> strat = SurveyDesign(apistrat; strata=:stype, weights=:pw); julia> bootstrat = bootweights(strat; replicates=1000) ReplicateDesign: -data: 200x1046 DataFrame -cluster: false_cluster -design.data[!,design.cluster]: 1, 2, 3, ..., 200 -popsize: popsize -design.data[!,design.popsize]: 6190.0, 6190.0, 6190.0, ..., 6190.0 -sampsize: sampsize -design.data[!,design.sampsize]: 200, 200, 200, ..., 200 -design.data[!,:probs]: 0.0226, 0.0226, 0.0226, ..., 0.0662 -design.data[!,:allprobs]: 0.0226, 0.0226, 0.0226, ..., 0.0662 +data: 200×1046 DataFrame +strata: stype + [E, E, E … H] +cluster: none +popsize: [6190.0, 6190.0, 6190.0 … 6190.0] +sampsize: [200, 200, 200 … 200] +weights: [44.2, 44.2, 44.2 … 15.1] +probs: [0.0226, 0.0226, 0.0226 … 0.0662] replicates: 1000 ``` """ diff --git a/src/bootstrap.jl b/src/bootstrap.jl index b4e226a8..83defc97 100644 --- a/src/bootstrap.jl +++ b/src/bootstrap.jl @@ -4,21 +4,18 @@ julia> using Random julia> apiclus1 = load_data("apiclus1"); -julia> dclus1 = SurveyDesign(apiclus1; clusters = :dnum); +julia> clus_one_stage = SurveyDesign(apiclus1; clusters = :dnum); -julia> rng = MersenneTwister(111); - -julia> bootweights(dclus1; replicates=1000, rng) +julia> bootweights(clus_one_stage; replicates=1000, rng=MersenneTwister(111)) # choose a seed for deterministic results ReplicateDesign: -data: 183x1046 DataFrame +data: 183×1046 DataFrame +strata: none cluster: dnum -design.data[!,design.cluster]: 637, 637, 637, ..., 448 -popsize: popsize -design.data[!,design.popsize]: 183, 183, 183, ..., 183 -sampsize: sampsize -design.data[!,design.sampsize]: 15, 15, 15, ..., 15 -design.data[!,:probs]: 1.0, 1.0, 1.0, ..., 1.0 -design.data[!,:allprobs]: 1.0, 1.0, 1.0, ..., 1.0 + [637, 637, 637 … 448] +popsize: [183, 183, 183 … 183] +sampsize: [15, 15, 15 … 15] +weights: [1, 1, 1 … 1] +probs: [1.0, 1.0, 1.0 … 1.0] replicates: 1000 ``` """ @@ -51,4 +48,4 @@ function bootweights(design::SurveyDesign; replicates=4000, rng=MersenneTwister( df[!, "replicate_" * string(i)] = disallowmissing(replicate(stratified, H).whij) end return ReplicateDesign(df, design.cluster, design.popsize, design.sampsize, design.strata, design.pps, replicates) -end \ No newline at end of file +end diff --git a/src/boxplot.jl b/src/boxplot.jl index 8790f116..8ee3dcc4 100644 --- a/src/boxplot.jl +++ b/src/boxplot.jl @@ -10,7 +10,7 @@ The keyword arguments are all the arguments that can be passed to `mapping` in ```@example boxplot apisrs = load_data("apisrs"); -srs = srs = SurveyDesign(apisrs; weights=:pw); +srs = SurveyDesign(apisrs; weights=:pw); bp = boxplot(srs, :stype, :enroll; weights = :pw) save("boxplot.png", bp); nothing # hide ``` diff --git a/src/by.jl b/src/by.jl index be26d5a3..a4de2f55 100644 --- a/src/by.jl +++ b/src/by.jl @@ -14,4 +14,4 @@ function bydomain(x::Symbol, domain::Symbol, design::ReplicateDesign, func::Func replace!(ses, NaN => 0) X.SE = ses return X -end \ No newline at end of file +end diff --git a/src/hist.jl b/src/hist.jl index 90d42d1b..17b54098 100644 --- a/src/hist.jl +++ b/src/hist.jl @@ -61,7 +61,7 @@ For the complete argument list see [Makie.hist](https://makie.juliaplots.org/sta ```@example histogram apisrs = load_data("apisrs"); -srs = SimpleRandomSample(apisrs;popsize=:fpc); +srs = SurveyDesign(apisrs; weights=:pw); h = hist(srs, :enroll) save("hist.png", h); nothing # hide ``` diff --git a/src/jackknife.jl b/src/jackknife.jl index 794ef10b..55880df9 100644 --- a/src/jackknife.jl +++ b/src/jackknife.jl @@ -13,4 +13,4 @@ function jkknife(variable:: Symbol, design::SurveyDesign ,func:: Function; para end var = c*(nh-1)/nh return DataFrame(Statistic = statistic, SE = sqrt(var)) -end \ No newline at end of file +end diff --git a/src/mean.jl b/src/mean.jl index 0ef5bb37..c1d80259 100644 --- a/src/mean.jl +++ b/src/mean.jl @@ -6,16 +6,16 @@ Compute the estimated mean of one or more variables within a survey design. ```jldoctest julia> apiclus1 = load_data("apiclus1"); -julia> clus1 = SurveyDesign(apiclus1; clusters = :dnum, weights = :pw) |> bootweights; +julia> clus_one_stage = SurveyDesign(apiclus1; clusters = :dnum, weights = :pw) |> bootweights; -julia> mean(:api00, clus1) +julia> mean(:api00, clus_one_stage) 1×2 DataFrame Row │ mean SE │ Float64 Float64 ─────┼────────────────── 1 │ 644.169 23.2919 -julia> mean([:api00, :enroll], clus1) +julia> mean([:api00, :enroll], clus_one_stage) 2×3 DataFrame Row │ names mean SE │ String Float64 Float64 @@ -45,9 +45,9 @@ Compute the estimated mean within a domain. ```jldoctest julia> apiclus1 = load_data("apiclus1"); -julia> clus1 = SurveyDesign(apiclus1; clusters = :dnum, weights = :pw) |> bootweights; +julia> clus_one_stage = SurveyDesign(apiclus1; clusters = :dnum, weights = :pw) |> bootweights; -julia> mean(:api00, :cname, clus1) +julia> mean(:api00, :cname, clus_one_stage) 11×3 DataFrame Row │ cname mean SE │ String15 Float64 Any @@ -70,4 +70,4 @@ function mean(x::Symbol, domain::Symbol, design::ReplicateDesign) df = bydomain(x, domain, design, weighted_mean) rename!(df, :statistic => :mean) return df -end \ No newline at end of file +end diff --git a/src/ratio.jl b/src/ratio.jl index 67e51668..1623eb3a 100644 --- a/src/ratio.jl +++ b/src/ratio.jl @@ -1,17 +1,14 @@ """ ratio(numerator, denominator, design) + Estimate the ratio of the columns specified in numerator and denominator ```jldoctest -julia> using Survey; - julia> apiclus1 = load_data("apiclus1"); -julia> apiclus1[!, :pw] = fill(757/15,(size(apiclus1,1),)); # Correct api mistake for pw column - -julia> dclus1 = SurveyDesign(apiclus1; clusters = :dnum, weights = :pw); +julia> clus_one_stage = SurveyDesign(apiclus1; clusters = :dnum, weights = :pw); -julia> ratio(:api00, :enroll, dclus1) +julia> ratio(:api00, :enroll, clus_one_stage) 1×2 DataFrame Row │ Statistic SE │ Float64 Float64 @@ -35,4 +32,4 @@ function ratio(variable_num:: Symbol, variable_den:: Symbol, design::SurveyDesig end var = c*(nh-1)/nh return DataFrame(Statistic = statistic, SE = sqrt(var)) -end \ No newline at end of file +end diff --git a/src/total.jl b/src/total.jl index e5fbbdcb..0c5001e5 100644 --- a/src/total.jl +++ b/src/total.jl @@ -6,16 +6,16 @@ Compute the estimated population total for one or more variables within a survey ```jldoctest julia> apiclus1 = load_data("apiclus1"); -julia> clus1 = SurveyDesign(apiclus1; clusters = :dnum, weights = :pw) |> bootweights; +julia> clus_one_stage = SurveyDesign(apiclus1; clusters = :dnum, weights = :pw) |> bootweights; -julia> total(:api00, clus1) +julia> total(:api00, clus_one_stage) 1×2 DataFrame Row │ total SE │ Float64 Float64 ─────┼────────────────────── 1 │ 3.98999e6 9.22175e5 -julia> total([:api00, :enroll], clus1) +julia> total([:api00, :enroll], clus_one_stage) 2×3 DataFrame Row │ names total SE │ String Float64 Float64 @@ -45,9 +45,9 @@ Compute the estimated population total within a domain. ```jldoctest julia> apiclus1 = load_data("apiclus1"); -julia> clus1 = SurveyDesign(apiclus1; clusters = :dnum, weights = :pw) |> bootweights; +julia> clus_one_stage = SurveyDesign(apiclus1; clusters = :dnum, weights = :pw) |> bootweights; -julia> total(:api00, :cname, clus1) +julia> total(:api00, :cname, clus_one_stage) 11×3 DataFrame Row │ cname total SE │ String15 Float64 Any @@ -68,4 +68,4 @@ julia> total(:api00, :cname, clus1) function total(x::Symbol, domain::Symbol, design::ReplicateDesign) df = bydomain(x, domain, design, wsum) rename!(df, :statistic => :total) -end \ No newline at end of file +end From ca92b7223bc53ebf8042892e67faf1779f852e34 Mon Sep 17 00:00:00 2001 From: Iulia Dumitru Date: Thu, 12 Jan 2023 14:50:42 +0530 Subject: [PATCH 17/19] Add tests for `show` --- test/runtests.jl | 3 +- test/show.jl | 111 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 113 insertions(+), 1 deletion(-) create mode 100644 test/show.jl diff --git a/test/runtests.jl b/test/runtests.jl index e8f18a3a..6bb9c738 100644 --- a/test/runtests.jl +++ b/test/runtests.jl @@ -16,4 +16,5 @@ include("mean.jl") include("plot.jl") include("hist.jl") include("boxplot.jl") -include("ratio.jl") \ No newline at end of file +include("ratio.jl") +include("show.jl") diff --git a/test/show.jl b/test/show.jl new file mode 100644 index 00000000..f81ed2a3 --- /dev/null +++ b/test/show.jl @@ -0,0 +1,111 @@ +@testset "No strata, no clusters" begin + io = IOBuffer() + + apisrs = load_data("apisrs") + srs = SurveyDesign(apisrs; weights=:pw) + refstr = """ + SurveyDesign: + data: 200×47 DataFrame + strata: none + cluster: none + popsize: [6190.0, 6190.0, 6190.0 … 6190.0] + sampsize: [200, 200, 200 … 200] + weights: [31.0, 31.0, 31.0 … 31.0] + probs: [0.0323, 0.0323, 0.0323 … 0.0323]""" + + show(io, MIME("text/plain"), srs) + str = String(take!(io)) + @test str == refstr + + bsrs = srs |> bootweights + refstrb = """ + ReplicateDesign: + data: 200×4047 DataFrame + strata: none + cluster: none + popsize: [6190.0, 6190.0, 6190.0 … 6190.0] + sampsize: [200, 200, 200 … 200] + weights: [31.0, 31.0, 31.0 … 31.0] + probs: [0.0323, 0.0323, 0.0323 … 0.0323] + replicates: 4000""" + + show(io, MIME("text/plain"), bsrs) + strb = String(take!(io)) + @test strb == refstrb +end + +@testset "With strata, no clusters" begin + io = IOBuffer() + + apistrat = load_data("apistrat") + strat = SurveyDesign(apistrat; strata=:stype, weights=:pw) + refstr = """ + SurveyDesign: + data: 200×46 DataFrame + strata: stype + [E, E, E … H] + cluster: none + popsize: [6190.0, 6190.0, 6190.0 … 6190.0] + sampsize: [200, 200, 200 … 200] + weights: [44.2, 44.2, 44.2 … 15.1] + probs: [0.0226, 0.0226, 0.0226 … 0.0662]""" + + show(io, MIME("text/plain"), strat) + str = String(take!(io)) + @test str == refstr + + stratb = strat |> bootweights + refstrb = """ + ReplicateDesign: + data: 200×4046 DataFrame + strata: stype + [E, E, E … H] + cluster: none + popsize: [6190.0, 6190.0, 6190.0 … 6190.0] + sampsize: [200, 200, 200 … 200] + weights: [44.2, 44.2, 44.2 … 15.1] + probs: [0.0226, 0.0226, 0.0226 … 0.0662] + replicates: 4000""" + + show(io, MIME("text/plain"), stratb) + strb = String(take!(io)) + @test strb == refstrb +end + +@testset "No strata, with clusters" begin + io = IOBuffer() + + apiclus1 = load_data("apiclus1") + clus_one_stage = SurveyDesign(apiclus1; clusters = :dnum, weights = :pw) + refstr = """ + SurveyDesign: + data: 183×46 DataFrame + strata: none + cluster: dnum + [637, 637, 637 … 448] + popsize: [6190.0, 6190.0, 6190.0 … 6190.0] + sampsize: [15, 15, 15 … 15] + weights: [33.8, 33.8, 33.8 … 33.8] + probs: [0.0295, 0.0295, 0.0295 … 0.0295]""" + + show(io, MIME("text/plain"), clus_one_stage) + str = String(take!(io)) + @test str == refstr + + clus_one_stageb = clus_one_stage |> bootweights + refstrb = """ + ReplicateDesign: + data: 183×4046 DataFrame + strata: none + cluster: dnum + [637, 637, 637 … 448] + popsize: [6190.0, 6190.0, 6190.0 … 6190.0] + sampsize: [15, 15, 15 … 15] + weights: [33.8, 33.8, 33.8 … 33.8] + probs: [0.0295, 0.0295, 0.0295 … 0.0295] + replicates: 4000""" + + show(io, MIME("text/plain"), clus_one_stageb) + strb = String(take!(io)) + @test strb == refstrb +end From c42acc7f55ba2431763ba1cb19e526c3deb7af0b Mon Sep 17 00:00:00 2001 From: Iulia Dumitru Date: Fri, 13 Jan 2023 12:17:26 +0530 Subject: [PATCH 18/19] Integrate changes to `show` --- README.md | 79 +++++++++++++++++++++++++------------------------------ 1 file changed, 36 insertions(+), 43 deletions(-) diff --git a/README.md b/README.md index a25bfe3a..3f1542a3 100644 --- a/README.md +++ b/README.md @@ -41,57 +41,52 @@ julia> apisrs = load_data("apisrs"); julia> srs = SurveyDesign(apisrs; weights=:pw) SurveyDesign: -data: 200x47 DataFrame -cluster: false_cluster -design.data[!,design.cluster]: 1, 2, 3, ..., 200 -popsize: popsize -design.data[!,design.popsize]: 6190.0, 6190.0, 6190.0, ..., 6190.0 -sampsize: sampsize -design.data[!,design.sampsize]: 200, 200, 200, ..., 200 -design.data[!,:probs]: 0.0323, 0.0323, 0.0323, ..., 0.0323 -design.data[!,:allprobs]: 0.0323, 0.0323, 0.0323, ..., 0.0323 +data: 200×47 DataFrame +strata: none +cluster: none +popsize: [6190.0, 6190.0, 6190.0 … 6190.0] +sampsize: [200, 200, 200 … 200] +weights: [31.0, 31.0, 31.0 … 31.0] +probs: [0.0323, 0.0323, 0.0323 … 0.0323] julia> apistrat = load_data("apistrat"); julia> strat = SurveyDesign(apistrat; strata=:stype, weights=:pw) SurveyDesign: -data: 200x46 DataFrame -cluster: false_cluster -design.data[!,design.cluster]: 1, 2, 3, ..., 200 -popsize: popsize -design.data[!,design.popsize]: 6190.0, 6190.0, 6190.0, ..., 6190.0 -sampsize: sampsize -design.data[!,design.sampsize]: 200, 200, 200, ..., 200 -design.data[!,:probs]: 0.0226, 0.0226, 0.0226, ..., 0.0662 -design.data[!,:allprobs]: 0.0226, 0.0226, 0.0226, ..., 0.0662 +data: 200×46 DataFrame +strata: stype + [E, E, E … H] +cluster: none +popsize: [6190.0, 6190.0, 6190.0 … 6190.0] +sampsize: [200, 200, 200 … 200] +weights: [44.2, 44.2, 44.2 … 15.1] +probs: [0.0226, 0.0226, 0.0226 … 0.0662] julia> apiclus1 = load_data("apiclus1"); julia> clus_one_stage = SurveyDesign(apiclus1; clusters=:dnum, weights=:pw) SurveyDesign: -data: 183x46 DataFrame +data: 183×46 DataFrame +strata: none cluster: dnum -design.data[!,design.cluster]: 637, 637, 637, ..., 448 -popsize: popsize -design.data[!,design.popsize]: 6190.0, 6190.0, 6190.0, ..., 6190.0 -sampsize: sampsize -design.data[!,design.sampsize]: 15, 15, 15, ..., 15 -design.data[!,:probs]: 0.0295, 0.0295, 0.0295, ..., 0.0295 -design.data[!,:allprobs]: 0.0295, 0.0295, 0.0295, ..., 0.0295 + [637, 637, 637 … 448] +popsize: [6190.0, 6190.0, 6190.0 … 6190.0] +sampsize: [15, 15, 15 … 15] +weights: [33.8, 33.8, 33.8 … 33.8] +probs: [0.0295, 0.0295, 0.0295 … 0.0295] julia> apiclus2 = load_data("apiclus2"); julia> clus_two_stage = SurveyDesign(apiclus2; clusters=[:dnum, :snum], weights=:pw) SurveyDesign: -data: 126x47 DataFrame +data: 126×47 DataFrame +strata: none cluster: dnum -design.data[!,design.cluster]: 15, 63, 83, ..., 795 -popsize: popsize -design.data[!,design.popsize]: 5130.0, 5130.0, 5130.0, ..., 5130.0 -sampsize: sampsize -design.data[!,design.sampsize]: 40, 40, 40, ..., 40 -design.data[!,:probs]: 0.0528, 0.0528, 0.0528, ..., 0.0528 -design.data[!,:allprobs]: 0.0528, 0.0528, 0.0528, ..., 0.0528 + [15, 63, 83 … 795] +popsize: [5130.0, 5130.0, 5130.0 … 5130.0] +sampsize: [40, 40, 40 … 40] +weights: [18.9, 18.9, 18.9 … 18.9] +probs: [0.0528, 0.0528, 0.0528 … 0.0528] ``` Using these designs we can compute estimates of statistics such as mean and @@ -102,15 +97,13 @@ to compute the standard errors. ```julia julia> bootsrs = bootweights(srs; replicates=1000) ReplicateDesign: -data: 200x1047 DataFrame -cluster: false_cluster -design.data[!,design.cluster]: 1, 2, 3, ..., 200 -popsize: popsize -design.data[!,design.popsize]: 6190.0, 6190.0, 6190.0, ..., 6190.0 -sampsize: sampsize -design.data[!,design.sampsize]: 200, 200, 200, ..., 200 -design.data[!,:probs]: 0.0323, 0.0323, 0.0323, ..., 0.0323 -design.data[!,:allprobs]: 0.0323, 0.0323, 0.0323, ..., 0.0323 +data: 200×1047 DataFrame +strata: none +cluster: none +popsize: [6190.0, 6190.0, 6190.0 … 6190.0] +sampsize: [200, 200, 200 … 200] +weights: [31.0, 31.0, 31.0 … 31.0] +probs: [0.0323, 0.0323, 0.0323 … 0.0323] replicates: 1000 julia> mean(:api00, bootsrs) From ebe857b798b299ff333218e766b7e3e9af29fa10 Mon Sep 17 00:00:00 2001 From: Ayush Patnaik Date: Fri, 13 Jan 2023 14:02:28 +0530 Subject: [PATCH 19/19] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 3f1542a3..df4f5696 100644 --- a/README.md +++ b/README.md @@ -161,7 +161,7 @@ All functionalities are supported by each design type. For a more complete guide see the [Tutorial](https://xkdr.github.io/Survey.jl/dev/#Basic-demo) section in the documentation. -## Future goals +## Goals We want to implement all the features provided by the [Survey package in R](https://cran.r-project.org/web/packages/survey/index.html)