Directly store replicate weights #168

ayushpatnaikgit · 2023-01-04T15:51:05Z

Replicate weights are directly stored instead of storing the scaling factor and asking functions to do the multiplication later.

Tests and doctests are failing. Need to be changed.

smishr · 2023-01-04T15:55:41Z

you want to fix before merging?

smishr

looks good. the small mistake in the total.jl also resolved.

smishr

adding some missing checks? with disallowmissing

iuliadmtru · 2023-01-09T07:32:32Z

I am working on fixing the tests and adding more tests for this functionality. I will convert this to a draft PR until that is done.

iuliadmtru · 2023-01-09T08:34:57Z

Does bootstrapping take into account fpc? In the code fpc is not used, but I understand that bootstrapping might account for fpc implicitly. Is that true?

This is relevant for testing. I am comparing our results to the ones in R. What is the correct way to compare?

Let's take total and mean for a simple random sample using the apisrs dataset for example. The Julia code for generating the design and the estimates is:

julia> apisrs = load_data("apisrs");

julia> srs = SurveyDesign(apisrs; weights = :pw) |> bootweights;

julia> tot = total(:api00, srs)
1×2 DataFrame
 Row │ total      SE
     │ Float64    Float64
─────┼────────────────────
   1 │ 4.06689e6  57644.8

julia> mn = mean(:api00, srs)
1×2 DataFrame
 Row │ mean     SE
     │ Float64  Float64
─────┼──────────────────
   1 │ 656.585  9.30656

For R, I believe the corresponding code is:

> srs <- svydesign(data=apisrs, id=~1, weights=~pw)
> srsrep <- as.svrepdesign(srs, type="bootstrap", replicates=4000)
> svytotal(~api00, srsrep)
        total    SE
api00 4066888 56295
> svymean(~api00, srsrep)
        mean     SE
api00 656.58 9.0886

However, specifying fpc gives results closer to ours:

> srs <- svydesign(data=apisrs, id=~1, weights=~pw, fpc=~fpc)
> srsrep <- as.svrepdesign(srs, type="bootstrap", replicates=4000)
> svytotal(~api00, srsrep)
        total    SE
api00 4066888 57549
> svymean(~api00, srsrep)
        mean     SE
api00 656.58 9.2912

While this is not an indication that the latter is the actual corresponding code, it does make me wonder. Which one is the correct approach that I should use for testing?

iuliadmtru · 2023-01-09T08:48:14Z

Also, regarding testing, so far we've been using absolute tolerance to check correctness as compared to the equivalent R results. However, we should consider using relative tolerance instead.

For estimated standard errors we should use rtol and for estimated statistics, we should use atol (in theory) since we expect the results to be exactly the same as those in R. Hence, say, atol = 1e-3 should be a good tolerance for testing. However, in some cases R and Julia don't agree on rounding methods and the results end up within an absolute tolerance of 1. An example is:

Julia:

julia> apisrs = load_data("apisrs");

julia> srs = SurveyDesign(apisrs; weights = :pw) |> bootweights

julia> tot = total(:api00, srs)
1×2 DataFrame
 Row │ total      SE
     │ Float64    Float64
─────┼────────────────────
   1 │ 4.06689e6  57644.8

R:

> srs <- svydesign(data=apisrs, id=~1, weights=~pw)
> srsrep <- as.svrepdesign(srs, type="bootstrap", replicates=4000)
> svytotal(~api00, srsrep)
        total    SE
api00 4066888 56295

An absolute tolerance of 1 is too large for most cases. Hence, I suggest we use relative tolerance for estimated statistics, as well as for estimated standard errors. This has the advantage of consistency.

There is also the case when the result is close to 0. We might get this especially for standard errors. In this case, the relative tolerance would be too high and we should use atol.

iuliadmtru · 2023-01-09T08:56:54Z

ReplicateDesign should be supported by ratio and quantile as well. @ayushpatnaikgit should we do this in a separate PR after this? And add tests for ratio and quantile in that PR.

iuliadmtru · 2023-01-09T09:02:23Z

We should also add back the CategoricalArray support.

ayushpatnaikgit · 2023-01-09T09:25:02Z

Does bootstrapping take into account fpc? In the code fpc is not used, but I understand that bootstrapping might account for fpc implicitly. Is that true?

This is relevant for testing. I am comparing our results to the ones in R. What is the correct way to compare?

Let's take total and mean for a simple random sample using the apisrs dataset for example. The Julia code for generating the design and the estimates is:
julia> apisrs = load_data("apisrs");

julia> srs = SurveyDesign(apisrs; weights = :pw) |> bootweights;

julia> tot = total(:api00, srs)
1×2 DataFrame
 Row │ total      SE
     │ Float64    Float64
─────┼────────────────────
   1 │ 4.06689e6  57644.8

julia> mn = mean(:api00, srs)
1×2 DataFrame
 Row │ mean     SE
     │ Float64  Float64
─────┼──────────────────
   1 │ 656.585  9.30656
For R, I believe the corresponding code is:
> srs <- svydesign(data=apisrs, id=~1, weights=~pw)
> srsrep <- as.svrepdesign(srs, type="bootstrap", replicates=4000)
> svytotal(~api00, srsrep)
        total    SE
api00 4066888 56295
> svymean(~api00, srsrep)
        mean     SE
api00 656.58 9.0886
However, specifying fpc gives results closer to ours:
> srs <- svydesign(data=apisrs, id=~1, weights=~pw, fpc=~fpc)
> srsrep <- as.svrepdesign(srs, type="bootstrap", replicates=4000)
> svytotal(~api00, srsrep)
        total    SE
api00 4066888 57549
> svymean(~api00, srsrep)
        mean     SE
api00 656.58 9.2912
While this is not an indication that the latter is the actual corresponding code, it does make me wonder. Which one is the correct approach that I should use for testing?

svymean(~api00, srsrep)

I tried it with 100,000 replicates. The answers are very close to the analytical solutions of with and without fpc. So fpc matters.

ayushpatnaikgit · 2023-01-09T09:28:16Z

We can merge the PR once the tests and doctests are complete.

smishr · 2023-01-09T10:26:20Z

We should also add back the CategoricalArray support.

In separate PR, not this one

smishr · 2023-01-09T10:26:49Z

ReplicateDesign should be supported by ratio and quantile as well. @ayushpatnaikgit should we do this in a separate PR after this? And add tests for ratio and quantile in that PR.

Yes this is higher priority than CategoricalArray.

codecov-commenter · 2023-01-09T14:49:24Z

Codecov Report

Merging #168 (76275cd) into singledesign (eb290eb) will increase coverage by 0.18%.
The diff coverage is 96.00%.

@@               Coverage Diff                @@
##           singledesign     #168      +/-   ##
================================================
+ Coverage         64.61%   64.79%   +0.18%     
================================================
  Files                13       13              
  Lines               195      196       +1     
================================================
+ Hits                126      127       +1     
  Misses               69       69

Impacted Files	Coverage Δ
src/bootstrap.jl	`95.65% <90.90%> (+0.19%)`	⬆️
src/SurveyDesign.jl	`92.30% <100.00%> (ø)`
src/by.jl	`100.00% <100.00%> (ø)`
src/mean.jl	`100.00% <100.00%> (ø)`
src/total.jl	`100.00% <100.00%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

iuliadmtru · 2023-01-09T14:59:01Z

Tests now pass. I will soon get into documentation and doctests.

We should consider designing more R-independent tests.

smishr · 2023-01-10T04:04:07Z

documentation still failing, as not all references to analytical solutions removed in this branch. that and better docs with current branch.

merge this now doc pr separate

ayushpatnaikgit · 2023-01-10T06:26:39Z

@iuliadmtru can you update README.md?

iuliadmtru · 2023-01-10T12:07:20Z

All tests and doctests should pass now. I made some modifications:

All text from api.md is gone. I believe that text is better fit for index.md in the tutorial/demo section. We should also have more complete docstrings where we write all the relevant information. Right now our dosctrings are lacking substance.
There are no more references to the old designs (SimpleRandomSample, StratifiedSample) in the documentation.
The sturges and freedman_diaconis functions are exported again. They are necessary for hist. However, I don't think they are relevant enough to our package to include them in the API reference. Therefore I didn't include them. Documenter now gives a warning because of this and I don't think we can avoid it.
The files that I changed now have a coherent style.

There are still a few problems left:

Precompiling the package gives a warning because quantile is defined twice. That is mentioned in issue quantile function #171 and will be fixed by PR Update survey design, add tests, remove extra quantile, add design.weights #172.
The documentation is far from complete, both the docstrings and the files inside docs/src. We should do a separate PR on documentation.
The README is outdated. @ayushpatnaikgit I can do it, but I suggest doing it in a separate PR.
The show methods for SurveyDesign and ReplicateDesign could be greatly improved. This will mean changing the doctests again.

I think we can now close this PR and have a clean source code once again so that the tests for our following PRs don't fail because of this one.

smishr

looks good to merge

Fix bug in total.

f9efaea

smishr approved these changes Jan 4, 2023

View reviewed changes

ayushpatnaikgit added 2 commits January 5, 2023 11:18

Remove comments and unused function.

7a65293

Attemp bug fix.

af7ee99

smishr reviewed Jan 9, 2023

View reviewed changes

iuliadmtru marked this pull request as draft January 9, 2023 07:33

iuliadmtru added 3 commits January 9, 2023 18:11

Fix and add tests for SRS for total

9c0b685

Fix and add tests for Stratified, add constants for tolerances

a5f3a4e

Fix and add tests for Cluster and minor reordering

87c11dc

iuliadmtru added 5 commits January 10, 2023 16:52

Remove references to old designs, add references to new functions

81b77e8

Fix docstring and minor style modifications

7518138

Convert indentation to spaces

efe2f6a

Fix docstrings, minor rearrangements and style checks

2bac938

Fix argument rendering in docstring

76275cd

iuliadmtru marked this pull request as ready for review January 10, 2023 12:08

iuliadmtru requested a review from smishr January 10, 2023 12:08

smishr approved these changes Jan 10, 2023

View reviewed changes

smishr merged commit 7e946cc into xKDR:singledesign Jan 10, 2023

This was referenced Jan 10, 2023

Fixing documentation error in singledesign #167

Closed

Suppress warning outputs during testing #125

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Directly store replicate weights #168

Directly store replicate weights #168

ayushpatnaikgit commented Jan 4, 2023

smishr commented Jan 4, 2023

smishr left a comment

smishr left a comment

iuliadmtru commented Jan 9, 2023

iuliadmtru commented Jan 9, 2023 •

edited

Loading

iuliadmtru commented Jan 9, 2023 •

edited

Loading

iuliadmtru commented Jan 9, 2023

iuliadmtru commented Jan 9, 2023

ayushpatnaikgit commented Jan 9, 2023

ayushpatnaikgit commented Jan 9, 2023

smishr commented Jan 9, 2023

smishr commented Jan 9, 2023

codecov-commenter commented Jan 9, 2023 •

edited

Loading

iuliadmtru commented Jan 9, 2023

smishr commented Jan 10, 2023 •

edited

Loading

ayushpatnaikgit commented Jan 10, 2023

iuliadmtru commented Jan 10, 2023 •

edited

Loading

smishr left a comment

Directly store replicate weights #168

Directly store replicate weights #168

Conversation

ayushpatnaikgit commented Jan 4, 2023

smishr commented Jan 4, 2023

smishr left a comment

Choose a reason for hiding this comment

smishr left a comment

Choose a reason for hiding this comment

iuliadmtru commented Jan 9, 2023

iuliadmtru commented Jan 9, 2023 • edited Loading

iuliadmtru commented Jan 9, 2023 • edited Loading

iuliadmtru commented Jan 9, 2023

iuliadmtru commented Jan 9, 2023

ayushpatnaikgit commented Jan 9, 2023

ayushpatnaikgit commented Jan 9, 2023

smishr commented Jan 9, 2023

smishr commented Jan 9, 2023

codecov-commenter commented Jan 9, 2023 • edited Loading

Codecov Report

iuliadmtru commented Jan 9, 2023

smishr commented Jan 10, 2023 • edited Loading

ayushpatnaikgit commented Jan 10, 2023

iuliadmtru commented Jan 10, 2023 • edited Loading

smishr left a comment

Choose a reason for hiding this comment

iuliadmtru commented Jan 9, 2023 •

edited

Loading

iuliadmtru commented Jan 9, 2023 •

edited

Loading

codecov-commenter commented Jan 9, 2023 •

edited

Loading

smishr commented Jan 10, 2023 •

edited

Loading

iuliadmtru commented Jan 10, 2023 •

edited

Loading