-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Directly store replicate weights #168
Conversation
you want to fix before merging? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. the small mistake in the total.jl
also resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding some missing checks? with disallowmissing
I am working on fixing the tests and adding more tests for this functionality. I will convert this to a draft PR until that is done. |
Does bootstrapping take into account fpc? In the code fpc is not used, but I understand that bootstrapping might account for fpc implicitly. Is that true? This is relevant for testing. I am comparing our results to the ones in R. What is the correct way to compare? Let's take julia> apisrs = load_data("apisrs");
julia> srs = SurveyDesign(apisrs; weights = :pw) |> bootweights;
julia> tot = total(:api00, srs)
1×2 DataFrame
Row │ total SE
│ Float64 Float64
─────┼────────────────────
1 │ 4.06689e6 57644.8
julia> mn = mean(:api00, srs)
1×2 DataFrame
Row │ mean SE
│ Float64 Float64
─────┼──────────────────
1 │ 656.585 9.30656 For R, I believe the corresponding code is: > srs <- svydesign(data=apisrs, id=~1, weights=~pw)
> srsrep <- as.svrepdesign(srs, type="bootstrap", replicates=4000)
> svytotal(~api00, srsrep)
total SE
api00 4066888 56295
> svymean(~api00, srsrep)
mean SE
api00 656.58 9.0886 However, specifying fpc gives results closer to ours: > srs <- svydesign(data=apisrs, id=~1, weights=~pw, fpc=~fpc)
> srsrep <- as.svrepdesign(srs, type="bootstrap", replicates=4000)
> svytotal(~api00, srsrep)
total SE
api00 4066888 57549
> svymean(~api00, srsrep)
mean SE
api00 656.58 9.2912 While this is not an indication that the latter is the actual corresponding code, it does make me wonder. Which one is the correct approach that I should use for testing? |
Also, regarding testing, so far we've been using absolute tolerance to check correctness as compared to the equivalent R results. However, we should consider using relative tolerance instead. For estimated standard errors we should use Julia: julia> apisrs = load_data("apisrs");
julia> srs = SurveyDesign(apisrs; weights = :pw) |> bootweights
julia> tot = total(:api00, srs)
1×2 DataFrame
Row │ total SE
│ Float64 Float64
─────┼────────────────────
1 │ 4.06689e6 57644.8 R: > srs <- svydesign(data=apisrs, id=~1, weights=~pw)
> srsrep <- as.svrepdesign(srs, type="bootstrap", replicates=4000)
> svytotal(~api00, srsrep)
total SE
api00 4066888 56295 An absolute tolerance of 1 is too large for most cases. Hence, I suggest we use relative tolerance for estimated statistics, as well as for estimated standard errors. This has the advantage of consistency. There is also the case when the result is close to 0. We might get this especially for standard errors. In this case, the relative tolerance would be too high and we should use |
|
We should also add back the |
I tried it with 100,000 replicates. The answers are very close to the analytical solutions of with and without fpc. So fpc matters. |
We can merge the PR once the tests and doctests are complete. |
In separate PR, not this one |
Yes this is higher priority than |
Codecov Report
@@ Coverage Diff @@
## singledesign #168 +/- ##
================================================
+ Coverage 64.61% 64.79% +0.18%
================================================
Files 13 13
Lines 195 196 +1
================================================
+ Hits 126 127 +1
Misses 69 69
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Tests now pass. I will soon get into documentation and doctests. We should consider designing more R-independent tests. |
documentation still failing, as not all references to analytical solutions removed in this branch. that and better docs with current branch. merge this now doc pr separate |
@iuliadmtru can you update README.md? |
All tests and doctests should pass now. I made some modifications:
There are still a few problems left:
I think we can now close this PR and have a clean source code once again so that the tests for our following PRs don't fail because of this one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to merge
Tests and doctests are failing. Need to be changed.