Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes in data structures to support more replicate methods #261

Closed
smishr opened this issue Feb 25, 2023 · 9 comments · Fixed by #297
Closed

Changes in data structures to support more replicate methods #261

smishr opened this issue Feb 25, 2023 · 9 comments · Fixed by #297
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed high priority High priority tasks, eg. relating to release softeng Software engineering and integration related

Comments

@smishr
Copy link
Contributor

smishr commented Feb 25, 2023

variance estimation of a function using jackknife (say mean) is inherently different to that with Rao-Wu bootstrap

function mean(x::Symbol, design::ReplicateDesign)
    X = mean(design.data[!, x], weights(design.data[!, design.weights]))
    Xt = [
        mean(design.data[!, x], weights(design.data[!, "replicate_"*string(i)])) for
        i = 1:design.replicates
    ]
    variance = sum((Xt .- X) .^ 2) / design.replicates
    DataFrame(mean = X, SE = sqrt(variance))
end

Jackknife variance calculate multiplies the Sum of Square Errors with a different factor (summing sum((Xt .- X) .^ 2) by (nh-1) / nh over each strata, where nh is number of psus in strata).

So not ideal to keep signature design::ReplicateDesign. Instead I think variance estimation needs system which allows for differing variance calculation for the diff methods (eg Jackknife, BRR, Bootstrap)

@smishr smishr added bug Something isn't working help wanted Extra attention is needed high priority High priority tasks, eg. relating to release softeng Software engineering and integration related labels Feb 25, 2023
@smishr
Copy link
Contributor Author

smishr commented Feb 25, 2023

Perhaps adding types for different type of variance estimation method, and catching that type using multiple dispatch?

@smishr smishr self-assigned this Feb 25, 2023
@smishr smishr changed the title Changes in structures for more replicate methods Changes in data structures to support more replicate methods Feb 25, 2023
@codetalker7
Copy link
Member

How about using value types: so instead of having type as a field in the ReplicateDesign constructor, we have it as a parameter:

# ReplicateType will be a symbol
struct ReplicateDesign{ReplicateType}
    # struct fields
    # constructor will look like the following
    ReplicateDesign{ReplicateType}(args...) = ...
end

So now, we can dispatch on the ReplicateDesign types. A very simple example:

myfunction(design::ReplicateDesign{:bootstrap}) = "This has type bootstrap"
myfunction(design::ReplicateDesign{:jackknife}) = "This has type jackknife"

This is just an idea; if this is the way to go, we'll have to implement it keeping in mind performance: https://docs.julialang.org/en/v1/manual/performance-tips/#man-performance-value-type.

@smishr
Copy link
Contributor Author

smishr commented Mar 7, 2023

How about using value types: so instead of having type as a field in the ReplicateDesign constructor, we have it as a parameter:

# ReplicateType will be a symbol
struct ReplicateDesign{ReplicateType}
    # struct fields
    # constructor will look like the following
    ReplicateDesign{ReplicateType}(args...) = ...
end

So now, we can dispatch on the ReplicateDesign types. A very simple example:

myfunction(design::ReplicateDesign{:bootstrap}) = "This has type bootstrap"
myfunction(design::ReplicateDesign{:jackknife}) = "This has type jackknife"

This is just an idea; if this is the way to go, we'll have to implement it keeping in mind performance: https://docs.julialang.org/en/v1/manual/performance-tips/#man-performance-value-type.

I think could be the going forward with multiple types of ReplicateDesign. @ayushpatnaikgit you mentioned this is how CRRao.jl does it. You think its the best way to go?

@asinghvi17
Copy link
Member

asinghvi17 commented Mar 25, 2023

If only one line fundamentally differs, then it would make sense to do something like the following, for maximum configurability:

# ReplicateType will be some replicate type
struct ReplicateDesign{ReplicateType}
    # struct fields
    inference_method::ReplicateType
    # constructor will look like the following
    ReplicateDesign{ReplicateType}(args...) = ...
    ReplicateDesign(args..., inf::RepType) = ReplicateDesign{RepType}(args..., inf)
end

Base.@kwdef struct BootstrapReplicates
    iterations = 1000
end

Base.@kwdef struct JackknifeReplicates
...
end

then do some fancy dispatch on an outer constructor which creates the appropriate type. One of these replicate types would then go into replicate_design.inference_method, and presumably one could define some inference/confidence interval methods using these types, which e.g bydomain might call.

I personally wouldn't use symbols for this, because you can't pass any other metadata along. Types tend to be a little easier for extensibility as well.

@smishr
Copy link
Contributor Author

smishr commented Mar 27, 2023

If only one line fundamentally differs, then it would make sense to do something like the following, for maximum configurability:

# ReplicateType will be some replicate type
struct ReplicateDesign{ReplicateType}
    # struct fields
    inference_method::ReplicateType
    # constructor will look like the following
    ReplicateDesign{ReplicateType}(args...) = ...
    ReplicateDesign(args..., inf::RepType) = ReplicateDesign{RepType}(args..., inf)
end

Base.@kwdef struct BootstrapReplicates
    iterations = 1000
end

Base.@kwdef struct JackknifeReplicates
...
end

then do some fancy dispatch on an outer constructor which creates the appropriate type. One of these replicate types would then go into replicate_design.inference_method, and presumably one could define some inference/confidence interval methods using these types, which e.g bydomain might call.

I personally wouldn't use symbols for this, because you can't pass any other metadata along. Types tend to be a little easier for extensibility as well.

lets discuss this suggestion in our next meeting. @ayushpatnaikgit @codetalker7 fyi

@ayushpatnaikgit
Copy link
Member

Guys, this is great. I did some experiments and there are no flaws with this. I don't think there is a need for a meeting. We should implement it.

We should implement Anshul's suggestion for the reason he mentioned.

We don't need an inner constructor right now. bootweights and jknweights can return ReplicateDesign objects.

@smishr
Copy link
Contributor Author

smishr commented Mar 28, 2023

okay. sounds good to me!

@codetalker7
Copy link
Member

Hi everyone, small question: now, since we're having inference_method as a property in every ReplicateDesign object, the following situation might arise: what if someone already has replicate weights (and we support creation of a ReplicateDesign object directly using these replicate weights), but they don't know what inference method was used to create these replicate weights. What should be done in that situation? Should we define an unknown inference type or something?

@ayushpatnaikgit
Copy link
Member

Hi everyone, small question: now, since we're having inference_method as a property in every ReplicateDesign object, the following situation might arise: what if someone already has replicate weights (and we support creation of a ReplicateDesign object directly using these replicate weights), but they don't know what inference method was used to create these replicate weights. What should be done in that situation? Should we define an unknown inference type or something?

How about a default? BootstrapReplicates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed high priority High priority tasks, eg. relating to release softeng Software engineering and integration related
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants