-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Identification algorithms #82
Conversation
Thanks, much appreciated :) |
src/gensearch.jl
Outdated
end | ||
end | ||
foreach(s -> genvisit(g, s, INIT), S) | ||
return Set([x for x in 1:nv(g) if visited[3*x-2] || visited[3*x-1] || visited[3*x]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this line called often?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is called once per call of gensearch.
gensearch returns the set of all visited vertices. Imho this is the most general/flexible way to do it.
In the context of d-separation, one call of gensearch with S = X (and the right pass function) will return the set of all vertices d-connected with X. To test whether Y is d-separated from X one could then check whether the intersection of the returned set and Y is empty.
Alternatives would be:
(i) to return a boolean vector of visited vertices (less overhead, but also not so convenient to work with)
(ii) to write a second gensearch function which tests reachability (e.g., in the d-separation example it would directly return a boolean). This could be used in cases where the whole set of visited vertices is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I think it’s fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return Set([x for x in 1:nv(g) if visited[3*x-2] || visited[3*x-1] || visited[3*x]]) | |
return Set(x for x in 1:nv(g) if visited[3*x-2] || visited[3*x-1] || visited[3*x]) |
would make sense.
What is the status? |
We might even wait for JuliaGraphs/Graphs.jl#266 |
We now have the functions find, find_min and list for the adjustment criterion and for the back-door criterion. The generalization of back-door to sets Generally, I would be in favor of only having the adjustment criterion functions. Dagitty also does it like this. The criterion is sound and complete for adjustment, is well-studied from the algorithmic side and in any case the differences are minor. E.g. if Finally, we might also want to add stuff on efficient adjustment sets (those with smallest asymptotic variance) and as far as I know those base on the adjustment criterion as well. Having both functions could also be confusing. |
We you can either kick them out or add a sentence to the doc string “Provided for comparison only” |
For the docs, we might change https://mschauer.github.io/CausalInference.jl/latest/examples/backdoor_example/ to the new tools. By the way, is it correct that there |
Sounds good. If we explain the difference between back-door criterion and adjustment criterion in the docs I'm also fine with having implementations for both. Might actually be a good idea to explicitly discuss the differences instead of just give a function for one of them.
No, that does not seem correct :/ Which function outputs this?
And that appears to be fine... |
Sorry for the noise, I looked at nodes 4 and 6 in error. |
This is really cool and quite useable
PS: As test:
|
Next TODOs are:
Hope I find some time tomorrow and next week for it |
I agree with the TODOs. The "Reasoning about experiments" can be done in a separate PR if we want to be a bit more incremental. |
Agreed, that's better |
List all d-separators `Z` with `I subseteq Z subseteq R` for sets of vertices `X` and `Y` in `g`. | ||
""" | ||
function list_dseps(g, X, Y, I = Set{eltype(g)}(), R = setdiff(Set(vertices(g)), X, Y)) | ||
X, Y, I, R = toset.((X, Y, I, R)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g. the cast on X and Y is not necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean for list functions specifically or also in other places?
Here find_dsep
would cast X at every call (which happens during list_dseps``) if we don't do the cast here. It's true that
find_dsepalso doesn't really need X to be a Set, however, e.g.
find_covariate_adjustment``` and other functions do (because membership v in X is tested repeatedly), so at that point I thought it was cleanest to cast everything.
Have you seen https://docs.julialang.org/en/v1/manual/style-guide/#Handle-excess-argument-diversity-in-the-caller ? I think it doesn't really apply here but in principle widening
and trusting the iteration to generate errors for unsupported types should work. I think here it doesn't matter at all that X and Y are of a certain type as long as the methods are defined we need in the iterator. |
Agreed, that's better and more general. I would still prefer to also cast X and Y in list_dseps etc, because it is more efficient for our use case (if we wouldn't cast it there it would get cast for every call of find = find_dsep). |
Other solution (instead of having a cast at the beginning of every function) would be to add X::Set{eltype(g)} etc to every function. Then, have a wrapper function which accepts other types, does a cast to Set and calls the other function. Generally I think casting isn't so bad here, because the Sets are basically our internal representation for efficiency. We could have something else than Sets, like a special kind of bitvector representation, which supports fast membership queries and union/intersect etc. Then the first line of every function would also be to convert the arguments to this representation. Anyway, I'm very open for suggestions because I'm not sure what's the cleanest way... |
I am fine with that |
I think I'm reasonably happy with the code for now :) Having problems right now to build the docs locally (haven't done that before), so I wasn't able to check how the formatting looks yet. But if I fix that, I think we could merge soon |
Can I just merge? Then we can see the docs. |
Go for it! |
Awesome :)
I didn't realize Currently not on the PC, could add this later (and try locally first) |
I figured out how to work on the docs locally now. Opened a new PR for updating the docs: #85 We can also work on a new Example section there (or in a separate PR if we want a quick merge)... |
Out of curiosity: is all functionality canonical (e.g. available in DAGitty) or did you sneak in your own discoveries and innovations in here? |
In principle all the functionality is in DAGitty as well. The front-door adjustment algorithm is very new, but it has also been added to the experimental version of Dagitty (but I think no other package has it yet). Still, I wrote everything from scratch. The main thing about my implementation is that it uses gensearch throughout (I also modified and simplified this a bit to my liking). In DAGitty there is a similar function, but for historical reasons I think it's not used everywhere. Credit goes to my colleague Benito van der Zander, who (I think) was the one that came up with the idea and convinced me that it's the cleanest way to proceed. I think it really makes the code much more maintainable and I was able to code all the functions in a few afternoons. Because I have a slightly different version of gensearch, I had to modify some minor things. Then, there is this thing with the back-door adjustment for sets (in Dagitty there is no function for the classic back-door criterion afaik, only the adjustment criterion), where the "standard" generalization to sets (e.g. given by Pearl) is imo overly restrictive (tbh I don't really understand why it's stated there this way). I think the way I implemented the listing algorithms with iterators is pretty neat. Every other package generates a vector of all (adjustment) sets, which might use a lot of memory or not terminate. |
@mwien I made a PR to be able to comment and assist.
Closes #61