Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call for notebooks demonstrating how to handle missing data #461

Closed
drbenvincent opened this issue Nov 12, 2022 · 10 comments
Closed

Call for notebooks demonstrating how to handle missing data #461

drbenvincent opened this issue Nov 12, 2022 · 10 comments
Labels
help wanted Extra attention is needed proposal New notebook proposal still up for discussion

Comments

@drbenvincent
Copy link
Contributor

drbenvincent commented Nov 12, 2022

While we do have a lot of example notebooks, we have a distinct lack of examples covering how to deal with missing data. There is also no missing data tag.

The only ones I can think of are the notebooks on censored and truncated data, which are a form of missing data.

So this is a kind of meta-issue. I/We would be very grateful if people would like to contribute notebooks demonstrating how to handle missing data. Feel free to create specific notebook proposal issues, referencing this issue.

@drbenvincent drbenvincent added help wanted Extra attention is needed proposal New notebook proposal still up for discussion labels Nov 12, 2022
@drbenvincent drbenvincent changed the title Call notebooks demonstrating how to handle missing data Call for notebooks demonstrating how to handle missing data Nov 12, 2022
@NathanielF
Copy link
Contributor

NathanielF commented Nov 21, 2022

I think this is an interesting issue, but not one I know a tonne about...I've heard good things about "Applied Missing Data" by Chris Enders though. Might be able to look into this a bit more after I finish out the Bayesian VAR model thing.

@NathanielF
Copy link
Contributor

Ok, i've ordered the Enders book - arriving on Friday. I will look into this topic in a bit more detail over Christmas and report back in January if i think i can add anything of interest.

@NathanielF
Copy link
Contributor

Ok, I think this is definitely something I want to pursue. Think there is a really nice example of workplace empowerment estimation I want to work through.... Will outline a full proposal after I've finished the reliability and prediction pull request if that's alright?

NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Jan 16, 2023
NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Jan 16, 2023
NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Jan 16, 2023
NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Jan 16, 2023
@NathanielF
Copy link
Contributor

Started some work on this and was able to get FIML and Bayesian imputation working for the multivariate normal. But i had to use a Potential rather than a likelihood as per the discussion here: https://discourse.pymc.io/t/automatic-imputation-of-multivariate-models/11029/3 for the Bayesian MV imputation.

I'm also going to try the chained equation imputation approach which shouldn't need this approach.

@juanitorduz
Copy link
Collaborator

Coo! By the way have you seen this video https://www.youtube.com/watch?v=nJ3XefApED0 ?

@NathanielF
Copy link
Contributor

About a 1/3 of the way through that video

@reshamas
Copy link
Contributor

@NathanielF

That video (https://www.youtube.com/watch?v=nJ3XefApED0) needs timestamps, in case you are interested. More info here:
pymc-devs/video-timestamps#11

@NathanielF
Copy link
Contributor

Thanks @reshamas , will have a look tomorrow

NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Jan 20, 2023
NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Jan 22, 2023
NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Jan 22, 2023
NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Jan 23, 2023
NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Jan 23, 2023
NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Jan 24, 2023
NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Jan 24, 2023
@NathanielF
Copy link
Contributor

I think this is close to done. Really impressed by those jax samplers!! The speed is so much better!

NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Feb 1, 2023
NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Feb 1, 2023
NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Feb 1, 2023
NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Feb 1, 2023
NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Feb 1, 2023
NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Feb 1, 2023
NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Feb 2, 2023
NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Feb 3, 2023
NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Feb 3, 2023
drbenvincent pushed a commit that referenced this issue Feb 3, 2023
* [Missing Data #461] First commit for missing data working out FIML

Signed-off-by: Nathaniel <[email protected]>

* [Missing Data #461] Added Sensitivity Plots

Signed-off-by: Nathaniel <[email protected]>

* [Missing Data #461] Added Bayesian model fit

Signed-off-by: Nathaniel <[email protected]>

* [Missing Data #461] more testing

Signed-off-by: Nathaniel <[email protected]>

* [Missing Data #461] added chained equation example

Signed-off-by: Nathaniel <[email protected]>

* [Missing Data #461] added myst and updated write up

Signed-off-by: Nathaniel <[email protected]>

* [Missing Data #461] fixed some typos

Signed-off-by: Nathaniel <[email protected]>

* [Missing Data #461] added hierarchical imputation

Signed-off-by: Nathaniel <[email protected]>

* [Missing Data #461] used blackjax sampler and converged

Signed-off-by: Nathaniel <[email protected]>

* [Missing Data #461] updated with feedback

Signed-off-by: Nathaniel <[email protected]>

* [Missing Data #461] nicer team impact plot with title

Signed-off-by: Nathaniel <[email protected]>

* [Missing Data #461] updated with Ben's comments

Signed-off-by: Nathaniel <[email protected]>

* [Missing Data #461] trying to fix sphinx cross ref

Signed-off-by: Nathaniel <[email protected]>

* [Missing Data #461] updated to link to truncated and censored regression notebook

Signed-off-by: Nathaniel <[email protected]>

* [Missing Data #461] fixed minor typo

Signed-off-by: Nathaniel <[email protected]>

* [Missing Data #461] changed authored by date

Signed-off-by: Nathaniel <[email protected]>

* [Missing Data #461] added more explanatory text on why we plot by team

Signed-off-by: Nathaniel <[email protected]>

* [Missing Data #461] updated data load method and added some text

Signed-off-by: Nathaniel <[email protected]>

* [Missing Data #461] removed extra # comments

Signed-off-by: Nathaniel <[email protected]>

* [Missing Data #461] removed another ## comment

Signed-off-by: Nathaniel <[email protected]>

---------

Signed-off-by: Nathaniel <[email protected]>
NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Feb 3, 2023
@NathanielF
Copy link
Contributor

Woop!! Thanks so much @drbenvincent. This one was a real fun one!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed proposal New notebook proposal still up for discussion
Projects
None yet
Development

No branches or pull requests

4 participants