-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create data validation report for animal parentage #534
base: release22.11-SNAPSHOT
Are you sure you want to change the base?
Conversation
identify animals with parents of a different species
identify animals with a parent whose gender does not match their sire/dam designation
identifies animals whose parents that are younger than their offspring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey -
picky comment on the queries.
beyond that, i would suggest wrapping these data integrity-level queries into an automatic email notification. See something like ColonyManagementNotification (i could have my name wrong). The general idea is that this can run automatically on a schedule. In most cases, we expect queries to return zero rows (i.e. if there are no conflicting parents, no rows). If the notification runs and not validations fail, no notification. If there are issues, subscribed users are alerted. These reports can also be run on demand, so you get a built-in HTML report out of it.
THEN TRUE | ||
ELSE FALSE | ||
END as parentSexMismatch | ||
FROM study.Animal d1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is fine, but wouldnt it be a little better to just base this on study.demographicsParents rather than animal? you're filtering anything not present in animal, arent you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, yes it would be better, but not all centers have study.demographicsParents working. I'm honestly just trying to keep things simple. we can always improve on it later if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it was my suggestion to use Animal. Only one center doesn't have demographicsParents, but this won't work for them anyway. We should probably look into bringing demographicsParents into core EHR and setting up the Parents column in DefaultEHRCustomizer. @jryurkanin we can sync on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like it might be useful to create a demographicsPedigree.sql in core EHR, and have this SQL select from study.demographics, and then merge in ehr.supplemental_pedigree. That would make columns consistent across EHR instances, and by default support supplemental_pedigree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have something along those lines for the base demographicsParents in this PR. Most centers already have it overwritten, but we can use whatever version of demographicsParents is active for these queries.
THEN TRUE | ||
ELSE FALSE | ||
END as parentSpeciesMismatch | ||
FROM study.Animal d1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comment below about basing on another query besides animal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comment replying to your comment
I see what you mean @bbimber, that is something we considered and may implement later. For now, we decided on doing a simple report, mostly because this is a bit of a training exercise for me to become more familiar with the dev work behind EHR. |
identify animals which are listed as both a dam and a sire
identify animals with parents of a different species
Rationale
There is not any easy way for clients to validate their data aside from trying to run a pedigree plot and getting errors. There needs to be an easily accessibly and interpretable report that validation the animal parentage data without having to check error messages.
Related Pull Requests
Changes