Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create data validation report for animal parentage #534

Open
wants to merge 7 commits into
base: release22.11-SNAPSHOT
Choose a base branch
from

Conversation

jryurkanin
Copy link

identify animals with parents of a different species

Rationale

There is not any easy way for clients to validate their data aside from trying to run a pedigree plot and getting errors. There needs to be an easily accessibly and interpretable report that validation the animal parentage data without having to check error messages.

Related Pull Requests

Changes

  • adding multiple .SQL files and an HTML report.

identify animals with parents of a different species
identify animals with a parent whose gender does not match their sire/dam designation
identifies animals whose parents that are younger than their offspring
@jryurkanin jryurkanin changed the title Create parentDifferentSpecies.sql Create data validation report for animal parentage Feb 14, 2023
Copy link
Collaborator

@bbimber bbimber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey -

picky comment on the queries.

beyond that, i would suggest wrapping these data integrity-level queries into an automatic email notification. See something like ColonyManagementNotification (i could have my name wrong). The general idea is that this can run automatically on a schedule. In most cases, we expect queries to return zero rows (i.e. if there are no conflicting parents, no rows). If the notification runs and not validations fail, no notification. If there are issues, subscribed users are alerted. These reports can also be run on demand, so you get a built-in HTML report out of it.

THEN TRUE
ELSE FALSE
END as parentSexMismatch
FROM study.Animal d1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fine, but wouldnt it be a little better to just base this on study.demographicsParents rather than animal? you're filtering anything not present in animal, arent you?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, yes it would be better, but not all centers have study.demographicsParents working. I'm honestly just trying to keep things simple. we can always improve on it later if necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it was my suggestion to use Animal. Only one center doesn't have demographicsParents, but this won't work for them anyway. We should probably look into bringing demographicsParents into core EHR and setting up the Parents column in DefaultEHRCustomizer. @jryurkanin we can sync on this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it might be useful to create a demographicsPedigree.sql in core EHR, and have this SQL select from study.demographics, and then merge in ehr.supplemental_pedigree. That would make columns consistent across EHR instances, and by default support supplemental_pedigree.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have something along those lines for the base demographicsParents in this PR. Most centers already have it overwritten, but we can use whatever version of demographicsParents is active for these queries.

#541

THEN TRUE
ELSE FALSE
END as parentSpeciesMismatch
FROM study.Animal d1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment below about basing on another query besides animal

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comment replying to your comment

@jryurkanin
Copy link
Author

I see what you mean @bbimber, that is something we considered and may implement later. For now, we decided on doing a simple report, mostly because this is a bit of a training exercise for me to become more familiar with the dev work behind EHR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants