Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Framework for dataset validation #648

Open
mih opened this issue Mar 20, 2024 · 0 comments
Open

Framework for dataset validation #648

mih opened this issue Mar 20, 2024 · 0 comments

Comments

@mih
Copy link
Member

mih commented Mar 20, 2024

This has general utility, but also specific applications.

  • verifying the completeness of an individual "sample" record in a study (all desired data types, all required annotations)
  • checking whether a desired number of item of some entity is present
  • ...

One way to implement this would be in terms of metadata validation. A comprehensive report on a dataset is generated, and this report is than validated for compliance with some schema. Once schema compliance is established, relatively simple declarative conditions can be evaluated, while benefiting from a known metadata structure and vocabulary.

The metadata-approach would also nicely connect existing facilities (metadata extraction), and further communicate additional details on the expected content of a dataset in a structured, machine-actionable form.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant