Update descriptions and provide more explicit values to improve AI-assisted workflow (MIT collab) #560

anngvu · 2024-12-10T17:54:15Z

Since the data model is used in AI workflows (as discussed with MIT group and experienced with other development/testing), one easy way to improve AI performance is to provide better information through the data model.

For MIT examples, aim is to improve sample_output. Since the description is provided as context, better hints to be provided are:

assay: We note that this is inconsistently filled out, so assay description could include that this should be the same term within the same dataset
dataSubtype: This is inconsistently filled out, but AI does a good job on fileFormat for obvious reasons, so provide example that fastq usually means "raw" in the description.
dataType: GenomicVariants vs AlignedReads vs GenomicFeatures -- since the AI currently doesn't look up these terms using the EDAM ontology, this can be confusing. Might have to put in description that AlignedReads refer to bam files, GenomicVariants is a broader term for simple variants or structural variants, so can either be maf or vcf ..., and genomic features we expect something like bed or some other formats.

In an ideal workflow, instead of getting hints through description, the generative AI can use relations encoded more formally*:

as ontology axioms or set of rules (again, if fileFormat is fastq, dataSubtype will always be "raw")
or probabilities (for type of tumor A, the most probable tissue sample types are {T1, T2}, though if it's a metastatic tumor, the tissue sample could be from a wider variety of tissues {T1,T2,T3,T4,T5,...}, and there is also known tropism for some cancers)

*This might be a future iteration.

Also attached: sample_output.csv

The text was updated successfully, but these errors were encountered:

anngvu mentioned this issue Dec 10, 2024

Update descriptions and provide more explicit values to improve AI-assisted workflow (accent) #561

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update descriptions and provide more explicit values to improve AI-assisted workflow (MIT collab) #560

Update descriptions and provide more explicit values to improve AI-assisted workflow (MIT collab) #560

anngvu commented Dec 10, 2024 •

edited

Loading

Update descriptions and provide more explicit values to improve AI-assisted workflow (MIT collab) #560

Update descriptions and provide more explicit values to improve AI-assisted workflow (MIT collab) #560

Comments

anngvu commented Dec 10, 2024 • edited Loading

anngvu commented Dec 10, 2024 •

edited

Loading