-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GAF/Source validation processing should occur with only one pass #1384
Comments
It occurs to me also that doing only one pass would eliminate completely any mid-validate GAF downloads. Recall that we have had an issue where mix-in GAFs are still downloaded within ontobio. |
@dougli1sqrd Good point--we've been burned by that before as well. |
This will also add clarity as we add more internal and upstream sources. |
Talking to @dougli1sqrd earlier, one idea would be to process all incoming files without any thought of mixins, get the files, then perform the reassembly as a discrete step afterwards. This would make it easier to trace issues, view intermediate products, and add new sources/products/mixins in the future. |
Add second pass at copying files to ensure that the "good" PAINT report files are the last ones over. For issue geneontology/go-site#1253 . Can go away with flow change in geneontology/go-site#1384 .
Note that we need this to get to geneontology/pipeline#27 |
In ontobio, the order of operations will make this difficult:
Step 3 is what this issue addresses. But if step 4 is dependent on step 3, we will need to resolve this difficulty in order to complete this issue. |
Currently the Makefile and ontobio validate are structured very group centric. So when processing say
fb
we do all processing that that requires to completion. This includes merging any "mix-in" datasets, for examaple: PAINT. So in the course of validatingfb
we also download and validatepaint_fb
.In normal pipeline mode though, we also separately process paint, including
paint_fb
. Since we validatepaint_fb
above in the course of validatingfb
we are processingpaint_fb
and potentially any other mix-in source twice.On its own processing twice is a little lame, but has been okay for quite some time. But as we have expanded features of ontobio validate and the pipeline there's been difficulties. In particular, #1253 is ultimately caused by this "double processing" issue outlined above.
The pipeline, Makefile, and ontobio should be structured so that main validation of sources only occurs once per dataset. Merging of mix-ins into main sources as output products can come as a separate step. @kltm and I will expand on solutions here to do this.
The text was updated successfully, but these errors were encountered: