JupyterHub Developing a UseCase: Variant Calling Workflow for Large-Scale Genomic Datasets #5

viktoriaas · 2024-11-03T21:35:18Z

Why?

Jupyter Notebook is an application for creating and sharing computational documents. JupyterHub is a way of providing the Notebooks to multiple users. The benefit is that users gain easy interactive access to computational resources without need to install anything.

GA4GH TES (Task Execution Service) API is a standardized schema and API for describing and executing batch execution tasks on any underlying computational backend. Full TES spec defines TES capabilities.

The goal of this issue is to develop use case for using JupyterHub instance. Sample use case can be variant calling for large scale genomic datasets.

Objective: Develop a workflow in JupyterHub to perform variant calling on genomic data from multiple cohorts, utilizing federated computing through GA4GH TES.

Scope: The workflow could include data pre-processing, alignment, and variant calling, leveraging TES to offload compute-intensive tasks to appropriate resources. Visualizations could show variant distributions, and results could be exported for further analysis.

More useful information and link: document online

How?

The full functionality of this issue (distributing parts of the workflow) depends on the functionality of other issues. However, it is still crucial to create a sample workflow that includes all steps of a data analysis pipeline logically divided into sections that could be theoretically offloaded to appropriate resources. You can use existing TES instances to offload some parts (or at least one part) to any TES instance.

Create a Jupyter Notebook with sample workflow (any bioinformatics workflow) that includes all steps of a data analysis pipeline.
Identify parts that could be offloaded and define their requirements - do they need data in advance? do they need to save output somewhere? Does the computation require any special resources? Is it possible that this computation could manipulate sensitive data?
Try to offload at least one part of the computation to any TES instance. Remember, that TES instances might require an authentication token so don't forget to add it!

If you want to work on this issue:

Assign yourself to the issue (if someone else is already assigned, first ask them if they would mind help on the issue - or pick another one)
Once assigned, move your issue to the "In progress" column on the project board
Start working 🚀

viktoriaas added this to BioHackathon Europe '24 Nov 3, 2024

viktoriaas added type: code type: deployment type: docs type: research type: testing topic: jupyterhub labels Nov 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JupyterHub Developing a UseCase: Variant Calling Workflow for Large-Scale Genomic Datasets #5

JupyterHub Developing a UseCase: Variant Calling Workflow for Large-Scale Genomic Datasets #5

viktoriaas commented Nov 3, 2024 •

edited by ahembal

Loading

JupyterHub Developing a UseCase: Variant Calling Workflow for Large-Scale Genomic Datasets #5

JupyterHub Developing a UseCase: Variant Calling Workflow for Large-Scale Genomic Datasets #5

Comments

viktoriaas commented Nov 3, 2024 • edited by ahembal Loading

viktoriaas commented Nov 3, 2024 •

edited by ahembal

Loading