-
Notifications
You must be signed in to change notification settings - Fork 0
Home
This workshop provides graduate students in public universities with the necessary skills and tools to analyze biological data using high-performance computing resources.
Participants will acquire hands-on experience with industry-standard command-line tools (CLI) for DNA and RNA sequencing analysis, sequence manipulation and alignment, and pipeline management for automating complex workflows. They will also learn about differential expression analysis for identifying genes with altered expression levels, data visualization techniques for effectively presenting results, and the basics of artificial intelligence (AI) and machine learning (ML) in bioinformatics.
Upon completion of this workshop, graduates will be capable of using these powerful tools and methods to address real-world biological challenges and make significant contributions to bioinformatics research.
Required Skills
Skill | Description |
---|---|
Basic understanding of biology | This workshop assumes a basic understanding of biological concepts, such as DNA, RNA, genes, and genomes. |
Familiarity with the command line (optional, but helpful) | While not required, familiarity with the command line will help navigate the tools covered in the workshop. |
Enthusiasm for learning new computational skills | A strong interest in learning new computational skills is essential for success in this workshop. |
Time: Tuesdays @3PM (for Zoom link/in person information, please sign up at the U of A Data Science Institute DataLab website)
All sessions are recorded and uploaded to the University of Arizona's DataLab YouTube channel, where you can also find the other DataLab series: Natural Language Processing (NLP), Generative AI, NextGen Geospatial.
Date | Title | Topic | Description | Instructors | Material Link/Recording |
---|---|---|---|---|---|
(09/03) | Exploring Analysis Platforms and High-Performance Computing Resources for Bioinformatics | Platforms and high-performance computing resources for bioinformatics (HPC/CyVerse) | Learn about the infrastructure behind bioinformatics analysis - high-performance computing (HPC) clusters; Explore CyVerse, a user-friendly platform for accessing HPC resources and running bioinformatics analyses. | Michele Cosi / Carlos Lizárraga | Material, Recording |
(09/10) | Intro to Essential Tools for Every Bioinformatician | CLI Tools for DNA and RNA-seq (BLAST, SAMtools, FastQC/MultiQC, STAR/TopHat/Bowtie2, BWA, HISAT2, GATK, BUSCO) | Master the CLI, a bioinformatics fundamental; Learn essential tools such as BLAST, SAMtools, FastQC/MultiQC, and aligners like STAR, TopHat, Bowtie2, BWA, HISAT2; Become proficient in GATK for variant calling and BUSCO for genome completeness assessment. | Michele Cosi | Material, Recording |
(09/17) | An Introduction to Sequence Manipulation, Alignment, and Assessment for Bioscience Analyses | Sequence manipulation, alignment, and assessment (BLAST, BWA, SAMtools, STAR/TopHat/Bowtie2, FastQC/MultiQC) | Enhance your knowledge of sequence manipulation for sequencing data preparation; Improve your alignment skills using various tools, and understand the assessment of alignment quality. | Michele Cosi | Material, Recording |
(09/24) | Using Nextflow for Streamlining Bioscience Analysis Pipelines | Pipeline management: NextFlow | Learn automation with Nextflow, tools that simplify complex bioinformatics analyses; Acquire the skills to design and run efficient pipelines, saving time and reducing errors. | Michele Cosi | Material, Recording |
(10/01) | Using Snakemake for Streamlining Data Analysis Pipelines | Pipeline management: SnakeMake | Doing the previous tasks, with SnakeMake option. | Michele Cosi | Material, Recording |
(10/08) | A Beginners Guide to Gene Expression: Diving into DESeq Analysis | Differential Expression Analysis (DESeq) | Master DESeq to identify genes with significant expression changes between biological conditions; Learn to interpret results for biological insights. | Michele Cosi, Carlos Lizárraga | Material, Recording |
(10/15) | Fundamentals of Data Visualization | Presenting and reading data (PCA plot, Volcano plot, heatmap, ggplot2) | Create compelling data visualizations using powerful tools like PCA plots, volcano plots, heatmaps, and ggplot; Effectively present your bioinformatics findings. | Carlos Lizárraga | Material, Recording |
(10/22) | Explore Current AI/ML Trends and Tools in Bioinformatics | AI/ML in Bioinformatics | Explore AI and ML applications in bioinformatics; Understand their role in revolutionizing biological research and solving complex challenges. | Carlos Lizárraga | Link |
(10/29) | Using MLFlow to Track Machine Learning Projects in Bioinformatics | MLFlow | Explore MLFlow, a platform for managing the machine learning lifecycle; Learn practical skills for experiment tracking, model deployment, and collaboration in bioinformatics research. | Artin Majdi / Carlos Lizárraga | Link |
- A Bioinformatics Wiki. C. Lizarraga. Data Science Institute. UArizona.
- Artificial Intelligence and Machine Learning in Bioinformatics.
- A survey of best practices for RNA-seq data analysis. Conesa, A., Madrigal, P., Tarazona, S. et al. A survey of best practices for RNA-seq data analysis. Genome Biol 17, 13 (2016). https://doi.org/10.1186/s13059-016-0881-8.
- awesome-bioinformatics
- awesome-biological-visualizations
- From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, Del Angel G, Levy-Moonshine A, Jordan T, Shakir K, Roazen D, Thibault J, Banks E, Garimella KV, Altshuler D, Gabriel S, DePristo MA. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinformatics. 2013;43(1110):11.10.1-11.10.33. doi: 10.1002/0471250953.bi1110s43. PMID: 25431634; PMCID: PMC4243306.
- Genome Browser
- RNA-seq and Differential Expression. High Performance Research Computing. Texas A&M University.
- TeSS (Training eSupport System).
Updated: 08/22/2024 (M. Cosi)
UArizona Data Lab, Data Science Institute, University of Arizona.
UArizona DataLab, Data Science Institute, University of Arizona, 2024.