-
Notifications
You must be signed in to change notification settings - Fork 5
/
pre_course_questionnaire_summary.qmd
294 lines (213 loc) · 14.1 KB
/
pre_course_questionnaire_summary.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
---
title: "Pre Course Questionnaire"
author: "Leon Eyrich Jessen"
format:
revealjs:
embed-resources: true
theme: moon
slide-number: c/t
width: 1600
height: 900
mainfont: avenir
logo: images/r4bds_logo_small.png
footer: "R for Bio Data Science"
---
<!--# ---------------------------------------------------------------------- -->
<!--# SLIDE ---------------------------------------------------------------- -->
<!--# ---------------------------------------------------------------------- -->
## Programming Experience
```{r}
library("tidyverse")
library("broom")
d_raw <- read_csv(file = "data/2024 R4BDS Pre Course Questionnaire (Responses) - Form Responses 1.csv") |>
select(
x1 = `What is your current activity?`,
x2 = `Prerequisites - Have you completed mentioned courses or equivalent prior to this course?`,
x3 = `How many years of programming (language irrelevant) experience do you have?`,
x4 = `How many years of base R-programming experience do you have?`,
x5 = `How many years of tidyverse R-programming experience do you have?`,
x6 = `Have you previously worked with any kind of data analysis?`)
d <- d_raw |>
filter(x1 %in% c("Bachelor student","Master Student","PhD Student","Postdoc","Industry"),
str_starts(string = x2, pattern = "\\d{5}"),
str_starts(string = x6, pattern = "Yes|No")) |>
mutate(id = row_number()) |>
relocate(id) |>
separate_rows(x2, sep = ", ") |>
filter(str_starts(x2, "\\d{5}")) |>
mutate(dummy = 1) |>
pivot_wider(id_cols = everything(),
names_from = x2,
values_from = dummy,
values_fill = 0) |>
separate_rows(x6, sep = ", ") |>
filter(str_starts(string = x6, pattern = "in|No")) |>
mutate(dummy = 1) |>
pivot_wider(id_cols = everything(),
names_from = x6,
values_from = dummy,
values_fill = 0)
d_raw |>
select(Any = x3, BaseR = x4, TidyverseR = x5) |>
pivot_longer(cols = everything()) |>
count(name, value) |>
complete(name, value, fill = list(value = 0)) |>
ggplot(aes(x = value,
y = n,
fill = name,
label = n)) +
geom_col(position = position_dodge(preserve = "single"), alpha = 0.5, colour = "black") +
geom_text(position = position_dodge(width = 0.9), vjust = -0.7) +
geom_hline(yintercept = 0) +
scale_x_continuous(breaks = 0:10) +
theme_minimal(base_family = "Arial", base_size = 20) +
theme(panel.grid.major.x = element_blank(),
panel.grid.minor = element_blank())
```
<!--# ---------------------------------------------------------------------- -->
<!--# SLIDE ---------------------------------------------------------------- -->
<!--# ---------------------------------------------------------------------- -->
## Student Composition
```{r}
d_pca_obj <- d |>
select(-id, -x1) |>
prcomp(center = TRUE, scale. = TRUE)
d_pca_obj_aug <- d_pca_obj |>
augment(d)
lbl <- d_pca_obj |>
tidy("eigenvalues") |>
mutate(lbl = str_c("PC", PC, ", Variance Explained = ",
round(percent*100,1), "%"))
d_pca_obj_aug |>
ggplot(aes(x = .fittedPC1,
y = .fittedPC2,
fill = factor(x4))) +
geom_vline(xintercept = 0) +
geom_hline(yintercept = 0) +
stat_ellipse(aes(colour = factor(x5))) +
geom_point(pch = 21, colour = "black", size = 4, alpha = 0.5) +
labs(title = "R for Bio Data Science, Student Composition",
x = lbl$lbl[1],
y = lbl$lbl[2],
fill = "Base R\nExperience",
colour = "Base R\nExperience") +
theme_minimal(base_family = "Arial", base_size = 20)
```
<!--# ---------------------------------------------------------------------- -->
<!--# SLIDE ---------------------------------------------------------------- -->
<!--# ---------------------------------------------------------------------- -->
# Is there any specific area of bio-research you would like to see covered?
<!--# ---------------------------------------------------------------------- -->
<!--# SLIDE ---------------------------------------------------------------- -->
<!--# ---------------------------------------------------------------------- -->
## Data Analysis & Machine Learning
_"I'm interested in more advanced statistics, mapping, and machine learning techniques applied to biological data, especially handling large datasets like RNA-seq and proteomics."_
<!--# ---------------------------------------------------------------------- -->
<!--# SLIDE ---------------------------------------------------------------- -->
<!--# ---------------------------------------------------------------------- -->
## Genetics and Genomics
_"I would like to see topics related to gene-based disease discovery, genome sequencing, and CRISPR applications in bio-research."_
<!--# ---------------------------------------------------------------------- -->
<!--# SLIDE ---------------------------------------------------------------- -->
<!--# ---------------------------------------------------------------------- -->
## Immunology
_"Immunology research, particularly related to autoimmune diseases and tumor immunology, would be valuable to explore."_
<!--# ---------------------------------------------------------------------- -->
<!--# SLIDE ---------------------------------------------------------------- -->
<!--# ---------------------------------------------------------------------- -->
## Multi-omics & Omics Data
_"Multi-omics approaches, including proteomics, genomics, and the analysis of RNA-seq data, are areas I’m very interested in."_
<!--# ---------------------------------------------------------------------- -->
<!--# SLIDE ---------------------------------------------------------------- -->
<!--# ---------------------------------------------------------------------- -->
## Epidemiology & Clinical Data
_"It would be helpful to cover epidemiology topics, including predictive modeling of disease outbreaks and the analysis of clinical datasets."_
<!--# ---------------------------------------------------------------------- -->
<!--# SLIDE ---------------------------------------------------------------- -->
<!--# ---------------------------------------------------------------------- -->
# Briefly, what are your general expectations to this course?
<!--# ---------------------------------------------------------------------- -->
<!--# SLIDE ---------------------------------------------------------------- -->
<!--# ---------------------------------------------------------------------- -->
## R Programming Proficiency
"I expect to become proficient in R programming, gaining confidence in writing, interpreting, and using R code in a variety of bio-research contexts. I aim to improve my R skills for future work in biological research and industry."
<!--# ---------------------------------------------------------------------- -->
<!--# SLIDE ---------------------------------------------------------------- -->
<!--# ---------------------------------------------------------------------- -->
## Data Handling and Analysis
"My goal is to learn how to effectively handle and analyze large datasets, including cleaning, organizing, and applying statistical methods to biological data. I hope to gain the ability to manage big data and automate data analysis tasks in R."
<!--# ---------------------------------------------------------------------- -->
<!--# SLIDE ---------------------------------------------------------------- -->
<!--# ---------------------------------------------------------------------- -->
## Practical Applications in Biology
"I want to apply the R programming skills learned in this course to real-world biological problems, such as genomic data analysis, bioinformatics, and pipeline development. Understanding how to use R in practical bio-research scenarios is a key expectation."
<!--# ---------------------------------------------------------------------- -->
<!--# SLIDE ---------------------------------------------------------------- -->
<!--# ---------------------------------------------------------------------- -->
## Visualization & Data Presentation
"I hope to develop skills in visualizing and presenting biological data in a clear and effective way. Learning how to create plots and presentations that make large datasets more accessible is a critical aspect I expect to master."
<!--# ---------------------------------------------------------------------- -->
<!--# SLIDE ---------------------------------------------------------------- -->
<!--# ---------------------------------------------------------------------- -->
## Confidence and Efficiency in Using R
"I aim to feel more confident and efficient in using R for bio-data analysis by the end of this course. I hope to reduce the time spent on coding, improve the readability of my code, and tackle intermediate challenges independently."
<!--# ---------------------------------------------------------------------- -->
<!--# SLIDE ---------------------------------------------------------------- -->
<!--# ---------------------------------------------------------------------- -->
# Is there anything you would like to add? Comments, suggestions, anything?
<!--# ---------------------------------------------------------------------- -->
<!--# SLIDE ---------------------------------------------------------------- -->
<!--# ---------------------------------------------------------------------- -->
## Course Pace & Structure
"No rushing through the learning material would greatly benefit my understanding. A slower, more deliberate pace will help those of us who are new to programming."
<!--# ---------------------------------------------------------------------- -->
<!--# SLIDE ---------------------------------------------------------------- -->
<!--# ---------------------------------------------------------------------- -->
## Learning Resources
"It would be really helpful to have additional learning resources, such as recommended books or websites, to support learning outside of class. Pointers for getting extra help would be appreciated."
<!--# ---------------------------------------------------------------------- -->
<!--# SLIDE ---------------------------------------------------------------- -->
<!--# ---------------------------------------------------------------------- -->
## Industry Applications
"Including company talks with content related to protein engineering and data handling would be valuable. It would be great to see how skills from the course can be applied to real industry problems, particularly in the context of protein data."
<!--# ---------------------------------------------------------------------- -->
<!--# SLIDE ---------------------------------------------------------------- -->
<!--# ---------------------------------------------------------------------- -->
## Community & Atmosphere
"Looking forward to the course and excited about the learning experience! There's a general sense of anticipation and eagerness to start the class."
<!--# ---------------------------------------------------------------------- -->
<!--# SLIDE ---------------------------------------------------------------- -->
<!--# ---------------------------------------------------------------------- -->
## Individual Programming Projects
"I'd love to focus more on coding custom functions and understanding how they can be applied in different contexts, beyond just the basics."
<!--# ---------------------------------------------------------------------- -->
<!--# SLIDE ---------------------------------------------------------------- -->
<!--# ---------------------------------------------------------------------- -->
# Summary
<!--# ---------------------------------------------------------------------- -->
<!--# SLIDE ---------------------------------------------------------------- -->
<!--# ---------------------------------------------------------------------- -->
## Alignment of Expectations
- Key criteria for succes is that we agree on the content of this course
- Luckily, much of what is mentioned is part of this course
- However, if we were to cover all of the mentioned aspects, that would basically be an entire bioinformatics education
<!--# ---------------------------------------------------------------------- -->
<!--# SLIDE ---------------------------------------------------------------- -->
<!--# ---------------------------------------------------------------------- -->
## This course
- Practical Application: Equip students with hands-on bio data science skills, emphasizing on transforming messy datasets to clean, structured ones, and performing insightful data analyses.
- Tools & Platforms: Utilize Tidyverse R, RStudio IDE, Quarto reporting, git/GitHub, and other specific tools such as dplyr, ggplot, and shinyapps.io for comprehensive bio data science tasks.
- Reproducibility & Collaboration: Stress on the importance of reproducible data analysis and effective collaboration in bio data science projects using Tidyverse R and git/GitHub.
- Broad Skill Set: Train students to perform basic statistical tests, create R packages, design Shiny apps, employ Large-Language-Model (LLM) technology like chatGPT, and more.
- Project Analysis & Presentation: Guide students to assess and critique bio data science projects on method choices and data communication, and organize & present their own results in dynamic Quarto reports.
<!--# ---------------------------------------------------------------------- -->
<!--# SLIDE ---------------------------------------------------------------- -->
<!--# ---------------------------------------------------------------------- -->
## This course - In other words
- Creates the foundation for you to explore the multitude of bioinformatics subjects
- Gives you concrete tool skills to handle (almost) any kind of bio data and to do collaborative coding projects
- Trains your general collaborative and communicative meta skills
<!--# ---------------------------------------------------------------------- -->
<!--# SLIDE ---------------------------------------------------------------- -->
<!--# ---------------------------------------------------------------------- -->
# Tag along and you'll have fun and learn a ton!