Replies: 2 comments
-
In #7966 I mention possible content of 2-3 initial modules for the Data Science for development course. Tweets from Donald Trump - only one person - examine and check all the tweets from all the users! Objectives are sometimes different. Data Science (more than statistics) contact individual customers with appropriate customised messages. Automation will always have problems and we often like the personal touch. But with the huge volumes of data and personal objectives, the analyses must be automated - machine learning is obviously what needs to be used! An important point David raised is that our course - for development - will include spreadsheets! Not just are we proposing to use a GUI for R. We also will be using Excel and/or Open Office. There are multiple reasons: |
Beta Was this translation helpful? Give feedback.
-
At the last meeting I was given the task of finding books for the course. I have mentioned elsewhere about the Introduction to Data Science book. This is only introductory but I like it a lot, and propose graduates should (at least) know all of that. This site discusses 5 free books on statistics for data science That section is not so strong in the Introductory book. Among those 5 free books, one seems outstanding and written by giants in the subject. This one seems excellent and written by current giants in the subject. It also has a sort of history of the subject that puts the down-from-the mountains ideas in a much more modern perspective. The second book is called Think Stats. It is much simpler. It could be useful for us, because it is based on python. I would like to get a copy of the data they use through the book - which needs python. But then it could be interesting to give that course material the R-Instat treatment, i.e. can you start by R-Instat and then learn python for statistics later. . |
Beta Was this translation helpful? Give feedback.
-
Following our meeting I was asked to suggest the books to recommend.
I am starting with one book, namely Introduction to Data Science by Rafael A. Irizarry.
This is just an elementary book, but I still suggest it can be the main book for the course. Given the proposed audience, and starting skill-sets of some students, we need to include an elementary book somewhere. I suggest basing the whole course, largely on this book, and then introducing other books that cover particular areas in more depth, is a reasonable way to go.
It has six sections and I further propose that each section could lead to one-or-more course modules. The sections are as follows:
So here we go on the course modules:
R: I propose 2 or 3 optional modules.
The students should choose 1 to be assessed - so maybe one out of them is compulsory?For those who would like to use R, there is a programming in R course. Maybe not a whole course is needed, in which case it could include some of the productivity tools from part 6 of the course.
This should be at the start of the programme, so students can, if they wish, use R (with RStudio) for the course.
They may, instead, use R-Instat for many of the course modules, but we would expect them to use R or python (instead or in addition to) R-Instat on their project.
So, later - in semester 2 or 3 there is an R through R-Instat course, for those who would like that, and there is also a python for data science course, for those who prefer to use python.
Data Visualization:
That's descriptive statistics for us. The book only covers ggplot2, and we need to also include tabulation. This is where we remind people that (at least for development) large-scale surveys are still routinely collected and analysed. Being able to analyse the MICS surveys, etc would be covered here. As it is data science, they don't need to design a survey, but they do need to be able to process the data. We might still call the module data visualisation as it sounds better than descriptive statistics, but that's what it would be. We could include climatic data and producing PICSA-type graphs, etc there too. I assume this would be a compulsory module.Data Wrangling.
That's a nice title for a compulsory module.Statistics with R.
We might have up to 3 modules here. Statistical Methods and perhaps Statistical Models 1 may be compulsory. I suspect that we may want Statistical models 2 as well, but let's see. This is very superficial in the book. I suggest statistical methods could introduce the different methodologies - frequentist, Bayesian, randomisation and might even be limited to relatively small problems. Statistical models 1 might even be devoted largely to generalised linear models? We need more thought here. Maybe Models 1 is regression and Models 2 is classification. I am assuming we can't leave classification to the machine learning.I suggest others, particularly perhaps @volloholic and @jkmusyoka and @lilyclements and others might comment next. If it stands up, then one aspect I am unclear about is the sort of projects students might do and an initial list could be useful. Perhaps @volloholic could easily start on this?
Then there will be further books, etc on particular sections, and I will look further. In addition I think we will need to have our own, at least drafted - for some parts - by then?
Beta Was this translation helpful? Give feedback.
All reactions