Library for processing MOOC data dumps. Currently limited to Coursera data.
Papers published using this code on our MOOC corpus are available in this repository for download here: https://github.com/WING-NUS/lib4moocdata/tree/master/coursera/docsIf you use this code for your own research, we request you to let us know by email or github issues and cite us.
-
Chandrasekaran, Epp, C.D., M. K., Kan, M.-Y., Litman, D., 2017. “Using Discourse Signals for Robust Instructor Intervention Prediction”. In Proceedings of the Thirty-First AAAI conference on Artificial Intelligence (AAAI-17), San Francisco, USA. pp. 3415-3421. AAAI. https://ojs.aaai.org/index.php/AAAI/article/view/11015
-
Chandrasekaran, M. K., Kan, M.-Y., Ragupathi, K., Tan, B. C. Y. 2015. “Learning instructor intervention from MOOC forums: Early Results and Issues”. In Proceedings of the 8th International Conference on Educational Data Mining, Madrid, Spain. pp. 218-225. International Education Data Mining Society. https://www.educationaldatamining.org/EDM2015/proceedings/full218-225.pdf
- <Full_Coursename>(<coursecode>)_SQL_anonymized_forum.sql
- <Full_Coursename>(<coursecode>)_SQL_hash_mapping.sql
- <Full_Coursename>(<coursecode>)_SQL_anonymized_general.sql
- <Full_Coursename>(<coursecode>)_SQL_unanonymizable.sql
A .txt file with clickstream data is also provided. We do not ywt process them in this library
5. <coursecode>_clickstream_export.gz
For replicating our published results (in our papers), it is sufficient to import files (1), (2) and (3).
Step by step instructions on runnning experiments to replicate our EDM 2015 and AAAI 2017 papers are accessible here. To use the library to process and analyse your data you will first need to install the MySQL database and ingest the .sql files into the database.Command to ingest .sql files using MySQL command line interface (CLI): mysql> source <path to .sql file>/<name of the.sql file>
Note that Coursera supplies a sql export for every course. This means DDL statements across the files from different courses will be redundant. More importatnly there is no field for coursecode in any of the tables. So, you have to either: i) create a separate MySQL database for each course dump (1 per each course iteration) or ii) add a 'coursecode' field to every table and issue update statements to populate the coursecode field after running the *.sql import
The scripts require you to have installed Perl 5 and some dependant perl packages.For Windows users
Install Strawberyy Perl from here http://strawberryperl.com/lib4moocdata or Active Perl from here http://www.activestate.com/activeperl
For Linux, Mac users
Linux and Mac users should have perl already installed as part of your OS. You can check this with the command perl -v
in your terminal.
The packages to install are:
- DBI
- FindBin
- Getopt::Long
- Encode
- HTML::Entities
- Lingua::EN::Sentence
- Lingua::EN::Tokenizer::Offsets
- Lingua::StopWords
- Lingua::EN::StopWordList
- Lingua::Stem::Snowball
- Lingua::EN::Ngram
- Lingua::EN::Bigram ## Fails on linux centos 6
- Lingua::EN::Tagger
- Lingua::EN::PluralToSingular
- Config::Simple
- File::Remove