In this workshop, we will learn about the tools that help researchers work jointly and/or sequentially on a project and are very useful for reproducibility and open science. The audience will be introduced to three sets of tools: version control (Github), dynamic documents (Notebooks) and quality control (unit testing) in Stata, R and Python. The program is meant to last 3 days either in consecutive days or weekly. Each session will last 3 hours with 4 modules of 15-minute demonstrations and 30 minutes hands-on practice. All sessions will be accompanied by a thorough documentation that can be accessed and edited after the workshop to be used as a common best practice reference. See below for more details.
Version control is a structured way of creating, storing and editing code while making sure that there is no confusion between new and old scripts.
- Basics of Github
- Pulling, committing and pushing code
- Branches
- Seamless integration with Python, R and Stata.
Dynamic documents, usually named notebooks, are extremely powerful and versatile tools. They allow users to mix code with text (including Latex and HTML). They can be used with any statistical language (Python, R, Stata and more) to document the code used and its outputs in a single document. They can also be used to produce presentations, academic papers or even webpages.
-
Use any notebook with your favorite language
-
Dynamic documents using Jupyter and/or Google Colab or using the follow along version:
-
Dynamic documents using R Notebook
-
Documenting data work using notebooks
-
Presentations using notebooks
How should you structure your workflow? How can you make sure that your code is not only running but is not changing outputs down the line? Unit testing is a way of implementing safeguards to dropping observations or variables when you shouldn’t
- Navigating your project workflow
- Unit testing in Stata
- Unit testing in R
- Unit testing in Python