Skip to content

forgef/Data-Science-Tools-Workshop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Science Tools - Workshop

MONT2 Econ Lab

Fabien Forge, Ph.D.

Objective

In this workshop, we will learn about the tools that help researchers work jointly and/or sequentially on a project and are very useful for reproducibility and open science. The audience will be introduced to three sets of tools: version control (Github), dynamic documents (Notebooks) and quality control (unit testing) in Stata, R and Python. The program is meant to last 3 days either in consecutive days or weekly. Each session will last 3 hours with 4 modules of 15-minute demonstrations and 30 minutes hands-on practice. All sessions will be accompanied by a thorough documentation that can be accessed and edited after the workshop to be used as a common best practice reference. See below for more details.

Day 1: Version control

Version control is a structured way of creating, storing and editing code while making sure that there is no confusion between new and old scripts.

  1. Basics of Github
  2. Pulling, committing and pushing code
  3. Branches
  4. Seamless integration with Python, R and Stata.

Day 2: Dynamic documents

Dynamic documents, usually named notebooks, are extremely powerful and versatile tools. They allow users to mix code with text (including Latex and HTML). They can be used with any statistical language (Python, R, Stata and more) to document the code used and its outputs in a single document. They can also be used to produce presentations, academic papers or even webpages.

  1. Use any notebook with your favorite language

  2. Dynamic documents using Jupyter and/or Google Colab Open In Colab or using the follow along version: Open In Colab

  3. Dynamic documents using R Notebook

  4. Documenting data work using notebooks

  5. Presentations using notebooks

Day 3: Quality control

How should you structure your workflow? How can you make sure that your code is not only running but is not changing outputs down the line? Unit testing is a way of implementing safeguards to dropping observations or variables when you shouldn’t

  1. Navigating your project workflow
  2. Unit testing in Stata
  3. Unit testing in R
  4. Unit testing in Python

About

Data Science Tools Workshops for MONT2 Econ Lab

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published