Skip to content

A repository of code and data for the USCB sklearn workshop on February 7th & 8th, 2019

Notifications You must be signed in to change notification settings

Kaaiian/UCSB_sklearn_workshop

Repository files navigation

UCSB_sklearn_workshop

A repository of code and data for the USCB sklearn workshop on February 7th & 8th, 2019

General outline

Day 1

Kaai can arrive early (8:00 AM) to informally talk about using pandas and numpy for reading and processing data if there is interest

Morning 10:00 AM - ~12:00 PM

  • Lecture - Basics of machine learning (1-2 hrs)

    • Training and test set
    • Model types, & model complexity
    • Basic data types
    • Data imbalance
    • Performance metrics
    • Advanced data types (NLP, computer-vision, time-series, etc.)
  • Code along - First pass: band gap regression (1 hr)

    • download data
    • perform a train-test split
    • convert formulae to features
    • perform linear regression (linear learner)
    • perform random-forest regression (non-linear)
    • generate error metrics & figures

Afternoon 1:00 PM - 5:00 PM

  • Code along - Second pass: band gap regression (1.5 hrs)

    • remove duplicates and make sure data looks okay
    • perform a train-test split
    • convert formula to features
    • discuss model parameter selection
    • implement cross-validation
    • implement grid-search
    • generate error metrics & figures
  • Individual work - Aflow regression: predicting bluk modulus (until the end of day)

    • Implement code (Taylor/Kaai will be avaliable for questions)
      • Fill out individual code sections. Answers are revealed before moving on (4 or 5 parts).

Homework: Think about data that might be interesting to learn on. We will talk about your ideas in the morning.

Day 2

  • Code along (do you want this many coding examples?) - A quick classification problem: predicting crystal structure (1 hr)

    • Augment same code structure for classification
    • generate error metrics & review recal vs precision.
  • Individual work - metal/non-metal band gap classification.

    • Work through full ML work flow while
    • Coaching - Discuss research ideas and data with Taylor/Kaai

Bonus material (things we can talk about if there is extra time):

* using matplotlib to make publication quality figures

About

A repository of code and data for the USCB sklearn workshop on February 7th & 8th, 2019

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published