Skip to content

sambhipiyush/Applied-Text-Mining-in-Python-University-of-Michigan

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Applied Text Mining in Python by University of Michigan on Coursera

UM-logo

Instructor(s) : V. G. Vinod Vydiswaran, Assistant Professor

About this Course

This course will introduce the learner to text mining and text manipulation basics. The course begins with an understanding of how text is handled by python, the structure of text both to the machine and to humans, and an overview of the nltk framework for manipulating text. The second week focuses on common manipulation needs, including regular expressions (searching for text), cleaning text, and preparing text for use by machine learning processes. The third week will apply basic natural language processing methods to text, and demonstrate how text classification is accomplished. The final week will explore more advanced methods for detecting the topics in documents and grouping them by similarity (topic modelling).

This course should be taken after: Introduction to Data Science in Python, Applied Plotting, Charting & Data Representation in Python, and Applied Machine Learning in Python.

Syllabus

WEEK 1

  • Key Concepts
    • Interpret text in terms of its basic building blocks: sentences and words.
    • Identify common problems with raw text and perform text cleaning tasks in Python.
    • Write regular expressions to find textual patterns.

Module 1: Working with Text in Python

You will be introduced to basic text mining tasks, and will be able to interpret text in terms of its building blocks – i.e. words and sentences, and reading in text files, processing text, and addressing common issues with unstructured text. You will also learn how to write regular expressions to find and extract words and concepts that follow specific textual patterns. You will be introduced to UTF-8 encoding and how multi-byte characters are handled in Python. This week’s assignment will focus on identifying dates using regular expressions and normalize them.

Graded: Module 1 Quiz
Graded: Assignment 1 Submission

WEEK 2

  • Key Concepts
    • Describe different natural language tasks.
    • Process free text through the NLTK toolkit to tag language constructs onto text.
    • Derive meaningful features from text.

Module 2: Basic Natural Language Processing

You will delve into NLTK, a very popular toolkit for processing text in Python. Through NLTK, you will be introduced to common natural language processing tasks and how to extract semantic meaning from text. For this week’s assignment, you’ll get a hands-on experience with NLTK to process and derive meaningful features and statistics from text.

Graded: Module 2 Quiz
Graded: Assignment 2 Submission

WEEK 3

  • Key Concepts
    • Compare text classification to other classification approaches (covered in Applied Machine Learning in Python as well)
    • Describe the Naive Bayes and Support Vector Machine algorithms
    • Classify text in two classes using one of these approaches in Python
    • Identify and extract features from text and transform them into feature vectors for the machine learning models

Module 3: Classification of Text

You will engage with two of the most standard text classification approaches, viz. naïve Bayes and support vector machine classification. Building on some of the topics you might have encountered in Course 3 of this specialization, you will learn about deriving features from text and using NLTK and scikit-learn toolkits for supervised text classification. You will also be introduced to another natural language challenge of analyzing sentiment from text reviews. For this week’s assignment, you will train a classifier to detect spam messages from non-spam (“ham”) messages. Through this assignment, you will also get a hands-on experience with cross-validation and training and testing phases of supervised classification tasks.

Graded: Module 3 Quiz
Graded: Assignment 3 Submission

WEEK 4

  • Key Concepts
    • Apply WordNet-based similarity measures on text
    • Derive semantic topics from a large text collection using LDA
    • List and describe techniques for named entity recognition and other information extraction tasks.

Module 4: Topic Modeling

You will be introduced to more advanced text mining approaches of topic modeling and semantic text similarity. You will also explore advanced information extraction topics, such as named entity recognition, building on concepts you have seen through Module One and Module Three of this course. The final assignment lets you explore semantic similarity of text snippets and building topic models using the gensim package. You will also experience the practical challenge of making sense of topic models in real life.

Graded: Module 4 Quiz
Graded: Assignment 4 Submission

Grading

Course Item Percentage of Final Grade Passing Threshold
Week 1 Quiz 5% 80%
Week 1 Jupyter Notebook Assignment 20%
Week 2 Quiz 5% 80%
Week 2 Jupyter Notebook Assignment 20%
Week 3 Quiz 5% 80%
Week 3 Jupyter Notebook Assignment 20%
Week 4 Quiz 5% 80%
Week 4 Jupyter Notebook Assignment 20%

Kudos!!!

Warm Regards,
Piyush Sambhi
Email: [email protected]
Git URL: https://github.com/sambhipiyush

About

Applied Text Mining in Python-University of Michigan

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published