This course focuses on programming strategies and techniques behind procedural analysis and generation of text-based data. We'll explore topics ranging from evaluating text according to its statistical properties to the automated production of text with probabilistic methods to text visualization. Students will learn server-side and client-side JavaScript programming and develop projects that can be shared and interacted with online. There will be weekly homework assignments as well as a final project.
- Daniel Shiffman, Fridays, 12:10pm-2:40pm
- Course Notes
- All example code in this repo.
- Where to find course materials
- Overview / syllabus
- Homework / final project
- Beyond Processing and into JavaScript and p5.js
- Installing Node
- JavaScript 101
- Strings in JavaScript
- File I/0 with Node
- Simple Text Analysis
- Back to p5.js, processing text from a user
- Processing to p5
- OOP in JS
- DOM manipulation in p5
- Strings in JS
- File I/O in Node
- File I/O in p5
- Process user text in p5
- Sign up for the class google group
- Watch or read The Secret Life of Pronouns.
- Develop a program that "writes" or "reads" (or both) text, i.e. generate your own text from a source text (or via some other generative method) or create your own method for analyzing the statistical properties (or, dare I say, meaning) of an input text. You can use node to process a text file or you can get user input in a browser. Feel free to play around with visual ideas for displaying text with p5.js.
- Wiki page for submitting homework
- Intro to Regular Expressions
- meta-characters
- position
- single character
- quantifiers
- character classes
- meta-characters
- Testing regex with egrep
- Regex in JavaScript
- Splitting with Regex
- Search and Replace
- Plain JS
- P5 examples
- Chapter 1, Mastering Regular Expressions
- Guide to regex in JavaScript
- Eloquent JavaScript Regular Expressions
- Play the regex crossword!
- Another, older Java-based Regex game
- If you are looking for some inspiration about computational methodologies as it relates to writing, read about Jackson Mac Low. Mac Low was an American poet well-known for his use of chance operations and other algorithmic processes in his writing. Following are two articles I would suggest: Science, Technology, and Poetry: Some Thoughts on Jackson Mac Low by Mordecai-Mark Mac Low and Listen and Relate: Notes Towards a Reading of Jackson Mac Low by George Hartley.
- Practice Regular Expressions! If you are stuck for an idea, here are some suggestions:
- Take your code from week 1, expand and rework it using Regular Expressions.
- Taking inspiration from the Pirate Translator, re-imagine a text using regex search and replace.
- Create a program that performs Mac Low's Diastic reading of a text. Diastic Explanation, eDiastic demo
- Write a regular expression that matches any e-mail address.
- Take that regular expression and do a search and replace so that any e-mail address is made into a “mailto:” link.
- Create an example that reads an HTML page and removes any markup and leaves only the raw content.
- Write a regular expression that matches any e-mail address.
- Take that regular expression and do a search and replace so that any e-mail address is made into a “mailto:” link.
- Create an example that reads an HTML page and removes any markup and leaves only the raw content.
- Adapt the regex tester to bea search/replace tester.
- Create a regex that matches only code comments in code.
- Don't forgot to document your work online, upload to dropbox, and post to the homework wiki.
- Associative Arrays in JavaScript?
- Text Concordance
- Keyword finding: TF-IDF
- Text Classification: Naive Bayes
- Text Concordance, Source code
- Keyword finding: TF-IDF, Source code
- Text Classification: Naive Bayes, Source Code
- Rita Library Basics
- Parts of Speech Concordance
- Sample Datasets
- What our words say about us.
- TF-IDF Single Page Tutorial
- Paul Graham's A Plan for Spam and Better Bayesian Filtering
- Introduction to Bayesian Filtering
- Monty Hall and Bayes
- An Intuitive Explanation of Bayes' Theorem by Eliezer S. Yudkowsky
- The RiTa Library
- Luke Dubois' Missed Connections
- Nicholas Felton's 2013 Annual Report, NY Times Article
- Experiment with text analysis. Here are some ideas if you are feeling stuck.
- Visualize the results of a concordance using canvas (or some other means).
- Expand the information the concordance holds so that it keeps track of word positions (i.e. not only how many times do the words appear in the source text, but where do they appear each time.)
- Implement some of the ideas specific to spam filtering to the bayesian classification exmple.
- In James W. Pennebaker's book The Secret Life of Pronouns, Pennebaker describes his research into how the frequency of words that have little to no meaning on their own (I, you, they, a, an, the, etc.) are a window into the emotional state or personality of an author or speaker. For example, heavy use of the pronoun “I” is an indicator of “depression, stress or insecurity”. Create a page sketch that analyzes the use of pronouns. For more, visit analyzewords.com.
- Use the ideas to find similarities between people. For example, if you look at all the e-mails on the ITP student list, can you determine who is similar? Consider using properties in addition to word count, such as time of e-mails, length of e-mails, etc.
- N-Grams and Markov Chains
- Grammars
- Wordnik
- TwitterBot
- Animated Markov Chain explanation
- N-Grams and Markov Chains by Allison Parrish
- Context-Free Grammars by Allison Parrish
- N-Grams and Markov Chains by Daniel Howe
- Context-Free Grammars by Daniel Howe
- Google N-Gram Viewer, google blog post about n-grams
- Markov Models of Natural Language
- Three Models for the Description of Language (Chomsky)
- Generate text procedurally.
- Post a link about your work to: week 4 homework wiki
- Some ideas:
- Create page that generates its content by feeding an existing text into the Markov chain algorithm. What effect does the value of n (the “order” of the n-gram) have on the result? Allison Parish's ITP Course generator is an excellent example.
- Visualize N-gram frequencies. See WebTrigrams by Chris Harrison for an example.
- What happens if you mash-up two texts? For example, feed Shakespeare plays and ITP physical computing blog post content into the generator. Can you modify the MarkovGenerator object to weight the input text (i.e. make shakespeare N-grams have higher probabilities?) The Gnoetry Project is a useful reference.
- Rework any of the example programs to use something other than text (or, at least, text that represents language) as its basic unit. For example: musical notes, songs in playlists, pixels in an image, etc.
- Invent your own grammar. Consider using one that generates something other English sentences: music, images, code, etc.
- Build a grammar that pulls its terminal words from Wordnik.
- Build a grammar based on a source text as demonstrated here.
- Drawing Text with Canvas
- Drawing Text with DOM Elements
- Classic Text Visualization Techniques
- word clouds
- treemaps
- network diagrams
- Pulling text and metadata from APIs
- Stefanie Posavec
- Textarc by Bradford Paley
- Ariel Malka
- OpenBible
- State of the Union, NY Times
- On the Origin of Species: The Preservation of Favoured Traces, Ben Fry
- Office of Creative Research
- Jeff Clark
- Jonathan Corum, word incidence
- Lynn Cherny, talk pinterest
- Matthew Jockers
- wordle algorithm
- SoSoLimited Reconstitution 2008
- Rob Seward Word Association
- Create a final project proposal. Add a link to the wiki.
- Pulling data from APIs
- Making your own API
- Combining the two!
- Prepare final project and documentation as described here. Plan on 5 minutes (with 1-2 minutes of questions) for presenting.
- NOTE THE NEW DATE FOR THIS FINAL CLASS Oct 30 -- 12:30 - 3:00 pm Room TBA
- CodeAcademy: JavaScript
- How to learn JavaScript properly
- JavaScript the right way
- Code School
- JavaScript garden
- A re-introduction to JS by Mozilla
- JavaScript 101 from JQuery
- JavaScript: The Definitive Guide
- Eloquent JavaScript, Marijn Haverbeke
- Beginning JavaScript, Paul Wilton and Jeremy McPeak
- Checking code: JSLint / JSHint
- Browser debugging: Chrome Developer Tools (tutorial) / Firebug (tutorial)
- Mobile debugging jsconsole.com
- Sharing code snippets (useful for asking questions): gist.github.com
- Office of Creative Research
- ITP Course Generator by Allison Parrish
- Darius Kazemi
- Wordnik, Wordnik API
- WTFEngine
- programming language design prototyping tool by Ramsey Nasser
- Drawing with text
- Ariel Malka
- Werdmerge
- Visualizing Fiction's Structure, Lynn Cherny
- Secretlife of Pronouns
- Gnoetry
- Jackson Mac Low
- Grand Text Auto
- Nick Montfort
- Wordnet
- Wordcount
- Rita
- Allison Parrish
- You are required to attend all class meetings and submit all weekly assignments and a final project.
- Grading (pass/fail) will be based on a combination of factors:
- Attendance, participation in class discussion, and engagement in other students' projects (25%)
- Quality of weekly assignments (50%)
- Final Project (25%)