A curated list of useful resources for computer language engineering and theory
Whether you want to create a text-processor, a parser, a language application, a DSL (Domain Specific Language), or a full-fledged programming language with compiler and tooling, this page serves as a directory map to point you to the right direction.
Better yet, help others finding their way by contributing to this page with the resources that you think useful.
Just like other domains, knowing the available tools that are tried-and-true will save you a lot of time and efforts. Furthermore, you will also learn the emerging techniques that are adopted in different tools which make the skills more transferable.
ANTLR (ANother Tool for Language Recognition)
A powerful parser generator for reading, processing, executing, or translating structured text or binary files. It's widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees.
Describe language lexical and grammar specification in a declarative file format .g4
(Lex/Yacc format alike), and the generator can create a parser for the following target languages: Java, C#, Python, JavaScript, Go, C++, Swift (see update)
Learning materials:
-
Book: The Definitive ANTLR 4 Reference: Build your own languages with ANTLR v4
-
Doc: Supported Documentation for the above books
-
Web: ANTLR Mega Tuotrial: A comprehensive tutorial that explains all you need to know to understand language design fundamentals and use ANTLR effectively. There is PDF version. All free.
-
Video: Terence Parr - ANTLR4: Dr. Terence Parr introduces the latest (and last) revision of his parser generator, ANTLR
MPS (Meta Programming System)
With JetBrains MPS, you can define custom editors for any new language and make using these DSLs simpler. Even domain experts, who are not familiar with traditional programming, can easily work in MPS with domain-specific languages designed around their domain-specific terminology.
Learning materials:
-
Book: The MPS Language Workbench Volume I: a simple introduction to the JetBrains MPS language workbench and a complete reference manual
-
Book: The MPS Language Workbench Volume II: how to customize the MPS platform to better integrate it with the needs of your languages
Xtext is a framework by Eclipse for development of programming languages and domain-specific languages. With Xtext you define your language using a powerful grammar language. As a result you get a full infrastructure, including parser, linker, typechecker, compiler as well as editing support for Eclipse, IntelliJ IDEA and your favorite web browser.
Learning materials:
-
Book: Implementing Domain-Specific Languages with Xtext and Xtend: learn how to implement a DSL with Xtext and Xtend using easy-to-understand examples and best practices
Sirius is an Eclipse project which allows you to easily create your own graphical modeling workbench by leveraging the Eclipse Modeling technologies, including EMF and GMF.
Learning materials:
- Web: Official Guide: provides an introduction to Sirius and a series of tutorials to get started building your own graphical modeling tool
Flex and Bison are aging unix utilities that help you write very fast parsers for almost arbitrary file formats. Lex and Yacc are the original tools; Flex and Bison are their almost completely compatible newer versions.
Learning materials:
-
Web: Flex & Bison Tutorial: this webpage is supposed to be a tutorial for complete novices needing to use Flex and Bison for some real project.
-
Book: Flex & Bison: explains how to use flex and bison to solve your problems quickly. This is the update from the original Lex & Yacc book described below.
-
Book: Lex & Yacc: shows you how to use two Unix utilities, lex and yacc, in program development. These tools help programmers build compilers and interpreters, but they also have a wider range of applications.
A parser generator for reading binary data. This is a declarative language for specifying data structure of binary data in order to generate parser (in multiple target languages) that handles reading binary file formats, network stream packet formats, etc. It comes with a compiler, an IDE, a visualizer, and library of format specs.
Describe binary structure specification in a declarative file format .ksy
(YAML alike), and the generator can create a parser for the following target languages: C++/STL, C#, Java, JavaScript, Perl, PHP, Python, Ruby (see update)
Sed and Awk are two text processing programs that are mainstays of the UNIX programmer's toolbox.
- Sed is a stream editor (non-interactive) to do common text editing jobs like search/extract/replace/insert.
- Awk is a whole programming language ideal for handling data extraction, reporting, and data-reformatting jobs.
Both are command-line interface programs that can be used independently or together nicely for many text processing purposes. They are great for recognizing and extracting information from text input. For simple language recognition tasks, perhaps they are the best tools for the job with the least effort due to their simplicity and targeted use cases. Sed and Awk are part of most, if not all, Linux/Unix/macOS distributions. They are available to download for Windows as well.
Learning materials:
-
Doc:
-
Web:
-
Course:
-
Book:
Designing, Implementing and Using Domain-Specific Languages
The definitive resource on domain-specific languages: based on years of real-world experience, relying on modern language workbenches and full of examples. Domain-Specific Languages are programming languages specialized for a particular application domain.
Create Your Own Domain-Specific and General Programming Languages
Written by the author of ANTLR, and it is also the tool used in the book, but the general concepts apply regardless of what you use.
A classic compiler book that is known to professors, students, and developers worldwide as the "Dragon Book"
Learning how to use a C-like language such as Go to create a complete programming language by applying fundamental concepts of lexer, parser, AST (Abstract Syntax Tree), Pratt technique, and recursive descent parser. This also shows you how to implement a REPL (interactive language shell).
General:
- Five Questions About Language Design
- Designing the next programming language? Understand how people learn!
- So you want to write your own language?
- Generations of programming languages
- Turing Completeness
Paradigms:
- Programming Paradigms for Dummies: What Every Programmer Should Know
- Paradigms of programming languages
Type Systems
To the extent possible under law, Nikyle Nguyen has waived all copyright and related or neighboring rights to this work.