Skip to content

Latest commit

 

History

History
964 lines (680 loc) · 28.8 KB

kickstart.md

File metadata and controls

964 lines (680 loc) · 28.8 KB

Clojure Kickstart

Most people with a background in static imperative OO programming face initial difficulties when getting started with a dynamic Lisp-style functional language like Clojure. This is sad because programming in Clojure is a great experience!

The first hurdle is setting up a decent development environment that lets you enjoy the interactive nature of Clojure.

The next step would be to have a toy project. It should do significantly more than print Hello World, and it must have a direct connection to everyday programming tasks.

And finally, novices should get a well-chosen list of hints and links to continue learning on their own.

Scope of the workshop

The three things listed above are exactly what we try to accomplish in a 3-hour workshop. Don't expect having mastered the language afterwards, but you can expect to be well-prepared for learning Clojure and deep-dive into its ecosystem.

And this is how we get you started:

And if you want to continue learning Clojure afterwards, you can also join the local user group for Cologne/Bonn area.

Prerequisites for participation

Each participant should have an own notebook with at least 8 GB RAM, ready to run Java (version >= 8).

Prior amateur knowledge of at least one programming language (for example C++, Python, Ruby, Java, any Lisp, Scala) is required.

We provide bundles with Java and a customized Visual Studio Code (VSC) environment for Linux, Windows and OSX. There is a brief keyboard shortcut overview for Clojure programming.

Organizational requirements

  • Max. number of participants: 20
  • A room with decent power supply
  • Internet access
  • Projector (we can bring our own)

Curriculum

Exercise: On the command line create a new project called "practising" using lein new practising. Open the folder practising in VSC and then open the file src/practising/core.clj. Wait a few seconds until the REPL is started. Connect to the REPL in the embedded Terminal via lein repl :connect.

The S-expression

Clojure is a Lisp. Code is organised in possibly nested expressions of the form:

(operator arg1 arg2 arg3 ...)

The operator is something we can invoke, usually a function. There are also special forms and macros.

Every arg is itself either an expression or a symbol. Before it is passed to a function invocation it is evaluated, unless it is quoted.

Exercise: In the file src/practising/core.clj enter your first hello world expression: (println "Hello World"). Evaluate it, you should see the text "Hello World" printed in the REPL.

Exercise: Quoting prevents the evaluation. In the REPL try to evaluate an expression like (+ x y z) and then try '(+ x y z).

There are some notable facts about this way of using brackets:

  • Code is not structured as a sequence of text lines, but a tree of expressions. This changes the way we can navigate and manipulate our code. Lisp leverages Paredit, which is elsewhere known as structural editing. Paredit manages the balancing of brackets for you. This gives you more power after you learned to use these new tools.

  • Code organization is very uniform: It's always prefix notation. There is no need for operator precedence rules. Arithmetic operators are functions and can be used anywhere where functions are applicable. On the other hand, arithmetic expressions in prefix notation look unusual and this takes some practise.

  • The syntactic basis of expressions are in effect lists, in other words: the code of Clojure is expressed in terms of Clojure's data representation. This idea is called "code-is-data", or "homoiconicity" for those who want to sound very smart. Since a Lisp is very good at manipulating data it can easily be used to create code. Macros look just like functions but are in effect embedded code generators, written in plain Clojure, executed at compile time. All of this means: You can morph your language in almost any direction helping you to better describe solutions for your problem domain.

  • Excessive nesting of expressions and overloading in meaning of parentheses are typical drawbacks of Lisps, but Clojure mitigates them with threading macros and the use of [] and {} brackets. You'll not have more brackets in your Clojure code than in code written in your favorite C-style language.

Using comments

  • The ;; form creates line comments, either whole line or rest of line.

  • The #_ reader macro lets the reader skip the immediate following expression, which is very useful inside expressions.

  • A (comment ...) must still be a well-formed S-expression and is used to encapsulate blocks of code that is used only for experimentation in development time.

Working with data

Scalar types

  • Numbers map to Java and JS number types

    • Automatic coercion

    • Rational numbers

  • Strings are Java or JS strings

  • Boolean values are true and false. Non-nil values are considered 'truthy'.

  • nil is null, it's the only 'falsey' value beside false.

  • Symbols are used for identifiers in code

  • Keywords are similar to Strings or Symbols, but can be used with namespace scoping.

Exercise: Calculate the average of the numbers 32, 23 and 1 with the functions + and /.

Exercise: Concatenate 2 strings using function str

Exercise: Convert a string to a keyword and vice versa, using functions keyword and name.

Collection types

As part of the core, Clojure offers a small variety of immutable datastructure types:

  • Vector: [1 "foo" :bar]

  • Map: {:one 1, "Two" 3.0}

  • Set: #{1 'Two 3.0}

  • List: '(1 "Two" :three)

There are quite some common functions that work on all datastructures in a sensible way.

Exercise: Define an example datastructure in namespace practising.core for each of the types shown above using an expression like (def myvector ...). Evaluate the whole namespace, inspect the contents of your definitions in the REPL, change one the definitions and re-evaluate it.

Exercise: Value lookup in maps can be done in three ways. You can use get, or a map as function on keywords or a keyword as a function on maps. Define a map whose keys are keywords. Try out each of these ways to lookup a value.

Exercise: Try to apply the following functions to each of your data structures:

  • first

  • rest

  • last

  • conj

  • count

  • get

  • seq

  • empty

Exercise: Visit the official cheatsheet, read in the section for "Collections" about the datastructure type-specific functions for maps, vectors and sets. Try out functions like conj, assoc, dissoc, disj etc. on your example data. To use typical functions for sets (like union or difference) you'll need to learn a bit about namespaces, see the upcoming section.

Exercise: Be aware that vectors are associative, with an index being the lookup key. This allows us to apply certain map-like operations to them. Use this idea to replace an existing value in a vector.

Namespaces

Clojure data and function definitions are organized in namespaces. Imagine a namespace as a dynamic map of symbols to Vars, and think of a Var as a box holding a piece of data or a function. (It is tempting to think of a Var as the same as a variable in imperative languages, and there are indeed similarities. However, the concept "variable" has no real meaning in functional languages. Be patient.)

Your file src/practising/core.clj has a namespace declaration at the top.

Each def or defn inside it is effectively a mutation to this map, executed when the Clojure runtime loads and compiles your namespace.

You can inspect a namespace at runtime, and the symbol *ns* always refers to your current namespace. The expression (ns-interns *ns*) results in a map of all these definitions. In our example, this would return the same as (ns-interns 'practising.core).

To use public definitions located in other namespaces a namespace must require them first. The typical way is like this:

(ns my.beautiful.ns
  "Contains my best code ever."
  (:require [clojure.string :as str]))

(defn first-funny-function
  [s]
  (str/split s #","))

The str here is used as an alias for anything reachable in clojure.string namespace. Please note, that this alias does not clash with the clojure.core function str.

Exercise: Require namespace clojure.set with alias set and try out functions like set/difference or set/intersection.

Functions

Clojure is a functional programming (FP) language. While object-oriented programming uses the object (together with its blueprint class) as the smallest building block, FP languages are based on functions operating on a small spectrum of datastructures.

Functions are values. This means

  • we can create them anywhere with an expression (fn [x] ...).

  • we can pass them to other functions (promoting these other functions to higher-order).

  • we can return them as values.

Forms to create functions

There are two ways to define a function:

  • The first is a combination of def and fn and results in a top-level function definition in a namespace, making it available for any other function:
(defn average-age
  [persons]
  ...)

This is the equivalent of writing:

(def average-age
  (fn [persons]
    ...))
  • The other is the anonymous form (a.k.a lambda expression), occurring often within a surrounding function:
(defn wrap-logging
  [handler]
  (fn [request]                        ;; <-- creates an anonymous function
    (log/debug "REQUEST:" request)
    (let [response (handler request)]
	  (log/debug "RESPONSE:" response)
	  response)))

For the anonymous form there is an even more compact notation. For example, instead of

(map (fn [x] (/ x 2)) numbers)

you're allowed to write

(map #(/ % 2) numbers)

Please note that the latter form can make your code much harder to understand if the anonymous function becomes more complex.

Closures

Anonymous functions can close over symbols visible in their surrounding scope, making them closures that carry values:

(defn make-adder
  "Returns a 1-arg function adding x to its argument."
  [x]
  (fn [y]
    (+ x y)))

=> (def add-3 (make-adder 3))
#'add-3

=> (add-3 2)
5

Function arguments

Formal arguments are defined in a vector of symbols after the docstring of a function:

(defn round
  "Round down a double to the given precision (number of significant digits)"
  [d precision]
  ...)

A single function can support multiple arities. In addition, you can define a variadic function that accepts any number of arguments. For our workshop goals, we don't need to go into the details here. If you are curious there is guidance in the Clojure docs on functions.

Clojure functions support a nifty way to bind data pieces in complex datastructures to local symbols, widely known as destructuring. The exact same tool is also available in let and for expressions.

Just as a glance, suppose you need to process a map entry, represented as a pair [key value] in one of your functions. Instead of writing

(defn uppercase-value
  [map-entry]
  [(first map-entry) (str/upper-case (second map-entry))])

you can write

(defn uppercase-value
  [[key value]]
  [key (str/upper-case value)])

This is called positional destructuring.

There is also support for map destructuring, useful for the very common case of processing a map like this:

(def track {:title "Be True"
            :artist "Commix"
            :genre "Drum & Bass"})

(defn track->str
  [{:keys [artist title]}]
  (str artist " - " title))

These examples provide only a first idea. Destructuring in Clojure is much more powerful, and can be extended further by libraries like plumbing. It very much leads to more readable code, therefore it is used quite often. For more detail, you should visit the Clojure docs on destructuring

Visibility

Functions defined with defn are public, which means any code outside the namespace can depend on it. It is good style to have per namespace a sharp distinction between the set of functions comprising the API and internal implementation details.

In order to limit what a namespace offers to the rest of the world Clojure allows us to attach metadata to any Var in a namespace:

(def ^:private a-constant 42)

(defn ^:private some-intermediate-calculations
  [...]
  ...)

Since private functions are very common there is a macro defn- to reduce visual clutter:

(defn- some-intermediate-calculations
  [...]
  ...)

Functions on functions

This section is not really necessary to follow the workshop. It shows to the curious a little of the power that Clojure offers when working with functions.

  • (partial f a b ...) allows you to apply a function f to a subset of the required arguments resulting in a new function that has those arguments fixed:
(defn add
  [x y z]
  (+ x y z))

(def add-12 (partial add 5 7))

=> (add-12 3)
15
  • (apply f coll) helps when we have an n-arity function f and an n-element collection coll, and want to invoke f with the elements of coll as arguments:
(defn add
  [x y z]
  (+ x y z))

(def numbers [1 2 3])

=> (add numbers)
;; will throw an ArityException

=> (apply add numbers)
6
  • (comp f g) composes two functions f and g (or more) so that the resulting function behaves on x like (f (g x)):
(def str->id
  (comp str/trim str/lower-case))

=> (str->id "  ABC ")
"abc"
  • (memoize f) produces a function that caches results of a function f:
(defn- my-really-costly-calculation-impl
  [a b]
  ...)

(def my-really-costly-calculation
  (memoize my-really-costly-calculation-impl))
  • (juxt k1 k2 ...) returns a function that looks up values for the provided keys k1, k2 etc. and delivers them in one vector:
(def persons [{:firstname "Peter" :lastname "Pan"}
              {:firstname "Daisy" :lastname "Duck}])

=> (map (juxt :firstname :lastname) persons)
(["Peter" "Pan"]
 ["Daisy" "Duck"])
  • (fnil f initial-value) returns a function that replaces its first argument with initial-value in case it is nil. The benefit becomes clearer when recognizing that most "modification" functions like conj or assoc expect a collection as their first argument. When building up new datastructures fnil is an elegant tool to handle initialization cases.
(def db {})

=> (update db :persons (fnil conj []) {:firstname "Donald" :lastname "Duck"})
{:persons [{:firstname "Donald" :lastname "Duck"}]}

System design in functional programming

An important property of a function is purity. A function is called pure if its result depends only on its arguments and if it does not change anything in its environment (in other words: it has no side-effects). Pure functions are pleasant because they are

  • easy to reuse,

  • easy to test,

  • thread-safe,

  • candidates for memoization.

Not surprisingly we want to have as many of them around us as possible. However, a system created of 100% pure functions is useless: no access to any input, no place to write any output to. We need to have some of our code do the "dirty job".

So the fundamental principle of program design in FP is:

  1. Build as much of the system as possible as a pure transformation of data into other data.

  2. Allow only very few pieces of code to interact with the world outside (that is: read and write data).

Conditional evaluation

Clojure offers many ways to express conditional evaluation: if, if-let, when, when-let, cond, case, condp, and on top of these there are conditional threading operators (introduced in a section below). But don't be daunted, most of the time if or cond will do, and all others offer more or less syntactic sugar to those.

Here's the grammar of if, which does not offer any surprises:

(if <test-expr>
  <then-expr>
  <else-expr>?)

Since the else-expr is optional the if expression will return nil if the test-expr fails to return a truthy value.

If you need more than two branches then cond will help:

(cond
  <test-expr1>  <then-expr1>
  <test-expr2>  <then-expr2>
  ...
  :else <else-expr>)

The first then-expr whose preceding test-expr returns a truthy value will be the evaluation result of the cond, otherwise the else-expr when present, otherwise nil.

The case expression is more akin to the switch in C-style languages:

(case <expr>
  <value1>  <then-expr1>
  <value2>  <then-expr2>
  ...
  <else-expr>)

The values can be any literals, even vectors or maps. If there is no else-expr and none of the values matches the result of expr then an exception is thrown.

To learn about condp, a nifty macro that reduces clutter in a special case of branching, you should try the next exercise.

Exercise: Look at the following function and consult the docs on condp. Replace cond with condp.

(defn score->grade
  [score]
  (cond
    (<= 90 score) "A"
    (<= 80 score) "B"
    (<= 70 score) "C"
    (<= 60 score) "D"
    :else "F")))

Local symbols with let

Suprisingly many functions work well as just one pipeline of function invocations. (The section about threading explains how we can limit the nesting of expressions.)

Of course, there are still numerous situations where we wish to bind an intermediate result within a function to a local symbol. In imperative languages we use local variables for this job, and it may look and feel as if we did the same in Clojure, but conceptually symbols just refer to evaluation results, let does not give us "boxes with varying content".

To introduce local symbols we use let, as in this example:

(defn path->filename
  [path]
  (let [parts (remove str/blank? (str/split path #"\/"))]
    (if (not (empty? parts))
      (last parts))))

In one let you can have as many symbol-expression pairs as you like, and you can use destructuring where you would normally place the symbols.

There is also if-let, which is helpful when your let body should be evaluated only if a test yields a truthy (non-nil, non-false) value:

(defn path->filename
  [path]
  (if-let [parts (seq (remove str/blank? (str/split path #"\/")))]
    (last parts)))

The seq here is like a test, because it returns either a sequence (truthy) or nil if the resulting sequence would be empty. It's the idiomatic way of writing (not (empty? ...)).

Threading macros

Excessive nesting of expressions makes it much harder to read and understand what is going on within a function. One mitigation is the use of let to introduce descriptive symbols, the other is threading.

Compare these examples, whose result is the same:

(assoc-in
  (assoc-in person [:employer :name] "doctronic")
  [:address :street]
  "Frankenstrasse 6")
(let [person
      (assoc-in person [:employer :name] "doctronic")

      person
      (assoc-in person [:address :street] "Frankenstrasse 6")
  person)
(-> person
    (assoc-in [:employer :name] "doctronic")
    (assoc-in [:address :street] "Frankenstrasse 6"))

Threading macros reorganize your code at compile time. The operator -> (called thread-first) takes the initial expression and inserts it as the first argument into the second expression, continuing until everything is nested.

It has a sibling ->> (called thread-last) doing the analogue with the last argument, which is often needed for sequence processing chains.

And both have cousins like cond->, cond->>, some-> and some->> that help with conditional processing steps.

Exercise: Use macroexpand and apply it to the thread-first example show above. (Don't forget to quote the expression to prevent the evaluation.)

Exercise: Use some->> to rewrite the path->filename function

Sequence processing

Clojure excels in data processing tasks. One of the reasons is the sequence abstraction that can be applied to all datastructures and a well-designed set of core functions that transform seqs into other seqs.

A consequence is that idiomatic Clojure code contains almost no looping. Another consequence is that programmers accustomed to imperative for and while loops need to re-learn how to process data on a significantly higher level. On this level, the brain is no more bothered with irrelevant details, however it is challenged with unfamiliar tools and solution strategies.

Exercise: Most sequence processing functions like map, filter etc. expect a sequence and ensure this by using seq on their argument. Apply seq to datastructures like a map, a set or a vector and see what is returned.

Exercise: Define a vector of persons, each with a name and an age. Write a filter expression that selects all persons in the age between 20 and 29.

Clojure offers the following different approaches for processing sequences:

  • map, mapcat, filter, reverse, sort and friends are usually good when you target a sequence as result. Building up a chain of these operations in combination with the thread-last macro ->> often yields the most elegant and maintainable solution. By far the biggest portion of sequence processing in idiomatic Clojure code is done on this basis.

  • for is Clojure's list comprehension operator. Be careful to not confuse it with the C-style for loop. It is great for traversing nested datastructures, when your goal is a one-dimensional result sequence. It is well suited for templating code, for example when rendering HTML or XML elements. It offers destructuring, local symbols and conditions.

  • reduce is a swiss army knife that can produce almost anything, sometimes leading to convoluted solutions. A reduction is often the terminal step of a sequence processing chain (for example into is only a special purpose reduction).

  • doseq is the imperative variant of a list comprehension. It provides the same traversal power as for and should be used exclusively for side-effects, for example writing out a bunch of files to disk.

  • Old school function recursion is still a valid approach, and may yield the cleanest code in some situations, but be aware that your call-stack may limit your problem size. The raw mechanics of full tree traversion is already provided in clojure.walk.

  • loop-recur is a manual tail call optimization for recursive operations, and effectively the most low-level construct. It's sometimes unavoidable, for example if you build a parser, or need to consume or produce several distinct pieces of data. While learning Clojure you might sometimes find yourself longing for a quick loop-recur sin. Resist. Step back. Ask yourself twice if there is no better tool for the job at hand.

Coming to a decision on how to approach a collection transformation problem boils down to looking at the list above from top to bottom and picking the tool that yields the simplest code. "Simple" in the Rich Hickey sense.

Rules of thumb:

  • If you need to traverse a more-dimensional data structure then give for a try, otherwise see if a combination of map, filter and others, perhaps terminated with a reduce, does the job.

  • If you need side-effects then doseq is probably the best bet.

  • To aggregate data into a non-sequential value (which can also be a map) a single reduce is usally all you need.

Exercise: Return a sequence of e-mail addresses that end with ".de" for the following data structure:

(def friends
  [{:name   "Fred"
    :emails ["[email protected]" "[email protected]" "[email protected]"]}
   {:name   "Ann"
    :emails ["[email protected]" "[email protected]"]}])

Hint: You can approach this both with mapcat as well as for. Solve the problem with both and compare the solutions.

Exercise: Take the friends data structure from above and produce a map {email -> name}.

Hint: Again, mapcat and for are both sensible choices but if you compare the solutions you'll see where for starts to shine.

Lazy sequences

Clojure features so-called lazy sequences. Laziness is an optimization strategy. Here's a snippet to illustrate the effect:

(->> (range 1e6)
     (take-while #(< % 100))
     (filter odd?))

range returns a sequence of potentially 1 million integers, but it is not realised, so you pay almost nothing for this huge number of numbers. take-while cuts this sequence off after 99, so filter actually processes only 100 values.

Most sequence processing functions as well as for return unrealised lazy sequences, where actual processing is done as soon as someone explicity accesses the values. Most of the time this is exactly what we prefer. But there are a few exceptions to that rule, for example:

  • Processing a sequence of records in a database transaction must be finished before the transaction is committed.

  • When you work on closable resources (like streams or files) you'll wrap the processing in a with-open expression. You certainly need a guarantee that any execution is finished before the resource is closed.

  • If side-effects are involved they could be delayed or not executed at all if no one asks for a result value.

You have basically two explicit ways to control laziness:

  • If you're interested in side-effects use doseq.

  • If you work with a transaction or a resource you can append a (doall) to your processing chain or wrap your for list comprehension in a (doall ...) expression.

Forgetting to turn off laziness is a very common cause of bugs, so watch out for this.

Managing mutable state

The data that Clojure functions process is almost exclusively immutable. "Mutations" like assoc or conj efficiently create new versions of existing data. However, in almost every meaningful program there has to be a small amount of mutable state, which more often than not must be managed in a thread-safe manner.

For this, Clojure offers four types of "boxes", each with its own rules regarding concurrency.

  • The Var is the thing that keeps functions and values in namespaces. A Var is usually initialised when a namespace is loaded, and the values usually don't change. Clojure offers an advanced feature called dynamic scoping where we can rebind Vars "down the call-stack" to different values.

  • The Atom is the box that is most often used in practical tasks. It holds a single value, and changes to it happen atomically, synchronously and uncoordinated.

  • The Ref is part of the in-memory transaction system that Clojure offers. Changes happen atomically, synchronously and coordinated together with all other Refs touched in the same transaction.

  • Finally there is an Agent where changes happen atomically, asynchronously, generally uncoordinated, but can be triggered by a commit of a related transaction.

Let's define an Atom:

(def !counter (atom 0))

The leading "!" in !counter is just a convention to give readers of the code a clear sign that the code deals with something mutable.

The expression @!counter yields the currently set value.

To update an atom we need swap! and a side-effect free function, because in multithreaded environments updates could be retried.

(swap! !counter inc)

or

(swap! !counter + 1)

The function we provide to swap! (inc or +) receives the current value of the atom as first argument and any additional arguments that we pass to !swap. If everything is fine the result of our function is set as new value.

You can create atoms also in functions, for example like this:

(defn make-id-gen
  [initial-value]
  (let [!count (atom initial-value)]
    (fn []
      (swap! !count inc))))

(def gen-id! (make-id-gen 0))

=> (gen-id!)
1

=> (gen-id!)
2

Again the trailing "!" in gen-id! is a mere convention to remind us that gen-id! has a side-effect.