Functional programming paradigm and languages are usually associated with core features like pure functions and immutable data (structures)1, and other properties like referential transparency, higher-order (first order) functions, lazy evaluation, and recursion2. They arguably provide programmers with better tools and abstractions to deal with difficult actual problems that involve concurrency, parallelism, big data, etc.
Despite all those important characteristics, the most commom remark I hear about functional languages when talking about them with friends and co-workers is: "What a strange syntax! And look at all those parentheses, they must make the code very hard to read and write!" So let's try to demystify this "strange" syntax and see what benefits it can give to developers.
The kind of "strange" syntax we are talking about is commonly associated with LISP3 and derived languages. It is easily recognized by the usage of parenthesis to denote function call/application (together with prefix notation, in which the operator is placed before its operands) and as scope delimiters (instead of {
and }
, for example).
LISP uses a notation called S-expression (for Symbolic expression). It is a notation for nested list data and is used for source code as well as data representation4. The authors of the excellent book SICP (Structure and Interpretation of Computer Programs)5 introduce s-expressions in a simple way in section 1.1.1 Expressions.
Expressions representing numbers may be combined with an expression representing a primitive procedure (such as + or *) to form a compound expression that represents the application of the procedure to those numbers. For example:
(+ 137 349)
486
(- 1000 334)
666
(* 5 99)
495
(/ 10 5)
2
(+ 2.7 10)
12.7
The authors call those expressions combinations. Combinations can accomodate procedures that may take an arbritary number of arguments, can be nested, and do not suffer from ambiguity:
(+ 21 35 12 7)
75
(+ (* 3 5) (- 10 6))
19
Those characteristics are already an improvement over languages that use infix notation: it is not necessary to repeat the operator between operands (e.g. 21 + 35 + 12 + 7
) and to remember operator precedence (e.g. multiplication is evaluated before addition).
Wikipedia defines homoiconicity (also know as code-as-data) as "a property of some programming languages in which the program structure is similar to its syntax, and therefore the program's internal representation can be inferred by reading the text's layout."6 Put another way, a homoiconic language uses the same (homo) representations (icons) for source code and data structures.
Most languages need to read a program's textual representation and parse it into something that can be compiled and/or evaluated. This "something" is known as AST (Abstract Syntax Tree), a data structure that represents formally what is expressed in the source code. The following image is a sample syntax tree for the expression requiresRole && userHasRole("MODIFY", assetId)
7.
Most languages do not provide a straightforward way to manipulate the syntax tree - it is an internal representation only accessible to the compiler during compilation, for example. Homoiconic languages, on the other hand, have programs written using data structures that represent the syntax tree directly. The same requiresRole
... code can be written as (and requiresRole (userHasRole "MODIFY" assetId))
. Note how the code structure resembles the syntax tree.
Homoiconicity enables higher level metaprogramming facilities that cannot be attained by languages that do not exhibit this property. Higher level metaprogramming means higher level abstractions, which frequently translate to less code, more work done, and a lower margin for mistakes. Let's take a look at some interesting examples of what can be accomplished through this powerful feature. (The Clojure8 language will be used in the examples, but the ideas and capabilities are shared among other LISP-like languages.)
for
loopJava 5 added a new syntax, know as enhanced for
loop, to the language. It enabled developers to rewritte an "old style" for
loop
for (int i = 0; i < collection.size(); i++) {
SomeType element = (SomeType) collection.get(i);
doSomething(element);
}
as
for (SomeType element : collection) {
doSomething(element);
}
The new syntax is easier to read and write, and hides complexities (such as index manipulation). Java programmers had to wait until Java 5 to use the enhanced for
loop because this addition required a change at the compiler level, and most programmers don't have the knowledge or ability to make it.
Clojure's metaprogramming workhorse is the macro9. A macro is a special function that is executed during compilation and whose arguments are unevaluated code. The arguments are data structures that represent code! Given that m
is a macro with a single parameter, the call (m (+ 1 2 3))
will result in the parameter's value being (+ 1 2 3)
10, not the value 6
.
With macros a programmer could define in a few lines a foreach
11 construct, similar to Java 5's enhanced for
loop, without touching the compiler or waiting years for it12:
(defmacro foreach
[[sym coll] & body]
`(loop [coll# ~coll]
(when-let [[~sym & xs#] (seq coll)]
~@body
(recur xs#))))
(foreach [x [1 2 3]]
(println x))
=> 1
=> 2
=> 3
S-expressions sometimes make writing and reading chained function calls a bit cumbersome because we need to write and read the expression from the inside out:
(prn (conj (reverse [1 2 3]) 4))
; (4 3 2 1)
=> nil
In natural language we would say: "Reverse the vector [1 2 3]
, append 4
to it and print the result. It would be nice if we could write code similar to what we just wrote in natural language:
(thread [1 2 3] reverse (conj 4) prn)
; (4 3 2 1)
=> nil
thread
would be a form that passes the first value ([1 2 3]
) through a series of method calls, passing the result of a previous call (e.g. (reverse [1 2 3])
) as the first argument of the next call (e.g. (conj [3 2 1] 4)
).
Clojure indeed has a macro that performs what was described: ->
(thread first). But the point here is that ->
is not a privileged language construct, it is a macro that we could have written ourselves:
(defmacro thread
[x & forms]
(loop [x x, forms forms]
(if forms
(let [form (first forms)
threaded (if (seq? form)
(list* (first forms) x (next forms))
(list form x))]
(recur threaded (next forms)))
x)))
S-expressions, homoiconicity, and prefix notation are not weird characteristics of esoteric programming languages. They are well-thought out features that give rise to powerful tools at a programmer's disposal. With proper care, we are able to write code at a higher abstraction level, leading to less verbosity, more meaning (think DLSs13), more maintainability, and better code structure in general.
If you are interested in learning the functional programming paradigm, my humble recommendation is the SICP book. The authors use the Scheme programming language14 and present the most important functional paradigm features and characteristics through several examples and exercises of increasing difficulty15. Those who have an object-oriented background, particularly with Java, may find Scala16 and Clojure very appealing.
https://clojurefun.wordpress.com/2012/08/27/what-defines-a-functional-programming-language/ ↩
https://en.wikipedia.org/wiki/Functional_programming#Concepts ↩
Example extracted from the book Clojure Programming, by Chas Emerick, Brian Carper, and Christophe Grand. ↩
In the macro context, (+ 1 2 3) is a list with four elements: the symbol + and three numbers. At runtime, (+ 1 2 3) is a function call. It is easy to see, then, the expression's "dual" nature: data structure (a list) and (executable) source code. ↩
During compilation, macros are recursively called to generate code that, in the end, does not have macro calls (only function or special form calls). In this sense, we could say that the compiler performs a macro expansion, in which one code (the macro call) is expanded into something else (a combination of function and special form calls). For example, the macro call (foreach [x [1 2 3]] (println x)) would be expanded to something similar to: ↩
(loop [coll3531 [1 2 3]] (when-let [[x & xs3534] (seq coll3531)] (println x) (recur xs3534)))
12. Clojure programmers do not need to define the foreach macro, since the language already has iteration facilities such as doseq. ↩
13. https://www.gnu.org/software/mit-scheme/ ↩
14. There are a lot of references and resources about SICP online. Particularly useful are sites where it is possible to find solutions for most of the exercises, sometimes with different implementations. But don't cheat! Try hard to solve the exercises by yourself before searching for someone's else solution. It will pay off! ↩