Short introduction to Racket: Syntax and a simple Evaluation Model
Table of Contents
1 A brief history of Racket
Racket is a descendant of Scheme. When it comes to syntax, Racket and Scheme are members of the Lisp family of languages, which includes Common Lisp and also Emacs Lisp. The salient feature of the syntax is that it uses white space and parentheses. Read these collection of mostly laudatory quotes about Lisp.)
Lisp was invented to study recursive function theory (McCarthy, 1958). It was successfully and extensively employed in building early Artificial Intelligence systems. Scheme (Sussman and Steele, 1975), a dialect of Lisp, grew out of an attempt to understand Actors (Hewitt, 1972) in Lisp. The result was a language that, while sticking to Lisp's parentheses laden syntax, was a significant departure from Lisp in its semantics: it embraced lexical scope pioneered by Algol. Lexical scope was crucial to correctly modelling the Lambda Calculus (Church, 1932). Scheme made it possible to write executable interpreters for the Lambda Calculus (Sussman and Steele, 1975) and also many features of programming languages. (See the Lambda Papers.) (For an account of the History of Lambda Calculus, see (Cardone and Hindley, 2006).
Scheme's evolution proceeded via a series of Revised Reports, the latest of which is Revised7 Report on the Algorithmic Language Scheme. Scheme was designed to be small. The introductory paragraph of the Scheme Report captures this philosophy:
Programming languages should be designed not by piling feature on top of feature, but by removing the weaknesses and restrictions that make additional features appear necessary. Scheme demonstrates that a very small number of rules for forming expressions, with no restrictions on how they are composed, suffice to form a practical and efficient programming language that is flexible enough to support most of the major programming paradigms in use today.
Racket's immediate parent was PLT Scheme (Felleisen et al.) which was another dialect of Scheme with its own interactive development environment or IDE. Racket evolved as Scheme adding more data types, more keywords and many more built in libraries to Scheme. The presumptive goal was to make it suitable for more real-world applications (`batteries included'), with a module system, class system, many other libraries, immutable lists, etc.
2 Learning Racket's syntax
If you're in a hurry, here is the Racket Syntax Cheat Sheet. The best way to learn the syntax of a programming language is to start small, with a minimal set of constructs to get things done. To learn Racket, we will start with a very small subset that drawn from the Syntax of Scheme. Racket's syntax includes that of Scheme, but has other elements to it.
In the grammar below, for non-terminals whose definitions are not listed here, please consult the Scheme Syntax Reference or Racket Reference.
Racket's syntax is quite permissive. Identifiers can all many special characters, including spaces. The language is insensitive to the amount of white space, Any amount may be used to separate syntactic elements. Unlike Python, indentation is not enforced, rather it is encouraged as good programming practice.
3 Datum
The key to the syntax of Racket is the notion of a datum.
A datum is either list expression or a non-list expression.
In the latter case, it is either an atom or a vector of
datum elements. In what follows, syntactic categories are
denoted by identifiers wrapped in <
and >
.
<datum> ::= <list-expr> | <non-list-expr> <non-list-expr> ::= <atom> | <vector>
3.1 Atom
<atom> ::= <boolean> | <number> | <character> | <string> | <symbol> | <procedure>
The syntax of atomic data is described in the Racket Guide under Built-in Datatypes.
3.2 Vector
<vector> ::= #(<datum> ...)
The meta notation <thing> ...
denotes a sequence of zero
or more (but finite) occurrences of the category <thing>
.
Note that the syntax of a vector consists of a hash #
followed by a pair of round brackets enclosing a sequence of
zero or more <datum>
items.
Some examples of vectors:
#(3 4 5) #("one" "two") #()
3.3 List Expression
<list-expr> ::= (<datum> ...) | (<datum> <datum> ... . <datum>) ^ note the dot
In the first variant, list expression is a sequence of zero
or more <datum>
elements enclosed in round brackets.
(Square brackets are also permissible.) Notice the absence
of commas between the elements. As a special case list can
be empty, i.e., ()
.
In the second variant, a list expression consists of a
sequence one or more <datum>
elements followed by a dot
.
, followed by another <datum>
. Notice that there can
be exactly one <datum>
after the dot.
3.4 Pair
A pair is a list expression form (<datum> . <datum>)
.
<pair> ::= (<datum> . <datum>)
In a pair (<d1> . <d2>)
, <d1>
is called the head of
the pair, and <d2>
the tail of the pair. Note that a
pair may be represented as a labeled directed tree with
three nodes, the pair and <d1>
and <d2>
and edges
labeled `head' and `tail' from the pair to <d1>
and <d2>
respectively.
In the Lisp family parlance, a pair is also called a cons cell, the two components of the pair are accessed via the functions car and cdr operating on the pair.
(<d1> . <d2>) | \ | \---tail---> <d2> | +---head---> <d1>
3.5 Nested Pairs
Pairs can be composed, or nested. For example, consider the
nested pair (3 . (a . (b . 5)))
. The gut of the nested
pair is the right-most element in its tree representation.
In the example, the gut of the nested pair is 5
.
3.6 Conversion
Different list expressions may denote the same object. For
example, the nested pair (3 . (2 . ()))
is equivalent to (3
2)
. We seek a way to transform list expressions into other
list expressions. The transformation process involves either
introducing or eliminating a `dot'. Transforming to a nested
pair introduces a `dot'. Transforming away from a nested pair
involves eliminiating a `dot'.
The operation of elimination and introduction is captured by the following ELIM/INTRO rewrite rules. Transformation is the process of applying these rewrite rules.
(<d1> <d2> ... . (<d3> . <d4>)) -> (<d1> <d2> ... <d3> . d4) ; nest ELIM (<d1> <d2> ... . ()) -> (<d1> <d2> ...) ; () ELIM (<d1> <d2> ... <d3> . d4) -> (<d1> <d2> ... . (<d3> . <d4>)) ; nest INTRO (<d1> <d2> ...) -> (<d1> <d2> ... . ()) ; () INTRO
Each rule has a left hand side, a right hand side, and a name.
Here are some examples of transformations. The rewrite rules
are specified with pattern variables (like <d1>
, <d2>
,
etc.).
(3 . (2 . (1 . ()))) =nest-ELIM=> -------------------- (3 2 . (1 . ())) =nest-ELIM=> ---------------- (3 2 1 . ()) =()-ELIM=> ------------ (3 2 1)
The underlines indicate a redex, short for reducible expression. A redex for a rule is a subexpression that matches the left hand side of the rule. It indicates the subexpression to which the rewrite rule is being applied. Note that in the above example transformation, the redex is always the entire expression. We say a redex rule is applied at the `top level' when the redex is the entire expression.
(3 . ((2 . ()) . 4)) =nest-ELIM=> -------------------- (3 (2 . ()) . 4) =()-ELIM=> -------- (3 (2) . 4)
Note that in the above example, the ()-ELIM
rule is not
applied at top level.
3.7 Lists
A list (or list form) is a list expression satisfying the following grammar. A list
<list> ::= (<datum> ...) | (<datum> <datum> ... . <non-list-expr>)
A list is proper if it is of the form (<datum> ...)
. It
is improper if it is of the form (<datum> <datum>
... . <non-list-expr>)
. (Note: in racket, the list?
is
used for cheking for proper lists, not all lists.)
Another way of describing a list is that it is a list expression that is not a redex for any of the ELIM rules. In other words, in a list, the ELIM rules can not be applied at the top level.
Every list expression may be converted to a list form by repeatedly applying the elimination rules at the `top level'.
3.8 Normal forms
A list expression is in list normal form if there are no
ELIM redexes in it. Notice that a list expression may be
a list but not a list normal form. For example, the list
expression ((3 . ()) 2)
is a list but is not in list normal
form. However, it may be transformed to ((3) 2)
, which is
in list normal form.
((3 . ()) 2) ; in list form ; but not in list normal form ((3) 2) ; the result of transforming to list normal form
A list expression is in pair normal form if there are no redexes for any INTRO rewrite rule.
Every list expression may be converted to a pair
normal form by applying the INTRO rules repeatedly until
saturation (i.e., until they are no longer any INTRO
redexes). For example, the list expression ((3 . ()) 2)
when converted to pair normal form is ((3 . ()) . (2
. ()))
.
3.9 Questions for future
We have claimed that every list expression may be transformed into a list, into a list normal form, and a pair normal form. We haven't, however, proved this property. At least, not yet.
Also, what can we say about the uniqueness of the list and pair normal forms? Can a give list expression be transformed to two distinct pair normal forms, or two distinct list normal forms? The answer, no: each list expression transforms to, or has, exactly one pair normal form and one list normal form. Again, the proof of why this is so is for later.
3.10 Uses of normal forms
All list expressions printed by Racket are in list normal
form and are prefixed with a quote ('
) mark. (We will
explain the role this quote mark shortly. ) It is convenient
to assume that internally, Racket stores list expressions in
pair normal form.
All datum types discussed so far may be read by the Racket interpreter, except a procedure. It can not be read in, nor its internals printed out. It can, however, be part of a list expression.
4 Program syntax
In what follows, we will stick to a very small subset of Racket. For the constructs whose definitions are not listed here or for the full syntax, please consult the Racket Reference Manual.
<pgm> ::= #lang racket <form> ... <form> ::= <defn> | <exp> | <require> | <provide> <require> ::= See Racket Reference <provide> ::= See Racket Reference <defn> ::= (define <id> <exp>) <exp> ::= <literal> <id> ; <identifier> (if <exp> <exp> <exp>) | ; (lambda <formals> <exp>) | ; lambda expression (letrec (<bind> ...) <exp>) ; recursive binding (<exp> <exp> ...) | ; application (<quote <exp>) ; quote <literal> ::= <number> ; number | <boolean> ; boolean | <character> ; char | <string> ; string <id> ::= <symbol> <bind> ::= (<id> <exp>) <formals> ::= (<id> ...) | (<id> ... . <id>) | <id>
5 Derived Forms
So far, we have introduced only five keywords! Racket has many other keywords, but many of them are derived: they are defined in terms of the basic form. It is best to consider their definitions via pattern matching:
5.1 Lexical (non-recursive) definitions
let
is derived syntax built as an application of a lambda
expression to arguments.
(let ([<id> <e>] ...) <body>) => ((lambda (<id> ...) <body>) <e> ...)
Here <body>
is any expression. Note again the use of
...
to denote zero or more of the previous syntactic
datum.
5.2 Sequencing
begin
is a way of evaluating a sequence of expressions and
returning the last one.
5.2.1 Syntax and meaning
The syntax of a begin expression is
(begin <exp1> <exp2> ...)
The expressions <exp1> <exp2> ...
are evaluated in sequence
and the value of the last expression is returned.
(begin e) => e (begin e1 e2 ...) => (let ([x e1]) (begin e2 ...))
Here x
is assumed to be a `fresh' identifier name not
occur anywhere in the begin
expression.
5.3 Conditional
Conditional expressions (starting with the cond
keyword)
are derived using if
. Note the recursive nature of the
transformation.
5.3.1 Syntax
The (slightly simplified) syntax for conditional is
<cond> ::= (cond <clause> ... [else <exp>]) <clause> ::= [<exp> <exp> ...]
Note that the a <cond>
consists of zero or more clauses
followed by an `else' clause.
Each <clause>
has at least two expressions. The first
expression in the clause is evaluated If it is true, the
remaining expressions in that clauses and the value of the last
one is returned. Otherwise, the next clause is considered. If
all the clauses `fail', then the expression in the else part is
evaluated as the value of the entire else
expression.
5.3.2 Transformation to <if>
expressions
(cond [else e]) => e (cond [e e1 e2 ...] clause ...) => (if e (begin e1 e2 ...) (cond clause ...))
5.4 Function definition
When using Racket, you might have used define
in
for defining functions. In that syntax, one often combines the
formal parameters of the function along with the identifier for
the function. For example, consider the definition
(define (f x) (+ x 2))
In the first line, we know we are defining a function f
because of the formal parameters following f
, as in (f x)
.
The same may be written in the canonical way:
(define f (lambda (x) (+ x 2)))
5.4.1 Transformation
The transformation is simple. The identifer f
is now bound to
a lambda
expression and the formals move next to the lambda
.
(define (f formal ...) exp) => (define f (lambda (formal ...) exp))
6 Basic Evaluation mechanism of Racket
We describe a rather high level view of how Racket evaluates expression. This notion will be made much more precise later in the course. At this point, our goal is to build an intuition of what Racket programs mean.
6.1 Value
Racket expressions evaluate to values. A value is a datum. Evaluation means taking an expression and returning a value or raising an error.
6.2 Evaluation rules
Each keyword is associated with an evaluation rule.
(define x <exp>)
: evaluate<exp>
and bind the result tox
. Return nothing.<literal>
: return the value of the literal.(lambda (x ...) <exp>)
:: create and return a procedure whose formals are(x ...)
and body is<exp>
. The internal structure of a procedure is unavailable.(if <test-exp> <then-exp> <else-exp>)
:: Evaluated<test-exp>
. If the result is not#f
, then evaluated<then-exp>
and return the result of that evaluation. Else, evaluated<else-exp>
and return that as the value of the entire expression.(<rator-exp> <operand-exp> ...)
Evaluate<rator-exp>
. If it is not a procedure, then raise an error. Otherwise, evaluate each of the<operand-exp>
to create a list of values. Pass these values as arguments to the procedure resulting from evaluating<rator-exp>
. If the number of arguments do not match, raise an error. Otherwise, bind the formals of the procedure to the corresponding arguments, and under these bindings, evaluate the body of the procedure.<id>
: Return the value to which the<id>
is bound.(quote <exp>)
: Return<exp>
.
6.3 Keywords control evaluation, application forces evaluation
Notice that
- Keywords occur in the head position of a list, and
- Keyword control the evaluation of the subsequent elements of the list.
whereas in an application expression consisting of a function expression and zero or more operand expressions,
- the function expression is evaluated, and
- all operand expressions are evaluated.
6.4 Initial environment
The `initial environment' or library comes with many
identifier bindings like cons
, car
, list
, null?
,
arithmetic identifiers like +
, *
, etc. Note, none of
these are keywords, they are identifies bound to built-in
values (mostly functions).
7 Quotation
Typing the following
#lang racket a
Racket tries to evaluate the symbol a
. According to the
evaluator rules described above, evaluating an identifer
means looking at the value to which it is bound. Here a
is not bound to anything, so Racket raises the error
a: unbound identifier
On the other hand, the quote
keyword prevents the
evaluation of
#lang racket (quote a)
(quote a)
is abbreviated 'a
. Note that there is only a
single quote, there is no closing quote.
quote
has three other cousins quasiquote
, unquote
and
unquote-splicing
. They turn out to be extremely handy in
controlling the evaluation of expressions within a list
expression. You may read about them in the Racket Manual.
7.1 Syntax
The quote
keyword and its three other cousins are
described next:
<exp> ::= | <quoted-form> <quoted-form> := (quote <datum>) | (quasiquote <datum>) | (unquote <datum>) | (unquote-splice <datum>) |
The quoted forms may be abbreviated as follows:
`<datum> -> (quote <datum>) `<datum> -> (quasiquote <datum>) ,<datum> -> (unquote <datum>) ,@<datum> -> (unquote-splicing <datum>)
7.2 Semantics
In the following we have used Haskell style notation f x
to
denote f(x)
or Racket's (f x)
notation to reduce parentheses
clutter.
Evaluating a quoted datum returns datum
(eval (quote datum)) = datum
Evaluating an unquote or unquote-splicing raises an error
eval (unquote <datum>) = error eval (unquote-splicing <datum>) = error
Evaluating a quasiquote of an atom returns the atom.
eval (quasiquote atom) = atom
Evaluating a quasiquote of a unquote of an expression results in the evaluation of the expression.
eval (quasiquote (unquote exp)) = eval exp
Evaluating a quasiquote of a pair whose head is a
(unquote-splicing <exp>)
and tail is =(<tail>)
eval (quasiquote (cons (unquote-splicing exp) tail)) = append (eval exp) (eval (quasiquote tail))
Otherwise, evaluating a quasiquote enclosing a pair is the pair of the results of evaluating the quasiquote of each of the two components.
eval (quasiquote (cons exp tail)) = cons (eval (quasiquote exp)) (eval (quasiquote tail))
Evaluating the quasiquote of a vector is the vector obtained by evaluating the quasiquote of each component of the vector:
eval (quasiquote (vector e1 ...)) = vector (eval (quasiquote e1) ...)
8 Style
8.1 Racket Style guide
Although Racket is quite agnostic to white space and other indentation, remember that programs are written to be read. Following, standard style rules leaves your reader with fewer surprises.
8.2 Conventions for comments
Racket, Scheme, and Emacs Lisp all share Lisp's syntax. The convention for commenting Lisp like programs is given here, in the Tips on writing comments in the Emacs-Lisp manual.