Short introduction to Racket: Syntax and a simple Evaluation Model

1. A brief history of Racket
2. Learning Racket's syntax
3. Datum
4. Program syntax
5. Derived Forms
6. Basic Evaluation mechanism of Racket
7. Quotation
- 7.1. Syntax
- 7.2. Semantics
8. Style
- 8.1. Racket Style guide
- 8.2. Conventions for comments

1 A brief history of Racket

Racket is a descendant of Scheme. When it comes to syntax, Racket and Scheme are members of the Lisp family of languages, which includes Common Lisp and also Emacs Lisp. The salient feature of the syntax is that it uses white space and parentheses. Read these collection of mostly laudatory quotes about Lisp.)

Lisp was invented to study recursive function theory (McCarthy, 1958). It was successfully and extensively employed in building early Artificial Intelligence systems. Scheme (Sussman and Steele, 1975), a dialect of Lisp, grew out of an attempt to understand Actors (Hewitt, 1972) in Lisp. The result was a language that, while sticking to Lisp's parentheses laden syntax, was a significant departure from Lisp in its semantics: it embraced lexical scope pioneered by Algol. Lexical scope was crucial to correctly modelling the Lambda Calculus (Church, 1932). Scheme made it possible to write executable interpreters for the Lambda Calculus (Sussman and Steele, 1975) and also many features of programming languages. (See the Lambda Papers.) (For an account of the History of Lambda Calculus, see (Cardone and Hindley, 2006).

Scheme's evolution proceeded via a series of Revised Reports, the latest of which is Revised⁷ Report on the Algorithmic Language Scheme. Scheme was designed to be small. The introductory paragraph of the Scheme Report captures this philosophy:

Programming languages should be designed not by piling feature on top of feature, but by removing the weaknesses and restrictions that make additional features appear necessary. Scheme demonstrates that a very small number of rules for forming expressions, with no restrictions on how they are composed, suffice to form a practical and efficient programming language that is flexible enough to support most of the major programming paradigms in use today.

Racket's immediate parent was PLT Scheme (Felleisen et al.) which was another dialect of Scheme with its own interactive development environment or IDE. Racket evolved as Scheme adding more data types, more keywords and many more built in libraries to Scheme. The presumptive goal was to make it suitable for more real-world applications (`batteries included'), with a module system, class system, many other libraries, immutable lists, etc.

2 Learning Racket's syntax

If you're in a hurry, here is the Racket Syntax Cheat Sheet. The best way to learn the syntax of a programming language is to start small, with a minimal set of constructs to get things done. To learn Racket, we will start with a very small subset that drawn from the Syntax of Scheme. Racket's syntax includes that of Scheme, but has other elements to it.

In the grammar below, for non-terminals whose definitions are not listed here, please consult the Scheme Syntax Reference or Racket Reference.

Racket's syntax is quite permissive. Identifiers can all many special characters, including spaces. The language is insensitive to the amount of white space, Any amount may be used to separate syntactic elements. Unlike Python, indentation is not enforced, rather it is encouraged as good programming practice.

3 Datum

The key to the syntax of Racket is the notion of a datum. A datum is either list expression or a non-list expression. In the latter case, it is either an atom or a vector of datum elements. In what follows, syntactic categories are denoted by identifiers wrapped in < and >.

<datum>         ::=  <list-expr> | <non-list-expr>
<non-list-expr> ::=  <atom> | <vector>

3.1 Atom

<atom> ::=    <boolean>
            | <number>
            | <character>
            | <string>
            | <symbol>
            | <procedure>

The syntax of atomic data is described in the Racket Guide under Built-in Datatypes.

3.2 Vector

<vector> ::= #(<datum> ...)

The meta notation <thing> ... denotes a sequence of zero or more (but finite) occurrences of the category <thing>. Note that the syntax of a vector consists of a hash # followed by a pair of round brackets enclosing a sequence of zero or more <datum> items.

Some examples of vectors:

#(3 4 5)
#("one" "two")
#()

3.3 List Expression

<list-expr>     ::=  (<datum> ...) | (<datum> <datum> ... . <datum>)
                                                          ^
                                                    note the dot

In the first variant, list expression is a sequence of zero or more <datum> elements enclosed in round brackets. (Square brackets are also permissible.) Notice the absence of commas between the elements. As a special case list can be empty, i.e., ().

In the second variant, a list expression consists of a sequence one or more <datum> elements followed by a dot ., followed by another <datum>. Notice that there can be exactly one <datum> after the dot.

3.4 Pair

A pair is a list expression form (<datum> . <datum>).

<pair> ::=  (<datum> . <datum>)

In a pair (<d1> . <d2>), <d1> is called the head of the pair, and <d2> the tail of the pair. Note that a pair may be represented as a labeled directed tree with three nodes, the pair and <d1> and <d2> and edges labeled `head' and `tail' from the pair to <d1> and <d2> respectively.

In the Lisp family parlance, a pair is also called a cons cell, the two components of the pair are accessed via the functions car and cdr operating on the pair.

(<d1> . <d2>)
  |    \
  |     \---tail---> <d2>
  |
  +---head---> <d1>

3.5 Nested Pairs

Pairs can be composed, or nested. For example, consider the nested pair (3 . (a . (b . 5))). The gut of the nested pair is the right-most element in its tree representation. In the example, the gut of the nested pair is 5.

3.6 Conversion

Different list expressions may denote the same object. For example, the nested pair (3 . (2 . ())) is equivalent to (3 2). We seek a way to transform list expressions into other list expressions. The transformation process involves either introducing or eliminating a `dot'. Transforming to a nested pair introduces a `dot'. Transforming away from a nested pair involves eliminiating a `dot'.

The operation of elimination and introduction is captured by the following ELIM/INTRO rewrite rules. Transformation is the process of applying these rewrite rules.

(<d1> <d2> ... . (<d3> . <d4>)) ->  (<d1> <d2> ... <d3> . d4)    ; nest ELIM
(<d1> <d2> ... . ())            -> (<d1> <d2> ...)               ; () ELIM


(<d1> <d2> ... <d3> . d4)  -> (<d1> <d2> ... . (<d3> . <d4>))   ; nest INTRO
(<d1> <d2> ...)            -> (<d1> <d2> ... . ())              ; () INTRO

Each rule has a left hand side, a right hand side, and a name. Here are some examples of transformations. The rewrite rules are specified with pattern variables (like <d1>, <d2>, etc.).

(3 . (2 . (1 . ()))) =nest-ELIM=> 
--------------------

(3 2 . (1 . ())) =nest-ELIM=>
----------------

(3 2 1 . ()) =()-ELIM=>
------------

(3 2 1)

The underlines indicate a redex, short for reducible expression. A redex for a rule is a subexpression that matches the left hand side of the rule. It indicates the subexpression to which the rewrite rule is being applied. Note that in the above example transformation, the redex is always the entire expression. We say a redex rule is applied at the `top level' when the redex is the entire expression.

(3 . ((2 . ()) . 4)) =nest-ELIM=>
--------------------

(3 (2 . ()) . 4)    =()-ELIM=>
   --------

(3 (2) . 4)

Note that in the above example, the ()-ELIM rule is not applied at top level.

3.7 Lists

A list (or list form) is a list expression satisfying the following grammar. A list

<list> ::= (<datum> ...) | (<datum> <datum> ... . <non-list-expr>)

A list is proper if it is of the form (<datum> ...). It is improper if it is of the form (<datum> <datum> ... . <non-list-expr>). (Note: in racket, the list? is used for cheking for proper lists, not all lists.)

Another way of describing a list is that it is a list expression that is not a redex for any of the ELIM rules. In other words, in a list, the ELIM rules can not be applied at the top level.

Every list expression may be converted to a list form by repeatedly applying the elimination rules at the `top level'.

3.8 Normal forms

A list expression is in list normal form if there are no ELIM redexes in it. Notice that a list expression may be a list but not a list normal form. For example, the list expression ((3 . ()) 2) is a list but is not in list normal form. However, it may be transformed to ((3) 2), which is in list normal form.

((3 . ()) 2)     ; in list form
                 ; but not in list normal form

((3) 2)          ; the result of transforming to list normal form

A list expression is in pair normal form if there are no redexes for any INTRO rewrite rule.

Every list expression may be converted to a pair normal form by applying the INTRO rules repeatedly until saturation (i.e., until they are no longer any INTRO redexes). For example, the list expression ((3 . ()) 2) when converted to pair normal form is ((3 . ()) . (2 . ())).

3.9 Questions for future

We have claimed that every list expression may be transformed into a list, into a list normal form, and a pair normal form. We haven't, however, proved this property. At least, not yet.

Also, what can we say about the uniqueness of the list and pair normal forms? Can a give list expression be transformed to two distinct pair normal forms, or two distinct list normal forms? The answer, no: each list expression transforms to, or has, exactly one pair normal form and one list normal form. Again, the proof of why this is so is for later.

3.10 Uses of normal forms

All list expressions printed by Racket are in list normal form and are prefixed with a quote (') mark. (We will explain the role this quote mark shortly. ) It is convenient to assume that internally, Racket stores list expressions in pair normal form.

All datum types discussed so far may be read by the Racket interpreter, except a procedure. It can not be read in, nor its internals printed out. It can, however, be part of a list expression.

4 Program syntax

In what follows, we will stick to a very small subset of Racket. For the constructs whose definitions are not listed here or for the full syntax, please consult the Racket Reference Manual.

<pgm> ::= #lang racket <form> ...
<form> ::= <defn> | <exp> | <require> | <provide>

<require> ::= See Racket Reference
<provide> ::= See Racket Reference

<defn> ::= (define <id> <exp>)
<exp> ::= <literal>
          <id>                                    ; <identifier>
          (if <exp> <exp> <exp>) |                ; 
          (lambda <formals> <exp>) |              ; lambda expression
          (letrec (<bind> ...) <exp>)             ; recursive binding
          (<exp> <exp> ...)  |                    ; application
          (<quote <exp>)                          ; quote

<literal> ::=  <number>                                ; number
             | <boolean>                               ; boolean
             | <character>                             ; char
             | <string>                                ; string    

<id> ::=   <symbol>
<bind> ::= (<id> <exp>) 
<formals> ::= (<id> ...) | (<id> ... . <id>) | <id>

5 Derived Forms

So far, we have introduced only five keywords! Racket has many other keywords, but many of them are derived: they are defined in terms of the basic form. It is best to consider their definitions via pattern matching:

5.1 Lexical (non-recursive) definitions

let is derived syntax built as an application of a lambda expression to arguments.

(let ([<id> <e>] ...) <body>) => ((lambda (<id> ...) <body>) <e> ...)

Here <body> is any expression. Note again the use of ... to denote zero or more of the previous syntactic datum.

5.2 Sequencing

begin is a way of evaluating a sequence of expressions and returning the last one.

5.2.1 Syntax and meaning

The syntax of a begin expression is

(begin <exp1> <exp2> ...)

The expressions <exp1> <exp2> ... are evaluated in sequence and the value of the last expression is returned.

(begin e) => e
(begin e1 e2 ...) => (let ([x e1]) (begin e2 ...))

Here x is assumed to be a `fresh' identifier name not occur anywhere in the begin expression.

5.3 Conditional

Conditional expressions (starting with the cond keyword) are derived using if. Note the recursive nature of the transformation.

5.3.1 Syntax

The (slightly simplified) syntax for conditional is

<cond> ::= (cond <clause> ... [else <exp>])
<clause> ::= [<exp> <exp> ...]

Note that the a <cond> consists of zero or more clauses followed by an `else' clause.

Each <clause> has at least two expressions. The first expression in the clause is evaluated If it is true, the remaining expressions in that clauses and the value of the last one is returned. Otherwise, the next clause is considered. If all the clauses `fail', then the expression in the else part is evaluated as the value of the entire else expression.

5.3.2 Transformation to `<if>` expressions

(cond [else e]) => e
(cond [e e1 e2 ...] clause ...)  => 
(if e 
   (begin e1 e2 ...)
   (cond clause ...))

5.4 Function definition

When using Racket, you might have used define in for defining functions. In that syntax, one often combines the formal parameters of the function along with the identifier for the function. For example, consider the definition

(define (f x) 
   (+ x 2))

In the first line, we know we are defining a function f because of the formal parameters following f, as in (f x).

The same may be written in the canonical way:

(define f 
   (lambda (x) (+ x 2)))

5.4.1 Transformation

The transformation is simple. The identifer f is now bound to a lambda expression and the formals move next to the lambda.

(define (f formal ...) exp) => (define f (lambda (formal ...) exp))

6 Basic Evaluation mechanism of Racket

We describe a rather high level view of how Racket evaluates expression. This notion will be made much more precise later in the course. At this point, our goal is to build an intuition of what Racket programs mean.

6.1 Value

Racket expressions evaluate to values. A value is a datum. Evaluation means taking an expression and returning a value or raising an error.

6.2 Evaluation rules

Each keyword is associated with an evaluation rule.

(define x <exp>): evaluate <exp> and bind the result to x. Return nothing.
<literal>: return the value of the literal.
(lambda (x ...) <exp>) :: create and return a procedure whose formals are (x ...) and body is <exp>. The internal structure of a procedure is unavailable.
(if <test-exp> <then-exp> <else-exp>) :: Evaluated <test-exp>. If the result is not #f, then evaluated <then-exp> and return the result of that evaluation. Else, evaluated <else-exp> and return that as the value of the entire expression.
(<rator-exp> <operand-exp> ...) Evaluate <rator-exp>. If it is not a procedure, then raise an error. Otherwise, evaluate each of the <operand-exp> to create a list of values. Pass these values as arguments to the procedure resulting from evaluating <rator-exp>. If the number of arguments do not match, raise an error. Otherwise, bind the formals of the procedure to the corresponding arguments, and under these bindings, evaluate the body of the procedure.
<id> : Return the value to which the <id> is bound.
(quote <exp>) : Return <exp>.

6.3 Keywords control evaluation, application forces evaluation

Notice that

Keywords occur in the head position of a list, and
Keyword control the evaluation of the subsequent elements of the list.

whereas in an application expression consisting of a function expression and zero or more operand expressions,

the function expression is evaluated, and
all operand expressions are evaluated.

6.4 Initial environment

The `initial environment' or library comes with many identifier bindings like cons, car, list, null?, arithmetic identifiers like +, *, etc. Note, none of these are keywords, they are identifies bound to built-in values (mostly functions).

7 Quotation

Typing the following

#lang racket
a

Racket tries to evaluate the symbol a. According to the evaluator rules described above, evaluating an identifer means looking at the value to which it is bound. Here a is not bound to anything, so Racket raises the error

a: unbound identifier

On the other hand, the quote keyword prevents the evaluation of

#lang racket
(quote a)

(quote a) is abbreviated 'a. Note that there is only a single quote, there is no closing quote.

quote has three other cousins quasiquote, unquote and unquote-splicing. They turn out to be extremely handy in controlling the evaluation of expressions within a list expression. You may read about them in the Racket Manual.

7.1 Syntax

The quote keyword and its three other cousins are described next:

<exp> ::=  
         | <quoted-form>

<quoted-form> :=   (quote <datum>)  |
                   (quasiquote <datum>) | 
                   (unquote <datum>)    | 
                   (unquote-splice <datum>) |

The quoted forms may be abbreviated as follows:

`<datum>       ->   (quote <datum>)  
`<datum>       ->   (quasiquote <datum>) 
,<datum>       ->   (unquote <datum>) 	  
,@<datum>      ->   (unquote-splicing <datum>)

7.2 Semantics

In the following we have used Haskell style notation f x to denote f(x) or Racket's (f x) notation to reduce parentheses clutter. Evaluating a quoted datum returns datum

(eval (quote datum)) = datum

Evaluating an unquote or unquote-splicing raises an error

eval (unquote <datum>) = error
eval (unquote-splicing <datum>) = error

Evaluating a quasiquote of an atom returns the atom.

eval (quasiquote atom) = atom

Evaluating a quasiquote of a unquote of an expression results in the evaluation of the expression.

eval (quasiquote (unquote exp)) = eval exp

Evaluating a quasiquote of a pair whose head is a (unquote-splicing <exp>) and tail is =(<tail>)

eval (quasiquote (cons (unquote-splicing exp) tail)) = append (eval exp) (eval (quasiquote tail))

Otherwise, evaluating a quasiquote enclosing a pair is the pair of the results of evaluating the quasiquote of each of the two components.

eval (quasiquote (cons exp tail)) = cons (eval (quasiquote exp)) (eval (quasiquote tail))

Evaluating the quasiquote of a vector is the vector obtained by evaluating the quasiquote of each component of the vector:

eval (quasiquote (vector e1 ...)) = vector (eval (quasiquote e1) ...)

8 Style

8.1 Racket Style guide

Although Racket is quite agnostic to white space and other indentation, remember that programs are written to be read. Following, standard style rules leaves your reader with fewer surprises.

Here is Racket's own style guide and conventions.

8.2 Conventions for comments

Racket, Scheme, and Emacs Lisp all share Lisp's syntax. The convention for commenting Lisp like programs is given here, in the Tips on writing comments in the Emacs-Lisp manual.