Lecture notes on Lambda Calculus – Part 1: Syntax and Operational Semantics

1. Introduction
2. Abstract Syntax
3. Alpha Equivalence
4. Application of a substitution to an expression
- - 4.0.1. Exercise
- 4.1. Discussion
  - 4.1.1. Accidental elimination of a binding
  - 4.1.2. Accidental capture of a free identifier
5. β and η rewrite rules
- 5.1. β rewrite rule
- 5.2. η rewrite rule
6. βη reduction
7. Normal forms for (full) $\beta \eta$ reduction
8. Confluence (Church-Rosser Property) of the λ-calculus: Proof Sketch
9. Significance of the Confluence (Church-Rosser) property of λ calculus
10. Reduction Strategies
11. References

1 Introduction

The λ-calculus, introduced by Alonzo Church in the late 1920's and the 1930's, is a reduction system designed to model functions, their construction and application. It was invented by Church to study the logical foundations of mathematics and computability theory. Later, people discovered its utility as a foundation for the design of programming languages.

At its core, the (pure) λ-calculus is a startlingly simple language of expressions: only three kinds of syntactic expressions and two rules govern the reduction of its expressions. However, these rules capture the essence of how programs in any sequential programming language behave. λ-calculus is Turing complete.

These lecture notes are an introduction to λ-calculus, its syntax, operational semantics and a theorem about its confluence property, and reduction strategies that relate λ-calculus to modern programming languages.

The material presented here is based on some of the textbooks and online resources listed in the References Section (Sec. 11) , particularly the Chapter on Untyped Lambda calculus from Types and Programming Languages by Pierce and Selinger's lecture notes on Lambda calculus.

2 Abstract Syntax

The (abstract) syntax of λ-calculus expressions, or expressions for short is given below:

\begin{align*} e ::= & & \scriptstyle{EXP} & \qquad \mbox{Expression}\\ & x & \scriptstyle{IDENT} & \qquad \mbox{Identifier}\\ & \lambda\ x.\ e & \scriptstyle{ABS} & \qquad \mbox{Abstraction}\\ & e\ e & \scriptstyle{APP} & \qquad \mbox{Application} \end{align*}

Informally, an abstraction expression denotes the construction of a function of one argument. An application $e\ e'$ denotes the application of the function denoted by $e$ to the argument denoted by $e'$.

2.1 Syntactic extension

Abstraction expressions take a single formal parameter and, correspondingly applications involve a single argument applied to an expression denoting a function. To avoid clutter, we rely on the following syntactic extension: an expression $e_0\ e_1\ e_2$ is shorthand for $((e_0 e_1) e_2)$. The shorthand $\lambda\ x\ y.\ e$ abbreviates $\lambda\ x.\ (\lambda\ y.\ e)$.

2.2 Size

The size of an expression $e$, denoted $|e|$ is recursively defined as follows:

\begin{align*} |x| &= 1\\ |\lambda\ x.\ e| &= |e| + 2 \\ |e\ e'| &= |e|+|e'| \end{align*}

2.3 Free and bound identifiers

In earlier classes, we learned how to connect identifier references to their binding occurrences (if they exist). A identifier x in an expression e that has a reference occurrence without a corresponding binding occurrence in e is said to occur free in e. Otherwise, it is said to be bound.

The set of free identifiers contained in an expression may also be inductively defined as follows:

\begin{align*} \id{FV}(x) &= \set{x}\\ \id{FV}(e\ e') &= \id{FV}(e) \union \id{FV}(e')\\ \id{FV}(\lambda\ x.\ e) &= \id{FV}(e)\ \backslash\ \set{x} \end{align*}

Note that it is possible for a identifier to occur free as well as bound in an expression. $\id{FV}(e)$ denotes the set of all identifiers occurring in an expression, whether free or bound.

2.4 Open and closed expressions

An expression is open if it contains free identifiers; it is closed otherwise.

2.5 Combinators

A combinator is an expression that is closed. Examples of some well-known combinators and closed expressions and their names are given below. We use the lambda: notation for curried form of lambda and the : for curried application.

(lambda (x) x)                  ; I combinator, the identity function
(lambda: (x y) x)               ; K combinator.  Note that this short for  
                                ; (lambda (x) (lambda (y) x))

(lambda: (x y z) (: x z (y z))   ; S combinator.  Note that this short for  
                                 ;  (lambda (x) 
                                   (lambda (y) 
                                     (lambda (z) 
                                       ((x z) (y z))))).

(lambda (x) (x x))             ; omega combinator.  
(omega omega)                  ; Omega.  An example of self application.

(lambda (g)                     ; Y combinator.
  ((lambda (x) (g (x x)))       ; Also called the (normal order)
   (lambda (x) (g (x x)))))     ; fixed point combinator


(lambda (g)                                  ; Z combinator
  ((lambda (x) (g (lambda (y) (: x x y))))   ; Also called the (applicative order)
   (lambda (x) (g (lambda (y) (: x x y)))))) ;  fixed point combinator

3 Alpha Equivalence

Consider the expressions $\lambda\ x.\ x$ and $\lambda\ y.\ y$. Intuitively, they both denote the same entity, namely, the identity function. They are equivalent because it should not matter what names are used as bound identifiers. On the other hand $\lambda\ x.\ x$ is clearly distinct from $\lambda\ x.\ y$. The former is the identity function, while the latter denotes a function that returns the (value of) $y$, independent of the parameter $x$.

We wish to formalize this notion of equivalence `upto renaming of bound identifiers'. We call this α equivalence, an equivalence relation between expressions that we define presently. Defining α-equivalence requires the notion of renaming, which we introduce next.

3.1 Renaming

A renaming is a syntactic operation on an expression in which an identifier is replaced with another in an expression. $e\set{x/y}$ is read as ``rename in $e$ the identifier $x$ with the identifier $y$.'' It is defined inductively as follows.

\begin{align*} x\set{x/y} &= y\\ z\set{x/y} &= z \qquad \mbox{if $x\neq z$}\\ e\ e'\set{x/y} &= e\set{x/y}\ e'\set{x/y}\\ (\lambda\ x.\ e)\set{x/y} &= \lambda\ y.\ e\set{x/y}\\ (\lambda\ z.\ e)\set{x/y} &= \lambda\ z.\ e\set{x/y} \end{align*}

Used without restriction, a renaming applied on an expression could transform not just the expression but also its meaning. (Exercise: give an example.) However, we will employ the renaming application sparingly and in the right context so that it actually serves our purpose of establishing α equivalance of two expressions.

3.2 Defining α-equivalence

The relation $\alphaequiv$, read α-equivalence, is defined as the smallest binary relation on $\lambda$-expressions that is closed under the following rules:

\begin{align*} \frac{y\not\in \id{FV}(e)}{\lambda\ x.\ e \alphaequiv \lambda\ y.\ e\set{x/y}} \qquad \scriptstyle{ALPHA}\\ \\ \frac{e \alphaequiv e'}{\lambda\ x.\ e \alphaequiv \lambda\ x.\ e'}\qquad \scriptstyle{ABS}\\ \\ \frac{{e_1 \alphaequiv e_1'}\quad{e_2\alphaequiv e_2'}}{e_1\ e_2 \alphaequiv e_1\ e_2'}\qquad \scriptstyle{APP}\\ \\ \\ \\ \frac{}{e\alphaequiv e}\qquad\scriptstyle{REF}\\ \\ \frac{e'\alphaequiv e}{e\alphaequiv e'}\qquad\scriptstyle{SYM}\\ \\ \frac{{e\alphaequiv e'}\quad{e'\alphaequiv e''}}{e\alphaequiv e''}\qquad\scriptstyle{TRANS} \end{align*}

Observe the judicious use of renaming in the definition of α-equivalence.

3.2.1 Exercise

Show that if $e\alphaequiv e'$ then $\id{FV}(e) = \id{FV}(e')$. Is the converse true?

3.3 Examples

3.3.1 $\lambda x.\ x \alphaequiv \lambda y.\ y$

CUSTOM_ID: alpha1

\begin{align*} 1\qquad \lambda x.\ x &\alphaequiv\ \lambda x.\ x & \text{using REF}\\ 2\qquad \lambda x. \ x &\alphaequiv\ \lambda y.\ y & \text{from 1, and}\ y\not\in \id{FV}(x), y=\set{x/y}, \text{using ALPHA}\ \\ \end{align*}

3.3.2 $\lambda x.\lambda x.\ x \alphaequiv \lambda y.\ \lambda y.\ y$

CUSTOM_ID: alpha2

The proof follows by applying the REF rule to $\lambda\ x.\lambda\ x.\ x$ with the replacement $\set{x/y}$ after noting that $y$ does not occur in the body of the $\lambda$ expression in LHS.

3.3.3 $\lambda x.\lambda y.\ y \alphaequiv \lambda y.\ \lambda x.\ x$

1. $\lambda y.\ y \alphaequiv \lambda x.\ x\qquad$ From Example 3.3.1.
2. $\lambda y.\ \lambda y.\ y \alphaequiv \lambda y.\ \lambda x.\ x\qquad$ 1, ABS($y$}.
3. $\lambda x.\ \lambda y.\ y \alphaequiv \lambda x.\ \lambda x.\ x\qquad$ 1, ABS($x$}.
4. $\lambda x.\ \lambda x.\ x \alphaequiv \lambda y.\ \lambda y.\ y\qquad$ From Example 3.3.2.
5. $\lambda x.\ \lambda y.\ y \alphaequiv \lambda y.\ \lambda x.\ x\qquad$ 2, 3, 4, EQUIV

3.3.4 Exercise

Prove \[((\lambda\ x.\ (x\ x)) (\lambda\ x.\ (x\ x))) \alphaequiv ((\lambda\ x.\ (x\ x)) (\lambda\ y.\ (y\ y)))\]

3.4 Barendregt identifier convention

By α equivalence, we may ensure that all bound identifiers (i.e., formals) in an expression are distinct from each other and any free identifiers.

4 Application of a substitution to an expression

Application of a function abstraction to an argument is the primary way in which expressions in λ-calculus are reduced.

Intuitively, applying a function $\lambda\ x.\ e\ $ to an argument $e'$ involves substituting occurrences of $x$ with $e'$ in the body $e$. The process of applying a substitution to an expression needs to be carefully defined. For example, in the body $e$, only free occurrences of $x$, if any, should be substituted with $e$.

Because of bound identifier names that may clash with identifier names in a substitution, a casual replacement of $x$ in $e$ with $e'$ could quickly get us into trouble. The application of a substitution on an expression works its way into sub-expressions. The result of the application depends on how the substitution ultimately gets applied to the atomic elements (indentifiers), but also how it commutes with compound elements: applications, and abstractions, which is where the subtleties surface.

\begin{align} x[x/e] &= e\\ y [x/e]&= y \quad \mbox{if $y\neq x$}\\ (e_1\ e_2)[x/e] &= e_1[x/e]\ e_2[x/e]\\ (\lambda\ x.\ e')[x/e] &= \lambda\ x.\ e' \label{eqn:subst-abs-bound}\\ (\lambda\ y.\ e')[x/e] &= \lambda\ y.\ e'[x/e] \quad \mbox{if}\ y\neq x\ \mbox{and}\ y\not\in \id{FV}(e) \label{eqn:subst-abs}\\ (\lambda\ y.\ e')[x/e] &= \lambda\ z.\ \big(e'\set{y/z}\big)[x/e] \quad \mbox{if}\ y\neq x,\ y\in \id{FV}(e)\ \mbox{and} \label{eqn:subst-fresh}\\ & \qquad \qquad \qquad z\ \text{is fresh, i.e.,}\ z\not\in \id{FV}(e)\union\id{FV}(e') \union\set{x,y}\nonumber \end{align}

Note that because of the side condition in the last clause, any number of different fresh variables may be chosen, rendering substitution operation not a function, but a relation. If, however, we assume that substitution operation to be defined on expressions modulo α-equivalence, then the substitution operation is indeed a function.

4.0.1 Exercise

Show the steps involved in applying the substitution $[x/(y\ z)]$ to $\lambda\ y.\ \ x\ y\ (\lambda\ y.\ \ y)$. Also refer to the clause in the definition employed at each step, along with evidence that the side conditions are satisfied.

4.1 Discussion

Here are some cases that would arise if we were less than careful with the definition:

4.1.1 Accidental elimination of a binding

If we eliminate equation \eqref{eqn:subst-abs-bound} and the side condition $y\neq x$ from \eqref{eqn:subst-abs}, we could end up with the following incorrect situation,

\[\lambda\ \texttt{x}.\ \texttt{x}[\texttt{x}/\texttt{w}] = \lambda\ \texttt{x}.\ \texttt{w}\]

which does not match our intuition about functions. Intuition dictates that an expression with no free identifiers should remain invariant under the application of a substitution. But here, the application of a substitution to the identity function has turned it into something else.

4.1.2 Accidental capture of a free identifier

Without the side condition $y\not\in \id{FV}(e)$ in \eqref{eqn:subst-abs}, we could have

\[\lambda\ \texttt{w}.\ \texttt{x}[\texttt{x}/\texttt{w}] = \lambda\ \texttt{w}.\ \texttt{w}\]

which suffers from the opposite problem: the incorrect application of the substitution has turned a constant function returning $\texttt{x}$ into the identity function.

5 β and η rewrite rules

5.1 β rewrite rule

The fundamental rewrite rule in the λ-calculus is the β rewrite rule:

\begin{align*} {(\lambda\ x.\ e)\ \ e'\rewritesto e[x/e']}\qquad \scriptstyle{BETA}\\ \end{align*}

The β rule captures the fundamental operation involved in applying a function to its argument: the formal parameter is substituted with the argument in the body of a function.

5.2 η rewrite rule

η reduction captures an optimization in the way functions operate. If a function $f$ merely passes its formal parameter $x$ to another function $g$ we may simply eliminate $f$ and use $g$ instead. This idea translates to the following $\eta$ rewrite rule:

\begin{align*} {\lambda\ x.\ e\ x\rewritesto_{\eta} \ e}\qquad \mbox{provided $x\not\in\id{FV}(e)$}\quad\scriptstyle{ETA}\\ \end{align*}

6 βη reduction

$\beta\eta$ reduction, denoted $\betaetareducesto$ is the compatible closure of the rewrite system consisting of $\rewritesto_{\beta}$ and $\rewritesto_{\eta}$.

\begin{align*} \frac{e\rewritesto_{\beta} e'}{e\betaetareducesto e'}\quad \scriptstyle{BETA}\\ \\ \\ \frac{e\rewritesto_{\eta} e'}{e\betaetareducesto e'}\quad \scriptstyle{ETA}\\ \\ \\ \frac{{e_1\betaetareducesto e_1'}}{e_1\ e_2\betaetareducesto e_1'\ e_2}\quad \scriptstyle{APP1}\\ \\ \\ \frac{{e_2\betaetareducesto e_2'}}{e_1\ e_2\betaetareducesto e_1\ e_2'}\quad \scriptstyle{APP2}\\ \\ \\ \frac{e\betaetareducesto e'}{\lambda x.\ e\betaetareducesto \lambda\ x.\ e'}\quad \scriptstyle{ABS}\\ \\ \\ \end{align*}

It is possible for a term to have zero or more β redexes. A β reduction $e\betareducesto e'$ may either reduce β redexes or increase them. It is also possible that the β reducing an expression results in an expression with a larger size. On the other hand, an η reduction always reduces the size of the expression.

7 Normal forms for (full) $\beta \eta$ reduction

$e$ is in full $\beta\eta$ normal form if it contains no β or η-redexes. A reduction system is has the strong normalization property if for all terms, every sequence of rewriting terminates with an normal form. λ calculus does not enjoy this property. A reduction system has the weak normalization property if for every term, there is at least one sequence of reductions that terminates in a normal form. λ calculus does not enjoy even the weak normalization property. Let $\omega = \lambda\ x.\ \ x\ x$. The expression $\Omega = \omega\ \omega$ is has no normal form.

What the λ-calculus does, however, enjoy is the confluence property, also called the Church Rosser property.

CONFLUENCE of λ-calculus: Let $\reducesto$ denote either $\betareducesto$ or $\betaetareducesto$. If $e\simplifiesto e_1$ and $e\simplifiesto e_2$, then there exists an $e_3$ such that $e_1\simplifiesto e_3$ and $e_2\simplifiesto e_3$.

One consequence of the confluence property of a reduction $\reducesto$ is that if an expression reduces to a normal form, then that normal form is unique. (Convince yourself why.)

8 Confluence (Church-Rosser Property) of the λ-calculus: Proof Sketch

We present a proof sketch based on the proof given in Selinger's notes. You are invited to read the entire proof in his notes for a much more thorough and complete understanding.

8.1 Confluence and the Diamond Property

Recall the the definitions of the following properties of an arbitrary reduction system $\reducesto$:

(a) Confluent (Church-Rosser): $\reducesto$ is confluent or has the Church-Rosser property if $a\simplifiesto b$ and $a\simplifiesto c$ implies that there exists $d$ such that $b\simplifiesto d$ and $c\simplifiesto d$.

(c) Diamond property: $\reducesto$ has the diamond property if $a\reducesto b$ and $a\reducesto c$ implies there exists $d$ such that $b\reducesto d$ and $c\reducesto d$.

The Diamond property and confluence are related in the following way:

1. Diamond implies Confluence: the proof is by induction on $\simplifiesto$ deductions. Geometrically, a grid of diamonds may be arranged to fill the area between the rectangle between $a$, $b$, $c$ and $d$ (See diagram on page 31 of Selinger's notes.)
2 $\reducesto$ is confluent iff $\simplifiesto$ satisfies Diamond: this is evident from the definitions of transitive closure and the definitions of the respective properties.

Note that as a consequence of 1., if βη-reduction satisfied the Diamond property, we would have a proof of the confluence of βη-reduction. Unfortunately, however, βη doesn't satisfy the Diamond property.

8.2 Strategy to prove Confluence of λ-calculus

If we find a relation, say $\preducesto$, such that:

A. $\psimplifiesto$ coincides with $\simplifiesto$,
B. $\preducesto$ satisfies the Diamond property

then, from A., B., and 2., we have a proof of the confluence of $\reducesto$.

In the rest of sketch, we

define a relation $\preducesto$ on λ expressions, called parallel reduction that has the above two properties,
show Property A,
define the notion of a maximally parallel reduct, and
use this to prove Property B.

8.3 Definition of Parallel reduction

Intuitively, parallel reduction allows you to look at all redexes in an expression and then apply a subset of them.

\begin{align*} \frac{}{x\preducesto x}\quad \scriptstyle{IDENT}\\ \\ \\ \frac{{e_1\preducesto e'_1}\quad{e_2\preducesto e_2'}}{e_1\ e_2\ \preducesto\ e_1'\ e_2'}\quad \scriptstyle{APP}\\ \\ \\ \frac{e\preducesto e'}{\lambda\ x.\ e\ \preducesto\ \lambda\ x.\ e'}\quad \scriptstyle{ABS}\\ \\ \\ \frac{{e_1\preducesto e'_1}\quad{e_2\preducesto e_2'}}{(\lambda\ x.\ e_1)e_2\ \preducesto e_1'[x/e_2']}\quad \scriptstyle{BETA}\\ \\ \\ \frac{{e\preducesto e'}\quad x\not\in\id{FV}(e)}{\lambda\ x.\ e\ x\ \preducesto e'}\quad\scriptstyle{ETA} \end{align*}

8.3.1 Example 1

Let $\id{Id}$ denote $\lambda\ x.\ x$. Consider the term $e$

\[(\lambda\ z. (\id{Id}\ z)^1)^2\]

where we have used superscripts to identify locations of redexes:

We first consider all the possible reductions from $e$:

1. $e\reducesto_{\beta:1} \lambda\ z.\ z$
2. $e\reducesto_{\eta:2} \id{Id}$

Notice that in this example applying the β-reduction makes the η-redex disappear and also the other way round. In general, in the application of a single β or η redex

1. An existing redex may disappear.
2. An existing redex may get replicated.
3. New redexes may get created.

Let us examine the definition carefully:

The IDENT rule says that identifiers parallel-reduce to themselves.
The APP rule says that rator $e_1$ and rand $e_2$ may parallel-reduce to say, $e_1'$ and $e_2'$. This could result in the creation of a β-redex, say, if $e_1'$ is an abstraction expression. However, this new redex created by the juxtaposition of $e_1'$ to $e_2'$ is not to be reduced in $e_1\ e_2$'s parallel reduction.
The BETA rule says that if $e_1$ and $e_2$ parallel-reduce to $e_1'$ and $e_2'$, then the existing β-redex $(\lambda\ x.\ e_1)\ e_2$ reduces to $e_1'[x/e_2']$.
The ABS rule says that the parallel-reduction of a λ-abstraction is dictated by the parallel-reduction of its body. Note that if $e$ reduces to $e''\ x$, then there is a new η-redex $\lambda\ x.\ e''\ x$. However, this redex is not reduced in a parallel-reduction of $\lambda\ x. e$.
The ETA rule is used to handle the case when there is an η-redex.

Intuitively, in parallel reduction, we do the following:

Step 1: Identify all the redexes in the expression,
Step 2: Select a subset of them.
step 3: Choose an any order to apply them.
Step 4: In the process some original redexes may get replicated due to substitution.
Step 5: If some original redexes disappear, so be it.
Step 6: If a new redex gets created, do not reduce it.

Now we consider the set of reductions in parallel for the above example:

1. $e\preducesto^{\pair{}} e$ (no reductions)
2. $e\preducesto^{\pair{\beta:1}} \lambda\ z.\ z$
3. $e\preducesto^{\pair{\eta:2}} \id{Id} = \lambda\ x.\ x$

8.3.2 Lemma: Prove that for each $e$, $e\preducesto e$

custom_id: e-preducesto-e

Proof: Exercise. [Hint: use induction on $e$]

Intuitively, the lemma claims that when doing parallel reduction, we always have the option of picking none of the possible redexes in $e$.

8.3.3 Example 2

Note that if $e\preducesto e'$, then $e'$ may still have redexes that were originally in $e'$ or new ones created in the process of doing the parallel reductions. This is demonstrated via parallel reduction in the following expression:

\[e=(((\lambda\ z. (\omega\ z)^1)^2) (\id{Id}\ \id{Id})^3)^4\]

$e$ contains four redexes identified as $\beta:1$, $\eta:2$, $\beta:3$ and $\beta:4$.

Here are some examples of parallel reductions of $e$

8.3.3.1 $S1$

\[e\preducesto (\id{Id}\ \id{Id})\]

The reduct $(\id{Id}\ \id{Id})$ is obtained by choosing and applying in sequence the redexes $\beta:1$, $\beta:3$ and $\beta:4$.

\begin{align*} e &=\\ (((\lambda\ z. (\omega\ z)^1)^2) (\id{Id}\ \id{Id})^3)^4 & \reducesto_{\beta:1} \\ ((\lambda\ z.\ (z\ z))(\id{Id}\ \id{Id})^3)^4 & \reducesto_{\beta:3} \\ ((\lambda\ z.\ (z\ z))\ \id{Id})^4 & \reducesto_{\beta:4}\\ (\id{Id}\ \id{Id})^5 \end{align*}

Notice at this point, we have exhausted all the original redexes in $e$. The redex $\beta:5$ in $(\id{Id}\ \id{Id})$ is new and is left untouched.

8.3.3.1.1 Exercise

Using the rules of $\preducesto$, derive the result $e\preducesto \id{Id}\ \id{Id}$.

8.3.3.2 $S2$

\[e \preducesto ((\id{Id}\ \id{Id}) (\id{Id}\ \id{Id}))\]

\begin{align*} e &=\\ (((\lambda\ z. (\omega\ z)^1)^2) (\id{Id}\ \id{Id})^3)^4 & \reducesto_{\beta:1} \\ ((\lambda\ z.\ (z\ z))(\id{Id}\ \id{Id})^3)^4 &\reducesto_{\beta:4} \\ ((\id{Id}\ \id{Id})^{6} (\id{Id}\ \id{Id})^{7})^{5} \end{align*}

8.3.3.2.1 Exercise

Derive, using the rules of $\preducesto$, the result $e\preducesto \id{Id}\ \id{Id}$.

8.3.3.2.2 Exercise

Show that \[(\id{Id}\ \id{Id}) (\id{Id}\ \id{Id}) \preducesto \id{Id}\ \id{Id}\]

8.4 Relating $\psimplifiesto$ with $\simplifiesto$

The examples above hint that parallel reduction can be simulated with a sequence of reductions. In fact, something stronger is true:

\[\psimplifiesto = \simplifiesto\]

The proof is in three parts:

(a) $e\reducesto e' \implies e\preducesto e'$: Each $\beta\eta$ reduction is a parallel reduction. The proof of this is by induction on $\reducesto$. The details are in Lemma 4.6(a) of Selinger.

(b) $e\preducesto e' \implies e\simplifiesto e'$: Each parallel reduction $\preducesto$ may be simulated by a sequence of $\reducesto$ reductions, i.e., by a single $\simplifiesto$ relation. The proof is by induction on $\preducesto$. Details are Lemma 4.6(b) of Selinger's notes.
(c) $\psimplifiesto$ and $\simplifiesto$ coincide: This is a consequence of (a) and (b) and the definition of reflexive transitive closure. Details are in Lemma 4.6(c) of Selinger's notes.

8.5 Definition of Maximally Parallel Reduct

When all original redexes are applied exhaustively in parallel on an expression $e$, we get its maximally parallel reduct $e^*$. Assume that we have an expression $e$. We define $e^*$, the unique maximally parallel one-step reduct of $e$ as follows:

\begin{align*} x^* &= x\\ (e_1\ e_2)^* &= (e_1^*\ e_2^*) \qquad \mbox{if $e_1\ e_2$ is not a $\beta$-redex}\\ ((\lambda\ x.\ e_1)e_2)^* &= e_1^*[x/e_2^*]\\ (\lambda\ x.\ e)^* &= \lambda\ x.\ e^* \qquad \mbox{if $(\lambda\ x.\ e)$ is not an $\eta$-redex}\\ (\lambda\ x.\ e\ x)^* &= e^* \qquad \mbox{if $x\not\in \id{FV}(e)$} \end{align*}

8.5.1 Exercise. Computing maximally parallel reduct

Show that \[(((\lambda\ z. (\omega\ z))) (\id{Id}\ \id{Id}))^*=\id{Id}\ \id{Id}\]

8.6 Proof of Property B: $\preducesto$ satisfies the Diamond property

The proof is in two parts:

(a) $e\preducesto e' \implies e'\preducesto e^*$: the intuition that once a subset of redexes are applied on an expression $e$ to yield a residual expression $e'$, i.e., $e\preducesto e'$, the residual $e'$ still has some of the original redexes of $e$. By applying these remaining original redexes $e$ via parallel reduction, we can arrive at $e^*$. Recall that its possible to steer $e'$ into a reduction sequence that takes us into an expression quite different from $e^*$. But all we are saying here is that there is a way to get from $e'$ to $e^*$. The formal proof is by induction on $e$ and given as Lemma 4.8 in Selinger's notes. The second case in the proof of Lemma 4.8 depends on a Substitution Lemma (Lemma 4.7) in Selinger's notes. Please read both proofs.
(b) $\preducesto$ satisfies the Diamond Property: Pick any two expressions $e_1$ and $e_2$ such that $e\preducesto e_1$ an $e\preducesto e_2$. From (a), it follows that $e_1\preducesto e^{*}$ and $e_2\preducesto e^{*}$. Thus, $\preducesto$ satisfies the Diamond Property.

9 Significance of the Confluence (Church-Rosser) property of λ calculus

The confluence of λ-calculus is one of the most important results in the theory of programming languages.

The confluence of λ-calculus means that:

1. λ-calculus reduction is non-deterministic: As long as the simplification does not diverge, (i.e., continue forever), it does not matter in what order we perform the reductions.
2. λ-calculus models evaluation: Every expression reduces to at most one normal form.
3. λ-calculus is parallel: It is possible to apply reductions in parallel, opening up many opportunities for parallelism for implementors of λ-calculus based languages.

10 Reduction Strategies

In any given expression, there may be zero or more redexes present. A reduction strategy determines which redex gets picked next for reduction. We have the following two main strategies:

10.1 Full Reduction

In full reduction, any redex in the expression may be picked as the next one for reduction. In the example below, adapted from Pierce's book (TAPL), the following is one possible full reduction sequence:

\begin{align*} (\id{Id}\ (\id{Id}\ (\lambda\ z.\ (\id{Id}\ z)^4)^3)^2)^1 &\reducesto_{\beta:4} \\ (\id{Id}\ (\id{Id}\ (\lambda\ z.\ z))^2)^1 &\reducesto_{\beta:2} \\ (\id{Id}\ (\lambda\ z.\ z))^1 &\reducesto_{\beta:1} \\ (\lambda\ z.\ z) &\ \not\reducesto \end{align*}

Note that the disadvantage of complete choice in full reduction is that one may veer off into a non-terminating reduction of an expression sequence even though the expression has a normal form. Just consider the expression $(\lambda\ x.\ \id{Id})\ \Omega$ with the hopeless strategy of always ignoring the top level redex and choosing instead to reduce Ω.

If an expression has a normal form, is there a strategy that guarantees arriving at it, without getting sucked into an infinite reduction sequence? The next strategy answers this in the affirmative.

10.2 Normal order reduction

The normal order strategy is based on the observation that given any two redexes at positions $i$ and $j$ in an expression $e$:

1. Seen from the root, $i$ `above' $j$ (or vice versa), or
2. Either $i$ is to the left of $j$ (or vice versa).
It is not possible for two redexes in a $\lambda$ expression to overlap.

Normal order strategy involves locating the next redex in the expression $e$ as follows:

1. If $e$ is a redex, use it
2. If $e$ is not a redex but a λ abstraction $\lambda\ x.\ e'$, then locate the redex inside the body $e'$.
3. If $e$ is not a redex but an application, $(e_1\ e_2)$, then locate the redex in the $e_1$.
4. If there's no redex in $e_1$, then locate a redex in $e_2$.

\begin{align*} (\id{Id}\ (\id{Id}\ (\lambda\ z.\ (\id{Id}\ z)^4)^3)^2)^1 &\reducesto_{\beta:1} \\ (\id{Id}\ (\lambda\ z.\ (\id{Id}\ z)^4)^3)^2 &\reducesto_{\beta:2} \\ (\lambda\ z.\ (\id{Id}\ z)^4)^3 &\reducesto_{\eta:3} \\ \id{Id} &\ \not\reducesto \end{align*}

10.3 Proof that Normal Order reduction guarantees a normal form if one exists

This result is called the `Standardization' theorem in λ calculus. A proof of this result using parallel reductions (see above) is given in Takahashi's Parallel Reductions in λ-calculus 1995 paper (Information and Computation No. 118, 120-127). Another approach for the standardization result λ-calculus is in Ryo Kashima's 2000 paper, using a inductively defined notion of β-reducibility with a standard sequence.

10.4 Applicative Order

The applicative order strategy requires that in any β redex, the argument position is in normal form before the β redex is chosen for reduction.

10.5 Call by name reduction

Apply normal order strategy but stop when you hit an abstraction: no reductions inside abstractions.

This was the strategy adopted in the Algol programming language. Although now dead, Algol pioneered many concepts in the design of programming. It was amongst the first high level programming languages based directly on λ-calculus. The official report defining the Scheme/Racket language is dedicated to the memory of Algol.

The call by name strategy can cause duplicate evaluations.

The other language that is based on the call by name strategy is the programming language Haskell. Actually Haskell uses a strategy called lazy evaluation. Lazy evaluation may be seen as an optimization over the call by name strategy in which values once computed are stored, to avoid recomputation.

10.6 Call by value

The call by value strategy uses the following rules:

1. Choose the outermost redex
2. Do not choose any redex inside a λ abstraction.
3. When applying a β-redex, ensure that the argument expression is in normal form.

Here is an example of simplification using a call-by-value strategy:

\begin{align*} (\id{Id}\ (\id{Id}\ (\lambda\ z.\ (\id{Id}\ z)^4)^3)^2)^1 &\reducesto_{\beta:2} \\ (\id{Id}\ (\lambda\ z.\ (\id{Id}\ z)^4)^3)^1 &\reducesto_{\beta:1} \\ (\lambda\ z.\ (\id{Id}\ z)^4)^3 &\not\reducesto \\ \end{align*}

The call-by-value strategy is the one most commonly by programming languages: C, Java, Scheme, Python, included.

We say that in call-by-value, function application is strict on the arguments: the application doesn't trigger unless the arguments are in normal forms.

10.7 Comparison

Here is an example of the expression

\[(\lambda\ x.\ (\lambda\ z.\ \id{Id}\ \ \id{Id})) (\lambda\ y. \Omega)\]

simplified under various reduction strategies

Strategy	Result
Full reduction	non-termination or $\lambda\ z.\ \id{Id}$
Normal Order	$\lambda\ z.\ \id{Id}$
Applicative Order	non-termination
Call by name	$\lambda\ z. \id{Id}\ \id{Id}$
Call by value	$\lambda\ z.\ \id{Id}\ \id{Id}$

11 References

11.1 Online notes/papers

NPTEL course on POPL: Church-Rosser lecture notes
TAPL chapter on The Untyped Lambda Calculus: Photocopy of chapter from Types and Programming Languages by Benjamin Pierce. Available on reserve at the library.
Peter Selinger's notes on the Lambda Calculus: The definitions of renaming, α equivalence, substitution application and the proof sketch for the Church-Rosser property of λ-calculus is based on these notes.

Susan Horwtiz's notes on the lambda calulus: Has lot's of good examples.
A mechanical proof of the Church Rosser Theorem: by N. Shankar. JACM 1988. An account of perhaps the earliest mechanised proof of the Church-Rosser theorem for Lambda Calculus. Uses the main idea of a walk (parallel reduction). Shankar quotes Rosser about the long and somewhat tortuous history of the proof of the theorem. Note that "reduction" in the paper refers to the notion of simplification as done in class.
Kashima's proof of the Standardization theorem
Dave Schmidt's (Kansas State Univ) chapter on Lambda Calculus

11.2 Other resources

Lambda Calculus Types and Models by Jean-Louis Krivine

Univ Chicago Class notes on λ-calculus: by Acar and Ahmed.
OpenDSA Programming Languages

Introduction to Lambda calculus by Barendregt and Barendsen

History of the Lambda-Calculus and Combinatory Logic: by Felice Cardone and J Roger Hindley
Smolka's notes on Confluence and Normalization in Reduction Systems

Logical Verification slides on Confluence
Normal forms: discusses diff between NF, HNF and WHNF.
Harvard CS152 lec on Reductions and λ calculus
Vriej University book on Term Rewriting

Short, neat intro to the Lambda calculus by Paul Hudak
Columbia Univ Fall 2010 slides on the lambda calculus

Head Normal Form
WHNF
Wikipedia page on Beta normal form: gives formal defn of head normal form.

Normark's online book on Programming Languages: Rewrite rules, reduction and normal forms.

11.3 Video Resources

Church-Turing Thesis: Story and Recent Progress: Lecture by Yuri Gurevich

Strategy	Result
Full reduction	non-termination or \(\lambda\ z.\ \id{Id}\)
Normal Order	\(\lambda\ z.\ \id{Id}\)
Applicative Order	non-termination
Call by name	\(\lambda\ z. \id{Id}\ \id{Id}\)
Call by value	\(\lambda\ z.\ \id{Id}\ \id{Id}\)