Lecture notes on the ADDITION
Language: Operational Semantics
Table of Contents
- 1. Introduction
- 2. Semantic Engineering
- 3.
ADDITION
Syntax - 4.
ADDITION
Rewrite (redex) rules - 5. Reduction (small step semantics)
- 6. Simplification (reflexive-transitive closure of reduction)
- 7. Theorem (Simplification compatible closure)
- 8. Values
- 9. Semantic Properties of the ADDITION language
- 10. Valuation (Big step) Semantics
- 11. Theorem (Consistency between small-step and valuation semantics)
- 12. Implementations (Reduction Strategies)
- 13. Theorem (Relating reduction, evaluation and implementation)
1 Introduction
We define a small language called ADDITION
, the language of
arithmetic expressions over numbers and the addition operator.
While the language itself is perhaps trivial, it is still a
useful test bed to introduce the two essential elements of the
design and analysis of programming languages: syntax and
semantics. Syntax defines what constitute expressions in a
language. Semantics is concerned with their meaning. Meaning
is arrived as the culmination of the execution of abstract
machine(s) that manipulate expressions. This process, called
evaluation, is the process by which meaning is invested into
expressions of a language. The semantic approach we employ is
called operational semantics, to distinguish from other types
of semantic approaches (denotational, axiomatic). It consists
of essentially six key components, each of which is defined as a
relation involving syntax:
- 1. Equational identities
- 2. Rewriting (reading the equations left to write)
- 3. Reduction (applying a rewrite rule to a subexpression)
- 4. Simplification (a sequence of reductions)
- 5. Evaluation (simplification to a value)
- 6. Implementation (reduction strategies)
1.1 Motivating example
Intuitively, the approach of operational semantics is not very different from how we do algebra in high school. We evaluate algebraic expressions by the process of simplification. Each simplification step consists of a reduction that transforms an expression to another expression. Each reduction is driven by the instantiation of a rewrite rule, an equation driven in a particular direction. For example the equation \(x/x = 1\) (provided \(x\neq 0\)) could be used to drive the rewrite rule \(x/x \rewritesto 1\) (read `rewrites to').
- Algebraic identities
- These come from our knowledge of the underlying domain. E.g., variables in the identities represent real numbers, and the operations * and / represent division and multiplication in a field.
- Rewrite Rules
- These are obtained by orienting the algebraic identities in a particular direction (from left to right):
Consider the expression \((3/3)*(y/y)\), where \(y\neq 0\). This expression reduces to 1*y/y by the application of a rewrite rule. The subexpression that matches the left side of the rewrite rule is called a redex. A sequence of reductions constitute a simplification. Answers are numbers; numbers are already in the simplest possible form: they can not be reduced further. A simplification that results in an answer is called an evaluation.
- Reduction, Simplification to an answer
- Here is an example of an evaluation.
We might say that the meaning of \((3/3)*(y/y)\) when \(y\neq 0\) is \(1\) because this expression evaluates to 1.
2 Semantic Engineering
Numerous questions arise: Does every expression simplify to an answer? Could the simplification yield multiple answers? What if the simplification process never finishes? What if multiple rewrite rules are applicable for reduction? Which one do we choose? Does it matter which one we choose first? In terms of efficiency? In terms of correctness? What does `correctness' mean anyway? What other machinery is needed to carry out simplification besides the expressions themselves, et cetera.
A semantic engineer tasked with designing a programming language is concerned with such questions, many of which have subtleties that miss the casual eye. The answer involves comparing multiple choices in the design space, with competing requirements inevitably yielding to some compromises, not uncommon in engineering design.
3 ADDITION
Syntax
4 ADDITION
Rewrite (redex) rules
\[\rewritesto \subseteq \id{EXP}\times \id{EXP}\]
\begin{align*} \overline{n_1}+\overline{n_2} \rewritesto \overline{n_1+n_2}\quad \scriptstyle{PLUS} \\ \\ \\ \end{align*}The understanding is that the above rule defines an (infinite) family of rules, one for each pair of numbers.
5 Reduction (small step semantics)
\[\reducesto \subseteq \id{EXP}\times \id{EXP}\]
\begin{align*} \frac{e \hookrightarrow e'}{e\reducesto e'}\quad \scriptstyle{REWRITE} \\ \\ \\ \frac{e_1\rightarrow e_1'}{e_1+e_2 \reducesto e_1'+e_2}\quad \scriptstyle{LEFT} \\ \\ \\ \frac{e_2\reducesto e_2'}{e_1+e_2 \reducesto e_1+e'_2}\quad \scriptstyle{RIGHT} \\ \end{align*}\(e\) is reducible, written \(e\reducesto\), if there exists an \(e'\) such that \(e\reducesto e'\). \(e\) is irreducible if \(e\) is not reducible.
6 Simplification (reflexive-transitive closure of reduction)
\[\simplifiesto \subseteq \id{EXP}\times \id{EXP}\]
\begin{align*} \frac{}{e \simplifiesto e}\quad \scriptstyle{REFLEX} \\ \\ \\ \frac{e \reducesto e'}{e\simplifiesto e'}\quad \scriptstyle{REDUCE} \\ \\ \\ \frac{e_1\reducesto e_2\quad e_2\simplifiesto e_3}{e_1\simplifiesto e_3}\quad \scriptstyle{TRANS} \\ \\ \\ \end{align*}7 Theorem (Simplification compatible closure)
[L1]
- 1. If \(p \simplifiesto r\) then \(p+q\simplifiesto r+q\).
- 2. If \(q \simplifiesto s\) then \(p+r\simplifiesto p+s\).
7.1 Proof
7.1.1 Part 1
By induction on the derivation of \(p\simplifiesto r\).
- 1. \(\cfrac{}{p\simplifiesto r}\quad\scriptstyle{REFLEX}\): In this case \(p=r\). Therefore, we have \[\frac{}{p+q\simplifiesto r+q}\quad{\scriptstyle{REFLEX}}\]
- 2. \(\cfrac{p\reducesto r}{p\simplifiesto r}\quad{REDUCE}\): The desired derivation of the simplification is obtained as
\[\cfrac{\cfrac{p\reducesto r}{p+q\reducesto r+q}\quad{LEFT}}{p+q\simplifiesto r+q}\quad{\scriptstyle{REDUCE}}\]
3. \(\cfrac{{p\reducesto s}\quad {s\simplifiesto r}}{p\simplifiesto r}\quad{TRANS}\):
Since \(s\simplifiesto r\), by the induction hypothesis \(s+q\simplifiesto r+q\).
Thus we have the following derivation of simplification:
\[ \cfrac{\cfrac{\cfrac{p\reducesto s} {p+q\reducesto s+q}\quad LEFT} {p+q\simplifiesto s+q}\quad{REDUCE} \qquad{s+q\simplifiesto r+q \quad (IH)}} {p+q\simplifiesto r+q}\quad{TRANS} \]
7.1.2 Part 2
The proof is very similar to that of Part1 and is omitted.
7.2 Derived Rules
From the above, we have
\begin{align*} \frac{p\simplifiesto r}{p+q\simplifiesto r+q}\quad {LEFT*}\\ \\ \frac{q\simplifiesto s}{p+q\simplifiesto p+s}\quad {RIGHT*} \end{align*}8 Values
We identify a set of values and a subset of expressions that represent those values. In the ADDITION language, values are numbers.
\begin{align*} v ::= & & \scriptstyle{VAL} & \qquad \mbox{Value}\\ & n & \scriptstyle{NUM} & \qquad \mbox{Number}\\ \end{align*}8.1 Values represented by literals
Certain expressions, the literals of the form \(\overline{n}\) represent values. More precisely, a literal \(\overline{n}\) represents the value \(n\).
Note that literals are irreducible.
9 Semantic Properties of the ADDITION language
9.1 Theorem (Normal forms under \(\reducesto\) are literals)
9.1.1 Statement
\(e\) is irreducible iff \(e\) is a literal \(\overline{n}\) for some number \(n\).
9.1.2 Proof
The 'if' part follows immediately since there is no reduction for a number literal.
For the `only if' we show that \(e\) is not a literal implies \(e\) is reducible. The proof is by induction on the structure of \(e\).
If \(e\) is not a literal, then it is an addition expression, say \(e_1+e_2\). Either both \(e_1\) and \(e_2\) are literals \(\overline{n_1}\) and \(\overline{n_2}\) respectively, or one or more of \(e_1\) and \(e_2\) are not. In the first case, \(e\) reduces to \(\overline{n_1+n_2}\). In the second case, assume \(e_1\) is not a literal. By the induction hypothesis, \(e_1\) is reducible. By the LEFT rule \(e\) is reducible. The case assuming \(e_2\) is not a literal is similar and omitted.
The only if condition, viz, ``if \(e\) is not a literal implies \(e\) is reducible'' is called the progress condition. It's possible in other languages, and we will see several later, that the progress condition may be violated.
9.2 Theorem (Termination)
9.2.1 Statement
For every \(e\), there is no infinite chain of reductions \[e\reducesto e_1 \reducesto e_2 \ldots\]
9.2.2 Proof
The proof is by structural induction on \(e\). Note that the size of the expression reduces in each reduction. If \(e\reducesto e'\), then \(|e'| < |e|\).
9.3 Theorem (Normalizing)
9.3.1 Statement
\(\reducesto\) is normalizing. Every expression \(e\) has a normal form.
9.3.2 Proof
Follows from Termination.
9.4 Theorem (Local Confluence)
9.4.1 Statement
If an expression reduces to two different expressions, then those two expressions simplify to a common third expression. If \(e\reducesto e_1\) and \(e\reducesto e_2\), there there exists \(e_3\) such that \(e_1\simplifiesto e_3\) and \(e_2 \simplifiesto e_3\).
9.4.2 Proof
The proof is based on well-founded induction on the \(\reducesto\) relation over expressions. Notice that our induction scheme is based on the inductive structure of the derivations of the reduction relation \(\reducesto\), not the structure of expressions (syntax).
Assume \(e\reducesto e_1\) and \(e\reducesto e_2\). Then we have the following cases:
- 1. \(e \reducesto e_1\) using REWRITE: Then \(e=\overline{n_1}+\overline{n_2}\). It follows that \(e_1\) and \(e_2\) are each \(\overline{n_1+n_2}\) and then \(e_1 = e_2 = e_3\), and hence \(e_1\simplifiesto e_3\) and \(e_2\simplifiesto e_3\).
- 2. \(e \reducesto e_1\) using LEFT: Then we have \(e = p+q\)
and \(e_1=p'+q\) such that \(p\reducesto p'\). Then there
are two possibilities for \(e\reducesto e_2\):
2.1. \(e\reducesto e_2\) using LEFT: We have \(e=p+q\) (from the parent assumption) and \(e_2=p''+ q\) such that \(p\reducesto p''\).
From the induction hypothesis, since \(p\reducesto p'\) and \(p\reducesto p''\), both \(p'\) and \(p''\) reduce to a common expression \(r\). Using LEFT*, \(p'+q\simplifiesto r+q\) and \(p''+q\simplifiesto r+q\). Thus both \(e_1\) and \(e_2\) simplify to a common expression.
- 2.2. \(e\reducesto e_2\) using RIGHT: We have \(e=p+q\) (from the parent assumption) and \(e_2=p+q'\) such that \(q\reducesto q'\). Then \(p'+q\) and \(p+q'\) simplify to the common expression \(p'+q'\) with the application of a LEFT and RIGHT rule, respectively.
The other cases are similar and left as an exercise.
9.5 Theorem (Confluence)
9.5.1 Statement
\(\reducesto\) is confluent.
9.5.2 Proof
Follows from termination and local confluence (see notes on Abstract Reduction systems).
9.6 Theorem (Convergence to a value)
9.6.1 Statement
Every expression simplifies to a unique literal.
9.6.2 Proof
Confluence and termination means convergence. Termination implies at least every expression simplifies to at least one normal form. Confluence implies at most one. From this it follows that every ADDITION expression reduces to a unique normal form. Since all normal forms in the ADDITION are literals, the result follows.
10 Valuation (Big step) Semantics
We define the value of an expression to be the number of the number literal to which the expression simplifies.
We write this as \(e\evaluatesto v\). The EOPL text also writes this as \(v = \id{valueof}(e)\). Notice that for ADDITION, it makes sense to say ``the'' value because the number literal to which an expression simplifies to is unique (from the above theorem).
10.1 Evaluation Rules
\[\evaluatesto \subseteq \id{EXP}\times \id{NUM}\]
\begin{align*} \frac{}{\overline{n}\evaluatesto n}\quad \scriptstyle{NUM} \\ \\ \\ \frac{{e_1\evaluatesto n_1}\qquad {e_2\evaluatesto n_2}}{e_1+e_2 \evaluatesto n_1+n_2}\quad \scriptstyle{PLUS} \\ \\ \\ \end{align*}11 Theorem (Consistency between small-step and valuation semantics)
11.0.1 Statement
\[e\simplifiesto \overline{n}\] iff \[e\evaluatesto n\]
11.0.2 Proof
Left as an exercise.
12 Implementations (Reduction Strategies)
Notice the non-determinism inherent in the reduction relation \(\reducesto\). Either the left or the right side of a compound expression may be reduced first. The deduction system doesn't specify which one. Implementations sometimes stipulate a specific order for the evaluation of the subexpressions. This may be necessary in the presence of side effects, for example.
The strategy we employ reduces the left subexpression first to a value. Only then does it reduce the right subexpression. (using the same strategy). We call this reduction strategy left-order.
12.1 Implementation via left-order reduction
\[\xto{}{L} \subseteq \id{EXP}\times \id{EXP}\]
\begin{align*} \frac{}{\overline{n}+\overline{n_2}\xto{}{L}\ \overline{n_1+n_2}}\quad \scriptstyle{REWRITE} \\ \\ \\ \frac{e_1\ \xto{}{L}\ {e'_1}}{e_1+e_2\ \xto{}{L}\ {e'_1}+e_2}\quad \scriptstyle{LEFT} \\ \\ \\ \frac{e_2\ \xto{}{L}\ {e'_2}}{\overline{n_1}+e_2\ \xto{}{L}\ \overline{n_1}+{e'_2}}\quad \scriptstyle{RIGHT} \\ \\ \\ \end{align*}13 Theorem (Relating reduction, evaluation and implementation)
13.0.1 Statement
\(e\simplifiesto \overline{n}\) \(\qquad\) iff \(\qquad\) \(e\xto{*}{L} \overline{n}\) \(\qquad\) iff \(\qquad\) \(e\evaluatesto n\).
13.0.2 Proof
Exercise.