CSE 130 Lecture Notes

CSE130 LECTURE NOTES

February 7th, 2001

MISCELLANEOUS

Use of Web board (do/don't), email (don't), office/lab hours (do) ...
use of "built-in" or "external" functions: well... ok, if you have to, but make your code and documentation self-contained! (and spend the time on understanding the problem instead of "localizing nearby solutions" ...)

SYNTAX IN GENERAL

Syntax as a field of study in (programming) linguistics is about the appearance and structure of expressions. (We'll use the word expression to include any self-contained fragment of a program, for example a subroutine definition.)

A grammar specifies (1) conceptual structure and (2) lexical detail. We are interested in understanding and representing (1), which is called abstract syntax because it leaves out the irrelevant details (2).

There can be many paths to meaning (semantics), and there can be many opinions on how expressions are structured, even if there is agreement on which expressions are legal.

There is less disagreement for programming languages than for natural languages, because PLs are designed, and they have unique meanings.

The structure of an expression is (almost always) a tree (e.g., an operator tree). The key structural relationship is what is a subexpression (i.e. child) of what.

Usually the number of children is fixed and their order is significant, but there are exceptions (e.g., some binary operators like "+" can be considered polyadic operators that work on lists of arguments).
Q: Can you think of other examples?

More Examples/Questions:

Is the order always important?
What is the structure of the arithmetic expression:
```
	x + y * z
```
How about this?
```
	x op1 y op2 z
```
What is the structure of the following programming construct?
```
	IF c1 THEN IF c2 THEN x ELSE y
```

And this:

	member x (y:ys)	= if x==y then True else member x ys

Parsing follows a grammar to check and discard the lexical detail of an expression, while discovering its abstract syntactic structure.

A Haskell DATATYPE IS AN ABSTRACT SYNTAX NOTATION

There are many ways to notate abstract syntax. The most important are tree diagrams, and fully parenthesized expressions.

Consider, for example, typical syntax for simple arithmetic expressions like (1*2)+(3+4). We can represent the set of all such expressions as a recursive type.

In Haskell we just use data constructors like Plus, Times, etc. for constructing the arithmetic expression, which then become instances of a type Expr a of arithmetic expressions over some type a (recall that Expr is a type constructor):

data Expr a = 
  	Constant a  
	| Plus (Expr a) (Expr a) 
        | Minus (Expr a) (Expr a) 
	| Times (Expr a) (Expr a) 
	deriving Show                   -- so we can show them easily

Remember that you can declare a pointer-based type, say in C, with exactly the same conceptual organization! With the data structure defined above, it is easy to write a recursive function that evaluates expressions:

eval                :: Num a => Expr a -> a   -- a should be a "Num" type
                                              -- ... for this to work  
eval (Constant x)   = x
eval (Plus x y)	    = (eval x) + (eval y)
eval (Minus x y)    = (eval x) - (eval y)
eval (Times x y)    = (eval x) * (eval y)

Notice how this function follows the structure of the Expr datatype exactly!

Example: let's give it a try ...

myexpr0 :: Expr Integer                -- over Integer values
myexpr0 = 
	Plus 
		(Times 
			(Constant 1)
			(Constant 2))
		(Plus
			(Constant 3)
			(Constant 4))

Now we can have the system show the value of myexpr0 and evaluate it using our function eval:

Main> myexpr0
Plus (Times (Constant 1) (Constant 2)) (Plus (Constant 3) (Constant 4)) :: Expr Integer
(223 reductions, 514 cells)

Let's have the system infer the type for this expression:

myexpr1 =
	Plus
		(Times 
			(Constant 1.5)
			(Constant 2))
		(Plus
			(Constant 3)
			(Constant 4))

Note that after the Haskell system has parsed the expression and done the type inference (i.e., at compile-time), the type of myexpr1 is known (you have probably already seen evidence that the system does type checking at compile-time: when you try to load a Haskell file and the system reports type errors!!)

Since the above loads/compiles without errors, we can ask the system for the type of myexpr1 or just have it "print" the value of myexpr1 (here, "hugs +t +s" was used):

Main> :info myexpr1
myexpr1 :: Expr Double

Main> myexpr1
Plus (Times (Constant 1.5) (Constant 2.0)) (Plus (Constant 3.0) (Constant 4.0)) :: Expr Double
(223 reductions, 522 cells)

Using our eval function above, we can evaluate myexpr0 and myexpr1:

Main> eval myexpr0
9 :: Integer
(21 reductions, 27 cells)

Main> eval myexpr1
10.0 :: Double
(21 reductions, 27 cells)

USES OF ABSTRACT SYNTAX

One use is to have a notation with details removed for use in compilation, interpretation, etc. Remember the "dragon book" on compiler design. The dragon is the "complexity of compiler design". The knight who vanquishes the dragon carries the emblem "syntax directed translation". This means first recovering abstract syntax, then doing code generation from the abstract syntax tree.
Another use is as a common "interlingua" for multiple concrete syntaxes.
Consider a web page creation program. The same functionality (i.e. semantics) can be controlled by many interfaces:
- a compiled language (e.g. TeX)
- an interpreted language (e.g. HTML)
- an interactive command line (e.g. vi)
- a graphical user interfaces (GUI) with
  - menu choices
  - direct manipulation
  - a subroutine library, i.e. an API (application programming interface)
Abstract syntax makes it possible to systematically consider all possible expressions in a given sublanguage, and therefore to prove that a property is true of all these expressions. This prinicple is called structural induction. For example one can use the recursive definition above to prove a property of all simple arithmetic expressions.

Structural Induction is a mathematical proof concept which is a generalization of the Principle of Mathematical Induction over the integers:

Assume that p(n) is a statement (property) over integer numbers n and assume further that we want to show that p(n) is true for all integers n. To prove this, we "only" need to show that
- (Induction base: n=0): p(0) is true
- (Induction step: n->n+1): if p(n) is true, then p(n+1) is also true
If both steps can be proven, then we have proven that indeed p(n) is true for all integers n.