CSE 130 Lecture Notes

CSE130 LECTURE NOTES

February 28th, 2001

MISCELLANEOUS

Class schedule update:
- Mon, Mar 5: 4:40-6pm discussion section (swapped with Feb. 28 lecture)
- Wed, Mar 7: 2:30-3:20pm additional (short) lecture
- Mon, Mar 12: ...
- Wed, Mar 14: ...
- Fri, Mar 23, 3-6pm: FINAL, SOLIS 107

NAMES (User-defined Identifiers)

Names in PLs for variables (but also functions, procedures, classes, and types) are more important than you may think (think of program development, debugging, maintenance).
Some design issues:

are "connector characters" ("_") allowed (Pascal, FORTRAN: no, others: yes)?
are names case sensitive?
are there conventions ("x" vs "xs" in Haskell, "X" vs "Xs") or implicit declarations ("I".."N" in FORTRAN, "@myarray" in Perl)

=> implications for readability

In order to capture the meaning of a name (identifier), declarations are used in most languages (variable declaration, function declaration, ...).

In compiled languages, names often exist only at compile time (unless compilation includes "debugging options"); at runtime, only values and their memory addresses exist.
In interpreted languages names and values coexist.

In the sequel we focus on variable names.

Properties that come with a (variable) name and its declaration and use:

the name itself
the location (beginning/end address) where the value will be stored
the type
the actual value/representation
the scope
the lifetime

BINDING

A binding is an association, e.g., between a (variable) name and the storage object (memory area) where the value of the variable is stored.

The binding time says when the binding happens:

when the language is designed ("+", ..) => "language design issue"
when it is implemented (e.g., concrete representation for floats) => "implementation issue"
when a program is compiled (e.g., binding of variable names to types in C, Haskell)
when a program is loaded (e.g., a static variable to a memory location in C)
when a program is executed (e.g., dynamic variables to memory locations)

Def. A static binding is one that happens before runtime and remains unchanged until program terminantion. A dynamic binding is one that occurs during runtime or may be changed during runtime.

Q: Think about how binding time may effect efficiency, safety, and flexibility of programs.

TYPE BINDINGS

Q: When are types bound to variable names:

in Haskell (monomorphic types, polymorphic types)?
in Java?

STORAGE BINDINGS

We speak of memory allocation if memory is taken from a pool of available memory. By deallocating the storage, it is put back into the pool of free memory.

LIFETIMES

Def.: The lifetime of a variable is the time during which its name is bound to a storage object (memory).

We speak of static lifetime if the variable name is bound to memory cells before program execution and if this binding (not the value!) remains unchanged throughout program execution.
Example: static variables in C, C++

One use for such a variable is to keep track of the seed used by a pseudo-random number generator:

function prng (): int; 
    static var seed : int = initialval; 
begin 
    seed := update(seed); 
    return transform(seed); 
end;

Here, we do not want the variable seed to have a new lifetime each time prng is called, but we do want this variable to be visible only inside prng.

Characteristics:

efficiency (no dynamic alloc/dealloc, direct addressed instead of offset-based addresses on the stack)
"stateful" procedures/functions (the "clean" version of this => objects in OOP have state; but allow access only via methods)
no support for recursion

We speak of stack-dynamic lifetime when storage bindings are created when a procedure or function is called ("the variable declaration is elaborated").
Example: local variables in C, C++, Pascal

Characteristics:

works for recursion
storage is conserved (automatically deallocated when the current block terminates or "returns"), but overhead of allocation/deallocation
"stateless" procedures/functions
indirect (stackpointer+offset) addresses

Heap-dynamic lifetime refers to the situation that a storage binding (allocation and deallocation) happens via explicit statements/commands at runtime
Examples: dynamic datastructures via malloc in C, Java objects

Characteristics

dynamic storage management
often explicit control of lifetime
lifetime not tied to the lifetime of the enclosing block
all values are addressed indirectly (via references or pointers)
garbage collection may be automatic (Java) or the responsibility of the programmer (C)

SCOPE

Recall the definition:

A binding occurrence of an identifier is an appearance of the identifier where its binding gets established.
An applied occurrence of an identifier is an appearance of the identifier where its binding is used.
The scope of an identifier is the region of the program where applied occurrences refer back to this binding occurrence, i.e., the lexical range in which that identifier is visible.
The free (or non-local) variables of a block are those that are visible but not declared in the block

NOTE: scope and lifetime can be closely related (e.g., for local variables of a Pascal procedure or a C function), but are different concepts!!

For example, scope corresponds to "visibility", is textual (and applies to all identifiers). In contrast, lifetime is temporal (and applies only to variables) Example for different variable occurences:

begin 
    const pi; real pi := 3.14159; real x := 0.0; 
    procedure foobar(real z) is ...; 

    function tan(real x):real; 
    begin 
        real arg := x mod (2.0*pi); 
        return sin(arg)/cos(arg) 
    end; 

    ... foobar(x); ... 
end;

(1) "pi" is bound to a constant real value. "x" is bound twice, once to a real-valued variable, and once to a real-valued parameter.

(2) There are three environments involved here: the global environment which has bindings for "pi", "x", "foobar", and "tan", and two "extended environments" which additionally include bindings for (function/procedure) parameters (more precisely: parameter identifiers!).

(3) There are two related binding occurrences of "pi" and two unrelated binding occurrences of "x".

(4) There are two unrelated applied occurrences of "x".

(5) The scope of the binding occurrences of "pi" is the whole program. The scope of the first binding occurrence of "x" is the main program body. The scope of the second binding occurrence of "x" is the body of "tan".

STATIC SCOPING

The scoping rules of a PL determine how occurrences of variable names are associated with variables.

Under static (lexical) scoping the binding occurrence can be found soley by looking at the program text (hence at compile time):

find the innermost declaration that encloses the applied occurrence
the chain of enclosing blocks constitute the static ancestors

Example:

proc P0

var x: integer

proc P1(x: integer)

begin

x := x+1;

P2;

end

proc P2

begin

x := x * 2

end

begin

x := 1

P1(x);

print x;

end

Variable declarations of enclosed blocks hide variable declarations of the same name in the enclosing block.
(Some languages including C++ and Ada allow access to these "hidden variables" using fully qualified names)

ALPHA-CONVERSION, INVARIANCE UNDER RENAMING

Alpha-conversion is the rule that the meaning of a block is unchanged if the name of a bound variable (recall: binding occurrences include local variable declarations and formal parameters!) is changed everywhere inside the block. Clearly alpha-conversion is a valid rule under static scoping.

DYNAMIC SCOPING

Under dynamic scoping, the (runtime) calling sequence of procedures/functions/blocks governs visibility, not the textual layout.

Hence, instead of looking at the first static ancestor of a block (at compile time), now the chain of procedure calls has to be considered at runtime to find the first matching variable name.

Example: compare the output of static and dynamic scoping for the code above.

Another example: suppose the function scale is used inside another procedure:

procedure update (v: vector); 
begin 
    var s := norm(v); 
    for j := 1 to length(v) do scale(v[i]) 
end;

Static scoping says that the "s" local to "update" is irrelevant to the function "scale", so the vector v will be magnified tenfold, not normalized.

"Dynamic scoping" is a different scoping discipline. It says that an applied occurrence refers to the binding occurrence in the most recently entered scope.

Under dynamic scoping, the function "update" would normalize a vector v.

Dynamic scoping violates alpha-conversion: if the name "s" in the "update" procedure body was changed to "t", under dynamic scoping a different effect is obtained.

Under static scoping, if all function values are created at compile-time, then identifiers can be discarded at run-time, because each use of a name can be compiled into a stack offset.

Dynamic scoping requires keeping knowledge of names at runtime. Names refer to values in the most recently entered scope, which can only be known at runtime.

Sometimes dynamic scoping is convenient. The behavior of a function can be changed without recompiling, just by calling it from inside an environment where global identifiers have different bindings.

However the same behavior can be obtained with static scoping and assignment:

var s = 10.0; 
function scale (x: real): real; 
begin 
    return x*s; 
end; 
procedure update (v: vector); 
begin 
    var l: integer := length(v); 
    s := norm(v); 
    for j := 1 to l do scale(v[j]) 
end;

Dynamic scoping makes programs harder to understand. The example above shows that assignment commands also make programs harder to understand.

The fundamental advantage of functional programming languages is that they are easier to understand, because they do not have the complexity of assignment commands.

If a program is easier to understand, it is also easier for a compiler to optimize.

STATIC VS DYNAMIC SCOPING

Static Scoping

Advantages:
- Readability
- Locality of reasoning
- Less run-time overhead
Disadvantages:
- Some loss of flexibility

Dynamic Scoping

Advantages
- Some extra convenience
Disadvantages:
- Loss of readability
- Unpredictable behavior (no locality of reasoning)
- More run-time overhead

PERSISTENT VARIABLES

Definition: A persistent variable is one that keeps its value after program execution, and that has an initial value before program execution.

Files are persistent variables.

Most languages have implementation-dependent restrictions on persistent variables. For example in Pascal a persistent variable must be of the type file of X for some component type X.

These limitations were to take account of implementation constraints, but they are excessive. For example Pascal has no random-access files.

FUNCTIONS AND PROCEDURES AS ABSTRACTIONS

Functions are abstractions of expressions:

A pure function is like an expression. When it is called, it yields a result value.

More precisely, a function is an abstraction of an expression. All the user of the function needs to know is what value will be returned. Exactly how this value is computed is a detail that is private to the function.

Consider for example two alternative cosine functions:

function cosine (x: real): real; 
begin 
    return sine(x + pi/2.0); 
end; 
function cosine (x: real): real; 
begin 
    return integrate(sine,0,x,epsilon); 
end;

Each function "encapsulates" a very different expression, but for the same argument, both always yield the same result.

Procedures are abstractions also:

A procedure is an abstraction of a command. The user of a procedure only needs to know how it updates its parameters and global variables. It doesn't matter what the procedure does with local variables.

Just like for functions, procedures with different bodies and different algorithms can have exactly the same effect, i.e. they are the same from the user's abstract point of view.

ACTUAL AND FORMAL PARAMETERS

In all programming languages, functions and procedures can have one or more arguments. First we need some terminology:

"Parameter" and "argument" are synonymous
When a function or procedure is called, the argument variables or values given to it are called actual arguments
The parameters specified in the function or procedure signature are called formal parameters.

The binding occurrence of a formal parameter name is in the signature. Occurrences in the body of the function or procedure are applied.

Remember the rule of alpha-conversion: the meaning of a block is unchanged if a name that is bound locally in the block is changed everywhere inside the block.

In most languages alpha-conversion is a valid rule for formal parameters. This implies that the details which are hidden from the user of a function or procedure include the names of formal parameters.

So which actual arguments match which formals should not be specified by name. It is usually specified by position: the parameter type of a function is a tuple type.

In Ada arguments can be specified by name using the same notation as for record aggregates. For example a function call might be:

integrate(epsilon => 0.001, f => sine, a => 0.0, b => pi);

This allows actual arguments to be given out of order, and default values can be used for unspecified arguments: e.g. epsilon above could have been left out if the default value was satisfactory. Common Lisp also has named parameters.

SEMANTICS OF CALLING FUNCTIONS AND PROCEDURES

Informally, the semantics of calling a function or procedure is as follows:

evaluate the actual arguments
pass the actual arguments as values for the formal parameters
execute the block that is the function or procedure body
return any results back.

We need to clarify each of these four stages. The point to notice here is that expressions are never passed as arguments in mainstream languages. They are always evaluated first, and the result values are passed.
(this is also called strict evaluation; recall that some functional languages like Haskell have non-strict, aka "lazy", "on demand", or "call by need" evaluation)

Remember the discussion of evaluation order for expressions. An operation like + applied to two arguments, e.g. 2*x + 3*y is really a function-call plus(times(2,x),times(3,y))!

The most important rule of expression evaluation is that arguments are evaluated before the function applied to them is called. Most of the time, that is: if-then-else is special many languages! Languages differ in whether or not it is specified in what order multiple arguments are evaluated.

CALL BY VALUE

In the following procedure declaration f1 and f2 are formal parameter names and x is the name of a local variable.

procedure p(f1,f2:t); 
    var x: t; 
begin 
    ... 
end;

Given the call "p(e1,e2)" where e1 evaluates to the value a1 and e2 evaluates to the value a2, the simplest semantics of parameter-passing says that the call is equivalent to executing the block

begin 
    const f1 := a1; 
    const f2 := a2; 
    var x: t; 
    ... 
end;

This method of parameter-passing is called call-by-value. A variant allows f1 and f2 to be local variables as opposed to local constants.

CALL BY VALUE-RESULT

When an actual argument is an expression, it makes no sense to use it for returning a result value. However when an actual argument is a variable name, the final value of the formal parameter can be copied back into the actual argument variable. More formally, given

procedure p(inout f:t); 
begin 
    ... 
end;

the call "p(x)" where x is the name of a variable is equivalent to

begin 
    var f:t := a; // "a" denotes the current value of x 
    ... 
    x := f; 
end;

Call by value-result is used in Ada.

CALL BY REFERENCE

When an actual argument is a variable name, it makes sense to pass the location denoted by the name to the called procedure. Then the formal parameter name acts as a local name for the global location.

This is the semantics of Pascal "var" parameter-passing. We can explain it more formally as follows. Given

procedure p(var f:t); 
begin 
    ... 
end;

the call "p(x)" where x is the name of a variable is equivalent to

begin 
    const f: pointer to t := location(x); 
    ... 
end;

In some languages, for example C, call-by-reference is not available but in can be achieved by using call-by-value and passing the location of a variable instead of its value (since we can easily get the value of the memory location of a variable using the "&" operator... ).

CALL BY MACRO

The idea in call-by-macro is that the actual argument is textually inserted into the called block. This contradicts the statement earlier that the first step in the semantics of function-calling is to evaluate all the argument expressions!

Consider the pseudo-Pascal summation function

function sum (lo,hi:int; macro k:int; macro f:real): real; 
    var s: real := 0; 
begin 
    for k := lo to hi do s := s + f; 
    return s; 
end;

With call-by-macro the actual argument expressions for k and f will be evaluated again each time the formal parameter is used, so we can write for example sum(1,n,j,a[j]*b[j]) to compute the dot-product of two vectors a and b of length n.

THE IMPORTANCE OF CALL-BY-MACRO

All the other parameter-passing methods are basically similar: they work by initializing the formal parameters as local names in an environment used for executing the called block. They are all variants of "call-by-environment." Most practical programming languages use call-by-environment, but call-by-macro is theoretically important because

it is a common informal way to reason about invocation: simulating mentally step-by-step what the function body does with the actual arguments in place of the formal parameters
it is used in the lambda calculus, an abstract mathematical programming language
it is formally simpler: it doesn't introduce these complex objects called "environments"
it does appear in practical languages under the guise of macros, and in pure functional languages

In Algol 60, call-by-macro is called call-by-name, but implemented using higher-order functions (i.e. using function arguments).

Consider the pseudo-Pascal summation function using call-by-macro from before. This can be implemented using call-by-value with function arguments:

function sum (lo,hi:int; ref k:int; f:void->real); 
    var s: real := 0; 
begin 
    for k := lo to hi do s := s + f(); 
    return s 
end; 
function f(): real; 
begin 
    return a[j]*b[j]; 
end;

The lesson to learn here is that higher-order functions give the functionality of call-by-macro.