CSE130 LECTURE NOTES

February 26th, 2001

MISCELLANEOUS





NAMES, IDENTIFIER

Names (of variables but also of functions, procedures, classes, and types) are more important than you may think (think of program development, debugging, maintenance):

The meaning of a name (identifier) has to be declared in most languages (variable declaration, function declaration, ...).

In some languages names have an implicit meaning (for convenience): e.g., in FORTRAN names starting with I to N are implicitly treated as integers (may be overridden), in Perl names starting with @ are considered arrays.

In compiled languages, names only exists at compile time (unless compilation includes "debugging options"); at runtime, only values and their memory addresses exist.
In interpreted languages names and values coexist.

In the following, we focus on variable names (aka identifiers denoting variables).

The meaning of a name is its binding. A binding is an association between a name (in the "symbol table") and a storage object (an area of memory).

A variable is an identifier together with the value it is bound to. A variable declaration introduces the name and (usually) the type of a variable. If memory is allocated (necessary e.g., when the variable is also initialized), we also speak of a definition.

In most (non-functional, non declarative) programming languages, variables are updatable. This means that the actual value of a variable stored at a memory address can be changed through assignment commands.

Definition: In a pure functional language, variables are not updatable. An imperative language does have updatable variables.

Since they do not have updatable variables, pure functional languages do not have commands with side effects. Haskell is a pure functional language.

In an imperative language like Pascal, a const declaration is the same thing as a value declaration (mytree0 = ...) in a functional language like Haskell.

BINDINGS AND ENVIRONMENTS

An environment is a mapping from names to values (e.g., a compiler may use a "symbol table" to keep track of names at compile time. However, the term "environment" is usually associated with runtime).

In other words, an environment is a set of bindings where each binding is an ordered pair (name, value). The simplest way to implement an environment is to use a list of pairs. This is adequate theoretically but for practical use a more efficient data structure must be used.

Environments are ubiquitous in PLs. To understand the semantics of a PL we need to understand how environments are treated at compile time and at runtime.

Exercise: how would you implement environments in Haskell? How in Java or C?

NESTED ENVIRONMENTS AND SCOPES

Let's start with three definitions, and then see how the definitions work in an example.

Definitions:

  1. A binding occurrence of an identifier is an appearance of the identifier where its binding gets established.
  2. An applied occurrence of an identifier is an appearance of the identifier where its binding gets used.
  3. The scope of (a binding occurrence of) an identifier is the region of the program where applied occurrences refer back to this binding occurrence
Here's the example (in some Pascal-style syntax):
begin 
    const pi; real pi := 3.14159; real x := 0.0; 
    procedure foobar(real z) is ...; 

    function tan(real x):real; 
    begin 
        real arg := x mod (2.0*pi); 
        return sin(arg)/cos(arg) 
    end; 

    ... foobar(x); ... 
end;

(1) "pi" is bound to a constant real value. "x" is bound twice, once to a real-valued variable, and once to a real-valued parameter.

(2) There are three environments involved here: the global environment which has bindings for "pi", "x", "foobar", and "tan", and two "extended environments" which additionally include bindings for (function/procedure) parameters (more precisely: parameter identifiers!).

(3) There are two related binding occurrences of "pi" and two unrelated binding occurrences of "x".

(4) There are two unrelated applied occurrences of "x".

(5) The scope of the binding occurrences of "pi" is the whole program. The scope of the first binding occurrence of "x" is the main program body. The scope of the second binding occurrence of "x" is the body of "tan".

NESTED BLOCK-STRUCTURED ("STATIC") SCOPING

All the important modern languages share the same basic scoping discipline.

Remember the definition: The "scope" of (a binding occurrence of) an identifier is the region of the program where applied occurrences refer back to this binding occurrence.

Under nested block-structured scoping, the essential rules are that:

  1. the scope of a binding is the entire block in which the binding happens,
  2. except a binding in an inner block hides a binding of the same identifier in an enclosing block.
Nested block-structured scoping says that free variables are understood to refer to a syntactically-enclosing binding. Thus it is usually called static scoping.

RECURSIVE DEFINITIONS

The scope of a binding extends to the end of the enclosing block. But usually it starts where the declaration appears, not at the start of the enclosing block.

Recursion among definitions can cause difficulties. Consider two types to keep track of the white and black squares on a chessboard:

type white = record 
                n1, n2, n3, n4: ^black; 
             end; 
type black = record 
                n1, n2, n3, n4: ^white; 
             end;
Most languages have ad hoc syntax for coping with mutual recursion. For example in Pascal, procedures can have "forward" bodies, and in the type "pointer to X" the type X does not have to be already defined.

ALPHA-CONVERSION, INVARIANCE UNDER RENAMING

An applied occurrence of an identifier in a block is called a "free variable" or "global variable" if there is no matching binding occurrence in the block.

Note the phrase "free variable" should really be "free identifier": a free variable may in fact be a constant or any other entity denoted by an identifier.

For example this function "scale" has one free variable, namely "s", and one non-free (bound) variable, namely the local variable "x":

const s = 10.0; 
function scale (x: real): real; 
begin 
    return x*s; 
end;
Alpha-conversion is the rule that the meaning of a block is unchanged if the name of a bound variable is changed everywhere inside the block. Clearly alpha-conversion is a valid rule under static scoping.

DYNAMIC SCOPING

Suppose the function scale is used inside another procedure:
procedure update (v: vector); 
begin 
    var s := norm(v); 
    for j := 1 to length(v) do scale(v[i]) 
end;
Static scoping says that the "s" local to "update" is irrelevant to the function "scale", so the vector v will be magnified tenfold, not normalized.

"Dynamic scoping" is a different scoping discipline. It says that an applied occurrence refers to the binding occurrence in the most recently entered scope.

Under dynamic scoping, the function "update" would normalize a vector v.

Dynamic scoping violates alpha-conversion: if the name "s" in the "update" procedure body was changed to "t", under dynamic scoping a different effect is obtained.

(DIS)ADVANTAGES OF DYNAMIC SCOPING

Definition: Dynamic scoping is the policy that an applied occurrence refers to the binding occurrence in the most recently entered scope.

Under static scoping, if all function values are created at compile-time, then identifiers can be discarded at run-time, because each use of a name can be compiled into a stack offset.

Dynamic scoping requires keeping knowledge of names at runtime. Names refer to values in the most recently entered scope, which can only be known at runtime.

Sometimes dynamic scoping is convenient. The behavior of a function can be changed without recompiling, just by calling it from inside an environment where global identifiers have different bindings.

However the same behavior can be obtained with static scoping and assignment:

var s = 10.0; 
function scale (x: real): real; 
begin 
    return x*s; 
end; 
procedure update (v: vector); 
begin 
    var l: integer := length(v); 
    s := norm(v); 
    for j := 1 to l do scale(v[i]) 
end;

Dynamic scoping makes programs harder to understand. The example above shows that assignment commands also make programs harder to understand.

The fundamental advantage of functional programming languages is that they are easier to understand, because they do not have the complexity of assignment commands.

If a program is easier to understand, it is also easier for a compiler to optimize.

LIFETIMES

Variables have scopes, and they also have lifetimes. The two concepts are distinct:

Every variable starts existing at a certain time, then stops existing at some time later. The general rule for determining a variable's lifetime is that it is the timespan of execution of the block in which the variable is declared.

For example the lifetime of s above is the execution of the whole program, while the local variable l has many lifetimes. Each lifetime of l is one execution of the body of update. Conceptually, update has new local variables at each call, which just happen to have the same name as previously existing variables.

Note that the executions of blocks are always strictly nested. They can't overlap, so lifetimes can't overlap also.

Where do variables live? There are several possibilities:

STATIC/OWN VARIABLES

A variable that keeps its value across different calls to a procedure (or function), but can only be updated inside the procedure, is global in lifetime, but local in scope.

A variable with this behaviour is called a static variable in C, or called an own variable in Algol 60.

One use for such a variable is to keep track of the seed used by a pseudo-random number generator:

function prng (): int; 
    static var seed : int = initialval; 
begin 
    seed := update(seed); 
    return transform(seed); 
end;

Here, we do not want the variable seed to have a new lifetime each time prng is called, but we do want this variable to be visible only inside prng.

PERSISTENT VARIABLES

Definition: A persistent variable is one that keeps its value after program execution, and that has an initial value before program execution.

The most common persistent variables are files.

Most languages have implementation-dependent restrictions on persistent variables. For example in Pascal a persistent variable must be of the type file of X for some component type X.

These limitations were to take account of implementation constraints, but they are excessive. For example Pascal has no random-access files.