The meaning of a name (identifier) has to be declared in most languages (variable declaration, function declaration, ...).
In some languages names have an implicit meaning (for convenience): e.g., in FORTRAN names starting with I to N are implicitly treated as integers (may be overridden), in Perl names starting with @ are considered arrays.
In compiled languages, names only exists at compile time
(unless compilation includes "debugging options"); at runtime, only
values and their memory addresses exist.
In interpreted languages names and values coexist.
In the following, we focus on
The meaning of a name is its binding. A binding is an association between a name (in the "symbol table") and a storage object (an area of memory).
A variable is an identifier together with the value it is bound to. A variable declaration introduces the name and (usually) the type of a variable. If memory is allocated (necessary e.g., when the variable is also initialized), we also speak of a definition.
In most (non-functional, non declarative) programming languages, variables are updatable. This means that the actual value of a variable stored at a memory address can be changed through assignment commands.
Definition: In a pure functional language, variables are not updatable. An imperative language does have updatable variables.
Since they do not have updatable variables, pure functional languages do not have commands with side effects. Haskell is a pure functional language.
In an imperative language like Pascal, a const declaration is the same thing as a value declaration (mytree0 = ...) in a functional language like Haskell.
In other words, an environment is a set of bindings where each binding is an ordered pair (name, value). The simplest way to implement an environment is to use a list of pairs. This is adequate theoretically but for practical use a more efficient data structure must be used.
Environments are ubiquitous in PLs. To understand the semantics of a PL we need to understand how environments are treated at compile time and at runtime.
Exercise: how would you implement environments in Haskell? How in Java or C?
Definitions:
begin const pi; real pi := 3.14159; real x := 0.0; procedure foobar(real z) is ...; function tan(real x):real; begin real arg := x mod (2.0*pi); return sin(arg)/cos(arg) end; ... foobar(x); ... end;
(1) "pi" is bound to a constant real value. "x" is bound twice, once to a real-valued variable, and once to a real-valued parameter.
(2) There are three environments involved here: the global environment which has bindings for "pi", "x", "foobar", and "tan", and two "extended environments" which additionally include bindings for (function/procedure) parameters (more precisely: parameter identifiers!).
(3) There are two related binding occurrences of "pi" and two unrelated binding occurrences of "x".
(4) There are two unrelated applied occurrences of "x".
(5) The scope of the binding occurrences of "pi" is the whole program. The scope of the first binding occurrence of "x" is the main program body. The scope of the second binding occurrence of "x" is the body of "tan".
Remember the definition: The "scope" of (a binding occurrence of) an identifier is the region of the program where applied occurrences refer back to this binding occurrence.
Under nested block-structured scoping, the essential rules are that:
Recursion among definitions can cause difficulties. Consider two types to keep track of the white and black squares on a chessboard:
type white = record n1, n2, n3, n4: ^black; end; type black = record n1, n2, n3, n4: ^white; end;Most languages have ad hoc syntax for coping with mutual recursion. For example in Pascal, procedures can have "forward" bodies, and in the type "pointer to X" the type X does not have to be already defined.
Note the phrase "free variable" should really be "free identifier": a free variable may in fact be a constant or any other entity denoted by an identifier.
For example this function "scale" has one free variable, namely "s", and one non-free (bound) variable, namely the local variable "x":
const s = 10.0; function scale (x: real): real; begin return x*s; end;Alpha-conversion is the rule that the meaning of a block is unchanged if the name of a bound variable is changed everywhere inside the block. Clearly alpha-conversion is a valid rule under static scoping.
procedure update (v: vector); begin var s := norm(v); for j := 1 to length(v) do scale(v[i]) end;Static scoping says that the "s" local to "update" is irrelevant to the function "scale", so the vector v will be magnified tenfold, not normalized.
"Dynamic scoping" is a different scoping discipline. It says that an applied occurrence refers to the binding occurrence in the most recently entered scope.
Under dynamic scoping, the function "update" would normalize a vector v.
Dynamic scoping violates alpha-conversion: if the name "s" in the "update" procedure body was changed to "t", under dynamic scoping a different effect is obtained.
Under static scoping, if all function values are created at compile-time, then identifiers can be discarded at run-time, because each use of a name can be compiled into a stack offset.
Dynamic scoping requires keeping knowledge of names at runtime. Names refer to values in the most recently entered scope, which can only be known at runtime.
Sometimes dynamic scoping is convenient. The behavior of a function can be changed without recompiling, just by calling it from inside an environment where global identifiers have different bindings.
However the same behavior can be obtained with static scoping and assignment:
var s = 10.0; function scale (x: real): real; begin return x*s; end; procedure update (v: vector); begin var l: integer := length(v); s := norm(v); for j := 1 to l do scale(v[i]) end;
Dynamic scoping makes programs harder to understand. The example above shows that assignment commands also make programs harder to understand.
The fundamental advantage of functional programming languages is that they are easier to understand, because they do not have the complexity of assignment commands.
If a program is easier to understand, it is also easier for a compiler to optimize.
Every variable starts existing at a certain time, then stops existing at some time later. The general rule for determining a variable's lifetime is that it is the timespan of execution of the block in which the variable is declared.
For example the lifetime of s above is the execution of the whole program, while the local variable l has many lifetimes. Each lifetime of l is one execution of the body of update. Conceptually, update has new local variables at each call, which just happen to have the same name as previously existing variables.
Note that the executions of blocks are always strictly nested. They can't overlap, so lifetimes can't overlap also.
Where do variables live? There are several possibilities:
A variable with this behaviour is called a static variable in C, or called an own variable in Algol 60.
One use for such a variable is to keep track of the seed used by a pseudo-random number generator:
function prng (): int; static var seed : int = initialval; begin seed := update(seed); return transform(seed); end;
Here, we do not want the variable seed to have a new lifetime each time prng is called, but we do want this variable to be visible only inside prng.
The most common persistent variables are files.
Most languages have implementation-dependent restrictions on persistent variables. For example in Pascal a persistent variable must be of the type file of X for some component type X.
These limitations were to take account of implementation constraints, but they are excessive. For example Pascal has no random-access files.