CSE 130 Lecture Notes

CSE130 LECTURE NOTES

March 7th, 2001 (first lecture)

MISCELLANEOUS

lecture notes vs. paying attention in class (the former is no substitute for the later)

(... brief review ...)

TYPE BINDINGS

static type binding:
- type specified by a declaration statement
- explicit declaration: statement in program that lists variable names and specifies that are a particular type
- implicit declaration: means of associating variables with types through defaults instead of declaration statements.
  (Ex: in FORTRAN, the first letter of an identifier determines its type. 'I'.. 'N' then INTEGER, otherwise REAL)
- Recall: declarations specify type and other attributes but do NOT cause allocation of storage; definitions specify attributes and cause storage allocation (cf. lecture from Feb 26)
dynamic type binding:
- variable bound to type when it is assigned a value in an assignment statement
- advantage: programming flexibility;
  disadvantage: higher (runtime!) cost, type errors are caught later
type inference:
- used in functional languages (ML, Haskell)
- types of the expressions are determined without necessarily specifying all types of variables and subexpressions;
  instead the system infers and checks types at compile time (static type checking);
  for polymorphic types, the concrete type is determined at runtime! (cf. question on previous lecture notes...)

STORAGE BINDINGS

We speak of memory allocation if memory is taken from a pool of available memory. By deallocating the storage, it is put back into the pool of free memory.

LIFETIMES

... static lifetime ... stack-dynamic lifetime ... heap-dynamic lifetime ...

SCOPE

...

STATIC SCOPING

...

ALPHA-CONVERSION, INVARIANCE UNDER RENAMING

...

DYNAMIC SCOPING

...

STATIC VS DYNAMIC SCOPING

Static Scoping

Advantages:
- Readability
- Locality of reasoning
- Less runtime overhead
Disadvantages:
- Some loss of flexibility

Dynamic Scoping

Advantages
- Some extra convenience
Disadvantages:
- Loss of readability
- Unpredictable behavior (no locality of reasoning)
- More runtime overhead

SCOPING Rules for Java

(adapted from here.)

Local vs. instance variables

Java allows local variables and parameters to have the same name as instance variables. Local variables then hide instance variables. Instance variable can still be reached using the this reference:

public class Example {
    int i;              // Instance variable

    public void exampleMethod() {
        int i;          // Local variable

        i = 0;          // uses the local variable
        this.i = 10;    // uses the instance variable
    }
}

Here is another example:

public class Example {
    int value;

    public Example(int value) {
        this.value = value;    // assign the value of the local
                               // parameter to the instance variable
    }
}

Nested blocks

Java (unlike C++) does not allow variables in nested blocks to have the same name. For example...

int i = 0;
int x = 0;
while (x < 10) {
    int i;        // ERROR: i already exists
    .
    .
    x++;
}

However, Java variables (like C++) only exist in the block in which they are declared. For example...

int x = 0;
while (x < 10) {
    int i = 5;
    .
    .
    x++;
}
x = i;        // ERROR: i does not exist here

The same is true of for loops. Variables declared in the for statement cannot have the same name as already existing variables and they only exist for the lifetime of the loop.

int i = 5;
for (int i = 0; i < 10; i++) {    //  ERROR: i already exists
    .
    .
}
    
for (int i = 0; i < 10; i++) {
    .
    .
}
i = 5;        // ERROR: i does not exist here

And since the for index is considered to only exist inside the loop, the following is allowed in Java...

for (int i = 0; i < 10; i++) {
    .
    .
}

for (int i = 0; i < 10; i++) {    // OK: the previous i is gone
    .
    .
}

Similar for the if statement.

PERSISTENT VARIABLES

...

FUNCTIONS AND PROCEDURES AS ABSTRACTIONS

...

ACTUAL AND FORMAL PARAMETERS

...

SEMANTICS OF CALLING FUNCTIONS AND PROCEDURES

Informally, the semantics of calling a function or procedure is as follows:

evaluate the actual arguments
pass the actual arguments as values for the formal parameters
execute the block that is the function or procedure body
return any results back.

We need to clarify each of these four stages. The point to notice here is that expressions are never passed as arguments in mainstream languages. They are always evaluated first, and the result values are passed.
(this is also called strict evaluation; recall that some functional languages like Haskell have non-strict, aka "lazy", "on demand", or "call by need" evaluation)

Remember the discussion of evaluation order for expressions. An operation like + applied to two arguments, e.g. 2*x + 3*y is really a function-call plus(times(2,x),times(3,y))!

The most important rule of expression evaluation is that arguments are evaluated before the function applied to them is called. Most of the time, that is: if-then-else is special many languages! Languages differ in whether or not it is specified in what order multiple arguments are evaluated.

CALL BY VALUE

In the following procedure declaration f1 and f2 are formal parameter names and x is the name of a local variable.

procedure p(f1,f2:t); 
    var x: t; 
begin 
    ... 
end;

Given the call "p(e1,e2)" where e1 evaluates to the value a1 and e2 evaluates to the value a2, the simplest semantics of parameter-passing says that the call is equivalent to executing the block

begin 
    const f1 := a1; 
    const f2 := a2; 
    var x: t; 
    ... 
end;

This method of parameter-passing is called call-by-value. A variant allows f1 and f2 to be local variables as opposed to local constants.

CALL BY VALUE-RESULT

When an actual argument is an expression, it makes no sense to use it for returning a result value. However when an actual argument is a variable name, the final value of the formal parameter can be copied back into the actual argument variable. More formally, given

procedure p(inout f:t); 
begin 
    ... 
end;

the call "p(x)" where x is the name of a variable is equivalent to

begin 
    var f:t := a; // "a" denotes the current value of x 
    ... 
    x := f; 
end;

Call by value-result is used in Ada.

CALL BY REFERENCE

When an actual argument is a variable name, it makes sense to pass the location denoted by the name to the called procedure. Then the formal parameter name acts as a local name for the global location.

This is the semantics of Pascal "var" parameter-passing. We can explain it more formally as follows. Given

procedure p(var f:t); 
begin 
    ... 
end;

the call "p(x)" where x is the name of a variable is equivalent to

begin 
    const f: pointer to t := location(x); 
    ... 
end;

In some languages, for example C, call-by-reference is not available but in can be achieved by using call-by-value and passing the location of a variable instead of its value (since we can easily get the value of the memory location of a variable using the "&" operator... ).

DIFFERENCE CALL BY VALUE-RESULT and CALL BY REFERENCE

Although these parameter passing modes are similar, they may produce different results since the former works on local copies and only at the end copies back (call by value-result is also called copy-in/copy-out) the result, whereas the latter works immediately on the object passed by reference to the procedure.

Consider the following example:


==================================================================
program main ...

var i, j: integer;

procedure foo(inout x,y: integer)  (* compare "inout" vs. "var" *)
begin
	i := y
end;

begin (* main *)

i := 1; j := 2;

foo(i, j);

(* value of i?? *)
end.

==================================================================

Here is another example:


==================================================================
program main ...

var i: integer;

procedure bar(inout x,y: integer)   (* compare "inout" vs. "var" *)
begin
	x := x + 1;
	y := y * 2;
end;

begin (* main *)

i := 2; 

bar(i, i);

(* value of i,j?? *)
end.

==================================================================

CALL BY MACRO/CALL BY NAME

The idea in call-by-macro (aka call by name) is that the actual argument is textually inserted into the called block. This is differetn from the earlier statement, i.e., that a function evaluates all argument (expressions) first!

Consider the pseudo-Pascal summation function

function sum (lo,hi:int; macro k:int; macro f:real): real; 
    var s: real := 0; 
begin 
    for k := lo to hi do s := s + f; 
    return s; 
end;

With call-by-macro the actual argument expressions for k and f will be evaluated again each time the formal parameter is used, so we can write for example sum(1,n,j,a[j]*b[j]) to compute the dot-product of two vectors a and b of length n.

THE IMPORTANCE OF CALL-BY-MACRO

All the other parameter-passing methods are basically similar: they work by initializing the formal parameters as local names in an environment used for executing the called block. They are all variants of "call-by-environment." Most practical programming languages use call-by-environment, but call-by-macro is theoretically important because

it is a common informal way to reason about invocation: simulating mentally step-by-step what the function body does with the actual arguments in place of the formal parameters
it is used in the lambda calculus, an abstract mathematical programming language
it is formally simpler: it doesn't introduce these complex objects called "environments"
it does appear in practical languages under the guise of macros, and in pure functional languages

In Algol 60, call-by-macro is called call-by-name, but implemented using higher-order functions (i.e. using function arguments).

Consider the pseudo-Pascal summation function using call-by-macro from before. This can be implemented using call-by-value with function arguments:

function sum (lo,hi:int; ref k:int; f:void->real); 
    var s: real := 0; 
begin 
    for k := lo to hi do s := s + f(); 
    return s 
end; 
function f(): real; 
begin 
    return a[j]*b[j]; 
end;

The lesson to learn here is that higher-order functions give the functionality of call-by-macro.

OBJECT-ORIENTED PROGRAMMING

(cf. [Sebesta, Ch.11])

Let us look at object-oriented programming from the point of view of trying to understand it in terms of programming language concepts that we know already. We will compare OOP to abstract data types. However first we will look again at imperative versus functional programming:

Updatable variables and assignment commands are the essence of imperative programming. There are two basic reasons for using variables: (1) to remember intermediate results (2) to keep track of things changing in the outside world.

The second use of updatable variables is much more fundamental. If you think of a software module as simulating something in the real world, e.g. a database system simulates the population of UCSD students, then it is very natural to update variables as events happen in the real world, e.g. a student enrolls in a class.

In object-oriented programming, an object is a generalization of a variable as used to model something in the outside world that can change.

USING VARIABLES TO REMEMBER TEMPORARY RESULTS CAN BE AVOIDED

Consider this version of the factorial function:

function fact (n: integer): integer; 
    var temp: integer; 
begin 
    if n = 0 then temp := 1 
    else 
        begin 
            temp := n-1; 
            temp := fact(temp); 
            temp := n*temp 
        end; 
    return temp 
end;

Is using updatable variables to remember intermediate results really necessary? The answer is no, because using nested expressions, the programmer can let the compiler take care of computing intermediate results in the right order:

function fact (n: integer): integer; 
begin 
    return ifthenelse( n=0, 1, n*fact(n-1) ) 
end;

Temporary variables are inessential.

WHAT IS AN OBJECT?

Remember the second, most important, use for updatable variables: to keep track of things changing in the outside world.

If you think of a software module as simulating something in the real world, e.g. a database system simulates the population of UCSD students, then it is very natural to update variables as events happen in the real world, e.g. a student adds a class.

In object-oriented programming, an object is a generalization of a variable as used to model an entity in the outside world.

We can think of an object as a (complex) variable that has "implementation-independence". The idea of "representation-independence" is a special case of "implementation-independence", which is an idea that can apply to types, to variables, constants, functions, procedures, and more:

In modeling a real-world entity, what is important is that the model should behave like the entity being modelled, not the details of how the model is implemented.

An object contains data, called its state or private variables, and it has operations that can update its private variables called methods. Methods are sometimes called member functions.

In the broadest sense, a class is a set of similar objects. Often a type is associated with a class (or a class is considered to be a type itself).

We say that an object is an instance of a class. The operations of an object typically have side-effects, so they are not pure functions.

Objects are thought of as "active individuals" rather than passive values. Invoking an operation of an object is called sending a message to the object.

Therefore, instead of writing x := plus(x,y) we write send x "add y" for example. This calls the add method of the object named x. Every method has an unwritten (implicit) first argument, which is the object itself.

CLASSES COMPARED TO ABSTRACT DATA TYPES

The definition of a class looks similar to the implementation of an ADT, but no separate signature or specification is given:

class financial-history = 
begin 
    state 
	cash, receipts, expenses 
    method init() = 
	cash := 0; receipts := []; expenses := []; 
    method receive(amount) = 
	receipts := append(amount,receipts); 
	cash := cash + amount; 
    method spend(amount) = 
	expenses := append(amount, expenses); 
	cash := cash - amount; 
    method cash() = return cash; 
end;

Individual objects can be declared in the same way as variables:

var myaccount: financial-history;

A class introduces a collection of operations which are imperative. Methods modify objects which have an internal, updatable state.

In contrast, an ADT introduces a new type. Any operation on a value of the new type must take the value as an input parameter. With a pure ADT, there are no operations that modify values of the type. Instead, some operations can generate "fresh", new values of the type.

RESTRICTED IMPERATIVE UPDATING

A type is pure if variables having the type cannot be updated in any way, i.e. no assignment commands are available for this type. In functional programming, all types are pure.

Errors are very likely with unlimited direct updating of composite variables. Consider the following example:

const length = 5; 
type address = array [1..length] of string; 

procedure update (a: address); 
    var j: integer := 1; s: string; 
begin 
    while (j <= length) and not eof(input) do 
    begin 
        readln(s); 
        a[j] := s; 
        j := j+1 
    end 
end;

Q: what happens if the user gives an address with less than 5 lines?

Classes are typically not pure, because objects can be updated, i.e. modified. However programming with objects is still safer than programming with regular composite types, because updating can only be done through predefined methods.

INHERITANCE

From a programming language (as opposed to programming methodology) point of view, the major innovation in OOP is inheritance. Here is an example of what is called a derived class definition:

class tax-history parent financial-history = 
begin 
    state deductions; 
    method init() = 
	deductions := []; 
    method deductible-spend(amount) = 
        deductions := append(amount,deductions); 
        self.spend(amount);	// = send self spend(amount)
end;

Every object of type tax-history has three financial-history state variables, plus one new state variable. It can respond to all financial-history messages, and some additional messages.

DYNAMIC BINDING

With inheritance there is a problem concerning how methods are bound to method-names.

One method of an object can call another of the same object if necessary. The name self (or this)refers to the current object, which is the implicit first argument of each method.

Calls to a self-method should refer to the method with the given name in the child class. This allows child class methods to override parent methods.

For example, sending the message id to an object of class D with the following class definitions should give I'm a type D object:

class C = 
begin 
    method whoami = 
	return "I'm a type C object"; 
    method id = 
	print (self.whoami) 
end; 

class D parent C = 
begin 
    method whoami = 
	return "I'm a type D object"; 
end;

Under standard static scoping, the name whoami in the body of the method C.id would refer to the method C.whoami. This is not what we want.

Method names should follow dynamic binding: the method corresponding to a name should be found in the runtime environment, not in the compile-time environment.

SOME MORE OOP KEYWORDS

Object identity. Even if two objects have the same internal value, they are still different objects. Similarly, two variables are different even if temporarily they have the same value, because they have different locations, and hence in the future they can be updated differently.
Object identity is connected to an "object-centric" view of programming: every operation is about some primary object. Syntactically, every method has an unwritten (implicit) first argument, which is the object itself.
Encapsulation. The state of an object is private to the object: the only way the state can be changed or observed is by calling one of the object's methods. This is what we called "implementation-hiding" (or independence) before.
Overloading. Two objects of different classes can have a method with the same name. If these methods perform conceptually similar operations, the caller of the method does not need to know the exact class of the called object. Overloading is particularly important with inheritance.

OWN (STATIC) VARIABLES REVISITED

Another point of view on objects is that an object is a single procedure with some variables that are global and private, i.e. they keep their values between calls to the procedure but they can only be updated inside the procedure. The methods of the object are alternative instructions that the object can execute.

According to this view, the object itself is really a dispatcher: it waits for a message and then selects one of its methods for execution based on which message was received. A class is a function that can generate new objects. For example:

type messages = (next, restart); 
function newgenerator (initial:real) = 
begin 
    var seed: real := initial; 
    return 
        function (m: messages) = 
        begin 
            case m of 
                next: seed := transform(seed); 
                restart: seed := initial; 
            end; 
            return seed; 
        end; 
end; 

function myrandom = newgenerator(1.2345); 
function hisrandom = newgenerator(0.1428);

Here newgenerator is a higher-order function that returns another function, which is anonymous. Each returned function is a pseudo-random number generator (PRNG) that can respond to two alternative messages.

The updated PRNG seed is kept in the variable seed which is local to newgenerator so seed is fresh each time that newgenerator is called. However seed is global to the anonymous function so each version of seed keeps its value between calls to the corresponding individual PRNG.