CSE 130 Lecture Notes

CSE130 LECTURE NOTES

ADMINISTRATION

Today we're distributing the first individual assignment.

VALUES AND TYPES

These are two of the most basic concepts to understand in understanding programming languages. They are the source of some of the most important similarities and differences between languages.

A value is a piece of data that is involved in computing. A type is a collection of values on which the same operations make sense.

For example:

37 is an integer
-5.0E2 is a floating-point number
"New York" is a string
'N' is a character
true is a Boolean

What we call a type depends on the level of abstraction that we adopt. At a high level, e.g. in mathematics, "number" is a type. Common PLs are lower-level and distinguish integers from reals, because computer hardware provides very different representations and operations for integers and reals. In other words, this low-level feature is propagated to high-level programming languages.

Example of different operations on integers and reals:

integers have div and also mod
reals have /

At a very low level, all values are strings of bits, i.e. the types available are nibble, bytes, words, longwords, etc. PLs provide abstractions on top of this level.

PRIMITIVE VALUES AND THEIR TYPES

A value is one that (conceptually) is not made up from other values. The opposite is a composite value. For example, characters are primitive values, whereas strings are composite in most languages. But in Cobol, strings (of fixed length) are considered primitive, and in Snobol, strings of arbitrary length.

A built-in type (or operation) is one that is supplied by the language. The opposite of built-in is programmer-defined. Typically, integers, reals, and characters are built-in, and maybe booleans and strings. For types that are not built-in, a PL often has a standard way to make them programmer-defined.

Different languages often use different names for the same primitive type. This is just a "syntactic" difference.

Pascal: boolean integer real char
C: int float

Syntactic differences can show up in the names of built-in operations on primitive types, e.g. mod versus rem.

Sometimes there are minor semantic differences, e.g. maybe -7 mod 3 = 2 but -7 rem 3 = -1.

An implementational issue is a semantic question that is left unspecified by the definition of a PL. Implementation issues are choices that different compilers or interpreters can make differently. The most troublesome PL differences are often implementational. For example, are floating point numbers 32 bit or 64 bit?

Implementational differences can be very troublesome because they are not well-documented (like syntactic differences) and they do not have work-arounds (like documented semantic differences).

Ideally, nothing in the specification of a language should be implementation-dependent.

ENUMERATION TYPES

Are there any primitive types that are not built-in? Yes, user-defined primitive types. These are called enumeration types.

For example in Pascal you can write type direction = {n, e, s, w}

Note that n, e, s, and w are not characters. They are identifiers (i.e. names) which introduce brand-new values that previously did not exist in the language.

Remember, a type is a set of values on which the same operations make sense. Which operations are provided for values of type direction? Pascal provides the built-in functions succ, pred, and ord. For example:

        succ(s) = w
        pred(s) = e
         ord(s) = 2

Using enumeration types makes programs clearer, and helps to catch errors at compile-time. For example writing s*w by mistake gives a compiler error message.

In earlier versions of C, enumeration types are not available. Instead, it's good style to write

        #define n 0
        #define e 1
        #define s 2
        #define w 3

However this doesn't allow catching errors at compile-time. Conclusion: it's good for a language to be designed so that the compiler can enforce type correctness rules.

Enumeration types don't substitute for the programmer's good sense. He or she must still choose to only use the values of the type in sensible ways, for example to enumerate directions in a sensible order such as clockwise. With an "abstract data type" (we'll study these later), the programmer defining the type can specify what the type-specific sensible operations are, and can disallow all other operations.

COMPOSITE TYPES

A composite (also called compound) value is one that is made up from other values. For example a record in Pascal is a composite value:

type point = record x, y: real end;

There are big differences between programming languages in how a user can define his own types of compound values. Many differences are only syntactic (Pascal records are called structures in C) but still, we need some concepts that let us understand the commonalities.

Our aim is to classify the wide variety of compound types. We will do this by looking at the mathematical ways of constructing sets of compound values. In ordinary mathematics we know about intersections, unions, and Cartesian products. For programming we need some slightly more complex concepts.

CARTESIAN PRODUCTS

A product type is one whose set of values is a mathematical Cartesian product, i.e. a set of tuples. A tuple is a vector of components, where the order of the components is significant, and the components may have the same or different types.

Mathematically, the Cartesian product of sets A and B is the set of all possible ordered pairs <x,y> where the first component x of a pair is from A and the second component y is from B. Formally, the Cartesian product is written A x B and defined to be the set { <x,y> : x in A, y in B }.

If A has n elements, i.e. its cardinality is n, and the cardinality of B is n, then the cardinality of A x B is mn. Formally, |A x B| = |A| |B|.

Records in Pascal or structures in C are Cartesian product types. In records and structures components are given names. The benefit of these names is mostly documentation, since the components could be referred to by their numerical position: first, second, third, etc.

In ML product types are defined without giving names to components, in a more mathematical way. For example:

type person = string * string * int * real;

In Haskell the same product type is defined as follows:

type person = (String, String, Integer, Real)

NOTE. Be careful not to confuse this Haskell notation that corresponds to a Cartesian product with a similar notation in Pascal for an enumerated type!! For example, in Pascal, the type declaration

	type color = (red,green,blue);

means that a variable of type color has one of the three values. The enumerated type is declared in Haskell as follows:

	data Color = Red | Green | Blue

Note that a tuple is the same as a row in a relational database, where components are often called "fields" or "attributes".

MAPPINGS

This is the mathematical idea that corresponds to arrays. Consider the Pascal types

type direction = (n, e, s, w);
field = array [direction] of real;

What is the set of values of the type "field"? Given a value of type direction, the array can return a value of type real. The array is a mapping from type "direction" to type "real".

Mathematically, a mapping from set A to set B is a set of ordered pairs where

(1) the first item in each pair is from A and the second item is from B
(2) for each item x in A, there is a unique y in B such that the pair <x,y> is in the mapping.

The mathematical notation for an ordered pair is <x,y> using "angle brackets" or (x,y) using parentheses. The mathematical notation for the set of all mappings from one set to another is ->. For example, cosine is a member of R -> R where R is the set of real numbers.

The cardinality of A -> B is |B| to the power |A|. Intuitively, to specify a single mapping we must make |B| choices, |A| times.

Example. How many mappings are there from three colors to an intensity level say 0...255?

 type colormapping =     {red, green, blue} -> {0,1,...,255}

In other words, how many values can a variable of type colormapping assume? Red can be mapped to one of 256 values, and so can (independently) green and blue. Hence there are 256 * 256 * 256 mappings, each representing a value that a variable of type colormapping can have.

Mappings are also called functions. The set on the left of the arrow (the "from" set) is called the domain and the set on the right is called the range. Note that the domain of a mapping need not be a primitive type. If we think of an array as a mapping, then the domain type is called the index type and is often a tuple.

Be sure to understand the difference between a single mapping, such as cosine, and the set of all mappings of the same type, such as R -> R.

Based on lecture notes by Charles Elkan.