CSE 130 Lecture Notes

CSE130 LECTURE NOTES

February 5th, 2001

MISCELLANEOUS

you can subscribe to get summaries from the Web board by email, e.g., a daily summary (you just missed out on some good ones... ;-)
apart from the (very good) online material available on Haskell, I recommend Introduction to Functional Programming using Haskell by Richard Bird to those who want to learn more about the art of functional programming! (not to be confused with hacking (Emacs) Lisp...)

CONCRETE vs. ABSTRACT DATA TYPES

The basic difference between abstract data types (ADTs) and concrete data types is that the latter allow us to look at the concrete representation, whereas the former hide the representation from us.

Before we look closer at ADTs, let us revisit type declarations in Haskell:

-- Enumeration type:
data Color		= Red | Green | Blue	

-- Union ("sum") type:
data MyAddress		= ShortAddr City | LongAddr Street City

type City	= String
type Street	= String

-- Record/Tuple ("product") type (polymorphic): 
data MyTriple a		= Triple a a [a]	

-- Algebraic ("sum of products") type:
data BinTree		= EmptyBT | LeafBT Int | NodeBT BinTree BinTree 

-- Polymorphic Algebraic type:
data XTree a		= LeafXT a | NodeXT [ XTree a]

Some observations:

Case matters (in case you haven't found out yet ;-)
The data declarations define new (i.e., user-defined) data types Color, myTriple a, BinTree, and XTree a
myTriple a and XTree a are polymorphic types, since the system can substitute any type for the generic type variable "a"
type constructors like Color, MyTriple, BinTree, ... are used to define new types (recall that a type stands for a set of values)
data constructor like Red, Triple, EmptyBT, NodeBT, .. are used to define concrete values

For example, the following expressions define a concrete trees:

myBinTree0 =
	NodeBT 
		(NodeBT
			(LeafBT 2)
			(LeafBT 3))
		(LeafBT 4)

myXTree0 =
	NodeXT [
		NodeXT [LeafXT 1],
		LeafXT 2]

How can we have the system "print" values for these data types? For example, we can explicitly tell the system to show us the values using the (predefined) function show :: Show a => a -> String:

Main> show myBinTree0
"NodeBT (NodeBT (LeafBT 2) (LeafBT 3)) (LeafBT 4)" :: [Char]

Main> show myXTree0
"NodeXT [NodeXT [LeafXT 1],LeafXT 2]" :: [Char]

Yet another way is to define your own show function, or to declare:

data XTree a		= LeafXT a | NodeXT [ XTree a]
			deriving Show

which then will allow us to just enter an XTree expression directly (the "deriving Show" declaration makes XTree a an instance of the type class Show):

Main> myXTree0
NodeXT [NodeXT [LeafXT 1],LeafXT 2] :: XTree Integer

FROM CONCRETE TO ABSTRACT DATATYPES (ADTs)

Types stand for sets of values. Operations (functions) take values over given types and produce values over the same or other types.

The above data types, e.g., myTriple a or myBinTree a are not ADTs, because we can see (and access!) their internal representation! For example, we may just enter:

Main> Triple 1 2 [3,4]
Triple 1 2 [3,4] :: MyTriple Integer
(71 reductions, 134 cells)

(here we assumed a deriving Show declaration for MyTriple a was given)

Note how we used the data constructor Triple directly to define a concrete value. The system evaluates the expression by reducing it to its canonical (i.e., normal) form, which in this case is the expression itself, and then shows its value.

Data Abstraction, Information Hiding, Encapsulation.
The notion of data abstraction (see also Chapter 10, [Sebesta,4th edition]) means that we want to abstract from the concrete representation of a type. Instead, we hide the details of the concrete data representation, together with the implementation of the operations and encapsulate them in a module (e.g., Haskell, Pascal) or class (e.g. Java).

In this way, an abstract data type can only be accessed and manipulated via the operations that are declared for this ADT; these operations act as an interface to the ADT.

More formally: An ADT has two parts: a specification part and an implementation part.

the specification part comprises a signature (for accessing/manipulating the ADT), and constraints (logical, algebraic):
- the signature states the types of the basic operations available on values of the new type
- the constraints state properties that these operations must satisfy
the implementation part comprises the concrete data types and a signature implementation:
- the concrete data types give the actual representation (encoding) of values
- the signature implementation contains the actual code for the operations

PUBLIC vs. PRIVATE

The specification part of an ADT is public: it is what a user of the ADT needs to know.

The implementation part of an ADT is private: only the implementer needs to know the details of exactly how values are represented, and how the basic operations on values are accomplished.

Note: the notions of user-defined/system-defined vs. abstract/concrete data types are orthogonal (independent).

Q: What kind of data type is Float?

AN EXAMPLE ADT: STACK

-- SPECIFICATION of STACK (signatures)

create  :: Stack a			-- create an empty stack 
push    :: a -> Stack a -> Stack a	-- push an element on the stack
pop     :: Stack a -> Stack a		-- pop an element off the stack
top     :: Stack a -> a			-- what's on top? 
empty   :: Stack a -> Bool		-- is the stack empty?

What are the constraints that apply to our stack?
The crucial constraints that capture the semantics of the stack operations are these:

top ( push x s) = x
pop ( push x s) = s

top (create) ==>  error
pop (create) ==>  error

Note that we have specified a stack without giving its concrete implementation. In this way, if we access an ADT only through its "interface" operations push, pop, ..., the code becomes independent of the chosen implementation. Advantages of this approach include:

better modularization
modules can be compiled (and optimized) separately

Now that we have specified what a stack is, let's look at a concrete implementation:

-- IMPLEMENTATION: CONCRETE DATA TYPE
data Stack a      = EmptyStack | Push a (Stack a)	
                  deriving (Eq, Show)	-- for displaying stacks



-- IMPLEMENTATION: "code" 
create			= EmptyStack

-- note the difference between the ADT operation 
-- ... "push" (a Haskell function) and the 
-- ... data constructor "Push" : 
push x s                = Push x s

pop (Push _ s)          = s
pop EmptyStack          = error "*** ERROR: pop from empty stack!"

top (Push x _)          = x
top EmptyStack          = error "*** ERROR: top of empty stack!"


empty EmptyStack        = True
empty (Push _ _)        = False

Q: Can you give another implementation of Stack a? How about using lists?

PURE VERSUS UPDATABLE ADTS

A pure ADT is one where all operations are pure functions. This means that operations have no side-effects. In particular, they do not modify or update their input arguments. They just use these arguments to generate outputs, which are fresh values of the ADT (or of other types).

Most concrete types are pure. For example, no operation on integers actually modifies an integer. Instead, all the operations like "+" produce fresh outputs.

An updatable ADT is one where some operations actually change values of the ADT. For example, suppose we had an operation called "pop" that took a stack as argument and modified it ("in place", "destructively") by removing the highest-priority item. This operation would be impure and the whole ADT would then be impure also.

PACKAGES

Abstract data types are generalized in modern languages (e.g. Ada) to so-called "packages".

A package has a public signature and a private implementation, like an ADT. The difference is that a package can simultaneously provide many types, many specific values, and many operations. This increases flexibility and allows mutually dependent types but loses the analogy with concrete data types (recall that an ADT is really a (albeit abstract) data type while a package can be a collection of ADTs together with other non-ADT "features".)

Packages preserve the crucial ADT idea of information-hiding. Details that a user does not need to know are not visible to him.

Packages are best-known in Ada (but also provided e.g. in ML). Haskell provides a mechanism call module for defining ADTs. One important use for packages is as units of separate compilation. All the types in a package typically provide related functionality and are implemented by the same person or team, so it is reasonable for the classes in a package to have access to each other's implementation, i.e. to all the state variables and methods of each other.