Data Structures and Algorithms
CSCI 311

Introduction to Data Types

Resources

For the first third of the course, we will focus on Part 1: Java syntax (Chapter 1) and program performance (Chapters 2-4).

I assume working knowledge of lists, arrays, matrices, stacks, and queues(Chapters 5-10). It is expected that you have knowledge on these basic data structures already (pre-requisite of CSCI 112 includes this material) thus this material is left for you to review on your own. However, you should look at these chapters to become familiar with issues you may not have seen. I suggest browsing through the chapters and the exercises, perhaps looking at the solutions the author provides for some to refresh your memory. I will at least look at the API to see the classes that Java provides for these data structures, but their properties and attributes of capabilites, use, and usefulness are assumed as understood.

For now (immediate future): Read Chapter 1 : Basic Java Concepts to see what you know and don't. Those of you who feel quite confident with Java can jump into reviewing your data structures of Chapters 5-10.

Exercises for Chapter 1: All of the exercises for Chapter 1 would be a good idea to do. Some are a quick and easy review. Good. Others may cause you to think. Beware. :-)
Hopefully # 1-22 are good reviews to bring you back into programming form. (Answers to odd numbers are at the text pages Exercise Solutions)
I would assume that the exercises on page 54 - # 23-29 may be a bit more difficult. We will probably not get to these for a couple weeks, but if the programming is a breeze for you, then go ahead and try these too. I am more concerned with your recursive algorithms since they are probably not as common to you.

(Problems are for practice - not to turn in. However, DO ask questions if you do not know how to do them!)

For Java: Read Chapter 1 thoroughly.

(1)

Preview - What are we doing here anyway!

Computer Science (has been defined) as the study of algorithms:

Whether one works with procedural languages or object-oriented languages, the concept of algorithms is fundamental.

Note: both techniques believe in the philosophy of "divide-and-conquer" (breaking up complex projects into smaller subprojects); the approaches simpy vary in how to decompose the project. Process vs. objects

Nonetheless, objects still have methods which are algorithms. AND, objects work together to produce a higher level algorithm.

(2)

Def: An algorithm is a finite set of instructions that, if followed, accomplish a particular task. In addition, every algorithm must satisy the following criteria:

  1. input (0 or more)

  2. output (at least 1)

  3. definiteness (each instruction clear and unambiguous)

  4. finiteness (trace and algorithm will end in finite steps)

  5. effectiveness (feasible)

(notice a program may not satisfy (4)... example?)

We can also test someone's algorithm to make sure that it is correct - using formal semantics.

Some of the questions we should ask when examining a computer program


And, there are, of course, things an algorithm simply cannot do!

Church-Turing Thesis:
Turing machines are formal versions of algorithms and no computational procedure will be considered an algorithm unless it can be presented as a Turing machine.

Godel's Incompleteness Theorem: A formal system cannot be both complete and consistent.
(meaning: there are truths that exist that cannot be proven using the axioms of the system)

Thanks to Turing and Godel, we know there are problems that are "undecidable" - no algorithm exists that can solve them.

(More on the topics of intractability in CSCI 550 and 650)

(3)

Additional (very important) introductory material

Data Types:

  1. a set (or domain...or collection) of allowed values

    (kinds of data that variables may assume)

    some books say only (1), some add:

  2. a set of operations on those values

Alternatively (more OO focus):
A data type is a collection of objects and a set of operations that act on those objects.

We look at three classes of data types:
simple, structured, and pointers

Simple are atomic (atomic data type)

The Data Type of a variable is the set of values that the variable may assume. Variables change only their value, not their structure (length, etc). You usually hear of a languages "primitive data types". Note again, that in the language, these have only values.

(4)

Structured Data Types: (or Data Structure)
A data type whose values

  1. Can be composed into a set of component elements each of which is either atomic or another data structure

  2. Include a set of associations or relationships (structure) involving the component elements

(includes the set of functions or operations which can be performed not only on the values of the data type, but also on the component elements of the data structure)

E.g., arrays, records, sets, files (text /user-defined) can all be "broken down" to atomic data types

Data Structures are created by giving names to aggregates of cells (atomic types) and (optionally) interpreting the values of some cells as representing connections (e.g. pointers) among cells.

Data structures (and programming) have two stages:
First Stage - Specification: What it does
Second Stage - Implementation: How it does it
Specification (design) should be as independent as possible from implementation

One should be able to write programs using data structures and their operations without knowledge of the specific implementation of the data structure.

Abstractions :

suppress irrelevant details and
seek to isolate the essence of the thing being abstracted

(5)

Abstract Data Types: ADT's

A mathematical model, together with various operations defined on the model. (e.g., lists, queues,...)

Abstract because the axioms do not imply a form of representation

To implement an algorithm in a given programming language - need to represent the ADT in terms of data types (and structures) supported by the programming language.

The implementation of a Data Structure is a representation of an ADT

The ADT is what (the potential; the function)
The data structure is how (the implementation)

Such abstraction allows for data encapsulation and information hiding

Definition: Data Encapsulation or Information Hiding is the concealing of the implementation details of a data object from the outside world.

Data Abstraction is the separation between the specification (what) of the data object and its implementation (how).

Typically a module ( an ADT) doesn't have a main program ... programs call modules

(very much like object-oriented programming: a "main" method should do no more than instantiate a class and start it)

Data Abstraction as a design:

Design:

David Marr - 3 stages

  1. what is the problem and goals? (computational theory)

  2. algorithm - how (abstract) (representation and algorithm (I/O, transformation))

  3. program - how (implementation)

(6)

Before Object-orientation: example - Pascal

Modules:

  1. set of data (data objects)
    shared among sets of procedures and functions
  2. operations on data
    may be procedures (don't return values) or functions (return values)
- no interaction except calls from one module to another

- linker puts all together

Typically a module doesn't have a main program

have library of modules
imports - things used by some module (provided by another)
exports - things the module makes available to other modules (provides)

Define at abstract level (Conceptual modules)

ADT Specifications: (the same conceptual module can be used for varying implementations)

Conceptual into actual implementations
(Abstract Types into Virtual Types)

Five Phases of "Creating" a Program

A side: Useful principles

(7)

Example:
Construct a conceptual module for operations on strings

  1. What operations to expect
  2. Space and time complexity
  3. What representation for the structure (elements mean the components of the values of the structured data type)

Notes from an MIT course about ADTs (note 7.3 for interesting use of throws to deal with what is given as preconditions)

Preconditions and Postconditions are needed for each procedure and function.

Conceptual module StringKeeper

	imports 	type	character

exports type String

proc MakeNullString

proc AppendString

proc ConcatString

proc ReverseString

How represent info of pre & post for StringKeeper?
e.g., Use |x| for the length of string x

For NullString |x| = 0, others |x| > 0

(8)

Precondition: an assertion that must be true in order for the operation to execute correctly

precondition

* the implementor assumes it to be true...and codes accordingly

* the user must be sure in his code it is met in order to use

(example for AppendString: Length(S) < 80) (or |S| < 80 )

Postcondition: an assertion that prevails upon completion of the operation. A description of the results of the actions performed by the operation. It must be accurately and precisely stated

postcondition

* the implementor must make it (through his code) true

* the user assumes it will be true

(e.g., c is appended to the right of S, which increases in length by 1) [ or S = #S o c and | S | = | #S | + 1 ]

where # means old

Elements, Structure, and Domain

Representation (choice of the particular combination of native data types)

and ... finally, Implementation:

const MaxSize = 80;		//in Java, make a class variable

type

Index = 1..MaxSize; String = record size : integer; str : array[1..MaxSize] of char end;

The previous edition of this book talked about Strings in the List chapter, yet now I have described them as an array (explain..)

See notes on ADTs for conceptual modules and implementations for Complex Numbers, Lists, and Stacking Queue

(9)

Readings:

Of course, Chapters 1-4

Additional Readings:
Chapters 5, 6, 7 (possibly new), 8, 9, 10 should be skimmed over to see what you remember (or don't). You can look at the author's page with solutions to problems. (Answers to odd numbers are at the text pages Exercise Solutions)

Hopefully in your previous classes you actually had to implement lists, stacks and queues. Once you have done the work and know their potentials, then you can appreciate Java's classes:


In the Iterable, java.util.Collections in the Collections Frameworks:

I have a page on the Collections and Generic Frameworks here

In this book, focus (obviously) is on Java. However, the author works a lot with ADT's. Remember that ADTs are language (implementation) independent, and thus, the book is useful for any language. In your readings, you should read carefully the abstract concepts (which remain valid) but skim over the specifics of implementation. My notes (and/or the API) will provide specifics on Java's implementation of arrays, vectors, stacks, queues, and linked lists. See also the classes vector and stack in java.util