Data Structures



  1. What is a data structure?

  2. The general problem of data structure design

  3. When do we need a data structure?

  4. Review of Stacks and Queues

  5. A stack is a data structure S that supports three operations:

  6. The pushing (insertion) and popping (detetion) are done at one and the same end of the stack, namely, its top.

  7. One implementation is an array S[1:N] and a global index I pointing to the top of the stack, assuming that S[i] is empty but S[i-1] is full
    Procedure Push (S,a)
    begin
    S[I] := a;
    I := I+1;
    end

    function Pop(S)
    begin
    I := I-1;
    return(S[I]);
    end

    function Top(S)
    begin
    return(S[I-1]);
    end

  8. A queue is a data structure Q that supports two operations:

  9. One implementation is a circular array Q[0:N-1] and two global indexes H and T, pointing to the head and tail elements of Q.

    Q[H] is considered full, Q[T] is considered empty.

    By circular we mean: the element before Q[0] is Q[N-1],
    and the element after Q[N-1] is Q[0].
    Procedure enqueue(Q,a)
    begin
    Q[T] := a;
    T = T - 1 modulo N;
    end

    function dequeue(Q)
    begin
    H := H-1 modulo N;
    return (Q[H+1 modulo N]);
    end

  10. Records: A record is a built-in structure data type. It is an aggregate of several elements called fields, where each field is a variable of a standard type or of a record type.

  11. Syntax of a record:


    record name
    begin
    field declarations;
    field declarations;
    ...
    field declarations;
    end


    Example:
    record employee
    begin
    char name[1:30]
    integer SSN;
    char address[1:100]
    real salaray;
    end


    Once a record has been defined, it becomes a new data type that can be used in declaring record variables:

    employee X;
    employee Y[1:100];
    record company
    begin
    char name[1:50]
    char address[1:100]
    employee Z[1:1000]
    end

    to access an individual field F of a record R, use
    R.F

    Examples: X.SSN := 123456789;
    X.salary := X.salary + 1000;

  12. Pointers data types
    A pointer is a data type whose contents are addresses of other variables of a specified type.


    Examples: charpointer p; intpointer q[1:n]; employeepointer r;
    charptr p; intptr q[1:n]; employeeptr r;

    to access the field F of a record pointed to by a record pointer r, use
    r ---> F

  13. A self-referential record is a record that has a pointer field that points to a record of the same type.

  14. Example:
    record employee
    begin
    char name[1:30]
    integer SSN;
    employeeptr next;
    end

  15. Singly Linked Lists:

  16. Doubly linked lists: these are like single linked lists except that each record has a field that points to the previous record, and another field that points to the next record.

  17. Introduction to graphs

  18. Trees

    In the three algorithms below, T is a pointer to the tree root, and a a is a key.


    The search function returns a pointer to the record containing a, if any. Otherwise, it returns nil.

    functionsearch(T,a)
    begin
    nodeptr p;
    p=T;
    while (p != nil and p --> key != a) do
    if a < p --> key then
    p := p --> left;
    else
    p := p --> right;
    endif
    endwhile
    return (p);
    end search

    The insert procedure inserts the key a into T.
    procedure insert(T,a)
    begin
         nodeptr p;
         p=T;
         Bool done = false;
         while not done do
            if key then
                 if p --> left != nil then
                     p = p --> left;
                 else
                     p --> left = new(node);
                     p --> left --> key := a;
                     done =true;
                 endif
             else
                 if p ---> right != nil then
                     p = p --> right;
                 else
                     p --> right = new(node);
                     p --> right -- > := a;
                     done = true;
                 endif
             endif
         endwhile
    end insert

    • The procedure delete deletes the node containing key a in the tree
    • p points to the node that will be found to contain a
    • q points to the parent of p
    • r points to a child of p
    • direction = 0 if p is a left child of q, 1 if p is the right child of q
    procedure  delete(T,a)	 
    begin 
    	nodeptr  p,q,r,s; 
    	integer  direction; 
    	p = T; 
    	while  (p != nil and p-->key != a) do 
    		if a < p-->key then 
    			q := p; 
                           	p := p-->left; 
    		 	direction := 0; 
                 	else 
    		 	q := p; 
                    	p := p-->right; 
    			direction := 1; 
                 	endif 
            endwhile 
    	if p == nil then 	 
    	 	return; 
    	elseif p-->left == nil and p-->right == nil then 
    		/* p has no children */ 
    		/* delete node pointed to by p and exit */ 
    		if direction == 0 
    			q-->left = nil; 
    		else 
    			q-->right = nil 
    		endif 
    		free (p); 
    	elseif p-->left == nil then 	
    		/* p has only one child, the right one */ 
    		if direction == 0 then 
    			q-->left := p-->right; 	
    			/* shortcut from parent to grandchild */ 
    		else 
    			q-->right := p-->right; 
    		endif 
    	elseif p-->right == nil then 
    		/* p has only one child, the left one */ 
    		if direction == 0 then 
    			q-->left := p-->left; 
    		else 
    			q-->right := p-->left; 
    		endif
    	else	/* p has two children*/ 
    		/* find the maximum node in the right */
    		/* subtree of p */ 
    		s := p-->left; 
    		q := p;	
    		/* now q will be the parent of s , and direction*/
    		/* will indicate the type of child s is to q*/ 
    		direction = 0; 
    		while s-->right != nil do 
    			q := s; 
    			s := s-->right; 
    			direction := 1; 
    		endwhile 
    		/*Now s points to the maximum node if the */
    		/*left subtree of p*/ 
    		p-->key := s-->key; 
    		/* now node s must be deleted. But since s has */
    		/* no right child, the deletion is done by */ 
    		/* deletion or shortcutting */ 
    		if s-->left == nil then 	/* s is a leaf*/ 
    			if direction == 0 then 
    				q-->left := nil; 
    			else 
    				q-->right := nil; 
    			endif 
    			free (s); 
    			return; 
    		else		/* s has a left child */ 
    			if direction == 0 then 
    				q-->left := s-->left; 
                           	else 
                                   	q-->right := s-->left; 
    			endif 
    			free(s) 
    			return; 
    		endif 
    	endif 
    end delete 
    


  19. the time complexity of insert, delete, and search is O(h) each, where h is the height of the tree.

  20. Full trees and almost complete trees:

    • A full binary tree is a binary tree where every non-leaf has two children and all the leaves are at the same level.

    • Exercises: show that the number of nodes in level i of a full binary tree is 2i. Show also that a full binary tree of height h has 2h+1 - 1 nodes.

    • the canonical labeling of a full binary tree is a labeling of the nodes from top to bottom, left to right, starting with the root being labeled 1.

    • observe that the labels of the children of node i are 2i and 2i+1. The label of the parent of i is [i/2] (integer division).

    • an almost complete binary tree of n nodes is the binary tree consisting of the first n nodes of a full binary tree.

    • observe that if the bottom level of an almost complete binary tree is removed, we are left with a full tree. Also, the nodes in the bottom level are packed to the left end of the tree without any "holes"

    • an almost complete binary tree of n nodes can be implemented with an array A[1:n] of n elements, where A[i] contains the contents of node of label i.

  21. Heaps (min-heaps and max-heaps)

    • Definition: a min-heap is an almost complete binary tree where every node has a value smaller than the values of its children

    • Operations on a min-heap: delete-min and insert
      function delete-min(H) /* H is the heap*/
      begin
          x= root of H;
      a=key of x; /* to be returned at the end*/
      remove a from node x;
      take the last node (node n), remove its key (call it b), and store b in the root;
      remove node n;
      /* now we have to restor the heap*/
          while (x has a key bigger than one of its children) do
      swap x with the smaller child;
      make x point to that child;
      endwhile
          /* note the while loop will stop when x becomes smaller */
          /* than both its children, or when x becomes a leaf*/
      end delete-min

      Time of delete-min: O(h)=O(log n)


      procedure insert(H,a)         /* inserts the key value a into the heap*/ begin
          create a node of label n+1, and insert a into that node;
      let x point to that node;
      while (x is smaller than its parent) do
                 swap the key of x with key of the parent of x;
      let x point to its parent;
          endwhile
          /* this will loop until either x becomes larger than its parent, or it becomes the root*/
      end insert
      Time of insert: O(h)=O(log n)

    • Note that with insert(), we can create a heap from scratch by n insertions into an intially empty heap. This takes time =
      O(log 1 + log 2 + ... + log n) = O(log(n!))= O(nlog n).
      There is a procedure for constructing a heap in O(n) time.

    • By starting with a heap, and calling delete-min n times, we sort the heap in time =
      O(logn + log(n-1) + ... + log 1) = O(nlog n).

  22. Sets and the Union-Find Data Structure

    • General implementations of sets
    • Union-Find problem: Given n sets, where set i has a single element {i}, we need to perform O(n) operations of union (U) and find (F):

      • U(A,B) removes the sets A and B, and replaces them with a new set equal to
        A U B

      • F(x) is a function that returns the name of the set containing x at the time F is called.

    • note that the sets are mutually disjoint at all times.

    • the problem is to design a data structure so that O(n) calls to U and F take as little time as possible.

    • we will carry out the design by having two different representations of the sets: one conceptual and one physical.

    • the conceptual representation of each set is a rooted tree (not necessarily binary). The nodes are labeld with the elements of the corresponding set. The label of the root serves the extra purpose of being the name of the set.

    • the physical representation of all the trees (at once) is a single array called PARENT[1:n].

    • if i is not a root, PARENT[i] is the parent of node i in whatever tree i happens to be.

    • if i is a root, PARENT[i]= -(number of nodes in the tree rooted at i).
    • three different implementations of the Union-Find

    • first implementation
      U(i,j) is carried out by simply making i the parent of j, thus combining the two trees into a single tree, and at once eliminating the old trees as distinct sets.

      F(x) returns the root of the tree containing x by walking up that tree from x, using the PARENT array, until a node is reached which has no parent. That is the root.

      procedure U(i,j)
      begin
           PARENT[i]=PARENTiI]+PARENT[j]; /* this computes the total number of nodes in the new tree */
      PARENT[j]=i;
      end

      Time: O(1), that is, constant.




      function F(x)
      begin
           integer r=x;
      while (PARENT[r] > 0) do
      r = PARENT[r];
      endwhile
           /*now r is a root*/
      return(r);
      end

      Time: O(h) where h is the height of the tree containing x.
      time for O(n) Us and Fs: up to O(n2). The proof is an exercise.

    • Second Implementation
      in U(i,j), make the root of the heavier tree become the parent of the other root. By heavier we mean "has more nodes".

      F(x) is not modified here.

      procedure U(i,j)
      begin
           if |PARENT[i]| >= |PARENT[j]| then
                    PARENT[i]=PARENTiI]+PARENT[j];
      PARENT[j]=i;
           else
                    PARENT[j]=PARENTiI]+PARENT[j];
      PARENT[i]=j;
           endif
      end
      time: O(1)
      Time of an O(n) sequence of Us and Fs: O(nlog n).
      This is because every tree is of height <= log(its number of nodes).
      The proof for that is by induction on the number of Us performed.
      Therefore, every tree is of height <= O(log n).
      Hence, every F(x) takes O(log n) time.
      As a result, O(n) Us takes O(n), and O(n) Fs take O(nlog n).
      The total time for O(n) Us and Fs is then O(nlog n).

    • Third implementation
      U will remain the same as in the second implementation.
      F will be changed using a technique called path compression. In particular, F(x) first finds the root. Afterwards, in a second walk from x to the root, every node on that path is made a direct child of the root.
      function F(x)
      begin
           integer r,s,t;
      r=x;
      while PARENT[r] > 0
                   r = PARENT[r];
           endwhile
      /*now r is a root*/
      s=x;
      while s != r do
                   t := s;
      s := PARENT[s];
      PARENT[t] := r;
           endwhile
      return (r);
      end

      Time of F: O(h) where h is the height of the tree containing x.
      Time of O(n) sequence of Us and Fs: O(nG(n)), where
      G(n) <= 5 for all n <= 265000. That is, for all practical values of n, the time is O(n).
    Abdou Youssef 2001-05-28