Sets

U = {1,2,. . ., n}

Operations: union, intersection, member

Implementation possibilities

Arrays

  ...  ...  

S1 = {1,5,7,9}

S2 = {2,5,7,10}

S1 U S2           S1 OR S2

S1 S2           S1 AND S2

Member(i,S1)   True       S1(i) = 1

Union, Intersection

Member - O(1)

Initialization - creating set
- O(n) time

not good if | S1 | << n      i.e., the size of the set is way smaller than the size of the array representation of it.

see problems of old text on how to improve this

Linked lists

S1

S2

Union, intersection performed by merging two lists.

- O(| S1 | + | S2 |)

Member in time O(| S1 |).

Trees

We will only consider disjoint sets.

Si   Sj =

Operations: Union (Si,Sj)

Find(x) - find the set to whichb element x belongs.

Initially each element of U belongs in a set all by itself i.e. n singleton sets

We create sets by performing a sequence of unions.

Union(1,7)                    Union(1,8)                        Union(1,9)

Given two sets:

Union(1,2)

OR

Another example

The representation using a Parent array (whose your daddy?) :-)

Find the Union


Algorithm  union(i,j)
{	   parent(i) = j
}

Do a Find

Algorithm  find(i)
{
	// find root of tree containing i 

	j = i
	while parent(j) != 0  do
	{  j = parent(j)
        }
	return(j)

}
Remember that same parent array is used to represent all of the subsets of U.

Time complexity of union/find

union : O(1)

find : O(longest path in tree)

Consider the following sequence:

U(1,2)      F(1)      U(2,3)      F(1)     U(3,4)      F(1)      . . .     U(n-1,n)     F(1)

we create a degenerate tree

in the worst case a find could take O(n) time

in above example n finds take O(n2) time.

Note: we cannot perform more than n-1 unions

Avoid creating degenerate trees by unioning carefully

li > lj

    longest path = li + 1                  longest path = li
     max (li+1,lj)                               max (li,lj+1)

DEFN: level of node in a tree

Count (i) = no. of nodes in tree rooted at i

weighting rule

Algorithm union(i,j)
{
	if count (i) < count (j)
    		then  parent (i) = j
    		else  parent (j) = i
}

union(1,2)

union(2,3)

union(2,4)

union(2,5)

Lemma: Let T be a tree with m nodes created by a sequence of weighted unions. No node in T has level greater than |log m | +1. (should be floor)

Proof: by induction

Base:

m=1: T has 1 level. | log 1 | +1 = 1

Assume:

Suppose thm. (No node in T has level greater than |log m | +1) is true for all trees with less than m nodes.

Proof

Let T be a tree with m nodes ( m>1 ).

Let union (i,j) be the last of the sequence of the unions which led to T.

Let p = count(i)
     // the number of nodes in Ti

then count(j) = m-p

without loss of generality, let 1 < p < m/2

then count (i) < count (j)

No. of levels in T
= max (no. of levels in Ti +1, no. of levels in Tj )

By induction hypothesis, no. of levels in Ti < | log p | +1
no. of levels in Tj < | log (m-p) | +1

Consider adding either way

(1) | log p | +1 +1 < | log m/2 | +2 < | log m | +1

(2) log | (m-p) | +1 < | log m | +1

No. of levels in T < | log m | +1

DONE

Need to store count only for roots.

Use parent field to store count.

Algorithm union (i,j)
{
	// union sets with roots i and j
	//	parent(i) = -count(i)
	//	parent(j) = -count(j)  
       //       negative to know not parent but count

	temp  = parent(i) + parent(j)

	if parent(i) > parent(j)         // remember that these are negative so smaller is bigger "real" count
		then 	{parent(i) = j
			 parent (j) = temp}
		else	{parent (j) =  i
			parent (i) =  temp}
}

Union: O(1)
find: O(log n)

Further improvement is possible on this find

Collapsing rule

If j is a node on the path from i to its root and parent(i) != root(i), then set parent(j) = root (i)

Remember that find just finds the root of the tree with i in it, so all the nodes above the i can direct to the root. If multiple finds will be called, then this is helpful.

Algorithm find (i)
{
	// find root of tree containing i and collapse 
	// find root (i) 

	j = i
	while parent(j) > 0 do 
		{ j = parent(j)
		}

	// collapse 
	
	k =  i
	while k !=  j do
		{
		t = parent(k)
		parent(k) =  j
		k = t
		}
	
	return (j)
}

Running time for m > n finds is slightly > O(m)

For CSCI 311, see Ch. 12 [pg. 486-495]; for CSCI 650, see Ch. 2 [pg. 102- 109] of text.