U = {1,2,. . ., n}
Operations: union, intersection, member
Implementation possibilities
| ... | ... |
S1 = {1,5,7,9} S2 = {2,5,7,10}
S1 U S2 S1 OR S2
S1
Member(i,S1) True
Union, Intersection
Initialization - creating set not good if | S1 | << n i.e., the size of the set is way smaller than the size of the array representation of it.
see problems of old text on how to improve this
S1
S2
Union, intersection performed by merging two lists.
- O(| S1 | + | S2 |)
Member in time O(| S1 |).
Si
Operations: Union (Si,Sj)
Find(x) - find the set to whichb element x belongs.
Initially each element of U belongs in a set all by itself i.e. n singleton sets
We create sets by performing a sequence of unions.
Union(1,7)
Union(1,8)
Union(1,9)
Given two sets:
Union(1,2)
OR Another example
The representation using a Parent array (whose your daddy?) :-)
Find the Union
Do a Find
union : O(1)
find : O(longest path in tree)
Consider the following sequence:
U(1,2) F(1) U(2,3) F(1) U(3,4) F(1) . . . U(n-1,n) F(1)
we create a degenerate tree
in the worst case a find could take O(n) time
in above example n finds take O(n2) time.
Note: we cannot perform more than n-1 unions
Avoid creating degenerate trees by unioning carefully
li > lj
longest path = li + 1 longest path = li
DEFN: level of node in a tree
Count (i) = no. of nodes in tree rooted at i
union(1,2) union(2,3) union(2,4) union(2,5)
Lemma: Let T be a tree with m nodes created by a sequence of weighted unions. No node in T has level greater than |log m | +1. (should be floor)
Proof: by induction
Base:
m=1: T has 1 level.
| log 1 | +1 = 1
Assume: Suppose thm. (No node in T has level greater than |log m | +1) is true for all trees with less than m nodes.
Proof
Let T be a tree with m nodes ( m>1 ).
Let union (i,j) be the last of the sequence of the unions which led to T.
Let p = count(i)
then count(j) = m-p
without loss of generality, let 1 < p < m/2
then count (i) < count (j)
No. of levels in T
By induction hypothesis, no. of levels in Ti < | log p | +1 Consider adding either way
(1) | log p | +1 +1 < | log m/2 | +2 < | log m | +1
(2) log | (m-p) | +1 < | log m | +1
No. of levels in T < | log m | +1
DONE
Need to store count only for roots. Use parent field to store count.
indices 1 2 3 4 5 6 7 8 9 10
value 1 0 0 0 1 0 1 0 1 0
0 1 0 0 1 0 1 0 0 1
- O(1) time if n - bit boolean operations can be done simultaneously
Member - O(1)
otherwise , O(n)
- O(n) time
Linked lists
Trees
We will only consider disjoint sets.
Algorithm union(i,j)
{ parent(i) = j
}
Algorithm find(i)
{
// find root of tree containing i
j = i
while parent(j) != 0 do
{ j = parent(j)
}
return(j)
}
Remember that same parent array is used to represent all of the subsets of U.
Time complexity of union/find

max (li+1,lj) max (li,lj+1)
weighting rule
Algorithm union(i,j)
{
if count (i) < count (j)
then parent (i) = j
else parent (j) = i
}
// the number of nodes in Ti

= max (no. of levels in Ti +1, no. of levels in Tj )
no. of levels in Tj < | log (m-p) | +1
Algorithm union (i,j)
{
// union sets with roots i and j
// parent(i) = -count(i)
// parent(j) = -count(j)
// negative to know not parent but count
temp = parent(i) + parent(j)
if parent(i) > parent(j) // remember that these are negative so smaller is bigger "real" count
then {parent(i) = j
parent (j) = temp}
else {parent (j) = i
parent (i) = temp}
}