Performance Analysis, continued

The model of computation affects complexity

'size' of problem

'basic' instruction = 1 unit

Previous analysis valid of each data item is an atomic entity ie., it can be loaded into one memory location or register and is never broken up.

Algorithm MAXIM
	max   'AA. . .A'
	while (more data items) do 
	        read WORD
		if WORD > MAX then MAX =  WORD
	write MAX
In the worst case n reads, n comparisons, n assignments

T (n) = O(n)

What if WORD contains a large number of characters?
The compiler provides machinery to break up WORD into characters on which basic instructions can execute.

Size of problem depends on number of words and length of word.

| WORD | < m characters
assume character comparison takes 1 unit of time

time for comparisons = n x m
time for assignment = n x m
time for reads = n x m

T (n,m) = O(n*m)

Moral of story: sometimes more than one thing can effect timing, hence more than one variable.

Uniform and logarithmic cost (complexity measures)

Things to consider:

Uniform: Basic operations take unit time regardless of size of operands

Logarithmic: Adding two n digit numbers takes O(n) single digit additions

Multiplying two n digit numbers takes O(n2) single digit multiplications

If the value of a number is N it can be stored in n=logb N digits in base b.

Recursion, recurrence relations, proofs by induction

Example will be with a Search on a sorted array. Note the handout is a mergesort so in that example, both sides of the array must be called recursively
Algorithm SEARCH (l, u)

//global  A [ : ], x  //
// search for x in A [l:u]

 if l > u then return 0
 mid <--   (l+u) /2  //want floor 
 case
	: x = A (mid):
	     return (mid)

	: x < A (mid):
	     return (SEARCH (l,mid-1))

        : x > A (mid):
	     return (SEARCH (mid+1,u))
// endcase
// end SEARCH

From case statement, divide and conquer,

like a binary search tree

n = u - l + 1

Since we are breaking in two, for ease of calculations, suppose that n is a power of 2.

T (n)    <    c2                 for n < 1
                    T(n/2)+c1   for n > 1

Expand up (without loss of generality, let n = 2k):

T(1) < c2
T(2) < T(1) + c1
T(4) < T(2) + c1 = T(1) + c1 + c1 = T(1) + 2c1
T(8) < T(4) + c1 = T(1) + 2c1 + c1 = T(1) + 3c1
T(16) < T(8) + c1 = T(1) + 3c1 + c1 = T(1) + 4c1

Do we see a pattern? This is important.

T(2k) < T(1) + kc1 = c2 + kc1

Remember n = 2k so

log2 n = log2 2k
Remember what x = log2y means? Also
reminder

log n = k

T(n) < c2 + c1log n

Now let's look at it the opposite way (breaking down) (some people see it this way, others see it the other...)

T (n)    <    c2                 for n < 1
                    T(n/2)+c1   for n > 1

This means that

T(n/2) < T(n/4) + c1

But then
T(n) < T(n/4) + c1 +c1

Can we get this k variable out?
If k is large enough such that n/2k < 1, we can replace T (n/2k) by c2 i.e. T (n) < c2 + k c1

So, want the smallest k such that : n/2k < 1

implies 2k > n

implies k > log2 n

How did she get that!

logs
If 2k > n, then log2 2k > log2 n
but log2 2k is k. Hence k > log2 n

i.e. k = | log2n |            //should be ceiling log2n but I cannot make those on browsers (note that I will also not always write log2 and use log instead)

T(n) < c2 + c1log n

= 0 (log n)

Verify using proof by induction

THEOREM:
T(n)    <    c2                 for n < 1
                    T(n/2)+c1   for n > 1 Implies that T(n) = O(log n)

PROOF: By induction ( surprise!)

Base: n = 1
then, T(1) = c2 = O(1) ; log 1 = 0 also constant O(0)=O(1)

More explicitly, T(n) < c2 + c1log n so T(1) < c2 + 0 < c2
The recurrance relation states T(n)  < c2 for n < 1; so T(1) < c2
BASE CASE OK .

Induction hypothesis: Assume THM is true: T(n) = O(log n) for all n < n'.

Induction step:

Suppose n = n' + 1 (how does this differ from us saying n+1 in our earlier induction proofs?)

then, T (n) < T(n/2) + c1 (by definition of T(n))

since n/2 < n', by induction hypothesis, T (n/2) = 0(log n/2)

which means that T(n/2) < c3 log n/2 for some constant c3 and n > some n0 (by definition of O(g(n)))

Ok, let's look at T(n) again, Remember, T(n) < T (n/2) + c1

T (n) < c3 log n/2 + c1

< c3 (log n - 1) + c1

< c3 log n - c3 + c1

= O(log n)

By induction, theorem is true for all n.

DEFN.: f(n) = 0(g(n)) iff there exist two positive constants c and n0 such that f(n) < c* g(n) for all n > n0.

OK, OK, I will remind you: c3log n/2 = c3(log n - log 2) = c3(log n - 1)= c3log n - c3


Notice that on these recurrances, sometimes it is easier to think in terms of k rather than n. Specifically, we will be using powers of 2 so n=2k.

Instead of
T(1) = c2
T(2) = T(1) + c1
T(4) = T(2) + c1 = T(1) + c1 + c1 = T(1) + 2c1
T(8) = T(4) + c1 = T(1) + 2c1 + c1 = T(1) + 3c1
T(16) = T(8) + c1 = T(1) + 3c1 + c1 = T(1) + 4c1

We think of n = 2k and (T(n) = T(2k)) and then the pattern in terms of the number of Recursive steps.
I.E. ,T(1)=T(20)=R(0), T(2)=T(21)=R(1)
R(0)= c2
R(1)= c2 + c1
R(2)= c2 + 2*c1
R(3)= c2 + 3*c1
R(4)= c2 + 4*c1
...
R(k)= c2 + k*c1
Since n = 2k, k will be log n and hence it is easy to spot log n recurrances. (sometimes)

See handout. Note a sort is 2T(n/2).

For the example that I show from the exercises, I will make my guess using R(k).
If I haven't already done this, see the
handout (a)