introduction to algorithms (design and analysis of algorithms)
DAA
Algorithms
An algorithm is any well-defined computational procedure that takes some value, or set of values, as input and produces some value, or set of values, as output. An algorithm is thus a sequence of computational steps that transform the input into the output.
We can also view an algorithm as a tool for solving a well-specified computational problem. The statement of the problem specifies in general terms the desired input/output relationship. The algorithm describes a specific computational procedure for achieving that input/output relationship.
For example, we might need to sort a sequence of numbers into nondecreasing order. This problem arises frequently in practice and provides fertile ground for introducing many standard design techniques and analysis tools. Here is how we formally define the sorting problem:
Input: A sequence of n numbers (a1, a2,….,an).
Output: A permutation (reordering) (a1′,a2′,…..an’) of the input sequence such that a1′ < a2′ < …. < an’.
For example, given the input sequence (31, 41, 59, 26, 41, 58), a sorting algorithm returns as output the sequence (26, 31, 41, 41, 58, 59). Such an input sequence is called an instance of the sorting problem. In general, an instance of a problem consists of the input (satisfying whatever constraints are imposed in the problem statement) needed to compute a solution to the problem.
An algorithm is said to be correct if, for every input instance, it halts with the correct output. We say that a correct algorithm solves the given computational problem. An incorrect algorithm might not halt at all on some input instances, or it might halt with an incorrect answer. Contrary to what you might expect, incorrect algorithms can sometimes be useful, if we can control their error rate. We shall see an example of an algorithm with a controllable error rate in Chapter 31 when we study algorithms for finding large prime numbers. Ordinarily, however, we shall be concerned only with correct algorithms.
An algorithm can be specified in English, as a computer program, or even as a hardware design. The only requirement is that the specification must provide a precise description of the computational procedure to be followed.
Asymptotic Notations Analysis Of Algorithms
Thus for efficiency with respect to large problems the analysis is made for large values of the input size and dominant terms plays and important role. The analysis that is conducted for large values of input size is called as asymptotic analysis.
In Asymptotic analysis it is considered that an algorithm ‘A1’ is better than algorithm ‘A2’ if the order of growth of the running time of the ‘A1’ is lower than that of ‘A2’. Therefore asymptotic efficiency of algorithms are concerned with how the running time of an algorithm increases with the size of the input in the limit, as the size of the input increases without bound.
Asymptotic Notations: For every algorithm corresponding to efficiency analysis, we have three basic cases :
- Best Case
- Average Case
- Worst Case
And there are 5 notations to resemble them:
Types Of Asymptotic Notations Analysis Of Algorithms
There are mainly three asymptotic notations:
- Big-O Notation (O-notation)
- Omega Notation (Ω-notation)
- Theta Notation (Θ-notation)
1. Big-O Notation (O-Notation):
Big-O notation represents the upper bound of the running time of an algorithm. Therefore, it gives the worst-case complexity of an algorithm.
If f(n) describes the running time of an algorithm, f(n) is O(g(n)) if there exist a positive constant C and n0 such that, 0 ≤ f(n) ≤ cg(n) for all n ≥ n0
It returns the highest possible output value (big-O)for a given input.
The execution time serves as an upper bound on the algorithm’s time complexity.
Mathematical Representation of Big-O Notation:
O(g(n)) = { f(n): there exist positive constants c and n0 such that 0 ≤ f(n) ≤ cg(n) for all n ≥ n0 }
For example, Consider the case of Insertion Sort. It takes linear time in the best case and quadratic time in the worst case. We can safely say that the time complexity of the Insertion sort is O(n2).
Note: O(n2) also covers linear time.
If we use Θ notation to represent the time complexity of Insertion sort, we have to use two statements for best and worst cases:
- The worst-case time complexity of Insertion Sort is Θ(n2).
- The best case time complexity of Insertion Sort is Θ(n).
The Big-O notation is useful when we only have an upper bound on the time complexity of an algorithm. Many times we easily find an upper bound by simply looking at the algorithm.
Examples :
{ 100 , log (2000) , 10^4 } belongs to O(1)
U { (n/4) , (2n+3) , (n/100 + log(n)) } belongs to O(n)
U { (n^2+n) , (2n^2) , (n^2+log(n))} belongs to O( n^2)
Note: Here, U represents union, we can write it in these manner because O provides exact or upper bounds .
2. Omega Notation (Ω-Notation):
Omega notation represents the lower bound of the running time of an algorithm. Thus, it provides the best case complexity of an algorithm.
The execution time serves as a both lower bound on the algorithm’s time complexity.
It is defined as the condition that allows an algorithm to complete statement execution in the shortest amount of time.
Let g and f be the function from the set of natural numbers to itself. The function f is said to be Ω(g), if there is a constant c > 0 and a natural number n0 such that c*g(n) ≤ f(n) for all n ≥ n0
Mathematical Representation of Omega notation :
Ω(g(n)) = { f(n): there exist positive constants c and n0 such that 0 ≤ cg(n) ≤ f(n) for all n ≥ n0 }
Let us consider the same Insertion sort example here. The time complexity of Insertion Sort can be written as Ω(n), but it is not very useful information about insertion sort, as we are generally interested in worst-case and sometimes in the average case.
Examples :
{ (n^2+n) , (2n^2) , (n^2+log(n))} belongs to Ω( n^2)
U { (n/4) , (2n+3) , (n/100 + log(n)) } belongs to Ω(n)
U { 100 , log (2000) , 10^4 } belongs to Ω(1)
Note: Here, U represents union, we can write it in these manner because Ω provides exact or lower bounds
3. Theta Notation (Θ-Notation):
Theta notation encloses the function from above and below. Since it represents the upper and the lower bound of the running time of an algorithm, it is used for analyzing the average-case complexity of an algorithm.
Let g and f be the function from the set of natural numbers to itself. The function f is said to be Θ(g), if there are constants c1, c2 > 0 and a natural number n0 such that c1* g(n) ≤ f(n) ≤ c2 * g(n) for all n ≥ n0
Mathematical Representation of Theta notation:
Θ (g(n)) = {f(n): there exist positive constants c1, c2 and n0 such that 0 ≤ c1 * g(n) ≤ f(n) ≤ c2 * g(n) for all n ≥ n0}
Note: Θ(g) is a set
The above expression can be described as if f(n) is theta of g(n), then the value f(n) is always between c1 * g(n) and c2 * g(n) for large values of n (n ≥ n0). The definition of theta also requires that f(n) must be non-negative for values of n greater than n0.
The execution time serves as both a lower and upper bound on the algorithm’s time complexity.
It exist as both, most, and least boundaries for a given input value.
A simple way to get the Theta notation of an expression is to drop low-order terms and ignore leading constants. For example, Consider the expression 3n3 + 6n2 + 6000 = Θ(n3), the dropping lower order terms is always fine because there will always be a number(n) after which Θ(n3) has higher values than Θ(n2) irrespective of the constants involved. For a given function g(n), we denote Θ(g(n)) is following set of functions.
Examples :
{ 100 , log (2000) , 10^4 } belongs to Θ(1)
{ (n/4) , (2n+3) , (n/100 + log(n)) } belongs to Θ(n)
{ (n^2+n) , (2n^2) , (n^2+log(n))} belongs to Θ( n2)
Note: Θ provides exact bounds.
Performance Analysis (Time and Space complexity)
Performance analysis of an algorithm depends upon two factors i.e. amount of memory used and amount of compute time consumed on any CPU. Formally they are notified as complexities in terms of:
- Space Complexity.
- Time Complexity.
Space Complexity
Space Complexity of an algorithm is the amount of memory it needs to run to completion i.e. from start of execution to its termination. Space need by any algorithm is the sum of following components:
- Fixed Component: This is independent of the characteristics of the inputs and outputs. This part includes: Instruction Space, Space of simple variables, fixed size component variables, and constants variables.
- Variable Component: This consist of the space needed by component variables whose size is dependent on the particular problems instances(Inputs/Outputs) being solved, the space needed by referenced variables and the recursion stack space is one of the most prominent components. Also this included the data structure components like Linked list, heap, trees, graphs etc.
Therefore the total space requirement of any algorithm ‘A’ can be provided as
Space(A) = Fixed Components(A) + Variable Components(A)
Among both fixed and variable component the variable part is important to be determined accurately, so that the actual space requirement can be identified for an algorithm ‘A’. To identify the space complexity of any algorithm following steps can be followed:
- Determine the variables which are instantiated by some default values.
- Determine which instance characteristics should be used to measure the space requirement and this is will be problem specific.
- Generally the choices are limited to quantities related to the number and magnitudes of the inputs to and outputs from the algorithms.
- Sometimes more complex measures of the interrelationships among the data items can used.
Example: Space Complexity
Algorithm Sum(number,size)\\ procedure will produce sum of all numbers provided in ‘number’ list
{
result=0.0;
for count = 1 to size do \\will repeat from 1,2,3,4,….size times
result= result + number[count];
return result;
}
In above example, when calculating the space complexity we will be looking for both fixed and variable components. here we have
Fixed components as ‘result’,’count’ and ‘size’ variable there for total space required is three(3) words.
Variable components is characterized as the value stored in ‘size’ variable (suppose value store in variable ‘size ‘is ‘n’). because this will decide the size of ‘number’ list and will also drive the for loop. therefore if the space used by size is one word then the total space required by ‘number’ variable will be ‘n'(value stored in variable ‘size’).
therefore the space complexity can be written as Space(Sum) = 3 + n;
Time Complexity
Time Complexity of an algorithm(basically when converted to program) is the amount of computer time it needs to run to completion. The time taken by a program is the sum of the compile time and the run/execution time . The compile time is independent of the instance(problem specific) characteristics. following factors effect the time complexity:
- Characteristics of compiler used to compile the program.
- Computer Machine on which the program is executed and physically clocked.
- Multiuser execution system.
- Number of program steps.
Therefore the again the time complexity consist of two components fixed(factor 1 only) and variable/instance(factor 2,3 & 4), so for any algorithm ‘A’ it is provided as:
Time(A) = Fixed Time(A) + Instance Time(A)
Here the number of steps is the most prominent instance characteristics and The number of steps any program statement is assigned depends on the kind of statement like
- comments count as zero steps,
- an assignment statement which does not involve any calls to other algorithm is counted as one step,
- for iterative statements we consider the steps count only for the control part of the statement etc.
Therefore to calculate total number program of program steps we use following procedure. For this we build a table in which we list the total number of steps contributed by each statement. This is often arrived at by first determining the number of steps per execution of the statement and the frequency of each statement executed. This procedure is explained using an example.
Example: Space Complexity
In above example if you analyze carefully frequency of “for count = 1 to size do” it is ‘size +1’ this is because the statement will be executed one time more die to condition check for false situation of condition provided in for statement. Now once the total steps are calculated they will resemble the instance characteristics in time complexity of algorithm. Also the repeated compile time of an algorithm will also be constant every time we compile the same set of instructions so we can consider this time as constant ‘C’. Therefore the time complexity can be expressed as: Time(Sum) = C + (2size +3)
So in this way both the Space complexity and Time complexity can be calculated. Combination of both complexity comprises the Performance analysis of any algorithm and can not be used independently. Both these complexities also helps in defining parameters on basis of which we optimize algorithms.
Worst, Average and Best Case Analysis of Algorithms
Popular Notations in Complexity Analysis of Algorithms
1. Big-O Notation
We define an algorithm’s worst-case time complexity by using the Big-O notation, which determines the set of functions grows slower than or at the same rate as the expression. Furthermore, it explains the maximum amount of time an algorithm requires to consider all input values.
2. Omega Notation
It defines the best case of an algorithm’s time complexity, the Omega notation defines whether the set of functions will grow faster or at the same rate as the expression. Furthermore, it explains the minimum amount of time an algorithm requires to consider all input values.
3. Theta Notation
It defines the average case of an algorithm’s time complexity, the Theta notation defines when the set of functions lies in both O(expression) and Omega(expression), then Theta notation is used. This is how we define a time complexity average case for an algorithm.
Measurement of Complexity of an Algorithm
Based on the above three notations of Time Complexity there are three cases to analyze an algorithm:
1. Worst Case Analysis (Mostly used)
In the worst-case analysis, we calculate the upper bound on the running time of an algorithm. We must know the case that causes a maximum number of operations to be executed. For Linear Search, the worst case happens when the element to be searched (x) is not present in the array. When x is not present, the search() function compares it with all the elements of arr[] one by one. Therefore, the worst-case time complexity of the linear search would be O(n).
2. Best Case Analysis (Very Rarely used)
In the best-case analysis, we calculate the lower bound on the running time of an algorithm. We must know the case that causes a minimum number of operations to be executed. In the linear search problem, the best case occurs when x is present at the first location. The number of operations in the best case is constant (not dependent on n). So time complexity in the best case would be Ω(1)
3. Average Case Analysis (Rarely used)
In average case analysis, we take all possible inputs and calculate the computing time for all of the inputs. Sum all the calculated values and divide the sum by the total number of inputs. We must know (or predict) the distribution of cases. For the linear search problem, let us assume that all cases are uniformly distributed (including the case of x not being present in the array). So we sum all the cases and divide the sum by (n+1). Following is the value of average-case time complexity.
Average Case Time = \sum_{i=1}^{n}\frac{\theta (i)}{(n+1)} = \frac{\theta (\frac{(n+1)*(n+2)}{2})}{(n+1)} = \theta (n)