Algorithms are ubiquitous in Computer Science and Software Engineering. Selection of appropriate algorithms and data structures improves our program efficiency in cost and time.
What is an algorithm? Informally, an algorithm is a procedure to accomplish a specific task. [ Skiena:2008:ADM:1410219] Specifically, an algorithm is a well-defined computational procedure, which takes some value (or set of values) as input and produces some value, or a set of values, as output. An algorithm is thus a sequence of computational steps that transform the input into the output. Cormen et. al. does not explicitly remark that an algorithm does not necessarily require an input. [Cormen:2001:IA:580470]
Formally, an algorithm must satisfy five features: [Knuth:1997:ACP:260999]
A procedure that lacks finiteness but satisfies all other characteristics of an algorithm may be called a computational method. [Knuth:1997:ACP:260999]
All algorithms are a list of steps to solve a problem. Each step has dependencies on some set of previous steps, or the start of the algorithm. A small problem might look like the following:
This structure is called a directed acyclic graph, or DAG for short. The links between each node in the graph represent dependencies in the order of operations, and there are no cycles in the graph.
How do dependencies happen? Take for example the following code:
total = 0
for(i = 1; i < 10; i++)
total = total + i
In this psuedocode, each iteration of the for loop is dependent on the result from the previous iteration because we are using the value calculated in the previous iteration in this next iteration. The DAG for this code might look like this:
If you understand this representation of algorithms, you can use it to understand algorithm complexity in terms of work and span.
Work is the actual number of operations that need to be executed in order to achieve the goal of the algorithm for a given input size n
.
Span is sometimes referred to as the critical path, and is the fewest number of steps an algorithm must make in order to achieve the goal of the algorithm.
The following image highlights the graph to show the differences between work and span on our sample DAG.
The work is the number of nodes in the graph as a whole. This is represented by the graph on the left above. The span is the critical path, or longest path from the start to the end. When work can be done in parallel, the yellow highlighted nodes on the right represent span, the fewest number of steps required. When work must be done serially, the span is the same as the work.
Both work and span can be evaluated independently in terms of analysis. The speed of an algorithm is determined by the span. The amount of computational power required is determined by the work.
Graphs are a mathematical structure that model sets of objects that may or may not be connected with members from sets of edges or links.
A graph can be described through two different sets of mathematical objects:
Graphs can be either directed or undirected.
Dynamic Programming is an improvement on Brute Force, see this example to understand how one can obtain a Dynamic Programming solution from Brute Force.
A Dynamic Programming Solution has 2 main requirements:
Overlapping Problems
Optimal Substructure
Overlapping Subproblems means that results of smaller versions of the problem are reused multiple times in order to arrive at the solution to the original problem
Optimal Substructure means that there is a method of calculating a problem from its subproblems.
A Dynamic Programming Solution has 2 main components, the State and the Transition
The State refers to a subproblem of the original problem.
The Transition is the method to solve a problem based on its subproblems
The time taken by a Dynamic Programming Solution can be calculated as No. of States * Transition Time
. Thus if a solution has N^2
states and the transition is O(N)
, then the solution would take roughly O(N^3)
time.
Kruskal's Algorithm is a greedy algorithm used to find Minimum Spanning Tree (MST) of a graph. A minimum spanning tree is a tree which connects all the vertices of the graph and has the minimum total edge weight.
Kruskal's algorithm does so by repeatedly picking out edges with minimum weight (which are not already in the MST) and add them to the final result if the two vertices connected by that edge are not yet connected in the MST, otherwise it skips that edge. Union - Find data structure can be used to check whether two vertices are already connected in the MST or not. A few properties of MST are as follows:
n
vertices will have exactly n-1
edges.A greedy algorithm is an algorithm in which in each step we choose the most beneficial option in every step without looking into the future. The choice depends only on current profit.
Greedy approach is usually a good approach when each profit can be picked up in every step, so no choice blocks another one.
Definition
The Big-O notation is at its heart a mathematical notation, used to compare the rate of convergence of functions. Let n -> f(n)
and n -> g(n)
be functions defined over the natural numbers. Then we say that f = O(g)
if and only if f(n)/g(n)
is bounded when n approaches infinity. In other words, f = O(g)
if and only if there exists a constant A, such that for all n, f(n)/g(n) <= A
.
Actually the scope of the Big-O notation is a bit wider in mathematics but for simplicity I have narrowed it to what is used in algorithm complexity analysis : functions defined on the naturals, that have non-zero values, and the case of n growing to infinity.
What does it mean ?
Let's take the case of f(n) = 100n^2 + 10n + 1
and g(n) = n^2
. It is quite clear that both of these functions tend to infinity as n tends to infinity. But sometimes knowing the limit is not enough, and we also want to know the speed at which the functions approach their limit. Notions like Big-O help compare and classify functions by their speed of convergence.
Let's find out if f = O(g)
by applying the definition. We have f(n)/g(n) = 100 + 10/n + 1/n^2
. Since 10/n
is 10 when n is 1 and is decreasing, and since 1/n^2
is 1 when n is 1 and is also decreasing, we have ̀f(n)/g(n) <= 100 + 10 + 1 = 111
. The definition is satisfied because we have found a bound of f(n)/g(n)
(111) and so f = O(g)
(we say that f is a Big-O of n^2
).
This means that f tends to infinity at approximately the same speed as g. Now this may seem like a strange thing to say, because what we have found is that f is at most 111 times bigger than g, or in other words when g grows by 1, f grows by at most 111. It may seem that growing 111 times faster is not "approximately the same speed". And indeed the Big-O notation is not a very precise way to classify function convergence speed, which is why in mathematics we use the equivalence relationship when we want a precise estimation of speed. But for the purposes of separating algorithms in large speed classes, Big-O is enough. We don't need to separate functions that grow a fixed number of times faster than each other, but only functions that grow infinitely faster than each other. For instance if we take h(n) = n^2*log(n)
, we see that h(n)/g(n) = log(n)
which tends to infinity with n so h is not O(n^2), because h grows infinitely faster than n^2.
Now I need to make a side note : you might have noticed that if f = O(g)
and g = O(h)
, then f = O(h)
. For instance in our case, we have f = O(n^3)
, and f = O(n^4)
... In algorithm complexity analysis, we frequently say f = O(g)
to mean that f = O(g)
and g = O(f)
, which can be understood as "g is the smallest Big-O for f". In mathematics we say that such functions are Big-Thetas of each other.
How is it used ?
When comparing algorithm performance, we are interested in the number of operations that an algorithm performs. This is called time complexity. In this model, we consider that each basic operation (addition, multiplication, comparison, assignment, etc.) takes a fixed amount of time, and we count the number of such operations. We can usually express this number as a function of the size of the input, which we call n. And sadly, this number usually grows to infinity with n (if it doesn't, we say that the algorithm is O(1)). We separate our algorithms in big speed classes defined by Big-O : when we speak about a "O(n^2) algorithm", we mean that the number of operations it performs, expressed as a function of n, is a O(n^2). This says that our algorithm is approximately as fast as an algorithm that would do a number of operations equal to the square of the size of its input, or faster. The "or faster" part is there because I used Big-O instead of Big-Theta, but usually people will say Big-O to mean Big-Theta.
When counting operations, we usually consider the worst case: for instance if we have a loop that can run at most n times and that contains 5 operations, the number of operations we count is 5n. It is also possible to consider the average case complexity.
Quick note : a fast algorithm is one that performs few operations, so if the number of operations grows to infinity faster, then the algorithm is slower: O(n) is better than O(n^2).
We are also sometimes interested in the space complexity of our algorithm. For this we consider the number of bytes in memory occupied by the algorithm as a function of the size of the input, and use Big-O the same way.
Given a directed graph G
, we often want to find the shortest distance from a given node A
to rest of the nodes in the graph. Dijkstra algorithm is the most famous algorithm for finding the shortest path, however it works only if edge weights of the given graph are non-negative. Bellman-Ford however aims to find the shortest path from a given node (if one exists) even if some of the weights are negative. Note that, shortest distance may not exist if a negative cycle is present in the graph (in which case we can go around the cycle resulting in infinitely small total distance ). Bellman-Ford additionally allows us to determine the presence of such a cycle.
Total complexity of the algorithm is O(V*E)
, where V - is the number of vertices and E
number of edges
Trees are a sub-category or sub-type of node-edge graphs. They are ubiquitous within computer science because of their prevalence as a model for many different algorithmic structures that are, in turn, applied in many different algorithms
When we analyze the performance of the sorting algorithm, we interest primarily on the number of comparison and exchange.
Let En be the total average number of exchange to sort array of N element. E1 = 0 (we do not need any exchange for array with one element). The average number of exchange to sort N element array is the sum of average number of number of exchange to sort N-1 element array with the average exchange to insert the last element into N-1 element array.
Simplify the summation (arithmetic series)
Expands the term
Simplify the summation again (arithmetic series)
Let Cn be the total average number of comparison to sort array of N element. C1 = 0 (we do not need any comparison on one element array). The average number of comparison to sort N element array is the sum of average number of number of comparison to sort N-1 element array with the average comparison to insert the last element into N-1 element array. If last element is largest element, we need only one comparison, if the last element is the second smallest element, we need N-1 comparison. However, if last element is the smallest element, we do not need N comparison. We still only need N-1 comparison. That is why we remove 1/N in below equation.
Simplify the summation (arithmetic series)
Expand the term
Simplify the summation again (arithmetic series and harmonic number)
The Travelling Salesman Problem is the problem of finding the minimum cost of travelling through N
vertices exactly once per vertex. There is a cost cost[i][j]
to travel from vertex i
to vertex j
.
There are 2 types of algorithms to solve this problem: Exact Algorithms and Approximation Algorithms
Exact Algorithms
Approximation Algorithms
To be added
Sometimes Quicksort is also known as Partition-Exchange sort.
Auxiliary Space: O(n)
Time complexity: worst O(n²)
, bestO(nlogn)
The Knapsack problem mostly arises in resources allocation mechanisms. The name "Knapsack" was first introduced by Tobias Dantzig.
Auxiliary Space: O(nw)
Time Complexity O(nw)
Pseudocode is by definition informal. This topic is meant to describe ways to translate language-specific code into something everyone with a programming background can understand.
Pseudocode is an important way to describe an algorithm and is more neutral than giving a langugage-specific implementation. Wikipedia often uses some form of pseudocode when describing an algorithm
Some things, like if-else type conditions are quite easy to write down informally. But other things, js-style callbacks for instance, may be hard to turn into pseudocode for some people.
This is why these examples may prove useful
Definition 1: An optimization problem Π consists of a set of instances ΣΠ. For every instance σ∈ΣΠ there is a set Ζσ of solutions and a objective function fσ : Ζσ → ℜ≥0 which assigns apositive real value to every solution.
We say OPT(σ) is the value of an optimal solution, A(σ) is the solution of an Algorithm A for the problem Π and wA(σ)=fσ(A(σ)) its value.
Definition 2: An online algorithm A for a minimization problem Π has a competetive ratio of r ≥ 1 if there is a constant τ∈ℜ with
wA(σ) = fσ(A(σ)) ≤ r ⋅ OPT(&sigma) + τ
for all instances σ∈ΣΠ. A is called a r-competitive online algorithm. Is even
wA(σ) ≤ r ⋅ OPT(&sigma)
for all instances σ∈ΣΠ then A is called a strictly r-competitive online algorithm.
Proposition 1.3: LRU and FWF are marking algorithm.
Proof: At the beginning of each phase (except for the first one) FWF has a cache miss and cleared the cache. that means we have k empty pages. In every phase are maximal k different pages requested, so there will be now eviction during the phase. So FWF is a marking algorithm.
Lets assume LRU is not a marking algorithm. Then there is an instance σ where LRU a marked page x in phase i evicted. Let σt the request in phase i where x is evicted. Since x is marked there has to be a earlier request σt* for x in the same phase, so t* < t. After t* x is the caches newest page, so to got evicted at t the sequence σt*+1,...,σt has to request at least k from x different pages. That implies the phase i has requested at least k+1 different pages which is a contradictory to the phase definition. So LRU has to be a marking algorithm.
Proposition 1.4: Every marking algorithm is strictly k-competitive.
Proof: Let σ be an instance for the paging problem and l the number of phases for σ. Is l = 1 then is every marking algorithm optimal and the optimal offline algorithm cannot be better.
We assume l ≥ 2. the cost of every marking algorithm for instance σ is bounded from above with l ⋅ k because in every phase a marking algorithm cannot evict more than k pages without evicting one marked page.
Now we try to show that the optimal offline algorithm evicts at least k+l-2 pages for σ, k in the first phase and at least one for every following phase except for the last one. For proof lets define l-2 disjunct subsequences of σ. Subsequence i ∈ {1,...,l-2} starts at the second position of phase i+1 and end with the first position of phase i+2.
Let x be the first page of phase i+1. At the beginning of subsequence i there is page x and at most k-1 different pages in the optimal offline algorithms cache. In subsequence i are k page request different from x, so the optimal offline algorithm has to evict at least one page for every subsequence. Since at phase 1 beginning the cache is still empty, the optimal offline algorithm causes k evictions during the first phase. That shows that
wA(σ) ≤ l⋅k ≤ (k+l-2)k ≤ OPT(σ) ⋅ k
Corollary 1.5: LRU and FWF are strictly k-competitive.
Is there no constant r for which an online algorithm A is r-competitive, we call A not competitive.
Proposition 1.6: LFU and LIFO are not competitive.
Proof: Let l ≥ 2 a constant, k ≥ 2 the cache size. The different cache pages are nubered 1,...,k+1. We look at the following sequence:
First page 1 is requested l times than page 2 and so one. At the end there are (l-1) alternating requests for page k and k+1.
LFU and LIFO fill their cache with pages 1-k. When page k+1 is requested page k is evicted and vice versa. That means every request of subsequence (k,k+1)l-1 evicts one page. In addition their are k-1 cache misses for the first time use of pages 1-(k-1). So LFU and LIFO evict exact k-1+2(l-1) pages.
Now we must show that for every constant τ∈ℜ and every constan r ≤ 1 there exists an l so that
which is equal to
To satisfy this inequality you just have to choose l sufficient big. So LFU and LIFO are not competetive.
Proposition 1.7: There is no r-competetive deterministic online algorithm for paging with r < k.
The first thing you have to do in this algorithm to get all of the vertices of the graph sorted in descending order according to its degree.
After that you have iterate on them and add each one to final vertices set which don't have any adjacent vertex in this set.
In the final stage iterate on the final vertices set and remove all of the vertices which have one of its adjacent vertices in this set.
Memoization - an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again.
Dynamic Programming - a method for solving a complex problem by breaking it down into a collection of simpler subproblems, solving each of those subproblems just once, and storing their solutions.