This guide acts as a follow-up to the talk Binary vs Interpolation Search.
Time complexity and search algorithms. A walk through their definition, purpose, use cases and constraints. A search algorithm is a technique used to locate an item in a certain data structure.
Introduction
A search algorithm is a technique used to locate an item in a certain data structure.
Before we begin, you should have an understanding of the below:
- Arrays
- Iteration and recursion
Let's give an overview of time complexity and what it entails.
Time Complexity
A way to show how the runtime of a function increases.
Broadly speaking, the time complexity of a program is grouped into three:
- Linear time
- Constant time
- Quadratic time
Linear Time O(n)
The processing time of a program will increase as the size of the input increases.
Example We can create a program that processes a list of items stored in an array. A case example can be a program to process locally stored files.
Constant Time O(1)
Here, no matter what input we parse to our program, we get the results in the same time frame. Hence, a process taking 10_000_000 items returns in the same time as the one taking in 7 items.
Quadratic time O(n2)
Where n2 is n squared. As the name suggests, a program will return results in a time period that resembles a quadratic curve. This time is common with multi-dimensional arrays/lists of lists, since:
- We loop over the list(say A) to get individual lists
- We loop over a specific list(say B) in this list to get the item and perform an operation on it.
For reference, the equation:
a + bx + c
Example: A program to calculate the product of integers in an array.
int array_products(int[] a){
int product = 0; // O(1)
for(int number: a){
product = product * number; // O(1)
}
return product; // O(1)
}
We can get an overview of how long a function will take, based on how large of an array we have in this particular function/ method.
In our case, since we can get the time complexity of this function as below:
O(1) + n * O(1) + O(1)
Since we are adding a value to the summation x number of times, we have O(1) happening for each of those occurrences. Thus n O(1)*.
Since a constant + another constant is still a constant.
O(1) + n * O(1)
Get the fastest-growing term from the equation, i.e:
n * O(1)
Then remove the coefficient to get O(n)
(linear time i.e, the function takes longer to complete the larger the array)
Binary Search
Given a sorted array, binary search locates an item in question, using the 'divide and conquor'
Time complexity: O(log n)
Binary Search pseudocode
- Get the first and last item of the sorted array.
- Add the first and last item indexes and divide by 2
- Get the element in the middle from the step above i.e middle
- If the value at the middle index is equal to the searched item, return the index
- If the value at the index is greater than the searched item, set the middle item to be the new high and discard the right
- If the value at the index is less than searched item, make the item the new low and discard the left side
- Repeat the process until the index is obtained
- Return -1 if the item is absent
Visual representation
Implementation of binary search
int search( int[] a, target){
int arrayLength = a.length;
int left = 0;
int right = a.length;
while(left <= right){
int middle = (left + right) / 2;
if(a[middle] == target){
return middle;
}
if(target < a[middle]){
right = middle - 1;
}else{
left = middle + 1;
}
return -1;
}
Applications of Binary Search
- Database indexing
Within your database of choice, a binary tree will be used where it will precompute the middle point on each step, until reaching the single item. More details of database indexing can be found here
- 3D games and applications.
Binary space partitioning occurs as space is divided into a tree structure and a binary search is used to retrieve which subdivisions to display according to a 3D position and camera.
Git debugging with git bisect
Autocomplete search
Consider a user typing in a URL on a browser. Given a history of searches, instead of going through each one and comparing, a binary search is used to perform a faster lookup across the history.
- Searching for prefixes
Instead of an array of integers, you can have an array of strings where you intend to search for an item that starts with a certain string or character sequence.
Going through a dictionary to find a word
Git cherry picking
A clever example of where this is used is by Sarv Shakti, who used it to identify what commit broke the application
Example: Consider a scenario where you have a list of maps storing student records in your classroom. Given a name, or phone number, which algorithm would you use to search through this? What if the records got to 10_000 or more? How would this grow?
Interpolation Search
This is an improvement over binary search. Gets an element from where it is more likely to exist.
Time complexity: O(log log N)
Constraints of interpolation search
- The array should be sorted
- The array should be uniformly distributed.
Note: Uniform distribution in an array implies that the difference between subsequent elements should be relatively equal. In this case, an array [10, 30, 300000, 23_000_000] would fail to meet this condition.
Example: Searching for the word 'zelda' in a dictionary of words from letters 'a' to 'z'.
Instead of starting from the middle item in the array, we go closer to the end of the array.
Time complexity and its relation to interpolation search
A walk through the equations
y = m*x + c
arr[index] = m*index + c
where: index: the position of the element in the array arr[index]: the value at this index in the array
arr[low] = m*low + c —– (1)
arr[high] = m*high + c —– (2)
Subtract equation (1) from (2), and we get
arr[high] – arr[low] = m * (high – low)
m = (arr[high] – arr[low]) / (high – low) —– (3)
Say we are searching for an element, K in this array.
arr[index] = m*index + c
Then replacing target in the above equation, we get
target = m*index + c —– (4)
target – arr[low] = m * (index – low)
index – low = (target – arr[low]) / m
index = low + (target – arr[low]) / m
index = low + (target - arr[low])*(high - low) / (arr[high] - arr[low])
Example
Array to search through:
[32, 35, 37, 39, 42, 44, 46, 48]
We will search for target = 42
in the above array.
The size of the array is 8
.
low = 0
,
high = 7
,
arr[low] = 32
,
arr[high] = 48
index = 0 + (42 - 32) * (7 - 0) / (48 - 22) = 2.6923 ~ 2
arr[2] = 37 < 42
Remember to floor the value of the index
Interpolation search pseudocode
- Calculate the Index using the interpolation index formula.
- If the target is smaller than the item at that index, search in the lower sub-list. Calculate the index for the left sub-array. Let low = index - 1
- If the target is greater than the item at that index, search in higher sub-list. Calculate the position of the right sub-array by assigning low = index + 1.
- If the target is a match, return the index of the item from the array and exit.
- If it is not a match, probe position.
- Repeat until the match or array is eventually 0
Implementation of Interpolation search
int search(int[] array, int target) {
int n = array.length;
int low = 0;
int high = n - 1;
while (low <= high) {
int index = low + ((target - array[low]) * (high - low)) / (array[high] - array[low]);
if (array[index] < target) {
low = index + 1;
} else if (array[index] > target) {
high = index - 1;
} else {
return index;
}
}
return -1;
}
Applications of interpolation search
Interpolation search can be used in the same areas where binary search is used. The only caveat is that this data must be uniformly distributed. A failure to meet this condition will give you a time complexity of O(n).
Benchmark
Comparatively, given uniformly distributed data, this difference might occur as below:
Resources and questions
These are locations to practice your data structures along with links to various challenges that may hold the above mentioned concepts or pivot them accordingly.
- The two sum problem
- How many numbers are smaller than the current number
- Middle of a linked list
- Container with most water
- OpenSource collection - TheAlgorithmCore
Other types of search algorithms
- Linear search / sequential search
- Breadth-first search
- Depth-first search
- Jump search
- Exponential search
- Fibonacci Search
Conclusion
There are more search algorithms out there. This is by no means a comprehensive list, and by no means are you expected to know all of them. A single algorithm can be used as the basis for other searches. With that said, there is no silver bullet. You choose what works based on the constraints at hand.
References:
Mohammed, A. S., Amrahov, Ş. E., & Çelebi, F. V. (2021). Interpolated binary search: An efficient hybrid search algorithm on ordered datasets. Engineering Science and Technology, an International Journal, 24(5), 1072–1079. doi.org/10.1016/j.jestch.2021.02.009
Manolopoulos, Y., Theodoridis, Y., & Tsotras, V. (2012). Advanced Database Indexing. In Google Books. Springer Science & Business Media. books.google.com/books?hl=en&lr=&id..
Originally published at marvinkweyu.net/talks/binary_vs_interpolati..