Notes by Brandon Yu (brandon.yu@digipen.edu)
DISCLAIMER: It should be obvious, but Prahbu’s notes should always have higher priority than these notes. These should purely be supplementary to the slides. When in doubt, assume that I’m wrong.
Teacher: Prabhu (prabhu.n@digipen.edu)
Data structures are a particular way of organizing data in a computer so that it can be used efficiently (Wikipedia).
What we need to know about structures and their use in algorithms:
One of the simplest data structures, the queue has a beginning and end, called the front and back of the queue.
Data enters in one end, goes out the other, like a FIFO structure, or like a checkout line in a supermarket.
Contiguous.
Bad for sorting - think about how you have to move through each element in an array.
A Binary Tree is organized like an upside down tree. Topmost (first node) is the root node. Pointer to the root node represents the tree.
Each “node” on the tree has three things:
Non-contiguous.
Very good for sorting (“Left” can be “smaller” and “Right” can be “larger”).
As a result, a binary tree has no duplicates, which is fine for its usage as an indexing tool. The exception is for when you have a “counter” to count duplicate values, but that’s one extra data type in each node.
You know what a linked list is.
Each node has data and a pointer to the next node. Last one has a pointer to null. Super simple.
Non-contiguous. (Obviously)
Food for thought - what happens when you delete a node in a linked list (have the previous one point to the next next one?) In an array, deleting something would result in an empty space (if you want to do it “efficiently”), but for a linked list it is easy to “reduce” the total amount of memory used.
Should Allocate(size) use best fit, first fit, next fit or worst fit?
All this so far has been external fragmentation. (Maybe? Unclear, have to do research.)
TODO: RESEARCH INTO INTERNAL AND EXTERNAL FRAGMENTATION - STILL UNCLEAR.
Assuming we have no alignment:
struct Student1{ int Age; //4 bytes char I1; //1 byte char I2; // 1 byte}
Student1 * pS = new Student1[2];
//Let's say each word is 4 bytes.
//If I ask for pS[0].Age, 1 operation is done.ps[0].I1, 1 operation is also done.pS[1].Age, first two bytes of age is in second word, last two bytes of age are in third word. So 2 operations are required - second AND third words have to be accessed.
struct Student1{ char I1; //1 byte int Age; //4 bytes char I2; // 1 byte}
pS[0].Age -> 2 operations
pS[0].I1 -> 1 operation
pS[0].I2 -> 1 operation
pS[1].Age -> 2 operations
Solution:
Variables important to analyzing algorithms:
Big-O Notation assumes the worst (longest) asymptotic computational time. So, in a conditional, it will go with the worst case every time.
For Big-O Notation, all operations take a constant time. Big-O Notation is a theoretical analysis, and does not actually glean the actual computing time required for the operation.
The dominant term is the term the one that gets biggest (i.e. dominates) as N gets bigger. So in O(n^2+2n+100), n^2 is the dominant term. An easy way to think about it is that if you make N go towards infinity, what encapsulates the vast majority of the computation time? That is the dominant term.
If you get confused, just assume n is some constant like 10 and try to calculate the value…
Useful Links:
https://rob-bell.net/2009/06/a-beginners-guide-to-big-o-notation/
https://stackoverflow.com/questions/2307283/what-does-olog-n-mean-exactly
O(1) describes an algorithm that will always execute in the same time (or space) regardless of the size of the input data set.
O(N) describes an algorithm whose performance will grow linearly and in direct proportion to the size of the input data set. The example below also demonstrates how Big O favours the worst-case performance scenario; a matching string could be found during any iteration of the for loop and the function would return early, but Big O notation will always assume the upper limit where the algorithm will perform the maximum number of iterations.
O(N^2) describes an algorithm whose performance will grow linearly and in direct proportion to the size of the input data set. The example below also demonstrates how Big O favours the worst-case performance scenario; a matching string could be found during any iteration of the for loop and the function would return early, but Big O notation will always assume the upper limit where the algorithm will perform the maximum number of iterations.
bool ContainsDuplicates(IList<string> elements)
{
for (var outer = 0; outer < elements.Count; outer++)
{
for (var inner = 0; inner < elements.Count; inner++)
{
// Don't compare with self
if (outer == inner) continue;
if (elements[outer] == elements[inner]) return true;
}
}
return false;
}
O(2^N) denotes an algorithm whose growth doubles with each additon to the input data set. The growth curve of an O(2^N) function is exponential - starting off very shallow, then rising meteorically. An example of an O(2N) function is the recursive calculation of Fibonacci numbers.
int Fibonacci(int number)
{
if (number <= 1) return number;
return Fibonacci(number - 2) + Fibonacci(number - 1);
}
FASTER (less dominant)
O(1)
O(log n)
O(n)
O(n log n)
O(n^2)
O(n^3)
O(n^k) - Polynomial
O(2^n) - Exponential
SLOWER (more dominant)
g() <- O(N)
for (j=0;j<N;++j) g(j) <- O(N^2)
Theres Big-Oh, Big-Theta, Big-Omega, but the latter two are described in CS330, not here.
Note that for this list, it’s specifically talking about when you have a subscript.
Great website suggested by Louis:
http://bigocheatsheet.com/
From http://bigocheatsheet.com/:
Note: These are just methods I use to memorize. Feel free to ignore this section.
Bubble Sort: Elements “bubble” to the top of the list, by continuous comparisons and swaps. “Bubbles” on the top are regarded as sorted.
Selection Sort: Two groups, sorted and unsorted. The smallest from the
unsorted (which is still bigger than the largest in the sorted) is selected and put in the sorted at the end of the sorted group.
Insertion Sort: Two groups, sorted and unsorted. The first in the unsorted is inserted into the sorted group.
Merge Sort: Every element is an individual group at the start. Sorted elements are merged then sorted.
Quick Sort: I dunno how the fuck to memorize this one easily. gg.com.sg
Example of the usage of a sorting algorithm:
a1 -> a2 -> a5 -> a3 -> a4
Goes through algorithm (Sort)
a1-> a2 -> a3 -> a4 -> a5
i+j==9
. A weaker invariant that is also true is that i >= 0 && i < 10
(because this is the continuation condition!) or that j <= 9
&& j >= 0
.int j = 9;
for(int i=0; i<10; i++)
j--;
https://visualgo.net/bn/sorting is extremely useful for visualization of the different sorts.
Assume an input array of
5 3 8 4 6
^ ^
First, compare the ith element and the (i+1)th element.
5 is more than 3, so let’s do a swap.
3 5 8 4 6
^ ^
Keep increasing i, iterating through…
3 5 8 4 6
^ ^
3 5 4 8 6
^ ^
3 5 4 8 6
^ ^
3 5 4 6 8 <- Note the highest number is in the last place.
That’s one pass.
Now the highest element (8) is in the correct position, at the end. So we ignore it on the next pass. Similarly, on the next pass we ignore the second last, and so on.
Keep doing passes until the array is sorted.
void bubbleSort(int arr[], int n)
{
int i, j;
for (i = 0; i < n-1; i++)
// Last i elements are already in place
for (j = 0; j < n-i-1; j++)
if (arr[j] > arr[j+1])
swap(&arr[j], &arr[j+1]);
}
void bubbleSort(int arr[], int n) {
bool swapped = true;
int j = 0;
int tmp;
while (swapped) {
//Note the boolean - it's important in this adaptive implementation
swapped = false;
j++;
for (int i = 0; i < n - j; i++) {
if (arr[i] > arr[i + 1]) {
tmp = arr[i];
arr[i] = arr[i + 1];
arr[i + 1] = tmp;
swapped = true;
}
}
}
}
(From Wikipedia)
The algorithm divides the input list into two parts: the
sublist of items already sorted, which is built up from left to right at
the front (left) of the list, and the sublist of items remaining to be
sorted that occupy the rest of the list. Initially, the sorted sublist
is empty and the unsorted sublist is the entire input list. The
algorithm proceeds by finding the smallest ((or largest, depending on
sorting order) element in the unsorted sublist, exchanging (swapping) it
with the leftmost unsorted element (putting it in sorted order), and
moving the sublist boundaries one element to the right.
/* a[0] to a[n-1] is the array to sort */
int i,j;
int n;
/* advance the position through the entire array */
/* (could do j < n-1 because single element is also min element) */
for (j = 0; j < n-1; j++)
{
/* find the min element in the unsorted a[j .. n-1] */
/* assume the min is the first element */
int iMin = j;
/* test against elements after j to find the smallest */
for (i = j+1; i < n; i++) {
/* if this element is less, then it is the new minimum */
if (a[i] < a[iMin]) {
/* found new minimum; remember its index */
iMin = i;
}
}
if(iMin != j)
{
swap(a[j], a[iMin]);
}
}
From Wikipedia:
Insertion sort iterates, consuming one input element each
repetition, and growing a sorted output list. At each iteration,
insertion sort removes one element from the input data, finds the
location it belongs within the sorted list, and inserts it there. It
repeats until no input elements remain.
Sorting is typically done in-place, by iterating up the array,
growing the sorted list behind it. At each array-position, it checks the
value there against the largest value in the sorted list (which happens
to be next to it, in the previous array-position checked). If larger,
it leaves the element in place and moves to the next. If smaller, it
finds the correct position within the sorted list, shifts all the larger
values up to make a space, and inserts into that correct position.
Main Idea: Keep inserting next element into the sorted sequence.
void InsertionSort (int a[], int N) //a is array, N is size of said array.
{
for(int i=1;i<N;++i)
{
int j=i; //Get first element from the unsorted side.
int current = a[i]; //Set current variable to the first element value.
//Find the correct position to insert into.
while((j>0) //While j is still within bounds and
&& (a[j-1]>current)) //The current value is more than the current
{
a[j]=a[j-1];
--j;
}
//Insert.
a[j] = current;
}
}
}
So a quick primer on recursive functions. When doing recursive functions, the parameter and signature of the function call will be added to said stack (called activation records, which are then executed in reverse order).
In other words, you’re fucking up your call stack and by extension, your memory usage. You should know this. For recursive functions, the Big-O Complexity required can be calculated by splitting the recursive function into separate recursive functions.
Reason this is relevant is that Merge Sort primarily uses recursion to sort.
Merge Sort is basically just the two steps repeated continuously.
To be more precise, we first divide the array into its elements. Then we merge them together into pairs in the correct order.
We start with
38 2 43 3 < Input
We divide them into 2 sections...
38 27 43 3 < Input
Sort each section using the below method (It's clearer later)
27 38 3 43 < Input
Okay, now choose smallest from the 2 lowest, 27 and 3.
3 is the smallest, so we strike that off the list.
27 38 43 < Input
3 < Output
Now we pick between 27 and 43.
38 43 < Input
3 27 < Output
Do it again.
43 < Input
3 27 38 < Output
Finally...
3 27 38 43 < Output
split each element into partitions of size 1
recursively merge adjancent partitions
for i = leftPartStartIndex to rightPartLastIndex inclusive
if leftPartHeadValue <= rightPartHeadValue
copy leftPartHeadValue
else: copy rightPartHeadValue
copy elements back to original array
void DoMergeSort ( int a[], int left, in right) {
if(left<right) {
unsigned const middle = (left+right)/2;
DoMergeSort(a, left, middle)
DoMergeSort(a, middle+1, right);
Merge(a, left, middle, right);
}
}
void Merge(int array[], int left, int middle, int right){ //right is the location of the end element of the right array, and left is the location of the end location of the left array, NOT one past the last.
unsigned i = left; // counter for the temp array
unsigned j = left; // counter for left array
unsigned k = middle + 1; // counter for right array
int* temp = new int [right+1];
while (j<=middle && k <=right)
if (array[j] <= array[k])
temp[i++] = array[j++];
else
temp[i++] = array[k++];
//Handle the leftovers, one of these two will be excuted.
while (j <= middle)
temp[i++] = array[j++];
while (k <= right)
temp[i++] = array[k++];
//Copy from temp array to input array.
for (i=left; i <= right; ++i)
array[i] = temp[i];
delete [] temp;
}
Complexity calculation:
Best Case:
Worst Case:
(From https://en.wikipedia.org/wiki/Quicksort)
Quicksort is a divide and conquer algorithm. Quicksort first divides a large array into two smaller sub-arrays: the low elements and the high elements. Quicksort can then recursively sort the sub-arrays.
The steps are:
Pick an element, called a pivot, from the array.
Partitioning: reorder the array so that all elements with values less than the pivot come before the pivot, while all elements with values greater than the pivot come after it (equal values can go either way). After this partitioning, the pivot is in its final position. This is called the partition operation.
Recursively apply the above steps to the sub-array of elements with smaller values and separately to the sub-array of elements with greater values.
The base case of the recursion is arrays of size zero or one, which never need to be sorted.
The pivot selection and partitioning steps can be done in several different ways; the choice of specific implementation schemes greatly affects the algorithm’s performance.
(From VisualAlgo)
for each (unsorted) partition
set first element as pivot
storeIndex = pivotIndex + 1
for i = pivotIndex + 1 to rightmostIndex
if element[i] < element[pivot]
swap(i, storeIndex); storeIndex++
swap(pivot, storeIndex - 1)
An Abstract Data Type is a data type (a set of values and a collection of operations on those values) that is accessed only through an interface (Hence, abstract, since functionality is hidden).
We refer to a program that uses an ADT as a client and a program that specifies the data type as an implementation.
If you want to grow your stack size and it’s not a linked-list stack, then it’s
TODO
TODO
TODO
Fast insertion/deletion/search.
Nodes are also known as vertexes.
An Edge is the connection (line) between one node to another.
Root node is the first node.
Leaf nodes are nodes without any children.
Parent node and Child node are also common terminology, but should be obvious.
Sibling nodes are nodes that share a common parent.
There’s also uncle nodes and grandfather nodes and grandgrandfather nodes but those aren’t gone into much detail.
Height of a tree is the “levels”, but not including a root node. (The level for just a root node is considered to be 0)
Sub-tree is a part of a tree including that node and its children and its childrens’ children, and so on.
Binary tree is at most 2 child nodes. (One is called left child node, one is called right child node) (Can have ONE child node)
M-ary Tree can be any number of child nodes.
Only 1 parent per node.
No cycles in the tree (no closed loops in the tree.) (Usually this breaks the above rule anyway)
No multiple paths from one node to another node.
Root node to leaf node there should be only one path
Level = 0, if it is root node. Level = Level (parent)+1, if it is a child node of some parent node.
Balanced in terms of height.
Balance factor
The constraint is generally applied recursively to every subtree. That is, the tree is only balanced if:
According to this, the next tree is balanced:
A
/ \
B C
/ / \
D E F
/
G
The next one is not balanced because the subtrees of C differ by 2 in their height:
A
/ \
B C <-- difference = 2
/ /
D E
/
G
That said, the specific constraint of the first point depends on the type of tree. The one listed above is the typical for AVL trees.
Red-black trees, for instance, impose a softer constraint.
A Binary Tree is full if every node has either 0 or 2 child nodes.
Trees have “left and right”, linked list is “previous and next”.
A is parent, B is left child, C is right child.
Different orders of traversal are useful for different sorting methods.
B,A,C
Algorithm In-order(tree)
A,B,C
Algorithm Pre-order(tree)
B,C,A
Algorithm Postorder(tree)
A Binary Search Tree (BST) is a tree in which all the nodes follow the below-mentioned properties −
The left sub-tree of a node has a key less than or equal to its parent node's key.
The right sub-tree of a node has a key greater than to its parent node's key.
Thus, BST divides all its sub-trees into two segments; the left sub-tree and the right sub-tree and can be defined as −
left_subtree (keys) ≤ node (key) ≤ right_subtree (keys)
External Nodes or Terminal Nodes are Leaf Nodes
Everything else is an internal node.
A binary tree with N internal nodes has N+1 external nodes (some may be empty/NULL).
A binary tree with N internal nodes has 2N links.
Does not have duplicate values.
A binary tree does not have to be balanced.
When the data is skewed all the way to left or right.
4 Cases:
Set parents pointer to this node to be null
Release the memory of the node.
Replace the node's data (including pointers) with the left/right child. Then delete the child's memory.
4 is trickiest. To solve it, we need the predecessor of the node, which is just going to be the largest node in its left sub-tree
Then, replace node X’s data with the predecessor’s data (NOT the pointers)
Then delete that node. Try to use the previous cases from there.
Promotion and rotation are the same thing.
https://en.wikipedia.org/wiki/Tree_rotation
In the above example, right rotation is P being promoted (going “up”)
AVL Tree is a balanced BST. For all nodes in the BST:
| Height of left sub tree - Height of right subtree | = 1 or 0.
http://www.geeksforgeeks.org/avl-tree-set-1-insertion/
Note that this doesn’t have to be remembered conventionally - it should make sense. You’re essentially taking the largest tree and reducing its size
Deletion is the same thing as insertion, with going in reverse order of the nodes, except instead of initial insertion, you delete.
Splay Trees are BSTs that are not balanced at all. They are great for operations that require a lot of cache look-ups.
Advantages:
Disadvantage:
Newly inserted elements into a splay tree are promoted to the root (hence why the locality of reference).
This is done similar to the AVL tree, with a list of nodes in the history.
We accomplish this 2 nodes at a time. If there is only 1 node ( the root ) left, just promote as per normal.
Know the applications of all of the data structures
Check whether something is BST: In-order traversal.