Binary Indexed Tree (Fenwick Tree)

Binary Indexed Trees (also known as Fenwick Trees) are useful for finding sum ranges between two elements in a set of numbers, particularly when we need to update the numbers often. If we have a static set of values then we can just pre-compute cumulative sums and then to get a range between i and j we would do sums[j]-sums[i-1]. However, if one of the numbers changes then we have to update O(n) values in the sums array. The Binary Indexed Tree (BIT) is useful as we can quickly update values in O(log n) time and quickly query sum ranges in O(log n) time. It is quite compact as it just uses an array with an element for each number in our input set. The logic for updating and reading is also concise with just a few lines for each.

I have created a slideshow with the steps to create a BIT and the steps to query the BIT to calculate sum ranges. Click through to see the actual slideshow:
https://adamk.org/BinaryIndexedTreeAnimation/BinaryIndexedTree.html

Additional BIT information:
https://www.topcoder.com/community/competitive-programming/tutorials/binary-indexed-trees/
https://www.youtube.com/watch?v=CWDQJGaN1gY (Tushar Roy)

Posted in programming | Tagged , | Leave a comment

Boyer-Moore Majority Vote Algorithm

If we need to find a number that appears in more than half of the elements in an array, we can do so in O(n) time and O(1) space using the Boyer-Moore majority vote algorithm. We use a variable to keep track of a nominated number (nomnum) and another variable to keep track of a count for that number. We iterate through the array and for each number we do:

  • If count is zero then set nomnum to the current array number and count to 1.
  • Otherwise, if current array number == nomnum then we increment count.
  • Otherwise decrement count.

Here’s a chart for an example run from wikipedia:

Here’s the c++ source code:

If we are not guaranteed to have a majority element then we can do a further pass of the array to determine if the nominated number is a majority element.

It is also possible to find up to two majority elements that appear more than n/3 times in an array. In the algorithm we use two nominated number variables and two counter variables. The key difference is that if the counts are not zero and the current array number is different from the current numbers then we need to decrement both counts. Here is the source code:

Posted in programming | Tagged , | Leave a comment

Smart pointers

Disclaimer: This is post is just showing example smart pointer implementations for illustrative purposes. In production code I would recommend using the standard library unique_ptr and shared_ptr instead.

A basic smart pointer will wrap a pointer, so the object gets destroyed that when the pointer goes out of scope. This is particularly useful if there are returns in nested if statements or there are exceptions. We don’t need to have separate delete’s in every possible exit path. Here is a basic implementation:

I’ve left out a few things to keep it simple, such as move semantics (which we probably want to disable move constructor and move assignment).

If we want to use multiple pointers to refer to the same object, we can use a shared pointer. In the standard library this is called shared_ptr. It is implemented by reference counting. So each pointer that points to the same object also has a shared reference count, which is incremented when a new copy of the pointer is created and decremented when a copy of the pointer is deleted. When the count goes to zero, we delete the object and reference count. Here is the general idea of how this is implemented:

Again, there are some things left out to keep it simple, e.g. move semantics and using the copy & swap idiom.

Posted in programming | Tagged , | Leave a comment

Binary Search

Whenever I revisit binary search problems, I keep having to work out the edge cases whenever coming up with the algorithm. So I thought I’d make a note of the pseudo code template I always end up with:

l=0, r=a.size()-1
while (l <= r) {
   m = l+(r-l)/2
   if (a[m] == target) return m
   if (a[m] < target) l = m+1
   else r = m-1
}
return -1  //not found 

Note that we use m = l+(r-l)/2, rather than m=(l+r)/2, to ensure that we don’t get integer overflow.

We don’t have to have an array for a binary search. If we want to find sqrt(x), then we can set left=0 and right=x. We then test if mid is our sqrt number by squaring it: m*m.

We can also use floating point numbers. But then we need a different end case based on precision. So, to find sqrt(x) for a floating point number we would do something like:

l=0, r=x
while (r-l > 0.001) {
   m = l+(r-l)/2
   if (m*m == x) return m
   else if (m*m > x) r = m
   else l = m
}
return m

There’s a chance we will guess m exactly, in which case we return m in the loop. But most likely we will get to within our precision limit and exit the loop and then we will return m.

We can use comparison constraint rather than an exact target match by adapting our binary search algorithm. So, let’s say we want to find the smallest value bigger than or equal to x, i.e. find first value >= x. We will use an answer variable (ans) to keep track of the latest mid value we’ve found that satisfies our criteria. If mid doesn’t satisfy our criteria then we look in the half that will get us closer, so in this case if our target is 6 and we find 2 then we need to search the right half. We can let the while loop run its course rather than terminating it early.

l=0, r=a.size()-1, ans=-1
while (l <= r) {
   m = l+(r-l)/2
   if (a[m] >= x) {
      ans = a[m]
      r = m-1
   }
   else 
      l = m+1
}
return ans

This is a nice video lecture about binary search:
https://www.youtube.com/watch?v=GU7DpgHINWQ

Posted in programming | Tagged , | Leave a comment

Leetcode 45: Jump Game II

Only a linear solution will pass the leetcode tests. Use a three counters as we iterate: jumps, current steps left, next steps left. We only use next jump when we run out of current steps left. Continually update next steps left if the current position has > next steps left. I tried various other ways but they all timed out. Recursive is most obvious. Then recursive with DP memo. Then iterative DP O(n^2). Finally linear.

Source code for recursive+memo solution (too slow for leetcode test cases):
https://gist.github.com/adamkorg/4cd63f9cf0045ed58c9d9fd20712ef46
Iterative DP solution (also too slow for leetcode test cases):
https://gist.github.com/adamkorg/6bbd0487c0e29ec1d824d156e400cfc7
And here’s the linear solution that passes the leetcode test cases:

Posted in programming | Tagged , , | Leave a comment

Leetcode 30: Substring with Concatenation of All Words

Put words into hash map, step through each char in s checking for words in the hashmap. If we find a word then step through to the character after that match and attempt to match remaining words with the hash map. We need two separate hash maps, one will hold the counts of the words (as they can appear more than once in the words array) and the other will hold the counts for the current matching sequence. I call then counts and countsMatch respectively. It doesn’t make much difference how we reset countsMatch, I initially looped through resetting each element but then used countsMatch.clear(). One optimisation that makes a big difference is if we stop our outer loop of s when it is impossible to make a match, i.e. words.size() * words[0].size() characters before the end of s.

Posted in programming | Tagged , , | Leave a comment

Leetcode 23: Merge k Sorted Lists

There are a few different ways to solve this. I found the easiest way was to do a merge k-1 times. So you merge the first two lists. Then you merge that merged list with list 3. And repeat that until all lists are merged. That results in O(kN) time and O(1) space.

We improve the speed by either using a heap/priorityqueue or a map to allow us to quickly select the next smallest node out of the current nodes in all the lists. So we add the heads of all the lists to a map with the value as the key. We then remove the smallest (begin of map) and add it to our merged list. Then we add a new map entry of the next node of the node we just processed (if there is one). Repeat that until the map is empty. This results in O(Nlogk) time and O(k) space.

Here is the O(kN) time solution code: https://gist.github.com/adamkorg/665adeec343a9a114e1088833f96dbb8
And here is the O(Nlogk) solution code:

Posted in programming | Tagged , , | Leave a comment

Leetcode 10: Regular Expression Matching

The Kleene stars (the * wildcard character for regular expressions) make this difficult. We can’t just gobble up all the wildcard matching characters, otherwise subsequent matches might not work. One way to solve this is by recursing every possibility, so if we have p=“.*abc” and s=“xyzabcabc” then we call our match function with offsets match(s,p,0,2) then match(s,p,1,2) then match(s,p,2,2), etc.. until we have a complete match for the rest of the string. Note that once we’ve read all of string s then we need to step over any trailing wildcards, which don’t need any characters to match.

We can speed this up by using Dynamic Programming memoization to cache previous results. I have included the code for this below the non-memoized embedded code below.

DP solution: https://gist.github.com/adamkorg/67f7f5bd9eb1c8ae73f1e55c1d54d1b9

Posted in programming | Tagged , , | Leave a comment

Leetcode 4: Median of Two Sorted Arrays

Let’s call our two arrays A and B, where they have sizes a and b. A key insight is that we need to partition both arrays such that the left of both will be the same size as the right of both. The only exception is if we an odd total number of elements, in which case we will put the extra element in the left partition. Our partition index will be the start of the right hand partition. The partition can be before the first element 0, and up to after the last element arr.size(). Start with the smallest array as A. Start with our binary search constraints (hi and lo) as 0 to arr.size(). Calculate partitionA as (lo+hi)/2. Calculate partitionB as (a+b+1)/2-partA, where a and b are lengths of the arrays. Calculate the elements next to the partition. One thing to note here is that we use INF and -INF if an array’s partition is empty, which enables our comparisons to work. Check if we have reached our end condition, which is maxALeft <= minBRight and maxBRight <= minALeft, and if we have then return the value closest to the partition from left if odd or average of two closest values to partition from both left and right if even. Otherwise if not end condition the adjust the binary search bounds and loop again.

Key points:

  • Understand that we will partition both arrays and they need to be partitioned such that both left sides are the same size (or one more) than both right sides.
  • Understand we can use binary search to try to find the correct partition position in log(len) time.
  • The correct partition position will be when maxALeft <= minBRight and maxBLeft <= minARight. Where maxALeft is the biggest value in the left partition of array A.
  • The partition can be between any of the numbers in the array and can be before the first element and after the last element.
  • Start with binary search of first array (smallest array) to be lo=0 and hi=a.
  • Calculate partition A = (lo + hi) / 2
  • Calculate partition B = (a+b+1) / 2 – partA
  • Check if we are at the correct end partition position using calculated boundary values and if so then calculate median. If odd total number of elements then median is biggest value in left partition, which is: max(maxALeft, maxBLeft). If even then we need to get average of biggest in left and smallest in right, which is: max(maxALeft, maxBLeft)+min(minARight, minBRight) / 2.
  • If not at end partition position then step forward our binary search. If partition A is too far to right (maxAleft > minBRight) then we need to move our search to left half of search bounds, so: hi = partitionA – 1. The -1 is important as it allows us to get in front of the array. Otherwise move search to right half of search bounds, so: lo = partitionA + 1
Posted in programming | Tagged , , | Leave a comment

Trie Data Structure (Prefix Tree)

Use a TrieNode class that has a next member which is an array of 26 TrieNode pointers that represents a pointer to TreeNode object for each of 26 letters in the alphabet. We initialise these TrieNode pointers to NULL. We also have a bool member wordEnd initialised to false. An alternative, solution would be to use a hash map rather than vector to contain the next TrieNode pointers, which could contain a wider range of character sets.

class TrieNode {
public:
    vector next;
    bool wordEnd;
    TrieNode() : next(26, NULL), wordEnd(false) {};
};

We can then create a Trie class which will wrap the root TrieNode and a set of methods.
The Trie constructor creates a root TrieNode pointer member.
The Insert method steps through letters of the word, creating TrieNode objects and pointers in next as we go. Finally it sets wordEnd on the last letter.
The Search method goes through each letter in the word traversing through TrieNodes and returns wordEnd state on the final letter.
StartsWith is similar to search but it does not need the wordEnd check. I also created destructor that deleted all the TrieNode objects recursively.

Posted in programming | Tagged , | Leave a comment