6 minute read

Longest increasing subsequence.

Problem description

Taken directly from LeetCode.

Given an integer array nums, return the length of the longest strictly increasing subsequence.

Example 1:

Input: nums = [10,9,2,5,3,7,101,18]
Output: 4
Explanation: The longest increasing subsequence is [2,3,7,101], therefore the length is 4.

Example 2:

Input: nums = [0,1,0,3,2,3]
Output: 4

Example 3:

Input: nums = [7,7,7,7,7,7,7]
Output: 1

Constraints:

  • 1 <= nums.length <= 2500
  • -104 <= nums[i] <= 104

Follow up: Can you come up with an algorithm that runs in O(n log(n)) time complexity?

Solution type

Caching via DP.

Solution

The \(O(2^n)\) brute force solution is to consider every single subsequence. You either include or exclude each element, hence the exponential time.

What I’d like to do instead is to somehow build the solution off of previous subsequences. In other words, looking at the first 5 elements and deriving the minimal subsequence there, figure out what the minimal subsequence would be after including the 6th element. However, this doesn’t work very nicely because they don’t fit very nicely into subproblems; in the first example, the longest subsequence within the first four elements is [2,5], which is not a subsequence of the global optimal solution [2,3,7,101]^0. Additionally, the hint is somehow suggesting doing something \(O(\log n)\) for every element, which implies binary search somehow?

I think you can derive an \(O(n^2)\) solution like this. The memo stores the length of the longest increasing subsequence that terminates at a given element. In a nested loop, do a pass over the memo for each element that comes before the one you are currently examining, and for each prior element that is less than the current one, do something like currentBest := max(currentBest, priorBest + 1), where currentBest := 1 on initialization. Basically, compare the longest sequence you could’ve made so far to the best possible with a given prior. Do this for every item and return the maximum element in the memo.

It feeeeeeels like there’s an optimization to make here in regards to repeating indices. If you’re adding to a chain that goes [2,3,5,7], then if you add 9 to that chain you shouldn’t bother checking 2,3,5. But that gets hairy, so let’s make sure our initial idea works.

int lengthOfLIS(vector<int>& nums) {
    vector<int> memo(nums.size(), 0);
    memo[0] = 1;
    for (int i = 1; i < nums.size(); ++i) {
        int currentBest = 1;
        for (int j = 0; j < i; ++j) {
            if (nums[j] < nums[i]) {
                currentBest = max(currentBest, memo[j] + 1);
            }
        }
        memo[i] = currentBest;
    }
    return *max_element(memo.begin(), memo.end());
}

It does! So how about this \(O(n \log n)\) thing? I feel like the answer might involve expanding our memo, but I’m not sure how.

Okay, so right now we are doing \(O(n)\) operations per element, for \(n\) elements. We can’t skip any elements, so somehow we need to turn that \(O(n)\) into \(O(\log n)\). We don’t need to check all \(O(n)\) elements that come before a given element; we just care about finding the length of the longest possible subsequence formable out of the previous elements that are less than it.

The solution that comes to mind is to use a different data structure for our memo. The total cost of finding out which elements are smaller + finding the maximum memoized subsequence length out of those smaller elements should be \(O(\log n)\). A binary search tree of some sort feels natural; cut the tree so that you only examine the smaller than elements, then find the item with the highest value out of all key-value pairs. But we’re searching on two different things and that causes some weirdness; we can’t…use the same tree?

I am thinking of heaps too, but heapify is \(O(n)\), and because you’re changing which elements you’re looking at you can’t maintain a single heap throughout (at least not in a nice way).

Hmm, here’s a different thought. We only care about relative position, not exact position, so what if we kept using a memo array but we sorted it after each iteration? Binary search on the key (the actual value of the number in nums) to figure out the largest previous value less than the current value, then just write a key-value pair of current number : largest previous key-value pair’s value plus one. It should be correct to just look at the largest previous value rather than all of them. I’m stupid, sorting is \(O(n \log n)\). BUT I think this makes the tree work?

int lengthOfLIS(vector<int>& nums) {
    map<int, int> memo;
    using pair_type = decltype(memo)::value_type;
    memo[nums[0]] = 1;
    for (int i = 1; i < nums.size(); ++i) {
        if (memo.lower_bound(nums[i] - 1) != memo.end()) {
            memo[nums[i]] = 1;
        }
        else {
            auto highestPrevious = memo.lower_bound( nums[i] - 1);
            memo[nums[i]] = highestPrevious->second + 1;
        }
    }
    return max_element(memo.begin(), memo.end(), 
    [] (const pair_type & p1, const pair_type & p2) {
        return p1.second < p2.second;
    })->second;
}

This isn’t quite it, but I swear it’s close. Out of time though so I’ll have to pick this up later…

Okay, coming back to this. So the \(O(n \log n)\) approach is pretty subtle: maintain a list of tails - a tail at a given index i represents the last element of a possible longest increasing subsequence of length i - and for every element of nums, use binary search to find the minimum length subsequence that it can be a tail for (or if it is greater than all tails currently stored, append it as a new element to your tails vector). This means that you are comparing each element to the tails stored in tails and overwriting the tails memo. Essentially, at every iteration of the algorithm, you insert the current element into the longest subsequence you’ve made so far that can contain that element.

You don’t even need the logic with longest in the solution I submitted; you can determine the based on the highest index of (or if you intialize tails to be an empty vector and push back elements instead of setting at indices, just return the size of the vector - but this requires you to wrap the tails[low] = num in a conditional if you’re over the size of tails). For \(n\) elements in nums, you do \(O(\log n)\) work for each one, which results in \(O(n \log (n))\) total time complexity.

Note that if this were nondecreasing instead of increasing, you’d have to use if (num < tails[mid]) instead of if (num <= tails[mid]).

int lengthOfLIS(vector<int>& nums) {
    vector<int> tails(nums.size(), -1);
    int longest = 0;
    for (int num : nums) {
        int low = 0, high = longest;
        while (low < high) {
            int mid = (low + high) / 2;
            if (num <= tails[mid]) {
                high = mid;
            }
            else {
                low = mid + 1;
            }
        }
        tails[low] = num;
        longest = max(low + 1, longest);
    }
    return longest;
}

Wikipedia (contributor Shiyu Ji) has this nice GIF to showcase the algorithm a little more clearly.

GIF of efficient longest increasing subsequence algorithm.

I think I was starting to converge on the tails idea, but it wasn’t that close at the end. It is caching and I guess it is DP (where the subproblem you’re using is “what is the ?”), but you have to think of the problem in such a different way that I don’t feel too bad for missing this one first try. Hey, I got the \(O(n^2)\) solution!

Leave a comment

Your email address will not be published. Required fields are marked. *

Loading...