2 minute read

Word break.

Problem description

Taken directly from LeetCode.

Given a string s and a dictionary of strings wordDict, return true if s can be segmented into a space-separated sequence of one or more dictionary words.

Note that the same word in the dictionary may be reused multiple times in the segmentation.

Example 1:

Input: s = “leetcode”, wordDict = [“leet”,”code”]
Output: true
Explanation: Return true because “leetcode” can be segmented as “leet code”.

Example 2:

Input: s = “applepenapple”, wordDict = [“apple”,”pen”]
Output: true
Explanation: Return true because “applepenapple” can be segmented as “apple pen apple”.
Note that you are allowed to reuse a dictionary word.

Example 3:

Input: s = “catsandog”, wordDict = [“cats”,”dog”,”sand”,”and”,”cat”]
Output: false

Constraints:

  • 1 <= s.length <= 300
  • 1 <= wordDict.length <= 1000
  • 1 <= wordDict[i].length <= 20
  • s and wordDict[i] consist of only lowercase English letters.
  • All the strings of wordDict are unique.

Solution type

Caching via DP.

Solution

What comes to mind first of all is that if I know the first four letters of "leetcode" are the word leet in the wordDict, then I should be able to use that information to save time when figuring out whether the rest of the word consists of dictionary terms. In other words, the subproblem relationship should be something like “given what I know about whether the previous \(n - 1\) characters can be formed by words in wordDict, are all \(n\) characters formable by words in wordDict?”. But it can’t be something exactly like that because leetcod is not formable by wordDict words, but leetcode is. I think this is a sign that we will have a higher time complexity because we need to take \(O(n)\) time to figure out how far back we can go before getting to a “good” spot.

The way I worked through this ended up being \(O(n^3d)\), where \(n\) is the length of s and \(d\) is the size of the dictionary wordDict. Starting from the beginning of s, build a substring, and mark in a memo at every index where the current substring is in wordDict. Repeat this process from every index that occurs immediately after an index in memo that holds a true value; this is to catch “code” in “leetcode”, “apple” and the second “pen” in “penapplepen”, etcetera. For \(O(n)\) characters in s, we perform a lookup in wordDict that takes \(O(nd)\) time; this repeats \(O(n)\) times (because we can start our substring at any place in the string / any position in the memo can be set to true). Multiply this all together and you get an \(O(n^3d)\) algorithm, which seems a little intimidating but turns out (I think) to be as fast as you can make this problem asymptotically.

bool wordBreak(string s, vector<string>& wordDict) {
    vector<bool> memo(s.size(), false);
    for (size_t start = 0; start < s.size(); ++start) {
        string substringSoFar;
        for (size_t current = start; current < s.size(); ++current) {
            substringSoFar.push_back(s[current]);
            if (find(wordDict.begin(), wordDict.end(), substringSoFar) != wordDict.end()) {
                memo[current] = true;
            }
        } 

        while (start < s.size() && !memo[start]) {
            ++start;
        }
    }
    return memo.back();
}

Leave a comment

Your email address will not be published. Required fields are marked. *

Loading...