Blind 75.20 - Word Break
Word break.
Problem description
Given a string s
and a dictionary of strings wordDict
, return true
if s
can be segmented into a space-separated sequence of one or more dictionary words.
Note that the same word in the dictionary may be reused multiple times in the segmentation.
Example 1:
Input: s = “leetcode”, wordDict = [“leet”,”code”]
Output: true
Explanation: Return true because “leetcode” can be segmented as “leet code”.
Example 2:
Input: s = “applepenapple”, wordDict = [“apple”,”pen”]
Output: true
Explanation: Return true because “applepenapple” can be segmented as “apple pen apple”.
Note that you are allowed to reuse a dictionary word.
Example 3:
Input: s = “catsandog”, wordDict = [“cats”,”dog”,”sand”,”and”,”cat”]
Output: false
Constraints:
1 <= s.length <= 300
1 <= wordDict.length <= 1000
1 <= wordDict[i].length <= 20
s
andwordDict[i]
consist of only lowercase English letters.- All the strings of
wordDict
are unique.
Solution type
Caching via DP.
Solution
What comes to mind first of all is that if I know the first four letters of "leetcode"
are the word leet
in the wordDict
, then I should be able to use that information to save time when figuring out whether the rest of the word consists of dictionary terms. In other words, the subproblem relationship should be something like “given what I know about whether the previous \(n - 1\) characters can be formed by words in wordDict
, are all \(n\) characters formable by words in wordDict
?”. But it can’t be something exactly like that because leetcod
is not formable by wordDict
words, but leetcode
is. I think this is a sign that we will have a higher time complexity because we need to take \(O(n)\) time to figure out how far back we can go before getting to a “good” spot.
The way I worked through this ended up being \(O(n^3d)\), where \(n\) is the length of s
and \(d\) is the size of the dictionary wordDict
. Starting from the beginning of s
, build a substring, and mark in a memo at every index where the current substring is in wordDict
. Repeat this process from every index that occurs immediately after an index in memo
that holds a true
value; this is to catch “code” in “leetcode”, “apple” and the second “pen” in “penapplepen”, etcetera. For \(O(n)\) characters in s
, we perform a lookup in wordDict
that takes \(O(nd)\) time; this repeats \(O(n)\) times (because we can start our substring at any place in the string / any position in the memo can be set to true). Multiply this all together and you get an \(O(n^3d)\) algorithm, which seems a little intimidating but turns out (I think) to be as fast as you can make this problem asymptotically.
bool wordBreak(string s, vector<string>& wordDict) {
vector<bool> memo(s.size(), false);
for (size_t start = 0; start < s.size(); ++start) {
string substringSoFar;
for (size_t current = start; current < s.size(); ++current) {
substringSoFar.push_back(s[current]);
if (find(wordDict.begin(), wordDict.end(), substringSoFar) != wordDict.end()) {
memo[current] = true;
}
}
while (start < s.size() && !memo[start]) {
++start;
}
}
return memo.back();
}
Leave a comment
Your email address will not be published. Required fields are marked. *