LeetCode 139. Word Break #69

Woodyiiiiiii · 2020-06-29T18:24:14Z

Given a non-empty string s and a dictionary wordDict containing a list of non-empty words, determine if s can be segmented into a space-separated sequence of one or more dictionary words.

Note:

The same word in the dictionary may be reused multiple times in the segmentation.
You may assume the dictionary does not contain duplicate words.

Example 1:

Input: s = "leetcode", wordDict = ["leet", "code"]
Output: true
Explanation: Return true because "leetcode" can be segmented as "leet code".

Example 2:

Input: s = "applepenapple", wordDict = ["apple", "pen"]
Output: true
Explanation: Return true because "applepenapple" can be segmented as "apple pen apple".
             Note that you are allowed to reuse a dictionary word.

Example 3:

Input: s = "catsandog", wordDict = ["cats", "dog", "sand", "and", "cat"]
Output: false

一开始我是用bruce force暴力递归的方法来做的，因为一开始想不到递归，所以从暴力递归开始，如果成功则尝试优化。Bruce force算法思路很暴力直接，就是对当前字符串遍历所有可能形成的子字符串，用集合matchList中保存，这是matches方法的实现逻辑(其中用了数组的contains方法，但不同于HashSet的常数级复杂度，仍是线性复杂度)。得到可匹配的字符串后，需要检查后序的字符串是否也匹配，所以截去匹配的字符串，继续递归调用当前函数，直至得到的matches方法返回的是字符串长度，这是dfs函数的逻辑。

总的来说，这种问题的暴力算法都是使用递归来探测所有可能性，返回一个结果。优点是简单缺点，缺点很明显，耗时费力，很多情况会被重复探测，而且往往得不到OJ的accepted。以下代码就是我用Bruce force的结果，OJ的检测结果是Time Exceed。

class Solution {
    public boolean wordBreak(String s, List<String> wordDict) {
        return dfs(s, wordDict, 0);
    }
    public boolean dfs(String s, List<String> wordDict, int i) {
        List<Integer> matchList = matches(s, wordDict, i);
        if (!matchList.isEmpty()) {
            for (int j : matchList) {
                if (j == s.length()) return true;
                else if (dfs(s, wordDict, j)) return true;
            }
        }
        return false;
    }
    public List<Integer> matches(String s, List<String> wordDict, int i) {
        List<Integer> matchList = new ArrayList<>();
        for (int j = i + 1; j <= s.length(); ++j) {
            String subStr = s.substring(i, j);
            if (wordDict.contains(subStr))
                matchList.add(j);
        }
        return matchList;
    }
}

接下来就是尝试优化暴力算法了。我参考了大佬的博客，总结下来有两种方法：第一种是用记忆化搜索，用记忆数组memo来保存所有已经计算过的结果，再下次遇到的时候，直接从 cache 中取，而不是再次计算一遍，为什么用数组结构呢？因为数组结构是固定的，如果用HashSet等其他存储结构是不固定的，而且意义不是很明确；第二种是DP迭代算法。这两者都可以解决此类问题，或者说如果一种可以解一个问题，另一种也一般可以。

初始化：首先把字典中所有字符串存入HashSet中，这样检测是否存在就实现了常数级时间复杂度。然后规定记忆数组 memo[i] 定义为范围为 [i, n] 的子字符串是否可以拆分，初始化为 -1，表示没有计算过，如果可以拆分，则赋值为1，反之为0。接着调用递归算法返回真或假，不同的是记忆数组要作为参数，实现记忆化搜索。

start作为子字符串的起点下标，如果start大于等于字符串长度，当然返回真；然后是记忆化数组memo的作用，如果当前memo[i]不为-1(已被记忆)，为1则可以返回true(因为以start起始的子字符串被记忆过有包含于字典中)，为0则返回false(无论如何start开头的子字符串都不会被包含)；如果没有被记忆，则作记忆过程，迭代以start开头的子字符串的在字典中的可能，注意先判断是否在字典中contains再递归判断，然后设置记忆为1，返回true，否则设置为0，返回false。

class Solution {
    public boolean wordBreak(String s, List<String> wordDict) {
        HashSet<String> wordSet = new HashSet<>(wordDict);
        int[] memo = new int[s.length()];
        for (int i = 0; i < memo.length; ++i)
            memo[i] = -1;
        return check(s, wordSet, 0, memo);
    }
    boolean check(String s, HashSet<String> wordSet, int start, int[] memo) {
        if (start >= s.length()) return true;
        if (memo[start] != -1) return memo[start] == 1;
        for (int i = start + 1; i <= s.length(); ++i) {
            if (wordSet.contains(s.substring(start, i)) &&
                    check(s, wordSet, i, memo)) {
                memo[start] = 1;
                return true;
            }
        }
        memo[start] = 0;
        return false;
    }
}

接下来是DP迭代算法的实现。大佬：“玩子数组或者子字符串且求极值的题，基本就是 DP 没差了”，虽然这道题没有求极值，但是玩子字符串也符合 DP 的状态转移的特点。DP算法的难点在于定义 dp 数组跟找出状态转移方程。我们创建一个一维数组DP，dp[i]的含义肯定跟下标i有关，我们不妨设dp[i]为字符串[0, i)的子字符串是否包含进字典中，所以dp数组长度比字符串长度大1。接下来是状态转移方程，对于某个元素dp[i]，我们需要确定它和上一个(或几个)元素的状态，设j < i，dp[j]和dp[i]怎么联系呢，显然dp[i]由dp[j]和contains(substring(j, i))共同判断，如果都为true，那么设dp[i]为true，然后不需要继续循环了，直接break。最后结果返回dp数组最后一个元素的布尔结果。

因为不需要递归，总体时间复杂度显然是平方级别O(n^2)。拿题目的Example 1举例：

l 

le e 

lee ee e 

_leet_  

leetc eetc etc tc c 

leetco eetco etco tco co o 

leetcod eetcod etcod tcod cod od d 

leetcode eetcode etcode tcode _code_

T F F F T F F F T

然后看一下Example 3的s = "catsandog", wordDict = ["cats", "dog", "sand", "and", "cat"]例子，cat和cats有两个分支，但到dp最后一个元素时没有影响。

class Solution {
    public boolean wordBreak(String s, List<String> wordDict) {
        HashSet<String> wordSet = new HashSet<>(wordDict);        
        boolean[] dp = new boolean[s.length() + 1];
        dp[0] = true;
        for (int i = 0; i < dp.length; ++i) {
            for (int j = 0; j <= i; ++j) {
                if (dp[j] && wordSet.contains(s.substring(j, i))) {
                    dp[i] = true;
                    break;
                }
            }
        }
        return dp[dp.length - 1];
    }
}

参考资料：

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LeetCode 139. Word Break #69

LeetCode 139. Word Break #69

Woodyiiiiiii commented Jun 29, 2020 •

edited

Loading

LeetCode 139. Word Break #69

LeetCode 139. Word Break #69

Comments

Woodyiiiiiii commented Jun 29, 2020 • edited Loading

Woodyiiiiiii commented Jun 29, 2020 •

edited

Loading