Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
文件准备
我把项目中的ACFilter,DFAFilter和其相关类文件,拷贝下来,删除了接口实现,其他原封不动
我自己写的是ACProFilter,
验证代码在MainTest类中:
测试思路
刚开始是三个filter依次运行打印结果和时间并且是单次执行,但是发现执行顺序对结果影响较大,第一个运行的较慢,然后又先循环10次进行预热处理,后面再依次运行发现次序差异依旧很大,分析可能是因为热点代码的问题。最终决定单个执行,for循环10万次,输出总用时,每次保证敏感词相同,三个fiter都初始化,每次调用只改变for循环执行的filter,其他全部不变。
执行结果和截图
预设过滤敏感词:babac
待匹配字符串: bababac
耗时比:6:10:3
2.预设过滤敏感词:abcde || dem
待匹配字符串: abcdem
耗时比: 5:9:3
3.预设过滤敏感词:abcde || abc
待匹配字符串: abcde
耗时比: 4:8:3
总结
将三组实两两对比不难得出下列结论:
对于abcde|| dem这种首位嵌套的敏感词组,DFA方法只能匹配到abcde即"首"类型,而AC和ACPro均能全部判断出,并且这两种方法耗时都小于DFA,但ACPro方法明显耗时更短;
对于abcde||abc这种具有包含关系的敏感词组,AC只能识别出最短的即abc,de无法过滤,但是ACPro和DFA均能全部判断,但ACPro方法明显耗时更短:
ACPro匹配精度更高100%(可以看算法分析验证),而AC和DFA君会对特定场景局部失效;同时ACPro效率更高,性能提升2~3倍左右。
(原来本地全是static方法,较快一点,匹配两个字的敏感词速度单次运行能夸张到1000多ns)