Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

当分词模型加载失败时,直接导致Tomcat死掉 #116

Closed
KevinGF opened this issue Jan 11, 2016 · 5 comments
Closed

当分词模型加载失败时,直接导致Tomcat死掉 #116

KevinGF opened this issue Jan 11, 2016 · 5 comments

Comments

@KevinGF
Copy link

KevinGF commented Jan 11, 2016

错误重现方法:
1.使用hanlp-1.2.8-release.zip中带的hanlp.properties,仅修改root属性
2.使用data-for-1.2.8-standard.zip(如果用full版date包则不会出现此错误)

建议:
1.应用内部发生错误时不应导致Tomcat死掉,建议增加相应错误预防处理机制或友好的异常机制,例如加载前先判断文件是否存在
2.standard和full分别提供hanlp.properties参考文件

参考日志堆栈信息如下(Tomcat版本8.0.18,不知与Tomcat有无关系):
12-Jan-2016 01:05:47.794 INFO [main] org.apache.catalina.startup.Catalina.start Server startup in 13663 ms
12-Jan-2016 01:08:31.672 SEVERE [http-apr-8080-exec-2] com.hankcs.hanlp.model.CRFSegmentModel. CRF分词模型加载 C:/xxxx/demo_prj/WebContent/WEB-INF/hanlp_data/data/model/segment/CRFSegmentModel.txt 失败,耗时 9 ms
12-Jan-2016 01:08:31.679 INFO [Thread-3] org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler ["http-apr-8080"]
12-Jan-2016 01:08:31.684 INFO [Thread-3] org.apache.catalina.core.StandardService.stopInternal Stopping service Catalina
12-Jan-2016 01:08:33.857 WARNING [localhost-startStop-2] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The web application [demo_prj] is still processing a request that has yet to finish. This is very likely to create a memory leak. You can control the time allowed for requests to finish by using the unloadDelay attribute of the standard Context implementation. Stack trace of request processing thread:
java.lang.Object.wait(Native Method)
java.lang.Thread.join(Thread.java:1245)
java.lang.Thread.join(Thread.java:1319)
java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:106)
java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46)
java.lang.Shutdown.runHooks(Shutdown.java:123)
java.lang.Shutdown.sequence(Shutdown.java:167)
java.lang.Shutdown.exit(Shutdown.java:212)
java.lang.Runtime.exit(Runtime.java:109)
java.lang.System.exit(System.java:968)
com.hankcs.hanlp.model.CRFSegmentModel.(CRFSegmentModel.java:43)
com.hankcs.hanlp.seg.CRF.CRFSegment.segSentence(CRFSegment.java:49)
com.hankcs.hanlp.seg.Segment.seg(Segment.java:422)
org.apache.jsp.hanlp.index_jsp._jspService(index_jsp.java:315)
org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
javax.servlet.http.HttpServlet.service(HttpServlet.java:725)
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:431)
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:396)
org.apache.jasper.servlet.JspServlet.service(JspServlet.java:340)
javax.servlet.http.HttpServlet.service(HttpServlet.java:725)
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:291)
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:501)
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:142)
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:610)
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:516)
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1086)
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:659)
org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:285)
org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpoint.java:2431)
org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:2420)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
java.lang.Thread.run(Thread.java:745)
12-Jan-2016 01:08:33.919 INFO [Thread-3] org.apache.coyote.AbstractProtocol.stop Stopping ProtocolHandler ["http-apr-8080"]

@KevinGF
Copy link
Author

KevinGF commented Jan 11, 2016

查看了一下代码,使用了“System.exit(-1);”,建议使用异常处理机制或者其它更友好的方式。
根据搜索,有16处使用了System.exit
https://github.com/hankcs/HanLP/search?utf8=✓&q=System.exit

static
{
    logger.info("CRF分词模型正在加载 " + HanLP.Config.CRFSegmentModelPath);
    long start = System.currentTimeMillis();
    crfModel = CRFModel.loadTxt(HanLP.Config.CRFSegmentModelPath, new CRFSegmentModel(new BinTrie<FeatureFunction>()));
    if (crfModel == null)
    {
        logger.severe("CRF分词模型加载 " + HanLP.Config.CRFSegmentModelPath + " 失败,耗时 " + (System.currentTimeMillis() - start) + " ms");
        System.exit(-1);
    }
    else
        logger.info("CRF分词模型加载 " + HanLP.Config.CRFSegmentModelPath + " 成功,耗时 " + (System.currentTimeMillis() - start) + " ms");
}

@hankcs
Copy link
Owner

hankcs commented Jan 13, 2016

感谢建议,我再总结一下。

  1. 大部分开源分词器都采用类加载时的静态初始化方法,HanLP也未能免俗。
  2. HanLP在设计的时候是一种fail-fast的思路,既然配置不对,则尽快提示用户有问题。
  3. 其实在最初HanLP是以抛异常来终止程序的,但很多初级用户认为异常就代表bug。
  4. 未来HanLP的加载异常将逐步恢复为运行时异常,可以通过如下代码进行捕获:
        try
        {
            HanLP.Config.CRFSegmentModelPath = "illegal path";
            Segment segment = new CRFSegment();
            System.out.println(segment.seg("有句谚语叫做一个萝卜一个坑儿"));
        }catch (Throwable throwable)
        {
            System.err.println("启动失败");
        }

@KevinGF
Copy link
Author

KevinGF commented Jan 14, 2016

采用类加载时的静态初始化方法,则意味着是在程序第一次访问到此类时才会加载,这样有可能是在最终用户访问时才触发加载操作,尤其是在模型较大的情况下,加载时间可能较长,会影响到最终用户体验。
可以考虑现有机制基础上增加一个强制加载的方法供开发者手工调用,使开发者可以在应用服务器启动的时候强制提前加载词典与模型数据。从而在最终用户访问时,不会因此而增加等待时间。

@hankcs
Copy link
Owner

hankcs commented Jan 14, 2016

其实程序也可以在启动的时候主动调用一次segment.seg,主动触发加载逻辑,达到相同的效果。

@KevinGF
Copy link
Author

KevinGF commented Jan 15, 2016

是的,主动调用一次segment.seg可以作为一个临时性的达到预加载词典和模型效果的方法

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants