Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dictionary_decompounder loading fails : "IOException while reading word_list_path: Input length = 1" #17212

Closed
nellicus opened this issue Mar 21, 2016 · 4 comments

Comments

@nellicus
Copy link
Contributor

Not sure if I am missing something here (e.g. incorrect format of dictionary file? see below) ,however Loading a dictionary file for dictionary_decompounder as documented in our docs fails.

ES version

{
  "name": "Jean Grey",
  "cluster_name": "elasticsearch",
  "version": {
    "number": "2.2.1",
    "build_hash": "d045fc29d1932bce18b2e65ab8b297fbf6cd41a1",
    "build_timestamp": "2016-03-09T09:38:54Z",
    "build_snapshot": false,
    "lucene_version": "5.4.1"
  },
  "tagline": "You Know, for Search"
}

Repro steps

  1. place dictionary file and FOP XML hyphenation pattern file under $ES_HOME/config
  2. launch
PUT my_index
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "myAnalyzer2": {
            "type": "custom",
            "tokenizer": "standard",
            "filter": [
              "myTokenFilter2"
            ]
          }
        },
        "filter": {
          "myTokenFilter2": {
            "type": "hyphenation_decompounder",
            "hyphenation_patterns_path": "de.xml",
            "word_list_path": "germany.txt",
            "max_subword_size": 22
          }
        }
      }
    }
  },
  "mappings": {
    "type1": {
      "properties": {
        "field1": {
          "type": "string",
          "anlyzer": "myAnalyzer2"
        }
      }
    }
  }
} 

Response

{
   "error": {
      "root_cause": [
         {
            "type": "index_creation_exception",
            "reason": "failed to create index"
         }
      ],
      "type": "illegal_argument_exception",
      "reason": "IOException while reading word_list_path: Input length = 1"
   },
   "status": 400
}

Dictionary file at http://www.md5this.com/wordlists/dictionary_german.zip

Antonios-MacBook-Air:elasticsearch-2.2.1 abonuccelli$ ls -alrth config/germany.txt 
-rw-r--r--@ 1 abonuccelli  wheel    20M Mar 21 16:13 config/germany.txt
Antonios-MacBook-Air:elasticsearch-2.2.1 abonuccelli$ head config/germany.txt && tail config/germany.txt && wc -l config/germany.txt 
00brucellosis
00faa
00kiribati
00mag
00murree
00whitebait
01
013
016
019
?ppigeren
?ppigerer
?ppigeres
?ppiges
?ppigkeit
?ppigste
?ppigstem
?ppigsten
?ppigster
?ppigstes
 1744388 config/germany.txt

FOP XML hyphenation pattern file downloaded from site referenced in docs.https://sourceforge.net/projects/offo/files/offo-hyphenation/1.2/offo-hyphenation_v1.2.zip/download

Full Trace Exception (9 More...swallowed?)

elasticsearch.log-[2016-03-21 16:25:38,683][DEBUG][cluster.service          ] [Straw Man] cluster state update task [create-index [my_index], cause [api]] failed
elasticsearch.log:[my_index] IndexCreationException[failed to create index]; nested: IllegalArgumentException[IOException while reading word_list_path: Input length = 1];
elasticsearch.log-  at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:360)
elasticsearch.log-  at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService$1.execute(MetaDataCreateIndexService.java:309)
elasticsearch.log-  at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:45)
elasticsearch.log-  at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:458)
elasticsearch.log-  at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:762)
elasticsearch.log-  at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
elasticsearch.log-  at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
elasticsearch.log-  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
elasticsearch.log-  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
elasticsearch.log-  at java.lang.Thread.run(Thread.java:745)
elasticsearch.log:Caused by: java.lang.IllegalArgumentException: IOException while reading word_list_path: Input length = 1
elasticsearch.log-  at org.elasticsearch.index.analysis.Analysis.getWordList(Analysis.java:241)
elasticsearch.log-  at org.elasticsearch.index.analysis.Analysis.getWordSet(Analysis.java:209)
elasticsearch.log-  at org.elasticsearch.index.analysis.compound.AbstractCompoundWordTokenFilterFactory.<init>(AbstractCompoundWordTokenFilterFactory.java:49)
elasticsearch.log-  at org.elasticsearch.index.analysis.compound.HyphenationCompoundWordTokenFilterFactory.<init>(HyphenationCompoundWordTokenFilterFactory.java:52)
elasticsearch.log-  at sun.reflect.GeneratedConstructorAccessor29.newInstance(Unknown Source)
elasticsearch.log-  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
elasticsearch.log-  at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
elasticsearch.log-  at org.elasticsearch.common.inject.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:50)
elasticsearch.log-  at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:86)
elasticsearch.log-  at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:104)
elasticsearch.log-  at org.elasticsearch.common.inject.FactoryProxy.get(FactoryProxy.java:54)
elasticsearch.log-  at org.elasticsearch.common.inject.InjectorImpl$5$1.call(InjectorImpl.java:828)
elasticsearch.log-  at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:887)
elasticsearch.log-  at org.elasticsearch.common.inject.InjectorImpl$5.get(InjectorImpl.java:823)
elasticsearch.log-  at org.elasticsearch.common.inject.assistedinject.FactoryProvider2.invoke(FactoryProvider2.java:236)
elasticsearch.log-  at com.sun.proxy.$Proxy19.create(Unknown Source)
elasticsearch.log-  at org.elasticsearch.index.analysis.AnalysisService.<init>(AnalysisService.java:161)
elasticsearch.log-  at org.elasticsearch.index.analysis.AnalysisService.<init>(AnalysisService.java:66)
elasticsearch.log-  at sun.reflect.GeneratedConstructorAccessor28.newInstance(Unknown Source)
elasticsearch.log-  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
elasticsearch.log-  at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
elasticsearch.log-  at org.elasticsearch.common.inject.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:50)
elasticsearch.log-  at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:86)
elasticsearch.log-  at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:104)
elasticsearch.log-  at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:47)
elasticsearch.log-  at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:887)
elasticsearch.log-  at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:43)
elasticsearch.log-  at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:59)
elasticsearch.log-  at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:46)
elasticsearch.log-  at org.elasticsearch.common.inject.SingleParameterInjector.inject(SingleParameterInjector.java:42)
elasticsearch.log-  at org.elasticsearch.common.inject.SingleParameterInjector.getAll(SingleParameterInjector.java:66)
elasticsearch.log-  at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:85)
elasticsearch.log-  at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:104)
elasticsearch.log-  at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:47)
elasticsearch.log-  at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:887)
elasticsearch.log-  at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:43)
elasticsearch.log-  at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:59)
elasticsearch.log-  at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:46)
elasticsearch.log-  at org.elasticsearch.common.inject.SingleParameterInjector.inject(SingleParameterInjector.java:42)
elasticsearch.log-  at org.elasticsearch.common.inject.SingleParameterInjector.getAll(SingleParameterInjector.java:66)
elasticsearch.log-  at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:85)
elasticsearch.log-  at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:104)
elasticsearch.log-  at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:47)
elasticsearch.log-  at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:887)
elasticsearch.log-  at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:43)
elasticsearch.log-  at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:59)
elasticsearch.log-  at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:46)
elasticsearch.log-  at org.elasticsearch.common.inject.InjectorBuilder$1.call(InjectorBuilder.java:201)
elasticsearch.log-  at org.elasticsearch.common.inject.InjectorBuilder$1.call(InjectorBuilder.java:193)
elasticsearch.log-  at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:880)
elasticsearch.log-  at org.elasticsearch.common.inject.InjectorBuilder.loadEagerSingletons(InjectorBuilder.java:193)
elasticsearch.log-  at org.elasticsearch.common.inject.InjectorBuilder.injectDynamically(InjectorBuilder.java:175)
elasticsearch.log-  at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:110)
elasticsearch.log-  at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:159)
elasticsearch.log-  at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:55)
elasticsearch.log-  at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:358)
elasticsearch.log-  ... 9 more
elasticsearch.log-[2016-03-21 16:25:38,683][DEBUG][action.admin.indices.create] [Straw Man] [my_index] failed to create
elasticsearch.log:[my_index] IndexCreationException[failed to create index]; nested: IllegalArgumentException[IOException while reading word_list_path: Input length = 1];
elasticsearch.log-  at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:360)
elasticsearch.log-  at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService$1.execute(MetaDataCreateIndexService.java:309)
elasticsearch.log-  at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:45)
elasticsearch.log-  at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:458)
elasticsearch.log-  at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:762)
elasticsearch.log-  at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
elasticsearch.log-  at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
elasticsearch.log-  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
elasticsearch.log-  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
elasticsearch.log-  at java.lang.Thread.run(Thread.java:745)
elasticsearch.log:Caused by: java.lang.IllegalArgumentException: IOException while reading word_list_path: Input length = 1
elasticsearch.log-  at org.elasticsearch.index.analysis.Analysis.getWordList(Analysis.java:241)
elasticsearch.log-  at org.elasticsearch.index.analysis.Analysis.getWordSet(Analysis.java:209)
elasticsearch.log-  at org.elasticsearch.index.analysis.compound.AbstractCompoundWordTokenFilterFactory.<init>(AbstractCompoundWordTokenFilterFactory.java:49)
elasticsearch.log-  at org.elasticsearch.index.analysis.compound.HyphenationCompoundWordTokenFilterFactory.<init>(HyphenationCompoundWordTokenFilterFactory.java:52)
elasticsearch.log-  at sun.reflect.GeneratedConstructorAccessor29.newInstance(Unknown Source)
elasticsearch.log-  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
elasticsearch.log-  at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
elasticsearch.log-  at org.elasticsearch.common.inject.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:50)
elasticsearch.log-  at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:86)
elasticsearch.log-  at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:104)
elasticsearch.log-  at org.elasticsearch.common.inject.FactoryProxy.get(FactoryProxy.java:54)
elasticsearch.log-  at org.elasticsearch.common.inject.InjectorImpl$5$1.call(InjectorImpl.java:828)
elasticsearch.log-  at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:887)
elasticsearch.log-  at org.elasticsearch.common.inject.InjectorImpl$5.get(InjectorImpl.java:823)
elasticsearch.log-  at org.elasticsearch.common.inject.assistedinject.FactoryProvider2.invoke(FactoryProvider2.java:236)
elasticsearch.log-  at com.sun.proxy.$Proxy19.create(Unknown Source)
elasticsearch.log-  at org.elasticsearch.index.analysis.AnalysisService.<init>(AnalysisService.java:161)
elasticsearch.log-  at org.elasticsearch.index.analysis.AnalysisService.<init>(AnalysisService.java:66)
elasticsearch.log-  at sun.reflect.GeneratedConstructorAccessor28.newInstance(Unknown Source)
elasticsearch.log-  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
elasticsearch.log-  at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
elasticsearch.log-  at org.elasticsearch.common.inject.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:50)
elasticsearch.log-  at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:86)
elasticsearch.log-  at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:104)
elasticsearch.log-  at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:47)
elasticsearch.log-  at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:887)
elasticsearch.log-  at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:43)
elasticsearch.log-  at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:59)
elasticsearch.log-  at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:46)
elasticsearch.log-  at org.elasticsearch.common.inject.SingleParameterInjector.inject(SingleParameterInjector.java:42)
elasticsearch.log-  at org.elasticsearch.common.inject.SingleParameterInjector.getAll(SingleParameterInjector.java:66)
elasticsearch.log-  at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:85)
elasticsearch.log-  at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:104)
elasticsearch.log-  at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:47)
elasticsearch.log-  at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:887)
elasticsearch.log-  at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:43)
elasticsearch.log-  at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:59)
elasticsearch.log-  at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:46)
elasticsearch.log-  at org.elasticsearch.common.inject.SingleParameterInjector.inject(SingleParameterInjector.java:42)
elasticsearch.log-  at org.elasticsearch.common.inject.SingleParameterInjector.getAll(SingleParameterInjector.java:66)
elasticsearch.log-  at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:85)
elasticsearch.log-  at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:104)
elasticsearch.log-  at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:47)
elasticsearch.log-  at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:887)
elasticsearch.log-  at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:43)
elasticsearch.log-  at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:59)
elasticsearch.log-  at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:46)
elasticsearch.log-  at org.elasticsearch.common.inject.InjectorBuilder$1.call(InjectorBuilder.java:201)
elasticsearch.log-  at org.elasticsearch.common.inject.InjectorBuilder$1.call(InjectorBuilder.java:193)
elasticsearch.log-  at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:880)
elasticsearch.log-  at org.elasticsearch.common.inject.InjectorBuilder.loadEagerSingletons(InjectorBuilder.java:193)
elasticsearch.log-  at org.elasticsearch.common.inject.InjectorBuilder.injectDynamically(InjectorBuilder.java:175)
elasticsearch.log-  at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:110)
elasticsearch.log-  at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:159)
elasticsearch.log-  at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:55)
elasticsearch.log-  at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:358)
elasticsearch.log-  ... 9 more

I've run lsof on the germany.txt file checking for other process accessing it but can't see anything

Antonios-MacBook-Air:elasticsearch-2.2.1 abonuccelli$ while true; do sudo lsof /opt/elk/PROD/elasticsearch-2.2.1/config/de.xml ; sudo lsof /opt/elk/PROD/elasticsearch-2.2.1/config/germany.txt;date;done
Mon Mar 21 16:25:22 CST 2016
Mon Mar 21 16:25:23 CST 2016
Mon Mar 21 16:25:25 CST 2016
Mon Mar 21 16:25:26 CST 2016
Mon Mar 21 16:25:27 CST 2016
Mon Mar 21 16:25:29 CST 2016
Mon Mar 21 16:25:31 CST 2016
Mon Mar 21 16:25:33 CST 2016
Mon Mar 21 16:25:34 CST 2016
Mon Mar 21 16:25:36 CST 2016
Mon Mar 21 16:25:37 CST 2016
Mon Mar 21 16:25:39 CST 2016
Mon Mar 21 16:25:40 CST 2016
Mon Mar 21 16:25:42 CST 2016
Mon Mar 21 16:25:43 CST 2016
Mon Mar 21 16:25:45 CST 2016
@rmuir
Copy link
Contributor

rmuir commented Mar 21, 2016

wrong encoding.

@rmuir rmuir closed this as completed Mar 21, 2016
@s1monw
Copy link
Contributor

s1monw commented Mar 21, 2016

What @rmuir meant is that the file is not UTF-8 encoded. All files that ES accepts must be UTF-8

@rmuir
Copy link
Contributor

rmuir commented Mar 21, 2016

By the way, i have the feeling an exception may be discarded here.

I feel like it should not be java.lang.IllegalArgumentException: IOException while reading word_list_path: Input length = 1, that is not a good error. I just happen to know what it means.

Per the charset api (https://docs.oracle.com/javase/7/docs/api/java/nio/charset/CodingErrorAction.html#REPORT), the original exception should have been a subclass of CharacterCodingException, such as MalformedInputException. This would make these errors easier to understand if we can improve that.

@s1monw
Copy link
Contributor

s1monw commented Mar 22, 2016

@rmuir it's fine in master but in 2.x we shadow this exception. 5.x already puts the MalformedInputException as the cause - I will put up a PR

s1monw added a commit to s1monw/elasticsearch that referenced this issue Mar 22, 2016
This commit fixes string formatting issues in the error handling and
provides a bettter error message if malformed input is detected.
This commit also adds tests for both situations.

Relates to elastic#17212
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants