Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Term Vector With BiGrams in Nested Object #25070

Closed
vishva2005 opened this issue Jun 6, 2017 · 2 comments
Closed

Term Vector With BiGrams in Nested Object #25070

vishva2005 opened this issue Jun 6, 2017 · 2 comments
Labels
discuss :Search/Search Search-related issues that do not fall into other categories

Comments

@vishva2005
Copy link

vishva2005 commented Jun 6, 2017

Describe the feature:

Elasticsearch version: 5.2.1

Plugins installed: []

JVM version (java -version): 1.8.0_101

OS version (uname -a if on a Unix-like system): Ubuntu 14.04.4 LTS

Description of the problem including expected versus actual behavior: When creating term vectors for a nested document with bigrams, the result is always a unigram if term vectors are dynamically evaluated, if they are evaluated and stored, result is an empty array. However same works for fields in non-nested document

Steps to reproduce:

I have a a document which contains a subdocument, as such I have made the subdocument a nested property of the document. Now i need to find term vectors for the sub document. My terms can be unigrams or bigrams, hence I created an analyzer with shingle filter. The setting for the index is as follows

{
  "settings": {
    "analysis": {
      "filter": {
        "light_english_stemmer": {
		  "type":       "stemmer",
          "language":   "light_english" 
        },
        "filter_shingle":{
		   "type":"shingle",
		   "max_shingle_size":3,
		   "min_shingle_size":2,
		   "output_unigrams":"true",
		   "filler_token" : ""
		}
      },
      "analyzer": {
        "keyword_discovery_analyzer": {
          "tokenizer":  "standard",
          "char_filter":  [ "html_strip" ],
          "filter": [
            "lowercase",
            "filter_shingle",
            "light_english_stemmer"
          ]
        }
      }
    }
  },
  "mappings": {
  	"doc" : {
  		"properties" : {
  			"name" : {
  				"type" : "text"
  			},
  			"description" : {
  				"type" : "text",
  				"analyzer" : "indexing_analyzer",
  				"search_analyzer": "search_analyzer",
  				"fields" : {
  					"termVec": { 
		              "type" : "text",
		              "term_vector": "yes",
			          "store" : true,
			          "analyzer" : "keyword_discovery_analyzer"
		            }
  				}
  			},
  			"subDoc" : {
  				"type" : "nested",
  				"properties" : {
  					"name" : {
		  				"type" : "text",
		  				"fields" : {
							"termVec": { 
				              "type" : "text",
				              "term_vector": "yes",
					          "store" : true,
					          "analyzer" : "keyword_discovery_analyzer"
				            }
  						}
		  			},
		  			"description" : {
		  				"type" : "text",
		  				"fields" : {
							"termVec": { 
				              "type" : "text",
				              "term_vector": "yes",
					          "store" : true,
					          "analyzer" : "keyword_discovery_analyzer"
				            }
  						}
		  			}
  				}
  			}
  		}
  	}
  }
}

When i execute request

GET /_termvectors
{
"fields" : ["subDoc.name.termVec"],
"offsets" : false,
"payloads" : false,
"positions" : false,
"term_statistics" : true,
"field_statistics" : true,
"filter" :{
"max_num_terms" : 4
}
}

I get empty result. However if instead of the above query i run the following,

GET /12631946/_termvectors
{
"fields" : ["subDoc.name"],
"offsets" : false,
"payloads" : false,
"positions" : false,
"term_statistics" : true,
"field_statistics" : true,
"per_field_analyzer" : {
"name": "keyword_discovery_analyzer"
},
"filter" :{
"max_num_terms" : 15
}
}
@tlrx
Copy link
Member

tlrx commented Jun 6, 2017

I don't think that nested documents are supported by the Term Vectors API and I don't think it is planned to be supported.

I'm going to label this issue as "Discuss" so that users and contributors can argue on the need and feasibility to add such a feature to the Term Vectors API.

See also #21625 (comment)

@tlrx
Copy link
Member

tlrx commented Jun 9, 2017

We discussed this in Fix-it Friday and it's indeed a duplicate of #21625.

@tlrx tlrx closed this as completed Jun 9, 2017
@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Term Vectors labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

No branches or pull requests

3 participants