Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If _id field is an object, no error is thrown but doc is "unsearchable" #3517

Closed
polyfractal opened this issue Aug 15, 2013 · 12 comments · Fixed by #14003
Closed

If _id field is an object, no error is thrown but doc is "unsearchable" #3517

polyfractal opened this issue Aug 15, 2013 · 12 comments · Fixed by #14003
Assignees
Labels
>bug help wanted adoptme :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@polyfractal
Copy link
Contributor

Expected Behavior

Normally, if you try to index a document without an ID in the URI (e.g. a POST) but with an _id field in the document (and no explicit _id path mapping), it throws an error because the autogenerated ID does not match the provided _id field:

curl -XDELETE localhost:9200/testindex
curl -XPUT localhost:9200/testindex
curl -XPOST localhost:9200/testindex/testtype?pretty -d '{"_id":"polyfractal","key":"value"}}}'
{
  "error" : "MapperParsingException[failed to parse [_id]]; nested: MapperParsingException[Provided id [O-kIgieVTRG9DpxHML7LkA] does not match the content one [polyfractal]]; ",
  "status" : 400
}

Broken Behavior

However, if the _id field happens to be an object, Elasticsearch happily indexes the document:

curl -XDELETE localhost:9200/testindex
curl -XPUT localhost:9200/testindex
curl -XPOST "localhost:9200/testindex/testtype" -d '{"key":"value"}'
curl -XPOST "localhost:9200/testindex/testtype" -d '{"_id":{"name":"polyfractal"},"key":"value"}}}'
{"ok":true,"_index":"testindex","_type":"testtype","_id":"b2xEPk5tTfC-RLsCb1ZapA","_version":1}
{"ok":true,"_index":"testindex","_type":"testtype","_id":"BsTbRqaeTrKLIe0JoeHsWw","_version":1}

You can GET it:

curl -XGET localhost:9200/testindex/testtype/BsTbRqaeTrKLIe0JoeHsWw?pretty
{
  "_index" : "testindex",
  "_type" : "testtype",
  "_id" : "BsTbRqaeTrKLIe0JoeHsWw",
  "_version" : 1,
  "exists" : true, "_source" : {"_id":{"name":"polyfractal"},"key":"value"}}}
}

It shows up with a match_all query:

curl -XGET localhost:9200/testindex/testtype/_search?pretty -d '{"query":{"match_all":{}}}'
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "testindex",
      "_type" : "testtype",
      "_id" : "BsTbRqaeTrKLIe0JoeHsWw",
      "_score" : 1.0, "_source" : {"_id":{"name":"polyfractal"},"key":"value"}}}
    }, {
      "_index" : "testindex",
      "_type" : "testtype",
      "_id" : "b2xEPk5tTfC-RLsCb1ZapA",
      "_score" : 1.0, "_source" : {"key":"value"}
    } ]
  }
}

But doesn't show up when you search for exact values (or Match or any other search):

curl -XGET localhost:9200/testindex/testtype/_search?pretty -d '{"query":{"term":{"key":"value"}}}'
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.30685282,
    "hits" : [ {
      "_index" : "testindex",
      "_type" : "testtype",
      "_id" : "b2xEPk5tTfC-RLsCb1ZapA",
      "_score" : 0.30685282, "_source" : {"key":"value"}
    } ]
  }
}

If you ask ES why it doesn't show up, it says there are no matching terms:

curl -XGET localhost:9200/testindex/testtype/BsTbRqaeTrKLIe0JoeHsWw/_explain?pretty -d '{"query":{"term":{"key":"value"}}}'
{
  "ok" : true,
  "_index" : "testindex",
  "_type" : "testtype",
  "_id" : "BsTbRqaeTrKLIe0JoeHsWw",
  "matched" : false,
  "explanation" : {
    "value" : 0.0,
    "description" : "no matching term"
  }
}

And finally, as a fun twist, you can set an explicit mapping to look inside the _id object. This works with regard to the ID (it extracts the appropriate ID), is GETable, match_all, etc. Search is still broken.

curl -XDELETE localhost:9200/testindex
curl -XPUT localhost:9200/testindex -d '{
   "mappings":{
      "testtype":{
         "_id" : {
           "path" : "_id.name"
         },
         "properties":{
            "_id":{
               "type":"object",
               "properties":{
                  "name":{
                     "type":"string"
                  }
               }
            }
         }
      }
   }
}'

curl -XPOST "localhost:9200/testindex/testtype" -d '{"key":"value"}'
curl -XPOST "localhost:9200/testindex/testtype" -d '{"_id":{"name":"polyfractal"},"key":"value"}}}'
curl -XGET localhost:9200/testindex/testtype/polyfractal?pretty
{
  "_index" : "testindex",
  "_type" : "testtype",
  "_id" : "polyfractal",
  "_version" : 1,
  "exists" : true, "_source" : {"_id":{"name":"polyfractal"},"key":"value"}}}
}
curl -XGET localhost:9200/testindex/testtype/_search?pretty -d '{"query":{"match_all":{}}}'
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "testindex",
      "_type" : "testtype",
      "_id" : "wsT9vaevTCW5EuKyr7nmUw",
      "_score" : 1.0, "_source" : {"key":"value"}
    }, {
      "_index" : "testindex",
      "_type" : "testtype",
      "_id" : "polyfractal",
      "_score" : 1.0, "_source" : {"_id":{"name":"polyfractal"},"key":"value"}}}
    } ]
  }
}
curl -XGET localhost:9200/testindex/testtype/_search?pretty -d '{"query":{"term":{"key":"value"}}}'
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.30685282,
    "hits" : [ {
      "_index" : "testindex",
      "_type" : "testtype",
      "_id" : "wsT9vaevTCW5EuKyr7nmUw",
      "_score" : 0.30685282, "_source" : {"key":"value"}
    } ]
  }
}

Reference

This was surfaced by Scott on the mailing list.

@ghost ghost assigned polyfractal Aug 26, 2013
polyfractal added a commit to polyfractal/elasticsearch that referenced this issue Aug 27, 2013
    An exception is thrown if the provided id does not match the
    content id, but only if the content id is a string field.  If
    the content id is a complex object, no exception is thrown but
    the document is indexed anyway, leading to problems with search
    later.

    This fix adds an additional check for _id fields that are objects
    and throws an exception if one is encountered

    Fixes elastic#3517
karmi added a commit to elastic/elasticsearch-rails that referenced this issue Dec 11, 2013
…xed_json` method

Default Mongoid serialization keeps the `_id` property in the JSON as "BSON Object",
which breaks Elasticsearch.

See:

* elastic/elasticsearch#3517
* rails-api/active_model_serializers#354 (comment)
karmi added a commit to elastic/elasticsearch-rails that referenced this issue Jan 19, 2014
…xed_json` method

Default Mongoid serialization keeps the `_id` property in the JSON as "BSON Object",
which breaks Elasticsearch.

See:

* elastic/elasticsearch#3517
* rails-api/active_model_serializers#354 (comment)
@GlenRSmith
Copy link
Contributor

It's a little bit more fun than that, even: you actually get partial indexing!

curl -XDELETE localhost:9200/testindex
curl -XPUT localhost:9200/testindex
curl -XPOST localhost:9200/testindex/testtype -d '{"leftkey":"value","_id":{"name":"polyfractal"},"rightkey":"value"}}}'
curl -XPOST localhost:9200/_flush

Now search on the field before the _id:

curl -XGET localhost:9200/testindex/testtype/_search?pretty -d '{"query":{"term":{"leftkey":"value"}}}'
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.30685282,
    "hits" : [ {
      "_index" : "testindex",
      "_type" : "testtype",
      "_id" : "PalIN5CpSPKkGbhs4qNqaw",
      "_score" : 0.30685282, "_source" : {"leftkey":"value","_id":{"name":"polyfractal"},"rightkey":"value"}}}
    } ]
  }
}

There you go.
But search on the field after the _id:

curl -XGET localhost:9200/testindex/testtype/_search?pretty -d '{"query":{"term":{"rightkey":"value"}}}'
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

And you get nothing.

@andreaskern
Copy link

I am affected by this behavior too, monogo output the field like this

{ "_id":{"$oid":"54d9e3bf30320c3335017e69"}, "@timestamp":"..."}

actually I did not care about the "_id" field, but I care about the "@timestamp" field which is silently not indexed. Here an example that shows the behavior:
https://gist.github.com/andreaskern/01d1d292f7f146186ee5

@s1monw s1monw added v1.6.0 and removed v1.5.0 labels Mar 17, 2015
@clintongormley
Copy link
Contributor

In 2.0, the timestamp field would now be indexed correctly, as would _id.$oid. Wondering if we should allow users to index _id field inside the body at all? /cc @rjernst

@clintongormley clintongormley added :Search Foundations/Mapping Index mappings, including merging and defining field types and removed v1.6.0 labels May 29, 2015
@rjernst
Copy link
Member

rjernst commented May 29, 2015

The ability to specify _id within a document has already been removed for 2.0+ indexes.

@clintongormley
Copy link
Contributor

@rjernst you removed the ability to specify the main doc _id in the body, but if the body contains an _id field then it creates a field called _id in the mapping, which can't be queried.

What I'm asking is: should we just ignore the fact that this field is not accessible (as we do in master today) or should we actually throw an exception? I'm leaning towards ignoring, as users don't always have control over the docs they receive.

@rjernst
Copy link
Member

rjernst commented May 31, 2015

I would be in favor of throwing an exception. This would only be for 2.0+ indexes, and it is really just field name validation (disallowing fields colliding with meta fields). The mechanism would be the same, a user would not be able to explicitly add a field _id in the properties for a document type.

@clintongormley
Copy link
Contributor

@rjernst it's a tricky one. eg mongo adds { "_id": { "$oid": "...." }}, so actually the _id.$oid field IS queryable... should this still throw an exception?

@rjernst
Copy link
Member

rjernst commented May 31, 2015

IMO, yes.

@rjernst
Copy link
Member

rjernst commented May 31, 2015

With #8871, I don't think that would work, because _id is both a field mapper (the real meta field), and an object mapper.

@clintongormley
Copy link
Contributor

@rjernst yep, makes sense

@rjernst rjernst self-assigned this Jun 3, 2015
@clintongormley
Copy link
Contributor

@rjernst this still works, even with #8871 merged in

@clintongormley
Copy link
Contributor

Closed by #14003

oleksandrbyk added a commit to oleksandrbyk/olek-elastic-rails that referenced this issue Feb 6, 2019
…xed_json` method

Default Mongoid serialization keeps the `_id` property in the JSON as "BSON Object",
which breaks Elasticsearch.

See:

* elastic/elasticsearch#3517
* rails-api/active_model_serializers#354 (comment)
SuperFullStack added a commit to SuperFullStack/elasticsearch_rails that referenced this issue Dec 8, 2021
…xed_json` method

Default Mongoid serialization keeps the `_id` property in the JSON as "BSON Object",
which breaks Elasticsearch.

See:

* elastic/elasticsearch#3517
* rails-api/active_model_serializers#354 (comment)
Durorexp added a commit to Durorexp/elasticsearch-rails that referenced this issue Mar 14, 2022
…xed_json` method

Default Mongoid serialization keeps the `_id` property in the JSON as "BSON Object",
which breaks Elasticsearch.

See:

* elastic/elasticsearch#3517
* rails-api/active_model_serializers#354 (comment)
koziiroman added a commit to koziiroman/Elasticsearch-Integraion that referenced this issue Apr 4, 2022
…xed_json` method

Default Mongoid serialization keeps the `_id` property in the JSON as "BSON Object",
which breaks Elasticsearch.

See:

* elastic/elasticsearch#3517
* rails-api/active_model_serializers#354 (comment)
patrickm53 pushed a commit to patrickm53/search-rails-on-rails that referenced this issue Sep 23, 2022
…xed_json` method

Default Mongoid serialization keeps the `_id` property in the JSON as "BSON Object",
which breaks Elasticsearch.

See:

* elastic/elasticsearch#3517
* rails-api/active_model_serializers#354 (comment)
richardstewart0213 added a commit to richardstewart0213/rails-elasticsearch that referenced this issue Nov 4, 2022
…xed_json` method

Default Mongoid serialization keeps the `_id` property in the JSON as "BSON Object",
which breaks Elasticsearch.

See:

* elastic/elasticsearch#3517
* rails-api/active_model_serializers#354 (comment)
mikejohn857 added a commit to mikejohn857/elasticsearch-rails-per-sistence that referenced this issue Nov 25, 2022
…xed_json` method

Default Mongoid serialization keeps the `_id` property in the JSON as "BSON Object",
which breaks Elasticsearch.

See:

* elastic/elasticsearch#3517
* rails-api/active_model_serializers#354 (comment)
superdev9082 added a commit to superdev9082/elasticsearch-rails that referenced this issue Feb 16, 2023
…xed_json` method

Default Mongoid serialization keeps the `_id` property in the JSON as "BSON Object",
which breaks Elasticsearch.

See:

* elastic/elasticsearch#3517
* rails-api/active_model_serializers#354 (comment)
bluedone added a commit to bluedone/elasticsearch-rails that referenced this issue Jun 6, 2023
…xed_json` method

Default Mongoid serialization keeps the `_id` property in the JSON as "BSON Object",
which breaks Elasticsearch.

See:

* elastic/elasticsearch#3517
* rails-api/active_model_serializers#354 (comment)
miniTalDev added a commit to miniTalDev/elasticsearch-rails that referenced this issue Aug 4, 2023
…xed_json` method

Default Mongoid serialization keeps the `_id` property in the JSON as "BSON Object",
which breaks Elasticsearch.

See:

* elastic/elasticsearch#3517
* rails-api/active_model_serializers#354 (comment)
pemilas added a commit to pemilas/elastic-search-rails that referenced this issue Aug 18, 2023
…xed_json` method

Default Mongoid serialization keeps the `_id` property in the JSON as "BSON Object",
which breaks Elasticsearch.

See:

* elastic/elasticsearch#3517
* rails-api/active_model_serializers#354 (comment)
@javanna javanna added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug help wanted adoptme :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
7 participants