-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
copy_to of mapper attachments metadata field isn't working #14946
Comments
From @gpaul on November 23, 2015 13:37 Ping. Should I open this issue against the main elasticsearch repository now that this plugin is moving there? |
Did you try the same script with elasticsearch 1.7? I'd like to know if it's a regression or if it has always been there. I know that copy_to feature is supposed to work for the extracted text but I don't think it worked for metadata. If I'm right (so it's not an issue but more a feature request), then you can open it in elasticsearch repo. |
From @gpaul on November 23, 2015 14:58 It seems like a regression:
|
Thank you @gpaul |
From @hhoechtl on November 23, 2015 15:2 It's also not working with the .content field |
I created a test for elasticsearch 1.7 and it is working well in 1.x series: @Test
public void testCopyToMetaData() throws Exception {
String mapping = copyToStringFromClasspath("/org/elasticsearch/index/mapper/attachment/test/integration/simple/copy-to-metadata.json");
byte[] txt = copyToBytesFromClasspath("/org/elasticsearch/index/mapper/attachment/test/sample-files/text-in-english.txt");
client().admin().indices().putMapping(putMappingRequest("test").type("person").source(mapping)).actionGet();
index("test", "person", jsonBuilder().startObject()
.startObject("file")
.field("_content", txt)
.field("_name", "name")
.endObject()
.endObject());
refresh();
CountResponse countResponse = client().prepareCount("test").setQuery(queryStringQuery("name").defaultField("file.name")).execute().get();
assertThatWithError(countResponse.getCount(), equalTo(1l));
countResponse = client().prepareCount("test").setQuery(queryStringQuery("name").defaultField("copy")).execute().get();
assertThatWithError(countResponse.getCount(), equalTo(1l));
} I created a test for 2.* branches which demonstrates the regression from 2.0. "Copy To Feature":
- do:
indices.create:
index: test
body:
mappings:
doc:
properties:
copy_dst:
type: string
doc:
type: attachment
fields:
name:
copy_to: copy_dst
- do:
cluster.health:
wait_for_status: yellow
- do:
index:
index: test
type: doc
id: 1
body:
doc:
_content: "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0="
_name: "name"
- do:
indices.refresh: {}
- do:
search:
index: test
body:
query:
match:
doc.content: "ipsum"
- match: { hits.total: 1 }
- do:
search:
index: test
body:
query:
match:
doc.name: "name"
- match: { hits.total: 1 }
- do:
search:
index: test
body:
query:
match:
copy_dst: "name"
- match: { hits.total: 1 } @rjernst Could you give a look please? |
@dadoonet One odd thing I see is using a copy_to from within a multi field. Seems like we should disallow that? We removed support here: The copy's are now handled outside of the mappers, while before there was a lot of spaghetti sharing between object mappers and field mappers that made document parsing complex. If we want to add it back, we will probably need a good refactoring in the way multi fields and copy_tos are handled. The problem is |
@clintongormley WDYT? Let me sum up the discussion. Before 2.0, we were able to support: PUT /test/person/_mapping
{
"person": {
"properties": {
"file": {
"type": "attachment",
"fields": {
"content": {
"type": "string",
"copy_to": "copy"
},
"name": {
"type": "string",
"copy_to": "copy"
},
"author": {
"type": "string",
"copy_to": "copy"
},
"title": {
"type": "string",
"copy_to": "copy"
}
}
},
"copy": {
"type": "string"
}
}
}
} It means that extracted From 2.0, this is not supported anymore for the reasons @rjernst described. Should we document that mapper attachments does not support anymore Or should we try to implement such a thing only for mapper attachments plugin. I can't see another use case today. May be some other community plugins would like to have it but really unsure here. IMO, users can always run search on multiple fields at the same time so instead of searching in Thoughts? |
My feeling is that, long term, we should remove the With this goal in mind, it doesn't make sense to add complicated (and likely buggy) hacks to fix this regression in 2.x. But, we should let the user know that copy-to on multi-fields is not supported: we should throw an exception at mapping time instead of silently ignoring the problem. |
So...I took a look at the copy_to and multi_fields and tried to throw an exception when we encounter copy_to in multi_fields - we could do that I guess. But I have the suspicion that re-adding the copy_to to multi_fields is just a matter of shifting three lines. I made a pr here so you can see what I mean: #15152 Tests pass but I might be missing something. |
That would be an awesome news @brwe. Was not expecting that. Are the tests I wrote for mapper attachments plugin work as well? Is that what you mean by |
@dadoonet I mean the tests that were there already and the tests I added in the pr. I suspect that your test would pass as well but did not check. However, we ( @rjernst @clintongormley and I ) had a chat yesterday about this in here is the outcome:
should not be possible because they add complexity in code and usage. This reduces flexibility of mappings and how they can transform data. However, the consensus is that elasticsearch is not the right place to perform these kind of transformations anyway and these kind of operations should move to external tools such as the planned node ingest plugin #14049 Applying the fix I proposed would just delay the removal of the feature and therefore we think we should not do it. |
It seems like you have removed a feature without providing a better alternative. The use of copy_to for custom '_all' fields is well-documented and very useful. |
@brwe My 2c on your discussion
|
@gpaul If our resources were unlimited, then I would agree with you. However, in an effort to clean up a massive code base and to remove complexity, we have to do it incrementally and sometimes we have to remove things that worked before. The mapping cleanup was 5 long months of work, and there is still a good deal more to be done. It brought some huge improvements (just see how many issues were linked to #8870) but meant that we couldn't support everything that we supported before. Every hack that we add into the code adds technical debt and increases the likelihood of introducing new bugs. We'd much rather focus our limited resources on making the system clean, stable, reliable, and maintainable. This is why I don't want to make this change. The workaround for your case is to search across multiple fields. |
[woops, I was logged in as a friend of mine when I posted this comment a minute ago. I've removed it and this is a repost as myself ><] That's fair. Thanks for all the hard work. As I'll have to redesign my mappings anyway, should I avoid copy_to in its entirety going forward or is it just the multi-field case that was causing pain? I'd like to avoid features that are on their way out. |
@gpaul It is just copy_to in a multi field. |
Got it, thanks. |
Copy to within multi field is ignored from 2.0 on, see elastic#10802. Instead of just ignoring it, we should throw an exception if this is found in the mapping when a mapping is added. For already existing indices we should at least log a warning. related to elastic#14946
throw exception if a copy_to is within a multi field Copy to within multi field is ignored from 2.0 on, see #10802. Instead of just ignoring it, we should throw an exception if this is found in the mapping when a mapping is added. For already existing indices we should at least log a warning. related to #14946
throw exception if a copy_to is within a multi field Copy to within multi field is ignored from 2.0 on, see #10802. Instead of just ignoring it, we should throw an exception if this is found in the mapping when a mapping is added. For already existing indices we should at least log a warning. related to #14946
Hi, I think I've this problem as well. Following the documentation (?) here: https://github.com/elastic/elasticsearch-mapper-attachments#copy-to-feature the I want to make use of this feature to copy the extracted content into a custom Is there a content extraction service/endpoint I could make use of to index prepared content, so that I don't have to rely on copyTo? Edit: These docs mention the Thanks! |
From @gpaul on November 17, 2015 10:43
The following set of steps against a fresh elasticsearch 2.0.0 instance with v3.0.2 of this plugin installed shows that copy_to isn't working for the name field. I doubt it is working for the other metadata fields, either.
You can copy/paste this in your shell.
Copied from original issue: elastic/elasticsearch-mapper-attachments#190
The text was updated successfully, but these errors were encountered: