Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index name determined by parameters #14

Open
danizen opened this issue Jun 20, 2017 · 1 comment
Open

Index name determined by parameters #14

danizen opened this issue Jun 20, 2017 · 1 comment

Comments

@danizen
Copy link

danizen commented Jun 20, 2017

I've just returned from elasticsearch Core Developers training, and I've learned that the most common idiom is to control your indices settings and mappings using "templates", and control what you are searching using "aliases". Tools like logstash support an index name that depends on parameters, e.g. the index name actually depends on properties of the data to be indexed.

Issue Norconex/crawlers#359 would address this, but a more thorough solution would be to allow date/timestamp substitution in the indexName, or to have a 1-up counter used on some periodic time basis.

These are ideas - both this issue and the former issue are syntactic sugar, as someone can always use Elasticsearch "alias" feature for indexing as well as search. That is, it is possible to use aliases for indexing as well, so that an alias such as "allcrawls" is used for searching, and an alias such as "current-crawl" is used for indexing.

@essiembre
Copy link
Contributor

This is something that has been considered a few times. It is not a problem to add this feature for document additions/modifications... but it is a problem with deletions where we may not have the piece of metadata used to determine the index name. Not all metadata fields are cached between recrawls, just the minimum required.

If you do not think it would be too much to have the Elasticsearch committer cache the field names/values used to figure out the index names... then it could be given more serious thoughts. If you have an idea how to do it without adding more caching, that would be even better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants