You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've just returned from elasticsearch Core Developers training, and I've learned that the most common idiom is to control your indices settings and mappings using "templates", and control what you are searching using "aliases". Tools like logstash support an index name that depends on parameters, e.g. the index name actually depends on properties of the data to be indexed.
Issue Norconex/crawlers#359 would address this, but a more thorough solution would be to allow date/timestamp substitution in the indexName, or to have a 1-up counter used on some periodic time basis.
These are ideas - both this issue and the former issue are syntactic sugar, as someone can always use Elasticsearch "alias" feature for indexing as well as search. That is, it is possible to use aliases for indexing as well, so that an alias such as "allcrawls" is used for searching, and an alias such as "current-crawl" is used for indexing.
The text was updated successfully, but these errors were encountered:
This is something that has been considered a few times. It is not a problem to add this feature for document additions/modifications... but it is a problem with deletions where we may not have the piece of metadata used to determine the index name. Not all metadata fields are cached between recrawls, just the minimum required.
If you do not think it would be too much to have the Elasticsearch committer cache the field names/values used to figure out the index names... then it could be given more serious thoughts. If you have an idea how to do it without adding more caching, that would be even better.
I've just returned from elasticsearch Core Developers training, and I've learned that the most common idiom is to control your indices settings and mappings using "templates", and control what you are searching using "aliases". Tools like logstash support an index name that depends on parameters, e.g. the index name actually depends on properties of the data to be indexed.
Issue Norconex/crawlers#359 would address this, but a more thorough solution would be to allow date/timestamp substitution in the indexName, or to have a 1-up counter used on some periodic time basis.
These are ideas - both this issue and the former issue are syntactic sugar, as someone can always use Elasticsearch "alias" feature for indexing as well as search. That is, it is possible to use aliases for indexing as well, so that an alias such as "allcrawls" is used for searching, and an alias such as "current-crawl" is used for indexing.
The text was updated successfully, but these errors were encountered: