Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support index cleaner for rollover indices and add integration tests #1689

Merged

Conversation

pavolloffay
Copy link
Member

Resolves #1681
Resolves #1682

@pavolloffay pavolloffay changed the title Index cleaner rollover itest Support index cleaner for rollover indices and add integration tests Jul 25, 2019
@pavolloffay pavolloffay force-pushed the index-cleaner-rollover-itest branch from 7f5b45b to 83e2621 Compare July 25, 2019 15:58
@codecov
Copy link

codecov bot commented Jul 25, 2019

Codecov Report

Merging #1689 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1689   +/-   ##
=======================================
  Coverage   98.49%   98.49%           
=======================================
  Files         193      193           
  Lines        9286     9286           
=======================================
  Hits         9146     9146           
  Misses        111      111           
  Partials       29       29

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7f1daf2...5f28d63. Read the comment docs.

@pavolloffay
Copy link
Member Author

@objectiser could you please have a look?

Copy link
Contributor

@objectiser objectiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - just a couple of questions/comments.

@@ -15,7 +15,8 @@ def main():
print('HOSTNAME ... specifies which Elasticsearch hosts URL to search and delete indices from.')
print('TIMEOUT ... number of seconds to wait for master node response.'.format(TIMEOUT))
print('INDEX_PREFIX ... specifies index prefix.')
print('ARCHIVE ... specifies whether to remove archive indices (default false).')
print('ARCHIVE ... specifies whether to remove archive indices (only works for rollover) (default false).')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a change in behaviour, or did it never work for archive (with daily indices)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just a comment change.

It worked and works for archive - but it is only supported for archive managed by rollover. If there is one archive we do not touch it.

plugin/storage/es/esCleaner.py Outdated Show resolved Hide resolved
@@ -60,13 +64,36 @@ def main():
def filter_main_indices(ilo, prefix):
ilo.filter_by_regex(kind='prefix', value=prefix + "jaeger")
empty_list(ilo, "No indices to delete")
ilo.filter_by_alias(aliases=[prefix + 'jaeger-span-read'], exclude=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these included, aren't they related to rollover?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are excluded, see the last parameter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry poor use of words, I mean why are these lines included/added - as they seem to relate to aliases related to the rollover approach. From looking at the tests, it seems like it is so that existing aliases (from rollover) are not touched - but that would imply the user has gone from a rollover environment back to using daily indices?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have simplified this in the last commit. Now there is a regex which matches exactly the daily inidices so no need to remove rollover indices afterwards.

# This excludes archive index as we use source='name'
# source `creation_date` would include archive index
ilo.filter_by_age(source='name', direction='older', timestring='%Y-%m-%d', unit='days', unit_count=int(sys.argv[1]))


def filter_main_indices_rollover(ilo, prefix):
ilo.filter_by_alias(aliases=[prefix + 'jaeger-*-read'])
empty_list(ilo, "No indices to delete")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these required after each filter, or just at the end? Does it cause problems calling the filter_by_alias on an empty list?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's required otherwise it fails on empty cluster

# This excludes archive index as we use source='name'
# source `creation_date` would include archive index
# TODO it might be useful to allow filter_by_space
ilo.filter_by_age(source='creation_date', direction='older', unit='days', unit_count=int(sys.argv[1]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering about a particular scenario - if rollover criteria is based on index size (for example) in low activity environment, it might be possible that the same index is in use for more days that the unit_count, in which case it would remove the index - even though it is in current use?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right removing by size does not really make sense.

plugin/storage/integration/es_index_cleaner_test.go Outdated Show resolved Hide resolved
Signed-off-by: Pavol Loffay <[email protected]>
@pavolloffay pavolloffay merged commit 5cd5752 into jaegertracing:master Jul 26, 2019
ilo.filter_by_regex(kind='regex', value=prefix + "jaeger-(span|service)-\d{6}")
empty_list(ilo, "No indices to delete")
# do not remove active write indices
ilo.filter_by_alias(aliases=[prefix + 'jaeger-span-write'], exclude=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If filtering by the regex now, are these two aliases filters required?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comment above. We need to exclude current active write indices.



def filter_archive_indices_rollover(ilo, prefix):
# Remove only rollover archive indices
# Do not remove active write archive index
ilo.filter_by_alias(aliases=[prefix + 'jaeger-span-archive-write'], exclude=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to filter by regex here aswell to be consistent with the change in filter_main_indices_rollover?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunatelly this is not filter_by_regex method

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what happened - the comment should have been associated with two lines down - so changing the ilo.filter_by_alias(aliases=[prefix + 'jaeger-span-archive-read']) to be ilo.filter_by_regex(kind='regex', value=prefix + "jaeger-(span|service)-\d{6}") ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I do not understand what you mean.

  1. we use regex to find indices which should be removed
  2. but we also need to exclude indices which are associated with write alias - to do that we use filter for an alias.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just mean that filter_main_indices and filter_main_indices_rollover are both using filter_by_regex to identify the indices to be removed - whereas filter_archive_indices_rollover uses ilo.filter_by_alias(aliases=[prefix + 'jaeger-span-archive-read']) - just suggesting that it would be better if filter_archive_indices_rollover also uses filter_by_regex for consistency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that sounds better #1693

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add tests for es-index cleaner and es-rollover Make es-index-cleaner work with rollover
2 participants