Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement framework for migrating system indices #78951

Merged
merged 37 commits into from
Oct 20, 2021

Conversation

gwbrown
Copy link
Contributor

@gwbrown gwbrown commented Oct 12, 2021

This PR adds a framework for migrating system indices as necessary prior
to Elasticsearch upgrades. This framework uses REST APIs added in
another commit:

  • GET _migration/system_features

This API, which gets the status of "features" (plugins which own system
indices) with regards to whether they need to be upgraded or not. As of
this PR, this API also reports errors encountered while migrating system
indices alongside the index that was being processed when this occurred.

As an example of this error reporting:

{
    "feature_name": "logstash_management",
    "minimum_index_version": "8.0.0",
    "upgrade_status": "ERROR",
    "indices": [
        {
            "index": ".logstash",
            "version": "8.0.0",
            "failure_cause": {
                "error": {
                    "root_cause": [
                        {
                            "type": "runtime_exception",
                            "reason": "whoopsie",
                            "stack_trace": "<omitted for brevity>"
                        }
                    ],
                    "type": "runtime_exception",
                    "reason": "whoopsie",
                    "stack_trace": "<omitted for brevity>"
                }
            }
        }
    ]
}
  • POST _migration/system_features

This API starts the migration process. The API for this has no changes,
but when called, any system indices which need to be migrated will be
migrated, with status information stored in the cluster state for later
use by the GET _migration/system_features API.

This PR adds a framework for migrating system indices as necessary prior
to Elasticsearch upgrades. This framework uses REST APIs added in
another commit:
- GET _migration/system_features

This API, which gets the status of "features" (plugins which own system
indices) with regards to whether they need to be upgraded or not. As of
this PR, this API also reports errors encountered while migrating system
indices alongside the index that was being processed when this occurred.

As an example of this error reporting:

```json
{
    "feature_name": "logstash_management",
    "minimum_index_version": "8.0.0",
    "upgrade_status": "ERROR",
    "indices": [
        {
            "index": ".logstash",
            "version": "8.0.0",
            "failure_cause": {
                "error": {
                    "root_cause": [
                        {
                            "type": "runtime_exception",
                            "reason": "whoopsie",
                            "stack_trace": "<omitted for brevity>"
                        }
                    ],
                    "type": "runtime_exception",
                    "reason": "whoopsie",
                    "stack_trace": "<omitted for brevity>"
                }
            }
        }
    ]
}
```

- POST _migration/system_features

This API starts the migration process. The API for this has no changes,
but when called, any system indices which need to be migrated will be
migrated, with status information stored in the cluster state for later
use by the GET _migration/system_features API.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

Copy link
Contributor

@williamrandolph williamrandolph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've read through all the code so far and I like it. I have a few comments and requests, and I will be happy to review again once we have some good tests. But overall: very impressive!

GetFeatureUpgradeStatusResponse.UpgradeStatus initialStatus;
if (featureName.equals(currentFeature)) {
initialStatus = IN_PROGRESS;
} else if (minimumVersion.before(Version.V_7_0_0)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably pull this version number out into a constant to make it more visible and to use it in tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I half-heartedly introduced one while backporting to 7.x in order to get bwc tests working: 3b43555

@@ -593,6 +593,7 @@ public static void validateFeatureName(String name, String plugin) {
* Class holding a description of a stateful feature.
*/
public static class Feature {
private final String name;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I see this field, it's obvious that it always should have been here.

private final String currentFeature;
private final Map<String, Object> featureCallbackMetadata;

public SystemIndexMigrationTaskState(String currentIndex, String currentFeature, Map<String, Object> featureCallbackMetadata) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: A little javadoc explaining what the feature callback metadata contains and where to find more information about it

Copy link
Contributor

@williamrandolph williamrandolph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really, really good. I have some nitpicks and testing requests, but nothing substantial. However, I'm going to mark it as "request changes" anyway so I can get a notification when commits from end-to-end testing make it here.

One thing I want to look at is making sure that, at the end of the process, all the original aliases are pointing at the migrated index.

Great work!

assertIndexHasCorrectProperties(finalMetadata, ".ext-unman-old-reindexed-for-8", EXTERNAL_UNMANAGED_FLAG_VALUE, false, false);
}

public void assertIndexHasCorrectProperties(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some assertions about aliases to this method? If I'm understanding things right, the original index name should now be an alias pointing to the migrated index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! And yes, that should be the case. I've also updated the code to copy any aliases the old index had to the new index, which was my original intent.

static final int EXTERNAL_UNMANAGED_FLAG_VALUE = 4;
static final SystemIndexDescriptor INTERNAL_MANAGED = SystemIndexDescriptor.builder()
.setIndexPattern(".int-man-*")
.setAliasName(".internal-managed-alias")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we create the aliases for the managed indices in these tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The aliases for the managed indices should be created automatically at index creation time. The tests now verify that the final, post-migration managed indices have the alias set in the descriptor.

SetOnce<Boolean> secondPluginPostMigrationHookCalled = new SetOnce<>();

TestPlugin.preMigrationHook.set(clusterState -> {
// None of the other hooks should have been called yet.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this way of verifying the order in which these hooks are called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I can't take credit for it, I picked this style up from uh... somewhere else in Elasticsearch, not sure where exactly.

@@ -199,34 +211,45 @@ public String toString() {
"featureName='" + featureName + '\'' +
", minimumIndexVersion='" + minimumIndexVersion + '\'' +
", upgradeStatus='" + upgradeStatus + '\'' +
", indexVersions=" + indexVersions +
", indexVersions=" + indexInfos +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A very tiny nitpick: indexVersions= could be changed to indexInfos=.

}

// visible for testing
static List<GetFeatureUpgradeStatusResponse.IndexVersion> getIndexVersions(ClusterState state, SystemIndices.Feature feature) {
static List<GetFeatureUpgradeStatusResponse.IndexInfo> getIndexVersions(ClusterState state, SystemIndices.Feature feature) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We could rename this method as well


assertThat(status.getUpgradeStatus(), equalTo(NO_UPGRADE_NEEDED));
assertThat(status.getUpgradeStatus(), equalTo(UPGRADE_NEEDED));
assertThat(status.getFeatureName(), equalTo("test-feature"));
assertThat(status.getMinimumIndexVersion(), equalTo(Version.V_7_0_0));
assertThat(status.getIndexVersions().size(), equalTo(2)); // additional testing below
}

public void testGetIndexVersion() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny, tiny nit: rename to testGetIndexInfos if we rename the getIndexVersions method.

Copy link
Contributor

@williamrandolph williamrandolph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code since my last review is very straightforward, and testing a build of the branch worked like a charm.

This is looking really great. Thank you so much for all the good work here.

@gwbrown gwbrown merged commit 1a90689 into elastic:master Oct 20, 2021
gwbrown added a commit that referenced this pull request Oct 20, 2021
gwbrown added a commit to gwbrown/elasticsearch that referenced this pull request Oct 20, 2021
Also re-enables BWC following the backport of elastic#78951.
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Oct 20, 2021
* upstream/master: (24 commits)
  Implement framework for migrating system indices (elastic#78951)
  Improve transient settings deprecation message (elastic#79504)
  Remove getValue and getValues from Field (elastic#79516)
  Store Template's mappings as bytes for disk serialization (elastic#78746)
  [ML] Add queue_capacity setting to start deployment API (elastic#79433)
  [ML] muting rest compat test issue elastic#79518 (elastic#79519)
  Avoid redundant available indices check (elastic#76540)
  Re-enable BWC tests
  TEST Ensure password 14 chars length on Kerberos FIPS tests (elastic#79496)
  [DOCS] Temporarily remove APM links (elastic#79411)
  Fix CCSDuelIT for skipped shards (elastic#79490)
  Add other time accounting in HotThreads (elastic#79392)
  Add deprecation info API entries for deprecated monitoring settings (elastic#78799)
  Add note in breaking changes for nameid_format (elastic#77785)
  Use 'migration' instead of 'upgrade' in GET system feature migration status responses (elastic#79302)
  Upgrade lucene version 8b68bf60c98 (elastic#79461)
  Use Strings#EMPTY_ARRAY (elastic#79452)
  Quicker shared cache file preallocation (elastic#79447)
  [ML] Removing some code that's obsolete for 8.0 (elastic#79444)
  Ensure indexing_data CCR requests are compressed (elastic#79413)
  ...
gwbrown added a commit that referenced this pull request Oct 20, 2021
This PR adds a framework for migrating system indices as necessary prior
to Elasticsearch upgrades. This framework uses REST APIs added in
another commit:
- GET _migration/system_features

This API, which gets the status of "features" (plugins which own system
indices) with regards to whether they need to be upgraded or not. As of
this PR, this API also reports errors encountered while migrating system
indices alongside the index that was being processed when this occurred.

As an example of this error reporting:

```json
{
    "feature_name": "logstash_management",
    "minimum_index_version": "8.0.0",
    "upgrade_status": "ERROR",
    "indices": [
        {
            "index": ".logstash",
            "version": "8.0.0",
            "failure_cause": {
                "error": {
                    "root_cause": [
                        {
                            "type": "runtime_exception",
                            "reason": "whoopsie",
                            "stack_trace": "<omitted for brevity>"
                        }
                    ],
                    "type": "runtime_exception",
                    "reason": "whoopsie",
                    "stack_trace": "<omitted for brevity>"
                }
            }
        }
    ]
}
```

- POST _migration/system_features

This API starts the migration process. The API for this has no changes,
but when called, any system indices which need to be migrated will be
migrated, with status information stored in the cluster state for later
use by the GET _migration/system_features API.
elasticsearchmachine pushed a commit that referenced this pull request Oct 20, 2021
Also re-enables BWC following the backport of #78951.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label >enhancement Team:Core/Infra Meta label for core/infra team v7.16.0 v8.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants