Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ILM] Cloud-specific changes for 7.10 #79650

Merged
merged 15 commits into from
Oct 13, 2020

Conversation

jloleysens
Copy link
Contributor

@jloleysens jloleysens commented Oct 6, 2020

Summary

Fix #80023

This PR adds two changes to the ILM UI needed for Cloud.

1. Hide the data tiers option if there are no nodes with new node roles

See instructions below for testing this locally.

2. Add a callout on cloud when node roles are detected to the cold phase

On cloud users may be on a hot-warm deployment. The hot-warm deployment uses node attributes to allocate data to different "tiers", which does not use the new tiering system in ILM.

When turning on the cold phase, we want to notify users that they can activate their cold tier in cloud UI.

Notice states introduced here:

Node Roles Node Attributes
Warm Phase no changes no changes
Cold Phase NEW CTA! no changs

When a user on cloud activates cold phase with default (recommended) allocation and they do not have any cold tier nodes, we want to tell them they can turn on the cold tier in cloud.

How to test locally

Add the following to your kibana.dev.yml:

# Enable cloud plugin
xpack.cloud.id: 'eastus2.azure.elastic-cloud.com:9243$59ef636c6917463db140321484d63cfa$a8b109c08adc43279ef48f29af1a3911'
xpack.cloud.deploymentUrl: 'https://cloud.elastic.co/deployments/resolve/cluster/eastus2/9243$59ef636c6917463db140321484d63cfa/'

Starting a cluster by running yarn es snapshot should not create a cold tier so you should see the notice when you navigate to ILM and activate the cold phase. The notice should only be visible for the "recommended" cold setting in data allocation.

Screenshot

Requires copy This copy should be for a future scenario when cloud will introduce a cold tier slider. We do not have access to how exactly this will look/work yet. But know that users on Cloud will be able to go to their cloud deployment and provision cold tier nodes.

Screenshot 2020-10-12 at 11 49 14

Checklist

Delete any items that are not applicable to this PR.

@jloleysens jloleysens added Feature:ILM v8.0.0 Team:Kibana Management Dev Tools, Index Management, Upgrade Assistant, ILM, Ingest Node Pipelines, and more release_note:skip Skip the PR/issue when compiling release notes v7.10.0 labels Oct 6, 2020
@jloleysens jloleysens requested a review from cjcenizal October 6, 2020 12:50
@cjcenizal
Copy link
Contributor

@jloleysens Would you mind updating the PR description with screenshots of the CTAs for both the warm and the cold phases, to give reviewers an idea of the Cloud UX when a user selects the "Use node roles" option for each?

Also, could we add a table in markdown, to clarify the states under which these CTAs manifest? Something like this...

Node roles Custom node attribues
Warm phase <describe state> <describe state>
Cold phase <describe state> <describe state>

And last thing, are we checking for node.data=true by hitting the GET _nodes/settings API anywhere? It's a little late here so my tired eyes might just be missing it. If not, why not? Thanks!

@jloleysens
Copy link
Contributor Author

@elasticmachine merge upstream

@jloleysens jloleysens marked this pull request as ready for review October 7, 2020 08:50
@jloleysens jloleysens requested a review from a team as a code owner October 7, 2020 08:50
@elasticmachine
Copy link
Contributor

Pinging @elastic/es-ui (Team:Elasticsearch UI)

@jloleysens
Copy link
Contributor Author

jloleysens commented Oct 7, 2020

Thanks for taking a look @cjcenizal !

And last thing, are we checking for node.data=true by hitting the GET _nodes/settings API anywhere? It's a little late here so my tired eyes might just be missing it. If not, why not? Thanks!

Great question! The way I understand it is that only for the cold phase are we able to tell the user about the ability to provision cold tier nodes in the cloud UI. The way we know they don't have cold tier nodes is by checking for any nodes with the data role data_cold. This way we know that they have or have not provisioned cold tier nodes in Cloud using the slider. This also means we can use the data on node roles that we are already pulling in.

I'm not sure what the added check at GET _nodes/settings would give us beyond an extra confirmation that nodes have been set up using legacy elasticsearch.yml roles.

Might be missing something here though?

@jloleysens jloleysens requested a review from esdocs October 7, 2020 09:23
@jloleysens
Copy link
Contributor Author

@jasontedor Could you confirm the above?

@cjcenizal cjcenizal requested a review from bleskes October 7, 2020 21:39
Copy link
Contributor

@cjcenizal cjcenizal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally, code LGTM. Thanks for doing this, JL! I'd appreciate @bleskes taking a look at this to confirm it's what he expected.

}),
body: i18n.translate('xpack.indexLifecycleMgmt.editPolicy.cloudDataTierCallout.body', {
defaultMessage:
'Data in this phase will be allocated to hot and warm tier nodes. Go to the Cloud UI to create a cold tier.',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need some writer's eyes on this copy. For example, I'd expect to refer to "Elastic Cloud" instead of "Cloud UI". Can we make that a link to https://cloud.elastic.co/home?

@cjcenizal
Copy link
Contributor

I opened #79946 to discuss Cloud-appropriate prompts for other configuration states.

@jloleysens
Copy link
Contributor Author

@elasticmachine merge upstream

@jloleysens
Copy link
Contributor Author

After further clarification on cloud requirements, this PR will need to be re-reviewed once the first option supporting the cloud case has been added.

@jloleysens jloleysens requested a review from cjcenizal October 8, 2020 16:08
@jloleysens jloleysens changed the title [ILM] Cloud-specific cold tier callout [ILM] Cloud-specific changes for 7.10 Oct 8, 2020
@jloleysens
Copy link
Contributor Author

@elasticmachine merge upstream

- Added server-side test for detecting legacy config
- Added client-side test for asserting on cloud data tier option
  is hidden
defaultMessage: 'Create a cold tier',
}),
body: i18n.translate('xpack.indexLifecycleMgmt.editPolicy.cloudDataTierCallout.body', {
defaultMessage: 'Edit your deployment on Cloud to create cold tier nodes.',
Copy link
Contributor

@jrodewig jrodewig Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the user is creating a cold tier, they're not necessarily creating cold tier nodes.
I think it's more accurate to say they're adding or assigning nodes to a cold tier.

Here are my suggestions in order of preference:

  • Add nodes to the cold tier in Elastic Cloud. Edit your Elastic Cloud deployment to set up a cold tier.
  • Edit your Elastic Cloud deployment to add nodes to the cold tier.
  • Edit your deployment on Elastic Cloud to add cold tier nodes.

Copy link
Contributor

@cjcenizal cjcenizal Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there value in describing the mechanics of "adding nodes"? Would it be easier for the user to understand if we were to describe this in a simplified way like "Edit your deployment on Elastic Cloud to set up the cold tier."?

Copy link
Contributor

@jrodewig jrodewig Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cjcenizal +1. I like the idea of abstracting away "adding nodes" if we're able.

Edit your Elastic Cloud deployment to set up a cold tier. works for me. I'll amend me original comment accordingly.

Copy link
Contributor

@jrodewig jrodewig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I added some suggestions, but the current copy is fine as-is.

Copy link
Contributor

@cjcenizal cjcenizal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, JL! Tested locally and verified that the "Default" option for selecting node roles is disabled when a node is configured with the data: true data role and that the cold phase directs the user to Cloud when the nodes are configured with data: false (screenshots below). Code LGTM, just had a few comments around readability and tests.

Node roles are disabled

image

Cold phase directs user to Cloud

image

* This is a slight hack because we only know we should disable the "default" option further
* down the component tree (i.e., after the policy has been deserialized).
*
* We reset the value to "custom" if we deserialized to "default".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to read this comment a few times to understand what the intention was. I interpret this as meaning, "If the initial data tier allocation value is default, we'll override this with custom if this option is disallowed." Is that right?

The "slight hack" part confuses me because I'm not sure what the "right" way to do it would be or why it isn't possible and therefore why we're forced to resort to a hack. Can we elaborate to explain what's hacky about this and why the hack is justified?

// If we detect a single node using legacy "data:true" setting we know we are not using data roles for
// data allocation.
if (!accum.isUsingLegacyDataRoleConfig) {
accum.isUsingLegacyDataRoleConfig = nodeSettings.settings?.node?.data === 'true';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think "deprecated" would be a bit clearer than "legacy". We use "legacy" to refer to index templates v1, and AFAIK they don't have have a deprecation schedule yet. In contrast, data roles have been deprecated and will be removed in 8.0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description mentions the presence of data: true as the condition, but the actual condition in the code is the value of isUsingLegacyDataRoleConfig. Can we align the description with the code? Personally I'd expect the code to be something like this (it wouldn't matter to me if it attempts to set accum.isUsingLegacyDataRoleConfig repeatedly):

if (nodeSettings.settings?.node?.data === 'true') {
  accum.isUsingLegacyDataRoleConfig = true;
}

nodesByAttributes: {},
nodesByRoles: {},
// Start with assumption that we are not using legacy config
isUsingLegacyDataRoleConfig: false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update the API integration test to check for the presence of isUsingLegacyDataRoleConfig? While we're there we might as well add a test for nodesByRoles too, since it looks like that's not being tested.

'Qtpmy7aBSIaOZisv9Q92TA',
],
});
expect(result.nodesByAttributes).toMatchInlineSnapshot(`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to reduce some noise by comparing against a JS object, we can use this instead:

    expect(result.nodesByAttributes).toEqual(
      {
        'availability_zone:europe-west4-a': [
          'ZVndRfrfSl-kmEyZgJu0JQ',
          'Tx8Xig60SIuitXhY0srD6Q',
        ],
        'availability_zone:europe-west4-b': [
          'SgaCpsXAQu-oTsP4iLGZWw',
        ],
        'availability_zone:europe-west4-c': [
          't49k7mdeRIiELuOt_MOZ1g',
          'Qtpmy7aBSIaOZisv9Q92TA',
        ],
        'data:hot': [
          'SgaCpsXAQu-oTsP4iLGZWw',
          'ZVndRfrfSl-kmEyZgJu0JQ',
          'Qtpmy7aBSIaOZisv9Q92TA',
        ],
        'data:warm': [
          't49k7mdeRIiELuOt_MOZ1g',
          'Tx8Xig60SIuitXhY0srD6Q',
        ],
        'instance_configuration:gcp.data.highio.1': [
          'ZVndRfrfSl-kmEyZgJu0JQ',
          'Qtpmy7aBSIaOZisv9Q92TA',
        ],
        'instance_configuration:gcp.data.highstorage.1': [
          't49k7mdeRIiELuOt_MOZ1g',
          'Tx8Xig60SIuitXhY0srD6Q',
        ],
        'instance_configuration:gcp.master.1': [
          'SgaCpsXAQu-oTsP4iLGZWw',
        ],
        'logical_availability_zone:tiebreaker': [
          'SgaCpsXAQu-oTsP4iLGZWw',
        ],
        'logical_availability_zone:zone-0': [
          't49k7mdeRIiELuOt_MOZ1g',
          'Qtpmy7aBSIaOZisv9Q92TA',
        ],
        'logical_availability_zone:zone-1': [
          'ZVndRfrfSl-kmEyZgJu0JQ',
          'Tx8Xig60SIuitXhY0srD6Q',
        ],
        'region:unknown-region': [
          't49k7mdeRIiELuOt_MOZ1g',
          'SgaCpsXAQu-oTsP4iLGZWw',
          'ZVndRfrfSl-kmEyZgJu0JQ',
          'Tx8Xig60SIuitXhY0srD6Q',
          'Qtpmy7aBSIaOZisv9Q92TA',
        ],
        'server_name:instance-0000000000.6ee9547c30214d278d2a63c4de98dea5': [
          'Qtpmy7aBSIaOZisv9Q92TA',
        ],
        'server_name:instance-0000000001.6ee9547c30214d278d2a63c4de98dea5': [
          'ZVndRfrfSl-kmEyZgJu0JQ',
        ],
        'server_name:instance-0000000002.6ee9547c30214d278d2a63c4de98dea5': [
          't49k7mdeRIiELuOt_MOZ1g',
        ],
        'server_name:instance-0000000003.6ee9547c30214d278d2a63c4de98dea5': [
          'Tx8Xig60SIuitXhY0srD6Q',
        ],
        'server_name:tiebreaker-0000000004.6ee9547c30214d278d2a63c4de98dea5': [
          'SgaCpsXAQu-oTsP4iLGZWw',
        ],
        'transform.node:false': [
          'SgaCpsXAQu-oTsP4iLGZWw',
        ],
        'transform.node:true': [
          't49k7mdeRIiELuOt_MOZ1g',
          'ZVndRfrfSl-kmEyZgJu0JQ',
          'Tx8Xig60SIuitXhY0srD6Q',
          'Qtpmy7aBSIaOZisv9Q92TA',
        ],
        'xpack.installed:true': [
          't49k7mdeRIiELuOt_MOZ1g',
          'SgaCpsXAQu-oTsP4iLGZWw',
          'ZVndRfrfSl-kmEyZgJu0JQ',
          'Tx8Xig60SIuitXhY0srD6Q',
          'Qtpmy7aBSIaOZisv9Q92TA',
        ],
      }
    );

import { cloudNodeSettingsWithLegacy, cloudNodeSettingsWithoutLegacy } from './fixtures';

describe('convertSettingsIntoLists', () => {
it('does not flag legacy config incorrectly', () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I find the double negative a bit harder to parse than a positive. How about "detects modern config"?

return null;
}
const renderNotice = () => {
if (phaseData.dataTierAllocationType === 'default') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not something we need to implement now, but I think we could have made the code a bit easier to read by using absolute values like node_roles and node_attributes as the values for dataTierAllocationType. custom and default are relative to whatever we've determined the default should be in the UI, and the logic would break if we decided to use node_attributes as the default for some crazy reason. Not that we'd do this, just trying to point out the tight coupling here. :)

Using values like node_roles and node_attributes would allow the reader to seamlessly associate these things with references in the code to node roles (nodesByRoles.data_cold and allocationNodeRole) and node attributes (hasNodeAttrs), rather than have to do some mental mapping to build that association.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a switch/case would make this a bit more scannable?

switch (phaseData.dataTierAllocationType) {
  case 'default':
    /* snip */

  case 'custom':
    /* snip */

  default:
    return null;
}

const isCloudEnabled = cloud?.isCloudEnabled ?? false;
if (
isCloudEnabled &&
!isUsingLegacyDataRoleConfig &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might sound silly, but can we extract this into a variable that offers some more meaning in the context of Cloud? For example, my understanding is that Cloud has two states: node roles are enabled or node roles are disabled. Reading a bit further, I think it'd be easier for me to wrap my head around this complex condition by doing the same thing with the other conditions. So maybe something like this would communicate the intention more clearly:

              const isCloudEnabled = cloud?.isCloudEnabled ?? false;
              const isColdPhase = phase === 'cold';
              const areNodeRolesEnabledOnCloud = !isUsingLegacyDataRoleConfig;
              const hasColdNodeRoles = !nodesByRoles.data_cold?.length;
              
              if (
                isCloudEnabled &&
                isColdPhase &&
                areNodeRolesEnabledOnCloud &&
                hasColdNodeRoles
              ) {
                // Tell cloud users they can deploy cold tier nodes.
                return (
                  <>
                    <EuiSpacer size="s" />
                    <CloudDataTierCallout />
                  </>
                );
              }

…a-tier-migration

* 'master' of github.com:elastic/kibana: (34 commits)
  Line Visualization improper scaling can result in gaps elastic#79663 (elastic#80135)
  [Security Solution][Timeline] Fix SelectableTimeline search (elastic#80128)
  move field_mapping util to saved_objects plugin (elastic#79918)
  [ML] Datagrid: Ensure column content with 'boolean' schema is not capitalized (elastic#80041)
  [CI] Correctly resolve repository root for JUnit reports (elastic#80226)
  [Ingest Manager] Fix package upgrade breaking after first rollover before new data has arrived (elastic#79887)
  [Security Solution] Fix positioning of Kibana banners list inside Sec… (elastic#80124)
  add missing await to fix test (elastic#80202)
  Revert test data changed in previous commit. (elastic#79479)
  [Security Solution] [Sourcerer] Jest beef up (elastic#79907)
  Re-enable transaction duration alert story (elastic#80187)
  [npm] remove canvas dep (elastic#80185)
  [DOCS] Redirects ILM docs to Elasticsearch reference (elastic#79798)
  [APM] Catch health status error from ML (elastic#80131)
  Fix layout and remove title for add alert popover. (elastic#77633)
  [Discover] Loading spinner cleanup (elastic#79819)
  [Security Solution] [Resolver] Remove related events api (elastic#79036)
  [Ingest Manager] Remove fields from index pattern during package uninstall (elastic#80082)
  do not refetch license if signature header absents from a response (elastic#79645)
  Only send agent data for non-opentelemetry agents (elastic#79587)
  ...

# Conflicts:
#	x-pack/plugins/index_lifecycle_management/server/routes/api/nodes/register_list_route.ts
@jloleysens jloleysens merged commit a0c649e into elastic:master Oct 13, 2020
@jloleysens jloleysens deleted the ilm/cloud-data-tier-migration branch October 13, 2020 11:41
jloleysens added a commit to jloleysens/kibana that referenced this pull request Oct 13, 2020
* added cloud cold tier specific call out, probably wip

* added jest test

* rephrase copy

* Added logic for showing/hiding data tier allocation option

- Added server-side test for detecting legacy config
- Added client-side test for asserting on cloud data tier option
  is hidden

* added test for not-on-cloud case

* Refactored logic for rendering data allocation notices

Also updated a comment.

* paraphrashing of Adams copy recommendation

* Fix comment

* address pr feedback

* clarify slight hack comment

* complete refactor of new flag for tests

Co-authored-by: Kibana Machine <[email protected]>
jloleysens added a commit to jloleysens/kibana that referenced this pull request Oct 13, 2020
* added cloud cold tier specific call out, probably wip

* added jest test

* rephrase copy

* Added logic for showing/hiding data tier allocation option

- Added server-side test for detecting legacy config
- Added client-side test for asserting on cloud data tier option
  is hidden

* added test for not-on-cloud case

* Refactored logic for rendering data allocation notices

Also updated a comment.

* paraphrashing of Adams copy recommendation

* Fix comment

* address pr feedback

* clarify slight hack comment

* complete refactor of new flag for tests

Co-authored-by: Kibana Machine <[email protected]>
# Conflicts:
#	x-pack/plugins/index_lifecycle_management/server/routes/api/nodes/register_list_route.ts
jloleysens added a commit that referenced this pull request Oct 13, 2020
* added cloud cold tier specific call out, probably wip

* added jest test

* rephrase copy

* Added logic for showing/hiding data tier allocation option

- Added server-side test for detecting legacy config
- Added client-side test for asserting on cloud data tier option
  is hidden

* added test for not-on-cloud case

* Refactored logic for rendering data allocation notices

Also updated a comment.

* paraphrashing of Adams copy recommendation

* Fix comment

* address pr feedback

* clarify slight hack comment

* complete refactor of new flag for tests

Co-authored-by: Kibana Machine <[email protected]>

Co-authored-by: Kibana Machine <[email protected]>
jloleysens added a commit that referenced this pull request Oct 13, 2020
* [ILM] Cloud-specific changes for 7.10 (#79650)

* added cloud cold tier specific call out, probably wip

* added jest test

* rephrase copy

* Added logic for showing/hiding data tier allocation option

- Added server-side test for detecting legacy config
- Added client-side test for asserting on cloud data tier option
  is hidden

* added test for not-on-cloud case

* Refactored logic for rendering data allocation notices

Also updated a comment.

* paraphrashing of Adams copy recommendation

* Fix comment

* address pr feedback

* clarify slight hack comment

* complete refactor of new flag for tests

Co-authored-by: Kibana Machine <[email protected]>
# Conflicts:
#	x-pack/plugins/index_lifecycle_management/server/routes/api/nodes/register_list_route.ts

* replace r with (r)
@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

@kbn/optimizer bundle module count

id before after diff
indexLifecycleManagement 206 208 +2

async chunks size

id before after diff
indexLifecycleManagement 233.1KB 235.2KB +2.1KB

distributable file count

id before after diff
default 48447 48448 +1

page load bundle size

id before after diff
indexLifecycleManagement 89.9KB 90.0KB +26.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:ILM NeededFor:Cloud release_note:skip Skip the PR/issue when compiling release notes Team:Kibana Management Dev Tools, Index Management, Upgrade Assistant, ILM, Ingest Node Pipelines, and more v7.10.0 v7.11.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ILM] Support hybrid tier states on cloud
5 participants