Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

elasticsearch_nodes.py mixes old and new nodes #4778

Closed
14 tasks
hannes-ucsc opened this issue Dec 1, 2022 · 4 comments
Closed
14 tasks

elasticsearch_nodes.py mixes old and new nodes #4778

hannes-ucsc opened this issue Dec 1, 2022 · 4 comments
Assignees
Labels
- [priority] Medium bug [type] A defect preventing use of the system as specified debt [type] A defect incurring continued engineering cost demo [process] To be demonstrated at the end of the sprint demoed [process] Successfully demonstrated to team infra [subject] Project infrastructure like CI/CD, build and deployment scripts orange [process] Done by the Azul team spike:1 [process] Spike estimate of one point

Comments

@hannes-ucsc
Copy link
Member

hannes-ucsc commented Dec 1, 2022

The dashboard created by this build resulted in the following set of nodes

image

Note that the green line is for an old node that stops metric producing data.

  • Security design review completed; the Resolution of this issue does not
    • … affect authentication; for example:
      • OAuth 2.0 with the application (API or Swagger UI)
      • Authentication of developers with Google Cloud APIs
      • Authentication of developers with AWS APIs
      • Authentication with a GitLab instance in the system
      • Password and 2FA authentication with GitHub
      • API access token authentication with GitHub
      • Authentication with
    • … affect the permissions of internal users like access to
      • Cloud resources on AWS and GCP
      • GitLab repositories, projects and groups, administration
      • an EC2 instance via SSH
      • GitHub issues, pull requests, commits, commit statuses, wikis, repositories, organizations
    • … affect the permissions of external users like access to
      • TDR snapshots
    • … affect permissions of service or bot accounts
      • Cloud resources on AWS and GCP
    • … affect audit logging in the system, like
      • adding, removing or changing a log message that represents an auditable event
      • changing the routing of log messages through the system
    • … affect monitoring of the system
    • … introduce a new software dependency like
      • Python packages on PYPI
      • Command-line utilities
      • Docker images
      • Terraform providers
    • … add an interface that exposes sensitive or confidential data at the security boundary
    • … affect the encryption of data at rest
    • … require persistence of sensitive or confidential data that might require encryption at rest
    • … require unencrypted transmission of data within the security boundary
    • … affect the network security layer; for example by
      • modifying, adding or removing firewall rules
      • modifying, adding or removing security groups
      • changing or adding a port a service, proxy or load balancer listens on
  • Documentation on any unchecked boxes is provided in comments below
@hannes-ucsc hannes-ucsc added the orange [process] Done by the Azul team label Dec 1, 2022
@dsotirho-ucsc dsotirho-ucsc added the spike:1 [process] Spike estimate of one point label Dec 1, 2022
@dsotirho-ucsc
Copy link
Contributor

spike to diagnose and devise solution.

@dsotirho-ucsc dsotirho-ucsc self-assigned this Dec 1, 2022
@dsotirho-ucsc
Copy link
Contributor

The current way elasticsearch_nodes.py is getting the list of node IDs is via the AWS API using Cloudwatch list-metrics. Although the RecentlyActive='PT3H' option is specified to filter out data points that haven't been active in the past three hours, depending on when the node was deleted it is still possible to show up in the results.

Instead, a better way of getting the node ids would be to query Elasticsearch for the values.

> from azul.es import ESClientFactory
> es = ESClientFactory.get()
> nodes = es.nodes.info()
> nodes['nodes'].keys()
dict_keys(['wcqZxIpfSbq3_D0q8Ocq0w', 'Ihax3CAYSyeSGQgd9VuB4g', 'D6q4FuA1Tq6YcE9lbVOE6Q', 'iGgJ-0k-S72UHhmqRt0Jzg'])

@hannes-ucsc
Copy link
Member Author

Great idea! This would also fix #4755 which are set as a blockee. One PR for both issues.

@hannes-ucsc hannes-ucsc added bug [type] A defect preventing use of the system as specified debt [type] A defect incurring continued engineering cost infra [subject] Project infrastructure like CI/CD, build and deployment scripts labels Dec 2, 2022
@dsotirho-ucsc dsotirho-ucsc self-assigned this May 3, 2023
@dsotirho-ucsc dsotirho-ucsc added the - [priority] Medium label May 3, 2023
@hannes-ucsc hannes-ucsc added the demo [process] To be demonstrated at the end of the sprint label May 10, 2023
@hannes-ucsc
Copy link
Member Author

hannes-ucsc commented May 10, 2023

For demo, destroy the sandbox domain and redeploy the sandbox from develop. Show that dashboard reflects complete set of new nodes. Reindex the sandbox.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
- [priority] Medium bug [type] A defect preventing use of the system as specified debt [type] A defect incurring continued engineering cost demo [process] To be demonstrated at the end of the sprint demoed [process] Successfully demonstrated to team infra [subject] Project infrastructure like CI/CD, build and deployment scripts orange [process] Done by the Azul team spike:1 [process] Spike estimate of one point
Projects
None yet
Development

No branches or pull requests

2 participants