Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get snapshot API returns duplicate information for .fleet-actions-results system data stream #111146

Open
romain-chanu opened this issue Jul 22, 2024 · 5 comments
Labels
>bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

Comments

@romain-chanu
Copy link

romain-chanu commented Jul 22, 2024

Elasticsearch Version

8.14.3

Installed Plugins

No response

Java Version

bundled

OS Version

N.A

Problem Description

Get snapshot API returns duplicate information for .fleet-actions-results system data stream.

c.f JSON response below:

{
  "total": 1,
  "remaining": 0,
  "snapshots": [
    {
      "include_global_state": true,
      "uuid": "PSIEWRmOR1m5EzTiCZJANg",
      "repository": "found-snapshots",
      "duration_in_millis": 8821,
      "start_time": "2024-07-22T02:29:59.815Z",
      "shards": {
        "successful": 71,
        "failed": 0,
        "total": 71
      },
      "version_id": 8505000,
      "end_time_in_millis": 1721615408636,
      "state": "SUCCESS",
      "version": "8.14.0-8.14.2",
      "snapshot": "cloud-snapshot-2024.07.22-wpnxv4hmqzqb3vewczcjvq",
      "end_time": "2024-07-22T02:30:08.636Z",
      "feature_states": [
        {
          "indices": [
            ".security-tokens-7",
            ".security-7",
            ".security-profile-8"
          ],
          "feature_name": "security"
        },
        {
          "indices": [
            ".kibana_8.14.3_001",
            ".kibana_security_solution_8.14.3_001",
            ".apm-custom-link",
            ".kibana_ingest_8.14.3_001",
            ".apm-agent-configuration",
            ".kibana_analytics_8.14.3_001",
            ".kibana_security_session_1",
            ".kibana_alerting_cases_8.14.3_001",
            ".kibana_task_manager_8.14.3_001"
          ],
          "feature_name": "kibana"
        },
        {
          "indices": [
            ".geoip_databases"
          ],
          "feature_name": "geoip"
        },
        {
          "indices": [
            ".transform-internal-007"
          ],
          "feature_name": "transform"
        },
        {
          "indices": [
            ".fleet-agents-7",
            ".fleet-enrollment-api-keys-7",
            ".fleet-actions-7",
            ".fleet-policies-7",
            ".fleet-servers-7",
            ".fleet-policies-leader-7"
          ],
          "feature_name": "fleet"
        }
      ],
      "indices": [
        ".ds-metrics-system.memory-default-2024.07.21-000001",
        ".apm-agent-configuration",
        ".ds-metrics-system.socket_summary-default-2024.07.21-000001",
        ".ds-metrics-elastic_agent.metricbeat-default-2024.07.21-000001",
        ".ds-logs-osquery_manager.result-default-2024.07.21-000001",
        ".kibana_task_manager_8.14.3_001",
        ".fleet-servers-7",
        ".kibana_ingest_8.14.3_001",
        ".ds-ilm-history-7-2024.07.21-000001",
        ".internal.alerts-ml.anomaly-detection-health.alerts-default-000001",
        ".slo-observability.summary-v3.2",
        ".ds-metrics-elastic_agent.elastic_agent-default-2024.07.21-000001",
        ".internal.alerts-observability.metrics.alerts-default-000001",
        ".internal.alerts-ml.anomaly-detection.alerts-default-000001",
        ".internal.alerts-security.alerts-default-000001",
        ".apm-source-map",
        ".logs-osquery_manager.action.responses-default",
        ".kibana_security_solution_8.14.3_001",
        ".ds-.kibana-event-log-ds-2024.07.21-000001",
        ".ds-metrics-fleet_server.agent_status-default-2024.07.21-000001",
        ".fleet-actions-7",
        ".fleet-enrollment-api-keys-7",
        ".internal.alerts-observability.apm.alerts-default-000001",
        ".ds-metrics-fleet_server.agent_versions-default-2024.07.21-000001",
        ".ds-logs-elastic_agent.metricbeat-default-2024.07.21-000001",
        ".kibana-observability-ai-assistant-conversations-000001",
        ".internal.alerts-observability.threshold.alerts-default-000001",
        ".kibana_alerting_cases_8.14.3_001",
        ".fleet-policies-leader-7",
        ".ds-metrics-system.network-default-2024.07.21-000001",
        ".ds-logs-system.syslog-default-2024.07.21-000001",
        ".ds-.slm-history-7-2024.07.21-000001",
        ".logs-osquery_manager.actions-default",
        ".geoip_databases",
        ".ds-metrics-system.load-default-2024.07.21-000001",
        ".ds-.logs-deprecation.elasticsearch-default-2024.07.21-000001",
        ".internal.alerts-observability.slo.alerts-default-000001",
        ".ds-metrics-system.cpu-default-2024.07.21-000001",
        ".internal.alerts-observability.uptime.alerts-default-000001",
        ".fleet-agents-7",
        ".security-7",
        ".ds-metrics-elastic_agent.osquerybeat-default-2024.07.21-000001",
        ".transform-notifications-000002",
        ".kibana_security_session_1",
        ".internal.alerts-stack.alerts-default-000001",
        ".internal.alerts-default.alerts-default-000001",
        ".ds-metrics-elastic_agent.filebeat-default-2024.07.21-000001",
        ".slo-observability.summary-v3.2.temp",
        ".ds-metrics-system.uptime-default-2024.07.21-000001",
        ".security-tokens-7",
        ".ds-metrics-elastic_agent.filebeat_input-default-2024.07.21-000001",
        ".slo-observability.sli-v3.2",
        ".ds-logs-elastic_agent.filebeat-default-2024.07.21-000001",
        ".fleet-policies-7",
        ".apm-custom-link",
        ".kibana_analytics_8.14.3_001",
        ".internal.alerts-transform.health.alerts-default-000001",
        ".ds-logs-elastic_agent.osquerybeat-default-2024.07.21-000001",
        ".ds-logs-system.auth-default-2024.07.21-000001",
        ".ds-metrics-system.process-default-2024.07.21-000001",
        ".ds-metrics-system.process.summary-default-2024.07.21-000001",
        ".ds-metrics-system.fsstat-default-2024.07.21-000001",
        ".security-profile-8",
        ".ds-logs-elastic_agent-default-2024.07.21-000001",
        ".internal.alerts-observability.logs.alerts-default-000001",
        ".ds-metrics-system.diskio-default-2024.07.21-000001",
        ".ds-metrics-system.filesystem-default-2024.07.21-000001",
        ".transform-internal-007",
        ".ds-.fleet-actions-results-2024.07.21-000001",
        ".kibana-observability-ai-assistant-kb-000001",
        ".kibana_8.14.3_001"
      ],
      "failures": [],
      "data_streams": [
        ".fleet-actions-results", <----- FIRST ENTRY
        ".logs-deprecation.elasticsearch-default",
        "ilm-history-7",
        "logs-elastic_agent.osquerybeat-default",
        "metrics-system.process-default",
        "metrics-elastic_agent.filebeat-default",
        "metrics-elastic_agent.metricbeat-default",
        "metrics-elastic_agent.filebeat_input-default",
        "metrics-system.process.summary-default",
        "metrics-system.network-default",
        "logs-system.auth-default",
        "metrics-system.load-default",
        "logs-elastic_agent.metricbeat-default",
        "logs-osquery_manager.result-default",
        "metrics-system.fsstat-default",
        "logs-system.syslog-default",
        "metrics-system.cpu-default",
        "logs-elastic_agent.filebeat-default",
        "metrics-system.memory-default",
        "metrics-elastic_agent.elastic_agent-default",
        "logs-elastic_agent-default",
        "metrics-elastic_agent.osquerybeat-default",
        ".kibana-event-log-ds",
        ".slm-history-7",
        "metrics-system.diskio-default",
        "metrics-system.filesystem-default",
        "metrics-fleet_server.agent_status-default",
        "metrics-system.uptime-default",
        "metrics-fleet_server.agent_versions-default",
        "metrics-system.socket_summary-default",
        ".fleet-actions-results" <----- SECOND ENTRY
      ],
      "start_time_in_millis": 1721615399815,
      "metadata": {
        "policy": "cloud-snapshot-policy"
      }
    }
  ]
}

Could be related to #89261 and #71667

Steps to Reproduce

  • Create a cluster version 8.14.3 in ESS and deploy an Elastic Agent with the Osquery Manager integration
  • Run a new live Osquery
  • Observe that the .fleet-actions-results system data stream is created with the respective backing indices
  • Take a snapshot and run the Get snapshot API on the same snapshot
  • Observe the duplicate .fleet-actions-results system data stream in Get snapshot API output

Logs (if relevant)

No response

@romain-chanu romain-chanu added >bug needs:triage Requires assignment of a team area label labels Jul 22, 2024
@DaveCTurner DaveCTurner added :Data Management/Data streams Data streams and their lifecycles and removed needs:triage Requires assignment of a team area label labels Jul 22, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Jul 22, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@PeteGillinElastic PeteGillinElastic self-assigned this Oct 18, 2024
@PeteGillinElastic
Copy link
Member

It doesn't look like the duplicates are due to a bug in the get snapshot API. It looks like the duplicate info is persisted in the repository.

I started ES locally with

gradlew run -Dtests.es.path.repo=/Users/petegillin/my-snapshot-repository -Dtests.es.xpack.security.enabled=false

I triggered the creation of that system data stream with

curl  -H "Content-Type: application/json; charset=UTF-8" -H "X-elastic-product-origin: fleet" -XPUT "http://localhost:9200/_data_stream/.fleet-actions-results"

I created a local FS snapshot repository with

curl  -H "Content-Type: application/json; charset=UTF-8" -XPUT "http://localhost:9200/_snapshot/my_fs_backup?pretty=true" -d'{
  "type": "fs",
  "settings": {
    "location": "/Users/petegillin/my-snapshot-repository"
  }
}
'

I triggered a snapshot with

curl  -H "Content-Type: application/json; charset=UTF-8" -XPUT "http://localhost:9200/_snapshot/my_fs_backup/my_snapshot?wait_for_completion=true&pretty=true"

N.B. You can already see the duplicate data stream names in the response:

    "data_streams" : [
      ".fleet-actions-results",
      "ilm-history-7",
      ".fleet-actions-results"
    ],

I called the get stapshot API with

curl  -H "Content-Type: application/json; charset=UTF-8" -XGET "http://localhost:9200/_snapshot/_all/_all?pretty=true"

You can see the duplicate data stream names in the response, as above.

Now, if we debug this, we see that at this call stack:

at [email protected]/org.elasticsearch.snapshots.SnapshotInfo.fromXContentInternal(SnapshotInfo.java:767)
at [email protected]/org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.deserialize(ChecksumBlobStoreFormat.java:183)
at [email protected]/org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.read(ChecksumBlobStoreFormat.java:129)
at [email protected]/org.elasticsearch.repositories.blobstore.BlobStoreRepository.lambda$getOneSnapshotInfo$27(BlobStoreRepository.java:2005)

we are doing

dataStreams = XContentParserUtils.parseList(parser, XContentParser::text);

and the value of dataStreams that we are deserializing from the blob has the duplicate:

[.fleet-actions-results, ilm-history-7, .fleet-actions-results]

@PeteGillinElastic
Copy link
Member

Since it looks like this is down to what's stored in the repository rather than a hallucination of the GET API, I think that @elastic/es-distributed should take a look.

I'm going to change the labels accordingly — please change back if you disagree.

@PeteGillinElastic PeteGillinElastic added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs and removed :Data Management/Data streams Data streams and their lifecycles labels Oct 23, 2024
@elasticsearchmachine elasticsearchmachine added Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. and removed Team:Data Management Meta label for data/management team labels Oct 23, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@PeteGillinElastic PeteGillinElastic removed their assignment Oct 23, 2024
@ywangd
Copy link
Member

ywangd commented Oct 24, 2024

Yeah I can confirm this is a bug in creating snapshot. In the following code, we concate the resolved datastream names with system data stream names without duplicating.

CollectionUtils.concatLists(
indexNameExpressionResolver.dataStreamNames(currentState, request.indicesOptions(), request.indices()),
systemDataStreamNames

Fortunately, this bug should not have significant consequences other than the duplicated output in the API response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Projects
None yet
Development

No branches or pull requests

5 participants