Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate instance id collected from add_cloud_metadata processor #19758

Closed
3 tasks done
kaiyan-sheng opened this issue Jul 8, 2020 · 9 comments
Closed
3 tasks done
Assignees
Labels
meta Team:Platforms Label for the Integrations - Platforms team

Comments

@kaiyan-sheng
Copy link
Contributor

kaiyan-sheng commented Jul 8, 2020

In Metricbeat, we can use add_cloud_metadata processor to enrich each event with instance metadata from the machine’s hosting provider. We also collect these metadata when running specific public cloud provider Metricbeat modules individually without the add_cloud_metadata processor.

For example, add_cloud_metadata processor should collect cloud.instance.id when we run Metricbeat on AWS EC2 instance. Metricbeat aws ec2 metricset also collects cloud.instance.id as a part of the event.

Cloud metadata like cloud.instance.id will be the important field that connects events sent from inside the host and outside the host. This issue is to track the investigation work on verify if cloud metadata are collected properly using add_cloud_metadata processor for different hosts/public cloud providers.

  • Running aws ec2 metricset vs running system module with add_cloud_metadata processor on EC2 instance
  • Running googlecloud compute metricset vs running system module with add_cloud_metadata processor on GCP VM
  • Running azure compute_vm metricset vs running system module with add_cloud_metadata processor on Azure VM
Public Cloud Type Assign to
AWS @kaiyan-sheng
GCP @kaiyan-sheng
Azure @narph

cc @exekias @sorantis

@kaiyan-sheng kaiyan-sheng added the Team:Platforms Label for the Integrations - Platforms team label Jul 8, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-platforms (Team:Platforms)

@exekias
Copy link
Contributor

exekias commented Jul 29, 2020

One of the main things we need to figure out from this research is: How are we going to correlate data coming from Filebeat/Metricbeat inside a cloud VM with the data we get from outside (AWS, Azure, GCP modules)?. This answer should make it to the inventory definition.

Some possible options:

  • explore if we need to define cloud.instance.id taking precedence over host.name when present
  • explore if we can come up with the same host.name both from inside/outside the machine

@kaiyan-sheng
Copy link
Contributor Author

kaiyan-sheng commented Aug 3, 2020

With #20171, host.name from ec2 metricset will be the same as cloud.instance.name and host.id will be the same as cloud.instance.id.
When running system module in EC2 instance, host.name is the private DNS address of the instance.

For AWS case, cloud.instance.id is a better way to correlate data both from inside and outside of the EC2 instance. We can't rely on host.name because it's optional for an EC2 instance to have a name. So some instances will not have a cloud.instance.name or host.name.

Running system module in EC2 instance with add_host_metadata and add_cloud_metadata processors enabled gives us event with host and cloud related fields:

  "host": {
    "id": "ec2eb2a972ed3b680c44e709588d1e20",
    "containerized": false,
    "name": "ip-172-31-19-109.ec2.internal",
    "ip": [
      "172.31.19.109",
      "fe80::831:6ff:fede:4a7f"
    ],
    "mac": [
      "0a:31:06:de:4a:7f"
    ],
    "hostname": "ip-172-31-19-109.ec2.internal",
    "architecture": "x86_64",
    "os": {
      "family": "redhat",
      "name": "Amazon Linux",
      "kernel": "4.14.173-137.229.amzn2.x86_64",
      "codename": "Karoo",
      "platform": "amzn",
      "version": "2"
    }
  },
  "cloud": {
    "image": {
      "id": "ami-0323c3dd2da7fb37d"
    },
    "provider": "aws",
    "instance": {
      "id": "i-0f40ae7ad61b1da0a"
    },
    "machine": {
      "type": "t2.xlarge"
    },
    "region": "us-east-1",
    "availability_zone": "us-east-1c",
    "account": {
      "id": "428152502467"
    }

@narph
Copy link
Contributor

narph commented Aug 17, 2020

some info on the Azure side:

  • as mentioned before we are using resource.id to to map cloud.instance.id but for some types of resources as vm's we see a specific id that can be considered as the machine id.

For example when running the system module inside the vm we get the following data:

{
  "@timestamp": "2020-08-11T11:27:03.854Z",
  "@metadata": {
    "beat": "metricbeat",
    "type": "_doc",
    "version": "7.8.1"
  },
  "host": {
    "os": {
      "name": "Windows Server 2019 Datacenter",
      "kernel": "10.0.17763.1339 (WinBuild.160101.0800)",
      "build": "17763.1339",
      "platform": "windows",
      "version": "10.0",
      "family": "windows"
    },
    "id": "503623fb-f098-4a7c-92ce-3613e1797e7d",
    "ip": [
      "fe80::e8fe:605d:94c6:b23f",
      "10.0.1.6"
    ],
    "mac": [
      "00:22:48:7f:ba:37"
    ],
    "hostname": "perfmon-test",
    "name": "perfmon-test",
    "architecture": "x86_64"
  },
  "agent": {
    "type": "metricbeat",
    "version": "7.8.1",
    "hostname": "perfmon-test",
    "ephemeral_id": "c2679589-4222-46ca-a22e-5199f1d3330b",
    "id": "16e48b65-7a7c-4f32-b6dd-c6ae72c00517",
    "name": "perfmon-test"
  },
  "cloud": {
    "provider": "az",
    "instance": {
      "name": "perfmon-test",
      "id": "23d5541a-ad41-4bef-b217-1d992c51ae07"
    },
    "machine": {
      "type": "Standard_B1ms"
    },
    "region": "westeurope"
  },
  "event": {
    "dataset": "system.memory",
    "module": "system",
    "duration": 4802400
  },
  "metricset": {
    "name": "memory",
    "period": 10000
  },
  "service": {
    "type": "system"
  },
  "system": {
    "memory": {
      "used": {
        "bytes": 1636282368,
        "pct": 0.7621
      },
      "free": 510730240,
      "actual": {
        "free": 510730240,
        "used": {
          "pct": 0.7621,
          "bytes": 1636282368
        }
      },
      "swap": {
        "total": 2683740160,
        "used": {
          "pct": 0.8203,
          "bytes": 2201350144
        },
        "free": 482390016
      },
      "total": 2147012608
    }
  }
  "ecs": {
    "version": "1.5.0"
  }
}

so cloud.instance.id is mapped to "23d5541a-ad41-4bef-b217-1d992c51ae07".

When running the azure module we get the following output:

{
  "@timestamp": "2020-08-17T13:08:00.000Z",
  "@metadata": {
    "beat": "metricbeat",
    "type": "_doc",
    "version": "8.0.0"
  },
  "agent": {
    "name": "DESKTOP-RFOOE09",
    "type": "metricbeat",
    "version": "8.0.0",
    "ephemeral_id": "0dae7c05-03b2-40d5-852e-2dd92285fbc8",
    "id": "4a1576f3-cd1f-4b8b-90d6-5918f5cb5d71"
  },
  "ecs": {
    "version": "1.5.0"
  },
  "cloud": {
    "instance": {
      "name": "perfmon-test",
      "id": "/subscriptions/.../resourceGroups/obs-test/providers/Microsoft.Compute/virtualMachines/perfmon-test"
    },
    "machine": {
      "type": "Standard_B1ms"
    },
    "provider": "azure",
    "region": "westeurope"
  },
  "event": {
    "module": "azure",
    "duration": 7320562600,
    "dataset": "azure.compute_vm"
  },
  "metricset": {
    "name": "compute_vm",
    "period": 300000
  },
  "azure": {
    "namespace": "Microsoft.Compute/virtualMachines",
    "compute_vm": {
     ...
    },
    "timegrain": "PT5M",
    "resource": {
      "group": "obs-test",
      "type": "Microsoft.Compute/virtualMachines"
    },
    "subscription_id": "..."
  },
  "service": {
    "type": "azure"
  },
  "host": {
    "ip": [
      ...
    ],
    "mac": [
      ...
    ],
    "hostname": "DESKTOP-RFOOE09",
    "name": "DESKTOP-RFOOE09",
    "architecture": "x86_64",
    "os": {
      "version": "10.0",
      "family": "windows",
      "name": "Windows 10 Pro",
      "kernel": "10.0.18362.959 (WinBuild.160101.0800)",
      "build": "18363.959",
      "platform": "windows"
    },
    "id": "1e50b6e1-9710-4164-a8f0-032b3c721dc3"
  }
}

In this case cloud.instance.id is mapped inside the metricset with the actual resource id.
We could retrieve the vm id using the api GET resources by ID, we are already calling this api in order to map the machine type.
The call seems to containvmId=23d5541a-ad41-4bef-b217-1d992c51ae07 which matches the GUID from the system module.

Few issues here:

  • the rest of metricsets which don't involve vm's do not seem to have/expose this type of ID's so what would we map it with then? We will not have a consistent mapping of this field.
  • I see the cloud.provider is az in the system module but we matched it to azure in the azure module. Why did we shorten azure to az? Looking around I do not see this az notation being a popular one or even used anywhere (maybe in the az client). Unlike aws or gcp, azure is does not seem to shorten to az, using azure instead seems to me to be the most reasonable solution.

@kaiyan-sheng
Copy link
Contributor Author

Few issues here:

  • the rest of metricsets which don't involve vm's do not seem to have/expose this type of ID's so what would we map it with then? We will not have a consistent mapping of this field.

Thanks @narph ! I think we should use vmId=23d5541a-ad41-4bef-b217-1d992c51ae07 as cloud.instance.id in azure module to match system module. The rest of the metricsets which don't involve VM will not have cloud.instance.id field right?

  • I see the cloud.provider is az in the system module but we matched it to azure in the azure module. Why did we shorten azure to az? Looking around I do not see this az notation being a popular one or even used anywhere (maybe in the az client). Unlike aws or gcp, azure is does not seem to shorten to az, using azure instead seems to me to be the most reasonable solution.

cloud.provider in ECS cloud field has an example value of azure. We should create a separate PR to fix this. WDYT?

@narph
Copy link
Contributor

narph commented Aug 19, 2020

Thanks @narph ! I think we should use vmId=23d5541a-ad41-4bef-b217-1d992c51ae07 as cloud.instance.id in azure module to match system module. The rest of the metricsets which don't involve VM will not have cloud.instance.id field right?

All metricsets are using the same mapping function so cloud.instance.id field is mapped to the resource.id. If we choose to map this field only for vm's then we have to reintroduce the azure.resource.id field.
Also, should this be the case for cloud.instance.name as well? Map it only for the vm's and reintroduce azure.resource.name as we had in the past?

cloud.provider in ECS cloud field has an example value of azure. We should create a separate PR to fix this. WDYT?

I have already started work on that one, will link the PR here.

@narph
Copy link
Contributor

narph commented Aug 24, 2020

Opened issue #20754 for the mapping of cloud.instance.id in the compute_vm metricset and PR is in progress.

@kaiyan-sheng
Copy link
Contributor Author

During the investigation, I found for GCP: add_cloud_metadata processor is not giving cloud.account.id. This will be fixed in a separate PR #21776.

@kaiyan-sheng
Copy link
Contributor Author

Closing this issue because cloud.instance.id from add_cloud_metadata processor matches cloud.instance.id from aws, azure and googlecloud metricbeat module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta Team:Platforms Label for the Integrations - Platforms team
Projects
None yet
Development

No branches or pull requests

4 participants