Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kind returns same value for all Pods in a namespace #246

Closed
nikimanoledaki opened this issue Sep 26, 2022 · 13 comments
Closed

Kind returns same value for all Pods in a namespace #246

nikimanoledaki opened this issue Sep 26, 2022 · 13 comments
Assignees
Labels
bug Something isn't working

Comments

@nikimanoledaki
Copy link
Collaborator

nikimanoledaki commented Sep 26, 2022

Describe the bug
All Pods return the same value in a kind cluster.

To Reproduce
On a MacOS, I created a Vagrant VM with VirtualBox as a driver to run Ubuntu 2204. I ensured that the VM meets all the Kepler requirements (kernel headers). Then, I created a kind cluster and bootstrapped Flux on it.

Below are the metrics for the Pods that I wanted to measure. They are the Pods containing the Flux controllers, all in the flux-system namespace.

The issue is that they all have the same value, which shouldn't be the case:

curl -G http://localhost:9090/api/v1/query --data-urlencode "query=pod_curr_energy_in_pkg_millijoule{pod_namespace='flux-system'}" | jq
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "pod_curr_energy_in_pkg_millijoule",
          "command": "helm-contr",
          "container": "kepler-exporter",
          "endpoint": "http",
          "instance": "kind-control-plane",
          "job": "kepler-exporter",
          "namespace": "kepler",
          "pod": "kepler-exporter-7nbzs",
          "pod_name": "helm-controller-9b6bb4f68-k2vp7",
          "pod_namespace": "flux-system",
          "service": "kepler-exporter"
        },
        "value": [
          1664036406.998,
          "1"
        ]
      },
      {
        "metric": {
          "__name__": "pod_curr_energy_in_pkg_millijoule",
          "command": "kustomize-",
          "container": "kepler-exporter",
          "endpoint": "http",
          "instance": "kind-control-plane",
          "job": "kepler-exporter",
          "namespace": "kepler",
          "pod": "kepler-exporter-7nbzs",
          "pod_name": "kustomize-controller-7f4687b878-65q97",
          "pod_namespace": "flux-system",
          "service": "kepler-exporter"
        },
        "value": [
          1664036406.998,
          "1"
        ]
      },
      {
        "metric": {
          "__name__": "pod_curr_energy_in_pkg_millijoule",
          "command": "source-con",
          "container": "kepler-exporter",
          "endpoint": "http",
          "instance": "kind-control-plane",
          "job": "kepler-exporter",
          "namespace": "kepler",
          "pod": "kepler-exporter-7nbzs",
          "pod_name": "source-controller-f8d655bdc-4fw6n",
          "pod_namespace": "flux-system",
          "service": "kepler-exporter"
        },
        "value": [
          1664036406.998,
          "1"
        ]
      },
      {
        "metric": {
          "__name__": "pod_curr_energy_in_pkg_millijoule",
          "command": "tini",
          "container": "kepler-exporter",
          "endpoint": "http",
          "instance": "kind-control-plane",
          "job": "kepler-exporter",
          "namespace": "kepler",
          "pod": "kepler-exporter-7nbzs",
          "pod_name": "notification-controller-7f5dbddc94-rtdhq",
          "pod_namespace": "flux-system",
          "service": "kepler-exporter"
        },
        "value": [
          1664036406.998,
          "1"
        ]
      }
    ]
  }
}

Expected behavior
I expected to have more granular data according to the activity of each Pod.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: MacOS with a Vagrant VirtualBox VM (Ubuntu 2204)

Additional context
This may very well be an issue due to running a kind cluster inside of a VM!

@nikimanoledaki nikimanoledaki added the bug Something isn't working label Sep 26, 2022
@rootfs
Copy link
Contributor

rootfs commented Sep 26, 2022

@nikimanoledaki thank you for reporting this issue! Are you running on mac with ARM or x86 CPUs?

@rootfs
Copy link
Contributor

rootfs commented Sep 26, 2022

would you please get the head of the kepler pod log?

kubectl logs -n kepler daemonset/kepler-exporter |head -100

@jichenjc
Copy link
Collaborator

I am guessing you are seeing similar issue to me #211 (comment) , looks like the kind on a VM might need more work

@rootfs
Copy link
Contributor

rootfs commented Sep 27, 2022

Not sure how macOS handles kind. On my setup (KVM guest on RHEL), only one pod is reported

[root@kind-control-plane /]# curl http://10.96.206.34:9102/metrics |grep pod_total_energy_millijoule 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 15976    0 15976    0     0  15.2M      0 --:--:-- --:--:-- --:--:-- 15.2M
# HELP pod_total_energy_millijoule pod_ total energy consumption (millijoule)
# TYPE pod_total_energy_millijoule counter
pod_total_energy_millijoule{command="docker-pro",pod_name="system_processes",pod_namespace="system"} 964
pod_total_energy_millijoule{command="irqbalance",pod_name="kube-scheduler-kind-control-plane",pod_namespace="kube-system"} 4

@jichenjc
Copy link
Collaborator

jichenjc commented Sep 27, 2022

On Ubuntu it's same (a Kind on KVM VM) , and the command="cinder-sch" is really weird as well (guess that's first pod it met)

root@kind-control-plane:/# curl localhost:9102/metrics |grep pod_total_energy_millijoule
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 13445    0 13445   # HELP pod_total_energy_millijoule pod_ total energy consumption (millijoule)
 # TYPE pod_total_energy_millijoule counter
0pod_total_energy_millijoule{command="cinder-sch",pod_name="system_processes",pod_namespace="system"} 238
     0  4376k      0 --:--:-- --:--:-- --:--:-- 4376k

@jichenjc
Copy link
Collaborator

I suspect BM => VM => kind is valid (at least for now) use case ,as both k8s and openshift doesn't have production env running like this model
per #211 (comment), I think

if we agree it's not a valid use case (production env) ,maybe we need doc somewhere

@marceloamaral
Copy link
Collaborator

So, just to double check, you are running inside a VM right?

@jichenjc
Copy link
Collaborator

jichenjc commented Sep 28, 2022

So, just to double check, you are running inside a VM right?

yes, BM => VM => kind, not sure @nikimanoledaki is or not

@nikimanoledaki
Copy link
Collaborator Author

Hey all, sorry for the very late reply. Yes, I was running kind inside a VM:

MacOs -> VirtualBox VM running Ubuntu 2204 -> Kind

I ensured that the VM had kernel headers.

Maybe it was not valid because MacOs was used as host. However, this was an attempt to find a working dev env for people who would like tro try Kepler using MacOs and don't have access to any baremetal machine.

This could potentially work if the CPU Arch is not discovered and instead is overridden with an env var as introduced by this PR by @rootfs: #278

WDYT? I haven't tested it - does anyone else have MacOs and would like to try? I could make time to try in the next few weeks

@marceloamaral
Copy link
Collaborator

@nikimanoledaki this might be related to the issue #388 to enable the model server.

But what is happening is that since it's a VM it doesn't normally have power metrics so it's using the power estimation for the node energy consumption. Also, since the VM probably does not have hardware counters, it is not calculating the resource usage rate to determine power consumption per container. So, it's probably dividing the host power consumption evenly across all containers. This is why we are seeing the same power consumption for all containers.

To get a better estimating on VMs you need to use the estimator and/or model server….See #388

@stale
Copy link

stale bot commented May 17, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label May 17, 2023
@marceloamaral
Copy link
Collaborator

@nikimanoledaki did you manage to collect metrics?

@stale stale bot removed the wontfix This will not be worked on label May 18, 2023
@nikimanoledaki
Copy link
Collaborator Author

nikimanoledaki commented May 18, 2023

I recently tried creating a similar setup on macOS + VirtualBox but unfortunately VirtualBox is no longer supported on the macOS Ventura, which is the latest version: kubernetes/minikube#15274
Since the setup environment is not supported at the moment or in the foreseeable future, it should be ok to close this issue.
I have not managed to make Kepler work in a VM on MacOS in any other way so I use a Linux machine instead when I have to run Kepler, so I am not blocked either.

@nikimanoledaki nikimanoledaki closed this as not planned Won't fix, can't repro, duplicate, stale May 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants