Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backpack and OCS #6

Open
bengland2 opened this issue May 8, 2020 · 11 comments
Open

backpack and OCS #6

bengland2 opened this issue May 8, 2020 · 11 comments
Assignees

Comments

@bengland2
Copy link
Contributor

For some OCS tests, I want to run the workload pods on separate physical machines from the OCS (OpenShift Container Storage) cluster - I think this is a problem that other folks might have too, with more than OCS. At present, backpack will be blissfully unaware that there are other nodes involved in the test and will not collect metadata about them. It would be great if there was a way to hand backpack a list of labels that mean "collect metadata about any of the nodes that have one of these labels". For example, OCS nodes are all labelled. Doing it this way would make it applicable to much more than just OCS.

@dry923
Copy link
Member

dry923 commented May 8, 2020

@bengland2 I have created PR cloud-bulldozer/benchmark-operator#335 which would allow you to run the metadata daemonset on only specifically labeled nodes. You would have to run it in addition to the normal fio/etc cr. Something as simple as this would work for nodes labeled foo=bar

apiVersion: ripsaw.cloudbulldozer.io/v1alpha1
kind: Benchmark
metadata:
  name: backpack
  namespace: my-ripsaw
spec:
  elasticsearch:
    server: es_server
    port: 9200
  metadata:
    collection: true
    targeted: false
    label_name: foo
    label_value: bar
  workload:
    name: metadata

@dry923 dry923 self-assigned this May 8, 2020
@bengland2
Copy link
Contributor Author

looks good to me, just have to try it out. 1 label is probably enough.

@dry923
Copy link
Member

dry923 commented May 8, 2020

If needed i could probably make it loop on a list but I haven't tried that with it yet

@dry923
Copy link
Member

dry923 commented May 8, 2020

@bengland2 I updated the PR to take a list of labels. Let me know your thoughts.

The cr would now look like:

apiVersion: ripsaw.cloudbulldozer.io/v1alpha1
kind: Benchmark
metadata:
  name: backpack
  namespace: my-ripsaw
spec:
  elasticsearch:
    server: "marquez.perf.lab.eng.rdu2.redhat.com"
    port: 9200
  metadata:
    collection: true
    targeted: false
    label:
      - [ 'my', 'bar' ]
      - [ 'foo', 'bar' ]
  workload:
    name: metadata

The node(s) would have to match all the labels given as its an implied AND functionality.

@bengland2
Copy link
Contributor Author

my original post wasn't clear, but I was proposing an OR not an AND. For a workload like CNV+OCS, there might be multiple sets of nodes to deal with (i.e. CNV compute nodes + OCS storage nodes).

@bengland2
Copy link
Contributor Author

haven't tested the PR with 2 labels yet but it certainly works with 1 label.

@bengland2
Copy link
Contributor Author

@dry923 with 2 labels it failed, backpack never runs, here's the benchmark operator log, pretty-printed using, around line 340 there is an error.

ocmr logs $(ocmr get pod | awk '/benchmark-operator/{print $1}') -c benchmark-operator \
   | ~/parse-kube-pod-log.py - > ~/benchmark-operator.log

and the parse-kube-pod-log.py is here, I just can't read a raw log from benchmark operator, sorry.

The CR that I used is here. Again, with just 1 label it works, and is usable for me, just not quite as general purpose. This is low-priority, but someday would be nice to have. I'm using quay.io/benchmark-operator/benchmark-operator:master as of 20m ago.

@dry923
Copy link
Member

dry923 commented May 15, 2020

@bengland2 Thanks for the heads up. I'll take a look at why its failing and let you know.

@dry923
Copy link
Member

dry923 commented May 15, 2020

@bengland2 So my initial thought was that it might be failing on the empty value with the label

 [ 'cluster.ocs.openshift.io/openshift-storage', '']

However, I did a similar test and it worked fine. I also re-checked it with the fio cr we use for ci but adding in the labels and it also succeeded. The logs we're too helpful unfortunately, is there any other information you can give me? I.e. did backpack launch then error, did fio error, did the operator throw an error and it simply never started anything, etc?

Thanks!

@jtaleric
Copy link
Member

Is this still an issue @bengland2 ?

@bengland2
Copy link
Contributor Author

the may 9th post is still an issue in my mind, I've just been preoccupied with other issues. The workaround is to just use one label for backpack that explicitly identifies which nodes you want to collect on, and that should be good enough for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants