Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes in Acto for running tests on kubeblocks postgresql operator #331

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 27 additions & 1 deletion acto/deploy.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ class Deploy():

def __init__(self, deploy_config: DeployConfig) -> None:
self._deploy_config = deploy_config
print(deploy_config)

self._operator_yaml: str = None
for step in self._deploy_config.steps:
Expand Down Expand Up @@ -81,7 +82,9 @@ def deploy(self,
# Run the steps in the deploy config one by one
for step in self._deploy_config.steps:
if step.apply:
args = ["apply", "--server-side", "-f", step.apply.file,
# args = ["apply", "--server-side", "-f", step.apply.file,
# "--context", context_name]
args = ["apply", "-f", step.apply.file,
"--context", context_name]

# Use the namespace from the argument if the namespace is delegated
Expand Down Expand Up @@ -110,6 +113,29 @@ def deploy(self,
elif step.wait:
# Simply wait for the specified duration
time.sleep(step.wait.duration)
elif step.create:
args = ["create", "-f", step.create.file,
"--context", context_name]

# Use the namespace from the argument if the namespace is delegated
# If the namespace from the config is explicitly specified,
# use the specified namespace
# If the namespace from the config is set to None, do not apply
# with namespace
if step.create.namespace == DELEGATED_NAMESPACE:
args += ["-n", namespace]
elif step.create.namespace is not None:
args += ["-n", step.create.namespace]

# Apply the yaml file and then wait for the pod to be ready
p = kubectl_client.kubectl(args)
if p.returncode != 0:
logger.error(
"Failed to deploy operator due to error from kubectl" +
f" (returncode={p.returncode})" +
f" (stdout={p.stdout})" +
f" (stderr={p.stderr})")
return False

# Add acto label to the operator pod
add_acto_label(api_client, namespace)
Expand Down
10 changes: 2 additions & 8 deletions acto/k8s_util/lib/k8sutil.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

#line 1 "cgo-builtin-export-prolog"

#include <stddef.h>
#include <stddef.h> /* for ptrdiff_t below */

#ifndef GO_CGO_EXPORT_PROLOGUE_H
#define GO_CGO_EXPORT_PROLOGUE_H
Expand Down Expand Up @@ -40,17 +40,11 @@ typedef long long GoInt64;
typedef unsigned long long GoUint64;
typedef GoInt64 GoInt;
typedef GoUint64 GoUint;
typedef size_t GoUintptr;
typedef __SIZE_TYPE__ GoUintptr;
typedef float GoFloat32;
typedef double GoFloat64;
#ifdef _MSC_VER
#include <complex.h>
typedef _Fcomplex GoComplex64;
typedef _Dcomplex GoComplex128;
#else
typedef float _Complex GoComplex64;
typedef double _Complex GoComplex128;
#endif

/*
static assertion to make sure the file is being used on architecture
Expand Down
24 changes: 24 additions & 0 deletions acto/lib/operator_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,27 @@ class WaitStep(pydantic.BaseModel, extra="forbid"):
description="Wait for the specified seconds", default=10
)

class CreateStep(pydantic.BaseModel, extra="forbid"):
"""Configuration for each step of kubectl create"""

file: str = pydantic.Field(description="Path to the file for kubectl create")
operator: bool = pydantic.Field(
description="If the file contains the operator deployment",
default=False,
)
operator_container_name: Optional[str] = pydantic.Field(
description="The container name of the operator in the operator pod, "
"required if there are multiple containers in the operator pod",
default=None,
)
namespace: Optional[str] = pydantic.Field(
description="Namespace for applying the file. If not specified, "
+ "use the namespace in the file or Acto namespace. "
+ "If set to null, use the namespace in the file",
default=DELEGATED_NAMESPACE,
)



class DeployStep(pydantic.BaseModel, extra="forbid"):
"""A step of deploying a resource"""
Expand All @@ -44,6 +65,9 @@ class DeployStep(pydantic.BaseModel, extra="forbid"):
description="Configuration for each step of waiting for the operator",
default=None,
)
create: CreateStep = pydantic.Field(
description="Configuration for each step of kubectl create", default=None
)

# TODO: Add support for helm and kustomize
# helm: str = pydantic.Field(
Expand Down
134 changes: 134 additions & 0 deletions data/kubeblocks-postgresql-operator/alarms-report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
### Alarms analysis


1, 2, 3. `testrun-2024-02-23-20-31/trial-02-0007/0001`, `testrun-2024-02-23-20-31/trial-02-0006/0004`, `testrun-2024-02-23-20-31/trial-02-0008/0001`

Testcase: `{"field": "[\"spec\", \"affinity\", \"nodeLabels\", \"ACTOKEY\"]", "testcase": "string-change"}`

Acto changes the nodeLabels field from "NotPresent" to "ACTOKEY".

`message='statefulset: test-cluster-postgresql replicas [1] ready_replicas [None]\npod: test-cluster-postgresql-0'`

The health oracle expects that there should be 1 ready replica in the system, but there are none, so it raises an alarm.

The following event occurs: `0/4 nodes are available: 1 node(s) didn't match Pod's node affinity/selector. preemption: 0/4 nodes are available: 1 Preemption is not helpful for scheduling, 3 No preemption victims found for incoming pod.`

Basically "ACTOKEY" is not a valid node label. However, the operator shuts down the running replica itself instead of rejecting the invalid state (which results in updating the affinity rule with an incorrect value). Thus the alarm is a **misoperation**.


4, 5. `testrun-2024-02-23-20-31/trial-02-0013/0005`, `testrun-2024-02-23-20-31/trial-03-0007/0003`

Testcase: `{"field": "[\"spec\", \"componentSpecs\", 0, \"services\", 0, \"annotations\", \"ACTOKEY\"]", "testcase": "string-deletion"}`

Acto removes spec.componentSpecs[0].services[0].annotations. Previously it was "ACTOKEY".
The consistency oracle raises an alarm that there is no such matching system change.

message='Found no matching fields for input' input_diff=Diff(prev='ACTOKEY', curr='', path=["spec", "componentSpecs", 0, "services", 0, "annotations", "ACTOKEY"]) system_state_diff=None

This occurs because the Reconcile() function of the operator calls ApplyParameters(), which calls DoMerge() in reconfigure_pipeline.go that merges operator configurations.
After several steps this calls MergeMap() on the annotations, that merges the backup annotation with the new one. So the old annotation value is not deleted. So this is a **true** alarm.

6-10. `testrun-2024-02-23-20-31/trial-00-0019/0001, testrun-2024-02-23-20-31/trial-00-0021/0001, testrun-2024-02-23-20-31/trial-03-0014/0001,
testrun-2024-02-23-20-31/trial-04-0019/0001`

In these alarms acto adds some variation of:
```
userResourceRefs:
configMapRefs:
- asVolumeFrom:
- ACTOKEY
configMap: {}
mountPoint: /jiyyay4e9mdlz06j2mn3n22xn2zcuuqautlbyltm6cyh67ynqwi03nwqmgi-18wpxd-cq7ixsypzbe3b-0blkusvc-dflm59kq0n50awotvkpxkddcs2f5bwjyskqqrm13taiestlhg4rkg1kh2pihr8a1f7yys3fauo4-m4ftdy6bmy6gg3ybr7us448uco7l50z-1m1q54wy2c9avdd-unnfqx12zrge
name: e
```

Normal kubeblocks deployment runs fine.
However in each of these cases, configMap.name is not present so an error like
`spec.volumes[3].configMap.name: Required value` is generated in the postgresql pod.

This is a **misconfiguration** since the generated yaml is incorrect.

11. `testrun-2024-02-23-20-31/trial-03-0015/0001`

Similar to above, but here the spec.volumes[3].name field contains dots, which is not allowed.

**Misconfiguration**

12. `testrun-2024-02-23-20-31/trial-07-0007/0001`, `testrun-2024-02-23-20-31/trial-01-0017/0001`, `testrun-2024-02-23-20-31/trial-01-0016/0001`, `testrun-2024-02-23-20-31/trial-01-0018/0001`,

Same as 6-10

13. `testrun-2024-02-23-20-31/trial-07-0009/0003`

Similar to above. `configMapsRef.name` yaml should not contain dots. **Misconfiguration**

14 - 22. `testrun-2024-02-23-20-31/trial-02-0021/0001`, `testrun-2024-02-23-20-31/trial-02-0020/0009`, `testrun-2024-02-23-20-31/trial-03-0017/0001`, `testrun-2024-02-23-20-31/trial-05-0002/0001`, `testrun-2024-02-23-20-31/trial-05-0001/0001`, `testrun-2024-02-23-20-31/trial-05-0000/0001`, `testrun-2024-02-23-20-31/trial-05-0000/0001`, `testrun-2024-02-23-20-31/trial-05-0000/0001`, `testrun-2024-02-23-20-31/trial-08-0000/0003`

Similar to 6-10, secret.secretName not specified in generated yaml, making it invalid
`error: Pod "pg-cluster-postgresql-0" is invalid: spec.volumes[3].secret.secretName: Required value`

**Misconfiguration**

23 - 26. `testrun-2024-02-23-20-31/trial-00-0006/0006`, `testrun-2024-02-23-20-31/trial-05-0014/0001`, `testrun-2024-02-23-20-31/trial-05-0013/0002`, `testrun-2024-02-23-20-31/trial-05-0015/0001`

```0/4 nodes are available: 1 node(s) didn't match Pod's node affinity/selector. preemption: 0/4 nodes are available: 1 Preemption is not helpful for scheduling, 3 No preemption victims found for incoming pod..
```

Acto adds a single node label to the yaml in the affinity rules, that does not match the correct label for the node.
All the cases follow similar pattern.
**Misoperation**

27 - 30. `testrun-2024-02-23-20-31/trial-06-0004/0001`, `testrun-2024-02-23-20-31/trial-07-0004/0002`, `testrun-2024-02-23-20-31/trial-04-0015/0002`, `testrun-2024-02-23-20-31/trial-04-0016/0003`

```...Invalid value: "": name part must consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyName', or 'my.name', or '123-abc', regex used for validation is '([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]'), spec.topologySpreadConstraints[0].topologyKey: Required value: can not be empty
```
Invalid yaml generated. Acto sets topologyKeys to ''.

**Misoperation**


31 - 38. `testrun-2024-02-23-20-31/trial-04-0014/0005`, `testrun-2024-02-23-20-31/trial-08-0006/0002`, `testrun-2024-02-23-20-31/trial-00-0009/0001`, `testrun-2024-02-23-20-31/trial-08-0007/0002`, `testrun-2024-02-23-20-31/trial-03-0013/0001`, `testrun-2024-02-23-20-31/trial-05-0021/0002`, `testrun-2024-02-23-20-31/trial-06-0011/0001`, `testrun-2024-02-23-20-31/trial-05-0022/0002`

```Pod "pg-cluster-postgresql-0" is invalid: [spec.tolerations[0].effect: Invalid value: "INVALID_EFFECT": effect must be 'NoExecute' when `tolerationSeconds` is set, spec.tolerations[0].effect: Unsupported value: "INVALID_EFFECT": supported values: "NoSchedule", "PreferNoSchedule", "NoExecute"```

Acto incorrectly sets the yaml for tolerations. Similar errors for each case.
**Misoperation**

39 - 40. `testrun-2024-02-23-20-31/trial-04-0005/0001`, `testrun-2024-02-23-20-31/trial-05-0020/0006`

Same as 23-26 **Misoperation**

41 - 44. `testrun-2024-02-23-20-31/trial-07-0001/0001`, `testrun-2024-02-23-20-31/trial-07-0000/0004`, `testrun-2024-02-23-20-31/trial-07-0014/0001`, `testrun-2024-02-23-20-31/trial-02-0009/0001`

```create Pod pg-cluster-postgresql-0 in StatefulSet pg-cluster-postgresql failed error: Pod "pg-cluster-postgresql-0" is invalid: [spec.containers[0].resources.requests[ACTOKEY]: Invalid value: "ACTOKEY": must be a standard resource type or fully qualified, spec.containers[0].resources.requests[ACTOKEY]: Invalid value: "ACTOKEY": must be a standard resource for containers]```

Acto specifies an invalid value in the resources yaml. All cases have similar errors.

**Misoperation**

45 - 47. `testrun-2024-02-23-20-31/trial-06-0006/0004`, `testrun-2024-02-23-20-31/trial-00-0000/0008`, `testrun-2024-02-23-20-31/trial-07-0010/0009`

Invalid yaml configuration, similar to previous cases

**Misoperation**

48 - 60. `testrun-2024-02-23-20-31/trial-00-0005/0002`, `testrun-2024-02-23-20-31/trial-00-0004/0002`, `testrun-2024-02-23-20-31/trial-06-0012/0006`, `testrun-2024-02-23-20-31/trial-06-0013/0003`, `testrun-2024-02-23-20-31/trial-05-0006/0006`, `testrun-2024-02-23-20-31/trial-05-0007/0003`, `testrun-2024-02-23-20-31/trial-00-0002/0002`, `testrun-2024-02-23-20-31/trial-00-0001/0002`, `testrun-2024-02-23-20-31/trial-02-0004/0007`, `testrun-2024-02-23-20-31/trial-04-0008/0002`, `testrun-2024-02-23-20-31/trial-04-0007/0002`, `testrun-2024-02-23-20-31/trial-03-0009/0002`, `testrun-2024-02-23-20-31/trial-03-0010/0003`

`No matching fields found for input`

Any modifications to the status field is ignored by kubernetes. This will result in no change to the system state. So it is a **false** alarm.

61 - 73. `testrun-2024-02-23-20-31/trial-00-0007/0003`, `testrun-2024-02-23-20-31/trial-01-0003/0003`, `testrun-2024-02-23-20-31/trial-08-0001/0007`, `testrun-2024-02-23-20-31/trial-08-0002/0002`, `testrun-2024-02-23-20-31/trial-02-0003/0002`, `testrun-2024-02-23-20-31/trial-02-0002/0004`, `testrun-2024-02-23-20-31/trial-04-0010/0004`, `testrun-2024-02-23-20-31/trial-08-0004/0008`, `testrun-2024-02-23-20-31/trial-08-0005/0002`, `testrun-2024-02-23-20-31/trial-04-0024/0002`, `testrun-2024-02-23-20-31/trial-04-0025/0003`

Same as 48-60. **False** alarm.

74 - 95. `testrun-2024-02-23-20-31/trial-08-0016/0001`, `testrun-2024-02-23-20-31/trial-07-0018/0001`, `testrun-2024-02-23-20-31/trial-01-0000/0002`, `testrun-2024-02-23-20-31/trial-03-0022/0002`, `testrun-2024-02-23-20-31/trial-03-0021/0002`, `testrun-2024-02-23-20-31/trial-03-0019/0002`, `testrun-2024-02-23-20-31/trial-03-0020/0002`, `testrun-2024-02-23-20-31/trial-03-0018/0002`, `testrun-2024-02-23-20-31/trial-08-0019/0002`, `testrun-2024-02-23-20-31/trial-08-0020/0002`, `testrun-2024-02-23-20-31/trial-08-0009/0005`, `testrun-2024-02-23-20-31/trial-08-0010/0003`, `testrun-2024-02-23-20-31/trial-02-0017/0002`, `testrun-2024-02-23-20-31/trial-02-0018/0003`, `testrun-2024-02-23-20-31/trial-01-0014/0003`, `testrun-2024-02-23-20-31/trial-01-0015/0003`, `testrun-2024-02-23-20-31/trial-05-0011/0003`, `testrun-2024-02-23-20-31/trial-05-0012/0003`, `testrun-2024-02-23-20-31/trial-00-0015/0005`, `testrun-2024-02-23-20-31/trial-00-0016/0001`, `testrun-2024-02-23-20-31/trial-01-0013/0003`

Same as 48-60. **False** alarm.

96. `testrun-2024-02-23-20-31/trial-00-0000/0008`

Number of replicas are set to 1000, which is not allowed. This leads to an error since the operator does not check for it.

**Misoperation**

Loading