- Troubleshooting
- Viya4 Monitoring and Logging
- SAS Viya Orchestration Tool
- SAS Viya Deployment Operator
- EKS - Cluster Autoscaler Installation
- kustomize - Generate deployment manifest
- Ingress-Nginx issue - Unable to access SAS Viya Platform web apps
- Ansible Variables with Special Jinja2 Characters
- Ingress-Nginx - use-forwarded-headers disabled
- Deploying with the SAS Orchestration Tool using a Provider Based Kubernetes Configuration File
- Applying a New License for your SAS Viya Platform Deployment
- Tagging the AWS EC2 Load Balancers
- Deploying with cadence versions > 2024.06 without creating the external PostgreSQL SharedServices database
Debug mode can be enabled by adding "-vvv" to the end of the docker or ansible commands
Example:
docker run --rm \
--group-add root \
--user $(id -u):$(id -g) \
--volume $HOME:/data \
--volume $HOME/ansible-vars.yaml:/config/config \
--volume $HOME/viya4-iac-azure/terraform.tfstate:/config/tfstate \
viya4-deployment --tags "baseline,viya,cluster-logging,cluster-monitoring,viya-monitoring,install" -vvv
ansible-playbook \
-e CONFIG=$HOME/ansible-vars.yaml \
-e TFSTATE=$HOME/viya4-iac-aws/terraform.tfstate \
viya4-deployment --tags "baseline,viya,cluster-logging,cluster-monitoring,viya-monitoring,install" -vvv
While deploying the SAS Viya platform to a cluster with the "cluster-logging" and "install" Ansible task tags specified, the following error message is encountered.
TASK [monitoring : cluster-logging - deploy] ********************************************************************************
fatal: [localhost]: FAILED! => changed=false
cmd: /home/user/.ansible/viya4-monitoring-kubernetes/logging/bin/deploy_logging.sh
msg: '[Errno 2] No such file or directory: b''/home/user/.ansible/viya4-monitoring-kubernetes/logging/bin/deploy_logging.sh'''
rc: 2
PLAY RECAP ******************************************************************************************************************
localhost : ok=52 changed=12 unreachable=0 failed=1 skipped=41 rescued=0 ignored=0
The cluster-logging task tried to deploy an older, incompatible release of sassoftware/viya4-monitoring-kubernetes (i.e. a release earlier than version 1.2.0) using a release of sassoftware/viya4-deployment at release 5.0.0 or later. Release 5.0.0 (and later) of sassoftware/viya4-deployment is only compatible with sassoftware/viya4-monitoring-kubernetes release 1.2.0 (and later).
When using sassoftware/viya4-deployment releases 5.0.0 or later, specify either the stable branch or a valid sassoftware/viya4-monitoring-kubernetes release tag of 1.2.0 or later for the value of the V4M_VERSION sassoftware/viya4-deployment variable, For more details on supported variables, refer to CONFIG-VARS.md
While deploying the SAS Viya platform to a cluster by running the viya4-deployment project directly on your host with the "viya" and "install" Ansible task tags specified (see AnsibleUsage.md), the following error message is encountered when the "vdm - orchestration" task executes:
TASK [vdm : orchestration - log into V4_CFG_CR_HOST] ******************************************************************************************************************************************************
fatal: [localhost]: FAILED! => {"changed": false, "msg": "Error connecting: Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))"}
The orchestration task attempted to log into the container registry defined by V4_CFG_CR_URL
using the Python Docker client and failed to do so since it could not communicate with the local Docker Engine API.
As of release 6.0.0, it's required that if you are running this project using Ansible directly on your workstation, it needs Docker to be installed and the executing user should be able to access it. This is so that we can consume the sas-orchestration tool, which is available as a Docker image to generate the SASDeployment Custom Resource file
On your host:
- Ensure that Docker is installed on your machine, the Dependency Versions documentation states that you need at least version 20.10.10.
- If Docker is already installed on you machine ensure that the deamon is running, see the Docker documentation.
When the SAS Viya Platform Deployment Operator is not working as expected, three different sources can be used to diagnose problems. Follow the commands from the SAS Viya Platform deployment guide to check out the SAS Viya Platform Deployment Operator Pod, the SASDeployment Custom Resource, and the Reconcile Job. Remediation steps are also present on that page.
While baselining your 1.25+ EKS cluster using the viya4-deployment project the "Deploy cluster-autoscaler" task failed with a timeout
TASK [baseline : Deploy cluster-autoscaler] ************************************
task path: /viya4-deployment/roles/baseline/tasks/cluster-autoscaler.yaml:15
fatal: [localhost]: FAILED! => changed=false
command: /usr/local/bin/helm --version=9.25.0 --repo=https://kubernetes.github.io/autoscaler upgrade
-i --reset-values --wait -f=/tmp/tmpzoxsdrsu.yml cluster-autoscaler cluster-autoscaler
msg: |-
Failure when executing Helm command. Exited 1.
stdout: Release "cluster-autoscaler" does not exist. Installing it now.
Error: timed out waiting for the condition
stderr: |-
Error: timed out waiting for the condition
stderr_lines: <omitted>
stdout: |-
Release "cluster-autoscaler" does not exist. Installing it now.
stdout_lines: <omitted>
When checking out the cluster-autoscaler-aws-cluster-autoscaler-xxx-x
in your cluster you see that it's stuck in a CrashLoopBackoff and checking the pods logs you will see the following error (usually near the beginning logs) and a large Stacktrace
$ kubectl get pods -n kube-system --selector app.kubernetes.io/instance=cluster-autoscaler
NAME READY STATUS RESTARTS AGE
cluster-autoscaler-aws-cluster-autoscaler-6c496cc6cc-zftxp 0/1 CrashLoopBackOff 7 (4m42s ago) 15m
$ kubectl logs -n kube-system cluster-autoscaler-aws-cluster-autoscaler-6c496cc6cc-zftxp
... truncated
F0227 16:39:34.624005 1 aws_cloud_provider.go:386] Failed to generate AWS EC2 Instance Types: UnauthorizedOperation: You are not authorized to perform this operation.
status code: 403, request id: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
... stacktrace truncated
The "Deploy cluster-autoscaler" task attempted to deploy the 9.25.0 autoscaler helm chart (or newer if you chose to override CLUSTER_AUTOSCALER_CHART_VERSION
) into your cluster, however the autoscaler deployment failed to start up due to the cluster-autoscaler role having insufficient policies configured.
As of release viya4-deployment:6.3.0 when installing the cluster-autoscaler on EKS 1.25+ clusters, the helm chart version 9.25.0 is used for compatibility reasons. This is because Kubernetes 1.25 has deprecated the PodDisruptionBudget policy/v1beta1
API version in favor of policy/v1
and this updated cluster-autoscaler version supports that change. This updated cluster-autoscaler chart requires a modified policy for the cluster-autoscaler role to properly function.
Note: As documented in our CONFIG-VARS.md, EKS 1.24 and lower clusters will still default to version 9.9.2 of the cluster-autoscaler helm chart.
Note: If you used viya4-iac-aws:5.6.0 or newer to create your infrastructure, these steps are not applicable for you. This role & policy should already be correct.
- Scale the
cluster-autoscaler-aws-cluster-autoscaler
deployment down to 0Use one of the two options below:kubectl scale --replicas=0 deployment/cluster-autoscaler-aws-cluster-autoscaler
- If you created your 1.25 EKS infrastructure prior to version 5.6.0 of the viya4-iac-aws project, after pulling the latest release you can run the following to update the cluster-autoscaler policy:
See Docker & Terraform usage in the viya4-iac-aws documentation for additional usage information
terraform apply -auto-approve \ -target=module.autoscaling["0"].aws_iam_policy.worker_autoscaling \ -var-file ${PATH_TO_TFVARS} -state ${PATH_TO_TFSTATE}
- Alternatively, if you have access to the AWS Console and go into the IAM Roles page and update the cluster-autoscaler role yourself.
- Once you are on the Roles page search for "cluster-autoscaler" and choose the one for your cluster.
- Under the "Permissions" tab expand the "eks-worker-autoscaling" policy
- Update the
eksWorkerAutoscalingAll
&eksWorkerAutoscalingOwn
Sids so that it matches the IAM policy as recommend by the kubernetes/autoscaler documentation. Make sure to leave theCondition
block as is.- Switch the repository to the tag of the version of the cluster-autoscaler you are deploying, so that you are viewing the correct documentation.
- If you created your 1.25 EKS infrastructure prior to version 5.6.0 of the viya4-iac-aws project, after pulling the latest release you can run the following to update the cluster-autoscaler policy:
- Scale the
cluster-autoscaler-aws-cluster-autoscaler
deployment back to 1kubectl scale --replicas=1 deployment/cluster-autoscaler-aws-cluster-autoscaler
While deploying the SAS Viya platform to a cluster with the "viya" and "install" Ansible task tags specified, the following error message is encountered when the "vdm : kustomize - Generate deployment manifest" task executes:
TASK [vdm : kustomize - Generate deployment manifest] ************************
fatal: [localhost]: FAILED! => changed=true
cmd:
- kustomize
- build
- <omitted>
- --load_restrictor=none
- -o
- <omitted>
delta: <omitted>
end: <omitted>
msg: non-zero return code
rc: 1
start: <omitted>
stderr: |-
Error: failed to apply json patch '- op: add
path: /spwc/volumeClaimTemplates/0/spec/storageClassName
value: sas': add operation does not apply: doc is missing path: "/spec/volumeClaimTemplates/0/spec/storageClassName": missing value
stderr_lines: <omitted>
stdout: ''
stdout_lines: <omitted>
The sas-data-agent-server-colocated component was added to the 2022.09 cadence of the SAS Viya Platform. That component contains a StatefulSet object which does not have a "/spec/volumeClaimTemplates/0/spec/storageClassName" path element. For viya4-deployment releases prior to v5.4.0, a PatchTransformer expects to find that path element in each StatefulSet.
As of release viya4-deployment:5.4.0, the StatefulSet PatchTransformer is intentionally skipped for the sas-data-agent-server-colocated component. Using release viya4-deployment:5.4.0 or later for your SAS Viya Platform deployment will eliminate this error.
After upgrading your AKS cluster's Kubernetes version to 1.24 or later, you are unable to access the SAS Viya Platform web apps. All the pods are running and errors are only seen in ingress-nginx logs:
W0320 20:15:25.141987 7 controller.go:1354] Using default certificate
W0320 20:15:25.141997 7 controller.go:1347] Unexpected error validating SSL certificate "deploy/sas-ingress-certificate-5gc77h2dhg" for server "*.deploy.test.example.com": x509: certificate is valid for test-aks.example.com, not *.deploy.test.example.com
W0320 20:15:25.142005 7 controller.go:1348] Validating certificate against DNS names. This will be deprecated in a future version
W0320 20:15:25.142013 7 controller.go:1353] SSL certificate "deploy/sas-ingress-certificate-5gc77h2dhg" does not contain a Common Name or Subject Alternative Name for server "*.deploy.test.example.com": x509: certificate is valid for test-aks.example.com, not *.deploy.test.example.com
This issue is related to Azure LoadBalancer’s probing. The appProtocol support inside cloud provider has broken ingress-nginx for AKS clusters >=1.22. The issue was caused by two reasons:
- the new version of nginx ingress controller added appProtocol and its probe path has to be
/healthz
; - the new version of cloud-controller-manager added HTTP probing with default path
/
for appProtocol=http services.
The Custom Load Balancer health probe
section in the Azure LoadBalancer document states that:
Tcp, Http and Https are three protocols supported by load balancer service. Currently, the default protocol of the health probe varies among services with different transport protocols, app protocols, annotations and external traffic policies.
- for local services, HTTP and /healthz would be used. The health probe will query NodeHealthPort rather than actual backend service
- for cluster TCP services, TCP would be used.
- for cluster UDP services, no health probes.
Since v1.20, service annotation
service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path
is introduced to determine the health probe behavior.
- For clusters <=1.23, spec.ports.appProtocol would only be used as probe protocol when
service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path
is also set.- For clusters >1.24, spec.ports.appProtocol would be used as probe protocol and
/
would be used as default probe request path (service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path
could be used to change to a different request path).
To resolve this issue the ingress-nginx version should be 1.3.0 (or later) with the following annotation configured :
--set controller.service.annotations."service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path"=/healthz
For Users upgrading their AKS cluster's Kubernetes version to 1.24 (or later) and used viya4-deployment v6.3.0 (or prior) for the SAS Viya Platform deployment, you must use viya4-deployment v6.4.0 (or later) and re-run the baseline install task.
If you prefer to continue using the existing viya4-deployment version then add the following in your ansible-var.yaml and re-run baseline install task :
INGRESS_NGINX_CHART_VERSION: 4.3.0
INGRESS_NGINX_CONFIG:
controller:
service:
annotations:
service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path: /healthz
You execute the viya4-deployment project and an Ansible task that uses a variable that you defined in your ansible-vars.yaml fails due to a Jinja2 templating error.
Example task below that failed while consuming the V4_CFG_CR_PASSWORD
variable from my ansible-vars.yaml
TASK [echo : orchestration tooling - my example task] ******************************************************************
fatal: [127.0.0.1]: FAILED! => {"msg": "An unhandled exception occurred while templating 'A1{%a%}{{b}}{#c#}#d##'. Error was a <class 'ansible.errors.AnsibleError'>, original message: template error while templating string: Encountered unknown tag 'a'.. String: A1{%a%}{{b}}{#c#}#d##"}
The variable that you defined in your ansible-vars.yaml
has a string value that contains a special Jinja2 character sequence that Ansible is attempt to templatize. To see a list of special Jinja2 characters view the list here in the Jinja2 documentation.
Ansible provides the !unsafe
keyword that you can place in front of your string values to block templating. String values with !unsafe
in front of them will be read as-is and will not require the user to escape the string themselves. It's important to note that using !unsafe
does not introduce a security vulnerability like the name may imply, it actually the opposite, marking data as unsafe prevents malicious users from abusing Jinja2 templates to execute arbitrary code on target machines.
Example:
# ansible-vars.yaml
V4_CFG_CR_PASSWORD: !unsafe "A1{%a%}{{b}}{#c#}#d##"
For additional information about the !unsafe
keyword see the Ansible Advanced playbook syntax documentation
In viya4-deployment v6.4.0 or before the default value for use-forwarded-headers
was set to true. This has raised a security concern and needs to be updated.
The document NGINX Ingress Controller states the use of use-forwarded-headers
as follows:
If true, NGINX passes the incoming X-Forwarded-* headers to upstream. Use this option when NGINX is behind another L7 proxy / load balancer that is setting these headers.
If false, NGINX ignores incoming X-Forwarded-* headers, filling them with the request information it sees. Use this option if NGINX is exposed directly to the internet, or it's behind a L3/packet-based load balancer that doesn't alter the source IP in the packets.
As NGINX is not behind another L7 proxy / load balancer we are setting the use-forwarded-headers
to false by default starting viya4-deployment v6.5.0 or later. If you wish to enable the incoming X-Forwarded headers then please add the following in your ansible-vars.yaml file.
INGRESS_NGINX_CONFIG:
controller:
config:
use-forwarded-headers: "true"
While deploying the SAS Viya platform into Google Cloud OR AWS cluster using a provider based kubernetes configuration file and setting V4_DEPLOYMENT_OPERATOR_ENABLED: false
in your ansible-vars.yaml
, the following error message is encountered:
In Google Cloud:
Error: Cannot create client for namespace 'deploy'
Caused by:
* Get " https://11.111.11.111/api?timeout=32s ": getting credentials: exec: executable gke-gcloud-auth-plugin not found
It looks like you are trying to use a client-go credential plugin that is not installed.
To learn more about this feature, consult the documentation available at:
https://kubernetes.io/docs/reference/access-authn-authz/authentication/#client-go-credential-plugins
Install gke-gcloud-auth-plugin for use with kubectl by following https://cloud.google.com/blog/products/containers-kubernetes/kubectl-auth-changes-in-gke
In AWS:
Error: Cannot create client for namespace 'deploy'
Caused by:
* Get "https://12345678123456781234123456785678.abc.us-west-1.eks.amazonaws.com/api?timeout=32s": getting credentials: exec: executable aws not found
It looks like you are trying to use a client-go credential plugin that is not installed.
To learn more about this feature, consult the documentation available at:
https://kubernetes.io/docs/reference/access-authn-authz/authentication/#client-go-credential-plugins
If you are using a provider based kubernetes configuration file; one that relies on external binaries from the cloud provider to authenticate into the kubernetes cluster (AWS & Google Cloud), there are deployment constraints you need to consider when planning your SAS Viya platform deployment when using this project. If you are using a "kubernetes service account and cluster role binding" or "static" based kubernetes configuration file it will be compatible will all SAS Viya platform deployment methods as well as ways to execute this project, and the statements below are not applicable.
Some background information, using the V4_DEPLOYMENT_OPERATOR_ENABLED
flag in your ansible-vars.yaml
you are able to control the method of deployment that this project will use to deploy SAS Viya.
V4_DEPLOYMENT_OPERATOR_ENABLED: true
, the SAS Viya Platform Deployment Operator will be installed into the cluster and used to deploy the SAS Viya platformV4_DEPLOYMENT_OPERATOR_ENABLED: false
, the sas-orchestration command whose tooling is delivered as a Docker image, is used to deploy the SAS Viya platform
Alongside the two SAS Viya deployments methods, considerations for the two different ways that this project, viya4-deployment, can be run will also need to be made. You can either:
- Clone this project and execute it using the ansible-playbook binary you have installed on your host
- Alternatively you can build a Docker image with the Dockerfile provided in this repository and run it using the Docker run command.
The combination of setting V4_DEPLOYMENT_OPERATOR_ENABLED: false
and running directly on your host using the ansible-playbook
command is where using a provider based kubernetes configuration file is not compatible.
When the sas-orchestration
tooling is run (as a Docker container) to deploy SAS Viya into the cluster, the required binaries from the cloud provider for authentication are not present, meaning that the tooling will not be able to connect to the cluster to perform the deployment.
When running the viya4-deployment project as a Docker container the sas-orchestration
tooling is run in a slightly different manner to get around this limitation. We make use of skopeo
to exact the contents of the sas-orchestration
tooling image directly into our running viya4-deployment container. Since in our Dockerfile we include the installation of the required authentications binaries for Google Cloud and AWS, the sas-orchestration
tooling is able to make use of them and successfully connect to the kubernetes cluster.
You have a couple of options:
- If you would still like to deploy the SAS Viya platform with the
sas-orchestration
command with your existing kubernetes configuration file, it is recommended to build the Docker image for this project with the Dockerfile and run it using the Docker run command. - If you created your infrastructure with the sassoftware/viya4-iac-* projects, you can go back and set
create_static_kubeconfig=true
and runterraform apply
again to generate a "static" kubeconfig file that is compatible withsas-orchestration
. - Using your existing provider based kubernetes configuration and
kubectl
you can alternatively create a new ServiceAccount, associate a service-account-token to it, and grant it admin permissions using RBAC. You should be able to use the ca cert and token from service-account-token to create your own "static" kubernetes configuration file.- See Kubernetes documentation
- Note: this is what the option above setting
create_static_kubeconfig=true
and runningterraform apply
would do for you automatically.
You have an existing SAS Viya platform deployment that was created using the viya4-deployment project, and you have a new license that you would like to apply to your deployment.
After downloading the license file perform the following steps:
- Set
V4_CFG_LICENSE
to path where your license file is located. Note, it is a.jwt
file. - Using viya4-deployment rerun the
viya,install
tasks to regenerate yourkustomization.yaml
that will now have an updated reference to the new license file, generate the SASDeployment custom resource file, and apply it into your cluster. - You will see your license file referenced in the
kustomization.yaml
as a generator, look forsite-config/vdm/generators/sas-license.yaml
- Note: If you are no longer using viya4-deployment and are updating the license on your own by following the SAS Viya Platform Operations Guide, this would be the line to remove from your
kustomization.yaml
- Note: If you are no longer using viya4-deployment and are updating the license on your own by following the SAS Viya Platform Operations Guide, this would be the line to remove from your
Information about licenses from the SAS Viya Platform Operations Guide
Note, these steps are only applicable for updating your license file, if you are going to be updating the SAS deployment or including additional products in your order we recommend that your perform your update manually. See this note in the README
The EC2 Load Balancer that get provisioned dynamically by AWS during the baseline install phase of viya4-deployment when ingress-nginx
is installed does not have the desired tags associated with it.
Based on this Network Load Balancing documentation from AWS, you can set the service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags
annotation your ingress-nginx
configuration to customize the tags for your load balancer. To do this in the context of viya4-deployment, perform the following steps.
- In your
ansible-vars.yaml
file, defineINGRESS_NGINX_CONFIG
and provide it with your own configuration values.- If you want to use the defaults that viya4-deployment uses, you can just copy the
INGRESS_NGINX_CONFIG
variable, and it's default configuration from here: https://github.com/sassoftware/viya4-deployment/blob/main/roles/baseline/defaults/main.yml. If you are copying it from this file, you will need to update theloadBalancerSourceRanges
value within the configuration yourself.
- If you want to use the defaults that viya4-deployment uses, you can just copy the
- Underneath the
controller.service.annotations
stanza in the configuration, you will need to add the following key,service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags
and give it in values in the form of"tagname1=tagvalue1,tagname2=tagvalue2"
. See the example below where I add the tagsmytag=foo
andresourceowner="johnsmith"
# defined in ansible-vars.yaml INGRESS_NGINX_CONFIG: controller: service: externalTrafficPolicy: Local sessionAffinity: None loadBalancerSourceRanges: ["0.0.0.0/0"] # you will need to update this for your own environment annotations: service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags: "mytag=foo,resourceowner=johnsmith" config: use-forwarded-headers: "false" hsts-max-age: "63072000" hide-headers: Server,X-Powered-By tcp: {} udp: {} lifecycle: preStop: exec: command: [/bin/sh, -c, sleep 5; /usr/local/nginx/sbin/nginx -c /etc/nginx/nginx.conf -s quit; while pgrep -x nginx; do sleep 1; done] terminationGracePeriodSeconds: 600
- When the
baseline,install
ansible tasks are run andingress-nginx
is installed, the EC2 Load Balancer that gets provisioned by AWS will have those tags you specified.
Deploying with cadence versions > 2024.06 without creating the external PostgreSQL SharedServices database
While deploying with a cadence version >= 2024.06 AND:
- you are targeting an IaC-provisioned cluster with an External PostgreSQL Database Server
- you didn't create the SharedServices database prior to running viya4-deployment
most pods will fail to initialize. The following error message can be found in the sas-data-server-operator pod:
$ kubectl logs deployment/sas-data-server-operator
{
"level":"error",
"source":"sas-data-server-operator-65c874585-xwzgr",
"messageParameters":{
"p1":"failed to connect to `host=example-default-flexpsql.postgres.database.azure.com user=pgadmin database=SharedServices`: server error (FATAL: database \"SharedServices\" does not exist (SQLSTATE 3D000))"
},
"messageKey":"failed to initialize database, got error %v",
"message":"failed to initialize database, got error %v"
}
{
"level":"error",
"source":"sas-data-server-operator-65c874585-xwzgr",
"messageKey":"Reconciler error",
"properties":{
"error":"database server is external and cannot connect to the SAS database",
"caller":"logr/logr.go:49"
},
"attributes":{
"DataServer":{
"name":"sas-platform-postgres",
"namespace":"deploy"
},
},
"message":"Reconciler error"
}
Due to changes in the sas-data-server-operator, the SharedServices database is not created automatically during the initial deployment of the SAS Viya platform. Instead, you must manually create it before you start the SAS Viya platform deployment
For more information, please refer to the External Postgres Requirements documentation.