-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] [AWS] cluster networking issues using calico plugin - NodePort service not always responding #1129
Comments
@przemyslavic can you add steps to reproduce here? |
So far I am not able to reproduce this on a AWS-Ubuntu-Callico, AWS-Ubuntu-Canal combo. Using the following config: kind: epiphany-cluster
name: default
provider: aws
specification:
admin_user:
name: ubuntu
key_path: /home/vscode/ssh/id_rsa_epi
cloud:
region: eu-west-3
credentials: # todo change it to get credentials from vault
key: blablabla
secret: blablabla
use_public_ips: true
components:
kubernetes_master:
count: 1
machine: kubernetes-master-machine
configuration: default
subnets:
- availability_zone: eu-west-3a
address_pool: 10.1.1.0/24
- availability_zone: eu-west-3b
address_pool: 10.1.2.0/24
kubernetes_node:
count: 2
machine: kubernetes-node-machine
configuration: default
subnets:
- availability_zone: eu-west-3a
address_pool: 10.1.1.0/24
- availability_zone: eu-west-3b
address_pool: 10.1.2.0/24
logging:
count: 1
machine: logging-machine
configuration: default
subnets:
- availability_zone: eu-west-3a
address_pool: 10.1.3.0/24
monitoring:
count: 1
machine: monitoring-machine
configuration: default
subnets:
- availability_zone: eu-west-3a
address_pool: 10.1.4.0/24
kafka:
count: 0
machine: kafka-machine
configuration: default
subnets:
- availability_zone: eu-west-3a
address_pool: 10.1.5.0/24
postgresql:
count: 1
machine: postgresql-machine
configuration: default
subnets:
- availability_zone: eu-west-3a
address_pool: 10.1.6.0/24
load_balancer:
count: 0
machine: load-balancer-machine
configuration: default
subnets:
- availability_zone: eu-west-3a
address_pool: 10.1.7.0/24
rabbitmq:
count: 0
machine: rabbitmq-machine
configuration: default
subnets:
- availability_zone: eu-west-3a
address_pool: 10.1.8.0/24
ignite:
count: 0
machine: ignite-machine
configuration: default
subnets:
- availability_zone: eu-west-3a
address_pool: 10.1.9.0/24
opendistro_for_elasticsearch:
count: 0
machine: logging-machine
configuration: default
subnets:
- availability_zone: eu-west-3a
address_pool: 10.1.10.0/24
single_machine:
count: 0
machine: single-machine
configuration: default
subnets:
- availability_zone: eu-west-3a
address_pool: 10.1.1.0/24
- availability_zone: eu-west-3b
address_pool: 10.1.2.0/24
name: awsu
prefix: 'test'
title: Epiphany cluster Config
---
kind: configuration/applications
title: Kubernetes Applications Config
name: default
specification:
applications:
- name: ignite-stateless
enabled: no
image_path: apacheignite/ignite:2.5.0
namespace: ignite
service:
rest_nodeport: 32300
sql_nodeport: 32301
thinclients_nodeport: 32302
replicas: 1
enabled_plugins:
- ignite-kubernetes
- ignite-rest-http
- name: rabbitmq
enabled: no
image_path: rabbitmq:3.7.10
use_local_image_registry: true
service:
name: rabbitmq-cluster
port: 30672
management_port: 31672
replicas: 2
namespace: queue
rabbitmq:
plugins:
- rabbitmq_management
- rabbitmq_management_agent
policies:
- name: ha-policy2
pattern: .*
definitions:
ha-mode: all
custom_configurations:
- name: vm_memory_high_watermark.relative
value: 0.5
cluster:
- name: auth-service
enabled: yes
image_path: jboss/keycloak:9.0.0
use_local_image_registry: true
service:
name: as-testauthdb
port: 30104
replicas: 2
namespace: namespace-for-auth
admin_user: auth-service-username
admin_password: PASSWORD_TO_CHANGE
database:
name: auth-database-name
user: auth-db-user
password: PASSWORD_TO_CHANGE
- name: pgpool
enabled: no
image:
path: bitnami/pgpool:4.1.1-debian-10-r29
debug: no
namespace: postgres-pool
service:
name: pgpool
port: 5432
replicas: 3
pod_spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- pgpool
topologyKey: kubernetes.io/hostname
nodeSelector: {}
tolerations: {}
resources:
limits:
memory: 176Mi
requests:
cpu: 250m
memory: 176Mi
pgpool:
env:
PGPOOL_POSTGRES_USERNAME: epi_pgpool_postgres_admin
PGPOOL_SR_CHECK_USER: epi_pgpool_sr_check
PGPOOL_ADMIN_USERNAME: epi_pgpool_admin
PGPOOL_ENABLE_LOAD_BALANCING: yes
PGPOOL_MAX_POOL: 4
PGPOOL_POSTGRES_PASSWORD_FILE: /opt/bitnami/pgpool/secrets/pgpool_postgres_password
PGPOOL_SR_CHECK_PASSWORD_FILE: /opt/bitnami/pgpool/secrets/pgpool_sr_check_password
PGPOOL_ADMIN_PASSWORD_FILE: /opt/bitnami/pgpool/secrets/pgpool_admin_password
secrets:
pgpool_postgres_password: PASSWORD_TO_CHANGE
pgpool_sr_check_password: PASSWORD_TO_CHANGE
pgpool_admin_password: PASSWORD_TO_CHANGE
pgpool_conf_content_to_append: |
#------------------------------------------------------------------------------
# CUSTOM SETTINGS (appended by Epiphany to override defaults)
#------------------------------------------------------------------------------
# num_init_children = 32
connection_life_time = 900
reserved_connections = 1
- name: pgbouncer
enabled: no
image_path: brainsam/pgbouncer:1.12
init_image_path: bitnami/pgpool:4.1.1-debian-10-r29
namespace: postgres-pool
service:
name: pgbouncer
port: 5432
replicas: 2
resources:
requests:
cpu: 250m
memory: 128Mi
limits:
cpu: 500m
memory: 128Mi
pgbouncer:
env:
DB_HOST: pgpool.postgres-pool.svc.cluster.local
DB_LISTEN_PORT: 5432
LISTEN_ADDR: '*'
LISTEN_PORT: 5432
AUTH_FILE: /etc/pgbouncer/auth/users.txt
AUTH_TYPE: md5
MAX_CLIENT_CONN: 150
DEFAULT_POOL_SIZE: 25
RESERVE_POOL_SIZE: 25
POOL_MODE: transaction
version: 0.7.0
provider: aws
---
kind: configuration/kubernetes-master
title: "Kubernetes Master Config"
name: default
provider: aws
specification:
version: 1.17.4
cluster_name: "kubernetes-epiphany"
allow_pods_on_master: False
storage:
name: epiphany-cluster-volume # name of the Kubernetes resource
path: / # directory path in mounted storage
enable: True
capacity: 50 # GB
data: {} #AUTOMATED - data specific to cloud provider
advanced: # modify only if you are sure what value means
api_server_args: # https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/
profiling: false
enable-admission-plugins: "AlwaysPullImages,DenyEscalatingExec,NamespaceLifecycle,ServiceAccount,NodeRestriction"
audit-log-path: "/var/log/apiserver/audit.log"
audit-log-maxbackup: 10
audit-log-maxsize: 200
controller_manager_args: # https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/
profiling: false
terminated-pod-gc-threshold: 200
scheduler_args: # https://kubernetes.io/docs/reference/command-line-tools-reference/kube-scheduler/
profiling: false
networking:
dnsDomain: cluster.local
serviceSubnet: 10.96.0.0/12
plugin: calico # valid options: calico, flannel, canal (due to lack of support for calico on Azure - use canal)
imageRepository: k8s.gcr.io
certificatesDir: /etc/kubernetes/pki
etcd_args:
encrypted: yes Using the following script trying to reproduce it trying bothe from master -> node and node -> master: while :
do
START_TIME=$(date +%s%3N)
OUTPUT=$(curl -o /dev/null -s -w '%{http_code}' -k https://node:30104/auth/ )
ELAPSED_TIME=$(expr $(date +%s%3N) - $START_TIME)
echo "Request httpcode: $OUTPUT, Time: $ELAPSED_TIME milliseconds"
sleep 2
done After running it for a few hours I see request times between 30 to 100ms but not 2 to 3 minutes that is reported by @przemyslavic. Issue seems similar to #1072. |
Shall we add fix to 0.6.x branch? |
Yes, please add fix to 0.6.x |
So this should not be part of 0.7.1 but a 0.6.1 epic. Also I think its better to wait for 0.7.1 release before we backmerge since there are some fixes beeing made with K8s there. |
Sure, for now I created 0.6.1 milestone (without due date) and assigned this issue to it. |
I put this in the blocked colomn for now since we need to await the 0.7.1 release. |
I moved it to correct 0.6.1 release and removed it from 0.6.1 milestone |
Add information to change log known issues section. |
Handled in this PR |
Describe the bug
Cluster networking issues using calico plugin - NodePort service not always responding.
Keycloak deployment on cluster is turn on. Timeout of response is random.
Logs
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Config files
OS (please complete the following information):
Cloud Environment (please complete the following information):
Additional context
The text was updated successfully, but these errors were encountered: