-
Notifications
You must be signed in to change notification settings - Fork 499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add a recovery test on node deletion for eks/gke #2119
Conversation
13016c0
to
e5dc491
Compare
@@ -119,8 +127,6 @@ var _ = ginkgo.Describe("[tidb-operator][Stability]", func() { | |||
ginkgo.AfterEach(func() { | |||
ginkgo.By("Uninstall tidb-operator") | |||
oa.CleanOperatorOrDie(ocfg) | |||
ginkgo.By("Uninstalling CRDs") | |||
oa.CleanCRDOrDie() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't delete CRDs, otherwise we can't keep test pods with --delete-namespace=false
.
02ba934
to
5f8bcac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM
pkg/pdapi/pdapi.go
Outdated
|
||
StoreStateUp = "Up" | ||
StoreStateOffline = "Offline" | ||
StoreStateTombstone = "Tombstone" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use TiKVStateTombstone
in pkg/apis/pingcap/v1alpha1/types.go
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/merge |
/run-all-tests |
@cofyc merge failed. |
/merge |
/run-all-tests |
@cofyc merge failed. |
/merge |
/run-all-tests |
cherry pick to release-1.1 in PR #2128 |
* add a recovery test on node deletion for aws/eks * support gke * use constants defined in pkg/apis/pingcap.com * more information for GKE (managed instance group) Co-authored-by: Yecheng Fu <[email protected]>
updated by comment Update main.go enable github actions (pingcap#1690) deploy: close connection after set privilege (pingcap#1692) fix tls client cert bug (pingcap#1693) Co-authored-by: Song Gao <[email protected]> change check podList update blockwrite imagePullPolicy update DefaultPollTimeout fix error msg Add secret get/list for webhook rbac (pingcap#1704) add more SANs to tidb server certificate (pingcap#1702) * add more SANs to tidb server certificate * address comment * address comment * codegen * fix CI tidb-backup: add restoreUsingExistingVolume option (pingcap#1708) Add consecutive count check for Auto-scaling (pingcap#1703) * Add consecutive count check for Auto-scaling tidb-backup: restore respects resources, imagePullPolicy, nodeSelector (pingcap#1705) tidb-initializer:close connection after set privilege (pingcap#1710) fix autoscaler api (pingcap#1718) use kubetest2 to run our e2e and support GKE provider (pingcap#1716) fix tidb-lightning errors (pingcap#1723) fixes for pingcap@9a3b1e2 Add auto-scaling calculation based by CPU load (pingcap#1722) * add cpu metrics func Remove consecutive check (pingcap#1732) * remove consecutive check temporarily Support user-defined tidb server/client certificate (pingcap#1714) * support user custom certificate * refine API * fix typo * fix some bugs * create service before certificate * tiny fix * address comments * address comment * address comment Co-authored-by: Song Gao <[email protected]> Finish auto-scaler controller (pingcap#1731) * finish auto-scaler controller * revise compare tc add basic yaml deployment example (pingcap#1573) * add basic yaml deployment example Signed-off-by: Aylei <[email protected]> * Address review comments Signed-off-by: Aylei <[email protected]> * fix typo Signed-off-by: Aylei <[email protected]> Co-authored-by: Yecheng Fu <[email protected]> Co-authored-by: Song Gao <[email protected]> Update deploy_tidb_operator_staging.groovy (pingcap#1740) Co-authored-by: pingcap-github-bot <[email protected]> add stale github action to close stale issues/prs (pingcap#1743) addd logs in stability test add debug log Update _start_pd.sh.tpl Update main.go add logs Fix TidbMonitor template error (pingcap#1745) * Fix TidbMonitor template error support eks provider in e2e (pingcap#1728) * support eks provider in e2e * upgrade to use kubetest v0.0.3 * prefix image tag with CLUSTER, then multiple clusters can be started in same project/account * - upgrade kubetest2-eks to v0.0.4 - use unique node group name * $RANDOM should be enough * support KUBE_WORKERS * fix mngName * fix e2e bug * specify runner suite name * increase open files for containers automatically * use kubetest2 v0.0.6 * --up-retries * decrease concurrency because in each node we will start a lots of pod Co-authored-by: Song Gao <[email protected]> Support AdvancedStatefulSet in admission webhook (pingcap#1640) * Support AdvancedStatefulSet in admission webhook Make TidbMonitor intergrated in AutoScaler (pingcap#1747) add deploy yamls for dm with new ha architecture (pingcap#1738) * add deploy yamls for dm with new ha architecture * fix format * address comments * add configmap for dm-master Allow to configure Affinity or Tolerations for Backups and Restores (pingcap#1737) * Allow to configure Affinity or Tolerations for Backups and Restores * Add affinity and tolerations options to Helm charts for backup and restore * Update CRD's, affinity and tolerations for backup and restore Co-authored-by: Song Gao <[email protected]> update actions/checkout to v2 (pingcap#1758) Co-authored-by: Song Gao <[email protected]> Revise controller log and fix deployment template error (pingcap#1735) * revise log and fix deployment template error use /hack/e2e.sh to run a single node kind cluster for develop (pingcap#1749) upgrade local volume provisioner to 2.3.4 (pingcap#1778) https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner/releases/tag/v2.3.4 make the drainer name configurable (pingcap#1604) * make the drainer name configurable This is used for the statefulset/pod names. The release name is already unique, so I would actually suggest just using that without the cluster name. However, that is a backwards incompatible change that I hesitate to make. * add a warning about updating the drainer name Co-authored-by: Yecheng Fu <[email protected]> Co-authored-by: Song Gao <[email protected]> Added marketplace product code filter for bastion to avoid selecting AMI from wrong vendor (pingcap#1775) tls: fix cluster TLS while using CR to create cluster (pingcap#1773) binlog: add tls in pump and drainer (pingcap#1739) use TidbCluster CRD to simplify the test and increase wait timeout (pingcap#1786) release v1.1.0-beta.2 (pingcap#1768) * release v1.1.0-beta.1 * update * Update CHANGELOG-1.1.md Co-Authored-By: weekface <[email protected]> * Apply suggestions from code review Co-Authored-By: Keke Yi <[email protected]> * enable tidbBackupManagerImage and use tagged version Co-authored-by: weekface <[email protected]> Co-authored-by: Keke Yi <[email protected]> Co-authored-by: Song Gao <[email protected]> terraform fmt (pingcap#1792) Update tidb-backup-manager image name (pingcap#1791) binglog: fix tls error when create pump with TLS when use CRD (pingcap#1799) fix master ci (pingcap#1802) add prefix for remote storage (pingcap#1790) fix tikv cluster tls bug (pingcap#1808) Manage hot region label for the tikv created by auto-scaler (pingcap#1801) * Manage hot region label for the tikv created by auto-scaler Co-authored-by: DanielZhangQD <[email protected]> Replace glog with klog (pingcap#1805) (pingcap#1813) enable defaulting (pingcap#1816) show cli flags in logs (pingcap#1807) use k8s standard tls secret format (pingcap#1824) * use standard tls secret format * fix tls config in prometheus scrape config Refactor Admission Webhook templates and values (pingcap#1832) * Refactor Admission Webhook templates and values Co-Authored-By: DanielZhangQD <[email protected]> Make evict leader scheduler compatitable (pingcap#1831) * fix evict leader Co-authored-by: pingcap-github-bot <[email protected]> upgrade kubetest2 to v0.0.7 (pingcap#1839) fix defaulting (pingcap#1845) Default TidbMonitor targetRef Namespace (pingcap#1834) current delete slot annotations check in Advanced Statefulset upgrader is not right (pingcap#1851) add hack/local-up-operator.sh to run tidb-operator locally and test examples (pingcap#1854) Support no secret for s3/ceph (pingcap#1817) * Support no secret for s3/ceph This is required if you use EKS ServiceAccount -> IAM role authentication via OIDC. * Use the environment directly for AWS credentials for rclone * Fixes to backup scripts * Update backup image to `pingcap/tidb-cloud-backup:20200229` Co-authored-by: DanielZhangQD <[email protected]> Co-authored-by: Tennix <[email protected]> Backup/Restore: support configuring TiKV GC life time (pingcap#1835) fix tidb defaulting (pingcap#1860) Backup: support TLS for br component (pingcap#1836) * backup: add TLS to backup br support starting tidb-server with `-advertise-address` parameter (pingcap#1859) * start tidb-server with * add EnableAdvertiseAddress switch * fix indent * address comments Fix hot region label setting for tikv auto-scaling (pingcap#1833) * mutate * fix log * add admission configuration * remove useless log * format by comment * use tikv cli * remove useless code * remove cmlister * fix lint * fix tpl Change the lightning restore image (pingcap#1869) Fix wrong method to get tikv configmap in mutation webhook (pingcap#1871) tls: Enable TLS For MySQL Clients (pingcap#1867) * Enable TLS For MySQL Clients * address comments Add timestamp annotation in tidbcluster statefulset (pingcap#1875) * Add timestamp annotation in tidbcluster statefulset fix drainer chart: unexpected define in command (pingcap#1873) fix kubetest2 version check (pingcap#1881) remove unnecessary setup (pingcap#1880) Fix defaulting webhook error (pingcap#1876) * fix defaulting * remove config validation * fix e2e test * fix e2e test feature:make Service port name configurable for tidb and pd service (pingcap#1823) * feature:make Service port name configurable for tidb and pd service * reset default port name * reset default port name * reset default port name * recover pd service clusterIP * comment pd and tidb service portName * set pd and tidb port name value in yaml * set pd and tidb port name value in yaml Fix tikv configuration key in toml and add an ut case (pingcap#1887) Fix nil error for update statefulset util (pingcap#1896) guide on manual tests in development (pingcap#1882) Co-authored-by: pingcap-github-bot <[email protected]> run e2e tests in gke (pingcap#1889) fix operator failover config invalid (pingcap#1877) use cert-manager to create and renew tidb-server certificates (self-signed example) (pingcap#1844) * selfsigned tls cert created by cert-manager * add tests improve note and revise idc config (pingcap#1904) Co-authored-by: pingcap-github-bot <[email protected]> Support IAM role for backup CRD (pingcap#1861) tls: TLS between TiDB components (pingcap#1870) add aws ami version link (pingcap#1903) add tikv-importer chart (pingcap#1910) * add tikv-importer chart * resolved some suggestions Co-authored-by: DanielZhangQD <[email protected]> fix prometheus scrape config issue while TLS is enabled (pingcap#1919) * fix prometheus scrape config while tls is enabled * fix chart problem * fix chart problem update eks e2e script and jenkins file (pingcap#1915) backup: support kms decryption secret (pingcap#1908) Support sync bucket in lightning (pingcap#1629) * Support sync bucket in lightning Signed-off-by: Aylei <[email protected]> * fix nodeSelector is not respected in tidb-lightning chart Signed-off-by: Aylei <[email protected]> * Fix nodeSelector indention Signed-off-by: Aylei <[email protected]> Add API document and its generating util (pingcap#1929) defaulting tikv container privileged field (pingcap#1933) Backup: make tikv support add serviceaccount and switch rclone env_auth to true (pingcap#1930) re comment pd service yaml value (pingcap#1850) * re comment pd service yaml value * fix check tidb clusters which own builtin StatefulSets only in upgrading (pingcap#1934) * Revert "current delete slot annotations check in Advanced Statefulset upgrader is not right (pingcap#1851)" This reverts commit 596d10d. * only check relevant tidb clusters fix drainer installation error (pingcap#1961) fix bug in e2e-examples script (pingcap#1957) Adding tolerations and affinity to the discovery chart template (pingcap#1959) update permission for tidb-controller-manager and add example for tidb-monitor (pingcap#1954) * update permission for tidb-controller-manager and add example for tidb-monitor * address comments Fix some webhook error (pingcap#1963) * fix webhook error backup: fix kms bug (pingcap#1955) backup: mask visual tables when dumper (pingcap#1970) add a serial test for stable scheduling (pingcap#1972) make tidb-initializer support TLS (pingcap#1931) some cleanups in e2e (pingcap#1974) TLS support for Pump and Drainer (pingcap#1979) Fix TidbMonitor several error (pingcap#1962) * Fix TidbMonitor several error lightning: support lightning use IAM (pingcap#1975) configure default parameters via envs in Jenkins job (pingcap#1989) fix clean bug (pingcap#1991) Add doc and examples for auto-scaler and intializer (pingcap#1772) * add doc and examples * fix by lint * revise the example * revise init * revise examples * Update tidb-cluster.yaml * revise by comment * revise examples * fix by lint * address the comment * Update examples/initialize/README.md Co-Authored-By: DanielZhangQD <[email protected]> * Update examples/auto-scale/README.md Co-Authored-By: DanielZhangQD <[email protected]> Co-authored-by: DanielZhangQD <[email protected]> backup: support br compatible with new TLS interface (pingcap#1988) * backup: support br compatiable with new TLS interface Mount google-cloud-sdk into e2e image (pingcap#1997) add stability e2e group and a basic case (pingcap#1986) Co-authored-by: Song Gao <[email protected]> Allocate tidb.initializer.resources to initcontainer in tidb initializer job (pingcap#1938) Co-authored-by: DanielZhangQD <[email protected]> Co-authored-by: pingcap-github-bot <[email protected]> pin alicloud version to fix ci errors (pingcap#2006) improve orphan pods clean logic (pingcap#2007) - check pod has been scheduled or not - use ResourceVersion precondition fix args passing (pingcap#2010) Update TiDB Config to v3.1.0 (pingcap#1906) * update tidb config Update PD Config to v3.1.0 (pingcap#1928) * update pd config create tidb cluster with cr on aws (pingcap#2004) Backup: open mysql client TLS in backup (pingcap#2003) create tidb cluster on ack with cr (pingcap#2012) * create tidb cluster on ack with cr * update variable name * update variable name * update default replicas Add tikv store limit pattern (pingcap#1965) * add tikv limit pattern fix default value of separateSlowLog (pingcap#2023) UCP: additionalPrinterColumns tidbautoscaler (pingcap#1943) Limit Autofailure condition (pingcap#2015) don't run on k8s-node if branch is a commit (pingcap#2032) fix crd util (pingcap#2031) remove dependencies on k8s-node (pingcap#2036) * remove dependencies on k8s-node * fix examples for advanced statefulset (pingcap#2039) Deploy TiDB Cluster with CR via TiDB Operator v1.1 on GKE (pingcap#2027) install tidb-operator in test namespace in non-parallel test specs (pingcap#2029) * install tidb-operator in test namespace in non-parallel test specs * check tidb pods only Able to configure custom env for components (pingcap#2052) * Able to configure custom env for components * codegen Co-authored-by: DanielZhangQD <[email protected]> fix error fix error Update failover.go fix failover pd format add spec.paused field to pause the syncing of tidb cluster (pingcap#2013) Default Tidb Log File Configuration (pingcap#2045) * default tidb file log config Fix TidbMonitor Service Label (pingcap#2051) * Fix TidbMonitor Service Label Fix location label (pingcap#1941) * Fix location label Signed-off-by: Aylei <[email protected]> * Fix api doc Signed-off-by: Aylei <[email protected]> * Separate struct used for crd and pd client Signed-off-by: Aylei <[email protected]> * Fix boilerplate Signed-off-by: Aylei <[email protected]> Co-authored-by: Song Gao <[email protected]> Co-authored-by: Yecheng Fu <[email protected]> Co-authored-by: DanielZhangQD <[email protected]> backup: fix issue pingcap#2028 (pingcap#2062) cert-allowed-cn support (pingcap#2061) * cert-allowed-cn support * cert-allowed-cn for drainer * tiny fix * fix ci Co-authored-by: DanielZhangQD <[email protected]> Co-authored-by: pingcap-github-bot <[email protected]> remove enableAdvertiseAddress field, --advertise-address should be (pingcap#2076) always configured backup: fix issue 1657 (pingcap#2071) use tidb-lightning in restore instead of loader (pingcap#2068) * use tidb-lightning * Update restore.go * Update cmd/backup-manager/app/import/restore.go Co-Authored-By: DanielZhangQD <[email protected]> * Update images/tidb-backup-manager/Dockerfile Co-Authored-By: DanielZhangQD <[email protected]> Co-authored-by: DanielZhangQD <[email protected]> Co-authored-by: pingcap-github-bot <[email protected]> should not change relative order of envs (pingcap#2086) release v1.1.0-rc.1 (pingcap#2072) * release v1.1.0-rc.1 * address comments * address comments * address comments BR e2e test in AWS (pingcap#2038) readme, static: update doc links and image in readme (pingcap#2094) * readme, static: update doc links and image in readme * add description to documentation Co-authored-by: DanielZhangQD <[email protected]> docs: remove unnecessary duplicated docs (pingcap#2098) UCP pingcap#1753: add timeout config for query metrics from Prometheus (pingcap#2093) Signed-off-by: Qiannan Lyu <[email protected]> Co-authored-by: DanielZhangQD <[email protected]> fix pd failover reocver Update failover.go fix begin insert Pod Update failover.go remove unnecessary change remove code add cluster2 fix check update failover fix stopNode add cluster3 fix alertmanager (pingcap#2108) check tidb cluster in owner references (pingcap#2112) Update Doc util (pingcap#2115) upgrade cert-manager to v0.14.1 in example tests (pingcap#2118) Add unit test for Auto-scaling Util (pingcap#2111) add a recovery test on node deletion for eks/gke (pingcap#2119) Set PD Dashboard Config when TLS Client enabled (pingcap#2085) support v<major>.<minor> format in KUBE_VERSION and add v1.18 support (pingcap#2126) * support v<major>.<minor> format in KUBE_VERSION and add v1.18 support * move hack::ensure_xxx after envs are printed Co-authored-by: DanielZhangQD <[email protected]> kill tidb-operator pods randomly in e2e (pingcap#2125) * kill tidb-operator pods randomly in e2e * don't use channel in configuration struct * add successful log crd for tiflash (pingcap#2122) * crd for tiflash * generated files * update defaulting for tiflash config * address comments * fix ci * update crd * update storage type definition Make webhook tls configuration easy to use (pingcap#2135) * Make webhook tls configuration easy to use change tidb readness probe to TCPSocket 4000 port (pingcap#2139) tls for tikv metircs api (pingcap#2137) Remove unnecessary informer caches (pingcap#1504) Add auto-scaling e2e test (pingcap#2123) * add auto-scaling e2e * fix interval error * fix e2e process * remove useless code * Update tests/e2e/tidbcluster/serial.go Co-Authored-By: Yecheng Fu <[email protected]> * address the comment * Update tests/e2e/tidbcluster/serial.go Co-Authored-By: DanielZhangQD <[email protected]> * address by comment * add log Co-authored-by: Yecheng Fu <[email protected]> Co-authored-by: DanielZhangQD <[email protected]> Add e2e test for upgrading from 1.0.x (pingcap#2145) Add e2e test for upgrading from 1.0.x fix terraform destroy failure on aws (pingcap#2148) scripts to run e2e against OpenShift 4 (pingcap#2141) fix e2e error fix tag recover remove unnecessary change add log for node stop fix error fix lint security: tikv encryption kms config (pingcap#2151) Skip TLS when connecting to TiDB Server (pingcap#2143) deploy controller for tiflash (pingcap#2157) add AGE column (pingcap#2168) Add unit test for restore controller (pingcap#2166) * add unit tests for restore controller * tiny fix * address comments * fix CI fix a typo (pingcap#2167) Add more events for tidbcluster and autoscaler (pingcap#2150) * add event for tidbcluster and auto-scaler * fix unit test * Update pkg/manager/member/upgrader.go Co-Authored-By: DanielZhangQD <[email protected]> * revise scaling logic * revise logic * fix failover event * remove upgrading event * remove scaling event * remove unnecessary event * remove useless code * revert changes Co-authored-by: DanielZhangQD <[email protected]> Remove unused certificate control and related code. (pingcap#2176) fix debug docker (pingcap#2187) use fixed job names (pingcap#2188) Add spec.pd.maxFailoverCount to limit max failover replicas for PD (pingcap#2184) * Add spec.pd.maxFailoverCount to limit max failover replicas for PD * update api generated files remove error add more log fix openshift job (pingcap#2192) Support Auto-scaling status (pingcap#2182) * add tac status * update * fix status * update rc * revert replicas * update notes * add last ts * address the comment * use metav1.Time Wait for the VM to be ready in CI (pingcap#2194) * fix a typo * wait for the vm to be ready release v1.1.0-rc.2 (pingcap#2197) * release v1.1.0-rc.2 * Update CHANGELOG-1.1.md Co-Authored-By: Ran <[email protected]> * Apply suggestions from code review Co-Authored-By: Ran <[email protected]> * update Co-authored-by: Ran <[email protected]> update-version (pingcap#2204) tmp skip revert skip fix br log issue and include both 3.1 and 4.0 br in the tidb-backup-manager image (pingcap#2213) easier to build and push to other docker image repo (pingcap#2207) install python for gcloud (pingcap#2206) fix tidb-debug docker build (pingcap#2215) delete pd data
What problem does this PR solve?
fixes #1546
What is changed and how does it work?
Check List
Tests
Code changes
Side effects
Related changes
Does this PR introduce a user-facing change?: