Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine aliyun and aws cloud tidb configurations #492

Merged
merged 6 commits into from
May 21, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion deploy/aliyun/README-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
- 一个新的 VPC;
- 一台 ECS 实例作为堡垒机;
- 一个托管版 ACK(阿里云 Kubernetes)集群以及一系列 worker 节点:
- 属于一个伸缩组的 2 台 ECS 实例(1核1G), 托管版 Kubernetes 的默认伸缩组中必须至少有两台实例, 用于承载整个的系统服务, 比如 CoreDNS
- 属于一个伸缩组的 2 台 ECS 实例(2核2G), 托管版 Kubernetes 的默认伸缩组中必须至少有两台实例, 用于承载整个的系统服务, 比如 CoreDNS
- 属于一个伸缩组的 3 台 `ecs.i2.xlarge` 实例, 用于部署 PD
- 属于一个伸缩组的 3 台 `ecs.i2.2xlarge` 实例, 用于部署 TiKV
- 属于一个伸缩组的 2 台 ECS 实例(16核32G)用于部署 TiDB
Expand Down
2 changes: 1 addition & 1 deletion deploy/aliyun/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ The default setup will create:
- A new VPC
- An ECS instance as bastion machine
- A managed ACK(Alibaba Cloud Kubernetes) cluster with the following ECS instance worker nodes:
- An auto-scaling group of 2 * instances(1c1g) as ACK mandatory workers for system service like CoreDNS
- An auto-scaling group of 2 * instances(2c2g) as ACK mandatory workers for system service like CoreDNS
- An auto-scaling group of 3 * `ecs.i2.xlarge` instances for PD
- An auto-scaling group of 3 * `ecs.i2.2xlarge` instances for TiKV
- An auto-scaling group of 2 * instances(16c32g) for TiDB
Expand Down
1 change: 1 addition & 0 deletions deploy/aliyun/data.tf
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ data "template_file" "tidb-cluster-values" {
tikv_writecf_block_cache_size = "${var.tikv_memory_size * 0.2}GB"
monitor_reserve_days = "${var.monitor_reserve_days}"
monitor_slb_network_type = "${var.monitor_slb_network_type}"
monitor_enable_anonymous_user = "${var.monitor_enable_anonymous_user}"
}
}

Expand Down
2 changes: 2 additions & 0 deletions deploy/aliyun/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ module "ack" {
vpc_id = "${var.vpc_id}"
group_id = "${var.group_id}"

default_worker_cpu_core_count = "${var.default_worker_core_count}"

worker_groups = [
{
name = "pd_worker_group"
Expand Down
4 changes: 4 additions & 0 deletions deploy/aliyun/outputs.tf
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ output "cluster_id" {
value = "${module.ack.cluster_id}"
}

output "cluster_name" {
value = "${var.cluster_name}"
}

output "kubeconfig_file" {
value = "${module.ack.kubeconfig_filename}"
}
Expand Down
2 changes: 1 addition & 1 deletion deploy/aliyun/templates/tidb-cluster-values.yaml.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -255,7 +255,7 @@ monitor:
config:
# Configure Grafana using environment variables except GF_PATHS_DATA, GF_SECURITY_ADMIN_USER and GF_SECURITY_ADMIN_PASSWORD
# Ref https://grafana.com/docs/installation/configuration/#using-environment-variables
GF_AUTH_ANONYMOUS_ENABLED: "true"
GF_AUTH_ANONYMOUS_ENABLED: %{ if monitor_enable_anonymous_user }"true"%{ else }"false"%{ endif }
GF_AUTH_ANONYMOUS_ORG_NAME: "Main Org."
GF_AUTH_ANONYMOUS_ORG_ROLE: "Viewer"
# if grafana is running behind a reverse proxy with subpath http://foo.bar/grafana
Expand Down
20 changes: 15 additions & 5 deletions deploy/aliyun/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ variable "cluster_name" {

variable "tidb_version" {
description = "TiDB cluster version"
default = "v2.1.0"
default = "v2.1.8"
}

variable "pd_count" {
Expand All @@ -25,7 +25,7 @@ variable "pd_instance_memory_size" {

variable "tikv_count" {
description = "TiKV instance count, ranges: [3, 100]"
default = 4
default = 3
}

variable "tikv_instance_type_family" {
Expand All @@ -40,7 +40,7 @@ variable "tikv_memory_size" {

variable "tidb_count" {
description = "TiDB instance count, ranges: [1, 100]"
default = 3
default = 2
}

variable "tidb_instance_type" {
Expand Down Expand Up @@ -86,6 +86,11 @@ variable "monitor_reserve_days" {
default = 14
}

variable "default_worker_core_count" {
description = "CPU core count of default kubernetes workers"
default = 2
}

variable "create_bastion" {
description = "Whether create bastion server"
default = true
Expand Down Expand Up @@ -115,6 +120,11 @@ variable "monitor_slb_network_type" {
default = "internet"
}

variable "monitor_enable_anonymous_user" {
description = "Whether enabling anonymous user visiting for monitoring"
default = false
}

variable "vpc_id" {
description = "VPC id, specify this variable to use an exsiting VPC and the vswitches in the VPC. Note that when using existing vpc, it is recommended to use a existing security group too. Otherwise you have to set vpc_cidr according to the existing VPC settings to get correct in-cluster security rule."
default = ""
Expand Down Expand Up @@ -142,5 +152,5 @@ variable "k8s_service_cidr" {

variable "vpc_cidr" {
description = "VPC cidr_block, options: [192.168.0.0.0/16, 172.16.0.0/16, 10.0.0.0/8], cannot collidate with kubernetes service cidr and pod cidr. Cannot change once the vpc created."
default = "192.168.0.0/16"
}
default = "192.168.0.0/16"
}
1 change: 1 addition & 0 deletions deploy/aws/data.tf
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ data "template_file" "tidb_cluster_values" {
pd_replicas = "${var.pd_count}"
tikv_replicas = "${var.tikv_count}"
tidb_replicas = "${var.tidb_count}"
monitor_enable_anonymous_user = "${var.monitor_enable_anonymous_user}"
}
}

Expand Down
2 changes: 1 addition & 1 deletion deploy/aws/templates/tidb-cluster-values.yaml.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -258,7 +258,7 @@ monitor:
config:
# Configure Grafana using environment variables except GF_PATHS_DATA, GF_SECURITY_ADMIN_USER and GF_SECURITY_ADMIN_PASSWORD
# Ref https://grafana.com/docs/installation/configuration/#using-environment-variables
GF_AUTH_ANONYMOUS_ENABLED: "true"
GF_AUTH_ANONYMOUS_ENABLED: %{ if monitor_enable_anonymous_user }"true"%{ else }"false"%{ endif }
GF_AUTH_ANONYMOUS_ORG_NAME: "Main Org."
GF_AUTH_ANONYMOUS_ORG_ROLE: "Viewer"
# if grafana is running behind a reverse proxy with subpath http://foo.bar/grafana
Expand Down
4 changes: 4 additions & 0 deletions deploy/aws/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -105,3 +105,7 @@ variable "tikv_root_volume_size" {
default = "100"
}

variable "monitor_enable_anonymous_user" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why create this variable in terrraform? Can't the user change it directly in the helm values? If we start doing this we eventually have to create a terraform value for every helm configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, neither do I like this direction but I cannot figure out a better way.

The problem is if we guide the user to customize the values.yaml, we have to also document that their modification will be overridden once they run terraform apply again, which render a new values.yaml using tf vars. This can be troublesome and error-prone for the users.

I do like the direction that the terraform scripts should only bootstrap the cluster and users should eventually maintain the tidb cluster by the application level method like values.yaml, sadly we haven't yet solved the cluster scaling problem outside terraform as mentioned in #436

How do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can they change tidb-cluster-values.yaml.tpl directly and use that as their values file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the long-term, the current flow of information is backwards. Terraform should just provision the K8s clusters and node pools.

  1. terraform install the K8s cluster (actually we should support attaching our node pools to an existing cluster also)
  2. helm install
  3. refresh terraform with helm values (number of pd, tikv, tidb)

Then maitentance is step 2 & 3

  • helm upgrade
  • refresh-script
  • terraform plan
  • terraform apply

Step 3 shouldn't be needed actually, but instead just an API call or button click to scale out the node group is needed. Its a problem that terraform wants to be in charge of setting the number of nodes in an autoscaling group, and I know on AWS this is supposed to get fixed in a newer version of terraform.

Copy link
Contributor Author

@aylei aylei May 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. As a experienced Kubernetes user, I would prefer to manage my tidb cluster this way, which is more flexible and predictable.

In the meantime, encapsulating all the things in a terraform script to provide "one-click deploy" is valuable for users who are not familiar with Kubernetes and Helm, and is a good fit for presentation especially.

So I suggest to add an option to skip the helm install tidb-cluster in terraform to achieve this terraform + helm workflow, while keeping the "one-click deploy" ability as default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, of course. I'd consider this as a workaround, because this expose the implementation details to user. Despite we may or may not document this method, we should provide a more formal way to leverage the full customization of helm values, for example, separate the terraform script into two modules: infrastructure & helm as you mentioned in #494

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should have a goal of hiding helm values as an implementation detail: that makes it impossible to customize the existing deployment. What is the downside to having users modify the helm values config file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The helm values is not the implementation detail, the template file (tidb-cluster-values.yaml.tpl) is. If the user edit the template file, they are maintain the terraform script on their own, and they have to track and merge with upstream change manually, and there's always an indirection even if the related change don't require an infrastructure change (e.g. change the config).
Have saying that, it is probably not a serious problem in real world usage. So the dominate reason I think we should finally avoid documenting "modify the .tpl file on your own" is #494 will expose the values.yaml to user in a more formal and flexible way (e.g. if future deployment requires multiple tidb-cluster to one kubernetes cluster).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the end user make changes that they need to the helm values in the existing setup? We do need a way to handle this workflow now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should limit this setup to deploying and managing only one tidb cluster. For managing multiple tidb clusters in a single k8s cluster, we may provide another terraform script to only set up the k8s cluster and tidb-operator itself.

In this case, it makes more sense to configure the tidb clusters using variables.tf.

description = "Whether enabling anonymous user visiting for monitoring"
default = false
}