Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add terraform scripts to support alibaba cloud ACK deployment #436

Merged
merged 10 commits into from
May 6, 2019

Conversation

aylei
Copy link
Contributor

@aylei aylei commented Apr 29, 2019

What problem does this PR solve?

One click to deploy TiDB Operator and TiDB cluster to Alibaba Cloud Kubernetes

What is changed and how it works?

Add terraform scripts, see README to refer how it works.

Check List

Manually tested in:

  • Local machine (MacOS, zsh, terraform 0.11.*)
  • Alibaba Cloud Shell

Successfully create, test and scale TiDB cluster in Alibaba Cloud cn-hangzhou, cn-shanghai and cn-beijing

Please refer to README or README-CN for all the changes in this PR.

@onlymellb @tennix @weekface @gregwebs PTAL

@gregwebs
Copy link
Contributor

This looks great! I am just concerned about templating the helm values file. It looks like this is taking the same approach as the AWS PR #401. Currently we offer a few variables that are templated. But don't we need to eventually template every single variable? A solution that scales better is to template nothing and advice the user to modify the helm values file itself. I can see the case being made that it is useful to have the a few specific variables templated. Those would be variables that are tied to the terraform deployment (it seems like there aren't any?) or something like cluster version to make it uniform across components. However, for cluster version that may indicate that our helm values are not structured in a user friendly way.

@aylei
Copy link
Contributor Author

aylei commented Apr 30, 2019

This looks great! I am just concerned about templating the helm values file. It looks like this is taking the same approach as the AWS PR #401. Currently we offer a few variables that are templated. But don't we need to eventually template every single variable? A solution that scales better is to template nothing and advice the user to modify the helm values file itself. I can see the case being made that it is useful to have the a few specific variables templated. Those would be variables that are tied to the terraform deployment (it seems like there aren't any?) or something like cluster version to make it uniform across components. However, for cluster version that may indicate that our helm values are not structured in a user friendly way.

Yes, the vision is to spin up an auto-scaling kubernetes cluster and the user just have to care about the TiDB cluster values.

However there's still obstacles that forces our user to care about the kubernetes arguments when operating TiDB cluster like scale-in and scale-out:

  • The Cluster AutoScaler do not work well for StatefuleSet with local PV, because CA has no idea about which new node will bring available local PVs;
  • The instance capacity is also bound to pre-provisioned auto-scaling groups because we cannot do auto-provisioning;

So currently, or at least for ACK, the vital part of helm values like instance capacity and replica numbers cannot be effectively updated without changing the kubernetes cluster.

Another option you mentioned is only templating some values, but the replicas and instance resource capacities should not be changed outside terraform in favor of consistency. I cannot figure out a way to keep the consistency while exposing the other values to user for editing.

Any ideas?

@gregwebs
Copy link
Contributor

tikv_storage_size and pd_storage_size seem like they are not being used in terraform. I think it is better to have the user modify the values file themselves.

I think the consistent way is to only template the variables that are used in terraform.

However, I think it is still problematic to drive K8s setup from terraform. It seems that we can query K8s for the replica count, etc, I think #401 is starting to take this approach of querying K8s as the source of truth.

@aylei
Copy link
Contributor Author

aylei commented May 1, 2019

The storage_size is bound to the local NVMe volume size according to the instance type.

Then there should be a merge strategy for manual editing and terraform templating I suppose? (e.g. when editing tikv_count in terraform and editing other fields in values.yaml)

I notice that #401 query the endpoints from kubernetes and I have planned to query these information via the kubernetes provider once the upstream get the kubeconfig issue fixed. But reflecting cluster settings from the tidbcluster resource instead of driving by terraform requires an extra component like Cluster Autoscaler(CA cannot work well in our case, of course), which need more focusing discussion and future efforts I suppose.

@gregwebs
Copy link
Contributor

gregwebs commented May 1, 2019

The storage_size is bound to the local NVMe volume size according to the instance type.

This is happening in K8s? Where is storage_size used to provision infrastructure?

@aylei
Copy link
Contributor Author

aylei commented May 2, 2019

Emmm, seems that I missed this part, I will amend a fix to use the capacity queried from node instance type then.

tennix
tennix previously approved these changes May 5, 2019
Copy link
Member

@tennix tennix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@aylei
Copy link
Contributor Author

aylei commented May 6, 2019

@tennix Thanks for your review, I resolved a conflict in README, could you please take a look again?

Copy link
Member

@tennix tennix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tennix tennix merged commit eebd686 into pingcap:master May 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants