diff --git a/content/zh/_redirects b/content/zh/_redirects deleted file mode 100644 index a38fbcc91be78..0000000000000 --- a/content/zh/_redirects +++ /dev/null @@ -1 +0,0 @@ -/zh/docs/ /zh/docs/home/ 301 diff --git a/content/zh/blog/_posts/2016-04-Kubernetes-Network-Policy-APIs.md b/content/zh/blog/_posts/2016-04-Kubernetes-Network-Policy-APIs.md deleted file mode 100644 index e9b6e2f0848be..0000000000000 --- a/content/zh/blog/_posts/2016-04-Kubernetes-Network-Policy-APIs.md +++ /dev/null @@ -1,182 +0,0 @@ ---- -title: " SIG-Networking: Kubernetes Network Policy APIs Coming in 1.3 " -date: 2016-04-18 -slug: kubernetes-network-policy-apis -url: /blog/2016/04/Kubernetes-Network-Policy-APIs ---- - - - - - -编者按:这一周,我们的封面主题是 [Kubernetes 特别兴趣小组](https://github.com/kubernetes/kubernetes/wiki/Special-Interest-Groups-(SIGs));今天的文章由网络兴趣小组撰写,来谈谈 1.3 版本中即将出现的网络策略 API - 针对安全,隔离和多租户的策略。 - - - -自去年下半年起,[Kubernetes 网络特别兴趣小组](https://kubernetes.slack.com/messages/sig-network/)经常定期开会,讨论如何将网络策略带入到 Kubernetes 之中,现在,我们也将慢慢看到这些工作的成果。 - - - -很多用户经常会碰到的一个问题是, Kubernetes 的开放访问网络策略并不能很好地满足那些需要对 pod 或服务( service )访问进行更为精确控制的场景。今天,这个场景可以是在多层应用中,只允许临近层的访问。然而,随着组合微服务构建原生应用程序潮流的发展,如何控制流量在不同服务之间的流动会别的越发的重要。 - - - -在大多数的(公共的或私有的) IaaS 环境中,这种网络控制通常是将 VM 和“安全组”结合,其中安全组中成员的通信都是通过一个网络策略或者访问控制表( Access Control List, ACL )来定义,以及借助于网络包过滤器来实现。 - - - -“网络特别兴趣小组”刚开始的工作是确定 [特定的使用场景](https://docs.google.com/document/d/1blfqiH4L_fpn33ZrnQ11v7LcYP0lmpiJ_RaapAPBbNU/edit?pref=2&pli=1#) ,这些用例需要基本的网络隔离来提升安全性。 -让这些API恰如其分地满足简单、共通的用例尤其重要,因为它们将为那些服务于 Kubernetes 内多租户,更为复杂的网络策略奠定基础。 - - - -根据这些应用场景,我们考虑了集中不同的方法,然后定义了一个最简[策略规范](https://docs.google.com/document/d/1qAm-_oSap-f1d6a-xRTj6xaH1sYQBfK36VyjB5XOZug/edit)。 -基本的想法是,如果是根据命名空间的不同来进行隔离,那么就会根据所被允许的流量类型的不同,来选择特定的 pods 。 - - - -快速支持这个实验性 API 的办法是往 API 服务器上加入一个 `ThirdPartyResource` 扩展,这在 Kubernetes 1.2 就能办到。 - - - -如果你还不是很熟悉这其中的细节, Kubernetes API 是可以通过定义 `ThirdPartyResources` 扩展在特定的 URL 上创建一个新的 API 端点。 - -#### third-party-res-def.yaml - -``` -kind: ThirdPartyResource -apiVersion: extensions/v1beta1 -metadata: - - name: network-policy.net.alpha.kubernetes.io -description: "Network policy specification" -versions: - - name: v1alpha1 -``` - -``` -$kubectl create -f third-party-res-def.yaml -``` - - -这条命令会创建一个 API 端点(每个命名空间各一个): - -``` -/net.alpha.kubernetes.io/v1alpha1/namespace/default/networkpolicys/ -``` - - - - -第三方网络控制器可以监听这些端点,根据资源的创建,修改或者删除作出必要的响应。 -_注意:在接下来的 Kubernetes 1.3 发布中, Network Policy API 会以 beta API 的形式出现,这也就不需要像上面那样,创建一个 `ThirdPartyResource` API 端点了。_ - - - -网络隔离默认是关闭的,因而,所有的 pods 之间可以自由地通信。 -然而,很重要的一点是,一旦开通了网络隔离,所有命名空间下的所有 pods 之间的通信都会被阻断,换句话说,开通隔离会改变 pods 的行为。 - - - -网络隔离可以通过定义命名空间, `net.alpha.kubernetes.io` 里的 `network-isolation` 注释来开通关闭: - -``` -net.alpha.kubernetes.io/network-isolation: [on | off] -``` - - - -一旦开通了网络隔离,**一定需要使用** 显示的网络策略来允许 pod 间的通信。 - - - -一个策略规范可以被用到一个命名空间中,来定义策略的细节(如下所示): - -``` -POST /apis/net.alpha.kubernetes.io/v1alpha1/namespaces/tenant-a/networkpolicys/ -{ - "kind": "NetworkPolicy", - "metadata": { - "name": "pol1" - }, - "spec": { - "allowIncoming": { - "from": [ - { - "pods": { - "segment": "frontend" - } - } - ], - "toPorts": [ - { - "port": 80, - "protocol": "TCP" - } - ] - }, - "podSelector": { - "segment": "backend" - } - } -} -``` - - - -在这个例子中,**tenant-a** 空间将会使用 **pol1** 策略。 -具体而言,带有 **segment** 标签为 **backend** 的 pods 会允许 **segment** 标签为 **frontend** 的 pods 访问其端口 80 。 - - - - - -今天,[Romana](http://romana.io/), [OpenShift](https://www.openshift.com/), [OpenContrail](http://www.opencontrail.org/) 以及 [Calico](http://projectcalico.org/) 都已经支持在命名空间和pods中使用网络策略。 -而 Cisco 和 VMware 也在努力实现支持之中。 -Romana 和 Calico 已经在最近的 KubeCon 中展示了如何在 Kubernetes 1.2 下使用这些功能。 -你可以在这里看到他们的演讲: -[Romana](https://www.youtube.com/watch?v=f-dLKtK6qCs) ([幻灯片](http://www.slideshare.net/RomanaProject/kubecon-london-2016-ronana-cloud-native-sdn)), -[Calico](https://www.youtube.com/watch?v=p1zfh4N4SX0) ([幻灯片](http://www.slideshare.net/kubecon/kubecon-eu-2016-secure-cloudnative-networking-with-project-calico)). - - - -**这是如何工作的** - - - -每套解决方案都有自己不同的具体实现。尽管今天,他们都借助于每种主机上( on-host )的实现机制,但未来的实现可以通过将策略使用在 hypervisor 上,亦或是直接使用到网络本身上来达到同样的目的。 - - - -外部策略控制软件(不同实现各有不同)可以监听 pods 创建以及新加载策略的 API 端点。 -当产生一个需要策略配置的事件之后,监听器会确认这个请求,相应的,控制器会配置接口,使用该策略。 -下面的图例展示了 API 监视器和策略控制器是如何通过主机代理在本地应用网络策略的。 -这些 pods 的网络接口是使用过主机上的 CNI 插件来进行配置的(并未在图中注明)。 - - ![controller.jpg](https://lh5.googleusercontent.com/zMEpLMYmask-B-rYWnbMyGb0M7YusPQFPS6EfpNOSLbkf-cM49V7rTDBpA6k9-Zdh2soMul39rz9rHFJfL-jnEn_mHbpg0E1WlM-wjU-qvQu9KDTQqQ9uBmdaeWynDDNhcT3UjX5) - - - - -如果你一直受网络隔离或安全考虑的困扰,而犹豫要不要使用 Kubernetes 来开发应用程序,这些新的网络策略将会极大地解决你这方面的需求。并不需要等到 Kubernetes 1.3 ,现在就可以通过 `ThirdPartyResource` 的方式来使用这个实现性 API 。 - - - - -如果你对 Kubernetes 和网络感兴趣,可以通过下面的方式参与、加入其中: - -- 我们的[网络 slack channel](https://kubernetes.slack.com/messages/sig-network/) -- 我们的[Kubernetes 特别网络兴趣小组](https://groups.google.com/forum/#!forum/kubernetes-sig-network) 邮件列表 - - - -网络“特别兴趣小组”每两周下午三点(太平洋时间)开会,地址是[SIG-Networking hangout](https://zoom.us/j/5806599998). - -_--Chris Marino, Co-Founder, Pani Networks_ diff --git a/content/zh/blog/_posts/2016-04-Kubernetes-On-Aws_15.md b/content/zh/blog/_posts/2016-04-Kubernetes-On-Aws_15.md deleted file mode 100644 index a12060b915706..0000000000000 --- a/content/zh/blog/_posts/2016-04-Kubernetes-On-Aws_15.md +++ /dev/null @@ -1,129 +0,0 @@ ---- -title: " 如何在AWS上部署安全,可审计,可复现的k8s集群 " -date: 2016-04-15 -slug: kubernetes-on-aws_15 -url: /blog/2016/04/Kubernetes-On-Aws_15 ---- - - - - - -_今天的客座文章是由Colin Hom撰写,[CoreOS](https://coreos.com/)的基础架构工程师。CoreOS致力于推广谷歌的基础架构模式(Google’s Infrastructure for Everyone Else, #GIFEE),让全世界的容器都能在CoreOS Linux, Tectonic 和 Quay上安全运行。_ - -_加入到我们的[柏林CoreOS盛宴](https://coreos.com/fest/),这是一个开源分布式系统主题的会议,在这里可以了解到更多关于CoreOS和Kubernetes的信息。_ - - - -在CoreOS, 我们一直都是在生产环境中大规模部署Kubernetes。今天我们非常兴奋地想分享一款工具,它能让你的Kubernetes生产环境大规模部署更加的轻松。Kube-aws这个工具可以用来在AWS上部署可审计,可复现的k8s集群,而CoreOS本身就在生产环境中使用它。 - - - -也许今天,你更多的可能是用手工的方式来拼接Kubernetes组件。但有了这个工具之后,Kubernetes可以流水化地打包、交付,节省时间,减少了相互间的依赖,更加快捷地实现生产环境的部署。 - - - -借助于一个简单的模板系统,来生成集群配置,这么做是因为一套声明式的配置模板可以版本控制,审计以及重复部署。而且,由于整个创建过程只用到了[AWS CloudFormation](https://aws.amazon.com/cloudformation/) 和 cloud-init,你也就不需要额外用到其它的配置管理工具。开箱即用! - - - -如果要跳过演讲,直接了解这个项目,可以看看[kube-aws的最新发布](https://github.com/coreos/coreos-kubernetes/releases),支持Kubernetes 1.2.x。如果要部署集群,可以参考[文档]](https://coreos.com/kubernetes/docs/latest/kubernetes-on-aws.html). - - -**为什么是kube-aws?安全,可审计,可复现** - - -Kube-aws设计初衷有三个目标。 - - - -**安全** : TLS 资源在嵌入到CloudFormation JSON之前,通过[AWS 秘钥管理服务](https://aws.amazon.com/kms/)加密。通过单独管理KMS密钥的[IAM 策略](http://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies.html),可以将CloudFormation栈的访问与TLS秘钥的访问分离开。 - - - -**可审计** : kube-aws是围绕集群资产的概念来创建。这些配置和账户资产是对集群的完全描述。由于KMS被用来加密TLS资产,因而可以无所顾忌地将未加密的CloudFormation栈 JSON签入到版本控制服务中。 - - - -**可重复** : _--export_ 选项将参数化的集群定义打包成一整个JSON文件,对应一个CloudFormation栈。这个文件可以版本控制,然后,如果需要的话,通过现有的部署工具直接提交给CloudFormation API。 - - -**如何开始用kube-aws** - - -在此基础之上,kube-aws也实现了一些功能,使得在AWS上部署Kubernetes集群更加容易,灵活。下面是一些例子。 - - -**Route53集成** : Kube-aws 可以管理你的集群DNS记录,作为配置过程的一部分。 - -cluster.yaml -``` -externalDNSName: my-cluster.kubernetes.coreos.com - -createRecordSet: true - -hostedZone: kubernetes.coreos.com - -recordSetTTL: 300 -``` - - -**现有VPC支持** : 将集群部署到现有的VPC上。 - -cluster.yaml -``` -vpcId: vpc-xxxxx - -routeTableId: rtb-xxxxx -``` - - -**验证** : kube-aws 支持验证 cloud-init 和 CloudFormation定义,以及集群栈会集成用到的外部资源。例如,下面就是一个cloud-config,外带一个拼写错误的参数: - -userdata/cloud-config-worker -``` -#cloud-config - -coreos: - - flannel: - interrface: $private\_ipv4 - etcd\_endpoints: {{ .ETCDEndpoints }} -``` - -$ kube-aws validate - - \> Validating UserData... - Error: cloud-config validation errors: - UserDataWorker: line 4: warning: unrecognized key "interrface" - - -考虑如何起步?看看[kube-aws 文档](https://coreos.com/kubernetes/docs/latest/kubernetes-on-aws.html)! - - -**未来的工作** - - -一如既往,kube-aws的目标是让生产环境部署更加的简单。尽管我们现在在AWS下使用kube-aws进行生产环境部署,但是这个项目还是pre-1.0,所以还有很多的地方,kube-aws需要考虑、扩展。 - - -**容错** : CoreOS坚信 Kubernetes on AWS是强健的平台,适于容错、自恢复部署。在接下来的几个星期,kube-aws将会迎接新的考验:混世猴子([Chaos Monkey](https://github.com/Netflix/SimianArmy/wiki/Chaos-Monkey))测试 - 控制平面以及全部! - - -**零停机更新** : 更新CoreOS节点和Kubernetes组件不需要停机,也不需要考虑实例更新策略(instance replacement strategy)的影响。 - - -有一个[github issue](https://github.com/coreos/coreos-kubernetes/issues/340)来追踪这些工作进展。我们期待你的参与,提交issue,或是直接贡献。 - - -_想要更多地了解Kubernetes,来[柏林CoreOS盛宴](https://coreos.com/fest/)看看,- 五月 9-10, 2016_ - - -_– Colin Hom, 基础架构工程师, CoreOS_ diff --git a/content/zh/blog/_posts/2018-03-Principles-Of-Container-App-Design.md b/content/zh/blog/_posts/2018-03-Principles-Of-Container-App-Design.md deleted file mode 100644 index 4797bb7986ea4..0000000000000 --- a/content/zh/blog/_posts/2018-03-Principles-Of-Container-App-Design.md +++ /dev/null @@ -1,81 +0,0 @@ ---- -title: "Principles of Container-based Application Design" -date: 2018-03-15 -slug: principles-of-container-app-design -url: /blog/2018/03/Principles-Of-Container-App-Design ---- - - - -现如今,几乎所有的的应用程序都可以在容器中运行。但创建云原生应用,通过诸如 Kubernetes 的云原生平台更有效地自动化运行、管理容器化的应用却需要额外的工作。 -云原生应用需要考虑故障;即使是在底层架构发生故障时也需要可靠地运行。 -为了提供这样的功能,像 Kubernetes 这样的云原生平台需要向运行的应用程序强加一些契约和约束。 -这些契约确保应用可以在符合某些约束的条件下运行,从而使得平台可以自动化应用管理。 - - - -我已经为容器化应用如何之为云原生应用概括出了[七项原则][1]。 - -| ----- | -| ![][2] | -| Container Design Principles | - - - - -这里所述的七项原则涉及到构建时和运行时,两类关注点。 - - -#### 构建时 - - - -* **单一关注点:** 每个容器只解决一个关注点,并且完成的很好。 -* **自包含:** 一个容器只依赖Linux内核。额外的库要求可以在构建容器时加入。 -* **镜像不变性:** 容器化的应用意味着不变性,一旦构建完成,不需要根据环境的不同而重新构建。 - - -#### 运行时 - - - -* **高可观测性:** 每个容器必须实现所有必要的 API 来帮助平台以最好的方式来观测、管理应用。 -* **生命周期一致性:** 一个容器必须要能从平台中获取事件信息,并作出相应的反应。 -* **进程易处理性:** 容器化应用的寿命一定要尽可能的短暂,这样,可以随时被另一个容器所替换。 -* **运行时限制:** 每个容器都必须要声明自己的资源需求,并将资源使用限制在所需要的范围之内。 - - - -编译时原则保证了容器拥有合适的粒度,一致性以及结构。运行时原则明确了容器化必须要实现那些功能才能成为云原生函数。遵循这些原则可以帮助你的应用适应 Kubernetes 上的自动化。 - - - -白皮书可以免费下载: - - - -想要了解更多关于如何面向 Kubernetes 设计云原生应用,可以看看我的 [Kubernetes 模式][3] 一书。 - - - -— [Bilgin Ibryam][4], 首席架构师, Red Hat - -Twitter: 
 -Blog: [http://www.ofbizian.com][5] -Linkedin: - - - -Bilgin Ibryam (@bibryam) 是 Red Hat 的一名首席架构师, ASF 的开源贡献者,博主,作者以及演讲者。 -他是 Camel 设计模式、 Kubernetes 模式的作者。在他的日常生活中,他非常享受指导、培训以及帮助各个团队更加成功地使用分布式系统、微服务、容器,以及云原生应用。 - -[1]: https://www.redhat.com/en/resources/cloud-native-container-design-whitepaper -[2]: https://lh5.googleusercontent.com/1XqojkVC0CET1yKCJqZ3-0VWxJ3W8Q74zPLlqnn6eHSJsjHOiBTB7EGUX5o_BOKumgfkxVdgBeLyoyMfMIXwVm9p2QXkq_RRy2mDJG1qEExJDculYL5PciYcWfPAKxF2-DGIdiLw -[3]: http://leanpub.com/k8spatterns/ -[4]: http://twitter.com/bibryam -[5]: http://www.ofbizian.com/ diff --git a/content/zh/case-studies/adform/adform_featured_logo.png b/content/zh/case-studies/adform/adform_featured_logo.png new file mode 100644 index 0000000000000..cd0fa7b6c9c4e Binary files /dev/null and b/content/zh/case-studies/adform/adform_featured_logo.png differ diff --git a/content/zh/case-studies/adform/index.html b/content/zh/case-studies/adform/index.html new file mode 100644 index 0000000000000..e9a8acc7a22f2 --- /dev/null +++ b/content/zh/case-studies/adform/index.html @@ -0,0 +1,118 @@ +--- +title: Adform Case Study +linkTitle: Adform +case_study_styles: true +cid: caseStudies +css: /css/style_case_studies.css +logo: adform_featured_logo.png +draft: false +featured: true +weight: 47 +quote: > + Kubernetes enabled the self-healing and immutable infrastructure. We can do faster releases, so our developers are really happy. They can ship our features faster than before, and that makes our clients happier. +--- + +
+

CASE STUDY:
Improving Performance and Morale with Cloud Native + +

+ +
+ +
+ Company  AdForm     Location  Copenhagen, Denmark     Industry  Adtech +
+ +
+
+
+
+

Challenge

+ Adform’s mission is to provide a secure and transparent full stack of advertising technology to enable digital ads across devices. The company has a large infrastructure: OpenStack-based private clouds running on 1,100 physical servers in 7 data centers around the world, 3 of which were opened in the past year. With the company’s growth, the infrastructure team felt that "our private cloud was not really flexible enough," says IT System Engineer Edgaras Apšega. "The biggest pain point is that our developers need to maintain their virtual machines, so rolling out technology and new software takes time. We were really struggling with our releases, and we didn’t have self-healing infrastructure." + + +
+ +

Solution

+ The team, which had already been using Prometheus for monitoring, embraced Kubernetes and cloud native practices in 2017. "To start our Kubernetes journey, we had to adapt all our software, so we had to choose newer frameworks," says Apšega. "We also adopted the microservices way, so observability is much better because you can inspect the bug or the services separately." + + +
+ +
+ +

Impact

+ "Kubernetes helps our business a lot because our features are coming to market faster," says Apšega. The release process went from several hours to several minutes. Autoscaling has been at least 6 times faster than the semi-manual VM bootstrapping and application deployment required before. The team estimates that the company has experienced cost savings of 4-5x due to less hardware and fewer man hours needed to set up the hardware and virtual machines, metrics, and logging. Utilization of the hardware resources has been reduced as well, with containers notching 2-3 times more efficiency over virtual machines. "The deployments are very easy because developers just push the code and it automatically appears on Kubernetes," says Apšega. Prometheus has also had a positive impact: "It provides high availability for metrics and alerting. We monitor everything starting from hardware to applications. Having all the metrics in Grafana dashboards provides great insight on your systems." + + +
+ +
+
+
+
+"Kubernetes enabled the self-healing and immutable infrastructure. We can do faster releases, so our developers are really happy. They can ship our features faster than before, and that makes our clients happier."

— Edgaras Apšega, IT Systems Engineer, Adform
+ +
+
+ + +
+
+

Adform made headlines last year when it detected the HyphBot ad fraud network that was costing some businesses hundreds of thousands of dollars a day.

With its mission to provide a secure and transparent full stack of advertising technology to enable an open internet, Adform published a white paper revealing what it did—and others could too—to limit customers’ exposure to the scam.

+In that same spirit, Adform is sharing its cloud native journey. "When you see that everyone shares their best practices, it inspires you to contribute back to the project," says IT Systems Engineer Edgaras Apšega.

+The company has a large infrastructure: OpenStack-based private clouds running on 1,100 physical servers in their own seven data centers around the world, three of which were opened in the past year. With the company’s growth, the infrastructure team felt that "our private cloud was not really flexible enough," says Apšega. "The biggest pain point is that our developers need to maintain their virtual machines, so rolling out technology and new software really takes time. We were really struggling with our releases, and we didn’t have self-healing infrastructure." + + +
+
+
+
+ "The fact that Cloud Native Computing Foundation incubated Kubernetes was a really big point for us because it was vendor neutral. And we can see that a community really gathers around it. Everyone shares their experiences, their knowledge, and the fact that it’s open source, you can contribute."

— Edgaras Apšega, IT Systems Engineer, Adform
+
+
+
+
+ +The team, which had already been using Prometheus for monitoring, embraced Kubernetes, microservices, and cloud native practices. "The fact that Cloud Native Computing Foundation incubated Kubernetes was a really big point for us because it was vendor neutral," says Apšega. "And we can see that a community really gathers around it."

+A proof of concept project was started, with a Kubernetes cluster running on bare metal in the data center. When developers saw how quickly containers could be spun up compared to the virtual machine process, "they wanted to ship their containers in production right away, and we were still doing proof of concept," says IT Systems Engineer Andrius Cibulskis. +Of course, a lot of work still had to be done. "First of all, we had to learn Kubernetes, see all of the moving parts, how they glue together," says Apšega. "Second of all, the whole CI/CD part had to be redone, and our DevOps team had to invest more man hours to implement it. And third is that developers had to rewrite the code, and they’re still doing it." +

+The first production cluster was launched in the spring of 2018, and is now up to 20 physical machines dedicated for pods throughout three data centers, with plans for separate clusters in the other four data centers. The user-facing Adform application platform, data distribution platform, and back ends are now all running on Kubernetes. "Many APIs for critical applications are being developed for Kubernetes," says Apšega. "Teams are rewriting their applications to .NET core, because it supports containers, and preparing to move to Kubernetes. And new applications, by default, go in containers." + + +
+
+
+
+"Releases are really nice for them, because they just push their code to Git and that’s it. They don’t have to worry about their virtual machines anymore."

— Andrius Cibulskis, IT Systems Engineer, Adform
+
+
+ +
+
+This big push has been driven by the real impact that these new practices have had. "Kubernetes helps our business a lot because our features are coming to market faster," says Apšega. "The deployments are very easy because developers just push the code and it automatically appears on Kubernetes." The release process went from several hours to several minutes. Autoscaling is at least six times faster than the semi-manual VM bootstrapping and application deployment required before.

+The team estimates that the company has experienced cost savings of 4-5x due to less hardware and fewer man hours needed to set up the hardware and virtual machines, metrics, and logging. Utilization of the hardware resources has been reduced as well, with containers notching two to three times more efficiency over virtual machines.

+Prometheus has also had a positive impact: "It provides high availability for metrics and alerting," says Apšega. "We monitor everything starting from hardware to applications. Having all the metrics in Grafana dashboards provides great insight on our systems." + + + +
+ +
+
+ "I think that our company just started our cloud native journey. It seems like a huge road ahead, but we’re really happy that we joined it."

— Edgaras Apšega, IT Systems Engineer, Adform
+
+
+ +
+All of these benefits have trickled down to individual team members, whose working lives have been changed for the better. "They used to have to get up at night to re-start some services, and now Kubernetes handles all of that," says Apšega. Adds Cibulskis: "Releases are really nice for them, because they just push their code to Git and that’s it. They don’t have to worry about their virtual machines anymore." Even the security teams have been impacted. "Security teams are always not happy," says Apšega, "and now they’re happy because they can easily inspect the containers." +The company plans to remain in the data centers for now, "mostly because we want to keep all the data, to not share it in any way," says Cibulskis, "and it’s cheaper at our scale." But, Apšega says, the possibility of using a hybrid cloud for computing is intriguing: "One of the projects we’re interested in is the Virtual Kubelet that lets you spin up the working nodes on different clouds to do some computing." +

+Apšega, Cibulskis and their colleagues are keeping tabs on how the cloud native ecosystem develops, and are excited to contribute where they can. "I think that our company just started our cloud native journey," says Apšega. "It seems like a huge road ahead, but we’re really happy that we joined it." + + + +
+ +
diff --git a/content/zh/case-studies/amadeus/amadeus_featured.png b/content/zh/case-studies/amadeus/amadeus_featured.png new file mode 100644 index 0000000000000..d23d7b0163854 Binary files /dev/null and b/content/zh/case-studies/amadeus/amadeus_featured.png differ diff --git a/content/zh/case-studies/amadeus/amadeus_logo.png b/content/zh/case-studies/amadeus/amadeus_logo.png new file mode 100644 index 0000000000000..6191c7f6819f2 Binary files /dev/null and b/content/zh/case-studies/amadeus/amadeus_logo.png differ diff --git a/content/zh/case-studies/amadeus/index.html b/content/zh/case-studies/amadeus/index.html new file mode 100644 index 0000000000000..8b6294d4f9afa --- /dev/null +++ b/content/zh/case-studies/amadeus/index.html @@ -0,0 +1,105 @@ +--- +title: Amadeus Case Study + +case_study_styles: true +cid: caseStudies +css: /css/style_amadeus.css +--- + +
+

CASE STUDY:
Another Technical Evolution for a 30-Year-Old Company +

+
+
+ Company  Amadeus IT Group     Location  Madrid, Spain     Industry  Travel Technology +
+
+
+
+
+

Challenge

+ In the past few years, Amadeus, which provides IT solutions to the travel industry around the world, found itself in need of a new platform for the 5,000 services supported by its service-oriented architecture. The 30-year-old company operates its own data center in Germany, and there were growing demands internally and externally for solutions that needed to be geographically dispersed. And more generally, "we had objectives of being even more highly available," says Eric Mountain, Senior Expert, Distributed Systems at Amadeus. Among the company’s goals: to increase automation in managing its infrastructure, optimize the distribution of workloads, use data center resources more efficiently, and adopt new technologies more easily. +
+
+

Solution

+ Mountain has been overseeing the company’s migration to Kubernetes, using OpenShift Container Platform, Red Hat’s enterprise container platform. +

+

Impact

+ One of the first projects the team deployed in Kubernetes was the Amadeus Airline Cloud Availability solution, which helps manage ever-increasing flight-search volume. "It’s now handling in production several thousand transactions per second, and it’s deployed in multiple data centers throughout the world," says Mountain. "It’s not a migration of an existing workload; it’s a whole new workload that we couldn’t have done otherwise. [This platform] gives us access to market opportunities that we didn’t have before." +
+
+
+ +
+
+ "We want multi-data center capabilities, and we want them for our mainstream system as well. We didn’t think that we could achieve them with our existing system. We need new automation, things that Kubernetes and OpenShift bring."

- Eric Mountain, Senior Expert, Distributed Systems at Amadeus IT Group
+
+
+
+ +
+

In his two decades at Amadeus, Eric Mountain has been the migrations guy.

+ Back in the day, he worked on the company’s move from Unix to Linux, and now he’s overseeing the journey to cloud native. "Technology just keeps changing, and we embrace it," he says. "We are celebrating our 30 years this year, and we continue evolving and innovating to stay cost-efficient and enhance everyone’s travel experience, without interrupting workflows for the customers who depend on our technology."

+ That was the challenge that Amadeus—which provides IT solutions to the travel industry around the world, from flight searches to hotel bookings to customer feedback—faced in 2014. The technology team realized it was in need of a new platform for the 5,000 services supported by its service-oriented architecture.

+ The tipping point occurred when they began receiving many requests, internally and externally, for solutions that needed to be geographically outside the company’s main data center in Germany. "Some requests were for running our applications on customer premises," Mountain says. "There were also new services we were looking to offer that required response time to the order of a few hundred milliseconds, which we couldn’t achieve with transatlantic traffic. Or at least, not without eating into a considerable portion of the time available to our applications for them to process individual queries."

+ More generally, the company was interested in leveling up on high availability, increasing automation in managing infrastructure, optimizing the distribution of workloads and using data center resources more efficiently. "We have thousands and thousands of servers," says Mountain. "These servers are assigned roles, so even if the setup is highly automated, the machine still has a given role. It’s wasteful on many levels. For instance, an application doesn’t necessarily use the machine very optimally. Virtualization can help a bit, but it’s not a silver bullet. If that machine breaks, you still want to repair it because it has that role and you can’t simply say, ‘Well, I’ll bring in another machine and give it that role.’ It’s not fast. It’s not efficient. So we wanted the next level of automation."

+
+
+ +
+
+ "We hope that if we build on what others have built, what we do might actually be upstream-able. As Kubernetes and OpenShift progress, we see that we are indeed able to remove some of the additional layers we implemented to compensate for gaps we perceived earlier." +
+
+ +
+
+ While mainly a C++ and Java shop, Amadeus also wanted to be able to adopt new technologies more easily. Some of its developers had started using languages like Python and databases like Couchbase, but Mountain wanted still more options, he says, "in order to better adapt our technical solutions to the products we offer, and open up entirely new possibilities to our developers." Working with recent technologies and cool new things would also make it easier to attract new talent. +

+ All of those needs led Mountain and his team on a search for a new platform. "We did a set of studies and proofs of concept over a fairly short period, and we considered many technologies," he says. "In the end, we were left with three choices: build everything on premise, build on top of Kubernetes whatever happens to be missing from our point of view, or go with OpenShift and build whatever remains there." +

+ The team decided against building everything themselves—though they’d done that sort of thing in the past—because "people were already inventing things that looked good," says Mountain. +

+ Ultimately, they went with OpenShift Container Platform, Red Hat’s Kubernetes-based enterprise offering, instead of building on top of Kubernetes because "there was a lot of synergy between what we wanted and the way Red Hat was anticipating going with OpenShift," says Mountain. "They were clearly developing Kubernetes, and developing certain things ahead of time in OpenShift, which were important to us, such as more security." +

+ The hope was that those particular features would eventually be built into Kubernetes, and, in the case of security, Mountain feels that has happened. "We realize that there’s always a certain amount of automation that we will probably have to develop ourselves to compensate for certain gaps," says Mountain. "The less we do that, the better for us. We hope that if we build on what others have built, what we do might actually be upstream-able. As Kubernetes and OpenShift progress, we see that we are indeed able to remove some of the additional layers we implemented to compensate for gaps we perceived earlier." +
+
+ +
+
+ "It’s not a migration of an existing workload; it’s a whole new workload that we couldn’t have done otherwise. [This platform] gives us access to market opportunities that we didn’t have before." +
+
+ +
+
+ The first project the team tackled was one that they knew had to run outside the data center in Germany. Because of the project’s needs, "We couldn’t rely only on the built-in Kubernetes service discovery; we had to layer on top of that an extra service discovery level that allows us to load balance at the operation level within our system," says Mountain. They also built a stream dedicated to monitoring, which at the time wasn’t offered in the Kubernetes or OpenShift ecosystem. Now that Prometheus and other products are available, Mountain says the company will likely re-evaluate their monitoring system: "We obviously always like to leverage what Kubernetes and OpenShift can offer." +

+ The second project ended up going into production first: the Amadeus Airline Cloud Availability solution, which helps manage ever-increasing flight-search volume and was deployed in public cloud. Launched in early 2016, it is "now handling in production several thousand transactions per second, and it’s deployed in multiple data centers throughout the world," says Mountain. "It’s not a migration of an existing workload; it’s a whole new workload that we couldn’t have done otherwise. [This platform] gives us access to market opportunities that we didn’t have before." +

+ Having been through this kind of technical evolution more than once, Mountain has advice on how to handle the cultural changes. "That’s one aspect that we can tackle progressively," he says. "We have to go on supplying our customers with new features on our pre-existing products, and we have to keep existing products working. So we can’t simply do absolutely everything from one day to the next. And we mustn’t sell it that way." +

+ The first order of business, then, is to pick one or two applications to demonstrate that the technology works. Rather than choosing a high-impact, high-risk project, Mountain’s team selected a smaller application that was representative of all the company’s other applications in its complexity: "We just made sure we picked something that’s complex enough, and we showed that it can be done." +
+
+ +
+
+ "The bottom line is we want these multi-data center capabilities, and we want them as well for our mainstream system," he says. "And we don’t think that we can implement them with our previous system. We need the new automation, homogeneity, and scale that Kubernetes and OpenShift bring." +
+
+ +
+
+ Next comes convincing people. "On the operations side and on the R&D side, there will be people who say quite rightly, ‘There is a system, and it works, so why change?’" Mountain says. "The only thing that really convinces people is showing them the value." For Amadeus, people realized that the Airline Cloud Availability product could not have been made available on the public cloud with the company’s existing system. The question then became, he says, "Do we go into a full-blown migration? Is that something that is justified?" +

+ "The bottom line is we want these multi-data center capabilities, and we want them as well for our mainstream system," he says. "And we don’t think that we can implement them with our previous system. We need the new automation, homogeneity, and scale that Kubernetes and OpenShift bring." +

+ So how do you get everyone on board? "Make sure you have good links between your R&D and your operations," he says. "Also make sure you’re going to talk early on to the investors and stakeholders. Figure out what it is that they will be expecting from you, that will convince them or not, that this is the right way for your company." +

+ His other advice is simply to make the technology available for people to try it. "Kubernetes and OpenShift Origin are open source software, so there’s no complicated license key for the evaluation period and you’re not limited to 30 days," he points out. "Just go and get it running." Along with that, he adds, "You’ve got to be prepared to rethink how you do things. Of course making your applications as cloud native as possible is how you’ll reap the most benefits: 12 factors, CI/CD, which is continuous integration, continuous delivery, but also continuous deployment." +

+ And while they explore that aspect of the technology, Mountain and his team will likely be practicing what he preaches to others taking the cloud native journey. "See what happens when you break it, because it’s important to understand the limits of the system," he says. Or rather, he notes, the advantages of it. "Breaking things on Kube is actually one of the nice things about it—it recovers. It’s the only real way that you’ll see that you might be able to do things." +
+
diff --git a/content/zh/case-studies/ancestry/ancestry_featured.png b/content/zh/case-studies/ancestry/ancestry_featured.png new file mode 100644 index 0000000000000..6d63daae32139 Binary files /dev/null and b/content/zh/case-studies/ancestry/ancestry_featured.png differ diff --git a/content/zh/case-studies/ancestry/ancestry_logo.png b/content/zh/case-studies/ancestry/ancestry_logo.png new file mode 100644 index 0000000000000..5fbade8decbc1 Binary files /dev/null and b/content/zh/case-studies/ancestry/ancestry_logo.png differ diff --git a/content/zh/case-studies/ancestry/index.html b/content/zh/case-studies/ancestry/index.html new file mode 100644 index 0000000000000..a992a284ac86b --- /dev/null +++ b/content/zh/case-studies/ancestry/index.html @@ -0,0 +1,117 @@ +--- +title: Ancestry Case Study + +case_study_styles: true +cid: caseStudies +css: /css/style_ancestry.css +--- + +
+

CASE STUDY:
Digging Into the Past With New Technology

+ +
+ +
+ Company  Ancestry     Location  Lehi, Utah     Industry  Internet Company, Online Services +
+ +
+ +
+
+
+ +

Challenge

+Ancestry, the global leader in family history and consumer genomics, uses sophisticated engineering and technology to help everyone, everywhere discover the story of what led to them. The company has spent more than 30 years innovating and building products and technologies that at their core, result in real and emotional human responses. Ancestry currently serves more than 2.6 million paying subscribers, holds 20 billion historical records, 90 million family trees and more than four million people are in its AncestryDNA network, making it the largest consumer genomics DNA network in the world. The company's popular website, ancestry.com, has been working with big data long before the term was popularized. The site was built on hundreds of services, technologies and a traditional deployment methodology. "It's worked well for us in the past," says Paul MacKay, software engineer and architect at Ancestry, "but had become quite cumbersome in its processing and is time-consuming. As a primarily online service, we are constantly looking for ways to accelerate to be more agile in delivering our solutions and our products." + +
+ +
+ +
+

Solution

+ + The company is transitioning to cloud native infrastructure, using Docker containerization, Kubernetes orchestration and Prometheus for cluster monitoring.
+
+

Impact

+ "Every single product, every decision we make at Ancestry, focuses on delighting our customers with intimate, sometimes life-changing discoveries about themselves and their families," says MacKay. "As the company continues to grow, the increased productivity gains from using Kubernetes has helped Ancestry make customer discoveries faster. With the move to Dockerization for example, instead of taking between 20 to 50 minutes to deploy a new piece of code, we can now deploy in under a minute for much of our code. We’ve truly experienced significant time savings in addition to the various features and benefits from cloud native and Kubernetes-type technologies." +
+
+
+ +
+
+ "At a certain point, you have to step back if you're going to push a new technology and get key thought leaders with engineers within the organization to become your champions for new technology adoption. At training sessions, the development teams were always the ones that were saying, 'Kubernetes saved our time tremendously; it's an enabler. It really is incredible.'"

- PAUL MACKAY, SOFTWARE ENGINEER AND ARCHITECT AT ANCESTRY +
+
+ +
+
+

It started with a Shaky Leaf.

+ + Since its introduction a decade ago, the Shaky Leaf icon has become one of Ancestry's signature features, which signals to users that there's a helpful hint you can use to find out more about your family tree.

+ So when the company decided to begin moving its infrastructure to cloud native technology, the first service that was launched on Kubernetes, the open source platform for managing application containers across clusters of hosts, was this hint system. Think of it as Amazon's recommended products, but instead of recommending products the company recommends records, stories, or familial connections. "It was a very important part of the site," says Ancestry software engineer and architect Paul MacKay, "but also small enough for a pilot project that we knew we could handle in a very appropriate, secure way."

+ And when it went live smoothly in early 2016, "our deployment time for this service literally was cut down from 50 minutes to 2 or 5 minutes," MacKay adds. "The development team was just thrilled because we're focused on supplying a great experience for our customers. And that means features, it means stability, it means all those things that we need for a first-in-class type operation."

+ The stability of that Shaky Leaf was a signal for MacKay and his team that their decision to embrace cloud native technologies was the right one for the company. With a private data center, Ancestry built its website (which launched in 1996) on hundreds of services and technologies and a traditional deployment methodology. "It worked well for us in the past, but the sum of the legacy systems became quite cumbersome in its processing and was time-consuming," says MacKay. "We were looking for other ways to accelerate, to be more agile in delivering our solutions and our products." +
+
+ +
+
+"And when it [Kubernetes] went live smoothly in early 2016, 'our deployment time for this service literally was cut down from 50 minutes to 2 or 5 minutes,' MacKay adds. 'The development team was just thrilled because we're focused on supplying a great experience for our customers. And that means features, it means stability, it means all those things that we need for a first-in-class type operation.'" +
+
+ +
+
+ That need led them in 2015 to explore containerization. Ancestry engineers had already been using technology like Java and Python on Linux, so part of the decision was about making the infrastructure more Linux-friendly. They quickly decided that they wanted to go with Docker for containerization, "but it always comes down to the orchestration part of it to make it really work," says MacKay.

+ His team looked at orchestration platforms offered by Docker Compose, Mesos and OpenStack, and even started to prototype some homegrown solutions. And then they started hearing rumblings of the imminent release of Kubernetes v1.0. "At the forefront, we were looking at the secret store, so we didn't have to manage that all ourselves, the config maps, the methodology of seamless deployment strategy," he says. "We found that how Kubernetes had done their resources, their types, their labels and just their interface was so much further advanced than the other things we had seen. It was a feature fit."

+
+ Plus, MacKay says, "I just believed in the confidence that comes with the history that Google has with containerization. So we started out right on the leading edge of it. And we haven't looked back since."

+ Which is not to say that adopting a new technology hasn't come with some challenges. "Change is hard," says MacKay. "Not because the technology is hard or that the technology is not good. It's just that people like to do things like they had done [before]. You have the early adopters and you have those who are coming in later. It was a learning experience on both sides."

+ Figuring out the best deployment operations for Ancestry was a big part of the work it took to adopt cloud native infrastructure. "We want to make sure the process is easy and also controlled in the manner that allows us the highest degree of security that we demand and our customers demand," says MacKay. "With Kubernetes and other products, there are some good solutions, but a little bit of glue is needed to bring it into corporate processes and governances. It's like having a set of gloves that are generic, but when you really do want to grab something you have to make it so it's customized to you. That's what we had to do."

+ Their best practices include allowing their developers to deploy into development stage and production, but then controlling the aspects that need governance and auditing, such as secrets. They found that having one namespace per service is useful for achieving that containment of secrets and config maps. And for their needs, having one container per pod makes it easier to manage and to have a smaller unit of deployment. +

+
+
+ +
+
+ +"The success of Ancestry's first deployment of the hint system on Kubernetes helped create momentum for greater adoption of the technology." + +
+
+ +
+
+ With that process established, the time spent on deployment was cut down to under a minute for some services. "As programmers, we have what's called REPL: read, evaluate, print, and loop, but with Kubernetes, we have CDEL: compile, deploy, execute, and loop," says MacKay. "It's a very quick loop back and a great benefit to understand that when our services are deployed in production, they're the same as what we tested in the pre-production environments. The approach of cloud native for Ancestry provides us a better ability to scale and to accommodate the business needs as work loads occur."

+ The success of Ancestry's first deployment of the hint system on Kubernetes helped create momentum for greater adoption of the technology. "Engineers like to code, they like to do features, they don't like to sit around waiting for things to be deployed and worrying about scaling up and out and down," says MacKay. "After a while the engineers became our champions. At training sessions, the development teams were always the ones saying, 'Kubernetes saved our time tremendously; it's an enabler; it really is incredible.' Over time, we were able to convince our management that this was a transition that the industry is making and that we needed to be a part of it."

+ A year later, Ancestry has transitioned a good number of applications to Kubernetes. "We have many different services that make up the rich environment that [the website] has from both the DNA side and the family history side," says MacKay. "We have front-end stacks, back-end stacks and back-end processing type stacks that are in the cluster."

+ The company continues to weigh which services it will move forward to Kubernetes, which ones will be kept as is, and which will be replaced in the future and thus don't have to be moved over. MacKay estimates that the company is "approaching halfway on those features that are going forward. We don't have to do a lot of convincing anymore. It's more of an issue of timing with getting product management and engineering staff the knowledge and information that they need." +
+
+ +
+
+ "... 'I believe in Kubernetes. I believe in containerization. I think + if we can get there and establish ourselves in that world, we will be further along and far better off being agile and all the things we talk about, + and it'll go forward.'" +
+
+ +
+
+ + +Looking ahead, MacKay sees Ancestry maximizing the benefits of Kubernetes in 2017. "We're very close to having everything that should be or could be in a Linux-friendly world in Kubernetes by the end of the year," he says, adding that he's looking forward to features such as federation and horizontal pod autoscaling that are currently in the works. "Kubernetes has been very wonderful for us and we continue to ride the wave."

+That wave, he points out, has everything to do with the vibrant Kubernetes community, which has grown by leaps and bounds since Ancestry joined it as an early adopter. "This is just a very rough way of judging it, but on Slack in June 2015, there were maybe 500 on there," MacKay says. "The last time I looked there were maybe 8,500 just on the Slack channel. There are so many major companies and different kinds of companies involved now. It's the variety of contributors, the number of contributors, the incredibly competent and friendly community."

+As much as he and his team at Ancestry have benefited from what he calls "the goodness and the technical abilities of many" in the community, they've also contributed information about best practices, logged bug issues and participated in the open source conversation. And they've been active in attending meetups to help educate and give back to the local tech community in Utah. Says MacKay: "We're trying to give back as far as our experience goes, rather than just code." +

When he meets with companies considering adopting cloud native infrastructure, the best advice he has to give from Ancestry's Kubernetes journey is this: "Start small, but with hard problems," he says. And "you need a patron who understands the vision of containerization, to help you tackle the political as well as other technical roadblocks that can occur when change is needed."

+With the changes that MacKay's team has led over the past year and a half, cloud native will be part of Ancestry's technological genealogy for years to come. MacKay has been such a champion of the technology that he says people have jokingly accused him of having a Kubernetes tattoo.

+"I really don't," he says with a laugh. "But I'm passionate. I'm not exclusive to any technology; I use whatever I need that's out there that makes us great. If it's something else, I'll use it. But right now I believe in Kubernetes. I believe in containerization. I think if we can get there and establish ourselves in that world, we will be further along and far better off being agile and all the things we talk about, and it'll go forward."

+He pauses. "So, yeah, I guess you can say I'm an evangelist for Kubernetes," he says. "But I'm not getting a tattoo!" + + +
+
diff --git a/content/zh/case-studies/ant-financial/ant-financial_featured_logo.png b/content/zh/case-studies/ant-financial/ant-financial_featured_logo.png new file mode 100644 index 0000000000000..cb4034502734d Binary files /dev/null and b/content/zh/case-studies/ant-financial/ant-financial_featured_logo.png differ diff --git a/content/zh/case-studies/ant-financial/index.html b/content/zh/case-studies/ant-financial/index.html new file mode 100644 index 0000000000000..92b46526dee48 --- /dev/null +++ b/content/zh/case-studies/ant-financial/index.html @@ -0,0 +1,96 @@ +--- +title: Ant Financial Case Study +linkTitle: ant-financial +case_study_styles: true +cid: caseStudies +css: /css/style_case_studies.css +featured: false +--- + +
+

CASE STUDY:
Ant Financial’s Hypergrowth Strategy Using Kubernetes + +

+ +
+ +
+ Company  Ant Financial     Location  Hangzhou, China     Industry  Financial Services +
+ +
+
+
+
+

Challenge

+ Officially founded in October 2014, Ant Financial originated from Alipay, the world’s largest online payment platform that launched in 2004. The company also offers numerous other services leveraging technology innovation. With the volume of transactions Alipay handles for its 900+ million users worldwide (through its local and global partners)—256,000 transactions per second at the peak of Double 11 Singles Day 2017, and total gross merchandise value of $31 billion for Singles Day 2018—not to mention that of its other services, Ant Financial faces “data processing challenge in a whole new way,” says Haojie Hang, who is responsible for Product Management for the Storage and Compute Group. “We see three major problems of operating at that scale: how to provide real-time compute, storage, and processing capability, for instance to make real-time recommendations for fraud detection; how to provide intelligence on top of this data, because there’s too much data and then we’re not getting enough insight; and how to apply security in the application level, in the middleware level, the system level, even the chip level.” In order to provide reliable and consistent services to its customers, Ant Financial embraced containers in early 2014, and soon needed an orchestration solution for the tens-of-thousands-of-node clusters in its data centers. + +

Solution

+ After investigating several technologies, the team chose Kubernetes for orchestration, as well as a number of other CNCF projects, including Prometheus, OpenTracing, etcd and CoreDNS. “In late 2016, we decided that Kubernetes will be the de facto standard,” says Hang. “Looking back, we made the right bet on the right technology. But then we needed to move the production workload from the legacy infrastructure to the latest Kubernetes-enabled platform, and that took some time, because we are very careful in terms of reliability and consistency.” All core financial systems were containerized by November 2017, and the migration to Kubernetes is ongoing. +
+

Impact

+ “We’ve seen at least tenfold in improvement in terms of the operations with cloud native technology, which means you can have tenfold increase in terms of output,” says Hang. Ant also provides its fully integrated financial cloud platform to business partners around the world, and hopes to power the next generation of digital banking with deep experience in service innovation and technology expertise. Hang says the team hasn’t begun to focus on optimizing the Kubernetes platform, either: “Because we’re still in the hyper growth stage, we’re not in a mode where we do cost saving yet.” +
+ +
+
+
+
+ "In late 2016, we decided that Kubernetes will be the de facto standard. Looking back, we made the right bet on the right technology." +

- HAOJIE HANG, PRODUCT MANAGEMENT, ANT FINANCIAL
+
+
+
+
+

A spinoff of the multinational conglomerate Alibaba, Ant Financial boasts a $150+ billion valuation and the scale to match. The fintech startup, launched in 2014, is comprised of Alipay, the world’s largest online payment platform, and numerous other services leveraging technology innovation.

+ And the volume of transactions that Alipay handles for over 900 million users worldwide (through its local and global partners) is staggering: 256,000 per second at the peak of Double 11 Singles Day 2017, and total gross merchandise value of $31 billion for Singles Day 2018. With the mission of “bringing the world equal opportunities,” Ant Financial is dedicated to creating an open, shared credit system and financial services platform through technology innovations. +

+ Combine that with the operations of its other properties—such as the Huabei online credit system, Jiebei lending service, and the 350-million-user Ant Forest green energy mobile app—and Ant Financial faces “data processing challenge in a whole new way,” says Haojie Hang, who is responsible for Product Management for the Storage and Compute Group. “We see three major problems of operating at that scale: how to provide real-time compute, storage, and processing capability, for instance to make real-time recommendations for fraud detection; how to provide intelligence on top of this data, because there’s too much data and we’re not getting enough insight; and how to apply security in the application level, in the middleware level, the system level, even the chip level.” +

+ To address those challenges and provide reliable and consistent services to its customers, Ant Financial embraced Docker containerization in 2014. But they soon realized that they needed an orchestration solution for some tens-of-thousands-of-node clusters in the company’s data centers. +
+
+
+
+ "On Double 11 this year, we had plenty of nodes on Kubernetes, but compared to the whole scale of our infrastructure, this is still in progress."

- RANGER YU, GLOBAL TECHNOLOGY PARTNERSHIP & DEVELOPMENT, ANT FINANCIAL
+ +
+
+
+
+ The team investigated several technologies, including Docker Swarm and Mesos. “We did a lot of POCs, but we’re very careful in terms of production systems, because we want to make sure we don’t lose any data,” says Hang. “You cannot afford to have a service downtime for one minute; even one second has a very, very big impact. We operate every day under pressure to provide reliable and consistent services to consumers and businesses in China and globally.” +

+ Ultimately, Hang says Ant chose Kubernetes because it checked all the boxes: a strong community, technology that “will be relevant in the next three to five years,” and a good match for the company’s engineering talent. “In late 2016, we decided that Kubernetes will be the de facto standard,” says Hang. “Looking back, we made the right bet on the right technology. But then we needed to move the production workload from the legacy infrastructure to the latest Kubernetes-enabled platform. We spent a lot of time learning and then training our people to build applications on Kubernetes well.” +

+ All core financial systems were containerized by November 2017, and the migration to Kubernetes is ongoing. Ant’s platform also leverages a number of other CNCF projects, including Prometheus, OpenTracing, etcd and CoreDNS. “On Double 11 this year, we had plenty of nodes on Kubernetes, but compared to the whole scale of our infrastructure, this is still in progress,” says Ranger Yu, Global Technology Partnership & Development. +
+
+
+
+ "We’re very grateful for CNCF and this amazing technology, which we need as we continue to scale globally. We’re definitely embracing the community and open source more in the future."

- HAOJIE HANG, PRODUCT MANAGEMENT, ANT FINANCIAL
+
+
+ +
+
+ Still, there has already been an impact. “Cloud native technology has benefited us greatly in terms of efficiency,” says Hang. “In general, we want to make sure our infrastructure is nimble and flexible enough for the work that could happen tomorrow. That’s the goal. And with cloud native technology, we’ve seen at least tenfold improvement in operations, which means you can have tenfold increase in terms of output. Let’s say you are operating 10 nodes with one person. With cloud native, tomorrow you can have 100 nodes.” +

+ Ant also provides its financial cloud platform to partners around the world, and hopes to power the next generation of digital banking with deep experience in service innovation and technology expertise. Hang says the team hasn’t begun to focus on optimizing the Kubernetes platform, either: “Because we’re still in the hyper growth stage, we’re not in a mode where we do cost-saving yet.” +

+ The CNCF community has also been a valuable asset during Ant Financial’s move to cloud native. “If you are applying a new technology, it’s very good to have a community to discuss technical problems with other users,” says Hang. “We’re very grateful for CNCF and this amazing technology, which we need as we continue to scale globally. We’re definitely embracing the community and open sourcing more in the future.” +
+ +
+
+"In China, we are the North Star in terms of innovation in financial and other related services,” says Hang. “We definitely want to make sure we’re still leading in the next 5 to 10 years with our investment in technology."

- RANGER YU, GLOBAL TECHNOLOGY PARTNERSHIP & DEVELOPMENT, ANT FINANCIAL
+
+ +
+ In fact, the company has already started to open source some of its cloud native middleware. “We are going to be very proactive about that,” says Yu. “CNCF provided a platform so everyone can plug in or contribute components. This is very good open source governance.” +

+ Looking ahead, the Ant team will continue to evaluate many other CNCF projects. Building a service mesh community in China, the team has brought together many China-based companies and developers to discuss the potential of that technology. “Service mesh is very attractive for Chinese developers and end users because we have a lot of legacy systems running now, and it’s an ideal mid-layer to glue everything together, both new and legacy,” says Hang. “For new technologies, we look very closely at whether they will last.” +

+ At Ant, Kubernetes passed that test with flying colors, and the team hopes other companies will follow suit. “In China, we are the North Star in terms of innovation in financial and other related services,” says Hang. “We definitely want to make sure we’re still leading in the next 5 to 10 years with our investment in technology.” + +
+
diff --git a/content/zh/case-studies/appdirect/appdirect_featured_logo.png b/content/zh/case-studies/appdirect/appdirect_featured_logo.png new file mode 100644 index 0000000000000..724a8a75684f0 Binary files /dev/null and b/content/zh/case-studies/appdirect/appdirect_featured_logo.png differ diff --git a/content/zh/case-studies/appdirect/index.html b/content/zh/case-studies/appdirect/index.html new file mode 100644 index 0000000000000..16d93cce5cb4e --- /dev/null +++ b/content/zh/case-studies/appdirect/index.html @@ -0,0 +1,99 @@ +--- +title: AppDirect Case Study + +linkTitle: AppDirect +case_study_styles: true +cid: caseStudies +css: /css/style_case_studies.css +logo: appdirect_featured_logo.png +featured: true +weight: 4 +quote: > + We made the right decisions at the right time. Kubernetes and the cloud native technologies are now seen as the de facto ecosystem. +--- + +
+

CASE STUDY:
AppDirect: How AppDirect Supported the 10x Growth of Its Engineering Staff with Kubernetess +

+ +
+ +
+ Company  AppDirect     Location  San Francisco, California +     Industry  Software +
+ +
+
+
+
+

Challenge

+ AppDirect provides an end-to-end commerce platform for cloud-based products and services. When Director of Software Development Pierre-Alexandre Lacerte began working there in 2014, the company had a monolith application deployed on a "tomcat infrastructure, and the whole release process was complex for what it should be," he says. "There were a lot of manual steps involved, with one engineer building a feature, then another team picking up the change. So you had bottlenecks in the pipeline to ship a feature to production." At the same time, the engineering team was growing, and the company realized it needed a better infrastructure to both support that growth and increase velocity. +

+

Solution

+ "My idea was: Let’s create an environment where teams can deploy their services faster, and they will say, ‘Okay, I don’t want to build in the monolith anymore. I want to build a service,’" says Lacerte. They considered and prototyped several different technologies before deciding to adopt Kubernetes in early 2016. Lacerte’s team has also integrated Prometheus monitoring into the platform; tracing is next. Today, AppDirect has more than 50 microservices in production and 15 Kubernetes clusters deployed on AWS and on premise around the world. +

+

Impact

+ The Kubernetes platform has helped support the engineering team’s 10x growth over the past few years. Coupled with the fact that they were continually adding new features, Lacerte says, "I think our velocity would have slowed down a lot if we didn’t have this new infrastructure." Moving to Kubernetes and services has meant that deployments have become much faster due to less dependency on custom-made, brittle shell scripts with SCP commands. Time to deploy a new version has shrunk from 4 hours to a few minutes. Additionally, the company invested a lot of effort to make things self-service for developers. "Onboarding a new service doesn’t require Jira tickets or meeting with three different teams," says Lacerte. Today, the company sees 1,600 deployments per week, compared to 1-30 before. The company also achieved cost savings by moving its marketplace and billing monoliths to Kubernetes from legacy EC2 hosts as well as by leveraging autoscaling, as traffic is higher during business hours. +
+
+
+
+
+ "It was an immense engineering culture shift, but the benefits are undeniable in terms of scale and speed." +

- Alexandre Gervais, Staff Software Developer, AppDirect
+
+
+
+
+

With its end-to-end commerce platform for cloud-based products and services, AppDirect has been helping organizations such as Comcast and GoDaddy simplify the digital supply chain since 2009.

+
+ When Director of Software Development Pierre-Alexandre Lacerte started working there in 2014, the company had a monolith application deployed on a "tomcat infrastructure, and the whole release process was complex for what it should be," he says. "There were a lot of manual steps involved, with one engineer building a feature then creating a pull request, and a QA or another engineer validating the feature. Then it gets merged and someone else will take care of the deployment. So we had bottlenecks in the pipeline to ship a feature to production."

+ At the same time, the engineering team of 40 was growing, and the company wanted to add an increasing number of features to its products. As a member of the platform team, Lacerte began hearing from multiple teams that wanted to deploy applications using different frameworks and languages, from Node.js to Spring Boot Java. He soon realized that in order to both support growth and increase velocity, the company needed a better infrastructure, and a system in which teams are autonomous, can do their own deploys, and be responsible for their services in production. + +
+
+
+
+ "We made the right decisions at the right time. Kubernetes and the cloud native technologies are now seen as the de facto ecosystem. We know where to focus our efforts in order to tackle the new wave of challenges we face as we scale out. The community is so active and vibrant, which is a great complement to our awesome internal team."

- Alexandre Gervais, Staff Software Developer, AppDirect +
+ +
+
+
+
+ From the beginning, Lacerte says, "My idea was: Let’s create an environment where teams can deploy their services faster, and they will say, ‘Okay, I don’t want to build in the monolith anymore. I want to build a service.’" (Lacerte left the company in 2019.)

+ Working with the operations team, Lacerte’s group got more control and access to the company’s AWS infrastructure, and started prototyping several orchestration technologies. "Back then, Kubernetes was a little underground, unknown," he says. "But we looked at the community, the number of pull requests, the velocity on GitHub, and we saw it was getting traction. And we found that it was much easier for us to manage than the other technologies." + They spun up the first few services on Kubernetes using Chef and Terraform provisioning, and as more services were added, more automation was, too. "We have clusters around the world—in Korea, in Australia, in Germany, and in the U.S.," says Lacerte. "Automation is critical for us." They’re now largely using Kops, and are looking at managed Kubernetes offerings from several cloud providers.

+ Today, though the monolith still exists, there are fewer and fewer commits and features. All teams are deploying on the new infrastructure, and services are the norm. AppDirect now has more than 50 microservices in production and 15 Kubernetes clusters deployed on AWS and on premise around the world.

+ Lacerte’s strategy ultimately worked because of the very real impact the Kubernetes platform has had to deployment time. Due to less dependency on custom-made, brittle shell scripts with SCP commands, time to deploy a new version has shrunk from 4 hours to a few minutes. Additionally, the company invested a lot of effort to make things self-service for developers. "Onboarding a new service doesn’t require Jira tickets or meeting with three different teams," says Lacerte. Today, the company sees 1,600 deployments per week, compared to 1-30 before. +
+
+
+
+ "I think our velocity would have slowed down a lot if we didn’t have this new infrastructure."

- Pierre-Alexandre Lacerte, Director of Software Development, AppDirect
+
+
+ +
+ +
+ Additionally, the Kubernetes platform has helped support the engineering team’s 10x growth over the past few years. "Ownership, a core value of AppDirect, reflects in our ability to ship services independently of our monolith code base," says Staff Software Developer Alexandre Gervais, who worked with Lacerte on the initiative. "Small teams now own critical parts of our business domain model, and they operate in their decoupled domain of expertise, with limited knowledge of the entire codebase. This reduces and isolates some of the complexity." Coupled with the fact that they were continually adding new features, Lacerte says, "I think our velocity would have slowed down a lot if we didn’t have this new infrastructure." + The company also achieved cost savings by moving its marketplace and billing monoliths to Kubernetes from legacy EC2 hosts as well as by leveraging autoscaling, as traffic is higher during business hours.

+ AppDirect’s cloud native stack also includes gRPC and Fluentd, and the team is currently working on setting up OpenCensus. The platform already has Prometheus integrated, so "when teams deploy their service, they have their notifications, alerts and configurations," says Lacerte. "For example, in the test environment, I want to get a message on Slack, and in production, I want a Slack message and I also want to get paged. We have integration with pager duty. Teams have more ownership on their services." + +
+ +
+
+"We moved from a culture limited to ‘pushing code in a branch’ to exciting new responsibilities outside of the code base: deployment of features and configurations; monitoring of application and business metrics; and on-call support in case of outages. It was an immense engineering culture shift, but the benefits are undeniable in terms of scale and speed."

- Pierre-Alexandre Lacerte, Director of Software Development, AppDirect
+
+ +
+ That of course also means more responsibility. "We asked engineers to expand their horizons," says Gervais. "We moved from a culture limited to ‘pushing code in a branch’ to exciting new responsibilities outside of the code base: deployment of features and configurations; monitoring of application and business metrics; and on-call support in case of outages. It was an immense engineering culture shift, but the benefits are undeniable in terms of scale and speed."

+ As the engineering ranks continue to grow, the platform team has a new challenge, of making sure that the Kubernetes platform is accessible and easily utilized by everyone. "How can we make sure that when we add more people to our team that they are efficient, productive, and know how to ramp up on the platform?" Lacerte says. So we have the evangelists, the documentation, some project examples. We do demos, we have AMA sessions. We’re trying different strategies to get everyone’s attention."

+ Three and a half years into their Kubernetes journey, Gervais feels AppDirect "made the right decisions at the right time," he says. "Kubernetes and the cloud native technologies are now seen as the de facto ecosystem. We know where to focus our efforts in order to tackle the new wave of challenges we face as we scale out. The community is so active and vibrant, which is a great complement to our awesome internal team. Going forward, our focus will really be geared towards benefiting from the ecosystem by providing added business value in our day-to-day operations." + + +
+
diff --git a/content/zh/case-studies/blablacar/blablacar_featured.png b/content/zh/case-studies/blablacar/blablacar_featured.png new file mode 100644 index 0000000000000..cfe37257b99e7 Binary files /dev/null and b/content/zh/case-studies/blablacar/blablacar_featured.png differ diff --git a/content/zh/case-studies/blablacar/blablacar_logo.png b/content/zh/case-studies/blablacar/blablacar_logo.png new file mode 100644 index 0000000000000..14606e036002e Binary files /dev/null and b/content/zh/case-studies/blablacar/blablacar_logo.png differ diff --git a/content/zh/case-studies/blablacar/index.html b/content/zh/case-studies/blablacar/index.html new file mode 100644 index 0000000000000..2d55ffb8d07fe --- /dev/null +++ b/content/zh/case-studies/blablacar/index.html @@ -0,0 +1,98 @@ +--- +title: BlaBlaCar Case Study + +case_study_styles: true +cid: caseStudies +css: /css/style_blablacar.css +--- + +
+

CASE STUDY:
Turning to Containerization to Support Millions of Rideshares

+ +
+ +
+ Company  BlaBlaCar     Location  Paris, France     Industry  Ridesharing Company +
+ +
+
+
+
+

Challenge

+ The world’s largest long-distance carpooling community, BlaBlaCar, connects 40 million members across 22 countries. The company has been experiencing exponential growth since 2012 and needed its infrastructure to keep up. "When you’re thinking about doubling the number of servers, you start thinking, ‘What should I do to be more efficient?’" says Simon Lallemand, Infrastructure Engineer at BlaBlaCar. "The answer is not to hire more and more people just to deal with the servers and installation." The team knew they had to scale the platform, but wanted to stay on their own bare metal servers. +
+
+

Solution

+ Opting not to shift to cloud virtualization or use a private cloud on their own servers, the BlaBlaCar team became early adopters of containerization, using the CoreOs runtime rkt, initially deployed using fleet cluster manager. Last year, the company switched to Kubernetes orchestration, and now also uses Prometheus for monitoring. +
+ +
+

Impact

+ "Before using containers, it would take sometimes a day, sometimes two, just to create a new service," says Lallemand. "With all the tooling that we made around the containers, copying a new service now is a matter of minutes. It’s really a huge gain. We are better at capacity planning in our data center because we have fewer constraints due to this abstraction between the services and the hardware we run on. For the developers, it also means they can focus only on the features that they’re developing, and not on the infrastructure." +
+
+
+
+
+ "When you’re switching to this cloud-native model and running everything in containers, you have to make sure that at any moment you can reboot without any downtime and without losing traffic. [With Kubernetes] our infrastructure is much more resilient and we have better availability than before."

- Simon Lallemand, Infrastructure Engineer at BlaBlaCar
+
+
+ +
+
+

For the 40 million users of BlaBlaCar, it’s easy to find strangers headed in the same direction to share rides and costs. You can even choose how much "bla bla" chatter you want from a long-distance ride mate.

+ Behind the scenes, though, the infrastructure was falling woefully behind the rider community’s exponential growth. Founded in 2006, the company hit its current stride around 2012. "Our infrastructure was very traditional," says Infrastructure Engineer Simon Lallemand, who began working at the company in 2014. "In the beginning, it was a bit chaotic because we had to [grow] fast. But then comes the time when you have to design things to make it manageable."

+ By 2015, the company had about 50 bare metal servers. The team was using a MySQL database and PHP, but, Lallemand says, "it was a very static way." They also utilized the configuration management system, Chef, but had little automation in its process. "When you’re thinking about doubling the number of servers, you start thinking, ‘What should I do to be more efficient?’" says Lallemand. "The answer is not to hire more and more people just to deal with the servers and installation."

+ Instead, BlaBlaCar began its cloud-native journey but wasn’t sure which route to take. "We could either decide to go into cloud virtualization or even use a private cloud on our own servers," says Lallemand. "But going into the cloud meant we had to make a lot of changes in our application work, and we were just not ready to make the switch from on premise to the cloud." They wanted to keep the great performance they got on bare metal, so they didn’t want to go to virtualization on premise.

+ The solution: containerization. This was early 2015 and containers were still relatively new. "It was a bold move at the time," says Lallemand. "We decided that the next servers that we would buy in the new data center would all be the same model, so we could outsource the maintenance of the servers. And we decided to go with containers and with CoreOS Container Linux as an abstraction for this hardware. It seemed future-proof to go with containers because we could see what companies were already doing with containers." +
+
+ +
+
+ "With all the tooling that we made around the containers, copying a new service is a matter of minutes. It’s a huge gain. For the developers, it means they can focus only on the features that they’re developing and not on the infrastructure or the hour they would test their code, or the hour that it would get deployed." +
+
+ +
+
+ Next, they needed to choose a runtime for the containers, but "there were very few deployments in production at that time," says Lallemand. They experimented with Docker but decided to go with rkt. Lallemand explains that for BlaBlaCar, it was "much simpler to integrate things that are on rkt." At the time, the project was still pre-v1.0, so "we could speak with the developers of rkt and give them feedback. It was an advantage." Plus, he notes, rkt was very stable, even at this early stage.

+ Once those decisions were made that summer, the company came up with a plan for implementation. First, they formed a task force to create a workflow that would be tested by three of the 10 members on Lallemand’s team. But they took care to run regular workshops with all 10 members to make sure everyone was on board. "When you’re focused on your product sometimes you forget if it’s really user friendly, whether other people can manage to create containers too," Lallemand says. "So we did a lot of iterations to find a good workflow."

+ After establishing the workflow, Lallemand says with a smile that "we had this strange idea that we should try the most difficult thing first. Because if it works, it will work for everything." So the first project the team decided to containerize was the database. "Nobody did that at the time, and there were really no existing tools for what we wanted to do, including building container images," he says. So the team created their own tools, such as dgr, which builds container images so that the whole team has a common framework to build on the same images with the same standards. They also revamped the service-discovery tools Nerve and Synapse; their versions, Go-Nerve and Go-Synapse, were written in Go and built to be more efficient and include new features. All of these tools were open-sourced.

+ At the same time, the company was working to migrate its entire platform to containers with a deadline set for Christmas 2015. With all the work being done in parallel, BlaBlaCar was able to get about 80 percent of its production into containers by its deadline with live traffic running on containers during December. (It’s now at 100 percent.) "It’s a really busy time for traffic," says Lallemand. "We knew that by using those new servers with containers, it would help us handle the traffic."

+ In the middle of that peak season for carpooling, everything worked well. "The biggest impact that we had was for the deployment of new services," says Lallemand. "Before using containers, we had to first deploy a new server and create configurations with Chef. It would take sometimes a day, sometimes two, just to create a new service. And with all the tooling that we made around the containers, copying a new service is a matter of minutes. So it’s really a huge gain. For the developers, it means they can focus only on the features that they’re developing and not on the infrastructure or the hour they would test their code, or the hour that it would get deployed." +
+
+ +
+
+ "We realized that there was a really strong community around it [Kubernetes], which meant we would not have to maintain a lot of tools of our own," says Lallemand. "It was better if we could contribute to some bigger project like Kubernetes." +
+
+ +
+
+ In order to meet their self-imposed deadline, one of the decisions they made was to not do any "orchestration magic" for containers in the first production alignment. Instead, they used the basic fleet tool from CoreOS to deploy their containers. (They did build a tool called GGN, which they’ve open-sourced, to make it more manageable for their system engineers to use.)

+ Still, the team knew that they’d want more orchestration. "Our tool was doing a pretty good job, but at some point you want to give more autonomy to the developer team," Lallemand says. "We also realized that we don’t want to be the single point of contact for developers when they want to launch new services." By the summer of 2016, they found their answer in Kubernetes, which had just begun supporting rkt implementation.

+ After discussing their needs with their contacts at CoreOS and Google, they were convinced that Kubernetes would work for BlaBlaCar. "We realized that there was a really strong community around it, which meant we would not have to maintain a lot of tools of our own," says Lallemand. "It was better if we could contribute to some bigger project like Kubernetes." They also started using Prometheus, as they were looking for "service-oriented monitoring that could be updated nightly." Production on Kubernetes began in December 2016. "We like to do crazy stuff around Christmas," he adds with a laugh.

+ BlaBlaCar now has about 3,000 pods, with 1200 of them running on Kubernetes. Lallemand leads a "foundations team" of 25 members who take care of the networks, databases and systems for about 100 developers. There have been some challenges getting to this point. "The rkt implementation is still not 100 percent finished," Lallemand points out. "It’s really good, but there are some features still missing. We have questions about how we do things with stateful services, like databases. We know how we will be migrating some of the services; some of the others are a bit more complicated to deal with. But the Kubernetes community is making a lot of progress on that part."

+ The team is particularly happy that they’re now able to plan capacity better in the company’s data center. "We have fewer constraints since we have this abstraction between the services and the hardware we run on," says Lallemand. "If we lose a server because there’s a hardware problem on it, we just move the containers onto another server. It’s much more efficient. We do that by just changing a line in the configuration file. And with Kubernetes, it should be automatic, so we would have nothing to do." +
+
+ +
+
+ "If we lose a server because there’s a hardware problem on it, we just move the containers onto another server. It’s much more efficient. We do that by just changing a line in the configuration file. With Kubernetes, it should be automatic, so we would have nothing to do." +
+
+ +
+
+ And these advances ultimately trickle down to BlaBlaCar’s users. "We have improved availability overall on our website," says Lallemand. "When you’re switching to this cloud-native model with running everything in containers, you have to make sure that you can at any moment reboot a server or a data container without any downtime, without losing traffic. So now our infrastructure is much more resilient and we have better availability than before."

+ Within BlaBlaCar’s technology department, the cloud-native journey has created some profound changes. Lallemand thinks that the regular meetings during the conception stage and the training sessions during implementation helped. "After that everybody took part in the migration process," he says. "Then we split the organization into different ‘tribes’—teams that gather developers, product managers, data analysts, all the different jobs, to work on a specific part of the product. Before, they were organized by function. The idea is to give all these tribes access to the infrastructure directly in a self-service way without having to ask. These people are really autonomous. They have responsibility of that part of the product, and they can make decisions faster."

+ This DevOps transformation turned out to be a positive one for the company’s staffers. "The team was very excited about the DevOps transformation because it was new, and we were working to make things more reliable, more future-proof," says Lallemand. "We like doing things that very few people are doing, other than the internet giants."

+ With these changes already making an impact, BlaBlaCar is looking to split up more and more of its application into services. "I don’t say microservices because they’re not so micro," Lallemand says. "If we can split the responsibilities between the development teams, it would be easier to manage and more reliable, because we can easily add and remove services if one fails. You can handle it easily, instead of adding a big monolith that we still have."

+ When Lallemand speaks to other European companies curious about what BlaBlaCar has done with its infrastructure, he tells them to come along for the ride. "I tell them that it’s such a pleasure to deal with the infrastructure that we have today compared to what we had before," he says. "They just need to keep in mind their real motive, whether it’s flexibility in development or reliability or so on, and then go step by step towards reaching those objectives. That’s what we’ve done. It’s important not to do technology for the sake of technology. Do it for a purpose. Our focus was on helping the developers." +
+
diff --git a/content/zh/case-studies/blackrock/blackrock_featured.png b/content/zh/case-studies/blackrock/blackrock_featured.png new file mode 100644 index 0000000000000..3898b88c9fa43 Binary files /dev/null and b/content/zh/case-studies/blackrock/blackrock_featured.png differ diff --git a/content/zh/case-studies/blackrock/blackrock_logo.png b/content/zh/case-studies/blackrock/blackrock_logo.png new file mode 100644 index 0000000000000..51e914a63b259 Binary files /dev/null and b/content/zh/case-studies/blackrock/blackrock_logo.png differ diff --git a/content/zh/case-studies/blackrock/index.html b/content/zh/case-studies/blackrock/index.html new file mode 100644 index 0000000000000..6bef0ec7084d3 --- /dev/null +++ b/content/zh/case-studies/blackrock/index.html @@ -0,0 +1,112 @@ +--- +title: BlackRock Case Study + +case_study_styles: true +cid: caseStudies +css: /css/style_blackrock.css +--- + +
+

CASE STUDY:
+
Rolling Out Kubernetes in Production in 100 Days
+

+ +
+ +
+ Company  BlackRock     Location  New York, NY     Industry  Financial Services +
+ +
+ +
+ +
+
+

Challenge

+ The world’s largest asset manager, BlackRock operates a very controlled static deployment scheme, which has allowed for scalability over the years. But in their data science division, there was a need for more dynamic access to resources. "We want to be able to give every investor access to data science, meaning Python notebooks, or even something much more advanced, like a MapReduce engine based on Spark," says Michael Francis, a Managing Director in BlackRock’s Product Group, which runs the company’s investment management platform. "Managing complex Python installations on users’ desktops is really hard because everyone ends up with slightly different environments. We have existing environments that do these things, but we needed to make it real, expansive and scalable. Being able to spin that up on demand, tear it down, make that much more dynamic, became a critical thought process for us. It’s not so much that we had to solve our main core production problem, it’s how do we extend that? How do we evolve?" +
+ +
+

Solution

+ Drawing from what they learned during a pilot done last year using Docker environments, Francis put together a cross-sectional team of 20 to build an investor research web app using Kubernetes with the goal of getting it into production within one quarter. +

+

Impact

+ "Our goal was: How do you give people tools rapidly without having to install them on their desktop?" says Francis. And the team hit the goal within 100 days. Francis is pleased with the results and says, "We’re going to use this infrastructure for lots of other application workloads as time goes on. It’s not just data science; it’s this style of application that needs the dynamism. But I think we’re 6-12 months away from making a [large scale] decision. We need to gain experience of running the system in production, we need to understand failure modes and how best to manage operational issues. What’s interesting is that just having this technology there is changing the way our developers are starting to think about their future development." + +
+
+ +
+ +
+
+ "My message to other enterprises like us is you can actually integrate Kubernetes into an existing, well-orchestrated machinery. You don’t have to throw out everything you do. And using Kubernetes made a complex problem significantly easier."

- Michael Francis, Managing Director, BlackRock
+
+
+ +
+ +
+ One of the management objectives for BlackRock’s Product Group employees in 2017 was to "build cool stuff." Led by Managing Director Michael Francis, a cross-sectional group of 20 did just that: They rolled out a full production Kubernetes environment and released a new investor research web app on it. In 100 days.

+ For a company that’s the world’s largest asset manager, "just equipment procurement can take 100 days sometimes, let alone from inception to delivery," says Karl Wieman, a Senior System Administrator. "It was an aggressive schedule. But it moved the dial." + In fact, the project achieved two goals: It solved a business problem (creating the needed web app) as well as provided real-world, in-production experience with Kubernetes, a cloud-native technology that the company was eager to explore. "It’s not so much that we had to solve our main core production problem, it’s how do we extend that? How do we evolve?" says Francis. The ultimate success of this project, beyond delivering the app, lies in the fact that "we’ve managed to integrate a radically new thought process into a controlled infrastructure that we didn’t want to change."

+ After all, in its three decades of existence, BlackRock has "a very well-established environment for managing our compute resources," says Francis. "We manage large cluster processes on machines, so we do a lot of orchestration and management for our main production processes in a way that’s very cloudish in concept. We’re able to manage them in a very controlled, static deployment scheme, and that has given us a huge amount of scalability."

+ Though that works well for the core production, the company has found that some data science workloads require more dynamic access to resources. "It’s a very bursty process," says Francis, who is head of data for the company’s Aladdin investment management platform division.

+ Aladdin, which connects the people, information and technology needed for money management in real time, is used internally and is also sold as a platform to other asset managers and insurance companies. "We want to be able to give every investor access to data science, meaning Python notebooks, or even something much more advanced, like a MapReduce engine based on Spark," says Francis. But "managing complex Python installations on users’ desktops is really hard because everyone ends up with slightly different environments. Docker allows us to flatten that environment." +
+
+ +
+
+ "We manage large cluster processes on machines, so we do a lot of orchestration and management for our main production processes in a way that’s very cloudish in concept. We’re able to manage them in a very controlled, static deployment scheme, and that has given us a huge amount of scalability." +
+
+ +
+
+ Still, challenges remain. "If you have a shared cluster, you get this storming herd problem where everyone wants to do the same thing at the same time," says Francis. "You could put limits on it, but you’d have to build an infrastructure to define limits for our processes, and the Python notebooks weren’t really designed for that. We have existing environments that do these things, but we needed to make it real, expansive, and scalable. Being able to spin that up on demand, tear it down, and make that much more dynamic, became a critical thought process for us."

+ Made up of managers from technology, infrastructure, production operations, development and information security, Francis’s team was able to look at the problem holistically and come up with a solution that made sense for BlackRock. "Our initial straw man was that we were going to build everything using Ansible and run it all using some completely different distributed environment," says Francis. "That would have been absolutely the wrong thing to do. Had we gone off on our own as the dev team and developed this solution, it would have been a very different product. And it would have been very expensive. We would not have gone down the route of running under our existing orchestration system. Because we don’t understand it. These guys [in operations and infrastructure] understand it. Having the multidisciplinary team allowed us to get to the right solutions and that actually meant we didn’t build anywhere near the amount we thought we were going to end up building."

+ In search of a solution in which they could manage usage on a user-by-user level, Francis’s team gravitated to Red Hat’s OpenShift Kubernetes offering. The company had already experimented with other cloud-native environments, but the team liked that Kubernetes was open source, and "we felt the winds were blowing in the direction of Kubernetes long term," says Francis. "Typically we make technology choices that we believe are going to be here in 5-10 years’ time, in some form. And right now, in this space, Kubernetes feels like the one that’s going to be there." Adds Uri Morris, Vice President of Production Operations: "When you see that the non-Google committers to Kubernetes overtook the Google committers, that’s an indicator of the momentum."

+ Once that decision was made, the major challenge was figuring out how to make Kubernetes work within BlackRock’s existing framework. "It’s about understanding how we can operate, manage and support a platform like this, in addition to tacking it onto our existing technology platform," says Project Manager Michael Maskallis. "All the controls we have in place, the change management process, the software development lifecycle, onboarding processes we go through—how can we do all these things?"

+ The first (anticipated) speed bump was working around issues behind BlackRock’s corporate firewalls. "One of our challenges is there are no firewalls in most open source software," says Francis. "So almost all install scripts fail in some bizarre way, and pulling down packages doesn’t necessarily work." The team ran into these types of problems using Minikube and did a few small pushes back to the open source project. + + +
+
+ +
+
+ "Typically we make technology choices that we believe are going to be here in 5-10 years’ time, in some form. And right now, in this space, Kubernetes feels like the one that’s going to be there." +
+
+ +
+
+ There were also questions about service discovery. "You can think of Aladdin as a cloud of services with APIs between them that allows us to build applications rapidly," says Francis. "It’s all on a proprietary message bus, which gives us all sorts of advantages but at the same time, how does that play in a third party [platform]?"

+ Another issue they had to navigate was that in BlackRock’s existing system, the messaging protocol has different instances in the different development, test and production environments. While Kubernetes enables a more DevOps-style model, it didn’t make sense for BlackRock. "I think what we are very proud of is that the ability for us to push into production is still incredibly rapid in this [new] infrastructure, but we have the control points in place, and we didn’t have to disrupt everything," says Francis. "A lot of the cost of this development was thinking how best to leverage our internal tools. So it was less costly than we actually thought it was going to be."

+ The project leveraged tools associated with the messaging bus, for example. "The way that the Kubernetes cluster will talk to our internal messaging platform is through a gateway program, and this gateway program already has built-in checks and throttles," says Morris. "We can use them to control and potentially throttle the requests coming in from Kubernetes’s very elastic infrastructure to the production infrastructure. We’ll continue to go in that direction. It enables us to scale as we need to from the operational perspective."

+ The solution also had to be complementary with BlackRock’s centralized operational support team structure. "The core infrastructure components of Kubernetes are hooked into our existing orchestration framework, which means that anyone in our support team has both control and visibility to the cluster using the existing operational tools," Morris explains. "That means that I don’t need to hire more people."

+ With those points established, the team created a procedure for the project: "We rolled this out first to a development environment, then moved on to a testing environment and then eventually to two production environments, in that sequential order," says Maskallis. "That drove a lot of our learning curve. We have all these moving parts, the software components on the infrastructure side, the software components with Kubernetes directly, the interconnectivity with the rest of the environment that we operate here at BlackRock, and how we connect all these pieces. If we came across issues, we fixed them, and then moved on to the different environments to replicate that until we eventually ended up in our production environment where this particular cluster is supposed to live."

+ The team had weekly one-hour working sessions with all the members (who are located around the world) participating, and smaller breakout or deep-dive meetings focusing on specific technical details. Possible solutions would be reported back to the group and debated the following week. "I think what made it a successful experiment was people had to work to learn, and they shared their experiences with others," says Vice President and Software Developer Fouad Semaan. Then, Francis says, "We gave our engineers the space to do what they’re good at. This hasn’t been top-down." + + +
+
+ +
+
+ "The core infrastructure components of Kubernetes are hooked into our existing orchestration framework, which means that anyone in our support team has both control and visibility to the cluster using the existing operational tools. That means that I don’t need to hire more people." + +
+
+ +
+
+ They were led by one key axiom: To stay focused and avoid scope creep. This meant that they wouldn’t use features that weren’t in the core of Kubernetes and Docker. But if there was a real need, they’d build the features themselves. Luckily, Francis says, "Because of the rapidity of the development, a lot of things we thought we would have to build ourselves have been rolled into the core product. [The package manager Helm is one example]. People have similar problems."

+ By the end of the 100 days, the app was up and running for internal BlackRock users. The initial capacity of 30 users was hit within hours, and quickly increased to 150. "People were immediately all over it," says Francis. In the next phase of this project, they are planning to scale up the cluster to have more capacity.

+ Even more importantly, they now have in-production experience with Kubernetes that they can continue to build on—and a complete framework for rolling out new applications. "We’re going to use this infrastructure for lots of other application workloads as time goes on. It’s not just data science; it’s this style of application that needs the dynamism," says Francis. "Is it the right place to move our core production processes onto? It might be. We’re not at a point where we can say yes or no, but we felt that having real production experience with something like Kubernetes at some form and scale would allow us to understand that. I think we’re 6-12 months away from making a [large scale] decision. We need to gain experience of running the system in production, we need to understand failure modes and how best to manage operational issues."

+ For other big companies considering a project like this, Francis says commitment and dedication are key: "We got the signoff from [senior management] from day one, with the commitment that we were able to get the right people. If I had to isolate what makes something complex like this succeed, I would say senior hands-on people who can actually drive it make a huge difference." With that in place, he adds, "My message to other enterprises like us is you can actually integrate Kubernetes into an existing, well-orchestrated machinery. You don’t have to throw out everything you do. And using Kubernetes made a complex problem significantly easier." + +
+
diff --git a/content/zh/case-studies/bose/bose_featured_logo.png b/content/zh/case-studies/bose/bose_featured_logo.png new file mode 100644 index 0000000000000..d4af69ed7275b Binary files /dev/null and b/content/zh/case-studies/bose/bose_featured_logo.png differ diff --git a/content/zh/case-studies/bose/index.html b/content/zh/case-studies/bose/index.html new file mode 100644 index 0000000000000..d22de2187af9c --- /dev/null +++ b/content/zh/case-studies/bose/index.html @@ -0,0 +1,103 @@ +--- +title: Bose Case Study +linkTitle: Bose +case_study_styles: true +cid: caseStudies +css: /css/style_case_studies.css +logo: bose_featured_logo.png +featured: false +weight: 2 +quote: > + The CNCF Landscape quickly explains what’s going on in all the different areas from storage to cloud providers to automation and so forth. This is our shopping cart to build a cloud infrastructure. We can go choose from the different aisles. +--- + +
+

CASE STUDY:
Bose: Supporting Rapid Development for Millions of IoT Products With Kubernetes + +

+ +
+ +
+ Company  Bose Corporation     Location  Framingham, Massachusetts +     Industry  Consumer Electronics +
+ +
+
+
+
+

Challenge

+ A household name in high-quality audio equipment, Bose has offered connected products for more than five years, and as that demand grew, the infrastructure had to change to support it. "We needed to provide a mechanism for developers to rapidly prototype and deploy services all the way to production pretty fast,” says Lead Cloud Engineer Josh West. In 2016, the company decided to start building a platform from scratch. The primary goal: "To be one to two steps ahead of the different product groups so that we are never scrambling to catch up with their scale,” says Cloud Architecture Manager Dylan O’Mahony. +

+

Solution

+ From the beginning, the team knew it wanted a microservices architecture. After evaluating and prototyping a couple of orchestration solutions, the team decided to adopt Kubernetes for its scaled IoT Platform-as-a-Service running on AWS. The platform, which also incorporated Prometheus monitoring, launched in production in 2017, serving over 3 million connected products from the get-go. Bose has since adopted a number of other CNCF technologies, including Fluentd, CoreDNS, Jaeger, and OpenTracing. +

+

Impact

+ With about 100 engineers onboarded, the platform is now enabling 30,000 non-production deployments across dozens of microservices per year. In 2018, there were 1250+ production deployments. Just one production cluster holds 1,800 namespaces and 340 worker nodes. "We had a brand new service taken from concept through coding and deployment all the way to production, including hardening, security testing and so forth, in less than two and a half weeks,” says O’Mahony. +
+ +
+
+
+
+ "At Bose we’re building an IoT platform that has enabled our physical products. If it weren’t for Kubernetes and the rest of the CNCF projects being free open source software with such a strong community, we would never have achieved scale, or even gotten to launch on schedule." +

- Josh West, Lead Cloud Engineer, Bose
+
+
+
+
+

A household name in high-quality audio equipment, Bose has offered connected products for more than five years, and as that demand grew, the infrastructure had to change to support it.

+ "We needed to provide a mechanism for developers to rapidly prototype and deploy services all the way to production pretty fast,” says Lead Cloud Engineer Josh West. "There were a lot of cloud capabilities we wanted to provide to support our audio equipment and experiences.”

+In 2016, the company decided to start building an IoT platform from scratch. The primary goal: "To be one to two steps ahead of the different product groups so that we are never scrambling to catch up with their scale,” says Cloud Architecture Manager Dylan O’Mahony. "If they release a new connected product, we want to be already well ahead of being able to handle whatever scale that they’re going to throw at us.”

+From the beginning, the team knew it wanted a microservices architecture and platform as a service. After evaluating and prototyping orchestration solutions, including Mesos and Docker Swarm, the team decided to adopt Kubernetes for its platform running on AWS. Kubernetes was still in 1.5, but already the technology could do much of what the team wanted and needed for the present and the future. For West, that meant having storage and network handled. O’Mahony points to Kubernetes’ portability in case Bose decides to go multi-cloud.

+"Bose is a company that looks out for the long term,” says West. "Going with a quick commercial off-the-shelf solution might’ve worked for that point in time, but it would not have carried us forward, which is what we needed from Kubernetes and the CNCF.” + + +
+
+
+
+ "Everybody on the team thinks in terms of automation, leaning out the processes, getting things done as quickly as possible. When you step back and look at what it means for a 50-plus-year-old speaker company to have that sort of culture, it really is quite incredible, and I think the tools that we use and the foundation that we’ve built with them is a huge piece of that."

- Dylan O’Mahony, Cloud Architecture Manager, Bose
+ +
+
+
+
+ The team spent time working on choosing tooling to make the experience easier for developers. "Our developers interact with tools provided by our Ops team, and the Ops team run all of their tooling on top of Kubernetes,” says O’Mahony. "We try not to make direct Kubernetes access the only way. In fact, ideally, our developers wouldn’t even need to know that they’re running on Kubernetes.”

+ The platform, which also incorporated Prometheus monitoring from the beginning, backdoored its way into production in 2017, serving over 3 million connected products from the get-go. "Even though the speakers and the products that we were designing this platform for were still quite a ways away from being launched, we did have some connected speakers on the market,” says O’Mahony. "We basically started to point certain features of those speakers and the apps that go with those speakers to this platform.”

+ Today, just one of Bose’s production clusters holds 1,800 namespaces/discrete services and 340 nodes. With about 100 engineers now onboarded, the platform infrastructure is now enabling 30,000 non-production deployments across dozens of microservices per year. In 2018, there were 1250+ production deployments.. It’s a staggering improvement over some of Bose’s previous deployment processes, which supported far fewer deployments and services. + +
+
+
+
+ "The CNCF Landscape quickly explains what’s going on in all the different areas from storage to cloud providers to automation and so forth. This is our shopping cart to build a cloud infrastructure. We can go choose from the different aisles."

- Josh West, Lead Cloud Engineer, Bose
+
+
+ +
+
+ "We had a brand new service deployed from concept through coding and deployment all the way to production, including hardening, security testing and so forth, in less than two and a half weeks,” says O’Mahony. "Everybody thinks in terms of automation, leaning out the processes, getting things done as quickly as possible. When you step back and look at what it means for a 50-plus-year-old speaker company to have that sort of culture, it really is quite incredible, and I think the tools that we use and the foundation that we’ve built is a huge piece of that.”

+ Many of those technologies—such as Fluentd, CoreDNS, Jaeger, and OpenTracing—come from the CNCF Landscape, which West and O’Mahony have relied upon throughout Bose’s cloud native journey. "The CNCF Landscape quickly explains what’s going on in all the different areas from storage to cloud providers to automation and so forth,” says West. "This is our shopping cart to build a cloud infrastructure. We can go choose from the different aisles.”

+ And, he adds, "If it weren’t for Kubernetes and the rest of the CNCF projects being free open source software with such a strong community, we would never have achieved scale, or even gotten to launch on schedule.”

+ Another benefit of going cloud native: "We are even attracting much more talent into Bose because we’re so involved with the CNCF Landscape,” says West. (Yes, they’re hiring.) "It’s just enabled so many people to do so many great things and really brought Bose into the future of cloud.” + + +
+ +
+
+"We have a lot going on to support many more of our business units at Bose in addition to the consumer electronics division, which we currently do. It’s only because of the cloud native landscape and the tools and the features that are available that we can provide such a fantastic cloud platform for all the developers and divisions that are trying to enable some pretty amazing experiences."

- Dylan O’Mahony, Cloud Architecture Manager, Bose
+
+ +
+ In the coming year, the team wants to work on service mesh and serverless, as well as expansion around the world. "Getting our latency down by going multi-region is going to be a big focus for us,” says O’Mahony. "In order to make sure that our customers in Japan, Australia, and everywhere else are having a good experience, we want to have points of presence closer to them. It’s never been done at Bose before.”

+ That won’t stop them, because the team is all about lofty goals. "We want to get to billions of connected products!” says West. "We have a lot going on to support many more of our business units at Bose in addition to the consumer electronics division, which we currently do. It’s only because of the cloud native landscape and the tools and the features that are available that we can provide such a fantastic cloud platform for all the developers and divisions that are trying to enable some pretty amazing experiences.”

+ In fact, given the scale the platform is already supporting, says O’Mahony, "doing anything other than Kubernetes, I think, would be folly at this point.” + + +
+
+ + diff --git a/content/zh/case-studies/box/box_featured.png b/content/zh/case-studies/box/box_featured.png new file mode 100644 index 0000000000000..fc6dec602af17 Binary files /dev/null and b/content/zh/case-studies/box/box_featured.png differ diff --git a/content/zh/case-studies/box/box_logo.png b/content/zh/case-studies/box/box_logo.png new file mode 100644 index 0000000000000..b401dec6248c6 Binary files /dev/null and b/content/zh/case-studies/box/box_logo.png differ diff --git a/content/zh/case-studies/box/box_small.png b/content/zh/case-studies/box/box_small.png new file mode 100644 index 0000000000000..105b66a5832bb Binary files /dev/null and b/content/zh/case-studies/box/box_small.png differ diff --git a/content/zh/case-studies/box/index.html b/content/zh/case-studies/box/index.html new file mode 100644 index 0000000000000..bead8eb01a5bf --- /dev/null +++ b/content/zh/case-studies/box/index.html @@ -0,0 +1,114 @@ +--- +title: Box Case Study +case_study_styles: true +cid: caseStudies +css: /css/style_box.css +video: https://www.youtube.com/embed/of45hYbkIZs?autoplay=1 +quote: > + Kubernetes has the opportunity to be the new cloud platform. The amount of innovation that's going to come from being able to standardize on Kubernetes as a platform is incredibly exciting - more exciting than anything I've seen in the last 10 years of working on the cloud. + +--- + +
+

CASE STUDY:
+
An Early Adopter Envisions + a New Cloud Platform
+

+
+ + +
+ Company  Box     Location  Redwood City, California     Industry  Technology +
+ +
+ +
+ +
+
+ +

Challenge

+ Founded in 2005, the enterprise content management company allows its more than 50 million users to manage content in the cloud. Box was built primarily with bare metal inside the company’s own data centers, with a monolithic PHP code base. As the company was expanding globally, it needed to focus on "how we run our workload across many different cloud infrastructures from bare metal to public cloud," says Sam Ghods, Cofounder and Services Architect of Box. "It’s been a huge challenge because of different clouds, especially bare metal, have very different interfaces." +
+
+ +
+

Solution

+ Over the past couple of years, Box has been decomposing its infrastructure into microservices, and became an early adopter of, as well as contributor to, Kubernetes container orchestration. Kubernetes, Ghods says, has allowed Box’s developers to "target a universal set of concepts that are portable across all clouds."

+ +

Impact

+ "Before Kubernetes," Ghods says, "our infrastructure was so antiquated it was taking us more than six months to deploy a new microservice. Today, a new microservice takes less than five days to deploy. And we’re working on getting it to an hour." +
+
+ +
+ +
+
+ "We looked at a lot of different options, but Kubernetes really stood out....the fact that on day one it was designed to run on bare metal just as well as Google Cloud meant that we could actually migrate to it inside of our data centers, and then use those same tools and concepts to run across public cloud providers as well."

- SAM GHOUDS, CO-FOUNDER AND SERVICES ARCHITECT OF BOX +
+
+ +
+ +
+

In the summer of 2014, Box was feeling the pain of a decade’s worth of hardware and software infrastructure that wasn’t keeping up with the company’s needs.

+ + A platform that allows its more than 50 million users (including governments and big businesses like General Electric) to manage and share content in the cloud, Box was originally a PHP monolith of millions of lines of code built exclusively with bare metal inside of its own data centers. It had already begun to slowly chip away at the monolith, decomposing it into microservices. And "as we’ve been expanding into regions around the globe, and as the public cloud wars have been heating up, we’ve been focusing a lot more on figuring out how we run our workload across many different environments and many different cloud infrastructure providers," says Box Cofounder and Services Architect Sam Ghods. "It’s been a huge challenge thus far because of all these different providers, especially bare metal, have very different interfaces and ways in which you work with them."

+ Box’s cloud native journey accelerated that June, when Ghods attended DockerCon. The company had come to the realization that it could no longer run its applications only off bare metal, and was researching containerizing with Docker, virtualizing with OpenStack, and supporting public cloud.

+ At that conference, Google announced the release of its Kubernetes container management system, and Ghods was won over. "We looked at a lot of different options, but Kubernetes really stood out, especially because of the incredibly strong team of Borg veterans and the vision of having a completely infrastructure-agnostic way of being able to run cloud software," he says, referencing Google’s internal container orchestrator Borg. "The fact that on day one it was designed to run on bare metal just as well as Google Cloud meant that we could actually migrate to it inside of our data centers, and then use those same tools and concepts to run across public cloud providers as well."

+ Another plus: Ghods liked that Kubernetes has a universal set of API objects like pod, service, replica set and deployment object, which created a consistent surface to build tooling against. "Even PaaS layers like OpenShift or Deis that build on top of Kubernetes still treat those objects as first-class principles," he says. "We were excited about having these abstractions shared across the entire ecosystem, which would result in a lot more momentum than we saw in other potential solutions."

+ Box deployed Kubernetes in a cluster in a production data center just six months later. Kubernetes was then still pre-beta, on version 0.11. They started small: The very first thing Ghods’s team ran on Kubernetes was a Box API checker that confirms Box is up. "That was just to write and deploy some software to get the whole pipeline functioning," he says. Next came some daemons that process jobs, which was "nice and safe because if they experienced any interruptions, we wouldn’t fail synchronous incoming requests from customers." + +
+
+ +
+
+ "As we’ve been expanding into regions around the globe, and as the public cloud wars have been heating up, we’ve been focusing a lot more on figuring out how we [can have Kubernetes help] run our workload across many different environments and many different cloud infrastructure providers." +
+
+ +
+
+ The first live service, which the team could route to and ask for information, was launched a few months later. At that point, Ghods says, "We were comfortable with the stability of the Kubernetes cluster. We started to port some services over, then we would increase the cluster size and port a few more, and that’s ended up to about 100 servers in each data center that are dedicated purely to Kubernetes. And that’s going to be expanding a lot over the next 12 months, probably too many hundreds if not thousands."

+ While observing teams who began to use Kubernetes for their microservices, "we immediately saw an uptick in the number of microservices being released," Ghods notes. "There was clearly a pent-up demand for a better way of building software through microservices, and the increase in agility helped our developers be more productive and make better architectural choices." +

"There was clearly a pent-up demand for a better way of building software through microservices, and the increase in agility helped our developers be more productive and make better architectural choices."

+ Ghods reflects that as early adopters, Box had a different journey from what companies experience now. "We were definitely lock step with waiting for certain things to stabilize or features to get released," he says. "In the early days we were doing a lot of contributions [to components such as kubectl apply] and waiting for Kubernetes to release each of them, and then we’d upgrade, contribute more, and go back and forth several times. The entire project took about 18 months from our first real deployment on Kubernetes to having general availability. If we did that exact same thing today, it would probably be no more than six."

+ In any case, Box didn’t have to make too many modifications to Kubernetes for it to work for the company. "The vast majority of the work our team has done to implement Kubernetes at Box has been making it work inside of our existing (and often legacy) infrastructure," says Ghods, "such as upgrading our base operating system from RHEL6 to RHEL7 or integrating it into Nagios, our monitoring infrastructure. But overall Kubernetes has been remarkably flexible with fitting into many of our constraints, and we’ve been running it very successfully on our bare metal infrastructure."

+ Perhaps the bigger challenge for Box was a cultural one. "Kubernetes, and cloud native in general, represents a pretty big paradigm shift, and it’s not very incremental," Ghods says. "We’re essentially making this pitch that Kubernetes is going to solve everything because it does things the right way and everything is just suddenly better. But it’s important to keep in mind that it’s not nearly as proven as many other solutions out there. You can’t say how long this or that company took to do it because there just aren’t that many yet. Our team had to really fight for resources because our project was a bit of a moonshot." +
+
+ +
+
+ "The vast majority of the work our team has done to implement Kubernetes at Box has been making it work inside of our existing [and often legacy] infrastructure....overall Kubernetes has been remarkably flexible with fitting into many of our constraints, and we’ve been running it very successfully on our bare metal infrastructure." +
+
+ +
+
+ Having learned from experience, Ghods offers these two pieces of advice for companies going through similar challenges: +

1. Deliver early and often.

Service discovery was a huge problem for Box, and the team had to decide whether to build an interim solution or wait for Kubernetes to natively satisfy Box’s unique requirements. After much debate, "we just started focusing on delivering something that works, and then dealing with potentially migrating to a more native solution later," Ghods says. "The above-all-else target for the team should always be to serve real production use cases on the infrastructure, no matter how trivial. This helps keep the momentum going both for the team itself and for the organizational perception of the project."

+

2. Keep an open mind about what your company has to abstract away from developers and what it doesn’t.

Early on, the team built an abstraction on top of Docker files to help ensure that images had the right security updates. + This turned out to be superfluous work, since container images are considered immutable and you can easily scan them post-build to ensure they do not contain vulnerabilities. Because managing infrastructure through containerization is such a discontinuous leap, it’s better to start by interacting directly with the native tools and learning their unique advantages and caveats. An abstraction should be built only after a practical need for it arises.

+ In the end, the impact has been powerful. "Before Kubernetes," Ghods says, "our infrastructure was so antiquated it was taking us more than six months to deploy a new microservice. Now a new microservice takes less than five days to deploy. And we’re working on getting it to an hour. Granted, much of that six months was due to how broken our systems were, but bare metal is intrinsically a difficult platform to support unless you have a system like Kubernetes to help manage it."

+ By Ghods’s estimate, Box is still several years away from his goal of being a 90-plus percent Kubernetes shop. "We’re very far along on having a mission-critical, stable Kubernetes deployment that provides a lot of value," he says. "Right now about five percent of all of our compute runs on Kubernetes, and I think in the next six months we’ll likely be between 20 to 50 percent. We’re working hard on enabling all stateless service use cases, and shift our focus to stateful services after that." +
+
+ +
+
+ "Ghods predicts that Kubernetes has the opportunity to be the new cloud platform. '...because it’s a never-before-seen level of automation and intelligence surrounding infrastructure that is portable and agnostic to every way you can run your infrastructure.'" +
+
+ +
+
+ In fact, that’s what he envisions across the industry: Ghods predicts that Kubernetes has the opportunity to be the new cloud platform. Kubernetes provides an API consistent across different cloud platforms including bare metal, and "I don’t think people have seen the full potential of what’s possible when you can program against one single interface," he says. "The same way AWS changed infrastructure so that you don’t have to think about servers or cabinets or networking equipment anymore, Kubernetes enables you to focus exclusively on the containers that you’re running, which is pretty exciting. That’s the vision."

+ Ghods points to projects that are already in development or recently released for Kubernetes as a cloud platform: cluster federation, the Dashboard UI, and CoreOS’s etcd operator. "I honestly believe it’s the most exciting thing I’ve seen in cloud infrastructure," he says, "because it’s a never-before-seen level of automation and intelligence surrounding infrastructure that is portable and agnostic to every way you can run your infrastructure."

+ Box, with its early decision to use bare metal, embarked on its Kubernetes journey out of necessity. But Ghods says that even if companies don’t have to be agnostic about cloud providers today, Kubernetes may soon become the industry standard, as more and more tooling and extensions are built around the API.

+ "The same way it doesn’t make sense to deviate from Linux because it’s such a standard," Ghods says, "I think Kubernetes is going down the same path. It is still early days—the documentation still needs work and the user experience for writing and publishing specs to the Kubernetes clusters is still rough. When you’re on the cutting edge you can expect to bleed a little. But the bottom line is, this is where the industry is going. Three to five years from now it’s really going to be shocking if you run your infrastructure any other way." +
+
diff --git a/content/zh/case-studies/box/video.png b/content/zh/case-studies/box/video.png new file mode 100644 index 0000000000000..4c61e7440fc48 Binary files /dev/null and b/content/zh/case-studies/box/video.png differ diff --git a/content/zh/case-studies/capital-one/capitalone_featured_logo.png b/content/zh/case-studies/capital-one/capitalone_featured_logo.png new file mode 100644 index 0000000000000..f57c7697e36fd Binary files /dev/null and b/content/zh/case-studies/capital-one/capitalone_featured_logo.png differ diff --git a/content/zh/case-studies/capital-one/index.html b/content/zh/case-studies/capital-one/index.html new file mode 100644 index 0000000000000..773db4869e4e3 --- /dev/null +++ b/content/zh/case-studies/capital-one/index.html @@ -0,0 +1,96 @@ +--- +title: Capital One Case Study +case_study_styles: true +cid: caseStudies +css: /css/style_case_studies.css +--- + +
+

CASE STUDY:
Supporting Fast Decisioning Applications with Kubernetes + +

+ +
+ +
+ Company  Capital One     Location  McLean, Virginia     Industry  Retail banking +
+ +
+
+
+
+

Challenge

+ The team set out to build a provisioning platform for Capital One applications deployed on AWS that use streaming, big-data decisioning, and machine learning. One of these applications handles millions of transactions a day; some deal with critical functions like fraud detection and credit decisioning. The key considerations: resilience and speed—as well as full rehydration of the cluster from base AMIs. + +
+

Solution

+ The decision to run Kubernetes "is very strategic for us," says John Swift, Senior Director Software Engineering. "We use Kubernetes as a substrate or an operating system, if you will. There’s a degree of affinity in our product development." +
+ +
+ +

Impact

+ "Kubernetes is a significant productivity multiplier," says Lead Software Engineer Keith Gasser, adding that to run the platform without Kubernetes would "easily see our costs triple, quadruple what they are now for the amount of pure AWS expense." Time to market has been improved as well: "Now, a team can come to us and we can have them up and running with a basic decisioning app in a fortnight, which before would have taken a whole quarter, if not longer." Deployments increased by several orders of magnitude. Plus, the rehydration/cluster-rebuild process, which took a significant part of a day to do manually, now takes a couple hours with Kubernetes automation and declarative configuration. + + +
+ +
+
+
+
+

+"With the scalability, the management, the coordination, Kubernetes really empowers us and gives us more time back than we had before." — Jamil Jadallah, Scrum Master +
+
+ +
+
+

+ As a top 10 U.S. retail bank, Capital One has applications that handle millions of transactions a day. Big-data decisioning—for fraud detection, credit approvals and beyond—is core to the business. To support the teams that build applications with those functions for the bank, the cloud team led by Senior Director Software Engineering John Swift embraced Kubernetes for its provisioning platform. "Kubernetes and its entire ecosystem are very strategic for us," says Swift. "We use Kubernetes as a substrate or an operating system, if you will. There’s a degree of affinity in our product development."

+ Almost two years ago, the team embarked on this journey by first working with Docker. Then came Kubernetes. "We wanted to put streaming services into Kubernetes as one feature of the workloads for fast decisioning, and to be able to do batch alongside it," says Lead Software Engineer Keith Gasser. "Once the data is streamed and batched, there are so many tool sets in Flink that we use for decisioning. We want to provide the tools in the same ecosystem, in a consistent way, rather than have a large custom snowflake ecosystem where every tool needs its own custom deployment. Kubernetes gives us the ability to bring all of these together, so the richness of the open source and even the license community dealing with big data can be corralled." + + + +
+
+
+
+ "We want to provide the tools in the same ecosystem, in a consistent way, rather than have a large custom snowflake ecosystem where every tool needs its own custom deployment. Kubernetes gives us the ability to bring all of these together, so the richness of the open source and even the license community dealing with big data can be corralled." + + +
+
+
+
+ In this first year, the impact has already been great. "Time to market is really huge for us," says Gasser. "Especially with fraud, you have to be very nimble in the way you respond to threats in the marketplace—being able to add and push new rules, detect new patterns of behavior, detect anomalies in account and transaction flows." With Kubernetes, "a team can come to us and we can have them up and running with a basic decisioning app in a fortnight, which before would have taken a whole quarter, if not longer. Kubernetes is a manifold productivity multiplier."

+ Teams now have the tools to be autonomous in their deployments, and as a result, deployments have increased by two orders of magnitude. "And that was with just seven dedicated resources, without needing a whole group sitting there watching everything," says Scrum Master Jamil Jadallah. "That’s a huge cost savings. With the scalability, the management, the coordination, Kubernetes really empowers us and gives us more time back than we had before." + +
+
+
+
+ With Kubernetes, "a team can come to us and we can have them up and running with a basic decisioning app in a fortnight, which before would have taken a whole quarter, if not longer. Kubernetes is a manifold productivity multiplier." +
+
+ +
+
+ Kubernetes has also been a great time-saver for Capital One’s required period "rehydration" of clusters from base AMIs. To minimize the attack vulnerability profile for applications in the cloud, "Our entire clusters get rebuilt from scratch periodically, with new fresh instances and virtual server images that are patched with the latest and greatest security patches," says Gasser. This process used to take the better part of a day, and personnel, to do manually. It’s now a quick Kubernetes job.

+ Savings extend to both capital and operating expenses. "It takes very little to get into Kubernetes because it’s all open source," Gasser points out. "We went the DIY route for building our cluster, and we definitely like the flexibility of being able to embrace the latest from the community immediately without waiting for a downstream company to do it. There’s capex related to those licenses that we don’t have to pay for. Moreover, there’s capex savings for us from some of the proprietary software that we get to sunset in our particular domain. So that goes onto our ledger in a positive way as well." (Some of those open source technologies include Prometheus, Fluentd, gRPC, Istio, CNI, and Envoy.) + +
+ +
+
+ "If we had to do all of this without Kubernetes, on underlying cloud services, I could easily see our costs triple, quadruple what they are now for the amount of pure AWS expense. That doesn’t account for personnel to deploy and maintain all the additional infrastructure." +
+
+ +
+ And on the opex side, Gasser says, the savings are high. "We run dozens of services, we have scores of pods, many daemon sets, and since we’re data-driven, we take advantage of EBS-backed volume claims for all of our stateful services. If we had to do all of this without Kubernetes, on underlying cloud services, I could easily see our costs triple, quadruple what they are now for the amount of pure AWS expense. That doesn’t account for personnel to deploy and maintain all the additional infrastructure."

+ The team is confident that the benefits will continue to multiply—without a steep learning curve for the engineers being exposed to the new technology. "As we onboard additional tenants in this ecosystem, I think the need for folks to understand Kubernetes may not necessarily go up. In fact, I think it goes down, and that’s good," says Gasser. "Because that really demonstrates the scalability of the technology. You start to reap the benefits, and they can concentrate on all the features they need to build for great decisioning in the business— fraud decisions, credit decisions—and not have to worry about, ‘Is my AWS server broken? Is my pod not running?’" +
+ +
diff --git a/content/zh/case-studies/cern/cern_featured_logo.png b/content/zh/case-studies/cern/cern_featured_logo.png new file mode 100644 index 0000000000000..b873b828b1d41 Binary files /dev/null and b/content/zh/case-studies/cern/cern_featured_logo.png differ diff --git a/content/zh/case-studies/cern/index.html b/content/zh/case-studies/cern/index.html new file mode 100644 index 0000000000000..9bd797024595e --- /dev/null +++ b/content/zh/case-studies/cern/index.html @@ -0,0 +1,93 @@ +--- +title: CERN Case Study +linkTitle: cern +case_study_styles: true +cid: caseStudies +css: /css/style_case_studies.css +logo: cern_featured_logo.png +--- + +
+

CASE STUDY: CERN
CERN: Processing Petabytes of Data More Efficiently with Kubernetes + +

+ +
+ +
+ Company  CERN     Location  Geneva, Switzerland +     Industry  Particle physics research +
+ +
+
+
+
+

Challenge

+ At CERN, the European Organization for Nuclear Research, physicists conduct experiments to learn about fundamental science. In its particle accelerators, "we accelerate protons to very high energy, close to the speed of light, and we make the two beams of protons collide," says CERN Software Engineer Ricardo Rocha. "The end result is a lot of data that we have to process." CERN currently stores 330 petabytes of data in its data centers, and an upgrade of its accelerators expected in the next few years will drive that number up by 10x. Additionally, the organization experiences extreme peaks in its workloads during periods prior to big conferences, and needs its infrastructure to scale to those peaks. "We want to have a more hybrid infrastructure, where we have our on premise infrastructure but can make use of public clouds temporarily when these peaks come up," says Rocha. "We’ve been looking to new technologies that can help improve our efficiency in our infrastructure so that we can dedicate more of our resources to the actual processing of the data." +

+

Solution

+ CERN’s technology team embraced containerization and cloud native practices, choosing Kubernetes for orchestration, Helm for deployment, Prometheus for monitoring, and CoreDNS for DNS resolution inside the clusters. Kubernetes federation has allowed the organization to run some production workloads both on premise and in public clouds. +

+

Impact

+ "Kubernetes gives us the full automation of the application," says Rocha. "It comes with built-in monitoring and logging for all the applications and the workloads that deploy in Kubernetes. This is a massive simplification of our current deployments." The time to deploy a new cluster for a complex distributed storage system has gone from more than 3 hours to less than 15 minutes. Adding new nodes to a cluster used to take more than an hour; now it takes less than 2 minutes. The time it takes to autoscale replicas for system components has decreased from more than an hour to less than 2 minutes. Initially, virtualization gave 20% overhead, but with tuning this was reduced to ~5%. Moving to Kubernetes on bare metal would get this to 0%. Not having to host virtual machines is expected to also get 10% of memory capacity back. + +
+ +
+
+
+
+ "Kubernetes is something we can relate to very much because it’s naturally distributed. What it gives us is a uniform API across heterogeneous resources to define our workloads. This is something we struggled with a lot in the past when we want to expand our resources outside our infrastructure." +

- Ricardo Rocha, Software Engineer, CERN
+
+
+
+
+

With a mission of researching fundamental science, and a stable of extremely large machines, the European Organization for Nuclear Research (CERN) operates at what can only be described as hyperscale.

+ Experiments are conducted in particle accelerators, the biggest of which is 27 kilometers in circumference. "We accelerate protons to very high energy, to close to the speed of light, and we make the two beams of protons collide in well-defined places," says CERN Software Engineer Ricardo Rocha. "We build experiments around these places where we do the collisions. The end result is a lot of data that we have to process."

+ And he does mean a lot: CERN currently stores and processes 330 petabytes of data—gathered from 4,300 projects and 3,300 users—using 10,000 hypervisors and 320,000 cores in its data centers.

+ Over the years, the CERN technology department has built a large computing infrastructure, based on OpenStack private clouds, to help the organization’s physicists analyze and treat all this data. The organization experiences extreme peaks in its workloads. "Very often, just before conferences, physicists want to do an enormous amount of extra analysis to publish their papers, and we have to scale to these peaks, which means overcommitting resources in some cases," says Rocha. "We want to have a more hybrid infrastructure, where we have our on premise infrastructure but can make use of public clouds temporarily when these peaks come up."

+ Additionally, few years ago, CERN announced that it would be doing a big upgrade of its accelerators, which will mean a ten-fold increase in the amount of data that can be collected. "So we’ve been looking to new technologies that can help improve our efficiency in our infrastructure, so that we can dedicate more of our resources to the actual processing of the data," says Rocha. + +
+
+
+
+ "Before, the tendency was always: ‘I need this, I get a couple of developers, and I implement it.’ Right now it’s ‘I need this, I’m sure other people also need this, so I’ll go and ask around.’ The CNCF is a good source because there’s a very large catalog of applications available. It’s very hard right now to justify developing a new product in-house. There is really no real reason to keep doing that. It’s much easier for us to try it out, and if we see it’s a good solution, we try to reach out to the community and start working with that community."

- Ricardo Rocha, Software Engineer, CERN
+ +
+
+
+
+ Rocha’s team started looking at Kubernetes and containerization in the second half of 2015. "We’ve been using distributed infrastructures for decades now," says Rocha. "Kubernetes is something we can relate to very much because it’s naturally distributed. What it gives us is a uniform API across heterogeneous resources to define our workloads. This is something we struggled with a lot in the past when we want to expand our resources outside our infrastructure."

+ The team created a prototype system for users to deploy their own Kubernetes cluster in CERN’s infrastructure, and spent six months validating the use cases and making sure that Kubernetes integrated with CERN’s internal systems. The main use case is batch workloads, which represent more than 80% of resource usage at CERN. (One single project that does most of the physics data processing and analysis alone consumes 250,000 cores.) "This is something where the investment in simplification of the deployment, logging, and monitoring pays off very quickly," says Rocha. Other use cases include Spark-based data analysis and machine learning to improve physics analysis. "The fact that most of these technologies integrate very well with Kubernetes makes our lives easier," he adds.

+ The system went into production in October 2016, also using Helm for deployment, Prometheus for monitoring, and CoreDNS for DNS resolution within the cluster. "One thing that Kubernetes gives us is the full automation of the application," says Rocha. "So it comes with built-in monitoring and logging for all the applications and the workloads that deploy in Kubernetes. This is a massive simplification of our current deployments." The time to deploy a new cluster for a complex distributed storage system has gone from more than 3 hours to less than 15 minutes.

Adding new nodes to a cluster used to take more than an hour; now it takes less than 2 minutes. The time it takes to autoscale replicas for system components has decreased from more than an hour to less than 2 minutes. + +
+
+
+
+ "With Kubernetes, there’s a well-established technology and a big community that we can contribute to. It allows us to do our physics analysis without having to focus so much on the lower level software. This is just exciting. We are looking forward to keep contributing to the community and collaborating with everyone."

- Ricardo Rocha, Software Engineer, CERN
+
+
+ +
+
+ Rocha points out that the metric used in the particle accelerators may be events per second, but in reality "it’s how fast and how much of the data we can process that actually counts." And efficiency has certainly been improved with Kubernetes. Initially, virtualization gave 20% overhead, but with tuning this was reduced to ~5%. Moving to Kubernetes on bare metal would get this to 0%. Not having to host virtual machines is expected to also get 10% of memory capacity back.

+ Kubernetes federation, which CERN has been using for a portion of its production workloads since February 2018, has allowed the organization to adopt a hybrid cloud strategy. And it was remarkably simple to do. "We had a summer intern working on federation," says Rocha. "For many years, I’ve been developing distributed computing software, which took like a decade and a lot of effort from a lot of people to stabilize and make sure it works. And for our intern, in a couple of days he was able to demo to me and my team that we had a cluster at CERN and a few clusters outside in public clouds that were federated together and that we could submit workloads to. This was shocking for us. It really shows the power of using this kind of well-established technologies."

+ With such results, adoption of Kubernetes has made rapid gains at CERN, and the team is eager to give back to the community. "If we look back into the ’90s and early 2000s, there were not a lot of companies focusing on systems that have to scale to this kind of size, storing petabytes of data, analyzing petabytes of data," says Rocha. "The fact that Kubernetes is supported by such a wide community and different backgrounds, it motivates us to contribute back." + +
+ +
+
+This means that the physicist can build his or her analysis and publish it in a repository, share it with colleagues, and in 10 years redo the same analysis with new data. If we looked back even 10 years, this was just a dream."

- Ricardo Rocha, Software Engineer, CERN
+
+ +
+ These new technologies aren’t just enabling infrastructure improvements. CERN also uses the Kubernetes-based Reana/Recast platform for reusable analysis, which is "the ability to define physics analysis as a set of workflows that are fully containerized in one single entry point," says Rocha. "This means that the physicist can build his or her analysis and publish it in a repository, share it with colleagues, and in 10 years redo the same analysis with new data. If we looked back even 10 years, this was just a dream."

+ All of these things have changed the culture at CERN considerably. A decade ago, "The tendency was always: ‘I need this, I get a couple of developers, and I implement it,’" says Rocha. "Right now it’s ‘I need this, I’m sure other people also need this, so I’ll go and ask around.’ The CNCF is a good source because there’s a very large catalog of applications available. It’s very hard right now to justify developing a new product in-house. There is really no real reason to keep doing that. It’s much easier for us to try it out, and if we see it’s a good solution, we try to reach out to the community and start working with that community." + +
+
diff --git a/content/zh/case-studies/chinaunicom/chinaunicom_featured_logo.png b/content/zh/case-studies/chinaunicom/chinaunicom_featured_logo.png new file mode 100644 index 0000000000000..f90ff1e509c85 Binary files /dev/null and b/content/zh/case-studies/chinaunicom/chinaunicom_featured_logo.png differ diff --git a/content/zh/case-studies/chinaunicom/index.html b/content/zh/case-studies/chinaunicom/index.html new file mode 100644 index 0000000000000..675273c47fa5d --- /dev/null +++ b/content/zh/case-studies/chinaunicom/index.html @@ -0,0 +1,99 @@ +--- +title: China Unicom Case Study + +linkTitle: chinaunicom +case_study_styles: true +cid: caseStudies +css: /css/style_case_studies.css +logo: chinaunicom_featured_logo.png +featured: true +weight: 1 +quote: > + Kubernetes has improved our experience using cloud infrastructure. There is currently no alternative technology that can replace it. +--- + +
+

CASE STUDY:
China Unicom: How China Unicom Leveraged Kubernetes to Boost Efficiency
and Lower IT Costs + +

+ +
+ +
+ Company  China Unicom     Location  Beijing, China     Industry  Telecom +
+ +
+
+
+
+

Challenge

+ China Unicom is one of the top three telecom operators in China, and to serve its 300 million users, the company runs several data centers with thousands of servers in each, using Docker containerization and VMWare and OpenStack infrastructure since 2016. Unfortunately, "the resource utilization rate was relatively low," says Chengyu Zhang, Group Leader of Platform Technology R&D, "and we didn’t have a cloud platform to accommodate our hundreds of applications." Formerly an entirely state-owned company, China Unicom has in recent years taken private investment from BAT (Baidu, Alibaba, Tencent) and JD.com, and is now focusing on internal development using open source technology, rather than commercial products. As such, Zhang’s China Unicom Lab team began looking for open source orchestration for its cloud infrastructure. +

+

Solution

+ Because of its rapid growth and mature open source community, Kubernetes was a natural choice for China Unicom. The company’s Kubernetes-enabled cloud platform now hosts 50 microservices and all new development going forward. "Kubernetes has improved our experience using cloud infrastructure," says Zhang. "There is currently no alternative technology that can replace it." China Unicom also uses Istio for its microservice framework, Envoy, CoreDNS, and Fluentd. +

+

Impact

+ At China Unicom, Kubernetes has improved both operational and development efficiency. Resource utilization has increased by 20-50%, lowering IT infrastructure costs, and deployment time has gone from a couple of hours to 5-10 minutes. "This is mainly because of the self-healing and scalability, so we can increase our efficiency in operation and maintenance," Zhang says. "For example, we currently have only five people maintaining our multiple systems. We could never imagine we can achieve this scalability in such a short time." + +
+ +
+
+
+
+ "Kubernetes has improved our experience using cloud infrastructure. There is currently no alternative technology that can replace it." +

- Chengyu Zhang, Group Leader of Platform Technology R&D, China Unicom
+
+
+
+
+

With more than 300 million users, China Unicom is one of the country’s top three telecom operators.

+ + Behind the scenes, the company runs multiple data centers with thousands of servers in each, using Docker containerization and VMWare and OpenStack infrastructure since 2016. Unfortunately, "the resource utilization rate was relatively low," says Chengyu Zhang, Group Leader of Platform Technology R&D, "and we didn’t have a cloud platform to accommodate our hundreds of applications."

+ Zhang’s team, which is responsible for new technology, R&D and platforms, set out to find an IT management solution. Formerly an entirely state-owned company, China Unicom has in recent years taken private investment from BAT (Baidu, Alibaba, Tencent) and JD.com, and is now focusing on homegrown development using open source technology, rather than commercial products. For that reason, the team began looking for open source orchestration for its cloud infrastructure. + +
+
+
+
+ "We could never imagine we can achieve this scalability in such a short time."

- Chengyu Zhang, Group Leader of Platform Technology R&D, China Unicom
+ +
+
+
+
+ Though China Unicom was already using Mesos for a core telecom operator system, the team felt that Kubernetes was a natural choice for the new cloud platform. "The main reason was that it has a mature community," says Zhang. "It grows very rapidly, and so we can learn a lot from others’ best practices." China Unicom also uses Istio for its microservice framework, Envoy, CoreDNS, and Fluentd.

+ The company’s Kubernetes-enabled cloud platform now hosts 50 microservices and all new development going forward. China Unicom developers can easily leverage the technology through APIs, without doing the development work themselves. The cloud platform provides 20-30 services connected to the company’s data center PaaS platform, as well as supports things such as big data analysis for internal users in the branch offices across the 31 provinces in China.

+ "Kubernetes has improved our experience using cloud infrastructure," says Zhang. "There is currently no alternative technology that can replace it." + +
+
+
+
+ "This technology is relatively complicated, but as long as developers get used to it, they can enjoy all the benefits."

- Jie Jia, Member of Platform Technology R&D, China Unicom
+
+
+ +
+
+ In fact, Kubernetes has boosted both operational and development efficiency at China Unicom. Resource utilization has increased by 20-50%, lowering IT infrastructure costs, and deployment time has gone from a couple of hours to 5-10 minutes. "This is mainly because of the self-healing and scalability of Kubernetes, so we can increase our efficiency in operation and maintenance," Zhang says. "For example, we currently have only five people maintaining our multiple systems."

+ With the wins China Unicom has experienced with Kubernetes, Zhang and his team are eager to give back to the community. That starts with participating in meetups and conferences, and offering advice to other companies that are considering a similar path. "Especially for those companies who have had traditional cloud computing system, I really recommend them to join the cloud native computing community," says Zhang. + +
+ +
+
+"Companies can use the managed services offered by companies like Rancher, because they have already customized this technology, you can easily leverage this technology."

- Jie Jia, Member of Platform Technology R&D, China Unicom
+
+ +
+ Platform Technology R&D team member Jie Jia adds that though "this technology is relatively complicated, as long as developers get used to it, they can enjoy all the benefits." And Zhang points out that in his own experience with virtual machine cloud, "Kubernetes and these cloud native technologies are relatively simpler."

+ Plus, "companies can use the managed services offered by companies like Rancher, because they have already customized this technology," says Jia. "You can easily leverage this technology."

+ Looking ahead, China Unicom plans to develop more applications on Kubernetes, focusing on big data and machine learning. The team is continuing to optimize the cloud platform that it built, and hopes to pass the conformance test to join CNCF’s Certified Kubernetes Conformance Program. They’re also hoping to someday contribute code back to the community.

+ If that sounds ambitious, it’s because the results they’ve gotten from adopting Kubernetes have been beyond even their greatest expectations. Says Zhang: "We could never imagine we can achieve this scalability in such a short time." + +
+
+ + diff --git a/content/zh/case-studies/city-of-montreal/city-of-montreal_featured_logo.png b/content/zh/case-studies/city-of-montreal/city-of-montreal_featured_logo.png new file mode 100644 index 0000000000000..be2af029f0bdc Binary files /dev/null and b/content/zh/case-studies/city-of-montreal/city-of-montreal_featured_logo.png differ diff --git a/content/zh/case-studies/city-of-montreal/index.html b/content/zh/case-studies/city-of-montreal/index.html new file mode 100644 index 0000000000000..151ce44b21691 --- /dev/null +++ b/content/zh/case-studies/city-of-montreal/index.html @@ -0,0 +1,99 @@ +--- +title: City of Montreal Case Study +linkTitle: city-of-montreal +case_study_styles: true +cid: caseStudies +css: /css/style_case_studies.css +featured: false +--- + +
+

CASE STUDY:
City of Montréal - How the City of Montréal Is Modernizing Its 30-Year-Old, Siloed Architecture with Kubernetes + +

+ +
+ +
+ Company  City of Montréal     Location  Montréal, Québec, Canada     Industry  Government +
+ +
+
+
+
+

Challenge

+ Like many governments, Montréal has a number of legacy systems, and “we have systems that are older than some developers working here,” says the city’s CTO, Jean-Martin Thibault. “We have mainframes, all flavors of Windows, various flavors of Linux, old and new Oracle systems, Sun servers, all kinds of databases. Like all big corporations, some of the most important systems, like Budget and Human Resources, were developed on mainframes in-house over the past 30 years.” There are over 1,000 applications in all, and most of them were running on different ecosystems. In 2015, a new management team decided to break down those silos, and invest in IT in order to move toward a more integrated governance for the city. They needed to figure out how to modernize the architecture. + +

Solution

+ The first step was containerization. The team started with a small Docker farm with four or five servers, with Rancher for providing access to the Docker containers and their logs and Jenkins to deploy. “We based our effort on the new trends; we understood the benefits of immutability and deployments without downtime and such things,” says Solutions Architect Marc Khouzam. They soon realized they needed orchestration as well, and opted for Kubernetes. Says Enterprise Architect Morgan Martinet: “Kubernetes offered concepts on how you would describe an architecture for any kind of application, and based on those concepts, deploy what’s required to run the infrastructure. It was becoming a de facto standard.” +
+

Impact

+ The time to market has improved drastically, from many months to a few weeks. Deployments went from months to hours. “In the past, you would have to ask for virtual machines, and that alone could take weeks, easily,” says Thibault. “Now you don’t even have to ask for anything. You just create your project and it gets deployed.” Kubernetes has also improved the efficiency of how the city uses its compute resources: “Before, the 200 application components we currently run on Kubernetes would have required hundreds of virtual machines, and now, if we’re talking about a single environment of production, we are able to run them on 8 machines, counting the masters of Kubernetes,” says Martinet. And it’s all done with a small team of just 5 people operating the Kubernetes clusters. +
+ +
+
+
+
+ "We realized the limitations of having a non-orchestrated Docker environment. Kubernetes came to the rescue, bringing in all these features that make it a lot easier to manage and give a lot more benefits to the users." +

- JEAN-MARTIN THIBAULT, CTO, CITY OF MONTRÉAL
+
+
+
+
+

The second biggest municipality in Canada, Montréal has a large number of legacy systems keeping the government running. And while they don’t quite date back to the city’s founding in 1642, “we have systems that are older than some developers working here,” jokes the city’s CTO, Jean-Martin Thibault.

+ “We have mainframes, all flavors of Windows, various flavors of Linux, old and new Oracle systems, Sun servers, all kinds of databases. Some of the most important systems, like Budget and Human Resources, were developed on mainframes in-house over the past 30 years.” +

+ In recent years, that fact became a big pain point. There are over 1,000 applications in all, running on almost as many different ecosystems. In 2015, a new city management team decided to break down those silos, and invest in IT in order to move toward a more integrated governance. “The organization was siloed, so as a result the architecture was siloed,” says Thibault. “Once we got integrated into one IT team, we decided to redo an overall enterprise architecture.” +

+ The first step to modernize the architecture was containerization. “We based our effort on the new trends; we understood the benefits of immutability and deployments without downtime and such things,” says Solutions Architect Marc Khouzam. The team started with a small Docker farm with four or five servers, with Rancher for providing access to the Docker containers and their logs and Jenkins for deployment. +
+
+
+
+ "Getting a project running in Kubernetes is entirely dependent on how long you need to program the actual software. It’s no longer dependent on deployment. Deployment is so fast that it’s negligible."

- MARC KHOUZAM, SOLUTIONS ARCHITECT, CITY OF MONTRÉAL
+ +
+
+
+
+ But this Docker farm setup had some limitations, including the lack of self-healing and dynamic scaling based on traffic, and the effort required to optimize server resources and scale to multiple instances of the same container. The team soon realized they needed orchestration as well. “Kubernetes came to the rescue,” says Thibault, “bringing in all these features that make it a lot easier to manage and give a lot more benefits to the users.” +

+ The team had evaluated several orchestration solutions, but Kubernetes stood out because it addressed all of the pain points. (They were also inspired by Yahoo! Japan’s use case, which the team members felt came close to their vision.) “Kubernetes offered concepts on how you would describe an architecture for any kind of application, and based on those concepts, deploy what’s required to run the infrastructure,” says Enterprise Architect Morgan Martinet. “It was becoming a de facto standard. It also promised portability across cloud providers. The choice of Kubernetes now gives us many options such as running clusters in-house or in any IaaS provider, or even using Kubernetes-as-a-service in any of the major cloud providers.” +

+ Another important factor in the decision was vendor neutrality. “As a government entity, it is essential for us to be neutral in our selection of products and providers,” says Thibault. “The independence of the Cloud Native Computing Foundation from any company provides this.” +
+
+
+
+ "Kubernetes has been great. It’s been stable, and it provides us with elasticity, resilience, and robustness. While re-architecting for Kubernetes, we also benefited from the monitoring and logging aspects, with centralized logging, Prometheus logging, and Grafana dashboards. We have enhanced visibility of what’s being deployed."

- MORGAN MARTINET, ENTERPRISE ARCHITECT, CITY OF MONTRÉAL
+
+
+ +
+
+ The Kubernetes implementation began with the deployment of a small cluster using an internal Ansible playbook, which was soon replaced by the Kismatic distribution. Given the complexity they saw in operating a Kubernetes platform, they decided to provide development groups with an automated CI/CD solution based on Helm. “An integrated CI/CD solution on Kubernetes standardized how the various development teams designed and deployed their solutions, but allowed them to remain independent,” says Khouzam. +

+ During the re-architecting process, the team also added Prometheus for monitoring and alerting, Fluentd for logging, and Grafana for visualization. “We have enhanced visibility of what’s being deployed,” says Martinet. Adds Khouzam: “The big benefit is we can track anything, even things that don’t run inside the Kubernetes cluster. It’s our way to unify our monitoring effort.” +

+ All together, the cloud native solution has had a positive impact on velocity as well as administrative overhead. With standardization, code generation, automatic deployments into Kubernetes, and standardized monitoring through Prometheus, the time to market has improved drastically, from many months to a few weeks. Deployments went from months and weeks of planning down to hours. “In the past, you would have to ask for virtual machines, and that alone could take weeks to properly provision,” says Thibault. Plus, for dedicated systems, experts often had to be brought in to install them with their own recipes, which could take weeks and months. +

+ Now, says Khouzam, “we can deploy pretty much any application that’s been Dockerized without any help from anybody. Getting a project running in Kubernetes is entirely dependent on how long you need to program the actual software. It’s no longer dependent on deployment. Deployment is so fast that it’s negligible.” + +
+ +
+
+"We’re working with the market when possible, to put pressure on our vendors to support Kubernetes, because it’s a much easier solution to manage"

- MORGAN MARTINET, ENTERPRISE ARCHITECT, CITY OF MONTRÉAL
+
+ +
+ Kubernetes has also improved the efficiency of how the city uses its compute resources: “Before, the 200 application components we currently run in Kubernetes would have required hundreds of virtual machines, and now, if we’re talking about a single environment of production, we are able to run them on 8 machines, counting the masters of Kubernetes,” says Martinet. And it’s all done with a small team of just five people operating the Kubernetes clusters. Adds Martinet: “It’s a dramatic improvement no matter what you measure.” +

+ So it should come as no surprise that the team’s strategy going forward is to target Kubernetes as much as they can. “If something can’t run inside Kubernetes, we’ll wait for it,” says Thibault. That means they haven’t moved any of the city’s Windows systems onto Kubernetes, though it’s something they would like to do. “We’re working with the market when possible, to put pressure on our vendors to support Kubernetes, because it’s a much easier solution to manage,” says Martinet. +

+ Thibault sees a near future where 60% of the city’s workloads are running on a Kubernetes platform—basically any and all of the use cases that they can get to work there. “It’s so much more efficient than the way we used to do things,” he says. “There’s no looking back.” + +
+
diff --git a/content/zh/case-studies/crowdfire/crowdfire_featured_logo.png b/content/zh/case-studies/crowdfire/crowdfire_featured_logo.png new file mode 100644 index 0000000000000..ef84b16ea06c7 Binary files /dev/null and b/content/zh/case-studies/crowdfire/crowdfire_featured_logo.png differ diff --git a/content/zh/case-studies/crowdfire/index.html b/content/zh/case-studies/crowdfire/index.html new file mode 100644 index 0000000000000..227a5c08394bd --- /dev/null +++ b/content/zh/case-studies/crowdfire/index.html @@ -0,0 +1,101 @@ +--- +title: Crowdfire Case Study + +case_study_styles: true +cid: caseStudies +css: /css/style_crowdfire.css +--- + +
+

CASE STUDY:
How to Keep Iterating a Fast-Growing App With a Cloud-Native Approach

+ +
+ +
+ Company  Crowdfire     Location  Mumbai, India     Industry  Social Media Software +
+ +
+
+
+
+

Challenge

+ Crowdfire helps content creators create their content anywhere on the Internet and publish it everywhere else in the right format. Since its launch in 2010, it has grown to 16 million users. The product began as a monolith app running on Google App Engine, and in 2015, the company began a transformation to microservices running on Amazon Web Services Elastic Beanstalk. "It was okay for our use cases initially, but as the number of services, development teams and scale increased, the deploy times, self-healing capabilities and resource utilization started to become problems for us," says Software Engineer Amanpreet Singh, who leads the infrastructure team for Crowdfire.
+

Solution

+ "We realized that we needed a more cloud-native approach to deal with these issues," says Singh. The team decided to implement a custom setup of Kubernetes based on Terraform and Ansible. +
+
+ +
+ +

Impact

+ "Kubernetes has helped us reduce the deployment time from 15 minutes to less than a minute," says Singh. "Due to Kubernetes’s self-healing nature, the operations team doesn’t need to do any manual intervention in case of a node or pod failure." Plus, he says, "Dev-Prod parity has improved since developers can experiment with options in dev/staging clusters, and when it’s finalized, they just commit the config changes in the respective code repositories. These changes automatically get replicated on the production cluster via CI/CD pipelines." +
+ +
+
+
+
+ "In the 15 months that we’ve been using Kubernetes, it has been amazing for us. It enabled us to iterate quickly, increase development speed, and continuously deliver new features and bug fixes to our users, while keeping our operational costs and infrastructure management overhead under control."

- Amanpreet Singh, Software Engineer at Crowdfire
+
+
+
+
+

"If you build it, they will come."

+ For most content creators, only half of that movie quote may ring true. Sure, platforms like Wordpress, YouTube and Shopify have made it simple for almost anyone to start publishing new content online, but attracting an audience isn’t as easy. Crowdfire "helps users publish their content to all possible places where their audience exists," says Amanpreet Singh, a Software Engineer at the company based in Mumbai, India. Crowdfire has gained more than 16 million users—from bloggers and artists to makers and small businesses—since its launch in 2010.

+ With that kind of growth—and a high demand from users for new features and continuous improvements—the Crowdfire team struggled to keep up behind the scenes. In 2015, they moved their monolith Java application to Amazon Web Services Elastic Beanstalk and started breaking it down into microservices.

+ It was a good first step, but the team soon realized they needed to go further down the cloud-native path, which would lead them to Kubernetes. "It was okay for our use cases initially, but as the number of services and development teams increased and we scaled further, deploy times, self-healing capabilities and resource utilization started to become problematic," says Singh, who leads the infrastructure team at Crowdfire. "We realized that we needed a more cloud-native approach to deal with these issues."

+ As he looked around for solutions, Singh had a checklist of what Crowdfire needed. "We wanted to keep some things separate so they could be shipped independent of other things; this would help remove blockers and let different teams work at their own pace," he says. "We also make a lot of data-driven decisions, so shipping a feature and its iterations quickly was a must."

+ Kubernetes checked all the boxes and then some. "One of the best things was the built-in service discovery," he says. "When you have a bunch of microservices that need to call each other, having internal DNS readily available and service IPs and ports automatically set as environment variables help a lot." Plus, he adds, "Kubernetes’s opinionated approach made it easier to get started." + +
+
+
+
+ "We realized that we needed a more cloud-native approach to deal with these issues," says Singh. The team decided to implement a custom setup of Kubernetes based on Terraform and Ansible." +
+
+
+
+ There was another compelling business reason for the cloud-native approach. "In today’s world of ever-changing business requirements, using cloud native technology provides a variety of options to choose from—even the ability to run services in a hybrid cloud environment," says Singh. "Businesses can keep services in a region closest to the users, and thus benefit from high-availability and resiliency."

+ So in February 2016, Singh set up a test Kubernetes cluster using the kube-up scripts provided. "I explored the features and was able to deploy an application pretty easily," he says. "However, it seemed like a black box since I didn’t understand the components completely, and had no idea what the kube-up script did under the hood. So when it broke, it was hard to find the issue and fix it."

+ To get a better understanding, Singh dove into the internals of Kubernetes, reading the docs and even some of the code. And he looked to the Kubernetes community for more insight. "I used to stay up a little late every night (a lot of users were active only when it’s night here in India) and would try to answer questions on the Kubernetes community Slack from users who were getting started," he says. "I would also follow other conversations closely. I must admit I was able to avoid a lot of issues in our setup because I knew others had faced the same issues."

+ Based on the knowledge he gained, Singh decided to implement a custom setup of Kubernetes based on Terraform and Ansible. "I wrote Terraform to launch Kubernetes master and nodes (Auto Scaling Groups) and an Ansible playbook to install the required components," he says. (The company recently switched to using prebaked AMIs to make the node bringup faster, and is planning to change its networking layer.)

+ +
+
+
+
+ "Kubernetes helped us reduce the deployment time from 15 minutes to less than a minute. Due to Kubernetes’s self-healing nature, the operations team doesn’t need to do any manual intervention in case of a node or pod failure." +
+
+ +
+
+ First, the team migrated a few staging services from Elastic Beanstalk to the new Kubernetes staging cluster, and then set up a production cluster a month later to deploy some services. The results were convincing. "By the end of March 2016, we established that all the new services must be deployed on Kubernetes," says Singh. "Kubernetes helped us reduce the deployment time from 15 minutes to less than a minute. Due to Kubernetes’s self-healing nature, the operations team doesn’t need to do any manual intervention in case of a node or pod failure." On top of that, he says, "Dev-Prod parity has improved since developers can experiment with options in dev/staging clusters, and when it’s finalized, they just commit the config changes in the respective code repositories. These changes automatically get replicated on the production cluster via CI/CD pipelines. This brings more visibility into the changes being made, and keeping an audit trail."

+ Over the next six months, the team worked on migrating all the services from Elastic Beanstalk to Kubernetes, except for the few that were deprecated and would soon be terminated anyway. The services were moved one at a time, and their performance was monitored for two to three days each. Today, "We’re completely migrated and we run all new services on Kubernetes," says Singh.

+ The impact has been considerable: With Kubernetes, the company has experienced a 90% cost savings on Elastic Load Balancer, which is now only used for their public, user-facing services. Their EC2 operating expenses have been decreased by as much as 50%.

+ All 30 engineers at Crowdfire were onboarded at once. "I gave an internal talk where I shared the basic components and demoed the usage of kubectl," says Singh. "Everyone was excited and happy about using Kubernetes. Developers have more control and visibility into their applications running in production now. Most of all, they’re happy with the low deploy times and self-healing services."

+ And they’re much more productive, too. "Where we used to do about 5 deployments per day," says Singh, "now we’re doing 30+ production and 50+ staging deployments almost every day." + + +
+
+
+
+ The impact has been considerable: With Kubernetes, the company has experienced a 90% cost savings on Elastic Load Balancer, which is now only used for their public, user-facing services. Their EC2 operating expenses have been decreased by as much as 50%. + +
+
+
+
+ + Singh notes that almost all of the engineers interact with the staging cluster on a daily basis, and that has created a cultural change at Crowdfire. "Developers are more aware of the cloud infrastructure now," he says. "They’ve started following cloud best practices like better health checks, structured logs to stdout [standard output], and config via files or environment variables."

+ With Crowdfire’s commitment to Kubernetes, Singh is looking to expand the company’s cloud-native stack. The team already uses Prometheus for monitoring, and he says he is evaluating Linkerd and Envoy Proxy as a way to "get more metrics about request latencies and failures, and handle them better." Other CNCF projects, including OpenTracing and gRPC are also on his radar.

+ Singh has found that the cloud-native community is growing in India, too, particularly in Bangalore. "A lot of startups and new companies are starting to run their infrastructure on Kubernetes," he says.

+ And when people ask him about Crowdfire’s experience, he has this advice to offer: "Kubernetes is a great piece of technology, but it might not be right for you, especially if you have just one or two services or your app isn’t easy to run in a containerized environment," he says. "Assess your situation and the value that Kubernetes provides before going all in. If you do decide to use Kubernetes, make sure you understand the components that run under the hood and what role they play in smoothly running the cluster. Another thing to consider is if your apps are ‘Kubernetes-ready,’ meaning if they have proper health checks and handle termination signals to shut down gracefully."

+ And if your company fits that profile, go for it. Crowdfire clearly did—and is now reaping the benefits. "In the 15 months that we’ve been using Kubernetes, it has been amazing for us," says Singh. "It enabled us to iterate quickly, increase development speed and continuously deliver new features and bug fixes to our users, while keeping our operational costs and infrastructure management overhead under control." + + +
+
diff --git a/content/zh/case-studies/golfnow/golfnow_featured.png b/content/zh/case-studies/golfnow/golfnow_featured.png new file mode 100644 index 0000000000000..0b99ac3b8f8f8 Binary files /dev/null and b/content/zh/case-studies/golfnow/golfnow_featured.png differ diff --git a/content/zh/case-studies/golfnow/golfnow_logo.png b/content/zh/case-studies/golfnow/golfnow_logo.png new file mode 100644 index 0000000000000..dbeb127b02a27 Binary files /dev/null and b/content/zh/case-studies/golfnow/golfnow_logo.png differ diff --git a/content/zh/case-studies/golfnow/index.html b/content/zh/case-studies/golfnow/index.html new file mode 100644 index 0000000000000..f4bf4d4f278c2 --- /dev/null +++ b/content/zh/case-studies/golfnow/index.html @@ -0,0 +1,125 @@ +--- +title: GolfNow Case Study + +case_study_styles: true +cid: caseStudies +css: /css/style_golfnow.css +--- + +
+

CASE STUDY:
+
Saving Time and Money with Cloud Native Infrastructure
+

+
+ +
+ Company GolfNow     Location Orlando, Florida     Industry Golf Industry Technology and Services Provider +
+ +
+ +
+
+
+ +

Challenge

+ A member of the NBC Sports Group, GolfNow is the golf industry’s technology and services leader, managing 10 different products, as well as the largest e-commerce tee time marketplace in the world. As its business began expanding rapidly and globally, GolfNow’s monolithic application became problematic. "We kept growing our infrastructure vertically rather than horizontally, and the cost of doing business became problematic," says Sheriff Mohamed, GolfNow’s Director, Architecture. "We wanted the ability to more easily expand globally." +
+
+ +
+

Solution

+ Turning to microservices and containerization, GolfNow began moving its applications and databases from third-party services to its own clusters running on Docker and Kubernetes.

+ +

Impact

+ The results were immediate. While maintaining the same capacity—and beyond, during peak periods—GolfNow saw its infrastructure costs for the first application virtually cut in half. +
+
+
+ + +
+
+ "With our growth we obviously needed to expand our infrastructure, and we kept growing vertically rather than horizontally. We were basically wasting money and doubling the cost of our infrastructure."

- SHERIFF MOHAMED, DIRECTOR, ARCHITECTURE AT GOLFNOW +
+
+ +
+
+

It’s not every day that you can say you’ve slashed an operating expense by half.

+ + But Sheriff Mohamed and Josh Chandler did just that when they helped lead their company, GolfNow, on a journey from a monolithic to a containerized, cloud native infrastructure managed by Kubernetes. +

+ A top-performing business within the NBC Sports Group, GolfNow is a technology and services company with the largest tee time marketplace in the world. GolfNow serves 5 million active golfers across 10 different products. In recent years, the business had grown so fast that the infrastructure supporting their giant monolithic application (written in C#.NET and backed by SQL Server database management system) could not keep up. "With our growth we obviously needed to expand our infrastructure, and we kept growing vertically rather than horizontally," says Sheriff, GolfNow’s Director, Architecture. "Our costs were growing exponentially. And on top of that, we had to build a Disaster Recovery (DR) environment, which then meant we’d have to copy exactly what we had in our original data center to another data center that was just the standby. We were basically wasting money and doubling the cost of our infrastructure." +

+ In moving just the first of GolfNow’s important applications—a booking engine for golf courses and B2B marketing platform—from third-party services to their own Kubernetes environment, "our bill went down drastically," says Sheriff. +

+ The path to those stellar results began in late 2014. In order to support GolfNow’s global growth, the team decided that the company needed to have multiple data centers and the ability to quickly and easily re-route traffic as needed. "From there we knew that we needed to go in a direction of breaking things apart, microservices, and containerization," says Sheriff. "At the time we were trying to get away from C#.NET and SQL Server since it didn’t run very well on Linux, where everything container was running smoothly." +

+ To that end, the team shifted to working with Node.js, the open-source, cross-platform JavaScript runtime environment for developing tools and applications, and MongoDB, the open-source database program. At the time, Docker, the platform for deploying applications in containers, was still new. But once the team began experimenting with it, Sheriff says, "we realized that was the way we wanted to go, especially since that’s the way the industry is heading." +
+
+ +
+
+ "The team migrated the rest of the application into their Kubernetes cluster. And the impact was immediate: On top of cutting monthly costs by a large percentage, says Sheriff, 'Running at the same capacity and during our peak time, we were able to horizontally grow. Since we were using our VMs more efficiently with containers, we didn’t have to pay extra money at all.'" +
+
+ +
+
+ GolfNow’s dev team ran an "internal, low-key" proof of concept and were won over. "We really liked how easy it was to be able to pass containers around to each other and have them up and running in no time, exactly the way it was running on my machine," says Sheriff. "Because that is always the biggest gripe that Ops has with developers, right? ‘It worked on my machine!’ But then we started getting to the point of, ‘How do we make sure that these things stay up and running?’"

+ That led the team on a quest to find the right orchestration system for the company’s needs. Sheriff says the first few options they tried were either too heavy or "didn’t feel quite right." In late summer 2015, they discovered the just-released Kubernetes, which Sheriff immediately liked for its ease of use. "We did another proof of concept," he says, "and Kubernetes won because of the fact that the community backing was there, built on top of what Google had already done." +

+ But before they could go with Kubernetes, NBC, GolfNow’s parent company, also asked them to comparison shop with another company. Sheriff and his team liked the competing company’s platform user interface, but didn’t like that its platform would not allow containers to run natively on Docker. With no clear decision in sight, Sheriff’s VP at GolfNow, Steve McElwee, set up a three-month trial during which a GolfNow team (consisting of Sheriff and Josh, who’s now Lead Architect, Open Platforms) would build out a Kubernetes environment, and a large NBC team would build out one with the other company’s platform. +

+ "We spun up the cluster and we tried to get everything to run the way we wanted it to run," Sheriff says. "The biggest thing that we took away from it is that not only did we want our applications to run within Kubernetes and Docker, we also wanted our databases to run there. We literally wanted our entire infrastructure to run within Kubernetes." +

+ At the time there was nothing in the community to help them get Kafka and MongoDB clusters running within a Kubernetes and Docker environment, so Sheriff and Josh figured it out on their own, taking a full month to get it right. "Everything started rolling from there," Sheriff says. "We were able to get all our applications connected, and we finished our side of the proof of concept a month in advance. My VP was like, ‘Alright, it’s over. Kubernetes wins.’" +

+ The next step, beginning in January 2016, was getting everything working in production. The team focused first on one application that was already written in Node.js and MongoDB. A booking engine for golf courses and B2B marketing platform, the application was already going in the microservice direction but wasn’t quite finished yet. At the time, it was running in Heroku Compose and other third-party services—resulting in a large monthly bill. + +
+
+ +
+
+ "'The time I spent actually moving the applications was under 30 seconds! We can move data centers in just incredible amounts of time. If you haven’t come from the Kubernetes world you wouldn’t believe me.' Sheriff puts it in these terms: 'Before Kubernetes I wasn’t sleeping at night, literally. I was woken up all the time, because things were down. After Kubernetes, I’ve been sleeping at night.'" +
+
+ +
+
+ "The goal was to take all of that out and put it within this new platform we’ve created with Kubernetes on Google Compute Engine (GCE)," says Sheriff. "So we ended up building piece by piece, in parallel, what was out in Heroku and Compose, in our Kubernetes cluster. Then, literally, just switched configs in the background. So in Heroku we had the app running hitting a Compose database. We’d take the config, change it and make it hit the database that was running in our cluster." +

+ Using this procedure, they were able to migrate piecemeal, without any downtime. The first migration was done during off hours, but to test the limits, the team migrated the second database in the middle of the day, when lots of users were running the application. "We did it," Sheriff says, "and again it was successful. Nobody noticed." +

+ After three weeks of monitoring to make sure everything was running stable, the team migrated the rest of the application into their Kubernetes cluster. And the impact was immediate: On top of cutting monthly costs by a large percentage, says Sheriff, "Running at the same capacity and during our peak time, we were able to horizontally grow. Since we were using our VMs more efficiently with containers, we didn’t have to pay extra money at all." +

+ Not only were they saving money, but they were also saving time. "I had a meeting this morning about migrating some applications from one cluster to another," says Josh. "I spent about 2 hours explaining the process. The time I spent actually moving the applications was under 30 seconds! We can move data centers in just incredible amounts of time. If you haven’t come from the Kubernetes world you wouldn’t believe me." Sheriff puts it in these terms: "Before Kubernetes I wasn’t sleeping at night, literally. I was woken up all the time, because things were down. After Kubernetes, I’ve been sleeping at night." +

+ A small percentage of the applications on GolfNow have been migrated over to the Kubernetes environment. "Our Core Team is rewriting a lot of the .NET applications into .NET Core [which is compatible with Linux and Docker] so that we can run them within containers," says Sheriff. +

+ Looking ahead, Sheriff and his team want to spend 2017 continuing to build a whole platform around Kubernetes with Drone, an open-source continuous delivery platform, to make it more developer-centric. "Now they’re able to manage configuration, they’re able to manage their deployments and things like that, making all these subteams that are now creating all these microservices, be self sufficient," he says. "So it can pull us away from applications and allow us to just make sure the cluster is running and healthy, and then actually migrate that over to our Ops team." + +
+
+ +
+
+ "Having gone from complete newbies to production-ready in three months, the GolfNow team is eager to encourage other companies to follow their lead. 'This is The Six Million Dollar Man of the cloud right now,' adds Josh. 'Just try it out, watch it happen. I feel like the proof is in the pudding when you look at these kinds of application stacks. They’re faster, they’re more resilient.'" +
+
+ +
+
+ And long-term, Sheriff has an even bigger goal for getting more people into the Kubernetes fold. "We’re actually trying to make this platform generic enough so that any of our sister companies can use it if they wish," he says. "Most definitely I think it can be used as a model. I think the way we migrated into it, the way we built it out, are all ways that I think other companies can learn from, and should not be afraid of." +

+ The GolfNow team is also giving back to the Kubernetes community by open-sourcing a bot framework that Josh built. "We noticed that the dashboard user interface is actually moving a lot faster than when we started," says Sheriff. "However we realized what we needed was something that’s more of a bot that really helps us administer Kubernetes as a whole through Slack." Josh explains: "With the Kubernetes-Slack integration, you can essentially hook into a cluster and the issue commands and edit configurations. We’ve tried to simplify the security configuration as much as possible. We hope this will be our major thank you to Kubernetes, for everything you’ve given us." +

+ Having gone from complete newbies to production-ready in three months, the GolfNow team is eager to encourage other companies to follow their lead. The lessons they’ve learned: "You’ve got to have buy-in from your boss," says Sheriff. "Another big deal is having two to three people dedicated to this type of endeavor. You can’t have people who are half in, half out." And if you don’t have buy-in from the get go, proving it out will get you there. +

+ "This is The Six Million Dollar Man of the cloud right now," adds Josh. "Just try it out, watch it happen. I feel like the proof is in the pudding when you look at these kinds of application stacks. They’re faster, they’re more resilient." + +
+
diff --git a/content/zh/case-studies/haufegroup/haufegroup_featured.png b/content/zh/case-studies/haufegroup/haufegroup_featured.png new file mode 100644 index 0000000000000..08b09ec9db8b7 Binary files /dev/null and b/content/zh/case-studies/haufegroup/haufegroup_featured.png differ diff --git a/content/zh/case-studies/haufegroup/haufegroup_logo.png b/content/zh/case-studies/haufegroup/haufegroup_logo.png new file mode 100644 index 0000000000000..5d8245b0f6d18 Binary files /dev/null and b/content/zh/case-studies/haufegroup/haufegroup_logo.png differ diff --git a/content/zh/case-studies/haufegroup/index.html b/content/zh/case-studies/haufegroup/index.html new file mode 100644 index 0000000000000..f4256ff569b4a --- /dev/null +++ b/content/zh/case-studies/haufegroup/index.html @@ -0,0 +1,112 @@ +--- +title: Haufe Group Case Study + +case_study_styles: true +cid: caseStudies +css: /css/style_haufegroup.css +--- + + +
+

CASE STUDY:
Paving the Way for Cloud Native for Midsize Companies

+ +
+ +
+ Company  Haufe Group     Location  Freiburg, Germany     Industry  Media and Software +
+ +
+ +
+ +
+
+

Challenge

+ Founded in 1930 as a traditional publisher, Haufe Group has grown into a media and software company with 95 percent of its sales from digital products. Over the years, the company has gone from having "hardware in the basement" to outsourcing its infrastructure operations and IT. More recently, the development of new products, from Internet portals for tax experts to personnel training software, has created demands for increased speed, reliability and scalability. "We need to be able to move faster," says Solution Architect Martin Danielsson. "Adapting workloads is something that we really want to be able to do." +
+
+

Solution

+ Haufe Group began its cloud-native journey when Microsoft Azure became available in Europe; the company needed cloud deployments for its desktop apps with bandwidth-heavy download services. "After that, it has been different projects trying out different things," says Danielsson. Two years ago, Holger Reinhardt joined Haufe Group as CTO and rapidly re-oriented the traditional host provider-based approach toward a cloud and API-first strategy. +
+
+ A core part of this strategy was a strong mandate to embrace infrastructure-as-code across the entire software deployment lifecycle via Docker. The company is now getting ready to go live with two services in production using Kubernetes orchestration on Microsoft Azure and Amazon Web Services. The team is also working on breaking up one of their core Java Enterprise desktop products into microservices to allow for better evolvability and dynamic scaling in the cloud. +
+
+

Impact

+ With the ability to adapt workloads, Danielsson says, teams "will be able to scale down to around half the capacity at night, saving 30 percent of the hardware cost." Plus, shorter release times have had a major impact. "Before, we had to announce at least a week in advance when we wanted to do a release because there was a huge checklist of things that you had to do," he says. "By going cloud native, we have the infrastructure in place to be able to automate all of these things. Now we can get a new release done in half an hour instead of days." + +
+
+ +
+ +
+
+ "Over the next couple of years, people won’t even think that much about it when they want to run containers. Kubernetes is going to be the go-to solution."

- Martin Danielsson, Solution Architect, Haufe Group
+
+
+ +
+ +
+

More than 80 years ago, Haufe Group was founded as a traditional publishing company, printing books and commentary on paper.

By the 1990s, though, the company’s leaders recognized that the future was digital, and to their credit, were able to transform Haufe Group into a media and software business that now gets 95 percent of its sales from digital products. "Among the German companies doing this, we were one of the early adopters," says Martin Danielsson, Solution Architect for Haufe Group.

+ And now they’re leading the way for midsize companies embracing cloud-native technology like Kubernetes. "The really big companies like Ticketmaster and Google get it right, and the startups get it right because they’re faster," says Danielsson. "We’re in this big lump of companies in the middle with a lot of legacy, a lot of structure, a lot of culture that does not easily fit the cloud technologies. We’re just 1,500 people, but we have hundreds of customer-facing applications. So we’re doing things that will be relevant for many companies of our size or even smaller."

+ Many of those legacy challenges stemmed from simply following the technology trends of the times. "We used to do full DevOps," he says. In the 1990s and 2000s, "that meant that you had your hardware in the basement. And then 10 years ago, the hype of the moment was to outsource application operations, outsource everything, and strip down your IT department to take away the distraction of all these hardware things. That’s not our area of expertise. We didn’t want to be an infrastructure provider. And now comes the backlash of that."

+ Haufe Group began feeling the pain as they were developing more new products, from Internet portals for tax experts to personnel training software, that have created demands for increased speed, reliability and scalability. "Right now, we have this break in workflows, where we go from writing concepts to developing, handing it over to production and then handing that over to your host provider," he says. "And then when things go bad we have no clue what went wrong. We definitely want to take back control, and we want to move a lot faster. Adapting workloads is something that we really want to be able to do."

+ Those needs led them to explore cloud-native technology. Their first foray into the cloud was doing deployments in Microsoft Azure, once it became available in Europe, for desktop products that had built-in download services. Hosting expenses for such bandwidth-heavy services were too high, so the company turned to the cloud. "After that, it has been different projects trying out different things," says Danielsson. +
+
+ +
+
+ "We have been doing containers for the last two years, and we really got the hang of how they work," says Danielsson. "But it was always for development and test, never in production, because we didn’t fully understand how that would work. And to me, Kubernetes was definitely the technology that solved that." +
+
+ +
+
+ + Two years ago, Holger Reinhardt joined Haufe Group as CTO and rapidly re-oriented the traditional host provider-based approach toward a cloud and API-first strategy. A core part of this strategy was a strong mandate to embrace infrastructure-as-code across the entire software deployment lifecycle via Docker. + Some experiments went further than others; German regulations about sensitive data proved to be a road block in moving some workloads to Azure and Amazon Web Services. "Due to our history, Germany is really strict with things like personally identifiable data," Danielsson says.

+ These experiments took on new life with the arrival of the Azure Sovereign Cloud for Germany (an Azure clone run by the German T-Systems provider). With the availability of Azure.de—which conforms to Germany’s privacy regulations—teams started to seriously consider deploying production loads in Docker into the cloud. "We have been doing containers for the last two years, and we really got the hang of how they work," says Danielsson. "But it was always for development and test, never in production, because we didn’t fully understand how that would work. And to me, Kubernetes was definitely the technology that solved that."

+ In parallel, Danielsson had built an API management system with the aim of supporting CI/CD scenarios, aspects of which were missing in off-the-shelf API management products. With a foundation based on Mashape’s Kong gateway, it is open-sourced as wicked.haufe.io. He put wicked.haufe.io to use with his product team.

Otherwise, Danielsson says his philosophy was "don’t try to reinvent the wheel all the time. Go for what’s there and 99 percent of the time it will be enough. And if you think you really need something custom or additional, think perhaps once or twice again. One of the things that I find so amazing with this cloud-native framework is that everything ties in."

+ Currently, Haufe Group is working on two projects using Kubernetes in production. One is a new mobile application for researching legislation and tax laws. "We needed a way to take out functionality from a legacy core and put an application on top of that with an API gateway—a lot of moving parts that screams containers," says Danielsson. So the team moved the build pipeline away from "deploying to some old, huge machine that you could deploy anything to" and onto a Kubernetes cluster where there would be automatic CI/CD "with feature branches and all these things that were a bit tedious in the past." +
+
+ +
+
+ "Before, we had to announce at least a week in advance when we wanted to do a release because there was a huge checklist of things that you had to do," says Danielsson. "By going cloud native, we have the infrastructure in place to be able to automate all of these things. Now we can get a new release done in half an hour instead of days." +
+
+ +
+
+ It was a proof of concept effort, and the proof was in the pudding. "Everyone was really impressed at what we accomplished in a week," says Danielsson. "We did these kinds of integrations just to make sure that we got a handle on how Kubernetes works. If you can create optimism and buzz around something, it’s half won. And if the developers and project managers know this is working, you’re more or less done." Adds Reinhardt: "You need to create some very visible, quick wins in order to overcome the status quo."

+ The impact on the speed of deployment was clear: "Before, we had to announce at least a week in advance when we wanted to do a release because there was a huge checklist of things that you had to do," says Danielsson. "By going cloud native, we have the infrastructure in place to be able to automate all of these things. Now we can get a new release done in half an hour instead of days."

+ The potential impact on cost was another bonus. "Hosting applications is quite expensive, so moving to the cloud is something that we really want to be able to do," says Danielsson. With the ability to adapt workloads, teams "will be able to scale down to around half the capacity at night, saving 30 percent of the hardware cost."

+ Just as importantly, Danielsson says, there’s added flexibility: "When we try to move or rework applications that are really crucial, it’s often tricky to validate whether the path we want to take is going to work out well. In order to validate that, we would need to reproduce the environment and really do testing, and that’s prohibitively expensive and simply not doable with traditional host providers. Cloud native gives us the ability to do risky changes and validate them in a cost-effective way."

+ As word of the two successful test projects spread throughout the company, interest in Kubernetes has grown. "We want to be able to support our developers in running Kubernetes clusters but we’re not there yet, so we allow them to do it as long as they’re aware that they are on their own," says Danielsson. "So that’s why we are also looking at things like [the managed Kubernetes platform] CoreOS Tectonic, Azure Container Service, ECS, etc. These kinds of services will be a lot more relevant to midsize companies that want to leverage cloud native but don’t have the IT departments or the structure around that."

+ In the next year and a half, Danielsson says the company will be working on moving one of their legacy desktop products, a web app for researching legislation and tax laws originally built in Java Enterprise, onto cloud-native technology. "We’re doing a microservice split out right now so that we can independently deploy the different parts," he says. The main website, which provides free content for customers, is also moving to cloud native. + +
+
+ +
+
+ "the execution of a strategy requires alignment of culture, structure and technology. Only if those three dimensions are aligned can you successfully execute a transformation into microservices and cloud-native architectures. And it is only then that the Cloud will pay the dividends in much faster speeds in product innovation and much lower operational costs." + +
+
+ +
+
+ But with these goals, Danielsson believes there are bigger cultural challenges that need to be constantly addressed. The move to new technology, not to mention a shift toward DevOps, means a lot of change for employees. "The roles were rather fixed in the past," he says. "You had developers, you had project leads, you had testers. And now you get into these really, really important things like test automation. Testers aren’t actually doing click testing anymore, and they have to write automated testing. And if you really want to go full-blown CI/CD, all these little pieces have to work together so that you get the confidence to do a check in, and know this check in is going to land in production, because if I messed up, some test is going to break. This is a really powerful thing because whatever you do, whenever you merge something into the trunk or to the master, this is going live. And that’s where you either get the people or they run away screaming." + Danielsson understands that it may take some people much longer to get used to the new ways.

+ "Culture is nothing that you can force on people," he says. "You have to live it for yourself. You have to evangelize. You have to show the advantages time and time again: This is how you can do it, this is what you get from it." To that end, his team has scheduled daylong workshops for the staff, bringing in outside experts to talk about everything from API to Devops to cloud.

+ For every person who runs away screaming, many others get drawn in. "Get that foot in the door and make them really interested in this stuff," says Danielsson. "Usually it catches on. We have people you never would have expected chanting, ‘Docker Docker Docker’ now. It’s cool to see them realize that there is a world outside of their Python libraries. It’s awesome to see them really work with Kubernetes."

+ Ultimately, Reinhardt says, "the execution of a strategy requires alignment of culture, structure and technology. Only if those three dimensions are aligned can you successfully execute a transformation into microservices and cloud-native architectures. And it is only then that the Cloud will pay the dividends in much faster speeds in product innovation and much lower operational costs." + +
+
diff --git a/content/zh/case-studies/ing/ing_featured_logo.png b/content/zh/case-studies/ing/ing_featured_logo.png new file mode 100644 index 0000000000000..f6d4489715aa4 Binary files /dev/null and b/content/zh/case-studies/ing/ing_featured_logo.png differ diff --git a/content/zh/case-studies/jd-com/jd-com_featured_logo.png b/content/zh/case-studies/jd-com/jd-com_featured_logo.png new file mode 100644 index 0000000000000..e897998429386 Binary files /dev/null and b/content/zh/case-studies/jd-com/jd-com_featured_logo.png differ diff --git a/content/zh/case-studies/naic/naic_featured_logo.png b/content/zh/case-studies/naic/naic_featured_logo.png new file mode 100644 index 0000000000000..f2497114bf40a Binary files /dev/null and b/content/zh/case-studies/naic/naic_featured_logo.png differ diff --git a/content/zh/case-studies/nav/nav_featured_logo.png b/content/zh/case-studies/nav/nav_featured_logo.png new file mode 100644 index 0000000000000..22d96017c432a Binary files /dev/null and b/content/zh/case-studies/nav/nav_featured_logo.png differ diff --git a/content/zh/case-studies/nerdalize/nerdalize_featured_logo.png b/content/zh/case-studies/nerdalize/nerdalize_featured_logo.png new file mode 100644 index 0000000000000..eb959b8ecfa1f Binary files /dev/null and b/content/zh/case-studies/nerdalize/nerdalize_featured_logo.png differ diff --git a/content/zh/case-studies/newyorktimes/newyorktimes_featured.png b/content/zh/case-studies/newyorktimes/newyorktimes_featured.png new file mode 100644 index 0000000000000..fad0927883a93 Binary files /dev/null and b/content/zh/case-studies/newyorktimes/newyorktimes_featured.png differ diff --git a/content/zh/case-studies/newyorktimes/newyorktimes_logo.png b/content/zh/case-studies/newyorktimes/newyorktimes_logo.png new file mode 100644 index 0000000000000..693a742c3ebcb Binary files /dev/null and b/content/zh/case-studies/newyorktimes/newyorktimes_logo.png differ diff --git a/content/zh/case-studies/nokia/nokia_featured_logo.png b/content/zh/case-studies/nokia/nokia_featured_logo.png new file mode 100644 index 0000000000000..8e046f021f447 Binary files /dev/null and b/content/zh/case-studies/nokia/nokia_featured_logo.png differ diff --git a/content/zh/case-studies/northwestern-mutual/northwestern_featured_logo.png b/content/zh/case-studies/northwestern-mutual/northwestern_featured_logo.png new file mode 100644 index 0000000000000..7c1422f32b86d Binary files /dev/null and b/content/zh/case-studies/northwestern-mutual/northwestern_featured_logo.png differ diff --git a/content/zh/case-studies/ocado/ocado_featured_logo.png b/content/zh/case-studies/ocado/ocado_featured_logo.png new file mode 100644 index 0000000000000..0c2ef19ec3b03 Binary files /dev/null and b/content/zh/case-studies/ocado/ocado_featured_logo.png differ diff --git a/content/zh/case-studies/openAI/openai_featured.png b/content/zh/case-studies/openAI/openai_featured.png new file mode 100644 index 0000000000000..b2b667c0bb13d Binary files /dev/null and b/content/zh/case-studies/openAI/openai_featured.png differ diff --git a/content/zh/case-studies/openAI/openai_logo.png b/content/zh/case-studies/openAI/openai_logo.png new file mode 100644 index 0000000000000..a85a81ea063d0 Binary files /dev/null and b/content/zh/case-studies/openAI/openai_logo.png differ diff --git a/content/zh/case-studies/peardeck/peardeck_featured.png b/content/zh/case-studies/peardeck/peardeck_featured.png new file mode 100644 index 0000000000000..ce87ee2d47f7b Binary files /dev/null and b/content/zh/case-studies/peardeck/peardeck_featured.png differ diff --git a/content/zh/case-studies/peardeck/peardeck_logo.png b/content/zh/case-studies/peardeck/peardeck_logo.png new file mode 100644 index 0000000000000..c1b9772ec45a0 Binary files /dev/null and b/content/zh/case-studies/peardeck/peardeck_logo.png differ diff --git a/content/zh/case-studies/pearson/pearson_featured.png b/content/zh/case-studies/pearson/pearson_featured.png new file mode 100644 index 0000000000000..6f8ffec49e6ef Binary files /dev/null and b/content/zh/case-studies/pearson/pearson_featured.png differ diff --git a/content/zh/case-studies/pearson/pearson_logo.png b/content/zh/case-studies/pearson/pearson_logo.png new file mode 100644 index 0000000000000..57e586f3ebd32 Binary files /dev/null and b/content/zh/case-studies/pearson/pearson_logo.png differ diff --git a/content/zh/case-studies/pingcap/pingcap_featured_logo.png b/content/zh/case-studies/pingcap/pingcap_featured_logo.png new file mode 100644 index 0000000000000..8b57f417ae813 Binary files /dev/null and b/content/zh/case-studies/pingcap/pingcap_featured_logo.png differ diff --git a/content/zh/case-studies/pinterest/pinterest_feature.png b/content/zh/case-studies/pinterest/pinterest_feature.png new file mode 100644 index 0000000000000..ea5d625789468 Binary files /dev/null and b/content/zh/case-studies/pinterest/pinterest_feature.png differ diff --git a/content/zh/case-studies/pinterest/pinterest_logo.png b/content/zh/case-studies/pinterest/pinterest_logo.png new file mode 100644 index 0000000000000..0f744e7828cc2 Binary files /dev/null and b/content/zh/case-studies/pinterest/pinterest_logo.png differ diff --git a/content/zh/case-studies/prowise/prowise_featured_logo.png b/content/zh/case-studies/prowise/prowise_featured_logo.png new file mode 100644 index 0000000000000..e6dc1a35ec238 Binary files /dev/null and b/content/zh/case-studies/prowise/prowise_featured_logo.png differ diff --git a/content/zh/case-studies/ricardo-ch/ricardo-ch_featured_logo.png b/content/zh/case-studies/ricardo-ch/ricardo-ch_featured_logo.png new file mode 100644 index 0000000000000..c462c7ba565c2 Binary files /dev/null and b/content/zh/case-studies/ricardo-ch/ricardo-ch_featured_logo.png differ diff --git a/content/zh/case-studies/slamtec/slamtec_featured_logo.png b/content/zh/case-studies/slamtec/slamtec_featured_logo.png new file mode 100644 index 0000000000000..598db9fe43a94 Binary files /dev/null and b/content/zh/case-studies/slamtec/slamtec_featured_logo.png differ diff --git a/content/zh/case-studies/slingtv/slingtv_featured_logo.png b/content/zh/case-studies/slingtv/slingtv_featured_logo.png new file mode 100644 index 0000000000000..b52143ee8b6c6 Binary files /dev/null and b/content/zh/case-studies/slingtv/slingtv_featured_logo.png differ diff --git a/content/zh/case-studies/sos/sos_featured_logo.png b/content/zh/case-studies/sos/sos_featured_logo.png new file mode 100644 index 0000000000000..a97671af6d8f5 Binary files /dev/null and b/content/zh/case-studies/sos/sos_featured_logo.png differ diff --git a/content/zh/case-studies/spotify/spotify_featured_logo.png b/content/zh/case-studies/spotify/spotify_featured_logo.png new file mode 100644 index 0000000000000..def15c51bfc14 Binary files /dev/null and b/content/zh/case-studies/spotify/spotify_featured_logo.png differ diff --git a/content/zh/case-studies/thredup/thredup_featured_logo.png b/content/zh/case-studies/thredup/thredup_featured_logo.png new file mode 100644 index 0000000000000..3961f761b1f4c Binary files /dev/null and b/content/zh/case-studies/thredup/thredup_featured_logo.png differ diff --git a/content/zh/case-studies/vsco/vsco_featured_logo.png b/content/zh/case-studies/vsco/vsco_featured_logo.png new file mode 100644 index 0000000000000..e01e2e4e8f0ed Binary files /dev/null and b/content/zh/case-studies/vsco/vsco_featured_logo.png differ diff --git a/content/zh/case-studies/woorank/woorank_featured_logo.png b/content/zh/case-studies/woorank/woorank_featured_logo.png new file mode 100644 index 0000000000000..f7d6ed300f186 Binary files /dev/null and b/content/zh/case-studies/woorank/woorank_featured_logo.png differ diff --git a/content/zh/docs/.gitkeep b/content/zh/docs/.gitkeep deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/content/zh/docs/admin/authorization/_index.md b/content/zh/docs/admin/authorization/_index.md deleted file mode 100644 index 7941c2705f393..0000000000000 --- a/content/zh/docs/admin/authorization/_index.md +++ /dev/null @@ -1,155 +0,0 @@ ---- -approvers: -- erictune -- lavalamp -- deads2k -- liggitt -title: 概述 -content_template: templates/concept ---- - -{{% capture overview %}} - -学习有关 Kubernetes 授权的更多信息,包括有关使用支持的授权模块创建策略的详细信息。 - -{{% /capture %}} - -{{% capture body %}} - -在 Kubernetes 里,您必须经过身份验证 ( 登录 ),才能授权您的请求 ( 授予访问权限 ).。有关认证的信息,请参阅[访问控制概述](/docs/admin/access-the-api/)。 - -Kubernetes 提供通用的 REST API 请求。这意味着 Kubernetes 授权可以与现有的组织或云提供商的访问控制系统一起使用,该系统可以处理除 Kubernetes API 之外的其他 API。 - -## 确定请求是允许还是被拒绝 -Kubernetes 使用 API ​​ 服务器授权 API 请求。它根据所有策略评估所有请求属性,并允许或拒绝请求。某些策略必须允许 API 请求的所有部分继续进行,这意味着默认情况下是拒绝权限。 - -( 虽然 Kubernetes 使用 API ​​服务器,访问控制和依赖特定类型对象的特定领域策略由 Admission 控制器处理。) - -当配置多个授权模块时,按顺序检查每个模块,如果有任何模块授权请求,则可以继续执行该请求。如果所有模块拒绝请求,则拒绝该请求 (HTTP 状态代码 403)。 - -## 查看您的请求属性 - -Kubernetes 仅查看以下 API 请求属性 : - -* **user** - 验证期间提供的 `user` 字符串 -* **group** - 认证用户所属的组名列表 -* **extra** - 由认证层提供的任意字符串键到字符串值的映射 -* **API** - 指示请求是否用于 API 资源 -* **Request path** - 诸如 `/api` 或 `/healthz` 的其他非资源端点的路径 ( 请参阅[kubectl](#kubectl)). -* **API request verb** - API 动词 `get`,`list`,`create`,`update`,`patch`,`watch`,`proxy`,`redirect`,`delete` 和 `deletecollection` 用于资源请求。要确定资源 API 端点的请求动词,请参阅**确定下面的请求动词**. -* **HTTP request verb** - HTTP 动词 `get`,`post`,`put` 和 `delete` 用于非资源请求 -* **Resource** - 正在访问的资源的 ID 或名称 ( 仅适用于资源请求 ),对于使用 `get`, `update`, `patch`, 和 `delete` 动词的资源请求,您必须提供资源名称。 -* **Subresource** - 正在访问的子资源 ( 仅用于资源请求 ) -* **Namespace** - 正在被访问的对象的命名空间 ( 仅针对命名空间的资源请求 ) -* **API group** - 正在访问的 API 组 ( 仅用于资源请求 ). 一个空字符串指定[核心 API 组](/docs/api/). - -## 确定请求动词 - -要确定资源 API 端点的请求动词,请查看所使用的 HTTP 动词以及请求是否对单个资源或资源集合进行操作 : - -HTTP 动词 | 请求动词 ----------- | --------------- -POST | 创建 -GET,HEAD | 获取 ( 个人资源 ),列表 ( 集合 ) -PUT | 更新 -PATCH | 补丁 -DELETE| 删除 ( 个人资源 ),删除 ( 收藏 ) - -Kubernetes 有时会使用专门的动词检查授权以获得额外的权限。例如 : - -* [PodSecurityPolicy](/docs/concepts/policy/pod-security-policy/) 在 `extensions` API 组中的 `podsecuritypolicies` 资源上检查 `use` 动词的授权。 -* [RBAC](/docs/admin/authorization/rbac/#privilege-escalation-prevention-and-bootstrapping) 在 `rbac.authorization.k8s.io` API 组中的 `roles` 和 `clusterroles` 资源上检查 `bind` 动词的授权。 -* [认证](/docs/admin/authentication/) 在核心 API 组中的 `users`,`groups` 和 `serviceaccounts` 上的 `impersonate` 动词的授权以及 `authentication.k8s.io` API 组中的 `userextras` 进行层次检查。 - -## 授权模块 -* **ABAC 模式** - 基于属性的访问控制 (ABAC) 定义了访问控制范例,通过使用将属性组合在一起的策略来授予用户访问权限。策略可以使用任何类型的属性 ( 用户属性,资源属性,对象,环境属性等 )。要了解有关使用 ABAC 模式的更多信息,请参阅 [ABAC 模式](/docs/admin/authorization/abac/) -* **RBAC 模式** - 基于角色的访问控制 (RBAC) 是一种根据企业内个人用户的角色来调整对计算机或网络资源的访问的方法。在这种情况下,访问是单个用户执行特定任务 ( 例如查看,创建或修改文件 ) 的能力。要了解有关使用 RBAC 模式的更多信息,请参阅 [RBAC 模式](/docs/admin/authorization/rbac/) -*当指定 "RBAC"( 基于角色的访问控制 ) 使用 "rbac.authorization.k8s.io" API 组来驱动授权决定时,允许管理员通过 Kubernetes API 动态配置权限策略 . -.. *截至 1.6 RBAC 模式是测试版 . -.. *要启用 RBAC,请使用 `--authorization-mode=RBAC` 启动 apiserver. -* **Webhook 模式** - WebHook 是 HTTP 回调 : 发生事件时发生的 HTTP POST; 通过 HTTP POST 简单的事件通知 . 实施 WebHooks 的 Web 应用程序将在某些事情发生时向 URL 发送消息 . 要了解有关使用 Webhook 模式的更多信息,请参阅[Webhook 模式](/docs/admin/authorization/webhook/) -* **自定义模块** - 您可以创建使用 Kubernetes 的自定义模块 . 要了解更多信息,请参阅下面的**自定义模块**。 - -### 自定义模块 -可以相当容易地开发其他实现 ,APIserver 调用 Authorizer 接口: - -```go -type Authorizer interface { - Authorize(a Attributes) error -} -``` - -以确定是否允许每个 API 操作 . - -授权插件是实现此接口的模块 . 授权插件代码位于 `pkg/auth/authorizer/$MODULENAME` 中。 - -授权模块可以完全实现,也可以拨出远程授权服务。 授权模块可以实现自己的缓存,以减少具有相同或相似参数的重复授权调用的成本。 开发人员应该考虑缓存和撤销权限之间的交互。 - -#### 检查 API 访问 - -Kubernetes 将 `subjectaccessreviews.v1.authorization.k8s.io` 资源公开为允许外部访问 API 授权者决策的普通资源。 无论您选择使用哪个授权器,您都可以使用 `SubjectAccessReview` 发出一个 `POST`,就像 webhook 授权器的 `apis/authorization.k8s.io/v1/subjectaccessreviews` 端点一样,并回复一个响应。 例如: - - -```bash -kubectl create --v=8 -f - << __EOF__ -{ - "apiVersion": "authorization.k8s.io/v1", - "kind": "SubjectAccessReview", - "spec": { - "resourceAttributes": { - "namespace": "kittensandponies", - "verb": "get", - "group": "unicorn.example.org", - "resource": "pods" - }, - "user": "jane", - "group": [ - "group1", - "group2" - ], - "extra": { - "scopes": [ - "openid", - "profile" - ] - } - } -} -__EOF__ - ---- snip lots of output --- - -I0913 08:12:31.362873 27425 request.go:908] Response Body: {"kind":"SubjectAccessReview","apiVersion":"authorization.k8s.io/v1","metadata":{"creationTimestamp":null},"spec":{"resourceAttributes":{"namespace":"kittensandponies","verb":"GET","group":"unicorn.example.org","resource":"pods"},"user":"jane","group":["group1","group2"],"extra":{"scopes":["openid","profile"]}},"status":{"allowed":true}} -subjectaccessreview "" created -``` - -这对于调试访问问题非常有用,因为您可以使用此资源来确定授权者授予哪些访问权限。 - -## 为您的授权模块使用标志 - -您的策略中必须包含一个标志,以指出您的策略包含哪个授权模块 : - -可以使用以下标志 : - - `--authorization-mode=ABAC` 基于属性的访问控制 (ABAC) 模式允许您使用本地文件配置策略。 - - `--authorization-mode=RBAC` 基于角色的访问控制 (RBAC) 模式允许您使用 Kubernetes API 创建和存储策略 . - - `--authorization-mode=Webhook` WebHook 是一种 HTTP 回调模式,允许您使用远程 REST 管理授权。 - - `--authorization-mode=AlwaysDeny` 此标志阻止所有请求 . 仅使用此标志进行测试。 - - `--authorization-mode=AlwaysAllow` 此标志允许所有请求 . 只有在您不需要 API 请求授权的情况下才能使用此标志。 - -您可以选择多个授权模块,如果其中一种模式为 `AlwaysAllow`,则覆盖其他模式,并允许所有 API 请求。 - -## 版本控制 - -对于版本 1.2,配置了 kube-up.sh 创建的集群,以便任何请求都不需要授权。 - -从版本 1.3 开始,配置由 kube-up.sh 创建的集群,使得 ABAC 授权模块处于启用状态。但是,其输入文件最初设置为允许所有用户执行所有操作,集群管理员需要编辑该文件,或者配置不同的授权器来限制用户可以执行的操作。 - -{{% /capture %}} -{{% capture whatsnext %}} - -* 要学习有关身份验证的更多信息,请参阅**身份验证**[控制访问 Kubernetes API](docs/admin/access-the-api/)。 -* 要了解有关入学管理的更多信息,请参阅[使用 Admission 控制器](docs/admin/admission-controllers/)。 -* -{{% /capture %}} - - diff --git a/content/zh/docs/admin/high-availability/_index.md b/content/zh/docs/admin/high-availability/_index.md deleted file mode 100644 index 4d854b8daffba..0000000000000 --- a/content/zh/docs/admin/high-availability/_index.md +++ /dev/null @@ -1,214 +0,0 @@ ---- -title: 构建高可用集群 ---- - - -## 简介 - - -本文描述了如何构建一个高可用(high-availability, HA)的 Kubernetes 集群。这是一个非常高级的主题。 - -对于仅希望使用 Kubernetes 进行试验的用户,推荐使用更简单的配置工具进行搭建,例如: -[Minikube](/docs/getting-started-guides/minikube/),或者尝试使用[Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine/) 来运行 Kubernetes。 - -此外,当前在我们的端到端(e2e)测试环境中,没有对 Kubernetes 高可用的支持进行连续测试。我们将会增加这个连续测试项,但当前对单节点 master 的安装测试得更加严格。 - -{{< toc >}} - -## 概览 - - -搭建一个正真可靠,高度可用的分布式系统需要若干步骤。这类似于穿上内衣,裤子,皮带,背带,另一套内衣和另一套裤子。我们会详细介绍每一个步骤,但先在这里给出一个总结来帮助指导用户。 - - -相关步骤如下: - - * [创建可靠的组成节点,共同形成我们的高可用主节点实现。](# 可靠的节点 ) - * [使用 etcd 集群,搭建一个冗余的,可靠的存储层。](# 建立一个冗余的,可靠的存储层 ) - * [启动具有备份和负载均衡能力的 Kubernetes API 服务](# 复制的 API 服务 ) - * [搭建运行 master 选举的 Kubernetes scheduler 和 controller-manager 守护程序](# 进行 master 选举的组件 ) - -系统完成时看起来应该像这样: - -![High availability Kubernetes diagram](/images/docs/ha.svg) - - -## 初始配置 - - -本文假设你正在搭建一个 3 节点的主节点集群,每个节点上都运行者某种 Linux 系统。 - -指南中的示例使用 Debian 发行版,但它们应该可以被轻松移植到其他发行版上。 - -同样的,不管在公有云还是私有云亦或是裸机上,这个配置都应该可以运行。 - - -从一个现成的单主节点集群开始是实现一个高可用 Kubernetes 集群的最简单的方法。这篇指导 [https://get.k8s.io](https://get.k8s.io) 描述了在多种平台上方便的安装一个单主节点集群的方法。 - -## 可靠的节点 - - -我们在每个主节点上都将运行数个实现 Kubernetes API 的进程。使他们可靠的第一步是保证在发生故障时,每一个进程都可以自动重启。为了实现这个目标,我们需要安装一个进程监视器。我们选择了在每个工作者节点上都会运行的 `kubelet` 进程。这会带来便利性,因为我们使用了容器来分发我们的二进制文件,所以我们能够为每一个守护程序建立资源限制并省查它们的资源消耗。当然,我们也需要一些手段来监控 kubelete 本身(在此监测监控者本身是一个有趣的话题)。对于 Debian 系统我们选择了 monit,但也有许多可替代的工具。例如在基于 systemd 的系统上(如 RHEL, CentOS),你可以运行 'systemctl enable kubelet'。 - - -如果你是从标准的 Kubernetes 安装扩展而来,那么 `kubelet` 二进制文件应该已经存在于你的系统中。你可以运行 `which kubelet` 来判断是否确实安装了这个二进制文件。如果没有安装的话,你应该手动安装 [kubelet binary](https://storage.googleapis.com/kubernetes-release/release/v0.19.3/bin/linux/amd64/kubelet), -[kubelet init file](http://releases.k8s.io/{{< param "githubbranch" >}}/cluster/saltbase/salt/kubelet/initd) 和 [default-kubelet](/docs/admin/high-availability/default-kubelet) 脚本。 - -如果使用 monit,你还需要安装 monit 守护程序(`apt-get install monit`)以及[monit-kubelet](/docs/admin/high-availability/monit-kubelet) 和 -[monit-docker](/docs/admin/high-availability/monit-docker) 配置。 - -在使用 systemd 的系统上,你可以执行 `systemctl enable kubelet` 和 `systemctl enable docker`。 - - -## 建立一个冗余的,可靠的存储层 - - -高可用方案的中心基础是一个冗余的,可靠的存储层。高可用的头条规则是保护数据。不管发生了什么,不管什么着了火,只要还有数据,你就可以重建。如果丢掉了数据,你就完了。 - - -集群化的 etcd 已经把你存储的数据复制到了你集群中的所有主节点实例上。这意味着如果要想丢失数据,三个节点的物理(或虚拟)硬盘需要全部同时故障。这种情况发生的概率是比较低的,所以对于许多人来说,运行一个复制的 etcd 集群可能已经足够的可靠了。你可以将集群数量从 3 个增大到 5 个来增加集群的可靠性。如果那样还不够,你可以添加[更多的可靠性到你的存储层](# 更加可靠的存储 )。 - - -### 集群化 etcd - - -集群化 etcd 的完整细节超出了本文范围,你可以在[etcd clustering page](https://github.com/coreos/etcd/blob/master/Documentation/op-guide/clustering.md) 找到许多详细内容。这个例子仅走查一个简单的集群建立过程,使用 etcd 内置的发现功能来构建我们的集群。 - - -首先,调用 etcd 发现服务来创建一个新令牌 : - -```shell -curl https://discovery.etcd.io/new?size=3 -``` - - -在每个节点上,拷贝 [etcd.yaml](/docs/admin/high-availability/etcd.yaml) 文件到 `/etc/kubernetes/manifests/etcd.yaml`。 - - -每个节点上的 kubelet 会动态的监控这个文件夹的内容,并且会按照 `etcd.yaml` 里对 pod 的定义创建一个 `etcd` 服务的实例。 - - -请注意,你应该使用上文中获取的令牌 URL 替换全部三个节点上 `etcd.yaml` 中的 `${DISCOVERY_TOKEN}` 项。同时还应该将每个节点上的 `${NODE_NAME}` 替换为一个不同的名字(例如:`node-1`),并将 `${NODE_IP}` 替换为正确的 IP 地址。 - - -#### 验证你的集群 - - -如果已经将这个文件拷贝到所有三个节点,你应该已经搭建起了一个集群化的 etcd。你可以在主节点上进行验证: -```shell -kubectl exec < pod_name > etcdctl member list -``` - -和 - -```shell -kubectl exec < pod_name > etcdctl cluster-health -``` - - -你也可以在一个节点上运行 `etcdctl set foo bar`,在另一个节点上运行 `etcdctl get foo` 来验证集群是否工作正常。 - - -### 更加可靠的存储 - - -当然,如果你对增加数据的可靠性感兴趣,这里还有一些更深入的选项可以使 etcd 把它的数据存放在比常规硬盘更可靠的地方(裤带和背带,ftw!)。 - - -如果你使用云服务,那么你的提供商通常会为你提供这个特性,例如 Google Cloud Platform 上的 [Persistent Disk](https://cloud.google.com/compute/docs/disks/persistent-disks) 。它们是可以挂载到你的虚拟机中的块设备持久化存储。其他的云服务提供商提供了类似的解决方案。 - - -如果运行于物理机之上,你仍然可以使用 iSCSI 或者 NFS 接口通过网络来连接冗余存储。 -此外,你还可以运行一个集群文件系统,比如 Gluster 或者 Ceph。最后,你还可以在你的每个物理机器上运行 RAID 矩阵。 - - -不管你选择如何实现,如果已经选择了使用其中的一个选项,那么你应该保证你的存储被挂载到了每一台机器上。如果你的存储在集群中的三个主节点之间共享,那么你应该在存储上为每一个节点创建一个不同的文件夹。对于所有的这些指导,我们都假设这个存储被挂载到你机器上的 `/var/etcd/data` 路径。 - - -## 复制的 API 服务 - - -在正确搭建复制的 etcd 之后,我们还需要使用 kubelet 安装 apiserver。 - - - - -首先,你需要创建初始的日志文件,这样 Docker 才会挂载一个文件而不是一个文件夹: -```shell -touch /var/log/kube-apiserver.log -``` - -接下来,你需要在每个节点上创建一个 `/srv/kubernetes/` 文件夹。这个文件夹包含: - - * basic_auth.csv - 基本认证的用户名和密码 - * ca.crt - CA 证书 - * known_tokens.csv - 实体(例如 kubelet)用来和 apiserver 通信的令牌 - * kubecfg.crt - 客户端证书,公钥 - * kubecfg.key - 客户端证书,私钥 - * server.cert - 服务端证书,公钥 - * server.key - 服务端证书,私钥 - - -创建这个文件夹最简单的方法可以是从一个工作正常的集群的主节点拷贝,或者你也可以手动生成它们。 - - -### 启动 API 服务 - - -一旦这些文件已经存在了,拷贝 [kube-apiserver.yaml](/docs/admin/high-availability/kube-apiserver.yaml) 到每个主节点的 `/etc/kubernetes/manifests/` 文件夹。 - - -kubelet 会监控这个文件夹,并且会按照文件里对 pod 的定义创建一个 `kube-apiserver` 容器。 - - -### 负载均衡 - - -现在,你应该有 3 个全部正常工作的 apiserver 了。如果搭建了网络负载均衡器,你应该能够通过那个负载均衡器访问你的集群,并且看到负载在 apiserver 实例间分发。设置负载均衡器依赖于你的平台的实际情况,例如对于 Google Cloud Platform 的指导可以在[这里](https://cloud.google.com/compute/docs/load-balancing/) 找到。 - - -请注意,如果使用了身份认证,你可能需要重新生成你的证书,除每个节点的 IP 地址外额外包含负载均衡器的 IP 地址。 - - -对于部署在集群中的 pods, `kubernetes` 服务 /dns 名称应该自动的为主节点提供了负载均衡的 endpoint。 - - -对于使用 API 的外部用户(如命令行运行的 `kubectl`,持续集成管道或其他客户端)你会希望将他们配置成为访问外部负载均衡器的地址。 - - -## 进行 Master 选举的组件 - - -到目前为止,我们已经搭建了状态存储,也搭建好了 API 服务,但我们还没有运行任何真正改变集群状态的服务,比如 controller manager 和 scheduler。为了可靠的实现这个目标,我们希望在同一时间只有一个参与者在修改集群状态。但是我们希望复制这些参与者的实例以防某个机器宕机。要做到这一点,我们打算在 API 中使用一个 lease-lock 来执行 master 选举。我们会对每一个 scheduler 和 controller-manager 使用 `--leader-elect` 标志,从而在 API 中使用一个租约来保证同一时间只有一个 scheduler 和 controller-manager 的实例正在运行。 - - -scheduler 和 controller-manager 可以配置为只和位于它们相同节点(即 127.0.0.1)上的 API 服务通信,也可以配置为使用 API 服务的负载均衡器的 IP 地址。不管它们如何配置,当使用 `--leader-elect` 时 scheduler 和 controller-manager 都将完成上文提到的 leader 选举过程。 - - -为了防止访问 API 服务失败,选举出的 leader 不能通过更新租约来选举一个新的 leader。当 scheduler 和 controller-manager 通过 127.0.0.1 访问 API 服务,而相同节点上的 API 服务不可用时,这一点相当重要。 - - -### 安装配置文件 - - -首先,在每个节点上创建空白日志文件,这样 Docker 就会挂载这些文件而不是创建一个新文件夹: - -```shell -touch /var/log/kube-scheduler.log -touch /var/log/kube-controller-manager.log -``` - - -接下来,在每个节点上配置 scheduler 和 controller manager pods 的描述文件。拷贝 [kube-scheduler.yaml](/docs/admin/high-availability/kube-scheduler.yaml) 和 [kube-controller-manager.yaml](/docs/admin/high-availability/kube-controller-manager.yaml) 到 `/etc/kubernetes/manifests/` 文件夹。 - - -## 结尾 - - -此时,你已经完成了 master 组件的配置(耶!),但你还需要添加工作者节点(噗!)。 - - -如果你有一个现成的集群,你只需要在每个节点上简单的重新配置你的 kubeletes 连接到负载均衡的 endpoint 并重启它们。 - - -如果你搭建的是一个全新的集群,你将需要在每个工作节点上安装 kubelet 和 kube-proxy,并设置 `--apiserver` 指向复制的 endpoint。 \ No newline at end of file diff --git a/content/zh/docs/admin/ovs-networking.md b/content/zh/docs/admin/ovs-networking.md deleted file mode 100644 index 856148c0d84b2..0000000000000 --- a/content/zh/docs/admin/ovs-networking.md +++ /dev/null @@ -1,22 +0,0 @@ ---- -approvers: -- thockin -title: Kubernetes OpenVSwitch GRE/VxLAN 网络 ---- - -本文档介绍了如何使用 OpenVSwitch,在跨 nodes 的 pods 之间设置网络。 -隧道类型可以是 GRE 或者是 VxLAN。如需在网络内执行大规模隔离时,最好使用 VxLAN。 - -![OVS Networking](/images/docs/ovs-networking.png) - -Kubernetes 中 Vagrant 的设置如下: - -docker 网桥被 brctl 生成的 Linux 网桥(kbr0) 所代替,kbr0 是具有 256 个地址空间的子网。总的来说,node 会得到 10.244.x.0/24 的子网,docker 上配置使用的网桥会代替默认 docker0 的网桥。 - -另外,OVS 网桥创建(obr0),并将其作为端口添加到 kbr0 的网桥中。所有 OVS 网桥通过 GRE 隧道连接所有的 nodes。因此,每个 node 都有一个到其他 nodes 的出站 GRE 隧道。这个隧道没有必要是一个完整的网状物,但是越像网状结构越好。在网桥上开启 STP (生成树)模式以防止环路的发生。 - -路由规则允许任何 10.244.0.0/16 通过与隧道相连的 OVS 网桥到达目标。 - - - - diff --git a/content/zh/docs/concepts/cluster-administration/device-plugins.md b/content/zh/docs/concepts/cluster-administration/device-plugins.md deleted file mode 100644 index 1792960105ae1..0000000000000 --- a/content/zh/docs/concepts/cluster-administration/device-plugins.md +++ /dev/null @@ -1,97 +0,0 @@ ---- -approvers: -title: 设备插件 -description: 使用 Kubernetes 设备插件框架来为 GPUs、 NICs、 FPGAs、 InfiniBand 和其他类似的需要供应商特别设置的资源开发插件。 -content_template: templates/concept ---- - -{{< feature-state state="alpha" >}} - -{{% capture overview %}} -从1.8版本开始,Kubernetes 提供了一套 -[设备插件框架](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-management/device-plugin.md), -使得供应商能够在不改动 Kubernetes 核心代码的情况下,向 kubelet 发布它们的资源。 -供应商可以实现一个手动或以 DaemonSet 形式部署的插件,而不是编写自定义的 Kubernetes 代码。 -插件的目标设备包括 GPUs、 高性能 NICs、 FPGAs、 InfiniBand -和其他类似的可能需要供应商特定的初始化和设置的计算资源。 -{{% /capture %}} - -{{% capture body %}} - -## 设备插件注册 - -设备插件功能通过 `DevicePlugins` 功能入口控制, 该功能默认是禁用的。 -当设备插件功能被启用时,kubelet 会对外提供一个 `Registration` gRPC 服务: - -```gRPC -service Registration { - rpc Register(RegisterRequest) returns (Empty) {} -} -``` -设备插件通过该 gRPC 服务将自身注册到 kubelet 。 -注册过程中,设备插件需要发送: - - * 它的 Unix 套接字名称。 - * 所基于的设备插件 API 版本。 - * 希望发布的 `ResourceName` 。 这里的 `ResourceName` 需要符合 - [扩展资源命名方案](https://github.com/kubernetes/kubernetes/pull/48922), - 形如 `vendor-domain/resource` 。 - 例如,Nvidia GPU 资源被发布为 `nvidia.com/gpu` 。 - -注册成功后,设备插件将其管理的设备列表发送至 kubelet ,然后 kubelet 负责将这些资源作为 kubelet 节点状态更新的一部分,通知 apiserver 。 -例如, 设备插件注册 `vendor-domain/foo` 到 kubelet , -并上报了节点上的两个健康的设备后,节点状态将更新, 发布2个 `vendor-domain/foo` 。 - -然后,开发者可以在 [容器](/docs/api-reference/{{< param "version" >}}/#container-v1-core) -规格中通过使用与 -[不透明整数型资源](/docs/tasks/configure-pod-container/opaque-integer-resource/) -中同样的流程来请求使用设备。 -在1.8版本中, 扩展资源仅支持整型的资源,且容器规格中声明的 `limit` 与 `request` 必须相等。 - -## 设备插件实现 - -设备插件的工作流程一般包括以下步骤: - -* 初始化。 在这个阶段,设备插件执行供应商特定的初始化和设置,以确保设备处于就绪状态。 - -* 插件通过主机路径 `/var/lib/kubelet/device-plugins/` 下的一个 Unix 套接字启动 gRPC 服务,该服务实现了以下接口: - - ```gRPC - service DevicePlugin { - // ListAndWatch returns a stream of List of Devices - // Whenever a Device state change or a Device disappears, ListAndWatch - // returns the new list - rpc ListAndWatch(Empty) returns (stream ListAndWatchResponse) {} - - // Allocate is called during container creation so that the Device - // Plugin can run device specific operations and instruct Kubelet - // of the steps to make the Device available in the container - rpc Allocate(AllocateRequest) returns (AllocateResponse) {} - } - ``` - -* 插件通过主机路径 `/var/lib/kubelet/device-plugins/kubelet.sock` 下的 Unix 套接字将自身注册到 kubelet 。 - -* 注册成功之后,设备插件以服务模式运行,其间持续监测设备健康状态,并在任何设备状态变化时上报到 kubelet 。 -插件也负责服务 `Allocate` gRPC 请求。 在 `Allocate` 过程中,插件可能会做设备特定的准备动作; 如 GPU 清理 或 QRNG 初始化。 -如操作成功,设备插件会返回一个 `AllocateResponse` ,它包含了用于访问分配的设备的容器运行时配置信息。 kubelet 将该信息传递到容器运行时。 - -我们期望设备插件能够监测到 kubelet 重启,并将自身重新注册到新的 kubelet 实例中。 在1.8版本中,新的 kubelet 实例启动时,会清理当前 `/var/lib/kubelet/device-plugins` 路径下已存在的 Unix 套接字。 通过这一事件,设备插件能够监测到其 Unix 套接字被删除,并重新对自身进行注册。 - -## 设备插件部署 - -设备插件可以手动部署,也可以作为 DaemonSet 进行部署。 以 DaemonSet 形式部署的好处是设备插件故障时, -Kubernetes能够重新启动 Pods 。 否则就需要额外的设备插件故障恢复机制。 -目录 `/var/lib/kubelet/device-plugins` 需要访问特权, -所以设备插件必须在特权的安全上下文环境下运行。 -如果设备插件以 DaemonSet 形式运行, `/var/lib/kubelet/device-plugins` -目录必须在插件的 [PodSpec](/docs/api-reference/{{< param "version" >}}/#podspec-v1-core) 中以 [Volume](/docs/api-reference/{{< param "version" >}}/#volume-v1-core) 的形式挂载。 - -## 示例 - -设备插件实现的示例,参考 -[基于 COS 操作系统的 nvidia GPU 设备插件](https://github.com/GoogleCloudPlatform/container-engine-accelerators/tree/master/cmd/nvidia_gpu)。 - -{{% /capture %}} - - diff --git a/content/zh/docs/concepts/cluster-administration/sysctl-cluster.md b/content/zh/docs/concepts/cluster-administration/sysctl-cluster.md deleted file mode 100644 index 2bcd589ac3a52..0000000000000 --- a/content/zh/docs/concepts/cluster-administration/sysctl-cluster.md +++ /dev/null @@ -1,103 +0,0 @@ ---- -approvers: -- sttts -title: Kubernetes集群中使用Sysctls ---- - -{{< toc >}} - -这篇文章描述了如何在Kubernetes集群中使用Sysctls。 - -## 什么是Sysctl? - -在Linux中,Sysctl接口允许管理员在内核运行时修改内核参数。这些可用参数都存在于虚拟进程文件系统中的`/proc/sys/`目录。这些内核参数作用于各种子系统中,例如: - -- 内核 (通用前缀:`kernel.`) -- 网络 (通用前缀:`net.`) -- 虚拟内存 (通用前缀:`vm.`) -- 设备专用 (通用前缀:`dev.`) -- 更多子系统描述见 [Kernel docs](https://www.kernel.org/doc/Documentation/sysctl/README). - -获取所有参数列表,可运行 - -``` -$ sudo sysctl -a -``` - -## 命名空间级vs.节点级Sysctls - -在今天的Linux内核系统中有一些Sysctls是 _命名空间级_ 的。这意味着他们在同节点的不同pod间是可配置成独立的。在kubernetes里,命名空间级是Sysctls的一个必要条件,以使其在一个pod语境里易于理解。 - -以下列出了Sysctls中已知的 _命名空间级_ : - -- `kernel.shm*`(内核中共享内存相关参数), -- `kernel.msg*`(内核中SystemV消息队列相关参数), -- `kernel.sem`(内核中信号量参数), -- `fs.mqueue.*`(内核中POSIX消息队列相关参数), -- `net.*`(内核中网络配置项相关参数),如果它可以在容器命名空间里被更改。然而,也有一些特例 - (例如,`net.netfilter.nf_conntrack_max` 和 - `net.netfilter.nf_conntrack_expect_max` - 可以在容器命名空间里被更改,但它们是非命名空间的)。 - -Sysctls中非命名空间级的被称为 _节点级_ ,其必须由集群管理员手动设置,要么通过节点的底层Linux分布方式(例如,通过 `/etc/sysctls.conf`),亦或在特权容器中使用Daemonset。 - -**注意**: 这是很好的做法,考虑在一个集群里给有特殊sysctl的节点设置为 _污点_ ,并且给他们安排仅需要这些sysctl设置的pods。 建议采用Kubernetes [_污点和容点_ -特征](/docs/user-guide/kubectl/{{< param "version" >}}/#taint) 来实现。 - -## 安全的 vs. 不安全的 Sysctls - -Sysctls被分为 _安全的_ 和 _不安全的_ sysctls。同一节点上的pods间除了适当命名空间命名一个 _安全的_ sysctl,还必须适当的 _隔离_ 。 这意味着给一个pod设置一个 _安全的_ sysctl - -- 不能对相同节点上其他pod产生任何影响 -- 不能对节点的健康造成损害 -- 不能在pod资源限制以外获取更多的CPU和内存资源 - -目前看来,大多数的 _命名空间级_ sysctls 不一定被认为是 _安全的_ 。 - -在Kubernetes 1.4版本中,以下sysctls提供了 _安全的_ 配置: - -- `kernel.shm_rmid_forced`, -- `net.ipv4.ip_local_port_range`, -- `net.ipv4.tcp_syncookies`. - -该列表在未来的Kubernetes版本里还会继续扩充,当kubelet提供更好的隔离机制时。 - -所有 _安全的_ sysctls 都是默认启用的。 - -所有 _不安全的_ sysctls 默认是关闭的,且必须通过每个节点基础上的集群管理手动开启。禁用不安全的sysctls的Pods将会被计划,但不会启动。 - -**警告**: 由于他们的本质是 _不安全的_ ,使用 _不安全的_ sysctls是自担风险的,并且会导致严重的问题,例如容器的错误行为,资源短缺或者是一个节点的完全破损。 - -## 使能不安全的Sysctls - -牢记上面的警告, 在非常特殊的情况下,例如高性能指标或是实时应用程序优化,集群管理员可以允许 _不安全的_ -sysctls。 _不安全的_ sysctls 会打上kubelet标识,在逐节点的基础上被启用,例如: - -```shell -$ kubelet --experimental-allowed-unsafe-sysctls 'kernel.msg*,net.core.somaxconn' ... -``` - -只有 _命名空间级_ sysctls 可以使用该方法启用。 - -## 给Pod配置Sysctls - -在Kubernetes 1.4版本中,sysctl特性是一个alpha API。因此,sysctls被设置为在pods上使用注释。它们适用于同一个pod上的所有容器。 - -这里列举了一个例子, _安全的_ 和 _不安全的_ sysctls使用不同的注释: - -```yaml -apiVersion: v1 -kind: Pod -metadata: - name: sysctl-example - annotations: - security.alpha.kubernetes.io/sysctls: kernel.shm_rmid_forced=1 - security.alpha.kubernetes.io/unsafe-sysctls: net.core.somaxconn=1024,kernel.msgmax=1 2 3 -spec: - ... -``` - -**注意**: 包含以上规定的 _不安全的_ sysctls的一个Pod, 将无法启动任何不能使这两个 _不安全的_ sysctls明确的节点。 推荐 -_节点级_ sysctls使用 [_容点和污点_ -特征](/docs/user-guide/kubectl/v1.6/#taint) or [taints on nodes](/docs/concepts/configuration/taint-and-toleration/) -来将这些pods分配到正确的nodes上。 diff --git a/content/zh/docs/concepts/configuration/commands.yaml b/content/zh/docs/concepts/configuration/commands.yaml deleted file mode 100644 index 2327d2582745f..0000000000000 --- a/content/zh/docs/concepts/configuration/commands.yaml +++ /dev/null @@ -1,13 +0,0 @@ -apiVersion: v1 -kind: Pod -metadata: - name: command-demo - labels: - purpose: demonstrate-command -spec: - containers: - - name: command-demo-container - image: debian - command: ["printenv"] - args: ["HOSTNAME", "KUBERNETES_PORT"] - restartPolicy: OnFailure diff --git a/content/zh/docs/concepts/configuration/pod-with-node-affinity.yaml b/content/zh/docs/concepts/configuration/pod-with-node-affinity.yaml deleted file mode 100644 index 253d2b21ea917..0000000000000 --- a/content/zh/docs/concepts/configuration/pod-with-node-affinity.yaml +++ /dev/null @@ -1,26 +0,0 @@ -apiVersion: v1 -kind: Pod -metadata: - name: with-node-affinity -spec: - affinity: - nodeAffinity: - requiredDuringSchedulingIgnoredDuringExecution: - nodeSelectorTerms: - - matchExpressions: - - key: kubernetes.io/e2e-az-name - operator: In - values: - - e2e-az1 - - e2e-az2 - preferredDuringSchedulingIgnoredDuringExecution: - - weight: 1 - preference: - matchExpressions: - - key: another-node-label-key - operator: In - values: - - another-node-label-value - containers: - - name: with-node-affinity - image: k8s.gcr.io/pause:2.0 \ No newline at end of file diff --git a/content/zh/docs/concepts/configuration/pod-with-pod-affinity.yaml b/content/zh/docs/concepts/configuration/pod-with-pod-affinity.yaml deleted file mode 100644 index 1897af901ff69..0000000000000 --- a/content/zh/docs/concepts/configuration/pod-with-pod-affinity.yaml +++ /dev/null @@ -1,29 +0,0 @@ -apiVersion: v1 -kind: Pod -metadata: - name: with-pod-affinity -spec: - affinity: - podAffinity: - requiredDuringSchedulingIgnoredDuringExecution: - - labelSelector: - matchExpressions: - - key: security - operator: In - values: - - S1 - topologyKey: failure-domain.beta.kubernetes.io/zone - podAntiAffinity: - preferredDuringSchedulingIgnoredDuringExecution: - - weight: 100 - podAffinityTerm: - labelSelector: - matchExpressions: - - key: security - operator: In - values: - - S2 - topologyKey: kubernetes.io/hostname - containers: - - name: with-pod-affinity - image: k8s.gcr.io/pause:2.0 diff --git a/content/zh/docs/concepts/configuration/pod.yaml b/content/zh/docs/concepts/configuration/pod.yaml deleted file mode 100644 index 134ddae2aa1c8..0000000000000 --- a/content/zh/docs/concepts/configuration/pod.yaml +++ /dev/null @@ -1,13 +0,0 @@ -apiVersion: v1 -kind: Pod -metadata: - name: nginx - labels: - env: test -spec: - containers: - - name: nginx - image: nginx - imagePullPolicy: IfNotPresent - nodeSelector: - disktype: ssd diff --git a/content/zh/docs/concepts/configuration/scheduler-perf-tuning.md b/content/zh/docs/concepts/configuration/scheduler-perf-tuning.md deleted file mode 100644 index 3ca2c81c8d46e..0000000000000 --- a/content/zh/docs/concepts/configuration/scheduler-perf-tuning.md +++ /dev/null @@ -1,175 +0,0 @@ ---- -reviewers: -- bsalamat -title: 调度器性能调优 -content_template: templates/concept -weight: 70 ---- - - - -{{% capture overview %}} - -{{< feature-state for_k8s_version="1.12" >}} - - - -Kube-scheduler 是 Kubernetes 的默认调度器。负责将 Pods 安排到集群中的节点上。 -集群中达到 Pod 调度要求的节点也被称为这个 Pod 的“可行性”节点。 -调度器首先找出 Pod 的可行性节点,然后运行一套打分函数来为可行性节点打分, -最后挑选出分数最高的可行性节点来运行 Pod。随后,调度器将这个决定通知给 API 服务器, -这个通知流程叫做“绑定”("Binding")。 - -{{% /capture %}} - -{{% capture body %}} - - -## 可行性打分的节点比例 - - - -在 Kubernetes 1.12 之前, Kube-scheduler 曾经是检查集群中所有节点的可行性,并为它们依次打分。 -而在 Kubernetes 1.12 中加入了一项新的功能,允许调度器在找到足够合适的可行性节点之后,停止搜索。 -这项功能将提高调度器在大型集群应用中的性能。这是一个比例参数,通过一个名为 `percentageOfNodesToScore` 的配置选项, -指明了集群大小中的比例。参数值范围在 1 到 100 之间。其它的数值将被认为是 100%。 -选项的默认值为 50%。集群管理员也可以在调度器的配置文件中,提供不同数值来做修改。 -但可能也并不需要修改这个值。 - -```yaml -apiVersion: componentconfig/v1alpha1 -kind: KubeSchedulerConfiguration -algorithmSource: - provider: DefaultProvider - -... - -percentageOfNodesToScore: 50 -``` - -{{< note >}} - - - -**注意**:如果集群中可行性节点的数量为 0 或者小于 50 个,调度器还是会检查所有的节点, -仅仅是因为没有足够多的可行性节点让调度器终止搜索。 - -{{< /note >}} - - - - -可以将 `percentageOfNodesToScore` 设置为 100 来**禁止这项功能**。 - - -### 调优 `percentageOfNodesToScore` - - - -`percentageOfNodesToScore` 的数值必须在 1 到 100 之间,默认值为 50。 -在内部设计里面,还存在有硬编码的至少 50 个节点的要求。无论 `percentageOfNodesToScore` 设置如何, -调度器都至少会搜索 50 个节点。换言之,在只有数百个节点的集群中调低这个参数,并不会对调度器搜索可行性节点有影响。 -这样设计是经过考虑的,因为在较小的集群中,这个参数并不会显著提升性能。 -而在超过 1000 个节点的大型集群中调低这个参数,将会有显著的性能提升。 - - - -在设置这个数值时,需要注意一点,如果集群中只有较少的一部分节点进行了可行性检查, -有些节点将不会被作可行性打分。 -因而,有运行 Pod 可行性分数更高的节点甚至可能不会到达打分阶段。这会造成一个并不十分理想的安排结果。 -正因为如此,这个数值不应该设置得非常低。 -一个重要原则就是这个数值不应该小于 30。 -更小的数值只应该在应用对调度器的吞吐量十分敏感,而节点的可行性打分相对不重要的前提下使用。 -换言之,只要节点适合运行 Pod 就可以安排到该节点上运行。 - - - -如果集群只有数百个节点,我们不建议将参数值调到比默认值低。因为这并不能显著提升调度器的性能。 - - -### 调度器是如何遍历节点的 - - - -这一节是为那些希望了解这项功能内部细节的人准备的。 - - - -为了让集群中所有的节点都有平等的机会被考虑运行 Pods,调度器需要以轮转的方式遍历所有的节点。 -你可以这样理解:所有节点都在记录在某数组之中,调度器从数组的一端开始检查节点的可行性,直到找到 `percentageOfNodesToScore` -指明的、足够多的节点。对于下一个 pod,调度器将从前一个 Pod 的结束节点开始,继续开始搜索。 - - - -如果节点在不同的区域,调度器也将遍历不同区域的所有节点,保证不同区域的节点都会被考虑在列。 -例如,如果六个节点分布在两个区域: - -``` -Zone 1: Node 1, Node 2, Node 3, Node 4 -Zone 2: Node 5, Node 6 -``` - - - -调度器将会按照下面的顺序来对所有的节点进行可行性检查: - -``` -Node 1, Node 5, Node 2, Node 6, Node 3, Node 4 -``` - - - -当遍历完所有的节点后,会重新从 Node 1 开始搜索。 - -{{% /capture %}} diff --git a/content/zh/docs/concepts/extend-kubernetes/api-extension/custom_resources.md b/content/zh/docs/concepts/extend-kubernetes/api-extension/custom_resources.md deleted file mode 100644 index bee484cb16cde..0000000000000 --- a/content/zh/docs/concepts/extend-kubernetes/api-extension/custom_resources.md +++ /dev/null @@ -1,535 +0,0 @@ ---- -title: 自定义资源 -reviewers: -- enisoc -- deads2k -content_template: templates/concept -weight: 20 ---- - - -{{% capture overview %}} - - -*自定义资源* 是 Kubernetes API 的扩展。本页讨论何时向 Kubernetes 群集添加自定义资源以及何时使用独立服务。它描述了添加自定义资源的两种方法以及如何在它们之间进行选择。 - -{{% /capture %}} - -{{% capture body %}} - -## 自定义资源 - - -*资源* 是[Kubernetes API](/docs/reference/using-api/api-overview/)中的端点,用于存储 -某种[API 对象](/docs/concepts/overview/working-with-objects/kubernetes-objects/)的集合。例如,内置 *Pod* 资源包含 Pod 对象的集合。 - - -*自定义资源* 是 Kubernetes API 的扩展,在 Kubernetes 的默认安装中不一定可用。它使 Kubernetes 具备可定制化安装的能力。但是,很多核心 Kubernetes 功能现在都使用自定义资源构建,使 Kubernetes 模块更加合理。 - - -自定义资源可以通过动态注册在运行的集群中显示或者消失,并且集群管理员可以独立于集群本身更新自定义资源。安装自定义资源后,用户可以使用 [kubectl](/docs/user-guide/kubectl-overview/) 创建和访问其对象,就像对 *Pod* 等内置资源一样。 - - -## 自定义控制器 - - -自定义资源可以存储和检索结构化数据。将自定义资源和自定义控制器结合时,自定义资源提供一个真正 _声明式 API_ 。 - - -[声明式 API](/docs/concepts/overview/working-with-objects/kubernetes-objects/#understanding-kubernetes-objects) 允许使用者_声明_或者指定所需的资源状态,并使当前状态与预期状态保持一致。控制器将结构化数据解释为用户所需状态的记录,并持续维护此状态。 - - -你可以在运行中的集群上部署或更新自定义控制器,而这一操作是与集群自身的生命期无关的。自定义控制器可以与任何资源一起工作,但是与自定义资源相结合时他们特别有效。[ Operator 模式](https://coreos.com/blog/introducing-operators.html) 结合了自定义资源和自定义控制器。您可以使用自定义控制器将特定应用程序的领域知识编码到 Kubernetes API 的扩展中。 - - -## 是否需要向 Kubernetes 集群添加自定义资源? - - -创建新 API 的时候,请考虑是[将 API 与 Kubernetes 集群 API 聚合](/docs/concepts/api-extension/apiserver-aggregation/) 还是将API独立。 - - -| 如果属于下面情况之一,可以考虑采用 API 聚合: | 如果属于下面情况之一,首选独立 API : | -| ------------------------------------------------------------ | ------------------------------------------------------------ | -| API 具有[声明性](#declarative-apis). | API 不适合[声明性](#declarative-apis)模型. | -| 你希望可使用 `kubectl` 来读写新类型。 | 不需要 `kubectl` 支持 | -| 你希望在 Kubernetes 用户界面(如 dashboard)中和内置类型一起看到你的新类型。 | 不需要 Kubernetes 使用界面支持。 | -| 你正在开发新的 API。 | 你已经有一个运行良好的程序为你提供 API 服务。 | -| 你愿意接受 Kubernetes 对 REST 资源路径的格式限制,例如 API 组和命名空间。 (参照 [API 概述](/docs/concepts/overview/kubernetes-api/).) | 你需要有一个特定的 REST 路径来兼容已经定义的 REST API 。 | -| 你的资源可以很自然地归入集群作用域或者集群中的某个命名空间的作用域。 | 集群或者命名空间作用域均不合适;你需要控制资源路径的细节。 | -| 你希望重用 [Kubernetes API 的支持功能](#common-features). | 你不需要这些功能。 | - - -### 声明式 API {#declarative-apis} - - -在声明式 API 中,通常: - - - - API由相对较少的对象(资源)组成。 - - 对象定义了应用或基础设施的配置项。 - - 对象更新相对较少。 - - 用户经常需要读取和写入对象。 - - 对象上的主要操作是 CRUD(创建,读取,更新和删除)。 - - 不需要跨对象的事务:API 表示一个所需的状态,而不是一个确切的状态。 - - -命令式 API 不是声明式的。API 不是声明式的特点包括: - - - - 客户端说“执行此操作”,然后在完成时获取同步响应。 - - 客户端说“执行此操作”,然后获取一个操作 ID,并且为了确认请求的完成,必须检查单独的操作对象。 - - 说的是远程过程调用(RPCs)。 - - 直接存储大量数据(例如,每个对象几 kB,或1000个对象)。 - - 需要高带宽访问(每秒持续的10个请求) - - 存储业务用户数据(如图像、PII 等)或者其他由应用程序处理的大规模数据。 - - 对于对象的自然操作不是 CRUD。 - - API 不容易建模成对象。 - - 你选择使用操作 ID 或者操作对象来表示待解决的操作。 - - -## 我应该使用 configMap 还是自定义资源? - - -如果以下任何一项适用,请使用 ConfigMap : - - -* 存在一种记录良好的配置文件格式,,例如 `mysql.cnf` 或者 `pom.xml` 。 -* 你想把所有的配置文件都放进 configMap 的一个键中。 -* 配置文件的作用就是,让运行在集群中 Pod 的程序配置自身环境。 -* 文件使用者更倾向于使用通过 Pod 中的文件或 Pod 中的环境变量,而不是使用 Kubernetes API。 -* 你希望在更新文件时通过部署等执行滚动更新。 - - -{{< note >}} -对敏感数据使用 [secret](/docs/concepts/configuration/secret/) , 类似于 configMap ,但更安全。 -{{}} - - -如果以下大多数情况出现,请使用自定义资源( CustomResourceDefinition 或者 API集合 ): - - -* 您希望使用 Kubernetes 客户端库和 CLIs 来创建和更新新资源。 -* 你希望从 kubectl 得到顶级支持(比如:`kubectl get my-object object-name`)。 -* 你希望新建一个自动化以监视新对象的更新,然后监视 CRUD 其他对象,反之亦然。 -* 你希望编写处理对象更新的自动化。 -* 你想要使用 Kubernetes API约定,比如像 `.spec` , `.status` ,和 `.metadata` 。 -* 你希望对象是受控资源集合的抽象,或者是其他资源的汇总。 - - -## 添加自定义资源 - - -Kubernetes 提供了两种将自定义资源添加到集群的方法: - - -- CustomResourceDefinition 非常简单,无需任何编程即可创建。 -- [API 聚集](/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/)需要编程,但允许对 API 行为进行更多控制,比如数据的存储方式和 API 版本之间的转换。 - - -Kubernetes 提供了两个选项来满足不同使用者的需要,因此无论是易用性还是灵活性都不受影响。 - - -API 聚集是位于主API服务器后面的从属 API 服务器,用来充当代理。这种安排被称作[API 聚合](/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/) (AA)。对于用户来说,Kubernetes API 似乎只是扩展了。 - - -CustomResourceDefinition 允许用户在不添加其他 API 服务器的情况下创建新类型的资源。您无需了解 API 集合就可以使用 CustomResourceDefinition 。 - - -无论如何安装,新资源都被称为自定义资源,已将其与内置的 Kubernetes(如 Pod )资源区分开。 - - -## 自定义资源定义(CustomResourceDefinition) {#CustomResourceDefinition} - - -[CustomResourceDefinition](/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/) API 资源允许你去定义自定义资源。定义 CustomResourceDefinition 对象创建了一个新的自定义资源,该资源具有您指定的名称和架构。Kubernetes API 提供并处理自定义资源的存储。 - - -这样,你就不用为了处理自定义资源来编写你自己的API服务器了,但是实现的通用性意味着比 [API 服务器集合](#api-server-aggregation)的灵活性要低。 - - -有关如何注册新自定义资源、使用新资源类型的实例以及使用控制器处理事件的示例,请参阅[自定义控制器示例](https://github.com/kubernetes/sample-controller)。 - - -## API服务器聚合 - - -通常,Kubernetes API 中的每个资源都需要代码来处理 REST 请求和管理对象的持续存储。主 Kubernetes API服务器处理像 *Pod* 和 *services* 的内置资源,还可以通过 [CustomResourceDefinition](#customresourcedefinitions) 以通用的方式处理自定义资源。 - - -[集合层](/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/)允许您通过编写部署你自己独立的API服务器来为自定义资源提供专用实现。 - - -主 API 服务器将委托您处理自定义资源,使其可供所有客户端使用。 - - -## 选择添加自定义资源的方法 - - -CustomResourceDefinition 更易于使用。API 集合更加灵活。选择更能满足您需求的方法。 - - -通常,CustomResourceDefinition 更适合以下几个方面: - - -* 你有几个字段。 -* 你正在使用公司内的资源,或者一个小的开源项目的一部分(区别于商业产品) - - -### 比较易用性 - - -CustomResourceDefinition 比 API 集合更容易创建。 - - -| CustomResourceDefinition(CRD) | API 集合 | -| ------------------------------------------------------------ | ------------------------------------------------------------ | -| 不需要编程。用户可以为 CRD 控制器选择任何语言。 | 需要在 Go 中编程并构建二进制和镜像。用户可以为CRD控制器选择任何语言。 | -| 无需运行其他服务; CR 由 API 服务器处理。 | 要创建其他服务,并可以会失败。 | -| 一旦 CRD 创建,无需后续支持。任何错误修复都是正常 Kubernetes Master 升级的一部分。 | 可能需要定期从上游拾取错误修复,并重建和更新 API 集合服务器。 | -| 无需处理 API 的多个版本。例如:当您控制此资源的客户端时,可以将其与 API 同步升级。 | 你需要处理 API 的多个版本,例如:当开发一个扩展,与世界分享时。 | - - -### 高级功能和灵活性 - - -API 集合提供更高级的 API 特性以及其他功能的自定义,例如:存储层。 - - -| 特性 | 描述 | CustomResourceDefinition | API集合 | -| ------- | ----------- | ---- | -------------- | -| 验证 |帮助用户避免错误并且允许您独立于客户端发展API。 当有许多客户端无法同时更新时,这些功能非常有用。 | 是的。大多数验证可以在CustomResourceDefinition 中使用[OpenAPI v3.0 validation](/docs/tasks/access-kubernetes-api/extend-api-custom-resource-definitions/#validation)来验证。 其他验证可以通过添加[验证的 Webhook](/docs/reference/access-authn-authz/admission-controllers/#validatingadmissionwebhook-alpha-in-1-8-beta-in-1-9)来支持. | 是的,任意验证检查。 | -| 违约 | 见上文 | 是的, 通过 [突变的 Webhook](/docs/reference/access-authn-authz/admission-controllers/#mutatingadmissionwebhook-beta-in-1-9); 规划, 通过 CustomResourceDefinition OpenAPI 架构. | 是的 | -| 多版本 | 允许通过两个 API 版本为同一对象提供服务。 可帮助简化 API 更改,如重命名字段。 如果您控制客户端版本,则不太重要。| [是的](/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definition-versioning) | 是的 | -| 自定义存储 | 如果需要不同性能模式的存储(例如,时间序列数据库而不是密钥值存储) 或为了安全进行隔离(例如,加密机密或不同的)| 不是 | 是的 | -| 定制业务逻辑 | 在创建,读取,更新或删除对象时执行任意检查或操作 | 是的, 使用 [Webhooks](/docs/reference/access-authn-authz/extensible-admission-controllers/#admission-webhooks)。| 是的 | -| Scale 子资源 | 允许 HorizontalPodAutoscaler 和 PodDisruptionBudget 与您的新资源进行互换 | [是的](/docs/tasks/access-kubernetes-api/extend-api-custom-resource-definitions/#scale-subresource) | 是的 | -| Status 子资源 | | [是的](/docs/tasks/access-kubernetes-api/extend-api-custom-resource-definitions/#status-subresource) | 是的 | -| 其他子资源| 添加 CRUD 以外的操作,如"日志"或"执行"。 | 不是 | 是的 | -| 战略-合并-补丁 | 新的端点支持具有 `Content-Type: application/strategic-merge-patch+json` 的 PATCH . 可用于更新可在本地和服务器修改的对象。更多信息,请参阅["使用 kubectl 修补程序更新API对象"](/docs/tasks/run-application/update-api-object-kubectl-patch/) | 不是 | 是的 | -| 协议缓冲区 | 新资源支持希望使用协议缓冲区的客户端 | 不是 | 是的 | -| OpenAPI 架构 | 是否有可从服务器动态提取类型的 OpenAPI (swagger) 架构?用户通过确保只设置允许的字段来避免拼写错误的字段名称是否被设置? 是否是强制执行类型 (换句话说,不要在 `string` 字段中放置 `int` ?) | 不,但有计划 | 是的 | - - -### 公共特性 - - -与在 Kubernetes 平台之外实现它相比,当通过 CustomResourceDefinition 或 AA 创建自定义资源时,您可以获得 API 的许多功能: - - - -| 功能 | 作用 | -| ------- | ------------ | -| CRUD | 通过 HTTP 和 `kubectl` ,新的端点通过 HTTP 和 kubectl 支持 CRUD 基本操作| -| Watch | 新的端点通过 HTTP 支持 Kubernetes 监视功能 | -| 发现 | 如 Kubectl 和 dashboard 客户端会自动提供资源上的列表、显示和字段编辑操作 | -| json-patch | 新的端点支持打上 `Content-Type: application/json-patch+json` 的 PATCH | -| merge-patch | 新的端点支持打上 `Content-Type: application/merge-patch+json` 的 PATCH | -| HTTPS | 新的端点使用 HTTPS | -| 内置身份验证 | 对扩展的访问使用核心API服务(聚合层)进行身份认证 | -| 内置授权 | 对扩展的访问可以重用被核心API服务使用的授权(例如,基于角色的访问控制 role-based access control) | -| 终结器 | 阻止删除扩展资源,直到进行外部清理。| -| 准入 Webhooks | 在任何创建/更新/删除操作期间设置默认值并验证扩展资源。 | -| UI/CLI 显示 | Kubectl ,和 dashboard 可以显示扩展资源。| -| 未设置与空 | 客户端可以把未设置字段从零值字段中区分出来。 | -| 客户端库生成 | Kubernetes 提供通用客户端库,以及用于生成特定客户端库的工具。| -| 标签和注释 | 工具知道如何编辑核心和自定义资源的跨对象的通用元数据 | - - -## 准备安装自定义资源 - - -在将自定义资源添加到群集之前,需要注意几个要点。 - - -### 第三方代码和新的故障点 - - -虽然创建CRD不会自动添加任何新的故障点(例如,通过第三方代码在 API 服务器上运行),但包(例如,图表)或其他安装捆绑包经常包括 CustomResourceDefinition 以及 Deployment 第三方代码来实现新的自定义资源的业务逻辑。 - - -安装新的聚合 API 服务器总是涉及运行新的部署。 - - -### 存储 - -自定义资源与 ConfigMap 以相同的方式消耗存储空间。创建过多的自定义资源会过载 API 服务器的存储空间。 - - -API 集合服务器可以使用与主 API 服务器相同的存储,在这种情况下,将应用相同的警告。 - - -### 身份验证、授权和审核 - - -CustomResourceDefinition 始终使用与 API 服务器内置资源相同的身份验证、授权和审核日志记录。 - - -如果使用 RBAC 进行授权,大多数 RBAC (基于角色的访问控制)的角色不会授予对新资源的访问权限(除了集群管理员角色或任何通配符规则创建的角色)。您需要明确授予对新资源的访问权限。CustomResourceDefinition 和 API集合 通常与他们添加类型的新角色定义捆绑。 - - -API 集合服务器可能使用或不使用与主 API 服务器相同的身份验证,授权和审核。 - - -## 访问自定义资源 - - -Kubernetes [客户端库](/docs/reference/using-api/client-libraries/)可用于访问自定义资源。并非所有客户端库都支持自定义资源。go 和 python 客户端库可以。 - - -当你添加了一个自定义资源,可以使用一下命令来访问它: - - -- kubectl -- kubernetes 动态客户端。 -- 你编写的 REST 客户端。 -- 使用 Kubernetes [客户端生成工具](https://github.com/kubernetes/code-generator)生成的客户端(生成是一个高级的任务,但有些项目可能会提供客户端以及 CustomResourceDdfinition 或 Aggregated API ) - -{{% /capture %}} - -{{% capture whatsnext %}} - - -* 了解如何使用[聚合层扩展 Kubernetes API](/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/)。 - - -* 了解如何使用 [CustomResourceDefinition 扩展 Kubernetes API](/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/)。 - -{{% /capture %}} - diff --git a/content/zh/docs/concepts/overview/object-management-kubectl/_index.md b/content/zh/docs/concepts/overview/object-management-kubectl/_index.md deleted file mode 100644 index 3ee775f5a6d78..0000000000000 --- a/content/zh/docs/concepts/overview/object-management-kubectl/_index.md +++ /dev/null @@ -1,11 +0,0 @@ ---- -title: "用 kubectl 管理对象" -weight: 50 ---- - - diff --git a/content/zh/docs/concepts/overview/object-management-kubectl/imperative-config.md b/content/zh/docs/concepts/overview/object-management-kubectl/imperative-config.md deleted file mode 100644 index ac923edef9917..0000000000000 --- a/content/zh/docs/concepts/overview/object-management-kubectl/imperative-config.md +++ /dev/null @@ -1,262 +0,0 @@ ---- -title: 使用配置文件指令式管理 Kubernetes 对象 -content_template: templates/concept -weight: 30 ---- - - - -{{% capture overview %}} - -通过使用 `kubectl` 命令行工具和 yaml 或 json 格式编写的对象配置文件,用户可以创建、更新和删除 Kubernetes 对象。 -本文档介绍如何使用配置文件定义和管理对象。 -{{% /capture %}} - -{{% capture body %}} - - -## 取舍权衡 - - - -`kubectl` 工具支持三种对象管理: - - -* 指令性命令 -* 指令性对象配置 -* 声明式对象配置 - - -请参阅[Kubernetes 对象管理](/docs/concepts/overview/object-management-kubectl/overview/) 以了解每种对象管理的优缺点。 - - -## 怎样创建对象 - - - -可以使用 `kubectl create -f` 从配置文件创建对象。 -有关详细信息,请参阅[Kubernetes API 参考](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/)。 - -- `kubectl create -f ` - - -## 怎样更新对象 - -{{< warning >}} - -使用 `replace` 命令更新对象时,系统将会删除配置文件的 spec 中未指定的所有内容。 -对于部分上由集群来管理的对象而言,不要使用这种对象管理方式。 - -例如,对于 `LoadBalancer` 类型的服务而言,其 `externalIPs` 字段值是独立于配置文件进行管理的。 -独立管理的字段必须复制到配置文件中,以防止被 `replace` 操作删除。 -{{< /warning >}} - - -您可以使用 `kubectl replace -f` 命令基于配置文件来更新活跃状态的对象。 - -- `kubectl replace -f ` - - -## 怎样删除对象 - - - -您可以使用 `kubectl delete -f` 命令来删除配置文件中描述的对象。 - -- `kubectl delete -f ` - - -## 怎样查看对象 - - - -您可以使用 `kubectl get -f` 命令来查看配置文件中描述的对象的信息。 - -- `kubectl get -f -o yaml` - - - -指定了 `-o yaml` 参数将会打印完整的对象配置。 -使用 `kubectl get -h` 命令查看选项列表。 - - -## 限制 - - -当每个对象的配置都完整的定义和记录在它的配置文件中时,`create`、`replace` 和 `delete` 命令就能正常使用。 -然而,当一个存活态的对象被更新、并且更新的内容没有被合入该对象的配置文件时,下次再执行 `replace` 命令将导致更新的内容丢失。 -如果控制器(如 HorizontalPodAutoscaler)直接更新存活态的对象,就会发生上面的情况。 -下面是个例子: - - - -1. 从配置文件创建对象。 -1. 另外一个资源更新这个对象的一些字段。 -1. 从配置文件中替换(replace)该对象。第二步中另外的资源对该对象所做的更新将丢失。 - - -如果您需要对同一对象支持多个写者,那么可以使用 `kubectl apply` 命令管理该对象。 - - -## 通过 URL 创建和编辑对象而不保存配置 - - - -假设您知道一个对象配置文件的 URL。 -您可以在对象被创建之前使用 `kubectl create --edit` 命令来更改它的配置。 -这对于指向那些读者可修改配置文件的教程和任务特别有用。 - -```sh -kubectl create -f --edit -``` - - -## 从指令性命令迁移到指令性对象配置 - - -从指令性命令迁移到指令性对象配置包括几个手动步骤。 - - -1. 将存活态的对象导出为本地对象配置文件: -```sh -kubectl get / -o yaml --export > _.yaml -``` -1. 手动从对象配置文件中移除状态信息。 -1. 对于后续的对象管理,只使用 `replace`。 -```sh -kubectl replace -f _.yaml -``` - - -## 定义控制器选择器和 PodTemplate 标签 - -{{< warning >}} - -强烈不建议更新控制器的选择器。 -{{< /warning >}} - - -建议的方法是定义一个单一的、不变的 PodTemplate 标签,该标签仅由控制器选择器使用,没有其他语义意义。 - - -标签示例: - -```yaml -selector: - matchLabels: - controller-selector: "extensions/v1beta1/deployment/nginx" -template: - metadata: - labels: - controller-selector: "extensions/v1beta1/deployment/nginx" -``` - -{{% /capture %}} - -{{% capture whatsnext %}} - - -- [使用指令性命令管理 Kubernetes 对象](/docs/concepts/overview/object-management-kubectl/imperative-command/) -- [使用对象配置文件(声明式)管理 Kubernetes 对象](/docs/concepts/overview/object-management-kubectl/declarative-config/) -- [Kubectl 命令参考](/docs/reference/generated/kubectl/kubectl/) -- [Kubernetes API 参考](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/) -{{% /capture %}} - - diff --git a/content/zh/docs/concepts/overview/working-with-objects/kubernetes-objects.md b/content/zh/docs/concepts/overview/working-with-objects/kubernetes-objects.md index 14c85b08629f1..df3f85658eef6 100644 --- a/content/zh/docs/concepts/overview/working-with-objects/kubernetes-objects.md +++ b/content/zh/docs/concepts/overview/working-with-objects/kubernetes-objects.md @@ -61,7 +61,7 @@ Kubernetes 系统读取 Deployment 规约,并启动我们所期望的该应用 这里有一个 `.yaml` 示例文件,展示了 Kubernetes Deployment 的必需字段和对象规约: -{{< code file="nginx-deployment.yaml" >}} +{{< codenew file="application/deployment.yaml" >}} 使用类似于上面的 `.yaml` 文件来创建 Deployment,一种方式是使用 `kubectl` 命令行接口(CLI)中的 [`kubectl create`](/docs/user-guide/kubectl/v1.7/#create) 命令,将 `.yaml` 文件作为参数。下面是一个示例: diff --git a/content/zh/docs/concepts/policy/pod-security-policy.md b/content/zh/docs/concepts/policy/pod-security-policy.md index 48c8de8fa837c..bfb25c25f0269 100644 --- a/content/zh/docs/concepts/policy/pod-security-policy.md +++ b/content/zh/docs/concepts/policy/pod-security-policy.md @@ -148,7 +148,7 @@ Pod 必须基于 PSP 验证每个字段。 下面是一个 Pod 安全策略的例子,所有字段的设置都被允许: -{{< code file="psp.yaml" >}} +{{< codenew file="policy/example-psp.yaml" >}} diff --git a/content/zh/docs/concepts/policy/psp.yaml b/content/zh/docs/concepts/policy/psp.yaml deleted file mode 100644 index 9f037f67d0d4d..0000000000000 --- a/content/zh/docs/concepts/policy/psp.yaml +++ /dev/null @@ -1,18 +0,0 @@ -apiVersion: extensions/v1beta1 -kind: PodSecurityPolicy -metadata: - name: permissive -spec: - seLinux: - rule: RunAsAny - supplementalGroups: - rule: RunAsAny - runAsUser: - rule: RunAsAny - fsGroup: - rule: RunAsAny - hostPorts: - - min: 8000 - max: 8080 - volumes: - - '*' diff --git a/content/zh/docs/concepts/services-networking/add-entries-to-pod-etc-hosts-with-host-aliases.md b/content/zh/docs/concepts/services-networking/add-entries-to-pod-etc-hosts-with-host-aliases.md index b6862ef5c690b..f53501a2c26fd 100644 --- a/content/zh/docs/concepts/services-networking/add-entries-to-pod-etc-hosts-with-host-aliases.md +++ b/content/zh/docs/concepts/services-networking/add-entries-to-pod-etc-hosts-with-host-aliases.md @@ -38,7 +38,7 @@ nginx 1/1 Running 0 13s 10.200.0.4 worker0 除了默认的样板内容,我们可以向 hosts 文件添加额外的条目,将 `foo.local`、 `bar.local` 解析为`127.0.0.1`,将 `foo.remote`、 `bar.remote` 解析为 `10.1.2.3`,我们可以在 `.spec.hostAliases` 下为 Pod 添加 HostAliases。 -{{< code file="hostaliases-pod.yaml" >}} +{{< codenew file="service/networking/hostaliases-pod.yaml" >}} hosts 文件的内容看起来类似如下这样: diff --git a/content/zh/docs/concepts/services-networking/connect-applications-service.md b/content/zh/docs/concepts/services-networking/connect-applications-service.md index ae8609a469c96..dc866caee99a6 100644 --- a/content/zh/docs/concepts/services-networking/connect-applications-service.md +++ b/content/zh/docs/concepts/services-networking/connect-applications-service.md @@ -35,7 +35,7 @@ Kubernetes 假设 Pod 可与其它 Pod 通信,不管它们在哪个主机上 我们在之前的示例中已经做过,然而再让我重试一次,这次聚焦在网络连接的视角。 创建一个 Nginx Pod,指示它具有一个容器端口的说明: -{{< code file="run-my-nginx.yaml" >}} +{{< codenew file="service/networking/run-my-nginx.yaml" >}} @@ -91,7 +91,7 @@ service "my-nginx" exposed 这等价于使用 `kubectl create -f` 命令创建,对应如下的 yaml 文件: -{{< code file="nginx-svc.yaml" >}} +{{< codenew file="service/networking/nginx-svc.yaml" >}} @@ -247,7 +247,7 @@ nginxsecret Opaque 2 1m 现在修改 Nginx 副本,启动一个使用在秘钥中的证书的 https 服务器和 Servcie,都暴露端口(80 和 443): -{{< code file="nginx-secure-app.yaml" >}} +{{< codenew file="service/networking/nginx-secure-app.yaml" >}} @@ -281,7 +281,7 @@ node $ curl -k https://10.244.3.5 通过创建 Service,我们连接了在证书中的 CName 与在 Service 查询时被 Pod使用的实际 DNS 名字。 让我们从一个 Pod 来测试(为了简化使用同一个秘钥,Pod 仅需要使用 nginx.crt 去访问 Service): -{{< code file="curlpod.yaml" >}} +{{< codenew file="service/networking/curlpod.yaml" >}} ```shell $ kubectl create -f ./curlpod.yaml diff --git a/content/zh/docs/concepts/services-networking/curlpod.yaml b/content/zh/docs/concepts/services-networking/curlpod.yaml deleted file mode 100644 index 0741a58e7f563..0000000000000 --- a/content/zh/docs/concepts/services-networking/curlpod.yaml +++ /dev/null @@ -1,25 +0,0 @@ -apiVersion: apps/v1beta1 -kind: Deployment -metadata: - name: curl-deployment -spec: - replicas: 1 - template: - metadata: - labels: - app: curlpod - spec: - volumes: - - name: secret-volume - secret: - secretName: nginxsecret - containers: - - name: curlpod - command: - - sh - - -c - - while true; do sleep 1; done - image: radial/busyboxplus:curl - volumeMounts: - - mountPath: /etc/nginx/ssl - name: secret-volume diff --git a/content/zh/docs/concepts/services-networking/hostaliases-pod.yaml b/content/zh/docs/concepts/services-networking/hostaliases-pod.yaml deleted file mode 100644 index aa57b9a9e5562..0000000000000 --- a/content/zh/docs/concepts/services-networking/hostaliases-pod.yaml +++ /dev/null @@ -1,21 +0,0 @@ -apiVersion: v1 -kind: Pod -metadata: - name: hostaliases-pod -spec: - hostAliases: - - ip: "127.0.0.1" - hostnames: - - "foo.local" - - "bar.local" - - ip: "10.1.2.3" - hostnames: - - "foo.remote" - - "bar.remote" - containers: - - name: cat-hosts - image: busybox - command: - - cat - args: - - "/etc/hosts" diff --git a/content/zh/docs/concepts/services-networking/ingress.yaml b/content/zh/docs/concepts/services-networking/ingress.yaml deleted file mode 100644 index 56a0d5138f4e4..0000000000000 --- a/content/zh/docs/concepts/services-networking/ingress.yaml +++ /dev/null @@ -1,9 +0,0 @@ -apiVersion: networking.k8s.io/v1beta1 -kind: Ingress -metadata: - name: test-ingress -spec: - backend: - serviceName: testsvc - servicePort: 80 - diff --git a/content/zh/docs/concepts/services-networking/nginx-secure-app.yaml b/content/zh/docs/concepts/services-networking/nginx-secure-app.yaml deleted file mode 100644 index ec180a18df3d3..0000000000000 --- a/content/zh/docs/concepts/services-networking/nginx-secure-app.yaml +++ /dev/null @@ -1,46 +0,0 @@ -apiVersion: v1 -kind: Service -metadata: - name: my-nginx - labels: - run: my-nginx -spec: - type: NodePort - ports: - - port: 8080 - targetPort: 80 - protocol: TCP - name: http - - port: 443 - protocol: TCP - name: https - selector: - run: my-nginx ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - name: my-nginx -spec: - selector: - matchLabels: - run: my-nginx - replicas: 1 - template: - metadata: - labels: - run: my-nginx - spec: - volumes: - - name: secret-volume - secret: - secretName: nginxsecret - containers: - - name: nginxhttps - image: bprashanth/nginxhttps:1.0 - ports: - - containerPort: 443 - - containerPort: 80 - volumeMounts: - - mountPath: /etc/nginx/ssl - name: secret-volume diff --git a/content/zh/docs/concepts/services-networking/nginx-svc.yaml b/content/zh/docs/concepts/services-networking/nginx-svc.yaml deleted file mode 100644 index 12fcd5d0bfe2b..0000000000000 --- a/content/zh/docs/concepts/services-networking/nginx-svc.yaml +++ /dev/null @@ -1,12 +0,0 @@ -apiVersion: v1 -kind: Service -metadata: - name: my-nginx - labels: - run: my-nginx -spec: - ports: - - port: 80 - protocol: TCP - selector: - run: my-nginx diff --git a/content/zh/docs/concepts/services-networking/run-my-nginx.yaml b/content/zh/docs/concepts/services-networking/run-my-nginx.yaml deleted file mode 100644 index 76a879f5c4c24..0000000000000 --- a/content/zh/docs/concepts/services-networking/run-my-nginx.yaml +++ /dev/null @@ -1,20 +0,0 @@ -apiVersion: apps/v1 -kind: Deployment -metadata: - name: my-nginx -spec: - selector: - matchLabels: - run: my-nginx - replicas: 2 - template: - metadata: - labels: - run: my-nginx - spec: - containers: - - name: my-nginx - image: nginx - ports: - - containerPort: 80 - diff --git a/content/zh/docs/concepts/workloads/controllers/daemonset.yaml b/content/zh/docs/concepts/workloads/controllers/daemonset.yaml deleted file mode 100644 index 529463721e8ea..0000000000000 --- a/content/zh/docs/concepts/workloads/controllers/daemonset.yaml +++ /dev/null @@ -1,36 +0,0 @@ -apiVersion: apps/v1 -kind: DaemonSet -metadata: - name: fluentd-elasticsearch - namespace: kube-system - labels: - k8s-app: fluentd-logging -spec: - template: - metadata: - labels: - name: fluentd-elasticsearch - spec: - containers: - - name: fluentd-elasticsearch - image: gcr.io/fluentd-elasticsearch/fluentd:v2.5.1 - resources: - limits: - memory: 200Mi - requests: - cpu: 100m - memory: 200Mi - volumeMounts: - - name: varlog - mountPath: /var/log - - name: varlibdockercontainers - mountPath: /var/lib/docker/containers - readOnly: true - terminationGracePeriodSeconds: 30 - volumes: - - name: varlog - hostPath: - path: /var/log - - name: varlibdockercontainers - hostPath: - path: /var/lib/docker/containers diff --git a/content/zh/docs/concepts/workloads/controllers/deployment.md b/content/zh/docs/concepts/workloads/controllers/deployment.md index f9ba89cd27a6b..05989741d9112 100644 --- a/content/zh/docs/concepts/workloads/controllers/deployment.md +++ b/content/zh/docs/concepts/workloads/controllers/deployment.md @@ -25,20 +25,50 @@ You should not manage ReplicaSets owned by a Deployment. All the use cases shoul The following are typical use cases for Deployments: -* [Create a Deployment to rollout a ReplicaSet](#creating-a-deployment). The ReplicaSet creates Pods in the background. Check the status of the rollout to see if it succeeds or not. -* [Declare the new state of the Pods](#updating-a-deployment) by updating the PodTemplateSpec of the Deployment. A new ReplicaSet is created and the Deployment manages moving the Pods from the old ReplicaSet to the new one at a controlled rate. Each new ReplicaSet updates the revision of the Deployment. -* [Rollback to an earlier Deployment revision](#rolling-back-a-deployment) if the current state of the Deployment is not stable. Each rollback updates the revision of the Deployment. -* [Scale up the Deployment to facilitate more load.](#scaling-a-deployment) -* [Pause the Deployment](#pausing-and-resuming-a-deployment) to apply multiple fixes to its PodTemplateSpec and then resume it to start a new rollout. -* [Use the status of the Deployment](#deployment-status) as an indicator that a rollout has stuck -* [Clean up older ReplicaSets](#clean-up-policy) that you don't need anymore +- [Use Case](#use-case) +- [Creating a Deployment](#creating-a-deployment) + - [Pod-template-hash label](#pod-template-hash-label) +- [Updating a Deployment](#updating-a-deployment) + - [Rollover (aka multiple updates in-flight)](#rollover-aka-multiple-updates-in-flight) + - [Label selector updates](#label-selector-updates) +- [Rolling Back a Deployment](#rolling-back-a-deployment) + - [Checking Rollout History of a Deployment](#checking-rollout-history-of-a-deployment) + - [Rolling Back to a Previous Revision](#rolling-back-to-a-previous-revision) +- [Scaling a Deployment](#scaling-a-deployment) + - [Proportional scaling](#proportional-scaling) +- [Pausing and Resuming a Deployment](#pausing-and-resuming-a-deployment) +- [Deployment status](#deployment-status) + - [Progressing Deployment](#progressing-deployment) + - [Complete Deployment](#complete-deployment) + - [Failed Deployment](#failed-deployment) + - [Operating on a failed deployment](#operating-on-a-failed-deployment) +- [Clean up Policy](#clean-up-policy) +- [Use Cases](#use-cases) + - [Canary Deployment](#canary-deployment) +- [Writing a Deployment Spec](#writing-a-deployment-spec) + - [Pod Template](#pod-template) + - [Replicas](#replicas) + - [Selector](#selector) + - [Strategy](#strategy) + - [Recreate Deployment](#recreate-deployment) + - [Rolling Update Deployment](#rolling-update-deployment) + - [Max Unavailable](#max-unavailable) + - [Max Surge](#max-surge) + - [Progress Deadline Seconds](#progress-deadline-seconds) + - [Min Ready Seconds](#min-ready-seconds) + - [Rollback To](#rollback-to) + - [Revision](#revision) + - [Revision History Limit](#revision-history-limit) + - [Paused](#paused) +- [Alternative to Deployments](#alternative-to-deployments) + - [kubectl rolling update](#kubectl-rolling-update) ## Creating a Deployment Here is an example Deployment. It creates a ReplicaSet to bring up three nginx Pods. -{{< code file="nginx-deployment.yaml" >}} +{{< codenew file="controllers/nginx-deployment.yaml" >}} Run the example by downloading the example file and then running this command: diff --git a/content/zh/docs/concepts/workloads/controllers/frontend.yaml b/content/zh/docs/concepts/workloads/controllers/frontend.yaml deleted file mode 100644 index 2a2b4d13b9e2b..0000000000000 --- a/content/zh/docs/concepts/workloads/controllers/frontend.yaml +++ /dev/null @@ -1,45 +0,0 @@ -apiVersion: extensions/v1beta1 -kind: ReplicaSet -metadata: - name: frontend - # these labels can be applied automatically - # from the labels in the pod template if not set - # labels: - # app: guestbook - # tier: frontend -spec: - # this replicas value is default - # modify it according to your case - replicas: 3 - # selector can be applied automatically - # from the labels in the pod template if not set, - # but we are specifying the selector here to - # demonstrate its usage. - selector: - matchLabels: - tier: frontend - matchExpressions: - - {key: tier, operator: In, values: [frontend]} - template: - metadata: - labels: - app: guestbook - tier: frontend - spec: - containers: - - name: php-redis - image: gcr.io/google_samples/gb-frontend:v3 - resources: - requests: - cpu: 100m - memory: 100Mi - env: - - name: GET_HOSTS_FROM - value: dns - # If your cluster config does not include a dns service, then to - # instead access environment variables to find service host - # info, comment out the 'value: dns' line above, and uncomment the - # line below. - # value: env - ports: - - containerPort: 80 diff --git a/content/zh/docs/concepts/workloads/controllers/garbage-collection.md b/content/zh/docs/concepts/workloads/controllers/garbage-collection.md index 24ee9bf60aa98..5bd78ed6bfb62 100644 --- a/content/zh/docs/concepts/workloads/controllers/garbage-collection.md +++ b/content/zh/docs/concepts/workloads/controllers/garbage-collection.md @@ -39,14 +39,14 @@ Kubernetes 垃圾收集器的角色是删除指定的对象,这些对象曾经 这里有一个配置文件,表示一个具有 3 个 Pod 的 ReplicaSet: -{{< code file="my-repset.yaml" >}} +{{< codenew file="controllers/replicaset.yaml" >}} 如果创建该 ReplicaSet,然后查看 Pod 的 metadata 字段,能够看到 OwnerReferences 字段: ```shell -kubectl create -f https://k8s.io/docs/concepts/controllers/my-repset.yaml +kubectl apply -f https://k8s.io/examples/controllers/replicaset.yaml kubectl get pods --output=yaml ``` diff --git a/content/zh/docs/concepts/workloads/controllers/hpa-rs.yaml b/content/zh/docs/concepts/workloads/controllers/hpa-rs.yaml deleted file mode 100644 index a8388530dcba1..0000000000000 --- a/content/zh/docs/concepts/workloads/controllers/hpa-rs.yaml +++ /dev/null @@ -1,11 +0,0 @@ -apiVersion: autoscaling/v1 -kind: HorizontalPodAutoscaler -metadata: - name: frontend-scaler -spec: - scaleTargetRef: - kind: ReplicaSet - name: frontend - minReplicas: 3 - maxReplicas: 10 - targetCPUUtilizationPercentage: 50 diff --git a/content/zh/docs/concepts/workloads/controllers/job.yaml b/content/zh/docs/concepts/workloads/controllers/job.yaml deleted file mode 100644 index ece4512a8acfc..0000000000000 --- a/content/zh/docs/concepts/workloads/controllers/job.yaml +++ /dev/null @@ -1,15 +0,0 @@ -apiVersion: batch/v1 -kind: Job -metadata: - name: pi -spec: - template: - metadata: - name: pi - spec: - containers: - - name: pi - image: perl - command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"] - restartPolicy: Never - diff --git a/content/zh/docs/concepts/workloads/controllers/my-repset.yaml b/content/zh/docs/concepts/workloads/controllers/my-repset.yaml deleted file mode 100644 index 54befd8f9d261..0000000000000 --- a/content/zh/docs/concepts/workloads/controllers/my-repset.yaml +++ /dev/null @@ -1,17 +0,0 @@ -apiVersion: extensions/v1beta1 -kind: ReplicaSet -metadata: - name: my-repset -spec: - replicas: 3 - selector: - matchLabels: - pod-is-for: garbage-collection-example - template: - metadata: - labels: - pod-is-for: garbage-collection-example - spec: - containers: - - name: nginx - image: nginx diff --git a/content/zh/docs/concepts/workloads/controllers/nginx-deployment.yaml b/content/zh/docs/concepts/workloads/controllers/nginx-deployment.yaml deleted file mode 100644 index 4ce71688f713b..0000000000000 --- a/content/zh/docs/concepts/workloads/controllers/nginx-deployment.yaml +++ /dev/null @@ -1,16 +0,0 @@ -apiVersion: apps/v1beta1 # for versions before 1.6.0 use extensions/v1beta1 -kind: Deployment -metadata: - name: nginx-deployment -spec: - replicas: 3 - template: - metadata: - labels: - app: nginx - spec: - containers: - - name: nginx - image: nginx:1.7.9 - ports: - - containerPort: 80 diff --git a/content/zh/docs/contribute/generate-ref-docs/federation-api.md b/content/zh/docs/contribute/generate-ref-docs/federation-api.md deleted file mode 100644 index c4023ab8a4848..0000000000000 --- a/content/zh/docs/contribute/generate-ref-docs/federation-api.md +++ /dev/null @@ -1,152 +0,0 @@ ---- -title: 为 Kubernetes 联邦 API 生成参考文档 -content_template: templates/task ---- - - - -{{% capture overview %}} - - - -本节介绍如何为 Kubernetes 联邦 API 自动生成参考文档。 - -{{% /capture %}} - - -{{% capture prerequisites %}} - - - -* 你需要安装 [Git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)。 - - - -* 你需要安装 1.9.1 或更高版本的 [Golang](https://golang.org/doc/install),并在环境变量中设置你的 `$GOPATH`。 - - - -* 你需要安装 [Docker](https://docs.docker.com/engine/installation/)。 - - - -* 你需要知道如何在一个 GitHub 项目仓库中创建一个 PR。一般来说,这涉及到创建仓库的一个分支。想了解更多信息,请参见[创建一个文档 PR](/docs/home/contribute/create-pull-request/)。 - -{{% /capture %}} - - -{{% capture steps %}} - - - -## 运行 update-federation-api-docs.sh 脚本 - - - -如果你还没有 Kubernetes 联邦的源码,现在下载: - -```shell -mkdir $GOPATH/src -cd $GOPATH/src -go get github.com/kubernetes/federation -``` - - - -确定本地 [kubernetes/federation](https://github.com/kubernetes/federation) 仓库的主目录。 -例如,如果按照前面的步骤获取联邦的源码,则主目录是 `$GOPATH/src/github.com/kubernetes/federation`。 -下文将该目录称为 ``。 - - - -运行文档生成脚本: - -```shell -cd -hack/update-federation-api-reference-docs.sh -``` - - - -脚本运行 [k8s.gcr.io/gen-swagger-docs](https://console.cloud.google.com/gcr/images/google-containers/GLOBAL/gen-swagger-docs?gcrImageListquery=%255B%255D&gcrImageListpage=%257B%2522t%2522%253A%2522%2522%252C%2522i%2522%253A0%257D&gcrImageListsize=50&gcrImageListsort=%255B%257B%2522p%2522%253A%2522uploaded%2522%252C%2522s%2522%253Afalse%257D%255D) 镜像来生成以下参考文档: - -* /docs/api-reference/extensions/v1beta1/operations.html -* /docs/api-reference/extensions/v1beta1/definitions.html -* /docs/api-reference/v1/operations.html -* /docs/api-reference/v1/definitions.html - - - -生成的文件不会被自动发布。你必须手工将它们复制到 [kubernetes/website](https://github.com/kubernetes/website/tree/master/content/en/docs/reference/generated) 仓库。 - - - -以下文件发布在 [kubernetes.io/docs/reference](/docs/reference/): - -* [Federation API v1 Operations](/docs/reference/federation/v1/operations/) -* [Federation API v1 Definitions](/docs/reference/federation/v1/definitions/) -* [Federation API extensions/v1beta1 Operations](/docs/reference/federation/extensions/v1beta1/operations/) -* [Federation API extensions/v1beta1 Definitions](/docs/reference/federation/extensions/v1beta1/definitions/) - -{{% /capture %}} - -{{% capture whatsnext %}} - - - -* [为 Kubernetes API 生成参考文档](/docs/home/contribute/generated-reference/kubernetes-api/) -* [为 kubectl 命令集生成参考文档](/docs/home/contribute/generated-reference/kubectl/) -* [为 Kubernetes 组件和工具生成参考页](/docs/home/contribute/generated-reference/kubernetes-components/) - -{{% /capture %}} diff --git a/content/zh/docs/contribute/style/hugo-shortcodes/podtemplate.json b/content/zh/docs/contribute/style/hugo-shortcodes/podtemplate.json new file mode 100644 index 0000000000000..bd4327414a10a --- /dev/null +++ b/content/zh/docs/contribute/style/hugo-shortcodes/podtemplate.json @@ -0,0 +1,22 @@ + { + "apiVersion": "v1", + "kind": "PodTemplate", + "metadata": { + "name": "nginx" + }, + "template": { + "metadata": { + "labels": { + "name": "nginx" + }, + "generateName": "nginx-" + }, + "spec": { + "containers": [{ + "name": "nginx", + "image": "dockerfile/nginx", + "ports": [{"containerPort": 80}] + }] + } + } + } diff --git a/content/zh/docs/doc-contributor-tools/snippets/atom-snippets.cson b/content/zh/docs/doc-contributor-tools/snippets/atom-snippets.cson new file mode 100644 index 0000000000000..878ccc4ed73fc --- /dev/null +++ b/content/zh/docs/doc-contributor-tools/snippets/atom-snippets.cson @@ -0,0 +1,226 @@ +# Your snippets +# +# Atom snippets allow you to enter a simple prefix in the editor and hit tab to +# expand the prefix into a larger code block with templated values. +# +# You can create a new snippet in this file by typing "snip" and then hitting +# tab. +# +# An example CoffeeScript snippet to expand log to console.log: +# +# '.source.coffee': +# 'Console log': +# 'prefix': 'log' +# 'body': 'console.log $1' +# +# Each scope (e.g. '.source.coffee' above) can only be declared once. +# +# This file uses CoffeeScript Object Notation (CSON). +# If you are unfamiliar with CSON, you can read more about it in the +# Atom Flight Manual: +# http://flight-manual.atom.io/using-atom/sections/basic-customization/#_cson + +'.source.gfm': + + # Capture variables for concept template + # For full concept template see 'newconcept' below + 'Insert concept template': + 'prefix': 'ctemplate' + 'body': 'content_template: templates/concept' + 'Insert concept overview': + 'prefix': 'coverview' + 'body': '{{% capture overview %}}' + 'Insert concept body': + 'prefix': 'cbody' + 'body': '{{% capture body %}}' + 'Insert concept whatsnext': + 'prefix': 'cnext' + 'body': '{{% capture whatsnext %}}' + + + # Capture variables for task template + # For full task template see 'newtask' below + 'Insert task template': + 'prefix': 'ttemplate' + 'body': 'content_template: templates/task' + 'Insert task overview': + 'prefix': 'toverview' + 'body': '{{% capture overview %}}' + 'Insert task prerequisites': + 'prefix': 'tprereq' + 'body': """ + {{% capture prerequisites %}} + + {{< include "task-tutorial-prereqs.md" >}} {{< version-check >}} + + {{% /capture %}} + """ + 'Insert task steps': + 'prefix': 'tsteps' + 'body': '{{% capture steps %}}' + 'Insert task discussion': + 'prefix': 'tdiscuss' + 'body': '{{% capture discussion %}}' + + + # Capture variables for tutorial template + # For full tutorial template see 'newtutorial' below + 'Insert tutorial template': + 'prefix': 'tutemplate' + 'body': 'content_template: templates/tutorial' + 'Insert tutorial overview': + 'prefix': 'tuoverview' + 'body': '{{% capture overview %}}' + 'Insert tutorial prerequisites': + 'prefix': 'tuprereq' + 'body': '{{% capture prerequisites %}}' + 'Insert tutorial objectives': + 'prefix': 'tuobjectives' + 'body': '{{% capture objectives %}}' + 'Insert tutorial lesson content': + 'prefix': 'tulesson' + 'body': '{{% capture lessoncontent %}}' + 'Insert tutorial whatsnext': + 'prefix': 'tunext' + 'body': '{{% capture whatsnext %}}' + 'Close capture': + 'prefix': 'ccapture' + 'body': '{{% /capture %}}' + 'Insert note': + 'prefix': 'anote' + 'body': """ + {{< note >}} + $1 + {{< /note >}} + """ + + # Admonitions + 'Insert caution': + 'prefix': 'acaution' + 'body': """ + {{< caution >}} + $1 + {{< /caution >}} + """ + 'Insert warning': + 'prefix': 'awarning' + 'body': """ + {{< warning >}} + $1 + {{< /warning >}} + """ + + # Misc one-liners + 'Insert TOC': + 'prefix': 'toc' + 'body': '{{< toc >}}' + 'Insert code from file': + 'prefix': 'codefile' + 'body': '{{< codenew file="$1" >}}' + 'Insert feature state': + 'prefix': 'fstate' + 'body': '{{< feature-state for_k8s_version="$1" state="$2" >}}' + 'Insert figure': + 'prefix': 'fig' + 'body': '{{< figure src="$1" title="$2" alt="$3" caption="$4" >}}' + 'Insert Youtube link': + 'prefix': 'yt' + 'body': '{{< youtube $1 >}}' + + + # Full concept template + 'Create new concept': + 'prefix': 'newconcept' + 'body': """ + --- + reviewers: + - ${1:"github-id-or-group"} + title: ${2:"topic-title"} + content_template: templates/concept + --- + {{% capture overview %}} + ${3:"overview-content"} + {{% /capture %}} + + {{< toc >}} + + {{% capture body %}} + ${4:"h2-heading-per-subtopic"} + {{% /capture %}} + + {{% capture whatsnext %}} + ${5:"next-steps-or-delete"} + {{% /capture %}} + """ + + + # Full task template + 'Create new task': + 'prefix': 'newtask' + 'body': """ + --- + reviewers: + - ${1:"github-id-or-group"} + title: ${2:"topic-title"} + content_template: templates/task + --- + {{% capture overview %}} + ${3:"overview-content"} + {{% /capture %}} + + {{< toc >}} + + {{% capture prerequisites %}} + + {{< include "task-tutorial-prereqs.md" >}} {{< version-check >}} + + ${4:"additional-prereqs-or-delete"} + + {{% /capture %}} + + {{% capture steps %}} + ${5:"h2-heading-per-step"} + {{% /capture %}} + + {{% capture discussion %}} + ${6:"task-discussion-or-delete"} + {{% /capture %}} + """ + + # Full tutorial template + 'Create new tutorial': + 'prefix': 'newtutorial' + 'body': """ + --- + reviewers: + - ${1:"github-id-or-group"} + title: ${2:"topic-title"} + content_template: templates/tutorial + --- + {{% capture overview %}} + ${3:"overview-content"} + {{% /capture %}} + + {{< toc >}} + + {{% capture prerequisites %}} + + {{< include "task-tutorial-prereqs.md" >}} {{< version-check >}} + + ${4:"additional-prereqs-or-delete"} + + {{% /capture %}} + + {{% capture objectives %}} + ${5:"tutorial-objectives"} + {{% /capture %}} + + {{% capture lessoncontent %}} + ${6:"lesson-content"} + {{% /capture %}} + + {{% capture whatsnext %}} + ${7:"next-steps-or-delete"} + {{% /capture %}} + """ + diff --git a/content/zh/docs/getting-started-guides/_index.md b/content/zh/docs/getting-started-guides/_index.md deleted file mode 100755 index e2e9a092eaab9..0000000000000 --- a/content/zh/docs/getting-started-guides/_index.md +++ /dev/null @@ -1,11 +0,0 @@ ---- -title: "独立解决方案" -weight: 50 ---- - - diff --git a/content/zh/docs/getting-started-guides/alternatives.md b/content/zh/docs/getting-started-guides/alternatives.md deleted file mode 100644 index 36b4755d67f9e..0000000000000 --- a/content/zh/docs/getting-started-guides/alternatives.md +++ /dev/null @@ -1,22 +0,0 @@ ---- -title: 弃用的替代品 ---- - - -# *停止。这些指南已被[Minikube](../minikube/)所取代,这里列出它们只是为了保持完整性。* - - -* [使用 Vagrant](https://git.k8s.io/community/contributors/devel/vagrant.md) -* *更高级的:* [直接使用 Kubernetes 原始二进制程序(仅限 Linux 系统)](https://git.k8s.io/community/contributors/devel/running-locally.md) - \ No newline at end of file diff --git a/content/zh/docs/getting-started-guides/fedora/_index.md b/content/zh/docs/getting-started-guides/fedora/_index.md deleted file mode 100644 index b7c00297f12ab..0000000000000 --- a/content/zh/docs/getting-started-guides/fedora/_index.md +++ /dev/null @@ -1,11 +0,0 @@ ---- -title: "裸金属" -weight: 60 ---- - - \ No newline at end of file diff --git a/content/zh/docs/getting-started-guides/fedora/fedora_manual_config.md b/content/zh/docs/getting-started-guides/fedora/fedora_manual_config.md deleted file mode 100644 index 596c87f6838ae..0000000000000 --- a/content/zh/docs/getting-started-guides/fedora/fedora_manual_config.md +++ /dev/null @@ -1,345 +0,0 @@ ---- -reviewers: -- aveshagarwal -- eparis -- thockin -title: Fedora (单节点) ---- - - - -{{< toc >}} - - - -## 前提条件 - - - -1. 您需要两台或更多机器安装 Fedora。这些机器可以是裸机,也可以是虚拟机。 - -## 说明 - -这是 Fedora 的入门指南。配置手工打造,因而需要了解所有底层软件包/服务/端口等等。 - -本指南只能使一个节点(以前的 minion)工作。多个节点需要在 Kubernetes 之外完成功能性网络配置。尽管额外的 Kubernetes 配置需求是显而易见的。 - -Kubernetes 包提供了一些服务:kube-apiserver、kube-scheduler、kube-control -manager、kubelet、kube-proxy。这些服务由 systemd 管理,配置位于中心位置:`/etc/kubernetes`。 -我们将打破主机之间的服务。第一个主机,fed-master,将是 Kubernetes 主节点。该主节点将运行 kube-apiserver、kube-control-manager 和 kube-scheduler。 -此外,主服务器还将运行 *etcd* (如果 *etcd* 运行在不同的主机上就不需要了,但是本指南假设 *etcd* 和 Kubernetes 主服务器在同一主机上运行)。剩下的主机,fed-node 将是节点并运行 kubelet、proxy 和 docker。 - - - - -**系统信息:** - -主机: - -```conf -fed-master = 192.168.121.9 -fed-node = 192.168.121.65 -``` - - - -**准备主机:** - -* 在所有主机(fed-{master,node})上安装 Kubernetes 。这同时也会安装 docker。接着在 fed-master 上安装 etcd。本指南已经通过 Kubernetes-0.18 及更高版本的测试。 -* 在使用 RHEL 7.2 的 AWS EC2 上运行时,您需要通过编辑 `/etc/yum.repos.d/redhat-rhui.repo` 和更改 `enable=0to` 为 `enable=1` 来为 yum 启用 “extras” 仓库。 - -```shell -dnf -y install kubernetes -``` - - - -* 安装 etcd - -```shell -dnf -y install etcd -``` - - -* 将主机和节点添加到所有机器上的 `/etc/hosts` (如果主机名已经在 DNS 中,则不需要)。通过使用 ping 等实用程序,确保 fed-master 和 fed-node 之间的通信工作正常。 - - -```shell -echo "192.168.121.9 fed-master -192.168.121.65 fed-node" >> /etc/hosts -``` - - - -* 编辑 `/etc/kubernetes/config` (在所有主机上应该是相同的)来设置主服务器的名称: - -```shell -# 逗号分隔的 etcd 群集中的节点列表 -KUBE_MASTER="--master=http://fed-master:8080" -``` - - - -* 禁用主节点和子节点上的防火墙,因为 Docker 与其他防火墙规则管理器不兼容。请注意,默认的 Fedora Server 安装中不存在 iptables.service。 - -```shell -systemctl mask firewalld.service -systemctl stop firewalld.service - -systemctl disable iptables.service -systemctl stop iptables.service -``` - - - -**在主服务器上配置 Kubernetes 服务。** - -* 编辑 `/etc/kubernetes/apiserver`,包含以下内容。`service-cluster-ip-range` 的 IP 地址必须是未使用的地址块,同时也不能在其他任何地方使用。它们不需要路由或分配给任何东西。 - - - -```shell -# 本地服务器上所要监听的地址。 -KUBE_API_ADDRESS="--address=0.0.0.0" - -# 逗号在 ETCD 集群分离节点列表 -KUBE_ETCD_SERVERS="--etcd-servers=http://127.0.0.1:2379" - -# 地址范围内使用的服务 -KUBE_SERVICE_ADDRESSES="--service-cluster-ip-range=10.254.0.0/16" - -# 添加你自己的! -KUBE_API_ARGS="" -``` - - - -* 编辑 `/etc/etcd/etcd.conf` 让 etcd 监听所有可用的 IP 地址,而不仅仅是 127.0.0.1。如果没有这样做,您可能会看到一个错误,例如 "connection refused"。 - -```shell -ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379" -``` - - - -* 在主节点上启动适当的服务: - -```shell -for SERVICES in etcd kube-apiserver kube-controller-manager kube-scheduler; do - systemctl restart $SERVICES - systemctl enable $SERVICES - systemctl status $SERVICES -done -``` - - - -**在节点上配置 Kubernetes 服务** - -***我们需要在节点上配置 kubelet。*** - -* 编辑 `/etc/kubernetes/kubelet`,加入以下内容: - - - - - -```shell -### -# Kubernetes kubelet(节点)的配置 - -# info 服务器要服务的地址(设置为 0.0.0.0 或 "" 用于所有接口) -KUBELET_ADDRESS="--address=0.0.0.0" - -# 可以留空,使用实际主机名 -KUBELET_HOSTNAME="--hostname-override=fed-node" - -# api-server 的位置 -KUBELET_ARGS="--cgroup-driver=systemd --kubeconfig=/etc/kubernetes/master-kubeconfig.yaml" - -``` - - - -* 编辑 `/etc/kubernetes/master-kubeconfig.yaml` 文件,添加以下信息: - -```yaml -kind: Config -clusters: -- name: local - cluster: - server: http://fed-master:8080 -users: -- name: kubelet -contexts: -- context: - cluster: local - user: kubelet - name: kubelet-context -current-context: kubelet-context -``` - - - -* 在节点(fed-node)上启动适当的服务。 - -```shell -for SERVICES in kube-proxy kubelet docker; do - systemctl restart $SERVICES - systemctl enable $SERVICES - systemctl status $SERVICES -done -``` - - -* 检查以确保集群在 fed-master 上可以看到 fed-node,并且它的状态更改为 _Ready_。 - -```shell -kubectl get nodes -NAME STATUS AGE VERSION -fed-node Ready 4h -``` - - - -* 删除节点: - -要从 Kubernetes 集群中删除 _fed-node_,应该在 fed-master 上运行以下命令(这只是演示用): - - -```shell -kubectl delete -f ./node.json -``` - - - -*到此为止!* - -**集群应该正在运行!创建测试 pod。** - -## 支持级别 - - - -IaaS 供应商 | 配置管理 | 操作系统| 网络 | 文档 | 合规 | 支持级别 - --------------------- | ------------ | ------ | ---------- | --------------------------------------------- | ---------| ---------------------------- -Bare-metal | custom | Fedora | _none_ | [文档](/docs/getting-started-guides/fedora/fedora_manual_config) | | 项目 - -有关所有解决方案的支持级别信息,请参见[解决方案表](/docs/getting-started-guides/#table-of-solutions)。 - - diff --git a/content/zh/docs/getting-started-guides/fedora/flannel_multi_node_cluster.md b/content/zh/docs/getting-started-guides/fedora/flannel_multi_node_cluster.md deleted file mode 100644 index 634167ccc4c21..0000000000000 --- a/content/zh/docs/getting-started-guides/fedora/flannel_multi_node_cluster.md +++ /dev/null @@ -1,332 +0,0 @@ ---- -reviewers: -- dchen1107 -- erictune -- thockin -title: Fedora (多节点) ---- - - - -{{< toc >}} - - -本文档描述了如何在多个主机上部署 Kubernetes 来建立一个多节点集群和 flannel 网络。遵循 fedora 入门指南设置 1 个主节点 (fed-master) - 和 2 个或更多节点。确保所有节点具有不同的名称(fed-node1、fed-node2 等等)和标签(fed-node1-label、fed-node2-label 等等),以避免 -任何冲突。还要确保 Kubernetes 主节点主机正在运行 etcd、kube-controller-manager、kube-scheduler 和 kube-apiserver 服务,节点正在 - 运行 docker、kube-proxy 和 kubelet 服务。现在在 Kubernetes 节点上安装 flannel。每个节点上的 flannel 配置 docker 使用的 overlay 网络。 - Flannel 在每个节点上运行,以设置一个唯一的 class-C 容器网络。 - - - -## 前提条件 - - -安装 Fedora 您需要两台或更多机器。 - - - -## 主节点设置 - - - -**在 Kubernetes 主节点上执行以下命令** - -* 在您当前的目录上的 fed-master 中通过创建一个 `flannel-config.json` 来配置 flannel。Flannel 在其他 overlay 网络后端选项中提供 udp 和 vxlan 。在本指南中,我们选择基于内核的 vxlan 后端。json 的内容为: - -```json -{ - "Network": "18.16.0.0/16", - "SubnetLen": 24, - "Backend": { - "Type": "vxlan", - "VNI": 1 - } -} -``` - -{{< note >}} - - -选择一个不在公共 IP 地址范围内的 IP 范围。 - -{{< /note >}} - - -将配置添加到 fed-master 上的 etcd 服务器。 - -```shell -etcdctl set /coreos.com/network/config < flannel-config.json -``` - - - -* 验证 fed-master 上的 etcd 服务器中是否存在该密钥。 - -```shell -etcdctl get /coreos.com/network/config -``` - - - -## 节点设置 - - - -**在所有 Kubernetes 节点上执行以下命令** - - - -安装 flannel 包 - -```shell -# dnf -y install flannel -``` - - - -编辑 flannel 配置文件 /etc/sysconfig/flanneld,如下所示: - - - -```shell -# Flanneld 配置选项 - -# etcd url 位置,将此指向 etcd 运行的服务器 -FLANNEL_ETCD="http://fed-master:2379" - -# etcd 配置的键。这是 flannel 查询的配置键 -# 用于地址范围分配 -FLANNEL_ETCD_KEY="/coreos.com/network" - -# 您想要传递的任何附加选项 -FLANNEL_OPTIONS="" -``` - -{{< note >}} - - -默认情况下,flannel 使用默认路由的接口。如果您有多个接口并且想要使用默认路由以外的接口,则可以将 "-iface=" 添加到 FLANNEL_OPTIONS。有关其他选项,请在命令行上运行 `flanneld --help`。 - -{{< /note >}} - - -启用 flannel 服务。 - -```shell -systemctl enable flanneld -``` - - -如果 docker 没有运行,那么启动 flannel 服务就足够了,跳过下一步。 - -```shell -systemctl start flanneld -``` - - -如果 docker 已经运行,则停止 docker,删除 docker bridge(docker0),启动 flanneld 并重新启动 docker,如下所示。另一种方法是重启系统(systemctl reboot)。 - -```shell -systemctl stop docker -ip link delete docker0 -systemctl start flanneld -systemctl start docker -``` - - - -## 测试集群和 flannel 配置 - - -现在检查节点上的接口。请注意,现在有一个 flannel.1 接口,docker0 和 flannel.1 接口的 ip 地址在同一个网络中。您会注意到 docker0 在上面配置的 IP 范围之外的每个 Kubernetes 节点上分配了一个子网(18.16.29.0/24,如下所示)。 正常运行的输出应如下所示: - - -```shell -# ip -4 a|grep inet - inet 127.0.0.1/8 scope host lo - inet 192.168.122.77/24 brd 192.168.122.255 scope global dynamic eth0 - inet 18.16.29.0/16 scope global flannel.1 - inet 18.16.29.1/24 scope global docker0 -``` - - -从集群中的任何节点,通过 curl 向 etcd 服务器发出查询来检查集群成员(仅显示部分输出 `grep -E "\{|\}|key|value`)。如果您设置了 1 个主节点和 3 个节点集群,您应该会看到每个节点都有一个块,显示分配给它们的子网。您可以通过输出中列出的 MAC 地址(VtepMAC)和 IP 地址(公共 IP) 将这些子网关联到每个节点。 - -```shell -curl -s http://fed-master:2379/v2/keys/coreos.com/network/subnets | python -mjson.tool -``` - -```json -{ - "node": { - "key": "/coreos.com/network/subnets", - { - "key": "/coreos.com/network/subnets/18.16.29.0-24", - "value": "{\"PublicIP\":\"192.168.122.77\",\"BackendType\":\"vxlan\",\"BackendData\":{\"VtepMAC\":\"46:f1:d0:18:d0:65\"}}" - }, - { - "key": "/coreos.com/network/subnets/18.16.83.0-24", - "value": "{\"PublicIP\":\"192.168.122.36\",\"BackendType\":\"vxlan\",\"BackendData\":{\"VtepMAC\":\"ca:38:78:fc:72:29\"}}" - }, - { - "key": "/coreos.com/network/subnets/18.16.90.0-24", - "value": "{\"PublicIP\":\"192.168.122.127\",\"BackendType\":\"vxlan\",\"BackendData\":{\"VtepMAC\":\"92:e2:80:ba:2d:4d\"}}" - } - } -} -``` - - -从所有节点,查看 `/run/flannel/subnet.env` 文件。这个文件是由 flannel 自动生成的。 - -```shell -# cat /run/flannel/subnet.env -FLANNEL_SUBNET=18.16.29.1/24 -FLANNEL_MTU=1450 -FLANNEL_IPMASQ=false -``` - - -此时,我们在 Kubernetes 主节点上运行了 etcd,在 Kubernetes 节点上运行了 flannel / docker。接下来的步骤是测试跨主机容器通信,这将确认 docker 和 flannel 配置正确。 - - -在任意两个节点上发出以下命令: - -```shell -# docker run -it fedora:latest bash -bash-4.3# -``` - - -您将会进入容器中。安装 iproute 和 iputils 包来安装 ip 和 ping 实用程序。由于一个[错误](https://bugzilla.redhat.com/show_bug.cgi?id=1142311),需要修改 ping 二进制文件的功能来处理"操作不允许"错误。 - -```shell -bash-4.3# dnf -y install iproute iputils -bash-4.3# setcap cap_net_raw-ep /usr/bin/ping -``` - - -现在记下第一个节点上的 IP 地址: - -```shell -bash-4.3# ip -4 a l eth0 | grep inet - inet 18.16.29.4/24 scope global eth0 -``` - - -还要注意另一个节点上的 IP 地址: - -```shell -bash-4.3# ip a l eth0 | grep inet - inet 18.16.90.4/24 scope global eth0 -``` - - -现在从第一个节点 ping 到另一个节点: - -```shell -bash-4.3# ping 18.16.90.4 -PING 18.16.90.4 (18.16.90.4) 56(84) bytes of data. -64 bytes from 18.16.90.4: icmp_seq=1 ttl=62 time=0.275 ms -64 bytes from 18.16.90.4: icmp_seq=2 ttl=62 time=0.372 ms -``` - - -现在,Kubernetes 多节点集群通过 flannel 设置 overlay 网络。 - - - -## 支持级别 - - - -IaaS 供应商 | 配置 管理 | 系统 | 网络 | 文档 | 标准 | 支持级别 --------------------- | ------------ | ------ | ---------- | --------------------------------------------- | ---------| ---------------------------- -Bare-metal | custom | Fedora | flannel | [docs](/docs/getting-started-guides/fedora/flannel_multi_node_cluster/) | | Community ([@aveshagarwal](https://github.com/aveshagarwal)) -libvirt | custom | Fedora | flannel | [docs](/docs/getting-started-guides/fedora/flannel_multi_node_cluster/) | | Community ([@aveshagarwal](https://github.com/aveshagarwal)) -KVM | custom | Fedora | flannel | [docs](/docs/getting-started-guides/fedora/flannel_multi_node_cluster/) | | Community ([@aveshagarwal](https://github.com/aveshagarwal)) - - - diff --git a/content/zh/docs/getting-started-guides/ubuntu.md b/content/zh/docs/getting-started-guides/ubuntu.md deleted file mode 100644 index ae93616dfc61f..0000000000000 --- a/content/zh/docs/getting-started-guides/ubuntu.md +++ /dev/null @@ -1,63 +0,0 @@ ---- -title: 在 Ubuntu 上运行 Kubernetes -content_template: templates/concept ---- - - -{{% capture overview %}} - -有多种方法可以在公有云、私有云以及裸金属上运行基于 Ubuntu 的 Kubernetes 集群。 -{{% /capture %}} - -{{% capture body %}} - -## Kubernetes Charmed 发行版(CDK) - - -[CDK](https://www.ubuntu.com/cloud/kubernetes) 是 Kubernetes 的一个发行版,作为开源应用程序建模器 Juju 的一组 *charms*。 - - -CDK 是附带了上游二进制文件的 Kubernetes 最新发行版,采用了一种支持快速简单部署的打包格式。它支持各种公有云和私有云,包括 AWS,GCE,Azure,Joyent,OpenStack,VMware 以及 Bare Metaland 本地部署。 - - -请参阅[官方文档](https://www.ubuntu.com/kubernetes/docs)获取详细信息。 - - -## MicroK8s - - -[MicroK8s](https://microk8s.io) 是 Kubernetes 的最小安装,旨在在本地运行。 - -它可以使用以下命令安装在 Ubuntu(或任何支持 snap 的操作系统)上: - -```shell -snap install microk8s --classic -``` - - -[MicroK8s 网站](https://microk8s.io/docs)上提供了完整的文档。 - -{{% /capture %}} diff --git a/content/zh/docs/getting-started-guides/ubuntu/_index.md b/content/zh/docs/getting-started-guides/ubuntu/_index.md deleted file mode 100644 index e3c8a54e9fdd8..0000000000000 --- a/content/zh/docs/getting-started-guides/ubuntu/_index.md +++ /dev/null @@ -1,140 +0,0 @@ ---- -title: Ubuntu 上运行 Kubernetes -content_template: templates/concept ---- - - -{{% capture overview %}} - - -使用 Ubuntu 运行 Kubernetes 集群有多种方法。 这些页面阐释了如何在多种公共云、私有云和裸机的 Ubuntu 上部署 Kubernetes。 -{{% /capture %}} - -{{% capture body %}} - -## 官方 Ubuntu 指南 - -- [Kubernetes 的 Canonical 发行版](https://www.ubuntu.com/cloud/kubernetes) - -最新版 Kubernetes 上游二进制文件。 支持 AWS、GCE、Azure、Joyent、OpenStack、VMware、裸机和 localhost 部署。 - - -### 快速入门 - -[conjure-up](http://conjure-up.io/) 提供了在多种云和裸机的 Ubuntu 上部署 Kubernetes 的最快方法。它提供了用户友好的界面,提示您提供云凭据和配置选项 - -适用于 Ubuntu 16.04 及更高版本: - -``` -sudo snap install conjure-up --classic -# 如果您刚刚安装了 snap 工具,可能需要重新登录。 -conjure-up kubernetes -``` - - - -以及用于 macOS 的 Homebrew: - -``` -brew install conjure-up -conjure-up kubernetes -``` - - - -### 操作指南 - -这些是用户在生产中运行 Kubernetes 的更深入的指南: - - - [安装](/docs/getting-started-guides/ubuntu/installation/) - - [验证](/docs/getting-started-guides/ubuntu/validation/) - - [备份](/docs/getting-started-guides/ubuntu/backups/) - - [升级](/docs/getting-started-guides/ubuntu/upgrades/) - - [缩放](/docs/getting-started-guides/ubuntu/scaling/) - - [日志](/docs/getting-started-guides/ubuntu/logging/) - - [监控](/docs/getting-started-guides/ubuntu/monitoring/) - - [网络](/docs/getting-started-guides/ubuntu/networking/) - - [安全](/docs/getting-started-guides/ubuntu/security/) - - [存储](/docs/getting-started-guides/ubuntu/storage/) - - [故障排除](/docs/getting-started-guides/ubuntu/troubleshooting/) - - [退役](/docs/getting-started-guides/ubuntu/decommissioning/) - - [操作因素](/docs/getting-started-guides/ubuntu/operational-considerations/) - - [词汇表](/docs/getting-started-guides/ubuntu/glossary/) - - - - -## 第三方产品集成 - - - [Rancher](/docs/getting-started-guides/ubuntu/rancher/) - -## 开发者指南 - - - [Localhost 使用 LXD](/docs/getting-started-guides/ubuntu/local/) - - - -## 如何找到我们 - -我们通常关注以下 Slack 频道: - -- [kubernetes-users](https://kubernetes.slack.com/messages/kubernetes-users/) -- [kubernetes-novice](https://kubernetes.slack.com/messages/kubernetes-novice/) -- [sig-cluster-lifecycle](https://kubernetes.slack.com/messages/sig-cluster-lifecycle/) -- [sig-cluster-ops](https://kubernetes.slack.com/messages/sig-cluster-ops/) -- [sig-onprem](https://kubernetes.slack.com/messages/sig-onprem/) - -而且我们会查看 Kubernetes 邮件列表。 -{{% /capture %}} - diff --git a/content/zh/docs/getting-started-guides/ubuntu/backups.md b/content/zh/docs/getting-started-guides/ubuntu/backups.md deleted file mode 100644 index 7fa1213a5da5e..0000000000000 --- a/content/zh/docs/getting-started-guides/ubuntu/backups.md +++ /dev/null @@ -1,189 +0,0 @@ ---- -title: 备份 -content_template: templates/task ---- - -{{% capture overview %}} - - -Kubernetes 集群的状态信息保存在 etcd 数据库中。 -本文将要展示如何对 Canonical 发行版的 Kubernetes 中所带有的 etcd 进行备份和恢复。 -至于如何对通常保存在持久卷上的应用数据进行备份,超出了本文的讨论范围。 - -{{% /capture %}} - -{{% capture prerequisites %}} - -本文假设您有一个 Juju 部署的集群。 -{{% /capture %}} - -{{% capture steps %}} - -## 快照 etcd 中的数据 - - - -etcd charm 的 `snapshot` 操作能够让操作员给正在运行的集群数据建立快照,快照数据可用于复制、备份或者迁移到一个新的集群中。 - - juju run-action etcd/0 snapshot - -这条命令会在 `/home/ubuntu/etcd-snapshots` 默认路径下建立一个快照。 - - -## 恢复 etcd 数据 - - - -etcd charm 能够通过 `restore` 操作从一个集群数据快照中恢复集群数据。 -这里有些注意事项,而且是恢复集群的唯一办法:集群当前只能有一个成员。 -所以最好是使用 etcd charm 来部署一个新的集群,而不必添加任何新的单元。 - -``` -juju deploy etcd new-etcd -``` - - -上面的命令将会部署一个单独的 etcd 单元,'new-etcd'。 - -``` -juju run-action etcd/0 restore target=/mnt/etcd-backups -``` - - - -当恢复操作完成后,评估一下集群的健康状态。如果集群运行良好,就可以按照您的需求来扩展应用程序规模。 - -- **参数** target: 保存现有数据的目的路径。 -- **参数** skip-backup: 不要备份任何现有的数据。 - - - - -## 迁移 etcd 集群 -通过使用上述的 `snapshot` 和 `restore` 操作,就能很容易地迁移 etcd 集群。 - - -**第一步:** 给现有的集群建立快照。这个已经封装在 `snapshot` 操作中。 - -``` -juju run-action etcd/0 snapshot -``` - - -结果: - -``` -Action queued with id: b46d5d6f-5625-4320-8cda-b611c6ae580c -``` - - -**第二步:** 检查操作状态,以便您能抓取快照并且验证校验和。 -您可以直接使用 `copy.cmd` 中的结果来下载您刚刚创建的快照数据,`copy.cmd` 中的结果可以直接复制/粘贴使用。 - - -从节点上下载刚刚创建的快照数据并且验证 sha256sum 校验和 - -``` -juju show-action-output b46d5d6f-5625-4320-8cda-b611c6ae580c -``` - - -结果: - -``` -results: - copy: - cmd: juju scp etcd/0:/home/ubuntu/etcd-snapshots/etcd-snapshot-2016-11-09-02.41.47.tar.gz - . - snapshot: - path: /home/ubuntu/etcd-snapshots/etcd-snapshot-2016-11-09-02.41.47.tar.gz - sha256: 1dea04627812397c51ee87e313433f3102f617a9cab1d1b79698323f6459953d - size: 68K -status: completed -``` - - -将数据快照拷到本地,然后检查 sha256sum。 - -``` -juju scp etcd/0:/home/ubuntu/etcd-snapshots/etcd-snapshot-2016-11-09-02.41.47.tar.gz . -sha256sum etcd-snapshot-2016-11-09-02.41.47.tar.gz -``` - - -**第三步:** 部署新的集群 leader 节点,并加载快照数据: - -``` -juju deploy etcd new-etcd --resource snapshot=./etcd-snapshot-2016-11-09-02.41.47.tar.gz -``` - - -**第四步:** 使用在第三步中的快照数据来重新初始化 master: - -``` -juju run-action new-etcd/0 restore -``` - -{{% /capture %}} - -{{% capture discussion %}} - -## 已知的局限 - - -#### 丢失 PKI 警告 - - - -如果销毁了 leader - 在状态栏通过 `*` 来标识,那么所有的 TLS pki 警告都将会丢失。 -在请求和注册证书的单元之外,将不会有 PKI 迁移发生。 - -{{< caution >}} - - -**警告:** 如果误管理这项配置,将会导致您无法从外部访问集群, -并且很可能会破坏现有的部署,出现 x509 证书验证相关的异常问题,这些都会对服务器和客户端造成影响。 - -{{< /caution >}} - - -#### 在一个已经扩展的集群上进行快照数据恢复 - - - -在一个已经扩展的集群上进行快照数据恢复,将会导致集群损坏。 -etcd 在节点启动时开始集群管理,并且将状态保存在 etcd 中。 -在快照数据的恢复阶段,会初始化一个新的集群 ID,并且丢弃其它 peer 节点以保证快照数据的恢复。 -请严格遵照上述集群迁移中的恢复操作来进行操作。 - -{{% /capture %}} diff --git a/content/zh/docs/getting-started-guides/ubuntu/decommissioning.md b/content/zh/docs/getting-started-guides/ubuntu/decommissioning.md deleted file mode 100644 index 9956c0f84c107..0000000000000 --- a/content/zh/docs/getting-started-guides/ubuntu/decommissioning.md +++ /dev/null @@ -1,103 +0,0 @@ ---- -title: 销毁 -content_template: templates/task ---- - - - - -{{% capture overview %}} - -本页将展示如何销毁一个集群。 -{{% /capture %}} - - -{{% capture prerequisites %}} - -本页假设有一个使用 Juju 部署的、正在运行的集群。 - -{{< warning >}} - -当您到达这一步时,您应该已经对集群的相关内容进行了备份;这部分将彻底销毁一个集群。 -{{< /warning >}} - -{{% /capture %}} - -{{% capture steps %}} - -## 破坏 Juju 模型 - - - -建议使用各自的模型来相应地部署 Kubernetes 集群, -以便各个环境之间能够界限分明。 -如果想要删除一个集群,首先需要通过 `juju list-models` 命令找到其对应的模型。 -控制器为其自身预留了 `admin` 这个模型。 -如果没有命名模型,则模型名可能会显示为 `default`。 - -``` -$ juju list-models -Controller: aws-us-east-2 - -Model Cloud/Region Status Machines Cores Access Last connection -controller aws/us-east-2 available 1 2 admin just now -my-kubernetes-cluster* aws/us-east-2 available 12 22 admin 2 minutes ago -``` - - -销毁模型,模型内的集群也随之被销毁: - - juju destroy-model my-kubernetes-cluster - -``` -$ juju destroy-model my-kubernetes-cluster -WARNING! This command will destroy the "my-kubernetes-cluster" model. -This includes all machines, applications, data and other resources. - -Continue [y/N]? y -Destroying model -Waiting on model to be removed, 12 machine(s), 10 application(s)... -Waiting on model to be removed, 12 machine(s), 9 application(s)... -Waiting on model to be removed, 12 machine(s), 8 application(s)... -Waiting on model to be removed, 12 machine(s), 7 application(s)... -Waiting on model to be removed, 12 machine(s)... -Waiting on model to be removed... -$ -``` - - -这将会彻底破坏并销毁所有节点。 -运行 `juju status` 命令可以确认所有节点是否已经被销毁。 - - -如果使用的是公有云,命令将会终止所有的实例。 -如果使用的是 MAAS 裸机,命令将会释放所有的节点,(可能)清空磁盘,关闭机器, -然后将节点资源返回到可用的机器池中。 - - -## 清理控制器 - - -如果控制器没有其它的用途,还需要删除控制器实例: - -``` -$ juju list-controllers -Use --refresh flag with this command to see the latest information. - -Controller Model User Access Cloud/Region Models Machines HA Version -aws-us-east-2* - admin superuser aws/us-east-2 2 1 none 2.0.1 - -$ juju destroy-controller aws-us-east-2 -WARNING! This command will destroy the "aws-us-east-2" controller. -This includes all machines, applications, data and other resources. - -Continue? (y/N):y -Destroying controller -Waiting for hosted model resources to be reclaimed -All hosted models reclaimed, cleaning up controller machines -$ -``` -{{% /capture %}} diff --git a/content/zh/docs/getting-started-guides/ubuntu/glossary.md b/content/zh/docs/getting-started-guides/ubuntu/glossary.md deleted file mode 100644 index 43528245ab251..0000000000000 --- a/content/zh/docs/getting-started-guides/ubuntu/glossary.md +++ /dev/null @@ -1,50 +0,0 @@ ---- -title: 词汇与术语 -content_template: templates/concept ---- - - - - - -{{%capture overview%}} -本页介绍了用 Juju 部署 Kubernetes 时使用的一些术语。 -{{%/ capture%}} - -{{%capture body%}} - - - -**controller** - 云环境的管理节点。通常,每个域(Region)都有一个 controller,在高可用环境中有更多 controller。每个 controller 负责管理给定环境中的所有后续 model。Controller 中包含 Juju API 服务器及其底层数据库。 - -**model** - 定义 Deployments 的一系列 charms 及其关系的集合。model 之中包括 machine 和更小的 unit。每个 controller 可以托管多个 model。出于管理和隔离的原因,建议将 Kubernetes 集群分成独立的 model。 - -**charm** - 每个 charm 对应一个 Service 的定义,包括其元数据、与其他服务间的依赖关系、所需的包和应用管理逻辑。 -其中包含部署 Kubernetes 集群的所有操作知识。内置的 charms 例子有 `kubernetes-core`、`easyrsa`、`flannel` 和 `etcd` 等。 - -**unit** - 对应某 Service 的给定实例。每个 unit 可能会也可能不会耗尽某指定机器上的所有资源。多个 unit 可能部署在同一台机器上。例如,您可能在一台机器上运行 `kubernetes-worker` 和 `etcd` 以及 `easyrsa` unit,但它们是基于不同服务的三个独立的 unit。 - -**machine** - 物理节点,可以是裸机节点,也可以是云提供商提供的虚拟机。 -{{%/ capture%}} diff --git a/content/zh/docs/getting-started-guides/ubuntu/installation.md b/content/zh/docs/getting-started-guides/ubuntu/installation.md deleted file mode 100644 index 93d61d742412e..0000000000000 --- a/content/zh/docs/getting-started-guides/ubuntu/installation.md +++ /dev/null @@ -1,542 +0,0 @@ ---- -reviewers: -- caesarxuchao -- erictune -title: 用 Juju 搭建 Kubernetes -content_template: templates/task ---- - - - - - -{% capture overview %} -Ubuntu 16.04 已公开 [Kubernetes 的 Canonical 发行版 ](https://www.ubuntu.com/cloud/kubernetes), 一套为生产环境设计的 Kubernetes 上游版本。本文将为您演示如何部署集群。 -{% endcapture %} - - - -{{% capture prerequisites %}} -- 一个可用的 [Juju 客户端](https://jujucharms.com/docs/2.3/reference-install);不一定要是 Linux 机器,也可以是 Windows 或 OSX。 -- 一个[受支持的云](#cloud-compatibility)。 - - 裸机部署可以通过 [MAAS](http://maas.io) 实现。 配置指南参见 [MAAS 文档](http://maas.io/docs/)。 - - OpenStack 部署目前只在 Icehouse 及更新版本上测试通过。 -- 下面任一一种选项: - - 可以网络访问以下站点 - - *.jujucharms.com - - gcr.io - - github.com - - 访问 Ubuntu 镜像源(公共的或私有的) - - 通过[这些](https://github.com/juju-solutions/bundle-canonical-kubernetes/wiki/Running-CDK-in-a-restricted-environment)步骤准备好离线部署。 -{{% /capture %}} - - - -{{% capture steps %}} -## 部署概述 - -开箱即用的部署由以下组件构成,部署在 9 台机器上: - -- Kubernetes (自动化部署,运营及伸缩) - - 具有一个主节点和三个工作节点的四节点 Kubernetes 集群。 - - 使用 TLS 实现组件间的安全通信。 - - Flannel 软件定义网络 (SDN) 插件 - - 一个负载均衡器以实现 kubernetes-master 的高可用 (实验阶段) - - 可选的 Ingress 控制器(在工作节点上) - - 可选的 Dashboard 插件(在主节点上),包含实现集群监控的 Heapster 插件 -- EasyRSA - - 扮演证书授权机构的角色,向集群中的组件提供自签名证书 -- ETCD (分布式键值存储) - - 三节点的集群达到高可靠性。 - - - -Juju Kubernetes 工作由 Canonical Ltd(https://www.canonical.com/) 的 Big Software 团队整理,欢迎对我们的工作给出反馈意见。 -如果发现任何问题,请提交相应的 [Issue 到跟踪系统](https://github.com/juju-solutions/bundle-canonical-kubernetes),以便我们解决。 - - - -## 支持级别 - -IaaS 提供商 | 配置管理 | 系统 | 网络 | 文档 | 符合 | 支持级别 --------------------- | ------------ | ------ | ---------- | --------------------------------------------- | ---------| ---------------------------- -Amazon Web Services (AWS) | Juju | Ubuntu | flannel, calico* | [docs](/docs/getting-started-guides/ubuntu) | | [Commercial](https://ubuntu.com/cloud/kubernetes), [Community](https://github.com/juju-solutions/bundle-kubernetes-core) -OpenStack | Juju | Ubuntu | flannel, calico | [docs](/docs/getting-started-guides/ubuntu) | | [Commercial](https://ubuntu.com/cloud/kubernetes), [Community](https://github.com/juju-solutions/bundle-kubernetes-core) -Microsoft Azure | Juju | Ubuntu | flannel | [docs](/docs/getting-started-guides/ubuntu) | | [Commercial](https://ubuntu.com/cloud/kubernetes), [Community](https://github.com/juju-solutions/bundle-kubernetes-core) -Google Compute Engine (GCE) | Juju | Ubuntu | flannel, calico | [docs](/docs/getting-started-guides/ubuntu) | | [Commercial](https://ubuntu.com/cloud/kubernetes), [Community](https://github.com/juju-solutions/bundle-kubernetes-core) -Joyent | Juju | Ubuntu | flannel | [docs](/docs/getting-started-guides/ubuntu) | | [Commercial](https://ubuntu.com/cloud/kubernetes), [Community](https://github.com/juju-solutions/bundle-kubernetes-core) -Rackspace | Juju | Ubuntu | flannel | [docs](/docs/getting-started-guides/ubuntu) | | [Commercial](https://ubuntu.com/cloud/kubernetes), [Community](https://github.com/juju-solutions/bundle-kubernetes-core) -VMWare vSphere | Juju | Ubuntu | flannel, calico | [docs](/docs/getting-started-guides/ubuntu) | | [Commercial](https://ubuntu.com/cloud/kubernetes), [Community](https://github.com/juju-solutions/bundle-kubernetes-core) -Bare Metal (MAAS) | Juju | Ubuntu | flannel, calico | [docs](/docs/getting-started-guides/ubuntu) | | [Commercial](https://ubuntu.com/cloud/kubernetes), [Community](https://github.com/juju-solutions/bundle-kubernetes-core) - - - - -有关所有解决方案的支持级别信息,请参见[解决方案表](/docs/getting-started-guides/#table-of-solutions)。 - -## 安装选项 - -可以通过下面任一一种方式启动集群:[conjure-up](#conjure-up) or [juju 部署](#juju-deploy)。Conjure-up 只是一个对 juju 的简易封装,简化了安装的过程。正因为如此,这也是推荐的安装方法。 - -可以在 [众多不同的公有云](#cloud-compatibility),私有 OpenStack 云,或者是原始的裸机集群上部署集群软件。通过 [MAAS](http://maas.io) 实现裸机部署。 - -## Conjure-up - -通过 conjure-up 来安装 Kubernetes, 只需要运行下面的命令,然后根据提示做选择: - -``` -sudo snap install conjure-up --classic -conjure-up kubernetes -``` - -## Juju 部署 - -### 配置 Juju 使用您的云提供商 - -确定所要部署的云之后,按照[云安装界面](https://jujucharms.com/docs/devel/getting-started)来配置、部署到该云。 - -加载[云凭证](https://jujucharms.com/docs/2.3/credentials)来选择、使用相应的云。 - - - -在本例中 - -``` -juju add-credential aws -credential name: my_credentials -select auth-type [userpass, oauth, etc]: userpass -enter username: jorge -enter password: ******* -``` - -也可以通过 `juju autoload-credentials` 命令自动加载常用的云凭证,该命令将自动从每个云的默认文件和环境变量中导入凭据信息。 - - - -接下来,我们需要启动一个控制器来管理集群。您需要确定所要启动的云,地区以及控制器节点的名字: - -``` -juju update-clouds # 这个命令可以确保客户端上所有最新的区域是最新的 -juju bootstrap aws/us-east-2 -``` -或者,另外一个例子,这次是在 Azure 上: - -``` -juju bootstrap azure/westus2 -``` - - - -如果您看到下面的错误信息,很可能默认的 Azure VM (Standard D1 v2 [1 vcpu, 3.5 GB memory]) 并不在当前的 Azure 地区。 -``` -ERROR failed to bootstrap model: instance provisioning failed (Failed) -``` - - - - -您需要为部署到的每个云或区域分配一个控制器节点。更多信息参见[控制器文档](https://jujucharms.com/docs/2.3/controllers)。 - -请注意,每个控制器可以在给定的云或区域中管理多个 Kubernetes 集群。 - - - -## 启动 Kubernetes 集群 - -以下命令将部署 9-节点的初始集群。执行速度取决于您所要部署到的云的性能: - -``` -juju deploy canonical-kubernetes -``` - -执行完此命令后,云将启动实例并开始部署过程。 - - - -## 监控部署 - -`juju status` 命令提供集群中每个单元的信息。`watch -c juju status --color` 命令可以获取集群部署的实时状态。 -当所有的状态是绿色并且“空闲”时,表示集群处于待用状态: - - juju status - -输出结果: - -``` -Model Controller Cloud/Region Version SLA -conjure-canonical-kubern-f48 conjure-up-aws-650 aws/us-east-2 2.3.2 unsupported - -App Version Status Scale Charm Store Rev OS Notes -easyrsa 3.0.1 active 1 easyrsa jujucharms 27 ubuntu -etcd 2.3.8 active 3 etcd jujucharms 63 ubuntu -flannel 0.9.1 active 4 flannel jujucharms 40 ubuntu -kubeapi-load-balancer 1.10.3 active 1 kubeapi-load-balancer jujucharms 43 ubuntu exposed -kubernetes-master 1.9.3 active 1 kubernetes-master jujucharms 13 ubuntu -kubernetes-worker 1.9.3 active 3 kubernetes-worker jujucharms 81 ubuntu exposed - -Unit Workload Agent Machine Public address Ports Message -easyrsa/0* active idle 3 18.219.190.99 Certificate Authority connected. -etcd/0 active idle 5 18.219.56.23 2379/tcp Healthy with 3 known peers -etcd/1* active idle 0 18.219.212.151 2379/tcp Healthy with 3 known peers -etcd/2 active idle 6 13.59.240.210 2379/tcp Healthy with 3 known peers -kubeapi-load-balancer/0* active idle 1 18.222.61.65 443/tcp Loadbalancer ready. -kubernetes-master/0* active idle 4 18.219.105.220 6443/tcp Kubernetes master running. - flannel/3 active idle 18.219.105.220 Flannel subnet 10.1.78.1/24 -kubernetes-worker/0 active idle 2 18.219.221.98 80/tcp,443/tcp Kubernetes worker running. - flannel/1 active idle 18.219.221.98 Flannel subnet 10.1.38.1/24 -kubernetes-worker/1* active idle 7 18.219.249.103 80/tcp,443/tcp Kubernetes worker running. - flannel/2 active idle 18.219.249.103 Flannel subnet 10.1.68.1/24 -kubernetes-worker/2 active idle 8 52.15.89.16 80/tcp,443/tcp Kubernetes worker running. - flannel/0* active idle 52.15.89.16 Flannel subnet 10.1.73.1/24 - -Machine State DNS Inst id Series AZ Message -0 started 18.219.212.151 i-065eab4eabc691b25 xenial us-east-2a running -1 started 18.222.61.65 i-0b332955f028d6281 xenial us-east-2b running -2 started 18.219.221.98 i-0879ef1ed95b569bc xenial us-east-2a running -3 started 18.219.190.99 i-08a7b364fc008fc85 xenial us-east-2c running -4 started 18.219.105.220 i-0f92d3420b01085af xenial us-east-2a running -5 started 18.219.56.23 i-0271f6448cebae352 xenial us-east-2c running -6 started 13.59.240.210 i-0789ef5837e0669b3 xenial us-east-2b running -7 started 18.219.249.103 i-02f110b0ab042f7ac xenial us-east-2b running -8 started 52.15.89.16 i-086852bf1bee63d4e xenial us-east-2c running - -Relation provider Requirer Interface Type Message -easyrsa:client etcd:certificates tls-certificates regular -easyrsa:client kubeapi-load-balancer:certificates tls-certificates regular -easyrsa:client kubernetes-master:certificates tls-certificates regular -easyrsa:client kubernetes-worker:certificates tls-certificates regular -etcd:cluster etcd:cluster etcd peer -etcd:db flannel:etcd etcd regular -etcd:db kubernetes-master:etcd etcd regular -kubeapi-load-balancer:loadbalancer kubernetes-master:loadbalancer public-address regular -kubeapi-load-balancer:website kubernetes-worker:kube-api-endpoint http regular -kubernetes-master:cni flannel:cni kubernetes-cni subordinate -kubernetes-master:kube-api-endpoint kubeapi-load-balancer:apiserver http regular -kubernetes-master:kube-control kubernetes-worker:kube-control kube-control regular -kubernetes-worker:cni flannel:cni kubernetes-cni subordinate -``` - - - -## 与集群的交互 - -部署完集群后,您可以在任意一个 kubernetes-master 或 kubernetes-worker 节点取得集群的控制权。 - -如果您没有使用 conjure-up,那么您需要先将凭据和客户端程序下载到本地工作站上: - -创建 kubectl 配置信息目录。 - -``` -mkdir -p ~/.kube -``` - -将 kubeconfig 文件复制到默认位置。 - -``` -juju scp kubernetes-master/0:config ~/.kube/config -``` - - - -下一步是在本地机器上安装 kubectl 客户端。在 Ubuntu 上推荐的安装方式是使用 kubectl snap ([/docs/tasks/tools/install-kubectl/#install-with-snap-on-ubuntu](/docs/tasks/tools/install-kubectl/#install-with-snap-on-ubuntu))。 - -可以运行下面的命令便可以控制 kubernetes 集群了: - -``` -sudo snap install kubectl --classic -``` - -这条命令会安装和部署 kubectl 程序。安装完成后,您可能需要重启命令窗口(因为 $PATH 已经被更新)。 - - - -查询集群: - kubectl cluster-info - -输出结果: - -``` -Kubernetes master is running at https://52.15.104.227:443 -Heapster is running at https://52.15.104.227:443/api/v1/namespaces/kube-system/services/heapster/proxy -KubeDNS is running at https://52.15.104.227:443/api/v1/namespaces/kube-system/services/kube-dns/proxy -Grafana is running at https://52.15.104.227:443/api/v1/namespaces/kube-system/services/monitoring-grafana/proxy -InfluxDB is running at https://52.15.104.227:443/api/v1/namespaces/kube-system/services/monitoring-influxdb/proxy -``` - - - -## 为集群垂直扩容 - -需要更大的 Kubernetes 节点?通过使用 Juju 的**约束**,您可以轻松地请求到不同大小的云资源。 -通过 Juju 请求创建的任意系统,您都可以为它们增加 CPU 和内存(RAM)。 -这使您可以对 Kubernetes 集群进行调优以适应工作负载。 -藉由 bootstrap 命令的参数或使用独立的 `juju constraints` 命令都可以做到这点。详情参见[和机器相关的 Juju 文档](https://jujucharms.com/docs/2.3/charms-constraints) - - - -## 为集群集群水平扩容 - -需要更多的工作节点?只需添加一些 unit: - -```shell -juju add-unit kubernetes-worker -``` - -或者一次添加多个: - -```shell -juju add-unit -n3 kubernetes-worker -``` -您也可以为特定实例类型或者特定机器的设置约束。更多信息请参见[约束文档](https://jujucharms.com/docs/stable/reference-constraints)。 -接下来举一些例子。请注意,诸如 `cores` 和 `mem` 这样的通用约束在各云之间的可移植性是比较高的。 -在本例中,我们从 AWS 申请一个特定的实例类型: - -```shell -juju set-constraints kubernetes-worker instance-type=c4.large -juju add-unit kubernetes-worker -``` - -为提升键值存储的容错能力,您也可以扩展 etcd charm: - -```shell -juju add-unit -n3 etcd -``` - -强烈建议运行奇数个 unit 以支持法定人数票选。 - - - -## 销毁集群 - -如果您是使用 conjure-up 创建的集群,通过 `conjure-down` 便可以完成销毁过程。 -如果是直接使用的 juju,你可以通过销毁 juju 模型或控制器来销毁集群。 -使用 `juju switch` 命令获取当前控制器的名字: - -```shell -juju switch -juju destroy-controller $controllername --destroy-all-models -``` - -这将关闭并终止该云上所有正在运行的实例。 -{{% /capture %}} - -{{% capture discussion %}} - - - -{{% capture discussion %}} -## 更多信息 - -Ubuntu Kubernetes 的部署通过名为 charms 的开源运维工具实现,这类工具也称作运维即代码(Operations as Code)。 -这些 charms 以层的方式组装,从而使代码更小,更专注于 Kubernetes 及其组件的操作。 - -Kubernetes 的层和 Bundle 可以在 github.com 的 `kubernetes` 项目中找到: - -- [Bundle 的地址](https://git.k8s.io/kubernetes/cluster/juju/bundles) -- [Kubernetes charm 层的地址](https://git.k8s.io/kubernetes/cluster/juju/layers) -- [Canonical Kubernetes 主页](https://jujucharms.com/kubernetes) -- [主要的 issue tracker](https://github.com/juju-solutions/bundle-canonical-kubernetes) - -欢迎提供功能需求,错误报告,pull request和反馈意见。 -{{% /capture %}} diff --git a/content/zh/docs/getting-started-guides/ubuntu/local.md b/content/zh/docs/getting-started-guides/ubuntu/local.md deleted file mode 100644 index 1744114ae19c8..0000000000000 --- a/content/zh/docs/getting-started-guides/ubuntu/local.md +++ /dev/null @@ -1,118 +0,0 @@ ---- -title: 通过 LXD 实现 Kubernetes 本地开发 -content_template: templates/task ---- - - - -{{% capture overview %}} - - - -在本地运行 Kubernetes 比在公有云上部署和移除集群具有明显的开发优势,如更低的成本和更快的迭代。 -理想情况下,Kubernetes 开发人员可以在本地容器内产生所有必需的节点,并在提交新配置时测试它们。 -本文将展示如何将集群部署到本地机器的 LXD 容器上。 - -{{% /capture %}} - - - -在本地机器上使用 [LXD](https://linuxcontainers.org/lxd/) 的目的是为了模拟用户在云或裸机中部署的环境。每个节点都被视为一台机器,具有与生产环境相同的特性。 每个节点都是一个单独的容器,它在里面运行 Docker 容器和 `kubectl`(更多信息请参阅 [集群简介](/docs/tutorials/kubernetes-basics/cluster-intro/))。 - -{{% capture prerequisites %}} - - - -安装 [conjure-up](http://conjure-up.io/),这是一个用来部署大型软件的工具。 -将当前用户添加到 `lxd` 用户组中。 - -``` -sudo snap install conjure-up --classic -sudo usermod -a -G lxd $(whoami) -``` - - - -注意:如果 conjure-up 要求您在 LXD 上 "配置一个 ipv6 子网",请选择 NO。目前还不支持在 Juju/LXD 上使用 ipv6。 -{% endcapture %} - -{{% capture steps %}} - - - -## 部署 Kubernetes - - -通过以下命令启动部署: - - conjure-up kubernetes - - - -对于本教程,我们将会创建一个新的控制器 - 选择 `localhost` 云类型: - -![选择云类型](/images/docs/ubuntu/00-select-cloud.png) - - - -部署应用: - -![部署应用](/images/docs/ubuntu/01-deploy.png) - - - -等待 Juju 引导结束: - -![引导](/images/docs/ubuntu/02-bootstrap.png) - - - -等待应用被完全部署: - -![等待](/images/docs/ubuntu/03-waiting.png) - - - -执行最终的后处理步骤,来自动配置 Kubernetes 环境: - -![后处理](/images/docs/ubuntu/04-postprocessing.png) - - - -查看最终的摘要信息: - -![最终的摘要](/images/docs/ubuntu/05-final-summary.png) - - - -## 访问集群 - -您可以通过运行以下命令来访问 Kubernetes 集群: - - kubectl --kubeconfig=~/.kube/config - - -或者如果您已经运行过一次,它将创建一个新的配置文件,如摘要信息所示。 - - kubectl --kubeconfig=~/.kube/config.conjure-up - -{{% /capture %}} diff --git a/content/zh/docs/getting-started-guides/ubuntu/logging.md b/content/zh/docs/getting-started-guides/ubuntu/logging.md deleted file mode 100644 index 341882002129c..0000000000000 --- a/content/zh/docs/getting-started-guides/ubuntu/logging.md +++ /dev/null @@ -1,79 +0,0 @@ ---- -title: 日志 -content_template: templates/task ---- - - - -{{% capture overview %}} - -本文将说明日志在 Juju 部署的集群中是如何工作的。 -{{% /capture %}} - -{{% capture prerequisites %}} - -本文假设你已经有一个可用的 Juju 部署的集群。 -{{% /capture %}} - -{{% capture steps %}} - -## 代理日志 - - -`juju debug-log` 命令可以显示集群中每一个节点上运行的 Juju 代理所汇总的日志结果。 -它可以帮助确定为何某个节点没有被部署或者是处于错误的状态。这些代理日志被存放在每个节点的 `/var/lib/juju/agents` 路径下。 - - -更多信息参见[Juju 文档](https://jujucharms.com/docs/stable/troubleshooting-logs) - - - -## 管理日志级别 - - -Juju 中默认的日志级别是 model 级别。不过,你可以随时调整它: - -``` -juju add-model k8s-development --config logging-config='=DEBUG;unit=DEBUG' -``` - - -然后在你的生态环境下的 k8s 模型进行配置 - -``` -juju model-config -m k8s-production logging-config='=ERROR;unit=ERROR' -``` - - -另外,所有控制器上的 jujud 守护进程默认使用 debug 级别。如果想要移除这种行为,编辑控制器节点上的 ```/var/lib/juju/init/jujud-machine-0/exec-start.sh``` 文件并注释掉 ```--debug``` 选项。 - - -修改之后,如下所示: - -``` -#!/usr/bin/env bash - -# Set up logging. -touch '/var/log/juju/machine-0.log' -chown syslog:syslog '/var/log/juju/machine-0.log' -chmod 0600 '/var/log/juju/machine-0.log' -exec >> '/var/log/juju/machine-0.log' -exec 2>&1 - -# Run the script. -'/var/lib/juju/tools/machine-0/jujud' machine --data-dir '/var/lib/juju' --machine-id 0 # --debug -``` - - -然后运行下面的命令,重启服务: - -``` -sudo systemctl restart jujud-machine-0.service -``` - - -Juju 中更多和日志与其它模型设置相关的信息请参考[官方文档](https://jujucharms.com/docs/stable/models-config)。 -{{% /capture %}} diff --git a/content/zh/docs/getting-started-guides/ubuntu/monitoring.md b/content/zh/docs/getting-started-guides/ubuntu/monitoring.md deleted file mode 100644 index 8f437d55dd821..0000000000000 --- a/content/zh/docs/getting-started-guides/ubuntu/monitoring.md +++ /dev/null @@ -1,234 +0,0 @@ ---- -title: 监控 -content_template: templates/task ---- - - - -{{% capture overview %}} - - - -本文将介绍如何将不同的日志解决方案连到已经用 Juju 部署好的 Kubernetes 集群上。 - -{{% /capture %}} - -{{% capture prerequisites %}} - - -本文假设你有一个用 Juju 部署好了的 Kubernetes 集群。 - -{{% /capture %}} - -{{% capture steps %}} - - - -## 连接 Datadog - - - -Datadog 是一个 SaaS 方案,包含了对很多不同类型的应用集成的支持,例如,Kubernetes 和 etcd。 -在提供商业版本的同时,也支持通过如下方式免费使用。 -部署一个带有现成的 Databox 的 Kubernetes 集群: - -``` -juju deploy canonical-kubernetes-datadog -``` - - -### 安装 Datadog - - -首先, 从 Juju 的 Charm Store 下载部署最新版本的 Datadog : - -``` -juju deploy datadog -``` - - - -使用在 [Datadog dashboard]() 上的 api-key 来配置 Datadog。 -将 `XXXX` 配置为你的 API 密钥。 - -``` -juju configure datadog api-key=XXXX -``` - - - -最后, 将 `datadog` 绑定到需要监控的所有应用上。例如:kubernetes-master, kubernetes-worker, and etcd: - -``` -juju add-relation datadog kubernetes-worker -juju add-relation datadog kubernetes-master -juju add-relation datadog etcd -``` - - -## 连接 Elastic 栈 - - - -Elastic 栈,正规地说是 "ELK" 栈, 指的是 ElasticSearch 和日志收集,监控,dashboard 的套件. -部署带有现成的 elastic 栈的 Kubernetes 集群命令如下: - -``` -juju deploy canonical-kubernetes-elastic -``` - - -### 初装 ElasticSearch - - - -首先, 从 Juju 的 Charm store 下载、部署最新版本的 ElasticSearch, Kibana, Filebeat 和 Topbeat: - - - -命令行如下: - -``` -juju deploy beats-core -``` - - - -此外,如果你要定制部署,或手工安装,可使用以下命令: - -``` -juju deploy elasticsearch -juju deploy kibana -juju deploy filebeat -juju deploy topbeat - -juju add-relation elasticsearch kibana -juju add-relation elasticsearch topbeat -juju add-relation elasticsearch filebeat -``` - - - -最后将 filebeat 和 topbeat 连接到所要监控的应用上。 -例如:kubernetes-master 和 kubernetes-worker: - -``` -juju add-relation kubernetes-master topbeat -juju add-relation kubernetes-master filebeat -juju add-relation kubernetes-worker topbeat -juju add-relation kubernetes-worker filebeat -``` - - -### 已装 ElasticSearch 集群 - - - -如果已有一个 ElasticSearch 集群已经存在的情况下, -你可以使用下面的方式来连接和使用它而不是重新创建一个单独的新集群。 -首先部署 filebeat 和 topbeat 两个组件: - -``` -juju deploy filebeat -juju deploy topbeat -``` - - - -按照如下方式可配置 filebeat 和 topbeat 对接 ElasticSearch 集群, -将 `255.255.255.255` 替换成自己配置的IP。 - -``` -juju configure filebeat elasticsearch=255.255.255.255 -juju configure topbeat elasticsearch=255.255.255.255 -``` - - - -使用上面的命令,将 topbeat 和 filebeat 连接到需要监控的应用上。 - - - - -## 连接 Nagios - - - -Nagios 在每个节点上,使用 Nagions 远程执行插件协议 (NRPE 协议)作为代理 -来收集节点里和健康、应用相关的详细信息。 - - - -### 初装 Nagios - - - -首先, 从 Juju 的 Charm store 部署最新版本的 Nagois 和 NRPE: - -``` -juju deploy nagios -juju deploy nrpe -``` - - -将 Nagois 连接到 NRPE 上 - -``` -juju add-relation nagios nrpe -``` - - - -最后,将 NRPE 添加到所有需要部署的应用, -例如,`kubernetes-master`, `kubernetes-worker`, `etcd`, `easyrsa`, 和 `kubeapi-load-balancer`。 - -``` -juju add-relation nrpe kubernetes-master -juju add-relation nrpe kubernetes-worker -juju add-relation nrpe etcd -juju add-relation nrpe easyrsa -juju add-relation nrpe kubeapi-load-balancer -``` - - -### 已装 Nagios - - - -如果已经装有 Nagios,可以换用 `nrpe-external-master` charm 。 -这样可以提供配置选项将现有的、外部 Nagios 安装映射到 NRPE上。 -将 `255.255.255.255` 替换为 nagois 实例的 IP 地址。 - -``` -juju deploy nrpe-external-master -juju configure nrpe-external-master nagios_master=255.255.255.255 -``` - - - -配置完后,如上所示,连到 nrpe-external-master。 - -{{% /capture %}} diff --git a/content/zh/docs/getting-started-guides/ubuntu/networking.md b/content/zh/docs/getting-started-guides/ubuntu/networking.md deleted file mode 100644 index e713c53cd7b36..0000000000000 --- a/content/zh/docs/getting-started-guides/ubuntu/networking.md +++ /dev/null @@ -1,92 +0,0 @@ ---- -title: 网络 -content_template: templates/task ---- - - - -{{% capture overview %}} - - - -Kubernetes 支持[容器网络接口](https://github.com/containernetworking/cni)。 -这个网络插件架构允许你使用任何你喜欢的、对 Kubernetes 友好的 SDN。 -目前支持的插件是 Flannel 和 Canal。 - - -本页将展示集群中各个网络部分是如何工作,并且对它们进行相应的配置。 - -{{% /capture %}} -{{% capture prerequisites %}} - - -本页假设你有一个已经通过 Juju 部署、正在运行的集群。 - -{{< note >}} - -注意,如果你是通过 `conjure-up` 或者 CDK 软件包部署的集群,将不需要再手动部署 CNI 插件。 - -{{< /note >}} -{{% /capture %}} - - -{{% capture steps %}} - - - -CNI charms 在[子路径](https://jujucharms.com/docs/stable/authors-subordinate-applications)下。 -这些 charms 需要主 charm 实现 `kubernetes-cni` 接口,才能正常部署。 - -## Flannel - -``` -juju deploy flannel -juju add-relation flannel kubernetes-master -juju add-relation flannel kubernetes-worker -juju add-relation flannel etcd -``` - -## Canal - -``` -juju deploy canal -juju add-relation canal kubernetes-master -juju add-relation canal kubernetes-worker -juju add-relation canal etcd -``` - - -### 配置 - - - -**iface** 接口是用来配置 flannel 或 canal 的 SDN 绑定。 -如果属性为空字符串或未定义,程序将通过下面的命令行试图找出默认的网络适配器: - -```bash -$ route | grep default | head -n 1 | awk {'print $8'} -``` - - - -**cidr** 在用 etcd 进行网络设置时,用于配置 flannel 或 canal SDN 所要使用的网络地址范围。 -请确保这个网络地址范围在所要部署的 L2/L3 上不是在用状态, -因为如果没有选择一个好的 CIDR 范围来分配给 flannel,就会出现冲突或异常行为。 -同时也要保证 IP 地址范围足够大以支持未来可能会发生的集群扩容。 -A 类 IP 地址 `/24` 是一个不错的选择。 - -{{% /capture %}} diff --git a/content/zh/docs/getting-started-guides/ubuntu/operational-considerations.md b/content/zh/docs/getting-started-guides/ubuntu/operational-considerations.md deleted file mode 100644 index d738f09288eec..0000000000000 --- a/content/zh/docs/getting-started-guides/ubuntu/operational-considerations.md +++ /dev/null @@ -1,262 +0,0 @@ ---- -title: 运维注意事项 -content_template: templates/task ---- - - - -{{% capture overview %}} - - -本文为管理维护长期运行的集群的工程师提供一些建议和提示。 - -{{% /capture %}} -{{% capture prerequisites %}} - - -本文假定您对 Juju 和 Kubernetes 已经有了基本的了解。 - -{{% /capture %}} - -{{% capture steps %}} - - - -## 管理 Juju - - - -### 确定控制节点规模 - - - -Juju 控制器: - - - -* 运行需要大概 2 到 2.5 GB 的 RAM。 -* 用 MongoDB 数据库作为集群配置和状态的存储后端。这个数据库可能增长很快,也可能是实例中 CPU 周期的最大消费者。 -* 汇总和存储所有服务和单位的日志数据。因此,长期运行的模型需要大量的存储。如果您的目的是保持集群运行,请确保为日志配置至少 64 GB 的存储空间。 - - - -指定参数创建一个控制器(命令行如下): - -``` -juju bootstrap --constraints "mem=8GB cpu-cores=4 root-disk=128G" -``` - - - -Juju 将会选择与目标云上的约束匹配的最便宜的实例类型。 -还可以通过将 ```instance-type``` 与 ```root-disk``` 两个约束结合使用来进行严格控制。 -对于可用的约束信息,请参阅 [官方文档](https://jujucharms.com/docs/stable/reference-constraints) - - - - -关于日志记录的更多信息,请参阅 [日志章节](/docs/getting-started-guides/ubuntu/logging) - - - -### SSH 到控制节点上 - - - -默认情况下,Juju 将创建一对 SSH 密钥,用于自动化单元之间的连接。 -这对密钥保存在客户端节点的 ```~/.local/share/juju/ssh/``` 路径下。 - - - -部署完后,Juju 控制器是一个 "无声单元", -其充当客户端和已部署应用程序之间的代理。 -尽管如此,SSH 到控制器上还是很有用的。 - - - -首先,你需要了解你的运行环境,特别是如果你运行了几个 Juju 模型和控制器。 - -运行下面的命令行: - -``` -juju list-models --all -$ juju models --all -Controller: k8s - -Model Cloud/Region Status Machines Cores Access Last connection -admin/controller lxd/localhost available 1 - admin just now -admin/default lxd/localhost available 0 - admin 2017-01-23 -admin/whale* lxd/localhost available 6 - admin 3 minutes ago -``` - - - -第一行的 ```Controller: k8s``` 表明是如何引导创建的控制器。 - - - -接着可以看见下面列了 2 个,3 个或更多的类型。 - - - -* admin/controller 是托管 juju 所有控制器单元的默认模型 -* admin/default 默认情况下,作为托管用户应用程序的主要模型,例如 Kubernetes 集群 -* admin/whale 是一个额外的模型,如在 Juju 之上,叠加使用 conjure-up 的话 - - - -现在开始 ssh 到控制节点上,首先是让 Juju 切换上下文,然后是像一般单元那样 ssh 到控制节点上: - -``` -juju switch controller -``` - - - -在这个阶段,也可以查询控制器模型: - -``` -juju status -Model Controller Cloud/Region Version -controller k8s lxd/localhost 2.0.2 - -App Version Status Scale Charm Store Rev OS Notes - -Unit Workload Agent Machine Public address Ports Message - -Machine State DNS Inst id Series AZ -0 started 10.191.22.15 juju-2a5ed8-0 xenial -``` - - - -请注意,如果是在 HA 模式下进行的引导, -会在列表中看到几台机器。 - - - -现在 ssh 到控制器节点上,遵循和经典 Juju 命令相同的语义: - -``` -$ juju ssh 0 -Welcome to Ubuntu 16.04.1 LTS (GNU/Linux 4.8.0-34-generic x86_64) - - * Documentation: https://help.ubuntu.com - * Management: https://landscape.canonical.com - * Support: https://ubuntu.com/advantage - - Get cloud support with Ubuntu Advantage Cloud Guest: - http://www.ubuntu.com/business/services/cloud - -0 packages can be updated. -0 updates are security updates. - - -Last login: Tue Jan 24 16:38:13 2017 from 10.191.22.1 -ubuntu@juju-2a5ed8-0:~$ -``` - - - -在结束完操作,想要返回到最初的模型,退出控制器即可。 - - - -如果,还想要切换回集群,ssh 到其他单元上,运行下面的命令行进行切换: - -``` -juju switch default -``` - - - -## 管理 Kubernetes 集群 - - - -### 运行特权容器 - - - -默认情况下,juju 部署的集群不支持在带有 GPU 的节点上运行特权容器。 -如果需要在其它节点上运行特权容器,只能是在 kubernetes-master 和 kubernetes-worker 节点上 -使能 ```allow-privileged``` 参数: - -``` -juju config kubernetes-master allow-privileged=true -juju config kubernetes-worker allow-privileged=true -``` - - - -### 私有仓库 - - - -通过 registry 操作,您可以很容易地创建一个使用 TLS 身份验证的私有 docker 仓库。 -但是请注意,通过这些功能部署的仓库不是高可用性的; -它使用的存储绑定到运行 pod 的 kubernetes 节点上。 -因此,如果仓库所在的 pod 从一个节点迁移到另一个节点上, -那么你需要重新发布镜像。 - - - -#### 使用示例 - - - -创建相关的身份验证文件。 -例如用户为 ```userA``` 密码为 ```passwordA``` 用来进行身份验证, -命令行如下: - -``` -echo "userA:passwordA" > htpasswd-plain -htpasswd -c -b -B htpasswd userA passwordA -``` - - - -(`htpasswd` 程序通过 ```apache2-utils``` 包获得) - - - -假设您的仓库可以通过 ```myregistry.company.com``` 访问, -您已经在 ```registry.key``` 文件中拥有了您的 TLS 密钥, -并且您的 TLS 身份验证(以 ```myregistry.company.com``` 作为 Common Name)在 -```registry.crt``` 文件中,那么您可以运行: - -``` -juju run-action kubernetes-worker/0 registry domain=myregistry.company.com htpasswd="$(base64 -w0 htpasswd)" htpasswd-plain="$(base64 -w0 htpasswd-plain)" tlscert="$(base64 -w0 registry.crt)" tlskey="$(base64 -w0 registry.key)" ingress=true -``` - - - -如果决定删除镜像仓库,命令行如下: - -``` -juju run-action kubernetes-worker/0 registry delete=true ingress=true -``` - -{{% /capture %}} diff --git a/content/zh/docs/getting-started-guides/ubuntu/rancher.md b/content/zh/docs/getting-started-guides/ubuntu/rancher.md deleted file mode 100644 index 095b4c92feabb..0000000000000 --- a/content/zh/docs/getting-started-guides/ubuntu/rancher.md +++ /dev/null @@ -1,543 +0,0 @@ ---- -title: Rancher 与 Ubuntu Kubernetes 集成 -cn-approvers: -- chentao1596 ---- - - - -{{% capture overview %}} - - - -本文将介绍如何在 Canonical Kubernetes 集群上部署 Rancher 2.0 alpha。 - - -这些步骤目前处于 alpha/testing 阶段,未来很可能会发生变化。 - - - -有关此集成的原始文档可以在 [https://github.com/CalvinHartwell/canonical-kubernetes-rancher/](https://github.com/CalvinHartwell/canonical-kubernetes-rancher/) 上找到。 - - -{{% /capture %}} -{{% capture prerequisites %}} - - -本文假设你有一个已经通过 Juju 部署、正在运行的集群。 - - - -有关使用 juju 部署 Kubernetes 集群的完整指导,请参考 [/docs/getting-started-guides/ubuntu/installation/](/docs/getting-started-guides/ubuntu/installation/)。 - -{{% /capture %}} - - -{{% capture steps %}} - - -## 部署 Rancher - - - -想要部署 Rancher,我们只需要在 Kubernetes 集群上运行 Rancher 容器工作负载即可。 -Rancher 通过 dockerhub([https://hub.docker.com/r/rancher/server/tags/](https://hub.docker.com/r/rancher/server/tags/)) -提供他们的容器镜像的免费下载。 - - - -如果您正在使用自己的镜像仓库,或进行离线部署, -那么,在开始部署之前,请先下载好这些容器镜像,将其推入私有镜像仓库中。 - - -### 使用 nodeport 部署 Rancher - - -首先创建一个 yaml 文件,该文件定义了如何在 kubernetes 上部署 Rancher。 -将该文件保存为 cdk-rancher-nodeport.yaml: - -``` ---- -apiVersion: rbac.authorization.k8s.io/v1 -kind: ClusterRoleBinding -metadata: - name: cluster-admin -subjects: - - kind: ServiceAccount - name: default - namespace: default -roleRef: - kind: ClusterRole - name: cluster-admin - apiGroup: rbac.authorization.k8s.io ---- -apiVersion: rbac.authorization.k8s.io/v1 -kind: ClusterRole -metadata: - name: cluster-admin -rules: -- apiGroups: - - '*' - resources: - - '*' - verbs: - - '*' -- nonResourceURLs: - - '*' - verbs: - - '*' ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - creationTimestamp: null - labels: - app: rancher - name: rancher -spec: - replicas: 1 - selector: - matchLabels: - app: rancher - ima: pod - strategy: {} - template: - metadata: - creationTimestamp: null - labels: - app: rancher - ima: pod - spec: - containers: - - image: rancher/server:preview - imagePullPolicy: Always - name: rancher - ports: - - containerPort: 80 - - containerPort: 443 - livenessProbe: - httpGet: - path: / - port: 80 - initialDelaySeconds: 5 - timeoutSeconds: 30 - resources: {} - restartPolicy: Always - serviceAccountName: "" -status: {} ---- -apiVersion: v1 -kind: Service -metadata: - name: rancher - labels: - app: rancher -spec: - ports: - - port: 443 - protocol: TCP - targetPort: 443 - selector: - app: rancher ---- -apiVersion: v1 -kind: Service -metadata: - name: rancher-nodeport -spec: - type: NodePort - selector: - app: rancher - ports: - - name: rancher-api - protocol: TCP - nodePort: 30443 - port: 443 - targetPort: 443 -``` - - - -kubectl 开始正常运行后,执行下面的命令开始部署 Rancher: - -``` - kubectl apply -f cdk-rancher-nodeport.yaml -``` - - - -现在我们需要打开这个 nodeport,以供访问。 -为此,我们可以使用 juju。我们需要在集群中的每个工作节点上运行 open-port 命令。 -在 cdk-rancher-nodeport.yaml 文件中,nodeport 已设置为 30443。 -下面的命令行展示如何在每个工作节点上打开端口: - - - -``` - # 在集群的每个工作节点上运行下面的命令行 - juju run --unit kubernetes-worker/0 "open-port 30443" - juju run --unit kubernetes-worker/1 "open-port 30443" - juju run --unit kubernetes-worker/2 "open-port 30443" -``` - - - -现在便可以通过工作节点的 IP 或 DNS 记录(如果已经创建)在此端口上访问 Rancher。 -通常建议您为集群中的每个工作节点创建一条 DNS 记录。 -例如,如果有三个工作节点并且域名是 example.com,则可以创建三条 A 记录,集群中的每个工作节点各一条。 - - - -由于创建 DNS 记录超出了本文关注的范围, -我们将使用免费服务 xip.io 来获得和 IP 地址相对应的 A 记录,IP 地址将是域名的一部分。 -例如,如果有域名 rancher.35.178.130.245.xip.io, -则 xip.io 服务会自动将 IP 地址 35.178.130.245 作为 A 记录返回,这对测试相当有用。 -至于您的部署,IP 地址 35.178.130.245 应该替换为集群工作节点的 IP 地址,这个 IP 地址可以通过 Juju 或 AWS 得到: - - - -``` - calvinh@ubuntu-ws:~/Source/cdk-rancher$ juju status - -# ... 输出省略。 - -Unit Workload Agent Machine Public address Ports Message -easyrsa/0* active idle 0 35.178.118.232 Certificate Authority connected. -etcd/0* active idle 1 35.178.49.31 2379/tcp Healthy with 3 known peers -etcd/1 active idle 2 35.177.99.171 2379/tcp Healthy with 3 known peers -etcd/2 active idle 3 35.178.125.161 2379/tcp Healthy with 3 known peers -kubeapi-load-balancer/0* active idle 4 35.178.37.87 443/tcp Loadbalancer ready. -kubernetes-master/0* active idle 5 35.177.239.237 6443/tcp Kubernetes master running. - flannel/0* active idle 35.177.239.237 Flannel subnet 10.1.27.1/24 -kubernetes-worker/0* active idle 6 35.178.130.245 80/tcp,443/tcp,30443/tcp Kubernetes worker running. - flannel/2 active idle 35.178.130.245 Flannel subnet 10.1.82.1/24 -kubernetes-worker/1 active idle 7 35.178.121.29 80/tcp,443/tcp,30443/tcp Kubernetes worker running. - flannel/3 active idle 35.178.121.29 Flannel subnet 10.1.66.1/24 -kubernetes-worker/2 active idle 8 35.177.144.76 80/tcp,443/tcp,30443/tcp Kubernetes worker running. - flannel/1 active idle 35.177.144.76 - -# 注意上面输出中 kubernetes-worker 的 IP 地址,可以选一个用作设置。 -``` - - - -尝试使用 nodeport 搭配域名或 IP 地址在浏览器中打开 Rancher: - - - -``` - # 将 IP 地址替换为某个 Kubernetes 工作节点的公共地址,通过 juju status 命令进行查找。 - wget https://35.178.130.245.xip.io:30443 --no-check-certificate - - # 这条命令也应该能工作 - wget https://35.178.130.245:30443 --no-check-certificate -``` - - - -如果需要对 kubernetes 配置文件进行任何更改,编辑 yaml 文件,再重新 apply 即可: - -``` - kubectl apply -f cdk-rancher-nodeport.yaml -``` - - -### 使用 ingress 规则部署 Rancher - - - -也可以使用 ingress 规则来部署 Rancher。 -这还有另外一个好处,就是不需要在 Kubernetes 集群上打开额外的端口。 -首先创建一个名为 cdk-rancher-ingress.yaml 的文件,内容如下: - -``` ---- -apiVersion: rbac.authorization.k8s.io/v1 -kind: ClusterRoleBinding -metadata: - name: cluster-admin -subjects: - - kind: ServiceAccount - name: default - namespace: default -roleRef: - kind: ClusterRole - name: cluster-admin - apiGroup: rbac.authorization.k8s.io ---- -apiVersion: rbac.authorization.k8s.io/v1 -kind: ClusterRole -metadata: - name: cluster-admin -rules: -- apiGroups: - - '*' - resources: - - '*' - verbs: - - '*' -- nonResourceURLs: - - '*' - verbs: - - '*' ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - creationTimestamp: null - labels: - app: rancher - name: rancher -spec: - replicas: 1 - selector: - matchLabels: - app: rancher - strategy: {} - template: - metadata: - creationTimestamp: null - labels: - app: rancher - spec: - containers: - - image: rancher/server:preview - imagePullPolicy: Always - name: rancher - ports: - - containerPort: 443 - livenessProbe: - httpGet: - path: / - port: 80 - initialDelaySeconds: 5 - timeoutSeconds: 30 - resources: {} - restartPolicy: Always - serviceAccountName: "" -status: {} ---- -apiVersion: v1 -kind: Service -metadata: - name: rancher - labels: - app: rancher -spec: - ports: - - port: 443 - targetPort: 443 - protocol: TCP - selector: - app: rancher ---- -apiVersion: extensions/v1beta1 -kind: Ingress -metadata: - name: rancher - annotations: - kubernetes.io/tls-acme: "true" - ingress.kubernetes.io/secure-backends: "true" -spec: - tls: - - hosts: - - rancher.34.244.118.135.xip.io - rules: - - host: rancher.34.244.118.135.xip.io - http: - paths: - - path: / - backend: - serviceName: rancher - servicePort: 443 -``` - - - -通常建议您为集群中的每个工作节点创建一条 DNS 记录。 -例如,如果有三个工作节点并且域名是 example.com,则可以创建三条 A 记录,集群中的每个工作节点各一条。 - - - -由于创建 DNS 记录超出了本文关注的范围, -我们将使用免费服务 xip.io 来获得和 IP 地址相对应的 A 记录,IP 地址将是域名的一部分。 -例如,如果有域名 rancher.35.178.130.245.xip.io, -则 xip.io 服务会自动将 IP 地址 35.178.130.245 作为 A 记录返回,这对测试相当有用。 - - - -至于您的部署,IP 地址 35.178.130.245 应该替换为集群工作节点的 IP 地址,这个 IP 地址可以通过 Juju 或 AWS 得到: - - - -``` - calvinh@ubuntu-ws:~/Source/cdk-rancher$ juju status - -# ... 输出省略。 - -Unit Workload Agent Machine Public address Ports Message -easyrsa/0* active idle 0 35.178.118.232 Certificate Authority connected. -etcd/0* active idle 1 35.178.49.31 2379/tcp Healthy with 3 known peers -etcd/1 active idle 2 35.177.99.171 2379/tcp Healthy with 3 known peers -etcd/2 active idle 3 35.178.125.161 2379/tcp Healthy with 3 known peers -kubeapi-load-balancer/0* active idle 4 35.178.37.87 443/tcp Loadbalancer ready. -kubernetes-master/0* active idle 5 35.177.239.237 6443/tcp Kubernetes master running. - flannel/0* active idle 35.177.239.237 Flannel subnet 10.1.27.1/24 -kubernetes-worker/0* active idle 6 35.178.130.245 80/tcp,443/tcp,30443/tcp Kubernetes worker running. - flannel/2 active idle 35.178.130.245 Flannel subnet 10.1.82.1/24 -kubernetes-worker/1 active idle 7 35.178.121.29 80/tcp,443/tcp,30443/tcp Kubernetes worker running. - flannel/3 active idle 35.178.121.29 Flannel subnet 10.1.66.1/24 -kubernetes-worker/2 active idle 8 35.177.144.76 80/tcp,443/tcp,30443/tcp Kubernetes worker running. - flannel/1 active idle 35.177.144.76 - -# 注意上面输出中 kubernetes-worker 的 IP 地址,可以选一个用作设置。 -``` - - - -查看上面 juju status 的命令输出,可以拿公共地址(35.178.130.245)来创建 xip.io DNS记录(rancher.35.178.130.245.xip.io),记录可以加到 cdk-rancher-ingress.yaml 文件中。 -你也同样可以创建自己的 DNS 记录,只要能解析到集群上的工作节点即可: - - - -``` - # xip.io 在文件中会出现两次,请都替换修改。 - cat cdk-rancher-ingress.yaml | grep xip.io - - host: rancher.35.178.130.245.xip.io -``` - - - -修改完 ingress 规则之后,可以运行 `kubectl apply -f cdk-rancher-ingress.yaml` 命令来更新 Kubernetes 集群: - -``` - kubectl apply -f cdk-rancher-ingress.yaml -``` - - - -现在可以通过工作节点 IP 或者 DNS 记录(如果已创建)在常规的 443 上访问 Rancher。 -尝试在浏览器中打开它: - - - -``` - # 将 IP 地址替换为某个 Kubernetes 工作节点的公共地址,通过 juju status 命令进行查找。 - wget https://35.178.130.245.xip.io:443 --no-check-certificate -``` - - - -如果需要对 kubernetes 配置文件进行任何更改,请编辑 yaml 文件,再 apply: - -``` - kubectl apply -f cdk-rancher-ingress.yaml -``` - - -### 删除 Rancher - - - -您可以使用 kubectl 从集群中删除 Rancher。 -在 Kubernetes 中删除对象与创建它们的过程一样简单: - - - -``` - # 使用 nodeport 示例(如果使用 ingress 示例,请修改文件名) - kubectl delete -f cdk-rancher-nodeport.yaml -``` - -{{% /capture %}} diff --git a/content/zh/docs/getting-started-guides/ubuntu/scaling.md b/content/zh/docs/getting-started-guides/ubuntu/scaling.md deleted file mode 100644 index a24c136eeb430..0000000000000 --- a/content/zh/docs/getting-started-guides/ubuntu/scaling.md +++ /dev/null @@ -1,147 +0,0 @@ ---- -title: 扩缩 -content_template: templates/task ---- - - - -{{% capture overview %}} - - -本文将讨论如何在集群中扩缩主节点和工作节点。 - -{{% /capture %}} - -{{% capture prerequisites %}} - - - -本文假设您已经有一个用 Juju 部署、正在运行的集群。 - - - -任何应用都可以在部署之后进行横向扩容。 -charms 将会不停地更新进度状态信息,建议运行如下命令。 - -``` -watch -c juju status --color -``` -{{% /capture %}} - -{{% capture steps %}} - - - -## Kubernetes 主节点 - - - -Kubernetes 主节点充当了集群中控制平面的角色。 -在设计上,这些主节点可以独立于工作节点进行扩缩容,从而带来运维上的灵活性。 -想要添加一个主节点,只需要执行以下命令: - - juju add-unit kubernetes-master - - - -这将会在控制平面中添加一个新的主节点。 -参见[构建高可用集群](/docs/admin/high-availability)文档,获取更多信息。 - - - -## Kubernetes 工作节点 - - - -kubernetes-worker 节点是 Kubernetes 集群中承担负载的部分。 - - - -默认情况下,pod 会自动均匀部署在 kubernetes-worker 节点上。 - - - -如果想要在集群中添加更多的 kubernetes-worker 节点,运行如下命令: - -``` -juju add-unit kubernetes-worker -``` - - - -或者修改机器限制,来创建更大的节点: - -``` -juju set-constraints kubernetes-worker "cpu-cores=8 mem=32G" -juju add-unit kubernetes-worker -``` - - - -参见[机器限制文档](https://jujucharms.com/docs/stable/charms-constraints), -了解其它机器约束,这些约束可能对 kubernetes-worker unit 有帮助。 - -## etcd - - - -Etcd 在 Kubernetes 集群中用作键值存储。 -集群默认使用一个存储实例。 - - - -由于仲裁机制的关系,推荐保有奇数个 etcd 节点。 -根据集群的大小,推荐使用3、5、7 或 9 个节点。 -CoreOS etcd 文档有一个关于[最佳集群大小](https://coreos.com/etcd/docs/latest/admin_guide.html#optimal-cluster-size)的图表, -可以参考确定最佳的容错设计。 - - -添加 etcd 单元: - -``` -juju add-unit etcd -``` - - -不建议在扩容 etcd 集群之后对其缩容。 - - -## Juju 控制器 - - - -一个负责协调每台机器上 Juju 代理(这些代理管理 Kubernetes 集群)的节点被称为控制器节点。 -对于生产环境下的部署,建议启用控制器节点的高可用性: - - juju enable-ha - - - -启用 HA 将会创建 3 个控制器节点,对于大多数情况而言应该是足够的。 -而对于超大型的部署,也同时支持 5 或 7 个控制器节点。 - - - -参见 [Juju HA 控制器文档](https://jujucharms.com/docs/2.2/controllers-ha) 获取更多信息. - -{{% /capture %}} diff --git a/content/zh/docs/getting-started-guides/ubuntu/security.md b/content/zh/docs/getting-started-guides/ubuntu/security.md deleted file mode 100644 index 49839858b20d4..0000000000000 --- a/content/zh/docs/getting-started-guides/ubuntu/security.md +++ /dev/null @@ -1,68 +0,0 @@ ---- -title: 安全考虑 -content_template: templates/task ---- - - -{{% capture overview %}} - -默认情况下,所有提供的节点之间的所有连接(包括 etcd 集群)都通过 easyrsa 的 TLS 进行保护。 - -本文介绍已部署集群的安全注意事项和生产环境建议。 -{{% /capture %}} -{{% capture prerequisites %}} - -本文假定您拥有一个使用 Juju 部署的正在运行的集群。 -{{% /capture %}} - - -{{% capture steps %}} - -## 实现 - -TLS 和 easyrsa 的实现使用以下 [layers](https://jujucharms.com/docs/2.2/developer-layers)。 - -[layer-tls-client](https://github.com/juju-solutions/layer-tls-client) -[layer-easyrsa](https://github.com/juju-solutions/layer-easyrsa) - - -## 限制 ssh 访问 - -默认情况下,管理员可以 ssh 到集群中的任意已部署节点。您可以通过以下命令来批量禁用集群节点的 ssh 访问权限。 - - juju model-config proxy-ssh=true - -注意:Juju 控制器节点在您的云中仍然有开放的 ssh 访问权限,并且在这种情况下将被用作跳板机。 - -有关如何管理 ssh 密钥的说明,请参阅 Juju 文档中的 [模型管理](https://jujucharms.com/docs/2.2/models) 页面。 -{{% /capture %}} - - diff --git a/content/zh/docs/getting-started-guides/ubuntu/storage.md b/content/zh/docs/getting-started-guides/ubuntu/storage.md deleted file mode 100644 index 04aa262d9b3e1..0000000000000 --- a/content/zh/docs/getting-started-guides/ubuntu/storage.md +++ /dev/null @@ -1,138 +0,0 @@ ---- -title: 存储 -content_template: templates/task ---- - - - -{{% capture overview %}} - - -本文解释了如何在集群中安装和配置持久化存储。 - -{{% /capture %}} - -{{% capture prerequisites %}} - - -本文假设您已经有一个用 Juju 部署、正在运行的集群。 - -{{% /capture %}} - -{{% capture steps %}} - - -## Ceph 持久卷 - - - -Canonical 的 Kubernetes 发行版允许添加持久化存储设备,例如 [Ceph](http://ceph.com)。 -配合 [Juju Storage](https://jujucharms.com/docs/2.0/charms-storage)功能, -可以跨云平台,添加持久化存储。 - - - -部署一个至少有三个 ceph-mon 和三个 ceph-osd 单元的存储池。 - -``` -juju deploy cs:ceph-mon -n 3 -juju deploy cs:ceph-osd -n 3 -``` - - -关联这些单元: - -``` -juju add-relation ceph-mon ceph-osd -``` - - -列出云上 Juju 可用的存储池: - - juju storage-pools - - -输出: - -``` -Name Provider Attrs -ebs ebs -ebs-ssd ebs volume-type=ssd -loop loop -rootfs rootfs -tmpfs tmpfs -``` - -{{< note >}} - - - -注意列表使用的是 AWS,不同的云有不同的存储池名称。 - -{{< /note >}} - - - -以 “名字,大小,数量”的格式往 ceph-osd charm 中添加存储池: - -``` -juju add-storage ceph-osd/0 osd-devices=ebs,10G,1 -juju add-storage ceph-osd/1 osd-devices=ebs,10G,1 -juju add-storage ceph-osd/2 osd-devices=ebs,10G,1 -``` - - -接下来将 Kubernetes 和存储集群相关联: - -``` -juju add-relation kubernetes-master ceph-mon -``` - - - - -现在我们可以在 Kubernetes 中列举可用的[持久卷](/docs/concepts/storage/persistent-volumes/), -集群中的负载可以通过 PVC 申领来使用这些持久卷。 - -``` -juju run-action kubernetes-master/0 create-rbd-pv name=test size=50 -``` - - - -本例中创建了 50 MB 大小的 “test” Rados 块设备 (rbd)。 - -在 Kubernetes 集群上使用如下所示的 watch 命令,可以看到 PV 加入列表,并被标记可用的过程: - - watch kubectl get pv - - -输出: - -``` -NAME CAPACITY ACCESSMODES STATUS CLAIM REASON AGE - -test 50M RWO Available 10s -``` - - - -要使用这些持久卷,pods 需要关联一个持久卷申领,这超出了本文档的讨论范围。 -参见[持久卷](/docs/concepts/storage/persistent-volumes/)获取更多信息。 - -{{% /capture %}} diff --git a/content/zh/docs/getting-started-guides/ubuntu/troubleshooting.md b/content/zh/docs/getting-started-guides/ubuntu/troubleshooting.md deleted file mode 100644 index be76e9eb9182e..0000000000000 --- a/content/zh/docs/getting-started-guides/ubuntu/troubleshooting.md +++ /dev/null @@ -1,272 +0,0 @@ ---- -title: 故障排除 ---- - - - -{{% capture overview %}} - - - -本文重点讨论如何解决 Kubernetes 集群部署过程中的问题, -而不会关心如何调试 Kubernetes 集群内的工作负载。 - -{{% /capture %}} - -{{% capture prerequisites %}} - - - -本文假设您已经有一个用 Juju 部署、正在工作的集群。 - -{{% /capture %}} - -{{% capture steps %}} - - -## 了解集群状态 - - -使用 `juju status` 命令可以了解一些集群内的情况: - -``` -Model Controller Cloud/Region Version -kubes work-multi aws/us-east-2 2.0.2.1 - -App Version Status Scale Charm Store Rev OS Notes -easyrsa 3.0.1 active 1 easyrsa jujucharms 3 ubuntu -etcd 2.2.5 active 1 etcd jujucharms 17 ubuntu -flannel 0.6.1 active 2 flannel jujucharms 6 ubuntu -kubernetes-master 1.4.5 active 1 kubernetes-master jujucharms 8 ubuntu exposed -kubernetes-worker 1.4.5 active 1 kubernetes-worker jujucharms 11 ubuntu exposed - -Unit Workload Agent Machine Public address Ports Message -easyrsa/0* active idle 0/lxd/0 10.0.0.55 Certificate Authority connected. -etcd/0* active idle 0 52.15.47.228 2379/tcp Healthy with 1 known peers. -kubernetes-master/0* active idle 0 52.15.47.228 6443/tcp Kubernetes master services ready. - flannel/1 active idle 52.15.47.228 Flannel subnet 10.1.75.1/24 -kubernetes-worker/0* active idle 1 52.15.177.233 80/tcp,443/tcp Kubernetes worker running. - flannel/0* active idle 52.15.177.233 Flannel subnet 10.1.63.1/24 - -Machine State DNS Inst id Series AZ -0 started 52.15.47.228 i-0bb211a18be691473 xenial us-east-2a -0/lxd/0 started 10.0.0.55 juju-153b74-0-lxd-0 xenial -1 started 52.15.177.233 i-0502d7de733be31bb xenial us-east-2b -``` - - - -在这个例子中,我们可以获取一些信息。 `Workload` 列将显示给定服务的状态。 -`Message` 部分将显示集群中给定服务的健康状况。 在部署和维护期间, -这些工作负载状态将进行更新以反映给定节点正在执行的操作。例如, -Workload 可能显示为 `maintenance`,而 Message 则会相应显示为 `Installing docker`。 - - - -正常情况下,Workload 列应该为 `active`,Agent 列(用于反映 Juju 代理正在做什么)应该为 `idle`, -而 Message 要么是 `Ready` 或者其它描述性的术语。 -如果集群运行健康,`juju status --color` 返回的结果输出都将是绿色的。 - - - -对于大型集群而言,状态信息可能会太多,因此建议检查各个服务的状态,例如仅检查工作节点的状态: - - juju status kubernetes-worker - - -或者只检查 etcd 集群的状态: - - juju status etcd - -Errors will have an obvious message, and will return a red result when used with -`juju status --color`. Nodes that come up in this manner should be investigated. - -错误都会有明显的错误信息,使用 `juju status --color` 的返回结果也将是红色的。 -如果节点状态出现这种情况,需要相应地检查了解。 - - -## SSH 到各个单元上 - - - -按照 `juju ssh <服务名>/<单元#>` 的命令格式可以轻松地连接到各个单元上: - - juju ssh kubernetes-worker/3 - - -将会 ssh 到第 3 个工作单元上。 - - juju ssh easyrsa/0 - - -将会 ssh 到第 0 个 easyrsa 单元上。 - - -## 收集调试信息 - - - -有时候,从集群上收集所有的信息,并与开发人员共享,将有助于发现问题。 -这最好是通过 [CDK Field Agent](https://github.com/juju-solutions/cdk-field-agent) 来完成。 - - - -在带有 Juju 客户端,而客户端配有指向相应的 CDK 部署的控制器的节点上, -下载并执行[CDK Field Agent](https://github.com/juju-solutions/cdk-field-agent)中的 collect.py 文件。 - - - -运行该脚本会生成一个 tar 包,包含系统信息以及诸如 systemctl 状态,Juju 日志,charm 单元数据等基本信息。 -额外和应用相关的信息可能也会包含其中。 - - - -## 常见问题 - - - -### 负载均衡器对 Helm 的影响 - - - -本节假定有一个用 Juju 部署的正在运行的 Kubernetes 集群,使用负载均衡器来代理 API,同时也用 Helm 来进行 chart 部署。 - - -Helm 初始化: - -``` -helm init -$HELM_HOME has been configured at /home/ubuntu/.helm -Tiller (the helm server side component) has been installed into your Kubernetes Cluster. -Happy Helming! -``` - - -随后使用 helm 时,可能会出现以下错误: - - -* Helm 不能从 Tiller 服务器获取版本号 - -``` -helm version -Client: &version.Version{SemVer:"v2.1.3", GitCommit:"5cbc48fb305ca4bf68c26eb8d2a7eb363227e973", GitTreeState:"clean"} -Error: cannot connect to Tiller -``` - - -* Helm 不能安装 chart - -``` -helm install --debug -Error: forwarding ports: error upgrading connection: Upgrade request required -``` - - - -这是因为 API 负载均衡器在 helm 客户端-服务端关系的上下文中不进行端口转发造成的。 -要使用 helm 进行部署,需要执行以下步骤: - - -1. 暴露 Kubernetes Master 服务 - - ``` - juju expose kubernetes-master - ``` - - -1. 确定其中一个主节点的公开 IP 地址 - - ``` - juju status kubernetes-master - Model Controller Cloud/Region Version - production k8s-admin aws/us-east-1 2.0.0 - - App Version Status Scale Charm Store Rev OS Notes - flannel 0.6.1 active 1 flannel jujucharms 7 ubuntu - kubernetes-master 1.5.1 active 1 kubernetes-master jujucharms 10 ubuntu exposed - - Unit Workload Agent Machine Public address Ports Message - kubernetes-master/0* active idle 5 54.210.100.102 6443/tcp Kubernetes master running. - flannel/0 active idle 54.210.100.102 Flannel subnet 10.1.50.1/24 - - Machine State DNS Inst id Series AZ - 5 started 54.210.100.102 i-002b7150639eb183b xenial us-east-1a - - Relation Provides Consumes Type - certificates easyrsa kubernetes-master regular - etcd etcd flannel regular - etcd etcd kubernetes-master regular - cni flannel kubernetes-master regular - loadbalancer kubeapi-load-balancer kubernetes-master regular - cni kubernetes-master flannel subordinate - cluster-dns kubernetes-master kubernetes-worker regular - cni kubernetes-worker flannel subordinate - ``` - - - - 本例中,公开 IP 地址为 54.210.100.102。 - 如果想编程访问得到这个值,可以使用 JSON 输出: - - ``` - juju show-status kubernetes-master --format json | jq --raw-output '.applications."kubernetes-master".units | keys[]' - 54.210.100.102 - ``` - - -1. 更新 kubeconfig 文件 - - - - 确定集群所使用的 kubeconfig 文件或配置部分,然后修改服务器配置。 - - 默认情况下,这个配置类似于 ```https://54.213.123.123:443```。将其替换为 Kubernetes Master 端点地址 - ```https://54.210.100.102:6443``` 并保存。 - - 注意,Kubernetes Master API 的 CDK 默认使用的端口为 6443,而负载均衡器暴露的端口是 443。 - - -1. 继续使用 helm! - - ``` - helm install --debug - Created tunnel using local port: '36749' - SERVER: "localhost:36749" - CHART PATH: /home/ubuntu/.helm/ - NAME: - ... - ... - ``` - - -## 日志和监控 - - - -默认情况下, Kubernetes 没有节点的日志聚合,每个节点都是本地保存日志。 -请参阅[日志](/docs/getting-started-guides/ubuntu/logging/)文档,获取更多信息。 - -{{% /capture %}} diff --git a/content/zh/docs/getting-started-guides/ubuntu/upgrades.md b/content/zh/docs/getting-started-guides/ubuntu/upgrades.md deleted file mode 100644 index 9df7fb310a639..0000000000000 --- a/content/zh/docs/getting-started-guides/ubuntu/upgrades.md +++ /dev/null @@ -1,273 +0,0 @@ ---- -title: 升级 -content_template: templates/task ---- - - - -{{% capture overview %}} - -本页将展示如何进行 Kubernetes 集群升级。 -{{% /capture %}} - -{{% capture prerequisites %}} - -本页假定你有一个 juju 部署的集群。 - -{{< warning >}} - - - -在进行升级之前,你应当备份所有的数据。 -不要忘记对集群内的工作负载进行数据备份! -参见[备份文档](/docs/getting-started-guides/ubuntu/backups)。 - -{{< /warning >}} - -{{% /capture %}} - -{{% capture steps %}} - - -## 对集群进行补丁版本升级,例如,1.9.0 -> 1.9.1 - - - -集群透明地升级到最新的 Kubernetes 补丁版本。 -需要澄清的是,用 1.9/stable 通道部署的集群将会透明地、自动更新到 Kubernetes 1.9.X 最新版。 -升级的过程对集群的运行没有影响,也不需要集群维护人员的干预。 -每一个补丁版本都由 Canonical Kubernetes 发布小组审核评估。 -一旦补丁版本通过了内部测试,认为可以安全用于集群升级, -将会被打包成 snap 格式,发布到稳定版通道上。 - - -## 对集群进行次版本升级,例如,1.8.1 -> 1.9.0 - - - -Kubernetes charms 遵循的是 Kubernetes 发行版本。 -请咨询了解 support 计划在升级频率方面的相关信息。 -重要的运维考虑以及行为上的改变都会记录在发布通知里。 - - -### 升级 etcd - - - -备份 etcd 需要导出和快照操作,参见[备份文档](/docs/getting-started-guides/ubuntu/backups)了解如何创建快照。 -在做完快照后,用下面的命令升级 etcd 服务: - - juju upgrade-charm etcd - - - -命令将会负责 etcd 的次版本升级。 -在 [juju 解决方案的 wiki](https://github.com/juju-solutions/bundle-canonical-kubernetes/wiki/Etcd-2.3-to-3.x-upgrade) 里 -可以了解如何将 etcd 从 2.x 升级到 3.x。 - - -### 升级 kubeapi-load-balancer - - - -Kubernetes Charms 通常是同时更新、发布。 -Ubuntu 集群的核心部分是 kubeapi-load-balancer 组件。 -错误或遗失修改可能会导致 API 可用性和访问控制方面的问题。 -为了保证 API 服务在集群升级期间还能为主节点和工作节点服务,也需要对它们进行升级。 - - - -升级命令: - - juju upgrade-charm kubeapi-load-balancer - - -### 升级 Kubernetes - - - -Kubernetes Charms 使用 snap 通道来驱动负荷。 -这些通道定义的格式为 `X.Y/channel`,其中,`X.Y` 是 Kubernetes `主.次` 发行版(例如,1.9) -而 `channel` 的取值范围如下: - - - -| 通道名 | 描述 | -| ------------------- | ------------ | -| stable | Kubernetes 的最新稳定发行版 | -| candidate | Kubernetes 的发行候选版 | -| beta | Kubernetes 次发行版的最新 alpha 或 beta 版 | -| edge | Kubernetes 次发行版的每日构建版 | - - - - -如果发行版还不可用,就会使用下一个最高通道的版本。 -例如,1.9/beta 会根据发行版的可用性加载 `/candidate` 或 `/stable` 版本。 -Kubernetes 的开发版本会根据每个次版本,发布到 edge 通道上。 -但不会保证 edge snap 能够和当前的 charms 一起工作。 - - -### 主节点升级 - - - -首先需要对主节点进行升级: - - juju upgrade-charm kubernetes-master - -{{< note >}} - - - -永远在工作节点升级之前,升级主节点。 - -{{< /note >}} - - - -在部署完最新的 charm 之后,可以通过下面的命令行来选择通道: - - juju config kubernetes-master channel=1.x/stable - - - -其中,`x` 是 Kubernetes 的次版本号。例如,`1.9/stable`。 -参阅前面对通道的定义。 -将 kubernetes-master 配置到合适的通道上后, -再在每个主节点上运行下面的升级命令: - - juju run-action kubernetes-master/0 upgrade - juju run-action kubernetes-master/1 upgrade - ... - - -### 工作节点升级 - - - -现有所支持的升级工作节点的方法有两种,[蓝/绿部署](http://martinfowler.com/bliki/BlueGreenDeployment.html) -和就地升级。提供两种方法可以带来运维上的灵活性,而这两种方法也都被支持和测试。 -相比于就地升级,蓝/绿部署需要更多的硬件资源,但也更为安全可靠。 - - -#### 蓝/绿工作节点升级 - - - -假定一个部署里面所有的工作节点都叫 kubernetes-alpha。 - - - -部署新的工作节点: - - juju deploy kubernetes-alpha - - - -暂停旧的工作节点,然后迁移工作负载: - - juju run-action kubernetes-alpha/# pause - - - -验证所迁移的工作负载: - - kubectl get pod -o wide - - - -销毁就有的工作节点: - - juju remove-application kubernetes-alpha - - -#### 就地工作节点升级 - - juju upgrade-charm kubernetes-worker - juju config kubernetes-worker channel=1.x/stable - - - -其中,`x` 是 Kubernetes 的次版本号。例如,`1.9/stable`。 -参阅前面对通道的定义。将 kubernetes-worker 配置到合适的通道上后, -再在每个工作节点上运行下面的升级命令: - - juju run-action kubernetes-worker/0 upgrade - juju run-action kubernetes-worker/1 upgrade - ... - - -### 验证升级 - - - -`kubectl version` 将会返回新的版本号。 - - - -建议重新运行[集群验证](/docs/getting-started-guides/ubuntu/validation)确认集群升级成功完成。 - - -### 升级 Flannel - - - -可以在任何时候升级 flannel,它的升级可以和 Kubernetes 升级分开进行。 -需要注意的是,在升级过程中,网络会受到影响。 -可以通过下面的命令行发起升级: - - juju upgrade-charm flannel - - -### 升级 easyrsa - - - -可以在任何时候升级 easyrsa,它的升级可以和 Kubernetes 升级分开进行。 -升级 easyrsa 会有停机时间,因为不是运行服务: - - juju upgrade-charm easyrsa - -{{% /capture %}} diff --git a/content/zh/docs/getting-started-guides/windows/OVN_OVS_Windows_Installer.png b/content/zh/docs/getting-started-guides/windows/OVN_OVS_Windows_Installer.png deleted file mode 100644 index 520f6ae9e6c54..0000000000000 Binary files a/content/zh/docs/getting-started-guides/windows/OVN_OVS_Windows_Installer.png and /dev/null differ diff --git a/content/zh/docs/getting-started-guides/windows/UpstreamRouting.png b/content/zh/docs/getting-started-guides/windows/UpstreamRouting.png deleted file mode 100644 index 91189c36af3ba..0000000000000 Binary files a/content/zh/docs/getting-started-guides/windows/UpstreamRouting.png and /dev/null differ diff --git a/content/zh/docs/getting-started-guides/windows/_index.md b/content/zh/docs/getting-started-guides/windows/_index.md deleted file mode 100644 index 343feaafcb3e2..0000000000000 --- a/content/zh/docs/getting-started-guides/windows/_index.md +++ /dev/null @@ -1,892 +0,0 @@ ---- -title: 在 Kubernetes 中使用 Windows Server 容器 -toc_hide: true ---- - - - -{{< note >}} -**Note:** 这些说明最近基于 Windows Server 平台增强和 Kubernetes v1.9 版本进行了更新 -{{< /note >}} - - - -Kubernetes 1.5 版本基于 Windows Server 2016 操作系统引入了对 Windows Server 容器 -的 Alpha 支持。随着 Windows Server 版本 1709 的发布和使用 Kubernetes v1.9,用户可以使用许多不同的 -网络拓扑和 CNI 插件在本地或私有/公共云中部署 Kubernetes 集群。Kubernetes 上的 Windows Server 容器的一些 -关键功能改进包括: - - - -- 改进了对 pod 的支持!具有多个 Windows Server 容器(共享内核)的共享网络命名空间(隔离专区) - - - -- 通过每个 pod 使用单个网络端点降低网络复杂性 - - - -- 使用虚拟过滤平台 (VFP)Hyper-v 交换机扩展(类似于 Linux iptables) 的基于内核的负载均衡 - - - -- 容器运行时接口(CRI) pod 和 节点级统计 - - - -- 支持 kubeadm 命令将 Windows Server 节点添加到 Kubernetes 环境中 - - - -Kubernetes 控制平面(API服务器,调度程序,控制器管理器等)继续在 Linux 上运行,而 kubelet 和 kube-proxy 可以在 Windows Server 2016 或更高版本上运行 - - - -{{< note >}} -**Note:** Kubernetes 上的 Windows Server 容器是 Kubernetes v1.9 中的一个 Beta 特性 -{{< /note >}} - - - -## 获取 Windows 二进制文件 - - - -我们建议使用可以在 [https://github.com/kubernetes/kubernetes/releases/latest](https://github.com/kubernetes/kubernetes/releases/latest) 上找到的发布的二进制文件。在更新日志下您可以找到 Windows-amd64 的节点二进制文件链接,其中包括 kubeadm,kubectl,kubelet 和 kube-proxy。 - - - -如果您希望自己构建代码,请参阅[此处](https://docs.microsoft.com/en-us/virtualization/windowscontainers/kubernetes/compiling-kubernetes-binaries)的详细构建说明。 - - - -## 环境准备 - -在 Kubernetes 1.9 或更高版本中,使用以下内容支持 Kubernetes 的 Windows Server 容器: - - -1. Kubernetes 控制平面在现有的 Linux 基础架构(1.9版本或更高版本)上运行。 - -2. Linux 的节点上的 Kubenet 网络插件设置。 - -3. Windows Server 2016 RTM 或更高版本,Windows Server 版本 1709 或更高版本是首选; 它解锁了共享网络命名空间等关键功能。 - -4. 适用于 Windows Server 节点的 Docker 版本 17.06.1-ee-2 或更高版本(Linux 节点和 Kubernetes 控制平面可以运行任何 Kubernetes 支持的 Docker 版本)。 - - -## 网络 - - - -Windows 上有几种支持 Kubernetes v1.9 的网络配置,包括使用第三方网络插件的第三层路由和覆盖拓扑。 - - -1. [上游 L3 路由](#upstream-l3-routing-topology) - 在上游 ToR 中配置的 IP 路由 - -2. [主机网关](#host-gateway-topology) - 在每台主机上配置的 IP 路由 - -3. [使用覆盖式 Open vSwitch(OVS) 和开放虚拟网络(OVN)](#using-ovn-with-ovs) - 覆盖网络(支持STT和Geneve隧道类型) - -4. [未来 - 评审中] 覆盖 - 使用 Flannel 的 VXLAN 或者 IP-in-IP 封装 - -5. [未来] 使用 BGP(Calico) 的第三层路由 - - -选择要部署的网络配置和拓扑取决于物理网络拓扑和用户配置路由的能力,封装的性能问题以及与第三方网络插件集成的要求。 - - -### 未来的 CNI 插件 -另外两个 CNI 插件 [win-l2bridge(主机网关)和 win-overlay(vxlan)] 正在进行 PR 审核。这两个 CNI 插件准备好后,既可以直接使用,也可以与 Flannel 一起使用。 - - - -### Linux -Linux 上已经使用桥接接口支持上述网络方法,桥接接口基本上创建了节点本地的专用网络。与 Windows 端类似,必须创建到所有其他 pod CIDR 的路由才能通过"公共" NIC 发送数据包。 - - - -### Windows -Windows 支持 CNI 网络模型,并使用插件与 Windows 主机网络服务(HNS)连接以配置主机网络和策略。在撰写本文时,Microsoft 唯一公开提供的 CNI 插件是从私人存储库构建的,可在此处获得[wincni.exe](https://github.com/Microsoft/SDN/blob/master/Kubernetes/windows/cni/wincni.exe)。它使用由管理员在每个节点上使用 HNS PowerShell 命令通过 Windows 主机网络服务(HNS)创建的 l2bridge 网络,如下面的 [Windows 主机设置](#windows-host-setup)部分所述。未来CNI插件的源代码将公开发布 - - - -#### 上游 L3 路由拓扑 -在这种拓扑结构中,通过在机架 (ToR)交换机/路由器的上游顶部配置静态 IP 路由,使用L3路由实现网络连接。每个群集节点都通过主机 IP 连接到管理网络。此外,每个节点使用本地'l2bridge'网络,并分配了一个 pod CIDR。给定工作节点上的所有 pod 将连接到 pod CIDR 子网('l2bridge'网络)。为了在不同节点上运行的 pod 之间实现网络通信,上游路由器配置了静态路由 pod CIDR 前缀 => 主机 IP。 - - - -以下示例图说明了使用上游 L3 路由设置的 Kubernetes 的 Windows Server 网络设置: - - -![K8s 集群使用 ToR 的 L3 路由](UpstreamRouting.png) - - - -#### 主机网关拓扑 -这种拓扑与上游 L3 路由拓扑相似,惟一的区别是静态 IP 路由是直接在每个集群节点上配置的,而不是在上游 ToR 中配置的。每个节点使用本地的 'l2bridge' 网络,并像以前一样分配 pod CIDR,并为分配给远程集群节点的所有其他 pod CIDR 子网提供路由表条目。 - - - -#### OVN 和 OVS 一起使用 -下图概述了组件之间的体系结构和交互: - - - -![覆盖式使用 OVN 控制器和 OVS 开关扩展](ovn_kubernetes.png) - - - -(上图来自 [https://github.com/openvswitch/ovn-kubernetes#overlay-mode-architecture-diagram](https://github.com/openvswitch/ovn-kubernetes#overlay-mode-architecture-diagram)) - - - -由于它的体系结构,OVN 有一个中央组件,它将您的网络意图存储在数据库中。其他组件如 kube-apiserver、kube-controller-manager、kube-scheduler 等也可以部署在该中心节点上。 - - -## 在 Kubernetes 上设置 Windows Server 容器 -要在 Kubernetes 上运行 Windows Server 容器,您需要为 Windows 设置主机和 Kubernetes 节点组件。根据您的网络拓扑,可能需要为不同节点上的 pod 通信设置路由。 - - - -### 主机设置 - - - -#### 1. 上游 L3 路由拓扑和 2. 主机网关拓扑 - - - -##### Linux 主机设置 - - - -1. Linux 主机应该根据它们各自的发行版文档和您将使用的 Kubernetes 版本的要求进行设置。 - -2. 使用步骤[此处](https://github.com/MicrosoftDocs/Virtualization-Documentation/blob/live/virtualization/windowscontainers/kubernetes/creating-a-linux-master.md)配置Linux主节点 - -3. [可选]安装CNI网络插件。 - - -##### Windows 主机设置 - - - -1. 运行所需 Windows Server 和 Docker 版本的 Windows Server 容器主机。请按照此帮助主题概述的安装说明进行操作:https://docs.microsoft.com/en-us/virtualization/windowscontainers/quick-start/quick-start-windows-server。 - -2. 2. [获取 Windows 二进制文件](#get-windows-binaries) kubelet.exe, kube-proxy.exe, and kubectl.exe 使用说明 - -3. 使用 X.509 密钥从 Linux 主节点复制节点规范文件(kube config) - -4. 创建 HNS 网络,确保正确的 CNI 网络配置,并使用此脚本 [start-kubelet.ps1](https://github.com/Microsoft/SDN/blob/master/Kubernetes/windows/start-kubelet.ps1) 启动 kubelet.exe - -5. 使用此脚本启动 [start-kubeproxy.ps1](https://github.com/Microsoft/SDN/blob/master/Kubernetes/windows/start-kubeproxy.ps1) 启动 kube-proxy - -6. [仅限 #2 主机网关模式]使用此脚本 [AddRoutes.ps1](https://github.com/Microsoft/SDN/blob/master/Kubernetes/windows/AddRoutes.ps1) 在Windows主机上添加静态路由 - -更详细的说明可以在[这里](https://github.com/MicrosoftDocs/Virtualization-Documentation/blob/live/virtualization/windowscontainers/kubernetes/getting-started-kubernetes-windows.md)找到。 - - -**Windows CNI 配置示例** - - - -Windows CNI 插件基于 wincni.exe 的,配置文件,是基于上面显示的 ToR 示例图,指定了应用于 Windows node-1 的配置。特别有趣的是 Windows node-1 pod CIDR(10.10.187.64/26) 和 cbr0(10.10.187.66)的关联网关。异常列表指定服务 CIDR(11.0.0.0/8),集群 CIDR(10.10.0.0/16) 和管理(或主机) CIDR(10.127.132.128/25)。 - - -注意:此文件假设用户以前使用 -HNSNetworkcmdlet 在每个 Windows 节点上创建了'l2bridge' 主机网络,如上面链接的 start-kubelet.ps1 和 start-kubeproxy.ps1 脚本中所示 - - -```json -{ - "cniVersion": "0.2.0", - "name": "l2bridge", - "type": "wincni.exe", - "master": "Ethernet", - "ipam": { - "environment": "azure", - "subnet": "10.10.187.64/26", - "routes": [{ - "GW": "10.10.187.66" - }] - }, - "dns": { - "Nameservers": [ - "11.0.0.10" - ] - }, - "AdditionalArgs": [{ - "Name": "EndpointPolicy", - "Value": { - "Type": "OutBoundNAT", - "ExceptionList": [ - "11.0.0.0/8", - "10.10.0.0/16", - "10.127.132.128/25" - ] - } - }, - { - "Name": "EndpointPolicy", - "Value": { - "Type": "ROUTE", - "DestinationPrefix": "11.0.0.0/8", - "NeedEncap": true - } - }, - { - "Name": "EndpointPolicy", - "Value": { - "Type": "ROUTE", - "DestinationPrefix": "10.127.132.213/32", - "NeedEncap": true - } - } - ] -} -``` - -#### 3.使用覆盖方式打开 vSwitch(OVS) 和开放虚拟网络(OVN) - - - -{{< note >}} -**Note:** 通过 Ansible 剧本的全自动设置是[可用的](https://github.com/openvswitch/ovn-kubernetes/tree/master/contrib)。 -{{< /note >}} - - - -对于手动设置,请继续以下步骤。 - - -##### Linux 主机设置 - - - -设置中心节点和所需组件超出了本文档的范围。您可以阅读[这些说明](https://github.com/openvswitch/ovn-kubernetes#k8s-master-node-initialization)。 - - -添加 Linux minion 也超出了范围,你可以在这里阅读:[Linux minion](https://github.com/openvswitch/ovn-kubernetes#k8s-minion-node-initializations)。 - - - -##### windows 主机设置 - - - -添加 Windows minion 需要您安装 OVS 和 OVN 二进制文件。运行所需 Windows Server 和 Docker 版本的 Windows Server 容器主机。请按照[此帮助主题](https://docs.microsoft.com/en-us/virtualization/windowscontainers/quick-start/quick-start-windows-server)概述的设置说明进行操作。从 Windows Server 2016 RTM 开始支持此类部署。 - - -编译 OVS 并生成安装程序不在本文中讨论。请访问[此链接](http://docs.openvswitch.org/en/latest/intro/install/windows/#open-vswitch-on-windows)。对于预构建的认证安装程序,请访问[此链接](https://cloudbase.it/openvswitch/#download)并下载最新版本 - - -以下指南使用预构建的认证安装程序。 - - -安装 OVS 既可以通过 GUI 对话框完成,也可以在无人看管的情况下完成。将 Windows 主机添加到您的设置需要您拥有`OVN主机`和默认安装特性。下面是需要安装的对话框图像: - - -![Windows 安装 OVN OVS](OVN_OVS_Windows_Installer.png) - - - -对于无人看管情况下的安装,请使用以下命令: - - -``` -cmd /c 'msiexec /i openvswitch.msi ADDLOCAL="OpenvSwitchCLI,OpenvSwitchDriver,OVNHost" /qn' -``` - -安装程序设置新的环境变量。请使用命令打开一个新的 shell 或注销/登录,以确保刷新了环境变量。 - - -对于叠加,Windows 上的 OVS 需要透明的 docker 网络才能正常运行。请使用以下命令创建一个透明的 docker 网络,OVS 将使用该网络。powershell: - - -``` -docker network create -d transparent --gateway $GATEWAY_IP --subnet $SUBNET ` - -o com.docker.network.windowsshim.interface="$INTERFACE_ALIAS" external -``` - -$SUBNET 是用于产生 pods 的 minion 子网(将由 kubernetes 使用的子网),$GATEWAY_IP 是 $SUBNET 的第一个 IP,$INTERFACE_ALIAS 是用于创建覆盖隧道的接口(必须与 OVN 主机的 rests 连接)。 -例: - - - -``` -docker network create -d transparent --gateway 10.0.1.1 --subnet 10.0.1.0/24 ` - -o com.docker.network.windowsshim.interface="Ethernet0" external -``` - -创建 docker 网络后,请从 powershell 运行下面的命令。(创建OVS桥接器,在桥接器下添加接口,并启用OVS转发交换机扩展名) - - -``` -$a = Get-NetAdapter | where Name -Match HNSTransparent -Rename-NetAdapter $a[0].Name -NewName HNSTransparent -Stop-Service ovs-vswitchd -force; Disable-VMSwitchExtension "Cloudbase Open vSwitch Extension"; -ovs-vsctl --no-wait del-br br-ex -ovs-vsctl --no-wait --may-exist add-br br-ex -ovs-vsctl --no-wait add-port br-ex HNSTransparent -- set interface HNSTransparent type=internal -ovs-vsctl --no-wait add-port br-ex $INTERFACE_ALIAS -Enable-VMSwitchExtension "Cloudbase Open vSwitch Extension"; sleep 2; Restart-Service ovs-vswitchd -``` - -除此之外,Windows主机的设置与Linux主机相同。从[这里](https://github.com/openvswitch/ovn-kubernetes#k8s-minion-node-initializations)开始执行以下步骤。 - - -**Windows CNI 设置** - - - -现在,Windows OVN&OVS CNI 插件是基于 ovn_cni.exe 可以从[此处](https://cloudbase.it/downloads/ovn_cni.exe)下载。CNI 配置文件示例如下: - - -``` -{ - "name": "net", - "type": "ovn_cni.exe", - "bridge": "br-int", - "isGateway": "true", - "ipMasq": "false", - "ipam": { - "type": "host-local", - "subnet": "$SUBNET" - } -} -``` - -$SUBNET 是上一个 ```docker network create``` 命令中使用的子网。 - - -有关谷歌云平台(GCP),即谷歌计算引擎(GCE)的完整指南,请访问[这里](https://github.com/apprenda/kubernetes-ovn-heterogeneous-cluster#heterogeneous-kubernetes-cluster-on-top-of-ovn)。 - - -有关亚马逊网络服务(AWS),请访问[这里](https://github.com/justeat/kubernetes-windows-aws-ovs#kubernetes-on-windows-in-aws-using-ovn)。 - - -## 启动群集 -要启动集群,您需要启动基于 Linux 的 Kubernetes 控制平面和基于 Windows Server 的 Kubernetes 节点组件(kubelet 和 kube-proxy)。对于 OVS 和 OVN,仅需要 kubelet。 - - - -## 启动基于 Linux-based 的控制平面 -使用您喜欢的方法在 Linux 上启动 Kubernetes 集群。请注意,集群 CIDR 可能需要更新。 - - - -## 支持kubeadm加入 - - - -如果您的群集是由[kubeadm](/docs/setup/independent/create-cluster-kubeadm/),创建的 -使用上面列出的方法之一正确地设置网络(网络是在 kubeadm 之外设置的),您可以使用 kubeadm 向集群添加 Windows 节点。在较高的级别上,首先必须使用 kubeadm(Linux) 初始化主节点,然后设置基于 CNI 的网络(在 kubeadm 之外),最后开始将 Windows 或 Linux 工作节点连接到集群。如需其他文件和参考资料,请访问上文的 kubeadm 链接。 - - - -kubeadm 二进制文件可以在 [Kubernetes 版本](https://github.com/kubernetes/kubernetes/release)的节点二进制文件归档中找到。添加 Windows 节点与添加 Linux 节点没有任何不同: - - - -`kubeadm.exe join --token : --discovery-token-ca-cert-hash sha256:` - -有关更多详细信息请参阅[加入您的节点](/docs/setup/independent/create-cluster-kubeadm/#joining-your-nodes)。 - - -## 支持的功能 - - - -下面列出的示例假设在 Windows Server 1709 上运行 Windows 节点。如果您正在运行 Windows Server 2016,示例将需要更新镜像以指定 `image: microsoft/windowsservercore:ltsc2016`。这是因为在使用进程隔离时,需要容器镜像匹配主机操作系统版本。不指定标记将隐式地使用 `:latest` 标记,这可能导致令人惊讶的行为。有关 Windows Service Core 镜像标记的更多信息,请与[https://hub.docker.com/r/microsoft/windowsservercore/](https://hub.docker.com/r/microsoft/windowsservercore/)联系。 - - - -### 在 Windows 上调度 Pod -由于您的群集同时具有 Linux 和 Windows 节点,因此必须明确设置 nodeSelector 约束以便能够将 pod 安排到 Windows 节点。必须将 nodeSelector 的标签 beta.kubernetes.io/os 设置为值 windows; 请参阅以下示例: - - - -{{< codenew file="windows/simple-pod.yaml" >}} - -{{< note >}} -**Note:** 本例假设您在 Windows Server 1709 上运行,因此使用镜像标记来支持它。如果使用不同的版本,则需要更新标记。例如,如果在 Windows Server 2016 上,更新为使用 `"image": "microsoft/iis"`,默认为该操作系统版本。 -{{< /note >}} - - - -### Secrets 和 ConfigMaps -secret和configmap可以在Windows Service 容器中使用,但是必须作为环境变量使用。有关更多细节,请参见下面的限制部分。 - - - -**例子:** - - - -Windows pod 与 secrets 映射到环境变量 - - -{{< codenew file="windows/secret-pod.yaml" >}} - -具有 configMap 值的 Windows Pod 映射到环境变量 - - -{{< codenew file="windows/configmap-pod.yaml" >}} - -### 卷 -一些受支持的卷挂载可以是本地卷,emptyDir 卷和主机路径卷。需要记住的一点是,路径必须转义,或者使用前斜杠,例如:`mountPath: "C:\\etc\\foo"` or `mountPath: "C:/etc/foo"`。 - - - -对于受支持的卷类型,支持持久卷的声明。 - - -**例子:** - - - -带有主机路径卷的 Windows pod - - -{{< codenew file="windows/hostpath-volume-pod.yaml" >}} - - -具有多个 emptyDir 卷的 Windows pod - - -{{< codenew file="windows/emptydir-pod.yaml" >}} - -### 守护线程集 - - - -支持守护线程集 - - -{{< codenew file="windows/daemonset.yaml" >}} - -### 指标 - - - -Windows Stats 使用混合模型:pod 和容器级别的统计数据来自 CRI(通过 dockershim),而节点级别的统计数据来自“winstats”包,该包使用特定于 Windows 的 perf 计数器导出 cadvisor 之类的数据结构。 - - -### 容器资源 - - - -现在可以为 v1.10 中的 windows 容器设置容器资源(CPU和内存)。 - - -{{< codenew file="windows/deploy-resource.yaml" >}} - -### Hyper-V 容器 - - - -Hyper-V 容器在 v1.10 中作为实验支持。要创建 Hyper-V 容器,kubelet 应该从特性 gates `HyperVContainer=true` 开始,Pod 应该包含注释 `experimental.windows.kubernetes.io/isolation-type=hyperv`。 - - -{{< codenew file="windows/deploy-hyperv.yaml" >}} - -### Kubelet 和 kube-proxy 现在可以作为 Windows 服务运行 - - - -从 kubernetes v1.11 开始,kubelet 和 kube-proxy 可以作为 Windows 服务运行。 - - -这意味着您现在可以通过 `sc` 命令将它们注册为 Windows 服务。有关如何使用 `sc` 创建 Windows 服务的更多细节,请参阅[此处](https://support.microsoft.com/en-us/help/251192/how-to-create-a-windows-service-by-using-sc-exe)。 - - -**例子:** - - - -创建服务 - - -``` -PS > sc.exe create binPath= " --service " -CMD > sc create binPath= " --service " -``` - -请注意,如果参数包含空格,则必须对其进行转义。例: - - -``` -PS > sc.exe create kubelet binPath= "C:\kubelet.exe --service --hostname-override 'minion' " -CMD > sc create kubelet binPath= "C:\kubelet.exe --service --hostname-override 'minion' " -``` - -启动服务: - - -``` -PS > Start-Service kubelet; Start-Service kube-proxy -CMD > net start kubelet && net start kube-proxy -``` - -停止服务 - - -``` -PS > Stop-Service kubelet (-Force); Stop-Service kube-proxy (-Force) -CMD > net stop kubelet && net stop kube-proxy -``` - -查询服务 - - -``` -PS > Get-Service kubelet; Get-Service kube-proxy; -CMD > sc.exe queryex kubelet && sc qc kubelet && sc.exe queryex kube-proxy && sc.exe qc kube-proxy -``` - -## Windows Server 容器与 v1.9 的已知限制 - - - -在未来的Kubernetes版本中,社区将解决其中一些限制: - - -- 共享网络名称空间(隔间)与多个 Windows Server 容器(共享内核)每个 pod 只支持在 Windows Server 1709 或更高 - - - -- 不支持使用 secret 和 configmap 作为卷装载 - - - -- Windows不支持挂载传播 - - - -- 不支持有状态应用程序的状态集功能 - - - -- Windows Server 容器 pod 的相同 pod 自动缩放尚未经过验证,pod 端之间无法工作。 - - - -- 不支持Hyper-V隔离容器。 - - - -- Windows 容器操作系统必须与主机操作系统匹配。如果它不这样做,pod 就会陷入崩溃循环。 - - - -- 在 L3 或主机 GW 的网络模型下,由于 Windows 问题,Windows 节点无法访问 - - - -- Windows kubelet.exe 在 VMware Fusion 下运行 Windows Server 时,可能无法启动[issue 57110](https://github.com/kubernetes/kubernetes/pull/57124) - - - -- Flannel 和 Weavenet 尚未得到支持 - - - -- 一些 .Net 核心应用程序希望环境变量的名称中带有冒号 (`:`)。Kubernetes 目前不允许这样做。根据[此处](https://docs.microsoft.com/en-us/aspnet/core/fundamentals/configuration/?tabs=basicconfiguration#configuration-by-environment)所述,用双下划线 (`__`) 替换冒号 (':') - - - -- 由于 cgroups 在 windows 上不受支持,kubelet.exe 应该以以下附加参数开始 `--cgroups-per-qos=false --enforce-node-allocatable=""` [issue 61716](https://github.com/kubernetes/kubernetes/issues/61716) - - - -## 后续步骤和资源 - - - -- 对Windows版本的支持从v1.9开始测试,欢迎您提供反馈。有关参与的信息,请访问[SIG-Windows](https://github.com/kubernetes/community/blob/master/sig-windows/README.md) -- 故障排除和常见问题:[链接](https://docs.microsoft.com/en-us/virtualization/windowscontainers/kubernetes/common-problems) - - diff --git a/content/zh/docs/getting-started-guides/windows/ovn_kubernetes.png b/content/zh/docs/getting-started-guides/windows/ovn_kubernetes.png deleted file mode 100644 index 739d75aad765c..0000000000000 Binary files a/content/zh/docs/getting-started-guides/windows/ovn_kubernetes.png and /dev/null differ diff --git a/content/zh/docs/getting-started-guides/windows/sample-l2bridge-wincni-config.json b/content/zh/docs/getting-started-guides/windows/sample-l2bridge-wincni-config.json deleted file mode 100644 index f3842026ce28c..0000000000000 --- a/content/zh/docs/getting-started-guides/windows/sample-l2bridge-wincni-config.json +++ /dev/null @@ -1,49 +0,0 @@ -{ - "cniVersion": "0.2.0", - "name": "l2bridge", - "type": "wincni.exe", - "master": "Ethernet", - "ipam": { - "environment": "azure", - "subnet": "10.10.187.64/26", - "routes": [ - { - "GW": "10.10.187.66" - } - ] - }, - "dns": { - "Nameservers": [ - "11.0.0.10" - ] - }, - "AdditionalArgs": [ - { - "Name": "EndpointPolicy", - "Value": { - "Type": "OutBoundNAT", - "ExceptionList": [ - "11.0.0.0/8", - "10.10.0.0/16", - "10.127.132.128/25" - ] - } - }, - { - "Name": "EndpointPolicy", - "Value": { - "Type": "ROUTE", - "DestinationPrefix": "11.0.0.0/8", - "NeedEncap": true - } - }, - { - "Name": "EndpointPolicy", - "Value": { - "Type": "ROUTE", - "DestinationPrefix": "10.127.132.213/32", - "NeedEncap": true - } - } - ] -} diff --git a/content/zh/docs/getting-started-guides/windows/windows-setup.png b/content/zh/docs/getting-started-guides/windows/windows-setup.png deleted file mode 100644 index e11c58d596e35..0000000000000 Binary files a/content/zh/docs/getting-started-guides/windows/windows-setup.png and /dev/null differ diff --git a/content/zh/docs/reference/federation/_index.html b/content/zh/docs/reference/federation/_index.html deleted file mode 100644 index 73b1875340882..0000000000000 --- a/content/zh/docs/reference/federation/_index.html +++ /dev/null @@ -1,11 +0,0 @@ ---- -title: "联邦 API" -weight: 40 ---- - - diff --git a/content/zh/docs/reference/glossary/rkt.md b/content/zh/docs/reference/glossary/rkt.md deleted file mode 100644 index b3423f32ff90b..0000000000000 --- a/content/zh/docs/reference/glossary/rkt.md +++ /dev/null @@ -1,26 +0,0 @@ ---- -title: rkt -id: rkt -date: 2019-01-24 -full_link: https://coreos.com/rkt/ -short_description: > - A security-minded, standards-based container engine. - -aka: -tags: -- security -- tool ---- - - - -一个安全的,基于标准的容器引擎。 - - - - -rkt 是一个应用程序 {{}} 引擎,其中包含{{}} -原生方法,可插拔式执行环境, 和定义良好的展示模块。 rkt 允许用户在 Pod 和应用程序级别应用不同的配置。每个 Pod 都直接在经典的 Unix 进程模型中,在一个独立的隔离环境中执行。 diff --git a/content/zh/docs/reference/kubectl/kubectl-overview.md b/content/zh/docs/reference/kubectl/kubectl-overview.md deleted file mode 100644 index d6fd615353e55..0000000000000 --- a/content/zh/docs/reference/kubectl/kubectl-overview.md +++ /dev/null @@ -1,288 +0,0 @@ ---- -approvers: -- hw-qiaolei -title: kubectl概述 ---- -kubectl是用于针对Kubernetes集群运行命令的命令行接口。本概述涵盖`kubectl`语法,描述命令操作,并提供常见的示例。有关每个命令的详细信息,包括所有支持的flags和子命令,请参考[kubectl](/docs/user-guide/kubectl)相关文档。有关安装说明,请参阅[安装kubectl](/docs/tasks/kubectl/install/)。 - -## 语法 -从您的终端窗口使用以下语法运行`kubectl`命令: - -```shell -kubectl [command] [TYPE] [NAME] [flags] -``` - -其中command,TYPE,NAME,和flags分别是: - -* `command`: 指定要在一个或多个资源进行操作,例如`create`,`get`,`describe`,`delete`。 - -* `TYPE`:指定[资源类型](#资源类型)。资源类型区分大小写,您可以指定单数,复数或缩写形式。例如,以下命令产生相同的输出: - - $ kubectl get pod pod1 - $ kubectl get pods pod1 - $ kubectl get po pod1 - -`NAME`:指定资源的名称。名称区分大小写。如果省略名称,则会显示所有资源的详细信息,比如`$ kubectl get pods`。 - - 在多个资源上执行操作时,可以按类型和名称指定每个资源,或指定一个或多个文件: - - * 按类型和名称指定资源: - - * 要分组资源,如果它们都是相同的类型:`TYPE1 name1 name2 name<#>`.
- 例: `$ kubectl get pod example-pod1 example-pod2` - - * 要分别指定多种资源类型: `TYPE1/name1 TYPE1/name2 TYPE2/name3 TYPE<#>/name<#>`.
- 例: `$ kubectl get pod/example-pod1 replicationcontroller/example-rc1` - - 使用一个或多个文件指定资源: `-f file1 -f file2 -f file<#>` 使用[YAML而不是JSON](/docs/concepts/configuration/overview/#general-configuration-tips),因为YAML往往更加用户友好,特别是对于配置文件。
- 例:$ kubectl get pod -f ./pod.yaml - -* flags:指定可选标志。例如,您可以使用`-s`或`--serverflags`来指定Kubernetes API服务器的地址和端口。 -**重要提示**:从命令行指定的标志将覆盖默认值和任何相应的环境变量。 - -如果您需要帮助,只需从终端窗口运行`kubectl help`。 - -## 操作 - -下表包括所有kubectl操作的简短描述和一般语法: - -Operation | Syntax | Description --------------------- | -------------------- | -------------------- -`annotate` | `kubectl annotate (-f FILENAME | TYPE NAME | TYPE/NAME) KEY_1=VAL_1 ... KEY_N=VAL_N [--overwrite] [--all] [--resource-version=version] [flags]` | 添加或更新一个或多个资源的注解。 -`api-versions` | `kubectl api-versions [flags]` | 列出可用的API版本。 -`apply` | `kubectl apply -f FILENAME [flags]`| 对文件或标准输入流更改资源应用配置。 -`attach` | `kubectl attach POD -c CONTAINER [-i] [-t] [flags]` | attach 到正在运行的容器来查看输出流或与容器(stdin)进行交互。 -`autoscale` | `kubectl autoscale (-f FILENAME | TYPE NAME | TYPE/NAME) [--min=MINPODS] --max=MAXPODS [--cpu-percent=CPU] [flags]` | 自动弹性伸缩一组被replication controller管理的pods。 -`cluster-info` | `kubectl cluster-info [flags]` | 显示有关集群中master节点和服务的端点信息。 -`config` | `kubectl config SUBCOMMAND [flags]` | 修改kubeconfig文件。有关详细信息,请参阅各个子命令。 -`create` | `kubectl create -f FILENAME [flags]` | 从文件或stdin创建一个或多个资源。 -`delete` | `kubectl delete (-f FILENAME | TYPE [NAME | /NAME | -l label | --all]) [flags]` | 从文件,stdin或指定selector,名称,资源选择器或资源中删除资源。 -`describe` | `kubectl describe (-f FILENAME | TYPE [NAME_PREFIX | /NAME | -l label]) [flags]` | 显示一个或多个资源的详细状态。 -`edit` | `kubectl edit (-f FILENAME | TYPE NAME | TYPE/NAME) [flags]` | 使用默认编辑器编辑和更新服务器上一个或多个资源的定义。 -`exec` | `kubectl exec POD [-c CONTAINER] [-i] [-t] [flags] [-- COMMAND [args...]]` | 对pod中的容器执行命令 -`explain` | `kubectl explain [--include-extended-apis=true] [--recursive=false] [flags]` | 获取各种资源的文档。例如 pods, nodes, services 等. -`expose` | `kubectl expose (-f FILENAME | TYPE NAME | TYPE/NAME) [--port=port] [--protocol=TCP|UDP] [--target-port=number-or-name] [--name=name] [----external-ip=external-ip-of-service] [--type=type] [flags]` | 将暴露replication controller, service, 或者pod为新的Kubernetes服务。 -`get` | `kubectl get (-f FILENAME | TYPE [NAME | /NAME | -l label]) [--watch] [--sort-by=FIELD] [[-o | --output]=OUTPUT_FORMAT] [flags]` | 列出一个或多个资源。 -`label` | `kubectl label (-f FILENAME | TYPE NAME | TYPE/NAME) KEY_1=VAL_1 ... KEY_N=VAL_N [--overwrite] [--all] [--resource-version=version] [flags]` | 添加或更新一个或多个资源的标签 -`logs` | `kubectl logs POD [-c CONTAINER] [--follow] [flags]` | 在pod的容器中打印日志。 -`patch` | `kubectl patch (-f FILENAME | TYPE NAME | TYPE/NAME) --patch PATCH [flags]` | 使用strategic merge patch程序更新资源的一个或多个字段。 -`port-forward` | `kubectl port-forward POD [LOCAL_PORT:]REMOTE_PORT [...[LOCAL_PORT_N:]REMOTE_PORT_N] [flags]` | 将一个或多个本地端口转发到pod。 -`proxy` | `kubectl proxy [--port=PORT] [--www=static-dir] [--www-prefix=prefix] [--api-prefix=prefix] [flags]` | 运行一个代理到Kubernetes API服务器。 -`replace` | `kubectl replace -f FILENAME` | 从文件或stdin替换资源。 -`rolling-update` | `kubectl rolling-update OLD_CONTROLLER_NAME ([NEW_CONTROLLER_NAME] --image=NEW_CONTAINER_IMAGE | -f NEW_CONTROLLER_SPEC) [flags]` | 通过逐步替换指定的replication controller及其pod来执行滚动更新。 -`run` | `kubectl run NAME --image=image [--env="key=value"] [--port=port] [--replicas=replicas] [--dry-run=bool] [--overrides=inline-json] [flags]` | 在集群上运行指定的镜像。 -`scale` | `kubectl scale (-f FILENAME | TYPE NAME | TYPE/NAME) --replicas=COUNT [--resource-version=version] [--current-replicas=count] [flags]` | 更新指定replication controller的副本数量。 -`stop` | `kubectl stop` | 已弃用: 相应的, 请查看 `kubectl delete`. -`version` | `kubectl version [--client] [flags]` | 显示在客户端和服务器上运行的Kubernetes版本。 - -请记住:有关命令操作的更多信息,请参阅[kubectl](/docs/user-guide/kubectl)参考文档。 - -## 资源类型 - -下表包括所有支持的资源类型及其缩写别名的列表: - -资源类型 | 缩写别名 --------------------- | -------------------- -`apiservices` | -`certificatesigningrequests` |`csr` -`clusters` | -`clusterrolebindings` | -`clusterroles` | -`componentstatuses` |`cs` -`configmaps` |`cm` -`controllerrevisions` | -`cronjobs` | -`customresourcedefinition` |`crd`, `crds` -`daemonsets` |`ds` -`deployments` |`deploy` -`endpoints` |`ep` -`events` |`ev` -`horizontalpodautoscalers` |`hpa` -`ingresses` |`ing` -`jobs` | -`limitranges` |`limits` -`namespaces` |`ns` -`networkpolicies` |`netpol` -`nodes` |`no` -`persistentvolumeclaims` |`pvc` -`persistentvolumes` |`pv` -`poddisruptionbudget` |`pdb` -`podpreset` | -`pods` |`po` -`podsecuritypolicies` |`psp` -`podtemplates` | -`replicasets` |`rs` -`replicationcontrollers` |`rc` -`resourcequotas` |`quota` -`rolebindings` | -`roles` | -`secrets` | -`serviceaccounts` |`sa` -`services` |`svc` -`statefulsets` | -`storageclasses` | - -## 输出选项 -请使用以下部分查看如何格式化或排序某些命令的输出的信息,有关哪些命令支持各种输出选项的详细信息,请参阅[kubectl](/docs/user-guide/kubectl)参考文档。 - -### 格式化输出 - -所有kubectl命令的默认输出格式是可读的纯文本格式。要以特定格式将详细信息输出到终端窗口,您可以将一个`-o`或多个`-output`标志添加到支持的`kubectl`命令中。 - -#### 语法 - -```shell -kubectl [command] [TYPE] [NAME] -o= -``` - -根据kubectl操作,支持以下输出格式: - -输出格式 | 描述 ---------------| ----------- -`-o=custom-columns=` | 输入使用逗号分隔的列表打印表格 [custom columns](#custom-columns)。 -`-o=custom-columns-file=` | 使用文件中的[自定义列模板](#custom-columns)打印表``。 -`-o=json` | 输出JSON格式的API对象。 -`-o=jsonpath=