diff --git a/website/config.toml b/website/config.toml index 96f1a9d9..d3779ff3 100644 --- a/website/config.toml +++ b/website/config.toml @@ -67,6 +67,13 @@ languageName = "한국어(Korean)" contentDir = "content/ko" weight = 3 +[languages.ja] +title = "CNCF TAG Appデリバリー" +description = "TAG Appデリバリーは、構築、実行、管理、運用を含む、クラウドネイティブアプリケーションのデリバリーに関連するプロジェクトや取り組みの支援を行います。" +languageName = "日本語(Japanese)" +contentDir = "content/ja" +weight = 4 + [markup] [markup.goldmark] [markup.goldmark.renderer] diff --git a/website/content/ja/_index.md b/website/content/ja/_index.md new file mode 100644 index 00000000..01ec23a7 --- /dev/null +++ b/website/content/ja/_index.md @@ -0,0 +1,49 @@ +--- +title: "CNCF TAG Appデリバリー" +list_pages: true +--- + +
+
+ Tag App Delivery logo +
+
+
+ TAG Appデリバリーは、構築、パッケージング、実行、管理、運用を含む、クラウドネイティブアプリケーションのデリバリーに関連するプロジェクトや取り組みの支援を行います。 +
+
+
+ +TAG Appデリバリーは、クラウドネイティブ・アプリケーションの開発者やプラットフォームエンジニア、エンドユーザーからのフィードバックを収集し、それらをTAGの担当分野のプロジェクトに共有します。また、エンドユーザー向けのガイダンスと事例を提供します。 + +このTAGは、チャーターに関連するプロジェクトを支援しています。これには、[アプリケーションの定義とイメージのビルド](https://landscape.cncf.io/card-mode?category=application-definition-image-build&project=hosted)、[継続的インテグレーションとデリバリー](https://landscape.cncf.io/card-mode?category=continuous-integration-delivery&project=hosted)、[コンテナレジストリ](https://landscape.cncf.io/card-mode?category=container-registry&project=hosted)の領域における[CNCFランドスケープ](https://landscape.cncf.io/card-mode)内のプロジェクトが含まれます。 + +現在2つのワーキンググループ(WG)が活動しています。[プラットフォームWG](./wgs/platforms/)と[アーティファクトWG](./wgs/artifacts/)です。 + +## ミーティング + +毎月第1、第3水曜日の16:00 UTC ([居住地の時刻に変換](https://dateful.com/convert/utc?t=16)). + +ミーティングは[CNCFのカレンダー](https://www.cncf.io/calendar/)と[CNCFコミュニティカレンダー](https://community.cncf.io/tag-app-delivery/)に掲載されます。 + +* [アジェンダと議事録](https://docs.google.com/document/d/1OykvqvhSG4AxEdmDMXilrupsX2n1qCSJUWwTc3I7AOs/edit#) +* [Zoomミーティング](https://zoom.us/j/7276783015) (パスコード: 77777) +* [過去のミーティングの録画](https://www.youtube.com/playlist?list=PLj6h78yzYM2OHd1Ht3jiZuucWzvouAAci) + +## リーダー + +- [Alois Reitbauer](https://github.com/AloisReitbauer) (チェア) +- [Josh Gavant](https://github.com/joshgav) (チェア) +- [Thomas Schuetz](https://github.com/thschue) (チェア) +- [Alex Jones](https://github.com/alexsjones) (テックリード) +- [Lian Li](https://github.com/lianmakesthings) (テックリード) +- [Karena Angell](https://github.com/angellk) (テックリード) + +### その他の資料 + +- [TAGのチャーター](https://github.com/cncf/toc/blob/main/tags/app-delivery.md) +- Slackチャンネル: [#tag-app-delivery](https://cloud-native.slack.com/messages/CL3SL0CP5) + - [CNCF Slackにご自身を招待する](https://slack.cncf.io/) +- [メーリングリスト](https://lists.cncf.io/g/cncf-tag-app-delivery/topics) + +

Man working on computer

diff --git a/website/content/ja/about/_index.md b/website/content/ja/about/_index.md new file mode 100644 index 00000000..4512aefa --- /dev/null +++ b/website/content/ja/about/_index.md @@ -0,0 +1,19 @@ +--- +title: About TAG App Delivery +linkTitle: About +toc_hide: true +description: Projects and initatives maintained by TAG App Delivery +--- + +## Working Groups + +The TAG establishes working groups (WGs) to accomplish specific projects and initiatives. + +| Working Group | Chairs | Meeting Time | +|---------------|-------------------|---------------------------------------| +| [Platforms](wg-platforms.md) | [Platforms chairs](wg-platforms/#chairs) | [Platforms meetings](wg-platforms/#meetings) | +| [GitOps](https://github.com/cncf/tag-app-delivery/tree/main/gitops-wg) | [gitops-wg/CHAIRS.md](./gitops-wg/CHAIRS.md) | [gitops-wg/README.md#meetings](./gitops-wg/README.md#meetings) | +| [Air Gapped](https://github.com/cncf/tag-app-delivery/tree/main/air-gapped-wg) | | Inactive | +| [Operator](https://github.com/cncf/tag-app-delivery/tree/main/operator-wg) | | Inactive | +|[Artifacts](https://github.com/cncf-tags/wg-artifacts#readme) | [Chairs](https://github.com/cncf-tags/wg-artifacts#chairs) | [Meetings](https://github.com/cncf-tags/wg-artifacts#communications) | + diff --git a/website/content/ja/about/wg-platforms.md b/website/content/ja/about/wg-platforms.md new file mode 100644 index 00000000..df3a0563 --- /dev/null +++ b/website/content/ja/about/wg-platforms.md @@ -0,0 +1,22 @@ +--- +title: WG Platforms +linkTitle: wg-platforms +description: Work to enable adoption of platforms for cloud-native computing. +--- + +* [Charter](https://github.com/cncf/tag-app-delivery/tree/main/platforms-wg/charter) +* Slack channel: [#wg-platforms](https://cloud-native.slack.com/archives/C020RHD43BP) + +## Chairs + +* [Josh Gavant](https://github.com/joshgav) +* [Roberth Strand](https://github.com/roberthstrand) + +## Meetings + +* Meeting schedule: 2nd and 4th Tuesday of each month at [1600 UTC](https://www.timeanddate.com/worldclock/converter.html?iso=20221213T160000&p1=1440) + * [2nd Tuesday event](https://calendar.google.com/calendar/u/0/r/week/2022/12/13?eid=MDAxZmVpMGE5aDc3a283dGd2Y2YwcnZuYTFfMjAyMjEyMTNUMTYwMDAwWiBsaW51eGZvdW5kYXRpb24ub3JnX281YXZqbHZ0MmNhZTlicTdhOTVlbWM0NzQwQGc) + * [4th Tuesday event](https://calendar.google.com/calendar/u/0/r/week/2022/12/27?eid=NGhyOHY1ZWVrbDliODY3bXU5ZnRtYWo0ZGdfMjAyMjEyMjdUMTYwMDAwWiBsaW51eGZvdW5kYXRpb24ub3JnX281YXZqbHZ0MmNhZTlicTdhOTVlbWM0NzQwQGc) + * [Full CNCF calendar](https://calendar.google.com/calendar/u/0/embed?src=linuxfoundation.org_o5avjlvt2cae9bq7a95emc4740@group.calendar.google.com) +* [Zoom](https://zoom.us/j/7276783015?pwd=R0RJMkRzQ1ZjcmE0WERGcTJTOEVyUT09) (Passcode: 77777) +* [Agendas and notes](https://docs.google.com/document/d/1_smeS9-j-SuHJi0VXjx4g9xiD2-tgqhnlwf5oSMDQgg) diff --git a/website/content/ja/blog/2023-10-kubecon-chicago.md b/website/content/ja/blog/2023-10-kubecon-chicago.md new file mode 100644 index 00000000..a3eb458d --- /dev/null +++ b/website/content/ja/blog/2023-10-kubecon-chicago.md @@ -0,0 +1,72 @@ +--- +title: TAG App Delivery at Kubecon Chicago +slug: tag-app-delivery-at-kubecon-chicago +date: 2023-10-09 12:00:00 +0000 +author: Josh Gavant +categories: +- Announcement +tags: +- Event +--- + +![Kubecon Chicago 2023](/images/kubecon-chicago-2023.jpg) + +At Kubecon Chicago TAG App Delivery will bring together maintainers and users of +projects that enable cloud-native application delivery to meet and learn from +each other. The TAG's goals as always are to a) enable application delivery projects +to learn from each other and from cloud application developers and b) make +application delivery faster and more efficient for end users. + +To this end the TAG will host +[a project meeting](https://events.linuxfoundation.org/kubecon-cloudnativecon-north-america/program/project-engagement/#in-person-project-working-session) +on Monday morning at [the Marriott Marquis](https://maps.app.goo.gl/6gczBxScup8Cn6tBA) on Level 4, room name "Probability"; and booth 41 in +[the project pavilion](https://events.linuxfoundation.org/kubecon-cloudnativecon-north-america/program/project-engagement/#project-pavilion) +at the conference center on Tuesday, Wednesday and Thursday. At these venues we'll be hosting talks and discussions about app +delivery topics; if you'd like to share an open source project, +a new idea, or just lead an open discussion please let us know by filling out +[this form](https://forms.gle/ZbNxrK5f72otckvj7). +The project meeting will also be [livestreamed to YouTube](https://www.youtube.com/watch?v=NZCmYRVziGY). + +Here's that info again: + +- Project meeting on Monday 11/6 from 8am-12pm at the [Marriot Marquis](https://maps.app.goo.gl/6gczBxScup8Cn6tBA), Level 4, room name Probability and [YouTube](https://www.youtube.com/watch?v=NZCmYRVziGY). +- Booth and meetup spot at project pavilion booth #P10 on the show floor each afternoon and evening +- Form for presentation and discussion proposals: + +The TAG will also host +[a panel discussion](https://kccncna2023.sched.com/event/eb75a050355eccf96c4f1d77a831f7d4) +on Tuesday at 3:25pm on the relevance of platforms and platform engineering for +efficient cloud-native computing. Please join us! + +Stop by our booth to chat about platforms, GitOps, artifacts and other app +delivery topics and learn more about the TAG. + +## Pre-day meetup - Monday morning + +The schedule for the project meeting on Monday morning will be as follows. The +meeting will be live streamed and recorded too. + +Time | Topic | Presenter +-------|-------|------ +08:00 | TAG General Review | [TAG Leads](https://tag-app-delivery.cncf.io/#leads) +09:00 | Sandbox submission review for [Radius](https://radapp.io/) | Jonathan Smith, Microsoft +10:00 | Learn about and discuss [CNOE](https://cnoe.io/) | Nima Kaviani, AWS +11:00 | Workload Specifications: Your Best Friends In Platform Engineering | Atulpriya Sharma, InfraCloud +\- | Automating the Deployment of Data Workloads to Kubernetes with ArgoCD, Argo Workflows, and Hera | Matt Menzenski, Payit +\- | Beyond the Bundle: Porter and CNAB | Sarah Christoff, Microsoft +\- | From Apps to Stacks: Delivering Reusable Analytic Stacks on Kubernetes | Robert Hodges, Altinity + +## Booth meetups - Tuesday, Wednesday, Thursday + +The schedule for talks at the booth follows: + +Date/Time | Topic +--------------|------- +Nov 7 @ 17:00 | Chat with participants from [Platforms Panel](https://kccncna2023.sched.com/event/eb75a050355eccf96c4f1d77a831f7d4) +Nov 8 @ 15:00 | Talks and discussions +\- | Speed up your API delivery with Microcks | Yacine Kheddache, Microcks +\- | How to Build a GitOps Internal Developer Platform on Kubernetes | Christina Andonov, AWS +\- | KubeConstellations: Platform Engineering Patterns | Ram Iyengar, Cloud Foundry +\- | AAA framework & DCD capability map for Platform Engineering | Vishal Biyani, InfraCloud + +See you in Chicago! diff --git a/website/content/ja/blog/_index.md b/website/content/ja/blog/_index.md new file mode 100644 index 00000000..5906f036 --- /dev/null +++ b/website/content/ja/blog/_index.md @@ -0,0 +1,6 @@ +--- +title: ブログ +menu: + main: + weight: 40 +--- \ No newline at end of file diff --git a/website/content/ja/blog/announce-platforms-paper.md b/website/content/ja/blog/announce-platforms-paper.md new file mode 100644 index 00000000..bb3c9d73 --- /dev/null +++ b/website/content/ja/blog/announce-platforms-paper.md @@ -0,0 +1,87 @@ +--- +title: "Announcing a Whitepaper on Platforms for Cloud-native Computing" +date: 2023-04-10 01:00:00 +0000 +author: Josh Gavant, Abby Bangser +categories: +- Announcement +tags: +- WG Platforms +--- + + + +CNCF’s Platforms working group (WG) is pleased to announce the first release of +a whitepaper to provide guidance and clarity on the nature and benefits of +platforms for cloud-native computing. Download it now as a +[PDF](https://github.com/cncf/tag-app-delivery/raw/main/platforms-whitepaper/v1/assets/platforms-def-v1.0.pdf) +or view it on [our website](https://tag-app-delivery.cncf.io/whitepapers/platforms). + +Thank you to our many contributors listed below for their ongoing input and +insights! + +We prepared this paper because we've learned that platforms enable organizations +to fully realize the promises of cloud computing. Platforms accelerate +application and service delivery by enabling rapid integration of infrastructure +and application components. They are a step in the ongoing evolution of +enterprise IT, providing core capabilities consistently to enable DevOps-style +efficiency and autonomy across an organization. + +The objective of this paper is to educate and advise organizational leaders and +would-be platform builders by describing the values internal platforms offer, +the problems they solve, methods to track their success and attributes and +capabilities they require. It presents how today’s CNCF projects fit together as +the foundation of complete platform initiatives. Finally, it provides guidance +on how to enable platform teams to succeed, how to measure their progress, and +some challenges to prepare them for. + +WG Platforms and TAG App Delivery are building on this foundation to provide +more guidance and reduce complexity for cloud application builders and CNCF +project maintainers. Join us via the links below as we expand guidance on +practices like integrating a product mindset in platform teams and applying +standard governance policies; and as we pursue conventions for capabilities like +secrets management, artifact storage, web portals and API frameworks, all +potential parts of a complete platform: + + + +Last but not least, this version of this paper will not be the last! Please +inform future iterations by responding to our survey (to be shared soon!) and +sharing your platform stories with us in CNCF groups and meetups. Hope to talk +to you soon! + +## Thank you to our contributors! + +As we reach this milestone we want to thank members of CNCF's WG Platforms for all +[their contributions](https://github.com/cncf/tag-app-delivery/commits/main/platforms-whitepaper) +and feedback, particularly the following: + +- Abby Bangser +- Abhinav Mishra +- Abi Noda +- Alex Chesser +- Brad Bazemore +- Chris Aniszczyk +- Colin Griffin +- Dash Copeland +- Gopal Ramachandran +- Henrik Blixt +- Johannes Kleinlercher +- Josh Gavant +- Justin Abrahms +- Lian Li +- Mark Fussell +- Mauricio Salatino +- Pascal Fenkam +- Raffaele Spazzoli +- Roberth Strand +- Saim Safdar +- Scott Nasello +- Taras Mankovski +- Thomas Vitale +- Viktor Nagy + +## Resources + +- Slack: +- Work item tracker: +- Mailing list: diff --git a/website/content/ja/blog/announcing-the-platform-engineering-maturity-model.md b/website/content/ja/blog/announcing-the-platform-engineering-maturity-model.md new file mode 100644 index 00000000..8408243b --- /dev/null +++ b/website/content/ja/blog/announcing-the-platform-engineering-maturity-model.md @@ -0,0 +1,88 @@ +--- +title: 'Announcing the Platform Engineering Maturity Model' +date: 2023-11-01 00:00:00 +0000 +author: Abby Bangser, Josh Gavant +categories: +- Announcement +tags: +- WG Platforms +--- + +The CNCF Platforms Working Group (WG) is excited to present the first release of a platform engineering maturity model which provides a more concrete application of the extremely well received white paper from this past April. + +Download now as a [PDF](https://github.com/cncf/tag-app-delivery/raw/main/platforms-maturity-model/v1/assets/platform-eng-maturity-model-v1.0.pdf) or view it on our [website](https://tag-app-delivery.cncf.io/whitepapers/platform-eng-maturity-model/). + +We want to thank the almost 50 people who have contributed their time and ideas to make this model reflect the state of companies both small and large and across both fast moving and highly regulated industries; all of them are listed below. + +The [platforms white paper](https://tag-app-delivery.cncf.io/whitepapers/platforms/) released in April 2023 was always aimed at providing an executive summary on the what and why behind platforms. This paper successfully defined a north star for many organizations but left questions about how they can progress. This new model presents platform engineering as the practice used to offer an internal platform as a product through investment in all parts of building platforms and their capabilities - the people, processes, policy, and technology which in turn drive business outcomes. + +
+ +While presented as a single model with 5 aspects and 4 levels of maturity, the accompanying paper extends well beyond the black and white tick box exercise sometimes associated with maturity models. + +Clear explanations of each aspect and each level within an aspect are provided and characteristics and scenarios are provided for each model item. These details have been collected from diverse working group members' experiences to provide readers a chance to see real world applications hand in hand with the theory behind each progression. In addition, we want _your_ stories to help decorate this model by including examples that you are happy to share. To introduce a new example, please [open an issue](https://github.com/cncf/tag-app-delivery/issues/new?assignees=&labels=&projects=&template=platform-maturity-model-example.md) and share your story. + +As readers, we encourage you to remember that it is not only costly, but sometimes actively detrimental to blindly follow any model to the highest level of maturity. Instead, we hope you will use this model to identify both your current and desired characteristics, enabling you to target your investment in the areas you will most benefit from. + +Finally, we want to take this opportunity to reintroduce the working group as a welcoming community of companies building platforms, consulting on platforms, and creating tools to support platform builders. We have a number of exciting initiatives in flight and would love to see you get involved including fortnightly meetings to share platform building stories and a deep dive paper on Platform as a Product. For more information please see our [website](https://appdelivery.cncf.io) or join the #wg-platforms channel in the [CNCF slack](https://slack.cncf.io/). + +## Thank you to our contributors! + +As we reach this milestone we want to thank all the reviewers for all their contributions and feedback: + +* Abby Bangser (Project lead) +* Abby Kearns +* Abdur Rahman Mungul +* Adam Gardner +* Adrian Cockroft +* Antoine Bermon +* Areti Panou +* Asare Nikansah +* Asare Nkansah +* Atulpriya Sharma +* Blake Romano +* Bob Hong +* Bruno Dias +* Colin Griffin +* Colin Humphreys +* Daniel Bryant +* David Sandilands +* Edward (Ted) Newman +* Jennifer Riggins +* John Dietz +* John Gardner +* Josh Gavant +* Karena Angell +* Kief Morris +* Kirstin Slevin +* Luca Acquaviva +* Manuel Pais +* Marsh Gardiner +* Matt Menzenski +* Michael Coté +* Michael Kestigian +* Nadav Cohen +* Nicki Watt +* Niklas Beinghaus +* Paula Kennedy +* Puja Abbassi +* Puneet Kandhari +* Ram Iyengar +* Ramanujan Iyengar +* Rick Osowski +* Roberth Strand +* Rogerio Angeliski +* Saim Safdar +* Sam Newman +* Simon Forster +* Tsahi Duek +* Victor Lu +* Vijay Chintha +* Viktor “Bika” Nagy +* Vishal Biyani + +## Resources + +Slack: https://cloud-native.slack.com/archives/C020RHD43BP + +Mailing list: https://lists.cncf.io/g/cncf-tag-app-delivery diff --git a/website/content/ja/blog/assets/platform_components.png b/website/content/ja/blog/assets/platform_components.png new file mode 100644 index 00000000..e69de29b diff --git a/website/content/ja/blog/assets/platforms-contribution-stages.jpg b/website/content/ja/blog/assets/platforms-contribution-stages.jpg new file mode 100644 index 00000000..a01c68cd Binary files /dev/null and b/website/content/ja/blog/assets/platforms-contribution-stages.jpg differ diff --git a/website/content/ja/blog/assets/platforms-mm-v1-table.png b/website/content/ja/blog/assets/platforms-mm-v1-table.png new file mode 100644 index 00000000..e69de29b diff --git a/website/content/ja/blog/assets/platforms-pyramid.png b/website/content/ja/blog/assets/platforms-pyramid.png new file mode 100644 index 00000000..e69de29b diff --git a/website/content/ja/blog/clusters-for-all-cloud-tenants.md b/website/content/ja/blog/clusters-for-all-cloud-tenants.md new file mode 100644 index 00000000..3f21ed11 --- /dev/null +++ b/website/content/ja/blog/clusters-for-all-cloud-tenants.md @@ -0,0 +1,62 @@ +--- +title: "Clusters for all cloud tenants" +date: 2022-06-02 13:04:00 +0200 +author: Josh Gavant +categories: +- Article +tags: +- WG Multi-tenancy +- Community Contribution +--- + +A decision which faces many large organizations as they adopt cloud architecture is how to provide isolated spaces within the same environments and clusters for various teams and purposes. For example, marketing and sales applications may need to be isolated from an organization's customer-facing applications; and development teams building any app usually require extra spaces for tests and verification. + +## Namespace as unit of tenancy + +To address this need, many organizations have started to use namespaces as units of isolation and tenancy, a pattern previously described by [Google](https://cloud.google.com/kubernetes-engine/docs/concepts/multitenancy-overview) and [Kubernetes contributors](https://kubernetes.io/blog/2021/04/15/three-tenancy-models-for-kubernetes/). But namespace-scoped isolation is often insufficient because some concerns are managed at cluster scope. In particular, installing new resource types (CRDs) is a cluster-scoped activity; and today independent teams often want to install custom resource types and operators. Also, more developers are themselves writing software operators and custom resource types and find themselves requiring cluster-scoped access for research and tests. + +## Cluster as unit of tenancy + +For these reasons and others, tenants often require their own isolated clusters with unconstrained access rights. In an isolated cluster, a tenant gets its own Kubernetes API server and persistence store and fully manages all namespaces and custom resource types in its cluster. + +But deploying physical or even virtual machines for many clusters is inefficient and difficult to manage, so organizations have struggled to provide clusters to tenant teams. Happily :smile:, to meet these organizations' and users' needs, leading Kubernetes vendors have been researching and developing lighter weight mechanisms to provide isolated clusters for an organization's tenants. In this post we'll compare and contrast several of these emergent efforts. + +Do you have other projects and ideas to enhance multitenancy for cloud architecture? Then please join CNCF's App Delivery advisory group in discussing these [here](https://github.com/cncf/tag-app-delivery/issues/193); thank you! + +### vcluster + +[vcluster](https://www.vcluster.com/) is [a prominent project](https://www.google.com/search?q=vcluster&tbm=nws) and CLI tool maintained by [loft.sh](https://loft.sh/) that provisions a virtual cluster as a StatefulSet within a tenant namespace. Access rights from the hosting namespace are propogated to the hosted virtual cluster such that the namespace tenant becomes the cluster's only tenant. As cluster admins, tenant members can create cluster-scoped resources like CRDs and ClusterRoles. + +The virtual cluster runs its own Kubernetes API service and persistence store independent of those of the hosting cluster. It can be published by the hosting cluster as a LoadBalancer-type service and accessed directly with kubectl and other Kubernetes API-compliant tools. This enables users of the tenant cluster to work with it directly with little or no knowledge of its host. + +In vcluster and the following solutions, the virtual cluster is a "metadata-only" cluster, in that resources in it are persisted to a backing store like etcd, but no schedulers act to reify the persisted resources - ultimately as pods. Instead, a "syncer" synchronization service copies and transforms reifiable resources - podspecs - from the virtual cluster to the hosting namespace of the hosting cluster. Schedulers in the hosting cluster then detect and reify these resources in the same underlying tenant namespace where the virtual cluster's control plane runs. + +An advantage of vcluster's approach of scheduling pods in the hosting namespace is that the hosting cluster ultimately handles all workloads and applies namespace quotas - all work happens within the namespace allocated to the tenant by the hosting cluster administrator. A disadvantage is that schedulers cannot be configured in the virtual cluster since pods aren't actually run there. + +- [vcluster on GitHub](https://github.com/loft-sh/vcluster) + +### Cluster API Provider Nested (CAPN) + +In vcluster, bespoke support for control plane implementations is required; as of this writing, vcluster supports k3s, k0s and vanilla k8s distributions. + +To support _any_ control plane implementation, the [Cluster API Provider Nested](https://github.com/kubernetes-sigs/cluster-api-provider-nested) project implements an architecture similar to that of vcluster, including a metadata-only cluster and a syncer, but provisions the control plane using a Cluster API provider rather than a bespoke distribution. + +CAPN promises to enable control planes implementable via Cluster API to serve virtual clusters. + +### HyperShift + +Similar to the previous two, [Red Hat](https://www.redhat.com/)'s [HyperShift](https://github.com/openshift/hypershift) project provisions an OpenShift (Red Hat's Kubernetes distro) control plane as a collection of pods in a host namespace. But rather than running workloads within the hosting cluster and namespace like vcluster, HyperShift control planes are connected to a pool of dedicated worker nodes where pods are synced and scheduled. + +HyperShift's model may be most appropriate for a hosting provider like Red Hat which desires to abstract control plane management from their customers and allow them to just manage worker nodes. + +### kcp + +Finally, [kcp](https://github.com/kcp-dev/kcp) is another proposal and project from [Red Hat](https://www.redhat.com/) inspired by and reimagined from all of the previous ideas. Whereas the above virtual clusters run _within_ a host cluster and turn to the host cluster to run pods, manage networks and provision volumes, kcp reverses this paradigm and makes the _hosting_ cluster a metadata-only cluster. _Child_ clusters - _workspaces_ in the kcp project - are registered with the hub metadata-only cluster and work is delegated to these children based on labels on resources in the hub. + +As opposed to hosted virtual clusters, child clusters in kcp _could_ manage their own schedulers. Another advantage of kcp's paradigm inversion is centralized awareness and management of child clusters. In particular, this enables simpler centralized policies and standards for custom resource types to be propogated to all children. + +## Conclusion + +vcluster, CAPN, HyperShift, and kcp are emerging projects and ideas to meet cloud users' needs for multitenancy with _clusters_ as the unit of tenancy. Early adopters are already providing feedback on good and better parts of these approaches and new ideas emerge daily. + +Want to help drive new ideas for cloud multitenancy? Want to help cloud users understand and give feedback on emerging paradigms in this domain? Then join [the discussion](https://github.com/cncf/tag-app-delivery/issues/193) in CNCF's TAG App Delivery. Thank you! diff --git a/website/content/ja/blog/contributing-to-wg-platforms.md b/website/content/ja/blog/contributing-to-wg-platforms.md new file mode 100644 index 00000000..0eaf218e --- /dev/null +++ b/website/content/ja/blog/contributing-to-wg-platforms.md @@ -0,0 +1,82 @@ +--- +title: Getting started with contributing in WG Platforms +slug: contributing-to-wg-platforms +date: 2023-12-20 12:00:00 +0000 +author: Abby Bangser & Atulpriya Sharma +categories: +- Article +tags: +- WG Platforms +--- + +Similar to the advice on the [TAG App Delivery contributions page](https://tag-app-delivery.cncf.io/contribute/), we highly encourage new faces and new voices in existing forums, including asynchronous chats on [Slack](https://cloud-native.slack.com/archives/C020RHD43BP), [GitHub issues](https://github.com/cncf/tag-app-delivery/issues), and the fortnightly [working group Zoom calls](https://zoom.us/j/7276783015?pwd=R0RJMkRzQ1ZjcmE0WERGcTJTOEVyUT09). + +In addition, the WG Platforms has noticed a number of exciting new ideas generated by new joiners and wants to create an avenue for those ideas to be supported and successful, even coming from the newest voices. With that in mind, we have created a path that will help these ideas get the support they need! + +## When you have a new idea + +You are passionate in the platform engineering space and have an idea on how to share that passion with the CNCF community, that is exciting and we want to help! + +Even with this excitement, we understand that contributing your own content for the first time can be confusing or intimidating. Don’t worry, we are a welcoming community and always open to new ideas and thoughts. If you’ve been wanting to be a part of the Platform WG, you’ve come to the right place. + +The following process builds on the wider TAG contribution guidelines to provide a lightweight way to ensure that all of these great ideas get the support they deserve within the scope of the WG. + +We have had ideas raised from new WG roles (e.g. a proposal for a community outreach role), a new white paper (e.g. the platform as a product white paper), a blog post (e.g. two sided market theory), and more. Some of these have garnered more traction than others, but the overriding criteria that we see for success is building enough momentum within the WG to get reviews for publication. This process is built to support new voices with an advocate who has the skills and experience in this process to make sure new joiner friction doesn’t cause a great idea to be silenced. + +If anything about this process is stopping you from contributing, the most important thing is to raise the idea. You can reach out to the WG Platform leads at any time to bounce an idea around and learn more about how the WG can help. + +With that in mind, the three steps are: + + + +## Step 0 - Idea generation + +Before you can publish you will need to have an idea to share! Therefore, you may start by asking “Is my idea suitable for this working group?” While you should always feel empowered to ask, you can first evaluate if your idea relates to platforms and platforming engineering. Some examples of relevant topics including: +* Technical overview or experiences working with platform tools and related technologies +* Experience report or interviews about platform building or using +thought leadership in regards to supporting developer experience and * productivity +* Hands-on DIY posts helping readers learn a tool + +Please keep in mind this is not an exhaustive list and are very open to new ideas. It may be easier to enumerate what is not fit for the working group: +* Vendor or other promotional pitches +* Topics not related to application delivery or cloud native technologies +* Discriminatory or abusive content + + +## Step 1 - Submission + +We would encourage you to generate a GitHub issue and open a Slack thread with your idea. You can use [this link](https://github.com/cncf/tag-app-delivery/issues/new?template=community-contribution.md) to create an issue and also start a thread on the [CNCF Slack](https://communityinviter.com/apps/cloud-native/cncf) in the [#wg-platforms channel](https://cloud-native.slack.com/archives/C020RHD43BP). + +The GitHub issue will present you with a template where you can follow the prompts. First, You must write a descriptive title, then answer each question in the description field. If you find one is not relevant that is OK, write that and there is always a chance to chat more on these things after your initial submission. + +For Slack, you can start a new thread in this format: [Proposal] + +This submission will act as two things: +1. A call for support from others in the community. You may naturally pick up a project advocate or set of collaborators who are as passionate as you are on the topic. + +1. A home for all the work done on this piece of work. You will be updating this frequently to indicate goals, progress, and asks for help. + +## Step 2 - Initial acceptance + +Once submitted, you can expect a WG lead to respond within a week. They will help clarify any open questions, confirm that the idea is within the scope of this WG, and guide you towards any existing work that you may be able to benefit from or where your idea may fit better if it doesn’t fit best within the WG. + +They will also recommend next steps for finding a project advocate so that you have someone to work with to find WG support during the project lifecycle including publishing and publicising the work. + +## Step 3 - Drafting & Reviews + +Now comes the fun work! You can work with your advocate and the entire WG community to refine your idea and produce the best possible content. Depending on what you are working on this could take days, weeks, or even months. Even if your idea has a scope of work that could last months, we highly suggest you find ways to release smaller content pieces first to generate more interest and also more confidence in the alignment of your work. For example, most blogs are written, reviewed, and published within 1-2 months. These are the types of things your advocate will help support you with. + +Once you’ve finished drafting your blog post, you can update the GitHub issue saying it’s ready for review as well as update the Slack with the draft as a Google Doc and tag anyone who you feel can review it. (PS: It’s an open forum so anyone is free to review, but if you feel there’s someone who must have a look at it, tag them) + +During the review process, we’ll do the following checks: +* _Basic grammar, syntax, and language check_ - we suggest using a tool like Grammarly before submitting for review. +* _Technical correctness_ - we’ll validate the technical accuracy of what you’ve written. +* _Vendor Pitches/Promotional links_ - we’ll carefully go through the content to ensure there’s no promotional material. + +## Step 4 - Final Approval and Publishing + +After the review is complete, and there’s a consensus from everyone that this is good to go, the WG lead will initiate the process of raising a PR and merging it. + +## What next? + +Congratulations, your blog post will be live by now and a handful of people will have already read it. So what next? Well, feel free to share the blog posts on social media, and tag us (TAG App delivery). Also, don’t forget to thank everyone who helped to make the blog post live. And lastly, you should start thinking about your next blog post. So shall start again from the beginning? diff --git a/website/content/ja/blog/cooperative-delivery-platforms.md b/website/content/ja/blog/cooperative-delivery-platforms.md new file mode 100644 index 00000000..8441d4f5 --- /dev/null +++ b/website/content/ja/blog/cooperative-delivery-platforms.md @@ -0,0 +1,52 @@ +--- +title: "Infrastructure for Apps: Platforms for Cooperative Delivery" +date: 2022-09-22 00:00:00 +0000 +author: Josh Gavant +categories: +- Announcement +tags: +- WG Platforms +- Event +--- + +![infrastructure integration](/images/infrastructure-integration.png) + +TAG App Delivery formed the Cooperative Delivery working group in late 2021 to gather and report on emerging trends around coordinated delivery of infrastructure capabilities and applications. The TAG noted that while infrastructure teams are successfully adopting software development practices and deploying features and fixes continuously via the likes of GitOps and IaC (Infrastructure as Code), delivery of infrastructure capabilities is often not coordinated well with delivery of applications using that infrastructure. That is, there's a *gap* in delivery between application and infrastructure and coordination/cooperation is needed to bridge that gap. + +The primary goals of the group have been to a) confirm the hypothesis that there is a gap, b) clarify how it manifests for end users and c) identify and encourage emerging trends to facilitate cooperation. For example, the group's [first hypotheses](https://github.com/cncf/tag-app-delivery/blob/main/cooperative-delivery-wg/charter/README.md#examples-of-known-patterns-aimed-to-deploy-applications) mentioned the following existing trends: + +- GitOps: continuous idempotent reconciliation of configuration from declarative descriptors +- Operators: reconciliation-oriented services +- Pipelines: imperative orchestration of services and applications + +In this article we'll review new trends we've learned about from end users and from emerging [CNCF projects](https://landscape.cncf.io/card-mode?category=application-definition-image-build,continuous-integration-delivery&grouping=no) like Backstage, Crossplane, Dapr, KubeVela and more. + +We've also learned over the past year that while "cooperation" between infrastructure and application teams is what we seek to achieve, "cooperative delivery" is not a familiar term to most of our contributors. Recognizing that this cooperation is also the goal of "internal developer platforms" (IDPs) and the emerging platform engineering movement, we're preparing to rename the working group Platforms. + +We're always seeking more input from users and contributors to guide us. Please consider sharing how your organization coordinates application and infrastructure delivery via [this GitHub form](https://github.com/cncf/tag-app-delivery/issues/new/choose) and share your thoughts in [GitHub](https://github.com/cncf/tag-app-delivery/discussions) or [Slack](https://cloud-native.slack.com/archives/CL3SL0CP5). + +## Platform Engineering + +Beyond our original hypotheses, an emerging trend we've noted for coordinating infrastructure and applications is platform engineering (PE) and particularly its principle of **self-serviceable capabilities**. [Backstage](https://www.cncf.io/projects/backstage/), for example, is a popular portal framework for these emerging platforms. According to Humanitec lead [Luca Galante](https://platformengineering.org/authors/luca-galante), platform engineering is "the discipline of designing and building toolchains and workflows that enable **self-service** capabilities for software engineering organizations in the cloud-native era ([link](https://platformengineering.org/blog/what-is-platform-engineering))." *Self-service* describes the mechanism of cooperative delivery: a developer provisions and uses capabilities in their app on-demand by following documented steps. + +In addition to its self-service paradigm, platform engineering also **focuses on the needs of application developers** and operators, the users of the platform. This increases PEs' empathy for developers and other platform users and helps them gather feedback and iteratively improve to meet their needs, as product developers do for end customers. The shift in focus also better aligns platform development with an enterprise's true value streams, rather than infrastructure teams being an out-of-band cost center. It's not technical exactly, but **empathetic relationships between platform engineering and application teams** lead to better coordination of infrastructure capabilities and app requirements. + +These platforms are typically built using foundational CNCF projects like Kubernetes, Helm, Prometheus, Backstage, Istio, Knative, Keptn and more. + +## Kubernetes for Everything + +Another trend we've noted in projects like [Crossplane](https://www.cncf.io/projects/crossplane/) is the adoption of the Kubernetes resource model for configuring and managing all types of infrastructure capabilities and application components. Users no longer provision only deployments, volumes and ingresses via the Kubernetes API; custom resource definitions (CRDs) now enable provisioning of databases, identities, message brokers, observability systems, and much more. + +The [GitOps](https://www.cncf.io/projects/opengitops/) movement demonstrated the value of continuous reconciliation for applications, and with so many resource types available developers can now reconcile infrastructure in the same way as applications. For those providing their own infrastructure capabilities, the [Operator Framework](https://www.cncf.io/projects/operator-framework/) is a popular foundation for custom Kubernetes-based reconciler implementations. + +## Capability Injection + +Finally, we've noted projects like [Dapr](https://www.cncf.io/projects/dapr/) and [KubeVela](https://www.cncf.io/projects/kubevela/) which seek to coordinate infrastructure capabilities for apps through inference and late resolution and injection of those capabilities. These projects often ask app developers to declare the capabilities they require, like databases and message brokers, and then resolve actual implementations at runtime, perhaps using sidecar containers or eBPF programs. Some projects like [Istio](https://www.redhat.com/en/blog/istio-service-mesh-applies-become-cncf-project) can even inject capabilities transparently to the app developer. + +Late resolution and injection loosens coupling of apps and infrastructure and is another form of "cooperative" delivery. Imagine getting a database from a different provider depending on the application's context - an RDS instance in AWS, a CloudSQL instance in GCP, or a [CloudNativePG](https://cloudnative-pg.io/) instance on premises. + +## Conclusion + +The mission of the Cooperative Delivery WG - soon to be the Platforms WG - is to gather feedback and highlight emerging trends that address gaps in coordination of infrastructure capabilities and applications. [Join us](https://github.com/cncf/tag-app-delivery) in TAG App Delivery to advance this topic and others relevant to application and platform developers and operators. + +Image credit https://www.cleo.com/blog/knowledge-base-cloud-integration-platform diff --git a/website/content/ja/blog/kubecon-eu-2023.md b/website/content/ja/blog/kubecon-eu-2023.md new file mode 100644 index 00000000..565aec76 --- /dev/null +++ b/website/content/ja/blog/kubecon-eu-2023.md @@ -0,0 +1,89 @@ +--- +title: TAG App Delivery at Kubecon EU 2023 +date: 2023-04-14 12:00:00 +0000 +author: Josh Gavant +categories: +- Announcement +tags: +- Event +--- + +![Kubecon EU 2023](/images/kubecon-eu-2023.jpg) + +At Kubecon EU next week TAG App Delivery will bring together cloud-native +application developers and framework builders to meet each other and share +insights and knowledge. Our goal is to make cloud development better for all by +gathering feedback, finding synergies and guiding both users and projects. + +To this end the TAG will host the following lightning talk meetups. The list of +talks and presenters follows below. + +- a [pre-day meetup](https://kccnceu2023.sched.com/event/1JWPr/tag-app-delivery-project-meeting) in RAI Room D301, Congress Center +- a booth meetup on Wednesday 4/19 at 3:00pm in booth K1 in the CNCF Project Pavilion in the Solutions Hall +- a booth meetup on Thursday 4/20 at 1:30pm in booth K1 in the CNCF Project Pavilion in the Solutions Hall + +In addition to these meetings and talks, the TAG will host booth K1 in the +project pavilion in the first half of each conference day. Stop by any time to +chat with us and learn more about the TAG; initiatives related to operators, +GitOps, Platforms and more; and App Delivery-related CNCF projects. + +Also, if you work on an open source project related to app delivery and don't +have your own booth, you're welcome to reserve the TAG booth for a session +describing your project! Please DM [Josh Gavant](https://cloud-native.slack.com/archives/DRTPUJL5V) +on CNCF Slack to coordinate. + +## Pre-day meetup - Tuesday + +Time | Topic | Presenter +-------|--------|------------ +13:00 | TAG General Review | [TAG Leads](https://tag-app-delivery.cncf.io/#leads) +13:30 | Lightning Talks on Application Delivery +-- | _Porting CloudFoundry Abstractions on Kubernetes_ | [Ram Iyengar](https://twitter.com/ramiyengar) +-- | _Platform Maturity Model_ | [Abby Bangser](https://twitter.com/a_bangser) +-- | _Project Unox - A platform showcase experiment_ | [Gopal Ramachandran](https://twitter.com/goposky) +-- | _Burden of Responsibility in cloud-native App Development_ | [Colin Griffin](https://www.linkedin.com/in/colin-e-griffin/) +-- | _Tailored Platforms on top of Kubernetes_ | [Mauricio Salatino](https://twitter.com/salaboy) +14:30 | TAG Work in Progress Review | [WG Leads](https://tag-app-delivery.cncf.io/about/#working-groups) +-- | _Platforms_ | Josh Gavant +-- | _Operators_ | Jennifer Streyevitch +-- | _Artifacts_ | Andrew Block +-- | _GitOps_ | Scott Rigby +-- | _Other_ +15:30 | Lightning Talks on Application Delivery +-- | _Capabilities of Portals_ | [Josh Gavant](https://www.linkedin.com/in/joshgav/) +-- | _K8sGPT brings superpowers to everyone_ | [Alex Jones](https://twitter.com/alexjonesax) +-- | _Enable secure self-service access to Kubernetes clusters with Paralus_ | [Abhinav Mishra](https://www.linkedin.com/in/abhinav-mishra-1b0093126/) +-- | _Microcks intro: The Open-source Kubernetes Native tool for API Mocking and Testing_ | [Yacine Kheddache](https://www.linkedin.com/in/yacinekheddache/) +-- | _Identity-defined Microservice Networks_ | [Karthik Prabhakar](https://twitter.com/worldhopper) + +## Booth meetups - Wednesday and Thursday + +As mentioned above, we'll hold meetups with lightning talks on App Delivery topics at booth K1 on Wednesday at 3:00pm and Thursday at 1:30pm. + +The schedule for the booth follows: + +Date/Time | Topic | Presenter +----------------|-------|----------- +Apr 19 @ 10:30 - 16:00 | Booth open +Apr 19 @ 15:00 - 16:00 | Meetup and Lightning Talks +-- | _ClickOps over GitOps_ | [Laszlo Fogas](https://twitter.com/laszlocph) +-- | _Implementing the pattern of "as-a-Service" (using Kratix)_ | [Abby Bangser](https://twitter.com/a_bangser) +-- | _Monitoring-As-Code with Crossplane_ | [Matthias Luebken](https://twitter.com/luebken/) +-- | _Using GitOps for AWS Serverless Infrastructure_ | [Carlos Santana](https://www.linkedin.com/in/csantanapr) +-- | _Far Beyond Virtual Clusters_ | [Dario Tranchitella](https://www.linkedin.com/in/dariotranchitella/) +-- | _GitOps made easy for any applications with PipeCD_ | [Khanh Tran](https://twitter.com/khanhtc1202) +Apr 20 @ 10:30 - 14:30 | Booth open +Apr 20 @ 13:30 - 14:30 | Meetup and presentations +-- | _ReleaseOps: GitOps for the People_ | [Lian Li](https://twitter.com/lianmakesthings) +-- | _Composable Platforms with Carvel_ | [Thomas Vitale](https://twitter.com/vitalethomas) +-- | _Making "Tenants" first-class citizens in Kubernetes with Capsule_ | [Dario Tranchitella](https://www.linkedin.com/in/dariotranchitella/) +Apr 21 @ 10:30 - 12:30 | Booth open + +## TAG presentation + +Last but not least, join TAG leads on Thursday for +[a session](https://kccnceu2023.sched.com/event/e52f9dc38bcbb6504e65d0e6c66170b3/) +describing how applications, operators, GitOps and platforms come together to +enable efficient and delightful cloud-native application delivery. + +Hope to see you in Amsterdam! diff --git a/website/content/ja/blog/kubeconna-project-meeting.md b/website/content/ja/blog/kubeconna-project-meeting.md new file mode 100644 index 00000000..a8845d12 --- /dev/null +++ b/website/content/ja/blog/kubeconna-project-meeting.md @@ -0,0 +1,21 @@ +--- +title: TAG App Delivery at Kubecon NA 2022 +date: 2022-09-12 12:00:00 +0000 +author: Jennifer Strejevitch and Josh Gavant +categories: +- Announcement +tags: +- Event +--- + +Join CNCF TAG App Delivery and our working groups at Kubecon in Detroit October 24-28. + +Many app delivery-related projects will be holding open office hours and maintaining booths in the project pavilion as detailed [here](https://events.linuxfoundation.org/kubecon-cloudnativecon-north-america/program/project-engagement/). + +A general meeting on the state of the TAG and some emerging app delivery patterns will be held Tuesday, October 25 1-5pm at Huntington Place. Find it on sched.com [here](https://kccncna2022.sched.com/event/1BaU0/cncf-tag-app-delivery-project-meeting). + +The agenda for the TAG Meeting on Tuesday is [here](https://docs.google.com/document/d/1aBLVTg2Ev27fIhFpXvsuL8WwqtK9AuMXgL6RpeozOTc/). + +Finally, the Platforms WG will be gathering platform component providers in an "unmeetup" on Thursday 10/27 at 1:30pm, details in [this doc](https://docs.google.com/document/d/1YNA1rYlZRZCGIj1VW6mL6a8HUXPHgQ2HaxunMOaoIVI/). + +See you in Detroit! diff --git a/website/content/ja/contribute/_index.md b/website/content/ja/contribute/_index.md new file mode 100644 index 00000000..0a67edcc --- /dev/null +++ b/website/content/ja/contribute/_index.md @@ -0,0 +1,30 @@ +--- +title: コントリビューション +list_pages: true +menu: + main: + weight: 20 +description: TAGに貢献する方法 +--- + +Here are a few suggestions on how to get started with the CNCF Technical Advisory Group (TAG) App Delivery. While you may find it useful to approach each in order, there is no requirement to do so. Find what makes sense to you and jump in! + +**Research a bit on what topics interest you the most.** + +The TAG has a lot of topics to cover under App Delivery. Some of the larger topics have become Working Groups (WGs) but there is still a lot more to discuss at the top TAG level. Titles of WGs and projects may be self-explanatory, but sometimes not. Continue on to learn more about how to find the right project for you. + +**Join Slack** + +Luckily, the CNCF slack isn’t a quiet place, it’s very active and provides a lot of great value. A great thing to start with is by simply introducing yourself in a WG channel and saying what you’re interested in. The leaders of the WG will point you in the right direction along with others that have contributed. + +**See what issues are available on GitHub** + +There’s everything from code-specific issues to documentation issues and everything in-between. Feel free to leave a comment or a suggestion. You can find them under the CNCF GitHub org for each TAG. For example, here’s the link to the App Delivery Tag: https://github.com/cncf/tag-app-delivery + +**Join a meeting** + +This may seem scary at first, especially if you’re new, but it’s a great way to dive in and see friendly faces after chatting with folks on Slack when you’re more comfortable. The meetings are structured with talking points in a shared google doc. You may be given time to introduce yourself (if you want!) and you are welcome to bring a topic to the meeting. + +**In conclusion, get started your way** + +TAG and WG members come in all styles. Some have never been to a call, others rarely engage on GitHub. The real power in the community is the coming together of different experiences and ideas to generate useful content for the wider cloud native community. You and your experiences are a big part of that, so come join us today! diff --git a/website/content/ja/contribute/community-post-guidelines/_index.md b/website/content/ja/contribute/community-post-guidelines/_index.md new file mode 100644 index 00000000..162d32b9 --- /dev/null +++ b/website/content/ja/contribute/community-post-guidelines/_index.md @@ -0,0 +1,71 @@ +--- +title: コミュニティへの投稿のガイドライン +list_pages: false +--- + +## Introduction + +This policy outlines the guidelines for accepting, reviewing, and publishing blog posts on our platform: https://tag-app-delivery.cncf.io/blog. + +Blog posts are an opportunity to foster a diverse and vibrant discussion environment, showcasing a variety of viewpoints from members and the wider open source community. + +It is important to note that this TAG is not itself a publishing house, and therefore the main focus of the group is to generate conversation and collaboration which will be true for blog posts as well, though to a lesser degree than for writing which is deemed a TAG official publication. + +## Representation of Diverse Viewpoints + +### Community Voices +Our blog serves as a collective of diverse viewpoints from our community members. We believe in representing a wide array of thoughts, experiences, and perspectives. + +### Inclusion of Disclaimer +Each post will include a short disclaimer, This blog post represents the viewpoint of its author(s) and does not necessarily reflect an official position or perspective of the TAG or any subsidiary working group. See this blog and contributions guidelines for more information on how you too can contribute." If the blog includes information about the authors current workplace or otherwise affiliated products, this must also be disclosed in the disclaimer. + +## Submission, Review and Acceptance Criteria + +### Submission Process +Contributors are required to submit their blog posts as pull requests (PRs). + +### Review Process +The TAG content is open for review from anyone with good intentions across the open source community. These reviews can be used to refine the content and help TAG leaders determine if the blog will be posted. + +### Acceptance Criteria +At least one TAG leader must approve the blog before it can be published. A non-exhaustive list of things the TAG leadership will evaluate is: +* Alignment with our community’s ethos and values. +* Relevance and usefulness to our community members. +* Independence from any one vendor or product. +* Originality and clarity of content. + +## Criteria for Rejection + +### Misalignment with Group Positions +Submissions fundamentally misaligned with the positions or values upheld by our group will be rejected. With this said, blogs are encouraged to have independent ideas which may at times challenge previous TAG publications, blog posts, or other content. This is not inherently an issue if the challenge is done using respect and detailed explanation of the data and experiences that led to this change. + +### Non-Adherence to Community Standards +Submissions failing to meet our community standards in terms of content quality, relevance, or ethical considerations will be subject to rejection. Content must also adhere to the [CNCF Code of Conduct](https://github.com/cncf/foundation/blob/main/code-of-conduct.md). + +### Feedback and Appeals +In cases of rejection, contributors will be provided with feedback. An appeal process is available for reconsideration, subject to the discretion of our TAG leadership team. + +## Publication Process + +### Editing and Finalization +Blog posts may undergo minor editing for clarity, grammar, and format consistency before publication. This will be done through PR reviews, suggestions, or adding a commit to the PR. + +### Notification of Acceptance +Contributors will be notified of the acceptance of their submission through a GitHub PR approval. + +### Publication and Promotion +Published posts will be shared on our platform and may be promoted through our social media channels and newsletters. Publication is expected to occur at the date of approval unless a specific date has been previously requested by the author and agreed with the TAG leadership team. + +## Cross-Posting Policy + +We actively encourage and support cross-posting content from personal or external blogs. This initiative helps in amplifying diverse voices and perspectives within our community. + +Cross-posts must either start on this platform or be published to this platform within 7 calendar days. If your intent is to publish to the community platform blog, we recommend having a publishing window approved before that post is published elsewhere, so that a link to the cross-post may be added. If 7 days are exceeded, the community post may be rejected as unoriginal content. + +## Amendments to the Policy + +This policy is subject to change at the discretion of the group's leadership, with the aim of continually adapting to the evolving needs of our community and maintaining the highest standards of content quality and diversity. + +## Acknowledgement + +By submitting content to our platform, contributors agree to abide by the terms outlined in this policy. diff --git a/website/content/ja/search.md b/website/content/ja/search.md new file mode 100644 index 00000000..4e0d6c3c --- /dev/null +++ b/website/content/ja/search.md @@ -0,0 +1,6 @@ +--- +title: 検索結果 +layout: search +toc_hide: true +--- + diff --git a/website/content/ja/wgs/_index.md b/website/content/ja/wgs/_index.md new file mode 100644 index 00000000..18c7df31 --- /dev/null +++ b/website/content/ja/wgs/_index.md @@ -0,0 +1,8 @@ +--- +title: ワーキンググループ(WG) +list_pages: true +menu: + main: + weight: 30 +description: "TAG Appデリバリーは、特定のプロジェクトや取り組みを推進するためのワーキンググループ(WG)を組織しています。" +--- diff --git a/website/content/ja/wgs/artifacts/_index.md b/website/content/ja/wgs/artifacts/_index.md new file mode 100644 index 00000000..069cf571 --- /dev/null +++ b/website/content/ja/wgs/artifacts/_index.md @@ -0,0 +1,5 @@ +--- +title: アーティファクトWG +list_pages: false +--- +{{< include path="assets/content/artifacts-wg/README.md" >}} diff --git a/website/content/ja/wgs/artifacts/charter/_index.md b/website/content/ja/wgs/artifacts/charter/_index.md new file mode 100644 index 00000000..9e998bee --- /dev/null +++ b/website/content/ja/wgs/artifacts/charter/_index.md @@ -0,0 +1,5 @@ +--- +title: アーティファクトWG憲章 +url: 'wgs/artifacts/charter/charter.md/' +--- +{{< include path="assets/content/artifacts-wg/charter.md" >}} \ No newline at end of file diff --git a/website/content/ja/wgs/operator/_index.md b/website/content/ja/wgs/operator/_index.md new file mode 100644 index 00000000..d9fdcb0d --- /dev/null +++ b/website/content/ja/wgs/operator/_index.md @@ -0,0 +1,35 @@ +--- +title: オペレーターWG +list_pages: true +inactive: true +--- +## This working group is currently inactive + +### Current Goal +Create A Whitepaper (Initially Definition of an Operator) + +### How to contribute +1. Join us in Slack [#sig-app-delivery-operator-wg](https://cloud-native.slack.com/archives/C01GTMYJLKS) +2. Read the [Operator Whitepaper](./whitepaper/README.md) +3. Grab a ticket and write content - we have a [project with all the tickets](https://github.com/cncf/sig-app-delivery/projects/1) +4. Open a new ticket for things that are missing + +For more details, see our [CONTRIBUTING.md](CONTRIBUTING.md) + +### Meetings +Every other week on Wednesday ([Zoom Meeting](https://zoom.us/my/cncfsigappdelivery?pwd=R0RJMkRzQ1ZjcmE0WERGcTJTOEVyUT09)) +* 9 AM US Pacific Standard Time +* 12 AM US Eastern Standard Time +* 6 PM Central European Time + +#### Links +* [Charter](./charter.md) +* [Agenda and Meeting Notes](https://docs.google.com/document/d/17pjT2g35yUMaby0cPJnFfHRlhlFzFruDxmJKRmj_BLU) + +#### Documents +* [Operator Whitepaper - Working Document](./whitepaper/README.md) + * Original Issue: (https://github.com/cncf/sig-app-delivery/issues/15) + + +#### Slack +* [#sig-app-delivery-operator-wg](https://cloud-native.slack.com/archives/C01GTMYJLKS) \ No newline at end of file diff --git a/website/content/ja/wgs/operator/charter/_index.md b/website/content/ja/wgs/operator/charter/_index.md new file mode 100644 index 00000000..de1f5757 --- /dev/null +++ b/website/content/ja/wgs/operator/charter/_index.md @@ -0,0 +1,49 @@ +--- +title: オペレーターWG憲章 +description: この憲章はオペレーターWGのミッションと戦術について述べます。 +--- + +# Operator Working Group Charter + +## Chairs/Sponsors +* Omer Kahani (@OmerKahani) +* Jennifer Strejevich (@Jenniferstrej) +* Thomas Schuetz (@thschue) + +## Goals +* Assemble and consolidate available best practices how to write operators e.g. CRDs, APIs, …. +* Patterns and use cases how to use operators +* Enable a forum for projects developing operators +* Coordination work amongst participants/projects in the operator ecosystem. +* Guidance around interoperability, configuration, management, and consumption of operators +* Operator definition and maturity/capability model +* Operator Security +* Custom Resource discovery, use, auditability +* Upgrade/Rollback best practices for controllers and CRDs +* Data sharing between operands/operators, like some configMap, secrets etc +* Identify requirements with other projects and engage with the right projects. Discover operator needs to present in other sigs + +## First Goal: +* Create A Whitepaper (Initially Definition of an Operator) + +## Non-Goals +* Writing SDKs for building operators +* Recommendation of individual operator projects or tools +* Non-Kubernetes Operators (at this time) +* Creation of any new software projects + +## Potential Future Scope +* Airgapped Operators +* Meta Operator/Umbrella Operator +* Operand Lifecycle Management + +## Working Group Meetings +every other week on wednesday (Zoom Meeting) + +* 9 AM US Pacific Standard Time +* 12 AM US Eastern Standard Time +* 6 PM Central European Time + + + + diff --git a/website/content/ja/wgs/operator/whitepaper/README.md b/website/content/ja/wgs/operator/whitepaper/README.md new file mode 100644 index 00000000..e90d25ff --- /dev/null +++ b/website/content/ja/wgs/operator/whitepaper/README.md @@ -0,0 +1,37 @@ +# Operator White Paper + +## Current Version +- [Latest Version (Version 1.0)](./index.md) + +## Proposed Schedule +*inspired by https://github.com/cncf/tag-security/issues/138* +- [x] Due February 10th, 2021 +* Tasking Assignment - people interested in content generation for a particular topic area comment on the corresponding GitHub issue and the topic gets assigned to them + +* Update - we will keep assignements open throughout the "Content-rough-in" weeks and [Jen](https://github.com/jenniferstrej), [Thomas](https://github.com/thschue) and [Omer](https://github.com/OmerKahani) will pick them up on a FIFO basis. + +- [x] February 10th - March 15th, 2021 - *Content-rough-in* +* Members generate content for the respective area of assignment. Cohesive sentences, concepts, phrasing, etc. should be placed in quotations ("") for later review as whole content. + +* Content rough in will be pulled into a clean working document (single markdown) for review + +- [x] March 15th - April 1st, 2021 - *Collaborative review* +* Members will comment and review the content of the draft + +- [x] April 1st - April 7th, 2021 - *Executive summary and content wrap up* + +- [x] April 7th - April 20th, 2021 - *Narrative voice* + + +* The narrative voice is a semi-final pass of the paper to ensure it reads + as a single, unified voice. It should ensure: + * the language origin is consistent throughout the document (lang_en or lang_us), + * phrasing is similar (caddy corner not mixed in with kitty corner), + * acronyms are spelled out at their first use and then abbrieviated later, + * footnotes and citations are consistent and not direct hyperlinks in the text + * vague terms are defined in a glossary or otherwise cited to the cloud native security lexicon in the repo + +- [X] April 20th - April 30th, 2021 - *Final group review* +* Final Review - final review by group, with selected "intended audience" + +- [ ] From April 30th, 2021 - *Publishing Process* diff --git a/website/content/ja/wgs/operator/whitepaper/img/02_1_operator_pattern.png b/website/content/ja/wgs/operator/whitepaper/img/02_1_operator_pattern.png new file mode 100644 index 00000000..a1f87cb2 Binary files /dev/null and b/website/content/ja/wgs/operator/whitepaper/img/02_1_operator_pattern.png differ diff --git a/website/content/ja/wgs/operator/whitepaper/img/02_2_operator.png b/website/content/ja/wgs/operator/whitepaper/img/02_2_operator.png new file mode 100644 index 00000000..d3909d43 Binary files /dev/null and b/website/content/ja/wgs/operator/whitepaper/img/02_2_operator.png differ diff --git a/website/content/ja/wgs/operator/whitepaper/img/04_1_operator_model.png b/website/content/ja/wgs/operator/whitepaper/img/04_1_operator_model.png new file mode 100644 index 00000000..f3879b01 Binary files /dev/null and b/website/content/ja/wgs/operator/whitepaper/img/04_1_operator_model.png differ diff --git a/website/content/ja/wgs/operator/whitepaper/img/071_GitOps_UseCase.png b/website/content/ja/wgs/operator/whitepaper/img/071_GitOps_UseCase.png new file mode 100644 index 00000000..5eb3a451 Binary files /dev/null and b/website/content/ja/wgs/operator/whitepaper/img/071_GitOps_UseCase.png differ diff --git a/website/content/ja/wgs/operator/whitepaper/img/08_1_sample.png b/website/content/ja/wgs/operator/whitepaper/img/08_1_sample.png new file mode 100644 index 00000000..7f790d65 Binary files /dev/null and b/website/content/ja/wgs/operator/whitepaper/img/08_1_sample.png differ diff --git a/website/content/ja/wgs/operator/whitepaper/img/08_2_umbrella.png b/website/content/ja/wgs/operator/whitepaper/img/08_2_umbrella.png new file mode 100644 index 00000000..e759493e Binary files /dev/null and b/website/content/ja/wgs/operator/whitepaper/img/08_2_umbrella.png differ diff --git a/website/content/ja/wgs/operator/whitepaper/img/09_1_distributedops.png b/website/content/ja/wgs/operator/whitepaper/img/09_1_distributedops.png new file mode 100644 index 00000000..42487081 Binary files /dev/null and b/website/content/ja/wgs/operator/whitepaper/img/09_1_distributedops.png differ diff --git a/website/content/ja/wgs/operator/whitepaper/index.md b/website/content/ja/wgs/operator/whitepaper/index.md new file mode 100644 index 00000000..56703487 --- /dev/null +++ b/website/content/ja/wgs/operator/whitepaper/index.md @@ -0,0 +1,1160 @@ +--- +title: "CNCFオペレーターホワイトペーパー" +pdf: https://github.com/cncf/tag-app-delivery/blob/main/operator-whitepaper/v1/CNCF_Operator_WhitePaper_v1-0_20210715.pdf +version_info: https://github.com/cncf/tag-app-delivery/blob/main/operator-whitepaper/latest/README.md +description: "この文書では、オペレーターの分類だけでなく、オペレーターを用いたアプリケーション管理システムの推奨される設定、実装、およびユースケースについても概説します。" +type: whitepapers +--- + +## Table of Contents + +- [Table of Contents](#table-of-contents) +- [Definition](#definition) +- [Executive Summary](#executive-summary) +- [Introduction](#introduction) + - [The Goal of this Document](#the-goal-of-this-document) + - [Target Audience / Minimum Level of Experience](#target-audience--minimum-level-of-experience) +- [Foundation](#foundation) + - [Operator Design Pattern](#operator-design-pattern) + - [Operator Characteristics](#operator-characteristics) + - [Dynamic Configuration](#dynamic-configuration) + - [Operational Automation](#operational-automation) + - [Domain Knowledge](#domain-knowledge) + - [Operator Components in Kubernetes](#operator-components-in-kubernetes) + - [Kubernetes Controllers](#kubernetes-controllers) + - [Custom Resources and Custom Resource Definitions](#custom-resources-and-custom-resource-definitions) + - [Control Loop](#control-loop) + - [Operator Capabilities](#operator-capabilities) + - [Install an Application / Take Ownership of an Application](#install-an-application--take-ownership-of-an-application) + - [Upgrade an Application](#upgrade-an-application) + - [Backup](#backup) + - [Recovery from backup](#recovery-from-backup) + - [Auto-Remediation](#auto-remediation) + - [Monitoring/Metrics - Observability](#monitoringmetrics---observability) + - [Scaling](#scaling) + - [Auto-Scaling](#auto-scaling) + - [Auto-Configuration tuning](#auto-configuration-tuning) + - [Uninstalling / Disconnect](#uninstalling--disconnect) +- [Security](#security) + - [Operator Developer](#operator-developer) + - [Transparency and Documentation](#transparency-and-documentation) + - [Operator Scope](#operator-scope) + - [Vulnerability Analysis](#vulnerability-analysis) + - [Application Developer (Operator-"Users")](#application-developer-operator-users) +- [Operator Frameworks for Kubernetes](#operator-frameworks-for-kubernetes) + - [CNCF Operator Framework](#cncf-operator-framework) + - [Kopf](#kopf) + - [kubebuilder](#kubebuilder) + - [Metacontroller - Lightweight Kubernetes Controllers as a Service](#metacontroller---lightweight-kubernetes-controllers-as-a-service) + - [Juju - Model-driven Operator Framework](#juju---model-driven-operator-framework) +- [Operator Lifecycle Management](#operator-lifecycle-management) + - [Upgrading the Operator](#upgrading-the-operator) + - [Upgrading the Declarative State](#upgrading-the-declarative-state) + - [Managing Relations of CRDs](#managing-relations-of-crds) +- [Use Cases for an Operator](#use-cases-for-an-operator) +- [Well known Operators \& patterns](#well-known-operators--patterns) + - [Prometheus Operator](#prometheus-operator) + - [Operator for GitOps](#operator-for-gitops) + - [Successful Patterns](#successful-patterns) + - [Management of a single type of application](#management-of-a-single-type-of-application) + - [Operator of Operators](#operator-of-operators) + - [One CRD per Controller](#one-crd-per-controller) + - [Where to publish and find Operators](#where-to-publish-and-find-operators) + - [Further reading](#further-reading) +- [Designing Operators](#designing-operators) + - [Requirement Analysis](#requirement-analysis) + - [Custom or third-party Operator](#custom-or-third-party-operator) + - [Use the right Tool](#use-the-right-tool) + - [Use the right programming language](#use-the-right-programming-language) + - [Design your Operator the according to your needs](#design-your-operator-the-according-to-your-needs) + - [References](#references) +- [Emerging Patterns of the Future](#emerging-patterns-of-the-future) + - [Operator Lifecycle Management](#operator-lifecycle-management-1) + - [Policy-Aware Operators](#policy-aware-operators) + - [Operator data modelling](#operator-data-modelling) + - [References](#references-1) +- [Conclusion](#conclusion) +- [Related Work](#related-work) + - [References](#references-2) +- [Acknowledgements](#acknowledgements) + - [V1.1](#v11) + - [Contributors](#contributors) + - [Reviewers](#reviewers) + - [V1.0](#v10) + - [Contributors](#contributors-1) + - [Reviewers](#reviewers-1) + +## Definition + +An operator is a synthesis of human behaviour, codified into software to facilitate the full lifecycle management of an application. + +## Executive Summary + +Maintaining application infrastructure requires many repetitive human activities that are devoid of lasting value. +Computers are the preferred method of performing precise tasks, verifying the state of an object and therefore enabling the infrastructure requirements to be codified. An operator provides a way to encapsulate the required activities, checks and state management of an application. + +In Kubernetes, an operator provides intelligent, dynamic management capabilities by extending the functionality of the API. + +These operator components allow for the automation of common processes as well as reactive applications that can continually adapt to their environment. This in turn, allows for more rapid development with fewer errors, lower mean-time-to-recovery, and increased engineering autonomy. + +Given the rising popularity of the operator pattern, it has become incumbent for there to be a reference paper that helps both novice and expert alike to learn from the community endorsed best practices for achieving their goals. +In this document, we outline not only the taxonomy of an operator but the recommended configuration, implementation and use cases for an operator application management system. + +## Introduction + +This whitepaper defines operators in a wider context than Kubernetes. It describes their characteristics and components, gives an overview of common patterns currently in use and explains how they differ from Kubernetes controllers. + +Additionally, it provides a deep dive into the capabilities of Kubernetes controllers, including backup, recovery and automatic configuration tuning. Further insights into frameworks currently in use, lifecycle management, security risks and use cases are provided. + +This paper includes best practices including observability, security and technical implementation. + +It closes with related work, highlights the additional value they can bring beyond this whitepaper and the next steps for operators. + +### The Goal of this Document +The goal of this document is to provide a definition of operators for cloud native applications in the context of Kubernetes and other container orchestrators. + + +### Target Audience / Minimum Level of Experience +This document is intended for application developers, Kubernetes cluster operators and service providers (internal or external) - who want to learn about operators and the problems they can solve. It can also help teams already looking at operators to learn when and where to use them to best effect. It presumes basic Kubernetes knowledge such as familiarity with Pods and Deployments. + +## Foundation +Kubernetes and the success of other orchestrators has been due to their focus on the main capabilities of containers. +While companies began their journey to cloud native, working with more specific use cases (microservices, stateless applications) made more sense. +As Kubernetes and other container orchestrators grew their reputation and extensibility, requirements became more ambitious. +The desire to use the full lifecycle capabilities of an orchestrator was also transferred to highly distributed data stores. + +Kubernetes primitives were not built to manage state by default. +Relying on Kubernetes primitives alone brings difficulty managing stateful application requirements such as replication, failover automation, backup/restore and upgrades (_which can occur based on events that are too specific_). + +The Operator Pattern can be used to solve the problem of managing state. +By leveraging Kubernetes built-in capabilities such as self-healing, reconciliation and extending those along with application-specific complexities; it is possible to automate any application lifecycle, operations and turn it into a highly capable offering. + +Operators are thought of as synonymous with Kubernetes. +However, the idea of an application whose management is entirely automated can be exported to other platforms. +The aim of this paper is to bring this concept to a higher level than Kubernetes itself. + +### Operator Design Pattern +This section describes the pattern with high-level concepts. +The next section _Kubernetes Operator Definition_ will describe the implementations of the pattern in terms of Kubernetes objects and concepts. + +The operator design pattern defines how to manage application and infrastructure resources using domain-specific knowledge and declarative state. The goal of the pattern is to reduce the amount of manual imperative work (how to backup, scale, upgrade...) which is required to keep an application in a healthy and well-maintained state, by capturing that domain specific knowledge in code and exposing it using a declarative API. + +By using the operator pattern, the knowledge on how to adjust and maintain a resource is captured in code and often within a single service (also called a controller). + +When using an operator design pattern the user should only be required to describe the desired state of the application and resources. The operator implementation should make the necessary changes in the world so it will be in the desired state. The operator will also monitor the real state continuously and take actions to keep it healthy and in the same state (preventing drifts). + +A general diagram of an operator will have software that can read the desired spec and can create and manage the resources that were described. + +![Operator Design Pattern](img/02_1_operator_pattern.png) + +The Operator pattern consists of three components: + +* The application or infrastructure that we want to manage. +* A domain specific language that enables the user to specify the desired state of the application in a declarative way. +* A controller that runs continuously: + * Reads and is aware of the state. + * Runs actions when operations state changes in an automated way. + * Report the state of the application in a declarative way. + +This design pattern will be applied on Kubernetes and its operators in the next sections. + +### Operator Characteristics +The core purpose of any operator is to extend its orchestrator's underlying API with new domain knowledge. As an example, an orchestration platform within Kubernetes natively understands things like containers and layer 4 load balancers via the Pod and Service objects. An operator adds new capabilities for more complex systems and applications. For instance, a prometheus-operator introduces new object types _Prometheus_, extending Kubernetes with high-level support for deploying and running Prometheus servers. + +The capabilities provided by an operator can be sorted into three overarching categories: dynamic configuration, operational automation and domain knowledge. + +#### Dynamic Configuration +Since the early stages of software development, there have been two main ways to configure software: configuration files and environment variables. The cloud native world created newer processes, which are based on querying a well-known API at startup. Most existing software relies on a combination of both of these options. Kubernetes naturally provides many tools to enable custom configuration (such as ConfigMaps and Secrets). Since most Kubernetes resources are generic, they don’t understand any specifics for modifying a given application. In comparison, an operator can define new custom object types (custom resources) to better express the configuration of a particular application in a Kubernetes context. + +Allowing for better validation and data structuring reduces the likelihood of small configuration errors and improves the ability of teams to self-serve. This removes the requirement for every team to house the understanding of either the underlying orchestrator or the target application as would be traditionally required. This can include things like progressive defaults, where a few high-level settings are used to populate a best-practices-driven configuration file or adaptive configuration such as adjusting resource usage to match available hardware or expected load based on cluster size. + +#### Operational Automation +Along with custom resources, most operators include at least one custom controller. These controllers are daemons that run inside the orchestrator like any other but connect to the underlying API and provide automation of common or repetitive tasks. This is the same way that orchestrators (like Kubernetes) are implemented. You may have seen kube-controller-manager or cloud-controller-manager mentioned in your journey so far. However, as was demonstrated with configuration, operators can extend and enhance orchestrators with higher-level automation such as deploying clustered software, providing automated backups and restores, or dynamic scaling based on load. + +By putting these common operational tasks into code, it can be ensured they will be repeatable, testable and upgradable in a standardized fashion. Keeping humans out of the loop on frequent tasks also ensures that steps won’t be missed or excluded and that different pieces of the task can’t drift out of sync with each other. As before, this allows for improved team autonomy by reducing the hours spent on boring-but-important upkeep tasks like application backups. + +#### Domain Knowledge +Similar to operational automation, it can be written into an operator to encode specialized domain knowledge about particular software or processes. A common example of this is application upgrades. While a simple stateless application might need nothing more than a Deployment’s rolling upgrade; databases and other stateful applications often require very specific steps in sequence to safely perform upgrades. The operator can handle this autonomously as it knows your current and requested versions and can run specialized upgrade code when needed. More generally, this can apply to anything a pre-cloud-native environment would use manual checklists for (effectively using the operator as an executable runbook). +Another common way to take advantage of automated domain knowledge is error remediation. For example, the Kubernetes built-in remediation behaviors mostly start and end with “restart container until it works” which is a powerful solution but often not the best or fastest solution. +An operator can monitor its application and react to errors with specific behavior to resolve the error or escalate the issue if it can’t be automatically resolved. This can reduce MTTR (mean time to recovery) and also reduce operator fatigue from recurring issues. + +### Operator Components in Kubernetes + +*“An operator is a Kubernetes controller that understands 2 domains: Kubernetes and something else. By combining knowledge of both domains, it can automate tasks that usually require a human operator that understands both domains”* +(Jimmy Zelinskie, https://github.com/kubeflow/tf-operator/issues/300#issuecomment-357527937) + +![Operator Big Picture](img/02_2_operator.png) +Operators enable the extension of the Kubernetes API with operational knowledge. +This is achieved by combining Kubernetes controllers and watched objects that describe the desired state. The controller can watch one or more objects and the objects can be either Kubernetes primitives such as Deployments, Services or things that reside outside of the cluster such as Virtual Machines or Databases. + +The desired state refers hereby to any resource that is defined in code and which the operator is configured to manage. Subsequently, the current state references the deployed instance of those resources. + +The controller will constantly compare the desired state with the current state using the reconciliation loop which ensures that the watched objects get transitioned to the desired state in a defined way. + +The desired state is encapsulated in one or more Kubernetes custom resources and the controller contains the operational knowledge which is needed to get the objects (such as deployments, services) to their target state. + +#### Kubernetes Controllers +A Kubernetes Controller takes care of routine tasks to ensure the desired state expressed by a particular resource type matches the current state (current state,https://engineering.bitnami.com/articles/a-deep-dive-into-kubernetes-controllers.html, https://fntlnz.wtf/post/what-i-learnt-about-kubernetes-controller/). For instance, the Deployment controller takes care that the desired amount of pod replicas is running and a new pod spins up, when one pod is deleted or fails. + +Technically, there is no difference between a typical controller and an operator. Often the difference referred to is the operational knowledge that is included in the operator. Therefore, a controller is the implementation, and the operator is the pattern of using custom controllers with CRDs and automation is what is looking to be achieved with this. As a result, a controller which spins up a pod when a custom resource is created, and the pod gets destroyed afterwards can be described as a simple controller. If the controller has operational knowledge like how to upgrade or remediate from errors, it is an operator. + +#### Custom Resources and Custom Resource Definitions +Custom resources are used to store and retrieve structured data in Kubernetes as an extension of the default Kubernetes API (https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/). +In the case of an operator, a custom resource contains the desired state of the resource (e.g. application) but does not contain the implementation logic. Such information could be the version information of application components, but also enabled features of an application or information where backups of the application could be part of this. A custom resource definition (CRD) defines how such an object looks like, for example, which fields exist and how the CRD is named. Such a CRD can be scaffolded using tools (as the operator SDK) or be written by hand. + + +The following example illustrates, how such an custom resource instance definition could look like: + +```yaml +apiVersion: example-app.appdelivery.cncf.io/v1alpha1 +kind: ExampleApp +metadata: + name: appdelivery-example-app +spec: + appVersion: 0.0.1 + features: + exampleFeature1: true + exampleFeature2: false + backup: + enabled: true + storageType: “s3” + host: “my-backup.example.com” + bucketName: “example-backup” + status: + currentVersion: 0.0.1 + url: https://myloadbalancer/exampleapp/ + authSecretName: appdelivery-example-app-auth + backup: + lastBackupTime: 12:00 +``` + +This example represents a custom resource with the name “appdelivery-example-app” of the kind “ExampleApp”. + +The “spec” section is where the user can declare the desired state. This example declares that appVersion 0.0.1 should be deployed with one feature enabled and another disabled. Furthermore, backups of this application should be made, and a s3 bucket should be used. + +The “status” section is where the operator can communicate useful information back to the user. In this example, the status shows the current deployed version. If it is different from the “appVersion” in the spec, then the user can expect that the operator is working to deploy the version requested in the spec. Other common information in the status section includes how to connect to an application and the health of the application. + +#### Control Loop +The control (reconciliation) loop in a Kubernetes controller ensures that the state that the user declares using a CRD matches the state of the application, but also that the transition between the states works as intended. One common use-case could be the migration of database schemes when upgrading an application. The control loop can be triggered on specific events, as a change on the crd, but also time-based, like for backing up data at a defined time. + +### Operator Capabilities +An operator is able to assist with operating an application or other managed components by solving many different tasks. When talking about operators, the first and most well known capability is the ability of installing and upgrading stateful applications. However, an operator could manage the full lifecycle of an application without requiring manual input on installation/upgrades. + +The following sections should give an overview about capabilities an operator could have and what a user can expect if an operator implements these capabilities. + +#### Install an Application / Take Ownership of an Application +An operator should be able to provision and set up all the required resources, so no manual work would be required during the installation. An operator must check and verify that resources that were provisions are working as expected, and ready to be used. + +An operator should also be able to recognize resources that were provisioned before the installation process, and only take ownership of them for later use. In this case, the ownership process should be seamless and not cause downtime. The ownership process purpose is to enable easy migration of resources to the operator. + +An Operator should report the version of the resources and their health status during the process. + +#### Upgrade an Application +An operator should be able to upgrade the version of the application/resources. The operator should know how to update the required dependencies and execute custom commands such as running a database migration. + +An operator should monitor the update and rollback if there was a problem during the process. + +An operator should report the version of the resources and their health status during the process. If there was an error, the version reported should be the version that is currently being used. + +#### Backup + +This capability is for operators that manage data and ensure that the operator is able to create consistent backups. This backup should be done in a way that the user of the operator can be certain that the previous version can be restored if data is lost or compromised. Furthermore, the status information provided should give insights about when the backup last ran and where it is located. + +![Example Backup Process](plantuml/backup-sequence.png) + +The above illustration shows how such a process could look like. At first, the backup gets triggered either by a human or another trigger (e.g. time-trigger). The operator instructs its watched resource (application) to set up a consistent state (like a consistent snapshot). Afterwards, the data of the application gets backed up to external storage using appropriate tools. This could either be a one-step process (backup directly to external storage) or in multiple steps, like writing to a persistent volume at first and to the external storage afterwards. The external storage might be an NFS/CIFS share (or any other network file system) on-premises, but also an object store/bucket on a cloud provider infrastructure. Whether the backup failed or succeeded, the state (of the backup) including the backed-up application version and the location of the backup might be written to the status section of the custom resource. + +#### Recovery from backup + +The recovery capability of an operator might assist a user in restoring the application state from a successful backup. Therefore, the application state (application version and data) should be restored. + +There might be many ways to achieve this. One possible way could be that the current application state also gets backed up (including configuration), so the user only has to create a custom resource for the application and point to the backup. The operator would read the configuration, restore the application version and restore the data. Another possible solution might be that the user only backed up the data and might have to specify the application version used. Nevertheless, in both ways, the operator ensures that the application is up and running afterwards using the data from the backup specified. + +#### Auto-Remediation +The auto-remediation capability of an operator should ensure that it is able to restore the application from a more complex failed state, which might not be handled or detected by mechanisms such as health checks (live and readiness probes). Therefore, the operator needs to have a deep understanding of the application. This can be achieved by metrics that might indicate application failures or errors, but also by dealing with kubernetes mechanisms like health checks. + +Some examples might be: +* Rolling back to the last known configuration if a defined amount of pod starts is unsuccessful after a version change. + In some points a restart of the application might be a short-term solution which also could be done by the operator. +* It could also be imaginable that an operator informs another operator of a dependent service that a backend system is not reachable at the moment (to take remediation actions). + +In any situation, this capability enables the operator to take actions to keep the system up and running. + + +#### Monitoring/Metrics - Observability +While the managed application should provide the telemetry data for itself, the operator could provide metrics about its own behavior and only provides a high level overview about the applications state (as it would be possible for auto-remediation). Furthermore, typical telemetry data provided by the operator could be the count of remediation actions, duration of backups, but also information about the last errors or operational tasks which were handled. + + +#### Scaling +Scaling is part of the day-2 operations that an operator can manage in order to keep the application / resources functional. The scaling capability doesn’t require the scaling to be automated, but only that the operator will know how to change the resources in terms of horizontal and vertical scaling. + +An operator should be able to increase or decrease any resource that it owns, such as CPU, memory, disk size and number of instances. + +Ideally the scaling action will be without downtime. Scaling action ends when all the resources are in consistent state and ready to be used, so an operator should verify the state of all the resources and report it. + +#### Auto-Scaling +An operator should be able to perform the scaling capability based on metrics that it collects constantly and according to thresholds. An operator should be able to automatically increase and decrease every resource that it’s own. + +An operator should respect basic scaling configuration of min and max. + + +#### Auto-Configuration tuning +This capability should empower the operator to manage the configuration of the managed application. As an example, the operator could adopt memory settings of an application according to the operation environment (e.g. Kubernetes) or the change of DNS names. Furthermore, the operator should be able to handle configuration changes in a seamless way, e.g. if a configuration change requires a restart, this should be triggered. + +These capabilities should be transparent to the users. The user should have the possibility to override such auto-configuration mechanisms if they want to do so. Furthermore, automatic reconfigurations should be well-documented in a way that the user could comprehend what is happening on the infrastructure. + +#### Uninstalling / Disconnect +When deleting the declarative requested state (in most cases a custom resource), an operator should allow two behaviors: +- Uninstalling: An operator should be able to completely remove or delete every managed resource. +- Disconnecting: An operator should stop managing the provisioned resources. + +Both processes should be applied to every resource that the operator directly provisioned. +An operator should report any failure in the process in a declarative way (using the [status field](https://kubernetes.io/docs/concepts/overview/working-with-objects/kubernetes-objects/#object-spec-and-status) for example). + +## Security +![operator model](img/04_1_operator_model.png) + +Operators are intended to manage their state and configuration via the Kubernetes API server using the Custom Resource Definition. The subordinate API resources they manage (often pods running stateful applications) also have their lifecycle and supporting RBAC, services, etc. managed via the Kubernetes API. In some cases, the operator will also interact with the application’s API across the network. All of these routes offer the potential to compromise the operator and its resources and should be protected in line with best practices laid out below. + +### Operator Developer +Operator developers should be aware of the security risks an operator +introduces and document its secure use. While developing an operator +it's important to focus on key areas such as transparency and +documentation, operator scope, and vulnerability analysis. + +#### Transparency and Documentation +During the development of an operator, a developer should have a clear understanding of how it will work and interface within Kubernetes. As developers shift from development to publishing the operator, users should be provided with a clear understanding of what the operator does, and how. +You've written something you're proud of, but think of this from the end user's point of view: Should they trust source code from the internet, an operator to run with administrative access on their cluster which may be large and costly, or maybe handling sensitive information? Anything the developer can do to help a user get up to speed with their software, how it works, how it's secured, and what effects it might have on their cluster will make it easier for them to adopt the software. + +Here are some items that can help users make informed decisions +about if they should use an operator: + +* Descriptive diagram (threat model) of how the operator is + communicating and with what is a good start to helping a user + understand how they must secure it and apply policy for the operator. +* Use case of how the software is intended to be used in order to + stay in scope for compliance or you risk vulnerability outside that + scope. +* Documented RBAC scopes, threat model, communication ports, API + calls available, pod security policy requisites (or other policy engine + requisites), or any other policy engine requisites developed for + Kubernetes such as OPA. +* Security reporting, disclosure, and incident response processes: + If someone finds a potential security issue, who should they contact + and what type of response should they expect? +* Logging and monitoring attachment through exposed endpoints, log + levels, or log aggregation. +* Operator issue, feature, version tracking. +* If the project has had security disclosures in the past, listing + these disclosures (and their CVE IDs) on a web page is a strong step + in building trust with users. Everyone will have security + issues at some point - how they are handled displays the maturity + of a project. + +For further ideas around the security of the development process, +the reader may wish to review the CNCF Security TAG's [self-assessment +questionnaire](https://github.com/cncf/sig-security/blob/master/assessments/guide/self-assessment.md). + +#### Operator Scope + +There are many use cases for operators and there is virtually no limit +in the scope of what you can design it for. In order to be clear about +the secure nature of an operator there should be clear communication +involved with each scope. The general scope’s which could be used are +cluster-wide operators, namespace operators, and external operators. In +order to best secure them, there needs to be an understanding of the +communication, any API’s created, controllers and their responsibility, +and any application metric endpoints. If this information is provided +with the operator it can be used to further secure the operator +application within the scope of implementation. If the information is +not provided you can be left vulnerable to a myriad of attacks. + +**Cluster-wide Operators** exist to execute custom resources across a +cluster no matter if those resources are living in another namespace +or not. +**Namespace Operators** exist to execute custom resources within a +namespace. Usually there are policy engine policies applied to jail the +scope within the namespace and only communicate with pods within the +namespace. This is considered more secure by nature, but the same rules +apply. +**External Operators** exist to execute custom resources that are +external to the cluster. The same rules apply, in addition to secure this +scope we must know the nature of the communication from the cluster to +the external component. + +While this paper also discusses scoping from a user point-of-view, +how an operator is designed will weigh heavily on the type of +security controls which can be applied against it in production. +It is common to start with lax permissions, and intentions to apply +security concepts before release; Spending some time thinking about +the security design of the operator as developers begin work on it +will make this process much easier for developers and their users. + +#### Vulnerability Analysis + +Being focused on the development and security of the operator, +there are steps that must be taken as an operator developer to ensure +validation and proper security analysis has been done. Following the +guidelines in the CNCF Cloud Native Security Whitepaper there is a +clear lifecycle process which defines the [layers of concern](https://github.com/cncf/sig-security/blob/master/security-whitepaper/cloud-native-security-whitepaper.md#cloud-native-layers) for the operator developer. All three layers +should be adhered to with a strict focus on the develop and distribute +layers in the scope of the operator developer. There are many detailed +guidelines in the development and distribution layers that will help +to apply sound vulnerability analysis to supply chain to ensure +that the operator being developed is signed and trusted for the best +integrity. The CNCF [Cloud Native Security Whitepaper](https://github.com/cncf/sig-security/blob/master/security-whitepaper/cloud-native-security-whitepaper.md) +is available at this link. + +In addition to the supply chain there needs to be a focus on +performing a threat model of the operator to keep the developer +in check and also make sure that there was nothing incidentally missed +that could leave the door open for attack. The foundational model for +checking for threats can be observed in the CNCF Cloud Native Security +Whitepaper on [Threat Modeling](https://github.com/cncf/sig-security/blob/master/security-whitepaper/cloud-native-security-whitepaper.md#threat-modeling). + +### Application Developer (Operator-"Users") + +Operators perform administrative tasks on the user’s behalf such +as volume creation/attachment, application deployment, and +certificate management. As the user is delegating control to the +operator, it is essential to provide machine authorization to perform +the actions needed, but one must also be careful to not grant more +privileges than necessary for the operator to perform its role. + +Deployment of an operator grants third-party software some level +of access to a Kubernetes namespace or cluster. While security +expertise is not required to use operators, the following Kubernetes +concepts highlight security preparation when using an operator: + +**Namespaces** are one of the primary ways of grouping and cordoning a +group of resources. In regards to an operator, the user should +consider what namespaces the operator needs to work with. While +there may be some use cases where a single operator needs access +to the whole cluster, it seems the common use case in 2021 is for +an operator to work with a specific application within Kubernetes, +so it usually makes sense to provide a namespace for that application +and related resources/operators. To further reduce the operator’s +separation from any loose or stolen RBAC in the subordinate resource’s +namespace, a dedicated namespace for the operator provides more +separation. + +**Role-Based Access Controls** are available in modern releases of +Kubernetes. When granting an operator access to resources, the focus +should be on granting the most limited set of permissions needed +for the operator to perform its task. This means only grant +ClusterRoles if absolutely necessary, but granting specific permissions +for specific resources/namespaces. The +[Using RBAC Authorization](https://kubernetes.io/docs/reference/access-authn-authz/rbac/) +chapter of the user guide covers this topic in detail. Operator +build kits such as the Operator SDK use general RBAC defaults that +developers may have not refined for their specific operator. +Permissions afforded by the service account identity outside the +cluster include federated and cross-cluster operators that have +permissions in other Kubernetes clusters. As operators are increasingly +used to manage off-cluster and cloud resources, cloud IAM integration +permissions should be configured to prevent cloud account takeover +from a compromised operator. + +_One thing to note_: A “land grab” of privileges - e.g requesting +significant/administrative access - is not always malicious in +intent. The developer might not know better or have had the time +to tune the required permissions to the concept of least privilege. +Even in the most innocent case, though, it is still a red flag: +Perhaps the operator is reached enough adoption for others to find and +raise concerns about the overuse of privileges, and perhaps it is a +sign of other security weaknesses within the operator. It is advisable +to proceed with caution if such a “land grab” is found. + +**Software provenance**: The “software supply chain” is starting to get +more attention at the time of writing this whitepaper. Consider the +source for an operator, how it is being installed, and how or why +a malicious user may want access to a kubernetes cluster. Spend a few minutes +reviewing an installation script before running it. While the kubectl +command supports the ability to apply a yaml script directly from +the public Internet (e.g `kubectl create -f +https://publicwebsite.com/install/operator.yaml`) it is strongly +recommended that one first downloads that file locally, review it, +and then run `kubectl create -f operator.yaml`. + +To review the script ask the following questions: + +* What is the purpose of this script? +* What resources are being created by the script? Is this script creating Roles and RoleBindings? +* What 3rd party sources will the script attempt to use? (e.g. + container images, other yaml files) How popular and well-maintained + are the git and docker image repositories? These might be signs of + a new project, abandoned software which is no longer receiving + security updates, or indicators of an unofficial repository with + malicious intent. +* What privileges does the script attempt to gain? Does the script + attempt to run container securityContexts with host sharing or + “privileged mode”? + +More information about software supply chain security is available in the [CNCF Supply Chain Security White Paper](https://github.com/cncf/tag-security/tree/main/supply-chain-security/supply-chain-security-paper). + +**Advanced security controls**, such as SELinux, AppArmor, or seccomp +may be mandated by cluster policy. Open source operators are unlikely +to have configurations for these Linux security modules, but if +an organization is familiar with one of these control systems, +writing the appropriate security configuration for the operator +should not require significant overhead. + +**Operator configuration**: Ideally a project will be “secure by default” to increase the likelihood of a secure operator or application deployment. Insecure defaults require manual configuration to secure the environment. While it may seem like unnecessary work to learn the configuration parameters of a new operator, it is usually preferable to manually adjusting the configuration and/or source code of an operator itself to reach the needed level of security. + +## Operator Frameworks for Kubernetes +Currently, many frameworks exist to simplify the process of bootstrapping an operator/controller project and to write operators. This chapter describes some of them without any claim to comprehensiveness. + +### CNCF Operator Framework + +The *[Operator Framework](https://github.com/operator-framework)* is an open source toolkit to manage Kubernetes native applications, called Operators, in an effective, automated, and scalable way. + +It aims at Operator Developers with an SDK to streamline Operator development with scaffolding tools (based on [kubebuilder](https://github.com/kubernetes-sigs/kubebuilder)), a test harness for unit tests and integration as well as functional tests and packaging / distribution mechanisms to publish version histories of Operators in conjunction with a user-configurable update graph. Supported project types are Golang, Helm and Ansible. Python and Java are currently in development. + +It also caters for Kubernetes administrators that require a central point to install, configure and update Operators in a multi-tenant environment with potentially dozens of Operators installed. It covers the following aspects of Operator lifecycle: + +- Continuous over-the-Air Updates and Catalogs of Operators a publishing mechanism and source of updates +- Dependency Model so Operator can have dependencies on cluster features or on each other +- Discoverability for less privileged tenants that usually cannot list CRDs or see Operators installed in separate namespaces +- Cluster Stability that avoid runtime conflicts of Operators on multi-tenant clusters while honoring the global nature of CRDs, and the subtleties of CRD versioning and CRD conversion +- Declarative UI controls that allows consoles to generate rich UI experiences for end users interacting with Operator services + +Main advantages of the Operator Framework are: + +- Simplified development: The Operator Framework simplifies the development of Kubernetes operators by providing a framework, tooling, and best practices for building operators. + +- Reusability: The Operator Framework promotes the creation of reusable operators, which can be used across different applications and projects. + +- Kubernetes-native: The Operator Framework is built on top of Kubernetes APIs and conventions, making it easier to develop operators that integrate well with the Kubernetes ecosystem. + +- Robustness: The Operator Framework generates code that adheres to best practices for building Kubernetes operators, making it easier to build robust, production-grade applications. + +- Community-driven: The Operator Framework has a large and active community that provides support, resources, and examples to help developers get started and solve problems. + +Main limitations: +- Learning curve: Building Kubernetes operators with the Operator Framework can still be a complex task, especially for developers who are new to Kubernetes or the concept of operators. + +- Limited flexibility: The Operator Framework is opinionated about how Kubernetes operators should be built, which can limit flexibility and customization options in certain cases. + +- Performance overhead: Kubernetes operators built with the Operator Framework can add a performance overhead to the Kubernetes cluster, especially for large-scale or distributed applications. + +- Maintenance: The Operator Framework requires ongoing maintenance and updates to ensure compatibility with new Kubernetes releases and changes to the operator's dependencies. + +### Kopf + +**[Kopf](https://github.com/nolar/kopf)** —**K**ubernetes **O**perator **P**ythonic **F**ramework— is a framework +to create Kubernetes operators faster and easier, just in a few lines of Python. +It takes away most of the low-level Kubernetes API communication hassle and +marshalls the Kubernetes resource changes to Python functions and back: + +```python +import kopf + +@kopf.on.create(kind='KopfExample') +def created(patch, spec, **_): + patch.status['name'] = spec.get('name', 'world') + +@kopf.on.event(kind='KopfExample', field='status.name', value=kopf.PRESENT) +def touched(memo, status, **_): + memo.last_name = status['name'] + +@kopf.timer('KopfExample', interval=5, when=lambda memo, **_: 'last_name' in memo) +def greet_regularly(memo, **_): + print(f"Hello, {memo['last_name']}!") +``` + +You should consider using this framework if you want or need to make ad-hoc +(here-and-now one-time non-generalizable) operators in Python 3.7+; especially if you want to bring your application domain directly to Kubernetes as custom +resources. +For more features, see the [documentation](https://kopf.readthedocs.io/en/stable/). + +Main advantages of using kopf: +- Easy to use: Kopf is designed to be easy to use and understand, making it a great choice for developers who are new to Kubernetes or building operators. + +- Python-based: As a Python-based framework, Kopf allows developers to leverage the vast Python ecosystem and libraries, making it easier to integrate with other tools and systems. + +- Declarative approach: Kopf provides a declarative approach to building operators, which makes it easier to define the desired state of the system and handle updates and changes automatically. + +- Lightweight and fast: Kopf is lightweight and has a low overhead, making it a good choice for building operators that need to be deployed in resource-constrained environments. + +Main limitations: + +- Python-specific: While Python is a popular language, some developers may prefer to use other languages to build Kubernetes operators. + +- Limited adoption: Compared to other frameworks and tools, Kopf has a relatively small community and limited adoption, which can limit the availability of resources and support. + +- Limited flexibility: Kopf is designed to be simple and easy to use, which can limit its flexibility and customization options for more complex or specialized use cases. + +- Learning curve: While Kopf is designed to be easy to use, building Kubernetes operators still requires knowledge of Kubernetes concepts and best practices, which can be a challenge for new users. + +### kubebuilder + +The kubebuilder framework provides developers the possibilities to extend the Kubernetes API by using Custom Resource Definitions, and to create controllers that handle these custom resources. + +The main entry point provided by the kubebuilder framework is a *Manager*. In the same way the native Kubernetes controllers are grouped into a single Kubernetes Controller Manager (`kube-controller-manager`), you will be able to create several controllers and make them managed by a single manager. + +As Kubernetes API resources are attached to domains and arranged in Groups, Versions and Kinds, the Kubernetes custom resources you will define will be attached to your own domain, and arranged in your own groups, versions and kinds. + +The first step when using kubebuilder is to create a project attached to your domain, that will create the source code for building a single Manager. + +After initiating your project with a specific domain, you can add APIs to your domain and make these APIs managed by the manager. + +Adding a resource to the project will generate some sample code for you: a sample *Custom Resource Definition* that you will adapt to build your own custom resource, and a sample *Reconciler* that will implement the reconcile loop for your operator handling this resource. + +The kubebuilder framework leverages the `controller-runtime` library, that provides the Manager and Reconciler concepts, among others. + +The kubebuilder framework provides all the requisites for building the manager binary, the image of a container starting the manager, and the Kubernetes resources necessary for deploying this manager, including the `CustomResourceDefinition` resource defining your custom resource, a `Deployment` to deploy the manager, and RBAC rules for your operator to be able to access the Kubernetes API. + +Main advantages of using kubebuilder are: + +- Simplified development: Kubebuilder provides a framework and tooling to scaffold and automate much of the boilerplate code required for building Kubernetes controllers and API servers, allowing developers to focus on business logic. + +- Kubernetes-native: Kubebuilder is built on top of Kubernetes APIs and conventions, making it easier to develop controllers and APIs that integrate well with the Kubernetes ecosystem. + +- Reusability: Kubebuilder encourages the creation of reusable, composable controllers and APIs, which can be shared across different applications and projects. + +- Robustness: Kubebuilder generates code that adheres to best practices for building Kubernetes controllers and APIs, making it easier to build robust, production-grade applications. + +Main limitations: + +- Learning curve: Kubebuilder has a significant learning curve, especially for developers who are new to Kubernetes or the Go programming language. + +- Complexity: While Kubebuilder simplifies much of the development process, building Kubernetes controllers and APIs can still be a complex task, especially for large-scale or distributed applications. + +- Limited flexibility: Kubebuilder is opinionated about how Kubernetes controllers and APIs should be built, which can limit flexibility in certain cases. For example, it may not be suitable for building highly customized or specialized controllers. + +### Metacontroller - Lightweight Kubernetes Controllers as a Service + +[Metacontroller](https://metacontroller.github.io/metacontroller/) is an operator, that makes it easy to write and deploy custom operators. + +It introduces two CRD's itself (2021) : +* [Composite Controller](https://metacontroller.github.io/metacontroller/api/compositecontroller.html) - allowing to write operator triggered by CRD +* [Decorator Controller](https://metacontroller.github.io/metacontroller/api/decoratorcontroller.html) - allowing to write operator triggered by any kubernetes object (also managed by other operators) + +Metacontrollers itself, configured by one of its CRD, will take care of observing cluster state and call controller, provided by user(user controller), to take actions. + +User controller should, having given resources as input, compute the desired state of dependent objects. + +This could also be called `lambda controller` pattern (more on this [here](https://metacontroller.github.io/metacontroller/concepts.html#lambda-controller)), as the output is calculated only considering input and the logic used by metacontroller could also reside at a Function-as-a-Service provider. + +Main advantages of metacontroller : +* Only a function (called via webhook) without any boilerplate related to watching kubernetes resources needs to be provided +* Such a function can be written in any language, and exposed via http + +Main limitations : +* Only certain patterns are possible to implement, mentioned above +* The current architecture relies on a single metacontroller in a cluster +* Metacontroller is not aware of any external state, it relies entirely on cluster state + +Example metacontroller configuration, shown below, is used to add additional network exposure for `StatefulSet` without explicitly defining `Service` manifest. +```yaml +apiVersion: metacontroller.k8s.io/v1alpha1 +kind: DecoratorController +metadata: + name: service-per-pod +spec: + resources: + - apiVersion: apps/v1 + resource: statefulsets + annotationSelector: + matchExpressions: + - {key: service, operator: Exists} + - {key: port, operator: Exists} + attachments: + - apiVersion: v1 + resource: services + hooks: + sync: + webhook: + url: http://service-per-pod.metacontroller/sync-service-per-pod + timeout: 10s + +``` +With above configuration : +* `metacontroller`, for every object matching `spec.resources` description (in this case - `apps/v1/statefulsets` with `service` and `port` annotations), will watch for any change in matching objects (create/update/delete) and invoke `hooks.sync` on each of those +* the `hooks.sync` can return objects which are described in `spec.attachments` (in this case - `v1/services`) which will be created/updated/deleted by `metacontroller`, according to `hook` response + For example, if below `StatefulSet` will be deployed: +```yaml +apiVersion: apps/v1 +kind: StatefulSet +metadata: + annotations: + service: "statefulset.kubernetes.io/pod-name" + ports: "80:8080" +... +``` +given `Service` object will be created by metacontroller: +```yaml +apiVersion: "v1" +kind: "Service" +spec: + selector: "statefulset.kubernetes.io/pod-name" + ports: + - port: 80 + targetPort: 8080 +``` + +The user defined endpoint (in this example - `http://service-per-pod.metacontroller/sync-service-per-pod`) only needs to care about the calculation of the `Service` and how it should look like for a given `StatefulSet`. + +Additional examples and ideas that could be implemented using metacontroller, can be found at the [metacontroller-examples](https://metacontroller.github.io/metacontroller/examples.html) page ! + +For any question, please visit our slack channel ([#metacontroller](https://kubernetes.slack.com/archives/CA0SUPUDP)) or ask it on [github discussions](https://github.com/metacontroller/metacontroller/discussions/). + +### Juju - Model-driven Operator Framework + +Juju Operator Framework is an open-source tool that simplifies the deployment, management, and scaling of complex applications in cloud and container environments. Juju provides a powerful model-driven approach that allows developers to create reusable and composable "charms" to encapsulate application knowledge, configuration, and logic. These charms can be easily deployed and orchestrated by Juju "operators," which are automated agents that handle the lifecycle of an application. One of the significant advantages of Juju is its ability to abstract away the underlying infrastructure, making it easier to deploy and manage applications across multiple clouds and container environments. + +Below is an example of integrations between a web app and database. +``` +# Database charm +name: charm-db +# ... +provides: + database: + interface: charm-db +``` + +``` +# A web app charm connecting to the database +name: my-web-app +# ... +requires: + database: + interface: charm-db + limit: 1 +provides: + website: + interface: http + optional: true +``` + +Main advantages of Juju: +- Abstraction: Juju provides a layer of abstraction that can help simplify the deployment and management of complex applications on Kubernetes. Juju can abstract away the complexity of Kubernetes APIs and allow developers to focus on the application logic. + +- Integration: In Juju, an integration is a connection between applications, or between different units of the same application (the latter are also known as ‘peer relations’). These relations between applications are defined in the charm, and Juju handles the integration between the applications. They allow you to pass data between applications and trigger actions through events. + +- Cloud-agnostic: Juju is cloud-agnostic and can deploy applications to various cloud providers and container platforms. This allows developers to deploy and manage their applications on any cloud or container platform with ease. + +- Model-driven approach: Juju is based on a model-driven approach that makes it easy to automate and manage the lifecycle of an application, including scaling, upgrading, and monitoring. + +Main limitations: +- The framework has a learning curve, and creating effective charms can require significant development effort, making it more suitable for enterprise use cases than smaller projects. + +## Operator Lifecycle Management +An operator is an application, this section will describe considerations regarding the lifecycle of the operator itself. + +### Upgrading the Operator +While upgrading the operator, special care should be taken in regards to the managed resources. During an operator upgrade, the managed resources should be kept in the same state and healthy. + +### Upgrading the Declarative State +The declarative state is the API of the operator, and it may need to be upgraded. The usage of CRD versions indicates the stability of the CRD and the operator - [read more about versioning a CRD](https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definition-versioning/) + +### Managing Relations of CRDs + +As the number of Operators & CRDs adds up, its complexity of management also increases. For example, how to manage the conflicts between Operators, like two ingress-related functions? How to manage the dependencies and/or correlation of data flow between CRDs, like DB cluster and DB backup CRDs? + +To resolve this problem, we would need a concrete model to manage Operators & CRDs and +a new mechanism to oversee them with a policy-based engine. +Community efforts like [KubeVela](https://kubevela.io/) and [Crossplane](https://crossplane.io/) +have been trying to solve this problem by providing solutions to compose CRDs. +KubeVela also provides management of data dependencies between custom resources. + +## Use Cases for an Operator + +- Database management: Operators can be used to manage and automate the deployment, scaling, and management of databases running on Kubernetes. For example, an operator could manage the deployment of a MySQL or PostgreSQL database cluster and perform tasks like scaling, backups, and upgrades. + +- Application deployment: Operators can be used to automate the deployment and management of complex applications running on Kubernetes. For example, an operator could manage the deployment of a containerized web application, handle rolling upgrades, and perform auto-scaling based on metrics like CPU usage. + +- Monitoring and logging: Operators can be used to manage the deployment and configuration of monitoring and logging tools like Prometheus or Elasticsearch. Operators can automate tasks like configuring monitoring alerts, collecting and aggregating logs, and performing backups. + +- Machine learning workflows: Operators can be used to manage the deployment and scaling of machine learning workflows and models. For example, an operator could manage the deployment of a TensorFlow or PyTorch cluster and handle tasks like scaling, model training, and deployment. + +- Infrastructure management: Operators can be used to manage the deployment and configuration of infrastructure resources like load balancers, storage volumes, and network policies. For example, an operator could manage the deployment of a load balancer and perform tasks like scaling and routing traffic. + +## Well known Operators & patterns + + +### Prometheus Operator + +The Prometheus Operator was one of the first ever Operators written, along with etcd, that proved the use case for this problem space. + +_"The Prometheus Operator serves to make running Prometheus on top of Kubernetes as easy as possible, while preserving Kubernetes-native configuration options."_ + +When the [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/user-guides/getting-started.md) is installed, besides the operator controller pod/deployment, a wide range of APIs becomes available to configure a Prometheus stack. The APIs are represented as Custom Resource Definitions (CRDs) which allow us to configure objects responsible, among other tasks, for: + +- Describing a set of targets to be monitored by Prometheus (ServiceMonitor). +- Declaratively describing the desired state of a Prometheus deployment. +- Describing an [AlertManager](https://github.com/prometheus/alertmanager) cluster to handle alerts sent by client applications. + +The benefit is using Kubernetes native configuration as a way to configure your whole operation stack, benefiting from Kubernetes resources validation and self-healing capabilities. + +The Operator controller will then communicate with the Kubernetes API server to add Service metrics endpoints and automatically generate the required Prometheus scrape configurations for the configured Services. + +### Operator for GitOps +Often, operators are associated with installing, upgrading and operating applications. One example that an operator could also "operate" things without managing an application can be found in the GitOps world. GitOps is the practice of using Git as the single source of truth for all resources. + +There might be the case that an - mainly imperatively managed - application should be orchestrated in a more declarative and Git-driven way. Therefore, an operator could assist in fetching the configuration from a git-repository, analyze configurations to find out if something has to be changed and which actions should be taken and takes the according actions. + +![GitOps Example](img/071_GitOps_UseCase.png) + +The above example illustrates such a case: + +1. a piece of configuration is checked in a git repository. +2. The operator acknowledges the git repository by using a custom resource definition (where the repository path and the information about the secret is stored). +3. The operator fetches the config and analyses it. +4. It applies its operational knowledge to get from the current to the desired state (by querying the application about its current state and sending instructions to get to the desired state). + +This enables the user to have reproducible configurations, versioned in a git repository. + +### Successful Patterns + +Over time, lots of best practices for writing operators have been published by various sources. Following, some of these sources are mentioned and parts of them described based on a scenario. + +Scenario: A microservice application ("The PodTato Head", https://github.com/cncf/podtato-head) should be entirely managed via operators (even if another deployment mechanism would make more sense). This application consists of 4 services and 1 database which can be illustrated as follows: + +![Sample Application](./img/08_1_sample.png) + +Best practices should be applied to this application deployment. + +### Management of a single type of application + +The features an operator provides, should be specific to a single application. Applied to our example, this means that there should be 5 operators which will manage one component (podtato-server, arm-service, foot-service, hat-service and the database) at a time. This provides a good separation of concerns for all of them (based on https://cloud.google.com/blog/products/containers-kubernetes/best-practices-for-building-kubernetes-operators-and-stateful-apps). + +### Operator of Operators + +With a growing count of Operators typically used within the lifecycle of application workload deployment and management, there are opportunities for new interplay of resources and meta behaviors across a group of Operators. Whether the goal is to reduce the cognitive burden of managing multiple asynchronous Operators performing resource changes - or to ensure a level of continuity between release versions; the *Operator of Operators* architecture is being applied in some use cases within the industry. This paradigm typically utilizes a *Meta* Operator to create multiple resources that are in turn asynchronously created and then updated in the meta resource. It enables a single custom resource definition to express a desired state outcome and for the requirements to be partitioned and asynchronously acted upon. + +![distributed](./img/09_1_distributedops.png) + + +Coordinating the setup and lifecycle of the whole stack can remain complex. An Operator controlling a metadata resource can help shield the user from this complexity by coordinating the various parts of the stack and exposes a CRD representing the whole stack. If this is the case, the *Meta* operator should delegate the work to the other Operators for the more specific parts. + +The controllers that own these sub-components of stacks can appear in two ways: + +- An operator distribution package could consist of multiple separate controllers, each handling a sub-component of the stack plus a main controller ( Responsible for the end-user facing CRD, representing the stack as a whole). Deploying such a multi-controller operator as a single package would result in all controllers running at once (one `Pod` each), but only the end-user facing API/CRD is actually exposed and documented for public consumption. When that happens, the controller responsible for this API delegates several duties to the other controllers, that are part of it's packaged using "internal" CRDs. This is useful when the whole "stack" is owned and developed by the same group of operator authors and the "subordinate" controllers don't make sense as a standalone project. To an end-user this set of controllers still appears as a single Operator. The main benefit here is separation of concerns within an operator project. + +![Stack-Operator](./img/08_2_umbrella.png) + +Technically, there would be a custom resource definition for the whole stack managed by an operator. This operator creates a custom resource for each of the components of the stack which are again managed by operators and managing the underlying resources. + + +- The second pattern depicted above, describes higher-level workload Operators. These depend on other general-purpose operator projects to deploy sub-components of a stack. An example would be an Operator, which depends on `cert-manager`, the `prometheus operator` and a `postgresql` operator to deploy its workload with rotating certificates, monitoring and a SQL database. In this case the higher-level workload operator should not try to ship and install `cert-manager` etc at runtime. This is because the operator author then signs up for shipping and maintaining the particular versions of these dependencies as well as dealing with the general problem area of CRD lifecycle management. + + *Instead a package management solution should be employed that supports dependency resolution at install time, so that installing the other required operators is delegated to a package manager in the background and not as part of the higher level operator startup code.* + + This is beneficial for operators that depend on other Operators, which are useful on their own and might even be shared with multiple other operators on the cluster. [OLM](https://github.com/operator-framework/operator-lifecycle-manager), part of the Operator Framework Project, is such a package manager. + + +### One CRD per Controller +Every CRD managed by an operator should be implemented in a single controller. This makes code a bit more readable and should help with separation of concerns. + +### Where to publish and find Operators +There are services like operatorhub.io and artifacthub.io which help end-users to find operators including instructions on how they can be installed. These services often include information about current security issues and the sources of operators. Additionally, information about the capabilities of operators is given. + +### Further reading +There are lots of other best practices like: +* Operators shouldn't make assumptions about the namespaces they are deployed in, but also +* Use an SDK for writing operators + +Additional information at the following sources: +* https://github.com/operator-framework/community-operators/blob/master/docs/best-practices.md +* https://cloud.google.com/blog/products/containers-kubernetes/best-practices-for-building-kubernetes-operators-and-stateful-apps + +## Designing Operators + +The previous chapter describes a use case for an operator that was one of the first operators ever. With no claim +of completeness, this chapter deals with some best practice when writing +own Operators, based on our own experience or described by the +community. However, without clear knowledge of the actual state and +without clear ideas of what we want to achieve, we also need some +methods and techniques specifying what our Operator should do. +Therefore, we will also have to deal with some aspects of requirement +engineering. + +### Requirement Analysis + +A key promise of Kubernetes is that it enables the automation of operational +tasks to deploy, scale, and manage containerized applications across +multiple environments with no (or minimal) human intervention. In +Kubernetes, stateless cloud native applications are well suited for +horizontal scaling, automated self-healing restarts, or progressive +rollout of new containers. However, stateful applications with complex +components running in clustered or distributed environments are not +always well suited for this type of container-based infrastructure. They +still require human interaction when it comes to persistence, upgrades, +or high availability to remain in a stable state. + +True, Kubernetes solves these issues in a novel way by creating and +managing custom applications using Operators. However, and here is the +first question: as a developer, do you really know how this type of +application works and interacts both internally and externally? How do +the day-to-day IT operations work? How is the application backed up +(including recovery)? What steps are necessary in case of failovers or +outages, are there any dependencies between the software components? + +It is therefore strongly recommended that a comprehensive requirement +analysis is needed to determine the requirements or conditions of an +Operator. Requirement analysis is critical to the success or failure of +Operators. All requirements should be documented, measurable, testable, +traceable, related to identified requirements, and defined at a level of +detail sufficient for system design. + +Steps to build the right operator: + +1. If unsure whether to use an operator or not, try to run a feasibility assessment instead. Find plausible and understandable reasons for using an Operator. Contrast the benefits of Operators with the effort required to implement and operate them. + +2. Study existing documentation of your application, interview + responsible system administrators and other stakeholders (if + necessary), get a list of possible system check activities, Business + and SLA-relevant KPI and compare them with existing incident + reports or bug tracking lists. + +3. Describe a concrete scenario (e.g., application failover) in detail + along the lines of "who does what, when, how, and why". + +4. Describe what an Operator needs to know to run the previous scenario + independently, keeping the application in a stable and productive + state. + +### Custom or third-party Operator + +Now that the situations where using an Operator have been made clear, the next part of the paper will focus on where Operator implementations are available and which best meets requirements. + +Finding the right Kubernetes Operator can be a challenge. On the one +hand, you need to find something that fits with the requirements you +have collected. On the other hand, the Operator needs to be regularly +updated and actively supported by the vendor. + +In short, to get an Operator, you have three choices: + +(1) You have a database and need an Operator? Consult the website of the +vendor. + +(2) You can search for a public (or private) registry that offer +available Kubernetes Operators. For example, \[1\] provides a +platform for publishing and sharing Operators in a way that +simplifies distribution. The platform makes it easier to find +supported services and basic documentation. It also identifies +active Operator communities and vendor-supported initiatives. + +(3) Write your own Operator, either from scratch or using a suitable +framework. + +Operators are application specific and their functionality ranges from a +simple installation script to sophisticated logic that handles upgrades, +backups and failures. It takes time and effort to find the right +Operator in a public registry, at the cost of oversized or missing +functionality. In contrast, when writing a custom Operator, there are no +limits to the functionality developers want or need to implement, at the +cost of development and maintenance. + +### Use the right Tool + +After completing and having a complete requirements analysis and +deciding to write a custom Kubernetes Operator, the next question is +which tools developers should use. The article by \[2\] discusses +different approaches to writing Operators and lists the pros and cons of +each solution. The article focuses on one Operator as an example and +uses various techniques and tools. In detail, the author describes the +following tools: + +\(a\) Operator SDK (Helm, Go, Ansible). + +\(b\) Operator framework KOPF (Python) + +\(c\) Bare programming language (Java) + +As mentioned earlier, this article not only describes the individual +tools, but also compares their approaches. The author demonstrates that +the imperative programming approaches require more time, work and +caution during development. In return, they give developers the +flexibility to program any kind of logic that is needed. In contrast, +the declarative approaches (Helm Chart, Ansible) allow the +implementation of Operators in a very simple form, which is precise and +human-readable. + +Best practices of \[2\] are: + +1. If you **already have a Helm chart** for your software and you do + not need any complex capability levels =\> Operator SDK: Helm + +2. If you want **to create your Operator quickly** and you do not need + any complex capability levels =\> Operator SDK: Helm + +3. If you **want complex features** or/and be flexible about any future + implementations =\> Operator SDK: Go + +4. If you want to keep a **single programming language in your + organization** + + a. If a popular Operator Framework exists for your language or/and + you want to contribute to it =\> Operator Framework + + b. If no popular Operator Framework exists for your programming + language =\> Bare Programming Language + +5. If **none of the above** =\> Operator SDK: Go + +### Use the right programming language + +Operators are programs that can be written in any language of choice. +This works because Kubernetes provides a REST API that allows +communication with clients using lightweight protocols such as HTTP. +Consequently, software developers can write Operators in their preferred +programming language as long as long as the REST API specifications are +followed. + +However, if developers are free to choose their programming language, +sooner or later a patchwork of different technologies and languages will +emerge. This will end up increasing costs for maintenance, +troubleshooting, bug fixing and support requests. A better strategy is +to focus on a single programming language and to use it for development +as a team. This greatly supports the collaboration and mutual support in +a team. + +However, according to \[1\], **Operators written in Go Language** are by +far the most popular. The reason for this is two-fold: first, the +Kubernetes environment itself is written in Go, so the client library is +perfectly optimized. Second, the Operator SDK (with embedded +Kubebuilder) supports the implementation of Operators in Go out-of-the-box. +This saves developers a lot of code scaffolding and gives them code generation for +free. + +### Design your Operator the according to your needs + +The last paragraph summarizes an unsorted list of best practices which +were found and published by various sources. + +- Writing an Operator involves using the Kubernetes API. Use a + framework like Operator-SDK to save yourself time with this and get + a suite of tooling to ease development and testing. \[3\] + +- Design an Operator in such a way that application instance continues + to run unaffected and effectively even if the Operator is stopped or + removed. + +- Develop one Operator per application \[4\] + +- Operators should be backward compatible and always understand + previous versions of resources that have already been created. + +- Use asynchronous sync loops \[4\] + +- Operators should leverage built-in Kubernetes primitives such as + replica sets and services. Whenever possible, use well-understood + and well-tested code. + +- When possible, test Operators against a test suite that simulates + potential failures of Pods, configuration, storage, and networking. + +### References + +\[1\] https://operatorhub.io + +\[2\] +https://hazelcast.org/blog/build-your-kubernetes-operator-with-the-right-tool/ + +\[3\] +https://github.com/operator-framework/community-operators/blob/master/docs/best-practices.md + +\[4\] +https://cloud.google.com/blog/products/containers-kubernetes/best-practices-for-building-kubernetes-operators-and-stateful-apps + + +## Emerging Patterns of the Future + +As the popularity of Operators increases, there are new usages and patterns that are challenging the status-quo of best practices and design principles. + +### Operator Lifecycle Management + +With increasing Operator complexity and versioned, distributed controllers; there has been a need for the management and transparency of Operators and their resources. This pattern aids in the reuse of Operators through discoverability, minimal dependencies and declarative UI controls[1]. + +In addition to this, as Operators become increasingly designed to reconcile with certain characteristics toward an anticipated end-state, maintaining the life cycle within the cluster through proper management enables iterations, experimentation and testing of new behaviors. + +### Policy-Aware Operators + +Many Operators have a static set of role based authorizations within a cluster to reconcile resources. +There is ongoing activity to provide operators more dynamic access, based on the behavior they are required to exhibit for reconciling a resource. This might mean a temporary elevation to create a resource directly, or to request that a custom resource definition is loaded into the Kubernetes API server. + +There is precedent for Operators[2] to allow for privileged creation of resources on the behalf of the Operators; extending to new patterns and operating models[3]. Future potential of this pattern would also allow for a policy-engine to control Operator authorization. + +### Operator data modelling + +As Kubernetes adoption continues to grow, the role of Kubernetes operators is also likely to evolve. In the future, we can expect operators to become more intelligent, automated, and integrated with other Kubernetes tools and services. One area where we may see significant developments is in the use of data modelling to support more complex and dynamic applications running on Kubernetes. By using data modelling to define and manage the state of the system, operators can more easily manage and automate complex workflows, applications, and services. This approach can also improve scalability, flexibility, and maintainability, as well as enable more advanced features like auto-scaling and self-healing. In the future, we can expect Kubernetes operators to play a critical role in enabling advanced data modelling and management capabilities for Kubernetes-based applications and services. + +### References + +\[1\] https://olm.operatorframework.io/ + +\[2\] https://github.com/cloud-ark/kubeplus + +\[3\] https://oam.dev/ + +## Conclusion + +Kubernetes operators are a crucial tool for managing complex applications running on Kubernetes. As we have explored in this whitepaper, there are several popular operator frameworks available, each with its own set of strengths and weaknesses. To determine which operator framework is right for your organization, it's important to evaluate your specific needs and consider factors such as development complexity, deployment scalability, and maintenance requirements. + +Looking to the future, we can expect Kubernetes operators to continue to evolve and become even more sophisticated, automated, and integrated with other Kubernetes tools and services. The use of data modelling may play a key role in this evolution, enabling more advanced features such as auto-scaling and self-healing. As Kubernetes adoption continues to grow, it is clear that operators will become an increasingly critical component of Kubernetes-based application development and deployment. By understanding the pros and cons of different operator frameworks and evaluating the specific needs of your organization, you can effectively leverage Kubernetes operators to automate and manage your applications with ease and efficiency. + +## Related Work +Initially, Operators were introduced by a blog post on the CoreOS Blog. This article provides a rough overview of what operators are, why the concept has been developed and how they are built. The insights of this article are mainly used for the definition of operators in this document. As the blog post only provided a concise overview, additional terms as capabilities, security and additional concepts are described more in-depth in this document. + +The Operator Pattern as a concept is described in the Kubernetes documentation and therefore provides an overview of how an example operator could do and provides starting points for writing an operator [1]. + +The Book “Kubernetes Operators” [2] provides a comprehensive overview about operators, which problems they solve and the different methods to develop them. Definitions made in this book flowed into this document. The same applies to the Book “Kubernetes Patterns” (Ibryam, 2019), which provides more technical and conceptual insights to operators. Definitions made in these books were summarized in this document (to provide a common declaration of operators). + +Michael Hausenblas and Stefan Schimanski [3] wrote a book about Programming Kubernetes, which provides deeper insights into client-go, custom resources, but also about writing operators. + +Google provided a blog post about best practices for building Kubernetes Operators and stateful apps. Some of the advisories of this post take place in the best practices section of the whitepaper [4]. + +Many documents describe capability levels (also known as maturity levels) of operators. Since there could be cases where an operator that supports all features that fall on the highest capability level but does not support some lower level features, this document chooses to cover “capabilities” rather than “capability levels”. The capabilities required for each capability level, however, are taken into consideration [5]. + +The CNCF TAG Security spent a lot of effort to add security related topics to this whitepaper. As the content of this whitepaper should mostly cover operator-related security measures, they wrote a cloud native security whitepaper which is a very useful source when dealing with cloud native security [6]. + +### References + +\[1\] https://kubernetes.io/docs/concepts/extend-kubernetes/operator/ +\[2\] Dobies, J., & Wood, J. (2020). Kubernetes Operators. O'Reilly. +\[3\] Michael Hausenblas and Stefan Schimanski, Programming Kubernetes: Developing Cloud-Native Applications, First edition. (Sebastopol, CA: O’Reilly Media, 2019). +\[4\] https://cloud.google.com/blog/products/containers-kubernetes/best-practices-for-building-kubernetes-operators-and-stateful-apps +\[5\] Operator Framework. Retrieved 11 2020, 24, from https://operatorframework.io/operator-capabilities/, +https://github.com/cloud-ark/kubeplus/blob/master/Guidelines.md +\[6\] https://github.com/cncf/sig-security/blob/master/security-whitepaper/cloud-native-security-whitepaper.md + + +## Acknowledgements +This document is a community-driven effort of the CNCF TAG App-Delivery Operator Working Group. Thanks to everyone who contributed to this document, joined discussions and reviewed this document. + +### V1.1 +#### Contributors + +- Alex Jones (github.com/AlexsJones) + +#### Reviewers + +### V1.0 + +#### Contributors + +- Omer Kahani (github.com/OmerKahani) +- Jennifer Strejevitch (github.com/Jenniferstrej) +- Thomas Schuetz (github.com/thschue) +- Alex Jones (github.com/AlexsJones) +- Hongchao Deng (github.com/hongchaodeng) +- Grzegorz Głąb (github.com/grzesuav) +- Noah Kantrowitz (github.com/coderanger) +- John Kinsella (github.com/jlk) +- Philippe Martin (github.com/feloy) +- Daniel Messer (github.com/dmesser) +- Roland Pellegrini (github.com/friendlydevops) +- Cameron Seader (github.com/cseader) + +#### Reviewers + +- Umanga Chapagain (github.com/umangachapagain) +- Michael Hrivnak (github.com/mhrivnak) +- Andy Jeffries (github.com/andyjeffries) +- Daniel Pacak (github.com/danielpacak) +- Bartlomiej Plotka (github.com/bwplotka) +- Phil Sautter (github.com/redeux) +- Roberth Strand (github.com/roberthstrand) +- Anais Urlichs (github.com/AnaisUrlichs) diff --git a/website/content/ja/wgs/operator/whitepaper/plantuml/backup-sequence.plantuml b/website/content/ja/wgs/operator/whitepaper/plantuml/backup-sequence.plantuml new file mode 100644 index 00000000..82087345 --- /dev/null +++ b/website/content/ja/wgs/operator/whitepaper/plantuml/backup-sequence.plantuml @@ -0,0 +1,15 @@ +@startuml + +skinparam monochrome true + +participant Initiator +participant Operator +participant Application +participant "Custom Resource" + +Initiator -> Operator: trigger backup process +Operator -> Application: ensure consistent state +Operator -> Application: backup data and save it to an external storage +Operator -> "Custom Resource": write backup state and location + +@enduml diff --git a/website/content/ja/wgs/operator/whitepaper/plantuml/backup-sequence.png b/website/content/ja/wgs/operator/whitepaper/plantuml/backup-sequence.png new file mode 100644 index 00000000..e3564ed1 Binary files /dev/null and b/website/content/ja/wgs/operator/whitepaper/plantuml/backup-sequence.png differ diff --git a/website/content/ja/wgs/platforms/_index.md b/website/content/ja/wgs/platforms/_index.md new file mode 100644 index 00000000..f126a345 --- /dev/null +++ b/website/content/ja/wgs/platforms/_index.md @@ -0,0 +1,26 @@ +--- +title: プラットフォームWG +list_pages: true +--- +# Platforms WG + +The [charter](./charter) describes the mission and tactics of the Platforms working group (WG). +To participate join us on Slack at +[#wg-platforms](https://cloud-native.slack.com/archives/C020RHD43BP) +or the meetings described below. + +## Chairs + +* Josh Gavant (@joshgav) +* Roberth Strand (@roberthstrand) +* Abby Bangser (@abangser) + +## Meetings + +* Meeting schedule: 2nd and 4th Tuesday of each month at [1600 UTC](https://www.timeanddate.com/worldclock/converter.html?iso=20221213T160000&p1=1440) + * [2nd Tuesday event](https://calendar.google.com/calendar/u/0/r/week/2022/12/13?eid=MDAxZmVpMGE5aDc3a283dGd2Y2YwcnZuYTFfMjAyMjEyMTNUMTYwMDAwWiBsaW51eGZvdW5kYXRpb24ub3JnX281YXZqbHZ0MmNhZTlicTdhOTVlbWM0NzQwQGc) + * [4th Tuesday event](https://calendar.google.com/calendar/u/0/r/week/2022/12/27?eid=NGhyOHY1ZWVrbDliODY3bXU5ZnRtYWo0ZGdfMjAyMjEyMjdUMTYwMDAwWiBsaW51eGZvdW5kYXRpb24ub3JnX281YXZqbHZ0MmNhZTlicTdhOTVlbWM0NzQwQGc) + * [Full CNCF calendar](https://calendar.google.com/calendar/u/0/embed?src=linuxfoundation.org_o5avjlvt2cae9bq7a95emc4740@group.calendar.google.com) +* Zoom: https://zoom.us/j/7276783015?pwd=R0RJMkRzQ1ZjcmE0WERGcTJTOEVyUT09 + * Passcode: 77777 +* Agendas and notes: diff --git a/website/content/ja/wgs/platforms/charter/_index.md b/website/content/ja/wgs/platforms/charter/_index.md new file mode 100644 index 00000000..0fd0b5cf --- /dev/null +++ b/website/content/ja/wgs/platforms/charter/_index.md @@ -0,0 +1,76 @@ +--- +title: プラットフォームWG憲章 +description: この憲章はプラットフォームWGのミッションと戦術について述べます。 +--- + +## Problem Statement +In most app-delivery scenarios, the packaging format and delivery mechanism of the application artifacts are targeted, but not necessarily the app's infrastructure dependencies such as data stores and message queues. That is, application and infrastructure delivery are not coordinated. Often, applications are heavily dependent on infrastructure resources that are not directly linked to a specific deployment, and therefore problems with non-existing infrastructure resources might cause deployments to fail. In addition to this, the application and infrastructure lifecycles are not synchronized, creating additional complexity and challenges when delivering workloads. + +Example: +* Developers using a storage class for testing an application locally that is not available in higher environments. +* Deployment of an application workload is a separate approval process and journey as to deploying the necessary ingress to route traffic to that service. +* Delivering complex micro-service architecture and a sustainable GitOps deployment pattern are treated as separate but related workflows. + +Setup of a CI/CD pipeline and its constituent parts to enable the delivery of applications in coordination with infrastructure is part of institutional knowledge and not easily transferable across organization, system nor domain boundaries. + +Additionally, it is often assumed that infrastructure is always available when dealing with application delivery. At some point, it might be useful that infrastructure could also be deployed by GitOps mechanisms and/or using the deployment pipeline. Therefore, there is always a cut between the infrastructure deployment and the application delivery/deployment tooling, which might lead to deployment problems and misconfigurations. + +Currently, it seems that there are no ubiquitous best practices or recommendations for these use cases and no standard for declaring an entire application in a platform-agnostic way. + +There are projects like Crossplane and Terraform primarily used for provisioning infrastructure and other tools like Argo, Flux and Keptn for provisioning applications on that infrastructure . There are also emerging projects like OAM and Dapr to abstract infrastructure and "inject" it automatically on behalf of apps, but these have not (yet) been broadly adopted by implementation providers, i.e. public cloud platforms. This working group should clarify what application and infrastructure coordination could look like and develop best practices for such cases. + +### What combinations of tooling are available out there that aim to resolve this problem, and what are the gaps encountered? + +This section aims to propose a few combinations of currently existing solutions, evaluate their pros and cons, and what bits are missing (taking the example above as a first use case). + +![App Enablement Tooling](img/charter_app_enablement.png) + +### Examples of known patterns aimed to deploy infrastructure: + +Without any claim to comprehensiveness: +* Terraform +* Crossplane (Do we need a sentence covering their primary purpose?) +* ACK +* Pulumi +* CDK +* Open Service Broker (https://www.openservicebrokerapi.org/) + +### Examples of known patterns aimed to deploy applications: + +#### GitOps +This pattern is built on top of existing tools like: helm, kustomize or raw yaml and automate applying changes from source of truth in Git repo to the Cluster. ArgoCD and Flux are some examples of projects that implement this pattern. + +#### Application Operator +This pattern is intended to enhance native Kubernetes resources like Deployment to deploy applications and manage their lifecycle. One of the critical features is canary deployment that requires coordination with Ingress or Service Mesh and domain knowledge about the application. ArgoRollout and Flagger are examples of projects that implement this pattern with these advanced features. + +#### Declarative Pipelines +This pattern allows you to build a declarative pipeline that encapsulates some workflow or process. This could be part of the CI process to build an artifact, test or ensure it meets security controls. They can also be used in the CD lifecycle in deploying an artifact and running custom tests or implementing custom canary logic. Declarative pipeline can be thought of as a [directed graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph) with decision points to move to the next point in the graph. The decision could be automated or require manual approval tied to a notification system. Projects like Keptn and ArgoWorkflow are examples of solutions that enable building declarative pipelines. + +#### Composition Operator +The Kubernetes CRD to representation for some resources is complex and at times has dependency on other CRDs. There is also a need for separation of concern between an application and operation teams, especially when it applies to infrastructure resources. This pattern allows you to compose resources using a number of other CRDs as building blocks and simplify or hide the number of parameters that must be provided by the top level CRD. The CrossPlane XRD/XRC and KubeVela are examples of projects that provide this pattern. + +#### Infrastructure Operator +There is a need to declare cloud provider infrastructure resources as CRDs so that they can be integrated as part of application deployment lifecycle on Kubernetes. The Infrastructure Operator pattern allows CRDs to be applied to the Cluster and the Operator implements the necessary logic to converge requests with desired state. The Operator integration to the cloud provider can take many forms, among them native cloud provider APIs, Cloud Formation Templates or TerraForm among them. CrossPlane Provider, AWS Controller for Kubernetes (ACK) are examples of projects that provide this pattern. + +## Alignment with TAG App Delivery Charter +As application delivery is often coupled to the underlying infrastructure (when thinking of services like external databases, message queues,...), this can often impact package formats and the application delivery workflow. This topic would handle the “Application definition, including description, parameter and configuration”, “Application bundling and deployment”, and “Application delivery workflow and strategy” topics of the TAG charter. As these should be done using configuration-driven tools (“GitOps”), this also touches the “Configuration source driven workflow” area. + +## Working mode / expected outcome +The group discusses concepts, plans and develops a demo infrastructure (as code) to handle these use cases (e.g. App-Ready-Platform as code). This might be implemented using different tools (link to landscape) and could be a blueprint for end users. Furthermore, the validated best practices might then be documented in a white paper. + +## Goals +_Focusing on the key stakeholder, who in this scenario is an engineer potentially with a CKA/CKAD looking to enable the delivery of an application workload on cloud infrastructure._ + +* Vendor and End-user interviews. +* Focused questions that help identify successes and frustrations of products. +* Capturing the current practices and the landscape. +* Landscape radar +* Provide interoperability examples between IaC and CD tools in the Podtato-head project. +* Give end-users ideas and examples of how they could integrate application and infrastructure deployment. +* Provide patterns in a white paper based on practical work and how end-users are implementing them. Present practices and trends seen occurring within the industry that would be valuable to highlight to end-users. + +## Non-Goals +* Creating a new type of standard +* An opinion on how to build microservice applications or cloud-native architecture. +* Defining how deployments should be done. +* Creation of a new CNCF open source project. \ No newline at end of file diff --git a/website/content/ja/wgs/platforms/glossary/_index.md b/website/content/ja/wgs/platforms/glossary/_index.md new file mode 100644 index 00000000..2d3cd5af --- /dev/null +++ b/website/content/ja/wgs/platforms/glossary/_index.md @@ -0,0 +1,53 @@ +--- +title: 用語集 +description: "プラットフォームWGが発表した論文等で使用される用語の一覧です。" +--- + +See also . + +## Platform +A platform aggregates capabilities to serve developers and operators in +development and delivery of products, services and apps. In reference to the +scenarios it aims to support, a platform may be named a "Developer Platform", a +"Delivery Platform", an "App Platform" or even a "Cloud Platform." The +connotations of the older term "Platform-as-a-Service", or PaaS, are also +influential. + +## Platform capability providers +The projects and systems that provide the core capabilities offered by the platform. +Providers can be maintained by either external organizations or internal teams, +and capabilities can be infrastructure, runtime, or other supporting services. + +## Platform engineering +The practice of building and maintaining shared platforms and capabilities and presenting them to end users. Emerging from the DevOps movement's goal of cooperation between application developers and operators, platform engineering proposes using common platforms as the foundation of that cooperation. The decision to use shared platforms impacts not only the technology but also the people, processes, policies and expected business outcomes at an organization. + +## Platform engineers +The role focused on developing and +maintaining interfaces and tools to enable provisioning and integration of platform +capabilities in applications, according to the requirements and instructions provided +by platform product managers. Platform developers are usually grouped in platform teams. + +## Platform product managers +The role chiefly responsible for understanding the experience of +platform users, building a roadmap that addresses platform product gaps, requirements, +and opportunities, and managing platform teams as a part of daily work. + +## Platform teams +A cross-functional team that develops and maintains interfaces to and experiences with +platform capabilities - like Web portals, custom APIs, and golden path templates. + +Platform teams are managed by platform product managers and involve +platform developers. As the platform evolves and become more advanced, other roles +can become part of a platform team, including, but not limited to, operators, +QA analysts, UI/UX designers, technical writers, developer advocates. + +## Platform users +The target audience for a platform which includes but is not limited to app developers and operators, data +scientists, COTS software operators, and information workers - whoever runs +software on the platform or uses platform provided capabilities. + +## Thinnest Viable Platform (TVP) {#tvp} +A concept originally defined in the book *Team Topologies* +by Matthew Skelton and Manuel Pais. The definition says: "A TVP is a careful balance between +keeping the platform small and ensuring that the platform is helping to accelerate and simplify +software delivery for teams building on the platform." diff --git a/website/content/ja/wgs/platforms/platforms-maturity-model/README.md b/website/content/ja/wgs/platforms/platforms-maturity-model/README.md new file mode 100644 index 00000000..90fd8500 --- /dev/null +++ b/website/content/ja/wgs/platforms/platforms-maturity-model/README.md @@ -0,0 +1,10 @@ +# Platform Engineering Maturity Model + +CNCF TAG App Delivery maintains and publishes this paper describing how +organizations can review and grow their platform engineering maturity. + +v1 was completed in October 2023; improvements and iterations continue in the +[`latest`](./latest/) directory. + +Currently, the edition in the `v1` directory is published to TAG App Delivery's website at +. diff --git a/website/content/ja/wgs/platforms/platforms-maturity-model/v1/assets/adoption-curve.jpg b/website/content/ja/wgs/platforms/platforms-maturity-model/v1/assets/adoption-curve.jpg new file mode 100644 index 00000000..d5e938a7 Binary files /dev/null and b/website/content/ja/wgs/platforms/platforms-maturity-model/v1/assets/adoption-curve.jpg differ diff --git a/website/content/ja/wgs/platforms/platforms-maturity-model/v1/assets/platform-eng-maturity-model-v1.0.pdf b/website/content/ja/wgs/platforms/platforms-maturity-model/v1/assets/platform-eng-maturity-model-v1.0.pdf new file mode 100644 index 00000000..2a9ab452 Binary files /dev/null and b/website/content/ja/wgs/platforms/platforms-maturity-model/v1/assets/platform-eng-maturity-model-v1.0.pdf differ diff --git a/website/content/ja/wgs/platforms/platforms-maturity-model/v1/contributions.md b/website/content/ja/wgs/platforms/platforms-maturity-model/v1/contributions.md new file mode 100644 index 00000000..ba9c053f --- /dev/null +++ b/website/content/ja/wgs/platforms/platforms-maturity-model/v1/contributions.md @@ -0,0 +1,15 @@ +We are grateful for the many people who spent their time and energy reviewing or contributing to each version of this document. + +This list is meant to thank these individuals but is in no way indicative of their endorsement or complete agreement with the final contents. + +Many apologies if this list is inaccurate in any way. If anyone feels they have been left off or would prefer to be removed please reach out. + +[Version 0.0.1](https://docs.google.com/document/d/1dXx5wJm_vfq3hXRr1kPOEp3g3W0UgBkynjyK-OBuJyM/edit) reviewers: Abby Bangser, Abby Kearns, Abdur Rahman Mungul, Adrian Cockroft, Colin Humphreys, Daniel Bryant, Edward (Ted) Newman, Kief Morris, Paula Kennedy, Manuel Pais, Michael Coté, Nicki Watt, Sam Newman + +[Version 0.1.0](https://docs.google.com/document/d/1bP8-LQ-d41eIdQB3IC2YsncDhawpFLggql2JxwtE0XI/edit) reviewers: Abby Bangser, Abdur Rahman Mungul, Adam Gardner, Areti Panou, Asare Nkansah, Atulpriya Sharma, Colin Griffin, John Dietz, John Gardner, Josh Gavant, Kirstin Slevin, Luca Acquaviva, Marsh Gardiner, Michael Kestigian, Nadav Cohen, Nicki Watt, Niklas Beinghaus, Ram Iyengar, Rick Osowski, Rogerio Angeliski, Saim Safdar, Simon Forster, Tsahi Duek, Victor Lu, Viktor “Bika” Nagy, Vishal Biyani + +[Version 0.2.0](https://docs.google.com/document/d/11J_RpaUwydNNBg5aVjH5Uzn8i-b4urEtJzwR86XezFQ/edit) reviewers: Abby Bangser, Asare Nikansah, Atulpriya Sharma, Colin Griffin, John Gardner, Josh Gavant, Kirstin Slevin, Marsh Gardiner, Matt Menzenski, Puja Abbassi, Puneet Kandhari, Roberth Strand, Saim Safdar, Tsahi Duek, Victor Lu, Vijay Chintha, Vishal Biyani + +[Version 0.3.0](https://docs.google.com/document/d/1yhvT1dZ78JQyKs3Kb64V098XgIFG0N3IXAAX6O67Ju0/edit) reviewers: Abby Bangser, Asare Nkansah, Bob Hong, Bruno Dias, Colin Griffin, Daniel Bryant, Josh Gavant, Marsh Gardiner, Matt Menzenski, Nicki Watt, Ramanujan Iyengar, Roberth Strand, Saim Safdar, Tsahi Duek + +Version 1.0.0 reviewers: Abby Bangser, Antoine Bermon, Atulpriya Sharma, Blake Romano, Bruno Dias, David Sandilands, Jennifer Riggins, Josh Gavant, Karena Angell, Kirstin Slevin, Marsh Gardiner, Matt Menzenski, Puja Abbassi, Roberth Strand diff --git a/website/content/ja/wgs/platforms/platforms-maturity-model/v1/index.md b/website/content/ja/wgs/platforms/platforms-maturity-model/v1/index.md new file mode 100644 index 00000000..04320a01 --- /dev/null +++ b/website/content/ja/wgs/platforms/platforms-maturity-model/v1/index.md @@ -0,0 +1,501 @@ +--- +title: "プラットフォームエンジニアリングの成熟度モデル" +pdf: https://github.com/cncf/tag-app-delivery/raw/main/platforms-maturity-model/v1/assets/platform-eng-maturity-model-v1.0.pdf +version_info: https://github.com/cncf/tag-app-delivery/tree/main/platforms-maturity-model/README.md +description: "この成熟度モデルは、プラットフォームホワイトペーパーにおいて議論されているパターンを採用しようとするユーザーに、戦略的なガイドを提供することを目的としています。ホワイトペーパーは、何を、なぜ構築するべきかを提案しますが、このドキュメントはそれを構築するためにどのように計画するべきかを解説します。対象読者は、現在の状態や環境を評価し、改善の機会を見つけたいと考えているCTO、エンジニアリングディレクター、リードエンジニア、アーキテクトです。

+この文書は、以下の関連文書を参照し、発展させ、同様の基準に従います。
+[クラウド成熟度モデル](https://maturitymodel.cncf.io/)
+[プラットフォームホワイトペーパー](https://tag-app-delivery.cncf.io/whitepapers/platforms/)" +type: whitepapers +--- + + + + +## Introduction + +CNCF's initial [Platforms Definition white paper](https://tag-app-delivery.cncf.io/whitepapers/platforms/) describes what internal platforms for cloud computing are and the values they promise to deliver to enterprises. But to achieve those values an organization must reflect and deliberately pursue outcomes and practices that are impactful for them, keeping in mind that every organization relies on an internal platform crafted for its own organization - even if that platform is just documentation on how to use third party services. This maturity model provides a framework for that reflection and for identifying opportunities for improvement in any organization. + +## What is platform engineering? + +Inspired by the cross-functional cooperation promised by DevOps, platforms and platform engineering have emerged in enterprises as an explicit form of that cooperation. Platforms curate and present common capabilities, frameworks and experiences. In the context of this working group and related publications, the focus is on platforms that facilitate and accelerate the work of [internal users]({{< ref "/wgs/platforms/glossary#platform-users" >}}) such as product and application teams. + +[**Platform engineering**]({{< ref "/wgs/platforms/glossary#platform-engineering" >}}) is the practice of planning and providing such computing platforms to developers and users and encompasses all parts of platforms and their capabilities — their people, processes, policies and technologies; as well as the desired business outcomes that drive them. + +Please read the [CNCF Platforms Definition white paper](https://tag-app-delivery.cncf.io/whitepapers/platforms/) first for complete context. + +## How to use this model + +As platform engineering has risen in prominence over the last few years, some patterns have become apparent. By organizing those patterns and observations into a progressive maturity model, we aim to orient [platform teams]({{< ref "/wgs/platforms/glossary#platform-teams" >}}) to the challenges they may face and opportunities to aim for. Each aspect is described by a continuum of characteristics of different teams and organizations at each level within the aspect. We expect readers to find themselves in the model and identify opportunities in adjacent levels. + +Of note, each additional level of maturity is accompanied by greater requirements for funding and people's time. Therefore, reaching the highest level should not be a goal in itself. Each level describes qualities that should appear at that stage. Readers must consider if their organization and their current context would benefit from these qualities given the required investment. + +Keep in mind that each aspect is meant to be evaluated and evolved independently. However, as in any socio-technical system these aspects are complex and interrelated. Thus you may find that to improve in one aspect you must reach a minimum level in another aspect too. + +It's also important to recognize that implementations of platforms vary from organization to organization. Make sure to evaluate the current state of _your_ group’s overall cloud native transformation. A phenomenal resource to leverage for this evaluation is the [Cloud Native Maturity Model](https://maturitymodel.cncf.io/). + +Finally, this model encourages organizations to mature their platform engineering discipline and their resulting platforms through intentional planning. Such planning and discipline themselves are a requirement for mature platform development and ongoing evolution. + +In general, keep in mind that mapping your organization into a model captures current state _to enable_ progressive iteration and improvement. [Martin Fowler](https://martinfowler.com/bliki/MaturityModel.html) says it well: "The true outcome of a maturity model assessment isn't what level you are at but the list of things you need to work on to improve. Your current level is merely a piece of intermediate work in order to determine that list of skills to acquire next." In that vein, seek to find yourself in the model then identify opportunities in adjacent levels. + +## Context behind this work + +It's valuable to understand the context a document has been written in. The following sections lay out some context behind the model as well as some expectations for you, the reader. + +### Intended audiences + +Each reader brings a unique context and will take unique learnings from this model. Following are some personas we have in mind, along with their possible motivations for engaging with this model: + +* **CTOs, VPs, and directors of technology**: Leaders looking to map a path to digital transformation and greater developer productivity +* **Engineering managers**: Groups and individuals seeking to empower engineers to provide value with less overhead and higher efficiency +* **Enterprise architects**: Individuals navigating the modern technology landscape who seek a value- and solution-oriented perspective on technology problems +* **Platform engineers and platform product managers**: Teams and people seeking to build the best possible experience for platform builders and platform users +* **Product vendors and project maintainers**: Organizations and engineers wishing to design tools and deliver messages to enable users to succeed with platforms and capabilities +* **Application and product developers**: Platform users seeking to understand in more detail what they might expect of an internal platform + +### Understanding the levels + +This model is not meant to classify an organization or platform team as wholly “Level 1” or “Level 4.” Each aspect should be considered independently of the others; the characteristics of each level represent a continuum within that aspect but are not necessarily coupled to other aspects at the same level. Even more so, many organizations will see characteristics of more than one level being applicable across their teams and work. This is because no level is inherently good or bad, only contextual to the team’s goals. + +The labels for each level are intended to reflect the impact of platform engineering at your organization. As you recognize your organization at a given level you will gain insight into opportunities which follow at the next ones. Lower-numbered levels comprise more tactical solutions while higher-numbered ones are more strategic. + +This yields a potential process for platform development and maturity similar to other digital product development: first recognize a problem and need for a new solution, next develop minimally-viable products as hypothesized solutions, third iterate to better solve the problem and ensure fit for your customers and finally scale and optimize the product to solve the problem for many teams and users. + +Similar to the [CNCF Cloud Native Maturity Model](https://maturitymodel.cncf.io/), this model highlights that successful business outcomes can only be achieved through balancing people, process, and policy alongside technology. Notably, this model introduces aspects which are often not fully in the remit of a single internal team, but rather require cooperation across the engineering department and quite often the wider organization. + +### But it doesn't seem to fit + +That’s perfectly fine! All organizations and groups have dynamics and parameters that are specific to them. + +Keep in mind that the goal of this paper isn’t to prescribe a rigid formula, but rather a framework that you can apply to your circumstances. Every single word may not be relevant to you, but we hope the content will inspire you to introspect on your own platform journey, taking what makes sense and leaving the rest. + +The objective of this model is to provide a tool to help guide platform engineering practitioners, stakeholders, and other interested parties on their journeys. Platform design and implementation is not an exact science, but rather depends on the needs of an individual project, an organization and a particular time and place. + + +## Model table + +|
Aspect
| | Provisional | Operational | Scalable | Optimizing | +|:---------------------------------------|:-------------------------------------------------------------------------------------------|:-----------------------|:----------------------|:-----------------------|:-----------------------------| +| [Investment](#Investment) | _How are staff and funds allocated to platform capabilities?_ | Voluntary or temporary | Dedicated team | As product | Enabled ecosystem | +| [Adoption](#Adoption) | _Why and how do users discover and use internal platforms and platform capabilities?_ | Erratic | Extrinsic push | Intrinsic pull | Participatory | +| [Interfaces](#Interfaces) | _How do users interact with and consume platform capabilities?_ | Custom processes | Standard tooling | Self-service solutions | Integrated services | +| [Operations](#Operations) | _How are platforms and their capabilities planned, prioritized, developed and maintained?_ | By request | Centrally tracked | Centrally enabled | Managed services | +| [Measurement](#Measurement) | _What is the process for gathering and incorporating feedback and learning?_ | Ad hoc | Consistent collection | Insights | Quantitative and qualitative | + +## Model Detail + +
+{{< tabs tabTotal="6">}} +{{< tab tabName="Investment" >}} + +

How are staff and funds allocated to platform capabilities?

+ +Investment in platforms and platform engineering is the process of allocating budget and people to build and maintain common capabilities. It is common for initiatives to be described as organically built from the bottom up, or driven through top down initiatives. In either case, it is the ability to invest sustained effort that drives high-impact work. This aspect captures how the scale and breadth of investment can impact platform success. + +### Level 1, Provisional — Voluntary or temporary + +Individual capabilities may exist to provide common foundations for common or critical functionality. These capabilities are built and maintained out of necessity rather than planned and intentionally funded. + +These capabilities are built and maintained by people assigned temporarily or voluntarily; no central funding or staffing are intentionally allocated to them. They depend on the current tactical requirements of their users. + +#### Characteristics: + +* "Hit" or "tiger" teams are built to tackle urgent requirements. These teams are short lived and not assigned nor granted the time to provide long term planning and support. +* Migrations, improvements, or enhancements are often considered "nice to have" work items and rely on "research" or "hack day" efforts. +* Process improvements or automation may be introduced while tackling a new requirement such as an urgent security patch, however there is not support to build the solutions in a reusable or sustainable way. +* Employees complain of burn out and frustration with the amount of work they are doing outside their core role. + +#### Example Scenarios: + +* There is a specific employee who is viewed as the test environment expert. While this employee means well, their attempt to enable better test environments despite limited investment has led to increased risk since there is no maintenance of their solution and no shared understanding of how to triage a broken test environment. +* Engineers are encouraged to invest in capability improvements when there is no pressure from management for revenue generating features. This translates to the last few days of some sprints where they prioritize automating and improving parts of their CI/CD pipeline. It is not uncommon for these improvements to come in bursts as there can be months of overly full sprints not allowing for time on these side endeavors. + +### Level 2, Operationalized — Dedicated team + +Budget and people are allocated for persistent people and resource support. The assigned people are tasked with providing a set of commonly-required capabilities to speed up software delivery. Often these teams focus on meeting reactive technical requirements. They may be called DevOps, Engineering Enablement, Developer Experience (DevEx or DevX), Shared Tools, a Centre-Of-Excellence, or even Platform. They're funded centrally and treated as cost centers; their impact on direct value streams and application teams is not measured. It can be hard to map the impact of platform teams at this level on the organization and its value streams, which can make it hard to sustain and continue funding such teams. + +#### Characteristics: + +* The team is made up of nearly all technical generalists. +* Team budget may include the infrastructure costs associated with their work leading to often being a key point in budget conversations. +* Backlog items range a number of technologies, leading to frequent and large context switches. +* This team is often the first to fill a gap that is not yet being addressed, even if not in the declared scope for the team. This team takes ownership of resources that don't have an owner. +* Assigned people rarely have the time or experience with customer research to validate their designs or implementations. + +#### Example Scenarios: + +* Application developers raise an issue with the long build time for their applications. A centralized team is tasked with reducing the build time by 50%. They solve this by doubling the size and quantity of the CI runners given they are not close enough to the software to individually improve the application builds. This creates a budget concern for their centralized team as the productivity gain is not directly measurable against this increased infrastructure cost. + +### Level 3, Scalable — As product + +Investment in internal platforms and their capabilities is similar to investment in an enterprise's outbound products and value streams: based on the value they are expected to provide to their customers. Product management and user experience are explicitly considered and invested in. A chargeback system may be used to reflect platforms' impact on their customers' own direct value streams and products. The enterprise allocates funds and staff to the appropriate initiatives by using data-driven performance indicators and feedback loops. Platform teams can ultimately optimize the business itself and contribute to increased profitability. + +#### Characteristics: + +* Platform teams staff roles not traditionally found in internal serving or technical teams, for example, product management and user experience. +* The team publicizes a roadmap internally to the organization, which indicates the value delivered and high level feature targets. +* Features are tested for both implementation quality and user experience during design, delivery, and post deployment. +* Feature removal is a key part of the conversation, the goal is to have a well supported, well used suite of capabilities instead of a sprawling estate that may not be maintained. + +#### Example Scenarios: + +* Data derived from platform usage metrics inform decisions to allocate funds and staff to the most impactful initiatives. + +### Level 4, Optimizing — Enabled ecosystem + +Platform teams find ways to increase organization-wide efficiency and effectiveness beyond basic capabilities. Core platform maintainers intentionally strive to optimize time-to-market for new products, reduce costs across the enterprise, enable efficient governance and compliance for new services, scale workloads quickly and easily, and other cross-cutting requirements. These core maintainers are focused on enabling capability specialists to seamlessly integrate their requirements and offerings into existing and new parts of platforms. Further, the organization focuses people and resources from specialist domains like security, performance, quality on engaging with provided platform frameworks to introduce advanced features that can enable product teams to accelerate their adherence to company goals without depending on a centralized team backlog. + +#### Characteristics: + +* It becomes a priority to enable specialists to extend platform capabilities and introduce new ones. +* The organization can centralize specialists allowing their knowledge and support to be spread through platform capabilities. + +#### Example Scenarios: + +* Marketing works with platform builders to introduce consistent user tracking in order to attribute marketing efforts to product outcomes. +* Automation initiative reduces human time to provision databases by 30 minutes per instance, saving $10m/year. + +{{< /tab >}} +{{< tab tabName="Adoption" >}} + +

Why and how do users discover and use internal platforms and platform capabilities?

+ +Adoption describes not only how and how much an organization uses platform capabilities, but also what motivates them to do so. In the early stages, many target users may not realize they are using a platform at all, rather they see their tools as an ad hoc collection of capabilities from various internal sources. This may mature into a group of capabilities that is consistently managed and presented to users — that is, one or more platforms. As the capabilities become more refined and discoverable, it is common that the drive for platform use moves away from more external motivations like mandates or incentives. This leads to users self-selecting into platform capabilities and ideally even investing their own efforts into the wider platform ecosystem. + +
+ +
+
+A diagram to indicate a common growth pattern for platform adoption. This showcases the often slow start driven mainly by platform builders. Once platforms provide enough value to users, growth becomes more pulled by the users causing a steeper adoption curve. +
+
+
+
+ +### Level 1, Provisional — Erratic + +Adoption of shared platforms and capabilities is sporadic and inconsistent. No organization-wide strategy or guidance exists for choosing and integrating required backing services and technologies. Individual teams might leverage platform practices to improve their own processes, but there is no coordinated effort or standardization across the organization. This level of adoption is characterized by the absence of a coherent approach and the idea that external tools are more effective than those provided internally. + +#### Characteristics: + +* One-off tools, services, and capabilities are managed by and consumed by various teams and departments in the organization. +* Provider-managed (aka "cloud") services are adopted and used inconsistently and without standard practices and policies, as internal configurations are hard to find or use. +* App and service teams discover tools and capabilities haphazardly, via rumors and chance conversations rather than through a more centralized process. +* Coordination and reuse of components and capabilities is driven only by end users (application teams), if at all. +* Product teams each maintain their own set of scripts or tools to deploy their applications. + +#### Example Scenarios: + +* A banking service requires a database. A developer finds out from a friend on another team that they can request an AWS account and set up an RDS database. From another team they find a Terraform script to provision that database. For monitoring they use CloudWatch on an ad hoc basis; they copy secrets from the AWS console to an instance of Hashicorp Vault manually before running the Terraform script. + +### Level 2, Operationalized — Extrinsic push + +The organization recognizes the value of shared platforms and capabilities and strives to encourage and nurture them. Internal directives incentivize or even require use of shared platform services for some use cases. Some product teams use platform capabilities more than others; capabilities cover typical use cases in the organization but not unusual ones; and it is difficult to add those outliers to the common platform. + +User discovery of capabilities and how to use them is inconsistent; it is possible a user on a product team won't discover a supported capability unless directed there by a platform team. + +#### Characteristics: + +* Some degree of external impetus leads to use of platform capabilities, for example: + * Incentives such as personal reviews + * Mandates such as requiring use for production releases or receiving funding +* The utilization of platform capabilities is fragmented — users may take advantage of one capability but might not be aware of, or interested in adopting, others that are available. +* Users have low motivation to learn how to use platform capabilities and rely heavily on collaboration with the providers through forums like office hours or help desk. +* Platform users are encouraged to join informal communities of practice to share problems and solutions but attendance may be limited. + +#### Example Scenarios: + +* An engineering organization decides on a standard deployment tool and instructs all teams to use it. New processes (communication of release notes, etc) are built around that standard. Teams are instructed to stop using other sorts of deployment scripts and use the common tool instead. This is difficult for some teams whose needs are not met by the new process but do not understand or are not allowed to extend it. + +### Level 3, Scalable — Intrinsic pull + +Users on product and service teams choose to use platforms and their capabilities because of the clear value they provide in reducing cognitive load on product teams while providing higher quality supporting services. Documentation and ergonomic interfaces enable product team users to quickly provision and use platform capabilities. Users choose internal platform implementations over alternatives such as developing the capability themselves or hiring a provider. + +#### Characteristics: + +* Platform adoption is self-sustaining –The primary driver for core adoption is not an external impetus or incentive which mandates users use platform offerings – rather it is the values of these platform offerings themselves which draws users to them. +* After using and appreciating one or some platform capabilities, users seek out others and find the experience is similar across capabilities. There is an expectation that an individual capability is not isolated, rather it is one feature among a larger platform feature set. +* Platform teams encourage the natural adoption of platforms by gathering user feedback, sharing roadmaps and maintaining open forums for conversation with users. +* Application and product teams value platform capabilities enough to pay for them, e.g., via a chargeback system. +* Users can share feedback and learn about upcoming features through open forums and shared roadmaps. +* Self-serve portals, golden-path templates, and other documents enable rapid use. + +#### Example Scenarios: + +* An application team previously had success requesting a new database. Their process was easy to understand and required almost no waiting time. In addition, key capabilities like backups and monitoring that allowed the team to progress their use all the way to production without issue were included. This experience meant that when the team later needed a queue, their first instinct was to check for an internal platform option. While they originally intended to use a specific queue technology, in the end, they chose to use the one offered internally since they knew how well integrated the solution would be for their organization. + +### Level 4, Optimizing — Participatory + +Users from product teams further invest in platform capabilities by joining the ecosystem and contributing back to it. Some contributions improve and fix existing capabilities; others introduce new capabilities and features to address new use cases. Processes and services are defined and enable users to identify requirements and coordinate contributions amongst several product and platform teams. New capabilities are published via consistent interfaces and portals and with complete documentation and standard versioning. + +#### Characteristics: + +* Users in app/service teams are empowered to contribute fixes, features, and feedback for platform capabilities. +* External projects and standards are strategically leveraged to reduce maintenance costs, accelerate new feature delivery, and use organization headcount most effectively. +* New capabilities and enhancements are coordinated asynchronously through issue boards and pull requests. Documents and checklists enable self-driven development by contributors. +* Developer advocates and internal ambassadors build and support an internal user community that extends platform ownership to app and service team contributors, too. +* Use of platform capabilities is viewed as the best way of working at the organization by both leadership and individual contributors. +* Platform engineers participate in product team planning to learn of requirements and suggest relevant existing capabilities. + +#### Example Scenarios: + +* A team wants an alternative backup plan. After proposing this as a general offering, it is deemed low priority due to minimal reuse. The proposing team chooses to integrate their solution into the platform framework and make it available to the organization. It is originally an alpha offering but once it meets all of the operational requirements can be promoted to a core platform capability. + +{{< /tab >}} +{{< tab tabName="Interfaces">}} + +

How do users interact with and consume platform capabilities?

+ +The interfaces provided by platforms affect how users interact with these platform offerings to provision, manage, and observe capabilities. Interfaces can include ticketing systems, project templates, and graphical portals as well as automatable APIs and command-line (CLI) tools. + +Key characteristics of an interface include how discoverable and user-friendly it is during key user journeys like initial request, maintenance, or incident triage. Higher levels of maturity here reflect more integrated, consistent, automated, and supported interfaces. + +### Level 1, Provisional — Custom processes + +A collection of varying processes exists to provision different capabilities and services, but the consistency of the interface is not considered. Custom tailor-made processes address the immediate needs of individuals or teams and are reliant on manual intervention even if the provider uses some automated implementation scripts. + +Knowledge of how to request these solutions is shared from person to person. The process for requesting a service lacks standardization and consistency. Provisioning and using a platform service likely requires deep support from the capability provider. + +Lack of central requirements and standards makes this level appropriate when the company has not yet identified and documented expectations. It can be particularly effective for teams at early stage companies or platform efforts. In these environments teams are provided the freedom to evolve processes and capabilities to their needs, allowing them to deliver more quickly and pay the price of standardization only when necessary later on. + +#### Characteristics: + +* User interaction is not a key topic of discussion and rarely (if ever) are interactions tested during design and delivery of new capabilities. +* Capabilities are mainly provided through manual requests, though providers may choose to automate some or all of the activities necessary to provision a user request. +* Requests that are on the face “simple” become complex due to finding out the right process to follow +* Sometimes a process appears to be sanctioned, but users run into issues when a different department or team gets involved + +#### Example Scenarios: + +* An application team wants to performance test their new change. To do this, they want an isolated environment that contains enough test data to get an accurate performance read. The last time they had this request a former teammate was able to get access to an environment, but they have since moved on and no one knows how to recreate it. In the end, they are connected to an engineer on the infrastructure team who is able to provision them an environment in a few days. +* A team in the exploratory phases of product development uses a bespoke process to provision a new cloud service without needing to validate their solution warrants further investment. + +### Level 2, Operationalized — Standard tooling + +Consistent, standard interfaces for provisioning and observing platforms and capabilities exist and meet broad needs. Users are able to identify what capabilities are available and are enabled to request capabilities that they require. + +"Paved roads" or "golden paths", in the form of documentation and templates, are provided. These resources define how to provision and manage typical capabilities using compliant and tested patterns. While some users are able to use these solutions on their own, the solutions often still require deep domain expertise and therefore support from maintainers is still vital. + +#### Characteristics: + +* Technical solutions are built-in tools specific to their problem domain, not always tools familiar to the users. +* There is investment in a common path; however, deviating from that path quickly uncovers few customization options as the focus was on building a single option. +* Given standardization, informal internal groups are able to form and gather to share good practices and overcome shared problems. +* There may be drift on capability implementation as teams take templates, customize them, and then cannot merge in changes from the centralized team. + +#### Example Scenarios: + +* A centralized team curates a library of Terraform modules, Kubernetes controllers, and CRDs for provisioning different types of infrastructure. +* A shared location includes comprehensive documents about solutions across the organization. + +### Level 3, Scalable — Self-service solutions + +Solutions are offered in a way that provides autonomy to users and requires little support from maintainers. The organization encourages and enables solutions to provide consistent interfaces that enable discoverability and portability of user experience from one capability to another. While self-service, the solutions do require team awareness and implementation. In order to improve this experience there may be a guided and simplified internal language which enables users to adopt and integrate platform capabilities more quickly. This generates a user-centric, self-serviceable, and consistent collection of capabilities. + +#### Characteristics: + +* Solutions are provided as “one-click” implementations, enabling teams to benefit from a capability without needing to understand how they are provisioned. +* While the solutions are easy to create, there may not be as much usability built into the day 2 and beyond management of the solution. +* There continues to be a narrow path of available solutions, leaving users with unique requirements unsure how to proceed. + +#### Example Scenarios: + +* An API is provided which abstracts the creation and maintenance of databases and provides users with any information they require to leverage that platform capability such as a connection string, location for secret data, and dashboard with observability data. + +### Level 4, Optimizing — Managed services + +Platform capabilities are transparently integrated into the tools and processes that teams already use to do their work. Some capabilities are provisioned automatically, such as observability or identity management for a deployed service. When users hit the edges of the provided services, there is an opportunity to move past automated solutions and customize for their needs without leaving the internal offerings because platform capabilities are considered building blocks. These building blocks are used to build transparent and automatic compositions to meet the higher-level use cases while enabling deeper customization where necessary. + +#### Characteristics: + +* It is clear what capabilities are differentiating for the organization and which are not, allowing the internal teams to invest in custom solutions only where they can not leverage industry standards. +* While capabilities are surfaced in a consistent way, there is no one way to use a capability. Some are best suited as CLI tools for use in scripts whereas others benefit from integration into where the user is writing code in their editors and IDEs. +* The value of individual capabilities is extended with a focus on the flow of both software development and release, leading to a focus on how to combine capabilities into higher level offerings. +* While capabilities are often provided in packages, super users are enabled to decompose these higher level offerings in order to optimize when and where they need to. + +#### Example Scenarios: + +* Observability agents are injected into every workload and an OIDC proxy is placed in front of all applications. +* By default every new project receives a space in a task runner (pipelines) and a runtime environment (K8s namespace), however a project can opt into other options such as serverless runtime. +* From a catalog in a Service Now portal a user selects "Provision a Database." Automation provisions an RDS database and sends a URL and location to get credentials to the user. + +{{< /tab >}} +{{< tab tabName="Operations">}} + +

How are platforms and their capabilities planned, prioritized, developed and maintained?

+ +Operation of platforms means running and supporting its capabilities and their features over their whole lifetime, including acceptance of new requests, initial releases, upgrades and extensions, ongoing maintenance and operations, user support, and even deprecation and termination. Organizations and their platform teams choose platforms and capabilities to create and maintain and can prioritize the most valuable and impactful initiatives. + +Notably, most of the work to provide a capability is expended after its initial release — in providing seamless upgrades, new and improved features, operational support, and end-user enablement and education. Therefore an impactful, valuable platform will plan in advance and manage their platform for long-term sustainable operations and reliability. + +### Level 1, Provisional — By request + +Platforms and capabilities are developed, published, and updated reactively, based on ad hoc product team requests and requirements. Product teams themselves may even need to plan and build the capabilities they require. + +Teams who build a new capability, whether dedicated centralized teams or application teams meeting their own needs, take only informal responsibility for supporting others using it. They are not expected to actively maintain it and few processes exist to vet the quality of the offering. In this level, implementations are often ignored until a security vulnerability is discovered, a bug prevents use, or a new requirement arrives, at which point another reactive plan may be quickly implemented. + +#### Characteristics: + +* Capabilities are created to meet the pressing needs of individual application teams. +* Focus is on initial delivery of core capabilities; plans are not made for ongoing maintenance and sustainability. +* Capability implementations are generally out of date and awaiting updates. +* Sudden spikes of work are introduced for late-breaking high-impact changes to capabilities, such as discovery of a vulnerability. +* Changes can result in both planned and unplanned downtime. +* Each upgrade is done in a bespoke way, requiring time and research to devise a process on each upgrade. + +#### Example Scenarios: + +* Log4Shell security vulnerability is announced and the organization spins up a specialty team to investigate where the organization may be vulnerable and instigate patches. Once the team identifies the impact, they must work hand in hand with a number of different teams since each one manages their servers and upgrade processes differently. Even when this work is deemed complete, the confidence level is fairly low that there won’t be more instances uncovered. + +### Level 2, Operationalized — Centrally tracked + +Platforms and capabilities are centrally documented and discoverable, and processes for planning and managing the lifecycle of capabilities is at least lightly defined. Responsibility and ownership is documented for each service and function. Lifecycle management processes vary for different capabilities depending on their owners and their priorities. A centralized team maintains, or is able to on demand generate, an inventory of backlog capabilities to provide the state of maintenance for current capabilities. This allows the organization to track progress towards capability offering and compliance with upgrade requirements. + +#### Characteristics: + +* Application teams create new capabilities as needed to meet pressing needs. +* A central team provides a register of available shared services across the organization. +* Loose standards, such as requiring an automatable API and usage docs, are applied to capabilities. +* Infrastructure as Code is used to allow easier traceability of deployed services. +* Audits for compliance regulations such as PCI DSS or HIPPA are enabled through the service inventories. +* Migration and upgrade work is tracked against a burndown chart enabling the organization to track rate of compliance and time until completion. +* Tracking does not indicate level of support; often upgrades at this stage are still manual and bespoke. + +#### Example Scenarios: + +* PostgreSQL 11 is going EOL by the end of the year. The organization is aware of which databases require upgrade and are scheduling the work on each team’s backlog to complete. + +### Level 3, Scalable — Centrally enabled + +Platforms and capabilities are not only centrally registered but also centrally orchestrated. Platform teams take responsibility for understanding the broad needs of the organization and prioritize work across platform and infrastructure teams accordingly. Those responsible for a capability are expected to not only maintain it technically, but also provide standard user experiences for integrating the capability with other related services around the organization, ensure secure and reliable use, and even provide observability. + +Standard processes for creating and evolving new capabilities exist, enabling anyone in the organization to contribute a solution that meets expectations. Continuous delivery processes for platform capabilities and features enable regular rollout and rollback. Large changes are planned and coordinated as they would be for customer-facing product changes. + +#### Characteristics: + +* Application teams request services from platform teams first before creating them. +* New services must adhere to standard practices such as standard interfaces, documentation, and governance. +* Upgrade processes are documented and consistent across versions and services. +* Where the capability provider does not manage an upgrade, they provide tooling and support to the users for minimal impact. + +#### Example Scenarios: + +* The organization is going to upgrade to RHEL 9. In doing so, each application team needs to validate that their software continues to work. In order to enable this testing phase the centralized compute team is setting up test environments for each team with the correct software and OS versions. + +### Level 4, Optimizing — Managed services + +The lifecycle of each capability is managed in a standardized, automated way. Capabilities, features and updates are delivered continuously with no impact on users. Any large changes instigated by platform providers include migration plans for existing users with defined responsibilities and timelines. + +Platform capability providers take on the brunt of responsibility for maintenance, but there is a clear contract — a "shared responsibility model" — describing the responsibilities of users, enabling both sides to operate mostly autonomously. + +#### Characteristics: + +* A shared ownership model clearly defines who is responsible for platforms and their capabilities and what is expected of users. +* Teams script both the execution of the upgrade and any rollback strategies to keep risk and impact low. + +#### Example Scenarios: + +* The users of virtual machines are not required to manage anything to do with version upgrades. Their only requirement is to have a stage in their delivery pipeline that contains a representative smoke test. They are then asked to declare their application as having lower risk tolerance so as to wait for a fully hardened upgrade or higher tolerance to become an early adopter. The virtual machine capability then manages the automated release of upgrades including rollbacks after either smoke test or canary release failures. + +{{< /tab >}} +{{< tab tabName="Measurement">}} + +

What is the process for gathering and incorporating feedback and learning?

+ +By reacting to explicit and implicit feedback from users, organizations can increase user satisfaction and ensure long-term platform sustainability. Organizations must balance innovation and meeting user demands to keep platform relevance. As technology and user preferences change, platforms that are agile and responsive to these changes will stand out. Regularly revisiting and refining the feedback mechanism can further optimize platform development and improve user engagement. + +### Level 1, Provisional — Ad hoc + +Usage and satisfaction metrics are gathered in custom ways, if at all, for each platform and capability. Outcomes and measures of success are not consistently aligned across capabilities, and therefore corresponding insights are not gathered. User feedback and instrumentation of platform use may not be gathered, or if it is, it will be informal. Decisions are made based on anecdotal requirements and incomplete data. + +#### Characteristics: + +* No experience or opinions about how to measure success of platforms +* Use familiar tools to gather common metrics with limited intent and forethought +* Reliance on small amounts of data +* Difficult to secure user participation — users believe their feedback isn't considered +* If surveys are used, the questions change between runs, negating the ability to track progress + +#### Example Scenarios: + +* A platform tech lead wants to improve the collaboration with users by adding key topics to their next quarterly planning. They decide to run a survey on what users would like to see. The response is overwhelming, which is exciting, but also results in a difficulty organizing and responding to all of the ideas. While some ideas influence the quarterly planning, the users do not see their ideas as being accepted and are less inclined to reply to the next survey. +* The team wants to capture more data automatically, so they look for opportunities for easy collection such as test failures in CI. However, not every team uses the same CI automation so the data is only available for Java applications even though some teams have moved on to writing their services in Scala. + +### Level 2, Operationalized — Consistent collection + +Organizations at this level have an intentional goal to verify platform products meet the needs of their market of internal users. Actionable, structured collection of user feedback is valued. Dedicated teams or individuals might be assigned to gather feedback, ensuring a more consistent approach. Feedback channels, such as surveys or user forums, are standardized, and feedback is categorized and prioritized. Beyond user feedback, there is also an expectation that user experiences are instrumented to generate usage data over time. + +Challenges remain in translating feedback into actionable tasks. While there is a growing repository of user data, the organization might need help effectively understanding and integrating this feedback into a platform roadmap. It may be hard to ensure that users see tangible changes driven by their feedback. + +#### Characteristics: + +* Data collection is discussed as part of most major planning sessions or capability implementations. +* There may not be alignment on exactly what to measure to verify success. +* Platform features can be measured for success, such as by measuring user adoption or user time saved. + +#### Example Scenarios: + +* A platform team allocates 20% of their time to user defined features, which they identify based on surveys and other interview techniques. Their findings are collected into a tool that enables additional voting and commenting to further refine priorities. During implementation the requesting users are approached for collaboration on early designs and implementations. Once implemented, there are announcements which make sure requesting users are aware of new features and supported in adopting them. +* The team focused on software delivery capabilities wants to capture more data automatically including cycle time which they automate through the build tool from commit to production. There is an understanding that cycle time can include other activities like PR review, but that isn’t included at this time. + +### Level 3, Scalable — Insights + +While robust, standard feedback mechanisms already exist, at this stage data is collected in crafted ways to yield specific strategic insights and actions. Desired results and outcomes are identified followed by standard metrics chosen to indicate progress towards those outcomes. Industry frameworks and standards may be used to benefit from industry research on the impact of certain behaviors. + +Dedicated teams or tools are employed to gather and review feedback and summarize actionable insights. A symbiotic relationship between platform products and their users is established. Feedback is considered a strategic asset that guides platform operations and roadmap. Regular feedback review sessions might be instituted, where cross-functional teams come together to discuss and strategize based on user insights. + +#### Characteristics: + +* Before delivering any new platform feature, the team discusses how to evaluate the outcome from their work. +* The organization has broad alignment on measures that indicate success of platform initiatives. +* A [product manager]({{< ref "/wgs/platforms/glossary#platform-product-managers" >}}) or dedicated team member drives an ongoing and consistent feedback collection and analysis process. +* The organization has established metrics and goals to observe and target to indicate success. + +#### Example Scenarios: + +* The organization has consistently tracked build times and lead time. However, now they realize that while easy to collect, these alone do not give a complete picture of software delivery. With this in mind, the team implements measurement for service reliability and stability. + +### Level 4, Optimizing — Quantitative and qualitative + +Feedback and measurements are deeply integrated into the organization's culture. The entire organization, from top-level executives to engineers organization-wide, recognizes the value of data collection and feedback on product evolution. There is a democratization of data, where various stakeholders, including platform users and business leaders, are actively involved in identifying hypotheses for platform improvements, providing feedback during the design process, and then measuring the impact post delivery. All of these measurements are considered when planning platform initiatives. + +Not only are standard frameworks leveraged, but there is an understanding that measuring from multiple angles creates a more holistic picture. There is an investment in understanding how qualitative measures change as quantitative ones are improved. There is a focus on identifying leading measures which can allow anticipation of features that would support user needs, alleviate their challenges, and stay ahead of industry trends and business requirements. + +#### Characteristics: + +* Platform teams continuously seek ways to improve the metrics they watch and the way they gather data. +* The organization is familiar with and sensitive to [Goodhart's Law](https://en.wikipedia.org/wiki/Goodhart%27s_law): "When a measure becomes a target, it ceases to be a good measure." +* Metrics and telemetry gathered is continuously evaluated for true insight and value. +* Metric data management is well supported, such as standard platform capabilities to manage data lakes and derive insights. +* Cross-departmental collaboration is encouraged to avoid data silos and enable effective feedback cycles. + +#### Example Scenarios: + +* Over time the organization has collected data indicating a rise in build time of over 15%. This triggers negative developer experiences and once triggered, even if the build time is reduced below the original time, developers stay frustrated for longer. This insight drives the build team to set and adhere to a Service Level Objective (SLO), which enables early identification and improvement before instigating the negative cycle with their users. + +{{< /tab >}} +{{< /tabs >}} +
+ +
+ +--- +## Conclusion + +Platforms and their maintainers provide a foundation for agile digital product development. They provide a consistent collection of capabilities that enable efficient software development and delivery. This maturity model provides a map for your platform engineering journey. diff --git a/website/content/ja/wgs/platforms/whitepaper/assets/platforms-def-latest.pdf b/website/content/ja/wgs/platforms/whitepaper/assets/platforms-def-latest.pdf new file mode 100644 index 00000000..8a123a4c Binary files /dev/null and b/website/content/ja/wgs/platforms/whitepaper/assets/platforms-def-latest.pdf differ diff --git a/website/content/ja/wgs/platforms/whitepaper/assets/platforms-def.drawio.png b/website/content/ja/wgs/platforms/whitepaper/assets/platforms-def.drawio.png new file mode 100644 index 00000000..09461d37 Binary files /dev/null and b/website/content/ja/wgs/platforms/whitepaper/assets/platforms-def.drawio.png differ diff --git a/website/content/ja/wgs/platforms/whitepaper/assets/platforms-def.drawio.xml b/website/content/ja/wgs/platforms/whitepaper/assets/platforms-def.drawio.xml new file mode 100644 index 00000000..bbbc6da7 --- /dev/null +++ b/website/content/ja/wgs/platforms/whitepaper/assets/platforms-def.drawio.xml @@ -0,0 +1,116 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/website/content/ja/wgs/platforms/whitepaper/assets/platforms-paper-cover.jpeg b/website/content/ja/wgs/platforms/whitepaper/assets/platforms-paper-cover.jpeg new file mode 100644 index 00000000..c28bae71 Binary files /dev/null and b/website/content/ja/wgs/platforms/whitepaper/assets/platforms-paper-cover.jpeg differ diff --git a/website/content/ja/wgs/platforms/whitepaper/index.md b/website/content/ja/wgs/platforms/whitepaper/index.md new file mode 100644 index 00000000..80d52aac --- /dev/null +++ b/website/content/ja/wgs/platforms/whitepaper/index.md @@ -0,0 +1,505 @@ +--- +title: "CNCFプラットフォームホワイトペーパー" +pdf: https://github.com/cncf/tag-app-delivery/raw/main/platforms-whitepaper/v1/assets/platforms-def-v1.0.pdf +version_info: https://github.com/cncf/tag-app-delivery/tree/main/platforms-whitepaper/README.md +description: "この論文は、エンタープライズのリーダー、エンタープライズアーキテクト、およびプラットフォームチームのリーダーが、クラウドコンピューティングのための内部プラットフォームの提唱、調査、計画するのをサポートすることを目的としています。私たちは、プラットフォームが企業の実際のバリューストリームに大きな影響を与えると信じていますが、それは間接的なものなので、プラットフォームチームの長期的な持続と成功にはリーダーシップの合意と支援が不可欠です。この論文では、プラットフォームの価値が何であるか、それをどのように測定するか、そしてそれを最大化するプラットフォームチームをどのように実現するかについて議論することで、その支援を促進します。" +type: whitepapers +--- + +## Introduction + +Inspired by the cross-functional cooperation promised by DevOps, platform +engineering has begun to emerge in enterprises as an explicit form of that +cooperation. Platforms curate and present foundational capabilities, frameworks +and experiences to facilitate and accelerate the work of internal customers such +as application developers, data scientists and information workers. Particularly +in cloud computing, platforms have helped enterprises realize values long +promised by the cloud like fast product releases, portability across +infrastructures, more secure and resilient products, and greater developer +productivity. + +This paper intends to support enterprise leaders, enterprise architects and +platform team leaders to advocate for, investigate and plan internal platforms +for cloud computing. We believe platforms significantly impact enterprises' +actual value streams, but only indirectly, so leadership consensus and support +is vital to the long-term sustainability and success of platform teams. In this +paper we'll enable that support by discussing what the value of platforms is, how +to measure that value, and how to implement platform teams that maximize it. + +## Table of Contents + +1. Why platforms? +1. What is a platform +1. Attributes of successful platforms +1. Attributes of successful platform teams +1. Challenges when implementing platforms +1. How to measure the success of platforms +1. Capabilities of platforms + +## Why platforms? + +Platforms and platform engineering are a popular topic in today's cloud computing world. +Before diving into definitions, techniques, and measurements for platform building, it +is important to first explore the value platforms provide that's driving this +well-deserved attention. + +Process improvements over the past 2-3 decades have significantly increased the +agility of software application and product teams, offering them flexible services +for both infrastructure like compute, network and storage as well as developer +services like builds, tests, delivery and observability. This autonomy and process +improvement has also had the effect of gradually shifting more and more responsibility +for supporting services to product teams, forcing them to spend more and more time +and cognitive energy on infrastructure concerns and reducing their time to produce +value relevant to their organization. + +The desire to refocus delivery teams on their core focus and reduce duplication of +effort across the organisation has motivated enterprises to implement platforms for +cloud-native computing. By investing in platforms, enterprises can: + +1. Reduce the cognitive load on product teams and thereby accelerate product + development and delivery +1. Improve reliability and resiliency of products relying on platform + capabilities by dedicating experts to configure and manage them +1. Accelerate product development and delivery by reusing and sharing platform + tools and knowledge across many teams in an enterprise +1. Reduce risk of security, regulatory and functional issues in products and + services by governing platform capabilities and the users, tools and processes + surrounding them +1. Enable cost-effective and productive use of services from public clouds + and other managed offerings by enabling delegation of implementations to those + providers while maintaining control over user experience + +These benefits accrue in part because just a few platform teams serve many +product teams, multiplying their impact; in part because platform teams +consolidate management of common functionality, facilitating governance; and in +part because platform teams emphasize user interfaces and experiences above all +else. + +A team of platform experts not only reduces common work demanded of product +teams but also optimizes platform capabilities used in those products. A +platform team also maintains a set of conventional patterns, knowledge and tools +used broadly across the enterprise; enabling developers to quickly contribute to +other teams and products built on the same foundations. The shared platform +patterns also allow embedding governance and controls in templates, patterns and +capabilities. Finally, because platform teams corral providers and provide +consistent experiences over their offerings, they enable efficient use of public +clouds and service providers for foundational but undifferentiated capabilities +such as databases, identity access, infrastructure operations, and app +lifecycle. + +## What is a platform + +A platform for cloud-native computing is an integrated collection of +capabilities defined and presented according to the needs of the platform's +users. It is a cross-cutting layer that ensures a consistent experience for +acquiring and integrating typical capabilities and services for a broad set of +applications and use cases. A good platform provides consistent user experiences +for using and managing its capabilities and services, such as Web portals, +project templates, and self-service APIs. + +According to Atlassian [[1]], "platform teams create capabilities that can be +used by numerous stream-aligned [product] teams with little overhead.... +platform teams minimize resources and cognitive load of the stream-aligned +[product] team... platform teams can create a cohesive experience that spans +across different user experiences or products." + +According to Martin Fowler and Evan Bottcher [[2]], "a digital platform is a +foundation of self-service APIs, tools, services, knowledge and support which +are arranged as a compelling internal product. Autonomous delivery teams can +make use of the platform to deliver product features at a higher pace, with +reduced coordination." + +The specific set of capabilities and scenarios supported by a platform should be +determined by the needs of stakeholders and users. And while platforms _provide_ +these required capabilities, it's critical to note that platform teams should +not always _implement_ them themselves. Managed service providers or dedicated +internal teams can maintain backing implementations while platforms are the +thinnest reasonable layer that provides consistency across provided implementations +and meets an organization's requirements. For example, a very simple +"platform" could be a wiki page with links to standard operating procedures to +provision capabilities from providers, as described in [[3]]. + +Because these platforms target no more and no less than an enterprise's internal +users we often refer to them as _internal_ platforms. + +Platforms are particularly relevant for cloud-native architectures because they +separate supporting capabilities from application-specific logic more than +previous paradigms. In cloud-like environments resources and capabilities are +often managed independently and integrated with custom business components; such +resources may include databases and object stores, message queues and brokers, +observability collectors and dashboards, user directories and authentication +systems, task runners and reconcilers and more. An internal platform provides +these to enterprise teams in ways that make them easy to integrate in their +applications and systems. + +### Platform maturity + +At their most basic, internal platforms provide consistent experiences for +acquiring and using individual services such as a pipeline runner, a database +system or a secret store. As they mature internal platforms also offer +_compositions_ of such services as self-serviceable templates for key scenarios +like web application development or data analysis, aka MLOps. + +Use cases an enterprise could meet with platforms might progress through the +following: + +1. Product developers can provision capabilities on demand and immediately use + them to run systems, such as compute, storage, databases or identities. +1. Product developers can provision service spaces on demand and use them to run + pipelines and tasks, to store artifacts and configuration, and/or to collect + telemetry. +1. Administrators of third-party software can provision required dependencies + like databases on demand and easily install and run that software. +1. Product developers can provision complete environments from templates + combining run-time and development-time services required for specific + scenarios, such as web development or MLOps. +1. Product developers and managers can observe functionality, performance, and + cost of deployed services through automatic instrumentation and standard + dashboards. + +By offering consistent, compliant experiences for individual capabilities or +sets of them, internal platforms ultimately make it easier and more efficient +for their users to deliver valuable products. + +## Attributes of platforms + +After defining what a platform is and why an organization might want to build one, +let's identify some key attributes that affect the success of a platform. + +1. **Platform as a product**. A platform exists to serve the requirements of its users + and it should be designed and evolved based on those requirements, similar to any + other software products. Platforms should provide the necessary capabilities to + support the most common use cases across product teams, and prioritize those + over more specific capabilities that are only used by a single team to maximize + the value delivered. +1. **User experience**. A platform should offer its capabilities through consistent + interfaces and focus on the user experience. Platforms should endeavor to meet their + users where they are, which may mean a combination of GUIs, APIs, command-line tools, + IDEs, and portals. For example, a platform typically offers the capability of deploying + an application. Developers might consume such a capability via the IDE, testers might + use a command-line tool, whereas a product owner might use a GUI-based web portal. +1. **Documentation and onboarding**. Documentation is a key aspect of a successful software + product. To be able to use a platform's offerings, users require documentation and + examples. A platform should be delivered with proper documentation addressing the + needs of its users. It should also provide tools to accelerate the onboarding of new projects + that can help users consume the necessary platform services in a quick and simple way. + For example, the platform could offer a reusable supply chain workflow for building, scanning, + testing, deploying, and observing a web application on Kubernetes. Such a workflow could be + offered with an initial project template and documentation, a bundle often described + as a _golden path_. +1. **Self-service**. A platform should be self-serviceable. Users must be able to request and + receive capabilities autonomously and automatically. This property is key to allowing a platform + team to enable multiple product teams and scale as needed. The platform capabilities should be + available on demand and with minimal manual intervention via the interfaces described above. + For example, it should be possible for a user to request a database and receive its locator + and credentials by running a command-line tool or filling out a form on a web portal. +1. **Reduced cognitive load for users**. An essential goal of a platform is to reduce the cognitive + load on product teams. A platform should encapsulate implementation details and hide + any complexity that might arise from its architecture. For example, a platform might delegate + certain services to a cloud provider, but users should not be exposed to such details. + At the same time, the platform should allow users to configure and observe certain services + as needed. Users must not be responsible for operating the services offered by the platform. + For example, users may often require a database, but they shouldn't have to manage the database + server. +1. **Optional and composable**. Platforms are intended to make product development more efficient, so they + must not be an impediment. A platform should be composable and enable product teams to use only + parts of its offerings. It should also enable product teams to provide and manage their own + capabilities outside of the platform's offerings when necessary. For example, if a platform doesn't + provide a graph database and it's required for a product, it should be possible for the product + team to provision and operate a graph database themselves. +1. **Secure by default**. A platform should be secure by default and offer capabilities + to ensure compliance and validation based on rules and standards defined by the organization. + +## Attributes of platform teams + +Platform teams are responsible for the interfaces to and experiences with +platform capabilities - like Web portals, custom APIs, and golden path +templates. On one hand, platform teams work with those teams implementing +infrastructure and supporting services to define consistent experiences; on +the other, they work with product and user teams to gather feedback and ensure +those experiences meet requirements. + +Following are jobs a platform team should be responsible for: + +1. Research platform user requirements and plan feature roadmap +1. Market, evangelize and advocate for the platform's proposed values +1. Manage and develop interfaces for using and observing capabilities and + services, including portals, APIs, documentation and templates, and CLI tools + +Most importantly, platform teams must learn about the requirements of platform +users to inform and continuously improve capabilities and interfaces offered by +their platform. Ways to learn about user requirements include user interviews, +interactive hackathons, issue trackers and surveys, and direct observation of +usage through observability tools. For example, a platform team could publish a +form for users to submit feature requests, lead roadmap meetings +to share upcoming features and review users' usage patterns to set priorities. + +Inbound feedback and thoughtful design is one side of product delivery; the +other side is outbound marketing and advocacy. If the platform is truly built to +user requirements those users will be excited to use the provided capabilities. +Some ways a platform team can enable user adoption is through internal marketing +activities including broad announcements, engaging demos, and regular feedback +and communication sessions. The key here is to meet users where they are and +bring them on a journey to engage with and benefit from the platform. + +A platform team doesn't necessarily run compute, network, storage or other +services. In fact an internal platform should rely on _externally_-provided +services and capabilities as much as possible; platform teams should build and +maintain their own capabilities only when they're not available elsewhere from +managed providers or internal infrastructure teams. Instead, platform teams are +most responsible for the _interfaces_ (i.e., GUI, CLI, and API) and user +experiences for the services and capabilities their platform makes available. + +For example, a Web page in a platform might describe and even offer a button to +provision an identity for an app; while the implementation of that capability +might be via a cloud-hosted identity service. An internal platform team may +manage the web page and an API, but not the actual service implementation. +Platform teams should usually consider creating and maintaining their own +capabilities only when a required capability is not available elsewhere. + +## Challenges with platforms + +While platforms promise lots of value, they also bring challenges like the +following which implementers should keep in mind. + +1. Platform teams must treat their platforms like products and develop them + together with users +1. Platform teams must carefully choose their priorities and initial partner + application teams +1. Platform teams must seek support of enterprise leadership and show impact on + value streams + +Perhaps most important is to treat the platform as a customer-facing product and +recognize that its success is directly dependent on the success of its users and +products; and as such it's vital that platform teams partner with app teams and +other users to prioritize, plan, implement and iterate on the platform's +capabilities and user experiences. Platform teams that release features and +experiences without feedback or that rely on top-down mandates to achieve adoption +are almost certain to find resistance and resentment from their users and miss a +lot of the promised value. To counter this, platform teams should include product +managers from the start to share roadmaps, gather feedback and generally understand +and represent the needs of platform users. + +When adopting platforms, choosing the right capabilities and experiences to +enable first, can be crucial. Capabilities that are frequently required and +undifferentiated, like pipelines, databases and observability, may be a good +place to start. Platform teams may also choose to focus first on a limited number +of engaged and skillful app teams. Detailed feedback from such teams improves the +first platform experiences; and people from those teams help champion and +evangelize the platform to later adopters. + +Finally, it's vital in large enterprises to quickly gain leadership support for +platform teams. Many enterprise leaders perceive IT infrastructure as an expense +quite disconnected from their primary value streams and may try to constrain +costs and resources allocated to IT platforms, leading to a poor implementation, +unrealized promises and frustration. To mitigate this, platform teams need to +demonstrate their direct impact on and relationships with product and value +stream teams (see the previous two paragraphs), presenting the platform team as +a strategic partner of product teams in delivering value to customers. + +### Enabling platform teams + +It is clear from these challenges that platform teams are faced with a number of +diverse responsibilities which lead to cognitive load. Just as with their +application team counterparts, this challenge grows with the number and diversity +of users and teams they need to support. + +It is important to focus the platform team's energy on the experience and +capabilities that are unique to their specific business. Ways to reduce load on +the platform team include the following: + +1. Seek to build the thinnest viable platform layer over implementations from + managed providers +1. Leverage open source frameworks and toolkits for creating docs, templates and + compositions for application team use +1. Ensure platform teams are staffed appropriately for their domain and number + of customers + +## How to measure the success of platforms + +Enterprises will want to measure whether their platform initiatives are +delivering the values and attributes discussed above. Also, throughout this paper we've +emphasized the importance of treating internal platforms as products, and good +product management depends on quantitative and qualitative measurement of a +product's performance. To meet these requirements, internal platform teams +should continuously gather user feedback and measure user activities. + +As with other aspects of internal platforms, though, platform teams should use +the smallest viable effort to gather the feedback they need. We'll suggest +metrics here but simple surveys and analysis of user behavior may be most +valuable initially. + +Categories of metrics that will help enterprises and platform teams understand +the impact of their platforms include the following: + +### User satisfaction and productivity + +The first quality sought by many platforms is to improve user experience in order +to increase productivity. Metrics that reflect user satisfaction and +productivity include the following: + +- Active users and retention: includes number of capabilities provisioned and user growth/churn +- "Net Promoter Score" (NPS) or other surveys that measure user satisfaction with a product +- Metrics for developer productivity such as those discussed in the SPACE framework [[4]] + +### Organizational efficiency + +Another benefit sought from many platforms is to efficiently provide common +needs to a large user base. This is often achieved by enabling user self-service +and reducing manual steps and required human intervention while implementing +policies to guarantee safety and compliance. To measure the efficiency of a +platform in reducing common work, consider measures such as these: + +- Latency from request to fulfillment of a service or capability, such as a database or test environment +- Latency to build and deploy a brand new service into production +- Time for a new user to submit their first code changes to their product + +### Product and feature delivery + +The ultimate objective of internal platforms is to deliver business value to +customers faster, so measuring impact on a business's own product and feature +releases demonstrates that the objectives of the platform are being met. The +DevOps Research and Assessment (DORA) institute at Google suggests [[5]] +tracking the following metrics: + +- Deployment frequency +- Lead time for changes +- Time to restore services after failure +- Change failure rate + +Generally, a key objective of platform teams is to align infrastructure and +other IT capabilities with an enterprise's value streams - its products. And so +ultimately the success of an organization's products and applications are the +true measure of the success of a platform. + +## Capabilities of platforms + +As we've described, a platform for cloud-native computing offers and composes +capabilities and services from many supporting providers. These providers may be +other teams within the same enterprise or third parties like cloud service +providers. In a nutshell, platforms bridge from underlying _capability +providers_ to platform users like application developers; and in the process +implement and enforce desired practices for security, performance, cost +governance and consistent experience. The following graphic illustrates the +relationships between products, platforms, and capability providers. + + + +We've focused in this paper on how to construct a good platform and platform +team; now in this last section we'll describe the capabilities a platform may +actually offer. This list is intended to guide platform builders and includes +capabilities typically required by cloud-native applications. As we've noted +throughout though, a good platform reflects its users' needs, so ultimately +platform teams should choose and prioritize the capabilities their platform +offers together with its users. + +Capabilities may comprise several _features_, meaning aspects or attributes of +the parent capability's domain. For example, observability may include features +for gathering and publishing metrics, traces and logs as well as for observing +costs and energy consumption. Consider the need and priority for each feature or +aspect in your organization. Later CNCF publications may expand on each +domain further. + +Here are capability domains to consider when building platforms for cloud-native +computing: + +1. **Web portals** for observing and provisioning products and capabilities +1. **APIs** (and CLIs) for automatically provisioning products and capabilities +1. **"Golden path" templates and docs** enabling optimal use of capabilities in products +1. **Automation for building and testing** services and products +1. **Automation for delivering and verifying** services and products +1. **Development environments** such as hosted IDEs and remote connection tools +1. **Observability** for services and products using instrumentation and + dashboards, including observation of functionality, performance and costs +1. **Infrastructure** services including compute runtimes, programmable + networks, and block and volume storage +1. **Data** services including databases, caches, and object stores +1. **Messaging** and event services including brokers, queues, and event fabrics +1. **Identity and secret** management services such as service and user identity + and authorization, certificate and key issuance, and static secret storage +1. **Security** services including static analysis of code and artifacts, + runtime analysis, and policy enforcement +1. **Artifact storage** including storage of container image and + language-specific packages, custom binaries and libraries, and source code + +The following table is intended to help readers grasp each capability by loosely +relating it to existing CNCF or CDF projects. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
CapabilityDescriptionExample CNCF/CDF Projects
Web portals for provisioning and observing capabilitiesPublish documentation, service catalogs, and project templates. Publish telemetry about systems and capabilities.Backstage, Skooner, Ortelius
APIs for automatically provisioning capabilitiesStructured formats for automatically creating, updating, deleting and observing capabilities.Kubernetes, Crossplane, Operator Framework, Helm, KubeVela
Golden path templates and docsTemplated compositions of well-integrated code and capabilities for rapid project development.ArtifactHub
Automation for building and testing productsAutomate build and test of digital products and services.Tekton, Jenkins, Buildpacks, ko, Carvel
Automation for delivering and verifying servicesAutomate and observe delivery of services.Argo, Flux, Keptn, Flagger, OpenFeature
Development environmentsEnable research and development of applications and systems.Devfile, Nocalhost, Telepresence, DevSpace
Application observabilityInstrument applications, gather and analyze telemetry and publish info to stakeholders.OpenTelemetry, Jaeger, Prometheus, Thanos, Fluentd, Grafana, OpenCost
Infrastructure servicesRun application code, connect application components and persist data for applicationsKubernetes, Kubevirt, Knative, WasmEdge, KEDA
CNI, Istio, Cilium, Envoy, Linkerd, CoreDNS
Rook, Longhorn, Etcd
Data servicesPersist structured data for applicationsTiKV, Vitess, SchemaHero
Messaging and event servicesEnable applications to communicate with each other asynchronouslyStrimzi, NATS, gRPC, Knative, Dapr
Identity and secret servicesEnsure workloads have locators and secrets to use resources and capabilities. Enable services to identify themselves to other servicesDex, External Secrets, SPIFFE/SPIRE, Teller, cert-manager
Security servicesObserve runtime behavior and report/remediate anomalies. Verify builds and artifacts don't contain vulnerabilities. Constrain activities on the platform per enterprise requirements; notify and/or remediate aberrationsFalco, In-toto, KubeArmor, OPA, Kyverno, Cloud Custodian
Artifact storage Store, publish and secure built artifacts for use in production. Cache and analyze third-party artifacts. Store source code.ArtifactHub, Harbor, Distribution, Porter
+ + + +[1]: https://www.atlassian.com/devops/frameworks/team-topologies +[2]: https://martinfowler.com/articles/talk-about-platforms.html +[3]: https://teamtopologies.com/key-concepts-content/what-is-a-thinnest-viable-platform-tvp +[4]: https://queue.acm.org/detail.cfm?id=3454124 +[5]: https://cloud.google.com/blog/products/devops-sre/the-2019-accelerate-state-of-devops-elite-performance-productivity-and-scaling diff --git a/website/i18n/ja.toml b/website/i18n/ja.toml new file mode 100644 index 00000000..6c03b8d7 --- /dev/null +++ b/website/i18n/ja.toml @@ -0,0 +1,12 @@ +# UI strings. Buttons and similar. + +# Footer text +[footer_all_rights_reserved] +other = " | このドキュメントはCC-BY-4.0の下で配布されています" + +[post_create_issue] +other = "issueの作成" + +# TOC +[page_contents] +other = "目次"