Update top-level and section READMEs (#50)

* top-level README WIP * rewrite top-level README * change top-level README title * remove initial quote in top-level README * Update README.md * Update README.md * Update README.md * foundations README * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md
GoogleCloudPlatform · Mar 17, 2020 · 9f70c96 · 9f70c96
1 parent b2a7b4f
commit 9f70c96
Show file tree

Hide file tree

Showing 3 changed files with 97 additions and 35 deletions.
diff --git a/README.md b/README.md
@@ -1,17 +1,41 @@
-# Cloud Foundation Toolkit - Fabric
+# Terraform Examples and Modules for Google Cloud
 
-Cloud Foundation Fabric provides end-to-end Terraform code examples on GCP, which are meant for prototyping and as minimal samples to aid in designing real-world infrastructures. As such, these samples are meant to be adapted and updated for your different use cases, and often do not implement GCP security best practices for production use.
+This repository provides **end-to-end examples** and a **suite of Terraform modules** for Google Cloud, which support different use cases:
 
-All the examples leverage composition, combining different Cloud Foundation Toolkit modules to realize an integrated design. Additional modules can be combined in to tailor the examples to specific needs, and to implement additional best practices. You can check the [full list of Cloud Foundation Toolkit modules here](https://github.com/terraform-google-modules).
+- starter kits used to bootstrap real-word cloud foundations and infrastructure
+- reference examples used to deep dive on network patterns or product features
+- composable modules that support quick prototyping and testing
+- a comprehensive source of lean modules that lend themselves well to changes
 
-The examples are organized into two main sections: GCP foundational design, and infrastructure design
+The whole repository is meant to be cloned as a single unit, and then forked into separate owned repositories to seed production usage, or used as-is and periodically updated as a complete toolkit for prototyping.
 
-## Foundational examples
+Both the examples and modules require some measure of Terraform skills to be used effectively. If you are looking for a feature-rich black box to manage project or product creation with minimal specific skills, you might be better served by the [Cloud Foundation Toolkit](https://registry.terraform.io/modules/terraform-google-modules) suite of modules.
 
-Foundational examples deal with organization-level management of GCP resources, and take care of folder hierarchy, initial automation requirements (service accounts, GCS buckets), and high level best practices like audit log exports and organization policies.
+## End-to-end examples
 
-They are simplified versions of real-life use cases, and put a particular emphasis on separation of duties at the environment or tenant level, and decoupling high level permissions from the day to day running of infrastructure automation. More details and the actual examples are available in the [foundations folder](foundations).
+The examples in this repository are split in two main sections: **foundational examples** that bootstrap the organizational hierarchy and automation prerequisites, and **infrastructure scenarios** that implement core networking patterns or features.
 
-## Infrastructure examples
+Currently available examples:
 
-Infrastructure examples showcase typical networking configurations on GCP, and are meant to illustrate how to automate them with Terraform, and to offer an easy way of testing different scenarios. Like the foundational examples, they are simplified versions of real-life use cases. More details and the actual examples are available in the [infrastructure folder](infrastructure).
+- **foundations** - [single level hierarchy](./foundations/environments/) (environments), [multiple level hierarchy](./foundations/business-units/) (business units + environments)
+- **infrastructure** - [hub and spoke via peering](./infrastructure/hub-and-spoke-peering/), [hub and spoke via VPN](./infrastructure/hub-and-spoke-vpn/), [DNS and Google Private Access for on-premises](./infrastructure/onprem-google-access-dns/), [Shared VPC with GKE support](./infrastructure/shared-vpc-gke/)
+
+For more information see the README files in the [foundations](./foundations/) and [infrastructure](./infrastructure/) folders.
+
+## Modules
+
+The suite of modules in this repository are designed for rapid composition and reuse, and to be reasonably simple and readable so that they can be forked and changed where used of third party code and sources is not allowed.
+
+All modules share a similar interface where each module tries to stay close to the underlying provider resources, support IAM together with resource creation and modification, offer the option of creating multiple resources where it makes sense (eg not for projects), and be completely free of side-effects (eg no external commands).
+
+The current list of modules supports most of the core foundational and networking components used to design end-to-end infrastructure, with more modules in active development for specialized compute, security, and data scenarios.
+
+Currently available modules:
+
+- **foundational** - [folders](./modules/folders), [log sinks](./modules/logging-sinks), [project](./modules/project), [service accounts](./modules/iam-service-accounts)
+- **networking** - [VPC](./modules/net-vpc), [VPC firewall](./modules/net-vpc-firewall), [VPC peering](./modules/net-vpc-peering), VPN ([static](./modules/net-vpn-static), [dynamic](./modules/net-vpn-dynamic), [HA](./modules/net-vpn-ha)), [NAT](./modules/net-cloudnat), [address reservation](./modules/net-address), [DNS](./modules/dns)
+- **compute** - [VM/VM group](./modules/compute-vm), [GKE cluster](./modules/gke-cluster), [GKE nodepool](./modules/gke-nodepool), [COS container](./modules/compute-vm-cos-coredns)
+- **data** - [GCS](./modules/gcs), [BigQuery dataset](./modules/bigquery)
+- **other** - [KMS](./modules/kms), [on-premises in Docker](./modules/on-prem-in-a-box)
+
+For more information and usage examples see each module's README file.
diff --git a/foundations/README.md b/foundations/README.md
@@ -1,40 +1,48 @@
-# Organization-level bootstrap samples
+# Cloud foundation examples
 
-This set of Terraform root modules is designed with two main purposes in mind: automating the organizational layout, and bootstrapping the initial resources and the corresponding IAM roles, which will then be used to automate the actual infrastructure.
+The examples in this folder deal with cloud foundations: the set of resources used to **create the organizational hierarchy** (folders and specific IAM roles), **implement top-level initial best practices** (audit log exports, policies) and **bootstrap infrastructure automation** (GCS buckets, service accounts and IAM roles).
 
-Despite being fairly generic, these modules closely match some of the initial automation stages we have implemented in actual customer engagements, and are purposely kept simple to offer a good starting point for further customizations.
+The examples are derived from actual production use cases, and are meant to be used as-is, or extended to create more complex hierarchies. The guiding principles they implement are:
 
-There are several advantages in using an initial stage like the ones provided here:
+- divide the hierarchy in separate partitions along environment/organization boundaries, to enforce separation of duties and decouple organization admin permissions from the day-to-day running of infrastructure
+- keep top-level Terraform code minimal and encapsulate complexity in modules, to ensure readability and allow using code as high level documentation
 
-- automate and parameterize creation of the organizational layout, documenting it through code
-- use a single declarative tool to create and manage both prerequisites and the actual infrastructure, eliminating the need for manual commands, scripts or external tools
-- enforce separation of duties at the environment (or tenant in multi-tenant architectures) level, by automating creation of per-environment Terraform service accounts, their IAM roles, and GCS buckets to hold state
-- decouple and document the use of organization-level permissions from the day to day management of the actual infrastructure, by assigning a minimum but sufficient set of high level IAM roles to Terraform service accounts in an initial stage
-- provide a sane place for the creation and management of shared resources that are not tied to a specific environment
+## Examples
 
-## Operational considerations
+### Environment Hierarchy
+
+<a href="./environments/" title="Environments example"><img src="./environments/diagram.png" align="left" width="280px"></a> This [example](./environments/) implements a simple one-level oganizational layout, which is commonly used to bootstrap small infrastructures, or in situations where lower level folders are managed with separate, more granular Terraform setups.
 
-This specific type of preliminary automation stage is usually fairly static, only changing when a new environment or shared resource is added or modified, and lends itself well to being applied manually by organization administrators. One secondary advantage of running this initial stage manually, is eliminating the need to create and manage automation credentials that embed sensitive permissions scoped at the organization or root folder level.
+One authoritative service account, one bucket and one folder are created for each environment, together with top-level shared resources. This example's simplicity makes it a good starting point to understand and prototype foundational design.
 
-### IAM roles
+<br clear="left">
 
-This type of automation stage needs very specific IAM roles on the root node (organization or folder), and additional roles at the organization level if the generated service accounts for automation need to be able to create and manage Shared VPC. The needed roles are:
+### Business Unit / Environment Hierarchy
 
-- on the root node Project Creator, Folder Administrator, Logging Administrator
-- on the billing account or organization Billing Account Administrator
-- on the organization Organization Administrator, if Shared VPC needs to be managed by the automation service accounts
+<a href="./business-units/" title="Business Units example"><img src="./business-units/diagram.png" align="left" width="280px"></a> This [example](./business-units/) implements a two-level oganizational layout, with a first level usually mapped to business units, and a second level implementing identical environments (prod, test, etc.) under each first-level folder.
 
-### State
+This approach maps well to medium sized infrastructures, and can be used as a starting point for more complex scenarios. Separate Terraform stages are then usually implemented for each business unit, implementing fine-grained project and service account creation for individual application teams.
+<br clear="left">
 
-This type of stage creates the prerequisites for Terraform automation including the GCS bucket used for its own remote state, so some care needs to be used when running it for the first time, when its GCS bucket has not yet been created.
+## Operational considerations
 
-After the first successful `terraform apply`, copy the `backend.tf.sample` file
-to `backend.tf`, then set the bucket name to the one shown in the `bootstrap_tf_gcs_bucket` output in the new file. Once that is done, run `terraform apply` again to transfer local state to the remote GCS bucket. From then on, state will be remote.
+These examples are always used manually, as they require very high-level permissions and are updated infrequently.
 
-### Things to be aware of
+The IAM roles needed are:
 
-Using `count` in Terraform resources has the [well-known limitation](https://github.com/hashicorp/terraform/issues/18767) that changing the variable controlling `count` results in the potential unwanted deletion of resources.
+- Project Creator, Folder Administrator, Logging Administrator on the root node (org or folder)
+- Billing Account Administrator on the billing account or org
+- Organization Administrator if Shared VPC roles have to be granted to the automation service accounts created for each scope
 
-These samples use `count` on the `environments` list variable to manage multiples for key resources like service accounts, GCS buckets, and IAM roles. Environment names are usually stable, but care must still be taken in defining the initial list so that names are final, and names for temporary environments (if any are needed) are last so they don't trigger recreation of resources based on the following elements in the list.
+State is local on the first run, then it should be moved to the GCS bucket created by the examples for this specific purpose:
 
-This issue will be addressed in a future release of these examples, by replacing `count` with the new `foreach` construct [introduced in Terraform 0.12.6](https://twitter.com/mitchellh/status/1156661893789966336?lang=en) that uses key-based indexing.
+```bash
+# first apply
+terraform apply
+# create backend file
+cp backend.tf.sample backend.tf
+# edit backend.tf and use bootstrap_tf_gcs_bucket output for GCS bucket name
+vi backend.tf
+# once done, move local state to GCS bucket
+terraform init
+```
diff --git a/infrastructure/README.md b/infrastructure/README.md
@@ -1,3 +1,33 @@
-# Infrastructure samples
+# Networking and infrastructure examples
 
-These examples showcase typical networking configurations on GCP derived from real-world use cases, and are meant to illustrate how to automate them with Terraform, and to offer an easy way of testing different scenarios. We have a long list of examples we plan on adding, so check back here often.
+The examples in this folder implement **typical network topologies** like hub and spoke, or **end-to-end scenarios** that allow testing specific features like on-premises DNS policies and Private Google Access.
+
+They are meant to be used as minimal but complete strting points to create actual infrastructure, and as playgrounds to experiment with specific Google Cloud features.
+
+## Examples
+
+### Hub and Spoke via Peering
+
+<a href="./hub-and-spoke-peering/" title="Hub and spoke via peering example"><img src="./hub-and-spoke-peering/diagram.png" align="left" width="280px"></a> This [example](./hub-and-spoke-peering/) implements a hub and spoke topology via VPC peering, a common design where a landing zone VPC (hub) is conncted to on-premises, and then peered with satellite VPCs (spokes) to further partition the infrastructure.
+
+The sample highlights the lack of transitivity in peering: the absence of connectivity between spokes, and the need create workarounds for private service access to managed services. One such workarund is shown for private GKE, allowing access from hub and all spokes to GKE masters via a dedicated VPN.
+<br clear="left">
+
+### Hub and Spoke via Dynamic VPN
+
+<a href="./hub-and-spoke-vpn/" title="Hub and spoke via dynamic VPN"><img src="./hub-and-spoke-vpn/diagram.png" align="left" width="280px"></a> This [example](./hub-and-spoke-vpn/) implements a hub and spoke topology via dynamic VPN tunnels, a common design where peering cannot be used due to limitations on the number of spokes or connectivity to managed services.
+
+The example shows how to implement spoke transitivity via BGP advertisements, how to expose hub DNS zones to spokes via DNS peering, and allows easy testing of different VPN and BGP configurations.
+<br clear="left">
+
+### DNS and Private Access for On-premises
+
+<a href="./onprem-google-access-dns/" title="DNS and Private Access for On-premises"><img src="./onprem-google-access-dns/diagram.png" align="left" width="280px"></a> This [example](./onprem-google-access-dns/) uses an emulated on-premises environment running in Docker containers inside a GCE instance, to allow testing specific features like DNS policies, DNS forwarding zones across VPN, and Private Access for On-premises hosts.
+
+The emulated on-premises environment can be used to test access to different services from outside Google Cloud, by implementing a VPN connection and BGP to Google CLoud via Strongswan and Bird.
+<br clear="left">
+
+### Shared VPC with GKE and per-subnet support
+<a href="./shared-vpc-gke/" title="Shared VPC with GKE"><img src="./shared-vpc-gke/diagram.png" align="left" width="280px"></a> This [example](./shared-vpc-gke/) shows how to configure a Shared VPC, including the specific IAM configurations needed for GKE, and to give different level of access to the VPC subnets to different identities.
+
+It is meant to be used as a starting point for most Shared VPC configurations, and to be integrated to the above examples where Shared VPC is needed in more complex network topologies.