This repository has been archived by the owner on Sep 30, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat/msp: allow enablement of logical replication features for Datast…
…ream (#63092) Adds a new `postgreSQL.logicalReplication` configuration to allow MSP to generate prerequisite setup for integration with Datastream: https://cloud.google.com/datastream/docs/sources-postgresql. Integration with Datastream allows the Data Analytics team to self-serve data enrichment needs for the Telemetry V2 pipeline. Enabling this feature entails downtime (Cloud SQL instance restart), so enabling the logical replication feature at the Cloud SQL level (`cloudsql.logical_decoding`) is gated behind `postgreSQL.logicalReplication: {}`. Setting up the required stuff in Postgres is a bit complicated, requiring 3 Postgres provider instances: 1. The default admin one, authenticated with our admin user 2. New: a workload identity provider, using cyrilgdn/terraform-provider-postgresql#448 / sourcegraph/managed-services-platform-cdktf#11. This is required for creating a publication on selected tables, which requires being owner of said table. Because tables are created by application using e.g. auto-migrate, the workload identity is always the table owner, so we need to impersonate the IAM user 3. New: a "replication user" which is created with the replication permission. Replication seems to not be a propagated permission so we need a role/user that has replication enabled. A bit more context scattered here and there in the docstrings. Beyond the Postgres configuration we also introduce some additional resources to enable easy Datastream configuration: 1. Datastream Private Connection, which peers to the service private network 2. Cloud SQL Proxy VM, which only allows connections to `:5432` from the range specified in 1, allowing a connection to the Cloud SQL instance 2. Datastream Connection Profile attached to 1 From there, data team can click-ops or manage the Datastream Stream and BigQuery destination on their own. Closes CORE-165 Closes CORE-212 Sample config: ```yaml resources: postgreSQL: databases: - "primary" logicalReplication: publications: - name: testing database: primary tables: - users ``` ## Test plan sourcegraph/managed-services#1569 ## Changelog - MSP services can now configure `postgreSQL.logicalReplication` to enable Data Analytics team to replicate selected database tables into BigQuery.
- Loading branch information
1 parent
eca5706
commit 84014e3
Showing
16 changed files
with
667 additions
and
25 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
22 changes: 22 additions & 0 deletions
22
dev/managedservicesplatform/internal/resource/datastreamconnection/BUILD.bazel
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
load("@io_bazel_rules_go//go:def.bzl", "go_library") | ||
|
||
go_library( | ||
name = "datastreamconnection", | ||
srcs = ["datastreamconnection.go"], | ||
importpath = "github.com/sourcegraph/sourcegraph/dev/managedservicesplatform/internal/resource/datastreamconnection", | ||
visibility = ["//dev/managedservicesplatform:__subpackages__"], | ||
deps = [ | ||
"//dev/managedservicesplatform/internal/resource/cloudsql", | ||
"//dev/managedservicesplatform/internal/resource/postgresqllogicalreplication", | ||
"//dev/managedservicesplatform/internal/resource/privatenetwork", | ||
"//dev/managedservicesplatform/internal/resource/serviceaccount", | ||
"//dev/managedservicesplatform/internal/resourceid", | ||
"//lib/pointers", | ||
"@com_github_aws_constructs_go_constructs_v10//:constructs", | ||
"@com_github_hashicorp_terraform_cdk_go_cdktf//:cdktf", | ||
"@com_github_sourcegraph_managed_services_platform_cdktf_gen_google//computefirewall", | ||
"@com_github_sourcegraph_managed_services_platform_cdktf_gen_google//computeinstance", | ||
"@com_github_sourcegraph_managed_services_platform_cdktf_gen_google//datastreamconnectionprofile", | ||
"@com_github_sourcegraph_managed_services_platform_cdktf_gen_google//datastreamprivateconnection", | ||
], | ||
) |
193 changes: 193 additions & 0 deletions
193
dev/managedservicesplatform/internal/resource/datastreamconnection/datastreamconnection.go
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,193 @@ | ||
package datastreamconnection | ||
|
||
import ( | ||
"fmt" | ||
|
||
"github.com/aws/constructs-go/constructs/v10" | ||
"github.com/hashicorp/terraform-cdk-go/cdktf" | ||
|
||
"github.com/sourcegraph/managed-services-platform-cdktf/gen/google/computefirewall" | ||
"github.com/sourcegraph/managed-services-platform-cdktf/gen/google/computeinstance" | ||
"github.com/sourcegraph/managed-services-platform-cdktf/gen/google/datastreamprivateconnection" | ||
|
||
"github.com/sourcegraph/managed-services-platform-cdktf/gen/google/datastreamconnectionprofile" | ||
|
||
"github.com/sourcegraph/sourcegraph/dev/managedservicesplatform/internal/resource/cloudsql" | ||
"github.com/sourcegraph/sourcegraph/dev/managedservicesplatform/internal/resource/postgresqllogicalreplication" | ||
"github.com/sourcegraph/sourcegraph/dev/managedservicesplatform/internal/resource/privatenetwork" | ||
"github.com/sourcegraph/sourcegraph/dev/managedservicesplatform/internal/resource/serviceaccount" | ||
"github.com/sourcegraph/sourcegraph/dev/managedservicesplatform/internal/resourceid" | ||
"github.com/sourcegraph/sourcegraph/lib/pointers" | ||
) | ||
|
||
type Config struct { | ||
VPC *privatenetwork.Output | ||
CloudSQL *cloudsql.Output | ||
// CloudSQLClientServiceAccount is used for establishing a proxy that can | ||
// connect to the Cloud SQL instance. | ||
CloudSQLClientServiceAccount serviceaccount.Output | ||
|
||
Publications []postgresqllogicalreplication.PublicationOutput | ||
PublicationUserGrants []cdktf.ITerraformDependable | ||
} | ||
|
||
type Output struct { | ||
} | ||
|
||
// New provisions everything needed for Datastream to connect to Cloud SQL proxy: | ||
// | ||
// Datastream --peering-> Private Network -> Cloud SQL Proxy VM -> Cloud SQL | ||
// | ||
// We need an additional VM proxying connections to Cloud SQL because Datastream | ||
// and Cloud SQL both have their own internal VPCs, and we cannot transitively | ||
// peer them over the private network we manage. | ||
func New(scope constructs.Construct, id resourceid.ID, config Config) (*Output, error) { | ||
const proxyInstanceName = "msp-datastream-cloudsqlproxy" | ||
|
||
cloudsqlproxyInstance := computeinstance.NewComputeInstance(scope, id.TerraformID("cloudsqlproxy"), &computeinstance.ComputeInstanceConfig{ | ||
Name: pointers.Ptr(proxyInstanceName), | ||
Description: pointers.Ptr("Cloud SQL proxy to allow Datastream to connect to Cloud SQL over private network"), | ||
|
||
// Just use a random zone in the same region as the Cloud SQL instance | ||
Zone: pointers.Stringf("%s-a", *config.CloudSQL.Instance.Region()), | ||
|
||
MachineType: pointers.Ptr("e2-micro"), | ||
NetworkInterface: []computeinstance.ComputeInstanceNetworkInterface{{ | ||
Network: config.VPC.Network.Name(), | ||
Subnetwork: config.VPC.Subnetwork.Name(), | ||
}}, | ||
ServiceAccount: &computeinstance.ComputeInstanceServiceAccount{ | ||
Email: &config.CloudSQLClientServiceAccount.Email, | ||
Scopes: &[]*string{pointers.Ptr("https://www.googleapis.com/auth/cloud-platform")}, | ||
}, | ||
BootDisk: &computeinstance.ComputeInstanceBootDisk{ | ||
InitializeParams: &computeinstance.ComputeInstanceBootDiskInitializeParams{ | ||
Image: pointers.Ptr("cos-cloud/cos-stable"), | ||
Size: pointers.Float64(10), // Gb | ||
}, | ||
}, | ||
Tags: &[]*string{pointers.Ptr(proxyInstanceName)}, | ||
|
||
// See docstring of newMetadataGCEContainerDeclaration for details about | ||
// the label and metadata. | ||
Labels: &map[string]*string{ | ||
"container-vm": pointers.Ptr(proxyInstanceName), | ||
"msp": pointers.Ptr("true"), | ||
}, | ||
Metadata: &map[string]*string{ | ||
"gce-container-declaration": pointers.Ptr( | ||
newMetadataGCEContainerDeclaration(proxyInstanceName, *config.CloudSQL.Instance.ConnectionName())), | ||
}, | ||
}) | ||
|
||
const dsPrivateConnectionSubnet = "10.126.0.0/29" // any '/29' range | ||
datastreamConnection := datastreamprivateconnection.NewDatastreamPrivateConnection(scope, id.TerraformID("cloudsqlproxy-privateconnection"), &datastreamprivateconnection.DatastreamPrivateConnectionConfig{ | ||
DisplayName: pointers.Ptr(proxyInstanceName), | ||
PrivateConnectionId: pointers.Ptr(proxyInstanceName), | ||
Location: config.CloudSQL.Instance.Region(), | ||
VpcPeeringConfig: &datastreamprivateconnection.DatastreamPrivateConnectionVpcPeeringConfig{ | ||
Vpc: config.VPC.Network.Id(), | ||
Subnet: pointers.Ptr(dsPrivateConnectionSubnet), | ||
}, | ||
Labels: &map[string]*string{"msp": pointers.Ptr("true")}, | ||
}) | ||
|
||
// Allow ingress from Datastream | ||
datastreamIngressFirewall := computefirewall.NewComputeFirewall(scope, id.TerraformID("cloudsqlproxy-firewall-datastream-ingress"), &computefirewall.ComputeFirewallConfig{ | ||
Name: pointers.Stringf("%s-datastream-ingress", proxyInstanceName), | ||
Description: pointers.Ptr("Allow incoming connections from a Datastream private connection to the Cloud SQL Proxy VM"), | ||
Network: config.VPC.Network.Name(), | ||
Priority: pointers.Float64(1000), | ||
|
||
Direction: pointers.Ptr("INGRESS"), | ||
SourceRanges: &[]*string{ | ||
pointers.Ptr(dsPrivateConnectionSubnet), | ||
}, | ||
Allow: []computefirewall.ComputeFirewallAllow{{ | ||
Protocol: pointers.Ptr("tcp"), | ||
Ports: &[]*string{pointers.Ptr("5432")}, | ||
}}, | ||
TargetTags: cloudsqlproxyInstance.Tags(), | ||
}) | ||
|
||
// Allow IAP ingress for debug https://cloud.google.com/iap/docs/using-tcp-forwarding | ||
_ = computefirewall.NewComputeFirewall(scope, id.TerraformID("cloudsqlproxy-firewall-iap-ingress"), &computefirewall.ComputeFirewallConfig{ | ||
Name: pointers.Stringf("%s-iap-ingress", proxyInstanceName), | ||
Description: pointers.Ptr("Allow incoming connections from GCP IAP to the Cloud SQL Proxy VM"), | ||
Network: config.VPC.Network.Name(), | ||
Priority: pointers.Float64(1000), | ||
|
||
Direction: pointers.Ptr("INGRESS"), | ||
SourceRanges: &[]*string{ | ||
pointers.Ptr("35.235.240.0/20"), | ||
}, | ||
Allow: []computefirewall.ComputeFirewallAllow{{ | ||
Protocol: pointers.Ptr("tcp"), | ||
Ports: &[]*string{pointers.Ptr("22")}, | ||
}}, | ||
TargetTags: cloudsqlproxyInstance.Tags(), | ||
}) | ||
|
||
for _, pub := range config.Publications { | ||
id := id.Group(pub.Name) | ||
|
||
// The Datastream Connection Profile is what the data team will click-ops | ||
// during their creation of the actual Datastream "Stream". | ||
// https://cloud.google.com/datastream/docs/create-a-stream | ||
// | ||
// This is where we stop managing things directly in MSP. | ||
_ = datastreamconnectionprofile.NewDatastreamConnectionProfile(scope, id.TerraformID("cloudsqlproxy-connectionprofile"), &datastreamconnectionprofile.DatastreamConnectionProfileConfig{ | ||
DisplayName: pointers.Stringf("MSP Publication - %s", pub.Name), | ||
ConnectionProfileId: pointers.Stringf("msp-publication-%s", pub.Name), | ||
Labels: &map[string]*string{ | ||
"msp": pointers.Ptr("true"), | ||
"database": pointers.Ptr(pub.Database), | ||
"pg_replication_slot": pub.ReplicationSlotName, | ||
"pg_publication": pub.PublicationName, | ||
}, | ||
Location: config.CloudSQL.Instance.Region(), | ||
PostgresqlProfile: &datastreamconnectionprofile.DatastreamConnectionProfilePostgresqlProfile{ | ||
Hostname: cloudsqlproxyInstance.NetworkInterface(). | ||
Get(pointers.Float64(0)). | ||
NetworkIp(), // internal IP of the instance | ||
Port: pointers.Float64(5432), | ||
|
||
Database: pointers.Ptr(pub.Database), | ||
Username: pub.User.Name(), | ||
Password: pub.User.Password(), | ||
}, | ||
PrivateConnectivity: &datastreamconnectionprofile.DatastreamConnectionProfilePrivateConnectivity{ | ||
PrivateConnection: datastreamConnection.Name(), | ||
}, | ||
DependsOn: pointers.Ptr(append(config.PublicationUserGrants, | ||
datastreamIngressFirewall)), | ||
}) | ||
} | ||
|
||
return &Output{}, nil | ||
} | ||
|
||
// newMetadataGCEContainerDeclaration recreates the metadata value that GCP | ||
// provides when you click-ops a Compute Engine VM that runs a container. GCP | ||
// manages the container lifecycle which is quite nice. Sadly this isn't | ||
// available via an official Terraform API, but we can replicate that GCP does | ||
// and hope they don't change anything. | ||
func newMetadataGCEContainerDeclaration(containerName, cloudSQLConnectionString string) string { | ||
// Note the docstring about how this format is not a public API - it's | ||
// generated by GCP, and we include that as well | ||
return fmt.Sprintf(` | ||
spec: | ||
restartPolicy: Always | ||
containers: | ||
- name: %s | ||
image: gcr.io/cloud-sql-connectors/cloud-sql-proxy | ||
args: | ||
- '--auto-iam-authn' | ||
- '--private-ip' | ||
- '--address=0.0.0.0' | ||
- '%s' | ||
# This container declaration format is not public API and may change without notice. Please | ||
# use gcloud command-line tool or Google Cloud Console to run Containers on Google Compute Engine.`, | ||
containerName, cloudSQLConnectionString) | ||
} |
20 changes: 20 additions & 0 deletions
20
dev/managedservicesplatform/internal/resource/postgresqllogicalreplication/BUILD.bazel
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
load("@io_bazel_rules_go//go:def.bzl", "go_library") | ||
|
||
go_library( | ||
name = "postgresqllogicalreplication", | ||
srcs = ["postgresqllogicalreplication.go"], | ||
importpath = "github.com/sourcegraph/sourcegraph/dev/managedservicesplatform/internal/resource/postgresqllogicalreplication", | ||
visibility = ["//dev/managedservicesplatform:__subpackages__"], | ||
deps = [ | ||
"//dev/managedservicesplatform/internal/resource/cloudsql", | ||
"//dev/managedservicesplatform/internal/resourceid", | ||
"//dev/managedservicesplatform/spec", | ||
"//lib/pointers", | ||
"@com_github_aws_constructs_go_constructs_v10//:constructs", | ||
"@com_github_hashicorp_terraform_cdk_go_cdktf//:cdktf", | ||
"@com_github_sourcegraph_managed_services_platform_cdktf_gen_postgresql//publication", | ||
"@com_github_sourcegraph_managed_services_platform_cdktf_gen_postgresql//replicationslot", | ||
"@com_github_sourcegraph_managed_services_platform_cdktf_gen_postgresql//role", | ||
"@com_github_sourcegraph_managed_services_platform_cdktf_gen_random//password", | ||
], | ||
) |
Oops, something went wrong.