diff --git a/blueprints/cloud-operations/network-dashboard/README.md b/blueprints/cloud-operations/network-dashboard/README.md index 768e0f12d1..1fe0960f7a 100644 --- a/blueprints/cloud-operations/network-dashboard/README.md +++ b/blueprints/cloud-operations/network-dashboard/README.md @@ -1,103 +1,89 @@ -# Networking Dashboard +# Network Dashboard and Discovery Tool -This repository provides an end-to-end solution to gather some GCP Networking quotas and limits (that cannot be seen in the GCP console today) and display them in a dashboard. -The goal is to allow for better visibility of these limits, facilitating capacity planning and avoiding hitting these limits. +This repository provides an end-to-end solution to gather some GCP networking quotas, limits, and their corresponding usage, store them in Cloud Operations timeseries which can displayed in one or more dashboards or wired to alerts. -Here is an example of dashboard you can get with this solution: +The goal is to allow for better visibility of these limits, some of which cannot be seen in the GCP console today, facilitating capacity planning and being notified when actual usage approaches them. + +The tool tracks several distinct usage types across a variety of resources: projects, policies, networks, subnetworks, peering groups, etc. For each usage type three distinct metrics are created tracking usage count, limit and utilization ratio. + +The screenshot below is an example of a simple dashboard provided with this blueprint, showing utilization for a specific metric (number of instances per VPC) for multiple VPCs and projects: -Here you see utilization (usage compared to the limit) for a specific metric (number of instances per VPC) for multiple VPCs and projects. - -Three metric descriptors are created for each monitored resource: usage, limit and utilization. You can follow each of these and create alerting policies if a threshold is reached. - -## Usage - -Clone this repository, then go through the following steps to create resources: -- Create a terraform.tfvars file with the following content: - ```tfvars - organization_id = "" - billing_account = "" - monitoring_project_id = "" - # Monitoring project where the dashboard will be created and the solution deployed, a project named "mon-network-dahshboard" will be created if left blank - monitored_projects_list = ["project-1", "project2"] - # Projects to be monitored by the solution - monitored_folders_list = ["folder_id"] - # Folders to be monitored by the solution - prefix = "" - # Monitoring project name prefix, monitoring project name is -network-dashboard, ignored if monitoring_project_id variable is provided - cf_version = V1|V2 - # Set to V2 to use V2 Cloud Functions environment - ``` -- `terraform init` -- `terraform apply` - -Note: Org level viewing permission is required for some metrics such as firewall policies. - -Once the resources are deployed, go to the following page to see the dashboard: https://console.cloud.google.com/monitoring/dashboards?project= a dashboard called "quotas-utilization" should be created. - -The Cloud Function runs every 10 minutes by default so you should start getting some data points after a few minutes. -You can use the metric explorer to view the data points for the different custom metrics created: https://console.cloud.google.com/monitoring/metrics-explorer?project=. -You can change this frequency by modifying the "schedule_cron" variable in variables.tf. - -Note that some charts in the dashboard align values over 1h so you might need to wait 1h to see charts on the dashboard views. - -Once done testing, you can clean up resources by running `terraform destroy`. - -## Supported limits and quotas -The Cloud Function currently tracks usage, limit and utilization of: -- active VPC peerings per VPC -- VPC peerings per VPC -- instances per VPC -- instances per VPC peering group -- Subnet IP ranges per VPC peering group -- internal forwarding rules for internal L4 load balancers per VPC -- internal forwarding rules for internal L7 load balancers per VPC -- internal forwarding rules for internal L4 load balancers per VPC peering group -- internal forwarding rules for internal L7 load balancers per VPC peering group -- Dynamic routes per VPC -- Dynamic routes per VPC peering group -- Static routes per project (VPC drill down is available for usage) -- Static routes per VPC peering group -- IP utilization per subnet (% of IP addresses used in a subnet) -- VPC firewall rules per project (VPC drill down is available for usage) -- Tuples per Firewall Policy - -It writes this values to custom metrics in Cloud Monitoring and creates a dashboard to visualize the current utilization of these metrics in Cloud Monitoring. - -Note that metrics are created in the cloud-function/metrics.yaml file. You can also edit default limits for a specific network in that file. See the example for `vpc_peering_per_network`. +One other example is the IP utilization information per subnet, allowing you to monitor the percentage of used IP addresses in your GCP subnets. + +More complex scenarios are possible by leveraging and combining the 50 different timeseries created by this tool, and connecting them to Cloud Operations dashboards and alerts. + +Refer to the [Cloud Function deployment instructions](./deploy-cloud-function/) for a high level overview and an end-to-end deployment example, and to the[discovery tool documentation](./src/) to try it as a standalone program or to package it in alternative ways. + +## Metrics created + +- `firewall_policy/tuples_available` +- `firewall_policy/tuples_used` +- `firewall_policy/tuples_used_ratio` +- `network/firewall_rules_used` +- `network/forwarding_rules_l4_available` +- `network/forwarding_rules_l4_used` +- `network/forwarding_rules_l4_used_ratio` +- `network/forwarding_rules_l7_available` +- `network/forwarding_rules_l7_used` +- `network/forwarding_rules_l7_used_ratio` +- `network/instances_available` +- `network/instances_used` +- `network/instances_used_ratio` +- `network/peerings_active_available` +- `network/peerings_active_used` +- `network/peerings_active_used_ratio` +- `network/peerings_total_available` +- `network/peerings_total_used` +- `network/peerings_total_used_ratio` +- `network/routes_dynamic_available` +- `network/routes_dynamic_used` +- `network/routes_dynamic_used_ratio` +- `network/routes_static_used` +- `network/subnets_available` +- `network/subnets_used` +- `network/subnets_used_ratio` +- `peering_group/forwarding_rules_l4_available` +- `peering_group/forwarding_rules_l4_used` +- `peering_group/forwarding_rules_l4_used_ratio` +- `peering_group/forwarding_rules_l7_available` +- `peering_group/forwarding_rules_l7_used` +- `peering_group/forwarding_rules_l7_used_ratio` +- `peering_group/instances_available` +- `peering_group/instances_used` +- `peering_group/instances_used_ratio` +- `peering_group/routes_dynamic_available` +- `peering_group/routes_dynamic_used` +- `peering_group/routes_dynamic_used_ratio` +- `peering_group/routes_static_available` +- `peering_group/routes_static_used` +- `peering_group/routes_static_used_ratio` +- `project/firewall_rules_available` +- `project/firewall_rules_used` +- `project/firewall_rules_used_ratio` +- `project/routes_static_available` +- `project/routes_static_used` +- `project/routes_static_used_ratio` +- `subnetwork/addresses_available` +- `subnetwork/addresses_used` +- `subnetwork/addresses_used_ratio` ## Assumptions and limitations -- The CF assumes that all VPCs in peering groups are within the same organization, except for PSA peerings -- The CF will only fetch subnet utilization data from the PSA peerings (not the VMs, ILB or routes usage) -- The CF assumes global routing is ON, this impacts dynamic routes usage calculation -- The CF assumes custom routes importing/exporting is ON, this impacts static and dynamic routes usage calculation -- The CF assumes all networks in peering groups have the same global routing and custom routes sharing configuration - -## Next steps and ideas -In a future release, we could support: -- Google managed VPCs that are peered with PSA (such as Cloud SQL or Memorystore) -- Dynamic routes calculation for VPCs/PPGs with "global routing" set to OFF -- Static routes calculation for projects/PPGs with "custom routes importing/exporting" set to OFF -- Calculations for cross Organization peering groups -- Support different scopes (reduced and fine-grained) - -If you are interested in this and/or would like to contribute, please contact legranda@google.com. - - -## Variables - -| name | description | type | required | default | -|---|---|:---:|:---:|:---:| -| [billing_account](variables.tf#L17) | The ID of the billing account to associate this project with. | | ✓ | | -| [monitored_projects_list](variables.tf#L36) | ID of the projects to be monitored (where limits and quotas data will be pulled). | list(string) | ✓ | | -| [organization_id](variables.tf#L46) | The organization id for the associated services. | | ✓ | | -| [prefix](variables.tf#L50) | Prefix used for resource names. | string | ✓ | | -| [cf_version](variables.tf#L21) | Cloud Function version 2nd Gen or 1st Gen. Possible options: 'V1' or 'V2'.Use CFv2 if your Cloud Function timeouts after 9 minutes. By default it is using CFv1. | | | V1 | -| [monitored_folders_list](variables.tf#L30) | ID of the projects to be monitored (where limits and quotas data will be pulled). | list(string) | | [] | -| [monitoring_project_id](variables.tf#L41) | Monitoring project where the dashboard will be created and the solution deployed; a project will be created if set to empty string. | | | | -| [project_monitoring_services](variables.tf#L59) | Service APIs enabled in the monitoring project if it will be created. | | | […] | -| [region](variables.tf#L81) | Region used to deploy the cloud functions and scheduler. | | | europe-west1 | -| [schedule_cron](variables.tf#L86) | Cron format schedule to run the Cloud Function. Default is every 10 minutes. | | | */10 * * * * | - - + +- The tool assumes all VPCs in peering groups are within the same organization, except for PSA peerings. +- The tool will only fetch subnet utilization data from the PSA peerings (not the VMs, ILB or routes usage). +- The tool assumes global routing is ON, this impacts dynamic routes usage calculation. +- The tool assumes custom routes importing/exporting is ON, this impacts static and dynamic routes usage calculation. +- The tool assumes all networks in peering groups have the same global routing and custom routes sharing configuration. + +## TODO + +These are some of our ideas for additional features: + +- support PSA-peered Google VPCs (Cloud SQL, Memorystore, etc.) +- dynamic routes for VPCs/peering groups with "global routing" turned off +- static routes calculation for projects/peering groups with custom routes import/export turned off +- cross-organization peering groups + +If you are interested in this and/or would like to contribute, please open an issue in this repository or send us a PR. diff --git a/blueprints/cloud-operations/network-dashboard/cloud-function/main.py b/blueprints/cloud-operations/network-dashboard/cloud-function/main.py deleted file mode 100644 index 8e7640dd44..0000000000 --- a/blueprints/cloud-operations/network-dashboard/cloud-function/main.py +++ /dev/null @@ -1,242 +0,0 @@ -# -# Copyright 2022 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# CFv2 define whether to use Cloud function 2nd generation or 1st generation - -import re -from distutils.command.config import config -import os -import time -from google.cloud import monitoring_v3, asset_v1 -from google.protobuf import field_mask_pb2 -from googleapiclient import discovery -from metrics import ilb_fwrules, firewall_policies, instances, networks, metrics, limits, peerings, routes, subnets, vpc_firewalls, secondarys - -CF_VERSION = os.environ.get("CF_VERSION") - - -def get_monitored_projects_list(config): - ''' - Gets the projects to be monitored from the MONITORED_FOLDERS_LIST environment variable. - - Parameters: - config (dict): The dict containing config like clients and limits - Returns: - monitored_projects (List of strings): Full list of projects to be monitored - ''' - monitored_projects = config["monitored_projects"] - monitored_folders = os.environ.get("MONITORED_FOLDERS_LIST").split(",") - - # Handling empty monitored folders list - if monitored_folders == ['']: - monitored_folders = [] - - # Gets all projects under each monitored folder (and even in sub folders) - for folder in monitored_folders: - read_mask = field_mask_pb2.FieldMask() - read_mask.FromJsonString('name,versionedResources') - - response = config["clients"]["asset_client"].search_all_resources( - request={ - "scope": f"folders/{folder}", - "asset_types": ["cloudresourcemanager.googleapis.com/Project"], - "read_mask": read_mask - }) - - for resource in response: - for versioned in resource.versioned_resources: - for field_name, field_value in versioned.resource.items(): - if field_name == "projectId": - project_id = field_value - # Avoid duplicate - if project_id not in monitored_projects: - monitored_projects.append(project_id) - - print("List of projects to be monitored:") - print(monitored_projects) - - return monitored_projects - - -def monitoring_interval(): - ''' - Creates the monitoring interval of 24 hours - Returns: - monitoring_v3.TimeInterval: Monitoring time interval of 24h - ''' - now = time.time() - seconds = int(now) - nanos = int((now - seconds) * 10**9) - return monitoring_v3.TimeInterval({ - "end_time": { - "seconds": seconds, - "nanos": nanos - }, - "start_time": { - "seconds": (seconds - 24 * 60 * 60), - "nanos": nanos - }, - }) - - -config = { - # Organization ID containing the projects to be monitored - "organization": - os.environ.get("ORGANIZATION_ID"), - # list of projects from which function will get quotas information - "monitored_projects": - os.environ.get("MONITORED_PROJECTS_LIST").split(","), - "monitoring_project": - os.environ.get('MONITORING_PROJECT_ID'), - "monitoring_project_link": - f"projects/{os.environ.get('MONITORING_PROJECT_ID')}", - "monitoring_interval": - monitoring_interval(), - "limit_names": { - "GCE_INSTANCES": - "compute.googleapis.com/quota/instances_per_vpc_network/limit", - "L4": - "compute.googleapis.com/quota/internal_lb_forwarding_rules_per_vpc_network/limit", - "L7": - "compute.googleapis.com/quota/internal_managed_forwarding_rules_per_vpc_network/limit", - "SUBNET_RANGES": - "compute.googleapis.com/quota/subnet_ranges_per_vpc_network/limit" - }, - "lb_scheme": { - "L7": "INTERNAL_MANAGED", - "L4": "INTERNAL" - }, - "clients": { - "discovery_client": discovery.build('compute', 'v1'), - "asset_client": asset_v1.AssetServiceClient(), - "monitoring_client": monitoring_v3.MetricServiceClient() - }, - # Improve performance for Asset Inventory queries on large environments - "page_size": - 500, - "series_buffer": [], -} - - -def main(event, context=None): - ''' - Cloud Function Entry point, called by the scheduler. - Parameters: - event: Not used for now (Pubsub trigger) - context: Not used for now (Pubsub trigger) - Returns: - 'Function executed successfully' - ''' - # Handling empty monitored projects list - if config["monitored_projects"] == ['']: - config["monitored_projects"] = [] - - # Gets projects and folders to be monitored - config["monitored_projects"] = get_monitored_projects_list(config) - - # Keep the monitoring interval up2date during each run - config["monitoring_interval"] = monitoring_interval() - - metrics_dict, limits_dict = metrics.create_metrics( - config["monitoring_project_link"], config) - project_quotas_dict = limits.get_quota_project_limit(config) - - firewalls_dict = vpc_firewalls.get_firewalls_dict(config) - firewall_policies_dict = firewall_policies.get_firewall_policies_dict(config) - - # IP utilization subnet level metrics - subnets.get_subnets(config, metrics_dict) - - # IP utilization secondary range metrics - secondarys.get_secondaries(config, metrics_dict) - - # Asset inventory queries - gce_instance_dict = instances.get_gce_instance_dict(config) - l4_forwarding_rules_dict = ilb_fwrules.get_forwarding_rules_dict(config, "L4") - l7_forwarding_rules_dict = ilb_fwrules.get_forwarding_rules_dict(config, "L7") - subnet_range_dict = networks.get_subnet_ranges_dict(config) - static_routes_dict = routes.get_static_routes_dict(config) - dynamic_routes_dict = routes.get_dynamic_routes( - config, metrics_dict, limits_dict['dynamic_routes_per_network_limit']) - - try: - - # Per Project metrics - vpc_firewalls.get_firewalls_data(config, metrics_dict, project_quotas_dict, - firewalls_dict) - # Per Firewall Policy metrics - firewall_policies.get_firewal_policies_data(config, metrics_dict, - firewall_policies_dict) - # Per Network metrics - instances.get_gce_instances_data(config, metrics_dict, gce_instance_dict, - limits_dict['number_of_instances_limit']) - ilb_fwrules.get_forwarding_rules_data( - config, metrics_dict, l4_forwarding_rules_dict, - limits_dict['internal_forwarding_rules_l4_limit'], "L4") - ilb_fwrules.get_forwarding_rules_data( - config, metrics_dict, l7_forwarding_rules_dict, - limits_dict['internal_forwarding_rules_l7_limit'], "L7") - - routes.get_static_routes_data(config, metrics_dict, static_routes_dict, - project_quotas_dict) - - peerings.get_vpc_peering_data(config, metrics_dict, - limits_dict['number_of_vpc_peerings_limit']) - - # Per VPC peering group metrics - metrics.get_pgg_data( - config, - metrics_dict["metrics_per_peering_group"]["instance_per_peering_group"], - gce_instance_dict, config["limit_names"]["GCE_INSTANCES"], - limits_dict['number_of_instances_ppg_limit']) - metrics.get_pgg_data( - config, metrics_dict["metrics_per_peering_group"] - ["l4_forwarding_rules_per_peering_group"], l4_forwarding_rules_dict, - config["limit_names"]["L4"], - limits_dict['internal_forwarding_rules_l4_ppg_limit']) - metrics.get_pgg_data( - config, metrics_dict["metrics_per_peering_group"] - ["l7_forwarding_rules_per_peering_group"], l7_forwarding_rules_dict, - config["limit_names"]["L7"], - limits_dict['internal_forwarding_rules_l7_ppg_limit']) - metrics.get_pgg_data( - config, metrics_dict["metrics_per_peering_group"] - ["subnet_ranges_per_peering_group"], subnet_range_dict, - config["limit_names"]["SUBNET_RANGES"], - limits_dict['number_of_subnet_IP_ranges_ppg_limit']) - #static - routes.get_routes_ppg( - config, metrics_dict["metrics_per_peering_group"] - ["static_routes_per_peering_group"], static_routes_dict, - limits_dict['static_routes_per_peering_group_limit']) - #dynamic - routes.get_routes_ppg( - config, metrics_dict["metrics_per_peering_group"] - ["dynamic_routes_per_peering_group"], dynamic_routes_dict, - limits_dict['dynamic_routes_per_peering_group_limit']) - except Exception as e: - print("Error writing metrics") - print(e) - finally: - metrics.flush_series_buffer(config) - - return 'Function execution completed' - - -if CF_VERSION == "V2": - import functions_framework - main_http = functions_framework.http(main) - -if __name__ == "__main__": - main(None, None) \ No newline at end of file diff --git a/blueprints/cloud-operations/network-dashboard/cloud-function/metrics.yaml b/blueprints/cloud-operations/network-dashboard/cloud-function/metrics.yaml deleted file mode 100644 index 217599634a..0000000000 --- a/blueprints/cloud-operations/network-dashboard/cloud-function/metrics.yaml +++ /dev/null @@ -1,223 +0,0 @@ -# -# Copyright 2022 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# ---- -metrics_per_subnet: - ip_usage_per_subnet: - usage: - name: number_of_ip_used - description: Number of used IP addresses in the subnet. - utilization: - name: ip_addresses_per_subnet_utilization - description: Percentage of IP used in the subnet. - limit: - name: number_of_max_ip - description: Number of available IP addresses in the subnet. - ip_usage_per_secondaryRange: - usage: - name: number_of_sr_ip_used - description: Number of used IP addresses in the secondary range. - utilization: - name: ip_addresses_per_sr_utilization - description: Percentage of IP used in the secondary range. - limit: - name: number_of_max_sr_ip - description: Number of available IP addresses in the secondary range. -metrics_per_network: - instance_per_network: - usage: - name: number_of_instances_usage - description: Number of instances per VPC network - usage. - limit: - name: number_of_instances_limit - description: Number of instances per VPC network - limit. - values: - default_value: 15000 - utilization: - name: number_of_instances_utilization - description: Number of instances per VPC network - utilization. - vpc_peering_active_per_network: - usage: - name: number_of_active_vpc_peerings_usage - description: Number of active VPC Peerings per VPC - usage. - limit: - name: number_of_active_vpc_peerings_limit - description: Number of active VPC Peerings per VPC - limit. - values: - default_value: 25 - utilization: - name: number_of_active_vpc_peerings_utilization - description: Number of active VPC Peerings per VPC - utilization. - vpc_peering_per_network: - usage: - name: number_of_vpc_peerings_usage - description: Number of VPC Peerings per VPC - usage. - limit: - name: number_of_vpc_peerings_limit - description: Number of VPC Peerings per VPC - limit. - values: - default_value: 25 - https://www.googleapis.com/compute/v1/projects/net-dash-test-host-prod/global/networks/vpc-prod: 40 - utilization: - name: number_of_vpc_peerings_utilization - description: Number of VPC Peerings per VPC - utilization. - l4_forwarding_rules_per_network: - usage: - name: internal_forwarding_rules_l4_usage - description: Number of Internal Forwarding Rules for Internal L4 Load Balancers - usage. - limit: - name: internal_forwarding_rules_l4_limit - description: Number of Internal Forwarding Rules for Internal L4 Load Balancers - limit. - values: - default_value: 500 - utilization: - name: internal_forwarding_rules_l4_utilization - description: Number of Internal Forwarding Rules for Internal L4 Load Balancers - utilization. - l7_forwarding_rules_per_network: - usage: - name: internal_forwarding_rules_l7_usage - description: Number of Internal Forwarding Rules for Internal L7 Load Balancers per network - usage. - limit: - name: internal_forwarding_rules_l7_limit - description: Number of Internal Forwarding Rules for Internal L7 Load Balancers per network - effective limit. - values: - default_value: 75 - utilization: - name: internal_forwarding_rules_l7_utilization - description: Number of Internal Forwarding Rules for Internal L7 Load Balancers per Vnetwork - utilization. - dynamic_routes_per_network: - usage: - name: dynamic_routes_per_network_usage - description: Number of Dynamic routes per network - usage. - limit: - name: dynamic_routes_per_network_limit - description: Number of Dynamic routes per network - limit. - values: - default_value: 100 - utilization: - name: dynamic_routes_per_network_utilization - description: Number of Dynamic routes per network - utilization. - #static routes limit is per project, but usage is per network - static_routes_per_project: - usage: - name: static_routes_per_project_vpc_usage - description: Number of Static routes per project and network - usage. - limit: - name: static_routes_per_project_limit - description: Number of Static routes per project - limit. - values: - default_value: 250 - utilization: - name: static_routes_per_project_utilization - description: Number of Static routes per project - utilization. -metrics_per_peering_group: - l4_forwarding_rules_per_peering_group: - usage: - name: internal_forwarding_rules_l4_ppg_usage - description: Number of Internal Forwarding Rules for Internal L4 Load Balancers per VPC peering group - usage. - limit: - name: internal_forwarding_rules_l4_ppg_limit - description: Number of Internal Forwarding Rules for Internal L4 Load Balancers per VPC peering group - effective limit. - values: - default_value: 500 - utilization: - name: internal_forwarding_rules_l4_ppg_utilization - description: Number of Internal Forwarding Rules for Internal L4 Load Balancers per VPC peering group - utilization. - l7_forwarding_rules_per_peering_group: - usage: - name: internal_forwarding_rules_l7_ppg_usage - description: Number of Internal Forwarding Rules for Internal L7 Load Balancers per VPC peering group - usage. - limit: - name: internal_forwarding_rules_l7_ppg_limit - description: Number of Internal Forwarding Rules for Internal L7 Load Balancers per VPC peering group - effective limit. - values: - default_value: 175 - utilization: - name: internal_forwarding_rules_l7_ppg_utilization - description: Number of Internal Forwarding Rules for Internal L7 Load Balancers per VPC peering group - utilization. - subnet_ranges_per_peering_group: - usage: - name: number_of_subnet_IP_ranges_ppg_usage - description: Number of Subnet Ranges per peering group - usage. - limit: - name: number_of_subnet_IP_ranges_ppg_limit - description: Number of Subnet Ranges per peering group - effective limit. - values: - default_value: 400 - utilization: - name: number_of_subnet_IP_ranges_ppg_utilization - description: Number of Subnet Ranges per peering group - utilization. - instance_per_peering_group: - usage: - name: number_of_instances_ppg_usage - description: Number of instances per peering group - usage. - limit: - name: number_of_instances_ppg_limit - description: Number of instances per peering group - limit. - values: - default_value: 15500 - utilization: - name: number_of_instances_ppg_utilization - description: Number of instances per peering group - utilization. - dynamic_routes_per_peering_group: - usage: - name: dynamic_routes_per_peering_group_usage - description: Number of Dynamic routes per peering group - usage. - limit: - name: dynamic_routes_per_peering_group_limit - description: Number of Dynamic routes per peering group - limit. - values: - default_value: 300 - utilization: - name: dynamic_routes_per_peering_group_utilization - description: Number of Dynamic routes per peering group - utilization. - static_routes_per_peering_group: - usage: - name: static_routes_per_peering_group_usage - description: Number of Static routes per peering group - usage. - limit: - name: static_routes_per_peering_group_limit - description: Number of Static routes per peering group - limit. - values: - default_value: 300 - utilization: - name: static_routes_per_peering_group_utilization - description: Number of Static routes per peering group - utilization. -metrics_per_project: - firewalls: - usage: - name: firewalls_per_project_vpc_usage - description: Number of VPC firewall rules in a project - usage. - limit: - # Firewalls limit is per project and we get the limit for the GCP quota API in vpc_firewalls.py - name: firewalls_per_project_limit - description: Number of VPC firewall rules in a project - limit. - utilization: - name: firewalls_per_project_utilization - description: Number of VPC firewall rules in a project - utilization. -metrics_per_firewall_policy: - firewall_policy_tuples: - usage: - name: firewall_policy_tuples_per_policy_usage - description: Number of tuples in a firewall policy - usage. - limit: - # This limit is not visibile through Google APIs, set default_value - name: firewall_policy_tuples_per_policy_limit - description: Number of tuples in a firewall policy - limit. - values: - default_value: 2000 - utilization: - name: firewall_policy_tuples_per_policy_utilization - description: Number of tuples in a firewall policy - utilization. diff --git a/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/firewall_policies.py b/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/firewall_policies.py deleted file mode 100644 index 95a26db383..0000000000 --- a/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/firewall_policies.py +++ /dev/null @@ -1,118 +0,0 @@ -# -# Copyright 2022 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -import re -import time - -from collections import defaultdict -from pydoc import doc -from collections import defaultdict -from google.protobuf import field_mask_pb2 -from . import metrics, networks, limits - - -def get_firewall_policies_dict(config: dict): - ''' - Calls the Asset Inventory API to get all Firewall Policies under the GCP organization, including children - Ignores monitored projects list: returns all policies regardless of their parent resource - Parameters: - config (dict): The dict containing config like clients and limits - Returns: - firewal_policies_dict (dictionary of dictionary): Keys are policy ids, subkeys are policy field values - ''' - - firewall_policies_dict = defaultdict(int) - read_mask = field_mask_pb2.FieldMask() - read_mask.FromJsonString('name,versionedResources') - - response = config["clients"]["asset_client"].search_all_resources( - request={ - "scope": f"organizations/{config['organization']}", - "asset_types": ["compute.googleapis.com/FirewallPolicy"], - "read_mask": read_mask, - }) - for resource in response: - for versioned in resource.versioned_resources: - firewall_policy = dict() - for field_name, field_value in versioned.resource.items(): - firewall_policy[field_name] = field_value - firewall_policies_dict[firewall_policy['id']] = firewall_policy - return firewall_policies_dict - - -def get_firewal_policies_data(config, metrics_dict, firewall_policies_dict): - ''' - Gets the data for VPC Firewall Policies in an organization, including children. All folders are considered, - only projects in the monitored projects list are considered. - Parameters: - config (dict): The dict containing config like clients and limits - metrics_dict (dictionary of dictionary of string: string): metrics names and descriptions. - firewall_policies_dict (dictionary of of dictionary of string: string): Keys are policies ids, subkeys are policies values - Returns: - None - ''' - - current_tuples_limit = None - try: - current_tuples_limit = metrics_dict["metrics_per_firewall_policy"][ - "firewall_policy_tuples"]["limit"]["values"]["default_value"] - except Exception: - print( - f"Could not determine number of tuples metric limit due to missing default value" - ) - if current_tuples_limit < 0: - print( - f"Could not determine number of tuples metric limit as default value is <= 0" - ) - - timestamp = time.time() - for firewall_policy_key in firewall_policies_dict: - firewall_policy = firewall_policies_dict[firewall_policy_key] - - # may either be a org, a folder, or a project - # folder and org require to split {folder,organization}\/\w+ - parent = re.search("(\w+$)", firewall_policy["parent"]).group( - 1) if "parent" in firewall_policy else re.search( - "([\d,a-z,-]+)(\/[\d,a-z,-]+\/firewallPolicies/[\d,a-z,-]*$)", - firewall_policy["selfLink"]).group(1) - parent_type = re.search("(^\w+)", firewall_policy["parent"]).group( - 1) if "parent" in firewall_policy else "projects" - - if parent_type == "projects" and parent not in config["monitored_projects"]: - continue - - metric_labels = {'parent': parent, 'parent_type': parent_type} - - metric_labels["name"] = firewall_policy[ - "displayName"] if "displayName" in firewall_policy else firewall_policy[ - "name"] - - metrics.append_data_to_series_buffer( - config, metrics_dict["metrics_per_firewall_policy"] - [f"firewall_policy_tuples"]["usage"]["name"], - firewall_policy['ruleTupleCount'], metric_labels, timestamp=timestamp) - if not current_tuples_limit == None and current_tuples_limit > 0: - metrics.append_data_to_series_buffer( - config, metrics_dict["metrics_per_firewall_policy"] - [f"firewall_policy_tuples"]["limit"]["name"], current_tuples_limit, - metric_labels, timestamp=timestamp) - metrics.append_data_to_series_buffer( - config, metrics_dict["metrics_per_firewall_policy"] - [f"firewall_policy_tuples"]["utilization"]["name"], - firewall_policy['ruleTupleCount'] / current_tuples_limit, - metric_labels, timestamp=timestamp) - - print(f"Buffered number tuples per Firewall Policy") diff --git a/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/ilb_fwrules.py b/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/ilb_fwrules.py deleted file mode 100644 index de8274d973..0000000000 --- a/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/ilb_fwrules.py +++ /dev/null @@ -1,122 +0,0 @@ -# -# Copyright 2022 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -import time - -from collections import defaultdict -from google.protobuf import field_mask_pb2 -from . import metrics, networks, limits - - -def get_forwarding_rules_dict(config, layer: str): - ''' - Calls the Asset Inventory API to get all L4 Forwarding Rules under the GCP organization. - - Parameters: - config (dict): The dict containing config like clients and limits - layer (string): the Layer to get Forwarding rules (L4/L7) - Returns: - forwarding_rules_dict (dictionary of string: int): Keys are the network links and values are the number of Forwarding Rules per network. - ''' - - read_mask = field_mask_pb2.FieldMask() - read_mask.FromJsonString('name,versionedResources') - - forwarding_rules_dict = defaultdict(int) - - response = config["clients"]["asset_client"].search_all_resources( - request={ - "scope": f"organizations/{config['organization']}", - "asset_types": ["compute.googleapis.com/ForwardingRule"], - "read_mask": read_mask, - "page_size": config["page_size"], - }) - - for resource in response: - internal = False - network_link = "" - for versioned in resource.versioned_resources: - for field_name, field_value in versioned.resource.items(): - if field_name == "loadBalancingScheme": - internal = (field_value == config["lb_scheme"][layer]) - if field_name == "network": - network_link = field_value - if internal: - if network_link in forwarding_rules_dict: - forwarding_rules_dict[network_link] += 1 - else: - forwarding_rules_dict[network_link] = 1 - - return forwarding_rules_dict - - -def get_forwarding_rules_data(config, metrics_dict, forwarding_rules_dict, - limit_dict, layer): - ''' - Gets the data for L4 Internal Forwarding Rules per VPC Network and writes it to the metric defined in forwarding_rules_metric. - - Parameters: - config (dict): The dict containing config like clients and limits - metrics_dict (dictionary of dictionary of string: string): metrics names and descriptions. - forwarding_rules_dict (dictionary of string: int): Keys are the network links and values are the number of Forwarding Rules per network. - limit_dict (dictionary of string:int): Dictionary with the network link as key and the limit as value. - layer (string): the Layer to get Forwarding rules (L4/L7) - Returns: - None - ''' - - timestamp = time.time() - for project_id in config["monitored_projects"]: - network_dict = networks.get_networks(config, project_id) - - current_quota_limit = limits.get_quota_current_limit( - config, f"projects/{project_id}", config["limit_names"][layer]) - - if current_quota_limit is None: - print( - f"Could not determine {layer} forwarding rules to metric for projects/{project_id} due to missing quotas" - ) - continue - - current_quota_limit_view = metrics.customize_quota_view(current_quota_limit) - - for net in network_dict: - limits.set_limits(net, current_quota_limit_view, limit_dict) - - usage = 0 - if net['self_link'] in forwarding_rules_dict: - usage = forwarding_rules_dict[net['self_link']] - - metric_labels = { - 'project': project_id, - 'network_name': net['network_name'] - } - metrics.append_data_to_series_buffer( - config, metrics_dict["metrics_per_network"] - [f"{layer.lower()}_forwarding_rules_per_network"]["usage"]["name"], - usage, metric_labels, timestamp=timestamp) - metrics.append_data_to_series_buffer( - config, metrics_dict["metrics_per_network"] - [f"{layer.lower()}_forwarding_rules_per_network"]["limit"]["name"], - net['limit'], metric_labels, timestamp=timestamp) - metrics.append_data_to_series_buffer( - config, metrics_dict["metrics_per_network"] - [f"{layer.lower()}_forwarding_rules_per_network"]["utilization"] - ["name"], usage / net['limit'], metric_labels, timestamp=timestamp) - - print( - f"Buffered number of {layer} forwarding rules to metric for projects/{project_id}" - ) \ No newline at end of file diff --git a/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/instances.py b/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/instances.py deleted file mode 100644 index d3b72e678e..0000000000 --- a/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/instances.py +++ /dev/null @@ -1,103 +0,0 @@ -# -# Copyright 2022 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -import time - -from code import interact -from collections import defaultdict -from . import metrics, networks, limits - - -def get_gce_instance_dict(config: dict): - ''' - Calls the Asset Inventory API to get all GCE instances under the GCP organization. - - Parameters: - config (dict): The dict containing config like clients and limits - - Returns: - gce_instance_dict (dictionary of string: int): Keys are the network links and values are the number of GCE Instances per network. - ''' - - gce_instance_dict = defaultdict(int) - - response = config["clients"]["asset_client"].search_all_resources( - request={ - "scope": f"organizations/{config['organization']}", - "asset_types": ["compute.googleapis.com/Instance"], - "page_size": config["page_size"], - }) - for resource in response: - for field_name, field_value in resource.additional_attributes.items(): - if field_name == "networkInterfaceNetworks": - for network in field_value: - if network in gce_instance_dict: - gce_instance_dict[network] += 1 - else: - gce_instance_dict[network] = 1 - - return gce_instance_dict - - -def get_gce_instances_data(config, metrics_dict, gce_instance_dict, limit_dict): - ''' - Gets the data for GCE instances per VPC Network and writes it to the metric defined in instance_metric. - - Parameters: - config (dict): The dict containing config like clients and limits - metrics_dict (dictionary of dictionary of string: string): metrics names and descriptions - gce_instance_dict (dictionary of string: int): Keys are the network links and values are the number of GCE Instances per network. - limit_dict (dictionary of string:int): Dictionary with the network link as key and the limit as value - Returns: - gce_instance_dict - ''' - timestamp = time.time() - for project_id in config["monitored_projects"]: - network_dict = networks.get_networks(config, project_id) - - current_quota_limit = limits.get_quota_current_limit( - config, f"projects/{project_id}", - config["limit_names"]["GCE_INSTANCES"]) - if current_quota_limit is None: - print( - f"Could not determine number of instances for projects/{project_id} due to missing quotas" - ) - - current_quota_limit_view = metrics.customize_quota_view(current_quota_limit) - - for net in network_dict: - limits.set_limits(net, current_quota_limit_view, limit_dict) - - usage = 0 - if net['self_link'] in gce_instance_dict: - usage = gce_instance_dict[net['self_link']] - - metric_labels = { - 'project': project_id, - 'network_name': net['network_name'] - } - metrics.append_data_to_series_buffer( - config, metrics_dict["metrics_per_network"]["instance_per_network"] - ["usage"]["name"], usage, metric_labels, timestamp=timestamp) - metrics.append_data_to_series_buffer( - config, metrics_dict["metrics_per_network"]["instance_per_network"] - ["limit"]["name"], net['limit'], metric_labels, timestamp=timestamp) - metrics.append_data_to_series_buffer( - config, metrics_dict["metrics_per_network"]["instance_per_network"] - ["utilization"]["name"], usage / net['limit'], metric_labels, - timestamp=timestamp) - - print(f"Buffered number of instances to metric for projects/{project_id}") diff --git a/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/limits.py b/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/limits.py deleted file mode 100644 index edd4a50b3d..0000000000 --- a/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/limits.py +++ /dev/null @@ -1,236 +0,0 @@ -# -# Copyright 2022 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -import time - -from google.api_core import exceptions -from google.cloud import monitoring_v3 -from . import metrics - - -def get_quotas_dict(quotas_list): - ''' - Creates a dictionary of quotas from a list, with lower case quota name as keys - Parameters: - quotas_array (array): array of quotas - Returns: - quotas_dict (dict): dictionary of quotas - ''' - quota_keys = [q['metric'] for q in quotas_list] - quotas_dict = dict() - i = 0 - for key in quota_keys: - if ("metric" in quotas_list[i]): - del (quotas_list[i]["metric"]) - quotas_dict[key.lower()] = quotas_list[i] - i += 1 - return quotas_dict - - -def get_quota_project_limit(config, regions=["global"]): - ''' - Retrieves quotas for all monitored project in selected regions, default 'global' - Parameters: - project_link (string): Project link. - Returns: - quotas (dict): quotas for all selected regions, default 'global' - ''' - try: - request = {} - quotas = dict() - for project in config["monitored_projects"]: - quotas[project] = dict() - if regions != ["global"]: - for region in regions: - request = config["clients"]["discovery_client"].compute.regions().get( - region=region, project=project) - response = request.execute() - quotas[project][region] = get_quotas_dict(response['quotas']) - else: - region = "global" - request = config["clients"]["discovery_client"].projects().get( - project=project, fields="quotas") - response = request.execute() - quotas[project][region] = get_quotas_dict(response['quotas']) - - return quotas - except exceptions.PermissionDenied as err: - print( - f"Warning: error reading quotas for {project}. " + - f"This can happen if you don't have permissions on the project, for example if the project is in another organization or a Google managed project" - ) - return None - - -def get_ppg(network_link, limit_dict): - ''' - Checks if this network has a specific limit for a metric, if so, returns that limit, if not, returns the default limit. - - Parameters: - network_link (string): VPC network link. - limit_list (list of string): Used to get the limit per VPC or the default limit. - Returns: - limit_dict (dictionary of string:int): Dictionary with the network link as key and the limit as value - ''' - if network_link in limit_dict: - return limit_dict[network_link] - else: - if 'default_value' in limit_dict: - return limit_dict['default_value'] - else: - print(f"Error: limit not found for {network_link}") - return 0 - - -def set_limits(network_dict, quota_limit, limit_dict): - ''' - Updates the network dictionary with quota limit values. - - Parameters: - network_dict (dictionary of string: string): Contains network information. - quota_limit (list of dictionaries of string: string): Current quota limit. - limit_dict (dictionary of string:int): Dictionary with the network link as key and the limit as value - Returns: - None - ''' - - network_dict['limit'] = None - - if quota_limit: - for net in quota_limit: - if net['network_id'] == network_dict['network_id']: - network_dict['limit'] = net['value'] - return - - network_link = f"https://www.googleapis.com/compute/v1/projects/{network_dict['project_id']}/global/networks/{network_dict['network_name']}" - - if network_link in limit_dict: - network_dict['limit'] = limit_dict[network_link] - else: - if 'default_value' in limit_dict: - network_dict['limit'] = limit_dict['default_value'] - else: - print(f"Error: Couldn't find limit for {network_link}") - network_dict['limit'] = 0 - - -def get_quota_current_limit(config, project_link, metric_name): - ''' - Retrieves limit for a specific metric. - - Parameters: - project_link (string): Project link. - metric_name (string): Name of the metric. - Returns: - results_list (list of string): Current limit. - ''' - - try: - results = config["clients"]["monitoring_client"].list_time_series( - request={ - "name": project_link, - "filter": f'metric.type = "{metric_name}"', - "interval": config["monitoring_interval"], - "view": monitoring_v3.ListTimeSeriesRequest.TimeSeriesView.FULL - }) - results_list = list(results) - return results_list - except exceptions.PermissionDenied as err: - print( - f"Warning: error reading quotas for {project_link}. " + - f"This can happen if you don't have permissions on the project, for example if the project is in another organization or a Google managed project" - ) - return None - - -def count_effective_limit(config, project_id, network_dict, usage_metric_name, - limit_metric_name, utilization_metric_name, - limit_dict, timestamp=None): - ''' - Calculates the effective limits (using algorithm in the link below) for peering groups and writes data (usage, limit, utilization) to the custom metrics. - Source: https://cloud.google.com/vpc/docs/quota#vpc-peering-effective-limit - - Parameters: - config (dict): The dict containing config like clients and limits - project_id (string): Project ID for the project to be analyzed. - network_dict (dictionary of string: string): Contains all required information about the network to get the usage, limit and utilization. - usage_metric_name (string): Name of the custom metric to be populated for usage per VPC peering group. - limit_metric_name (string): Name of the custom metric to be populated for limit per VPC peering group. - utilization_metric_name (string): Name of the custom metric to be populated for utilization per VPC peering group. - limit_dict (dictionary of string:int): Dictionary containing the limit per peering group (either VPC specific or default limit). - timestamp (time): timestamp to be recorded for all points - Returns: - None - ''' - - if timestamp == None: - timestamp = time.time() - - if network_dict['peerings'] == []: - return - - # Get usage: Sums usage for current network + all peered networks - peering_group_usage = network_dict['usage'] - for peered_network in network_dict['peerings']: - if 'usage' not in peered_network: - print( - f"Cannot add metrics for peered network in projects/{project_id} as no usage metrics exist due to missing permissions" - ) - continue - peering_group_usage += peered_network['usage'] - - network_link = f"https://www.googleapis.com/compute/v1/projects/{project_id}/global/networks/{network_dict['network_name']}" - - # Calculates effective limit: Step 1: max(per network limit, per network_peering_group limit) - limit_step1 = max(network_dict['limit'], get_ppg(network_link, limit_dict)) - - # Calculates effective limit: Step 2: List of max(per network limit, per network_peering_group limit) for each peered network - limit_step2 = [] - for peered_network in network_dict['peerings']: - peered_network_link = f"https://www.googleapis.com/compute/v1/projects/{peered_network['project_id']}/global/networks/{peered_network['network_name']}" - - if 'limit' in peered_network: - limit_step2.append( - max(peered_network['limit'], get_ppg(peered_network_link, - limit_dict))) - else: - print( - f"Ignoring projects/{peered_network['project_id']} for limits in peering group of project {project_id} as no limits are available." - + - "This can happen if you don't have permissions on the project, for example if the project is in another organization or a Google managed project" - ) - - # Calculates effective limit: Step 3: Find minimum from the list created by Step 2 - limit_step3 = 0 - if len(limit_step2) > 0: - limit_step3 = min(limit_step2) - - # Calculates effective limit: Step 4: Find maximum from step 1 and step 3 - effective_limit = max(limit_step1, limit_step3) - utilization = peering_group_usage / effective_limit - metric_labels = { - 'project': project_id, - 'network_name': network_dict['network_name'] - } - metrics.append_data_to_series_buffer(config, usage_metric_name, - peering_group_usage, metric_labels, - timestamp=timestamp) - metrics.append_data_to_series_buffer(config, limit_metric_name, - effective_limit, metric_labels, - timestamp=timestamp) - metrics.append_data_to_series_buffer(config, utilization_metric_name, - utilization, metric_labels, - timestamp=timestamp) diff --git a/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/metrics.py b/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/metrics.py deleted file mode 100644 index 8e0c4082ba..0000000000 --- a/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/metrics.py +++ /dev/null @@ -1,267 +0,0 @@ -# -# Copyright 2022 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -from curses import KEY_MARK -import re -import time -import yaml -from google.api import metric_pb2 as ga_metric -from google.cloud import monitoring_v3 -from . import peerings, limits, networks - - -def create_metrics(monitoring_project, config): - ''' - Creates all Cloud Monitoring custom metrics based on the metric.yaml file - Parameters: - monitoring_project (string): the project where the metrics are written to - config (dict): The dict containing config like clients and limits - Returns: - metrics_dict (dictionary of dictionary of string: string): metrics names and descriptions - limits_dict (dictionary of dictionary of string: int): limits_dict[metric_name]: dict[network_name] = limit_value - ''' - client = config["clients"]["monitoring_client"] - existing_metrics = [] - for desc in client.list_metric_descriptors(name=monitoring_project): - existing_metrics.append(desc.type) - limits_dict = {} - - with open("./metrics.yaml", 'r') as stream: - try: - metrics_dict = yaml.safe_load(stream) - - for metric_list in metrics_dict.values(): - for metric_name, metric in metric_list.items(): - for sub_metric_key, sub_metric in metric.items(): - metric_link = f"custom.googleapis.com/{sub_metric['name']}" - # If the metric doesn't exist yet, then we create it - if metric_link not in existing_metrics: - create_metric(sub_metric["name"], sub_metric["description"], - monitoring_project, config) - # Parse limits for network and peering group metrics - # Subnet level metrics have a different limit: the subnet IP range size - if sub_metric_key == "limit" and ( - metric_name != "ip_usage_per_subnet" and - metric_name != "ip_usage_per_secondaryRange"): - limits_dict_for_metric = {} - if "values" in sub_metric: - for network_link, limit_value in sub_metric["values"].items(): - limits_dict_for_metric[network_link] = limit_value - limits_dict[sub_metric["name"]] = limits_dict_for_metric - - return metrics_dict, limits_dict - except yaml.YAMLError as exc: - print(exc) - - -def create_metric(metric_name, description, monitoring_project, config): - ''' - Creates a Cloud Monitoring metric based on the parameter given if the metric is not already existing - Parameters: - metric_name (string): Name of the metric to be created - description (string): Description of the metric to be created - monitoring_project (string): the project where the metrics are written to - config (dict): The dict containing config like clients and limits - Returns: - None - ''' - client = config["clients"]["monitoring_client"] - - descriptor = ga_metric.MetricDescriptor() - descriptor.type = f"custom.googleapis.com/{metric_name}" - descriptor.metric_kind = ga_metric.MetricDescriptor.MetricKind.GAUGE - descriptor.value_type = ga_metric.MetricDescriptor.ValueType.DOUBLE - descriptor.description = description - descriptor = client.create_metric_descriptor(name=monitoring_project, - metric_descriptor=descriptor) - print("Created {}.".format(descriptor.name)) - - -def append_data_to_series_buffer(config, metric_name, metric_value, - metric_labels, timestamp=None): - ''' - Appends data to Cloud Monitoring custom metrics, using a buffer. buffer is flushed every BUFFER_LEN elements, - any unflushed series is discarded upon function closure - Parameters: - config (dict): The dict containing config like clients and limits - metric_name (string): Name of the metric - metric_value (int): Value for the data point of the metric. - matric_labels (dictionary of dictionary of string: string): metric labels names and values - timestamp (float): seconds since the epoch, in UTC - Returns: - usage (int): Current usage for that network. - limit (int): Current usage for that network. - ''' - - # Configurable buffer size to improve performance when writing datapoints to metrics - buffer_len = 10 - - series = monitoring_v3.TimeSeries() - series.metric.type = f"custom.googleapis.com/{metric_name}" - series.resource.type = "global" - - for label_name in metric_labels: - if (metric_labels[label_name] != None): - series.metric.labels[label_name] = metric_labels[label_name] - - timestamp = timestamp if timestamp != None else time.time() - seconds = int(timestamp) - nanos = int((timestamp - seconds) * 10**9) - interval = monitoring_v3.TimeInterval( - {"end_time": { - "seconds": seconds, - "nanos": nanos - }}) - point = monitoring_v3.Point({ - "interval": interval, - "value": { - "double_value": metric_value - } - }) - series.points = [point] - - # TODO: sometimes this cashes with 'DeadlineExceeded: 504 Deadline expired before operation could complete' error - # Implement exponential backoff retries? - config["series_buffer"].append(series) - if len(config["series_buffer"]) >= buffer_len: - flush_series_buffer(config) - - -def flush_series_buffer(config): - ''' - writes buffered metrics to Google Cloud Monitoring, empties buffer upon both failure/success - config (dict): The dict containing config like clients and limits - ''' - try: - if config["series_buffer"] and len(config["series_buffer"]) > 0: - client = config["clients"]["monitoring_client"] - client.create_time_series(name=config["monitoring_project_link"], - time_series=config["series_buffer"]) - series_names = [ - re.search("\/(.+$)", series.metric.type).group(1) - for series in config["series_buffer"] - ] - print("Wrote time series: ", series_names) - except Exception as e: - print("Error while flushing series buffer") - print(e) - - config["series_buffer"] = [] - - -def get_pgg_data(config, metric_dict, usage_dict, limit_metric, limit_dict): - ''' - This function gets the usage, limit and utilization per VPC peering group for a specific metric for all projects to be monitored. - Parameters: - config (dict): The dict containing config like clients and limits - metric_dict (dictionary of string: string): Dictionary with the metric names and description, that will be used to populate the metrics - usage_dict (dictionnary of string:int): Dictionary with the network link as key and the number of resources as value - limit_metric (string): Name of the existing GCP metric for limit per VPC network - limit_dict (dictionary of string:int): Dictionary with the network link as key and the limit as value - Returns: - None - ''' - for project_id in config["monitored_projects"]: - network_dict_list = peerings.gather_peering_data(config, project_id) - # Network dict list is a list of dictionary (one for each network) - # For each network, this dictionary contains: - # project_id, network_name, network_id, usage, limit, peerings (list of peered networks) - # peerings is a list of dictionary (one for each peered network) and contains: - # project_id, network_name, network_id - current_quota_limit = limits.get_quota_current_limit( - config, f"projects/{project_id}", limit_metric) - if current_quota_limit is None: - print( - f"Could not determine number of L7 forwarding rules to metric for projects/{project_id} due to missing quotas" - ) - continue - - current_quota_limit_view = customize_quota_view(current_quota_limit) - - timestamp = time.time() - # For each network in this GCP project - for network_dict in network_dict_list: - if network_dict['network_id'] == 0: - print( - f"Could not determine {metric_dict['usage']['name']} for peering group {network_dict['network_name']} in {project_id} due to missing permissions." - ) - continue - network_link = f"https://www.googleapis.com/compute/v1/projects/{project_id}/global/networks/{network_dict['network_name']}" - - limit = networks.get_limit_network(network_dict, network_link, - current_quota_limit_view, limit_dict) - - usage = 0 - if network_link in usage_dict: - usage = usage_dict[network_link] - - # Here we add usage and limit to the network dictionary - network_dict["usage"] = usage - network_dict["limit"] = limit - - # For every peered network, get usage and limits - for peered_network_dict in network_dict['peerings']: - peered_network_link = f"https://www.googleapis.com/compute/v1/projects/{peered_network_dict['project_id']}/global/networks/{peered_network_dict['network_name']}" - peered_usage = 0 - if peered_network_link in usage_dict: - peered_usage = usage_dict[peered_network_link] - - current_peered_quota_limit = limits.get_quota_current_limit( - config, f"projects/{peered_network_dict['project_id']}", - limit_metric) - if current_peered_quota_limit is None: - print( - f"Could not determine metrics for peering to projects/{peered_network_dict['project_id']} due to missing quotas" - ) - continue - - peering_project_limit = customize_quota_view(current_peered_quota_limit) - - peered_limit = networks.get_limit_network(peered_network_dict, - peered_network_link, - peering_project_limit, - limit_dict) - # Here we add usage and limit to the peered network dictionary - peered_network_dict["usage"] = peered_usage - peered_network_dict["limit"] = peered_limit - - limits.count_effective_limit(config, project_id, network_dict, - metric_dict["usage"]["name"], - metric_dict["limit"]["name"], - metric_dict["utilization"]["name"], - limit_dict, timestamp) - print( - f"Buffered {metric_dict['usage']['name']} for peering group {network_dict['network_name']} in {project_id}" - ) - - -def customize_quota_view(quota_results): - ''' - Customize the quota output for an easier parsable output. - Parameters: - quota_results (string): Input from get_quota_current_usage or get_quota_current_limit. Contains the Current usage or limit for all networks in that project. - Returns: - quotaViewList (list of dictionaries of string: string): Current quota usage or limit. - ''' - quotaViewList = [] - for result in quota_results: - quotaViewJson = {} - quotaViewJson.update(dict(result.resource.labels)) - quotaViewJson.update(dict(result.metric.labels)) - for val in result.points: - quotaViewJson.update({'value': val.value.int64_value}) - quotaViewList.append(quotaViewJson) - return quotaViewList diff --git a/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/networks.py b/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/networks.py deleted file mode 100644 index 094f374ed6..0000000000 --- a/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/networks.py +++ /dev/null @@ -1,160 +0,0 @@ -# -# Copyright 2022 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -from code import interact -from collections import defaultdict -from google.protobuf import field_mask_pb2 -from googleapiclient import errors -import http - - -def get_subnet_ranges_dict(config: dict): - ''' - Calls the Asset Inventory API to get all Subnet ranges under the GCP organization. - - Parameters: - config (dict): The dict containing config like clients and limits - Returns: - subnet_range_dict (dictionary of string: int): Keys are the network links and values are the number of subnet ranges per network. - ''' - - subnet_range_dict = defaultdict(int) - read_mask = field_mask_pb2.FieldMask() - read_mask.FromJsonString('name,versionedResources') - - response = config["clients"]["asset_client"].search_all_resources( - request={ - "scope": f"organizations/{config['organization']}", - "asset_types": ["compute.googleapis.com/Subnetwork"], - "read_mask": read_mask, - "page_size": config["page_size"], - }) - for resource in response: - ranges = 0 - network_link = None - - for versioned in resource.versioned_resources: - for field_name, field_value in versioned.resource.items(): - if field_name == "network": - network_link = field_value - ranges += 1 - if field_name == "secondaryIpRanges": - for range in field_value: - ranges += 1 - - if network_link in subnet_range_dict: - subnet_range_dict[network_link] += ranges - else: - subnet_range_dict[network_link] = ranges - - return subnet_range_dict - - -def get_networks(config, project_id): - ''' - Returns a dictionary of all networks in a project. - - Parameters: - config (dict): The dict containing config like clients and limits - project_id (string): Project ID for the project containing the networks. - Returns: - network_dict (dictionary of string: string): Contains the project_id, network_name(s) and network_id(s) - ''' - request = config["clients"]["discovery_client"].networks().list( - project=project_id) - response = request.execute() - network_dict = [] - if 'items' in response: - for network in response['items']: - network_name = network['name'] - network_id = network['id'] - self_link = network['selfLink'] - d = { - 'project_id': project_id, - 'network_name': network_name, - 'network_id': network_id, - 'self_link': self_link - } - network_dict.append(d) - return network_dict - - -def get_network_id(config, project_id, network_name): - ''' - Returns the network_id for a specific project / network name. - - Parameters: - config (dict): The dict containing config like clients and limits - project_id (string): Project ID for the project containing the networks. - network_name (string): Name of the network - Returns: - network_id (int): Network ID. - ''' - request = config["clients"]["discovery_client"].networks().list( - project=project_id) - try: - response = request.execute() - except errors.HttpError as err: - # TODO: log proper warning - if err.resp.status == http.HTTPStatus.FORBIDDEN: - print( - f"Warning: error reading networks for {project_id}. " + - f"This can happen if you don't have permissions on the project, for example if the project is in another organization or a Google managed project" - ) - else: - print(f"Warning: error reading networks for {project_id}: {err}") - return 0 - - network_id = 0 - - if 'items' in response: - for network in response['items']: - if network['name'] == network_name: - network_id = network['id'] - break - - if network_id == 0: - print(f"Error: network_id not found for {network_name} in {project_id}") - - return network_id - - -def get_limit_network(network_dict, network_link, quota_limit, limit_dict): - ''' - Returns limit for a specific network and metric, using the GCP quota metrics or the values in the yaml file if not found. - - Parameters: - network_dict (dictionary of string: string): Contains network information. - network_link (string): Contains network link - quota_limit (list of dictionaries of string: string): Current quota limit for all networks in that project. - limit_dict (dictionary of string:int): Dictionary with the network link as key and the limit as value - Returns: - limit (int): Current limit for that network. - ''' - if quota_limit: - for net in quota_limit: - if net['network_id'] == network_dict['network_id']: - return net['value'] - - if network_link in limit_dict: - return limit_dict[network_link] - else: - if 'default_value' in limit_dict: - return limit_dict['default_value'] - else: - print(f"Error: Couldn't find limit for {network_link}") - - return 0 diff --git a/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/peerings.py b/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/peerings.py deleted file mode 100644 index 616c7f6630..0000000000 --- a/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/peerings.py +++ /dev/null @@ -1,179 +0,0 @@ -# -# Copyright 2022 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -import time - -from . import metrics, networks, limits - - -def get_vpc_peering_data(config, metrics_dict, limit_dict): - ''' - Gets the data for VPC peerings (active or not) and writes it to the metric defined (vpc_peering_active_metric and vpc_peering_metric). - - Parameters: - config (dict): The dict containing config like clients and limits - metrics_dict (dictionary of dictionary of string: string): metrics names and descriptions - limit_dict (dictionary of string:int): Dictionary with the network link as key and the limit as value - Returns: - None - ''' - timestamp = time.time() - for project in config["monitored_projects"]: - active_vpc_peerings, vpc_peerings = gather_vpc_peerings_data( - config, project, limit_dict) - - for peering in active_vpc_peerings: - metric_labels = { - 'project': project, - 'network_name': peering['network_name'] - } - metrics.append_data_to_series_buffer( - config, metrics_dict["metrics_per_network"] - ["vpc_peering_active_per_network"]["usage"]["name"], - peering['active_peerings'], metric_labels, timestamp=timestamp) - metrics.append_data_to_series_buffer( - config, metrics_dict["metrics_per_network"] - ["vpc_peering_active_per_network"]["limit"]["name"], - peering['network_limit'], metric_labels, timestamp=timestamp) - metrics.append_data_to_series_buffer( - config, metrics_dict["metrics_per_network"] - ["vpc_peering_active_per_network"]["utilization"]["name"], - peering['active_peerings'] / peering['network_limit'], metric_labels, - timestamp=timestamp) - print( - "Buffered number of active VPC peerings to custom metric for project:", - project) - - for peering in vpc_peerings: - metric_labels = { - 'project': project, - 'network_name': peering['network_name'] - } - metrics.append_data_to_series_buffer( - config, metrics_dict["metrics_per_network"]["vpc_peering_per_network"] - ["usage"]["name"], peering['peerings'], metric_labels, - timestamp=timestamp) - metrics.append_data_to_series_buffer( - config, metrics_dict["metrics_per_network"]["vpc_peering_per_network"] - ["limit"]["name"], peering['network_limit'], metric_labels, - timestamp=timestamp) - metrics.append_data_to_series_buffer( - config, metrics_dict["metrics_per_network"]["vpc_peering_per_network"] - ["utilization"]["name"], - peering['peerings'] / peering['network_limit'], metric_labels, - timestamp=timestamp) - print("Buffered number of VPC peerings to custom metric for project:", - project) - - -def gather_peering_data(config, project_id): - ''' - Returns a dictionary of all peerings for all networks in a project. - - Parameters: - config (dict): The dict containing config like clients and limits - project_id (string): Project ID for the project containing the networks. - Returns: - network_list (dictionary of string: string): Contains the project_id, network_name(s) and network_id(s) of peered networks. - ''' - request = config["clients"]["discovery_client"].networks().list( - project=project_id) - response = request.execute() - - network_list = [] - if 'items' in response: - for network in response['items']: - net = { - 'project_id': project_id, - 'network_name': network['name'], - 'network_id': network['id'], - 'peerings': [] - } - if 'peerings' in network: - STATE = network['peerings'][0]['state'] - if STATE == "ACTIVE": - for peered_network in network[ - 'peerings']: # "projects/{project_name}/global/networks/{network_name}" - start = peered_network['network'].find("projects/") + len( - 'projects/') - end = peered_network['network'].find("/global") - peered_project = peered_network['network'][start:end] - peered_network_name = peered_network['network'].split( - "networks/")[1] - peered_net = { - 'project_id': - peered_project, - 'network_name': - peered_network_name, - 'network_id': - networks.get_network_id(config, peered_project, - peered_network_name) - } - net["peerings"].append(peered_net) - network_list.append(net) - return network_list - - -def gather_vpc_peerings_data(config, project_id, limit_dict): - ''' - Gets the data for all VPC peerings (active or not) in project_id and writes it to the metric defined in vpc_peering_active_metric and vpc_peering_metric. - - Parameters: - config (dict): The dict containing config like clients and limits - project_id (string): We will take all VPCs in that project_id and look for all peerings to these VPCs. - limit_dict (dictionary of string:int): Dictionary with the network link as key and the limit as value - Returns: - active_peerings_dict (dictionary of string: string): Contains project_id, network_name, network_limit for each active VPC peering. - peerings_dict (dictionary of string: string): Contains project_id, network_name, network_limit for each VPC peering. - ''' - active_peerings_dict = [] - peerings_dict = [] - request = config["clients"]["discovery_client"].networks().list( - project=project_id) - response = request.execute() - if 'items' in response: - for network in response['items']: - if 'peerings' in network: - STATE = network['peerings'][0]['state'] - if STATE == "ACTIVE": - active_peerings_count = len(network['peerings']) - else: - active_peerings_count = 0 - - peerings_count = len(network['peerings']) - else: - peerings_count = 0 - active_peerings_count = 0 - - network_link = f"https://www.googleapis.com/compute/v1/projects/{project_id}/global/networks/{network['name']}" - network_limit = limits.get_ppg(network_link, limit_dict) - - active_d = { - 'project_id': project_id, - 'network_name': network['name'], - 'active_peerings': active_peerings_count, - 'network_limit': network_limit - } - active_peerings_dict.append(active_d) - d = { - 'project_id': project_id, - 'network_name': network['name'], - 'peerings': peerings_count, - 'network_limit': network_limit - } - peerings_dict.append(d) - - return active_peerings_dict, peerings_dict diff --git a/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/routers.py b/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/routers.py deleted file mode 100644 index 064354e7f6..0000000000 --- a/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/routers.py +++ /dev/null @@ -1,57 +0,0 @@ -# -# Copyright 2022 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -from google.protobuf import field_mask_pb2 - - -def get_routers(config): - ''' - Returns a dictionary of all Cloud Routers in the GCP organization. - - Parameters: - config (dict): The dict containing config like clients and limits - Returns: - routers_dict (dictionary of string: list of string): Key is the network link and value is a list of router links. - ''' - - read_mask = field_mask_pb2.FieldMask() - read_mask.FromJsonString('name,versionedResources') - - routers_dict = {} - - response = config["clients"]["asset_client"].search_all_resources( - request={ - "scope": f"organizations/{config['organization']}", - "asset_types": ["compute.googleapis.com/Router"], - "read_mask": read_mask, - "page_size": config["page_size"], - }) - for resource in response: - network_link = None - router_link = None - for versioned in resource.versioned_resources: - for field_name, field_value in versioned.resource.items(): - if field_name == "network": - network_link = field_value - if field_name == "selfLink": - router_link = field_value - - if network_link in routers_dict: - routers_dict[network_link].append(router_link) - else: - routers_dict[network_link] = [router_link] - - return routers_dict diff --git a/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/routes.py b/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/routes.py deleted file mode 100644 index a161454547..0000000000 --- a/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/routes.py +++ /dev/null @@ -1,289 +0,0 @@ -# -# Copyright 2022 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -import time - -from collections import defaultdict -from google.protobuf import field_mask_pb2 -from . import metrics, networks, limits, peerings, routers - - -def get_routes_for_router(config, project_id, router_region, router_name): - ''' - Returns the same of dynamic routes learned by a specific Cloud Router instance - - Parameters: - config (dict): The dict containing config like clients and limits - project_id (string): Project ID for the project containing the Cloud Router. - router_region (string): GCP region for the Cloud Router. - router_name (string): Cloud Router name. - Returns: - sum_routes (int): Number of dynamic routes learned by the Cloud Router. - ''' - request = config["clients"]["discovery_client"].routers().getRouterStatus( - project=project_id, region=router_region, router=router_name) - response = request.execute() - - sum_routes = 0 - - if 'result' in response: - if 'bgpPeerStatus' in response['result']: - for peer in response['result']['bgpPeerStatus']: - sum_routes += peer['numLearnedRoutes'] - - return sum_routes - - -def get_routes_for_network(config, network_link, project_id, routers_dict): - ''' - Returns a the number of dynamic routes for a given network - - Parameters: - config (dict): The dict containing config like clients and limits - network_link (string): Network self link. - project_id (string): Project ID containing the network. - routers_dict (dictionary of string: list of string): Dictionary with key as network link and value as list of router links. - Returns: - sum_routes (int): Number of routes in that network. - ''' - sum_routes = 0 - - if network_link in routers_dict: - for router_link in routers_dict[network_link]: - # Router link is using the following format: - # 'https://www.googleapis.com/compute/v1/projects/PROJECT_ID/regions/REGION/routers/ROUTER_NAME' - start = router_link.find("/regions/") + len("/regions/") - end = router_link.find("/routers/") - router_region = router_link[start:end] - router_name = router_link.split('/routers/')[1] - routes = get_routes_for_router(config, project_id, router_region, - router_name) - - sum_routes += routes - - return sum_routes - - -def get_dynamic_routes(config, metrics_dict, limits_dict): - ''' - This function gets the usage, limit and utilization for the dynamic routes per VPC - note: assumes global routing is ON for all VPCs - Parameters: - config (dict): The dict containing config like clients and limits - metrics_dict (dictionary of dictionary of string: string): metrics names and descriptions. - limits_dict (dictionary of string: int): key is network link (or 'default_value') and value is the limit for that network - Returns: - dynamic_routes_dict (dictionary of string: int): key is network link and value is the number of dynamic routes for that network - ''' - routers_dict = routers.get_routers(config) - dynamic_routes_dict = defaultdict(int) - - timestamp = time.time() - for project in config["monitored_projects"]: - network_dict = networks.get_networks(config, project) - - for net in network_dict: - sum_routes = get_routes_for_network(config, net['self_link'], project, - routers_dict) - dynamic_routes_dict[net['self_link']] = sum_routes - - if net['self_link'] in limits_dict: - limit = limits_dict[net['self_link']] - else: - if 'default_value' in limits_dict: - limit = limits_dict['default_value'] - else: - print("Error: couldn't find limit for dynamic routes.") - break - - utilization = sum_routes / limit - metric_labels = {'project': project, 'network_name': net['network_name']} - metrics.append_data_to_series_buffer( - config, metrics_dict["metrics_per_network"] - ["dynamic_routes_per_network"]["usage"]["name"], sum_routes, - metric_labels, timestamp=timestamp) - metrics.append_data_to_series_buffer( - config, metrics_dict["metrics_per_network"] - ["dynamic_routes_per_network"]["limit"]["name"], limit, metric_labels, - timestamp=timestamp) - metrics.append_data_to_series_buffer( - config, metrics_dict["metrics_per_network"] - ["dynamic_routes_per_network"]["utilization"]["name"], utilization, - metric_labels, timestamp=timestamp) - - print("Buffered metrics for dynamic routes for VPCs in project", project) - - return dynamic_routes_dict - - -def get_routes_ppg(config, metric_dict, usage_dict, limit_dict): - ''' - This function gets the usage, limit and utilization for the static or dynamic routes per VPC peering group. - note: assumes global routing is ON for all VPCs for dynamic routes, assumes share custom routes is on for all peered networks - Parameters: - config (dict): The dict containing config like clients and limits - metric_dict (dictionary of string: string): Dictionary with the metric names and description, that will be used to populate the metrics - usage_dict (dictionnary of string:int): Dictionary with the network link as key and the number of resources as value - limit_dict (dictionary of string:int): Dictionary with the network link as key and the limit as value - Returns: - None - ''' - timestamp = time.time() - for project_id in config["monitored_projects"]: - network_dict_list = peerings.gather_peering_data(config, project_id) - - for network_dict in network_dict_list: - network_link = f"https://www.googleapis.com/compute/v1/projects/{project_id}/global/networks/{network_dict['network_name']}" - - limit = limits.get_ppg(network_link, limit_dict) - - usage = 0 - if network_link in usage_dict: - usage = usage_dict[network_link] - - # Here we add usage and limit to the network dictionary - network_dict["usage"] = usage - network_dict["limit"] = limit - - # For every peered network, get usage and limits - for peered_network_dict in network_dict['peerings']: - peered_network_link = f"https://www.googleapis.com/compute/v1/projects/{peered_network_dict['project_id']}/global/networks/{peered_network_dict['network_name']}" - peered_usage = 0 - if peered_network_link in usage_dict: - peered_usage = usage_dict[peered_network_link] - - peered_limit = limits.get_ppg(peered_network_link, limit_dict) - - # Here we add usage and limit to the peered network dictionary - peered_network_dict["usage"] = peered_usage - peered_network_dict["limit"] = peered_limit - - limits.count_effective_limit(config, project_id, network_dict, - metric_dict["usage"]["name"], - metric_dict["limit"]["name"], - metric_dict["utilization"]["name"], - limit_dict, timestamp) - print( - f"Buffered {metric_dict['usage']['name']} for peering group {network_dict['network_name']} in {project_id}" - ) - - -def get_static_routes_dict(config): - ''' - Calls the Asset Inventory API to get all static custom routes under the GCP organization. - Parameters: - config (dict): The dict containing config like clients and limits - Returns: - routes_per_vpc_dict (dictionary of string: int): Keys are the network links and values are the number of custom static routes per network. - ''' - routes_per_vpc_dict = defaultdict() - usage_dict = defaultdict() - - read_mask = field_mask_pb2.FieldMask() - read_mask.FromJsonString('name,versionedResources') - - response = config["clients"]["asset_client"].search_all_resources( - request={ - "scope": f"organizations/{config['organization']}", - "asset_types": ["compute.googleapis.com/Route"], - "read_mask": read_mask - }) - - for resource in response: - for versioned in resource.versioned_resources: - static_route = dict() - for field_name, field_value in versioned.resource.items(): - static_route[field_name] = field_value - static_route["project_id"] = static_route["network"].split('/')[6] - static_route["network_name"] = static_route["network"].split('/')[-1] - network_link = f"https://www.googleapis.com/compute/v1/projects/{static_route['project_id']}/global/networks/{static_route['network_name']}" - #exclude default vpc and peering routes, dynamic routes are not in Cloud Asset Inventory - if "nextHopPeering" not in static_route and "nextHopNetwork" not in static_route: - if network_link not in routes_per_vpc_dict: - routes_per_vpc_dict[network_link] = dict() - routes_per_vpc_dict[network_link]["project_id"] = static_route[ - "project_id"] - routes_per_vpc_dict[network_link]["network_name"] = static_route[ - "network_name"] - if static_route["destRange"] not in routes_per_vpc_dict[network_link]: - routes_per_vpc_dict[network_link][static_route["destRange"]] = {} - if "usage" not in routes_per_vpc_dict[network_link]: - routes_per_vpc_dict[network_link]["usage"] = 0 - routes_per_vpc_dict[network_link][ - "usage"] = routes_per_vpc_dict[network_link]["usage"] + 1 - - #output a dict with network links and usage only - return { - network_link_out: routes_per_vpc_dict[network_link_out]["usage"] - for network_link_out in routes_per_vpc_dict - } - - -def get_static_routes_data(config, metrics_dict, static_routes_dict, - project_quotas_dict): - ''' - Determines and writes the number of static routes for each VPC in monitored projects, the per project limit and the per project utilization - note: assumes custom routes sharing is ON for all VPCs - Parameters: - config (dict): The dict containing config like clients and limits - metric_dict (dictionary of string: string): Dictionary with the metric names and description, that will be used to populate the metrics - static_routes_dict (dictionary of dictionary: int): Keys are the network links and values are the number of custom static routes per network. - project_quotas_dict (dictionary of string:int): Dictionary with the network link as key and the limit as value. - Returns: - None - ''' - timestamp = time.time() - project_usage = {project: 0 for project in config["monitored_projects"]} - - #usage is drilled down by network - for network_link in static_routes_dict: - - project_id = network_link.split('/')[6] - if (project_id not in config["monitored_projects"]): - continue - network_name = network_link.split('/')[-1] - - project_usage[project_id] = project_usage[project_id] + static_routes_dict[ - network_link] - - metric_labels = {"project": project_id, "network_name": network_name} - metrics.append_data_to_series_buffer( - config, metrics_dict["metrics_per_network"]["static_routes_per_project"] - ["usage"]["name"], static_routes_dict[network_link], metric_labels, - timestamp=timestamp) - - #limit and utilization are calculated by project - for project_id in project_usage: - current_quota_limit = project_quotas_dict[project_id]['global']["routes"][ - "limit"] - if current_quota_limit is None: - print( - f"Could not determine static routes metric for projects/{project_id} due to missing quotas" - ) - continue - # limit and utilization are calculted by project - metric_labels = {"project": project_id} - metrics.append_data_to_series_buffer( - config, metrics_dict["metrics_per_network"]["static_routes_per_project"] - ["limit"]["name"], current_quota_limit, metric_labels, - timestamp=timestamp) - metrics.append_data_to_series_buffer( - config, metrics_dict["metrics_per_network"]["static_routes_per_project"] - ["utilization"]["name"], - project_usage[project_id] / current_quota_limit, metric_labels, - timestamp=timestamp) - - return diff --git a/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/secondarys.py b/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/secondarys.py deleted file mode 100644 index 6030ddafdb..0000000000 --- a/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/secondarys.py +++ /dev/null @@ -1,266 +0,0 @@ -# -# Copyright 2022 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -import time - -from . import metrics -from google.protobuf import field_mask_pb2 -from google.protobuf.json_format import MessageToDict -import ipaddress - - -def get_all_secondaryRange(config): - ''' - Returns a dictionary with secondary range informations - Parameters: - config (dict): The dict containing config like clients and limits - Returns: - secondary_dict (dictionary of String: dictionary): Key is the project_id, - value is a nested dictionary with subnet_name/secondary_range_name as the key. - ''' - secondary_dict = {} - read_mask = field_mask_pb2.FieldMask() - read_mask.FromJsonString('name,versionedResources') - - response = config["clients"]["asset_client"].search_all_resources( - request={ - "scope": f"organizations/{config['organization']}", - "asset_types": ['compute.googleapis.com/Subnetwork'], - "read_mask": read_mask, - "page_size": config["page_size"], - }) - - for asset in response: - for versioned in asset.versioned_resources: - subnet_name = versioned.resource.get('name') - # Network self link format: - # "https://www.googleapis.com/compute/v1/projects//global/networks/" - project_id = versioned.resource.get('network').split('/')[6] - network_name = versioned.resource.get('network').split('/')[-1] - subnet_region = versioned.resource.get('region').split('/')[-1] - - # Check first if the subnet has any secondary ranges to begin with - if versioned.resource.get('secondaryIpRanges'): - for items in versioned.resource.get('secondaryIpRanges'): - # Each subnet can have multiple secondary ranges - secondaryRange_name = items.get('rangeName') - secondaryCidrBlock = items.get('ipCidrRange') - - net = ipaddress.ip_network(secondaryCidrBlock) - total_ip_addresses = int(net.num_addresses) - - if project_id not in secondary_dict: - secondary_dict[project_id] = {} - secondary_dict[project_id][f"{subnet_name}/{secondaryRange_name}"] = { - 'name': secondaryRange_name, - 'region': subnet_region, - 'subnetName': subnet_name, - 'ip_cidr_range': secondaryCidrBlock, - 'total_ip_addresses': total_ip_addresses, - 'used_ip_addresses': 0, - 'network_name': network_name - } - return secondary_dict - - -def compute_GKE_secondaryIP_utilization(config, read_mask, all_secondary_dict): - ''' - Counts the IP Addresses used by GKE (Pods and Services) - Parameters: - config (dict): The dict containing config like clients and limits - read_mask (FieldMask): read_mask to get additional metadata from Cloud Asset Inventory - all_secondary_dict (dict): Dict containing the secondary IP Range information for each subnets in the GCP organization - Returns: - all_secondary_dict (dict): Same dict but populated with GKE IP utilization information - ''' - cluster_secondary_dict = {} - node_secondary_dict = {} - - # Creating cluster dict - # Cluster dict has subnet information - response_cluster = config["clients"]["asset_client"].list_assets( - request={ - "parent": f"organizations/{config['organization']}", - "asset_types": ['container.googleapis.com/Cluster'], - "content_type": 'RESOURCE', - "page_size": config["page_size"], - }) - - for asset in response_cluster: - cluster_project = asset.resource.data['selfLink'].split('/')[5] - cluster_parent = "/".join(asset.resource.data['selfLink'].split('/')[5:10]) - cluster_subnetwork = asset.resource.data['subnetwork'] - cluster_service_rangeName = asset.resource.data['ipAllocationPolicy'][ - 'servicesSecondaryRangeName'] - - cluster_secondary_dict[f"{cluster_parent}/Service"] = { - "project": cluster_project, - "subnet": cluster_subnetwork, - "secondaryRange_name": cluster_service_rangeName, - 'used_ip_addresses': 0, - } - - for node_pool in asset.resource.data['nodePools']: - nodepool_name = node_pool['name'] - node_IPrange = node_pool['networkConfig']['podRange'] - cluster_secondary_dict[f"{cluster_parent}/{nodepool_name}"] = { - "project": cluster_project, - "subnet": cluster_subnetwork, - "secondaryRange_name": node_IPrange, - 'used_ip_addresses': 0, - } - - # Creating node dict - # Node dict allows 1:1 mapping of pod IP utilization, and which secondary Range it is using - response_node = config["clients"]["asset_client"].search_all_resources( - request={ - "scope": f"organizations/{config['organization']}", - "asset_types": ['k8s.io/Node'], - "read_mask": read_mask, - "page_size": config["page_size"], - }) - - for asset in response_node: - # Node name link format: - # "//container.googleapis.com/projects////clusters//k8s/nodes/" - node_parent = "/".join(asset.name.split('/')[4:9]) - node_name = asset.name.split('/')[-1] - node_full_name = f"{node_parent}/{node_name}" - - for versioned in asset.versioned_resources: - node_secondary_dict[node_full_name] = { - 'node_parent': - node_parent, - 'this_node_pool': - versioned.resource['metadata']['labels'] - ['cloud.google.com/gke-nodepool'], - 'used_ip_addresses': - 0 - } - - # Counting IP addresses used by pods in GKE - response_pods = config["clients"]["asset_client"].search_all_resources( - request={ - "scope": f"organizations/{config['organization']}", - "asset_types": ['k8s.io/Pod'], - "read_mask": read_mask, - "page_size": config["page_size"], - }) - - for asset in response_pods: - # Pod name link format: - # "//container.googleapis.com/projects////clusters//k8s/namespaces//pods/" - pod_parent = "/".join(asset.name.split('/')[4:9]) - - for versioned in asset.versioned_resources: - cur_PodIP = versioned.resource['status']['podIP'] - cur_HostIP = versioned.resource['status']['hostIP'] - host_node_name = versioned.resource['spec']['nodeName'] - pod_full_path = f"{pod_parent}/{host_node_name}" - - # A check to make sure pod is not using node IP - if cur_PodIP != cur_HostIP: - node_secondary_dict[pod_full_path]['used_ip_addresses'] += 1 - - # Counting IP addresses used by Service in GKE - response_service = config["clients"]["asset_client"].search_all_resources( - request={ - "scope": f"organizations/{config['organization']}", - "asset_types": ['k8s.io/Service'], - "read_mask": read_mask, - "page_size": config["page_size"], - }) - - for asset in response_service: - service_parent = "/".join(asset.name.split('/')[4:9]) - service_fullpath = f"{service_parent}/Service" - cluster_secondary_dict[service_fullpath]['used_ip_addresses'] += 1 - - for item in node_secondary_dict.values(): - itemKey = f"{item['node_parent']}/{item['this_node_pool']}" - cluster_secondary_dict[itemKey]['used_ip_addresses'] += item['used_ip_addresses'] - - for item in cluster_secondary_dict.values(): - itemKey = f"{item['subnet']}/{item['secondaryRange_name']}" - all_secondary_dict[item['project']][itemKey]['used_ip_addresses'] += item[ - 'used_ip_addresses'] - - -def compute_secondary_utilization(config, all_secondary_dict): - ''' - Counts resources (GKE, GCE) using IPs in secondary ranges. - Parameters: - config (dict): Dict containing config like clients and limits - all_secondary_dict (dict): Dict containing the secondary IP Range information for each subnets in the GCP organization - Returns: - None - ''' - read_mask = field_mask_pb2.FieldMask() - read_mask.FromJsonString('name,versionedResources') - - compute_GKE_secondaryIP_utilization(config, read_mask, all_secondary_dict) - # TODO: Other Secondary IP like GCE VM using alias IPs - - -def get_secondaries(config, metrics_dict): - ''' - Writes all secondary rang IP address usage metrics to custom metrics. - Parameters: - config (dict): The dict containing config like clients and limits - Returns: - None - ''' - - secondaryRange_dict = get_all_secondaryRange(config) - # Updates all_subnets_dict with the IP utilization info - compute_secondary_utilization(config, secondaryRange_dict) - - timestamp = time.time() - for project_id in config["monitored_projects"]: - if project_id not in secondaryRange_dict: - continue - for secondary_dict in secondaryRange_dict[project_id].values(): - ip_utilization = 0 - if secondary_dict['used_ip_addresses'] > 0: - ip_utilization = secondary_dict['used_ip_addresses'] / secondary_dict[ - 'total_ip_addresses'] - - # Building unique identifier with subnet region/name - subnet_id = f"{secondary_dict['region']}/{secondary_dict['name']}" - metric_labels = { - 'project': project_id, - 'network_name': secondary_dict['network_name'], - 'region' : secondary_dict['region'], - 'subnet' : secondary_dict['subnetName'], - 'secondary_range' : secondary_dict['name'] - } - metrics.append_data_to_series_buffer( - config, metrics_dict["metrics_per_subnet"] - ["ip_usage_per_secondaryRange"]["usage"]["name"], - secondary_dict['used_ip_addresses'], metric_labels, - timestamp=timestamp) - metrics.append_data_to_series_buffer( - config, metrics_dict["metrics_per_subnet"] - ["ip_usage_per_secondaryRange"]["limit"]["name"], - secondary_dict['total_ip_addresses'], metric_labels, - timestamp=timestamp) - metrics.append_data_to_series_buffer( - config, metrics_dict["metrics_per_subnet"] - ["ip_usage_per_secondaryRange"]["utilization"]["name"], - ip_utilization, metric_labels, timestamp=timestamp) - - print("Buffered metrics for secondary ip utilization for VPCs in project", - project_id) diff --git a/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/subnets.py b/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/subnets.py deleted file mode 100644 index 46fbc7564a..0000000000 --- a/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/subnets.py +++ /dev/null @@ -1,373 +0,0 @@ -# -# Copyright 2022 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -import time - -from . import metrics -from google.protobuf import field_mask_pb2 -from google.protobuf.json_format import MessageToDict -import ipaddress - - -def get_all_subnets(config): - ''' - Returns a dictionary with subnet level informations (such as IP utilization) - Parameters: - config (dict): The dict containing config like clients and limits - Returns: - subnet_dict (dictionary of String: dictionary): Key is the project_id, value is a nested dictionary with subnet_region/subnet_name as the key. - ''' - subnet_dict = {} - read_mask = field_mask_pb2.FieldMask() - read_mask.FromJsonString('name,versionedResources') - - response = config["clients"]["asset_client"].search_all_resources( - request={ - "scope": f"organizations/{config['organization']}", - "asset_types": ['compute.googleapis.com/Subnetwork'], - "read_mask": read_mask, - "page_size": config["page_size"], - }) - - for asset in response: - for versioned in asset.versioned_resources: - subnet_name = "" - network_name = "" - project_id = "" - ip_cidr_range = "" - subnet_region = "" - - for field_name, field_value in versioned.resource.items(): - if field_name == 'name': - subnet_name = field_value - elif field_name == 'network': - # Network self link format: - # "https://www.googleapis.com/compute/v1/projects//global/networks/" - project_id = field_value.split('/')[6] - network_name = field_value.split('/')[-1] - elif field_name == 'ipCidrRange': - ip_cidr_range = field_value - elif field_name == 'region': - subnet_region = field_value.split('/')[-1] - - net = ipaddress.ip_network(ip_cidr_range) - # Note that 4 IP addresses are reserved by GCP in all subnets - # Source: https://cloud.google.com/vpc/docs/subnets#reserved_ip_addresses_in_every_subnet - total_ip_addresses = int(net.num_addresses) - 4 - - if project_id not in subnet_dict: - subnet_dict[project_id] = {} - subnet_dict[project_id][f"{subnet_region}/{subnet_name}"] = { - 'name': subnet_name, - 'region': subnet_region, - 'ip_cidr_range': ip_cidr_range, - 'total_ip_addresses': total_ip_addresses, - 'used_ip_addresses': 0, - 'network_name': network_name - } - - return subnet_dict - - -def compute_subnet_utilization_vms(config, read_mask, all_subnets_dict): - ''' - Counts VMs using private IPs in the different subnets. - Parameters: - config (dict): Dict containing config like clients and limits - read_mask (FieldMask): read_mask to get additional metadata from Cloud Asset Inventory - all_subnets_dict (dict): Dict containing the information for each subnets in the GCP organization - Returns: - None - ''' - response_vm = config["clients"]["asset_client"].search_all_resources( - request={ - "scope": f"organizations/{config['organization']}", - "asset_types": ["compute.googleapis.com/Instance"], - "read_mask": read_mask, - "page_size": config["page_size"], - }) - - # Counting IP addresses for GCE instances (VMs) - for asset in response_vm: - for versioned in asset.versioned_resources: - for field_name, field_value in versioned.resource.items(): - # TODO: Handle multi-NIC - if field_name == 'networkInterfaces': - response_dict = MessageToDict(list(field_value._pb)[0]) - # Subnet self link: - # https://www.googleapis.com/compute/v1/projects//regions//subnetworks/ - subnet_region = response_dict['subnetwork'].split('/')[-3] - subnet_name = response_dict['subnetwork'].split('/')[-1] - # Network self link: - # https://www.googleapis.com/compute/v1/projects//global/networks/ - project_id = response_dict['network'].split('/')[6] - network_name = response_dict['network'].split('/')[-1] - - all_subnets_dict[project_id][f"{subnet_region}/{subnet_name}"][ - 'used_ip_addresses'] += 1 - - -def compute_subnet_utilization_ilbs(config, read_mask, all_subnets_dict): - ''' - Counts ILBs using private IPs in the different subnets. - Parameters: - config (dict): Dict containing config like clients and limits - read_mask (FieldMask): read_mask to get additional metadata from Cloud Asset Inventory - all_subnets_dict (dict): Dict containing the information for each subnets in the GCP organization - Returns: - None - ''' - response_ilb = config["clients"]["asset_client"].search_all_resources( - request={ - "scope": f"organizations/{config['organization']}", - "asset_types": ["compute.googleapis.com/ForwardingRule"], - "read_mask": read_mask, - "page_size": config["page_size"], - }) - - for asset in response_ilb: - internal = False - psc = False - project_id = '' - subnet_name = '' - subnet_region = '' - address = '' - network = '' - for versioned in asset.versioned_resources: - for field_name, field_value in versioned.resource.items(): - if 'loadBalancingScheme' in field_name and field_value in [ - 'INTERNAL', 'INTERNAL_MANAGED' - ]: - internal = True - # We want to count only accepted PSC endpoint Forwarding Rule - # If the PSC endpoint Forwarding Rule is pending, we will count it in the reserved IP addresses - elif field_name == 'pscConnectionStatus' and field_value == 'ACCEPTED': - psc = True - elif field_name == 'IPAddress': - address = field_value - elif field_name == 'network': - project_id = field_value.split('/')[6] - network = field_value.split('/')[-1] - elif 'subnetwork' in field_name: - subnet_name = field_value.split('/')[-1] - subnet_region = field_value.split('/')[-3] - - if internal: - all_subnets_dict[project_id][f"{subnet_region}/{subnet_name}"][ - 'used_ip_addresses'] += 1 - elif psc: - # PSC endpoint asset doesn't contain the subnet information in Asset Inventory - # We need to find the correct subnet with IP address matching - ip_address = ipaddress.ip_address(address) - for subnet_key, subnet_dict in all_subnets_dict[project_id].items(): - if subnet_dict["network_name"] == network: - if ip_address in ipaddress.ip_network(subnet_dict['ip_cidr_range']): - all_subnets_dict[project_id][subnet_key]['used_ip_addresses'] += 1 - - -def compute_subnet_utilization_addresses(config, read_mask, all_subnets_dict): - ''' - Counts reserved IP addresses in the different subnets. - Parameters: - config (dict): Dict containing config like clients and limits - read_mask (FieldMask): read_mask to get additional metadata from Cloud Asset Inventory - all_subnets_dict (dict): Dict containing the information for each subnets in the GCP organization - Returns: - None - ''' - response_reserved_ips = config["clients"][ - "asset_client"].search_all_resources( - request={ - "scope": f"organizations/{config['organization']}", - "asset_types": ["compute.googleapis.com/Address"], - "read_mask": read_mask, - "page_size": config["page_size"], - }) - - # Counting IP addresses for GCE Reserved IPs (ex: PSC, Cloud DNS Inbound policies, reserved GCE IPs) - for asset in response_reserved_ips: - purpose = "" - status = "" - project_id = "" - network_name = "" - subnet_name = "" - subnet_region = "" - address = "" - prefixLength = "" - address_name = "" - for versioned in asset.versioned_resources: - for field_name, field_value in versioned.resource.items(): - if field_name == 'name': - address_name = field_value - if field_name == 'purpose': - purpose = field_value - elif field_name == 'region': - subnet_region = field_value.split('/')[-1] - elif field_name == 'status': - status = field_value - elif field_name == 'address': - address = field_value - elif field_name == 'network': - network_name = field_value.split('/')[-1] - project_id = field_value.split('/')[6] - elif field_name == 'subnetwork': - subnet_name = field_value.split('/')[-1] - project_id = field_value.split('/')[6] - elif field_name == 'prefixLength': - prefixLength = field_value - - # Rserved IP addresses for GCE instances or PSC Forwarding Rule PENDING state - if purpose == "GCE_ENDPOINT" and status == "RESERVED": - all_subnets_dict[project_id][f"{subnet_region}/{subnet_name}"][ - 'used_ip_addresses'] += 1 - # Cloud DNS inbound policy - elif purpose == "DNS_RESOLVER": - all_subnets_dict[project_id][f"{subnet_region}/{subnet_name}"][ - 'used_ip_addresses'] += 1 - # PSA Range for Cloud SQL, MemoryStore, etc. - elif purpose == "VPC_PEERING": - ip_range = f"{address}/{int(prefixLength)}" - net = ipaddress.ip_network(ip_range) - # Note that 4 IP addresses are reserved by GCP in all subnets - # Source: https://cloud.google.com/vpc/docs/subnets#reserved_ip_addresses_in_every_subnet - total_ip_addresses = int(net.num_addresses) - 4 - all_subnets_dict[project_id][f"psa/{address_name}"] = { - 'name': f"psa/{address_name}", - 'region': subnet_region, - 'ip_cidr_range': ip_range, - 'total_ip_addresses': total_ip_addresses, - 'used_ip_addresses': 0, - 'network_name': network_name - } - - -def compute_subnet_utilization_redis(config, read_mask, all_subnets_dict): - ''' - Counts Redis (Memorystore) instances using private IPs in the different subnets. - Parameters: - config (dict): Dict containing config like clients and limits - read_mask (FieldMask): read_mask to get additional metadata from Cloud Asset Inventory - all_subnets_dict (dict): Dict containing the information for each subnets in the GCP organization - Returns: - None - ''' - response_redis = config["clients"]["asset_client"].search_all_resources( - request={ - "scope": f"organizations/{config['organization']}", - "asset_types": ["redis.googleapis.com/Instance"], - "read_mask": read_mask, - "page_size": config["page_size"], - }) - - for asset in response_redis: - ip_range = "" - connect_mode = "" - network_name = "" - project_id = "" - region = "" - for versioned in asset.versioned_resources: - for field_name, field_value in versioned.resource.items(): - if field_name == 'locationId': - region = field_value[0:-2] - if field_name == 'authorizedNetwork': - network_name = field_value.split('/')[-1] - project_id = field_value.split('/')[1] - if field_name == 'reservedIpRange': - ip_range = field_value - if field_name == 'connectMode': - connect_mode = field_value - - # Only handling PSA for Redis for now - if connect_mode == "PRIVATE_SERVICE_ACCESS": - redis_ip_range = ipaddress.ip_network(ip_range) - for subnet_key, subnet_dict in all_subnets_dict[project_id].items(): - if subnet_dict["network_name"] == network_name: - # Reddis instance asset doesn't contain the subnet information in Asset Inventory - # We need to find the correct subnet range with IP address matching to compute the utilization - if redis_ip_range.overlaps( - ipaddress.ip_network(subnet_dict['ip_cidr_range'])): - all_subnets_dict[project_id][subnet_key][ - 'used_ip_addresses'] += redis_ip_range.num_addresses - all_subnets_dict[project_id][subnet_key]['region'] = region - - -def compute_subnet_utilization(config, all_subnets_dict): - ''' - Counts resources (VMs, ILBs, reserved IPs) using private IPs in the different subnets. - Parameters: - config (dict): Dict containing config like clients and limits - all_subnets_dict (dict): Dict containing the information for each subnets in the GCP organization - Returns: - None - ''' - read_mask = field_mask_pb2.FieldMask() - read_mask.FromJsonString('name,versionedResources') - - compute_subnet_utilization_vms(config, read_mask, all_subnets_dict) - compute_subnet_utilization_ilbs(config, read_mask, all_subnets_dict) - compute_subnet_utilization_addresses(config, read_mask, all_subnets_dict) - # TODO: Other PSA services such as FileStore, Cloud SQL - compute_subnet_utilization_redis(config, read_mask, all_subnets_dict) - - # TODO: Handle secondary ranges and count GKE pods - - -def get_subnets(config, metrics_dict): - ''' - Writes all subnet metrics to custom metrics. - - Parameters: - config (dict): The dict containing config like clients and limits - Returns: - None - ''' - - all_subnets_dict = get_all_subnets(config) - # Updates all_subnets_dict with the IP utilization info - compute_subnet_utilization(config, all_subnets_dict) - - timestamp = time.time() - for project_id in config["monitored_projects"]: - if project_id not in all_subnets_dict: - continue - for subnet_dict in all_subnets_dict[project_id].values(): - ip_utilization = 0 - if subnet_dict['used_ip_addresses'] > 0: - ip_utilization = subnet_dict['used_ip_addresses'] / subnet_dict[ - 'total_ip_addresses'] - - # Building unique identifier with subnet region/name - subnet_id = f"{subnet_dict['region']}/{subnet_dict['name']}" - metric_labels = { - 'project': project_id, - 'network_name': subnet_dict['network_name'], - 'subnet_id': subnet_id - } - metrics.append_data_to_series_buffer( - config, metrics_dict["metrics_per_subnet"]["ip_usage_per_subnet"] - ["usage"]["name"], subnet_dict['used_ip_addresses'], metric_labels, - timestamp=timestamp) - metrics.append_data_to_series_buffer( - config, metrics_dict["metrics_per_subnet"]["ip_usage_per_subnet"] - ["limit"]["name"], subnet_dict['total_ip_addresses'], metric_labels, - timestamp=timestamp) - metrics.append_data_to_series_buffer( - config, metrics_dict["metrics_per_subnet"]["ip_usage_per_subnet"] - ["utilization"]["name"], ip_utilization, metric_labels, - timestamp=timestamp) - - print("Buffered metrics for subnet ip utilization for VPCs in project", - project_id) diff --git a/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/vpc_firewalls.py b/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/vpc_firewalls.py deleted file mode 100644 index f9fec79a72..0000000000 --- a/blueprints/cloud-operations/network-dashboard/cloud-function/metrics/vpc_firewalls.py +++ /dev/null @@ -1,122 +0,0 @@ -# -# Copyright 2022 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -import re -import time - -from collections import defaultdict -from pydoc import doc -from collections import defaultdict -from google.protobuf import field_mask_pb2 -from . import metrics, networks, limits - - -def get_firewalls_dict(config: dict): - ''' - Calls the Asset Inventory API to get all VPC Firewall Rules under the GCP organization. - - Parameters: - config (dict): The dict containing config like clients and limits - Returns: - firewalls_dict (dictionary of dictionary: int): Keys are projects, subkeys are networks, values count #of VPC Firewall Rules - ''' - - firewalls_dict = defaultdict(int) - read_mask = field_mask_pb2.FieldMask() - read_mask.FromJsonString('name,versionedResources') - - response = config["clients"]["asset_client"].search_all_resources( - request={ - "scope": f"organizations/{config['organization']}", - "asset_types": ["compute.googleapis.com/Firewall"], - "read_mask": read_mask, - "page_size": config["page_size"], - }) - for resource in response: - project_id = re.search("(compute.googleapis.com/projects/)([\w\-\d]+)", - resource.name).group(2) - network_name = "" - for versioned in resource.versioned_resources: - for field_name, field_value in versioned.resource.items(): - if field_name == "network": - network_name = re.search("[a-z0-9\-]*$", field_value).group(0) - firewalls_dict[project_id] = defaultdict( - int - ) if not project_id in firewalls_dict else firewalls_dict[project_id] - firewalls_dict[project_id][ - network_name] = 1 if not network_name in firewalls_dict[ - project_id] else firewalls_dict[project_id][network_name] + 1 - break - break - return firewalls_dict - - -def get_firewalls_data(config, metrics_dict, project_quotas_dict, - firewalls_dict): - ''' - Gets the data for VPC Firewall Rules per VPC Network and writes it to the metric defined in vpc_firewalls_metric. - - Parameters: - config (dict): The dict containing config like clients and limits - metrics_dict (dictionary of dictionary of string: string): metrics names and descriptions. - project_quotas_dict (dictionary of string:int): Dictionary with the network link as key and the limit as value. - firewalls_dict (dictionary of of dictionary of string: string): Keys are projects, subkeys are networks, values count #of VPC Firewall Rules - Returns: - None - ''' - - timestamp = time.time() - for project_id in config["monitored_projects"]: - - current_quota_limit = project_quotas_dict[project_id]['global']["firewalls"] - if current_quota_limit is None: - print( - f"Could not determine VPC firewal rules metric for projects/{project_id} due to missing quotas" - ) - continue - - network_dict = networks.get_networks(config, project_id) - - project_usage = 0 - for net in network_dict: - usage = 0 - if project_id in firewalls_dict and net['network_name'] in firewalls_dict[ - project_id]: - usage = firewalls_dict[project_id][net['network_name']] - project_usage += usage - metric_labels = { - 'project': project_id, - 'network_name': net['network_name'] - } - metrics.append_data_to_series_buffer( - config, - metrics_dict["metrics_per_project"][f"firewalls"]["usage"]["name"], - usage, metric_labels, timestamp=timestamp) - - metric_labels = {'project': project_id} - # firewall quotas are per project, not per single VPC - metrics.append_data_to_series_buffer( - config, - metrics_dict["metrics_per_project"][f"firewalls"]["limit"]["name"], - current_quota_limit['limit'], metric_labels, timestamp=timestamp) - metrics.append_data_to_series_buffer( - config, metrics_dict["metrics_per_project"][f"firewalls"]["utilization"] - ["name"], project_usage / current_quota_limit['limit'] - if current_quota_limit['limit'] != 0 else 0, metric_labels, - timestamp=timestamp) - print( - f"Buffered number of VPC Firewall Rules to metric for projects/{project_id}" - ) diff --git a/blueprints/cloud-operations/network-dashboard/cloud-function/requirements.txt b/blueprints/cloud-operations/network-dashboard/cloud-function/requirements.txt deleted file mode 100644 index d561348229..0000000000 --- a/blueprints/cloud-operations/network-dashboard/cloud-function/requirements.txt +++ /dev/null @@ -1,11 +0,0 @@ -regex==2022.3.2 -google-api-python-client==2.39.0 -google-auth==2.6.0 -google-auth-httplib2==0.1.0 -google-cloud-logging==3.0.0 -google-cloud-monitoring==2.9.1 -oauth2client==4.1.3 -google-api-core==2.7.0 -PyYAML==6.0 -google-cloud-asset==3.8.1 -functions-framework==3.* \ No newline at end of file diff --git a/blueprints/cloud-operations/network-dashboard/dashboards/quotas-utilization.json b/blueprints/cloud-operations/network-dashboard/dashboards/quotas-utilization.json index e26d692645..1c11bdb7af 100644 --- a/blueprints/cloud-operations/network-dashboard/dashboards/quotas-utilization.json +++ b/blueprints/cloud-operations/network-dashboard/dashboards/quotas-utilization.json @@ -7,7 +7,7 @@ { "height": 4, "widget": { - "title": "internal_forwarding_rules_l4_utilization", + "title": "Internal L4 forwarding rules utilization", "xyChart": { "chartOptions": { "mode": "COLOR" @@ -24,7 +24,7 @@ "alignmentPeriod": "3600s", "perSeriesAligner": "ALIGN_NEXT_OLDER" }, - "filter": "metric.type=\"custom.googleapis.com/internal_forwarding_rules_l4_utilization\" resource.type=\"global\"", + "filter": "metric.type=\"custom.googleapis.com/netmon/network/forwarding_rules_l4_used_ratio\" resource.type=\"global\"", "secondaryAggregation": { "alignmentPeriod": "1800s", "perSeriesAligner": "ALIGN_MEAN" @@ -47,7 +47,7 @@ { "height": 4, "widget": { - "title": "internal_forwarding_rules_l7_utilization", + "title": "Internal L7 forwarding rules utilization", "xyChart": { "chartOptions": { "mode": "COLOR" @@ -64,7 +64,7 @@ "alignmentPeriod": "3600s", "perSeriesAligner": "ALIGN_NEXT_OLDER" }, - "filter": "metric.type=\"custom.googleapis.com/internal_forwarding_rules_l7_utilization\" resource.type=\"global\"", + "filter": "metric.type=\"custom.googleapis.com/netmon/network/forwarding_rules_l4_used_ratio\" resource.type=\"global\"", "secondaryAggregation": { "alignmentPeriod": "60s", "perSeriesAligner": "ALIGN_MEAN" @@ -87,7 +87,7 @@ { "height": 4, "widget": { - "title": "number_of_instances_utilization", + "title": "Instance utilization", "xyChart": { "chartOptions": { "mode": "COLOR" @@ -104,7 +104,7 @@ "alignmentPeriod": "3600s", "perSeriesAligner": "ALIGN_NEXT_OLDER" }, - "filter": "metric.type=\"custom.googleapis.com/number_of_instances_utilization\" resource.type=\"global\"", + "filter": "metric.type=\"custom.googleapis.com/netmon/network/instances_used_ratio\" resource.type=\"global\"", "secondaryAggregation": { "alignmentPeriod": "60s", "perSeriesAligner": "ALIGN_MEAN" @@ -127,7 +127,7 @@ { "height": 4, "widget": { - "title": "number_of_vpc_peerings_utilization", + "title": "Peering utilization", "xyChart": { "chartOptions": { "mode": "COLOR" @@ -144,7 +144,7 @@ "alignmentPeriod": "3600s", "perSeriesAligner": "ALIGN_NEXT_OLDER" }, - "filter": "metric.type=\"custom.googleapis.com/number_of_vpc_peerings_utilization\" resource.type=\"global\"", + "filter": "metric.type=\"custom.googleapis.com/netmon/network/peerings_total_used_ratio\" resource.type=\"global\"", "secondaryAggregation": { "alignmentPeriod": "60s", "perSeriesAligner": "ALIGN_MEAN" @@ -167,7 +167,7 @@ { "height": 4, "widget": { - "title": "number_of_active_vpc_peerings_utilization", + "title": "Active peering utilization", "xyChart": { "chartOptions": { "mode": "COLOR" @@ -184,7 +184,7 @@ "alignmentPeriod": "3600s", "perSeriesAligner": "ALIGN_NEXT_OLDER" }, - "filter": "metric.type=\"custom.googleapis.com/number_of_active_vpc_peerings_utilization\" resource.type=\"global\"", + "filter": "metric.type=\"custom.googleapis.com/netmon/network/peerings_active_used_ratio\" resource.type=\"global\"", "secondaryAggregation": { "alignmentPeriod": "60s", "perSeriesAligner": "ALIGN_INTERPOLATE" @@ -207,7 +207,7 @@ { "height": 4, "widget": { - "title": "subnet_IP_ranges_ppg_utilization", + "title": "Peering group internal L4 forwarding rules utilization", "xyChart": { "chartOptions": { "mode": "COLOR" @@ -224,47 +224,7 @@ "alignmentPeriod": "3600s", "perSeriesAligner": "ALIGN_NEXT_OLDER" }, - "filter": "metric.type=\"custom.googleapis.com/number_of_subnet_IP_ranges_ppg_utilization\" resource.type=\"global\"", - "secondaryAggregation": { - "alignmentPeriod": "3600s", - "perSeriesAligner": "ALIGN_MEAN" - } - } - } - } - ], - "timeshiftDuration": "0s", - "yAxis": { - "label": "y1Axis", - "scale": "LINEAR" - } - } - }, - "width": 6, - "xPos": 0, - "yPos": 16 - }, - { - "height": 4, - "widget": { - "title": "internal_forwarding_rules_l4_ppg_utilization", - "xyChart": { - "chartOptions": { - "mode": "COLOR" - }, - "dataSets": [ - { - "minAlignmentPeriod": "3600s", - "plotType": "LINE", - "targetAxis": "Y1", - "timeSeriesQuery": { - "apiSource": "DEFAULT_CLOUD", - "timeSeriesFilter": { - "aggregation": { - "alignmentPeriod": "3600s", - "perSeriesAligner": "ALIGN_NEXT_OLDER" - }, - "filter": "metric.type=\"custom.googleapis.com/internal_forwarding_rules_l4_ppg_utilization\" resource.type=\"global\"", + "filter": "metric.type=\"custom.googleapis.com/netmon/peering_group/forwarding_rules_l4_used_ratio\" resource.type=\"global\"", "secondaryAggregation": { "alignmentPeriod": "3600s", "perSeriesAligner": "ALIGN_MEAN" @@ -287,7 +247,7 @@ { "height": 4, "widget": { - "title": "internal_forwarding_rules_l7_ppg_utilization", + "title": "Peering group internal L7 forwarding rules utilization", "xyChart": { "chartOptions": { "mode": "COLOR" @@ -304,7 +264,7 @@ "alignmentPeriod": "3600s", "perSeriesAligner": "ALIGN_NEXT_OLDER" }, - "filter": "metric.type=\"custom.googleapis.com/internal_forwarding_rules_l7_ppg_utilization\" resource.type=\"global\"", + "filter": "metric.type=\"custom.googleapis.com/netmon/peering_group/forwarding_rules_l7_used_ratio\" resource.type=\"global\"", "secondaryAggregation": { "alignmentPeriod": "60s", "perSeriesAligner": "ALIGN_MEAN" @@ -327,7 +287,7 @@ { "height": 4, "widget": { - "title": "number_of_instances_ppg_utilization", + "title": "Peering group instance utilization", "xyChart": { "chartOptions": { "mode": "COLOR" @@ -344,7 +304,7 @@ "alignmentPeriod": "3600s", "perSeriesAligner": "ALIGN_NEXT_OLDER" }, - "filter": "metric.type=\"custom.googleapis.com/number_of_instances_ppg_utilization\" resource.type=\"global\"" + "filter": "metric.type=\"custom.googleapis.com/netmon/peering_group/instances_used_ratio\" resource.type=\"global\"" } } } @@ -363,7 +323,7 @@ { "height": 4, "widget": { - "title": "dynamic_routes_per_network_utilization", + "title": "Peering group dynamic route utilization", "xyChart": { "chartOptions": { "mode": "COLOR" @@ -380,7 +340,7 @@ "alignmentPeriod": "60s", "perSeriesAligner": "ALIGN_MEAN" }, - "filter": "metric.type=\"custom.googleapis.com/dynamic_routes_per_network_utilization\" resource.type=\"global\"" + "filter": "metric.type=\"custom.googleapis.com/netmon/peering_group/routes_dynamic_used_ratio\" resource.type=\"global\"" } } } @@ -399,7 +359,7 @@ { "height": 4, "widget": { - "title": "firewalls_per_project_vpc_usage", + "title": "Project firewall rules used ratio", "xyChart": { "chartOptions": { "mode": "COLOR" @@ -420,7 +380,7 @@ ], "perSeriesAligner": "ALIGN_MEAN" }, - "filter": "metric.type=\"custom.googleapis.com/firewalls_per_project_vpc_usage\" resource.type=\"global\"" + "filter": "metric.type=\"custom.googleapis.com/netmon/project/firewall_rules_used_ratio\" resource.type=\"global\"" } } } @@ -439,47 +399,7 @@ { "height": 4, "widget": { - "title": "firewalls_per_project_utilization", - "xyChart": { - "chartOptions": { - "mode": "COLOR" - }, - "dataSets": [ - { - "minAlignmentPeriod": "60s", - "plotType": "LINE", - "targetAxis": "Y1", - "timeSeriesQuery": { - "apiSource": "DEFAULT_CLOUD", - "timeSeriesFilter": { - "aggregation": { - "alignmentPeriod": "60s", - "crossSeriesReducer": "REDUCE_MAX", - "groupByFields": [ - "metric.label.\"project\"" - ], - "perSeriesAligner": "ALIGN_MAX" - }, - "filter": "metric.type=\"custom.googleapis.com/firewalls_per_project_utilization\" resource.type=\"global\"" - } - } - } - ], - "timeshiftDuration": "0s", - "yAxis": { - "label": "y1Axis", - "scale": "LINEAR" - } - } - }, - "width": 6, - "xPos": 6, - "yPos": 32 - }, - { - "height": 4, - "widget": { - "title": "tuples_per_firewall_policy_utilization", + "title": "Firewall policy tuples used ratio", "xyChart": { "chartOptions": { "mode": "COLOR" @@ -496,7 +416,7 @@ "alignmentPeriod": "60s", "perSeriesAligner": "ALIGN_MEAN" }, - "filter": "metric.type=\"custom.googleapis.com/firewall_policy_tuples_per_policy_utilization\" resource.type=\"global\"" + "filter": "metric.type=\"custom.googleapis.com/netmon/firewall_policy/tuples_used_ratio\" resource.type=\"global\"" } } } @@ -515,7 +435,7 @@ { "height": 4, "widget": { - "title": "ip_addresses_per_subnet_utilization", + "title": "IP addressed per subnetwork used ratio", "xyChart": { "chartOptions": { "mode": "COLOR" @@ -532,7 +452,7 @@ "alignmentPeriod": "60s", "perSeriesAligner": "ALIGN_MEAN" }, - "filter": "metric.type=\"custom.googleapis.com/ip_addresses_per_subnet_utilization\" resource.type=\"global\"" + "filter": "metric.type=\"custom.googleapis.com/netmon/subnetwork/addresses_used_ratio\" resource.type=\"global\"" } } } @@ -551,43 +471,7 @@ { "height": 4, "widget": { - "title": "dynamic_routes_ppg_utilization", - "xyChart": { - "chartOptions": { - "mode": "COLOR" - }, - "dataSets": [ - { - "minAlignmentPeriod": "60s", - "plotType": "LINE", - "targetAxis": "Y1", - "timeSeriesQuery": { - "apiSource": "DEFAULT_CLOUD", - "timeSeriesFilter": { - "aggregation": { - "alignmentPeriod": "60s", - "perSeriesAligner": "ALIGN_MEAN" - }, - "filter": "metric.type=\"custom.googleapis.com/dynamic_routes_per_peering_group_utilization\" resource.type=\"global\"" - } - } - } - ], - "timeshiftDuration": "0s", - "yAxis": { - "label": "y1Axis", - "scale": "LINEAR" - } - } - }, - "width": 6, - "xPos": 6, - "yPos": 20 - }, - { - "height": 4, - "widget": { - "title": "static_routes_per_project_vpc_usage", + "title": "Project static routes used", "xyChart": { "chartOptions": { "mode": "COLOR" @@ -608,7 +492,7 @@ ], "perSeriesAligner": "ALIGN_MEAN" }, - "filter": "metric.type=\"custom.googleapis.com/static_routes_per_project_vpc_usage\" resource.type=\"global\"", + "filter": "metric.type=\"custom.googleapis.com/netmon/project/routes_static_used_ratio\" resource.type=\"global\"", "secondaryAggregation": { "alignmentPeriod": "60s", "perSeriesAligner": "ALIGN_NONE" @@ -632,7 +516,7 @@ { "height": 4, "widget": { - "title": "static_routes_per_ppg_utilization", + "title": "Peering group static routes used", "xyChart": { "chartOptions": { "mode": "COLOR" @@ -649,7 +533,7 @@ "alignmentPeriod": "60s", "perSeriesAligner": "ALIGN_MEAN" }, - "filter": "metric.type=\"custom.googleapis.com/static_routes_per_peering_group_utilization\" resource.type=\"global\"" + "filter": "metric.type=\"custom.googleapis.com/netmon/peering_group/routes_static_used_ratio\" resource.type=\"global\"" } } } @@ -665,85 +549,6 @@ "width": 6, "xPos": 0, "yPos": 28 - }, - { - "height": 4, - "widget": { - "title": "static_routes_per_project_utilization", - "xyChart": { - "chartOptions": { - "mode": "COLOR" - }, - "dataSets": [ - { - "minAlignmentPeriod": "60s", - "plotType": "LINE", - "targetAxis": "Y1", - "timeSeriesQuery": { - "apiSource": "DEFAULT_CLOUD", - "timeSeriesFilter": { - "aggregation": { - "alignmentPeriod": "60s", - "perSeriesAligner": "ALIGN_MEAN" - }, - "filter": "metric.type=\"custom.googleapis.com/static_routes_per_project_utilization\" resource.type=\"global\"" - } - } - } - ], - "timeshiftDuration": "0s", - "yAxis": { - "label": "y1Axis", - "scale": "LINEAR" - } - } - }, - "width": 6, - "xPos": 6, - "yPos": 24 - }, - { - "height": 4, - "widget": { - "title": "secondary_ip_address_utilization", - "xyChart": { - "chartOptions": { - "mode": "COLOR" - }, - "dataSets": [ - { - "minAlignmentPeriod": "60s", - "plotType": "LINE", - "targetAxis": "Y1", - "timeSeriesQuery": { - "apiSource": "DEFAULT_CLOUD", - "timeSeriesFilter": { - "aggregation": { - "alignmentPeriod": "60s", - "crossSeriesReducer": "REDUCE_NONE", - "perSeriesAligner": "ALIGN_MEAN" - }, - "filter": "metric.type=\"custom.googleapis.com/ip_addresses_per_sr_utilization\" resource.type=\"global\"", - "secondaryAggregation": { - "alignmentPeriod": "60s", - "crossSeriesReducer": "REDUCE_NONE", - "perSeriesAligner": "ALIGN_NONE" - } - } - } - } - ], - "thresholds": [], - "timeshiftDuration": "0s", - "yAxis": { - "label": "y1Axis", - "scale": "LINEAR" - } - } - }, - "width": 6, - "xPos": 0, - "yPos": 36 } ] } diff --git a/blueprints/cloud-operations/network-dashboard/deploy-cloud-function/README.md b/blueprints/cloud-operations/network-dashboard/deploy-cloud-function/README.md new file mode 100644 index 0000000000..15288b557d --- /dev/null +++ b/blueprints/cloud-operations/network-dashboard/deploy-cloud-function/README.md @@ -0,0 +1,89 @@ +# Network Dashboard Discovery via Cloud Function + +This simple Terraform setup allows deploying the [discovery tool for the Network Dashboard](../src/) to a Cloud Function, triggered by a schedule via PubSub. + +GCP resource diagram + +## Project and function-level configuration + +A single project is used both for deploying the function and to collect generated timeseries: writing timeseries to a separate project is not supported here for brevity, but is very simple to implement (basically change the value for `op_project` in the schedule payload queued in PubSub). The project is configured with the required APIs, and it can also optionally be created via the `project_create_config` variable. + +The function uses a dedicated service account which is created for this purpose. Roles to allow discovery can optionally be set at the top-level discovery scope (organization or folder) via the `grant_discovery_iam_roles` variable, those of course require the right set of permissions on the part of the identity running `terraform apply`. The alternative when IAM bindings cannot be managed on the top-level scope, is to assign `roles/compute.viewer` and `roles/cloudasset.viewer` to the function service account from a separate process, or manually in the console. + +A few configuration values for the function which are relevant to this example can also be configured in the `cloud_function_config` variable, particularly the `debug` attribute which turns on verbose logging to help in troubleshooting. + +## Discovery configuration + +Discovery configuration is done via the `discovery_config` variable, which mimicks the set of options available when running the discovery tool in cli mode. Pay particular care in defining the right top-level scope via the `discovery_root` attribute, as this is the root of the hierarchy used to discover Compute resources and it needs to include the individual folders and projects that needs to be monitored, which are defined via the `monitored_folders` and `monitored_projects` attributes. + +The following schematic diagram of a resource hierarchy illustrates the interplay between root scope and monitored resources. The root scope is set to the top-level red folder and completely encloses every resource that needs to be monitored. The blue folder and project are set as monitored defining the actual perimeter used to discover resources. Note that setting the root scope to the blue folder would have resulted in the rightmost project being excluded. + +GCP resource diagram + +This is an example of a working configuration, where the discovery root is set at the org level, but resources used to compute timeseries need to be part of the hierarchy of two specific folders: + +```tfvars +# cloud_function_config = { +# debug = true +# } +discovery_config = { + discovery_root = "organizations/1234567890" + monitored_folders = ["3456789012", "7890123456"] + monitored_projects = [] + # if you have custom quota not returned by the API, compile a file and set + # its pat here; format is described in ../src/custom-quotas.sample + # custom_quota_file = "../src/custom-quotas.yaml" +} +grant_discovery_iam_roles = true +project_create_config = { + billing_account_id = "12345-ABCDEF-12345" + parent_id = "folders/2345678901" +} +project_id = "my-project" +``` + +## Manual triggering for troubleshooting + +If the function crashes or its behaviour is not as expected, you can turn on debugging via the `cloud_function_config.debug` variable attribute, then manually trigger the function from the console by specifying a payload with a single `data` attribute containing the base64-encoded arguments passed to the function by Cloud Scheduler. You can get the pre-computed payload from the `troubleshooting_payload` output: + +```bash +# copy and paste to the function's "Testing" tab in the console +tf output -raw troubleshooting_payload +``` + +## Monitoring dashboard + +A monitoring dashboard can be optionally be deployed int he same project by setting the `dashboard_json_path` variable to the path of a dashboard JSON file. A sample dashboard is in included, and can be deployed with this variable configuration: + +```hcl +dashboard_json_path = "../dashboards/quotas-utilization.json" +``` + + +## Variables + +| name | description | type | required | default | +|---|---|:---:|:---:|:---:| +| [discovery_config](variables.tf#L44) | Discovery configuration. Discovery root is the organization or a folder. If monitored folders and projects are empy, every project under the discovery root node will be monitored. | object({…}) | ✓ | | +| [project_id](variables.tf#L90) | Project id where the Cloud Function will be deployed. | string | ✓ | | +| [bundle_path](variables.tf#L17) | Path used to write the intermediate Cloud Function code bundle. | string | | "./bundle.zip" | +| [cloud_function_config](variables.tf#L23) | Optional Cloud Function configuration. | object({…}) | | {} | +| [dashboard_json_path](variables.tf#L38) | Optional monitoring dashboard to deploy. | string | | null | +| [grant_discovery_iam_roles](variables.tf#L62) | Optionally grant required IAM roles to Cloud Function service account. | bool | | false | +| [labels](variables.tf#L69) | Billing labels used for the Cloud Function, and the project if project_create is true. | map(string) | | {} | +| [name](variables.tf#L75) | Name used to create Cloud Function related resources. | string | | "net-dash" | +| [project_create_config](variables.tf#L81) | Optional configuration if project creation is required. | object({…}) | | null | +| [region](variables.tf#L95) | Compute region where the Cloud Function will be deployed. | string | | "europe-west1" | +| [schedule_config](variables.tf#L101) | Schedule timer configuration in crontab format. | string | | "0/30 * * * *" | + +## Outputs + +| name | description | sensitive | +|---|---|:---:| +| [bucket](outputs.tf#L17) | Cloud Function deployment bucket resource. | | +| [cloud-function](outputs.tf#L22) | Cloud Function resource. | | +| [project_id](outputs.tf#L27) | Project id. | | +| [service_account](outputs.tf#L32) | Cloud Function service account. | | +| [troubleshooting_payload](outputs.tf#L40) | Cloud Function payload used for manual triggering. | ✓ | + + diff --git a/blueprints/cloud-operations/network-dashboard/deploy-cloud-function/diagram-scope.png b/blueprints/cloud-operations/network-dashboard/deploy-cloud-function/diagram-scope.png new file mode 100644 index 0000000000..6247c1c90d Binary files /dev/null and b/blueprints/cloud-operations/network-dashboard/deploy-cloud-function/diagram-scope.png differ diff --git a/blueprints/cloud-operations/network-dashboard/deploy-cloud-function/diagram.png b/blueprints/cloud-operations/network-dashboard/deploy-cloud-function/diagram.png new file mode 100644 index 0000000000..d715406722 Binary files /dev/null and b/blueprints/cloud-operations/network-dashboard/deploy-cloud-function/diagram.png differ diff --git a/blueprints/cloud-operations/network-dashboard/deploy-cloud-function/main.tf b/blueprints/cloud-operations/network-dashboard/deploy-cloud-function/main.tf new file mode 100644 index 0000000000..abbea80e29 --- /dev/null +++ b/blueprints/cloud-operations/network-dashboard/deploy-cloud-function/main.tf @@ -0,0 +1,144 @@ +/** + * Copyright 2022 Google LLC + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +locals { + discovery_roles = ["roles/compute.viewer", "roles/cloudasset.viewer"] +} + +resource "random_string" "default" { + count = var.cloud_function_config.bucket_name == null ? 1 : 0 + length = 8 + special = false + upper = false +} + +module "project" { + source = "../../../../modules/project" + name = var.project_id + billing_account = try(var.project_create_config.billing_account_id, null) + labels = var.project_create_config != null ? var.labels : null + parent = try(var.project_create_config.parent_id, null) + project_create = var.project_create_config != null + services = [ + "cloudasset.googleapis.com", + "cloudbuild.googleapis.com", + "cloudfunctions.googleapis.com", + "cloudscheduler.googleapis.com", + "compute.googleapis.com", + "monitoring.googleapis.com" + ] +} + +module "pubsub" { + source = "../../../../modules/pubsub" + project_id = module.project.project_id + name = var.name + regions = [var.region] + subscriptions = { "${var.name}-default" = null } +} + +module "cloud-function" { + source = "../../../../modules/cloud-function" + project_id = module.project.project_id + name = var.name + bucket_name = coalesce( + var.cloud_function_config.bucket_name, + "${var.name}-${random_string.default.0.id}" + ) + bucket_config = { + location = var.region + } + build_worker_pool = var.cloud_function_config.build_worker_pool_id + bundle_config = { + source_dir = var.cloud_function_config.source_dir + output_path = var.cloud_function_config.bundle_path + } + environment_variables = ( + var.cloud_function_config.debug != true ? {} : { DEBUG = "1" } + ) + function_config = { + entry_point = "main_cf_pubsub" + memory_mb = var.cloud_function_config.memory_mb + timeout_seconds = var.cloud_function_config.timeout_seconds + } + service_account_create = true + trigger_config = { + v1 = { + event = "google.pubsub.topic.publish" + resource = module.pubsub.topic.id + } + } +} + +resource "google_cloud_scheduler_job" "default" { + project = var.project_id + region = var.region + name = var.name + schedule = var.schedule_config + time_zone = "UTC" + + pubsub_target { + attributes = {} + topic_name = module.pubsub.topic.id + data = base64encode(jsonencode({ + discovery_root = var.discovery_config.discovery_root + folders = var.discovery_config.monitored_folders + projects = var.discovery_config.monitored_projects + monitoring_project = module.project.project_id + custom_quota = ( + var.discovery_config.custom_quota_file == null + ? { networks = {}, projects = {} } + : yamldecode(file(var.discovery_config.custom_quota_file)) + ) + })) + } +} + +resource "google_organization_iam_member" "discovery" { + for_each = toset( + var.grant_discovery_iam_roles && + startswith(var.discovery_config.discovery_root, "organizations/") + ? local.discovery_roles + : [] + ) + org_id = split("/", var.discovery_config.discovery_root)[1] + role = each.key + member = module.cloud-function.service_account_iam_email +} + +resource "google_folder_iam_member" "discovery" { + for_each = toset( + var.grant_discovery_iam_roles && + startswith(var.discovery_config.discovery_root, "folders/") + ? local.discovery_roles + : [] + ) + folder = var.discovery_config.discovery_root + role = each.key + member = module.cloud-function.service_account_iam_email +} + +resource "google_project_iam_member" "monitoring" { + project = module.project.project_id + role = "roles/monitoring.metricWriter" + member = module.cloud-function.service_account_iam_email +} + +resource "google_monitoring_dashboard" "dashboard" { + count = var.dashboard_json_path == null ? 0 : 1 + project = var.project_id + dashboard_json = file(var.dashboard_json_path) +} diff --git a/blueprints/cloud-operations/network-dashboard/deploy-cloud-function/outputs.tf b/blueprints/cloud-operations/network-dashboard/deploy-cloud-function/outputs.tf new file mode 100644 index 0000000000..0c2c50abed --- /dev/null +++ b/blueprints/cloud-operations/network-dashboard/deploy-cloud-function/outputs.tf @@ -0,0 +1,46 @@ +/** + * Copyright 2022 Google LLC + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +output "bucket" { + description = "Cloud Function deployment bucket resource." + value = module.cloud-function.bucket +} + +output "cloud-function" { + description = "Cloud Function resource." + value = module.cloud-function.function +} + +output "project_id" { + description = "Project id." + value = module.project.project_id +} + +output "service_account" { + description = "Cloud Function service account." + value = { + email = module.cloud-function.service_account_email + iam_email = module.cloud-function.service_account_iam_email + } +} + +output "troubleshooting_payload" { + description = "Cloud Function payload used for manual triggering." + sensitive = true + value = jsonencode({ + data = google_cloud_scheduler_job.default.pubsub_target.0.data + }) +} diff --git a/blueprints/cloud-operations/network-dashboard/deploy-cloud-function/variables.tf b/blueprints/cloud-operations/network-dashboard/deploy-cloud-function/variables.tf new file mode 100644 index 0000000000..ab59f91f52 --- /dev/null +++ b/blueprints/cloud-operations/network-dashboard/deploy-cloud-function/variables.tf @@ -0,0 +1,105 @@ +/** + * Copyright 2022 Google LLC + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +variable "bundle_path" { + description = "Path used to write the intermediate Cloud Function code bundle." + type = string + default = "./bundle.zip" +} + +variable "cloud_function_config" { + description = "Optional Cloud Function configuration." + type = object({ + bucket_name = optional(string) + build_worker_pool_id = optional(string) + bundle_path = optional(string, "./bundle.zip") + debug = optional(bool, false) + memory_mb = optional(number, 256) + source_dir = optional(string, "../src") + timeout_seconds = optional(number, 540) + }) + default = {} + nullable = false +} + +variable "dashboard_json_path" { + description = "Optional monitoring dashboard to deploy." + type = string + default = null +} + +variable "discovery_config" { + description = "Discovery configuration. Discovery root is the organization or a folder. If monitored folders and projects are empy, every project under the discovery root node will be monitored." + type = object({ + discovery_root = string + monitored_folders = list(string) + monitored_projects = list(string) + custom_quota_file = optional(string) + }) + nullable = false + validation { + condition = ( + var.discovery_config.monitored_folders != null && + var.discovery_config.monitored_projects != null + ) + error_message = "Monitored folders and projects can be empty lists, but they cannot be null." + } +} + +variable "grant_discovery_iam_roles" { + description = "Optionally grant required IAM roles to Cloud Function service account." + type = bool + default = false + nullable = false +} + +variable "labels" { + description = "Billing labels used for the Cloud Function, and the project if project_create is true." + type = map(string) + default = {} +} + +variable "name" { + description = "Name used to create Cloud Function related resources." + type = string + default = "net-dash" +} + +variable "project_create_config" { + description = "Optional configuration if project creation is required." + type = object({ + billing_account_id = string + parent_id = optional(string) + }) + default = null +} + +variable "project_id" { + description = "Project id where the Cloud Function will be deployed." + type = string +} + +variable "region" { + description = "Compute region where the Cloud Function will be deployed." + type = string + default = "europe-west1" +} + +variable "schedule_config" { + description = "Schedule timer configuration in crontab format." + type = string + default = "0/30 * * * *" +} diff --git a/blueprints/cloud-operations/network-dashboard/main.tf b/blueprints/cloud-operations/network-dashboard/main.tf deleted file mode 100644 index e74cabd6e2..0000000000 --- a/blueprints/cloud-operations/network-dashboard/main.tf +++ /dev/null @@ -1,191 +0,0 @@ -/** - * Copyright 2022 Google LLC - * - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -locals { - project_ids = toset(var.monitored_projects_list) - projects = join(",", local.project_ids) - - folder_ids = toset(var.monitored_folders_list) - folders = join(",", local.folder_ids) - monitoring_project = var.monitoring_project_id == "" ? module.project-monitoring[0].project_id : var.monitoring_project_id -} - -################################################ -# Monitoring project creation # -################################################ - -module "project-monitoring" { - count = var.monitoring_project_id == "" ? 1 : 0 - source = "../../../modules/project" - name = "network-dashboards" - parent = "organizations/${var.organization_id}" - prefix = var.prefix - billing_account = var.billing_account - services = var.project_monitoring_services -} - -################################################ -# Service account creation and IAM permissions # -################################################ - -module "service-account-function" { - source = "../../../modules/iam-service-account" - project_id = local.monitoring_project - name = "sa-dash" - generate_key = false - - # Required IAM permissions for this service account are: - # 1) compute.networkViewer on projects to be monitored (I gave it at organization level for now for simplicity) - # 2) monitoring viewer on the projects to be monitored (I gave it at organization level for now for simplicity) - - iam_organization_roles = { - "${var.organization_id}" = [ - "roles/compute.networkViewer", - "roles/monitoring.viewer", - "roles/cloudasset.viewer" - ] - } - - iam_project_roles = { - "${local.monitoring_project}" = [ - "roles/monitoring.metricWriter", - ] - } -} - -module "service-account-scheduler" { - source = "../../../modules/iam-service-account" - project_id = local.monitoring_project - name = "sa-scheduler" - generate_key = false - - iam_project_roles = { - "${local.monitoring_project}" = [ - "roles/run.invoker", - "roles/cloudfunctions.invoker" - ] - } -} - -################################################ -# Cloud Function configuration (& Scheduler) # -# you can comment out the pub/sub call in case of 2nd generation function -################################################ - -module "pubsub" { - - source = "../../../modules/pubsub" - project_id = local.monitoring_project - name = "network-dashboard-pubsub" - subscriptions = { - "network-dashboard-pubsub-default" = null - } - # the Cloud Scheduler robot service account already has pubsub.topics.publish - # at the project level via roles/cloudscheduler.serviceAgent -} - -resource "google_cloud_scheduler_job" "job" { - count = var.cf_version == "V2" ? 0 : 1 - project = local.monitoring_project - region = var.region - name = "network-dashboard-scheduler" - schedule = var.schedule_cron - time_zone = "UTC" - - pubsub_target { - topic_name = module.pubsub.topic.id - data = base64encode("test") - } -} -#http trigger for 2nd generation function - -resource "google_cloud_scheduler_job" "job_httptrigger" { - count = var.cf_version == "V2" ? 1 : 0 - project = local.monitoring_project - region = var.region - name = "network-dashboard-scheduler" - schedule = var.schedule_cron - time_zone = "UTC" - - http_target { - http_method = "POST" - uri = module.cloud-function.uri - - oidc_token { - service_account_email = module.service-account-scheduler.email - } - } -} - -module "cloud-function" { - v2 = var.cf_version == "V2" - source = "../../../modules/cloud-function" - project_id = local.monitoring_project - name = "network-dashboard-cloud-function" - bucket_name = "${local.monitoring_project}-network-dashboard-bucket" - bucket_config = { - location = var.region - } - region = var.region - - bundle_config = { - source_dir = "cloud-function" - output_path = "cloud-function.zip" - } - - function_config = { - timeout = 480 # Timeout in seconds, increase it if your CF timeouts and use v2 if > 9 minutes. - entry_point = "main" - runtime = "python39" - instances = 1 - memory_mb = 256 - - } - - environment_variables = { - MONITORED_PROJECTS_LIST = local.projects - MONITORED_FOLDERS_LIST = local.folders - MONITORING_PROJECT_ID = local.monitoring_project - ORGANIZATION_ID = var.organization_id - CF_VERSION = var.cf_version - } - - service_account = module.service-account-function.email - # Internal only doesn't seem to work with CFv2: - ingress_settings = var.cf_version == "V2" ? "ALLOW_ALL" : "ALLOW_INTERNAL_ONLY" - - trigger_config = var.cf_version == "V2" ? { - v2 = { - event_type = "google.cloud.pubsub.topic.v1.messagePublished" - pubsub_topic = module.pubsub.topic.id - service_account_create = true - } - } : { - v1 = { - event = "google.pubsub.topic.publish" - resource = module.pubsub.topic.id - } - } -} - -################################################ -# Cloud Monitoring Dashboard creation # -################################################ - -resource "google_monitoring_dashboard" "dashboard" { - dashboard_json = file("${path.module}/dashboards/quotas-utilization.json") - project = local.monitoring_project -} diff --git a/blueprints/cloud-operations/network-dashboard/src/README.md b/blueprints/cloud-operations/network-dashboard/src/README.md new file mode 100644 index 0000000000..27dd159c33 --- /dev/null +++ b/blueprints/cloud-operations/network-dashboard/src/README.md @@ -0,0 +1,106 @@ +# Network Dashboard Discovery Tool + +This tool constitutes the discovery and data gathering side of the Network Dashboard, and can be used in combination with the related [Terraform deployment examples](../), or packaged in different ways including standalone manual use. + +- [Quick Usage Example](#quick-usage-example) +- [High Level Architecture and Plugin Design](#high-level-architecture-and-plugin-design) +- [Debugging and Troubleshooting](#debugging-and-troubleshooting) + +## Quick Usage Example + +The tool behaves like a regular CLI app, with several options documented via the usual short help: + +```text +./main.py --help + +Usage: main.py [OPTIONS] + + CLI entry point. + +Options: + -dr, --discovery-root TEXT Root node for asset discovery, + organizations/nnn or folders/nnn. [required] + -op, --monitoring-project TEXT GCP monitoring project where metrics will be + stored. [required] + -p, --project TEXT GCP project id, can be specified multiple + times. + -f, --folder INTEGER GCP folder id, can be specified multiple + times. + --custom-quota-file FILENAME Custom quota file in yaml format. + --dump-file FILENAME Export JSON representation of resources to + file. + --load-file FILENAME Load JSON resources from file, skips init and + discovery. + --debug-plugin TEXT Run only core and specified timeseries plugin. + --help Show this message and exit. +``` + +In normal use three pieces of information need to be passed in: + +- the monitoring project where metric descriptors and timeseries will be stored +- the discovery root scope (organization or top-level folder, [see here for examples](../deploy-cloud-function/README.md#discovery-configuration)) +- the list of folders and/or projects that contain the resources to be monitored (folders will discover all included projects) + +To account for custom quota which are not yet exposed via API or which are applied to individual networks, a YAML file with quota overrides can be specified via the `--custom-quota-file` option. Refer to the [included sample](./custom-quotas.sample) for details on its format. + +A typical invocation might look like this: + +```bash +./main.py \ + -dr organizations/1234567890 \ + -op my-monitoring-project \ + --folder 1234567890 --folder 987654321 \ + --project my-net-project \ + --custom-quota-file custom-quotas.yaml +``` + +## High Level Architecture and Plugin Design + +The tool is composed of two main processing phases + +- the discovery of resources within a predefined scope using Cloud Asset Inventory and Compute APIs +- the computation of metric timeseries derived from discovered resources + +Once both phases are complete, the tool sends generated timeseries to Cloud Operations together with any missing metric descriptors. + +Every action during those phases is delegated to a series of plugins, which conform to simple interfaces and exchange predefined basic types with the main module. Plugins are registered at runtime, and are split in broad categories depending on the stage where they execute: + +- init plugin functions have the task of preparing the required keys in the shared resource data structure. Usually, init functions are usually small and there's one for each discovery plugin +- discovery plugin functions do the bulk of the work of discovering resources; they return HTTP Requests (e.g. calls to GCP APIs) or Resource objects (extracted from the API responses) to the main module, and receive HTTP Responses +- timeseries plugin read from the shared resource data structure, and return computed Metric Descriptors and Timeseries objects + +Plugins are registered via simple functions defined in the [plugin package initialization file](./plugins/__init__.py), and leverage [utility functions](./plugins/utils.py) for batching API requests and parsing results. + +The main module cycles through stages, calling stage plugins in succession iterating over their results. + +## Debugging and Troubleshooting + +A few convenience options are provided to simplify development, debugging and troubleshooting: + +- the discovery phase results can be dumped to a JSON file, that can then be used to check actual resource representation, or skip the discovery phase entirely to speed up development of timeseries-related functions +- a single timeseries plugin can be optionally run alone, to focus debugging and decrease the amount of noise from logs and outputs + +This is an example call that stores discovery results to a file: + +```bash +./main.py \ + -dr organizations/1234567890 \ + -op my-monitoring-project \ + --folder 1234567890 --folder 987654321 \ + --project my-net-project \ + --custom-quota-file custom-quotas.yaml \ + --dump-file out.json +``` + +And this is the corresponding call that skips the discovery phase and also runs a single timeseries plugin: + +```bash +./main.py \ + -dr organizations/1234567890 \ + -op my-monitoring-project \ + --folder 1234567890 --folder 987654321 \ + --project my-net-project \ + --custom-quota-file custom-quotas.yaml \ + --load-file out.json \ + --debug-plugin plugins.series-firewall-rules.timeseries +``` diff --git a/blueprints/cloud-operations/network-dashboard/src/custom-quotas.sample b/blueprints/cloud-operations/network-dashboard/src/custom-quotas.sample new file mode 100644 index 0000000000..9f090b3c50 --- /dev/null +++ b/blueprints/cloud-operations/network-dashboard/src/custom-quotas.sample @@ -0,0 +1,8 @@ +projects: + tf-playground-svpc-net: + global: + INTERNAL_FORWARDING_RULES_PER_NETWORK: 750 +networks: + # TODO: what are the quotas that can be overridden at the network level? + projects/tf-playground-svpc-net/global/networks/shared-vpc: + PEERINGS_PER_NETWORK: 40 diff --git a/blueprints/cloud-operations/network-dashboard/src/main.py b/blueprints/cloud-operations/network-dashboard/src/main.py new file mode 100755 index 0000000000..6db262a669 --- /dev/null +++ b/blueprints/cloud-operations/network-dashboard/src/main.py @@ -0,0 +1,300 @@ +#!/usr/bin/env python3 +# Copyright 2022 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +'Network dashboard: create network-related metric timeseries for GCP resources.' + +import base64 +import binascii +import collections +import json +import logging +import os + +import click +import google.auth +import plugins +import plugins.monitoring +import yaml + +from google.auth.transport.requests import AuthorizedSession + +HTTP = AuthorizedSession(google.auth.default()[0]) +LOGGER = logging.getLogger('net-dash') +MONITORING_ROOT = 'netmon/' + +Result = collections.namedtuple('Result', 'phase resource data') + + +def do_discovery(resources): + '''Calls discovery plugin functions and collect discovered resources. + + The communication with discovery plugins uses double dispatch, where plugins + accept either no args and return 1-n HTTP request instances, or a single HTTP + response and return 1-n resource instances. A queue is set up for each plugin + results since each call can return multiple requests or resources. + + Args: + resources: pre-initialized map where discovered resources will be stored. + ''' + LOGGER.info(f'discovery start') + for plugin in plugins.get_discovery_plugins(): + # set up the queue with the initial list of HTTP requests from this plugin + q = collections.deque(plugin.func(resources)) + while q: + result = q.popleft() + if isinstance(result, plugins.HTTPRequest): + # fetch a single HTTP request + response = fetch(result) + if not response: + continue + if result.json: + try: + # decode the JSON HTTP response and pass it to the plugin + LOGGER.debug(f'passing JSON result to {plugin.name}') + results = plugin.func(resources, response, response.json()) + except json.decoder.JSONDecodeError as e: + LOGGER.critical( + f'error decoding JSON for {result.url}: {e.args[0]}') + continue + else: + # pass the raw HTTP response to the plugin + LOGGER.debug(f'passing raw result to {plugin.name}') + results = plugin.func(resources, response) + q += collections.deque(results) + elif isinstance(result, plugins.Resource): + # store a resource the plugin derived from a previous HTTP response + LOGGER.debug(f'got resource {result} from {plugin.name}') + if result.key: + # this specific resource is indexed by an additional key + resources[result.type][result.id][result.key] = result.data + else: + resources[result.type][result.id] = result.data + LOGGER.info('discovery end {}'.format( + {k: len(v) for k, v in resources.items() if not isinstance(v, str)})) + + +def do_init(resources, discovery_root, monitoring_project, folders=None, projects=None, + custom_quota=None): + '''Calls init plugins to configure keys in the shared resource map. + + Args: + discovery_root: root node for discovery from configuration. + monitoring_project: monitoring project id id from configuration. + folders: list of folder ids for resource discovery from configuration. + projects: list of project ids for resource discovery from configuration. + ''' + LOGGER.info(f'init start') + folders = [str(f) for f in folders or []] + resources['config:discovery_root'] = discovery_root + resources['config:monitoring_project'] = monitoring_project + resources['config:folders'] = folders + resources['config:projects'] = projects or [] + resources['config:custom_quota'] = custom_quota or {} + resources['config:monitoring_root'] = MONITORING_ROOT + if discovery_root.startswith('organization'): + resources['organization'] = discovery_root.split('/')[-1] + for f in folders: + resources['folders'] = {f: {} for f in folders} + for plugin in plugins.get_init_plugins(): + plugin.func(resources) + LOGGER.info(f'init completed, resources {resources}') + + +def do_timeseries_calc(resources, descriptors, timeseries, debug_plugin=None): + '''Calls timeseries plugins and collect resulting descriptors and timeseries. + + Timeseries plugin return a list of MetricDescriptors and Timeseries instances, + one per each metric. + + Args: + resources: shared map of configuration and discovered resources. + descriptors: list where collected descriptors will be stored. + timeseries: list where collected timeseries will be stored. + debug_plugin: optional name of a single plugin to call + ''' + LOGGER.info(f'timeseries calc start (debug plugin: {debug_plugin})') + for plugin in plugins.get_timeseries_plugins(): + if debug_plugin and plugin.name != debug_plugin: + LOGGER.info(f'skipping {plugin.name}') + continue + num_desc, num_ts = 0, 0 + for result in plugin.func(resources): + if not result: + continue + # append result to the relevant collection (descriptors or timeseries) + if isinstance(result, plugins.MetricDescriptor): + descriptors.append(result) + num_desc += 1 + elif isinstance(result, plugins.TimeSeries): + timeseries.append(result) + num_ts += 1 + LOGGER.info(f'{plugin.name}: {num_desc} descriptors {num_ts} timeseries') + LOGGER.info('timeseries calc end (descriptors: {} timeseries: {})'.format( + len(descriptors), len(timeseries))) + + +def do_timeseries_descriptors(project_id, existing, computed): + '''Executes API calls for each previously computed metric descriptor. + + Args: + project_id: monitoring project id where to write descriptors. + existing: map of existing descriptor types. + computed: list of plugins.MetricDescriptor instances previously computed. + ''' + LOGGER.info('timeseries descriptors start') + requests = plugins.monitoring.descriptor_requests(project_id, MONITORING_ROOT, + existing, computed) + num = 0 + for request in requests: + fetch(request) + num += 1 + LOGGER.info('timeseries descriptors end (computed: {} created: {})'.format( + len(computed), num)) + + +def do_timeseries(project_id, timeseries, descriptors): + '''Executes API calls for each previously computed timeseries. + + Args: + project_id: monitoring project id where to write timeseries. + timeseries: list of plugins.Timeseries instances. + descriptors: list of plugins.MetricDescriptor instances matching timeseries. + ''' + LOGGER.info('timeseries start') + requests = plugins.monitoring.timeseries_requests(project_id, MONITORING_ROOT, + timeseries, descriptors) + num = 0 + for request in requests: + fetch(request) + num += 1 + LOGGER.info('timeseries end (number: {} requests: {})'.format( + len(timeseries), num)) + + +def fetch(request): + '''Minimal HTTP client interface for API calls. + + Executes the HTTP request passed as argument using the google.auth + authenticated session. + + Args: + request: an instance of plugins.HTTPRequest. + Returns: + JSON-decoded or raw response depending on the 'json' request attribute. + ''' + # try + LOGGER.debug(f'fetch {"POST" if request.data else "GET"} {request.url}') + try: + if not request.data: + response = HTTP.get(request.url, headers=request.headers) + else: + response = HTTP.post(request.url, headers=request.headers, + data=request.data) + except google.auth.exceptions.RefreshError as e: + raise SystemExit(e.args[0]) + if response.status_code != 200: + LOGGER.critical( + f'response code {response.status_code} for URL {request.url}') + LOGGER.critical(response.content) + print(request.data) + raise SystemExit(1) + return response + + +def main_cf_pubsub(event, context): + 'Entry point for Cloud Function triggered by a PubSub message.' + debug = os.environ.get('DEBUG') + logging.basicConfig(level=logging.DEBUG if debug else logging.INFO) + LOGGER.info('processing pubsub payload') + try: + payload = json.loads(base64.b64decode(event['data']).decode('utf-8')) + except (binascii.Error, json.JSONDecodeError) as e: + raise SystemExit(f'Invalid payload: e.args[0].') + discovery_root = payload.get('discovery_root') + monitoring_project = payload.get('monitoring_project') + if not discovery_root: + LOGGER.critical('no discovery roo project specified') + LOGGER.info(payload) + raise SystemExit(f'Invalid options') + if not monitoring_project: + LOGGER.critical('no monitoring project specified') + LOGGER.info(payload) + raise SystemExit(f'Invalid options') + if discovery_root.partition('/')[0] not in ('folders', 'organizations'): + raise SystemExit(f'Invalid discovery root {discovery_root}.') + custom_quota = payload.get('custom_quota', {}) + descriptors = [] + folders = payload.get('folders', []) + projects = payload.get('projects', []) + resources = {} + timeseries = [] + do_init(resources, discovery_root, monitoring_project, folders, projects, + custom_quota) + do_discovery(resources) + do_timeseries_calc(resources, descriptors, timeseries) + do_timeseries_descriptors(monitoring_project, resources['metric-descriptors'], + descriptors) + do_timeseries(monitoring_project, timeseries, descriptors) + + +@click.command() +@click.option( + '--discovery-root', '-dr', required=True, + help='Root node for asset discovery, organizations/nnn or folders/nnn.') +@click.option('--monitoring-project', '-mon', required=True, type=str, + help='GCP monitoring project where metrics will be stored.') +@click.option('--project', '-p', type=str, multiple=True, + help='GCP project id, can be specified multiple times.') +@click.option('--folder', '-f', type=int, multiple=True, + help='GCP folder id, can be specified multiple times.') +@click.option('--custom-quota-file', type=click.File('r'), + help='Custom quota file in yaml format.') +@click.option('--dump-file', type=click.File('w'), + help='Export JSON representation of resources to file.') +@click.option('--load-file', type=click.File('r'), + help='Load JSON resources from file, skips init and discovery.') +@click.option('--debug-plugin', + help='Run only core and specified timeseries plugin.') +def main(discovery_root, monitoring_project, project=None, folder=None, + custom_quota_file=None, dump_file=None, load_file=None, + debug_plugin=None): + 'CLI entry point.' + logging.basicConfig(level=logging.INFO) + if discovery_root.partition('/')[0] not in ('folders', 'organizations'): + raise SystemExit('Invalid discovery root.') + descriptors = [] + timeseries = [] + if load_file: + resources = json.load(load_file) + else: + custom_quota = {} + resources = {} + if custom_quota_file: + try: + custom_quota = yaml.load(custom_quota_file, Loader=yaml.Loader) + except yaml.YAMLError as e: + raise SystemExit(f'Error decoding custom quota file: {e.args[0]}') + do_init(resources, discovery_root, monitoring_project, folder, project, + custom_quota) + do_discovery(resources) + if dump_file: + json.dump(resources, dump_file, indent=2) + do_timeseries_calc(resources, descriptors, timeseries, debug_plugin) + do_timeseries_descriptors(monitoring_project, resources['metric-descriptors'], + descriptors) + do_timeseries(monitoring_project, timeseries, descriptors) + + +if __name__ == '__main__': + main(auto_envvar_prefix='NETMON') diff --git a/blueprints/cloud-operations/network-dashboard/src/plugins/__init__.py b/blueprints/cloud-operations/network-dashboard/src/plugins/__init__.py new file mode 100644 index 0000000000..1bdc4cb20b --- /dev/null +++ b/blueprints/cloud-operations/network-dashboard/src/plugins/__init__.py @@ -0,0 +1,81 @@ +# Copyright 2022 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +'''Plugin interface objects and registration functions. + +This module export the objects passed to and returned from plugin functions, +and the function used to register plugins for each stage, and get all plugins +for individual stages. +''' + +import collections +import enum +import functools +import importlib +import pathlib +import pkgutil +import types + +__all__ = [ + 'HTTPRequest', 'Level', 'PluginError', 'Resource', 'get_discovery_plugins', + 'get_init_plugins', 'register_discovery', 'register_init' +] + +_PLUGINS_DISCOVERY = [] +_PLUGINS_INIT = [] +_PLUGINS_TIMESERIES = [] + +HTTPRequest = collections.namedtuple('HTTPRequest', 'url headers data json', + defaults=[True]) +Level = enum.IntEnum('Level', 'CORE PRIMARY DERIVED') +MetricDescriptor = collections.namedtuple('MetricDescriptor', + 'type name labels is_ratio', + defaults=[False]) +Plugin = collections.namedtuple('Plugin', 'func name level priority', + defaults=[Level.PRIMARY, 99]) +Resource = collections.namedtuple('Resource', 'type id data key', + defaults=[None]) +TimeSeries = collections.namedtuple('TimeSeries', 'metric value labels') + + +class PluginError(Exception): + pass + + +def _register_plugin(collection, *args): + 'Derive plugin name from function and add to its collection.' + if args and type(args[0]) == types.FunctionType: + collection.append( + Plugin(args[0], f'{args[0].__module__}.{args[0].__name__}')) + return + + def outer(func): + collection.append(Plugin(func, f'{func.__module__}.{func.__name__}', *args)) + return func + + return outer + + +get_discovery_plugins = functools.partial(iter, _PLUGINS_DISCOVERY) +get_init_plugins = functools.partial(iter, _PLUGINS_INIT) +get_timeseries_plugins = functools.partial(iter, _PLUGINS_TIMESERIES) +register_discovery = functools.partial(_register_plugin, _PLUGINS_DISCOVERY) +register_init = functools.partial(_register_plugin, _PLUGINS_INIT) +register_timeseries = functools.partial(_register_plugin, _PLUGINS_TIMESERIES) + +_plugins_path = str(pathlib.Path(__file__).parent) + +for mod_info in pkgutil.iter_modules([_plugins_path], 'plugins.'): + importlib.import_module(mod_info.name) + +_PLUGINS_DISCOVERY.sort(key=lambda i: i.level) diff --git a/blueprints/cloud-operations/network-dashboard/src/plugins/core-discover-cai-nodes.py b/blueprints/cloud-operations/network-dashboard/src/plugins/core-discover-cai-nodes.py new file mode 100644 index 0000000000..dc5c53247e --- /dev/null +++ b/blueprints/cloud-operations/network-dashboard/src/plugins/core-discover-cai-nodes.py @@ -0,0 +1,80 @@ +# Copyright 2022 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +'''Project and folder discovery from configuration options. + +This plugin needs to run first, as it's responsible for discovering nodes that +contain resources: folders and projects contained in the hierarchy passed in +via configuration options. Node resources are fetched from Cloud Asset +Inventory based on explicit id or being part of a folder hierarchy. +''' + +import logging + +from . import HTTPRequest, Level, Resource, register_init, register_discovery +from .utils import parse_page_token, parse_cai_results + +LOGGER = logging.getLogger('net-dash.discovery.cai-nodes') + +CAI_URL = ('https://content-cloudasset.googleapis.com/v1p1beta1' + '/{}/resources:searchAll' + '?assetTypes=cloudresourcemanager.googleapis.com/Folder' + '&assetTypes=cloudresourcemanager.googleapis.com/Project' + '&pageSize=500') + + +def _handle_discovery(resources, response, data): + 'Processes asset response and returns project resources or next URLs.' + LOGGER.info('discovery handle request') + for result in parse_cai_results(data, 'nodes'): + asset_type = result['assetType'].split('/')[-1] + name = result['name'].split('/')[-1] + if asset_type == 'Folder': + yield Resource('folders', name, {'name': result['displayName']}) + elif asset_type == 'Project': + number = result['project'].split('/')[1] + data = {'number': number, 'project_id': name} + yield Resource('projects', name, data) + yield Resource('projects:number', number, data) + else: + LOGGER.info(f'unknown resource {name}') + next_url = parse_page_token(data, response.request.url) + if next_url: + LOGGER.info('discovery next url') + yield HTTPRequest(next_url, {}, None) + + +@register_init +def init(resources): + 'Prepares project datastructures in the shared resource map.' + LOGGER.info('init') + resources.setdefault('folders', {}) + resources.setdefault('projects', {}) + resources.setdefault('projects:number', {}) + + +@register_discovery(Level.CORE, 0) +def start_discovery(resources, response=None, data=None): + 'Plugin entry point, triggers discovery and handles requests and responses.' + LOGGER.info(f'discovery (has response: {response is not None})') + if response is None: + # return initial discovery URLs + for v in resources['config:folders']: + yield HTTPRequest(CAI_URL.format(f'folders/{v}'), {}, None) + for v in resources['config:projects']: + if v not in resources['projects']: + yield HTTPRequest(CAI_URL.format(f'projects/{v}'), {}, None) + else: + # pass the API response to the plugin data handler and return results + for result in _handle_discovery(resources, response, data): + yield result diff --git a/blueprints/cloud-operations/network-dashboard/src/plugins/discover-cai-compute.py b/blueprints/cloud-operations/network-dashboard/src/plugins/discover-cai-compute.py new file mode 100644 index 0000000000..86379fb1c7 --- /dev/null +++ b/blueprints/cloud-operations/network-dashboard/src/plugins/discover-cai-compute.py @@ -0,0 +1,231 @@ +# Copyright 2022 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +'''Compute resources discovery from Cloud Asset Inventory. + +This plugin handles discovery for Compute resources via a broad org-level +scoped CAI search. Common resource attributes are parsed by a generic handler +function, which then delegates parsing of resource-level attributes to smaller +specialized functions, one per resource type. +''' + +import logging + +from . import HTTPRequest, Level, Resource, register_init, register_discovery +from .utils import parse_cai_results + + +CAI_URL = ('https://content-cloudasset.googleapis.com/v1' + '/{root}/assets' + '?contentType=RESOURCE&{asset_types}&pageSize=500') +LOGGER = logging.getLogger('net-dash.discovery.cai-compute') +TYPES = { + 'addresses': 'Address', + 'firewall_policies': 'FirewallPolicy', + 'firewall_rules': 'Firewall', + 'forwarding_rules': 'ForwardingRule', + 'instances': 'Instance', + 'networks': 'Network', + 'subnetworks': 'Subnetwork', + 'routers': 'Router', + 'routes': 'Route', +} +NAMES = {v: k for k, v in TYPES.items()} + + +def _get_parent(parent, resources): + 'Extracts and returns resource parent and type.' + parent_type, parent_id = parent.split('/')[-2:] + if parent_type == 'projects': + project = resources['projects:number'].get(parent_id) + if project: + return {'project_id': project['project_id'], 'project_number': parent_id} + if parent_type == 'folders': + if parent_id in resources['folders']: + return {'parent': f'{parent_type}/{parent_id}'} + if resources.get('organization') == parent_id: + return {'parent': f'{parent_type}/{parent_id}'} + + +def _handle_discovery(resources, response, data): + 'Processes the asset API response and returns parsed resources or next URL.' + LOGGER.info('discovery handle request') + for result in parse_cai_results(data, 'cai-compute', method='list'): + resource = _handle_resource(resources, result['resource']) + if not resource: + continue + yield resource + page_token = data.get('nextPageToken') + if page_token: + LOGGER.info('requesting next page') + url = _url(resources) + yield HTTPRequest(f'{url}&pageToken={page_token}', {}, None) + + +def _handle_resource(resources, data): + 'Parses and returns a single resource. Calls resource-level handler.' + attrs = data['data'] + # general attributes shared by all resource types + resource_name = NAMES[data['discoveryName']] + resource = { + 'id': attrs['id'], + 'name': attrs['name'], + 'self_link': _self_link(attrs['selfLink']) + } + # derive parent type and id and skip if parent is not within scope + parent_data = _get_parent(data['parent'], resources) + if not parent_data: + LOGGER.info(f'{resource["self_link"]} outside perimeter') + LOGGER.debug([ + resources['organization'], resources['folders'], + resources['projects:number'] + ]) + return + resource.update(parent_data) + # gets and calls the resource-level handler for type specific attributes + func = globals().get(f'_handle_{resource_name}') + if not callable(func): + raise SystemExit(f'specialized function missing for {resource_name}') + extra_attrs = func(resource, attrs) + if not extra_attrs: + return + resource.update(extra_attrs) + return Resource(resource_name, resource['self_link'], resource) + + +def _handle_addresses(resource, data): + 'Handles address type resource data.' + network = data.get('network') + subnet = data.get('subnetwork') + return { + 'address': data['address'], + 'internal': data.get('addressType') == 'INTERNAL', + 'purpose': data.get('purpose', ''), + 'status': data.get('status', ''), + 'network': None if not network else _self_link(network), + 'subnetwork': None if not subnet else _self_link(subnet) + } + + +def _handle_firewall_policies(resource, data): + 'Handles firewall policy type resource data.' + return { + 'num_rules': len(data.get('rules', [])), + 'num_tuples': data.get('ruleTupleCount', 0) + } + + +def _handle_firewall_rules(resource, data): + 'Handles firewall type resource data.' + return {'network': _self_link(data['network'])} + + +def _handle_forwarding_rules(resource, data): + 'Handles forwarding_rules type resource data.' + network = data.get('network') + region = data.get('region') + subnet = data.get('subnetwork') + return { + 'address': data.get('IPAddress'), + 'load_balancing_scheme': data.get('loadBalancingScheme', ''), + 'network': None if not network else _self_link(network), + 'psc_accepted': data.get('pscConnectionStatus') == 'ACCEPTED', + 'region': None if not region else region.split('/')[-1], + 'subnetwork': None if not subnet else _self_link(subnet) + } + + +def _handle_instances(resource, data): + 'Handles instance type resource data.' + if data['status'] != 'RUNNING': + return + networks = [{ + 'network': _self_link(i['network']), + 'subnetwork': _self_link(i['subnetwork']) + } for i in data.get('networkInterfaces', [])] + return {'zone': data['zone'], 'networks': networks} + + +def _handle_networks(resource, data): + 'Handles network type resource data.' + peerings = [{ + 'active': p['state'] == 'ACTIVE', + 'name': p['name'], + 'network': _self_link(p['network']), + 'project_id': _self_link(p['network']).split('/')[1] + } for p in data.get('peerings', [])] + subnets = [_self_link(s) for s in data.get('subnetworks', [])] + return {'peerings': peerings, 'subnetworks': subnets} + + +def _handle_routers(resource, data): + 'Handles router type resource data.' + return { + 'network': _self_link(data['network']), + 'region': data['region'].split('/')[-1] + } + + +def _handle_routes(resource, data): + 'Handles route type resource data.' + hop = [ + a.removeprefix('nextHop').lower() for a in data if a.startswith('nextHop') + ] + return {'next_hop_type': hop[0], 'network': _self_link(data['network'])} + + +def _handle_subnetworks(resource, data): + 'Handles subnetwork type resource data.' + secondary_ranges = [{ + 'name': s['rangeName'], + 'cidr_range': s['ipCidrRange'] + } for s in data.get('secondaryIpRanges', [])] + return { + 'cidr_range': data['ipCidrRange'], + 'network': _self_link(data['network']), + 'purpose': data.get('purpose'), + 'region': data['region'].split('/')[-1], + 'secondary_ranges': secondary_ranges + } + + +def _self_link(s): + 'Removes initial part from self links.' + return s.removeprefix('https://www.googleapis.com/compute/v1/') + + +def _url(resources): + 'Returns discovery URL' + discovery_root = resources['config:discovery_root'] + asset_types = '&'.join( + f'assetTypes=compute.googleapis.com/{t}' for t in TYPES.values()) + return CAI_URL.format(root=discovery_root, asset_types=asset_types) + + +@register_init +def init(resources): + 'Prepares the datastructures for types managed here in the resource map.' + LOGGER.info('init') + for name in TYPES: + resources.setdefault(name, {}) + + +@register_discovery(Level.PRIMARY, 10) +def start_discovery(resources, response=None, data=None): + 'Plugin entry point, triggers discovery and handles requests and responses.' + LOGGER.info(f'discovery (has response: {response is not None})') + if response is None: + yield HTTPRequest(_url(resources), {}, None) + else: + for result in _handle_discovery(resources, response, data): + yield result diff --git a/blueprints/cloud-operations/network-dashboard/src/plugins/discover-compute-quota.py b/blueprints/cloud-operations/network-dashboard/src/plugins/discover-compute-quota.py new file mode 100644 index 0000000000..9c9e8f9486 --- /dev/null +++ b/blueprints/cloud-operations/network-dashboard/src/plugins/discover-compute-quota.py @@ -0,0 +1,88 @@ +# Copyright 2022 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +'''Discovers project quota via Compute API and overlay user overrides. + +This plugin discovers project quota via batch Compute API requests. Project and +network quotas are then optionally overlaid with custom quota modifiers passed +in as options. Region quota discovery is partially implemented but not active. +''' + +import logging + +from . import Level, Resource, register_init, register_discovery +from .utils import batched, poor_man_mp_request, poor_man_mp_response + +LOGGER = logging.getLogger('net-dash.discovery.compute-quota') +NAME = 'quota' + +API_GLOBAL_URL = '/compute/v1/projects/{}' +API_REGION_URL = '/compute/v1/projects/{}/regions/{}' + + +def _handle_discovery(resources, response): + 'Processes asset batch response and overlays custom quota.' + LOGGER.info('discovery handle request') + content_type = response.headers['content-type'] + per_project_quota = resources['config:custom_quota'].get('projects', {}) + # process batch response + for part in poor_man_mp_response(content_type, response.content): + kind = part.get('kind') + quota = { + q['metric']: int(q['limit']) + for q in sorted(part.get('quotas', []), key=lambda v: v['metric']) + } + self_link = part.get('selfLink') + if not self_link: + logging.warn('invalid quota response') + self_link = self_link.split('/') + if kind == 'compute#project': + project_id = self_link[-1] + region = 'global' + elif kind == 'compute#region': + project_id = self_link[-3] + region = self_link[-1] + # custom quota overrides + for k, v in per_project_quota.get(project_id, {}).get(region, {}).items(): + quota[k] = int(v) + if project_id not in resources[NAME]: + resources[NAME][project_id] = {} + yield Resource(NAME, project_id, quota, region) + + +@register_init +def init(resources): + 'Prepares quota datastructures in the shared resource map.' + LOGGER.info('init') + resources.setdefault(NAME, {}) + + +@register_discovery(Level.DERIVED, 0) +def start_discovery(resources, response=None): + 'Plugin entry point, triggers discovery and handles requests and responses.' + LOGGER.info(f'discovery (has response: {response is not None})') + if response is None: + # TODO: regions + urls = [API_GLOBAL_URL.format(p) for p in resources['projects']] + if not urls: + return + for batch in batched(urls, 10): + yield poor_man_mp_request(batch) + else: + for result in _handle_discovery(resources, response): + yield result + # store custom network-level quota + per_network_quota = resources['config:custom_quota'].get('networks', {}) + for network_id, overrides in per_network_quota.items(): + quota = {k: int(v) for k, v in overrides.items()} + yield Resource(NAME, network_id, quota) diff --git a/blueprints/cloud-operations/network-dashboard/src/plugins/discover-compute-routerstatus.py b/blueprints/cloud-operations/network-dashboard/src/plugins/discover-compute-routerstatus.py new file mode 100644 index 0000000000..cd2840b771 --- /dev/null +++ b/blueprints/cloud-operations/network-dashboard/src/plugins/discover-compute-routerstatus.py @@ -0,0 +1,89 @@ +# Copyright 2022 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +'''Discovers dynamic route counts via router status. + +This plugin depends on the CAI Compute one as it discovers dynamic route +data by parsing router status, and it needs routers to have already been +discovered. It uses batch Compute API requests via the utils functions. +''' + +import logging + +from . import Level, Resource, register_init, register_discovery +from .utils import batched, poor_man_mp_request, poor_man_mp_response + +LOGGER = logging.getLogger('net-dash.discovery.compute-routes-dynamic') +NAME = 'routes_dynamic' + +API_URL = '/compute/v1/projects/{}/regions/{}/routers/{}/getRouterStatus' + + +def _handle_discovery(resources, response): + 'Processes asset batch response and parses router status data.' + LOGGER.info('discovery handle request') + content_type = response.headers['content-type'] + routers = [r for r in resources['routers'].values()] + # process batch response + for i, part in enumerate(poor_man_mp_response(content_type, + response.content)): + router = routers[i] + result = part.get('result') + if not result: + LOGGER.info(f'skipping router {router["self_link"]}, no result') + continue + bgp_peer_status = result.get('bgpPeerStatus') + if not bgp_peer_status: + LOGGER.info(f'skipping router {router["self_link"]}, no bgp peer status') + continue + network = result.get('network') + if not network: + LOGGER.info(f'skipping router {router["self_link"]}, no bgp peer status') + continue + if not network.endswith(router['network']): + LOGGER.warn( + f'router network mismatch: got {network} expected {router["network"]}' + ) + continue + num_learned_routes = sum( + int(p.get('numLearnedRoutes', 0)) for p in bgp_peer_status) + if router['network'] not in resources[NAME]: + resources[NAME][router['network']] = {} + yield Resource(NAME, router['network'], num_learned_routes, + router['self_link']) + yield + + +@register_init +def init(resources): + 'Prepares dynamic routes datastructure in the shared resource map.' + LOGGER.info('init') + resources.setdefault(NAME, {}) + + +@register_discovery(Level.DERIVED) +def start_discovery(resources, response=None): + 'Plugin entry point, triggers discovery and handles requests and responses.' + LOGGER.info(f'discovery (has response: {response is not None})') + if not response: + urls = [ + API_URL.format(r['project_id'], r['region'], r['name']) + for r in resources['routers'].values() + ] + if not urls: + return + for batch in batched(urls, 10): + yield poor_man_mp_request(batch) + else: + for result in _handle_discovery(resources, response): + yield result diff --git a/blueprints/cloud-operations/network-dashboard/src/plugins/discover-group-networks.py b/blueprints/cloud-operations/network-dashboard/src/plugins/discover-group-networks.py new file mode 100644 index 0000000000..350c288b4c --- /dev/null +++ b/blueprints/cloud-operations/network-dashboard/src/plugins/discover-group-networks.py @@ -0,0 +1,39 @@ +# Copyright 2022 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +'Group discovered networks by project.' + +import itertools +import logging + +from . import Level, Resource, register_init, register_discovery + +LOGGER = logging.getLogger('net-dash.discovery.compute-routes-dynamic') +NAME = 'networks:project' + + +@register_init +def init(resources): + 'Prepares datastructure in the shared resource map.' + LOGGER.info('init') + resources.setdefault(NAME, {}) + + +@register_discovery(Level.DERIVED) +def start_discovery(resources, response=None): + 'Plugin entry point, group and return discovered networks.' + LOGGER.info(f'discovery (has response: {response is not None})') + grouped = itertools.groupby(resources['networks'].values(), + lambda v: v['project_id']) + for project_id, vpcs in grouped: + yield Resource(NAME, project_id, [v['self_link'] for v in vpcs]) diff --git a/blueprints/cloud-operations/network-dashboard/src/plugins/discover-metric-descriptors.py b/blueprints/cloud-operations/network-dashboard/src/plugins/discover-metric-descriptors.py new file mode 100644 index 0000000000..a9e4090def --- /dev/null +++ b/blueprints/cloud-operations/network-dashboard/src/plugins/discover-metric-descriptors.py @@ -0,0 +1,69 @@ +# Copyright 2022 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +'''Discover existing network dashboard metric descriptors. + +Populating this data allows the tool to later compute which metric descriptors +need to be created. +''' + +import logging +import urllib.parse + +from . import HTTPRequest, Level, Resource, register_init, register_discovery +from .utils import parse_page_token + +LOGGER = logging.getLogger('net-dash.discovery.metrics') +NAME = 'metric-descriptors' + +URL = ('https://content-monitoring.googleapis.com/v3/projects' + '/{}/metricDescriptors' + '?filter=metric.type%3Dstarts_with(%22custom.googleapis.com%2F{}%22)' + '&pageSize=500') + + +def _handle_discovery(resources, response, data): + 'Processes monitoring API response and parses descriptor data.' + LOGGER.info('discovery handle request') + descriptors = data.get('metricDescriptors') + if not descriptors: + LOGGER.info('no descriptors found') + return + for d in descriptors: + yield Resource(NAME, d['type'], {}) + next_url = parse_page_token(data, response.request.url) + if next_url: + LOGGER.info('discovery next url') + yield HTTPRequest(next_url, {}, None) + + +@register_init +def init(resources): + 'Prepares datastructure in the shared resource map.' + LOGGER.info('init') + resources.setdefault(NAME, {}) + + +@register_discovery(Level.CORE, 99) +def start_discovery(resources, response=None, data=None): + 'Plugin entry point, triggers discovery and handles requests and responses.' + LOGGER.info(f'discovery (has response: {response is not None})') + project_id = resources['config:monitoring_project'] + type_root = resources['config:monitoring_root'] + url = URL.format(urllib.parse.quote_plus(project_id), + urllib.parse.quote_plus(type_root)) + if response is None: + yield HTTPRequest(url, {}, None) + else: + for result in _handle_discovery(resources, response, data): + yield result diff --git a/blueprints/cloud-operations/network-dashboard/src/plugins/monitoring.py b/blueprints/cloud-operations/network-dashboard/src/plugins/monitoring.py new file mode 100644 index 0000000000..de4eae8971 --- /dev/null +++ b/blueprints/cloud-operations/network-dashboard/src/plugins/monitoring.py @@ -0,0 +1,106 @@ +# Copyright 2022 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +'Utility functions to create monitoring API requests.' + +import collections +import datetime +import json +import logging + +from . import HTTPRequest +from .utils import batched + +DESCRIPTOR_TYPE_BASE = 'custom.googleapis.com/{}' +DESCRIPTOR_URL = ('https://content-monitoring.googleapis.com/v3' + '/projects/{}/metricDescriptors?alt=json') +HEADERS = {'content-type': 'application/json'} +LOGGER = logging.getLogger('net-dash.plugins.monitoring') +TIMESERIES_URL = ('https://content-monitoring.googleapis.com/v3' + '/projects/{}/timeSeries?alt=json') + + +def descriptor_requests(project_id, root, existing, computed): + 'Returns create requests for missing descriptors.' + type_base = DESCRIPTOR_TYPE_BASE.format(root) + url = DESCRIPTOR_URL.format(project_id) + for descriptor in computed: + d_type = f'{type_base}{descriptor.type}' + if d_type in existing: + continue + LOGGER.info(f'creating descriptor {d_type}') + if descriptor.is_ratio: + unit = '10^2.%' + value_type = 'DOUBLE' + else: + unit = '1' + value_type = 'INT64' + data = json.dumps({ + 'type': d_type, + 'displayName': descriptor.name, + 'metricKind': 'GAUGE', + 'valueType': value_type, + 'unit': unit, + 'monitoredResourceTypes': ['global'], + 'labels': [{ + 'key': l, + 'valueType': 'STRING' + } for l in descriptor.labels] + }) + yield HTTPRequest(url, HEADERS, data) + + +def timeseries_requests(project_id, root, timeseries, descriptors): + 'Returns create requests for timeseries.' + descriptor_valuetypes = {d.type: d.is_ratio for d in descriptors} + end_time = ''.join((datetime.datetime.utcnow().isoformat('T'), 'Z')) + type_base = DESCRIPTOR_TYPE_BASE.format(root) + url = TIMESERIES_URL.format(project_id) + # group timeseries in buckets by their type so that multiple timeseries + # can be grouped in a single API request without grouping duplicates types + ts_buckets = {} + for ts in timeseries: + bucket = ts_buckets.setdefault(ts.metric, collections.deque()) + bucket.append(ts) + LOGGER.info(f'metric types {list(ts_buckets.keys())}') + ts_buckets = list(ts_buckets.values()) + while ts_buckets: + data = {'timeSeries': []} + for bucket in ts_buckets: + ts = bucket.popleft() + if descriptor_valuetypes[ts.metric]: + pv = 'doubleValue' + else: + pv = 'int64Value' + data['timeSeries'].append({ + 'metric': { + 'type': f'{type_base}{ts.metric}', + 'labels': ts.labels + }, + 'resource': { + 'type': 'global' + }, + 'points': [{ + 'interval': { + 'endTime': end_time + }, + 'value': { + pv: ts.value + } + }] + }) + req_num = len(data['timeSeries']) + tot_num = sum(len(b) for b in ts_buckets) + LOGGER.info(f'sending {req_num} remaining: {tot_num}') + yield HTTPRequest(url, HEADERS, json.dumps(data)) + ts_buckets = [b for b in ts_buckets if b] diff --git a/blueprints/cloud-operations/network-dashboard/src/plugins/series-firewall-policies.py b/blueprints/cloud-operations/network-dashboard/src/plugins/series-firewall-policies.py new file mode 100644 index 0000000000..defd697532 --- /dev/null +++ b/blueprints/cloud-operations/network-dashboard/src/plugins/series-firewall-policies.py @@ -0,0 +1,43 @@ +# Copyright 2022 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +'Prepares descriptors and timeseries for firewall policy resources.' + +import logging + +from . import MetricDescriptor, TimeSeries, register_timeseries + +DESCRIPTOR_ATTRS = { + 'tuples_used': 'Firewall tuples used per policy', + 'tuples_available': 'Firewall tuples limit per policy', + 'tuples_used_ratio': 'Firewall tuples used ratio per policy' +} +DESCRIPTOR_LABELS = ('parent', 'name') +LOGGER = logging.getLogger('net-dash.timeseries.firewall-policies') +TUPLE_LIMIT = 2000 + + +@register_timeseries +def timeseries(resources): + 'Returns used/available/ratio firewall tuples timeseries by policy.' + LOGGER.info('timeseries') + for dtype, name in DESCRIPTOR_ATTRS.items(): + yield MetricDescriptor(f'firewall_policy/{dtype}', name, DESCRIPTOR_LABELS, + dtype.endswith('ratio')) + for v in resources['firewall_policies'].values(): + tuples = int(v['num_tuples']) + labels = {'parent': v['parent'], 'name': v['name']} + yield TimeSeries('firewall_policy/tuples_used', tuples, labels) + yield TimeSeries('firewall_policy/tuples_available', TUPLE_LIMIT, labels) + yield TimeSeries('firewall_policy/tuples_used_ratio', tuples / TUPLE_LIMIT, + labels) diff --git a/blueprints/cloud-operations/network-dashboard/src/plugins/series-firewall-rules.py b/blueprints/cloud-operations/network-dashboard/src/plugins/series-firewall-rules.py new file mode 100644 index 0000000000..5490e6d3bd --- /dev/null +++ b/blueprints/cloud-operations/network-dashboard/src/plugins/series-firewall-rules.py @@ -0,0 +1,59 @@ +# Copyright 2022 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +'Prepares descriptors and timeseries for firewall rules by project and network.' + +import itertools +import logging + +from . import MetricDescriptor, TimeSeries, register_timeseries + +DESCRIPTOR_ATTRS = { + 'firewall_rules_used': 'Firewall rules used per project', + 'firewall_rules_available': 'Firewall rules limit per project', + 'firewall_rules_used_ratio': 'Firewall rules used ratio per project', +} +LOGGER = logging.getLogger('net-dash.timeseries.firewall-rules') + + +@register_timeseries +def timeseries(resources): + 'Returns used/available/ratio firewall timeseries by project and network.' + LOGGER.info('timeseries') + # return a single descriptor for network as we don't have limits + yield MetricDescriptor(f'network/firewall_rules_used', + 'Firewall rules used per network', ('project', 'name')) + # return used/vailable/ratio descriptors for project + for dtype, name in DESCRIPTOR_ATTRS.items(): + yield MetricDescriptor(f'project/{dtype}', name, ('project',), + dtype.endswith('ratio')) + # group firewall rules by network then prepare and return timeseries + grouped = itertools.groupby(resources['firewall_rules'].values(), + lambda v: v['network']) + for network_id, rules in grouped: + count = len(list(rules)) + labels = { + 'name': resources['networks'][network_id]['name'], + 'project': resources['networks'][network_id]['project_id'] + } + yield TimeSeries('network/firewall_rules_used', count, labels) + # group firewall rules by project then prepare and return timeseries + grouped = itertools.groupby(resources['firewall_rules'].values(), + lambda v: v['project_id']) + for project_id, rules in grouped: + count = len(list(rules)) + limit = int(resources['quota'][project_id]['global']['FIREWALLS']) + labels = {'project': project_id} + yield TimeSeries('project/firewall_rules_used', count, labels) + yield TimeSeries('project/firewall_rules_available', limit, labels) + yield TimeSeries('project/firewall_rules_used_ratio', count / limit, labels) diff --git a/blueprints/cloud-operations/network-dashboard/src/plugins/series-networks.py b/blueprints/cloud-operations/network-dashboard/src/plugins/series-networks.py new file mode 100644 index 0000000000..0ce7a4b304 --- /dev/null +++ b/blueprints/cloud-operations/network-dashboard/src/plugins/series-networks.py @@ -0,0 +1,142 @@ +# Copyright 2022 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +'''Prepares descriptors and timeseries for network-level metrics. + +This plugin computes metrics for a variety of network resource types like +subnets, instances, peerings, etc. It mostly does so by first grouping +resources for a type, and then using a generalized function to derive counts +and ratios and compute the actual timeseries. +''' + +import functools +import itertools +import logging + +from . import MetricDescriptor, TimeSeries, register_timeseries + +DESCRIPTOR_ATTRS = { + 'forwarding_rules_l4_available': 'L4 fwd rules limit per network', + 'forwarding_rules_l4_used': 'L4 fwd rules used per network', + 'forwarding_rules_l4_used_ratio': 'L4 fwd rules used ratio per network', + 'forwarding_rules_l7_available': 'L7 fwd rules limit per network', + 'forwarding_rules_l7_used': 'L7 fwd rules used per network', + 'forwarding_rules_l7_used_ratio': 'L7 fwd rules used ratio per network', + 'instances_available': 'Instance limit per network', + 'instances_used': 'Instance used per network', + 'instances_used_ratio': 'Instance used ratio per network', + 'peerings_active_available': 'Active peering limit per network', + 'peerings_active_used': 'Active peering used per network', + 'peerings_active_used_ratio': 'Active peering used ratio per network', + 'peerings_total_available': 'Total peering limit per network', + 'peerings_total_used': 'Total peering used per network', + 'peerings_total_used_ratio': 'Total peering used ratio per network', + 'subnets_available': 'Subnet limit per network', + 'subnets_used': 'Subnet used per network', + 'subnets_used_ratio': 'Subnet used ratio per network' +} +LIMITS = { + 'INSTANCES_PER_NETWORK_GLOBAL': 15000, + 'INTERNAL_FORWARDING_RULES_PER_NETWORK': 500, + 'INTERNAL_MANAGED_FORWARDING_RULES_PER_NETWORK': 75, + 'ROUTES': 250, + 'SUBNET_RANGES_PER_NETWORK': 300 +} +LOGGER = logging.getLogger('net-dash.timeseries.networks') + + +def _group_timeseries(name, resources, grouped, limit_name): + 'Generalized function that returns timeseries from data grouped by network.' + for network_id, elements in grouped: + network = resources['networks'].get(network_id) + if not network: + LOGGER.info(f'out of scope {name} network {network_id}') + continue + count = len(list(elements)) + labels = {'project': network['project_id'], 'network': network['name']} + quota = resources['quota'][network['project_id']]['global'] + limit = quota.get(limit_name, LIMITS[limit_name]) + yield TimeSeries(f'network/{name}_used', count, labels) + yield TimeSeries(f'network/{name}_available', limit, labels) + yield TimeSeries(f'network/{name}_used_ratio', count / limit, labels) + + +def _forwarding_rules(resources): + 'Groups forwarding rules by network/type and returns relevant timeseries.' + # create two separate iterators filtered by L4 and L7 balancing schemes + filter = lambda n, v: v['load_balancing_scheme'] != n + forwarding_rules = resources['forwarding_rules'].values() + forwarding_rules_l4 = itertools.filterfalse( + functools.partial(filter, 'INTERNAL'), forwarding_rules) + forwarding_rules_l7 = itertools.filterfalse( + functools.partial(filter, 'INTERNAL_MANAGED'), forwarding_rules) + # group each iterator by network and return timeseries + grouped_l4 = itertools.groupby(forwarding_rules_l4, lambda i: i['network']) + grouped_l7 = itertools.groupby(forwarding_rules_l7, lambda i: i['network']) + return itertools.chain( + _group_timeseries('forwarding_rules_l4', resources, grouped_l4, + 'INTERNAL_FORWARDING_RULES_PER_NETWORK'), + _group_timeseries('forwarding_rules_l7', resources, grouped_l7, + 'INTERNAL_MANAGED_FORWARDING_RULES_PER_NETWORK'), + ) + + +def _instances(resources): + 'Groups instances by network and returns relevant timeseries.' + instance_networks = itertools.chain.from_iterable( + i['networks'] for i in resources['instances'].values()) + grouped = itertools.groupby(instance_networks, lambda i: i['network']) + return _group_timeseries('instances', resources, grouped, + 'INSTANCES_PER_NETWORK_GLOBAL') + + +def _peerings(resources): + 'Counts peerings by network and returns relevant timeseries.' + quota = resources['quota'] + for network_id, network in resources['networks'].items(): + labels = {'project': network['project_id'], 'network': network['name']} + limit = quota.get(network_id, {}).get('PEERINGS_PER_NETWORK', 250) + p_active = len([p for p in network['peerings'] if p['active']]) + p_total = len(network['peerings']) + yield TimeSeries('network/peerings_active_used', p_active, labels) + yield TimeSeries('network/peerings_active_available', limit, labels) + yield TimeSeries('network/peerings_active_used_ratio', p_active / limit, + labels) + yield TimeSeries('network/peerings_total_used', p_total, labels) + yield TimeSeries('network/peerings_total_available', limit, labels) + yield TimeSeries('network/peerings_total_used_ratio', p_total / limit, + labels) + + +def _subnet_ranges(resources): + 'Groups subnetworks by network and returns relevant timeseries.' + grouped = itertools.groupby(resources['subnetworks'].values(), + lambda v: v['network']) + return _group_timeseries('subnets', resources, grouped, + 'SUBNET_RANGES_PER_NETWORK') + + +@register_timeseries +def timeseries(resources): + 'Returns used/available/ratio timeseries by network for different resources.' + LOGGER.info('timeseries') + # return descriptors + for dtype, name in DESCRIPTOR_ATTRS.items(): + yield MetricDescriptor(f'network/{dtype}', name, ('project', 'network'), + dtype.endswith('ratio')) + + # chain iterators from specialized functions and yield combined timeseries + results = itertools.chain(_forwarding_rules(resources), _instances(resources), + _peerings(resources), _subnet_ranges(resources)) + for result in results: + yield result diff --git a/blueprints/cloud-operations/network-dashboard/src/plugins/series-peering-groups.py b/blueprints/cloud-operations/network-dashboard/src/plugins/series-peering-groups.py new file mode 100644 index 0000000000..9f79268500 --- /dev/null +++ b/blueprints/cloud-operations/network-dashboard/src/plugins/series-peering-groups.py @@ -0,0 +1,180 @@ +# Copyright 2022 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +'Prepares descriptors and timeseries for peering group metrics.' + +import itertools +import logging + +from . import MetricDescriptor, TimeSeries, register_timeseries + +DESCRIPTOR_ATTRS = { + 'forwarding_rules_l4_available': + 'L4 fwd rules limit per peering group', + 'forwarding_rules_l4_used': + 'L4 fwd rules used per peering group', + 'forwarding_rules_l4_used_ratio': + 'L4 fwd rules used ratio per peering group', + 'forwarding_rules_l7_available': + 'L7 fwd rules limit per peering group', + 'forwarding_rules_l7_used': + 'L7 fwd rules used per peering group', + 'forwarding_rules_l7_used_ratio': + 'L7 fwd rules used ratio per peering group', + 'instances_available': + 'Instance limit per peering group', + 'instances_used': + 'Instance used per peering group', + 'instances_used_ratio': + 'Instance used ratio per peering group', + 'routes_dynamic_available': + 'Dynamic route limit per peering group', + 'routes_dynamic_used': + 'Dynamic route used per peering group', + 'routes_dynamic_used_ratio': + 'Dynamic route used ratio per peering group', + 'routes_static_available': + 'Static route limit per peering group', + 'routes_static_used': + 'Static route used per peering group', + 'routes_static_used_ratio': + 'Static route used ratio per peering group', +} +LIMITS = { + 'forwarding_rules_l4': { + 'pg': ('INTERNAL_FORWARDING_RULES_PER_PEERING_GROUP', 500), + 'prj': ('INTERNAL_FORWARDING_RULES_PER_NETWORK', 500) + }, + 'forwarding_rules_l7': { + 'pg': ('INTERNAL_MANAGED_FORWARDING_RULES_PER_PEERING_GROUP', 175), + 'prj': ('INTERNAL_MANAGED_FORWARDING_RULES_PER_NETWORK', 75) + }, + 'instances': { + 'pg': ('INSTANCES_PER_PEERING_GROUP', 15500), + 'prj': ('INSTANCES_PER_NETWORK_GLOBAL', 15000) + }, + 'routes_static': { + 'pg': ('STATIC_ROUTES_PER_PEERING_GROUP', 300), + 'prj': ('ROUTES', 250) + }, + 'routes_dynamic': { + 'pg': ('DYNAMIC_ROUTES_PER_PEERING_GROUP', 300), + 'prj': ('', 100) + } +} +LOGGER = logging.getLogger('net-dash.timeseries.peerings') + + +def _count_forwarding_rules_l4(resources, network_ids): + 'Returns count of L4 forwarding rules for specified network ids.' + return len([ + r for r in resources['forwarding_rules'].values() if + r['network'] in network_ids and r['load_balancing_scheme'] == 'INTERNAL' + ]) + + +def _count_forwarding_rules_l7(resources, network_ids): + 'Returns count of L7 forwarding rules for specified network ids.' + return len([ + r for r in resources['forwarding_rules'].values() + if r['network'] in network_ids and + r['load_balancing_scheme'] == 'INTERNAL_MANAGED' + ]) + + +def _count_instances(resources, network_ids): + 'Returns count of instances for specified network ids.' + count = 0 + for i in resources['instances'].values(): + if any(n['network'] in network_ids for n in i['networks']): + count += 1 + return count + + +def _count_routes_static(resources, network_ids): + 'Returns count of static routes for specified network ids.' + return len( + [r for r in resources['routes'].values() if r['network'] in network_ids]) + + +def _count_routes_dynamic(resources, network_ids): + 'Returns count of dynamic routes for specified network ids.' + return sum([ + sum(v.values()) + for k, v in resources['routes_dynamic'].items() + if k in network_ids + ]) + + +def _get_limit_max(quota, network_id, project_id, resource_name): + 'Returns maximum limit value in project / peering group / network limits.' + pg_name, pg_default = LIMITS[resource_name]['pg'] + prj_name, prj_default = LIMITS[resource_name]['prj'] + network_quota = quota.get(network_id, {}) + project_quota = quota.get(project_id, {}).get('global', {}) + return max([ + network_quota.get(pg_name, 0), + project_quota.get(prj_name, prj_default), + project_quota.get(pg_name, pg_default) + ]) + + +def _get_limit(quota, network, resource_name): + 'Computes and returns peering group limit.' + # reference https://cloud.google.com/vpc/docs/quota#vpc-peering-ilb-example + # step 1 - vpc_max = max(vpc limit, pg limit) + vpc_max = _get_limit_max(quota, network['self_link'], network['project_id'], + resource_name) + # step 2 - peers_max = [max(vpc limit, pg limit) for v in peered vpcs] + # step 3 - peers_min = min(peers_max) + peers_min = min([ + _get_limit_max(quota, p['network'], p['project_id'], resource_name) + for p in network['peerings'] + ]) + # step 4 - max(vpc_max, peers_min) + return max([vpc_max, peers_min]) + + +def _peering_group_timeseries(resources, network): + 'Computes and returns peering group timeseries for network.' + if len(network['peerings']) == 0: + return + network_ids = [network['self_link'] + ] + [p['network'] for p in network['peerings']] + for resource_name in LIMITS: + limit = _get_limit(resources['quota'], network, resource_name) + func = globals().get(f'_count_{resource_name}') + if not func or not callable(func): + LOGGER.critical(f'no handler for {resource_name} or handler not callable') + continue + count = func(resources, network_ids) + labels = {'project': network['project_id'], 'network': network['name']} + yield TimeSeries(f'peering_group/{resource_name}_used', count, labels) + yield TimeSeries(f'peering_group/{resource_name}_available', limit, labels) + yield TimeSeries(f'peering_group/{resource_name}_used_ratio', count / limit, + labels) + + +@register_timeseries +def timeseries(resources): + 'Returns peering group timeseries for all networks.' + LOGGER.info('timeseries') + # returns metric descriptors + for dtype, name in DESCRIPTOR_ATTRS.items(): + yield MetricDescriptor(f'peering_group/{dtype}', name, + ('project', 'network'), dtype.endswith('ratio')) + # chain timeseries for each network and return each one individually + results = itertools.chain(*(_peering_group_timeseries(resources, n) + for n in resources['networks'].values())) + for result in results: + yield result diff --git a/blueprints/cloud-operations/network-dashboard/src/plugins/series-routes.py b/blueprints/cloud-operations/network-dashboard/src/plugins/series-routes.py new file mode 100644 index 0000000000..89011215ca --- /dev/null +++ b/blueprints/cloud-operations/network-dashboard/src/plugins/series-routes.py @@ -0,0 +1,93 @@ +# Copyright 2022 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +'Prepares descriptors and timeseries for network-level route metrics.' + +import itertools +import logging + +from . import MetricDescriptor, TimeSeries, register_timeseries + +DESCRIPTOR_ATTRS = { + 'network/routes_dynamic_used': + 'Dynamic routes limit per network', + 'network/routes_dynamic_available': + 'Dynamic routes used per network', + 'network/routes_dynamic_used_ratio': + 'Dynamic routes used ratio per network', + 'network/routes_static_used': + 'Static routes limit per network', + 'project/routes_dynamic_used': + 'Dynamic routes limit per project', + 'project/routes_dynamic_available': + 'Dynamic routes used per project', + 'project/routes_dynamic_used_ratio': + 'Dynamic routes used ratio per project', + 'project/routes_static_used': + 'Static routes limit per project', + 'project/routes_static_available': + 'Static routes used per project', + 'project/routes_static_used_ratio': + 'Static routes used ratio per project' +} +LIMITS = {'ROUTES': 250, 'ROUTES_DYNAMIC': 100} +LOGGER = logging.getLogger('net-dash.timeseries.routes') + + +def _dynamic(resources): + 'Computes network-level timeseries for dynamic routes.' + for network_id, router_counts in resources['routes_dynamic'].items(): + network = resources['networks'][network_id] + count = sum(router_counts.values()) + labels = {'project': network['project_id'], 'network': network['name']} + limit = LIMITS['ROUTES_DYNAMIC'] + yield TimeSeries('network/routes_dynamic_used', count, labels) + yield TimeSeries('network/routes_dynamic_available', limit, labels) + yield TimeSeries('network/routes_dynamic_used_ratio', count / limit, labels) + + +def _static(resources): + 'Computes network and project-level timeseries for dynamic routes.' + filter = lambda v: v['next_hop_type'] in ('peering', 'network') + routes = itertools.filterfalse(filter, resources['routes'].values()) + grouped = itertools.groupby(routes, lambda v: v['network']) + project_counts = {} + for network_id, elements in grouped: + network = resources['networks'].get(network_id) + count = len(list(elements)) + labels = {'project': network['project_id'], 'network': network['name']} + yield TimeSeries('network/routes_static_used', count, labels) + project_counts[network['project_id']] = project_counts.get( + network['project_id'], 0) + count + for project_id, count in project_counts.items(): + labels = {'project': project_id} + quota = resources['quota'][project_id]['global'] + limit = quota.get('ROUTES', LIMITS['ROUTES']) + yield TimeSeries('project/routes_static_used', count, labels) + yield TimeSeries('project/routes_static_available', limit, labels) + yield TimeSeries('project/routes_static_used_ratio', count / limit, labels) + + +@register_timeseries +def timeseries(resources): + 'Returns used/available/ratio timeseries by network and project.' + LOGGER.info('timeseries') + # return descriptors + for dtype, name in DESCRIPTOR_ATTRS.items(): + labels = ('project') if dtype.startswith('project') else ('project', + 'network') + yield MetricDescriptor(dtype, name, labels, dtype.endswith('ratio')) + # chain static and dynamic route timeseries then return each one individually + results = itertools.chain(_static(resources), _dynamic(resources)) + for result in results: + yield result diff --git a/blueprints/cloud-operations/network-dashboard/src/plugins/series-subnets.py b/blueprints/cloud-operations/network-dashboard/src/plugins/series-subnets.py new file mode 100644 index 0000000000..a9f0a5f302 --- /dev/null +++ b/blueprints/cloud-operations/network-dashboard/src/plugins/series-subnets.py @@ -0,0 +1,100 @@ +# Copyright 2022 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +'Prepares descriptors and timeseries for subnetwork-level metrics.' + +import collections +import ipaddress +import itertools +import logging + +from . import MetricDescriptor, TimeSeries, register_timeseries + +DESCRIPTOR_ATTRS = { + 'addresses_available': 'Address limit per subnet', + 'addresses_used': 'Addresses used per subnet', + 'addresses_used_ratio': 'Addresses used ratio per subnet' +} +LOGGER = logging.getLogger('net-dash.timeseries.subnets') + + +def _subnet_addresses(resources): + 'Returns count of addresses per subnetwork.' + for v in resources['addresses'].values(): + if v['status'] != 'RESERVED': + continue + if v['purpose'] in ('GCE_ENDPOINT', 'DNS_RESOLVER'): + yield v['subnetwork'], 1 + + +def _subnet_forwarding_rules(resources, subnet_nets): + 'Returns counts of forwarding rules per subnetwork.' + for v in resources['forwarding_rules'].values(): + if v['load_balancing_scheme'].startswith('INTERNAL'): + yield v['subnetwork'], 1 + continue + if v['psc_accepted']: + network = resources['networks'].get(v['network']) + if not network: + LOGGER.warn(f'PSC address for missing network {v["network"]}') + continue + address = ipaddress.ip_address(v['address']) + for subnet_self_link in network['subnetworks']: + if address in subnet_nets[subnet_self_link]: + yield subnet_self_link, 1 + break + continue + + +def _subnet_instances(resources): + 'Returns counts of instances per subnetwork.' + vm_networks = itertools.chain.from_iterable( + i['networks'] for i in resources['instances'].values()) + return collections.Counter(v['subnetwork'] for v in vm_networks).items() + + +@register_timeseries +def timeseries(resources): + 'Returns used/available/ratio timeseries for addresses by subnetwork.' + LOGGER.info('timeseries') + # return descriptors + for dtype, name in DESCRIPTOR_ATTRS.items(): + yield MetricDescriptor(f'subnetwork/{dtype}', name, + ('project', 'network', 'subnetwork', 'region'), + dtype.endswith('ratio')) + # aggregate per-resource counts in total per-subnet counts + subnet_nets = { + k: ipaddress.ip_network(v['cidr_range']) + for k, v in resources['subnetworks'].items() + } + # TODO: add counter functions for PSA + subnet_counts = {k: 0 for k in resources['subnetworks']} + counters = itertools.chain(_subnet_addresses(resources), + _subnet_forwarding_rules(resources, subnet_nets), + _subnet_instances(resources)) + for subnet_self_link, count in counters: + subnet_counts[subnet_self_link] += count + # compute and return metrics + for subnet_self_link, count in subnet_counts.items(): + max_ips = subnet_nets[subnet_self_link].num_addresses - 4 + subnet = resources['subnetworks'][subnet_self_link] + labels = { + 'network': resources['networks'][subnet['network']]['name'], + 'project': subnet['project_id'], + 'region': subnet['region'], + 'subnetwork': subnet['name'] + } + yield TimeSeries('subnetwork/addresses_available', max_ips, labels) + yield TimeSeries('subnetwork/addresses_used', count, labels) + yield TimeSeries('subnetwork/addresses_used_ratio', + 0 if count == 0 else count / max_ips, labels) diff --git a/blueprints/cloud-operations/network-dashboard/src/plugins/utils.py b/blueprints/cloud-operations/network-dashboard/src/plugins/utils.py new file mode 100644 index 0000000000..5be6599889 --- /dev/null +++ b/blueprints/cloud-operations/network-dashboard/src/plugins/utils.py @@ -0,0 +1,101 @@ +# Copyright 2022 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +'Utility functions for API requests and responses.' + +import itertools +import json +import logging +import re + +from . import HTTPRequest, PluginError + +MP_PART = '''\ +Content-Type: application/http +MIME-Version: 1.0 +Content-Transfer-Encoding: binary + +GET {}?alt=json HTTP/1.1 +Content-Type: application/json +MIME-Version: 1.0 +Content-Length: 0 +Accept: application/json +Accept-Encoding: gzip, deflate +Host: compute.googleapis.com + +''' +RE_URL = re.compile(r'nextPageToken=[^&]+&?') + + +def batched(iterable, n): + 'Batches data into lists of length n. The last batch may be shorter.' + # batched('ABCDEFG', 3) --> ABC DEF G + if n < 1: + raise ValueError('n must be at least one') + it = iter(iterable) + while (batch := list(itertools.islice(it, n))): + yield batch + + +def parse_cai_results(data, name, resource_type=None, method='search'): + 'Parses an asset API response and returns individual results.' + results = data.get('results' if method == 'search' else 'assets') + if not results: + logging.info(f'no results for {name}') + return + for result in results: + if resource_type and result['assetType'] != resource_type: + logging.warn(f'result for wrong type {result["assetType"]}') + continue + yield result + + +def parse_page_token(data, url): + 'Detect next page token in result and return next page URL.' + page_token = data.get('nextPageToken') + if page_token: + logging.info(f'page token {page_token}') + if page_token: + return RE_URL.sub(f'pageToken={page_token}&', url) + + +def poor_man_mp_request(urls, boundary='1234567890'): + 'Bundles URLs into a single multipart mixed batched request.' + boundary = f'--{boundary}' + data = [boundary] + for url in urls: + data += ['\n', MP_PART.format(url), boundary] + data.append('--\n') + headers = {'content-type': f'multipart/mixed; boundary={boundary[2:]}'} + return HTTPRequest('https://compute.googleapis.com/batch/compute/v1', headers, + ''.join(data), False) + + +def poor_man_mp_response(content_type, content): + 'Parses a multipart mixed response and returns individual parts.' + try: + _, boundary = content_type.split('=') + except ValueError: + raise PluginError('no boundary found in content type') + content = content.decode('utf-8').strip()[:-2] + if boundary not in content: + raise PluginError('MIME boundary not found') + for part in content.split(f'--{boundary}'): + part = part.strip() + if not part: + continue + try: + mime_header, header, body = part.split('\r\n\r\n', 3) + except ValueError: + raise PluginError('cannot parse MIME part') + yield json.loads(body) diff --git a/blueprints/cloud-operations/network-dashboard/src/requirements.txt b/blueprints/cloud-operations/network-dashboard/src/requirements.txt new file mode 100644 index 0000000000..3ca529bc35 --- /dev/null +++ b/blueprints/cloud-operations/network-dashboard/src/requirements.txt @@ -0,0 +1,4 @@ +click==8.1.3 +google-auth==2.14.1 +PyYAML==6.0 +requests==2.28.1 diff --git a/blueprints/cloud-operations/network-dashboard/src/tools/remove-descriptors.py b/blueprints/cloud-operations/network-dashboard/src/tools/remove-descriptors.py new file mode 100755 index 0000000000..93b1110e46 --- /dev/null +++ b/blueprints/cloud-operations/network-dashboard/src/tools/remove-descriptors.py @@ -0,0 +1,72 @@ +#!/usr/bin/env python3 +# Copyright 2022 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +'Delete metric descriptors matching filter.' + +import json +import logging + +import click +import google.auth + +from google.auth.transport.requests import AuthorizedSession + +HEADERS = {'content-type': 'application/json'} +HTTP = AuthorizedSession(google.auth.default()[0]) +URL_DELETE = 'https://monitoring.googleapis.com/v3/{}' +URL_LIST = ( + 'https://monitoring.googleapis.com/v3/projects/{}' + '/metricDescriptors?filter=metric.type=starts_with("custom.googleapis.com/netmon/")' + '&alt=json') + + +def fetch(url, delete=False): + 'Minimal HTTP client interface for API calls.' + # try + try: + if not delete: + response = HTTP.get(url, headers=HEADERS) + else: + response = HTTP.delete(url) + except google.auth.exceptions.RefreshError as e: + raise SystemExit(e.args[0]) + if response.status_code != 200: + logging.critical(f'response code {response.status_code} for URL {url}') + logging.critical(response.content) + return + return response.json() + + +@click.command() +@click.option('--monitoring-project', '-op', required=True, type=str, + help='GCP monitoring project where metrics will be stored.') +def main(monitoring_project): + 'Module entry point.' + # if not click.confirm('Do you want to continue?'): + # raise SystemExit(0) + logging.info('fetching descriptors') + result = fetch(URL_LIST.format(monitoring_project)) + descriptors = result.get('metricDescriptors') + if not descriptors: + raise SystemExit(0) + logging.info(f'{len(descriptors)} descriptors') + for d in descriptors: + name = d['name'] + logging.info(f'delete {name}') + result = fetch(URL_DELETE.format(name), True) + + +if __name__ == '__main__': + logging.basicConfig(level=logging.INFO) + main() diff --git a/blueprints/cloud-operations/network-dashboard/tests/README.md b/blueprints/cloud-operations/network-dashboard/tests/README.md deleted file mode 100644 index 6e4779d459..0000000000 --- a/blueprints/cloud-operations/network-dashboard/tests/README.md +++ /dev/null @@ -1 +0,0 @@ -Creating here resources to test the Cloud Function and ensuring metrics are correctly populated \ No newline at end of file diff --git a/blueprints/cloud-operations/network-dashboard/tests/test.tf b/blueprints/cloud-operations/network-dashboard/tests/test.tf deleted file mode 100644 index bb9d6d317c..0000000000 --- a/blueprints/cloud-operations/network-dashboard/tests/test.tf +++ /dev/null @@ -1,287 +0,0 @@ -/** - * Copyright 2022 Google LLC - * - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -resource "google_folder" "test-net-dash" { - display_name = "test-net-dash" - parent = "organizations/${var.organization_id}" -} - -##### Creating host projects, VPCs, service projects ##### - -module "project-hub" { - source = "../../../../modules/project" - name = "test-host-hub" - parent = google_folder.test-net-dash.name - prefix = var.prefix - billing_account = var.billing_account - services = var.project_vm_services - - shared_vpc_host_config = { - enabled = true - } -} - -module "vpc-hub" { - source = "../../../../modules/net-vpc" - project_id = module.project-hub.project_id - name = "vpc-hub" - subnets = [ - { - ip_cidr_range = "10.0.10.0/24" - name = "subnet-hub-1" - region = var.region - } - ] -} - -module "project-svc-hub" { - source = "../../../../modules/project" - parent = google_folder.test-net-dash.name - billing_account = var.billing_account - prefix = var.prefix - name = "test-svc-hub" - services = var.project_vm_services - - shared_vpc_service_config = { - attach = true - host_project = module.project-hub.project_id - } -} - -module "project-prod" { - source = "../../../../modules/project" - name = "test-host-prod" - parent = google_folder.test-net-dash.name - prefix = var.prefix - billing_account = var.billing_account - services = var.project_vm_services - - shared_vpc_host_config = { - enabled = true - } -} - -module "vpc-prod" { - source = "../../../../modules/net-vpc" - project_id = module.project-prod.project_id - name = "vpc-prod" - subnets = [ - { - ip_cidr_range = "10.0.20.0/24" - name = "subnet-prod-1" - region = var.region - } - ] -} - -module "project-svc-prod" { - source = "../../../../modules/project" - parent = google_folder.test-net-dash.name - billing_account = var.billing_account - prefix = var.prefix - name = "test-svc-prod" - services = var.project_vm_services - - shared_vpc_service_config = { - attach = true - host_project = module.project-prod.project_id - } -} - -module "project-dev" { - source = "../../../../modules/project" - name = "test-host-dev" - parent = google_folder.test-net-dash.name - prefix = var.prefix - billing_account = var.billing_account - services = var.project_vm_services - - shared_vpc_host_config = { - enabled = true - } -} - -module "vpc-dev" { - source = "../../../../modules/net-vpc" - project_id = module.project-dev.project_id - name = "vpc-dev" - subnets = [ - { - ip_cidr_range = "10.0.30.0/24" - name = "subnet-dev-1" - region = var.region - } - ] -} - -module "project-svc-dev" { - source = "../../../../modules/project" - parent = google_folder.test-net-dash.name - billing_account = var.billing_account - prefix = var.prefix - name = "test-svc-dev" - services = var.project_vm_services - - shared_vpc_service_config = { - attach = true - host_project = module.project-dev.project_id - } -} - -##### Creating VPC peerings ##### - -module "hub-to-prod-peering" { - source = "../../../../modules/net-vpc-peering" - local_network = module.vpc-hub.self_link - peer_network = module.vpc-prod.self_link -} - -module "prod-to-hub-peering" { - source = "../../../../modules/net-vpc-peering" - local_network = module.vpc-prod.self_link - peer_network = module.vpc-hub.self_link - depends_on = [module.hub-to-prod-peering] -} - -module "hub-to-dev-peering" { - source = "../../../../modules/net-vpc-peering" - local_network = module.vpc-hub.self_link - peer_network = module.vpc-dev.self_link -} - -module "dev-to-hub-peering" { - source = "../../../../modules/net-vpc-peering" - local_network = module.vpc-dev.self_link - peer_network = module.vpc-hub.self_link - depends_on = [module.hub-to-dev-peering] -} - -##### Creating VMs ##### - -resource "google_compute_instance" "test-vm-prod1" { - project = module.project-svc-prod.project_id - name = "test-vm-prod1" - machine_type = "f1-micro" - zone = var.zone - - tags = ["${var.region}"] - - boot_disk { - initialize_params { - image = "debian-cloud/debian-9" - } - } - - network_interface { - subnetwork = module.vpc-prod.subnet_self_links["${var.region}/subnet-prod-1"] - subnetwork_project = module.project-prod.project_id - } - - allow_stopping_for_update = true -} - -resource "google_compute_instance" "test-vm-prod2" { - project = module.project-prod.project_id - name = "test-vm-prod2" - machine_type = "f1-micro" - zone = var.zone - - tags = [var.region] - - boot_disk { - initialize_params { - image = "debian-cloud/debian-9" - } - } - - network_interface { - subnetwork = module.vpc-prod.subnet_self_links["${var.region}/subnet-prod-1"] - subnetwork_project = module.project-prod.project_id - } - - allow_stopping_for_update = true -} - -resource "google_compute_instance" "test-vm-dev1" { - count = 10 - project = module.project-svc-dev.project_id - name = "test-vm-dev${count.index}" - machine_type = "f1-micro" - zone = var.zone - - tags = ["${var.region}"] - - boot_disk { - initialize_params { - image = "debian-cloud/debian-9" - } - } - - network_interface { - subnetwork = module.vpc-dev.subnet_self_links["${var.region}/subnet-dev-1"] - subnetwork_project = module.project-dev.project_id - } - - allow_stopping_for_update = true -} - -resource "google_compute_instance" "test-vm-hub1" { - project = module.project-svc-hub.project_id - name = "test-vm-hub1" - machine_type = "f1-micro" - zone = var.zone - - tags = ["${var.region}"] - - boot_disk { - initialize_params { - image = "debian-cloud/debian-9" - } - } - - network_interface { - subnetwork = module.vpc-hub.subnet_self_links["${var.region}/subnet-hub-1"] - subnetwork_project = module.project-hub.project_id - } - - allow_stopping_for_update = true -} - -# Forwarding Rules -resource "google_compute_forwarding_rule" "forwarding-rule-dev" { - count = 10 - name = "forwarding-rule-dev${count.index}" - project = module.project-svc-dev.project_id - network = module.vpc-dev.self_link - subnetwork = module.vpc-dev.subnet_self_links["${var.region}/subnet-dev-1"] - - region = var.region - backend_service = google_compute_region_backend_service.test-backend.id - ip_protocol = "TCP" - load_balancing_scheme = "INTERNAL" - all_ports = true - allow_global_access = true - -} - -# backend service -resource "google_compute_region_backend_service" "test-backend" { - name = "test-backend" - region = var.region - project = module.project-svc-dev.project_id - protocol = "TCP" - load_balancing_scheme = "INTERNAL" -} diff --git a/blueprints/cloud-operations/network-dashboard/tests/variables.tf b/blueprints/cloud-operations/network-dashboard/tests/variables.tf deleted file mode 100644 index dd01b29fdf..0000000000 --- a/blueprints/cloud-operations/network-dashboard/tests/variables.tf +++ /dev/null @@ -1,52 +0,0 @@ -/** - * Copyright 2022 Google LLC - * - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -variable "organization_id" { - description = "The organization id for the associated services" -} - -variable "billing_account" { - description = "The ID of the billing account to associate this project with" -} - -variable "prefix" { - description = "Prefix used for resource names." - type = string - validation { - condition = var.prefix != "" - error_message = "Prefix cannot be empty." - } -} - -variable "project_vm_services" { - description = "Service APIs enabled by default in new projects." - default = [ - "cloudbilling.googleapis.com", - "compute.googleapis.com", - "logging.googleapis.com", - "monitoring.googleapis.com", - "servicenetworking.googleapis.com", - ] -} -variable "region" { - description = "Region used to deploy subnets" - default = "europe-west1" -} - -variable "zone" { - description = "Zone used to deploy vms" - default = "europe-west1-b" -} diff --git a/blueprints/cloud-operations/network-dashboard/variables.tf b/blueprints/cloud-operations/network-dashboard/variables.tf deleted file mode 100644 index 2744eed629..0000000000 --- a/blueprints/cloud-operations/network-dashboard/variables.tf +++ /dev/null @@ -1,89 +0,0 @@ -/** - * Copyright 2022 Google LLC - * - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -variable "billing_account" { - description = "The ID of the billing account to associate this project with." -} - -variable "cf_version" { - description = "Cloud Function version 2nd Gen or 1st Gen. Possible options: 'V1' or 'V2'.Use CFv2 if your Cloud Function timeouts after 9 minutes. By default it is using CFv1." - default = "V1" - validation { - condition = var.cf_version == "V1" || var.cf_version == "V2" - error_message = "The value of cf_version must be either V1 or V2." - } -} - -variable "monitored_folders_list" { - type = list(string) - description = "ID of the projects to be monitored (where limits and quotas data will be pulled)." - default = [] -} - -variable "monitored_projects_list" { - type = list(string) - description = "ID of the projects to be monitored (where limits and quotas data will be pulled)." -} - -variable "monitoring_project_id" { - description = "Monitoring project where the dashboard will be created and the solution deployed; a project will be created if set to empty string." - default = "" -} - -variable "organization_id" { - description = "The organization id for the associated services." -} - -variable "prefix" { - description = "Prefix used for resource names." - type = string - validation { - condition = var.prefix != "" - error_message = "Prefix cannot be empty." - } -} - -variable "project_monitoring_services" { - description = "Service APIs enabled in the monitoring project if it will be created." - default = [ - "artifactregistry.googleapis.com", - "cloudasset.googleapis.com", - "cloudbilling.googleapis.com", - "cloudbuild.googleapis.com", - "cloudfunctions.googleapis.com", - "cloudresourcemanager.googleapis.com", - "cloudscheduler.googleapis.com", - "compute.googleapis.com", - "iam.googleapis.com", - "iamcredentials.googleapis.com", - "logging.googleapis.com", - "monitoring.googleapis.com", - "pubsub.googleapis.com", - "run.googleapis.com", - "servicenetworking.googleapis.com", - "serviceusage.googleapis.com", - "storage-component.googleapis.com" - ] -} -variable "region" { - description = "Region used to deploy the cloud functions and scheduler." - default = "europe-west1" -} - -variable "schedule_cron" { - description = "Cron format schedule to run the Cloud Function. Default is every 10 minutes." - default = "*/10 * * * *" -} diff --git a/blueprints/cloud-operations/network-dashboard/versions.tf b/blueprints/cloud-operations/network-dashboard/versions.tf deleted file mode 100644 index 3bdf23370a..0000000000 --- a/blueprints/cloud-operations/network-dashboard/versions.tf +++ /dev/null @@ -1,27 +0,0 @@ -# Copyright 2022 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# https://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -terraform { - required_version = ">= 1.3.1" - required_providers { - google = { - source = "hashicorp/google" - version = ">= 4.40.0" # tftest - } - google-beta = { - source = "hashicorp/google-beta" - version = ">= 4.40.0" # tftest - } - } -}