Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDK (Cloud Development Kit) to create infra for project website search #15

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
128 changes: 128 additions & 0 deletions cdk/opensearch-website-search/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
## CDK for deploying website search clusters

This project deploys following stacks:
1. Network stack: Sets up networking resources like VPC, subnets, AZ, security group, etc.
2. Infrastructure stack: Sets up EC2 (installs ODFE 1.13.2 by default using userdata), cloudwatch logging, network load balancer. Check your cluster log in the log group created from your stack in the cloudwatch.
3. APIGatewayLambda stack: Sets up API Gateway with various endpoints and search lambda function that points to network load balancer create in Infra stack.
4. Monitoring stack: Create an AWS Lambda function that periodically monitors backend OpenSearch cluster and sends metrics to CloudWatch.
5. Bastion stack: Creates a group EC2 instances inside an AutoScaling groups spread across AZs which acts as SSH bastion hosts and can be accessed from restricted IP ranges as defined in stack.

### CDK Installation

[Install CDK](https://docs.aws.amazon.com/cdk/latest/guide/cli.html) using `npm install -g aws-cdk`


The `cdk.json` file tells the CDK Toolkit how to execute your app.

This project is set up like a standard Python project. The initialization
process also creates a virtualenv within this project, stored under the `.venv`
directory. To create the virtualenv it assumes that there is a `python3`
(or `python` for Windows) executable in your path with access to the `venv`
package. If for any reason the automatic creation of the virtualenv fails,
you can create the virtualenv manually.

To manually create a virtualenv on MacOS and Linux:

```
$ python3 -m venv .venv
```

After the init process completes and the virtualenv is created, you can use the following
step to activate your virtualenv.

```
$ source .venv/bin/activate
```

If you are a Windows platform, you would activate the virtualenv like this:

```
% .venv\Scripts\activate.bat
```

Once the virtualenv is activated, you can install the required dependencies.

```
$ pip install -r requirements.txt
```

At this point you can now synthesize the CloudFormation template for this code.

```
$ cdk synth
```

To add additional dependencies, for example other CDK libraries, just add
them to your `setup.py` file and rerun the `pip install -r requirements.txt`
command.

## Prerequisites
1. Python 3 required to run CDK
2. AWS credentials configured locally or you can pass them during deployment in app.py file. [More information](https://docs.aws.amazon.com/cdk/latest/guide/environments.html)
3. EC2 keypair in the deploying region to passed as context variable. Make sure to store the private key safely or to AWS Secrets Manager.
4. Users and passwords for search and monitoring Lambda functions. These should be created in AWS Secrets Manager. Should be passed as environment variables or context variables on command lines. Avoid persisting them `cdk.context.json` .
```
SEARCH_USER
SEARCH_PASS
MONITORING_USER
MONITORING_PASS
```
While OpenSearch is bootstrapped on EC2 nodes, this usernames and passwords will be fetched from AWS Secrets Manager and respective users will created along with roles and role mappings.


## Cluster deploy
The cdk currently only supports TAR distribution hence passing any other argument as distribution would result in error.
You can check the cdk.context.json file for the default context variables. Consider them as parameters. Enter the appropriate values for keypair, url and dashboards_url.
Any of the context variable can be overwritten using the `-c` or `--context` flag in the deploy command.

In order to deploy the stacks follow the following steps:
1. Activate the python virtual environment
2. Enter the required values in `cdk.context.json` file:
- cidr: CIDR to create VPC with (defaults to 10.9.0.0/21).
- distribution: currently we only support `tar` distribution.
- keypair: your EC2 keypair in the deploying region. Please check that the key exists and you are deploying in the same region,
- url: OpenSearch download url eg:https://artifacts.opensearch.org/snapshots/bundle/opensearch/1.0.0-rc1/opensearch-1.0.0-rc1-linux-x64.tar.gz ,
- dashboards_url: OpenSearch download url eg: https://artifacts.opensearch.org/snapshots/bundle/opensearch-dashboards/1.0.0-rc1/opensearch-dashboards-1.0.0-rc1-linux-x64.tar.gz

Please check that the urls are valid as they won't throw an error explicitly. These links are used in the userdata of an EC2 instance.

3. If you have all values entered in cdk.context.json:
```
cdk deploy --all
```
If you want to enter the parameters via command line:
```
cdk deploy --all -c keypair=your_ec2_keyPair -c url=<opensearch_download_link> -c dashboards_url=<dashboards_download_link>
```
For non-interactive shell:
```
cdk deploy --all --require-approval=never
```

### SSH
Both data nodes and master nodes are SSH via bastion hosts. Use [ssh-bastion-ec2](tools/ssh-bastion-ec2/ssh-bastion-ec2) tool.


## Teardown
To delete a particular stack use the command:
```
cdk destroy <stackName>
```

To delete all the created stacks together use the command
```
cdk destroy --all
```
_Note: If you deployed the stack using command line parameters (i.e. cdk.context.json has empty values), you need to pass the parameters during `cdk destroy` as well_
```
cdk destroy --all -c keypair=your_ec2_keyPair -c url=<opensearch_download_link> -c dashboards_url=<dashboards_download_link>
```
## Useful commands

* `cdk ls` list all stacks in the app
* `cdk synth` emits the synthesized CloudFormation template
* `cdk deploy` deploy this stack to your default AWS account/region
* `cdk diff` compare deployed stack with current state
* `cdk docs` open CDK documentation


76 changes: 76 additions & 0 deletions cdk/opensearch-website-search/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
#!/usr/bin/env python3
import os

# For consistency with TypeScript code, `cdk` is the preferred import name for
# the CDK's core module. The following line also imports it as `core` for use
# with examples from the CDK Developer's Guide, which are in the process of
# being updated to use `cdk`. You may delete this import if you don't need it.
from aws_cdk import core, aws_ec2 as ec2

from website_search_cdk.network import Network
from website_search_cdk.infra import ClusterStack
from website_search_cdk.infra import Architecture, Security
from website_search_cdk.api_lambda import ApiLambdaStack
from website_search_cdk.monitoring import MonitoringStack
from website_search_cdk.bastions import Bastions

env = core.Environment(account=os.environ.get("CDK_DEPLOY_ACCOUNT", os.environ["CDK_DEFAULT_ACCOUNT"]),
region=os.environ.get("CDK_DEPLOY_REGION", os.environ["CDK_DEFAULT_REGION"]))
app = core.App()

stack_prefix = app.node.try_get_context("stack_prefix")
if not stack_prefix:
raise ValueError(stack_prefix, "is either null or empty. Please use a prefix to differentiate"
" between stack and prevent from overriding other stacks")

architecture = app.node.try_get_context("architecture")
if not (Architecture.has_value(architecture)):
raise ValueError(architecture, "is either null or not supported yet! Please use either x64 or arm64")

security = app.node.try_get_context("security")
if not (Security.has_security_value(security)):
raise ValueError(security, "The keyword has to be either of these two: enable or disable.")

cluster_stack_name = app.node.try_get_context("cluster_stack_name")
network_stack_name = app.node.try_get_context("network_stack_name")
search_access_stack_name = app.node.try_get_context("search_access_stack_name")
monitoring_stack_name = app.node.try_get_context("monitoring_stack_name")

if not cluster_stack_name:
raise ValueError(" Cluster stack name cannot be None. Please provide the right stack name")
if not network_stack_name:
raise ValueError(" Network stack name cannot be None. Please provide the right stack name")

# Default AMI points to latest AL2
al2_ami = ec2.MachineImage.latest_amazon_linux(generation=ec2.AmazonLinuxGeneration.AMAZON_LINUX_2,
cpu_type=ec2.AmazonLinuxCpuType.X86_64)

network = Network(app, stack_prefix + network_stack_name,
# If you don't specify 'env', this stack will be environment-agnostic.
# Account/Region-dependent features and context lookups will not work,
# but a single synthesized template can be deployed anywhere.

# Uncomment the next line to specialize this stack for the AWS Account
# and Region that are implied by the current CLI configuration.

# env=core.Environment(account=os.getenv('CDK_DEFAULT_ACCOUNT'),
# region=os.getenv('CDK_DEFAULT_REGION')),

# Uncomment the next line if you know exactly what Account and Region you
# want to deploy the stack to. */

# env=core.Environment(account='123456789012', region='us-east-1'),

# For more information, see https://docs.aws.amazon.com/cdk/latest/guide/environments.html
env=env,
)
opensearh_infra = ClusterStack(app, stack_prefix + cluster_stack_name, vpc=network.vpc, sg=network.security_group,
architecture=architecture, security=security, env=env
)

api_lambda = ApiLambdaStack(app, stack_prefix + search_access_stack_name, network.vpc, opensearh_infra.nlb,
opensearh_infra.opensearch_listener, env=env)
monitoring = MonitoringStack(app, stack_prefix + monitoring_stack_name, network.vpc, opensearh_infra.nlb, env=env)
bastion_host_infra = Bastions(app, stack_prefix + 'bastion-hosts', network.vpc, env=env)

app.synth()
27 changes: 27 additions & 0 deletions cdk/opensearch-website-search/cdk.context.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
{
"stack_prefix": "test-one-",
"cluster_stack_name": "opensearch-cluster",
"network_stack_name": "network",
"search_access_stack_name": "gateway-lambda",
"monitoring_stack_name": "monitoring",
"cidr": "10.9.0.0/21",
"distribution": "tar",
"keypair": "",
"architecture": "x64",
"ami_id": "",
"url": "https://artifacts.opensearch.org/releases/bundle/opensearch/1.0.0/opensearch-1.0.0-linux-x64.tar.gz",
"dashboards_url": "https://artifacts.opensearch.org/releases/bundle/opensearch-dashboards/1.0.0/opensearch-dashboards-1.0.0-linux-x64.tar.gz",
"master_node_count": "2",
"data_node_count": "3",
"client_node_count": "3",
"nlb_opensearch_port": "80",
"nlb_dashboards_port": "5601",
"security": "enable",
"search_user": "",
"search_pass": "",
"monitoring_user": "",
"monitoring_pass": "",
"allowed_origins": [
"*"
]
}
13 changes: 13 additions & 0 deletions cdk/opensearch-website-search/cdk.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
"app": "python3 app.py",
"context": {
"@aws-cdk/core:enableStackNameDuplicates": "true",
"aws-cdk:enableDiffNoFail": "true",
"@aws-cdk/core:stackRelativeExports": "true",
"@aws-cdk/aws-ecr-assets:dockerIgnoreSupport": true,
"@aws-cdk/aws-secretsmanager:parseOwnedSecretName": true,
"@aws-cdk/aws-kms:defaultKeyPolicies": true,
"@aws-cdk/aws-s3:grantWriteWithoutAcl": true,
"@aws-cdk/aws-ecs-patterns:removeDefaultDesiredCount": true
}
}
Empty file.
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
import json
import boto3
import requests
import os
from requests.auth import HTTPBasicAuth

username = os.getenv('MONITORING_USER')
password = os.getenv('MONITORING_PASS')
nlb_endpoint = os.getenv('NLB_ENDPOINT')
nlb_opensearch_port = os.getenv('NLB_OPENSEARCH_PORT', "80")
nlb_dashboards_port = os.getenv('NLB_OPENSEARCH_PORT', "5601")
opensearch_base_url = 'http://' + nlb_endpoint + ':' + nlb_opensearch_port
dashboards_base_url = 'http://' + nlb_endpoint + ':' + nlb_dashboards_port

http_basic_auth = HTTPBasicAuth(username, password)

"""
MetricData=[
{
'MetricName': 'string',
'Dimensions': [
{
'Name': 'string',
'Value': 'string'
},
],
'Timestamp': datetime(2015, 1, 1),
'Value': 123.0,
'StatisticValues': {
'SampleCount': 123.0,
'Sum': 123.0,
'Minimum': 123.0,
'Maximum': 123.0
},
'Values': [
123.0,
],
'Counts': [
123.0,
],
'Unit': 'Seconds'|'Microseconds'|'Milliseconds'|'Bytes'|'Kilobytes'|'Megabytes'|'Gigabytes'|'Terabytes'|'Bits'|'Kilobits'|'Megabits'|'Gigabits'|'Terabits'|'Percent'|'Count'|'Bytes/Second'|'Kilobytes/Second'|'Megabytes/Second'|'Gigabytes/Second'|'Terabytes/Second'|'Bits/Second'|'Kilobits/Second'|'Megabits/Second'|'Gigabits/Second'|'Terabits/Second'|'Count/Second'|'None',
'StorageResolution': 123
},
]
"""


class CWMetricData():

def __init__(self, namespace):
self.metric_data = []
self.namespace = namespace

def add(self, metric):
self.metric_data.append(metric)

def get_all_metrics(self):
return self.metric_data

def get_namespace(self):
return self.namespace


def check_cluster_health(metric_data):
# TODO: whether cluster is RED, YELLOW or GREEN, this is not bullet proof code
cluster_health_url = opensearch_base_url + '/_cluster/health?pretty'
# ES 6.x requires an explicit Content-Type header
headers = {"Content-Type": "application/json"}

val = 0
try:
res = requests.get(cluster_health_url, auth=http_basic_auth, headers=headers, verify=False)
health = res.json()
if health['status'] == 'green':
val = 1
except Exception as e:
print(e)

metric_data.add({
'MetricName': 'ClusterHealth',
'Value': val,
'Unit': 'Count',
'StorageResolution': 60
})


def handler(event, context):
cw = boto3.client("cloudwatch")
metrics = CWMetricData("opensearch-website-search")
check_cluster_health(metrics)
cw.put_metric_data(Namespace=metrics.get_namespace(),
MetricData=metrics.get_all_metrics())


Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
requests==2.26.0

Empty file.
Loading