Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Welcome to Leverage's documentation! Here you will find the concepts you need to understand to work with our stack, the steps to try Leverage by yourself, and the extensive documentation about every aspect of our solution.
Now that you know the basic concepts about Leverage feel free to give it a try or check out the User Guide section to go deeper into the implementation details. Links down below:
Leverage was built around the AWS Well Architected Framework and it uses a stack that includes Terraform, Ansible, Helm and other tools.
We are also adopters and supporters of Kubernetes and the Cloud Native movement, which should become self-evident as you keep exploring our technology stack.
"},{"location":"concepts/our-tech-stack/#why-did-we-choose-our-tech-stack","title":"Why did we choose our tech stack?","text":"Why AWS\u2753
Amazon Web Services (AWS) is the world\u2019s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers\u2014including the fastest-growing startups, largest enterprises, and leading government agencies\u2014are using AWS to lower costs, become more agile, and innovate faster.
Build, Deploy, and Manage Websites, Apps or Processes On AWS' Secure, Reliable Network. AWS is Secure, Reliable, Scalable Services. HIPAA Compliant. Easily Manage Clusters. Global Infrastructure. Highly Scalable.
Read More: What is AWS
Why WAF (Well Architected Framework)\u2753
AWS Well-Architected helps cloud architects to build secure, high-performing, resilient, and efficient infrastructure for their applications and workloads. Based on five pillars \u2014 operational excellence, security, reliability, performance efficiency, and cost optimization \u2014 AWS Well-Architected provides a consistent approach for customers and partners to evaluate architectures, and implement designs that can scale over time.
Read More: AWS Well-architected
Why Infra as Code (IaC) & Terraform\u2753
Confidence: A change breaks the env? Just roll it back. Still not working? Build a whole new env with a few keystrokes. IaC enables this.
Repeatability: Allows your infra to be automatically instantiated, making it easy to build multiple identical envs.
Troubleshooting: Check source control and see exactly what changed in the env. As long as you are diligent and don\u2019t make manual envs changes, then IaC can be a game changer.
DR: Require the ability to set up an alternate env in a different DC or Region. IaC makes this a much more manageable prospect.
Auditability: You will need to be able to audit both changes and access to an env, IaC gives you this right out of the box.
Visibility: As an env expands over time, is challenging to tell what has been provisioned. In the #cloud this can be a huge #cost issue. IaC allows tracking your resources.
Portability: Some IaC techs are #multicloud. Also, translating #Terraform from one cloud provider to another is considerably more simple than recreating your entire envs in a cloud-specific tool.
Security: See history of changes to your SG rules along with commit messages can do wonders for being confident about the security configs of your envs.
Terraform allows to codify your application infrastructure, reduce human error and increase automation by provisioning infrastructure as code. With TF we can manage infrastructure across clouds and provision infrastructure across 300+ public clouds and services using a single workflow. Moreover it helps to create reproducible infrastructure and provision consistent testing, staging, and production environments with the same configuration.
Terraform has everything we expect from a IaC framework: open source, cloud-agnostic provisioning tool that supported immutable infrastructure, a declarative language, and a client-only architecture.
Read More
Why Infrastructure as Code
Why Terraform by Gruntwork
Why Organizations\u2753
AWS Organizations helps you centrally manage and govern your environment as you grow and scale your AWS resources. Using AWS Organizations, you can programmatically create new AWS accounts and allocate resources, group accounts to organize your workflows, apply policies to accounts or groups for governance, and simplify billing by using a single payment method for all of your accounts.
Read More
How it works: AWS Organizations
AWS Organizations
Why IAM and roles\u2753
AWS Identity and Access Management (IAM) enables you to manage access to AWS services and resources securely. Using IAM, you can create and manage AWS users and groups, and use permissions to allow and deny their access to AWS resources.
Integration and Fine-grained access control with almost every AWS service and its resources.
Multi-factor authentication for highly privileged users.
Raise your security posture with AWS infrastructure and services. Using AWS, you will gain the control and confidence you need to securely run your business with the most flexible and secure cloud computing environment available today. As an AWS customer, you will benefit from AWS data centers and a network architected to protect your information, identities, applications, and devices. With AWS, you can improve your ability to meet core security and compliance requirements, such as data locality, protection, and confidentiality with our comprehensive services and features.
Read More
How it works: AWS Security
AWS Cloud Security
Why VPC\u2753
Amazon Virtual Private Cloud (Amazon VPC) is a service that lets you launch AWS resources in a logically isolated virtual network that you define. You have complete control over your virtual networking environment, including selection of your own IP address range, creation of subnets, and configuration of route tables and network gateways. You can use both IPv4 and IPv6 for most resources in your virtual private cloud, helping to ensure secure and easy access to resources and applications.
Read More
How it works: AWS Networking
AWS Virtual Private Cloud
Why Kubernetes (K8s) & AWS EKS\u2753
Kubernetes, also known as K8s, is an open-source system for automating deployment, scaling, and management of containerized applications. It groups containers that make up an application into logical units for easy management and discovery. Kubernetes builds upon 15 years of experience of running production workloads at Google, combined with best-of-breed ideas and practices from the community.
Amazon Elastic Kubernetes Service (Amazon EKS) gives you the flexibility to start, run, and scale Kubernetes applications in the AWS cloud or on-premises. Amazon EKS helps you provide highly-available and secure clusters and automates key tasks such as patching, node provisioning, and updates. Customers such as Intel, Snap, Intuit, GoDaddy, and Autodesk trust EKS to run their most sensitive and mission critical applications.
EKS runs upstream Kubernetes and is certified Kubernetes conformant for a predictable experience. You can easily migrate any standard Kubernetes application to EKS without needing to refactor your code.
Read More
How it works: AWS EKS
AWS EKS
Kubernetes
Why S3\u2753
Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. This means customers of all sizes and industries can use it to store and protect any amount of data for a range of use cases, such as data lakes, websites, mobile applications, backup and restore, archive, enterprise applications, IoT devices, and big data analytics. Amazon S3 provides easy-to-use management features so you can organize your data and configure finely-tuned access controls to meet your specific business, organizational, and compliance requirements. Amazon S3 is designed for 99.999999999% (11 9's) of durability, and stores data for millions of applications for companies all around the world.
Read More
How it works: AWS Storage
AWS S3
Why RDS\u2753
Amazon Relational Database Service (Amazon RDS) makes it easy to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity while automating time-consuming administration tasks such as hardware provisioning, database setup, patching and backups. It frees you to focus on your applications so you can give them the fast performance, high availability, security and compatibility they need.
Amazon RDS is available on several database instance types - optimized for memory, performance or I/O - and provides you with six familiar database engines to choose from, including Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle Database, and SQL Server. You can use the AWS Database Migration Service to easily migrate or replicate your existing databases to Amazon RDS.
Read More
How it works: AWS Databases
AWS RDS
Why Hashicorp Vault\u2753
As many organizations migrate to the public cloud, a major concern has been how to best secure data, preventing it from unauthorized access or exfiltration.
Deploying a product like HashiCorp Vault gives you better control of your sensitive credentials and helps you meet cloud security standards.
HashiCorp Vault is designed to help organizations manage access to secrets and transmit them safely within an organization. Secrets are defined as any form of sensitive credentials that need to be tightly controlled and monitored and can be used to unlock sensitive information. Secrets could be in the form of passwords, API keys, SSH keys, RSA tokens, or OTP.
HashiCorp Vault makes it very easy to control and manage access by providing you with a unilateral interface to manage every secret in your infrastructure. Not only that, you can also create detailed audit logs and keep track of who accessed what.
Manage Secrets and Protect Sensitive Data. Secure, store and tightly control access to tokens, passwords, certificates, encryption keys for protecting secrets and other sensitive data using a UI, CLI, or HTTP API.
Read More
How it works: Secrets
Hashicorp Vault Project
"},{"location":"concepts/what-is-leverage/","title":"What is Leverage?","text":"
Leverage was made out of a significant amount of knowledge, acquired through several years of experience, turned into an ecosystem of code, tools, and workflows that enables you to build the AWS infrastructure for your applications and services quickly and securely.
Since all the code and modules are already built, we can get you up and running up to 10x faster than a consulting company -- typically in just a few weeks! -- and on top of code that is thoroughly documented, tested, and has been proven in production at dozens of other project deployments.
Our focus is on creating reusable, high quality Cloud Infrastructure code, through our core components:
Reference Architecture: Designed under optimal configs for the most popular modern web and mobile applications needs. Its design is fully based on the AWS Well Architected Framework.
Infrastructure as Code (IaC) Library: A collection of reusable, tested, production-ready E2E AWS Cloud infrastructure as code solutions, leveraged by modules written in: Terraform, Ansible, Helm charts, Dockerfiles and Makefiles.
Leverage CLI: projects' command line tool. Provides the means to interact and deploy Leverage Reference Architecture on AWS and if needed it allows you to define custom tasks to run.
Check out this intro video that explains what Leverage is in less than 5 minutes:
"},{"location":"concepts/what-leverage-can-do-for-you/","title":"What can Leverage do for you?","text":"
Still not convinced? Check out the following sections which describe what Leverage can bring on the table depending on your role in a company.
"},{"location":"concepts/what-leverage-can-do-for-you/#leverage-for-cios-ctos-and-vps-of-engineering","title":"Leverage for CIOs, CTOs and VPs of Engineering","text":"Accelerate development and optimize costs
Annual cost savings are a new standard and best practice. Profits are being targeted to business development, regulatory and compliance needs. Resulting in a reduction of pressure on IT and development budgets, granting the opportunity to focus in new features and boost innovation.
Modernize applications architecture (loosely coupled and modular)
Strategically decompose the monolith into a fine-grained, loosely coupled modular architecture to increase both development and business agility. When the system architecture is designed to allow teams to test, deploy and change systems without relying on other teams, they require little communication to get the job done. In other words, both the architecture and the teams are loosely coupled.
Innovation - Rapidly adopt new technologies and reduce development time
Use Leverage Reference Architecture and for AWS + our libraries to provide a collection of cloud application architecture components to build and deploy faster in the cloud. Building a cloud Landing Zone is complex, especially since most companies have little or no expertise in this area. And it can take a significant amount of time to get it right. Leverage a reference architecture to give you an AWS Landing Zone that provides a consistent and solid \"foundations\" to bootstrap your project in the cloud. The code solution implements the best AWS Well-Architected Framework practices as well as the battle-tested tech experience and years of knowledge of our contributors.
Hours or days, not weeks or months
Leverage implements infrastructure as code at all times. We have rolled this out using Terraform, and has been fully proven in AWS and other Terraform providers that are part of our reference architecture like Kubernetes, Helm and Hashicorp Vault. By using the Leverage CLI, our binary will help you to quickly bootstrap your AWS Landing Zone in a matter of hours (or at most a few days).
It's not just a pile of scripts
It's not just another layer of untested, one time and stand-alone developed scripts. The code is modularized and well designed under best practices, our Leverage CLI has both unit and integration tests. While our Terraform code has been extensively E2E tested. Moreover, 100% of the code is yours (to modify, extend, reuse, etc), with no vendor locking and vendor licensing fees. We use the MIT license, so you can take the code, modify it and use it as your private code. All we ask in return is a friendly greeting and that (if possible) consider contributing to binbash Leverage project. Implement Leverage yourself or we can deploy it for you!
DevOps culture and methodologies
Team agility and continuous improvements based on feedback loops are some of the main drivers of cloud adoption, and IAC's goal of reducing the frequency of deployment of both infrastructure and applications are some of the most important aspects of DevOps practices. We continue to apply these methodologies to achieve a DevOps first culture. We have experienced and demonstrated their potential and have practiced them in dozens of projects over the past 5 years. The Leverage reference architecture for AWS combines a set of application best practices, technology patterns and a common CI/CD deployment approach through Leverage CLI for all your application environments. As a result, we are pursuing a world-class software delivery performance through optimized collaboration, communication, reliability, stability, scalability and security at ever-decreasing cost and effort.
Repeatable, composable and extensible immutable infrastructure
The best high-performance development teams create and recreate their development and production environments using infrastructure as code (IaC) as part of their daily development processes. The Leverage CLI allows to build repeatable and immutable infrastructure. So your cloud development, staging and production environments will consistently be the same.
"},{"location":"concepts/what-leverage-can-do-for-you/#leverage-for-devops-engineers-cloud-architects-and-software-engineers","title":"Leverage for DevOps Engineers, Cloud Architects and Software Engineers","text":"Provisioning infrastructure as code (Iac)
Instead of manually provisioning infrastructure, the real benefits of cloud adoption come from orchestrating infrastructure through code. However, this is really challenging to achieve, there are literally thousands of tiny things and configs to consider and they all seem to take forever. Our experience is that it can take teams up to 24 months to achieve a desired infra state in AWS. By using Leverage you could get your AWS Landing-Zone in few weeks, or your entire AWS Well-Architected based cloud solution within 1 to 3 months (depending on your project complexity needs).
We've done it before (don't reinvent the wheel)
Often, development teams have similar and recurring requests such as: iam, networking, security, storage, databases, compute and secret management, etc. binbash Leverage has been proven in dozen of project to create software-defined (IaC) AWS environments.
Best practices baked in the code
Leverage provides IaC reference architecture for AWS hosted applications infrastructure. This is baked into the code as a combination of the best AWS Well-Architected framework practices and the experience of having successfully orchestrated many customers to AWS cloud.
On-demand infra deployment
Leverage provides your DevOps, Cloud, SRE and Development teams with the ability to provision on-demand infrastructure, granting that it will meet the rigorous security requirements of modern cloud native best practices. It fully implements AWS Well-Architected Framework (WAF) and best DevOps practices, including practices, including collaboration, version control, CI/CD, continuous testing, cloud infrastructure and losely couple architectures.
Easier to support and maintain
Leverage IaC approach significantly reduce your AWS infra deployment, config and support burden and reduce risk. Our code backed provisioning has been rigorously tested many times, eliminating the possibility of manual errors. Because the entire infrastructure is deployed from the same proven code, the consistency your cloud environments will simplify your setup and maintenance. Use the versioned code to iterate and improve, extend or compose your internal processes as your cloud operating model evolves.
There is no vendor lock-in. You own the solution
With Leverage you own 100% of the code with no lock-in clauses. If you choose to leave Leverage, you will still have your entire AWS cloud infrastructure that you can access and manage. If you drop Leverage, you will still have your entire cloud native infrastructure code (Terraform, Helm, Ansible, Python). It\u2019s 100% Open Source on GitHub and is free to use with no strings attached under MIT license (no licensing fees), and you are free to commercially and privately use, distribute and modify.
Consistent environments (Dev/prod parity)
Keep development, staging, and production cloud envs parity. Infrastructure as code allow us to define and provisioning all infrastructure components (think networks, load balancers, databases, security, compute and storage, etc.) using code. Leverage uses Terraform as the IaC language, to deploy and setup all the AWS, Kubernetes and Hashicorp Vault resources (it has support for multiple cloud and technology providers). Backed by code, your cloud environments are built exactly the identical way all the time. Finally, this will result in no differences between development, staging and production.
Development in production like envs
IaC allows your development team to deploy and test the AWS infrastructure as if it were application code. Your development is always done in production-like environments. Provision your cloud test and sandbox environments on demand and tear them down when all your testing is complete. Leverage takes all the pain out of maintaining production-like environments, with stable infra releases. It eliminates the unpredictability of wondering if what actually worked in your development envs will work in production.
By implementing our Reference Architecture for AWS and the Infrastructure as Code (IaC) Library via Leverage CLI, you will get your entire Cloud Native Application Infrastructure deployed in only a few weeks.
Did you know?
You can roll out Leverage by yourself or we can implement it for you!
"},{"location":"concepts/why-leverage/#the-problem-and-our-solution","title":"The problem and our solution","text":""},{"location":"concepts/why-leverage/#what-are-the-problems-you-might-be-facing","title":"What are the problems you might be facing?","text":"Figure: Why Leverage? The problem. (Source: binbash, \"Leverage Presentation: Why you should use Leverage?\", accessed June 15th 2021)."},{"location":"concepts/why-leverage/#what-is-our-solution","title":"What is our solution?","text":"Figure: Why Leverage? The solution. (Source: binbash, \"Leverage Presentation: Why you should use Leverage?\", accessed June 15th 2021)."},{"location":"es/bienvenido/","title":"Bienvenido","text":""},{"location":"es/bienvenido/#proximamente","title":"Pr\u00f3ximamente","text":""},{"location":"how-it-works/ref-architecture/","title":"How it works","text":""},{"location":"how-it-works/ref-architecture/#how-it-works","title":"How it works","text":"
The objective of this document is to explain how the binbash Leverage Reference Architecture for AWS works, in particular how the Reference Architecture model is built and why we need it.
This documentation contains all the guidelines to create binbash Leverage Reference Architecture for AWS that will be implemented on the Projects\u2019 AWS infrastructure.
We're assuming you've already have in place your AWS Landing Zone based on the First Steps guide.
Our Purpose
Democratize advanced technologies: As complex as it may sound, the basic idea behind this design principle is simple. It is not always possible for a business to maintain a capable in-house IT department while staying up to date. It is entirely feasible to set up your own cloud computing ecosystem from scratch without experience, but that would take a considerable amount of resources; it is definitely not the most efficient way to go.
An efficient business-minded way to go is to employ AWS as a service allows organizations to benefit from the advanced technologies integrated into AWS without learning, researching, or creating teams specifically for those technologies.
Info
This documentation will provide a detailed reference of the tools and techs used, the needs they address and how they fit with the multiple practices we will be implementing.
AWS Regions: Multi Region setup \u2192 1ry: us-east-1 (N. Virginia) & 2ry: us-west-2 (Oregon).
Repositories & Branching Strategy
DevOps necessary repositories will be created. Consultant will use a trunk-based branching strategy with short-lived feature branches (feature/ID-XXX -> `master), and members from either the Consultant or the Client will be reviewers of every code delivery to said repositories (at least 1 approver per Pull Request).
Infra as code deployments should run from the new feature/ID-XXX or master branch. feature/ID-XXX branch must be merged immediately (ASAP) via PR to the master branch.
Consideration: validating that the changes within the code will only affect the desired target resources is the responsibility of the executor (to ensure everything is OK please consider exec after review/approved PR).
Infra as Code + GitOps
After deployment via IaC (Terraform, Ansible & Helm) all subsequent changes will be performed via versioned controlled code, by modifying the corresponding repository and running the proper IaC Automation execution.
All AWS resources will be deployed via Terraform and rarely occasional CloudFormation, Python SDK & AWS CLI when the resource is not defined by Terraform (almost none scenario). All code and scripts will be included in the repository. We'll start the process via Local Workstations. Afterwards full exec automation will be considered via: Github Actions, ,Gitlab Pipelines or equivalent preferred service.
Consideration: Note that any change manually performed will generate inconsistencies on the deployed resources (which left them out of governance and support scope).
Server OS provisioning: Provisioning via Ansible for resources that need to be provisioned on an OS.
Containers Orchestration: Orchestration via Terraform + Helm Charts for resources that need to be provisioned in Kubernetes (with Docker as preferred container engine).
Pre-existing AWS Accounts: All resources will be deployed in several new AWS accounts created inside the Client AWS Organization. Except for the AWS Legacy Account invitation to the AWS Org and OrganizationAccountAccessRole creation in it, there will be no intervention whatsoever in Client Pre-existing accounts, unless required by Client authority and given a specific requirement.
Info
We will explore the details of all the relevant Client application stacks, CI/CD processes, monitoring, security, target service level objective (SLO) and others in a separate document.
"},{"location":"try-leverage/","title":"Index","text":""},{"location":"try-leverage/#try-leverage","title":"Try Leverage","text":""},{"location":"try-leverage/#before-you-begin","title":"Before you begin","text":"
The objective of this guide is to introduce you to our binbash Leverage Reference Architecture for AWS workflow through the complete deployment of a basic landing zone configuration.
The Leverage Landing Zone is the smallest possible fully functional configuration. It lays out the base infrastructure required to manage the environment: billing and financial management, user management, security enforcement, and shared services and resources. Always following the best practices layed out by the AWS Well-Architected Framework to ensure quality and to provide a solid base to build upon. This is the starting point from which any Leverage user can and will develop all the features and capabilities they may require to satisfy their specific needs.
Figure: Leverage Landing Zone architecture components diagram."},{"location":"try-leverage/#about-this-guide","title":"About this guide","text":"
In this guide you will learn how to:
Create and configure your AWS account.
Work with the Leverage CLI to manage your credentials, infrastructure and the whole Leverage stack.
Prepare your local environment to manage a Leverage project.
Orchestrate the project's infrastructure.
Configure your users' credentials to interact with the project.
Upon completion of this guide you will gain an understanding of the structure of a project as well as familiarity with the tooling used to manage it.
To begin your journey into creating your first Leverage project, continue to the next section of the guide where you will start by setting up your AWS account.
"},{"location":"try-leverage/add-aws-accounts/","title":"Add more AWS Accounts","text":""},{"location":"try-leverage/add-aws-accounts/#brief","title":"Brief","text":"
You can add new AWS accounts to your Leverage project by following the steps in this page.
Important
In the examples below, we will be using apps-prd as the account we will be adding and it will be created in the us-east-1 region.
"},{"location":"try-leverage/add-aws-accounts/#create-the-new-account-in-your-aws-organization","title":"Create the new account in your AWS Organization","text":"
Go to management/global/organizations.
Edit the locals.tf file to add the account to the local accounts variable.
Note that the apps organizational unit (OU) is being used as the parent OU of the new account. If you need to use a new OU you can add it to organizational_units variable in the same file.
Run the Terraform workflow to apply the new changes. Typically that would be this:
Note this layer was first applied before using the boostrap user. Now, that we are working with SSO, credentials have changed. So, if this is the first account you add you'll probably get this error applying: \"Error: error configuring S3 Backend: no valid credential sources for S3 Backend found.\" In this case running leverage tf init -reconfigure will fix the issue.
Add the new account to the <project>/config/common.tfvars file. The new account ID should have been displayed in the output of the previous step, e.g.:
aws_organizations_account.accounts[\"apps-prd\"]: Creation complete after 14s [id=999999999999]\n
Note the id, 999999999999.
...so please grab it from there and use it to update the file as shown below:
accounts = {\n\n[...]\n\napps-prd = {\nemail = \"<aws+apps-prd@yourcompany.com>\",\n id = \"<add-the-account-id-here>\"\n}\n}\n
5. Since you are using SSO in this project, permissions on the new account must be granted before we can move forward. Add the right permissions to the management/global/sso/account_assignments.tf file. For the example:
Note your needs can vary, these permissions are just an example, please be careful with what you are granting here.
Apply these changes:
leverage terraform apply\n
And you must update your AWS config file accordingly by running this:
leverage aws configure sso\n
Good! Now you are ready to create the initial directory structure for the new account. The next section will guide through those steps.
"},{"location":"try-leverage/add-aws-accounts/#create-and-deploy-the-layers-for-the-new-account","title":"Create and deploy the layers for the new account","text":"
In this example we will create the apps-prd account structure by using the shared as a template.
"},{"location":"try-leverage/add-aws-accounts/#create-the-initial-directory-structure-for-the-new-account","title":"Create the initial directory structure for the new account","text":"
Ensure you are at the root of this repository
Now create the directory structure for the new account:
mkdir -p apps-prd/{global,us-east-1}\n
Set up the config files:
Create the config files for this account:
cp -r shared/config apps-prd/config\n
Open apps-prd/config/backend.tfvars and replace any occurrences of shared with apps-prd.
Do the same with apps-prd/config/account.tfvars
"},{"location":"try-leverage/add-aws-accounts/#create-the-terraform-backend-layer","title":"Create the Terraform Backend layer","text":"
If the source layer was already initialized you should delete the previous Terraform setup using sudo rm -rf .terraform* in the target layer's directory, e.g. rm -rf apps-prd/us-east-1/base-tf-backend/.terraform*
Go to the apps-prd/us-east-1/base-tf-backend directory, open the config.tf file and comment the S3 backend block. E.g.:
To finish with the backend layer, re-init to move the tfstate to the new location. Run:
leverage terraform init\n
Terraform will detect that you are trying to move from a local to a remote state and will ask for confirmation.
Initializing the backend...\nAcquiring state lock. This may take a few moments...\nDo you want to copy existing state to the new backend?\n Pre-existing state was found while migrating the previous \"local\" backend to the\n newly configured \"s3\" backend. No existing state was found in the newly\n configured \"s3\" backend. Do you want to copy this state to the new \"s3\"\nbackend? Enter \"yes\" to copy and \"no\" to start with an empty state.\n\n Enter a value:
Enter yes and hit enter.
"},{"location":"try-leverage/add-aws-accounts/#create-the-security-base-layer","title":"Create the security-base layer","text":"
Copy the layer from an existing one: From the repository root run:
If the source layer was already initialized you should delete the previous Terraform setup using sudo rm -rf .terraform* in the target layer's directory, e.g. rm -rf apps-prd/us-east-1/security-base/.terraform*
Go to the apps-prd/us-east-1/security-base directory and open the config.tf file replacing any occurrences of shared with apps-prd E.g. this line should be:
If the source layer was already initialized you should delete the previous Terraform setup using sudo rm -rf .terraform* in the target layer's directory, e.g. rm -rf apps-prd/us-east-1/base-network/.terraform*
Go to the apps-prd/us-east-1/base-network directory and open the config.tf file replacing any occurrences of shared with apps-prd. E.g. this line should be:
Note here only two AZs are enabled, if needed uncomment the other ones in the three structures.
Do not overlap CIDRs!
Be careful when chosing CIDRs. Avoid overlaping CIDRs between accounts. If you need a reference on how to chose the right CIDRs, please see here.
Calculate CIDRs
To calculate CIDRs you can check this playbook.
Init and apply the layer
leverage tf init\nleverage tf apply\n
Create the VPC Peering between the new account and the VPC of the Shared account. Edit file shared/us-east-1/base-network/config.tf and add provider and remote state for the created account.
provider \"aws\" {\nalias = \"apps-prd\"\nregion = var.region\nprofile = \"${var.project}-apps-prd-devops\"\n}\n\ndata \"terraform_remote_state\" \"apps-prd-vpcs\" {\nfor_each = {\nfor k, v in local.apps-prd-vpcs :\nk => v if !v[\"tgw\"]\n}\n\nbackend = \"s3\"\n\nconfig = {\nregion = lookup(each.value, \"region\")\nprofile = lookup(each.value, \"profile\")\nbucket = lookup(each.value, \"bucket\")\nkey = lookup(each.value, \"key\")\n}\n}\n
Edit file shared/us-east-1/base-network/locals.tf and under
Edit file shared/us-east-1/base-network/vpc_peerings.tf (if this is your first added account the file won\u00b4t exist, please crate it) and add the peering definition:
To keep creating infra on top of this binbash Leverage Landing Zone with this new account added, please check:
Check common use cases in Playbooks
Review the binbash Leverage architecture
Go for EKS!
"},{"location":"try-leverage/aws-account-setup/","title":"Creating your AWS Management account","text":""},{"location":"try-leverage/aws-account-setup/#create-the-first-aws-account","title":"Create the first AWS account","text":"
First and foremost you'll need to create an AWS account for your project.
Attention
Note this will be your management account and has to be called <project-name>-management.
E.g. if your project is called binbash then your account should be binbash-management.
Follow the instructions here.
This will be the management account for your AWS Organization and the email address you use for signing up will be the root user of this account -- you can see this user represented in the architecture diagram.
Since the root user is the main access point to your account it is strongly recommended that you keep its credentials (email, password) safe by following AWS best practices.
Tip
To protect your management account, enabling Multi Factor Authentication is highly encouraged. Also, reviewing the account's billing setup is always a good idea before proceeding.
For more details on setting up your AWS account: Organization account setup guide
"},{"location":"try-leverage/aws-account-setup/#create-a-bootstrap-user-with-temporary-administrator-permissions","title":"Create a bootstrap user with temporary administrator permissions","text":"
Leverage needs a user with temporary administrator permissions in order to deploy the initial resources that will form the foundations you will then use to keep building on. That initial deployment is called the bootstrap process and thus the user required for that is called \"the bootstrap user\".
To create that user, navigate to the IAM page and create a user named mgmt-org-admin following steps 2 and 3 of this leverage doc.
Info
Bear in mind that the page for creating users may change from time to time but the key settings for configuring the bootstrap user are the following:
It must be an IAM user (we won't be using IAM Identity Center for this)
Password can be auto-generated
It requires admin privileges which you can achieve by directly attaching the AdministratorAccess policy to it
There's no need to add the user to any group as it is only a temporary user
Usually the last step of the user creation should present you the following information:
Console sign-in URL
User name
Console password
Make a note of all of these and keep them in a safe place as you will need them in the following steps.
Info
If you are only getting the bootstrap user credentials for someone else in your team or in Binbash's team, then please share that using a secure way (e.g. password management service, GPG keys, etc).
Info
If user was set up with the option \"Force to change password on first login\", you should log into the console to do so.
You have successfully created and configured the AWS account for your Leverage project. From now on, almost all interactions with the AWS environment (with few notable exceptions) will be performed via Leverage.
Next, you will setup all required dependencies to work on a Leverage project in your local machine.
Change sso_enabled to true as follows to enable SSO support:
sso_enabled = true\n
Now you need to set the sso_start_url with the right URL. To find that, navigate here: https://us-east-1.console.aws.amazon.com/singlesignon/home -- you should be already logged in to the Management account for this to work. You should see a \"Settings summary\" panel on the right of the screen that shows the \"AWS access portal URL\". Copy that and use it to replace the value in the sso_start_url entry. Below is an example just for reference:
The 'AWS access portal URL' can be customized to use a more friendly name. Check the official documentation for that.
Further info on configuring SSO
There is more information on how to configure SSO here.
"},{"location":"try-leverage/enabling-sso/#update-backend-profiles-in-the-management-account","title":"Update backend profiles in the management account","text":"
It's time to set the right profile names in the backend configuration files. Open this file: management/config/backend.tfvars and change the profile value from this:
profile = \"me-bootstrap\"\n
To this:
profile = \"me-management-oaar\"\n
Please note that in the examples above my short project name is me which is used as a prefix and it's the part that doesn't get replaced."},{"location":"try-leverage/enabling-sso/#activate-your-sso-user-and-set-up-your-password","title":"Activate your SSO user and set up your password","text":"
The SSO users you created when you provisioned the SSO layer need to go through an email activation procedure.
The user is the one you set in the project.yaml file at the beginning, in this snippet:
Once SSO user's have been activated, they will need to get their initial password so they are able to log in. Check out the steps for that here.
Basically:
Log into your sso_start_url address
Ingress your username (the user email)
Under Password, choose Forgot password.
Type in the code shown in the screen
A reset password email will be sent
Follow the link and reset your password
Now, in the same URL as before, log in with the new credentials
You will be prompted to create an MFA, just do it.
"},{"location":"try-leverage/enabling-sso/#configure-the-cli-for-sso","title":"Configure the CLI for SSO","text":"
Almost there. Let's try the SSO integration now.
"},{"location":"try-leverage/enabling-sso/#configure-your-sso-profiles","title":"Configure your SSO profiles","text":"
Since this is your first time using that you will need to configure it by running this:
leverage aws configure sso\n
Follow the wizard to get your AWS config file created for you. There is more info about that here.
"},{"location":"try-leverage/enabling-sso/#verify-on-a-layer-in-the-management-account","title":"Verify on a layer in the management account","text":"
To ensure that worked, let's run a few commands to verify:
We'll use sso for the purpose of this example
Move to the management/global/sso layer
Run: leverage tf plan
You should get this error: \"Error: error configuring S3 Backend: no valid credential sources for S3 Backend found.\"
This happens because so far you have been running Terraform with a different AWS profile (the bootstrap one). Luckily the fix is simple, just run this: leverage tf init -reconfigure. Terraform should reconfigure the AWS profile in the .terraform/terraform.tfstate file.
Now try running that leverage tf plan command again
This time it should succeed, you should see the message: No changes. Your infrastructure matches the configuration.
Note if you still have the same error, try clearing credentials with:
Next, you will orchestrate the remaining accounts, security and shared.
"},{"location":"try-leverage/leverage-project-setup/","title":"Create a Leverage project","text":"
A Leverage project starts with a simple project definition file that you modify to suit your needs. That file is then used to render the initial directory layout which, at the end of this guide, will be your reference architecture. Follow the sections below to begin with that.
The account's name will be given by your project's name followed by -management, since Leverage uses a suffix naming system to differentiate between the multiple accounts of a project. For this guide we'll stick to calling the project MyExample and so, the account name will be myexample-management.
Along the same line, we'll use the example.com domain for the email address used to register the account. Adding a -aws suffix to the project's name to indicate that this email address is related to the project's AWS account, we end up with a registration email that looks like myexample-aws@example.com.
Email addresses for AWS accounts.
Each AWS account requires having a unique email address associated to it. The Leverage Reference Architecture for AWS makes use of multiple accounts to better manage the infrastructure, as such, you will need different addresses for each one. Creating a new email account for each AWS is not a really viable solution to this problem, a better approach is to take advantage of mail services that support aliases. For information regarding how this works: Email setup for your AWS account.
"},{"location":"try-leverage/leverage-project-setup/#create-the-project-directory","title":"Create the project directory","text":"
Each Leverage project lives in its own working directory. Create a directory for your project as follows:
mkdir myexample\ncd myexample\n
"},{"location":"try-leverage/leverage-project-setup/#initialize-the-project","title":"Initialize the project","text":"
Create the project definition file by running the following command:
$ leverage project init\n[18:53:24.407] INFO Project template found. Updating. [18:53:25.105] INFO Finished updating template. [18:53:25.107] INFO Initializing git repository in project directory. [18:53:25.139] INFO No project configuration file found. Dropping configuration template project.yaml. [18:53:25.143] INFO Project initialization finished.\n
The command above should create the project definition file (project.yaml) and should initialize a git repository in the current working directory. This is important because Leverage projects by-design rely on specific git conventions and also because it is assumed that you will want to keep your infrastructure code versioned.
"},{"location":"try-leverage/leverage-project-setup/#modify-the-project-definition-file","title":"Modify the project definition file","text":"
Open the project.yaml file and fill in the required information.
Typically the placeholder values between < and > symbols are the ones you would want to edit however you are welcome to adjust any other values to suit your needs.
For instance, the following is a snippet of the project.yaml file in which the values for project_name and short_name have been set to example and ex respectively:
The project_name field only accepts lowercase alphanumeric characters and allows hyphens('-'). For instance, valid names could be 'example' or 'leveragedemo' or 'example-demo'
The short_name field only accepts 2 to 4 lowercase alpha characters. For instance, valid names could be 'exam or 'leve or 'ex
We typically use as 1ry us-east-1 and 2ry us-west-2 as our default regions for the majority of our projects. However, please note that these regions may not be the most fitting choice for your specific use case. For detailed guidance, we recommend following these provided guidelines.
Another example is below. Note that the management, security, and shared accounts have been updated with slightly different email addresses (actually aws+security@example.com and aws+shared@example.com are email aliases of aws@example.com which is a convenient trick in some cases):
To be able to interact with your AWS environment you first need to configure the credentials to enable AWS CLI to do so. Provide the keys obtained in the previous account creation step to the command by any of the available means.
ManuallyFile selectionProvide file in command
leverage credentials configure --type BOOTSTRAP\n
[09:37:17.530] INFO Loading configuration file.\n[09:37:18.477] INFO Loading project environment configuration file.\n[09:37:20.426] INFO Configuring bootstrap credentials.\n> Select the means by which you'll provide the programmatic keys: Manually\n> Key: AKIAU1OF18IXH2EXAMPLE\n> Secret: ****************************************\n[09:37:51.638] INFO Bootstrap credentials configured in: /home/user/.aws/me/credentials\n[09:37:53.497] INFO Fetching management account id.\n[09:37:53.792] INFO Updating project configuration file.\n[09:37:55.344] INFO Skipping assumable roles configuration.\n
leverage credentials configure --type BOOTSTRAP\n
[09:37:17.530] INFO Loading configuration file.\n[09:37:18.477] INFO Loading project environment configuration file.\n[09:37:20.426] INFO Configuring bootstrap credentials.\n> Select the means by which you'll provide the programmatic keys: Path to an access keys file obtained from AWS\n> Path to access keys file: ../bootstrap_accessKeys.csv\n[09:37:51.638] INFO Bootstrap credentials configured in: /home/user/.aws/me/credentials\n[09:37:53.497] INFO Fetching management account id.\n[09:37:53.792] INFO Updating project configuration file.\n[09:37:55.344] INFO Skipping assumable roles configuration.\n
"},{"location":"try-leverage/leverage-project-setup/#create-the-configured-project","title":"Create the configured project","text":"
Now you will finally create all the infrastructure definition in the project.
leverage project create\n
[09:40:54.934] INFO Loading configuration file.\n[09:40:54.950] INFO Creating project directory structure.\n[09:40:54.957] INFO Finished creating directory structure.\n[09:40:54.958] INFO Setting up common base files.\n[09:40:54.964] INFO Account: Setting up management.\n[09:40:54.965] INFO Layer: Setting up config.\n[09:40:54.968] INFO Layer: Setting up base-tf-backend.\n[09:40:54.969] INFO Layer: Setting up base-identities.\n[09:40:54.984] INFO Layer: Setting up organizations.\n[09:40:54.989] INFO Layer: Setting up security-base.\n[09:40:54.990] INFO Account: Setting up security.\n[09:40:54.991] INFO Layer: Setting up config.\n[09:40:54.994] INFO Layer: Setting up base-tf-backend.\n[09:40:54.995] INFO Layer: Setting up base-identities.\n[09:40:55.001] INFO Layer: Setting up security-base.\n[09:40:55.002] INFO Account: Setting up shared.\n[09:40:55.003] INFO Layer: Setting up config.\n[09:40:55.006] INFO Layer: Setting up base-tf-backend.\n[09:40:55.007] INFO Layer: Setting up base-identities.\n[09:40:55.008] INFO Layer: Setting up security-base.\n[09:40:55.009] INFO Layer: Setting up base-network.\n[09:40:55.013] INFO Project configuration finished.\n INFO Reformatting terraform configuration to the standard style.\n[09:40:55.743] INFO Finished setting up project.\n
More information on project create
In this step, the directory structure for the project and all definition files are created using the information from the project.yaml file and checked for correct formatting.
You will end up with something that looks like this:
As you can see, it is a structure comprised of directories for each account containing all the definitions for each of the accounts respective layers.
\n
The layers themselves are also grouped based on the region in which they are deployed. The regions are configured through the project.yaml file. In the case of the Leverage landing zone, most layers are deployed in the primary region, so you can see the definition of these layers in a us-east-1 directory, as per the example configuration.
\n
Some layers are not bound to a region because their definition is mainly comprised of resources for services that are global in nature, like IAM or Organizations. These kind of layers are kept in a global directory.
You have now created the definition of all the infrastructure for your project and configured the credentials need to deploy such infrastructure in the AWS environment.
\n
Next, you will orchestrate the first and main account of the project, the management account.
Leverage-based projects are better managed via the Leverage CLI which is a companion tool that simplifies your daily interactions with Leverage. This page will guide you through the installation steps.
Now you have your system completely configured to work on a Leverage project.
Next, you will setup and create your Leverage project.
"},{"location":"try-leverage/management-account/","title":"Configure the Management account","text":"
Finally we reach the point in which you'll get to actually create the infrastructure in our AWS environment.
Some accounts and layers rely on other accounts or layers to be deployed first, which creates dependencies between them and establishes an order in which all layers should be deployed. We will go through these dependencies in order.
The management account is used to configure and access all the accounts in the AWS Organization. Consolidated Billing and Cost Management are also enforced though this account.
Costs associated with this solution
By default this AWS Reference Architecture configuration should not incur in any costs.
"},{"location":"try-leverage/management-account/#deploy-the-management-accounts-layers","title":"Deploy the Management account's layers","text":"
To begin, place yourself in the management account directory.
All apply commands will prompt for confirmation, answer yes when this happens.
More information on terraform init and terraform apply
Now, the infrastructure for the Terraform state management is created. The next step is to push the local .tfstate to the bucket. To do this, uncomment the backend section for the terraform configuration in management/base-tf-backend/config.tf
The AWS account that you created manually is the management account itself, so to prevent Terraform from trying to create it and error out, this account definition is commented by default in the code. Now you need to make the Terraform state aware of the link between the two. To do that, uncomment the management organizations account resource in accounts.tf
Zsh users may need to prepend noglob to the import command for it to be recognized correctly, as an alternative, square brackets can be escaped as \\[\\]
"},{"location":"try-leverage/management-account/#update-the-bootstrap-credentials","title":"Update the bootstrap credentials","text":"
Now that the management account has been deployed, and more specifically, all Organizations accounts have been created (in the organizations layer) you need to update the credentials for the bootstrap process before proceeding to deploy any of the remaining accounts.
This will fetch the organizations structure from the AWS environment and create individual profiles associated with each account for the AWS CLI to use. So, run:
Before working on the SSO layer you have to navigate to the AWS IAM Identity Center page, set the region to the primary region you've chosen and enable Single Sign-On (SSO) by clicking on the Enable button.
Now back to the terminal. The SSO layer is deployed in two steps. First, switch to the global/sso directory and run the following:
Now you not only have a fully functional landing zone configuration deployed, but also are able to interact with it using your own AWS SSO credentials.
For more detailed information on the binbash Leverage Landing Zone, visit the links below.
How it works
User guide
"},{"location":"try-leverage/security-and-shared-accounts/","title":"Configure the Security and Shared accounts","text":"
You should by now be more familiar with the steps required to create and configure the Management account. Now you need to do pretty much the same with two more accounts: Security and Shared. Follow the sections in this page to get started!
What are these accounts used for?
The Security account is intended for operating security services (e.g. GuardDuty, AWS Security Hub, AWS Audit Manager, Amazon Detective, Amazon Inspector, and AWS Config), monitoring AWS accounts, and automating security alerting and response.
The Shared Services account supports the services that multiple applications and teams use to deliver their outcomes. Some examples include VPN servers, monitoring systems, and centralized logs management services.
"},{"location":"try-leverage/security-and-shared-accounts/#deploy-the-security-accounts-layers","title":"Deploy the Security account's layers","text":"
The next account to orchestrate is the security account.
This account is intended for centralized user management via a IAM roles based cross organization authentication approach. This means that most of the users for your organization will be defined in this account and those users will access the different accounts through this one.
"},{"location":"try-leverage/security-and-shared-accounts/#deploy-the-shared-accounts-layers","title":"Deploy the Shared account's layers","text":"
The last account in this deployment is the shared account.
Again, this account is intended for managing the infrastructure of shared services and resources such as directory services, DNS, VPN, monitoring tools or centralized logging solutions.
You have now a fully deployed landing zone configuration for the Leverage Reference Architecture for AWS, with its three accounts management, security and shared ready to be used.
Start/Stop EC2/RDS instances using schedule or manual endpoint
Calculate VPC subnet CIDRs
Kubernetes in different stages
Encrypting/decrypting files with SOPS+KMS
Enable/Disable nat gateway
ArgoCD add external cluster
"},{"location":"user-guide/cookbooks/VPC-subnet-calculator/","title":"How to calculate the VPC subnet CIDRs?","text":"
To calculate subnets this calculator can be used
Note in this link a few params were added: the base network and mask, and the division number. In this case the example is for the shared account networking.
Note the main CIDR is being used for the VPC. See on the left how the /20 encompasses all the rows.
Then two divisions for /21. Note the first subnet address of the first row for each one is being used for private_subnets_cidr and public_subnets_cidr.
Finally the /23 are being used for each subnet.
Note we are using the first two subnet addresses for each /21. This is due to we are reserving the other two to allow adding more AZs in the future. (up to two in this case)
If you want you can take as a reference this page to select CIDRs for each account.
"},{"location":"user-guide/cookbooks/VPC-with-no-LandingZone/","title":"VPC with no Landing Zone","text":""},{"location":"user-guide/cookbooks/VPC-with-no-LandingZone/#what","title":"What","text":"
Do you want to try binbash Leverage but you are not willing to transform yet your already existent infra to the binbash Leverage Landing Zone (honoring the AWS Well Architected Framework)?
With this cookbook you will create a VPC with all the benefits binbash Leverage network layer provides.
If you want to use the Full binbash Leverage Landing Zone please visit the Try Leverage section
This will give you the full power of binbash Leverage and the AWS Well Architected Framework.
Since we are testing we won't use the S3 backend (we didn't create the bucket, but you can do it easily with the base-tf-backend layer), so comment this line in config.tf file:
"},{"location":"user-guide/cookbooks/VPC-with-no-LandingZone/#get-the-layer","title":"Get the layer","text":"
For this step we'll go for a layer that can be found in the binbash Leverage RefArch under this directory.
You can download a directory from a git repository using this Firefox addon or any method you want.
Note when you copy the layer (e.g. with gitzip), the file common-variables.tf , which is a soft link, was probably copied as a regular file. If this happens, delete it:
cd ec2-fleet-ansible\\ --\nrm common-variables.tf\n
"},{"location":"user-guide/cookbooks/VPC-with-no-LandingZone/#prepare-the-layer","title":"Prepare the layer","text":"
Again, since we are not running the whole binbash Leverage Landing Zone we need to comment out these lines in config.tf:
...again, due to the lack of the whole binbash Leverage Landing Zone...
If you plan to access the instance from the Internet (EC2 in a public subnet)(e.g. to use Ansible), you change the first line to \"0.0.0.0/0\". (or better, a specific public IP)
If you want to add an SSH key (e.g. to use Ansible), you can generate a new SSH key, add a resource like this:
And replace the line in ec2_fleet.tf with this one:
key_name = aws_key_pair.devops.key_name\n
In the same file, change instance_type as per your needs.
Also you can add this * * *to the ec2_ansible_fleet resource:
create_spot_instance = true\n
to create spot instances.... and this
create_iam_instance_profile = true\niam_role_description = \"IAM role for EC2 instance\"\niam_role_policies = {\nAmazonSSMManagedInstanceCore = \"arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore\"\n}\n
to add SSM access.
In locals.tf file check the variable multiple_instances. There the EC2 instances are defined, by default there are four. Remember to set the subnets in which the instances will be created.
Finally, apply the layer:
leverage tf apply\n
Check your public IP and try to SSH into your new instance!
Have fun!
"},{"location":"user-guide/cookbooks/argocd-external-cluster/","title":"How to add an external cluster to ArgoCD to manage it","text":""},{"location":"user-guide/cookbooks/argocd-external-cluster/#goal","title":"Goal","text":"
Given an ArgoCD installation created with binbash Leverage Landing Zone using the EKS layer, add and manage an external Cluster.
There can be a single ArgoCD instance for all cluster or multiple instances installed:
We are assuming the binbash Leverage Landing Zone is deployed, two accounts called shared and apps-devstg were created and a region us-east-1 is being used. In any case you can adapt these examples to other scenarios.
Note all the argocd namespace's ServiceAccounts were added to oidc_fully_qualified_subjects (because different ArgoCD components use different SAs), and they will be capable of assume the role ${local.environment}-argocd-devstg. (Since we are working in shared the role will be shared-argocd-devstg)
This role lives in shared account.
Apply the layer:
leverage tf apply\n
Info
Note this step creates a role and binds it to the in-cluster serviceaccounts.
"},{"location":"user-guide/cookbooks/argocd-external-cluster/#create-the-target-role-and-change-the-aws_auth-config-map","title":"Create the target role and change the aws_auth config map","text":"
Info
This has to be done in apps-devstg account.
"},{"location":"user-guide/cookbooks/argocd-external-cluster/#create-the-role","title":"Create the role","text":"
Go into the apps-devstg/global/base-identities layer.
In file roles.tf add this resource:
module \"iam_assumable_role_argocd\" {\nsource = \"github.com/binbashar/terraform-aws-iam.git//modules/iam-assumable-role?ref=v4.1.0\"\n\ntrusted_role_arns = [\n\"arn:aws:iam::${var.accounts.shared.id}:root\"\n]\n\ncreate_role = true\nrole_name = \"ArgoCD\"\nrole_path = \"/\"\n\n #\n # MFA setup\n #\nrole_requires_mfa = false\nmfa_age = 43200 # Maximum CLI/API session duration in seconds between 3600 and 43200\nmax_session_duration = 3600 # Max age of valid MFA (in seconds) for roles which require MFA\ncustom_role_policy_arns = [\n]\n\ntags = local.tags\n}\n
Note MFA is deactivated since this is a programatic access role. Also no policies are added since we need to assume it just to access the cluster.
Apply the layer:
leverage tf apply\n
Info
This step will add a role that can be assumed from the shared account.
"},{"location":"user-guide/cookbooks/argocd-external-cluster/#update-the-aws_auth-config-map","title":"Update the aws_auth config map","text":"
cd into layer apps-devstg/us-east-1/k8s-eks/cluster.
Edit file locals.tf, under map_roles list add this:
"},{"location":"user-guide/cookbooks/argocd-external-cluster/#create-the-external-cluster-in-argocd","title":"Create the external cluster in ArgoCD","text":"
Info
This has to be done in shared account.
In shared/us-east-1/k8s-eks/k8s-components layer modify files cicd-argocd.tf and chart-values/argocd.yaml and add this to the first one:
##------------------------------------------------------------------------------\n## ArgoCD DEVSTG: GitOps + CD\n##------------------------------------------------------------------------------\nresource \"helm_release\" \"argocd_devstg\" {\ncount = var.enable_argocd_devstg ? 1 : 0\nname = \"argocd-devstg\"\nnamespace = kubernetes_namespace.argocd_devstg[0].id\nrepository = \"https://argoproj.github.io/argo-helm\"\nchart = \"argo-cd\"\nversion = \"6.7.3\"\nvalues = [\ntemplatefile(\"chart-values/argocd.yaml\", {\nargoHost = \"argocd-devstg.${local.environment}.${local.private_base_domain}\"\ningressClass = local.private_ingress_class\nclusterIssuer = local.clusterissuer_vistapath\nroleArn = data.terraform_remote_state.eks-identities.outputs.argocd_devstg_role_arn\nremoteRoleARN = \"role\"\nremoteClusterName = \"clustername\"\nremoteServer = \"remoteServer\"\nremoteName = \"remoteName\"\nremoteClusterCertificate = \"remoteClusterCertificate\"\n}),\n # We are using a different approach here because it is very tricky to render\n # properly the multi-line sshPrivateKey using 'templatefile' function\nyamlencode({\nconfigs = {\nsecret = {\nargocd_devstgServerAdminPassword = data.sops_file.secrets.data[\"argocd_devstg.serverAdminPassword\"]\n}\n # Grant Argocd_Devstg access to the infrastructure repo via private SSH key\nrepositories = {\nwebapp = {\nname = \"webapp\"\nproject = \"default\"\nsshPrivateKey = data.sops_file.secrets.data[\"argocd_devstg.webappRepoDeployKey\"]\ntype = \"git\"\nurl = \"git@github.com:VistaPath/webapp.git\"\n}\n}\n}\n # Enable SSO via Github\nserver = {\nconfig = {\nurl = \"https://argocd_devstg.${local.environment}.${local.private_base_domain}\"\n\"dex.config\" = data.sops_file.secrets.data[\"argocd_devstg.dexConfig\"]\n}\n}\n})\n]\n}\n
This is a simpler (than the previous one) method, but also is less secure.
It uses a bearer token, which should be rotated periodically. (maybe manually or with a custom process)
Given this diagram:
ArgoCD will call the target cluster directly using the bearer token as authentication.
So, these are the steps:
create a ServiceAccount and its token in the target cluster
create the external cluster in the source cluster's ArgoCD
"},{"location":"user-guide/cookbooks/argocd-external-cluster/#create-the-serviceaccount","title":"Create the ServiceAccount","text":"
Info
This has to be done in apps-devstg account.
There are two ways to grant access. Cluster level or namespace scoped.
If namespace scoped ServiceAccount, Role and Rolebinding are needed to grant access to ArgoCD to the target cluster. If cluster level then ServiceAccount, ClusterRole and ClusterRolebinding. The former needs the namespaces to be created beforehand. The later allows ArgoCD to create the namespaces.
In the target cluster identities layer at apps-devstg/us-east-1/k8s-eks/identities create a tf file and add this:
The following example is for namespace scoped way.
This step will create a ServiceAccount, a Role with the needed permissions, the RoleBinding and the secret with the token. (or clusterrole and clusterrolebinding) Also, multiple namespaces can be specified for namespace scoped way.
To recover the token and the API Server run this:
NAMESPACE=test\nSECRET=$(leverage kubectl get secret -n ${NAMESPACE} -o jsonpath='{.items[?(@.metadata.generateName==\\\"argocd-managed-\\\")].metadata.name}' | sed -E '/^\\[/d')\nTOKEN=$(leverage kubectl get secret ${SECRET} -n ${NAMESPACE} -o jsonpath='{.data.token}' | sed -E '/^\\[/d' | base64 --decode)\nAPISERVER=$(leverage kubectl config view --minify -o jsonpath='{.clusters[0].cluster.server}' | sed -E '/^\\[/d')\n
"},{"location":"user-guide/cookbooks/argocd-external-cluster/#create-the-external-cluster-in-argocd_1","title":"Create the external cluster in ArgoCD","text":"
Info
This has to be done in shared account.
In shared/us-east-1/k8s-eks/k8s-components layer modify files cicd-argocd.tf and chart-values/argocd.yaml and add this to the first one:
##------------------------------------------------------------------------------\n## ArgoCD DEVSTG: GitOps + CD\n##------------------------------------------------------------------------------\nresource \"helm_release\" \"argocd_devstg\" {\ncount = var.enable_argocd_devstg ? 1 : 0\nname = \"argocd-devstg\"\nnamespace = kubernetes_namespace.argocd_devstg[0].id\nrepository = \"https://argoproj.github.io/argo-helm\"\nchart = \"argo-cd\"\nversion = \"6.7.3\"\nvalues = [\ntemplatefile(\"chart-values/argocd.yaml\", {\nargoHost = \"argocd-devstg.${local.environment}.${local.private_base_domain}\"\ningressClass = local.private_ingress_class\nclusterIssuer = local.clusterissuer_vistapath\nroleArn = data.terraform_remote_state.eks-identities.outputs.argocd_devstg_role_arn\nremoteServer = \"remoteServer\"\nremoteName = \"remoteName\"\nremoteClusterCertificate = \"remoteClusterCertificate\"\nbearerToken = \"bearerToken\"\n}),\n # We are using a different approach here because it is very tricky to render\n # properly the multi-line sshPrivateKey using 'templatefile' function\nyamlencode({\nconfigs = {\nsecret = {\nargocd_devstgServerAdminPassword = data.sops_file.secrets.data[\"argocd_devstg.serverAdminPassword\"]\n}\n # Grant Argocd_Devstg access to the infrastructure repo via private SSH key\nrepositories = {\nwebapp = {\nname = \"webapp\"\nproject = \"default\"\nsshPrivateKey = data.sops_file.secrets.data[\"argocd_devstg.webappRepoDeployKey\"]\ntype = \"git\"\nurl = \"git@github.com:VistaPath/webapp.git\"\n}\n}\n}\n # Enable SSO via Github\nserver = {\nconfig = {\nurl = \"https://argocd_devstg.${local.environment}.${local.private_base_domain}\"\n\"dex.config\" = data.sops_file.secrets.data[\"argocd_devstg.dexConfig\"]\n}\n}\n})\n]\n}\n
clusterResources false is so that ArgoCD is prevented to manage cluster level resources.
namespaces scopes the namespaces on which ArgoCD can deploy resources.
Apply the layer:
leverage tf apply\n
Info
This step will create the external-cluster configuration for ArgoCD. Now you can see the cluster in the ArgoCD web UI.
"},{"location":"user-guide/cookbooks/argocd-external-cluster/#deploying-stuff-to-the-target-cluster","title":"Deploying stuff to the target cluster","text":"
To deploy an App to a given cluster, these lines have to be added to the manifest:
"},{"location":"user-guide/cookbooks/enable-nat-gateway/","title":"Enable nat-gateway using binbash Leverage","text":""},{"location":"user-guide/cookbooks/enable-nat-gateway/#goal","title":"Goal","text":"
To activate the NAT Gateway in a VPC created using binbash Leverage Landing Zone.
We are assuming the binbash Leverage Landing Zone is deployed, an account called apps-devstg was created and a region us-east-1 is being used. In any case you can adapt these examples to other scenarios.
if you called the layer other that this, please set the right dir here
Check a file called terraform.auto.tfvars exists. If it does not, create it.
Edit the file and set this content:
vpc_enable_nat_gateway = true\n
Apply the layer as usual:
leverage tf apply\n
"},{"location":"user-guide/cookbooks/enable-nat-gateway/#how-to-disable-the-nat-gateway","title":"How to disable the nat gateway","text":"
Do the same as before but setting this in the tfvars file:
vpc_enable_nat_gateway = false\n
"},{"location":"user-guide/cookbooks/k8s/","title":"Kubernetes for different stages of your projects","text":""},{"location":"user-guide/cookbooks/k8s/#goal","title":"Goal","text":"
When starting a project using Kubernetes, usually a lot of testing is done.
Also, as a startup, the project is trying to save costs. (since probably no clients, or just a few, are now using the product)
To achieve this, we suggest the following path:
Step 0 - develop in a K3s running on an EC2
Step 1 - starting stress testing or having the first clients, go for KOPS
Step 2 - when HA, escalation and easy of management is needed, consider going to EKS
We are assuming the binbash Leverage Landing Zone is deployed, an account called apps-devstg was created and a region us-east-1 is being used. In any case you can adapt these examples to other scenarios.
A gossip-cluster (not exposed to Internet cluster, an Internet exposed cluster can be created using Route53) with a master node and a worker node (with node autoscaling capabilities) will be deployed here.
More master nodes can be deployed. (i.e. one per AZ, actually three are recommended for production grade clusters)
It will be something similar to what is stated here, but with one master, one worker, and the LB for the API in the private network.
We are assuming here the worker Instance Group is called nodes. If you change the name or have more than one Instance Group you need to adapt the first tag.
Info
Note a DNS is not needed since this will be a gossip cluster.
Info
A new bucket is created so KOPS can store the state there
By default, the account base network is used. If you want to change this check/modify this resource in config.tf file:
data \"terraform_remote_state\" \"vpc\" {\n
Also, shared VPC will be used to allow income traffic from there. This is because in the binbash Leverage Landing Zone defaults, the VPN server will be created there.
cd into the 1-prerequisites directory.
Open the locals.tf file.
Here these items can be updated:
versions
machine types (and max, min qty for masters and workers autoscaling groups)
the number of AZs that will be used for master nodes.
Remember binbash Leverage has its rules for this, the key name should match <account-name>/[<region>/]<layer-name>/<sublayer-name>/terraform.tfstate.
Init and apply as usual:
leverage tf init\nleverage tf apply\n
Warning
You will be prompted to enter the ssh_pub_key_path. Here enter the full path (e.g. /home/user/.ssh/thekey.pub) for your public SSH key and hit enter. A key managed by KMS can be used here. A regular key-in-a-file is used for this example, but you can change it as per your needs.
Info
Note if for some reason the nat-gateway changes, this layer has to be applied again.
Info
Note the role AWSReservedSSO_DevOps (the one created in the SSO for Devops) is added as system:masters. If you want to change the role, check the devopsrole in data.tf file.
"},{"location":"user-guide/cookbooks/k8s/#2-apply-the-cluster-with-kops","title":"2 - Apply the cluster with KOPS","text":"
cd into the 2-kops directory.
Open the config.tf file and edit the backend key if needed:
Remember binbash Leverage has its rules for this, the key name should match <account-name>/[<region>/]<layer-name>/<sublayer-name>/terraform.tfstate.
Info
If you want to check the configuration:
make cluster-template\n
The final template in file cluster.yaml.
If you are happy with the config (or you are not happy but you think the file is ok), let's create the Terraform files!
make cluster-update\n
Finally, apply the layer:
leverage tf init\nleverage tf apply\n
Cluster can be checked with this command:
make kops-cmd KOPS_CMD=\"validate cluster\"\n
"},{"location":"user-guide/cookbooks/k8s/#accessing-the-cluster","title":"Accessing the cluster","text":"
Here there are two questions.
One is how to expose the cluster so Apps running in it can be reached.
The other one is how to access the cluster's API.
For the first one:
since this is a `gossip-cluster` and as per the KOPS docs: When using gossip mode, you have to expose the kubernetes API using a loadbalancer. Since there is no hosted zone for gossip-based clusters, you simply use the load balancer address directly. The user experience is identical to standard clusters. kOps will add the ELB DNS name to the kops-generated kubernetes configuration.\n
So, we need to create a LB with public access.
For the second one, we need to access the VPN (we have set the access to the used network previously), and hit the LB. With the cluster, a Load Balancer was deployed so you can reach the K8s API.
"},{"location":"user-guide/cookbooks/k8s/#access-the-api","title":"Access the API","text":"
Run:
make kops-kubeconfig\n
A file named as the cluster is created with the kubeconfig content (admin user, so keep it safe). So export it and use it!
export KUBECONFIG=$(pwd)/clustername.k8s.local\nkubectl get ns\n
Warning
You have to be connected to the VPN to reach your cluster!
"},{"location":"user-guide/cookbooks/schedule-start-stop-ec2/","title":"Start/Stop EC2/RDS instances using schedule or manual endpoint","text":""},{"location":"user-guide/cookbooks/schedule-start-stop-ec2/#what","title":"What?","text":"
You have EC2 instances (or RDS) that are not being used all the time... so why to keep them up and running and billing? Here we'll create a simple schedule to turn them off/on. (also with an HTTP endpoint to do it so manually)
In your binbash Leverage infra repository, under your desired account and region, copy this layer.
You can download a directory from a git repository using this Firefox addon or any method you want.
Remember, if the common-variables.tf file delete the file and soft-link it to the homonymous file in the root config dir: e.g. common-variables.tf -> ../../../config/common-variables.tf
"},{"location":"user-guide/cookbooks/schedule-start-stop-ec2/#set-the-tags","title":"Set the tags","text":"
In the tools-cloud-scheduler-stop-start layer edit the main.tf file. There are two resources: - schedule_ec2_stop_daily_midnight to stop the instances - schedule_ec2_start_daily_morning to start the instances
You can change these names. If you do so remember to change all the references to them.
In the resource_tags element set the right tags. E.g. this:
in the schedule_ec2_stop_daily_midnight resource means this resource will stop instances with tag: ScheduleStopDaily=true."},{"location":"user-guide/cookbooks/schedule-start-stop-ec2/#set-the-schedule","title":"Set the schedule","text":"
Here you can set the schedule in a cron-like fashion.
If it is none it won't create a schedule (e.g. if you only need http endpoint):
cloudwatch_schedule_expression = \"none\"\n
Then if you set this:
http_trigger = true\n
A HTTP endpoint will be created to trigger the corresponding action.
If an endpoint was created then in the outputs the URL will be shown.
"},{"location":"user-guide/cookbooks/sops-kms/","title":"Encrypt and decrypt SOPS files with AWS KMS","text":""},{"location":"user-guide/cookbooks/sops-kms/#goal","title":"Goal","text":"
Using a SOPS file to store secrets in the git repository.
We are assuming the binbash Leverage Landing Zone is deployed, an account called apps-devstg was created and a region us-east-1 is being used. In any case you can adapt these examples to other scenarios.
"},{"location":"user-guide/cookbooks/sops-kms/#encrypt-the-file","title":"Encrypt the file","text":"
Note for encrypting you need to specify an AWS Profile. In the binbash Leverage context profiles are like this: {short-project-name}-{account}-{role}. For example, for my apps-devstg account, using the role devops, in my project bb, the profile is: bb-apps-devstg-devops.
Since binbash Leverage Landing Zone is being used, the default key for the account+region has an alias: ${var.project}_${var.environment}_${var.kms_key_name}_key, in this case is vp_apps-devstg_default_key, so arn:aws:kms:<region>:<account>:alias/vp_apps-devstg_default_key should be used.
Info
To use this file with Terraform, edit the secrets.enc.yaml and at the bottom, edit the line with aws_profile and set there the AWS Profile you've used to encrypt the file.
"},{"location":"user-guide/cookbooks/sops-kms/#decrypt-the-file","title":"Decrypt the file","text":"
"},{"location":"user-guide/infra-as-code-library/infra-as-code-library-forks/","title":"Leverage Open Source Modules management.","text":"
We\u2019ll fork every Infrastructure as Code (IaC) Library dependency repo, why?
Grant full governance over the lib repositories
Availability: Because our project resilience and continuity (including the clients) depends on these repositories (via requirement files or imports) and we want and need total control over the repository used as a dependency. NOTE: There could be few exceptions when using official open source modules makes sense, e.g. the ones shared and maintained by Nginx, Weave, Hashiport, etc.
Reliability (Avoid unforeseen events): in the event that the original project becomes discontinued while we are still working or depending on it (the owners, generally individual maintainers of the original repository, might decide to move from github, ansible galaxy, etc. or even close their repo for personal reasons).
Stability: Our forks form modules (ansible roles / terraform / dockerfiles, etc.) are always going to be locked to fixed versions for every client so no unexpected behavior will occur.
Projects that don't tag versions: having the fork protects us against breaking changes.
Write access: to every Leverage library component repository ensuring at all times that we can support, update, maintain, test, customize and release a new version of this component.
Centralized Org source of truth: for improved customer experience and keeping dependencies consistently imported from binbash repos at Leverage Github
Scope: binbash grants and responds for all these dependencies.
Metrics: Dashboards w/ internal measurements.
Automation: We\u2019ll maintain all this workflow cross-tech as standardized and automated as possible, adding any extra validation like testing, security check, etc., if needed -> Leverage dev-tools
Licence & Ownership: Since we fork open-source and commercially reusable components w/ MIT and Apache 2.0 license. We keep full rights to all commercial, modification, distribution, and private use of the code (No Lock-In w/ owners) through forks inside our own Leverage Project repos. As a result, when time comes, we can make our libs private at any moment if necessary. (for the time being Open Source looks like the best option)
Collaborators considerations
We look forward to have every binbash Leverage repo open sourced favoring the collaboration of the open source community.
Repos that are still private must not be forked by our internal collaborators till we've done a detailed and rigorous review in order to open source them.
As a result any person looking forward to use, extend or update Leverage public repos, could also fork them in its personal or company Github account and create an upstream PR to contribute.
"},{"location":"user-guide/infra-as-code-library/infra-as-code-library-specs/","title":"Tech Specifications","text":"As Code: Hundred of thousands lines of code
Written in:
Terraform
Groovy (Jenkinsfiles)
Ansible
Makefiles + Bash
Dockerfiles
Helm Charts
Stop reinventing the wheel, automated and fully as code
automated (executable from a single source).
as code.
parameterized
variables
input parameters
return / output parameters
\"Stop reinventing the wheel\"
avoid re-building the same things more than X times.
avoid wasting time.
not healthy, not secure and slows us down.
DoD of a highly reusable, configurable, and composable sub-modules
Which will be 100%
modular
equivalent to other programming languages functions - Example for terraform - https://www.terraform.io/docs/modules/usage.html (but can be propagated for other languages and tools):
inputs, outputs parameters.
code reuse (reusable): consider tf modules and sub-modules approach.
testable by module / function.
Since TF is oriented to work through 3rd party API calls, then tests are more likely to be integration tests rather than unit tests. If we don't allow integration for terraform then we can't work at all.
This has to be analyzed for every language we'll be using and how we implement it (terraform, cloudformation, ansible, python, bash, docker, kops and k8s kubeclt cmds)
composition (composable): have multiple functions and use them together
abstraction (abstract away complexity): we have a very complex function but we only expose it's definition to the API, eg: def_ai_processing(data_set){very complex algorithm here}; ai_processing([our_data_set_here])
avoid inline blocks: The configuration for some Terraform resources can be defined either as inline blocks or as separate resources. For example, the aws_route_table resource allows you to define routes via inline blocks. But by doing so, your module become less flexible and configurable. Also, if a mix of both, inline blocks and separate resources, is used, errors may arise in which they conflict and overwrite each other. Therefore, you must use one or the other (ref: https://blog.gruntwork.io/how-to-create-reusable-infrastructure-with-terraform-modules-25526d65f73d) As a rule of thumb, when creating a module, separate resources should always be used.
use module-relative paths: The catch is that the used file path has to be relative (since you could run Terraform on many different computers)\u200a\u2014\u200abut relative to what? By default, Terraform interprets the path as relative to the working directory. That\u2019s a good default for normal Terraform templates, but it won\u2019t work if the file is part of a module. To solve this issue, always use a path variable in file paths. eg:
So as to be able to manage them as a software product with releases and change log. This way we'll be able to know which version is currently deployed in a given client and consider upgrading it.
Env Parity
Promote immutable, versioned infra modules based across envs.
Updated
Continuously perform updates, additions, and fixes to libraries and modules.
Orchestrated in automation
We use the leverage-cli for this purpose
Proven & Tested
Every commit goes through a suite of automated tests to grant code styling and functional testing.
Develop a wrapper/jobs together with specific testing tools in order to grant the modules are working as expected.
Ansible:
Testing your ansible roles w/ molecule
How to test ansible roles with molecule on ubuntu
Terraform:
gruntwork-io/terratest
Cost savings by design
The architecture for our Library / Code Modules helps an organization to analyze its current IT and DevSecOps Cloud strategy and identify areas where changes could lead to cost savings. For instance, the architecture may show that multiple database systems could be changed so only one product is used, reducing software and support costs. Provides a basis for reuse. The process of architecting can support both the use and creation of reusable assets. Reusable assets are beneficial for an organization, since they can reduce the overall cost of a system and also improve its quality, since a reusable asset has already been proven.
Full Code Access & No Lock-In
You get access to 100% of the code under Open Source license, if you choose to discontinue the direct support of the binbash Leverage team, you keep rights to all the code.
Documented
Includes code examples, use cases and thorough documentation, such as README.md, --help command, doc-string and in line comments.
Supported & Customizable
Commercially maintained and supported by binbash.
"},{"location":"user-guide/infra-as-code-library/modules-library-by-technology/","title":"Modules by Technology","text":""},{"location":"user-guide/infra-as-code-library/modules-library-by-technology/#open-source-modules-repos","title":"Open Source Modules Repos","text":"Category URLs Ansible Galaxy Roles bb-leverage-ansible-roles-list Dockerfiles bb-leverage-dockerfiles-list Helm Charts bb-leverage-helm-charts-list Terraform Modules bb-leverage-terraform-modules-list"},{"location":"user-guide/infra-as-code-library/modules-library-by-technology/#open-source-private-modules-repos-via-github-teams","title":"Open Source + Private Modules Repos (via GitHub Teams)","text":"Repositories Details Reference Architecture Most of the AWS resources are here, divided by account. Dockerfiles These are Terraform module we created/imported to build reusable resources / stacks. Ansible Playbooks & Roles Playbooks we use for provisioning servers such as Jenkins, Spinnaker, Vault, and so on. Helm Charts Complementary Jenkins pipelines to clean docker images, unseal Vault, and more. Also SecOps jobs can be found here. Terraform Modules Jenkins pipelines, docker images, and other resources used for load testing."},{"location":"user-guide/infra-as-code-library/overview/","title":"Infrastructure as Code (IaC) Library","text":""},{"location":"user-guide/infra-as-code-library/overview/#overview","title":"Overview","text":"
A collection of reusable, tested, production-ready E2E infrastructure as code solutions, leveraged by modules written in Terraform, Ansible, Dockerfiles, Helm charts and Makefiles.
To view a list of all the available commands and options in your current Leverage version simply run leverage or leverage --help. You should get an output similar to this:
$ leverage\nUsage: leverage [OPTIONS] COMMAND [ARGS]...\n\n Leverage Reference Architecture projects command-line tool.\n\nOptions:\n -f, --filename TEXT Name of the build file containing the tasks\n definitions. [default: build.py]\n-l, --list-tasks List available tasks to run.\n -v, --verbose Increase output verbosity.\n --version Show the version and exit.\n --help Show this message and exit.\n\nCommands:\n aws Run AWS CLI commands in a custom containerized environment.\n credentials Manage AWS cli credentials.\n kc Run Kubectl commands in a custom containerized environment.\n kubectl Run Kubectl commands in a custom containerized environment.\n project Manage a Leverage project.\n run Perform specified task(s) and all of its dependencies.\n shell Run a shell in a generic container.\n terraform Run Terraform commands in a custom containerized...\n tf Run Terraform commands in a custom containerized...\n tfautomv Run TFAutomv commands in a custom containerized...\n
Similarly, subcommands provide further information by means of the --help flag. For example leverage tf --help.
-f | --filename: Name of the file containing the tasks' definition. Defaults to build.py
-l | --list-tasks: List all the tasks defined for the project along a description of their purpose (when available).
Tasks in build file `build.py`:\n\n clean Clean build directory.\n copy_file \n echo \n html Generate HTML.\n images [Ignored] Prepare images.\n start_server [Default] Start the server\n stop_server \n\nPowered by Leverage 1.9.0\n
-v | --verbose: Increases output verbosity. When running a command in a container, the tool provides a description of the container's configuration before the execution. This is specially useful if the user were to to have the need of recreating Leverage's behavior by themselves.
Mapping of the host (Source) directories and files into the container (Target)
Command being executed (useful when trying to replicate Leverage's behavior by yourself)
"},{"location":"user-guide/leverage-cli/history/","title":"A bit of history","text":""},{"location":"user-guide/leverage-cli/history/#how-leverage-cli-came-about","title":"How Leverage CLI came about","text":"
The multiple tools and technologies required to work with a Leverage project were initially handled through a Makefiles system. Not only to automate and simplify the different tasks, but also to provide a uniform user experience during the management of a project.
As a result of more and more features being added and the Leverage Reference Architecture becoming broader and broader, our Makefiles were growing large and becoming too repetitive, and thus, harder to maintain. Also, some limitations and the desire for a more friendly and flexible language than that of Makefiles made evident the need for a new tool to take their place.
Python, a language broadly adopted for automation due to its flexibility and a very gentle learning curve seemed ideal. Even more so, Pynt, a package that provides the ability to define and manage tasks as simple Python functions satisfied most of our requirements, and thus, was selected for the job. Some gaps still remained but with minor modifications these were bridged.
Gradually, all capabilities originally implemented through Makefiles were migrated to Python as libraries of tasks that still resided within the Leverage Reference Architecture. But soon, the need to deliver these capabilities pre-packaged in a tool instead of embedded in the infrastructure definition became apparent, and were re-implemented in the shape of built-in commands of Leverage CLI.
Currently, the core functionality needed to interact with a Leverage project is native to Leverage CLI but a system for custom tasks definition and execution heavily inspired in that of Pynt is retained.
"},{"location":"user-guide/leverage-cli/installation/#update-leverage-cli-from-previous-versions","title":"Update Leverage CLI from previous versions","text":"
Upgrade to a specific version.
$ pip3 install -Iv leverage==1.9.1\n
Upgrade to the latest stable version
$ pip3 install --upgrade leverage\n
"},{"location":"user-guide/leverage-cli/installation/#verify-your-leverage-installation","title":"Verify your Leverage installation","text":"
Verify that your Leverage installation was successful by running
$ leverage --help\nUsage: leverage [OPTIONS] COMMAND [ARGS]...\n\n Leverage Reference Architecture projects command-line tool.\n\nOptions:\n -f, --filename TEXT Name of the build file containing the tasks\n definitions. [default: build.py]\n-l, --list-tasks List available tasks to run.\n -v, --verbose Increase output verbosity.\n --version Show the version and exit.\n --help Show this message and exit.\n\nCommands:\n aws Run AWS CLI commands in a custom containerized environment.\n credentials Manage AWS cli credentials.\n kubectl Run Kubectl commands in a custom containerized environment.\n project Manage a Leverage project.\n run Perform specified task(s) and all of its dependencies.\n terraform Run Terraform commands in a custom containerized...\n tf Run Terraform commands in a custom containerized...\n tfautomv Run TFAutomv commands in a custom containerized...\n
"},{"location":"user-guide/leverage-cli/installation/#installation-in-an-isolated-environment","title":"Installation in an isolated environment","text":"
If you prefer not to install the Leverage package globally and would like to limit its influence to only the directory of your project, we recommend using tools like Pipenv or Poetry. These tools are commonly used when working with python applications and help manage common issues that may result from installing and using such applications globally.
Leverage CLI is the tool used to manage and interact with any Leverage project.
It transparently handles the most complex and error prone tasks that arise from working with a state-of-the-art infrastructure definition like our Leverage Reference Architecture. Leverage CLI uses a dockerized approach to encapsulate the tools needed to perform such tasks and to free the user from having to deal with the configuration and management of said tools.
"},{"location":"user-guide/leverage-cli/private-repositories/","title":"Private Repositories","text":""},{"location":"user-guide/leverage-cli/private-repositories/#working-with-terraform-modules-in-private-repos","title":"Working with Terraform modules in private repos","text":"
If it is the case that the layer is using a module from a private repository read the following. E.g.:
where gitlab.com:some-org/some-project/the-private-repo.git is a private repo."},{"location":"user-guide/leverage-cli/private-repositories/#ssh-accessed-repository","title":"SSH accessed repository","text":"
To source a Terraform module from a private repository in a layer via an SSH connection these considerations have to be kept in mind.
Leverage CLI will mount the host's SSH-Agent socket into the Leverage Toolbox container, this way your keys are accessed in a secure way.
So, if an SSH private reporitory has to be accessed, the corresponding keys need to be loaded to the SSH-Agent.
If the agent is automatically started and the needed keys added in the host system, it should work as it is.
These steps should be followed otherwise:
start the SSH-Agent:
$ eval \"$(ssh-agent -s)\"\n
add the keys to it
$ ssh-add ~/.ssh/<private_ssh_key_file>\n
(replace private_ssh_key_file with the desired file, the process can request the passphrase if it was set on key creation step)
"},{"location":"user-guide/leverage-cli/private-repositories/#using-the-ssh-config-file-to-specify-the-key-that-must-be-used-for-a-given-host","title":"Using the SSH config file to specify the key that must be used for a given host","text":"
The ssh-agent socket is not always available in all the OS, like in Mac. So now our leverage terraform init command copies the ssh config file (and the whole .ssh directory) into the container volume, which means any custom configuration you have there, will be used. You can read more on the ssh official documentation.
If, for example, you need to use a custom key for your private repositories on gitlab, you could add a block to your ssh config file, specifying:
When launching a Terraform shell, Leverage provides the user with a completely isolated environment tailored to operate in the current project via a Docker container.
The whole project is mounted on a directory named after the value for project_long in the global configuration file, or simply named \"project\" if this value is not defined. A project named myexample, would be mounted in /myexample.
The .gitconfig user's file is also mounted on /etc/gitconfig for convenience, while (if ssh-agent is running), the socket stated in SSH_AUTH_SOCK is mounted on /ssh-agent. Also, the credentials files (credentials and config) found in the project AWS credentials directory (~/.aws/myexample), are mapped to the locations given by the environment variables AWS_SHARED_CREDENTIALS_FILE and AWS_CONFIG_FILE respectively within the container.
Determining which credentials are needed to operate on a layer, and retrieving those credentials, may prove cumbersome for many complex layer definitions. In addition to that, correctly configuring them can also become a tedious an error prone process. For that reason Leverage automates this process upon launching the shell if requested by the user via the shell command options.
Bear in mind, that an authenticated shell session's credentials are obtained for the layer in which the session was launched. These credentials may not be valid for other layers in which different roles need to be assumed or require more permissions.
If authentication via SSO is required, the user will need to configure or login into SSO before launching the shell via
leverage terraform shell --sso\n
"},{"location":"user-guide/leverage-cli/shell/#operations-on-the-projects-layer","title":"Operations on the project's layer","text":"
In order to operate in a project's layer, Terraform commands such as plan or apply will need to receive extra parameters providing the location of the files that contain the definition of the variables required by the layer. Usually, these files are:
the project global configuration file common.tfvars
the account configuration file account.tfvars
the terraform backend configuration file backend.tfvars
In this case these parameters should take the form:
"},{"location":"user-guide/leverage-cli/extending-leverage/build.env/","title":"The build.env file","text":""},{"location":"user-guide/leverage-cli/extending-leverage/build.env/#override-defaults-via-buildenv-file","title":"Override defaults via build.env file","text":"
By utilizing the build.env capability, you can easily change some default behaviors of the CLI. In the binbash Leverage\u2122 Ref Architecture you will find the following build.env example as an example. This allows you to specify several configurations for the CLI, such as the Leverage-Toolbox-Image you want to use, ensuring that you are using the latest version or a specific version that you prefer based on your compatibility requirements. This helps you avoid compatibility issues and ensures that your infrastructure deployments go smoothly.
Customizing or extending the leverage-toolbox docker image
You can locally copy and edit the Dockerfile in order to rebuild it based on your needs, eg for a Dockerfile placed in the current working directory: $ docker build -t binbash/leverage-toolbox:1.2.7-0.1.4 --build-arg TERRAFORM_VERSION='1.2.7' . In case you like this changes to be permanent please consider creating and submitting a PR.
The leverage CLI has an environmental variable loading utility that will load all .env files with the given name in the current directory an all of its parents up to the repository root directory, and store them in a dictionary. Files are traversed from parent to child as to allow values in deeper directories to override possible previously existing values. Consider all files must bear the same name, which in our case defaults to \"build.env\". So you can have multiple build.env files that will be processed by the leverage CLI in the context of a specific layer of a Reference Architecture project. For example the /le-tf-infra-aws/apps-devstg/us-east-1/k8s-kind/k8s-resources/build.env file.
"},{"location":"user-guide/leverage-cli/extending-leverage/how-to-extend/","title":"Extending & Configuring leverage CLI","text":""},{"location":"user-guide/leverage-cli/extending-leverage/how-to-extend/#override-defaults-via-buildenv-file","title":"Override defaults via build.env file","text":"
By utilizing the build.env capability, you can easily change some default behaviors of the CLI. This allows you to specify several configurations for the CLI, such as the Leverage-Toolbox-Image that you want to use, ensuring that you are using the latest version or a specific version that you prefer based on your compatibility requirements. This helps you avoid compatibility issues and ensures that your infrastructure deployments go smoothly.
Read More about build.env
In order to further understand this mechanism and how to use it please visit the dedicated build.env entry.
Using additional .tfvars configuration files at the account level or at the global level will allow you to extend your terraform configuration entries. Consider that using multiple .tfvars configuration files allows you to keep your configuration entries well-organized. You can have separate files for different accounts or environments, making it easy to manage and maintain your infrastructure. This also makes it easier for other team members to understand and work with your configuration, reducing the risk of misconfigurations or errors.
Read More about .tfvars config files
In order to further understand this mechanism and how to use it please visit the dedicated .tfvars configs entry.
"},{"location":"user-guide/leverage-cli/extending-leverage/how-to-extend/#custom-tasks-with-buildpy","title":"Custom tasks with build.py","text":"
Leverage CLI has a native mechanism to allow customizing your workflow. With the custom tasks feature using build.py, you can write your own tasks using Python, tailoring the CLI to fit your specific workflow. This allows you to automate and streamline your infrastructure deployments, reducing the time and effort required to manage your infrastructure. You can also easily integrate other tools and services into your workflow to further improve your productivity.
Read More about build.py custom tasks
In order to further understand this mechanism and how to use it please visit the dedicated build.py custom tasks entry.
"},{"location":"user-guide/leverage-cli/extending-leverage/how-to-extend/#fork-collaborate-and-improve","title":"Fork, collaborate and improve","text":"
By forking the leverage repository on GitHub and contributing to the project, you have the opportunity to make a positive impact on the product and the community. You can fix bugs, implement new features, and contribute your ideas and feedback. This helps to ensure that the product continues to evolve and improve, serving the needs of the community and making infrastructure deployments easier for everyone.
Read More about contributing with the project
In order to further understand this mechanism and how to use it please visit the dedicated CONTRIBUTING.md entry.
The same way we needed to automate or simplify certain tasks or jobs for the user, you may need to do the same in your project.
Leverage CLI does not limit itself to provide only the core functionality required to create and manage your Leverage project, but also allows for the definition of custom tasks, at the build.py root context file, that can be used to add capabilities that are outside of Leverage CLI's scope.
By implementing new auxiliary Leverage tasks you can achieve consistency and homogeneity in the experience of the user when interacting with your Leverage project and simplify the usage of any other tool that you may require.
To check some common included tasks please see here
Tasks are simple python functions that are marked as such with the use of the @task() decorator. We call the file where all tasks are defined a 'build script', and by default it is assumed to be named build.py. If you use any other name for your build script, you can let Leverage know through the global option --filename.
from leverage import task\n\n@task()\ndef copy_file(src, dst):\n\"\"\"Copy src file to dst\"\"\"\n print(f\"Copying {src} to {dst}\")\n
The contents in the task's docstring are used to provide a short description of what's the task's purpose when listing all available tasks to run.
$ leverage --list-tasks\nTasks in build file `build.py`:\n\n copy_file Copy src file to dst\n\nPowered by Leverage 1.0.10\n
Any argument that the task may receive are to be given when running the task. The syntax for passing arguments is similar to that of Rake.
The task decorator allows for the definition of dependencies. These are defined as positional arguments in the decorator itself. Multiple dependencies can be defined for each task.
from leverage import task\n@task()\ndef html(target=\".\"):\n\"\"\"Generate HTML.\"\"\"\n print(f\"Generating HTML in directory \\\"{target}\\\"\")\n\n@task()\ndef images():\n\"\"\"Prepare images.\"\"\"\n print(\"Preparing images...\")\n\n@task(html, images)\ndef start_server(host=\"localhost\", port=\"80\"):\n\"\"\"Start the server\"\"\"\n print(f\"Starting server at {host}:{port}\")\n
We can see how the task start_server depends on both html and images. This means that both html and images will be executed before start_server and in that same order.
$ leverage run start_server\n[09:34:54.848] [ build.py - \u279c Starting task html ]\nGenerating HTML in directory \".\"\n[09:34:54.851] [ build.py - \u2714 Completed task html ]\n[09:34:54.852] [ build.py - \u279c Starting task images ]\nPreparing images...\n[09:34:54.854] [ build.py - \u2714 Completed task images ]\n[09:34:54.855] [ build.py - \u279c Starting task start_server ]\nStarting server at localhost:80\n[09:34:54.856] [ build.py - \u2714 Completed task start_server ]\n
"},{"location":"user-guide/leverage-cli/extending-leverage/tasks/#ignoring-a-task","title":"Ignoring a task","text":"
If you find yourself in the situation were there's a task that many other tasks depend on, and you need to quickly remove it from the dependency chains of all those tasks, ignoring its execution is a very simple way to achieve that end without having to remove all definitions and references across the code.
To ignore or disable a task, simply set ignore to True in the task's decorator.
from leverage import task\n\n@task()\ndef html(target=\".\"):\n\"\"\"Generate HTML.\"\"\"\n print(f\"Generating HTML in directory \\\"{target}\\\"\")\n\n@task(ignore=True)\ndef images():\n\"\"\"Prepare images.\"\"\"\n print(\"Preparing images...\")\n\n@task(html, images)\ndef start_server(server=\"localhost\", port=\"80\"):\n\"\"\"Start the server\"\"\"\n print(f\"Starting server at {server}:{port}\")\n
$ leverage run start_server\n[09:38:32.819] [ build.py - \u279c Starting task html ]\nGenerating HTML in directory \".\"\n[09:38:32.822] [ build.py - \u2714 Completed task html ]\n[09:38:32.823] [ build.py - \u2933 Ignoring task images ]\n[09:38:32.824] [ build.py - \u279c Starting task start_server ]\nStarting server at localhost:80\n[09:38:32.825] [ build.py - \u2714 Completed task start_server ]\n
When listing the available tasks any ignored task will be marked as such.
$ leverage --list-tasks\nTasks in build file `build.py`:\n\n html Generate HTML.\n images [Ignored] Prepare images.\n start_server Start the server\n\nPowered by Leverage 1.0.10\n
Sometimes you may want to define auxiliary tasks that don't need to be shown as available to run by the user. For this scenario, you can make any task into a private one. There's two ways to accomplish this, either by naming the task with an initial underscore (_) or by setting private to True in the task's decorator.
from leverage import task\n\n@task(private=True)\ndef clean():\n\"\"\"Clean build directory.\"\"\"\n print(\"Cleaning build directory...\")\n\n@task()\ndef _copy_resources():\n\"\"\"Copy resource files. This is a private task and will not be listed.\"\"\"\n print(\"Copying resource files\")\n\n@task(clean, _copy_resources)\ndef html(target=\".\"):\n\"\"\"Generate HTML.\"\"\"\n print(f\"Generating HTML in directory \\\"{target}\\\"\")\n\n@task(clean, _copy_resources, ignore=True)\ndef images():\n\"\"\"Prepare images.\"\"\"\n print(\"Preparing images...\")\n\n@task(html, images)\ndef start_server(host=\"localhost\", port=\"80\"):\n\"\"\"Start the server\"\"\"\n print(f\"Starting server at {host}:{port}\")\n
Private tasks will be executed, but not shown when tasks are listed.
$ leverage --list-tasks\nTasks in build file `build.py`:\n\n html Generate HTML.\n images Prepare images.\n start_server Start the server\n\nPowered by Leverage 1.0.10\n
If you have a task that is run much more often than the rest, it can get tedious to always pass the name of that task to the run command. Leverage allows for the definition of a default task to address this situation. Thi task is executed when no task name is given.
To define a default task, simply assign the already defined task to the special variable __DEFAULT__.
from leverage import task\n\n@task()\ndef html(target=\".\"):\n\"\"\"Generate HTML.\"\"\"\n print(f\"Generating HTML in directory \\\"{target}\\\"\")\n\n@task(ignore=True)\ndef images():\n\"\"\"Prepare images.\"\"\"\n print(\"Preparing images...\")\n\n@task(html, images)\ndef start_server(server=\"localhost\", port=\"80\"):\n\"\"\"Start the server\"\"\"\n print(f\"Starting server at {server}:{port}\")\n\n__DEFAULT__ = start_server\n
The default task is marked as such when listing all available tasks.
$ leverage --list-tasks\nTasks in build file `build.py`:\n\n html Generate HTML.\n images [Ignored] Prepare images.\n start_server [Default] Start the server\n\nPowered by Leverage 1.0.10\n
Build scripts are not only looked up in the current directory but also in all parent directories up to the root of the Leverage project. This makes it possible to launch tasks form any directory of the project as long as any parent of the current directory holds a build script.
Leverage CLI treats the directory in which the build script is found as a python package. This means that you can break up your build files into modules and simply import them into your main build script, encouraging modularity and code reuse.
Leverage CLI empowers you to create whole libraries of functionalities for your project. You can use it to better organize your tasks or implement simple auxiliary python functions.
As mentioned in the Organizing build scripts section, Leverage CLI treats the directory in which the main build script is located as a python package in order to allow importing of user defined python modules. If this directory contains a period (.) in its name, this will create issues for the importing process. This is because the period is used by python to separate subpackages from their parents.
For example, if the directory where the build script build.py is stored is named local.assets, at the time of loading the build script, python will try to locate local.build instead of locating local.assets.build and fail.
The same situation will arise from any other subdirectory in the project. When importing modules from those directories, they wont be found.
The simple solution to this is to avoid using periods when naming directories. If the build script is located in the project's root folder, this would also apply to that directory.
This tasks is aimed to help to determine the current layer dependencies.
If the current layer is getting information from remote states in different layers, then these layers have to be run before the current layer, this is called a dependency.
To run this task, cd into the desired layer and run:
leverage run layer_dependency\n
This is a sample output:
\u276f leverage run layer_dependency\n[10:37:41.817] [ build.py - \u279c Starting task _checkdir ] [10:37:41.824] [ build.py - \u2714 Completed task _checkdir ] [10:37:41.825] [ build.py - \u279c Starting task layer_dependency ] \nNote layer dependency is calculated using remote states.\nNevertheless, other sort of dependencies could exist without this kind of resources,\ne.g. if you rely on some resource created in a different layer and not referenced here.\n{\n\"security\": {\n\"remote_state_name\": \"security\",\n \"account\": \"apps-devstg\",\n \"layer\": \"security-keys\",\n \"key\": \"apps-devstg/security-keys/terraform.tfstate\",\n \"key_raw\": \"${var.environment}/security-keys/terraform.tfstate\",\n \"usage\": {\n\"used\": true,\n \"files\": [\n\"/home/jdelacamara/Dev/work/BinBash/code/le-tf-infra-aws/apps-devstg/us-east-1/ec2-fleet-ansible --/ec2_fleet.tf\"\n]\n}\n},\n \"vpc\": {\n\"remote_state_name\": \"vpc\",\n \"account\": \"apps-devstg\",\n \"layer\": \"network\",\n \"key\": \"apps-devstg/network/terraform.tfstate\",\n \"key_raw\": \"${var.environment}/network/terraform.tfstate\",\n \"usage\": {\n\"used\": true,\n \"files\": [\n\"/home/jdelacamara/Dev/work/BinBash/code/le-tf-infra-aws/apps-devstg/us-east-1/ec2-fleet-ansible --/locals.tf\",\n \"/home/jdelacamara/Dev/work/BinBash/code/le-tf-infra-aws/apps-devstg/us-east-1/ec2-fleet-ansible --/ec2_fleet.tf\"\n]\n}\n},\n \"vpc-shared\": {\n\"remote_state_name\": \"vpc-shared\",\n \"account\": \"shared\",\n \"layer\": \"network\",\n \"key\": \"shared/network/terraform.tfstate\",\n \"key_raw\": \"shared/network/terraform.tfstate\",\n \"usage\": {\n\"used\": true,\n \"files\": [\n\"/home/jdelacamara/Dev/work/BinBash/code/le-tf-infra-aws/apps-devstg/us-east-1/ec2-fleet-ansible --/ec2_fleet.tf\"\n]\n}\n}\n}\n[10:37:41.943] [ build.py - \u2714 Completed task layer_dependency ]\n
Data:
\"remote_state_name\": the remote state name
\"account\": the account the remote state belongs to
\"layer\": the referenced layer
\"key\": the key name (i.e. the tfstate file name for the remote state)
\"key_raw\": the same as key but with variables not resolved
\"usage\": if this remote state is used and in what files
For a shorter version:
\u276f leverage run layer_dependency\\['summary=True'\\]\n[10:47:00.461] [ build.py - \u279c Starting task _checkdir ] [10:47:00.467] [ build.py - \u2714 Completed task _checkdir ] [ build.py - \u279c Starting task layer_dependency ] \nNote layer dependency is calculated using remote states.\nNevertheless, other sort of dependencies could exist without this kind of resources,\ne.g. if you rely on some resource created in a different layer and not referenced here.\n{\n\"this\": [\n\"apps-devstg/security-keys/terraform.tfstate\",\n \"apps-devstg/network/terraform.tfstate\",\n \"shared/network/terraform.tfstate\"\n]\n}\n[10:47:00.489] [ build.py - \u2714 Completed task layer_dependency ]
If you already have a binbash Leverage project created, you can download this file into your project root dir and add this import to your build.py:
The aws command is a wrapper for a containerized installation of AWS CLI 2.0. All commands are passed directly to the AWS CLI and you should expect the same behavior from all of them, except for the few exceptions listed below.
Extracts information from the project's Terraform configuration to generate the required profiles for AWS CLI to handle SSO.
In the process, you will need to log in via your identity provider. To allow you to do this, Leverage will attempt to open the login page in the system's default browser.
It wraps aws sso logout taking extra steps to make sure that all tokens and temporary credentials are wiped from the system. It also reminds the user to log out form the AWS SSO login page and identity provider portal. This last action is left to the user to perform.
Important
Please keep in mind that this command will not only remove temporary credentials but also the AWS config file. If you use such file to store your own configuration please create a backup before running the sso logout command.
The credentials command is used to set up and manage the AWS CLI credentials required to interact with the AWS environment.
All credentials's subcommands feed off the project.yaml, build.env, and Terraform configuration files to obtain the information they need. In case the basic required information is not found, the subcommands will prompt the user for it.
The credentials configure command sets up the credentials needed to interact with the AWS environment, from the initial deployment process (BOOTSTRAP) to everyday management (MANAGEMENT) and development or use (SECURITY) of it.
It attempts to retrieve the structure of the organization in order to generate all the AWS CLI profiles required to interact with the environment and update the terraform configuration with the id of all relevant accounts.
Backups of the previous configured credentials files are always created when overwriting or updating the current ones.
--type: Type of the credentials to set. Can be any of BOOTSTRAP, MANAGEMENT or SECURITY. This option is case insensitive. This option is required.
--credentials-file: Path to a .csv credentials file, as produced by the AWS Console, containing the user's programmatic access keys. If not given, the user will be prompted for the credentials.
--fetch-mfa-device: Retrieve an MFA device serial from AWS for the current user.
--overwrite-existing-credentials: If the type of credentials being configured is already configured, overwrite current configuration. Mutually exclusive option with --skip-access-keys-setup.
--skip-access-keys-setup: Skip the access keys configuration step. Continue on to setting up the accounts profiles. Mutually exclusive option with --overwrite-existing-credentials.
--skip-assumable-roles-setup: Don't configure each account profile to assume their specific role.
If neither of --overwrite-existing-credentials or --skip-access-keys-setup is given, the user will be prompted to choose between both actions when appropriate.
To have this feature available, Leverage Toolbox versions 1.2.7-0.1.7 and up, or 1.3.5-0.1.7 and up must be used.
The kubectl command is a wrapper for a containerized installation of kubectl. It provides the kubectl executable with specific configuration values required by Leverage.
It transparently handles authentication, whether it is Multi-Factor or via Single Sign-On, on behalf of the user in the commands that require it. SSO Authentication takes precedence over MFA when both are active.
The sub-commands can only be run at layer level and will not run anywhere else in the project. The sub-command configure can only be run at an EKS cluster layer level. Usually called cluster.
The command can also be invoked via its shortened version kc.
Configuring on first use
To start using this command, you must first run leverage kubectl configure on a cluster layer,
to set up the credentials on the proper config file.
The project init subcommand initializes a Leverage project in the current directory. If not found, it also initializes the global config directory for Leverage CLI ~/.leverage/, and fetches the template for the projects' creation.
It then proceeds to drop a template file for the project configuration called project.yaml and initializes a git repository in the directory.
The project create subcommand creates the files structure for the architecture in the current directory and configures it based on the values set in the project.yaml file.
It will then proceed to make sure all files follow the standard Terraform code style.
An arbitrary number of tasks can be given to the command. All tasks given must be in the form of the task name optionally followed by arguments that the task may require enclosed in square brackets, i.e. TASK_NAME[TASK_ARGUMENTS]. The execution respects the order in which they were provided.
If no tasks are given, the default task will be executed. In case no default task is defined, the command will list all available tasks to run.
Example:
leverage run task1 task2[arg1,arg2] task3[arg1,kwarg1=val1,kwarg2=val2]\n
task1 is invoked with no arguments, which is equivalent to task1[]
task2 receives two positional arguments arg1 and arg2
task3 receives one positional argument arg1 and two keyworded arguments kwarg1 with value val1 and kwarg2 with value val2
Run a shell in a generic container. It supports mounting local paths and injecting arbitrary environment variables. It also supports AWS credentials injection via mfa/sso.
>> leverage shell --help\n\nUsage: leverage shell [OPTIONS]\n\nRun a shell in a generic container. It supports mounting local paths and\n injecting arbitrary environment variables. It also supports AWS credentials\n injection via mfa/sso.\n\n Syntax: leverage shell --mount <local-path> <container-path> --env-var <name> <value>\n Example: leverage shell --mount /home/user/bin/ /usr/bin/ --env-var env dev\n\n Both mount and env-var parameters can be provided multiple times.\n Example: leverage shell --mount /home/user/bin/ /usr/bin/ --mount /etc/config.ini /etc/config.ini --env-var init 5 --env-var env dev\n\nOptions:\n --mount <TEXT TEXT>...\n --env-var <TEXT TEXT>...\n --mfa Enable Multi Factor Authentication upon launching shell.\n --sso Enable SSO Authentication upon launching shell.\n --help Show this message and exit.\n
The terraform command is a wrapper for a containerized installation of Terraform. It provides the Terraform executable with specific configuration values required by Leverage.
It transparently manages authentication, either Multi-Factor or Single Sign-On, on behalf of the user on commands that require it. SSO authentication takes precedence over MFA when both are active.
Some commands can only be run at layer level and will not run anywhere else in the project.
The command can also be invoked via its shortened version tf.
Since version 1.12, all the subcommands supports --mount and --env-var parameters in form of tuples:
leverage terraform --mount /home/user/bin/ /usr/bin/ --env-var FOO BAR apply\n
You can also provide them multiple times:
leverage terraform --mount /usr/bin/ /usr/bin/ --mount /etc/config /config --env-var FOO BAR --env-var TEST OK init\n
--layers: Applies command to layers listed in this option. (see more info here)
Regarding S3 backend keys
If the S3 backend block is set, and no key was defined, Leverage CLI will try to create a new one autoatically and store it in the config.tf file. It will be based on the layer path relative to the account.
Check the Terraform backend configuration in the code definition.
When you are setting up the backend layer for the very first time, the S3 bucket does not yet exist. When running validations, Leverage CLI will detect that the S3 Key does not exist or cannot be generated. Therefore, it is necessary to first create the S3 bucket by using the init --skip-validation flag in the initialization process, and then move the \"tfstate\" file to it.
Import the resource with the given ID into the Terraform state at the given ADDRESS.
Can only be run at layer level.
zsh globbing
Zsh users may need to prepend noglob to the import command for it to be recognized correctly, as an alternative, square brackets can be escaped as \\[\\]
For using this feature Leverage Toolbox versions 1.2.7-0.0.5 and up, or 1.3.5-0.0.1 and up must be used.
The tfautomv command is a wrapper for a containerized installation of tfautomv. It provides the tfautomv executable with specific configuration values required by Leverage.
It transparently handles authentication, whether it is Multi-Factor or via Single Sign-On, on behalf of the user in the commands that require it. SSO Authentication takes precedence over MFA when both are active.
This command can only be run at layer level and will not run anywhere else in the project.
This parameter can be used with the following Leverage CLI Terraform commands:
init
plan
apply
output
destroy
Value:
Parameter Type Description --layers string A comma serparated list of layer's relative paths"},{"location":"user-guide/leverage-cli/reference/terraform/layers/#common-workflow","title":"Common workflow","text":"
When using the --layers parameter, these commands should be run from account or layers-container-directory directories.
...any of the aforementioned commands, combined with --layers, can be called from /home/user/project/management/, /home/user/project/management/global/ or /home/user/project/management/us-east-1/.
The value for this parameter is a comma separated list of layer's relative paths.
Leverage CLI will iterate through the layer's relative paths, going into each one, executing the command and going back to the original directory.
Example:
For this command, from /home/user/project/management/:
leverage tf plan --layers us-east-1/terraform-backend,global/security-base\n
...the Leverage CLI will:
check each one of the layer's relative paths exists
check each one of the layer's relative paths exists
go into us-east-1/terraform-backend directory
run the validate-layout command
go back to /home/user/project/management/
go into global/security-base directory
run the validate-layout command
go back to /home/user/project/management/
go into us-east-1/terraform-backend directory
run the init command
go back to /home/user/project/management/
go into global/security-base directory
run the init command
go back to /home/user/project/management/
This is done this way to prevent truncated executions. Meaning, if any of the validation fails, the user will be able to fix whatever has to be fixed and run the command again as it is.
Skipping the validation
The --skip-validation flag still can be used here with --layers.
"},{"location":"user-guide/leverage-cli/reference/terraform/layers/#terraform-parameters-and-flags","title":"Terraform parameters and flags","text":"
Terraform parameters and flags can still be passed when using the --layers parameter.
Config files can be found under each config folders
Global config file /config/common.tfvars contains global context TF variables that we inject to TF commands which are used by all sub-directories such as leverage terraform plan or leverage terraform apply and which cannot be stored in backend.tfvars due to TF.
Account config files
backend.tfvars contains TF variables that are mainly used to configure TF backend but since profile and region are defined there, we also use them to inject those values into other TF commands.
account.tfvars contains TF variables that are specific to an AWS account.
Global common-variables.tf file /config/common-variables.tfvars contains global context TF variables that we symlink to all terraform layers code e.g. shared/us-east-1/tools-vpn-server/common-variables.tf.
build.env file
By utilizing the build.env capability, you can easily change some default behaviors of the CLI. Read more in its dedicated \"Override defaults via build.env file\" section.
"},{"location":"user-guide/ref-architecture-aws/configuration/#setting-credentials-for-terraform-via-aws-profiles","title":"Setting credentials for Terraform via AWS profiles","text":"
File backend.tfvars will inject the profile name that TF will use to make changes on AWS.
Such profile is usually one that relies on another profile to assume a role to get access to each corresponding account.
Please read the credentials section to understand the alternatives supported by Leverage to authenticate with AWS.
Read the following page leverage doc to understand how to set up a profile to assume a role
Currently the following two methods are supported:
AWS IAM: this is essentially using on-disk, permanent programmatic credentials that are tied to a given IAM User. This method can optionally support MFA which is highly recommended since using permanent credentials is discouraged, so at least with MFA you can counter-balance that. Keep reading...
AWS IAM Identity Center (formerly known as AWS SSO): this one is more recent and it's the method recommeded by AWS since it uses roles (managed by AWS) which in turn enforce the usage of temporary credentials. Keep reading...
The following block provides a brief explanation of the chosen files/folders layout, under every account (management, shared, security, etc) folder you will see a service layer structure similar to the following:
Configuration files are organized by environments (e.g. dev, stg, prd), and service type, which we call layers (identities, organizations, storage, etc) to keep any changes made to them separate. Within each of those layers folders you should find the Terraform files that are used to define all the resources that belong to such account environment and specific layer.
Project file structure
An extended project file structure could be found here While some other basic concepts and naming conventions in the context of Leverage like \"project\" and \"layer\" here
NOTE: As a convention folders with the -- suffix reflect that the resources are not currently created in AWS, basically they've been destroyed or not yet exist.
Such layer separation is meant to avoid situations in which a single folder contains a lot of resources. That is important to avoid because at some point, running leverage terraform plan / apply starts taking too long and that becomes a problem.
This organization also provides a layout that is easier to navigate and discover. You simply start with the accounts at the top level and then you get to explore the resource categories within each account.
The AWS Reference Architecture was created on a set of opinionated definitions and conventions on:
how to organize files/folders,
where to store configuration files,
how to handle credentials,
how to set up and manage state,
which commands and workflows to run in order to perform different tasks,
and more.
Key Concept
Although the Reference Architecture for AWS was initially designed to be compatible with web, mobile and microservices application stacks, it can also accommodate other types of workloads such as machine learning, blockchain, media, and more.
It was designed with modularity in mind. A multi-accounts approach is leveraged in order to improve security isolation and resources separation. Furthermore each account infrastructure is divided in smaller units that we call layers. Each layer contains all the required resources and definitions for a specific service or feature to function.
Key Concept
The design is strongly based on the AWS Well Architected Framework.
Each individual configuration of the Reference Architecture is referred to as a project. A Leverage project is comprised of all the relevant accounts and layers.
Better code quality and modules maturity (proven and tested).
Supported by binbash, and public modules even by 1000's of top talented Open Source community contributors.
Increase development cost savings.
Clients keep full rights to all commercial, modification, distribution, and private use of the code (No Lock-In) through forks inside their own projects' repositories (open-source and commercially reusable via license MIT and Apache 2.0.
"},{"location":"user-guide/ref-architecture-aws/overview/#a-more-visual-example","title":"A More Visual Example","text":"
The following diagram shows the type of AWS multi-account setup you can achieve by using this Reference Architecture:
The following are official AWS documentations, blog posts and whitepapers we have considered while building our Reference Solutions Architecture:
CloudTrail for AWS Organizations
Reserved Instances - Multi Account
AWS Multiple Account Security Strategy
AWS Multiple Account Billing Strategy
AWS Secure Account Setup
Authentication and Access Control for AWS Organizations
AWS Regions
VPC Peering
Route53 DNS VPC Associations
AWS Well Architected Framework
AWS Tagging strategies
Inviting an AWS Account to Join Your Organization
"},{"location":"user-guide/ref-architecture-aws/tf-state/","title":"Terraform - S3 & DynamoDB for Remote State Storage & Locking","text":""},{"location":"user-guide/ref-architecture-aws/tf-state/#overview","title":"Overview","text":"
Use this terraform configuration files to create the S3 bucket & DynamoDB table needed to use Terraform Remote State Storage & Locking.
What is the Terraform Remote State?
Read the official definition by Hashicorp.
Figure: Terraform remote state store & locking necessary AWS S3 bucket and DynamoDB table components. (Source: binbash Leverage, \"Terraform Module: Terraform Backend\", Terraform modules registry, accessed December 3rd 2020)."},{"location":"user-guide/ref-architecture-aws/tf-state/#prerequisites","title":"Prerequisites","text":"
Terraform repo structure + state backend initialization
Ensure you have Leverage CLI installed in your system
Refer to Configuration Pre-requisites to understand how to set up the configuration files required for this layer. Where you must build your Terraform Reference Architecture account structure
Leveraged by the Infrastructure as Code (IaC) Library through the terraform-aws-tfstate-backend module
At the corresponding account dir, eg: /shared/base-tf-backend then,
Run leverage terraform init --skip-validation
Run leverage terraform plan, review the output to understand the expected changes
Run leverage terraform apply, review the output once more and type yes if you are okay with that
This should create a terraform.tfstate file in this directory but we don't want to push that to the repository so let's push the state to the backend we just created
In the base-tf-backend folder you should find the definition of the infrastructure that needs to be deployed before you can get to work with anything else.
IMPORTANT: THIS IS ONLY NEEDED IF THE BACKEND WAS NOT CREATED YET. IF THE BACKEND ALREADY EXISTS YOU JUST USE IT.
The sequence of commands that you run to operate on each layer is called the Terraform workflow. In other words, it's what you would typically run in order to create, update, or delete the resources defined in a given layer.
Now, the extended workflow is annotated with more explanations and it is intended for users who haven't yet worked with Leverage on a daily basis:
Terraform Workflow
Make sure you understood the basic concepts:
Overview
Configuration
Directory Structure
Remote State
Make sure you installed the Leverage CLI.
Go to the layer (directory) you need to work with, e.g. shared/global/base-identities/.
Run leverage terraform init -- only the first time you work on this layer, or if you upgraded modules or providers versions, or if you made changes to the Terraform remote backend configuration.
Make any changes you need to make. For instance: modify a resource definition, add an output, add a new resource, etc.
Run leverage terraform plan to preview any changes.
Run leverage terraform apply to give it a final review and to apply any changes.
Tip
You can use the --layers argument to run Terraform commands on more than one layer. For more information see here
Note
If desired, at step #5 you could submit a PR, allowing you and the rest of the team to understand and review what changes would be made to your AWS Cloud Architecture components before executing leverage terraform apply (terraform apply). This brings the huge benefit of treating changes with a GitOps oriented approach, basically as we should treat any other code & infrastructure change, and integrate it with the rest of our tools and practices like CI/CD, in
"},{"location":"user-guide/ref-architecture-aws/workflow/#running-in-automation","title":"Running in Automation","text":"Figure: Running terraform with AWS in automation (just as reference).
Amazon CloudFront is a fast content delivery network (CDN) service that securely delivers data, videos, applications, and APIs to customers globally with low latency, high transfer speeds, all within a developer-friendly environment. CloudFront is integrated with AWS \u2013 both physical locations that are directly connected to the AWS global infrastructure, as well as other AWS services. CloudFront works seamlessly with services including AWS Shield for DDoS mitigation, Amazon S3, Elastic Load Balancing, API Gateway or Amazon EC2 as origins for your applications, and Lambda@Edge to run custom code closer to customers\u2019 users and to customize the user experience. Lastly, if you use AWS origins such as Amazon S3, Amazon EC2 or Elastic Load Balancing, you don\u2019t pay for any data transferred between these services and CloudFront.
"},{"location":"user-guide/ref-architecture-aws/features/cdn/cdn/#load-balancer-alb-nlb-s3-cloudfront-origins","title":"Load Balancer (ALB | NLB) & S3 Cloudfront Origins","text":"Figure: AWS CloudFront with ELB and S3 as origin diagram. (Source: Lee Atkinson, \"How to Help Achieve Mobile App Transport Security (ATS) Compliance by Using Amazon CloudFront and AWS Certificate Manager\", AWS Security Blog, accessed November 17th 2020)."},{"location":"user-guide/ref-architecture-aws/features/cdn/cdn/#api-gateway-cloudfront-origins","title":"API Gateway Cloudfront Origins","text":"Figure: AWS CloudFront with API Gateway as origin diagram. (Source: AWS, \"AWS Solutions Library, AWS Solutions Implementations Serverless Image Handler\", AWS Solutions Library Solutions Implementations, accessed November 17th 2020)."},{"location":"user-guide/ref-architecture-aws/features/ci-cd/argocd/","title":"ArgoCD","text":""},{"location":"user-guide/ref-architecture-aws/features/ci-cd/argocd/#argocd","title":"ArgoCD","text":""},{"location":"user-guide/ref-architecture-aws/features/ci-cd/argocd/#aws-apps-services-k8s-eks-accounts-diagram","title":"AWS Apps & Services K8s EKS accounts diagram","text":"
The below diagram is based on our binbash Leverage Reference Architecture CI-CD official documentation
Figure: K8S reference architecture CI/CD with ArgoCD diagram. (Source: binbash Leverage Confluence Doc, \"Implementation Diagrams\", binbash Leverage Doc, accessed August 4th 2021)."},{"location":"user-guide/ref-architecture-aws/features/ci-cd/jenkins-argocd/","title":"CI/CD","text":""},{"location":"user-guide/ref-architecture-aws/features/ci-cd/jenkins-argocd/#jenkins-argocd","title":"Jenkins + ArgoCD","text":"Figure: ACI/CD with Jenkins + ArgoCD architecture diagram. (Source: ArgoCD, \"Overview - What Is Argo CD\", ArgoCD documentation, accessed November 18th 2020)."},{"location":"user-guide/ref-architecture-aws/features/ci-cd/jenkins-spinnaker/","title":"CI/CD","text":""},{"location":"user-guide/ref-architecture-aws/features/ci-cd/jenkins-spinnaker/#jenkins-spinnaker","title":"Jenkins + Spinnaker","text":"Figure: CI/CD with Jenkins + Spinnaker diagram. (Source: Irshad Buchh, \"Continuous Delivery using Spinnaker on Amazon EKS\", AWS Open Source Blog, accessed November 18th 2020)."},{"location":"user-guide/ref-architecture-aws/features/compute/k8s-eks/","title":"AWS Elastic Kubernetes Service (EKS)","text":"
Important
Please check the Reference Architecture for EKS to learn more details about this.
Kops is an official Kubernetes project for managing production-grade Kubernetes clusters. Kops is currently the best tool to deploy Kubernetes clusters to Amazon Web Services. The project describes itself as kubectl for clusters.
Core Features
Open-source & supports AWS and GCE
Deploy clusters to existing virtual private clouds (VPC) or create a new VPC from scratch
Supports public & private topologies
Provisions single or multiple master clusters
Configurable bastion machines for SSH access to individual cluster nodes
Built on a state-sync model for dry-runs and automatic idempotency
Direct infrastructure manipulation, or works with CloudFormation and Terraform
Rolling cluster updates
Supports heterogeneous clusters by creating multiple instance groups
Figure: AWS K8s Kops architecture diagram (just as reference). (Source: Carlos Rodriguez, \"How to deploy a Kubernetes cluster on AWS with Terraform & kops\", Nclouds.com Blog post, accessed November 18th 2020)."},{"location":"user-guide/ref-architecture-aws/features/compute/k8s-kops/#kops-pre-requisites","title":"Kops Pre-requisites","text":"
Important consideration
K8s clusters provisioned by Kops have a number of resources that need to be available before the cluster is created. These are Kops pre-requisites and they are defined in the 1-prerequisites directory which includes all Terraform files used to create/modify these resources.
The current code has been fully tested with the AWS VPC Network Module
NOTE1: Regarding Terraform versions please also consider https://github.com/binbashar/bb-devops-tf-aws-kops#todo
NOTE2: These dependencies will me mostly covered via Makefile w/ terraform dockerized cmds (https://hub.docker.com/repository/docker/binbash/terraform-awscli)
"},{"location":"user-guide/ref-architecture-aws/features/compute/k8s-kops/#resulting-solutions-architecture","title":"Resulting Solutions Architecture","text":"Figure: AWS K8s Kops architecture diagram (just as reference)."},{"location":"user-guide/ref-architecture-aws/features/compute/k8s-kops/#why-this-workflow","title":"Why this workflow","text":"
The workflow follows the same approach that is used to manage other terraform resources in your AWS accounts. E.g. network, identities, and so on.
So we'll use existing AWS resources to create a cluster-template.yaml containing all the resource IDs that Kops needs to create a Kubernetes cluster.
Why not directly use Kops CLI to create the K8s cluster as well as the VPC and its other dependencies?
While this is a valid approach, we want to manage all these building blocks independently and be able to fully customize any AWS component without having to alter our Kubernetes cluster definitions and vice-versa.
This is a fully declarative coding style approach to manage your infrastructure so being able to declare the state of our cluster in YAML files fits 100% as code & GitOps based approach.
The 2-kops directory includes helper scripts and Terraform files in order to template our Kubernetes cluster definition. The idea is to use our Terraform outputs from 1-prerequisites to construct a cluster definition.
Cluster Management via Kops is typically carried out through the kops CLI. In this case, we use a 2-kops directory that contains a Makefile, Terraform files and other helper scripts that reinforce the workflow we use to create/update/delete the cluster.
This workflow is a little different to the typical Terraform workflows we use. The full workflow goes as follows:
Cluster: Creation & Update
Modify files under 1-prerequisites
Main files to update probably are locals.tf and outputs.tf
Mostly before the cluster is created but could be needed afterward
Modify cluster-template.yml under 2-kops folder
E.g. to add or remove instance groups, upgrade k8s version, etc
At 2-kops/ context run make cluster-update will follow the steps below
Get Terraform outputs from 1-prerequisites
Generate a Kops cluster manifest -- it uses cluster-template.yml as a template and the outputs from the point above as replacement values
Update Kops state -- it uses the generated Kops cluster manifest in previous point (cluster.yml)
Generate Kops Terraform file (kubernetes.tf) -- this file represents the changes that Kops needs to apply on the cloud provider.
Run make plan
To preview any infrastructure changes that Terraform will make.
If desired we could submit a PR, allowing you and the rest of the team to understand and review what changes would be made to the Kubernetes cluster before executing make apply (terraform apply). This brings the huge benefit of treating changes to our Kubernetes clusters with a GitOps oriented approach, basically like we treat any other code & infrastructure change, and integrate it with the rest of our tools and practices like CI/CD, integration testing, replicate environments and so on.
Run make apply
To apply those infrastructure changes on AWS.
Run make cluster-rolling-update
To determine if Kops needs to trigger some changes to happen right now (dry run)
These are usually changes to the EC2 instances that won't get reflected as they depend on the autoscaling
Run make cluster-rolling-update-yes
To actually make any changes to the cluster masters/nodes happen
Cluster: Deletion
To clean-up any resources created for your K8s cluster, you should run:
At 2-kops folder context run make destroy
This will execute a terraform destroy of all the kubernets.tf declared AWS resources.
At 2-kops folder context run cluster-destroy
Will run Kops destroy cluster -- only dry run, no changes will be applied
Exec cluster-destroy-yes
Kops will effectively destroy all the remaining cluster resources.
Finally if at 1-prerequisites exec make destroy
This will remove Kops state S3 bucket + any other extra resources you've provisioned for your cluster.
The workflow may look complicated at first but generally it boils down to these simplified steps: 1. Modify cluster-template.yml 2. Run make cluster-update 3. Run make apply 4. Run make cluster-rolling-update-yes
What about persistent and stateful K8s resources?
This approach will work better the more stateless your Kubernetes workloads are. Treating Kubernetes clusters as ephemeral and replaceable infrastructure requires to consider not to use persistent volumes or the drawback of difficulties when running workloads such as databases on K8s. We feel pretty confident that we can recreate our workloads by applying each of our service definitions, charts and manifests to a given Kubernetes cluster as long as we keep the persistent storage separately on AWS RDS, DynamoDB, EFS and so on. In terms of the etcd state persistency, Kops already provisions the etcd volumes (AWS EBS) independently to the master instances they get attached to. This helps to persist the etcd state after rolling update your master nodes without any user intervention. Moreover simplifying volume backups via EBS Snapshots (consider https://github.com/binbashar/terraform-aws-backup-by-tags). We also use a very valuable backup tool named Velero (formerly Heptio Ark - https://github.com/vmware-tanzu/velero) to o back up and restore our Kubernetes cluster resources and persistent volumes.
TODO
IMPORTANT: Kops terraform output (kops update cluster --target terraform) is still generated for Terraform 0.11.x (https://github.com/kubernetes/kops/issues/7052) we'll take care of the migration when tf-0.12 gets fully supported.
Create a binbash Leverage public Confluence Wiki entry detailing some more info about etcd, calico and k8s versions compatibilities
Ultra light, ultra simple, ultra powerful. Linkerd adds security, observability, and reliability to Kubernetes, without the complexity. CNCF-hosted and 100% open source.
"},{"location":"user-guide/ref-architecture-aws/features/compute/k8s-service-mesh/#how-it-works","title":"How it works","text":"
How Linkerd works
Linkerd works by installing a set of ultralight, transparent proxies next to each service instance. These proxies automatically handle all traffic to and from the service. Because they\u2019re transparent, these proxies act as highly instrumented out-of-process network stacks, sending telemetry to, and receiving control signals from, the control plane. This design allows Linkerd to measure and manipulate traffic to and from your service without introducing excessive latency.
"},{"location":"user-guide/ref-architecture-aws/features/compute/k8s-service-mesh/#architecture","title":"Architecture","text":"Figure: Figure: Linkerd v2.10 architecture diagram. (Source: Linkerd official documentation, \"High level Linkerd control plane and a data plane.\", Linkerd Doc, accessed June 14th 2021)."},{"location":"user-guide/ref-architecture-aws/features/compute/k8s-service-mesh/#dashboard","title":"Dashboard","text":"Figure: Figure: Linkerd v2.10 dashboard. (Source: Linkerd official documentation, \"Linkerd dashboard\", Linkerd Doc, accessed June 14th 2021)."},{"location":"user-guide/ref-architecture-aws/features/compute/k8s-service-mesh/#read-more","title":"Read more","text":"
Related resources
Linkerd vs Istio benchmarks
"},{"location":"user-guide/ref-architecture-aws/features/compute/overview/","title":"Compute","text":""},{"location":"user-guide/ref-architecture-aws/features/compute/overview/#containers-and-serverless","title":"Containers and Serverless","text":"
Overview
In order to serve Client application workloads we propose to implement Kubernetes, and proceed to containerize all application stacks whenever it\u2019s the best solution (we\u2019ll also consider AWS Lambda for a Serverless approach when it fits better). Kubernetes is an open source container orchestration platform that eases the process of running containers across many different machines, scaling up or down by adding or removing containers when demand changes and provides high availability features. Also, it serves as an abstraction layer that will give Client the possibility, with minimal effort, to move the apps to other Kubernetes clusters running elsewhere, or a managed Kubernetes service such as AWS EKS, GCP GKE or others.
Clusters will be provisioned with Kops and/or AWS EKS, which are solutions meant to orchestrate this compute engine in AWS. Whenever possible the initial version deployed will be the latest stable release.
Figure: Kubernetes high level components architecture. (Source: Andrew Martin, \"11 Ways (Not) to Get Hacked\", Kubernetes.io Blog post, accessed November 18th 2020)."},{"location":"user-guide/ref-architecture-aws/features/compute/overview/#kubernetes-addons","title":"Kubernetes addons","text":"
Serverless is the native architecture of the cloud that enables you to shift more of your operational responsibilities to AWS, increasing your agility and innovation. Serverless allows you to build and run applications and services without thinking about servers. It eliminates infrastructure management tasks such as server or cluster provisioning, patching, operating system maintenance, and capacity provisioning. You can build them for nearly any type of application or backend service, and everything required to run and scale your application with high availability is handled for you.
Why use serverless?
Serverless enables you to build modern applications with increased agility and lower total cost of ownership. Building serverless applications means that your developers can focus on their core product instead of worrying about managing and operating servers or runtimes, either in the cloud or on-premises. This reduced overhead lets developers reclaim time and energy that can be spent on developing great products which scale and that are reliable.
Figure: AWS serverless architecture diagram (just as reference). (Source: Nathan Peck, \"Designing a modern serverless application with AWS Lambda and AWS Fargate\", Containers-on-AWS Medium Blog post, accessed November 18th 2020).
Serverless Compute Services
AWS Lambda lets you run code without provisioning or managing servers. You pay only for the compute time you consume - there is no charge when your code is not running.
Lambda@Edge allows you to run Lambda functions at AWS Edge locations in response to Amazon CloudFront events.
AWS Fargate is a purpose-built serverless compute engine for containers. Fargate scales and manages the infrastructure required to run your containers.
Apart from the EC2 instances that are part of Kubernetes, there are going to be other instances running tools for monitoring, logging centralization, builds/tests, deployment, among others. that are to be defined at this point. Some of them can be replaced by managed services, like: CircleCI, Snyk, etc, and this can have cons and pros that will need to be considered at the time of implementation. Any OS that is provisioned will be completely reproducible as code, in the event of migration to another vendor.
"},{"location":"user-guide/ref-architecture-aws/features/costs/costs/","title":"Cost Estimation & Optimization","text":""},{"location":"user-guide/ref-architecture-aws/features/costs/costs/#opportunity-to-optimize-resources","title":"Opportunity to optimize resources","text":"
Compute
Usage of reserved EC2 instances for stable workloads (AWS Cost Explorer Reserved Optimization | Compute Optimizer - get a -$ of up to 42% vs On-Demand)
Usage of Spot EC2 instances for fault-tolerant workloads (-$ by up to 90%).
Use ASG to allow your EC2 fleet to \u00b1 based on demand.
Id EC2 w/ low-utiliz and -$ by stop / rightsize them.
Compute Savings Plans to reduce EC2, Fargate and Lambda $ (Compute Savings Plans OK regardless of EC2 family, size, AZ, reg, OS or tenancy, OK for Fargate / Lambda too).
Databases
Usage of reserved RDS instances for stable workload databases.
Monitoring & Automation
AWS billing alarms + AWS Budget (forecasted account cost / RI Coverage) Notifications to Slack
Activate AWS Trusted Advisor cost related results
Id EBS w/ low-utiliz and -$ by snapshotting and then rm them
Check underutilized EBS to be possibly shrunk or removed.
Networking -> deleting idle LB -> Use LB check w/ RequestCount of > 100 past 7d.
Setup Lambda nuke to automatically clean up AWS account resources.
Setup lambda scheduler for stop and start resources on AWS (EC2, ASG & RDS)
Storage & Network Traffic
Check S3 usage and -$ by leveraging lower $ storage tiers.
Use S3 Analytics, or automate mv for these objects into lower $ storage tier w/ Life Cycle Policies or w/ S3 Intelligent-Tiering.
If DataTransferOut from EC2 to the public internet is significant $, consider implementing CloudFront.
Stable workloads will always run on reserved instances, the following calculation only considers 1yr. No Upfront mode, in which Client will not have to pay in advance but commits to this monthly usage and will be billed so, even if the instance type is not used. More aggressive Reservation strategies can be implemented to further reduce costs, these will have to be analyzed by business in conjunction with operations.
Will implement AWS RDS databases matching the requirements of the current application stacks. If the region selected is the same you're actually using for your legacy AWS RDS instances we will be able to create a peering connection to existing databases in order to migrate the application stacks first, then databases.
"},{"location":"user-guide/ref-architecture-aws/features/identities/credentials-vault/","title":"Hashicorp Vault credentials","text":""},{"location":"user-guide/ref-architecture-aws/features/identities/credentials-vault/#hashicorp-vault-private-api-endpoint","title":"Hashicorp Vault private API endpoint","text":"
If you are on HCP, you can get this from the Admin UI. Otherwise, it will depend on how you set up DNS, TLS and port settings for your self-hosted installation. We always favours a private endpoint deployment only accessible from the VPN.
"},{"location":"user-guide/ref-architecture-aws/features/identities/credentials-vault/#hcp-vault-private-api-endpoint","title":"HCP Vault private API endpoint","text":"
We'll need to setup this Vault auth token in our [/config/common.config] file whenever we run the Terraform Leverage Reference architecture for:
le-tf-infra-aws
le-tf-vault
Vault token generation and authentication
Vault token that will be used by Terraform, or vault cli to perform calls to Vault API. During the initial setup, you will have to use a root token. If you are using a self-hosted installation you will get such token after you initialize Vault; if you are using Hashicorp Cloud Platform you can get the token from HCP Admin UI.
After the initial setup, and since we recommend integrating Vault to Github for authentication, you will have to follow these steps:
Generate a GitHub Personal Access Token: https://github.com/settings/tokens
Click \u201cGenerate new token\u201c
Under scopes, only select \"read:org\", under \"admin:org\"
"},{"location":"user-guide/ref-architecture-aws/features/identities/credentials-vault/#get-vault-token-from-your-gh-auth-token","title":"Get vault token from your GH auth token","text":"
Run vault cli v\u00eda docker: docker run -it vault:1.7.2 sh
Vault ENV vars setup ( NOTE: this will change a little bit between AWS self-hosted vs HCP vault deployment)
\u256d\u2500 \uf179 \ue0b1 \uf015 ~ \ue0b0\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\ue0b2 \u2714 \ue0b3 at 14:21:27 \uf017\n\u2570\u2500 docker run -it vault:1.7.2 sh\n/ # export VAULT_ADDR=\"https://bb-le-shared-vault-cluster.private.vault.xxxxxxx.a\nws.hashicorp.cloud:8200\"; export VAULT_NAMESPACE=\"admin\"\n\n/ # vault login -method=github\nGitHub Personal Access Token (will be hidden):\nSuccess! You are now authenticated. The token information displayed below\nis already stored in the token helper. You do NOT need to run \"vault login\"\nagain. Future Vault requests will automatically use this token.\n\nKey Value\n--- -----\ntoken s.PNAXXXXXXXXXXXXXXXXXXX.hbtct\ntoken_accessor KTqKKXXXXXXXXXXXXXXXXXXX.hbtct\ntoken_duration 1h\n...\n
input your GH personal access token
Set the returned token in step 4) into /config/common.config -> vault_token=\"s.PNAXXXXXXXXXXXXXXXXXXX.hbtct\"
NOTE: the admin token from https://portal.cloud.hashicorp.com/ will always work but it's use is discouraged for the nominated GH personal access token for security audit trail reasons
You can also manage your Vault instance via its UI. We'll present below screenshot to show an example using the Github personal access token, one of our supported auth methods.
Generate a GitHub Personal Access Token: https://github.com/settings/tokens
Click \u201cGenerate new token\u201c
Under scopes, only select \"read:org\", under \"admin:org\"
Open your preferred web browser choose Github auth method and paste your GH token and you'll be able to login to your instance.
These are temporary credentials used for the initial deployment of the architecture, and they should only be used for this purpose. Once this process is finished, management and security users should be the ones managing the environment.
management credentials are meant to carry the role of making all important administrative tasks in the environment (e.g. billing adjustments). They should be tied to a physical user in your organization.
A user with these credentials will assume the role OrganizationAccountAccessRole when interacting the environment.
These credentials are the ones to be used for everyday maintenance and interaction with the environment. Users in the role of DevOps | SecOps | Cloud Engineer in your organization should use these credentials.
A user with these credentials will assume te role DevOps when interacting with the environment.
"},{"location":"user-guide/ref-architecture-aws/features/identities/gpg/","title":"GPG Keys","text":""},{"location":"user-guide/ref-architecture-aws/features/identities/gpg/#why-do-we-use-gpg-keys","title":"Why do we use GPG keys?","text":"
By default our Leverage Reference Architectre base-identities layer approach is to use IAM module to manage AWS IAM Users credentials with encryption to grant strong security.
This module outputs commands and GPG messages which can be decrypted either using command line to get AWS Web Console user's password and user's secret key.
Notes for keybase users
If possible, always use GPG encryption to prevent Terraform from keeping unencrypted password and access secret key in state file.
Keybase pre-requisites
When gpg_key is specified as keybase:username, make sure that the user public key has already been uploaded to the Reference Architecture base-identities layer keys folder
"},{"location":"user-guide/ref-architecture-aws/features/identities/gpg/#managing-your-gpg-keys","title":"Managing your GPG keys","text":"
Create a key pair
NOTE: the user for whom this account is being created needs to do this
Install gpg
Run gpg --version to confirm
Run gpg --gen-key and provide \"Your Name\" and \"Your Email\" as instructed -- you must also provide a passphrase
Run gpg --list-keys to check that your key was generated
Delete a key pair
Run gpg --list-keys to check your key id
Run gpg --delete-secret-keys \"Your Name\" to delete your private gpg key
Run gpg --delete-key \"Your Name\" to delete your public gpg key
Export your public key
NOTE: the user must have created a key pair before doing this
Run gpg --export \"Your Name\" | base64
Now the user can share her/his public key for creating her/his account
Decrypt your encrypted password
The user should copy the encrypted password from whatever media it was provided to her/him
Run gpg --decrypt a_file_with_your_pass (in the path you've executed 2.) to effectively decrypt your pass using your gpg key and its passphrase
$ gpg --decrypt encrypted_pass\n\nYou need a passphrase to unlock the secret key for\nuser: \"Demo User (AWS org project-user acct gpg key w/ passphrase) <username.lastname@domain.com>\"\n2048-bit RSA key, ID 05ED43DC, created 2019-03-15 (main key ID D64DD59F)\n\ngpg: encrypted with 2048-bit RSA key, ID 05ED43DC, created 2019-03-15\n \"Demo User (AWS org project-user acct gpg key w/ passphrase) <username.lastname@domain.com>\"\nVi0JA|c%fP*FhL}CE-D7ssp_TVGlf#%\n
Depending on your shell version an extra % character could appear as shown below, you must disregard this character since it's not part of the Initial (one time) AWS Web Console password.
If all went well, the decrypted password should be there
"},{"location":"user-guide/ref-architecture-aws/features/identities/gpg/#workaround-for-mac-users","title":"Workaround for Mac users","text":"
There are some situations where gpg keys generated on Mac don't work properly, generating errors like the following:
\u2577\n\u2502 Error: error encrypting password during IAM User Login Profile (user.lastname) creation: Error encrypting Password: error parsing given PGP key: openpgp: unsupported feature: unsupported oid: 2b060104019755010501\n\u2502 \n\u2502 with module.user[\"user.lastname\"].aws_iam_user_login_profile.this[0],\n\u2502 on .terraform/modules/user/modules/iam-user/main.tf line 12, in resource \"aws_iam_user_login_profile\" \"this\":\n\u2502 12: resource \"aws_iam_user_login_profile\" \"this\" {\n\u2502\n
Docker is required for this workaround.
If you don't have docker on your PC, don't worry. You can easily install it following the steps on the official page.
In these cases, execute the following steps:
Run an interactive console into an ubuntu container mounting your gpg directory.
docker run --rm -it --mount type=bind,src=/Users/username/.gnupg,dst=/root/.gnupg ubuntu:latest\n
Inside the container, install required packages.
apt update\napt install gnupg\n
Generate the key as described in previous sections, running gpg --gen-key at the interactive console in the ubuntu container.
To fix permissions in your gpg directory, run these commands at the interactive console in the ubuntu container.
find ~/.gnupg -type f -exec chmod 600 {} \\;\nfind ~/.gnupg -type d -exec chmod 700 {} \\;\n
Now you should be able to export the gpg key and decode the password from your mac, running gpg --export \"Your Name\" | base64.
Finally, decrypt the password in your mac, executing:
"},{"location":"user-guide/ref-architecture-aws/features/identities/identities/","title":"Identity and Access Management (IAM) Layer","text":""},{"location":"user-guide/ref-architecture-aws/features/identities/identities/#setting-up-user-credentials","title":"Setting up user credentials","text":"
Please follow the steps below to orchestrate your base-identities layer 1st in your project-root AWS account and afterwards in your project-security account.
IAM user standard creation workflow
Pre-requisite add Public PGP Key following the documentation
For steps 3. and 4. consider following Leverage's Terraform workflow
Update (add | remove) your IAM Users associated code and deploy security/global/base-identities/users.tf
Consider customizing your account Alias and Password Policy
Update (add | remove | edit) your IAM Groups associated code and deploy security/global/base-identities/groups.tf
Get and share the IAM Users AWS Console user id and its OTP associated password from the make apply outputs
temporally set sensitive = false to get the encrypted outputs in your prompt output.
Each user will need to decrypt its AWS Console Password, you could share the associated documentation with them.
Users must login to the AWS Web Console (https://project-security.signin.aws.amazon.com/console) with their decrypted password and create new pass
Activate MFA for Web Console (Optional but strongly recommended)
User should create his AWS ACCESS KEYS if needed
User could optionally set up\u00a0~/.aws/project/credentials\u00a0+\u00a0~/.aws/project/config following the immediately below AWS Credentials Setup sub-section
To allow users to Access AWS Organization member account consider repeating step 3. but for the corresponding member accounts:
When you activate STS endpoints for a Region, AWS STS can issue temporary credentials to users and roles in your account that make an AWS STS request. Those credentials can then be used in any Region that is enabled by default or is manually enabled. You must activate the Region in the account where the temporary credentials are generated. It does not matter whether a user is signed into the same account or a different account when they make the request.
To activate or deactivate AWS STS in a Region that is enabled by default (console)
Sign in as a root user or an IAM user with permissions to perform IAM administration tasks.
Open the IAM console and in the navigation pane choose Account settings.
If necessary, expand Security Token Service (STS), find the Region that you want to activate, and then choose Activate or Deactivate. For Regions that must be enabled, we activate STS automatically when you enable the Region. After you enable a Region, AWS STS is always active for the Region and you cannot deactivate it. To learn how to enable a Region, see Managing AWS Regions in the AWS General Reference.
Source | AWS Documentation IAM User Guide | Activating and deactivating AWS STS in an AWS Region
Figure: Deactivating AWS STS in not in use AWS Region. Only in used Regions must have STS activated.
"},{"location":"user-guide/ref-architecture-aws/features/identities/overview/","title":"Identity and Access Management (IAM)","text":""},{"location":"user-guide/ref-architecture-aws/features/identities/overview/#overview","title":"Overview","text":"
Having this official AWS resource as reference we've define a security account structure for managing multiple accounts.
User Management Definitions
IAM users will strictly be created and centralized in the Security account (member accounts IAM Users could be exceptionally created for very specific tools that still don\u2019t support IAM roles for cross-account auth).
All access to resources within the Client organization will be assigned via policy documents attached to IAM roles or groups.
All IAM roles and groups will have the least privileges required to properly work.
IAM AWS and Customer managed policies will be defined, inline policies will be avoided whenever possible.
All user management will be maintained as code and will reside in the DevOps repository.
All users will have MFA enabled whenever possible (VPN and AWS Web Console).
Root user credentials will be rotated and secured. MFA for root will be enabled.
IAM Access Keys for root will be disabled.
IAM root access will be monitored via CloudWatch Alerts.
Why multi account IAM strategy?
Creating a security relationship between accounts makes it even easier for companies to assess the security of AWS-based deployments, centralize security monitoring and management, manage identity and access, and provide audit and compliance monitoring services
Figure: AWS Organization Security account structure for managing multiple accounts (just as reference). (Source: Yoriyasu Yano, \"How to Build an End to End Production-Grade Architecture on AWS Part 2\", Gruntwork.io Blog, accessed November 18th 2020)."},{"location":"user-guide/ref-architecture-aws/features/identities/overview/#iam-groups-roles-definition","title":"IAM Groups & Roles definition","text":"
AWS Org member accounts IAM groups :
Account Name AWS Org Member Accounts IAM Groups Admin Auditor DevOps DeployMaster project-management x project-security x x x x
AWS Org member accounts IAM roles :
Account Name AWS Org Member Accounts IAM Roles Admin Auditor DevOps DeployMaster OrganizationAccountAccessRole project-management x project-security x x x x project-shared x x x x x project-legacy x x x project-apps-devstg x x x x x project-apps-prd x x x x x"},{"location":"user-guide/ref-architecture-aws/features/identities/roles/","title":"IAM Roles","text":"
What are AWS IAM Roles?
For the Leverage AWS Reference Architecture we heavily depend on AWS IAM roles, which is a standalone IAM entity that:
Allows you to attach IAM policies to it,
Specify which other IAM entities to trust, and then
Those other IAM entities can assume the IAM role to be temporarily get access to the permissions in those IAM policies.
The two most common use cases for IAM roles are
Service roles: Whereas an IAM user allows a human being to access AWS resources, one of the most common use cases for an IAM role is to allow a service\u2014e.g., one of your applications, a CI server, or an AWS service\u2014to access specific resources in your AWS account. For example, you could create an IAM role that gives access to a specific S3 bucket and allow that role to be assumed by one of your EC2 instances or Lambda functions. The code running on that AWS compute service will then be able to access that S3 bucket (or any other service you granted through this IAM roles) without you having to manually copy AWS credentials (i.e., access keys) onto that instance.
Cross account access: Allow to grant an IAM entity in one AWS account access to specific resources in another AWS account. For example, if you have an IAM user in AWS account A, then by default, that IAM user cannot access anything in AWS account B. However, you could create an IAM role in account B that gives access to a specific S3 bucket (or any necessary AWS services) in AWS account B and allow that role to be assumed by an IAM user in account A. That IAM user will then be able to access the contents of the S3 bucket by assuming the IAM role in account B. This ability to assume IAM roles across different AWS accounts is the critical glue that truly makes a multi AWS account structure possible.
"},{"location":"user-guide/ref-architecture-aws/features/identities/roles/#how-iam-roles-work","title":"How IAM roles work?","text":"Figure: Example of AWS cross-account AWS access. (Source: Kai Zhao, \"AWS CloudTrail Now Tracks Cross-Account Activity to Its Origin\", AWS Security Blog, accessed November 17th 2020).
You must define a trust policy for each IAM role, which is a JSON document (very similar to an IAM policy) that specifies who can assume this IAM role. For example, we present below a trust policy that allows this IAM role to be assumed by an IAM user named John in AWS account 111111111111:
Note that a trust policy alone does NOT automatically give John permissions to assume this IAM role. Cross-account access always requires permissions in both accounts (2 way authorization). So, if John is in AWS account 111111111111 and you want him to have access to an IAM role called DevOps in account B ID 222222222222, then you need to configure permissions in both accounts: 1. In account 222222222222, the DevOps IAM role must have a trust policy that gives sts:AssumeRole permissions to AWS account A ID 111111111111 (as shown above). 2. 2nd, in account A 111111111111, you also need to attach an IAM policy to John\u2019s IAM user that allows him to assume the DevOps IAM role, which might look like this:
"},{"location":"user-guide/ref-architecture-aws/features/identities/roles/#assuming-an-aws-iam-role","title":"Assuming an AWS IAM role","text":"
How does it work?
IAM roles do not have a user name, password, or permanent access keys. To use an IAM role, you must assume it by making an AssumeRole API call (v\u00eda SDKs API, CLI or Web Console, which will return temporary access keys you can use in follow-up API calls to authenticate as the IAM role. The temporary access keys will be valid for 1-12 hours (depending on your current validity expiration config), after which you must call AssumeRole again to fetch new temporary keys. Note that to make the AssumeRole API call, you must first authenticate to AWS using some other mechanism.
For example, for an IAM user to assume an IAM role, the workflow looks like this:
Figure: Assuming an AWS IAM role. (Source: Gruntwork.io, \"How to configure a production-grade AWS account structure using Gruntwork AWS Landing Zone\", Gruntwork.io Production deployment guides, accessed November 17th 2020).
Basic AssumeRole workflow
Authenticate using the IAM user\u2019s permanent AWS access keys
Make the AssumeRole API call
AWS sends back temporary access keys
You authenticate using those temporary access keys
Now all of your subsequent API calls will be on behalf of the assumed IAM role, with access to whatever permissions are attached to that role
IAM roles and AWS services
Most AWS services have native support built-in for assuming IAM roles.
For example:
You can associate an IAM role directly with an EC2 instance (instance profile), and that instance will automatically assume the IAM role every few hours, making the temporary credentials available in EC2 instance metadata.
Just about every AWS CLI and SDK tool knows how to read and periodically update temporary credentials from EC2 instance metadata, so in practice, as soon as you attach an IAM role to an EC2 instance, any code running on that EC2 instance can automatically make API calls on behalf of that IAM role, with whatever permissions are attached to that role. This allows you to give code on your EC2 instances IAM permissions without having to manually figure out how to copy credentials (access keys) onto that instance.
The same strategy works with many other AWS services: e.g., you use IAM roles as a secure way to give your Lambda functions, ECS services, Step Functions, and many other AWS services permissions to access specific resources in your AWS account.
Consider the following AWS official links as reference:
AWS Identities | Roles terms and concepts
AWS Identities | Common scenarios
"},{"location":"user-guide/ref-architecture-aws/features/monitoring/apm/","title":"Application Performance Monitoring (APM) and Business Performance","text":"
Custom Prometheus BlackBox Exporter + Grafana & Elastic Application performance monitoring (APM) delivers real-time and trending data about your web application's performance and the level of satisfaction that your end users experience. With end to end transaction tracing and a variety of color-coded charts and reports, APM visualizes your data, down to the deepest code levels. Your DevOps teams don't need to guess whether a performance blocker comes from the app itself, CPU availability, database loads, or something else entirely unexpected. With APM, you can quickly identify potential problems before they affect your end users.
APM's user interface provides both current and historical information about memory usage, CPU utilization, database query performance, web browser rendering performance, app availability and error analysis, external services, and other useful metrics.
For this purpose we propose the usage of Elasticsearch + Kibana for database and visualization respectively. By deploying the Fluentd daemonset on the Kubernetes clusters we can send all logs from running pods to Elasticsearch, and with \u2018beat\u2019 we can send specific logs for resources outside of Kubernetes. There will be many components across the environment generating different types of logs: ALB access logs, s3 access logs, cloudfront access logs, application request logs, application error logs. Access logs on AWS based resources can be stored in a centralized bucket for that purpose, on the security account and given the need these can be streamed to Elasticsearch as well if needed.
Figure: Monitoring metrics and log architecture diagram (just as reference). (Source: binbash Leverage, \"AWS Well Architected Reliability Report example\", binbash Leverage Doc, accessed November 18th 2020).
Alerting based on Logs
Certain features that were only available under licence were recently made available by Elastic, and included in the open source project of Elasticsearch. Elastalert allow us to generate alerts based on certain log entries or even after counting a certain amount of a type of entry, providing great flexibility.
There are metrics that are going to be of interest both in the infrastructure itself (CPU, Memory, disk) and also on application level (amount of non 200 responses, latency, % of errors) and we will have two key sources for this: Prometheus and AWS CloudWatch metrics.
Metric collectors
CloudWatch metrics: Is where amazon stores a great number of default metrics for each of its services. Useful data here can be interpreted and alerts can be generated with Cloudwatch alerts and can also be used as a source for Grafana. Although this is a very good offering, we have found it to be incomplete and highly bound to AWS services but not integrated enough with the rest of the ecosystem.
Prometheus: This is an open source tool (by Soundcloud) that is essentially a time-series database. It stores metrics, and it has the advantage of being highly integrated with all Kubernetes things. In fact, Kubernetes is already publishing various metrics in Prometheus format \u201cout of the box\u201d. It\u2019s alerting capabilities are also remarkable, and it can all be kept as code in a repository. It has a big community behind it, and it\u2019s not far fetched at this point to include a library in your own application that provides you with the ability to create an endpoint that publishes certain metrics about your own application, that we can graph or alert based on them.
Figure: Monitoring metrics and log architecture diagram (just as reference). (Source: binbash Leverage, \"AWS Well Architected Reliability Report example\", binbash Leverage Doc, accessed November 18th 2020).
Graphing metrics
Grafana is the standard open source visualization tool which can be used on top of a variety of different data stores. It can use prometheus as a source, and there are many open source dashboards and plugins available that provide great visualization of how things are running, and we can also build our own if necessary. If something is left out of prometheus and already available in Cloudwatch metrics we can easily integrate it as a source for Grafana as well, and build dashboards that integrate these metrics and even do some intelligence on them coming from multiple origins.
Although Grafana already has alerting capabilities built in, we rather (most of the times) have Prometheus alerting engine configured, because we can have really customize and specify alerts. We can have them as code in their extremely readable syntax. Example:
"},{"location":"user-guide/ref-architecture-aws/features/monitoring/notification_escalation/","title":"Notification & Escalation Procedure","text":""},{"location":"user-guide/ref-architecture-aws/features/monitoring/notification_escalation/#overview","title":"Overview","text":"Urgency Service Notification Setting Use When Response High 24/7 High-priority PagerDuty Alert 24/7/365
Issue is in Production
Or affects the applications/services and in turn affects the normal operation of the clinics
Or prevents clinic patients to interact with the applications/services
Requires immediate human action
Escalate as needed
The engineer should be woken up
High during support hours High-priority Slack Notifications during support hours
Issue impacts development team productivity
Issue impacts the normal business operation
Requires immediate human action ONLY during business hours
Low Low Priority Slack Notification
Any issue, on any environment, that occurs during working hours
All alerts are sent to #engineering-urgent-alerts channel. Members that are online can have visibility from there. AlertManager takes care of sending such alerts according to the rules defined here: TODO
Note: there is a channel named engineering-alerts but is used for Github notifications. It didn\u2019t make sense to mix real alerts with that, that is why a new engineering-urgent-alerts channel was created. As a recommendation, Github notifications should be sent to a channel named like #engineering-notifications and leave engineering-alerts for real alerts.
PagerDuty
AlertManager only sends to PagerDuty alerts that are labeled as severity: critical. PagerDuty is configured to turn these into incidents according to the settings defined here for the Prometheus Critical Alerts service. The aforementioned service uses HiPriorityAllYearRound escalation policy to define who gets notified and how.
Note: currently only the TechOwnership role gets notified as we don\u2019t have agreements or rules about on-call support but this can be easily changed in the future to accommodate business decisions.
UpTimeRobot
We are doing basic http monitoring on the following sites: * www.domain_1.com * www.domain_2.com * www.domain_3.com
Note: a personal account has been set up for this. As a recommendation, an new account should be created using an email account that belongs to your project.
Distributed tracing, also called distributed request tracing, is a method used to profile and monitor applications, especially those built using a microservices architecture. Distributed tracing helps pinpoint where failures occur and what causes poor performance.
Figure: Figure: Distributed tracing architecture diagram (just as reference). (Source: binbash Leverage, \"AWS Well Architected Reliability Report example\", binbash Leverage Doc, accessed November 18th 2020)."},{"location":"user-guide/ref-architecture-aws/features/monitoring/tracing/#read-more","title":"Read more","text":"
Related resources
Jaeger
Opensensus
"},{"location":"user-guide/ref-architecture-aws/features/network/dns/","title":"Route53 DNS hosted zones","text":""},{"location":"user-guide/ref-architecture-aws/features/network/dns/#how-it-works","title":"How it works","text":"
Route53 Considerations
Route53 private hosted zone will have associations with VPCs on different AWS organization accounts
Route53 should ideally be hosted in the Shared account, although sometimes Route53 is already deployed in a Legacy account where it can be imported and fully supported as code.
Route53 zero downtime migration (active-active hosted zones) is completely possible and achievable with Leverage terraform code
Figure: AWS Organization shared account Route53 DNS diagram. (Source: Cristian Southall, \"Using CloudFormation Custom Resources to Configure Route53 Aliases\", Abstractable.io Blog post, accessed November 18th 2020)."},{"location":"user-guide/ref-architecture-aws/features/network/dns/#user-guide","title":"User guide","text":"
pre-requisites
Review & update configs
Review & understand the workflow
Steps
DNS service has to be orchestrated from /shared/global/base-dns layer following the standard workflow
"},{"location":"user-guide/ref-architecture-aws/features/network/dns/#migrated-aws-route53-hosted-zones-between-aws-accounts","title":"Migrated AWS Route53 Hosted Zones between AWS Accounts","text":"
We'll need to setup the Route53 DNS service with an active-active config to avoid any type of service disruption and downtime. This would then allow the Name Servers of both AWS Accounts to be added to your domain provider (eg: namecheap.com) and have for example:
4 x ns (project-legacy Route53 Account)
4 x ns (project-shared Route53 Account)
After the records have propagated and everything looks OK we could remove the project-legacy Route53 ns from your domain provider (eg: namecheap.com) and leave only the of project-shared ones.
This official Migrating a hosted zone to a different AWS account - Amazon Route 53 article explains this procedure step by step:
AWS Route53 hosted zone migration steps
Create records in the new hosted zone (bb-shared)
Compare records in the old and new hosted zones (bb-legacy)
Update the domain registration to use name servers for the new hosted zone (NIC updated to use both bb-legacy + bb-shared)
Wait for DNS resolvers to start using the new hosted zone
(Optional) delete the old hosted zone (bb-legacy), remember you'll need to delete the ns delegation records from your domain registration (NIC) too.
"},{"location":"user-guide/ref-architecture-aws/features/network/network-nacl/","title":"Security in AWS with Leverage Reference Architecture and NACLs","text":"
When deploying an AWS Landing Zone resources, security is of fundamental importance. Network Access Control Lists (NACLs) play a crucial role in controlling traffic at the subnet level. In this section we'll describe the use of NACLs implementing with Terraform over the Leverage AWS Reference Architecture.
"},{"location":"user-guide/ref-architecture-aws/features/network/network-nacl/#understanding-network-access-control-lists-nacls","title":"Understanding Network Access Control Lists (NACLs)","text":"
Network Access Control Lists (NACLs) act as a virtual firewall for your AWS VPC (Virtual Private Cloud), controlling inbound and outbound traffic at the subnet level. They operate on a rule-based system, allowing or denying traffic based on defined rules.
"},{"location":"user-guide/ref-architecture-aws/features/network/network-nacl/#leverage-ref-arch-default-configuration-and-variables-setup-for-nacls","title":"Leverage Ref Arch: Default Configuration and Variables Setup for NACLs","text":"
In the Leverage Reference Architecture, we adopt the default NACLs approach. This foundational setup not only ensures a controlled security environment but also offers the flexibility for customization.
This setup ensures that default NACLs are used, providing a baseline level of security.:
manage_default_network_acl = true\npublic_dedicated_network_acl = false // use dedicated network ACL for the public subnets.\nprivate_dedicated_network_acl = false // use dedicated network ACL for the private subnets.\n
To verify that default NACLs are enabled in your Leverage proyect, follow this steps:
Move into the /shared/us-east-1/base-network/ directory.
Open network.tf file: The network.tf file defines the configuration for the VPC (Virtual Private Cloud) and NACL service using a terraform module.
module \"vpc\" {\nsource = \"github.com/binbashar/terraform-aws-vpc.git?ref=v3.18.1\"\n.\n.\n.\nmanage_default_network_acl = var.manage_default_network_acl\npublic_dedicated_network_acl = var.public_dedicated_network_acl // use dedicated network ACL for the public subnets.\nprivate_dedicated_network_acl = var.private_dedicated_network_acl // use dedicated network ACL for the private subnets.\n.\n.\n.\n
Open variable.tf file: The module allows customization of Network Access Control Lists (NACLs) through specified variables
"},{"location":"user-guide/ref-architecture-aws/features/network/network-nacl/#key-points-to-kae-into-account-for-a-robust-and-secure-setup","title":"Key Points to kae into account for a robust and secure setup:","text":"
Explicit Approval Process for NACL Enablement: Enabling NACLs should not be taken lightly. Users or tech leads wishing to enable NACLs must undergo an explicit approval process. This additional step ensures that the introduction of NACLs aligns with the overall security policies and requirements of the organization.
Feedback Mechanisms for NACL Status and Permissions: Communication is key when it comes to security configurations. Feedback mechanisms should be in place to inform users of the status of NACLs and any associated permissions. This ensures transparency and allows for prompt resolution of any issues that may arise.
Comprehensive Testing for Non-disruptive Integration: Before enabling NACLs, comprehensive testing should be conducted to ensure that the default disabling of NACLs does not introduce new issues. This includes testing in different environments and scenarios to guarantee a non-disruptive integration. Automated testing and continuous monitoring can be valuable tools in this phase.
We prioritize operational simplicity to provide an efficient deployment process; however, it's essential for users to conduct a review process align with their specific security and compliance requirements.
This approach allows users to benefit from initial ease of use while maintaining the flexibility to customize and enhance security measures according to their unique needs and compliance standards
In this code, we ensure that default NACLs are enabled. Users can later seek approval and modify these variables if enabling dedicated NACLs becomes necessary.
In this section we detail all the network design related specifications
VPCs CIDR blocks
VPC Gateways: Internet, NAT, VPN.
VPC Peerings
VPC DNS Private Hosted Zones Associations.
Network ACLS (NACLs)
"},{"location":"user-guide/ref-architecture-aws/features/network/vpc-addressing/#vpcs-ip-addressing-plan-cidr-blocks-sizing","title":"VPCs IP Addressing Plan (CIDR blocks sizing)","text":"
Introduction
VPCs can vary in size from 16 addresses (/28 netmask) to 65,536 addresses (/16 netmask). In order to size a VPC correctly, it is important to understand the number, types, and sizes of workloads expected to run in it, as well as workload elasticity and load balancing requirements.
Keep in mind that there is no charge for using Amazon VPC (aside from EC2 charges), therefore cost should not be a factor when determining the appropriate size for your VPC, so make sure you size your VPC for growth.
Moving workloads or AWS resources between networks is not a trivial task, so be generous in your IP address estimates to give yourself plenty of room to grow, deploy new workloads, or change your VPC design configuration from one to another. The majority of AWS customers use VPCs with a /16 netmask and subnets with /24 netmasks. The primary reason AWS customers select smaller VPC and subnet sizes is to avoid overlapping network addresses with existing networks.
So having AWS single VPC Design we've chosen a Medium/Small VPC/Subnet addressing plan which would probably fit a broad range variety of use cases
"},{"location":"user-guide/ref-architecture-aws/features/network/vpc-addressing/#networking-ip-addressing","title":"Networking - IP Addressing","text":"
Starting CIDR Segment (AWS Org)
AWS Org IP Addressing calculation is presented below based on segment 172.16.0.0/12
We started from 172.16.0.0/12 and subnetted to /20
Resulting in Total Subnets: 256
2 x AWS Account with Hosts/SubNet: 4094
1ry VPC + 2ry VPC
1ry VPC DR + 2ry VPC DR
Individual CIDR Segments (VPCs)
Then each of these are /20 to /24
Considering the whole Starting CIDR Segment (AWS Org) before declared, we'll start at 172.18.0.0/20
shared
1ry VPC CIDR: 172.18.0.0/24
2ry VPC CIDR: 172.18.16.0/24
1ry VPC DR CIDR: 172.18.32.0/24
2ry VPC DR CIDR: 172.18.48.0/24
apps-devstg
1ry VPC CIDR: 172.18.64.0/24
2ry VPC CIDR: 172.18.80.0/24
1ry VPC DR CIDR: 172.18.96.0/24
2ry VPC DR CIDR: 172.18.112.0/24
apps-prd
1ry VPC CIDR: 172.18.128.0/24
2ry VPC CIDR: 172.18.144.0/24
1ry VPC DR CIDR: 172.18.160.0/24
2ry VPC DR CIDR: 172.18.176.0/24
Resulting in Subnets: 16 x VPC
VPC Subnets with Hosts/Net: 256.
Eg: apps-devstg account \u2192 us-east-1 w/ 3 AZs \u2192 3 x Private Subnets /az + 3 x Public Subnets /az
1ry VPC CIDR: 172.18.64.0/24Subnets:
Private 172.18.64.0/24, 172.18.66.0/24 and 172.18.68.0/24
Public 172.18.65.0/24, 172.18.67.0/24 and 172.18.69.0/24
"},{"location":"user-guide/ref-architecture-aws/features/network/vpc-addressing/#planned-subnets-per-vpc","title":"Planned Subnets per VPC","text":"
Having defined the initial VPC that will be created in the different accounts that were defined, we are going to create subnets in each of these VPCs defining Private and Public subnets split among different availability zones:
Please follow the steps below to orchestrate your base-network layer, 1st in your project-shared AWS account and afterwards in the necessary member accounts which will host network connected resources (EC2, Lambda, EKS, RDS, ALB, NLB, etc):
project-apps-devstg account.
project-apps-prd account.
Network layer standard creation workflow
Please follow Leverage's Terraform workflow for each of your accounts.
We'll start by project-shared AWS Account Update (add | remove | customize) your VPC associated code before deploying this layer shared/base-network Main files
network.tf
locals.tf
Repeat for every AWS member Account that needs its own VPC Access AWS Organization member account consider repeating step 3. but for the corresponding member accounts.
"},{"location":"user-guide/ref-architecture-aws/features/network/vpc-peering/","title":"Diagram: Network Service (cross-account VPC peering)","text":""},{"location":"user-guide/ref-architecture-aws/features/network/vpc-peering/#how-it-works","title":"How it works","text":"
"},{"location":"user-guide/ref-architecture-aws/features/network/vpc-peering/#diagram-network-service-cross-account-vpc-peering_1","title":"Diagram: Network Service (cross-account VPC peering)","text":"Figure: AWS multi account Organization VPC peering diagram. (Source: AWS, \"Amazon Virtual Private Cloud VPC Peering\", AWS Documentation Amazon VPC User Guide, accessed November 18th 2020). Figure: AWS multi account Organization peering detailed diagram. (Source: AWS, \"Amazon Virtual Private Cloud VPC Peering\", AWS Documentation Amazon VPC User Guide, accessed November 18th 2020)."},{"location":"user-guide/ref-architecture-aws/features/network/vpc-topology/","title":"Network Layer","text":""},{"location":"user-guide/ref-architecture-aws/features/network/vpc-topology/#network-topology","title":"Network Topology","text":"
VPC with public and private subnets (NAT)
The configuration for this scenario includes a virtual private cloud (VPC) with public subnets and a private subnets (it's number will change depending on our specific needs). We recommend this scenario if you want to run a public-facing web application, while maintaining back-end servers that aren't publicly accessible. A common example is a multi-tier website, with a Load Balancer (ALB | NLB) in a public subnet, or other public facing routing service like AWS CloudFront or Api Gateway, and our web servers (Lambda, EKS, ECS, EC2) and database (RDS, DynamoDB, etc) servers in private subnets. You can set up security (SGs, ACLs, WAF) and routing so that the web servers can communicate internally (even between VPC accounts or VPN Endpoints) with all necessary services and components such as databases, cache, queues, among others.
The services running in the public subnet, like an ALB or NLB can send outbound traffic directly to the Internet, whereas the instances in the private subnet can't. Instead, the instances in the private subnet can access the Internet by using a network address translation (NAT) gateway that resides in the public subnet. The database servers can connect to the Internet for software updates using the NAT gateway (if using RDS this is transparently provided by AWS), but the Internet cannot establish connections to the database servers.
So, whenever possible all our AWS resources like EC2, EKS, RDS, Lambda, SQS will be deployed in VPC private subnets and we'll use a NAT device (Nat Gateway) to enable instances in a private subnet to connect to the internet (for example, for software updates) or other AWS services, but prevent the internet from initiating connections with the instances.
A NAT device forwards traffic from the instances in the private subnet to the internet (via the VPC Internet Gateway) or other AWS services, and then sends the response back to the instances. When traffic goes to the internet, the source IPv4 address is replaced with the NAT device\u2019s address and similarly, when the response traffic goes to those instances, the NAT device translates the address back to those instances\u2019 private IPv4 addresses.
Figure: VPC topology diagram. (Source: AWS, \"VPC with public and private subnets (NAT)\", AWS Documentation Amazon VPC User Guide, accessed November 18th 2020). Figure: VPC topology diagram with multiple Nat Gateways for HA. (Source: Andreas Wittig, \"Advanced AWS Networking: Pitfalls That You Should Avoid\", Cloudonaut.io Blog, accessed November 18th 2020)."},{"location":"user-guide/ref-architecture-aws/features/network/vpc-topology/#read-more","title":"Read more","text":"
AWS reference links
Consider the following AWS official links as reference:
VPC with public and private subnets (NAT)
AWS Elastic Load Balancing
"},{"location":"user-guide/ref-architecture-aws/features/network/vpc-traffic-out/","title":"Network Security","text":""},{"location":"user-guide/ref-architecture-aws/features/network/vpc-traffic-out/#control-internet-access-outbound-traffic","title":"Control Internet access outbound traffic","text":""},{"location":"user-guide/ref-architecture-aws/features/network/vpc-traffic-out/#goals","title":"Goals","text":"
Review and analyse available alternatives for controlling outbound traffic in VPCs.
All possible candidates need to offer a reasonable balance between features and pricing.
Solutions
"},{"location":"user-guide/ref-architecture-aws/features/network/vpc-traffic-out/#leverage-currently-supports","title":"Leverage currently supports","text":"
Network ACL (Subnet firewall)
Security Groups (Instance firewall)
"},{"location":"user-guide/ref-architecture-aws/features/network/vpc-traffic-out/#what-alternatives-do-we-have","title":"What alternatives do we have?","text":""},{"location":"user-guide/ref-architecture-aws/features/network/vpc-traffic-out/#pre-considerations","title":"Pre-considerations","text":"
First of all, keep in mind the following points before and while you go through the data in the table:
1 EBS pricing at the moment of this writing:
GP2: $0.10 per GB-month
GP3: $0.08 per GB-month)
2 DataTransfer costs will be incurred in all options
Our default AWS Organizations terraform layout solution includes 5 accounts + 1 or N accounts (if you invite pre-existing AWS Accounts).
Account Description Management (Root) Used to manage configuration and access to AWS Org managed accounts. The AWS Organizations account provides the ability to create and financially manage member accounts, it contains AWS Organizations Service Control Policies(SCPs). Shared Services / Resources Reference for creating infrastructure shared services such as directory services, DNS, VPN Solution, Monitoring tools like Prometheus and Grafana, CI/CD server (Jenkins, Drone, Spinnaker, etc), centralized logging solution like ELK and Vault Server (Hashicorp Vault) Security Intended for centralized user management via IAM roles based cross-org auth approach (IAM roles per account to be assumed still needed. Also to centralize AWS CloudTrail and AWS Config logs, and used as the master AWS GuardDuty Account Network Intended for centralized networking management via Transit Gateway (TGW), supports a centralized outbound traffic setup and the integration of AWS Network Firewall (NFW) Legacy Your pre existing AWS Accounts to be invited as members of the new AWS Organization, probably several services and workloads are going to be progressively migrated to your new Accounts. Apps DevStg Host your DEV, QA and STG environment workloads Compute / Web App Servers (K8s Clusters and Lambda Functions), Load Balancers, DB Servers, Caching Services, Job queues & Servers, Data, Storage, CDN Apps Prod Host your PROD environment workloads Compute / Web App Servers (K8s Clusters and Lambda Functions), Load Balancers, DB Servers, Caching Services, Job queues & Servers, Data, Storage, CDN"},{"location":"user-guide/ref-architecture-aws/features/organization/billing/","title":"Billing","text":""},{"location":"user-guide/ref-architecture-aws/features/organization/billing/#overview","title":"Overview","text":"
Each month AWS charges your payer Root Account for all the linked accounts in a consolidated bill. The following illustration shows an example of a consolidated bill.
Figure: AWS Organization Multi-Account structure (just as reference). (Source: Andreas Wittig, \"AWS Account Structure: Think twice before using AWS Organizations\", Cloudonaut.io Blog, accessed November 18th 2020). Figure: AWS Organization Multi-Account billing structure (just as reference). (Source: AWS, \"Consolidated billing process\", AWS Documentation AWS Billing and Cost Management User Guide, accessed November 18th 2020).
Reference Architecture AWS Organizations features
AWS Multiple Account Billing Strategy: consolidated billing for all your accounts within organization, enhanced per account cost filtering and RI usage
A single monthly bill accumulates the spending among many AWS accounts.
Benefit from volume pricing across more than one AWS account.
AWS Organizations Billing FAQs
What does AWS Organizations cost?
AWS Organizations is offered at no additional charge.
Who pays for usage incurred by users under an AWS member account in my organization?
The owner of the master account is responsible for paying for all usage, data, and resources used by the accounts in the organization.
Will my bill reflect the organizational unit structure that I created in my organization?
No. For now, your bill will not reflect the structure that you have defined in your organization. You can use cost allocation tags in individual AWS accounts to categorize and track your AWS costs, and this allocation will be visible in the consolidated bill for your organization.
You'll need an email to create and register your AWS Organization Management Account. For this purpose we recommend to avoid using a personal email account. Instead, whenever possible, it should ideally be associated, with a distribution list email such as a GSuite Group to ensure the proper admins member's team (DevOps | SecOps | Cloud Engineering Team) to manage its notifications avoiding a single point of contact (constraint).
GSuite Group Email address: aws@domain.com (to which admins / owners belong), and then using the + we generate the aliases automatically implicitly when running Terraform's Leverage code.
aws+security@binbash.com.ar
aws+shared@binbash.com.ar
aws+network@binbash.com.ar
aws+apps-devstg@binbash.com.ar
aws+apps-prd@binbash.com.ar
Reference Code as example
#\n# Project Prd: services and resources related to production are placed and\n# maintained here.\n#\nresource \"aws_organizations_account\" \"apps_prd\" {\nname = \"apps-prd\"\nemail = \"aws+apps-prd@domain.ar\"\nparent_id = aws_organizations_organizational_unit.apps_prd.id\n}\n
Billing: review billing setup as pre-requisite to deploy the AWS Org. At your Management account billing setup check
Activate IAM User and Role Access to Billing Information
If needed Update Alternate Contacts
Via AWS Web Console: in project_name-management previously created account (eg, name: leverage-management, email: aws@binbash.com.ar) create the mgmt-org-admin IAM user with Admin privileges (attach the AdministratorAccess IAM managed policy and enable Web Console and programmatic access), which will be use for the initial AWS Org bootstrapping.
NOTE: After it\u2019s 1st execution only nominated Org admin users will persist in the project-management account.
Via AWS Web Console: in project-management account create mgmt-org-admin IAM user AWS ACCESS KEYS
NOTE: This could be created all in one in the previous step (N\u00ba 2).
Figure: AWS Web Console screenshot. (Source: binbash, \"AWs Organization management account init IAM admin user\", accessed June 16th 2021).
Figure: AWS Web Console screenshot. (Source: binbash, \"AWs Organization management account init IAM admin user\", accessed June 16th 2021).
Set your IAM credentials in the machine your're going to exec the Leverage CLI (remember this are the mgmt-org-admin temporary user credentials shown in the screenshot immediately above).
Set up your Leverage reference architecture configs in order to work with your new account and `org-mgmt-admin IAM user
common config
account configs
Setup and create the terraform remote state for the new AWS Org Management account
terraform remote state config
terraform remote state workflow
terraform remote state ref code
You'll 1st get a local state and then you'll need to move your tf state to s3; validate it and finally delete local state files
The AWS Organization from the Reference Architecture /le-tf-infra-aws/root/global/organizations will be orchestrated using the Leverage CLI following the standard workflow.
the Management account has to be imported into de the code.
Verify your Management account email address in order to invite existing (legacy) AWS accounts to join your organization.
Following the doc orchestrate v\u00eda the Leverage CLI workflow the Mgmt Account IAM layer (base-identities) with the admin IAM Users (consider this/these users will have admin privileges over the entire AWS Org assuming the OrganizationAccountAccessRole) -> le-tf-infra-aws/root/global/base-identities
The IAM role: OrganizationAccessAccountRole => does not exist in the initial Management (root) account, this will be created by the code in this layer.
Mgmt account admin user permanent credentials set up => setup in your workstation the AWS credentials) for the OrganizationAccountAccessRole IAM role (project_short-root-oaar, eg: bb-root-oaar). Then validate within each initial mgmt account layer that the profile bb-root-oaar is correctly configured at the below presented config files, as well as any other necessary setup.
/config/common.config
/root/config/account.config
/root/config/backend.config
Setup (code and config files) and Orchestrate the /security/global/base-identities layer via Leverage CLI on your security account for consolidated and centralized User Mgmt and access to the AWS Org.
You must have your AWS Organization deployed and access to your Management account as described in the /user-guide/user-guide/organization/organization-init section.
"},{"location":"user-guide/ref-architecture-aws/features/organization/legacy-accounts/#invite-aws-pre-existing-legacy-accounts-to-your-aws-organization","title":"Invite AWS pre-existing (legacy) accounts to your AWS Organization","text":"
AWS Org pre-existing accounts invitation
Via AWS Web Console: from your project-root account invite the pre-existing project-legacy (1 to n accounts).
Via AWS Web Console: in project-legacy create the OrganizationAccountAccessRole IAM Role with Admin permissions.
Should follow Creating the OrganizationAccountAccessRole in an invited member account section.
Import your project-legacy account as code.
Update the following variables in ./@bin/makefiles/terraform12/Makefile.terraform12-import-rm
This repository contains all Terraform configuration files used to create binbash Leverage Reference AWS Organizations Multi-Account baseline layout.
Why AWS Organizations?
This approach allows it to have a hierarchical structure of AWS accounts, providing additional security isolation and the ability to separate resources into Organizational Units with it associated Service Control Policies (SCP).
Considering that a current AWS account/s was/were already active (Client AWS Legacy Account), this one will then be invited to be a \u201cmember account\u201d of the AWS Organization architecture. In the future, once all Client\u2019s Legacy dev, stage, prod and other resources for the Project applications are running in the new accounts architecture, meaning a full AWS Organizations approach, all the already migrated assets from the \u2018Legacy\u2019 account should be decommissioned. This account will remain with the necessary services, such as DNS, among others.
The following block provides a brief explanation of the chosen AWS Organization Accounts layout:
MyExample project file structure
+\ud83d\udcc2 management/ (resources for the management account)\n ...\n +\ud83d\udcc2 security/ (resources for the security + users account)\n ...\n +\ud83d\udcc2 shared/ (resources for the shared account)\n ...\n +\ud83d\udcc2 network/ (resources for the centralized network account)\n ...\n +\ud83d\udcc2 apps-devstg/ (resources for apps dev & stg account)\n ...\n +\ud83d\udcc2 apps-prd/ (resources for apps prod account)\n ...\n
Billing: Consolidated billing for all your accounts within organization, enhanced per account cost filtering and RI usage
Security I: Extra security layer: You get fully isolated infrastructure for different organizations units in your projects, eg: Dev, Prod, Shared Resources, Security, Users, BI, etc.
Security II: Using AWS Organization you may use Service Control Policies (SCPs) to control which AWS services are available within different accounts.
Networking: Connectivity and access will be securely setup via VPC peering + NACLS + Sec Groups everything with private endpoints only accessible v\u00eda Pritunl VPN significantly reducing the surface of attack.
User Mgmt: You can manage all your IAM resources (users/groups/roles) and policies in one place (usually, security/users account) and use AssumeRole to works with org accounts.
Operations: Will reduce the blast radius to the maximum possible.
Compatibility: Legacy accounts can (probably should) be invited as a member of the new Organization and afterwards even imported into your terraform code.
Migration: After having your baseline AWS Org reference cloud solutions architecture deployed (IAM, VPC, NACLS, VPC-Peering, DNS Cross-Org, CloudTrail, etc) you're ready to start progressively orchestrating new resources in order to segregate different Environment and Services per account. This approach will allow you to start a 1 by 1 Blue/Green (Red/Black) migration without affecting any of your services at all. You would like to take advantage of an Active-Active DNS switchover approach (nice as DR exercise too).
EXAMPLE: Jenkins CI Server Migration steps:
Let's say you have your EC2_A (jenkins.aws.domain.com) in Account_A (Legacy), so you could deploy a brand new EC2_B Jenkins Instance in Account_B (Shared Resources).
Temporally associated with jenkins2.aws.domain.com
Sync it's current data (/var/lib/jenkins)
Test and fully validate every job and pipeline works as expected.
In case you haven't finished your validations we highly recommend to declare everything as code and fully automated so as to destroy and re-create your under development env on demand to save costs.
Finally switch jenkins2.aws.domain.com -> to -> jenkins.aws.domain.com
Stop your old EC2_A.
If everything looks fine after after 2/4 weeks you could terminate your EC2_A (hope everything is as code and just terraform destroy)
Considering the previously detailed steps plan your roadmap to move forward with every other component to be migrated.
Consider the following AWS official links as reference:
Why should I set up a multi-account AWS environment?
AWS Multiple Account User Management Strategy
AWS Muttiple Account Security Strategy
AWS Multiple Account Billing Strategy
AWS Secure Account Setup
Authentication and Access Control for AWS Organizations (keep in mind EC2 and other services can also use AWS IAM Roles to get secure cross-account access)
AWS Backup is a fully managed backup service that makes it easy to centralize and automate the backup of data across AWS services. Using AWS Backup, you can centrally configure backup policies and monitor backup activity for AWS resources, such as:
Amazon EBS volumes,
Amazon EC2 instances,
Amazon RDS databases,
Amazon DynamoDB tables,
Amazon EFS file systems,
and AWS Storage Gateway volumes.
AWS Backup automates and consolidates backup tasks previously performed service-by-service, removing the need to create custom scripts and manual processes. With just a few clicks in the AWS Backup console, you can create backup policies that automate backup schedules and retention management. AWS Backup provides a fully managed, policy-based backup solution, simplifying your backup management, enabling you to meet your business and regulatory backup compliance requirements.
Figure: AWS Backup service diagram (just as reference). (Source: AWS, \"AWS Backup - Centrally manage and automate backups across AWS services\", AWS Documentation, accessed November 18th 2020)."},{"location":"user-guide/ref-architecture-aws/features/reliability/backups/#s3-bucket-region-replication","title":"S3 bucket region replication","text":"
Buckets that hold data critical to business or to application operation can be replicated to another region almost synchronously.
This can be setup on request to increase durability and along with database backup can constitute the base for a Business Continuity strategy.
"},{"location":"user-guide/ref-architecture-aws/features/reliability/backups/#comparison-of-the-backup-and-retention-policies-strategies","title":"Comparison of the backup and retention policies strategies","text":"
In this sub-section you'll find the resources to review and adjust your backup retention policies to adhere to compliance rules that govern your specific institutions regulations. This post is a summarised write-up of how we approached this sensitive task, the alternatives we analysed and the recommended solutions we provided in order to meet the requirements. We hope it can be useful for others as well.
Leverage Confluence Documentation
You'll find here a detailed comparison including the alternative product and solution types, pricing model, features, pros & cons.
"},{"location":"user-guide/ref-architecture-aws/features/reliability/dr/","title":"Disaster Recovery & Business Continuity Plan","text":""},{"location":"user-guide/ref-architecture-aws/features/reliability/dr/#overview","title":"Overview","text":"
Applications that are business critical should always have a plan in place to recover in case of a catastrophic failure or disaster. There are many strategies that can be implemented to achieve this, and deciding between them is a matter of analyzing how much is worth to invest based on calculation of damages suffered if the application is not available for a given period of time. It is based on this factor (time) that disaster recovery plans are based on. Factors that need to be determined per application are:
RTO and RPO
Recovery time objective (RTO): This represents the time it takes after a disruption to restore a business process to its service level. For example, if a disaster occurs at 12:00 PM (noon) and the RTO is eight hours, the DR process should restore the business process to the acceptable service level by 8:00 PM.
Recovery point objective (RPO): This is the acceptable amount of data loss measured in time. For example, if a disaster occurs at 12:00 PM (noon) and the RPO is one hour, the system should recover all data that was in the system before that hour.
After deciding RTO and RPO we have options available to achieve the time objectives:
HA Strategies
Backup and restore: In most traditional environments, data is backed up to tape and sent off-site regularly. The equivalent in AWS would be to take backups in the form of snapshots and copy them to another region for RDS instances, EBS volumes, EFS and S3 buckets. The plan details the step-by-step procedure to recover a fully working production environment based on these backups being restored on freshly provisioned infrastructure, and how to rollback to a regular production site once the emergency is over.
Pilot Light Method: The term pilot light is often used to describe a DR scenario in which a minimal version of an environment is always running in AWS. Very similar to \u201cBackup and restore\u201d except a minimal version of key infrastructure components is provisioned in a separate region and then scaled up in case of disaster declaration.
Warm standby active-passive method: The term warm-standby is used to describe a DR scenario in which a scaled-down version of a fully-functional environment is always running in the cloud. Enhancement of Pilot Light in which a minimal version is created of all components, not just critical ones.
Multi-Region active-active method: By architecting multi region applications and using DNS to balance between them in normal production status, you can adjust the DNS weighting and send all traffic to the AWS region that is available, this can even be performed automatically with Route53 or other DNS services that provide health check mechanisms as well as load balancing.
Figure: 2 sets of app instances, each behind an elastic load balancer in two separate regions (just as reference). (Source: Randika Rathugamage, \"High Availability with Route53 DNS Failover\", Medium blogpost, accessed December 1st 2020). Figure: AWS calculated \u2014 or parent \u2014 health check, we can fail on any number of child health checks (just as reference). (Source: Simon Tabor, \"How to implement the perfect failover strategy using Amazon Route53\", Medium blogpost, accessed December 1st 2020)."},{"location":"user-guide/ref-architecture-aws/features/reliability/dr/#read-more","title":"Read more","text":"
AWS reference links
Consider the following AWS official links as reference:
"},{"location":"user-guide/ref-architecture-aws/features/reliability/high-availability/","title":"High Availability & Helthchecks","text":""},{"location":"user-guide/ref-architecture-aws/features/reliability/high-availability/#recovery-from-failures","title":"Recovery from Failures","text":"
Automatic recovery from failure
It keeps an AWS environment reliable. Using logs and metrics from CloudWatch, designing a system where the failures themselves trigger recovery is the way to move forward.
Figure: AWS HA architecture diagrams (just as reference)."},{"location":"user-guide/ref-architecture-aws/features/reliability/high-availability/#recovery-procedures","title":"Recovery Procedures","text":"
Test recovery procedures
The risks faced by cloud environment and systems, the points of failure for systems and ecosystems, as well as details about the most probable attacks are known and can be simulated. Testing recovery procedures are something that can be done using these insights. Real points of failure are exploited and the way the environment reacts to the emergency shows just how reliable the system it.
Figure: AWS HA architecture diagrams (just as reference)."},{"location":"user-guide/ref-architecture-aws/features/reliability/high-availability/#scalability-and-availability","title":"Scalability and Availability","text":"
Scale horizontally to increase aggregate system availability
The cloud environment needs to have multiple redundancies and additional modules as added security measures. Of course, multiple redundancies require good management and maintenance for them to remain active through the environment\u2019s lifecycle.
Figure: AWS HA scalable architecture diagrams (just as reference)."},{"location":"user-guide/ref-architecture-aws/features/reliability/high-availability/#healthchecks-self-healing","title":"Healthchecks & Self-healing","text":""},{"location":"user-guide/ref-architecture-aws/features/reliability/high-availability/#k8s-and-containers","title":"K8s and containers","text":"
K8s readiness and liveness probes
Distributed systems can be hard to manage. A big reason is that there are many moving parts that all need to work for the system to function. If a small part breaks, the system has to detect it, route around it, and fix it. And this all needs to be done automatically! Health checks are a simple way to let the system know if an instance of your app is working or not working.
If an instance of your app is not working, then other services should not access it or send a request to it. Instead, requests should be sent to another instance of the app that is ready, or re-tried at a later time. The system should also bring your app back to a healthy state.
By default, Kubernetes starts to send traffic to a pod when all the containers inside the pod start, and restarts containers when they crash. While this can be \u201cgood enough\u201d when you are starting out, you can make your deployments more robust by creating custom health checks. Fortunately, Kubernetes make this relatively straightforward, so there is no excuse not to!\u201d
So aside from the monitoring and alerting that underlying infrastructure will have, application container will have their own mechanisms to determine readiness and liveness. These are features that our scheduler of choice Kubernetes natively allows, to read more click here.
"},{"location":"user-guide/ref-architecture-aws/features/secrets/secrets/","title":"Secrets and Passwords Management","text":""},{"location":"user-guide/ref-architecture-aws/features/secrets/secrets/#overview","title":"Overview","text":"
Ensure scalability, availability and persistence, as well as secure, hierarchical storage to manage configuration and secret data for:
Secret Managers
AWS KMS
AWS SSM Parameter Store
Ansible Vault
Hashicorp Vault
Strengths
Improve the level of security by validating separation of environment variables and code secrets.
Control and audit granular access in detail
Store secure chain and configuration data in hierarchies and track versions.
Configure integration with AWS KMS, Amazon SNS, Amazon CloudWatch, and AWS CloudTrail to notify, monitor, and audit functionality.
AWS CloudTrail monitors and records account activity across your AWS infrastructure, giving you control over storage, analysis, and remediation actions.
AWS CloudTrail overview
This service will be configured to enable auditing of all AWS services in all accounts. Once enabled, as shown in the below presented figure, CloudTrail will deliver all events from all accounts to the Security account in order to have a centralized way to audit operations on AWS resources. Audit events will be available from CloudTrail for 90 days but a longer retention time will be available through a centralized S3 bucket.
Figure: AWS CloudTrail components architecture diagram (just as reference). (Source: binbash Leverage diagrams, accessed July 6th 2022).
\"AWS Certificate Manager is a service that lets you easily provision, manage, and deploy public and private Secure Sockets Layer/Transport Layer Security (SSL/TLS) certificates for use with AWS services and your internal connected resources. SSL/TLS certificates are used to secure network communications and establish the identity of websites over the Internet as well as resources on private networks. AWS Certificate Manager removes the time-consuming manual process of purchasing, uploading, and renewing SSL/TLS certificates.\"
\"With AWS Certificate Manager, you can quickly request a certificate, deploy it on ACM-integrated AWS resources, such as:
Elastic Load Balancers,
Amazon CloudFront distributions,
and APIs on API Gateway,
and let AWS Certificate Manager handle certificate renewals. It also enables you to create private certificates for your internal resources and manage the certificate lifecycle centrally. Public and private certificates provisioned through AWS Certificate Manager for use with ACM-integrated services are free. You pay only for the AWS resources you create to run your application. With AWS Certificate Manager Private Certificate Authority, you pay monthly for the operation of the private CA and for the private certificates you issue.\"
Figure: AWS certificate manager (ACM) service integration diagram. (Source: AWS, \"Amazon Certificate Manager intro diagram\", AWS Documentation Amazon ACM User Guide, accessed August 4th 2021)."},{"location":"user-guide/ref-architecture-aws/features/security/certificates/#cert-manager-lets-encrypt","title":"Cert-manager + Let's Encrypt","text":"
Why Cert-manager + Let's Encrypt\u2753
cert-manager adds certificates and certificate issuers as resource types in Kubernetes clusters, and simplifies the process of obtaining, renewing and using those certificates.
It can issue certificates from a variety of supported sources, including Let\u2019s Encrypt, HashiCorp Vault, and Venafi as well as private PKI.
It will ensure certificates are valid and up to date, and attempt to renew certificates at a configured time before expiry.
It is loosely based upon the work of kube-lego and has borrowed some wisdom from other similar projects such as kube-cert-manager.
Figure: Certificate manager high level components architecture diagram. (Source: Cert-manager official documentation, \"Cert-manager manager intro overview\", Cert-manager Documentation main intro section, accessed August 4th 2021)."},{"location":"user-guide/ref-architecture-aws/features/security/firewall-manager/","title":"Firewall Manager","text":""},{"location":"user-guide/ref-architecture-aws/features/security/firewall-manager/#use-cases","title":"Use Cases","text":"
Network Firewall rules: Security administrators will be able to deploy firewall rules for AWS Network Firewall to control traffic leaving and entering your network across accounts and Amazon VPCs, from the Security account.
WAF & WAF v2: Your security administrators will able to deploy WAF and WAF v2 rules, and Managed rules for WAF to be used on Application Load Balancers, API Gateways and Amazon CloudFront distributions.
Route 53 Resolver DNS Firewall rules: Deploy Route 53 Resolver DNS Firewall rules from the Security account to enforce firewall rules across your organization.
Audit Security Groups: You can create policies to set guardrails that define what security groups are allowed/disallowed across your VPCs. AWS Firewall Manager continuously monitors security groups to detect overly permissive rules, and helps improve firewall posture. You can get notifications of accounts and resources that are non-compliant or allow AWS Firewall Manager to take action directly through auto-remediation.
Security Groups: Use AWS Firewall Manager to create a common primary security group across your EC2 instances in your VPCs.
Access Analyzer analyzes the resource-based policies that are applied to AWS resources in the Region where you enabled Access Analyzer. Only resource-based policies are analyzed.
Supported resource types:
Amazon Simple Storage Service buckets
AWS Identity and Access Management roles
AWS Key Management Service keys
AWS Lambda functions and layers
Amazon Simple Queue Service queues
AWS Secrets Manager secrets
Figure: AWS IAM access analysis features. (Source: AWS, \"How it works - monitoring external access to resources\", AWS Documentation, accessed June 11th 2021)."},{"location":"user-guide/ref-architecture-aws/features/security/iam-access-analyzer/#aws-organizations","title":"AWS Organizations","text":"
CONSIDERATION: AWS Organization integration
In order to enable AccessAnalyzer with the Organization at the zone of of trust in the Security account, this account needs to be set as a delegated administrator.
Such step cannot be performed by Terraform yet so it was set up manually as described below: https://docs.aws.amazon.com/IAM/latest/UserGuide/access-analyzer-settings.html
If you're configuring AWS IAM Access Analyzer in your AWS Organizations management account, you can add a member account in the organization as the delegated administrator to manage Access Analyzer for your organization. The delegated administrator has permissions to create and manage analyzers with the organization as the zone of trust. Only the management account can add a delegated administrator.
"},{"location":"user-guide/ref-architecture-aws/features/security/iam-access-analyzer/#aws-web-console","title":"AWS Web Console","text":"Figure: AWS Web Console screenshot. (Source: binbash, \"IAM access analyzer service\", accessed June 11th 2021)."},{"location":"user-guide/ref-architecture-aws/features/security/overview/","title":"Security","text":""},{"location":"user-guide/ref-architecture-aws/features/security/overview/#supported-aws-security-services","title":"Supported AWS Security Services","text":"
AWS IAM Access Analyzer: Generates comprehensive findings that identify resources policies for public or cross-account accessibility, monitors and helps you refine permissions. Provides the highest levels of security assurance.
AWS Config: Tracks changes made to AWS resources over time, making possible to return to a previous state. Monitors and records your AWS resource configurations and allows you to automate the evaluation of recorded configurations against desired compliance rule set. Adds accountability factor.
AWS Cloudtrail: Stores logs over all calls made to AWS APIs, coming from web console, command line or any other. Allowing us to monitor it via CW Dashboards and notifications.
AWS VPC Flow Logs: Enables us to examine individual Network Interfaces logs, to address network issues and also monitor suspicious behavior.
AWS Web Application Firewall: Optional but if not used, it is recommended that a similar service is used, such as Cloudflare. When paired to an Application Load Balancer or Cloudfront distribution, it checks incoming requests to detect and block OWASP Top10 attacks, such as SQL injection, XSS and others.
AWS Inspector: Is an automated security assessment service that helps improve the security and compliance of infrastructure and applications deployed on AWS.
AWS GuardDuty: Is a managed threat detection service that continuously monitors for malicious or unauthorized behavior to help you protect your AWS accounts and workloads. Detects unusual API calls or potentially unauthorized deployments (possible account compromise) and potentially compromised instances or reconnaissance by attackers.
AWS Security Logs Other access logs from client-facing resources will be stored in the Security account.
AWS Firewall Manager Is a security management service which allows you to centrally configure and manage firewall rules across your accounts and applications in AWS Organizations. This service lets you build firewall rules, create security policies, and enforce them in a consistent, hierarchical manner across your entire infrastructure, from a central administrator account.
K8s API via kubectl private endpoint eg: avoiding emergency K8s API vulnerability patching.
Limit exposure: Limit the exposure of the workload to the internet and internal networks by only allowing minimum required access -> Avoiding exposure for Dev/QA/Stg http endpoints
The Pritunl OpenVPN Linux instance is hardened and only runs this VPN solution. All other ports/access is restricted.
Each VPN user can be required to use MFA to connect via VPN (as well as strong passwords). This combination makes almost impossible for an outsider to gain access via VPN.
Centralized access and audit logs.
Figure: Securing access to a private network with Pritunl diagram. (Source: Pritunl, \"Accessing a Private Network\", Pritunl documentation v1 Guides, accessed November 17th 2020)."},{"location":"user-guide/ref-architecture-aws/features/security/vpn/#read-more","title":"Read More","text":"
Pritunl - Open Source Enterprise Distributed OpenVPN, IPsec and WireGuard Server Specifications
Welcome to the comprehensive guide for using the AWS Systems Manager (SSM) through the Leverage framework integrated with AWS Single Sign-On (SSO). This documentation is designed to facilitate a smooth and secure setup for managing EC2 instances, leveraging advanced SSO capabilities for enhanced security and efficiency.
The AWS Systems Manager (SSM) provides a powerful interface for managing cloud resources. By initiating an SSM session using the leverage aws sso configure command, you can securely configure and manage your instances using single sign-on credentials. This integration simplifies the authentication process and enhances security, making it an essential tool for administrators and operations teams.
SSO Integration: Utilize the Leverage framework to integrate AWS SSO, simplifying the login process and reducing the need for multiple credentials.
Interactive Command Sessions: The start-session command requires the Session Manager plugin and is interactive, ensuring secure and direct command execution.
This command configures your AWS CLI to use SSO for authentication, streamlining access management across your AWS resources.
leverage aws sso configure\n
"},{"location":"user-guide/ref-architecture-aws/features/ssm/ssm/#advantages-of-terminal-access","title":"Advantages of Terminal Access","text":"
While it is possible to connect to SSM through a web browser, using the terminal offers several benefits:
Direct Shell Access: Provides real-time, interactive management capabilities.
Operational Efficiency: Enhances workflows by allowing quick and direct command executions.
"},{"location":"user-guide/ref-architecture-aws/features/ssm/ssm/#security-and-management-benefits","title":"Security and Management Benefits","text":"
Adopting this integrated approach offers significant advantages:
Increased Security: By using SSO, the system minimizes risks associated with multiple credential sets and potential unauthorized access.
Efficient Management: Centralizes control over AWS resources, reducing complexity and improving oversight.
This guide is structured into detailed sections that cover:
Pre-requisites: Requirements needed before you begin.
Variable Initialization: Setup and explanation of the necessary variables.
Authentication via SSO: How to authenticate using the leverage aws sso configure command.
Exporting AWS Credentials: Guidelines for correctly exporting AWS credentials for session management.
Session Handling: Detailed instructions for starting, managing, and terminating SSM sessions.
Each section aims to provide step-by-step instructions to ensure you are well-prepared to use the AWS SSM configuration tool effectively.
Navigate through the subsections for detailed information relevant to each stage of the setup process and refer back to this guide as needed to enhance your experience and utilization of AWS SSM capabilities.
Before you begin, ensure that you have the necessary tools and permissions set up:
SSM Plugin for AWS CLI: Crucial for starting SSM sessions from the command line. Install it by following the steps on the AWS Documentation site.
"},{"location":"user-guide/ref-architecture-aws/features/ssm/ssm/#getting-started-guide","title":"Getting Started Guide","text":""},{"location":"user-guide/ref-architecture-aws/features/ssm/ssm/#step-1-initialize-environment-variables","title":"Step 1: Initialize Environment Variables","text":"
Set up all necessary variables used throughout the session. These include directories, profiles, and configuration settings essential for the script\u2019s functionality.
"},{"location":"user-guide/ref-architecture-aws/features/ssm/ssm/#step-2-authenticate-via-sso","title":"Step 2: Authenticate via SSO","text":"
Navigate to the required layer directory and perform authentication using AWS SSO. This step verifies your credentials and ensures that subsequent operations are secure.
cd $FOLDER/shared/us-east-1/tools-vpn-server\nleverage aws sso configure\n
This command initiates a secure session to the specified EC2 instance using SSM. It's a crucial tool for managing your servers securely without the need for direct SSH access. Ensure that your permissions and profiles are correctly configured to use this feature effectively.
By following these steps, you can efficiently set up and use the AWS SSM configuration tool for enhanced security and management of your cloud resources.
For a complete view of the script and additional configurations, please refer to the full Gist.
Before deploying your AWS SSO definition in the project, it will first have to be manually enabled in the AWS Management Console.
Prerequisites
Enable AWS SSO
After that, choosing and configuring an Identity Provider (IdP) is the next step. For this, we will make use of JumpCloud, as described in the how it works section. These resources point to all requirements and procedures to have your JumpCloud account setup and synched with AWS SSO:
AWS JumpCloud support guide
JumpCloud guide on how to configure as IdP for AWS SSO
Once this is set up, the SSO layer can be safely deployed.
"},{"location":"user-guide/ref-architecture-aws/features/sso/configuration/#preparing-the-project-to-use-aws-sso","title":"Preparing the project to use AWS SSO","text":"
To implement SSO authentication in your IaC definition, some configuration values need to be present in your project.
sso_enabled determines whether leverage will attempt to use credentials obtained via SSO to authenticate against AWS
sso_start_url and sso_region are necessary to configure AWS CLI correctly in order to be able to get the credentials
When configuring AWS CLI, a default profile is created containing region and output default settings. The region value is obtained from the previously mentioned sso_region, however, you can override this behavior by configuring a region_primary value in the same global configuration file, as so:
This is the role for which credentials will be obtained via SSO when operating in the current layer.
"},{"location":"user-guide/ref-architecture-aws/features/sso/configuration/#authentication-via-sso","title":"Authentication via SSO","text":""},{"location":"user-guide/ref-architecture-aws/features/sso/configuration/#1-configuring-aws-sso","title":"1. Configuring AWS SSO","text":"
Once the project has been set up to use SSO, the profiles that AWS CLI will use to authenticate against the service need to be created.
To do this, simply run leverage aws configure sso.
Attention
This step simply writes over the credentials files for AWS CLI without asking for confirmation from the user. So it's recommended to backup/wipe old credentials before executing this step in order to avoid loss of credentials or conflicts with profiles having similar names to the ones generated by Leverage.
This step is executed as part of the previous one. So if the user has just configured SSO, this step is not required.
Having SSO configured, the user will proceed to log in.
This is achieved by running leverage aws sso login.
In this step, the user is prompted to manually authorize the log in process via a web console.
When logging in, Leverage obtains a token from SSO. This token is later used to obtain the credentials needed for the layer the user is working on. This token has a relatively short life span to strike a balance between security and convenience for the user.
"},{"location":"user-guide/ref-architecture-aws/features/sso/configuration/#3-working-on-a-layer","title":"3. Working on a layer","text":"
When SSO is enabled in the project, Leverage will automatically figure out the required credentials for the current layer, and attempt to get them from AWS every time the user executes a command on it.
These credentials are short lived (30 minutes) for security reasons, and will be refreshed automatically whenever they expire.
When the user has finished working, running leverage sso logout wipes out all remaining valid credentials and voids the token obtained from logging in.
Enabling and requiring MFA is highly recommended. We typically choose these following guidelines:
Prompt users for MFA: Only when their sign-in context changes (context-aware).
Users can authenticate with these MFA types: we allow security keys, built-in authenticators (such as fingerprint or retina/face scans), and authenticator apps.
If a user does not yet have a registered MFA device: require them to register an MFA device at sign in.
Who can manage MFA devices: users and administrators can add and manage MFA devices.
Refer to the official documentation for more details.
By default, the SSO session is set to last 12 hours. This is a good default but we still prefer to share this decision making with the Client -- e.g. focal point, dev/qa team, data science teams. They might factor in considerations such as security/compliance, UX/DevEx, operational needs, technical constraints, administration overheads, cost considerations, and more.
"},{"location":"user-guide/ref-architecture-aws/features/sso/managing-users/","title":"Managing users","text":""},{"location":"user-guide/ref-architecture-aws/features/sso/managing-users/#onboarding-users-and-groups","title":"Onboarding Users and Groups","text":""},{"location":"user-guide/ref-architecture-aws/features/sso/managing-users/#addremove-users","title":"Add/remove users","text":"
Open this file: management/global/sso/locals.tf
Locate the users map within the local variables definition
Add an entry to the users map with all the required data, including the groups the user should belong to
Apply your changes
Additional steps are required when creating a new user:
The user's email needs to be verified. Find the steps for that in this section.
After the user has verified his/her email he/she should be able to use the Forgot Password flow to generate its password. The steps for that can be found in this section.
Open this file: devops-tf-infra/management/global/sso/locals.tf
Find the users map within the local variables definition
Update the groups attribute to add/remove groups that user belongs to
Apply your changes
"},{"location":"user-guide/ref-architecture-aws/features/sso/managing-users/#trigger-user-email-activation","title":"Trigger user email activation","text":"
Log in to management account through the AWS console
Go to AWS IAM Identity Center
Go to the users section
Locate the user whose email you want to active
Click on the user to view the user details
There should be a \"Send verification email\" or \"Send email verification link\" button at the top. Click on it.
Notify the user, confirm that he/she got the email and that he/she clicked on the activation link.
"},{"location":"user-guide/ref-architecture-aws/features/sso/managing-users/#reset-a-user-password","title":"Reset a user password","text":"
JumpCloud will be configured as the Identity Provider (IdP) that we will integrate with AWS SSO in order to grant users access to AWS resources from a centralized service. Users will be able to log in to JumpCloud in order to access AWS accounts, using specific permission sets that will in turn determine what kind of actions they are allowed on AWS resources.
Users will be defined in JumpCloud and used for deploying AWS resources with scoped permissions.
"},{"location":"user-guide/ref-architecture-aws/features/sso/overview/#sso-groups","title":"SSO Groups","text":"Account / Groups Administrators DevOps FinOps SecurityAuditors Management x x x x
Consideration
This definition could be fully customized based on the project specific needs
"},{"location":"user-guide/ref-architecture-aws/features/sso/overview/#sso-permission-sets-w-account-associations","title":"SSO Permission Sets (w/ Account Associations)","text":"Account / Permission Sets Administrator DevOps FinOps SecurityAuditors Management x x Security x x x Shared x x x Network x x x Apps-DevStg x x x Apps-Prd x x x
Considerations
Devops Engineers will assume this permission set through JumpCloud + AWS SSO.
Developers could have their specific SSO Group + Permission Set policy association.
This definition could be fully customized based on the project specific needs
We will review all S3 buckets in the existing account to determine if it\u2019s necessary to copy over to the new account, evaluate existing bucket policy and tightening permissions to be absolutely minimum required for users and applications. As for EBS volumes, our recommendation is to create all encrypted by default. Overhead created by this process is negligible.
Storage class Designed for Durability (designed for) Availability (designed for) Availability Zones Min storage duration Min billable object size Other considerations S3 Standard Frequently accessed data 99.999999999% 99.99% >= 3 None None None S3 Standard-IA Long-lived, infrequently accessed data 99.999999999% 99.9% >= 3 30 days 128 KB Per GB retrieval fees apply. S3 Intelligent-Tiering Long-lived data with changing or unknown access patterns 99.999999999% 99.9% >= 3 30 days None Monitoring and automation fees per object apply. No retrieval fees. S3 One Zone-IA Long-lived, infrequently accessed, non-critical data 99.999999999% 99.5% 1 30 days 128 KB Per GB retrieval fees apply. Not resilient to the loss of the Availability Zone. S3 Glacier Long-term data archiving with retrieval times ranging from minutes to hours 99.999999999% 99.99% (after you restore objects) >= 3 90 days 40 KB Per GB retrieval fees apply. You must first restore archived objects before you can access them. For more information, see Restoring archived objects. S3 Glacier Deep Archive Archiving rarely accessed data with a default retrieval time of 12 hours 99.999999999% 99.99% (after you restore objects) >= 3 180 days 40 KB Per GB retrieval fees apply. You must first restore archived objects before you can access them. For more information, see Restoring archived objects. RRS (Not recommended) Frequently accessed, non-critical data 99.99% 99.99% >= 3 None None None"},{"location":"user-guide/ref-architecture-aws/features/storage/storage/#ebs-volumes","title":"EBS Volumes","text":"
Tech specs
Backups: Periodic EBS snapshots with retention policy
Encryption: Yes (by default)
Type: SSD (gp2) by default, Throughput Optimized HDD (st1) for some database workloads, if needed.
This guideline includes considerations and steps that should be performed when upgrading a cluster to a newer version.
"},{"location":"user-guide/ref-architecture-eks/cluster-upgrade/#upgrade-plan-overview","title":"Upgrade Plan Overview","text":"
General considerations
Preparation Steps
Understand what changed
Plan a maintenance window for the upgrade
Rehearse on a non-Production cluster first
Ensure you have proper visibility on the cluster
Upgrade Steps
Upgrade Control Plane
Upgrade Managed Node Groups
Upgrade Cluster AutoScaler version
Upgrade EKS Add-ons
Closing Steps
Migration Notes
"},{"location":"user-guide/ref-architecture-eks/cluster-upgrade/#detailed-upgrade-plan","title":"Detailed Upgrade Plan","text":""},{"location":"user-guide/ref-architecture-eks/cluster-upgrade/#1-general-considerations","title":"1) General considerations","text":"
Ensure your sensitive workloads are deployed in a highly available manner to reduce downtime as much as possible
Ensure Pod Disruption Budgets are set in your deployments to ensure your application pods are evicted in a controlled way (e.g. leave at least one pod active at all times)
Ensure Liveness and Readiness probes are set so that Kubernetes can tell whether your application is healthy to start receiving traffic or needs a restart
Plan the upgrade during off hours so that unexpected disruptions have even less impact on end-users
"},{"location":"user-guide/ref-architecture-eks/cluster-upgrade/#2-preparation-steps","title":"2) Preparation Steps","text":""},{"location":"user-guide/ref-architecture-eks/cluster-upgrade/#understand-what-changed","title":"Understand what changed","text":"
Here you need to get a good understanding of the things that changed between the current version and the version you want to upgrade to. For that, it is highly recommended to go to the AWS EKS official documentation as it is frequently being updated.
Another documentation you should refer to is the Kubernetes official documentation, specially the Kubernetes API Migration Guide which explains in great detail what has been changed.
For instance, typical changes include:
Removed/deprecated Kubernetes APIs: this one may require that you also upgrade the resources used by your applications or even base components your applications rely on. E.g. cert-manager, external-dns, etc.
You can use tools such as kubent to find deprecated API versions. That should list the resources that need to be upgraded however you may still need to figure out if it's an EKS base component or a cluster component installed via Terraform & Helm.
Base component updates: this is about changes to control plane components. components that run on the nodes. An example of that would be the deprecation and removal of Docker as a container runtime.
"},{"location":"user-guide/ref-architecture-eks/cluster-upgrade/#plan-a-maintenance-window-for-the-upgrade","title":"Plan a maintenance window for the upgrade","text":"
Keep in mind that, at the very least, you will be upgrading the control plane and the data plane; and in some cases you would also need to upgrade components and workloads. So, although Kubernetes has a great development team and automation; and even though we rely on EKS for which AWS performs additional checks and validations, we are still dealing with a complex, evolving piece of software, so planning for the upgrade is still a reasonable move.
Upgrading the control plane should not affect the workloads but you should still bear in mind that the Kubernetes API may become unresponsive during the upgrade, so anything that talks to the Kubernetes API might experience delays or even timeouts.
Now, upgrading the nodes is the more sensitive task and, while you can use a rolling-update strategy, that still doesn't provide any guarantees on achieving a zero down-time upgrade so, again, planning for some maintenance time is recommended.
"},{"location":"user-guide/ref-architecture-eks/cluster-upgrade/#rehearse-on-a-non-production-cluster-first","title":"Rehearse on a non-Production cluster first","text":"
Perform the upgrade on a non-Production to catch up and anticipate any issues before you upgrade the Production cluster. Also take notes and reflect any important updates on this document.
"},{"location":"user-guide/ref-architecture-eks/cluster-upgrade/#ensure-you-have-proper-visibility-on-the-cluster","title":"Ensure you have proper visibility on the cluster","text":"
Monitoring the upgrade is important so make sure you have monitoring tools in-place before attempting the upgrade. Such tools include the AWS console (via AWS EKS Monitoring section) and also tools like Prometheus/Grafana and ElasticSearch/Kibana. Make sure you are familiar with those before the upgrade.
"},{"location":"user-guide/ref-architecture-eks/cluster-upgrade/#3-upgrade-steps","title":"3) Upgrade Steps","text":""},{"location":"user-guide/ref-architecture-eks/cluster-upgrade/#1-upgrade-control-plane","title":"1) Upgrade Control Plane","text":"
This is simply about updating the cluster_version variable in the variables.tf file within the cluster layer of the cluster you want to upgrade and then applying that change. However, the current version of the Terraform EKS module, when modifying the cluster version input, it will show that it needs to upgrade the control plane and the nodes which may not follow the expected order (first cluster, then nodes). Another thing that could go wrong is Terraform ending up in an unfinished state due to the upgrade taking too long to complete (or, what happened to me, the cluster gets upgraded but somehow the launch template used for the nodes is deleted and thus the upgraded nodes cannot be spun up).
The alternative to all of that is to perform the upgrade outside Terraform and, after it is complete, to update the cluster_version variable in variables.tf file. Then you can run a Terraform Plan to verify the output shows no changes. This should be the method that provides a good degree of control over the upgrade.
Having said that, go ahead and proceed with the upgrade, either via the AWS console, the AWS CLI or the EKS CLI and watch the upgrade as it happens. As it was stated in a previous step, the Kubernetes API may evidence some down-time during this operation so make sure you prepare accordingly.
Once the control plane is upgraded you should be ready to upgrade the nodes. There are 2 strategies you could use here: rolling-upgrade or recreate. The former is recommended for causing the minimal disruption. Recreate could be used in an environment where down-time won't be an issue.
As it was mentioned in the previous step, the recommendation is to trigger the upgrade outside Terraform so please proceed with that and monitor the operation as it happens (via AWS EKS console, via Kubectl, via Prometheus/Grafana).
If you go with the AWS CLI, you can use the following command to get a list of the clusters available to your current AWS credentials:
aws eks list-clusters --profile [AWS_PROFILE]\n
Make a note of the cluster name as you will be using that in subsequent commands.
Now use the following command to get a list of the node groups:
After that, you need to identify the appropriate release version for the upgrade. This is determined by the Amazon EKS optimized Amazon Linux AMI version, the precise value can be found in the AMI Details section in the github repository CHANGELOG. Look for the column named Release version in the first table under the corresponding Kubernetes version collapsed section (indicated by a \u25b8 symbol).
With that information you should be ready to trigger the update with the command below:
Modify scaling.tf per the official Kubernetes autoscaler chart and apply with Terraform. The version of the cluster autoscaler should at least match the cluster version you are moving to. A greater version of the autoscaler might work with earlier version of Kubernetes but the opposite most likely won't be the case.
"},{"location":"user-guide/ref-architecture-eks/cluster-upgrade/#4-upgrade-eks-base-components","title":"4) Upgrade EKS base components","text":"
Namely these components are:
Kube-Proxy
CoreDNS
VPC CNI
EBS-CSI driver
In recent versions EKS is able to manage these components as add-ons which makes their upgrades less involved and which can even be performed through a recent version of the Terraform EKS module. However, we are not currently using EKS Add-ons to manage the installation of these components, we are using the so called self-managed approach, so the upgrade needs to be applied manually.
Generally speaking, the upgrade procedure could be summed up as follows:
Determine current version
Determine the appropriate version you need to upgrade to
Upgrade each component and verify
Now, the recommendation is to refer to the following guides which carefully describe the steps that need to be performed:
Kube-proxy: check here
CoreDNS: check here
VPC CNI: check here
EBS-CSI driver: check here
IMPORTANT: be extremely careful when applying these updates, specially with the VPC CNI as the instructions are not easy to follow.
Make sure you notify the team about the upgrade result. Also, do not forget about committing/pushing all code changes to the repository and creating a PR for them.
If you found any information you consider it should be added to this document, you are welcome to reflect that here.
"},{"location":"user-guide/ref-architecture-eks/cluster-upgrade/#upgrade-to-v121","title":"Upgrade to v1.21","text":"
VPC CNI: The latest available version was v1.11.4 but it was only able to be upgraded to v1.9.3. It couldn't be moved further because v1.10.3 wasn't able to run as it keep throwing the following errors:
{\"level\":\"info\",\"ts\":\"2022-10-07T15:42:01.802Z\",\"caller\":\"entrypoint.sh\",\"msg\":\"Retrying waiting for IPAM-D\"}\npanic: runtime error: invalid memory address or nil pointer dereference\n[signal SIGSEGV: segmentation violation code=0x1 addr=0x39 pc=0x560d2186d418]\n
Cluster Autoscaler: it is already at v1.23.0. The idea is that this should match with the Kubernetes version but since the version we have has been working well so far, we can keep it and it should cover us until we upgrade Kubernetes to a matching version.
Managed Nodes failures due to PodEvictionFailure: this one happened twice during a Production cluster upgrade. It seemed to be related to Calico pods using tolerations that are not compatible with Kubernetes typical node upgrade procedure. In short, the pods tolerate the NoSchedule taint and thus refuse to be evicted from the nodes during a drain procedure. The workaround that worked was using a forced upgrade. That is essentially a flag that can be passed via Terraform (or via AWS CLI). A more permanent solution would involve figuring out a proper way to configure Calico pods without the problematic toleration; we just need to keep in mind that we are deploying Calico via the Tigera Operator.
"},{"location":"user-guide/ref-architecture-eks/cluster-upgrade/#upgrade-to-v122","title":"Upgrade to v1.22","text":"
Control plane and managed nodes: no issues. Cluster Autoscaler: already at v1.23.0. Kube-proxy: no issues. Upgraded to v1.22.16-minimal-eksbuild.3. CodeDNS: no issues. Upgraded to v1.8.7-eksbuild.1. VPC CNI: no issues. Upgraded to latest version available, v1.12.1.
Outstanding issue: Prometheus/Grafana instance became unresponsive right during the upgrade of the control plane. It was fully inaccessible. A stop and start was needed to bring it back up.
"},{"location":"user-guide/ref-architecture-eks/cluster-upgrade/#upgrade-to-v125","title":"Upgrade to v1.25","text":"
Before upgrading to v1.25 there's two main issues to tackle. The main one is the removal of the PodSecurityPolicy resource from the policy/v1beta1 API. You should migrate all your PSPs to PodSecurityStandards or any other Policy-as-code solution for Kubernetes. If the only PSP found in the cluster is named eks.privileged, you can skip this step. This is a PSP handled by EKS and will be migrated for you by the platform. For more information about this, the official EKS PSP removal FAQ can be referenced. The second issue to tackle is to upgrade aws-load-balancer-controller to v2.4.7 or later to address the removal of EndpointSlice from the discovery.k8s.io/v1beta1 API. This should be done via the corresponding helm-chart.
After the control plane and managed nodes are upgraded, which should present no issues, the cluster autoscaler needs to be upgraded. Usually we would achieve this by changing the helm-hart version to an appropriate one that deploys the matching version to the cluster, that is, cluster autoscaler v1.25. However, there's no release that covers this scenario, as such, we need to provide the cluster autoscaler image version to the current helm-chart via the image.tag values file variable.
Addons should present no problems being upgraded to the latest available version.
"},{"location":"user-guide/ref-architecture-eks/cluster-upgrade/#upgrade-to-v126","title":"Upgrade to v1.26","text":"
No extra considerations are needed to upgrade from v1.25 to v1.26. The standard procedure listed above should work with no issues.
"},{"location":"user-guide/ref-architecture-eks/cluster-upgrade/#upgrade-to-v127","title":"Upgrade to v1.27","text":"
In a similar fashion to the previous upgrade notes, the standard procedure listed above should work with no issues.
Official AWS procedure
A step-bystep instructions guide for upgrading an EKS cluster can be found at the following link: https://repost.aws/knowledge-center/eks-plan-upgrade-cluster
Official AWS Release Notes
To be aware of important changes in each new Kubernetes version in standard support, is important to check the AWS release notes for standard support versions.
Most of these components and services are installed via Helm charts. Usually tweaking these components configuration is done via the input values for their corresponding chart. For detailed information on the different parameters please head to each component public documentation (Links in each section).
It automatically provisions AWS Application Load Balancers (ALB) or AWS Network Load Balancers (NLB) in response to the creation of Kubernetes Ingress or LoadBalancer resources respectively. Automates the routing of traffic to the cluster.
Implementation in ref architecture: /apps-devstg/us-east-1/k8s-eks-demoapps
It is used to allow for the configuration of NGINX via a system of annotations in Kubernetes resources.
A configuration can be enforced globally, via the controller.config variable in the helm-chart, or individually for each application, via annotations in the Ingress resource of the application.
Implementation in ref architecture: /apps-devstg/us-east-1/k8s-eks-demoapps
Automatically creates the required DNS records based on the definition of Ingress resources in the cluster.
The annotation kubernetes.io/ingress.class: <class> defines whether the records are created in the public hosted zone or the private hosted zone for the environment. It accepts one of two values: public-apps or private-apps.
Implementation in ref architecture: /apps-devstg/us-east-1/k8s-eks-demoapps
Automatically fetches secrets and parameters from Parameter Store, AWS Secrets Manager and other sources, and makes them available in the cluster as Kubernetes Secrets.
Implementation in ref architecture: /apps-devstg/us-east-1/k8s-eks-demoapps
Stack of Kubernetes manifests, monitoring, alerting and visualization applications, rules and dashboards implementing an end-to-end Kubernetes monitoring solution.
Implementation in ref architecture: /apps-devstg/us-east-1/k8s-eks-demoapps
Provides the capability of using more complex deployment and promotion schemes to eliminate downtime and allow for greater control of the process. Like Blue-Green or Canary deployment.
"},{"location":"user-guide/ref-architecture-eks/components/#argo-cd-image-updater","title":"Argo CD Image Updater","text":"
Tracks for new images in ECR and updates the applications definition so that Argo CD automatically proceeds with the deployment of such images.
Access to EKS is usually achieved via IAM roles. These could be either custom IAM roles that you define, or SSO roles that AWS takes care of creating and managing.
Granting different kinds of access to IAM roles can be done as shown here where you can define classic IAM roles or SSO roles. Note however that, since the latter are managed by AWS SSO, they could change if they are recreated or reassigned.
Now, even though granting access to roles is the preferred way, keep in mind that that is not the only way you can use. You can also grant access to specific users or to specific accounts.
Amazon Elastic Kubernetes Services (EKS) is a managed service that makes it easy for you to run Kubernetes on AWS without needing to install and operate your own Kubernetes control plane or worker nodes.
Core Features
Highly Secure: EKS automatically applies the latest security patches to your cluster control plane.
Multiple Availability Zones: EKS auto-detects and replaces unhealthy control plane nodes and provides on-demand, zero downtime upgrades and patching.
Serverless Compute: EKS supports AWS Fargate to remove the need to provision and manage servers, improving security through application isolation by design.
Built with the Community: AWS actively works with the Kubernetes community, including making contributions to the Kubernetes code base helping you take advantage of AWS services.
Figure: AWS K8s EKS architecture diagram (just as reference). (Source: Jay McConnell, \"A tale from the trenches: The CloudBees Core on AWS Quick Start\", AWS Infrastructure & Automation Blog post, accessed November 18th 2020)."},{"location":"user-guide/ref-architecture-eks/overview/#version-support","title":"Version Support","text":"
At Leverage we support the last 3 latest stable Kubernetes version releases (at best effort) within our Reference Architecture EKS layer and IaC Library EKS module
We think this is a good balance between management overhead and an acceptable level of supported versions (at best effort). If your project have and older legacy version we could work along your CloudOps team to safely migrate it to a Leverage supported EKS version.
This is the primary resource which defines the cluster. We will create one cluster on each account:
apps-devstg/us-east-1/k8s-eks
apps-devstg/us-east-1/k8s-eks-demoapps
Important
In case of multiple environments hosted in the same cluster as for the one with Apps Dev and Stage, the workload isolation will be achieved through Kubernetes features such as namespaces, network policies, RBAC, and others.
Each option has its pros and cons with regard to cost, operation complexity, extensibility, customization capabilities, features, and management.
In general we implement Managed Nodes. The main reasons being:
They allow a high degree of control in terms of the components we can deploy and the features those components can provide to us. For instance we can run ingress controllers and service mesh, among other very customizable resources.
AWS takes care of provisioning and lifecycle management of nodes which is one less task to worry about.
Upgrading Kubernetes versions becomes much simpler and quicker to perform.
We still can, at any time, start using Fargate and Fargate Spot by simply creating a profile for one or both of them, then we only need to move the workloads that we want to run on Fargate profiles of our choice.
AWS EKS: Docker runs in the 172.17.0.0/16 CIDR range in Amazon EKS clusters. We recommend that your cluster's VPC subnets do not overlap this range. Otherwise, you will receive the following error:
Error: : error upgrading connection: error dialing backend: dial tcp 172.17.nn.nn:10250:\ngetsockopt: no route to host\n
Read more: AWS EKS network requirements
Reserved IP Addresses The first four IP addresses and the last IP address in each subnet CIDR block are not available for you to use, and cannot be assigned to an instance. For example, in a subnet with CIDR block 10.0.0.0/24, the following five IP addresses are reserved. For more AWS VPC Subnets IP addressing
"},{"location":"user-guide/ref-architecture-eks/vpc/#vpcs-ip-addressing-plan-cidr-blocks-sizing","title":"VPCs IP Addressing Plan (CIDR blocks sizing)","text":"
Introduction
VPCs can vary in size from 16 addresses (/28 netmask) to 65,536 addresses (/16 netmask). In order to size a VPC correctly, it is important to understand the number, types, and sizes of workloads expected to run in it, as well as workload elasticity and load balancing requirements.
Keep in mind that there is no charge for using Amazon VPC (aside from EC2 charges), therefore cost should not be a factor when determining the appropriate size for your VPC, so make sure you size your VPC for growth.
Moving workloads or AWS resources between networks is not a trivial task, so be generous in your IP address estimates to give yourself plenty of room to grow, deploy new workloads, or change your VPC design configuration from one to another. The majority of AWS customers use VPCs with a /16 netmask and subnets with /24 netmasks. The primary reason AWS customers select smaller VPC and subnet sizes is to avoid overlapping network addresses with existing networks.
So having AWS single VPC Design we've chosen a Medium/Small VPC/Subnet addressing plan which would probably fit a broad range variety of use cases
"},{"location":"user-guide/ref-architecture-eks/vpc/#networking-ip-addressing","title":"Networking - IP Addressing","text":"
Starting CIDR Segment (AWS EKS clusters)
AWS EKS clusters IP Addressing calculation is presented below based on segment 10.0.0.0/16 (starts at /16 due to AWS VPC limits)
We started from 10.0.0.0/16 and subnetted to /19
Resulting in Total Subnets: 8
Number of available hosts for each subnet: 8190
Number of available IPs (AWS) for each subnet: 8187
Individual CIDR Segments (VPCs)
Then each of these are /16 to /19
Considering the whole Starting CIDR Segment (AWS EKS clusters) before declared, we'll start at 10.0.0.0/16
apps-devstg
1ry VPC CIDR: 10.0.0.0/16
1ry VPC DR CIDR: 10.20.0.0/16
apps-prd
1ry VPC CIDR: 10.10.0.0/16
1ry VPC DR CIDR: 10.30.0.0/16
Resulting in Subnets: 4 x VPC
VPC Subnets with Hosts/Net: 16.
Eg: apps-devstg account \u2192 us-east-1 w/ 3 AZs \u2192 3 x Private Subnets /az + 3 x Public Subnets /az
1ry VPC CIDR: 10.0.0.0/16Subnets:
Private 10.0.0.0/19, 10.0.32.0/19 and 10.0.64.0/19
Public 10.0.96.0/19, 10.0.128.0/19 and 10.0.160.0/19
"},{"location":"user-guide/ref-architecture-eks/vpc/#planned-subnets-per-vpc","title":"Planned Subnets per VPC","text":"
Having defined the initial VPC that will be created in the different accounts that were defined, we are going to create subnets in each of these VPCs defining Private and Public subnets split among different availability zones:
In case you would like to further understand the different tech specs and configs for this Ref Arch you could find some details like at the user-guide/Compute/K8s EKS
Config files can be found under each config folders
Global config file /config/common.tfvars contains global context TF variables that we inject to TF commands which are used by all sub-directories such as leverage terraform plan or leverage terraform apply and which cannot be stored in backend.tfvars due to TF.
Account config files
backend.tfvars contains TF variables that are mainly used to configure TF backend but since profile and region are defined there, we also use them to inject those values into other TF commands.
account.tfvars contains TF variables that are specific to an AWS account.
At the corresponding account dir, eg: /hcp/base-tf-backend then,
Run leverage terraform init
Run leverage terraform plan, review the output to understand the expected changes
Run leverage terraform apply, review the output once more and type yes if you are okay with that
This should create a terraform.tfstate file in this directory but we don't want to push that to the repository so let's push the state to the backend we just created
Make sure you've read and prepared your local development environment following the Overview base-configurations section.
Depending in which Terraform Ref Architecture repo you are working, please review and assure you meet all the terraform aws pre-requisites or terraform vault pre-requisites
Remote State
Configuration files
AWS Profile and credentials
Vault token secret
Get into the folder that you need to work with (e.g. 2_identities)
Run leverage terraform init
Make whatever changes you need to make
Run leverage terraform plan if you only mean to preview those changes
Run leverage terraform apply if you want to review and likely apply those changes
Note
If desired, at step #5 you could submit a PR, allowing you and the rest of the team to understand and review what changes would be made to your AWS Cloud Architecture components before executing leverage terraform apply (terraform apply). This brings the huge benefit of treating changes with a GitOps oriented approach, basically as we should treat any other code & infrastructure change, and integrate it with the rest of our tools and practices like CI/CD, in
"},{"location":"user-guide/ref-architecture-vault/workflow/#running-in-automation","title":"Running in Automation","text":"Figure: Running terraform with AWS in automation (just as reference)."},{"location":"user-guide/ref-architecture-vault/workflow/#read-more","title":"Read More","text":"
Make sure you read general troubleshooting page before trying out anything else.
"},{"location":"user-guide/troubleshooting/credentials/#are-you-using-iam-or-sso","title":"Are you using IAM or SSO?","text":"
Leverage supports two methods for getting AWS credentials: IAM and SSO. We are progressively favoring SSO over IAM, only using the latter as a fallback option.
SSO is enabled through the common.tfvars file on this line:
sso_enabled = true\n
If that is set to true, then you are using SSO, otherwise it's IAM."},{"location":"user-guide/troubleshooting/credentials/#why-should-i-care-whether-i-am-using-iam-or-sso","title":"Why should I care whether I am using IAM or SSO?","text":"
Well, because even though both methods will try to get temporary AWS credentials, each method will use a different way to do that. In fact, Leverage relies on the AWS CLI to get the credentials and each method requires completely different commands to achieve that.
"},{"location":"user-guide/troubleshooting/credentials/#do-you-have-mfa-enabled","title":"Do you have MFA enabled?","text":"
MFA is optionally used via the IAM method. It can be enabled/disabled in the build.env file.
Keep in mind that MFA should only be used with the IAM method, not with SSO.
"},{"location":"user-guide/troubleshooting/credentials/#identify-which-credentials-are-failing","title":"Identify which credentials are failing","text":"
Since Leverage actually relies on Terraform and, since most of the definitions are AWS resources, it is likely that you are having issues with the Terraform AWS provider, in other words, you might be struggling with AWS credentials. Now, bear in mind that Leverage can also be used with other providers such as Gitlab, Github, Hashicorp Cloud Platform, or even SSH via Ansible; so the point here is to understand what credentials are not working for you in order to focus the troubleshooting on the right suspect.
"},{"location":"user-guide/troubleshooting/credentials/#determine-the-aws-profile-you-are-using","title":"Determine the AWS profile you are using","text":"
When you are facing AWS credentials issues it's important to understand what is the AWS profile that might be causing the issue. Enabling verbose mode should help with that. The suspect profile is likely to show right above the error line and, once you have identified that, you can skip to the next section.
If the above doesn't make the error evident yet, perhaps you can explore the following questions:
Is it a problem with the Terraform remote state backend? The profile used for that is typically defined in the backend.tfvars file, e.g. this one, or this other one.
Is it a problem with another profile used by the layer? Keep in mind that layers can have multiple profile definitions in order to be able to access resources in different accounts. For instance, this is a simple provider definition that uses a single profile, but here's a more complex definition with multiple provider blocks.
Can the problematic profile be found in the AWS config file? Or is the profile entry in the AWS config file properly defined? Read the next sections for more details on that.
"},{"location":"user-guide/troubleshooting/credentials/#check-the-profiles-in-your-aws-config-file","title":"Check the profiles in your AWS config file","text":"
Once you know what AWS profile is giving you headaches, you can open the AWS config file, typically under ~/.aws/[project_name_here]/config, to look for and inspect that profile definition.
Things to look out for:
Is there a profile entry in that file that matches the suspect profile?
Are there repeated profile entries?
Does the profile entry include all necessary fields (e.g. region, role_arn, source_profile; mfa_serial if MFA is enabled)?
Keep in mind that profiles change depending on if you are using SSO or IAM for getting credentials so please refer to the corresponding section below in this page to find specific details about your case.
"},{"location":"user-guide/troubleshooting/credentials/#configure-the-aws-cli-for-leverage","title":"Configure the AWS CLI for Leverage","text":"
These instructions can be used when you need to test your profiles with the AWS CLI, either to verify the profiles are properly set up or to validate the right permissions were granted.
Since Leverage stores the AWS config and credentials file under a non-default path, when using the AWS CLI you'll need to point it to the right locations:
Get shell access to the Leverage Toolbox Docker Image
Another alternative, if you can't or don't want to install the AWS CLI on your machine, is to use the one included in the Leverage Toolbox Docker image. You can access it by running leverage tf shell
"},{"location":"user-guide/troubleshooting/credentials/#test-the-failing-profile-with-the-aws-cli","title":"Test the failing profile with the AWS CLI","text":"
Once you have narrowed down your investigation to a profile what you can do is test it. For instance, let's assume that the suspect profile is le-shared-devops. You can run this command: aws sts get-caller-identity --profile le-shared-devops in order to mimic the way that AWS credentials are generated in order to be used by Terraform, so if that command succeeds then that's a good sign.
Note: if you use the AWS CLI installed in your host machine, you will need to configure the environment variables in the section \"Configure the AWS CLI for Leverage\" below.
AWS CLI Error Messages
The AWS CLI has been making great improvements to its error messages over time so it is important to pay attention to its output as it can reveal profiles that have been misconfigured with the wrong roles or missing entries.
"},{"location":"user-guide/troubleshooting/credentials/#regenerating-the-aws-config-or-credentials-files","title":"Regenerating the AWS config or credentials files","text":"
If you think your AWS config file has misconfigured or missing profile entries (which could happen due to manual editing of that file, or when AWS accounts have been added or remove) you can try regenerating it via Leverage CLI. But before you do that make sure you know which authentication method you are using: SSO or IAM.
When using IAM, regenerating your AWS config file can be achieved through the leverage credentials command. Check the command documentation here.
When using SSO, the command you need to run is leverage aws configure sso. Refer to that command's documentation for more details.
"},{"location":"user-guide/troubleshooting/credentials/#logging-out-of-your-sso-session","title":"Logging out of your SSO session","text":"
Seldom times, when using SSO, we have received reports of strange behaviors while trying to run Terraform commands via the Leverage CLI. For instance, users would try to run a leverage tf init command but would get an error saying that their session is expire; so they would try to log in via leverage aws sso login as expected, which would proceed normally so they would try the init command again just to get the same error as before. In these cases, which we are still investigating as they are very hard to reproduce, what has worked for most users is to log out from the SSO session via leverage aws sso logout, even log out from your SSO session through the AWS console running your browser, then try logging back in via leverage aws sso login, and then try the init command again.
"},{"location":"user-guide/troubleshooting/general/","title":"Troubleshooting general issues","text":""},{"location":"user-guide/troubleshooting/general/#gathering-more-information","title":"Gathering more information","text":"
Trying to get as much information about the issue as possible is key when troubleshooting.
If the issue happens while you are working on a layer of the reference architecture and you are using Terraform, you can use the --verbose flag to try to get more information about the underlying issue. For instance, if the error shows up while running a Terraform plan command, you can enable a more verbose output like follows:
leverage --verbose tf plan\n
The --verbose flag can also be used when you are working with the Ansible Reference Architecture:
leverage --verbose run init\n
"},{"location":"user-guide/troubleshooting/general/#understanding-how-leverage-gets-the-aws-credentials-for-terraform-and-other-tools","title":"Understanding how Leverage gets the AWS credentials for Terraform and other tools","text":"
Firstly, you need to know that Terraform doesn't support AWS authentication methods that require user interaction. For instance, logging in via SSO or assuming roles that require MFA. That is why Leverage made the following two design decisions in that regard:
Configure Terraform to use AWS profiles via Terraform AWS provider and local AWS configuration files.
Leverage handles the user interactivity during the authentication phase in order to get the credentials that Terraform needs through AWS profiles.
So, Leverage runs simple bash scripts to deal with 2. and then passes the execution flow to Terraform which by then should have the AWS profiles ready-to-use and in the expected path.
"},{"location":"user-guide/troubleshooting/general/#where-are-those-aws-profiles-stored-again","title":"Where are those AWS profiles stored again?","text":"
They are stored in 2 files: config and credentials. By default, the AWS CLI will create those files under this path: ~/.aws/ but Leverage uses a slightly different convention, so they should actually be located in this path: ~/.aws/[project_name_here]/.
So, for instance, if your project name is acme, then said files should be found under: ~/.aws/acme/config and ~/.aws/acme/credentials.
If you get a reiterative dialog for confirmation while running a leverage terraform init :
Warning: the ECDSA host key for 'YYY' differs from the key for the IP address 'ZZZ.ZZZ.ZZZ.ZZZ'\nOffending key for IP in /root/.ssh/known_hosts:xyz\nMatching host key in /root/.ssh/known_hosts:xyw\nAre you sure you want to continue connecting (yes/no)?\n
You may have more than 1 key associated to the YYY host. Remove the old or incorrect one, and the dialog should stop."},{"location":"user-guide/troubleshooting/general/#leverage-cli-cant-find-the-docker-daemon","title":"Leverage CLI can't find the Docker daemon","text":"
The Leverage CLI talks to the Docker API which usually runs as a daemon on your machine. Here's an example of the error:
$ leverage tf shell\n[17:06:13.754] ERROR Docker daemon doesn't seem to be responding. Please check it is up and running correctly before re-running the command.\n
"},{"location":"user-guide/troubleshooting/general/#macos-after-docker-desktop-upgrade","title":"MacOS after Docker Desktop upgrade","text":"
We've seen this happen after a Docker Desktop upgrade. Defaults are changed and the Docker daemon no longer uses Unix sockets but TCP, or perhaps it does use Unix sockets but under a different path or user.
What has worked for us in order to fix the issue is to make sure the following setting is enabled:
Note: that setting can be accessed by clicking on the Docker Desktop icon tray, and then clicking on \"Settings...\". Then click on the \"Advanced\" tab to find the checkbox.
"},{"location":"user-guide/troubleshooting/general/#linux-and-docker-in-rootless-mode","title":"Linux and Docker in Rootless mode","text":"
The same problem might come from missing env variable DOCKER_HOST. leverage looks for Docker socket at unix:///var/run/docker.sock unless DOCKER_HOST is provided in environment. If you installed Docker in Rootless mode, you need to remember to add DOCKER_HOST in you rc files:
"},{"location":"user-guide/troubleshooting/general/#leverage-cli-fails-to-mount-the-ssh-directory","title":"Leverage CLI fails to mount the SSH directory","text":"
The Leverage CLI mounts the ~/.ssh directory in order to make the pulling of private Terraform modules work. The error should look similar to the following:
[18:26:44.416] ERROR Error creating container:\n APIError: 400 Client Error for http+docker://localhost/v1.43/containers/create: Bad Request (\"invalid mount config for type \"bind\": stat /host_mnt/private/tmp/com.apple.launchd.CWrsoki5yP/Listeners: operation not supported\")\n
The problem happes because of the file system virtualization that is used by default and can be fixed by choosing the \"osxfs (Legacy)\" option as shown below:
Note: that setting can be accessed by clicking on the Docker Desktop icon tray, and then clicking on \"Settings...\". The setting should be in the \"General\" tab.
"},{"location":"work-with-us/","title":"Work with us","text":""},{"location":"work-with-us/#customers-collaboration-methodology","title":"Customers collaboration methodology","text":"
What are all the steps of an engagement
1st Stage: Leverage Customer Tech Intro Interview
Complete our binbash Leverage project evaluation form so we can get to know your project, find out if you're a good fit and get in contact with you.
Schedule a tech intro interview meeting to understand which are your exact challenges and do a Leverage Reference Architecture feasibility assessment.
2nd Stage: Leverage Reference Architecture Review
If we can contribute, we'll execute a Mutual NDA (ours or yours), then walk your through to complete our binbash Leverage due diligence for Reference Architecture form.
Once we completely understand your requirements we'll prepare a comprehensive proposal including the complete \"Leverage Implementation Action Plan Roadmap\" (also known as Statement of Work - SOW) detailing every task for the entire project.
After you review it and we agree on the general scope, a Services Agreement (SA) is signed.
The Roadmap (SOW) is executed, we'll send an invoice for the deposit and first Sprint starts.
4rth Stage: binbash Leverage Support
During and after finishing the complete Roadmap we'll provide commercial support, maintenance and upgrades for our work over the long term.
"},{"location":"work-with-us/#work-methodology-intro-video","title":"Work methodology intro video","text":""},{"location":"work-with-us/#customer-support-workflow","title":"Customer Support workflow","text":""},{"location":"work-with-us/#read-more","title":"Read More","text":"
Related articles
FAQs | Agreement and statement of work
"},{"location":"work-with-us/careers/","title":"Careers","text":""},{"location":"work-with-us/careers/#how-we-work","title":"How we work","text":"
binbash work culture
Fully Remote
binbash was founded as a remote-first company. That means you can always work from home, a co-working place, a nice cafe, or wherever else you feel comfortable, and you'll have almost complete control over your working hours. Why \"almost\"? Because depending on the current projects we'll require few hours of overlap between all Leverage collaborators for some specific meetings or shared sessions (pair-programming).
Distributed Team
Despite the fact that our collaborators are currently located in \ud83c\udde6\ud83c\uddf7 Argentina, \ud83c\udde7\ud83c\uddf7 Brazil and \ud83c\uddfa\ud83c\uddfe Uruguay, consider we are currently hiring from most countries in the time zones between GMT-7 (e.g. California, USA) to GMT+2 (e.g., Berlin, Germany).
We promote life-work balance
Job burnout is an epidemic \ud83d\ude46, and we tech workers are especially at risk. So we'll do our best to de-stress our workforce at binbash. In order to achieve this we offer:
Remote work that lets you control your hours and physical location.
Normal working hours (prime-time 9am-5pm GTM-3), in average no more than ~30-40hs per week, and we don't work during weekends or your country of residence national holidays.
Project management and planning that will take into consideration the time zone of all our team members.
A flexible vacation policy where you could take 4 weeks per year away from the keyboard. If more time is needed we could always try to arrange it for you.
No ON-CALL rotation. We only offer support contracts with SLAs of responses on prime time business days hours exclusively.
You will take on big challenges, but the hours are reasonable.
Everyone is treated fairly and with respect, but where disagreement and feedback is always welcome.
That is welcoming, safe, and inclusive for people of all cultures, genders, and races.
Create a collection of reusable, tested, production-ready E2E AWS oriented infrastructure modules (e.g., VPC, IAM, Kubernetes, Prometheus, Grafana, EFK, Consul, Vault, Jenkins, etc.) using several tool and languages: Terraform, Ansible, Helm, Dockerfiles, Python, Bash and Makefiles.
Reference Architecture
Improve, maintain, extend and update our reference architecture, which has been designed under optimal configs for the most popular modern web and mobile applications needs. Its design is fully based on the AWS Well Architected Framework.
Open Source & Leverage DevOps Tools
Contribute to our open source projects to continue building a fundamentally better DevOps experience, including our open source modules, leverage python CLI, Makefiles Lib among others.
Document team knowledge
Get siloed and not yet documented knowledge and extend the Leverage documentation, such as creating knowledgebase articles, runbooks, and other documentation for the internal team as well as binbash Leverage customers.
Customer engineering support
While participating in business-hours only support rotations, collaborate with customer requests, teach binbash Leverage and DevOps best-practices, help resolve problems, escalate to internal SMEs, and automate and document the solutions so that problems are mitigated for future scenarios and users.
Role scope and extra points!
Responsible for the development, maintenance, support and delivery of binbash Leverage Products.
Client side Leverage Reference Architecture solutions implementation, maintenance and support.
Client side cloud solutions & tech management (service delivery and project task management).
Bring Leverage recs for re-engineering, bug fixes (issues) report and improvements based on real scenario implementations.
Mentoring, KT, PRs and team tech follow up both internally and customer facing.
binbash is a small, distributed startup, so things are changing all the time, and from time to time we all wear many hats. You should expect to write lot of code, but, depending on your interests, there will also be lot of opportunities to write blog posts, give talks, contribute to open source, go to conferences, talk with customers, do sales calls, think through financial questions, interview candidates, mentor new hires, design products, come up with marketing ideas, discuss strategy, consider legal questions, and all the other tasks that are part of working at a small company.
Nice to have background
You hate repeating and doing the same thing twice and would rather spend the time to automate a problem away than do the same task again.
You have strong English communication skills and are comfortable engaging with external customers.
You know how to write code across the stack (\u201cDev\u201d) and feel very comfortable with Infra as Code (\"IaC\").
You have experience running production software environments (\"Ops\").
You have a strong background in software engineering and understanding of CI/CD (or you are working hard on it!).
You have a passion for learning new technologies, tools and programming languages.
Bonus points for a sense of humor, empathy, autonomy and curiosity.
Note that even if we're concerned with prior experience like AWS, Linux and Terraform, we're more concerned with curiosity about all areas of the Leverage stack and demonstrated ability to learn quickly and go deep when necessary.
"},{"location":"work-with-us/contribute/","title":"Contribute and Developing binbash Leverage","text":"
This document explains how to get started with developing for Leverage Reference Architecture. It includes how to build, test, and release new versions.
"},{"location":"work-with-us/contribute/#quick-start","title":"Quick Start","text":""},{"location":"work-with-us/contribute/#getting-the-code","title":"Getting the code","text":"
The code must be checked out from this same github.com repo inside the binbash Leverage Github Organization.
Leverage is mainly oriented to Latam, North America and European startup's CTOs, VPEs, Engineering Managers and/or team leads (Software Architects / DevOps Engineers / Cloud Solutions Architects) looking to rapidly set and host their modern web and mobile applications and systems in Amazon Web Services (\u2705 typically in just a few weeks!).
Oriented to Development leads or teams looking to solve their current AWS infrastructure and software delivery business needs in a securely and reliably manner, under the most modern best practices.
Your Entire AWS Cloud solutions based on DevOps practices will be achieved:
Moreover, if you are looking to have the complete control of the source code, and of course be able to run it without us, such as building new Development environments and supporting your Production Cloud environments, you're a great fit for the Leverage AWS Cloud Solutions Reference Architecture model.
And remember you could implement yourself or we could implement it for you! \ud83d\udcaa
"},{"location":"work-with-us/faqs/#agreement-and-statement-of-work","title":"Agreement and statement of work","text":""},{"location":"work-with-us/faqs/#project-kick-off","title":"Project Kick-Off","text":"
Project Kick-Off
Once the agreement contract and NDA are signed we estimate 15 days to have the team ready to start the project following the proposed Roadmap (\u201cStatement of work\u201d) that describes at length exactly what you'll receive.
"},{"location":"work-with-us/faqs/#assignments-and-delivery","title":"Assignments and Delivery","text":"
Assignments and Delivery
After gathering all the customer project requirements and specifications we'll adjust the Reference Architecture based on your needs. As a result we'll develop and present the Leverage Reference Architecture for AWS implementation Roadmap.
A typical Roadmap (\u201cStatement of Work\u201d) includes a set number of Iterations (sprints). We try to keep a narrow scope for each Iteration so that we can tightly control how hours get spent to avoid overruns. We typically avoid adding tasks to a running Iteration so that the scope does not grow. That's also why we have an allocation for to specific long lived tasks:
General-Task-1: DevOps and Solutions Architecture challenge, definitions, tasks (PM), reviews, issues and audit.
General-Task-2: WEEKLY FOLLOW-UP Meeting,
Which is work that falls outside of the current Iteration specific tasks. This is for special requests, meetings, pair programming sessions, extra documentation, etc.
binbash will participate and review the planned tasks along the customer:
planned roadmap features
bug fixes
Implementation support
Using the relevant ticketing system (Jira) to prioritize and plan the corresponding work plan.
"},{"location":"work-with-us/faqs/#reports-and-invoicing","title":"Reports and Invoicing","text":"
Reports and Invoicing
Weekly task reports and tasks management agile metrics. We use Toggl to track all our time by client, project, sprint, and developer. We then import these hours into Quickbooks for invoicing.
"},{"location":"work-with-us/faqs/#rates-and-billing","title":"Rates and Billing","text":"
Rates and pricing plans
Pre-paid package subscriptions: A number of prepaid hours is agreed according to the needs of the project. It could be a \"Basic Plan\" of 40 hours per month. Or a \"Premium Plan\" of 80 hours per month (if more hours are needed it could be reviewed). When buying in bulk there is a discount on the value of the hour. When you pay for the package you start discounting the hours from the total as they are used, and if there are unused hours left, consider that maximum 20% could be transferred for the next month.
On-demand Business Subscription: There are a certain number of hours tracked each month, as planned tasks are demanded. The total spent hours will be reported each month. There is a monthly minimum of 40 hours per month. Support tasks maximum estimated effort should be between 80 and 120 hs / month.
Billing
The Customer will be billed every month. Invoices are due within 15 days of issue. We accept payments via US Bank ACH, Bill.com, and Payoneer. Rates include all applicable taxes and duties as required by law.
Please create a Github Issue to get immediate support from the binbash Leverage Team
"},{"location":"work-with-us/support/#our-engineering-support-team","title":"Our Engineering & Support Team","text":""},{"location":"work-with-us/support/#aws-well-architected-review","title":"AWS Well Architected Review","text":"
Feel free to contact us for an AWS Well Architected Framework Review
Well Architected Framework Review Reference Study Case
binbash Leverage\u2122 and its components intends to be backward compatible, but due to the complex ecosystems of tools we manage this is not always possible.
It is always recommended using the latest version of the Leverage CLI with the latest versions of the Reference Architecture for AWS. In case that's not possible we always recommend pinning versions to favor stability and doing controlled updates component by component based on the below presented compatibility matrix table.
If you need to know which Leverage CLI versions are compatible with which Leverage Toolbox Docker Images please refer to the Release Notes. Just look for the section called \"Version Compatibility\". Bear in mind though that, at the moment, we do not include a full compatibility table there but at least you should be able to find out what's the Toolbox Image that was used for a given release.
If you are looking for the versions of the software included in the Toolbox Docker Image then go instead to the release notes of that repo instead.
This project does not follow the Terraform or other release schedule. Leverage aims to provide a reliable deployment and operations experience for the binbash Leverage\u2122 Reference Architecture for AWS, and typically releases about a quarter after the corresponding Terraform release. This time allows for the Terraform project to resolve any issues introduced by the new version and ensures that we can support the latest features.
"},{"location":"work-with-us/roadmap/ref-arch/cost-optimization/","title":"Cost Optimization Roadmap","text":""},{"location":"work-with-us/roadmap/ref-arch/cost-optimization/#features-functionalities","title":"Features / Functionalities \ud83d\udcb0\ud83d\udcca\ud83d\udcc9","text":"Category Tags / Labels Feature / Functionality Status Doc CostOptimization(FinOps) leveragecloud-solutions-architecturedocumentation Calculate Cloud provider costs (Cost optimization focus!) \u2705 \u274c CostOptimization(FinOps) leveragecost-optimizationbilling AWS billing alarms + AWS Budget (forecasted account cost / RI Coverage) Notifications to Slack \u2705 \u274c CostOptimization(FinOps) leveragecost-optimizationcost Activate AWS Trusted Advisor cost related results \u2705 \u274c CostOptimization(FinOps) leveragecost-optimizationlambda-nuke Setup Lambda nuke to automatically clean up AWS account resources \u2705 \u274c CostOptimization(FinOps) leveragecost-optimizationlambda-scheduler Setup lambda scheduler for stop and start resources on AWS (EC2, ASG & RDS) \u2705 \u274c"},{"location":"work-with-us/roadmap/ref-arch/demo-apps/","title":"Demo Applications Roadmap","text":""},{"location":"work-with-us/roadmap/ref-arch/demo-apps/#features-functionalities","title":"Features / Functionalities \ud83d\udc68\u200d\ud83d\udcbb\ud83c\udccf\ud83d\udd79\ud83c\udfaf","text":"Category Tags / Labels Feature / Functionality Status Doc CI/CD Pipelineautomation& imple leverageci-cd-pipelinedockerbuild FrontEnd Build (Demo App): set up ECR, create IAM permissions, create pipelines (Jenkins / DroneCI), set up GitHub triggers 2021 Q2 \u274c CI/CD Pipelineautomation& imple leverageci-cd-pipelinedeploy FrontEnd Deploy (Demo App): create pipelines (Jenkins / Spinnaker), set up ECR/Github triggers 2021 Q2 \u274c CI/CD Pipelineautomation& imple leverageci-cd-pipelinedockerbuild BackEnd Build (Demo App): set up ECR, create IAM permissions, create pipelines (Jenkins / DroneCI), set up GitHub triggers 2021 Q2 \u274c CI/CD Pipelineautomation& imple leverageci-cd-pipelinedeploy BackEnd Deploy (Demo App): create pipelines (Jenkins / Spinnaker), set up ECR triggers 2021 Q2 \u274c Testing (QA) leveragetestingci-cd-pipeline Unit Testing (Demo App): Dev team needs this to run on a Jenkins/CircleCI/DroneCI/Spinnaker pipeline 2021 Q2 \u274c Testing (QA) leveragetestingci-cd-pipeline Integration Testing (Demo App): QA team needs automation to run on a Jenkins/Spinnaker pipeline for AWS Cloud QA / Stage envs. 2021 Q2 \u274c Testing (QA) leveragetestingci-cd-pipeline E2E Functional / Aceptannce (Demo App): QA team needs Smoke tests automation to run on a Jenkins/Spinnaker pipeline for AWS Cloud Stage / Prod envs. 2021 Q2 \u274c Testing (QA) leveragetestingci-cd-pipeline Static Analysis (Demo App): code complexity, dependency graph, code frequency, contributors, code activity, and so on. 2021 Q2 \u274c CI/CD Pipelineautomation& imple leverageci-cd-pipelinekubernetespbe Push Button Environments (Demo App): implement ephemeral environments. 2021 Q2 \u274c"},{"location":"work-with-us/roadmap/ref-arch/operational-excellence/","title":"Operational Excellence Roadmap","text":""},{"location":"work-with-us/roadmap/ref-arch/operational-excellence/#features-functionalities","title":"Features / Functionalities \ud83d\udc68\u200d\ud83d\udcbb \ud83d\udcaf\ud83e\udd47","text":"Category Tags / Labels Feature / Functionality Status Doc CloudSolutionsArchitecture leveragecloud-solutions-architecturedocumentation DevSecOps & AWS Cloud Solutions Architecture Doc \u2705 \u2705 CloudSolutionsArchitecture leveragecloud-solutions-architecturedocumentation Demo Applications architecture / Services Specifications Doc 2021 Q1 \u274c BaseInfrastructure leveragebase-infrastructuregithub Open Source Ref Architecture (le-tf-aws / le-ansible / le-tf-vault / le-tf-github) 2021 Q2 \u274c BaseInfrastructure leveragebase-infrastructurecli Leverage CLI (https://github.com/binbashar/leverage) for every Reference Architecture Repo (le-tf-aws / le-ansible / le-tf-vault / le-tf-github) 2021 Q2 \u274c BaseInfrastructure leveragebase-infrastructureorganizations Account Settings: Account Aliases and Password Policies, MFA, and enable IAM Access Analyzer across accounts. \u2705 \u274c BaseInfrastructure leveragebase-infrastructurestorage Storage: Account Enable encrypted EBS by default on all accounts; disable S3 public ACLs and policies \u2705 \u274c BaseInfrastructure leveragebase-infrastructureregion Define AWS Region / Multi-Region: keep in mind customers proximity, number of subnets, and other region limitations (https://infrastructure.aws) \u2705 \u274c BaseInfrastructure leveragebase-infrastructurevcs Terraform Github Ref Architecture / Pre-requisites: permissions to set up webhooks, create/configure repositories, create groups (Preferred SSO tool) 2021 Q2 \u274c BaseInfrastructure leveragebase-infrastructureorganizations AWS Organizations: development/stage, production, shared, security, legacy \u2705 \u2705 BaseInfrastructure leveragebase-infrastructureiam IAM: initial accounts (security users, groups, policies, roles; shared/appdevtsg/appprd DevOps role) \u2705 \u2705 BaseInfrastructure leveragebase-infrastructurevpc Networking 1: DNS, VPC, Subnets, Route Tables, NACLs, NATGW, VPC Peering or TGW \u2705 \u274c BaseInfrastructure leveragebase-infrastructurevpn Networking 2: VPN (install Pritunl, create organization, servers and users) \u2705 \u274c Kubernetes leveragekuberneteseks Production Grade Cluster: deploy EKS cluster as code \u2705 \u274c Kubernetes leveragekubernetesk8s K8s Helm + Terraform binbash Leverage repository backing all the K8s components deployment and configuration \u2705 \u274c Kubernetes leveragekubernetesmetrics Monitoring: metrics-server (metrcis for K8s HPA + Cluster AutoScaler + Prom node Exporter) + kube-state-metrics (for Grafana Dasboards) 2021 Q2 \u274c Kubernetes leveragekubernetesiamsecurity Security: Iam-authenticator, K8s RBAC (user, group and roles) \u2705 \u274c Kubernetes leveragekubernetesiam Implement AWS service accounts (IRSA for EKS) to provide IAM credentials to containers running inside a kubernetes cluster based on annotations. \u2705 \u274c Kubernetes leveragekubernetesdashboard Monitoring: K8s dashboard & Weave Scope \u2705 \u274c Kubernetes leveragekubernetesingress Ingress: review, analyze and implement (alb skipper, k8s nginx, alb sigs, etc) \u2705 \u274c Kubernetes leveragekubernetesingress Load Balancing: review, analyze and implement Ingress w/ LB (AWS ALB or NLB + access logs) \u2705 \u274c Kubernetes leveragekubernetesdns Implement external-dns w/ annotations for K8s deployed Apps (https://github.com/kubernetes-sigs/external-dns) \u2705 \u274c Kubernetes leveragekubernetesservices-discovery Service Discovery: review, analyze and implement k8s native [env vars & core-dns] or Consul 2021 Q3 \u274c Kubernetes leveragekubernetesservice-meshlinkerd Service Mesh: review, analyze and implement consul or linkerd2. 2021 Q3 \u274c CI/CDInfrastructure leverageci-cd-infrastructurejenkins Jenkins: installation, configuration, GitHub/GSuite/Bitbucket SSO-Auth integration \u2705 \u274c CI/CDInfrastructure leverageci-cd-infrastructurespinnaker Deployments / Jenkins or Tekton Pipelines + Argo-CD: installation, configuration, Github integration 2021 Q3 \u274c CI/CDInfrastructure leverageci-cd-infrastructuredroneci DroneCI: installation, configuration, Github integration 2021 Q4 \u274c CI/CDInfrastructure leverageci-cd-infrastructurewebhook Proxy Instance (webhooks) : installation, configuration, GitHub integration 2021 Q4 \u274c CI/CDInfrastructure leverageci-cd-infrastructureqa SonarQube: installation, configuration, GitHub/GSuite/Bitbucket SSO-Auth integration 2021 Q4 \u274c ApplicationsInfrastructure leverageapps-infrastructuredockercontainers Automate and containerized app environments by using docker images, enabling consistent experience in local environment and dev/stage/prod Cloud environments. \u2705 \u274c ApplicationsInfrastructure leverageapps-infrastructuredockercontainers Automate and containerized app environments by using docker images, enabling consistent experience in local environment and dev/stage/prod Cloud environments. \u2705 \u274c ApplicationsInfrastructure leverageapps-infrastructuredatabaserds Databases: RDS (most likely AWS Aurora MySql, single db for all microservices at first - Prod dedicated instance considering new auto-scaling feature and read-replicas) + RDS Proxy (if needed for high Cx N\u00b0) - Compliance: Consider using SSL/TLS to Encrypt a Connection to a DB Instance \u2705 \u274c ApplicationsInfrastructure leverageapps-infrastructurequeuesqs Queues: SQS (recommended for background workers and some microservices). Redis (AWS ElasticCache) / RabbitMQ (K8s Containerzied). \u2705 \u274c ApplicationsInfrastructure leverageapps-infrastructurestorages3 Storage: S3 (for the FrontEnd statics) \u2705 \u274c ApplicationsInfrastructure leverageapps-infrastructurecloudfrontcdn Caching: CloudFront (for the FrontEnd) w/ access logs \u2705 \u274c ApplicationsInfrastructure leverageapps-infrastructurecacheredis CacheLayer: AWS Elasticache (Memcache or Redis) \u2705 \u274c"},{"location":"work-with-us/roadmap/ref-arch/overview/","title":"Roadmap","text":"
Leverage AWS Cloud Solutions Reference Architecture Features / Functionalities per category
Operational Excellence
Reliability & Performance
Security
Cost Optimization
Demo Applications
"},{"location":"work-with-us/roadmap/ref-arch/reliability-performance/","title":"Reliability Performance Roadmap","text":""},{"location":"work-with-us/roadmap/ref-arch/reliability-performance/#features-functionalities","title":"Features / Functionalities \ud83d\ude80\u23f2\ud83d\udcca","text":"Category Tags / Labels Feature / Functionality Status Doc MonitoringMetrics& Alerting leveragemonitoring-metrics-alertingprometheusgrafana Metrics: install and configure Prometheus (NodeExporter for EC2 / BlackBox exporter / Alert Monitroing), install and configure Grafana (K8s Plugin + Prometheus int + CloudWatch int) \u2705 \u274c MonitoringMetrics& Alerting leveragemonitoring-metrics-alertinggrafanacloudwatch Metrics: Grafana + AWS Cloudwatch integrations config (https://github.com/monitoringartist/grafana-aws-cloudwatch-dashboards) 2021 Q2 \u274c MonitoringMetrics& Alerting leveragemonitoring-metrics-alertingapm APM: review, analyze and implement (New Relic, DataDog, ElasticAPM Agent/Server) 2021 Q2 \u274c MonitoringMetrics& Alerting leveragemonitoring-metrics-alertingdocumentation Define and document reference notification/escalation procedure \u2705 \u274c MonitoringMetrics& Alerting leveragemonitoring-metrics-alerting Alerting: configure AlertsManager, Elastalert (optimized logs rotation when using it from docker image), PagerDuty, Slack according to the procedure above 2021 Q2 \u274c MonitoringMetrics& Alerting leveragemonitoring-metrics-alertingprometheus Monitor Infra Tool Instances (WebHook Proxy, Jenkins, Vault, Pritunl, Prometheus, Grafana, etc) / implement monitoring via Prometheus + Grafana or Another Solution \u2705 \u274c MonitoringDistributedTracing leveragemonitoring-tracingjaeger Distributed Tracing Instrumentation: review, analyze and implement to detect and improve transactions performance and svs dep analysis (jaeger, instana, lightstep, AWS X-Ray, etc) 2021 Q3 \u274c MonitoringLogging leveragemonitoring-logsefk Logging / EFK - use separate indexes per K8s components & apps/svc for each custer/env (segregating dev/stg from prd) + enable ES monitoring w/ X-Pack + configure curator to rotate indices + tool to improve index mgmt 2021 Q2 \u274c Performance& Optimization leverageperformance-optimizationci-cd-pipeline Load Testing: set up and run continuous load tests pipelines (Jenkins) to determine and improve apps/services capacity through time (apapche ab, gatling, iperf, locust, taurus, BlazeMeter and https://github.com/loadimpact/k6) 2021 Q3 \u274c Performance& Optimization leverageperformance-optimizationci-cd-pipeline Performance Testing (stress, soak, spike, etc): set up and run continuous performance tests pipelines (Jenkins) to measure performance through time (apapche ab, gatling, iperf, locust, taurus and BlazeMeter) 2021 Q3 \u274c Performance& Optimization leverageperformance-optimizationkubernetes Tune K8S nodes (EC2 family type, size and AWS ASG -> K8s HPA + Cluster AutoScaler ) 2021 Q3 \u274c Performance& Optimization leverageperformance-optimizationkubernetes Tune K8S requests and limits per namespace (CPU and RAM) / https://github.com/FairwindsOps/goldilocks 2021 Q2 \u274c Performance& Optimization leverageperformance-optimizations3 S3: ensure each bucket is using the proper storage types and persistence (automate mv these objs into lower $ storage tier w/ Life Cycle Policies or w/ S3 Intelligent-Tiering) \u2705 \u274c DisasterRecovery leveragedisaster-recoverybackup AWS Backup Service: RDS, EC2 (AMI), EBS, Dynamo, EFS, SFx, Storage Gw \u2705 \u274c DisasterRecovery leveragedisaster-recoverybackup Replication: S3 (CRR cross-region replication or SRR same-region replication) \u2705 \u274c DisasterRecovery leveragedisaster-recoverybackup Replication: VPC / Compute / Database (CRR cross-region replication) \u2705 \u274c DisasterRecovery leveragedisaster-recoverybackupkubernetes Backup and migrate Kubernetes applications and their persistent volumes w/ https://velero.io/ 2021 Q3 \u274c DisasterRecovery leveragedocumentationdisaster-recovery Review: Disaster recovery plan, missing resources, RTO / RPO, level of automation 2021 Q4 \u274c DisasterRecovery leveragedocumentationdisaster-recovery Improve Plan: create a plan to improve the existing recovery plan and determine implementation phases 2021 Q4 \u274c DisasterRecovery leveragedocumentationdisaster-recovery Execute Plan: implement according to the plan, review/measure and iterate 2021 Q4 \u274c"},{"location":"work-with-us/roadmap/ref-arch/security/","title":"Security Roadmap","text":""},{"location":"work-with-us/roadmap/ref-arch/security/#features-functionalities","title":"Features / Functionalities \ud83d\udd10\u2705","text":"Category Tags / Labels Feature / Functionality Status Doc Security &Audit (SecOps) leveragesecurity-auditpasswords Team Password Management: review, analyze and implement (passbolt, bitwarden, 1password, etc) \u2705 \u274c Security &Audit (SecOps) leverageci-cd-infrastructuresecrets Secrets Management: review, analyze and implement Hashicorp vault 2021 Q1 \u274c Compliance(SecOps) leveragesecrets aws-vault implementation 2021 Q1 \u274c Security &Audit (SecOps) leveragesecurity-auditguardduty AWS Guarduty (Cross-Org with Master and member accounts setup + Trusted IP Lists and Threat IP Lists / Creation + Deletion of Filters for your GuardDuty findings to avoid false possitives + CloudWatch Rule to Lambda/ Cw-Metrics w/ CloudWatch Dashboard) \u2705 \u274c Security &Audit (SecOps) leveragesecurity-auditinspector AWS Inspector (w/ Ansible aws-inpector galaxy role per EC2) 2021 Q3 \u274c Security &Audit (SecOps) leveragesecurity-auditcloudtrail AWS CloudTrail w/ CloudWatch Dashboard + Alarms (include RootLogin) to Slack \u2705 \u274c Security &Audit (SecOps) leveragesecurity-auditfirewall AWS Firewall Manager (cross-org WAF + Shield integrated with ALBs, CloudFront and/or API-GW + Cross-org Sec group audit) \u2705 \u274c Security &Audit (SecOps) leveragesecurity-auditvpc AWS VPC Flow Logs \u2705 \u274c Security &Audit (SecOps) leveragesecurity-audit ScoutSuite / Prowler: set up continuous, automated reports for each account (Evaluate the use of CloudMapper) 2021 Q2 \u274c Security &Audit (SecOps) leveragesecurity-auditusers Infra DevOps Tools OS Layer ( OS security updates and patches, root user config, ssh port, fail2ban ) \u2705 \u274c Compliance(SecOps) leveragesecurity-auditcompliance AWS Config: implement audit controls (evaluate automatic remediation if applicable) \u2705 \u274c Compliance(SecOps) leveragesecurity-auditcompliance AWS Security Hub: implement audit controls 2021 Q3 \u274c Compliance(SecOps) leveragesecurity-auditcompliance AWS Trusted Advisor: Review automated Costs Optimization, Performance, Security, Fault Tolerance and Service Limits audit results. \u2705 \u274c Compliance(SecOps) leveragesecurity-auditcompliancekubernetes Kubernetes Audit: implement on the clusters: KubeAudit, Kube-Bench, Kube-Hunter and Starboard. 2021 Q2 \u274c Security &Audit (SecOps) leveragesecurity-auditci-cd-pipeline Security and Vulnerability static code analysis (code dependencies): implement tools to continuously analyze and report vulnerabilities, automated reports (OWASP, bandit, snyk, HawkEye scanner, yarn audit, etc) 2021 Q2 \u274c Security &Audit (SecOps) leveragesecurity-auditdocker Containers: implement tools to continuously analyze and report on vulnerabilities (docker-bench-security, snyk, aquasecurity microscanner, docker-bench, aws ecr scan) \u2705 \u274c Security &Audit (SecOps) leveragesecurity-audit Review and Fix all snyk high sev findings 2021 Q2 \u274c Security &Audit (SecOps) leveragesecurity-audit Security and cost analysis in the CI PR automated process (le-tf-aws / le-ansible / le-tf-vault / le-tf-github) 2021 Q1 \u274c Security &Audit (SecOps) leveragesecurity-audit Comply with AWS Security Maturity Roadmap 2021 2021 Q2 \u274c Compliance(SecOps) leveragesecurity-auditcompliance Certified compliant by the Center for Internet Security (CIS)end-to-end CIS-compliant Reference Architecture (get compliance out of the box) 2021 Q2 \u274c Security &Audit (SecOps) leveragesecurity-auditdashboard Centralized DevSecOps Tools and Audit Report Dashboard 2021 Q3 \u274c"}]}
\ No newline at end of file
+{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"license/","title":"License","text":"
MIT License
Copyright \u00a9 2017 - 2020 binbash
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Welcome to Leverage's documentation! Here you will find the concepts you need to understand to work with our stack, the steps to try Leverage by yourself, and the extensive documentation about every aspect of our solution.
Now that you know the basic concepts about Leverage feel free to give it a try or check out the User Guide section to go deeper into the implementation details. Links down below:
Leverage was built around the AWS Well Architected Framework and it uses a stack that includes Terraform, Ansible, Helm and other tools.
We are also adopters and supporters of Kubernetes and the Cloud Native movement, which should become self-evident as you keep exploring our technology stack.
"},{"location":"concepts/our-tech-stack/#why-did-we-choose-our-tech-stack","title":"Why did we choose our tech stack?","text":"Why AWS\u2753
Amazon Web Services (AWS) is the world\u2019s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers\u2014including the fastest-growing startups, largest enterprises, and leading government agencies\u2014are using AWS to lower costs, become more agile, and innovate faster.
Build, Deploy, and Manage Websites, Apps or Processes On AWS' Secure, Reliable Network. AWS is Secure, Reliable, Scalable Services. HIPAA Compliant. Easily Manage Clusters. Global Infrastructure. Highly Scalable.
Read More: What is AWS
Why WAF (Well Architected Framework)\u2753
AWS Well-Architected helps cloud architects to build secure, high-performing, resilient, and efficient infrastructure for their applications and workloads. Based on five pillars \u2014 operational excellence, security, reliability, performance efficiency, and cost optimization \u2014 AWS Well-Architected provides a consistent approach for customers and partners to evaluate architectures, and implement designs that can scale over time.
Read More: AWS Well-architected
Why Infra as Code (IaC) & Terraform\u2753
Confidence: A change breaks the env? Just roll it back. Still not working? Build a whole new env with a few keystrokes. IaC enables this.
Repeatability: Allows your infra to be automatically instantiated, making it easy to build multiple identical envs.
Troubleshooting: Check source control and see exactly what changed in the env. As long as you are diligent and don\u2019t make manual envs changes, then IaC can be a game changer.
DR: Require the ability to set up an alternate env in a different DC or Region. IaC makes this a much more manageable prospect.
Auditability: You will need to be able to audit both changes and access to an env, IaC gives you this right out of the box.
Visibility: As an env expands over time, is challenging to tell what has been provisioned. In the #cloud this can be a huge #cost issue. IaC allows tracking your resources.
Portability: Some IaC techs are #multicloud. Also, translating #Terraform from one cloud provider to another is considerably more simple than recreating your entire envs in a cloud-specific tool.
Security: See history of changes to your SG rules along with commit messages can do wonders for being confident about the security configs of your envs.
Terraform allows to codify your application infrastructure, reduce human error and increase automation by provisioning infrastructure as code. With TF we can manage infrastructure across clouds and provision infrastructure across 300+ public clouds and services using a single workflow. Moreover it helps to create reproducible infrastructure and provision consistent testing, staging, and production environments with the same configuration.
Terraform has everything we expect from a IaC framework: open source, cloud-agnostic provisioning tool that supported immutable infrastructure, a declarative language, and a client-only architecture.
Read More
Why Infrastructure as Code
Why Terraform by Gruntwork
Why Organizations\u2753
AWS Organizations helps you centrally manage and govern your environment as you grow and scale your AWS resources. Using AWS Organizations, you can programmatically create new AWS accounts and allocate resources, group accounts to organize your workflows, apply policies to accounts or groups for governance, and simplify billing by using a single payment method for all of your accounts.
Read More
How it works: AWS Organizations
AWS Organizations
Why IAM and roles\u2753
AWS Identity and Access Management (IAM) enables you to manage access to AWS services and resources securely. Using IAM, you can create and manage AWS users and groups, and use permissions to allow and deny their access to AWS resources.
Integration and Fine-grained access control with almost every AWS service and its resources.
Multi-factor authentication for highly privileged users.
Raise your security posture with AWS infrastructure and services. Using AWS, you will gain the control and confidence you need to securely run your business with the most flexible and secure cloud computing environment available today. As an AWS customer, you will benefit from AWS data centers and a network architected to protect your information, identities, applications, and devices. With AWS, you can improve your ability to meet core security and compliance requirements, such as data locality, protection, and confidentiality with our comprehensive services and features.
Read More
How it works: AWS Security
AWS Cloud Security
Why VPC\u2753
Amazon Virtual Private Cloud (Amazon VPC) is a service that lets you launch AWS resources in a logically isolated virtual network that you define. You have complete control over your virtual networking environment, including selection of your own IP address range, creation of subnets, and configuration of route tables and network gateways. You can use both IPv4 and IPv6 for most resources in your virtual private cloud, helping to ensure secure and easy access to resources and applications.
Read More
How it works: AWS Networking
AWS Virtual Private Cloud
Why Kubernetes (K8s) & AWS EKS\u2753
Kubernetes, also known as K8s, is an open-source system for automating deployment, scaling, and management of containerized applications. It groups containers that make up an application into logical units for easy management and discovery. Kubernetes builds upon 15 years of experience of running production workloads at Google, combined with best-of-breed ideas and practices from the community.
Amazon Elastic Kubernetes Service (Amazon EKS) gives you the flexibility to start, run, and scale Kubernetes applications in the AWS cloud or on-premises. Amazon EKS helps you provide highly-available and secure clusters and automates key tasks such as patching, node provisioning, and updates. Customers such as Intel, Snap, Intuit, GoDaddy, and Autodesk trust EKS to run their most sensitive and mission critical applications.
EKS runs upstream Kubernetes and is certified Kubernetes conformant for a predictable experience. You can easily migrate any standard Kubernetes application to EKS without needing to refactor your code.
Read More
How it works: AWS EKS
AWS EKS
Kubernetes
Why S3\u2753
Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. This means customers of all sizes and industries can use it to store and protect any amount of data for a range of use cases, such as data lakes, websites, mobile applications, backup and restore, archive, enterprise applications, IoT devices, and big data analytics. Amazon S3 provides easy-to-use management features so you can organize your data and configure finely-tuned access controls to meet your specific business, organizational, and compliance requirements. Amazon S3 is designed for 99.999999999% (11 9's) of durability, and stores data for millions of applications for companies all around the world.
Read More
How it works: AWS Storage
AWS S3
Why RDS\u2753
Amazon Relational Database Service (Amazon RDS) makes it easy to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity while automating time-consuming administration tasks such as hardware provisioning, database setup, patching and backups. It frees you to focus on your applications so you can give them the fast performance, high availability, security and compatibility they need.
Amazon RDS is available on several database instance types - optimized for memory, performance or I/O - and provides you with six familiar database engines to choose from, including Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle Database, and SQL Server. You can use the AWS Database Migration Service to easily migrate or replicate your existing databases to Amazon RDS.
Read More
How it works: AWS Databases
AWS RDS
Why Hashicorp Vault\u2753
As many organizations migrate to the public cloud, a major concern has been how to best secure data, preventing it from unauthorized access or exfiltration.
Deploying a product like HashiCorp Vault gives you better control of your sensitive credentials and helps you meet cloud security standards.
HashiCorp Vault is designed to help organizations manage access to secrets and transmit them safely within an organization. Secrets are defined as any form of sensitive credentials that need to be tightly controlled and monitored and can be used to unlock sensitive information. Secrets could be in the form of passwords, API keys, SSH keys, RSA tokens, or OTP.
HashiCorp Vault makes it very easy to control and manage access by providing you with a unilateral interface to manage every secret in your infrastructure. Not only that, you can also create detailed audit logs and keep track of who accessed what.
Manage Secrets and Protect Sensitive Data. Secure, store and tightly control access to tokens, passwords, certificates, encryption keys for protecting secrets and other sensitive data using a UI, CLI, or HTTP API.
Read More
How it works: Secrets
Hashicorp Vault Project
"},{"location":"concepts/what-is-leverage/","title":"What is Leverage?","text":"
Leverage was made out of a significant amount of knowledge, acquired through several years of experience, turned into an ecosystem of code, tools, and workflows that enables you to build the AWS infrastructure for your applications and services quickly and securely.
Since all the code and modules are already built, we can get you up and running up to 10x faster than a consulting company -- typically in just a few weeks! -- and on top of code that is thoroughly documented, tested, and has been proven in production at dozens of other project deployments.
Our focus is on creating reusable, high quality Cloud Infrastructure code, through our core components:
Reference Architecture: Designed under optimal configs for the most popular modern web and mobile applications needs. Its design is fully based on the AWS Well Architected Framework.
Infrastructure as Code (IaC) Library: A collection of reusable, tested, production-ready E2E AWS Cloud infrastructure as code solutions, leveraged by modules written in: Terraform, Ansible, Helm charts, Dockerfiles and Makefiles.
Leverage CLI: projects' command line tool. Provides the means to interact and deploy Leverage Reference Architecture on AWS and if needed it allows you to define custom tasks to run.
Check out this intro video that explains what Leverage is in less than 5 minutes:
"},{"location":"concepts/what-leverage-can-do-for-you/","title":"What can Leverage do for you?","text":"
Still not convinced? Check out the following sections which describe what Leverage can bring on the table depending on your role in a company.
"},{"location":"concepts/what-leverage-can-do-for-you/#leverage-for-cios-ctos-and-vps-of-engineering","title":"Leverage for CIOs, CTOs and VPs of Engineering","text":"Accelerate development and optimize costs
Annual cost savings are a new standard and best practice. Profits are being targeted to business development, regulatory and compliance needs. Resulting in a reduction of pressure on IT and development budgets, granting the opportunity to focus in new features and boost innovation.
Modernize applications architecture (loosely coupled and modular)
Strategically decompose the monolith into a fine-grained, loosely coupled modular architecture to increase both development and business agility. When the system architecture is designed to allow teams to test, deploy and change systems without relying on other teams, they require little communication to get the job done. In other words, both the architecture and the teams are loosely coupled.
Innovation - Rapidly adopt new technologies and reduce development time
Use Leverage Reference Architecture and for AWS + our libraries to provide a collection of cloud application architecture components to build and deploy faster in the cloud. Building a cloud Landing Zone is complex, especially since most companies have little or no expertise in this area. And it can take a significant amount of time to get it right. Leverage a reference architecture to give you an AWS Landing Zone that provides a consistent and solid \"foundations\" to bootstrap your project in the cloud. The code solution implements the best AWS Well-Architected Framework practices as well as the battle-tested tech experience and years of knowledge of our contributors.
Hours or days, not weeks or months
Leverage implements infrastructure as code at all times. We have rolled this out using Terraform, and has been fully proven in AWS and other Terraform providers that are part of our reference architecture like Kubernetes, Helm and Hashicorp Vault. By using the Leverage CLI, our binary will help you to quickly bootstrap your AWS Landing Zone in a matter of hours (or at most a few days).
It's not just a pile of scripts
It's not just another layer of untested, one time and stand-alone developed scripts. The code is modularized and well designed under best practices, our Leverage CLI has both unit and integration tests. While our Terraform code has been extensively E2E tested. Moreover, 100% of the code is yours (to modify, extend, reuse, etc), with no vendor locking and vendor licensing fees. We use the MIT license, so you can take the code, modify it and use it as your private code. All we ask in return is a friendly greeting and that (if possible) consider contributing to binbash Leverage project. Implement Leverage yourself or we can deploy it for you!
DevOps culture and methodologies
Team agility and continuous improvements based on feedback loops are some of the main drivers of cloud adoption, and IAC's goal of reducing the frequency of deployment of both infrastructure and applications are some of the most important aspects of DevOps practices. We continue to apply these methodologies to achieve a DevOps first culture. We have experienced and demonstrated their potential and have practiced them in dozens of projects over the past 5 years. The Leverage reference architecture for AWS combines a set of application best practices, technology patterns and a common CI/CD deployment approach through Leverage CLI for all your application environments. As a result, we are pursuing a world-class software delivery performance through optimized collaboration, communication, reliability, stability, scalability and security at ever-decreasing cost and effort.
Repeatable, composable and extensible immutable infrastructure
The best high-performance development teams create and recreate their development and production environments using infrastructure as code (IaC) as part of their daily development processes. The Leverage CLI allows to build repeatable and immutable infrastructure. So your cloud development, staging and production environments will consistently be the same.
"},{"location":"concepts/what-leverage-can-do-for-you/#leverage-for-devops-engineers-cloud-architects-and-software-engineers","title":"Leverage for DevOps Engineers, Cloud Architects and Software Engineers","text":"Provisioning infrastructure as code (Iac)
Instead of manually provisioning infrastructure, the real benefits of cloud adoption come from orchestrating infrastructure through code. However, this is really challenging to achieve, there are literally thousands of tiny things and configs to consider and they all seem to take forever. Our experience is that it can take teams up to 24 months to achieve a desired infra state in AWS. By using Leverage you could get your AWS Landing-Zone in few weeks, or your entire AWS Well-Architected based cloud solution within 1 to 3 months (depending on your project complexity needs).
We've done it before (don't reinvent the wheel)
Often, development teams have similar and recurring requests such as: iam, networking, security, storage, databases, compute and secret management, etc. binbash Leverage has been proven in dozen of project to create software-defined (IaC) AWS environments.
Best practices baked in the code
Leverage provides IaC reference architecture for AWS hosted applications infrastructure. This is baked into the code as a combination of the best AWS Well-Architected framework practices and the experience of having successfully orchestrated many customers to AWS cloud.
On-demand infra deployment
Leverage provides your DevOps, Cloud, SRE and Development teams with the ability to provision on-demand infrastructure, granting that it will meet the rigorous security requirements of modern cloud native best practices. It fully implements AWS Well-Architected Framework (WAF) and best DevOps practices, including practices, including collaboration, version control, CI/CD, continuous testing, cloud infrastructure and losely couple architectures.
Easier to support and maintain
Leverage IaC approach significantly reduce your AWS infra deployment, config and support burden and reduce risk. Our code backed provisioning has been rigorously tested many times, eliminating the possibility of manual errors. Because the entire infrastructure is deployed from the same proven code, the consistency your cloud environments will simplify your setup and maintenance. Use the versioned code to iterate and improve, extend or compose your internal processes as your cloud operating model evolves.
There is no vendor lock-in. You own the solution
With Leverage you own 100% of the code with no lock-in clauses. If you choose to leave Leverage, you will still have your entire AWS cloud infrastructure that you can access and manage. If you drop Leverage, you will still have your entire cloud native infrastructure code (Terraform, Helm, Ansible, Python). It\u2019s 100% Open Source on GitHub and is free to use with no strings attached under MIT license (no licensing fees), and you are free to commercially and privately use, distribute and modify.
Consistent environments (Dev/prod parity)
Keep development, staging, and production cloud envs parity. Infrastructure as code allow us to define and provisioning all infrastructure components (think networks, load balancers, databases, security, compute and storage, etc.) using code. Leverage uses Terraform as the IaC language, to deploy and setup all the AWS, Kubernetes and Hashicorp Vault resources (it has support for multiple cloud and technology providers). Backed by code, your cloud environments are built exactly the identical way all the time. Finally, this will result in no differences between development, staging and production.
Development in production like envs
IaC allows your development team to deploy and test the AWS infrastructure as if it were application code. Your development is always done in production-like environments. Provision your cloud test and sandbox environments on demand and tear them down when all your testing is complete. Leverage takes all the pain out of maintaining production-like environments, with stable infra releases. It eliminates the unpredictability of wondering if what actually worked in your development envs will work in production.
By implementing our Reference Architecture for AWS and the Infrastructure as Code (IaC) Library via Leverage CLI, you will get your entire Cloud Native Application Infrastructure deployed in only a few weeks.
Did you know?
You can roll out Leverage by yourself or we can implement it for you!
"},{"location":"concepts/why-leverage/#the-problem-and-our-solution","title":"The problem and our solution","text":""},{"location":"concepts/why-leverage/#what-are-the-problems-you-might-be-facing","title":"What are the problems you might be facing?","text":"Figure: Why Leverage? The problem. (Source: binbash, \"Leverage Presentation: Why you should use Leverage?\", accessed June 15th 2021)."},{"location":"concepts/why-leverage/#what-is-our-solution","title":"What is our solution?","text":"Figure: Why Leverage? The solution. (Source: binbash, \"Leverage Presentation: Why you should use Leverage?\", accessed June 15th 2021)."},{"location":"es/bienvenido/","title":"Bienvenido","text":""},{"location":"es/bienvenido/#proximamente","title":"Pr\u00f3ximamente","text":""},{"location":"how-it-works/ref-architecture/","title":"How it works","text":""},{"location":"how-it-works/ref-architecture/#how-it-works","title":"How it works","text":"
The objective of this document is to explain how the binbash Leverage Reference Architecture for AWS works, in particular how the Reference Architecture model is built and why we need it.
This documentation contains all the guidelines to create binbash Leverage Reference Architecture for AWS that will be implemented on the Projects\u2019 AWS infrastructure.
We're assuming you've already have in place your AWS Landing Zone based on the First Steps guide.
Our Purpose
Democratize advanced technologies: As complex as it may sound, the basic idea behind this design principle is simple. It is not always possible for a business to maintain a capable in-house IT department while staying up to date. It is entirely feasible to set up your own cloud computing ecosystem from scratch without experience, but that would take a considerable amount of resources; it is definitely not the most efficient way to go.
An efficient business-minded way to go is to employ AWS as a service allows organizations to benefit from the advanced technologies integrated into AWS without learning, researching, or creating teams specifically for those technologies.
Info
This documentation will provide a detailed reference of the tools and techs used, the needs they address and how they fit with the multiple practices we will be implementing.
AWS Regions: Multi Region setup \u2192 1ry: us-east-1 (N. Virginia) & 2ry: us-west-2 (Oregon).
Repositories & Branching Strategy
DevOps necessary repositories will be created. Consultant will use a trunk-based branching strategy with short-lived feature branches (feature/ID-XXX -> `master), and members from either the Consultant or the Client will be reviewers of every code delivery to said repositories (at least 1 approver per Pull Request).
Infra as code deployments should run from the new feature/ID-XXX or master branch. feature/ID-XXX branch must be merged immediately (ASAP) via PR to the master branch.
Consideration: validating that the changes within the code will only affect the desired target resources is the responsibility of the executor (to ensure everything is OK please consider exec after review/approved PR).
Infra as Code + GitOps
After deployment via IaC (Terraform, Ansible & Helm) all subsequent changes will be performed via versioned controlled code, by modifying the corresponding repository and running the proper IaC Automation execution.
All AWS resources will be deployed via Terraform and rarely occasional CloudFormation, Python SDK & AWS CLI when the resource is not defined by Terraform (almost none scenario). All code and scripts will be included in the repository. We'll start the process via Local Workstations. Afterwards full exec automation will be considered via: Github Actions, ,Gitlab Pipelines or equivalent preferred service.
Consideration: Note that any change manually performed will generate inconsistencies on the deployed resources (which left them out of governance and support scope).
Server OS provisioning: Provisioning via Ansible for resources that need to be provisioned on an OS.
Containers Orchestration: Orchestration via Terraform + Helm Charts for resources that need to be provisioned in Kubernetes (with Docker as preferred container engine).
Pre-existing AWS Accounts: All resources will be deployed in several new AWS accounts created inside the Client AWS Organization. Except for the AWS Legacy Account invitation to the AWS Org and OrganizationAccountAccessRole creation in it, there will be no intervention whatsoever in Client Pre-existing accounts, unless required by Client authority and given a specific requirement.
Info
We will explore the details of all the relevant Client application stacks, CI/CD processes, monitoring, security, target service level objective (SLO) and others in a separate document.
"},{"location":"try-leverage/","title":"Index","text":""},{"location":"try-leverage/#try-leverage","title":"Try Leverage","text":""},{"location":"try-leverage/#before-you-begin","title":"Before you begin","text":"
The objective of this guide is to introduce you to our binbash Leverage Reference Architecture for AWS workflow through the complete deployment of a basic landing zone configuration.
The Leverage Landing Zone is the smallest possible fully functional configuration. It lays out the base infrastructure required to manage the environment: billing and financial management, user management, security enforcement, and shared services and resources. Always following the best practices layed out by the AWS Well-Architected Framework to ensure quality and to provide a solid base to build upon. This is the starting point from which any Leverage user can and will develop all the features and capabilities they may require to satisfy their specific needs.
Figure: Leverage Landing Zone architecture components diagram."},{"location":"try-leverage/#about-this-guide","title":"About this guide","text":"
In this guide you will learn how to:
Create and configure your AWS account.
Work with the Leverage CLI to manage your credentials, infrastructure and the whole Leverage stack.
Prepare your local environment to manage a Leverage project.
Orchestrate the project's infrastructure.
Configure your users' credentials to interact with the project.
Upon completion of this guide you will gain an understanding of the structure of a project as well as familiarity with the tooling used to manage it.
To begin your journey into creating your first Leverage project, continue to the next section of the guide where you will start by setting up your AWS account.
"},{"location":"try-leverage/add-aws-accounts/","title":"Add more AWS Accounts","text":""},{"location":"try-leverage/add-aws-accounts/#brief","title":"Brief","text":"
You can add new AWS accounts to your Leverage project by following the steps in this page.
Important
In the examples below, we will be using apps-prd as the account we will be adding and it will be created in the us-east-1 region.
"},{"location":"try-leverage/add-aws-accounts/#create-the-new-account-in-your-aws-organization","title":"Create the new account in your AWS Organization","text":"
Go to management/global/organizations.
Edit the locals.tf file to add the account to the local accounts variable.
Note that the apps organizational unit (OU) is being used as the parent OU of the new account. If you need to use a new OU you can add it to organizational_units variable in the same file.
Run the Terraform workflow to apply the new changes. Typically that would be this:
Note this layer was first applied before using the boostrap user. Now, that we are working with SSO, credentials have changed. So, if this is the first account you add you'll probably get this error applying: \"Error: error configuring S3 Backend: no valid credential sources for S3 Backend found.\" In this case running leverage tf init -reconfigure will fix the issue.
Add the new account to the <project>/config/common.tfvars file. The new account ID should have been displayed in the output of the previous step, e.g.:
aws_organizations_account.accounts[\"apps-prd\"]: Creation complete after 14s [id=999999999999]\n
Note the id, 999999999999.
...so please grab it from there and use it to update the file as shown below:
accounts = {\n\n[...]\n\napps-prd = {\nemail = \"<aws+apps-prd@yourcompany.com>\",\n id = \"<add-the-account-id-here>\"\n}\n}\n
5. Since you are using SSO in this project, permissions on the new account must be granted before we can move forward. Add the right permissions to the management/global/sso/account_assignments.tf file. For the example:
Note your needs can vary, these permissions are just an example, please be careful with what you are granting here.
Apply these changes:
leverage terraform apply\n
And you must update your AWS config file accordingly by running this:
leverage aws configure sso\n
Good! Now you are ready to create the initial directory structure for the new account. The next section will guide through those steps.
"},{"location":"try-leverage/add-aws-accounts/#create-and-deploy-the-layers-for-the-new-account","title":"Create and deploy the layers for the new account","text":"
In this example we will create the apps-prd account structure by using the shared as a template.
"},{"location":"try-leverage/add-aws-accounts/#create-the-initial-directory-structure-for-the-new-account","title":"Create the initial directory structure for the new account","text":"
Ensure you are at the root of this repository
Now create the directory structure for the new account:
mkdir -p apps-prd/{global,us-east-1}\n
Set up the config files:
Create the config files for this account:
cp -r shared/config apps-prd/config\n
Open apps-prd/config/backend.tfvars and replace any occurrences of shared with apps-prd.
Do the same with apps-prd/config/account.tfvars
"},{"location":"try-leverage/add-aws-accounts/#create-the-terraform-backend-layer","title":"Create the Terraform Backend layer","text":"
If the source layer was already initialized you should delete the previous Terraform setup using sudo rm -rf .terraform* in the target layer's directory, e.g. rm -rf apps-prd/us-east-1/base-tf-backend/.terraform*
Go to the apps-prd/us-east-1/base-tf-backend directory, open the config.tf file and comment the S3 backend block. E.g.:
To finish with the backend layer, re-init to move the tfstate to the new location. Run:
leverage terraform init\n
Terraform will detect that you are trying to move from a local to a remote state and will ask for confirmation.
Initializing the backend...\nAcquiring state lock. This may take a few moments...\nDo you want to copy existing state to the new backend?\n Pre-existing state was found while migrating the previous \"local\" backend to the\n newly configured \"s3\" backend. No existing state was found in the newly\n configured \"s3\" backend. Do you want to copy this state to the new \"s3\"\nbackend? Enter \"yes\" to copy and \"no\" to start with an empty state.\n\n Enter a value:
Enter yes and hit enter.
"},{"location":"try-leverage/add-aws-accounts/#create-the-security-base-layer","title":"Create the security-base layer","text":"
Copy the layer from an existing one: From the repository root run:
If the source layer was already initialized you should delete the previous Terraform setup using sudo rm -rf .terraform* in the target layer's directory, e.g. rm -rf apps-prd/us-east-1/security-base/.terraform*
Go to the apps-prd/us-east-1/security-base directory and open the config.tf file replacing any occurrences of shared with apps-prd E.g. this line should be:
If the source layer was already initialized you should delete the previous Terraform setup using sudo rm -rf .terraform* in the target layer's directory, e.g. rm -rf apps-prd/us-east-1/base-network/.terraform*
Go to the apps-prd/us-east-1/base-network directory and open the config.tf file replacing any occurrences of shared with apps-prd. E.g. this line should be:
Note here only two AZs are enabled, if needed uncomment the other ones in the three structures.
Do not overlap CIDRs!
Be careful when chosing CIDRs. Avoid overlaping CIDRs between accounts. If you need a reference on how to chose the right CIDRs, please see here.
Calculate CIDRs
To calculate CIDRs you can check this playbook.
Init and apply the layer
leverage tf init\nleverage tf apply\n
Create the VPC Peering between the new account and the VPC of the Shared account. Edit file shared/us-east-1/base-network/config.tf and add provider and remote state for the created account.
provider \"aws\" {\nalias = \"apps-prd\"\nregion = var.region\nprofile = \"${var.project}-apps-prd-devops\"\n}\n\ndata \"terraform_remote_state\" \"apps-prd-vpcs\" {\nfor_each = {\nfor k, v in local.apps-prd-vpcs :\nk => v if !v[\"tgw\"]\n}\n\nbackend = \"s3\"\n\nconfig = {\nregion = lookup(each.value, \"region\")\nprofile = lookup(each.value, \"profile\")\nbucket = lookup(each.value, \"bucket\")\nkey = lookup(each.value, \"key\")\n}\n}\n
Edit file shared/us-east-1/base-network/locals.tf and under
Edit file shared/us-east-1/base-network/vpc_peerings.tf (if this is your first added account the file won\u00b4t exist, please crate it) and add the peering definition:
To keep creating infra on top of this binbash Leverage Landing Zone with this new account added, please check:
Check common use cases in Playbooks
Review the binbash Leverage architecture
Go for EKS!
"},{"location":"try-leverage/aws-account-setup/","title":"Creating your AWS Management account","text":""},{"location":"try-leverage/aws-account-setup/#create-the-first-aws-account","title":"Create the first AWS account","text":"
First and foremost you'll need to create an AWS account for your project.
Attention
Note this will be your management account and has to be called <project-name>-management.
E.g. if your project is called binbash then your account should be binbash-management.
Follow the instructions here.
This will be the management account for your AWS Organization and the email address you use for signing up will be the root user of this account -- you can see this user represented in the architecture diagram.
Since the root user is the main access point to your account it is strongly recommended that you keep its credentials (email, password) safe by following AWS best practices.
Tip
To protect your management account, enabling Multi Factor Authentication is highly encouraged. Also, reviewing the account's billing setup is always a good idea before proceeding.
For more details on setting up your AWS account: Organization account setup guide
"},{"location":"try-leverage/aws-account-setup/#create-a-bootstrap-user-with-temporary-administrator-permissions","title":"Create a bootstrap user with temporary administrator permissions","text":"
Leverage needs a user with temporary administrator permissions in order to deploy the initial resources that will form the foundations you will then use to keep building on. That initial deployment is called the bootstrap process and thus the user required for that is called \"the bootstrap user\".
To create that user, navigate to the IAM page and create a user named mgmt-org-admin following steps 2 and 3 of this leverage doc.
Info
Bear in mind that the page for creating users may change from time to time but the key settings for configuring the bootstrap user are the following:
It must be an IAM user (we won't be using IAM Identity Center for this)
Password can be auto-generated
It requires admin privileges which you can achieve by directly attaching the AdministratorAccess policy to it
There's no need to add the user to any group as it is only a temporary user
Usually the last step of the user creation should present you the following information:
Console sign-in URL
User name
Console password
Make a note of all of these and keep them in a safe place as you will need them in the following steps.
Info
If you are only getting the bootstrap user credentials for someone else in your team or in Binbash's team, then please share that using a secure way (e.g. password management service, GPG keys, etc).
Info
If user was set up with the option \"Force to change password on first login\", you should log into the console to do so.
You have successfully created and configured the AWS account for your Leverage project. From now on, almost all interactions with the AWS environment (with few notable exceptions) will be performed via Leverage.
Next, you will setup all required dependencies to work on a Leverage project in your local machine.
Change sso_enabled to true as follows to enable SSO support:
sso_enabled = true\n
Now you need to set the sso_start_url with the right URL. To find that, navigate here: https://us-east-1.console.aws.amazon.com/singlesignon/home -- you should be already logged in to the Management account for this to work. You should see a \"Settings summary\" panel on the right of the screen that shows the \"AWS access portal URL\". Copy that and use it to replace the value in the sso_start_url entry. Below is an example just for reference:
The 'AWS access portal URL' can be customized to use a more friendly name. Check the official documentation for that.
Further info on configuring SSO
There is more information on how to configure SSO here.
"},{"location":"try-leverage/enabling-sso/#update-backend-profiles-in-the-management-account","title":"Update backend profiles in the management account","text":"
It's time to set the right profile names in the backend configuration files. Open this file: management/config/backend.tfvars and change the profile value from this:
profile = \"me-bootstrap\"\n
To this:
profile = \"me-management-oaar\"\n
Please note that in the examples above my short project name is me which is used as a prefix and it's the part that doesn't get replaced."},{"location":"try-leverage/enabling-sso/#activate-your-sso-user-and-set-up-your-password","title":"Activate your SSO user and set up your password","text":"
The SSO users you created when you provisioned the SSO layer need to go through an email activation procedure.
The user is the one you set in the project.yaml file at the beginning, in this snippet:
Once SSO user's have been activated, they will need to get their initial password so they are able to log in. Check out the steps for that here.
Basically:
Log into your sso_start_url address
Ingress your username (the user email)
Under Password, choose Forgot password.
Type in the code shown in the screen
A reset password email will be sent
Follow the link and reset your password
Now, in the same URL as before, log in with the new credentials
You will be prompted to create an MFA, just do it.
"},{"location":"try-leverage/enabling-sso/#configure-the-cli-for-sso","title":"Configure the CLI for SSO","text":"
Almost there. Let's try the SSO integration now.
"},{"location":"try-leverage/enabling-sso/#configure-your-sso-profiles","title":"Configure your SSO profiles","text":"
Since this is your first time using that you will need to configure it by running this:
leverage aws configure sso\n
Follow the wizard to get your AWS config file created for you. There is more info about that here.
"},{"location":"try-leverage/enabling-sso/#verify-on-a-layer-in-the-management-account","title":"Verify on a layer in the management account","text":"
To ensure that worked, let's run a few commands to verify:
We'll use sso for the purpose of this example
Move to the management/global/sso layer
Run: leverage tf plan
You should get this error: \"Error: error configuring S3 Backend: no valid credential sources for S3 Backend found.\"
This happens because so far you have been running Terraform with a different AWS profile (the bootstrap one). Luckily the fix is simple, just run this: leverage tf init -reconfigure. Terraform should reconfigure the AWS profile in the .terraform/terraform.tfstate file.
Now try running that leverage tf plan command again
This time it should succeed, you should see the message: No changes. Your infrastructure matches the configuration.
Note if you still have the same error, try clearing credentials with:
Next, you will orchestrate the remaining accounts, security and shared.
"},{"location":"try-leverage/leverage-project-setup/","title":"Create a Leverage project","text":"
A Leverage project starts with a simple project definition file that you modify to suit your needs. That file is then used to render the initial directory layout which, at the end of this guide, will be your reference architecture. Follow the sections below to begin with that.
The account's name will be given by your project's name followed by -management, since Leverage uses a suffix naming system to differentiate between the multiple accounts of a project. For this guide we'll stick to calling the project MyExample and so, the account name will be myexample-management.
Along the same line, we'll use the example.com domain for the email address used to register the account. Adding a -aws suffix to the project's name to indicate that this email address is related to the project's AWS account, we end up with a registration email that looks like myexample-aws@example.com.
Email addresses for AWS accounts.
Each AWS account requires having a unique email address associated to it. The Leverage Reference Architecture for AWS makes use of multiple accounts to better manage the infrastructure, as such, you will need different addresses for each one. Creating a new email account for each AWS is not a really viable solution to this problem, a better approach is to take advantage of mail services that support aliases. For information regarding how this works: Email setup for your AWS account.
"},{"location":"try-leverage/leverage-project-setup/#create-the-project-directory","title":"Create the project directory","text":"
Each Leverage project lives in its own working directory. Create a directory for your project as follows:
mkdir myexample\ncd myexample\n
"},{"location":"try-leverage/leverage-project-setup/#initialize-the-project","title":"Initialize the project","text":"
Create the project definition file by running the following command:
$ leverage project init\n[18:53:24.407] INFO Project template found. Updating. [18:53:25.105] INFO Finished updating template. [18:53:25.107] INFO Initializing git repository in project directory. [18:53:25.139] INFO No project configuration file found. Dropping configuration template project.yaml. [18:53:25.143] INFO Project initialization finished.\n
The command above should create the project definition file (project.yaml) and should initialize a git repository in the current working directory. This is important because Leverage projects by-design rely on specific git conventions and also because it is assumed that you will want to keep your infrastructure code versioned.
"},{"location":"try-leverage/leverage-project-setup/#modify-the-project-definition-file","title":"Modify the project definition file","text":"
Open the project.yaml file and fill in the required information.
Typically the placeholder values between < and > symbols are the ones you would want to edit however you are welcome to adjust any other values to suit your needs.
For instance, the following is a snippet of the project.yaml file in which the values for project_name and short_name have been set to example and ex respectively:
The project_name field only accepts lowercase alphanumeric characters and allows hyphens('-'). For instance, valid names could be 'example' or 'leveragedemo' or 'example-demo'
The short_name field only accepts 2 to 4 lowercase alpha characters. For instance, valid names could be 'exam or 'leve or 'ex
We typically use as 1ry us-east-1 and 2ry us-west-2 as our default regions for the majority of our projects. However, please note that these regions may not be the most fitting choice for your specific use case. For detailed guidance, we recommend following these provided guidelines.
Another example is below. Note that the management, security, and shared accounts have been updated with slightly different email addresses (actually aws+security@example.com and aws+shared@example.com are email aliases of aws@example.com which is a convenient trick in some cases):
To be able to interact with your AWS environment you first need to configure the credentials to enable AWS CLI to do so. Provide the keys obtained in the previous account creation step to the command by any of the available means.
ManuallyFile selectionProvide file in command
leverage credentials configure --type BOOTSTRAP\n
[09:37:17.530] INFO Loading configuration file.\n[09:37:18.477] INFO Loading project environment configuration file.\n[09:37:20.426] INFO Configuring bootstrap credentials.\n> Select the means by which you'll provide the programmatic keys: Manually\n> Key: AKIAU1OF18IXH2EXAMPLE\n> Secret: ****************************************\n[09:37:51.638] INFO Bootstrap credentials configured in: /home/user/.aws/me/credentials\n[09:37:53.497] INFO Fetching management account id.\n[09:37:53.792] INFO Updating project configuration file.\n[09:37:55.344] INFO Skipping assumable roles configuration.\n
leverage credentials configure --type BOOTSTRAP\n
[09:37:17.530] INFO Loading configuration file.\n[09:37:18.477] INFO Loading project environment configuration file.\n[09:37:20.426] INFO Configuring bootstrap credentials.\n> Select the means by which you'll provide the programmatic keys: Path to an access keys file obtained from AWS\n> Path to access keys file: ../bootstrap_accessKeys.csv\n[09:37:51.638] INFO Bootstrap credentials configured in: /home/user/.aws/me/credentials\n[09:37:53.497] INFO Fetching management account id.\n[09:37:53.792] INFO Updating project configuration file.\n[09:37:55.344] INFO Skipping assumable roles configuration.\n
"},{"location":"try-leverage/leverage-project-setup/#create-the-configured-project","title":"Create the configured project","text":"
Now you will finally create all the infrastructure definition in the project.
leverage project create\n
[09:40:54.934] INFO Loading configuration file.\n[09:40:54.950] INFO Creating project directory structure.\n[09:40:54.957] INFO Finished creating directory structure.\n[09:40:54.958] INFO Setting up common base files.\n[09:40:54.964] INFO Account: Setting up management.\n[09:40:54.965] INFO Layer: Setting up config.\n[09:40:54.968] INFO Layer: Setting up base-tf-backend.\n[09:40:54.969] INFO Layer: Setting up base-identities.\n[09:40:54.984] INFO Layer: Setting up organizations.\n[09:40:54.989] INFO Layer: Setting up security-base.\n[09:40:54.990] INFO Account: Setting up security.\n[09:40:54.991] INFO Layer: Setting up config.\n[09:40:54.994] INFO Layer: Setting up base-tf-backend.\n[09:40:54.995] INFO Layer: Setting up base-identities.\n[09:40:55.001] INFO Layer: Setting up security-base.\n[09:40:55.002] INFO Account: Setting up shared.\n[09:40:55.003] INFO Layer: Setting up config.\n[09:40:55.006] INFO Layer: Setting up base-tf-backend.\n[09:40:55.007] INFO Layer: Setting up base-identities.\n[09:40:55.008] INFO Layer: Setting up security-base.\n[09:40:55.009] INFO Layer: Setting up base-network.\n[09:40:55.013] INFO Project configuration finished.\n INFO Reformatting terraform configuration to the standard style.\n[09:40:55.743] INFO Finished setting up project.\n
More information on project create
In this step, the directory structure for the project and all definition files are created using the information from the project.yaml file and checked for correct formatting.
You will end up with something that looks like this:
As you can see, it is a structure comprised of directories for each account containing all the definitions for each of the accounts respective layers.
\n
The layers themselves are also grouped based on the region in which they are deployed. The regions are configured through the project.yaml file. In the case of the Leverage landing zone, most layers are deployed in the primary region, so you can see the definition of these layers in a us-east-1 directory, as per the example configuration.
\n
Some layers are not bound to a region because their definition is mainly comprised of resources for services that are global in nature, like IAM or Organizations. These kind of layers are kept in a global directory.
You have now created the definition of all the infrastructure for your project and configured the credentials need to deploy such infrastructure in the AWS environment.
\n
Next, you will orchestrate the first and main account of the project, the management account.
Leverage-based projects are better managed via the Leverage CLI which is a companion tool that simplifies your daily interactions with Leverage. This page will guide you through the installation steps.
Now you have your system completely configured to work on a Leverage project.
Next, you will setup and create your Leverage project.
"},{"location":"try-leverage/management-account/","title":"Configure the Management account","text":"
Finally we reach the point in which you'll get to actually create the infrastructure in our AWS environment.
Some accounts and layers rely on other accounts or layers to be deployed first, which creates dependencies between them and establishes an order in which all layers should be deployed. We will go through these dependencies in order.
The management account is used to configure and access all the accounts in the AWS Organization. Consolidated Billing and Cost Management are also enforced though this account.
Costs associated with this solution
By default this AWS Reference Architecture configuration should not incur in any costs.
"},{"location":"try-leverage/management-account/#deploy-the-management-accounts-layers","title":"Deploy the Management account's layers","text":"
To begin, place yourself in the management account directory.
All apply commands will prompt for confirmation, answer yes when this happens.
More information on terraform init and terraform apply
Now, the infrastructure for the Terraform state management is created. The next step is to push the local .tfstate to the bucket. To do this, uncomment the backend section for the terraform configuration in management/base-tf-backend/config.tf
The AWS account that you created manually is the management account itself, so to prevent Terraform from trying to create it and error out, this account definition is commented by default in the code. Now you need to make the Terraform state aware of the link between the two. To do that, uncomment the management organizations account resource in accounts.tf
Zsh users may need to prepend noglob to the import command for it to be recognized correctly, as an alternative, square brackets can be escaped as \\[\\]
"},{"location":"try-leverage/management-account/#update-the-bootstrap-credentials","title":"Update the bootstrap credentials","text":"
Now that the management account has been deployed, and more specifically, all Organizations accounts have been created (in the organizations layer) you need to update the credentials for the bootstrap process before proceeding to deploy any of the remaining accounts.
This will fetch the organizations structure from the AWS environment and create individual profiles associated with each account for the AWS CLI to use. So, run:
Before working on the SSO layer you have to navigate to the AWS IAM Identity Center page, set the region to the primary region you've chosen and enable Single Sign-On (SSO) by clicking on the Enable button.
Now back to the terminal. The SSO layer is deployed in two steps. First, switch to the global/sso directory and run the following:
Now you not only have a fully functional landing zone configuration deployed, but also are able to interact with it using your own AWS SSO credentials.
For more detailed information on the binbash Leverage Landing Zone, visit the links below.
How it works
User guide
"},{"location":"try-leverage/security-and-shared-accounts/","title":"Configure the Security and Shared accounts","text":"
You should by now be more familiar with the steps required to create and configure the Management account. Now you need to do pretty much the same with two more accounts: Security and Shared. Follow the sections in this page to get started!
What are these accounts used for?
The Security account is intended for operating security services (e.g. GuardDuty, AWS Security Hub, AWS Audit Manager, Amazon Detective, Amazon Inspector, and AWS Config), monitoring AWS accounts, and automating security alerting and response.
The Shared Services account supports the services that multiple applications and teams use to deliver their outcomes. Some examples include VPN servers, monitoring systems, and centralized logs management services.
"},{"location":"try-leverage/security-and-shared-accounts/#deploy-the-security-accounts-layers","title":"Deploy the Security account's layers","text":"
The next account to orchestrate is the security account.
This account is intended for centralized user management via a IAM roles based cross organization authentication approach. This means that most of the users for your organization will be defined in this account and those users will access the different accounts through this one.
"},{"location":"try-leverage/security-and-shared-accounts/#deploy-the-shared-accounts-layers","title":"Deploy the Shared account's layers","text":"
The last account in this deployment is the shared account.
Again, this account is intended for managing the infrastructure of shared services and resources such as directory services, DNS, VPN, monitoring tools or centralized logging solutions.
You have now a fully deployed landing zone configuration for the Leverage Reference Architecture for AWS, with its three accounts management, security and shared ready to be used.
Start/Stop EC2/RDS instances using schedule or manual endpoint
Calculate VPC subnet CIDRs
Kubernetes in different stages
Encrypting/decrypting files with SOPS+KMS
Enable/Disable nat gateway
ArgoCD add external cluster
VPN Server
"},{"location":"user-guide/cookbooks/VPC-subnet-calculator/","title":"How to calculate the VPC subnet CIDRs?","text":"
To calculate subnets this calculator can be used
Note in this link a few params were added: the base network and mask, and the division number. In this case the example is for the shared account networking.
Note the main CIDR is being used for the VPC. See on the left how the /20 encompasses all the rows.
Then two divisions for /21. Note the first subnet address of the first row for each one is being used for private_subnets_cidr and public_subnets_cidr.
Finally the /23 are being used for each subnet.
Note we are using the first two subnet addresses for each /21. This is due to we are reserving the other two to allow adding more AZs in the future. (up to two in this case)
If you want you can take as a reference this page to select CIDRs for each account.
"},{"location":"user-guide/cookbooks/VPC-with-no-LandingZone/","title":"VPC with no Landing Zone","text":""},{"location":"user-guide/cookbooks/VPC-with-no-LandingZone/#what","title":"What","text":"
Do you want to try binbash Leverage but you are not willing to transform yet your already existent infra to the binbash Leverage Landing Zone (honoring the AWS Well Architected Framework)?
With this cookbook you will create a VPC with all the benefits binbash Leverage network layer provides.
If you want to use the Full binbash Leverage Landing Zone please visit the Try Leverage section
This will give you the full power of binbash Leverage and the AWS Well Architected Framework.
Since we are testing we won't use the S3 backend (we didn't create the bucket, but you can do it easily with the base-tf-backend layer), so comment this line in config.tf file:
"},{"location":"user-guide/cookbooks/VPC-with-no-LandingZone/#get-the-layer","title":"Get the layer","text":"
For this step we'll go for a layer that can be found in the binbash Leverage RefArch under this directory.
You can download a directory from a git repository using this Firefox addon or any method you want.
Note when you copy the layer (e.g. with gitzip), the file common-variables.tf , which is a soft link, was probably copied as a regular file. If this happens, delete it:
cd ec2-fleet-ansible\\ --\nrm common-variables.tf\n
"},{"location":"user-guide/cookbooks/VPC-with-no-LandingZone/#prepare-the-layer","title":"Prepare the layer","text":"
Again, since we are not running the whole binbash Leverage Landing Zone we need to comment out these lines in config.tf:
...again, due to the lack of the whole binbash Leverage Landing Zone...
If you plan to access the instance from the Internet (EC2 in a public subnet)(e.g. to use Ansible), you change the first line to \"0.0.0.0/0\". (or better, a specific public IP)
If you want to add an SSH key (e.g. to use Ansible), you can generate a new SSH key, add a resource like this:
And replace the line in ec2_fleet.tf with this one:
key_name = aws_key_pair.devops.key_name\n
In the same file, change instance_type as per your needs.
Also you can add this * * *to the ec2_ansible_fleet resource:
create_spot_instance = true\n
to create spot instances.... and this
create_iam_instance_profile = true\niam_role_description = \"IAM role for EC2 instance\"\niam_role_policies = {\nAmazonSSMManagedInstanceCore = \"arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore\"\n}\n
to add SSM access.
In locals.tf file check the variable multiple_instances. There the EC2 instances are defined, by default there are four. Remember to set the subnets in which the instances will be created.
Finally, apply the layer:
leverage tf apply\n
Check your public IP and try to SSH into your new instance!
Have fun!
"},{"location":"user-guide/cookbooks/VPN-server/","title":"How to create a VPN Server","text":""},{"location":"user-guide/cookbooks/VPN-server/#goal","title":"Goal","text":"
To create a VPN server to access all the private networks (or at least, those ones \"peered\" to the VPN one) in the Organization.
We are assuming the binbash Leverage Landing Zone is deployed, apps-devstg and shared were created and region us-east-1 is being used. In any case you can adapt these examples to other scenarios.
As per binbash Leverage Landing Zone defaults, the VPN server will be created in a public network of the shared base-network VPC.
It is a \"Pritunl\" server.
All the networks that should be accessible from the VPN must:
be \"peered\" to the shared base-network VPC
their CIDR have to be added to the \"Pritunl\" server
This Pritunl server will be deployed in an EC2 instance.
Note this instance can be started/stopped in an scheduled fashion, see here for more info. (Note also, if no EIP is being used, when the instance is stopped and then started again the IP will change.)
These are the steps:
create the EC2 instance
deploy Pritunl
configure Pritunl
"},{"location":"user-guide/cookbooks/VPN-server/#create-the-ec2","title":"Create the EC2","text":""},{"location":"user-guide/cookbooks/VPN-server/#copy-the-layer","title":"Copy the layer","text":"
A few methods can be used to download the VPN Server layer directory into the binbash Leverage project.
E.g. this addon is a nice way to do it.
Paste this layer into the account/region chosen to host this, e.g. shared/us-east-1/, so the final layer is shared/us-east-1/tools-vpn-server/.
Info
As usual when copying a layer this way, remove the file common-variables.tf and soft-link it to your project level one. E.g. rm common-variables.tf && ln -s ../../../config/common-variables.tf common-variables.tf.
"},{"location":"user-guide/cookbooks/VPN-server/#update-the-layer","title":"Update the layer","text":"
Change as per your needs. At a minimum, change the S3 backend key in config.tf file and in file ec2.tf update the objects dns_records_public_hosted_zone and dns_records_internal_hosted_zone with your own domain.
Also, temporarily, allow access to port 22 (SSH) from Internet, so we can access the instance with Ansible.
It seems to be obvious but... you need Ansible installed.
This Ansible repo will be used here: Pritunl VPN Server Playbook
Note
This is a private repository, please get in touch with us to get access to it!
https://www.binbash.co/es/contact
leverage@binbash.co
Copy the playbooks into your project repository. (e.g. you can create an ansible directory inside your binbash Leverage project repository, so all your infraesctructure code is in the same place)
cd into the ansible-pritunl-vpn-server (or the name you've chosen) directory.
Follow the steps in the repository README file to install the server.
"},{"location":"user-guide/cookbooks/VPN-server/#connect-and-configure-the-server","title":"Connect and configure the server","text":"
ssh into the server and run this command:
sudo pritunl default-password\n
Grab the user and password and use them as credentials in the web page at your public domain!
In the initial setup page and change the password and enter the domain in \"Lets Encrypt Domain\".
Hit Save.
"},{"location":"user-guide/cookbooks/VPN-server/#a-user-and-an-organization","title":"A user and an organization","text":"
First things first, add a user.
Go to Users.
Hit Add Organization.
Enter a name and hit Add.
Now Add User.
Enter a name, select the organization, enter an email and let the pin empty.
Hit Add.
"},{"location":"user-guide/cookbooks/VPN-server/#a-new-server","title":"A new server","text":"
Now add a server to log into.
Go to Servers and hit \"Add Server\".
Enter the name, check \"Enable Google Authenticator\" and add it.
Info
Note the Port and Protocol has to be in the range stated in the VPN Server layer, in the ec2.tf file under this block:
{\nfrom_port = 15255, # Pritunl VPN Server public UDP service ports -> pritunl.server.admin org\nto_port = 15257, # Pritunl VPN Server public UDP service ports -> pritunl.server.devops org\nprotocol = \"udp\",\ncidr_blocks = [\"0.0.0.0/0\"],\ndescription = \"Allow Pritunl Service\"\n}\n
Hit Attach Organization and attach the organization you've created.
Hit Attach.
Now hit Start Server.
"},{"location":"user-guide/cookbooks/VPN-server/#a-note-on-aws-private-dns","title":"A note on AWS private DNS","text":"
To use a Route53 private zone (where your private addresses are set), these steps have to be followed:
Edit the server
In the \"DNS Server\" box (where 8.8.8.8 is set) add the internal DNS for the VPC
the internal DNS is x.x.x.2, e.g. if the VPC in which your VPN Server is is 172.18.0.0/16, then your DNS is 172.18.0.2
for the example, the final text is 172.18.0.2, 8.8.8.8 (note we are adding the 8.8.8.8 as a secondary DNS)
Add a specific route for the DNS server, for the example 172.18.0.2/32
Then add all the other routes you need to access your resources, e.g. to access the VPN Server's VPC this route must be added: 172.18.0.0/16
"},{"location":"user-guide/cookbooks/VPN-server/#use-the-user-to-log-into-the-vpn","title":"Use the user to log into the VPN","text":"
Go to Users.
Click the chain icon (Temporary Profile Link) next to the user.
Copy the \"Temporary url to view profile links, expires after 24 hours\" link and send it to the user.
The user should open the link.
The user has to create an OTP with an app such as Authy, enter a PIN, copy the \"Profile URI Link\" and enter it in the \"import > profile URI\" in the Pritunl Client.
Start the VPN and enjoy being secure!
"},{"location":"user-guide/cookbooks/VPN-server/#set-back-security","title":"Set back security","text":"
Set back all the configurations to access the server and apply the layer:
must temporally open port 80 to the world (line 52)
must temporally open port 443 to the world (line 59)
must uncomment public DNS record block (lines 105-112)
make apply
connect to the VPN and ssh to the Pritunl EC2
run '$sudo pritunl reset-ssl-cert'
force SSL cert update (manually via UI or via API call) in the case of using the UI, set the \"Lets Encrypt Domain\" field with the vpn domain and click on save
rollback steps a,b & c + make apply
"},{"location":"user-guide/cookbooks/argocd-external-cluster/","title":"How to add an external cluster to ArgoCD to manage it","text":""},{"location":"user-guide/cookbooks/argocd-external-cluster/#goal","title":"Goal","text":"
Given an ArgoCD installation created with binbash Leverage Landing Zone using the EKS layer, add and manage an external Cluster.
There can be a single ArgoCD instance for all cluster or multiple instances installed:
We are assuming the binbash Leverage Landing Zone is deployed, two accounts called shared and apps-devstg were created and a region us-east-1 is being used. In any case you can adapt these examples to other scenarios.
Note all the argocd namespace's ServiceAccounts were added to oidc_fully_qualified_subjects (because different ArgoCD components use different SAs), and they will be capable of assume the role ${local.environment}-argocd-devstg. (Since we are working in shared the role will be shared-argocd-devstg)
This role lives in shared account.
Apply the layer:
leverage tf apply\n
Info
Note this step creates a role and binds it to the in-cluster serviceaccounts.
"},{"location":"user-guide/cookbooks/argocd-external-cluster/#create-the-target-role-and-change-the-aws_auth-config-map","title":"Create the target role and change the aws_auth config map","text":"
Info
This has to be done in apps-devstg account.
"},{"location":"user-guide/cookbooks/argocd-external-cluster/#create-the-role","title":"Create the role","text":"
Go into the apps-devstg/global/base-identities layer.
In file roles.tf add this resource:
module \"iam_assumable_role_argocd\" {\nsource = \"github.com/binbashar/terraform-aws-iam.git//modules/iam-assumable-role?ref=v4.1.0\"\n\ntrusted_role_arns = [\n\"arn:aws:iam::${var.accounts.shared.id}:root\"\n]\n\ncreate_role = true\nrole_name = \"ArgoCD\"\nrole_path = \"/\"\n\n #\n # MFA setup\n #\nrole_requires_mfa = false\nmfa_age = 43200 # Maximum CLI/API session duration in seconds between 3600 and 43200\nmax_session_duration = 3600 # Max age of valid MFA (in seconds) for roles which require MFA\ncustom_role_policy_arns = [\n]\n\ntags = local.tags\n}\n
Note MFA is deactivated since this is a programatic access role. Also no policies are added since we need to assume it just to access the cluster.
Apply the layer:
leverage tf apply\n
Info
This step will add a role that can be assumed from the shared account.
"},{"location":"user-guide/cookbooks/argocd-external-cluster/#update-the-aws_auth-config-map","title":"Update the aws_auth config map","text":"
cd into layer apps-devstg/us-east-1/k8s-eks/cluster.
Edit file locals.tf, under map_roles list add this:
"},{"location":"user-guide/cookbooks/argocd-external-cluster/#create-the-external-cluster-in-argocd","title":"Create the external cluster in ArgoCD","text":"
Info
This has to be done in shared account.
In shared/us-east-1/k8s-eks/k8s-components layer modify files cicd-argocd.tf and chart-values/argocd.yaml and add this to the first one:
##------------------------------------------------------------------------------\n## ArgoCD DEVSTG: GitOps + CD\n##------------------------------------------------------------------------------\nresource \"helm_release\" \"argocd_devstg\" {\ncount = var.enable_argocd_devstg ? 1 : 0\nname = \"argocd-devstg\"\nnamespace = kubernetes_namespace.argocd_devstg[0].id\nrepository = \"https://argoproj.github.io/argo-helm\"\nchart = \"argo-cd\"\nversion = \"6.7.3\"\nvalues = [\ntemplatefile(\"chart-values/argocd.yaml\", {\nargoHost = \"argocd-devstg.${local.environment}.${local.private_base_domain}\"\ningressClass = local.private_ingress_class\nclusterIssuer = local.clusterissuer_vistapath\nroleArn = data.terraform_remote_state.eks-identities.outputs.argocd_devstg_role_arn\nremoteRoleARN = \"role\"\nremoteClusterName = \"clustername\"\nremoteServer = \"remoteServer\"\nremoteName = \"remoteName\"\nremoteClusterCertificate = \"remoteClusterCertificate\"\n}),\n # We are using a different approach here because it is very tricky to render\n # properly the multi-line sshPrivateKey using 'templatefile' function\nyamlencode({\nconfigs = {\nsecret = {\nargocd_devstgServerAdminPassword = data.sops_file.secrets.data[\"argocd_devstg.serverAdminPassword\"]\n}\n # Grant Argocd_Devstg access to the infrastructure repo via private SSH key\nrepositories = {\nwebapp = {\nname = \"webapp\"\nproject = \"default\"\nsshPrivateKey = data.sops_file.secrets.data[\"argocd_devstg.webappRepoDeployKey\"]\ntype = \"git\"\nurl = \"git@github.com:VistaPath/webapp.git\"\n}\n}\n}\n # Enable SSO via Github\nserver = {\nconfig = {\nurl = \"https://argocd_devstg.${local.environment}.${local.private_base_domain}\"\n\"dex.config\" = data.sops_file.secrets.data[\"argocd_devstg.dexConfig\"]\n}\n}\n})\n]\n}\n
This is a simpler (than the previous one) method, but also is less secure.
It uses a bearer token, which should be rotated periodically. (maybe manually or with a custom process)
Given this diagram:
ArgoCD will call the target cluster directly using the bearer token as authentication.
So, these are the steps:
create a ServiceAccount and its token in the target cluster
create the external cluster in the source cluster's ArgoCD
"},{"location":"user-guide/cookbooks/argocd-external-cluster/#create-the-serviceaccount","title":"Create the ServiceAccount","text":"
Info
This has to be done in apps-devstg account.
There are two ways to grant access. Cluster level or namespace scoped.
If namespace scoped ServiceAccount, Role and Rolebinding are needed to grant access to ArgoCD to the target cluster. If cluster level then ServiceAccount, ClusterRole and ClusterRolebinding. The former needs the namespaces to be created beforehand. The later allows ArgoCD to create the namespaces.
In the target cluster identities layer at apps-devstg/us-east-1/k8s-eks/identities create a tf file and add this:
The following example is for namespace scoped way.
This step will create a ServiceAccount, a Role with the needed permissions, the RoleBinding and the secret with the token. (or clusterrole and clusterrolebinding) Also, multiple namespaces can be specified for namespace scoped way.
To recover the token and the API Server run this:
NAMESPACE=test\nSECRET=$(leverage kubectl get secret -n ${NAMESPACE} -o jsonpath='{.items[?(@.metadata.generateName==\\\"argocd-managed-\\\")].metadata.name}' | sed -E '/^\\[/d')\nTOKEN=$(leverage kubectl get secret ${SECRET} -n ${NAMESPACE} -o jsonpath='{.data.token}' | sed -E '/^\\[/d' | base64 --decode)\nAPISERVER=$(leverage kubectl config view --minify -o jsonpath='{.clusters[0].cluster.server}' | sed -E '/^\\[/d')\n
"},{"location":"user-guide/cookbooks/argocd-external-cluster/#create-the-external-cluster-in-argocd_1","title":"Create the external cluster in ArgoCD","text":"
Info
This has to be done in shared account.
In shared/us-east-1/k8s-eks/k8s-components layer modify files cicd-argocd.tf and chart-values/argocd.yaml and add this to the first one:
##------------------------------------------------------------------------------\n## ArgoCD DEVSTG: GitOps + CD\n##------------------------------------------------------------------------------\nresource \"helm_release\" \"argocd_devstg\" {\ncount = var.enable_argocd_devstg ? 1 : 0\nname = \"argocd-devstg\"\nnamespace = kubernetes_namespace.argocd_devstg[0].id\nrepository = \"https://argoproj.github.io/argo-helm\"\nchart = \"argo-cd\"\nversion = \"6.7.3\"\nvalues = [\ntemplatefile(\"chart-values/argocd.yaml\", {\nargoHost = \"argocd-devstg.${local.environment}.${local.private_base_domain}\"\ningressClass = local.private_ingress_class\nclusterIssuer = local.clusterissuer_vistapath\nroleArn = data.terraform_remote_state.eks-identities.outputs.argocd_devstg_role_arn\nremoteServer = \"remoteServer\"\nremoteName = \"remoteName\"\nremoteClusterCertificate = \"remoteClusterCertificate\"\nbearerToken = \"bearerToken\"\n}),\n # We are using a different approach here because it is very tricky to render\n # properly the multi-line sshPrivateKey using 'templatefile' function\nyamlencode({\nconfigs = {\nsecret = {\nargocd_devstgServerAdminPassword = data.sops_file.secrets.data[\"argocd_devstg.serverAdminPassword\"]\n}\n # Grant Argocd_Devstg access to the infrastructure repo via private SSH key\nrepositories = {\nwebapp = {\nname = \"webapp\"\nproject = \"default\"\nsshPrivateKey = data.sops_file.secrets.data[\"argocd_devstg.webappRepoDeployKey\"]\ntype = \"git\"\nurl = \"git@github.com:VistaPath/webapp.git\"\n}\n}\n}\n # Enable SSO via Github\nserver = {\nconfig = {\nurl = \"https://argocd_devstg.${local.environment}.${local.private_base_domain}\"\n\"dex.config\" = data.sops_file.secrets.data[\"argocd_devstg.dexConfig\"]\n}\n}\n})\n]\n}\n
clusterResources false is so that ArgoCD is prevented to manage cluster level resources.
namespaces scopes the namespaces on which ArgoCD can deploy resources.
Apply the layer:
leverage tf apply\n
Info
This step will create the external-cluster configuration for ArgoCD. Now you can see the cluster in the ArgoCD web UI.
"},{"location":"user-guide/cookbooks/argocd-external-cluster/#deploying-stuff-to-the-target-cluster","title":"Deploying stuff to the target cluster","text":"
To deploy an App to a given cluster, these lines have to be added to the manifest:
"},{"location":"user-guide/cookbooks/enable-nat-gateway/","title":"Enable nat-gateway using binbash Leverage","text":""},{"location":"user-guide/cookbooks/enable-nat-gateway/#goal","title":"Goal","text":"
To activate the NAT Gateway in a VPC created using binbash Leverage Landing Zone.
We are assuming the binbash Leverage Landing Zone is deployed, an account called apps-devstg was created and a region us-east-1 is being used. In any case you can adapt these examples to other scenarios.
if you called the layer other that this, please set the right dir here
Check a file called terraform.auto.tfvars exists. If it does not, create it.
Edit the file and set this content:
vpc_enable_nat_gateway = true\n
Apply the layer as usual:
leverage tf apply\n
"},{"location":"user-guide/cookbooks/enable-nat-gateway/#how-to-disable-the-nat-gateway","title":"How to disable the nat gateway","text":"
Do the same as before but setting this in the tfvars file:
vpc_enable_nat_gateway = false\n
"},{"location":"user-guide/cookbooks/k8s/","title":"Kubernetes for different stages of your projects","text":""},{"location":"user-guide/cookbooks/k8s/#goal","title":"Goal","text":"
When starting a project using Kubernetes, usually a lot of testing is done.
Also, as a startup, the project is trying to save costs. (since probably no clients, or just a few, are now using the product)
To achieve this, we suggest the following path:
Step 0 - develop in a K3s running on an EC2
Step 1 - starting stress testing or having the first clients, go for KOPS
Step 2 - when HA, escalation and easy of management is needed, consider going to EKS
We are assuming the binbash Leverage Landing Zone is deployed, an account called apps-devstg was created and a region us-east-1 is being used. In any case you can adapt these examples to other scenarios.
A gossip-cluster (not exposed to Internet cluster, an Internet exposed cluster can be created using Route53) with a master node and a worker node (with node autoscaling capabilities) will be deployed here.
More master nodes can be deployed. (i.e. one per AZ, actually three are recommended for production grade clusters)
It will be something similar to what is stated here, but with one master, one worker, and the LB for the API in the private network.
We are assuming here the worker Instance Group is called nodes. If you change the name or have more than one Instance Group you need to adapt the first tag.
Info
Note a DNS is not needed since this will be a gossip cluster.
Info
A new bucket is created so KOPS can store the state there
By default, the account base network is used. If you want to change this check/modify this resource in config.tf file:
data \"terraform_remote_state\" \"vpc\" {\n
Also, shared VPC will be used to allow income traffic from there. This is because in the binbash Leverage Landing Zone defaults, the VPN server will be created there.
cd into the 1-prerequisites directory.
Open the locals.tf file.
Here these items can be updated:
versions
machine types (and max, min qty for masters and workers autoscaling groups)
the number of AZs that will be used for master nodes.
Remember binbash Leverage has its rules for this, the key name should match <account-name>/[<region>/]<layer-name>/<sublayer-name>/terraform.tfstate.
Init and apply as usual:
leverage tf init\nleverage tf apply\n
Warning
You will be prompted to enter the ssh_pub_key_path. Here enter the full path (e.g. /home/user/.ssh/thekey.pub) for your public SSH key and hit enter. A key managed by KMS can be used here. A regular key-in-a-file is used for this example, but you can change it as per your needs.
Info
Note if for some reason the nat-gateway changes, this layer has to be applied again.
Info
Note the role AWSReservedSSO_DevOps (the one created in the SSO for Devops) is added as system:masters. If you want to change the role, check the devopsrole in data.tf file.
"},{"location":"user-guide/cookbooks/k8s/#2-apply-the-cluster-with-kops","title":"2 - Apply the cluster with KOPS","text":"
cd into the 2-kops directory.
Open the config.tf file and edit the backend key if needed:
Remember binbash Leverage has its rules for this, the key name should match <account-name>/[<region>/]<layer-name>/<sublayer-name>/terraform.tfstate.
Info
If you want to check the configuration:
make cluster-template\n
The final template in file cluster.yaml.
If you are happy with the config (or you are not happy but you think the file is ok), let's create the Terraform files!
make cluster-update\n
Finally, apply the layer:
leverage tf init\nleverage tf apply\n
Cluster can be checked with this command:
make kops-cmd KOPS_CMD=\"validate cluster\"\n
"},{"location":"user-guide/cookbooks/k8s/#accessing-the-cluster","title":"Accessing the cluster","text":"
Here there are two questions.
One is how to expose the cluster so Apps running in it can be reached.
The other one is how to access the cluster's API.
For the first one:
since this is a `gossip-cluster` and as per the KOPS docs: When using gossip mode, you have to expose the kubernetes API using a loadbalancer. Since there is no hosted zone for gossip-based clusters, you simply use the load balancer address directly. The user experience is identical to standard clusters. kOps will add the ELB DNS name to the kops-generated kubernetes configuration.\n
So, we need to create a LB with public access.
For the second one, we need to access the VPN (we have set the access to the used network previously), and hit the LB. With the cluster, a Load Balancer was deployed so you can reach the K8s API.
"},{"location":"user-guide/cookbooks/k8s/#access-the-api","title":"Access the API","text":"
Run:
make kops-kubeconfig\n
A file named as the cluster is created with the kubeconfig content (admin user, so keep it safe). So export it and use it!
export KUBECONFIG=$(pwd)/clustername.k8s.local\nkubectl get ns\n
Warning
You have to be connected to the VPN to reach your cluster!
"},{"location":"user-guide/cookbooks/schedule-start-stop-ec2/","title":"Start/Stop EC2/RDS instances using schedule or manual endpoint","text":""},{"location":"user-guide/cookbooks/schedule-start-stop-ec2/#what","title":"What?","text":"
You have EC2 instances (or RDS) that are not being used all the time... so why to keep them up and running and billing? Here we'll create a simple schedule to turn them off/on. (also with an HTTP endpoint to do it so manually)
In your binbash Leverage infra repository, under your desired account and region, copy this layer.
You can download a directory from a git repository using this Firefox addon or any method you want.
Remember, if the common-variables.tf file delete the file and soft-link it to the homonymous file in the root config dir: e.g. common-variables.tf -> ../../../config/common-variables.tf
"},{"location":"user-guide/cookbooks/schedule-start-stop-ec2/#set-the-tags","title":"Set the tags","text":"
In the tools-cloud-scheduler-stop-start layer edit the main.tf file. There are two resources: - schedule_ec2_stop_daily_midnight to stop the instances - schedule_ec2_start_daily_morning to start the instances
You can change these names. If you do so remember to change all the references to them.
In the resource_tags element set the right tags. E.g. this:
in the schedule_ec2_stop_daily_midnight resource means this resource will stop instances with tag: ScheduleStopDaily=true."},{"location":"user-guide/cookbooks/schedule-start-stop-ec2/#set-the-schedule","title":"Set the schedule","text":"
Here you can set the schedule in a cron-like fashion.
If it is none it won't create a schedule (e.g. if you only need http endpoint):
cloudwatch_schedule_expression = \"none\"\n
Then if you set this:
http_trigger = true\n
A HTTP endpoint will be created to trigger the corresponding action.
If an endpoint was created then in the outputs the URL will be shown.
"},{"location":"user-guide/cookbooks/sops-kms/","title":"Encrypt and decrypt SOPS files with AWS KMS","text":""},{"location":"user-guide/cookbooks/sops-kms/#goal","title":"Goal","text":"
Using a SOPS file to store secrets in the git repository.
We are assuming the binbash Leverage Landing Zone is deployed, an account called apps-devstg was created and a region us-east-1 is being used. In any case you can adapt these examples to other scenarios.
"},{"location":"user-guide/cookbooks/sops-kms/#encrypt-the-file","title":"Encrypt the file","text":"
Note for encrypting you need to specify an AWS Profile. In the binbash Leverage context profiles are like this: {short-project-name}-{account}-{role}. For example, for my apps-devstg account, using the role devops, in my project bb, the profile is: bb-apps-devstg-devops.
Since binbash Leverage Landing Zone is being used, the default key for the account+region has an alias: ${var.project}_${var.environment}_${var.kms_key_name}_key, in this case is vp_apps-devstg_default_key, so arn:aws:kms:<region>:<account>:alias/vp_apps-devstg_default_key should be used.
Info
To use this file with Terraform, edit the secrets.enc.yaml and at the bottom, edit the line with aws_profile and set there the AWS Profile you've used to encrypt the file.
"},{"location":"user-guide/cookbooks/sops-kms/#decrypt-the-file","title":"Decrypt the file","text":"
"},{"location":"user-guide/infra-as-code-library/infra-as-code-library-forks/","title":"Leverage Open Source Modules management.","text":"
We\u2019ll fork every Infrastructure as Code (IaC) Library dependency repo, why?
Grant full governance over the lib repositories
Availability: Because our project resilience and continuity (including the clients) depends on these repositories (via requirement files or imports) and we want and need total control over the repository used as a dependency. NOTE: There could be few exceptions when using official open source modules makes sense, e.g. the ones shared and maintained by Nginx, Weave, Hashiport, etc.
Reliability (Avoid unforeseen events): in the event that the original project becomes discontinued while we are still working or depending on it (the owners, generally individual maintainers of the original repository, might decide to move from github, ansible galaxy, etc. or even close their repo for personal reasons).
Stability: Our forks form modules (ansible roles / terraform / dockerfiles, etc.) are always going to be locked to fixed versions for every client so no unexpected behavior will occur.
Projects that don't tag versions: having the fork protects us against breaking changes.
Write access: to every Leverage library component repository ensuring at all times that we can support, update, maintain, test, customize and release a new version of this component.
Centralized Org source of truth: for improved customer experience and keeping dependencies consistently imported from binbash repos at Leverage Github
Scope: binbash grants and responds for all these dependencies.
Metrics: Dashboards w/ internal measurements.
Automation: We\u2019ll maintain all this workflow cross-tech as standardized and automated as possible, adding any extra validation like testing, security check, etc., if needed -> Leverage dev-tools
Licence & Ownership: Since we fork open-source and commercially reusable components w/ MIT and Apache 2.0 license. We keep full rights to all commercial, modification, distribution, and private use of the code (No Lock-In w/ owners) through forks inside our own Leverage Project repos. As a result, when time comes, we can make our libs private at any moment if necessary. (for the time being Open Source looks like the best option)
Collaborators considerations
We look forward to have every binbash Leverage repo open sourced favoring the collaboration of the open source community.
Repos that are still private must not be forked by our internal collaborators till we've done a detailed and rigorous review in order to open source them.
As a result any person looking forward to use, extend or update Leverage public repos, could also fork them in its personal or company Github account and create an upstream PR to contribute.
"},{"location":"user-guide/infra-as-code-library/infra-as-code-library-specs/","title":"Tech Specifications","text":"As Code: Hundred of thousands lines of code
Written in:
Terraform
Groovy (Jenkinsfiles)
Ansible
Makefiles + Bash
Dockerfiles
Helm Charts
Stop reinventing the wheel, automated and fully as code
automated (executable from a single source).
as code.
parameterized
variables
input parameters
return / output parameters
\"Stop reinventing the wheel\"
avoid re-building the same things more than X times.
avoid wasting time.
not healthy, not secure and slows us down.
DoD of a highly reusable, configurable, and composable sub-modules
Which will be 100%
modular
equivalent to other programming languages functions - Example for terraform - https://www.terraform.io/docs/modules/usage.html (but can be propagated for other languages and tools):
inputs, outputs parameters.
code reuse (reusable): consider tf modules and sub-modules approach.
testable by module / function.
Since TF is oriented to work through 3rd party API calls, then tests are more likely to be integration tests rather than unit tests. If we don't allow integration for terraform then we can't work at all.
This has to be analyzed for every language we'll be using and how we implement it (terraform, cloudformation, ansible, python, bash, docker, kops and k8s kubeclt cmds)
composition (composable): have multiple functions and use them together
abstraction (abstract away complexity): we have a very complex function but we only expose it's definition to the API, eg: def_ai_processing(data_set){very complex algorithm here}; ai_processing([our_data_set_here])
avoid inline blocks: The configuration for some Terraform resources can be defined either as inline blocks or as separate resources. For example, the aws_route_table resource allows you to define routes via inline blocks. But by doing so, your module become less flexible and configurable. Also, if a mix of both, inline blocks and separate resources, is used, errors may arise in which they conflict and overwrite each other. Therefore, you must use one or the other (ref: https://blog.gruntwork.io/how-to-create-reusable-infrastructure-with-terraform-modules-25526d65f73d) As a rule of thumb, when creating a module, separate resources should always be used.
use module-relative paths: The catch is that the used file path has to be relative (since you could run Terraform on many different computers)\u200a\u2014\u200abut relative to what? By default, Terraform interprets the path as relative to the working directory. That\u2019s a good default for normal Terraform templates, but it won\u2019t work if the file is part of a module. To solve this issue, always use a path variable in file paths. eg:
So as to be able to manage them as a software product with releases and change log. This way we'll be able to know which version is currently deployed in a given client and consider upgrading it.
Env Parity
Promote immutable, versioned infra modules based across envs.
Updated
Continuously perform updates, additions, and fixes to libraries and modules.
Orchestrated in automation
We use the leverage-cli for this purpose
Proven & Tested
Every commit goes through a suite of automated tests to grant code styling and functional testing.
Develop a wrapper/jobs together with specific testing tools in order to grant the modules are working as expected.
Ansible:
Testing your ansible roles w/ molecule
How to test ansible roles with molecule on ubuntu
Terraform:
gruntwork-io/terratest
Cost savings by design
The architecture for our Library / Code Modules helps an organization to analyze its current IT and DevSecOps Cloud strategy and identify areas where changes could lead to cost savings. For instance, the architecture may show that multiple database systems could be changed so only one product is used, reducing software and support costs. Provides a basis for reuse. The process of architecting can support both the use and creation of reusable assets. Reusable assets are beneficial for an organization, since they can reduce the overall cost of a system and also improve its quality, since a reusable asset has already been proven.
Full Code Access & No Lock-In
You get access to 100% of the code under Open Source license, if you choose to discontinue the direct support of the binbash Leverage team, you keep rights to all the code.
Documented
Includes code examples, use cases and thorough documentation, such as README.md, --help command, doc-string and in line comments.
Supported & Customizable
Commercially maintained and supported by binbash.
"},{"location":"user-guide/infra-as-code-library/modules-library-by-technology/","title":"Modules by Technology","text":""},{"location":"user-guide/infra-as-code-library/modules-library-by-technology/#open-source-modules-repos","title":"Open Source Modules Repos","text":"Category URLs Ansible Galaxy Roles bb-leverage-ansible-roles-list Dockerfiles bb-leverage-dockerfiles-list Helm Charts bb-leverage-helm-charts-list Terraform Modules bb-leverage-terraform-modules-list"},{"location":"user-guide/infra-as-code-library/modules-library-by-technology/#open-source-private-modules-repos-via-github-teams","title":"Open Source + Private Modules Repos (via GitHub Teams)","text":"Repositories Details Reference Architecture Most of the AWS resources are here, divided by account. Dockerfiles These are Terraform module we created/imported to build reusable resources / stacks. Ansible Playbooks & Roles Playbooks we use for provisioning servers such as Jenkins, Spinnaker, Vault, and so on. Helm Charts Complementary Jenkins pipelines to clean docker images, unseal Vault, and more. Also SecOps jobs can be found here. Terraform Modules Jenkins pipelines, docker images, and other resources used for load testing."},{"location":"user-guide/infra-as-code-library/overview/","title":"Infrastructure as Code (IaC) Library","text":""},{"location":"user-guide/infra-as-code-library/overview/#overview","title":"Overview","text":"
A collection of reusable, tested, production-ready E2E infrastructure as code solutions, leveraged by modules written in Terraform, Ansible, Dockerfiles, Helm charts and Makefiles.
To view a list of all the available commands and options in your current Leverage version simply run leverage or leverage --help. You should get an output similar to this:
$ leverage\nUsage: leverage [OPTIONS] COMMAND [ARGS]...\n\n Leverage Reference Architecture projects command-line tool.\n\nOptions:\n -f, --filename TEXT Name of the build file containing the tasks\n definitions. [default: build.py]\n-l, --list-tasks List available tasks to run.\n -v, --verbose Increase output verbosity.\n --version Show the version and exit.\n --help Show this message and exit.\n\nCommands:\n aws Run AWS CLI commands in a custom containerized environment.\n credentials Manage AWS cli credentials.\n kc Run Kubectl commands in a custom containerized environment.\n kubectl Run Kubectl commands in a custom containerized environment.\n project Manage a Leverage project.\n run Perform specified task(s) and all of its dependencies.\n shell Run a shell in a generic container.\n terraform Run Terraform commands in a custom containerized...\n tf Run Terraform commands in a custom containerized...\n tfautomv Run TFAutomv commands in a custom containerized...\n
Similarly, subcommands provide further information by means of the --help flag. For example leverage tf --help.
-f | --filename: Name of the file containing the tasks' definition. Defaults to build.py
-l | --list-tasks: List all the tasks defined for the project along a description of their purpose (when available).
Tasks in build file `build.py`:\n\n clean Clean build directory.\n copy_file \n echo \n html Generate HTML.\n images [Ignored] Prepare images.\n start_server [Default] Start the server\n stop_server \n\nPowered by Leverage 1.9.0\n
-v | --verbose: Increases output verbosity. When running a command in a container, the tool provides a description of the container's configuration before the execution. This is specially useful if the user were to to have the need of recreating Leverage's behavior by themselves.
Mapping of the host (Source) directories and files into the container (Target)
Command being executed (useful when trying to replicate Leverage's behavior by yourself)
"},{"location":"user-guide/leverage-cli/history/","title":"A bit of history","text":""},{"location":"user-guide/leverage-cli/history/#how-leverage-cli-came-about","title":"How Leverage CLI came about","text":"
The multiple tools and technologies required to work with a Leverage project were initially handled through a Makefiles system. Not only to automate and simplify the different tasks, but also to provide a uniform user experience during the management of a project.
As a result of more and more features being added and the Leverage Reference Architecture becoming broader and broader, our Makefiles were growing large and becoming too repetitive, and thus, harder to maintain. Also, some limitations and the desire for a more friendly and flexible language than that of Makefiles made evident the need for a new tool to take their place.
Python, a language broadly adopted for automation due to its flexibility and a very gentle learning curve seemed ideal. Even more so, Pynt, a package that provides the ability to define and manage tasks as simple Python functions satisfied most of our requirements, and thus, was selected for the job. Some gaps still remained but with minor modifications these were bridged.
Gradually, all capabilities originally implemented through Makefiles were migrated to Python as libraries of tasks that still resided within the Leverage Reference Architecture. But soon, the need to deliver these capabilities pre-packaged in a tool instead of embedded in the infrastructure definition became apparent, and were re-implemented in the shape of built-in commands of Leverage CLI.
Currently, the core functionality needed to interact with a Leverage project is native to Leverage CLI but a system for custom tasks definition and execution heavily inspired in that of Pynt is retained.
"},{"location":"user-guide/leverage-cli/installation/#update-leverage-cli-from-previous-versions","title":"Update Leverage CLI from previous versions","text":"
Upgrade to a specific version.
$ pip3 install -Iv leverage==1.9.1\n
Upgrade to the latest stable version
$ pip3 install --upgrade leverage\n
"},{"location":"user-guide/leverage-cli/installation/#verify-your-leverage-installation","title":"Verify your Leverage installation","text":"
Verify that your Leverage installation was successful by running
$ leverage --help\nUsage: leverage [OPTIONS] COMMAND [ARGS]...\n\n Leverage Reference Architecture projects command-line tool.\n\nOptions:\n -f, --filename TEXT Name of the build file containing the tasks\n definitions. [default: build.py]\n-l, --list-tasks List available tasks to run.\n -v, --verbose Increase output verbosity.\n --version Show the version and exit.\n --help Show this message and exit.\n\nCommands:\n aws Run AWS CLI commands in a custom containerized environment.\n credentials Manage AWS cli credentials.\n kubectl Run Kubectl commands in a custom containerized environment.\n project Manage a Leverage project.\n run Perform specified task(s) and all of its dependencies.\n terraform Run Terraform commands in a custom containerized...\n tf Run Terraform commands in a custom containerized...\n tfautomv Run TFAutomv commands in a custom containerized...\n
"},{"location":"user-guide/leverage-cli/installation/#installation-in-an-isolated-environment","title":"Installation in an isolated environment","text":"
If you prefer not to install the Leverage package globally and would like to limit its influence to only the directory of your project, we recommend using tools like Pipenv or Poetry. These tools are commonly used when working with python applications and help manage common issues that may result from installing and using such applications globally.
Leverage CLI is the tool used to manage and interact with any Leverage project.
It transparently handles the most complex and error prone tasks that arise from working with a state-of-the-art infrastructure definition like our Leverage Reference Architecture. Leverage CLI uses a dockerized approach to encapsulate the tools needed to perform such tasks and to free the user from having to deal with the configuration and management of said tools.
"},{"location":"user-guide/leverage-cli/private-repositories/","title":"Private Repositories","text":""},{"location":"user-guide/leverage-cli/private-repositories/#working-with-terraform-modules-in-private-repos","title":"Working with Terraform modules in private repos","text":"
If it is the case that the layer is using a module from a private repository read the following. E.g.:
where gitlab.com:some-org/some-project/the-private-repo.git is a private repo."},{"location":"user-guide/leverage-cli/private-repositories/#ssh-accessed-repository","title":"SSH accessed repository","text":"
To source a Terraform module from a private repository in a layer via an SSH connection these considerations have to be kept in mind.
Leverage CLI will mount the host's SSH-Agent socket into the Leverage Toolbox container, this way your keys are accessed in a secure way.
So, if an SSH private reporitory has to be accessed, the corresponding keys need to be loaded to the SSH-Agent.
If the agent is automatically started and the needed keys added in the host system, it should work as it is.
These steps should be followed otherwise:
start the SSH-Agent:
$ eval \"$(ssh-agent -s)\"\n
add the keys to it
$ ssh-add ~/.ssh/<private_ssh_key_file>\n
(replace private_ssh_key_file with the desired file, the process can request the passphrase if it was set on key creation step)
"},{"location":"user-guide/leverage-cli/private-repositories/#using-the-ssh-config-file-to-specify-the-key-that-must-be-used-for-a-given-host","title":"Using the SSH config file to specify the key that must be used for a given host","text":"
The ssh-agent socket is not always available in all the OS, like in Mac. So now our leverage terraform init command copies the ssh config file (and the whole .ssh directory) into the container volume, which means any custom configuration you have there, will be used. You can read more on the ssh official documentation.
If, for example, you need to use a custom key for your private repositories on gitlab, you could add a block to your ssh config file, specifying:
When launching a Terraform shell, Leverage provides the user with a completely isolated environment tailored to operate in the current project via a Docker container.
The whole project is mounted on a directory named after the value for project_long in the global configuration file, or simply named \"project\" if this value is not defined. A project named myexample, would be mounted in /myexample.
The .gitconfig user's file is also mounted on /etc/gitconfig for convenience, while (if ssh-agent is running), the socket stated in SSH_AUTH_SOCK is mounted on /ssh-agent. Also, the credentials files (credentials and config) found in the project AWS credentials directory (~/.aws/myexample), are mapped to the locations given by the environment variables AWS_SHARED_CREDENTIALS_FILE and AWS_CONFIG_FILE respectively within the container.
Determining which credentials are needed to operate on a layer, and retrieving those credentials, may prove cumbersome for many complex layer definitions. In addition to that, correctly configuring them can also become a tedious an error prone process. For that reason Leverage automates this process upon launching the shell if requested by the user via the shell command options.
Bear in mind, that an authenticated shell session's credentials are obtained for the layer in which the session was launched. These credentials may not be valid for other layers in which different roles need to be assumed or require more permissions.
If authentication via SSO is required, the user will need to configure or login into SSO before launching the shell via
leverage terraform shell --sso\n
"},{"location":"user-guide/leverage-cli/shell/#operations-on-the-projects-layer","title":"Operations on the project's layer","text":"
In order to operate in a project's layer, Terraform commands such as plan or apply will need to receive extra parameters providing the location of the files that contain the definition of the variables required by the layer. Usually, these files are:
the project global configuration file common.tfvars
the account configuration file account.tfvars
the terraform backend configuration file backend.tfvars
In this case these parameters should take the form:
"},{"location":"user-guide/leverage-cli/extending-leverage/build.env/","title":"The build.env file","text":""},{"location":"user-guide/leverage-cli/extending-leverage/build.env/#override-defaults-via-buildenv-file","title":"Override defaults via build.env file","text":"
By utilizing the build.env capability, you can easily change some default behaviors of the CLI. In the binbash Leverage\u2122 Ref Architecture you will find the following build.env example as an example. This allows you to specify several configurations for the CLI, such as the Leverage-Toolbox-Image you want to use, ensuring that you are using the latest version or a specific version that you prefer based on your compatibility requirements. This helps you avoid compatibility issues and ensures that your infrastructure deployments go smoothly.
Customizing or extending the leverage-toolbox docker image
You can locally copy and edit the Dockerfile in order to rebuild it based on your needs, eg for a Dockerfile placed in the current working directory: $ docker build -t binbash/leverage-toolbox:1.2.7-0.1.4 --build-arg TERRAFORM_VERSION='1.2.7' . In case you like this changes to be permanent please consider creating and submitting a PR.
The leverage CLI has an environmental variable loading utility that will load all .env files with the given name in the current directory an all of its parents up to the repository root directory, and store them in a dictionary. Files are traversed from parent to child as to allow values in deeper directories to override possible previously existing values. Consider all files must bear the same name, which in our case defaults to \"build.env\". So you can have multiple build.env files that will be processed by the leverage CLI in the context of a specific layer of a Reference Architecture project. For example the /le-tf-infra-aws/apps-devstg/us-east-1/k8s-kind/k8s-resources/build.env file.
"},{"location":"user-guide/leverage-cli/extending-leverage/how-to-extend/","title":"Extending & Configuring leverage CLI","text":""},{"location":"user-guide/leverage-cli/extending-leverage/how-to-extend/#override-defaults-via-buildenv-file","title":"Override defaults via build.env file","text":"
By utilizing the build.env capability, you can easily change some default behaviors of the CLI. This allows you to specify several configurations for the CLI, such as the Leverage-Toolbox-Image that you want to use, ensuring that you are using the latest version or a specific version that you prefer based on your compatibility requirements. This helps you avoid compatibility issues and ensures that your infrastructure deployments go smoothly.
Read More about build.env
In order to further understand this mechanism and how to use it please visit the dedicated build.env entry.
Using additional .tfvars configuration files at the account level or at the global level will allow you to extend your terraform configuration entries. Consider that using multiple .tfvars configuration files allows you to keep your configuration entries well-organized. You can have separate files for different accounts or environments, making it easy to manage and maintain your infrastructure. This also makes it easier for other team members to understand and work with your configuration, reducing the risk of misconfigurations or errors.
Read More about .tfvars config files
In order to further understand this mechanism and how to use it please visit the dedicated .tfvars configs entry.
"},{"location":"user-guide/leverage-cli/extending-leverage/how-to-extend/#custom-tasks-with-buildpy","title":"Custom tasks with build.py","text":"
Leverage CLI has a native mechanism to allow customizing your workflow. With the custom tasks feature using build.py, you can write your own tasks using Python, tailoring the CLI to fit your specific workflow. This allows you to automate and streamline your infrastructure deployments, reducing the time and effort required to manage your infrastructure. You can also easily integrate other tools and services into your workflow to further improve your productivity.
Read More about build.py custom tasks
In order to further understand this mechanism and how to use it please visit the dedicated build.py custom tasks entry.
"},{"location":"user-guide/leverage-cli/extending-leverage/how-to-extend/#fork-collaborate-and-improve","title":"Fork, collaborate and improve","text":"
By forking the leverage repository on GitHub and contributing to the project, you have the opportunity to make a positive impact on the product and the community. You can fix bugs, implement new features, and contribute your ideas and feedback. This helps to ensure that the product continues to evolve and improve, serving the needs of the community and making infrastructure deployments easier for everyone.
Read More about contributing with the project
In order to further understand this mechanism and how to use it please visit the dedicated CONTRIBUTING.md entry.
The same way we needed to automate or simplify certain tasks or jobs for the user, you may need to do the same in your project.
Leverage CLI does not limit itself to provide only the core functionality required to create and manage your Leverage project, but also allows for the definition of custom tasks, at the build.py root context file, that can be used to add capabilities that are outside of Leverage CLI's scope.
By implementing new auxiliary Leverage tasks you can achieve consistency and homogeneity in the experience of the user when interacting with your Leverage project and simplify the usage of any other tool that you may require.
To check some common included tasks please see here
Tasks are simple python functions that are marked as such with the use of the @task() decorator. We call the file where all tasks are defined a 'build script', and by default it is assumed to be named build.py. If you use any other name for your build script, you can let Leverage know through the global option --filename.
from leverage import task\n\n@task()\ndef copy_file(src, dst):\n\"\"\"Copy src file to dst\"\"\"\n print(f\"Copying {src} to {dst}\")\n
The contents in the task's docstring are used to provide a short description of what's the task's purpose when listing all available tasks to run.
$ leverage --list-tasks\nTasks in build file `build.py`:\n\n copy_file Copy src file to dst\n\nPowered by Leverage 1.0.10\n
Any argument that the task may receive are to be given when running the task. The syntax for passing arguments is similar to that of Rake.
The task decorator allows for the definition of dependencies. These are defined as positional arguments in the decorator itself. Multiple dependencies can be defined for each task.
from leverage import task\n@task()\ndef html(target=\".\"):\n\"\"\"Generate HTML.\"\"\"\n print(f\"Generating HTML in directory \\\"{target}\\\"\")\n\n@task()\ndef images():\n\"\"\"Prepare images.\"\"\"\n print(\"Preparing images...\")\n\n@task(html, images)\ndef start_server(host=\"localhost\", port=\"80\"):\n\"\"\"Start the server\"\"\"\n print(f\"Starting server at {host}:{port}\")\n
We can see how the task start_server depends on both html and images. This means that both html and images will be executed before start_server and in that same order.
$ leverage run start_server\n[09:34:54.848] [ build.py - \u279c Starting task html ]\nGenerating HTML in directory \".\"\n[09:34:54.851] [ build.py - \u2714 Completed task html ]\n[09:34:54.852] [ build.py - \u279c Starting task images ]\nPreparing images...\n[09:34:54.854] [ build.py - \u2714 Completed task images ]\n[09:34:54.855] [ build.py - \u279c Starting task start_server ]\nStarting server at localhost:80\n[09:34:54.856] [ build.py - \u2714 Completed task start_server ]\n
"},{"location":"user-guide/leverage-cli/extending-leverage/tasks/#ignoring-a-task","title":"Ignoring a task","text":"
If you find yourself in the situation were there's a task that many other tasks depend on, and you need to quickly remove it from the dependency chains of all those tasks, ignoring its execution is a very simple way to achieve that end without having to remove all definitions and references across the code.
To ignore or disable a task, simply set ignore to True in the task's decorator.
from leverage import task\n\n@task()\ndef html(target=\".\"):\n\"\"\"Generate HTML.\"\"\"\n print(f\"Generating HTML in directory \\\"{target}\\\"\")\n\n@task(ignore=True)\ndef images():\n\"\"\"Prepare images.\"\"\"\n print(\"Preparing images...\")\n\n@task(html, images)\ndef start_server(server=\"localhost\", port=\"80\"):\n\"\"\"Start the server\"\"\"\n print(f\"Starting server at {server}:{port}\")\n
$ leverage run start_server\n[09:38:32.819] [ build.py - \u279c Starting task html ]\nGenerating HTML in directory \".\"\n[09:38:32.822] [ build.py - \u2714 Completed task html ]\n[09:38:32.823] [ build.py - \u2933 Ignoring task images ]\n[09:38:32.824] [ build.py - \u279c Starting task start_server ]\nStarting server at localhost:80\n[09:38:32.825] [ build.py - \u2714 Completed task start_server ]\n
When listing the available tasks any ignored task will be marked as such.
$ leverage --list-tasks\nTasks in build file `build.py`:\n\n html Generate HTML.\n images [Ignored] Prepare images.\n start_server Start the server\n\nPowered by Leverage 1.0.10\n
Sometimes you may want to define auxiliary tasks that don't need to be shown as available to run by the user. For this scenario, you can make any task into a private one. There's two ways to accomplish this, either by naming the task with an initial underscore (_) or by setting private to True in the task's decorator.
from leverage import task\n\n@task(private=True)\ndef clean():\n\"\"\"Clean build directory.\"\"\"\n print(\"Cleaning build directory...\")\n\n@task()\ndef _copy_resources():\n\"\"\"Copy resource files. This is a private task and will not be listed.\"\"\"\n print(\"Copying resource files\")\n\n@task(clean, _copy_resources)\ndef html(target=\".\"):\n\"\"\"Generate HTML.\"\"\"\n print(f\"Generating HTML in directory \\\"{target}\\\"\")\n\n@task(clean, _copy_resources, ignore=True)\ndef images():\n\"\"\"Prepare images.\"\"\"\n print(\"Preparing images...\")\n\n@task(html, images)\ndef start_server(host=\"localhost\", port=\"80\"):\n\"\"\"Start the server\"\"\"\n print(f\"Starting server at {host}:{port}\")\n
Private tasks will be executed, but not shown when tasks are listed.
$ leverage --list-tasks\nTasks in build file `build.py`:\n\n html Generate HTML.\n images Prepare images.\n start_server Start the server\n\nPowered by Leverage 1.0.10\n
If you have a task that is run much more often than the rest, it can get tedious to always pass the name of that task to the run command. Leverage allows for the definition of a default task to address this situation. Thi task is executed when no task name is given.
To define a default task, simply assign the already defined task to the special variable __DEFAULT__.
from leverage import task\n\n@task()\ndef html(target=\".\"):\n\"\"\"Generate HTML.\"\"\"\n print(f\"Generating HTML in directory \\\"{target}\\\"\")\n\n@task(ignore=True)\ndef images():\n\"\"\"Prepare images.\"\"\"\n print(\"Preparing images...\")\n\n@task(html, images)\ndef start_server(server=\"localhost\", port=\"80\"):\n\"\"\"Start the server\"\"\"\n print(f\"Starting server at {server}:{port}\")\n\n__DEFAULT__ = start_server\n
The default task is marked as such when listing all available tasks.
$ leverage --list-tasks\nTasks in build file `build.py`:\n\n html Generate HTML.\n images [Ignored] Prepare images.\n start_server [Default] Start the server\n\nPowered by Leverage 1.0.10\n
Build scripts are not only looked up in the current directory but also in all parent directories up to the root of the Leverage project. This makes it possible to launch tasks form any directory of the project as long as any parent of the current directory holds a build script.
Leverage CLI treats the directory in which the build script is found as a python package. This means that you can break up your build files into modules and simply import them into your main build script, encouraging modularity and code reuse.
Leverage CLI empowers you to create whole libraries of functionalities for your project. You can use it to better organize your tasks or implement simple auxiliary python functions.
As mentioned in the Organizing build scripts section, Leverage CLI treats the directory in which the main build script is located as a python package in order to allow importing of user defined python modules. If this directory contains a period (.) in its name, this will create issues for the importing process. This is because the period is used by python to separate subpackages from their parents.
For example, if the directory where the build script build.py is stored is named local.assets, at the time of loading the build script, python will try to locate local.build instead of locating local.assets.build and fail.
The same situation will arise from any other subdirectory in the project. When importing modules from those directories, they wont be found.
The simple solution to this is to avoid using periods when naming directories. If the build script is located in the project's root folder, this would also apply to that directory.
This tasks is aimed to help to determine the current layer dependencies.
If the current layer is getting information from remote states in different layers, then these layers have to be run before the current layer, this is called a dependency.
To run this task, cd into the desired layer and run:
leverage run layer_dependency\n
This is a sample output:
\u276f leverage run layer_dependency\n[10:37:41.817] [ build.py - \u279c Starting task _checkdir ] [10:37:41.824] [ build.py - \u2714 Completed task _checkdir ] [10:37:41.825] [ build.py - \u279c Starting task layer_dependency ] \nNote layer dependency is calculated using remote states.\nNevertheless, other sort of dependencies could exist without this kind of resources,\ne.g. if you rely on some resource created in a different layer and not referenced here.\n{\n\"security\": {\n\"remote_state_name\": \"security\",\n \"account\": \"apps-devstg\",\n \"layer\": \"security-keys\",\n \"key\": \"apps-devstg/security-keys/terraform.tfstate\",\n \"key_raw\": \"${var.environment}/security-keys/terraform.tfstate\",\n \"usage\": {\n\"used\": true,\n \"files\": [\n\"/home/jdelacamara/Dev/work/BinBash/code/le-tf-infra-aws/apps-devstg/us-east-1/ec2-fleet-ansible --/ec2_fleet.tf\"\n]\n}\n},\n \"vpc\": {\n\"remote_state_name\": \"vpc\",\n \"account\": \"apps-devstg\",\n \"layer\": \"network\",\n \"key\": \"apps-devstg/network/terraform.tfstate\",\n \"key_raw\": \"${var.environment}/network/terraform.tfstate\",\n \"usage\": {\n\"used\": true,\n \"files\": [\n\"/home/jdelacamara/Dev/work/BinBash/code/le-tf-infra-aws/apps-devstg/us-east-1/ec2-fleet-ansible --/locals.tf\",\n \"/home/jdelacamara/Dev/work/BinBash/code/le-tf-infra-aws/apps-devstg/us-east-1/ec2-fleet-ansible --/ec2_fleet.tf\"\n]\n}\n},\n \"vpc-shared\": {\n\"remote_state_name\": \"vpc-shared\",\n \"account\": \"shared\",\n \"layer\": \"network\",\n \"key\": \"shared/network/terraform.tfstate\",\n \"key_raw\": \"shared/network/terraform.tfstate\",\n \"usage\": {\n\"used\": true,\n \"files\": [\n\"/home/jdelacamara/Dev/work/BinBash/code/le-tf-infra-aws/apps-devstg/us-east-1/ec2-fleet-ansible --/ec2_fleet.tf\"\n]\n}\n}\n}\n[10:37:41.943] [ build.py - \u2714 Completed task layer_dependency ]\n
Data:
\"remote_state_name\": the remote state name
\"account\": the account the remote state belongs to
\"layer\": the referenced layer
\"key\": the key name (i.e. the tfstate file name for the remote state)
\"key_raw\": the same as key but with variables not resolved
\"usage\": if this remote state is used and in what files
For a shorter version:
\u276f leverage run layer_dependency\\['summary=True'\\]\n[10:47:00.461] [ build.py - \u279c Starting task _checkdir ] [10:47:00.467] [ build.py - \u2714 Completed task _checkdir ] [ build.py - \u279c Starting task layer_dependency ] \nNote layer dependency is calculated using remote states.\nNevertheless, other sort of dependencies could exist without this kind of resources,\ne.g. if you rely on some resource created in a different layer and not referenced here.\n{\n\"this\": [\n\"apps-devstg/security-keys/terraform.tfstate\",\n \"apps-devstg/network/terraform.tfstate\",\n \"shared/network/terraform.tfstate\"\n]\n}\n[10:47:00.489] [ build.py - \u2714 Completed task layer_dependency ]
If you already have a binbash Leverage project created, you can download this file into your project root dir and add this import to your build.py:
The aws command is a wrapper for a containerized installation of AWS CLI 2.0. All commands are passed directly to the AWS CLI and you should expect the same behavior from all of them, except for the few exceptions listed below.
Extracts information from the project's Terraform configuration to generate the required profiles for AWS CLI to handle SSO.
In the process, you will need to log in via your identity provider. To allow you to do this, Leverage will attempt to open the login page in the system's default browser.
It wraps aws sso logout taking extra steps to make sure that all tokens and temporary credentials are wiped from the system. It also reminds the user to log out form the AWS SSO login page and identity provider portal. This last action is left to the user to perform.
Important
Please keep in mind that this command will not only remove temporary credentials but also the AWS config file. If you use such file to store your own configuration please create a backup before running the sso logout command.
The credentials command is used to set up and manage the AWS CLI credentials required to interact with the AWS environment.
All credentials's subcommands feed off the project.yaml, build.env, and Terraform configuration files to obtain the information they need. In case the basic required information is not found, the subcommands will prompt the user for it.
The credentials configure command sets up the credentials needed to interact with the AWS environment, from the initial deployment process (BOOTSTRAP) to everyday management (MANAGEMENT) and development or use (SECURITY) of it.
It attempts to retrieve the structure of the organization in order to generate all the AWS CLI profiles required to interact with the environment and update the terraform configuration with the id of all relevant accounts.
Backups of the previous configured credentials files are always created when overwriting or updating the current ones.
--type: Type of the credentials to set. Can be any of BOOTSTRAP, MANAGEMENT or SECURITY. This option is case insensitive. This option is required.
--credentials-file: Path to a .csv credentials file, as produced by the AWS Console, containing the user's programmatic access keys. If not given, the user will be prompted for the credentials.
--fetch-mfa-device: Retrieve an MFA device serial from AWS for the current user.
--overwrite-existing-credentials: If the type of credentials being configured is already configured, overwrite current configuration. Mutually exclusive option with --skip-access-keys-setup.
--skip-access-keys-setup: Skip the access keys configuration step. Continue on to setting up the accounts profiles. Mutually exclusive option with --overwrite-existing-credentials.
--skip-assumable-roles-setup: Don't configure each account profile to assume their specific role.
If neither of --overwrite-existing-credentials or --skip-access-keys-setup is given, the user will be prompted to choose between both actions when appropriate.
To have this feature available, Leverage Toolbox versions 1.2.7-0.1.7 and up, or 1.3.5-0.1.7 and up must be used.
The kubectl command is a wrapper for a containerized installation of kubectl. It provides the kubectl executable with specific configuration values required by Leverage.
It transparently handles authentication, whether it is Multi-Factor or via Single Sign-On, on behalf of the user in the commands that require it. SSO Authentication takes precedence over MFA when both are active.
The sub-commands can only be run at layer level and will not run anywhere else in the project. The sub-command configure can only be run at an EKS cluster layer level. Usually called cluster.
The command can also be invoked via its shortened version kc.
Configuring on first use
To start using this command, you must first run leverage kubectl configure on a cluster layer,
to set up the credentials on the proper config file.
The project init subcommand initializes a Leverage project in the current directory. If not found, it also initializes the global config directory for Leverage CLI ~/.leverage/, and fetches the template for the projects' creation.
It then proceeds to drop a template file for the project configuration called project.yaml and initializes a git repository in the directory.
The project create subcommand creates the files structure for the architecture in the current directory and configures it based on the values set in the project.yaml file.
It will then proceed to make sure all files follow the standard Terraform code style.
An arbitrary number of tasks can be given to the command. All tasks given must be in the form of the task name optionally followed by arguments that the task may require enclosed in square brackets, i.e. TASK_NAME[TASK_ARGUMENTS]. The execution respects the order in which they were provided.
If no tasks are given, the default task will be executed. In case no default task is defined, the command will list all available tasks to run.
Example:
leverage run task1 task2[arg1,arg2] task3[arg1,kwarg1=val1,kwarg2=val2]\n
task1 is invoked with no arguments, which is equivalent to task1[]
task2 receives two positional arguments arg1 and arg2
task3 receives one positional argument arg1 and two keyworded arguments kwarg1 with value val1 and kwarg2 with value val2
Run a shell in a generic container. It supports mounting local paths and injecting arbitrary environment variables. It also supports AWS credentials injection via mfa/sso.
>> leverage shell --help\n\nUsage: leverage shell [OPTIONS]\n\nRun a shell in a generic container. It supports mounting local paths and\n injecting arbitrary environment variables. It also supports AWS credentials\n injection via mfa/sso.\n\n Syntax: leverage shell --mount <local-path> <container-path> --env-var <name> <value>\n Example: leverage shell --mount /home/user/bin/ /usr/bin/ --env-var env dev\n\n Both mount and env-var parameters can be provided multiple times.\n Example: leverage shell --mount /home/user/bin/ /usr/bin/ --mount /etc/config.ini /etc/config.ini --env-var init 5 --env-var env dev\n\nOptions:\n --mount <TEXT TEXT>...\n --env-var <TEXT TEXT>...\n --mfa Enable Multi Factor Authentication upon launching shell.\n --sso Enable SSO Authentication upon launching shell.\n --help Show this message and exit.\n
The terraform command is a wrapper for a containerized installation of Terraform. It provides the Terraform executable with specific configuration values required by Leverage.
It transparently manages authentication, either Multi-Factor or Single Sign-On, on behalf of the user on commands that require it. SSO authentication takes precedence over MFA when both are active.
Some commands can only be run at layer level and will not run anywhere else in the project.
The command can also be invoked via its shortened version tf.
Since version 1.12, all the subcommands supports --mount and --env-var parameters in form of tuples:
leverage terraform --mount /home/user/bin/ /usr/bin/ --env-var FOO BAR apply\n
You can also provide them multiple times:
leverage terraform --mount /usr/bin/ /usr/bin/ --mount /etc/config /config --env-var FOO BAR --env-var TEST OK init\n
--layers: Applies command to layers listed in this option. (see more info here)
Regarding S3 backend keys
If the S3 backend block is set, and no key was defined, Leverage CLI will try to create a new one autoatically and store it in the config.tf file. It will be based on the layer path relative to the account.
Check the Terraform backend configuration in the code definition.
When you are setting up the backend layer for the very first time, the S3 bucket does not yet exist. When running validations, Leverage CLI will detect that the S3 Key does not exist or cannot be generated. Therefore, it is necessary to first create the S3 bucket by using the init --skip-validation flag in the initialization process, and then move the \"tfstate\" file to it.
Import the resource with the given ID into the Terraform state at the given ADDRESS.
Can only be run at layer level.
zsh globbing
Zsh users may need to prepend noglob to the import command for it to be recognized correctly, as an alternative, square brackets can be escaped as \\[\\]
For using this feature Leverage Toolbox versions 1.2.7-0.0.5 and up, or 1.3.5-0.0.1 and up must be used.
The tfautomv command is a wrapper for a containerized installation of tfautomv. It provides the tfautomv executable with specific configuration values required by Leverage.
It transparently handles authentication, whether it is Multi-Factor or via Single Sign-On, on behalf of the user in the commands that require it. SSO Authentication takes precedence over MFA when both are active.
This command can only be run at layer level and will not run anywhere else in the project.
This parameter can be used with the following Leverage CLI Terraform commands:
init
plan
apply
output
destroy
Value:
Parameter Type Description --layers string A comma serparated list of layer's relative paths"},{"location":"user-guide/leverage-cli/reference/terraform/layers/#common-workflow","title":"Common workflow","text":"
When using the --layers parameter, these commands should be run from account or layers-container-directory directories.
...any of the aforementioned commands, combined with --layers, can be called from /home/user/project/management/, /home/user/project/management/global/ or /home/user/project/management/us-east-1/.
The value for this parameter is a comma separated list of layer's relative paths.
Leverage CLI will iterate through the layer's relative paths, going into each one, executing the command and going back to the original directory.
Example:
For this command, from /home/user/project/management/:
leverage tf plan --layers us-east-1/terraform-backend,global/security-base\n
...the Leverage CLI will:
check each one of the layer's relative paths exists
check each one of the layer's relative paths exists
go into us-east-1/terraform-backend directory
run the validate-layout command
go back to /home/user/project/management/
go into global/security-base directory
run the validate-layout command
go back to /home/user/project/management/
go into us-east-1/terraform-backend directory
run the init command
go back to /home/user/project/management/
go into global/security-base directory
run the init command
go back to /home/user/project/management/
This is done this way to prevent truncated executions. Meaning, if any of the validation fails, the user will be able to fix whatever has to be fixed and run the command again as it is.
Skipping the validation
The --skip-validation flag still can be used here with --layers.
"},{"location":"user-guide/leverage-cli/reference/terraform/layers/#terraform-parameters-and-flags","title":"Terraform parameters and flags","text":"
Terraform parameters and flags can still be passed when using the --layers parameter.
Config files can be found under each config folders
Global config file /config/common.tfvars contains global context TF variables that we inject to TF commands which are used by all sub-directories such as leverage terraform plan or leverage terraform apply and which cannot be stored in backend.tfvars due to TF.
Account config files
backend.tfvars contains TF variables that are mainly used to configure TF backend but since profile and region are defined there, we also use them to inject those values into other TF commands.
account.tfvars contains TF variables that are specific to an AWS account.
Global common-variables.tf file /config/common-variables.tfvars contains global context TF variables that we symlink to all terraform layers code e.g. shared/us-east-1/tools-vpn-server/common-variables.tf.
build.env file
By utilizing the build.env capability, you can easily change some default behaviors of the CLI. Read more in its dedicated \"Override defaults via build.env file\" section.
"},{"location":"user-guide/ref-architecture-aws/configuration/#setting-credentials-for-terraform-via-aws-profiles","title":"Setting credentials for Terraform via AWS profiles","text":"
File backend.tfvars will inject the profile name that TF will use to make changes on AWS.
Such profile is usually one that relies on another profile to assume a role to get access to each corresponding account.
Please read the credentials section to understand the alternatives supported by Leverage to authenticate with AWS.
Read the following page leverage doc to understand how to set up a profile to assume a role
Currently the following two methods are supported:
AWS IAM: this is essentially using on-disk, permanent programmatic credentials that are tied to a given IAM User. This method can optionally support MFA which is highly recommended since using permanent credentials is discouraged, so at least with MFA you can counter-balance that. Keep reading...
AWS IAM Identity Center (formerly known as AWS SSO): this one is more recent and it's the method recommeded by AWS since it uses roles (managed by AWS) which in turn enforce the usage of temporary credentials. Keep reading...
The following block provides a brief explanation of the chosen files/folders layout, under every account (management, shared, security, etc) folder you will see a service layer structure similar to the following:
Configuration files are organized by environments (e.g. dev, stg, prd), and service type, which we call layers (identities, organizations, storage, etc) to keep any changes made to them separate. Within each of those layers folders you should find the Terraform files that are used to define all the resources that belong to such account environment and specific layer.
Project file structure
An extended project file structure could be found here While some other basic concepts and naming conventions in the context of Leverage like \"project\" and \"layer\" here
NOTE: As a convention folders with the -- suffix reflect that the resources are not currently created in AWS, basically they've been destroyed or not yet exist.
Such layer separation is meant to avoid situations in which a single folder contains a lot of resources. That is important to avoid because at some point, running leverage terraform plan / apply starts taking too long and that becomes a problem.
This organization also provides a layout that is easier to navigate and discover. You simply start with the accounts at the top level and then you get to explore the resource categories within each account.
The AWS Reference Architecture was created on a set of opinionated definitions and conventions on:
how to organize files/folders,
where to store configuration files,
how to handle credentials,
how to set up and manage state,
which commands and workflows to run in order to perform different tasks,
and more.
Key Concept
Although the Reference Architecture for AWS was initially designed to be compatible with web, mobile and microservices application stacks, it can also accommodate other types of workloads such as machine learning, blockchain, media, and more.
It was designed with modularity in mind. A multi-accounts approach is leveraged in order to improve security isolation and resources separation. Furthermore each account infrastructure is divided in smaller units that we call layers. Each layer contains all the required resources and definitions for a specific service or feature to function.
Key Concept
The design is strongly based on the AWS Well Architected Framework.
Each individual configuration of the Reference Architecture is referred to as a project. A Leverage project is comprised of all the relevant accounts and layers.
Better code quality and modules maturity (proven and tested).
Supported by binbash, and public modules even by 1000's of top talented Open Source community contributors.
Increase development cost savings.
Clients keep full rights to all commercial, modification, distribution, and private use of the code (No Lock-In) through forks inside their own projects' repositories (open-source and commercially reusable via license MIT and Apache 2.0.
"},{"location":"user-guide/ref-architecture-aws/overview/#a-more-visual-example","title":"A More Visual Example","text":"
The following diagram shows the type of AWS multi-account setup you can achieve by using this Reference Architecture:
The following are official AWS documentations, blog posts and whitepapers we have considered while building our Reference Solutions Architecture:
CloudTrail for AWS Organizations
Reserved Instances - Multi Account
AWS Multiple Account Security Strategy
AWS Multiple Account Billing Strategy
AWS Secure Account Setup
Authentication and Access Control for AWS Organizations
AWS Regions
VPC Peering
Route53 DNS VPC Associations
AWS Well Architected Framework
AWS Tagging strategies
Inviting an AWS Account to Join Your Organization
"},{"location":"user-guide/ref-architecture-aws/tf-state/","title":"Terraform - S3 & DynamoDB for Remote State Storage & Locking","text":""},{"location":"user-guide/ref-architecture-aws/tf-state/#overview","title":"Overview","text":"
Use this terraform configuration files to create the S3 bucket & DynamoDB table needed to use Terraform Remote State Storage & Locking.
What is the Terraform Remote State?
Read the official definition by Hashicorp.
Figure: Terraform remote state store & locking necessary AWS S3 bucket and DynamoDB table components. (Source: binbash Leverage, \"Terraform Module: Terraform Backend\", Terraform modules registry, accessed December 3rd 2020)."},{"location":"user-guide/ref-architecture-aws/tf-state/#prerequisites","title":"Prerequisites","text":"
Terraform repo structure + state backend initialization
Ensure you have Leverage CLI installed in your system
Refer to Configuration Pre-requisites to understand how to set up the configuration files required for this layer. Where you must build your Terraform Reference Architecture account structure
Leveraged by the Infrastructure as Code (IaC) Library through the terraform-aws-tfstate-backend module
At the corresponding account dir, eg: /shared/base-tf-backend then,
Run leverage terraform init --skip-validation
Run leverage terraform plan, review the output to understand the expected changes
Run leverage terraform apply, review the output once more and type yes if you are okay with that
This should create a terraform.tfstate file in this directory but we don't want to push that to the repository so let's push the state to the backend we just created
In the base-tf-backend folder you should find the definition of the infrastructure that needs to be deployed before you can get to work with anything else.
IMPORTANT: THIS IS ONLY NEEDED IF THE BACKEND WAS NOT CREATED YET. IF THE BACKEND ALREADY EXISTS YOU JUST USE IT.
The sequence of commands that you run to operate on each layer is called the Terraform workflow. In other words, it's what you would typically run in order to create, update, or delete the resources defined in a given layer.
Now, the extended workflow is annotated with more explanations and it is intended for users who haven't yet worked with Leverage on a daily basis:
Terraform Workflow
Make sure you understood the basic concepts:
Overview
Configuration
Directory Structure
Remote State
Make sure you installed the Leverage CLI.
Go to the layer (directory) you need to work with, e.g. shared/global/base-identities/.
Run leverage terraform init -- only the first time you work on this layer, or if you upgraded modules or providers versions, or if you made changes to the Terraform remote backend configuration.
Make any changes you need to make. For instance: modify a resource definition, add an output, add a new resource, etc.
Run leverage terraform plan to preview any changes.
Run leverage terraform apply to give it a final review and to apply any changes.
Tip
You can use the --layers argument to run Terraform commands on more than one layer. For more information see here
Note
If desired, at step #5 you could submit a PR, allowing you and the rest of the team to understand and review what changes would be made to your AWS Cloud Architecture components before executing leverage terraform apply (terraform apply). This brings the huge benefit of treating changes with a GitOps oriented approach, basically as we should treat any other code & infrastructure change, and integrate it with the rest of our tools and practices like CI/CD, in
"},{"location":"user-guide/ref-architecture-aws/workflow/#running-in-automation","title":"Running in Automation","text":"Figure: Running terraform with AWS in automation (just as reference).
Amazon CloudFront is a fast content delivery network (CDN) service that securely delivers data, videos, applications, and APIs to customers globally with low latency, high transfer speeds, all within a developer-friendly environment. CloudFront is integrated with AWS \u2013 both physical locations that are directly connected to the AWS global infrastructure, as well as other AWS services. CloudFront works seamlessly with services including AWS Shield for DDoS mitigation, Amazon S3, Elastic Load Balancing, API Gateway or Amazon EC2 as origins for your applications, and Lambda@Edge to run custom code closer to customers\u2019 users and to customize the user experience. Lastly, if you use AWS origins such as Amazon S3, Amazon EC2 or Elastic Load Balancing, you don\u2019t pay for any data transferred between these services and CloudFront.
"},{"location":"user-guide/ref-architecture-aws/features/cdn/cdn/#load-balancer-alb-nlb-s3-cloudfront-origins","title":"Load Balancer (ALB | NLB) & S3 Cloudfront Origins","text":"Figure: AWS CloudFront with ELB and S3 as origin diagram. (Source: Lee Atkinson, \"How to Help Achieve Mobile App Transport Security (ATS) Compliance by Using Amazon CloudFront and AWS Certificate Manager\", AWS Security Blog, accessed November 17th 2020)."},{"location":"user-guide/ref-architecture-aws/features/cdn/cdn/#api-gateway-cloudfront-origins","title":"API Gateway Cloudfront Origins","text":"Figure: AWS CloudFront with API Gateway as origin diagram. (Source: AWS, \"AWS Solutions Library, AWS Solutions Implementations Serverless Image Handler\", AWS Solutions Library Solutions Implementations, accessed November 17th 2020)."},{"location":"user-guide/ref-architecture-aws/features/ci-cd/argocd/","title":"ArgoCD","text":""},{"location":"user-guide/ref-architecture-aws/features/ci-cd/argocd/#argocd","title":"ArgoCD","text":""},{"location":"user-guide/ref-architecture-aws/features/ci-cd/argocd/#aws-apps-services-k8s-eks-accounts-diagram","title":"AWS Apps & Services K8s EKS accounts diagram","text":"
The below diagram is based on our binbash Leverage Reference Architecture CI-CD official documentation
Figure: K8S reference architecture CI/CD with ArgoCD diagram. (Source: binbash Leverage Confluence Doc, \"Implementation Diagrams\", binbash Leverage Doc, accessed August 4th 2021)."},{"location":"user-guide/ref-architecture-aws/features/ci-cd/jenkins-argocd/","title":"CI/CD","text":""},{"location":"user-guide/ref-architecture-aws/features/ci-cd/jenkins-argocd/#jenkins-argocd","title":"Jenkins + ArgoCD","text":"Figure: ACI/CD with Jenkins + ArgoCD architecture diagram. (Source: ArgoCD, \"Overview - What Is Argo CD\", ArgoCD documentation, accessed November 18th 2020)."},{"location":"user-guide/ref-architecture-aws/features/ci-cd/jenkins-spinnaker/","title":"CI/CD","text":""},{"location":"user-guide/ref-architecture-aws/features/ci-cd/jenkins-spinnaker/#jenkins-spinnaker","title":"Jenkins + Spinnaker","text":"Figure: CI/CD with Jenkins + Spinnaker diagram. (Source: Irshad Buchh, \"Continuous Delivery using Spinnaker on Amazon EKS\", AWS Open Source Blog, accessed November 18th 2020)."},{"location":"user-guide/ref-architecture-aws/features/compute/k8s-eks/","title":"AWS Elastic Kubernetes Service (EKS)","text":"
Important
Please check the Reference Architecture for EKS to learn more details about this.
Kops is an official Kubernetes project for managing production-grade Kubernetes clusters. Kops is currently the best tool to deploy Kubernetes clusters to Amazon Web Services. The project describes itself as kubectl for clusters.
Core Features
Open-source & supports AWS and GCE
Deploy clusters to existing virtual private clouds (VPC) or create a new VPC from scratch
Supports public & private topologies
Provisions single or multiple master clusters
Configurable bastion machines for SSH access to individual cluster nodes
Built on a state-sync model for dry-runs and automatic idempotency
Direct infrastructure manipulation, or works with CloudFormation and Terraform
Rolling cluster updates
Supports heterogeneous clusters by creating multiple instance groups
Figure: AWS K8s Kops architecture diagram (just as reference). (Source: Carlos Rodriguez, \"How to deploy a Kubernetes cluster on AWS with Terraform & kops\", Nclouds.com Blog post, accessed November 18th 2020)."},{"location":"user-guide/ref-architecture-aws/features/compute/k8s-kops/#kops-pre-requisites","title":"Kops Pre-requisites","text":"
Important consideration
K8s clusters provisioned by Kops have a number of resources that need to be available before the cluster is created. These are Kops pre-requisites and they are defined in the 1-prerequisites directory which includes all Terraform files used to create/modify these resources.
The current code has been fully tested with the AWS VPC Network Module
NOTE1: Regarding Terraform versions please also consider https://github.com/binbashar/bb-devops-tf-aws-kops#todo
NOTE2: These dependencies will me mostly covered via Makefile w/ terraform dockerized cmds (https://hub.docker.com/repository/docker/binbash/terraform-awscli)
"},{"location":"user-guide/ref-architecture-aws/features/compute/k8s-kops/#resulting-solutions-architecture","title":"Resulting Solutions Architecture","text":"Figure: AWS K8s Kops architecture diagram (just as reference)."},{"location":"user-guide/ref-architecture-aws/features/compute/k8s-kops/#why-this-workflow","title":"Why this workflow","text":"
The workflow follows the same approach that is used to manage other terraform resources in your AWS accounts. E.g. network, identities, and so on.
So we'll use existing AWS resources to create a cluster-template.yaml containing all the resource IDs that Kops needs to create a Kubernetes cluster.
Why not directly use Kops CLI to create the K8s cluster as well as the VPC and its other dependencies?
While this is a valid approach, we want to manage all these building blocks independently and be able to fully customize any AWS component without having to alter our Kubernetes cluster definitions and vice-versa.
This is a fully declarative coding style approach to manage your infrastructure so being able to declare the state of our cluster in YAML files fits 100% as code & GitOps based approach.
The 2-kops directory includes helper scripts and Terraform files in order to template our Kubernetes cluster definition. The idea is to use our Terraform outputs from 1-prerequisites to construct a cluster definition.
Cluster Management via Kops is typically carried out through the kops CLI. In this case, we use a 2-kops directory that contains a Makefile, Terraform files and other helper scripts that reinforce the workflow we use to create/update/delete the cluster.
This workflow is a little different to the typical Terraform workflows we use. The full workflow goes as follows:
Cluster: Creation & Update
Modify files under 1-prerequisites
Main files to update probably are locals.tf and outputs.tf
Mostly before the cluster is created but could be needed afterward
Modify cluster-template.yml under 2-kops folder
E.g. to add or remove instance groups, upgrade k8s version, etc
At 2-kops/ context run make cluster-update will follow the steps below
Get Terraform outputs from 1-prerequisites
Generate a Kops cluster manifest -- it uses cluster-template.yml as a template and the outputs from the point above as replacement values
Update Kops state -- it uses the generated Kops cluster manifest in previous point (cluster.yml)
Generate Kops Terraform file (kubernetes.tf) -- this file represents the changes that Kops needs to apply on the cloud provider.
Run make plan
To preview any infrastructure changes that Terraform will make.
If desired we could submit a PR, allowing you and the rest of the team to understand and review what changes would be made to the Kubernetes cluster before executing make apply (terraform apply). This brings the huge benefit of treating changes to our Kubernetes clusters with a GitOps oriented approach, basically like we treat any other code & infrastructure change, and integrate it with the rest of our tools and practices like CI/CD, integration testing, replicate environments and so on.
Run make apply
To apply those infrastructure changes on AWS.
Run make cluster-rolling-update
To determine if Kops needs to trigger some changes to happen right now (dry run)
These are usually changes to the EC2 instances that won't get reflected as they depend on the autoscaling
Run make cluster-rolling-update-yes
To actually make any changes to the cluster masters/nodes happen
Cluster: Deletion
To clean-up any resources created for your K8s cluster, you should run:
At 2-kops folder context run make destroy
This will execute a terraform destroy of all the kubernets.tf declared AWS resources.
At 2-kops folder context run cluster-destroy
Will run Kops destroy cluster -- only dry run, no changes will be applied
Exec cluster-destroy-yes
Kops will effectively destroy all the remaining cluster resources.
Finally if at 1-prerequisites exec make destroy
This will remove Kops state S3 bucket + any other extra resources you've provisioned for your cluster.
The workflow may look complicated at first but generally it boils down to these simplified steps: 1. Modify cluster-template.yml 2. Run make cluster-update 3. Run make apply 4. Run make cluster-rolling-update-yes
What about persistent and stateful K8s resources?
This approach will work better the more stateless your Kubernetes workloads are. Treating Kubernetes clusters as ephemeral and replaceable infrastructure requires to consider not to use persistent volumes or the drawback of difficulties when running workloads such as databases on K8s. We feel pretty confident that we can recreate our workloads by applying each of our service definitions, charts and manifests to a given Kubernetes cluster as long as we keep the persistent storage separately on AWS RDS, DynamoDB, EFS and so on. In terms of the etcd state persistency, Kops already provisions the etcd volumes (AWS EBS) independently to the master instances they get attached to. This helps to persist the etcd state after rolling update your master nodes without any user intervention. Moreover simplifying volume backups via EBS Snapshots (consider https://github.com/binbashar/terraform-aws-backup-by-tags). We also use a very valuable backup tool named Velero (formerly Heptio Ark - https://github.com/vmware-tanzu/velero) to o back up and restore our Kubernetes cluster resources and persistent volumes.
TODO
IMPORTANT: Kops terraform output (kops update cluster --target terraform) is still generated for Terraform 0.11.x (https://github.com/kubernetes/kops/issues/7052) we'll take care of the migration when tf-0.12 gets fully supported.
Create a binbash Leverage public Confluence Wiki entry detailing some more info about etcd, calico and k8s versions compatibilities
Ultra light, ultra simple, ultra powerful. Linkerd adds security, observability, and reliability to Kubernetes, without the complexity. CNCF-hosted and 100% open source.
"},{"location":"user-guide/ref-architecture-aws/features/compute/k8s-service-mesh/#how-it-works","title":"How it works","text":"
How Linkerd works
Linkerd works by installing a set of ultralight, transparent proxies next to each service instance. These proxies automatically handle all traffic to and from the service. Because they\u2019re transparent, these proxies act as highly instrumented out-of-process network stacks, sending telemetry to, and receiving control signals from, the control plane. This design allows Linkerd to measure and manipulate traffic to and from your service without introducing excessive latency.
"},{"location":"user-guide/ref-architecture-aws/features/compute/k8s-service-mesh/#architecture","title":"Architecture","text":"Figure: Figure: Linkerd v2.10 architecture diagram. (Source: Linkerd official documentation, \"High level Linkerd control plane and a data plane.\", Linkerd Doc, accessed June 14th 2021)."},{"location":"user-guide/ref-architecture-aws/features/compute/k8s-service-mesh/#dashboard","title":"Dashboard","text":"Figure: Figure: Linkerd v2.10 dashboard. (Source: Linkerd official documentation, \"Linkerd dashboard\", Linkerd Doc, accessed June 14th 2021)."},{"location":"user-guide/ref-architecture-aws/features/compute/k8s-service-mesh/#read-more","title":"Read more","text":"
Related resources
Linkerd vs Istio benchmarks
"},{"location":"user-guide/ref-architecture-aws/features/compute/overview/","title":"Compute","text":""},{"location":"user-guide/ref-architecture-aws/features/compute/overview/#containers-and-serverless","title":"Containers and Serverless","text":"
Overview
In order to serve Client application workloads we propose to implement Kubernetes, and proceed to containerize all application stacks whenever it\u2019s the best solution (we\u2019ll also consider AWS Lambda for a Serverless approach when it fits better). Kubernetes is an open source container orchestration platform that eases the process of running containers across many different machines, scaling up or down by adding or removing containers when demand changes and provides high availability features. Also, it serves as an abstraction layer that will give Client the possibility, with minimal effort, to move the apps to other Kubernetes clusters running elsewhere, or a managed Kubernetes service such as AWS EKS, GCP GKE or others.
Clusters will be provisioned with Kops and/or AWS EKS, which are solutions meant to orchestrate this compute engine in AWS. Whenever possible the initial version deployed will be the latest stable release.
Figure: Kubernetes high level components architecture. (Source: Andrew Martin, \"11 Ways (Not) to Get Hacked\", Kubernetes.io Blog post, accessed November 18th 2020)."},{"location":"user-guide/ref-architecture-aws/features/compute/overview/#kubernetes-addons","title":"Kubernetes addons","text":"
Serverless is the native architecture of the cloud that enables you to shift more of your operational responsibilities to AWS, increasing your agility and innovation. Serverless allows you to build and run applications and services without thinking about servers. It eliminates infrastructure management tasks such as server or cluster provisioning, patching, operating system maintenance, and capacity provisioning. You can build them for nearly any type of application or backend service, and everything required to run and scale your application with high availability is handled for you.
Why use serverless?
Serverless enables you to build modern applications with increased agility and lower total cost of ownership. Building serverless applications means that your developers can focus on their core product instead of worrying about managing and operating servers or runtimes, either in the cloud or on-premises. This reduced overhead lets developers reclaim time and energy that can be spent on developing great products which scale and that are reliable.
Figure: AWS serverless architecture diagram (just as reference). (Source: Nathan Peck, \"Designing a modern serverless application with AWS Lambda and AWS Fargate\", Containers-on-AWS Medium Blog post, accessed November 18th 2020).
Serverless Compute Services
AWS Lambda lets you run code without provisioning or managing servers. You pay only for the compute time you consume - there is no charge when your code is not running.
Lambda@Edge allows you to run Lambda functions at AWS Edge locations in response to Amazon CloudFront events.
AWS Fargate is a purpose-built serverless compute engine for containers. Fargate scales and manages the infrastructure required to run your containers.
Apart from the EC2 instances that are part of Kubernetes, there are going to be other instances running tools for monitoring, logging centralization, builds/tests, deployment, among others. that are to be defined at this point. Some of them can be replaced by managed services, like: CircleCI, Snyk, etc, and this can have cons and pros that will need to be considered at the time of implementation. Any OS that is provisioned will be completely reproducible as code, in the event of migration to another vendor.
"},{"location":"user-guide/ref-architecture-aws/features/costs/costs/","title":"Cost Estimation & Optimization","text":""},{"location":"user-guide/ref-architecture-aws/features/costs/costs/#opportunity-to-optimize-resources","title":"Opportunity to optimize resources","text":"
Compute
Usage of reserved EC2 instances for stable workloads (AWS Cost Explorer Reserved Optimization | Compute Optimizer - get a -$ of up to 42% vs On-Demand)
Usage of Spot EC2 instances for fault-tolerant workloads (-$ by up to 90%).
Use ASG to allow your EC2 fleet to \u00b1 based on demand.
Id EC2 w/ low-utiliz and -$ by stop / rightsize them.
Compute Savings Plans to reduce EC2, Fargate and Lambda $ (Compute Savings Plans OK regardless of EC2 family, size, AZ, reg, OS or tenancy, OK for Fargate / Lambda too).
Databases
Usage of reserved RDS instances for stable workload databases.
Monitoring & Automation
AWS billing alarms + AWS Budget (forecasted account cost / RI Coverage) Notifications to Slack
Activate AWS Trusted Advisor cost related results
Id EBS w/ low-utiliz and -$ by snapshotting and then rm them
Check underutilized EBS to be possibly shrunk or removed.
Networking -> deleting idle LB -> Use LB check w/ RequestCount of > 100 past 7d.
Setup Lambda nuke to automatically clean up AWS account resources.
Setup lambda scheduler for stop and start resources on AWS (EC2, ASG & RDS)
Storage & Network Traffic
Check S3 usage and -$ by leveraging lower $ storage tiers.
Use S3 Analytics, or automate mv for these objects into lower $ storage tier w/ Life Cycle Policies or w/ S3 Intelligent-Tiering.
If DataTransferOut from EC2 to the public internet is significant $, consider implementing CloudFront.
Stable workloads will always run on reserved instances, the following calculation only considers 1yr. No Upfront mode, in which Client will not have to pay in advance but commits to this monthly usage and will be billed so, even if the instance type is not used. More aggressive Reservation strategies can be implemented to further reduce costs, these will have to be analyzed by business in conjunction with operations.
Will implement AWS RDS databases matching the requirements of the current application stacks. If the region selected is the same you're actually using for your legacy AWS RDS instances we will be able to create a peering connection to existing databases in order to migrate the application stacks first, then databases.
"},{"location":"user-guide/ref-architecture-aws/features/identities/credentials-vault/","title":"Hashicorp Vault credentials","text":""},{"location":"user-guide/ref-architecture-aws/features/identities/credentials-vault/#hashicorp-vault-private-api-endpoint","title":"Hashicorp Vault private API endpoint","text":"
If you are on HCP, you can get this from the Admin UI. Otherwise, it will depend on how you set up DNS, TLS and port settings for your self-hosted installation. We always favours a private endpoint deployment only accessible from the VPN.
"},{"location":"user-guide/ref-architecture-aws/features/identities/credentials-vault/#hcp-vault-private-api-endpoint","title":"HCP Vault private API endpoint","text":"
We'll need to setup this Vault auth token in our [/config/common.config] file whenever we run the Terraform Leverage Reference architecture for:
le-tf-infra-aws
le-tf-vault
Vault token generation and authentication
Vault token that will be used by Terraform, or vault cli to perform calls to Vault API. During the initial setup, you will have to use a root token. If you are using a self-hosted installation you will get such token after you initialize Vault; if you are using Hashicorp Cloud Platform you can get the token from HCP Admin UI.
After the initial setup, and since we recommend integrating Vault to Github for authentication, you will have to follow these steps:
Generate a GitHub Personal Access Token: https://github.com/settings/tokens
Click \u201cGenerate new token\u201c
Under scopes, only select \"read:org\", under \"admin:org\"
"},{"location":"user-guide/ref-architecture-aws/features/identities/credentials-vault/#get-vault-token-from-your-gh-auth-token","title":"Get vault token from your GH auth token","text":"
Run vault cli v\u00eda docker: docker run -it vault:1.7.2 sh
Vault ENV vars setup ( NOTE: this will change a little bit between AWS self-hosted vs HCP vault deployment)
\u256d\u2500 \uf179 \ue0b1 \uf015 ~ \ue0b0\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\ue0b2 \u2714 \ue0b3 at 14:21:27 \uf017\n\u2570\u2500 docker run -it vault:1.7.2 sh\n/ # export VAULT_ADDR=\"https://bb-le-shared-vault-cluster.private.vault.xxxxxxx.a\nws.hashicorp.cloud:8200\"; export VAULT_NAMESPACE=\"admin\"\n\n/ # vault login -method=github\nGitHub Personal Access Token (will be hidden):\nSuccess! You are now authenticated. The token information displayed below\nis already stored in the token helper. You do NOT need to run \"vault login\"\nagain. Future Vault requests will automatically use this token.\n\nKey Value\n--- -----\ntoken s.PNAXXXXXXXXXXXXXXXXXXX.hbtct\ntoken_accessor KTqKKXXXXXXXXXXXXXXXXXXX.hbtct\ntoken_duration 1h\n...\n
input your GH personal access token
Set the returned token in step 4) into /config/common.config -> vault_token=\"s.PNAXXXXXXXXXXXXXXXXXXX.hbtct\"
NOTE: the admin token from https://portal.cloud.hashicorp.com/ will always work but it's use is discouraged for the nominated GH personal access token for security audit trail reasons
You can also manage your Vault instance via its UI. We'll present below screenshot to show an example using the Github personal access token, one of our supported auth methods.
Generate a GitHub Personal Access Token: https://github.com/settings/tokens
Click \u201cGenerate new token\u201c
Under scopes, only select \"read:org\", under \"admin:org\"
Open your preferred web browser choose Github auth method and paste your GH token and you'll be able to login to your instance.
These are temporary credentials used for the initial deployment of the architecture, and they should only be used for this purpose. Once this process is finished, management and security users should be the ones managing the environment.
management credentials are meant to carry the role of making all important administrative tasks in the environment (e.g. billing adjustments). They should be tied to a physical user in your organization.
A user with these credentials will assume the role OrganizationAccountAccessRole when interacting the environment.
These credentials are the ones to be used for everyday maintenance and interaction with the environment. Users in the role of DevOps | SecOps | Cloud Engineer in your organization should use these credentials.
A user with these credentials will assume te role DevOps when interacting with the environment.
"},{"location":"user-guide/ref-architecture-aws/features/identities/gpg/","title":"GPG Keys","text":""},{"location":"user-guide/ref-architecture-aws/features/identities/gpg/#why-do-we-use-gpg-keys","title":"Why do we use GPG keys?","text":"
By default our Leverage Reference Architectre base-identities layer approach is to use IAM module to manage AWS IAM Users credentials with encryption to grant strong security.
This module outputs commands and GPG messages which can be decrypted either using command line to get AWS Web Console user's password and user's secret key.
Notes for keybase users
If possible, always use GPG encryption to prevent Terraform from keeping unencrypted password and access secret key in state file.
Keybase pre-requisites
When gpg_key is specified as keybase:username, make sure that the user public key has already been uploaded to the Reference Architecture base-identities layer keys folder
"},{"location":"user-guide/ref-architecture-aws/features/identities/gpg/#managing-your-gpg-keys","title":"Managing your GPG keys","text":"
Create a key pair
NOTE: the user for whom this account is being created needs to do this
Install gpg
Run gpg --version to confirm
Run gpg --gen-key and provide \"Your Name\" and \"Your Email\" as instructed -- you must also provide a passphrase
Run gpg --list-keys to check that your key was generated
Delete a key pair
Run gpg --list-keys to check your key id
Run gpg --delete-secret-keys \"Your Name\" to delete your private gpg key
Run gpg --delete-key \"Your Name\" to delete your public gpg key
Export your public key
NOTE: the user must have created a key pair before doing this
Run gpg --export \"Your Name\" | base64
Now the user can share her/his public key for creating her/his account
Decrypt your encrypted password
The user should copy the encrypted password from whatever media it was provided to her/him
Run gpg --decrypt a_file_with_your_pass (in the path you've executed 2.) to effectively decrypt your pass using your gpg key and its passphrase
$ gpg --decrypt encrypted_pass\n\nYou need a passphrase to unlock the secret key for\nuser: \"Demo User (AWS org project-user acct gpg key w/ passphrase) <username.lastname@domain.com>\"\n2048-bit RSA key, ID 05ED43DC, created 2019-03-15 (main key ID D64DD59F)\n\ngpg: encrypted with 2048-bit RSA key, ID 05ED43DC, created 2019-03-15\n \"Demo User (AWS org project-user acct gpg key w/ passphrase) <username.lastname@domain.com>\"\nVi0JA|c%fP*FhL}CE-D7ssp_TVGlf#%\n
Depending on your shell version an extra % character could appear as shown below, you must disregard this character since it's not part of the Initial (one time) AWS Web Console password.
If all went well, the decrypted password should be there
"},{"location":"user-guide/ref-architecture-aws/features/identities/gpg/#workaround-for-mac-users","title":"Workaround for Mac users","text":"
There are some situations where gpg keys generated on Mac don't work properly, generating errors like the following:
\u2577\n\u2502 Error: error encrypting password during IAM User Login Profile (user.lastname) creation: Error encrypting Password: error parsing given PGP key: openpgp: unsupported feature: unsupported oid: 2b060104019755010501\n\u2502 \n\u2502 with module.user[\"user.lastname\"].aws_iam_user_login_profile.this[0],\n\u2502 on .terraform/modules/user/modules/iam-user/main.tf line 12, in resource \"aws_iam_user_login_profile\" \"this\":\n\u2502 12: resource \"aws_iam_user_login_profile\" \"this\" {\n\u2502\n
Docker is required for this workaround.
If you don't have docker on your PC, don't worry. You can easily install it following the steps on the official page.
In these cases, execute the following steps:
Run an interactive console into an ubuntu container mounting your gpg directory.
docker run --rm -it --mount type=bind,src=/Users/username/.gnupg,dst=/root/.gnupg ubuntu:latest\n
Inside the container, install required packages.
apt update\napt install gnupg\n
Generate the key as described in previous sections, running gpg --gen-key at the interactive console in the ubuntu container.
To fix permissions in your gpg directory, run these commands at the interactive console in the ubuntu container.
find ~/.gnupg -type f -exec chmod 600 {} \\;\nfind ~/.gnupg -type d -exec chmod 700 {} \\;\n
Now you should be able to export the gpg key and decode the password from your mac, running gpg --export \"Your Name\" | base64.
Finally, decrypt the password in your mac, executing:
"},{"location":"user-guide/ref-architecture-aws/features/identities/identities/","title":"Identity and Access Management (IAM) Layer","text":""},{"location":"user-guide/ref-architecture-aws/features/identities/identities/#setting-up-user-credentials","title":"Setting up user credentials","text":"
Please follow the steps below to orchestrate your base-identities layer 1st in your project-root AWS account and afterwards in your project-security account.
IAM user standard creation workflow
Pre-requisite add Public PGP Key following the documentation
For steps 3. and 4. consider following Leverage's Terraform workflow
Update (add | remove) your IAM Users associated code and deploy security/global/base-identities/users.tf
Consider customizing your account Alias and Password Policy
Update (add | remove | edit) your IAM Groups associated code and deploy security/global/base-identities/groups.tf
Get and share the IAM Users AWS Console user id and its OTP associated password from the make apply outputs
temporally set sensitive = false to get the encrypted outputs in your prompt output.
Each user will need to decrypt its AWS Console Password, you could share the associated documentation with them.
Users must login to the AWS Web Console (https://project-security.signin.aws.amazon.com/console) with their decrypted password and create new pass
Activate MFA for Web Console (Optional but strongly recommended)
User should create his AWS ACCESS KEYS if needed
User could optionally set up\u00a0~/.aws/project/credentials\u00a0+\u00a0~/.aws/project/config following the immediately below AWS Credentials Setup sub-section
To allow users to Access AWS Organization member account consider repeating step 3. but for the corresponding member accounts:
When you activate STS endpoints for a Region, AWS STS can issue temporary credentials to users and roles in your account that make an AWS STS request. Those credentials can then be used in any Region that is enabled by default or is manually enabled. You must activate the Region in the account where the temporary credentials are generated. It does not matter whether a user is signed into the same account or a different account when they make the request.
To activate or deactivate AWS STS in a Region that is enabled by default (console)
Sign in as a root user or an IAM user with permissions to perform IAM administration tasks.
Open the IAM console and in the navigation pane choose Account settings.
If necessary, expand Security Token Service (STS), find the Region that you want to activate, and then choose Activate or Deactivate. For Regions that must be enabled, we activate STS automatically when you enable the Region. After you enable a Region, AWS STS is always active for the Region and you cannot deactivate it. To learn how to enable a Region, see Managing AWS Regions in the AWS General Reference.
Source | AWS Documentation IAM User Guide | Activating and deactivating AWS STS in an AWS Region
Figure: Deactivating AWS STS in not in use AWS Region. Only in used Regions must have STS activated.
"},{"location":"user-guide/ref-architecture-aws/features/identities/overview/","title":"Identity and Access Management (IAM)","text":""},{"location":"user-guide/ref-architecture-aws/features/identities/overview/#overview","title":"Overview","text":"
Having this official AWS resource as reference we've define a security account structure for managing multiple accounts.
User Management Definitions
IAM users will strictly be created and centralized in the Security account (member accounts IAM Users could be exceptionally created for very specific tools that still don\u2019t support IAM roles for cross-account auth).
All access to resources within the Client organization will be assigned via policy documents attached to IAM roles or groups.
All IAM roles and groups will have the least privileges required to properly work.
IAM AWS and Customer managed policies will be defined, inline policies will be avoided whenever possible.
All user management will be maintained as code and will reside in the DevOps repository.
All users will have MFA enabled whenever possible (VPN and AWS Web Console).
Root user credentials will be rotated and secured. MFA for root will be enabled.
IAM Access Keys for root will be disabled.
IAM root access will be monitored via CloudWatch Alerts.
Why multi account IAM strategy?
Creating a security relationship between accounts makes it even easier for companies to assess the security of AWS-based deployments, centralize security monitoring and management, manage identity and access, and provide audit and compliance monitoring services
Figure: AWS Organization Security account structure for managing multiple accounts (just as reference). (Source: Yoriyasu Yano, \"How to Build an End to End Production-Grade Architecture on AWS Part 2\", Gruntwork.io Blog, accessed November 18th 2020)."},{"location":"user-guide/ref-architecture-aws/features/identities/overview/#iam-groups-roles-definition","title":"IAM Groups & Roles definition","text":"
AWS Org member accounts IAM groups :
Account Name AWS Org Member Accounts IAM Groups Admin Auditor DevOps DeployMaster project-management x project-security x x x x
AWS Org member accounts IAM roles :
Account Name AWS Org Member Accounts IAM Roles Admin Auditor DevOps DeployMaster OrganizationAccountAccessRole project-management x project-security x x x x project-shared x x x x x project-legacy x x x project-apps-devstg x x x x x project-apps-prd x x x x x"},{"location":"user-guide/ref-architecture-aws/features/identities/roles/","title":"IAM Roles","text":"
What are AWS IAM Roles?
For the Leverage AWS Reference Architecture we heavily depend on AWS IAM roles, which is a standalone IAM entity that:
Allows you to attach IAM policies to it,
Specify which other IAM entities to trust, and then
Those other IAM entities can assume the IAM role to be temporarily get access to the permissions in those IAM policies.
The two most common use cases for IAM roles are
Service roles: Whereas an IAM user allows a human being to access AWS resources, one of the most common use cases for an IAM role is to allow a service\u2014e.g., one of your applications, a CI server, or an AWS service\u2014to access specific resources in your AWS account. For example, you could create an IAM role that gives access to a specific S3 bucket and allow that role to be assumed by one of your EC2 instances or Lambda functions. The code running on that AWS compute service will then be able to access that S3 bucket (or any other service you granted through this IAM roles) without you having to manually copy AWS credentials (i.e., access keys) onto that instance.
Cross account access: Allow to grant an IAM entity in one AWS account access to specific resources in another AWS account. For example, if you have an IAM user in AWS account A, then by default, that IAM user cannot access anything in AWS account B. However, you could create an IAM role in account B that gives access to a specific S3 bucket (or any necessary AWS services) in AWS account B and allow that role to be assumed by an IAM user in account A. That IAM user will then be able to access the contents of the S3 bucket by assuming the IAM role in account B. This ability to assume IAM roles across different AWS accounts is the critical glue that truly makes a multi AWS account structure possible.
"},{"location":"user-guide/ref-architecture-aws/features/identities/roles/#how-iam-roles-work","title":"How IAM roles work?","text":"Figure: Example of AWS cross-account AWS access. (Source: Kai Zhao, \"AWS CloudTrail Now Tracks Cross-Account Activity to Its Origin\", AWS Security Blog, accessed November 17th 2020).
You must define a trust policy for each IAM role, which is a JSON document (very similar to an IAM policy) that specifies who can assume this IAM role. For example, we present below a trust policy that allows this IAM role to be assumed by an IAM user named John in AWS account 111111111111:
Note that a trust policy alone does NOT automatically give John permissions to assume this IAM role. Cross-account access always requires permissions in both accounts (2 way authorization). So, if John is in AWS account 111111111111 and you want him to have access to an IAM role called DevOps in account B ID 222222222222, then you need to configure permissions in both accounts: 1. In account 222222222222, the DevOps IAM role must have a trust policy that gives sts:AssumeRole permissions to AWS account A ID 111111111111 (as shown above). 2. 2nd, in account A 111111111111, you also need to attach an IAM policy to John\u2019s IAM user that allows him to assume the DevOps IAM role, which might look like this:
"},{"location":"user-guide/ref-architecture-aws/features/identities/roles/#assuming-an-aws-iam-role","title":"Assuming an AWS IAM role","text":"
How does it work?
IAM roles do not have a user name, password, or permanent access keys. To use an IAM role, you must assume it by making an AssumeRole API call (v\u00eda SDKs API, CLI or Web Console, which will return temporary access keys you can use in follow-up API calls to authenticate as the IAM role. The temporary access keys will be valid for 1-12 hours (depending on your current validity expiration config), after which you must call AssumeRole again to fetch new temporary keys. Note that to make the AssumeRole API call, you must first authenticate to AWS using some other mechanism.
For example, for an IAM user to assume an IAM role, the workflow looks like this:
Figure: Assuming an AWS IAM role. (Source: Gruntwork.io, \"How to configure a production-grade AWS account structure using Gruntwork AWS Landing Zone\", Gruntwork.io Production deployment guides, accessed November 17th 2020).
Basic AssumeRole workflow
Authenticate using the IAM user\u2019s permanent AWS access keys
Make the AssumeRole API call
AWS sends back temporary access keys
You authenticate using those temporary access keys
Now all of your subsequent API calls will be on behalf of the assumed IAM role, with access to whatever permissions are attached to that role
IAM roles and AWS services
Most AWS services have native support built-in for assuming IAM roles.
For example:
You can associate an IAM role directly with an EC2 instance (instance profile), and that instance will automatically assume the IAM role every few hours, making the temporary credentials available in EC2 instance metadata.
Just about every AWS CLI and SDK tool knows how to read and periodically update temporary credentials from EC2 instance metadata, so in practice, as soon as you attach an IAM role to an EC2 instance, any code running on that EC2 instance can automatically make API calls on behalf of that IAM role, with whatever permissions are attached to that role. This allows you to give code on your EC2 instances IAM permissions without having to manually figure out how to copy credentials (access keys) onto that instance.
The same strategy works with many other AWS services: e.g., you use IAM roles as a secure way to give your Lambda functions, ECS services, Step Functions, and many other AWS services permissions to access specific resources in your AWS account.
Consider the following AWS official links as reference:
AWS Identities | Roles terms and concepts
AWS Identities | Common scenarios
"},{"location":"user-guide/ref-architecture-aws/features/monitoring/apm/","title":"Application Performance Monitoring (APM) and Business Performance","text":"
Custom Prometheus BlackBox Exporter + Grafana & Elastic Application performance monitoring (APM) delivers real-time and trending data about your web application's performance and the level of satisfaction that your end users experience. With end to end transaction tracing and a variety of color-coded charts and reports, APM visualizes your data, down to the deepest code levels. Your DevOps teams don't need to guess whether a performance blocker comes from the app itself, CPU availability, database loads, or something else entirely unexpected. With APM, you can quickly identify potential problems before they affect your end users.
APM's user interface provides both current and historical information about memory usage, CPU utilization, database query performance, web browser rendering performance, app availability and error analysis, external services, and other useful metrics.
For this purpose we propose the usage of Elasticsearch + Kibana for database and visualization respectively. By deploying the Fluentd daemonset on the Kubernetes clusters we can send all logs from running pods to Elasticsearch, and with \u2018beat\u2019 we can send specific logs for resources outside of Kubernetes. There will be many components across the environment generating different types of logs: ALB access logs, s3 access logs, cloudfront access logs, application request logs, application error logs. Access logs on AWS based resources can be stored in a centralized bucket for that purpose, on the security account and given the need these can be streamed to Elasticsearch as well if needed.
Figure: Monitoring metrics and log architecture diagram (just as reference). (Source: binbash Leverage, \"AWS Well Architected Reliability Report example\", binbash Leverage Doc, accessed November 18th 2020).
Alerting based on Logs
Certain features that were only available under licence were recently made available by Elastic, and included in the open source project of Elasticsearch. Elastalert allow us to generate alerts based on certain log entries or even after counting a certain amount of a type of entry, providing great flexibility.
There are metrics that are going to be of interest both in the infrastructure itself (CPU, Memory, disk) and also on application level (amount of non 200 responses, latency, % of errors) and we will have two key sources for this: Prometheus and AWS CloudWatch metrics.
Metric collectors
CloudWatch metrics: Is where amazon stores a great number of default metrics for each of its services. Useful data here can be interpreted and alerts can be generated with Cloudwatch alerts and can also be used as a source for Grafana. Although this is a very good offering, we have found it to be incomplete and highly bound to AWS services but not integrated enough with the rest of the ecosystem.
Prometheus: This is an open source tool (by Soundcloud) that is essentially a time-series database. It stores metrics, and it has the advantage of being highly integrated with all Kubernetes things. In fact, Kubernetes is already publishing various metrics in Prometheus format \u201cout of the box\u201d. It\u2019s alerting capabilities are also remarkable, and it can all be kept as code in a repository. It has a big community behind it, and it\u2019s not far fetched at this point to include a library in your own application that provides you with the ability to create an endpoint that publishes certain metrics about your own application, that we can graph or alert based on them.
Figure: Monitoring metrics and log architecture diagram (just as reference). (Source: binbash Leverage, \"AWS Well Architected Reliability Report example\", binbash Leverage Doc, accessed November 18th 2020).
Graphing metrics
Grafana is the standard open source visualization tool which can be used on top of a variety of different data stores. It can use prometheus as a source, and there are many open source dashboards and plugins available that provide great visualization of how things are running, and we can also build our own if necessary. If something is left out of prometheus and already available in Cloudwatch metrics we can easily integrate it as a source for Grafana as well, and build dashboards that integrate these metrics and even do some intelligence on them coming from multiple origins.
Although Grafana already has alerting capabilities built in, we rather (most of the times) have Prometheus alerting engine configured, because we can have really customize and specify alerts. We can have them as code in their extremely readable syntax. Example:
"},{"location":"user-guide/ref-architecture-aws/features/monitoring/notification_escalation/","title":"Notification & Escalation Procedure","text":""},{"location":"user-guide/ref-architecture-aws/features/monitoring/notification_escalation/#overview","title":"Overview","text":"Urgency Service Notification Setting Use When Response High 24/7 High-priority PagerDuty Alert 24/7/365
Issue is in Production
Or affects the applications/services and in turn affects the normal operation of the clinics
Or prevents clinic patients to interact with the applications/services
Requires immediate human action
Escalate as needed
The engineer should be woken up
High during support hours High-priority Slack Notifications during support hours
Issue impacts development team productivity
Issue impacts the normal business operation
Requires immediate human action ONLY during business hours
Low Low Priority Slack Notification
Any issue, on any environment, that occurs during working hours
All alerts are sent to #engineering-urgent-alerts channel. Members that are online can have visibility from there. AlertManager takes care of sending such alerts according to the rules defined here: TODO
Note: there is a channel named engineering-alerts but is used for Github notifications. It didn\u2019t make sense to mix real alerts with that, that is why a new engineering-urgent-alerts channel was created. As a recommendation, Github notifications should be sent to a channel named like #engineering-notifications and leave engineering-alerts for real alerts.
PagerDuty
AlertManager only sends to PagerDuty alerts that are labeled as severity: critical. PagerDuty is configured to turn these into incidents according to the settings defined here for the Prometheus Critical Alerts service. The aforementioned service uses HiPriorityAllYearRound escalation policy to define who gets notified and how.
Note: currently only the TechOwnership role gets notified as we don\u2019t have agreements or rules about on-call support but this can be easily changed in the future to accommodate business decisions.
UpTimeRobot
We are doing basic http monitoring on the following sites: * www.domain_1.com * www.domain_2.com * www.domain_3.com
Note: a personal account has been set up for this. As a recommendation, an new account should be created using an email account that belongs to your project.
Distributed tracing, also called distributed request tracing, is a method used to profile and monitor applications, especially those built using a microservices architecture. Distributed tracing helps pinpoint where failures occur and what causes poor performance.
Figure: Figure: Distributed tracing architecture diagram (just as reference). (Source: binbash Leverage, \"AWS Well Architected Reliability Report example\", binbash Leverage Doc, accessed November 18th 2020)."},{"location":"user-guide/ref-architecture-aws/features/monitoring/tracing/#read-more","title":"Read more","text":"
Related resources
Jaeger
Opensensus
"},{"location":"user-guide/ref-architecture-aws/features/network/dns/","title":"Route53 DNS hosted zones","text":""},{"location":"user-guide/ref-architecture-aws/features/network/dns/#how-it-works","title":"How it works","text":"
Route53 Considerations
Route53 private hosted zone will have associations with VPCs on different AWS organization accounts
Route53 should ideally be hosted in the Shared account, although sometimes Route53 is already deployed in a Legacy account where it can be imported and fully supported as code.
Route53 zero downtime migration (active-active hosted zones) is completely possible and achievable with Leverage terraform code
Figure: AWS Organization shared account Route53 DNS diagram. (Source: Cristian Southall, \"Using CloudFormation Custom Resources to Configure Route53 Aliases\", Abstractable.io Blog post, accessed November 18th 2020)."},{"location":"user-guide/ref-architecture-aws/features/network/dns/#user-guide","title":"User guide","text":"
pre-requisites
Review & update configs
Review & understand the workflow
Steps
DNS service has to be orchestrated from /shared/global/base-dns layer following the standard workflow
"},{"location":"user-guide/ref-architecture-aws/features/network/dns/#migrated-aws-route53-hosted-zones-between-aws-accounts","title":"Migrated AWS Route53 Hosted Zones between AWS Accounts","text":"
We'll need to setup the Route53 DNS service with an active-active config to avoid any type of service disruption and downtime. This would then allow the Name Servers of both AWS Accounts to be added to your domain provider (eg: namecheap.com) and have for example:
4 x ns (project-legacy Route53 Account)
4 x ns (project-shared Route53 Account)
After the records have propagated and everything looks OK we could remove the project-legacy Route53 ns from your domain provider (eg: namecheap.com) and leave only the of project-shared ones.
This official Migrating a hosted zone to a different AWS account - Amazon Route 53 article explains this procedure step by step:
AWS Route53 hosted zone migration steps
Create records in the new hosted zone (bb-shared)
Compare records in the old and new hosted zones (bb-legacy)
Update the domain registration to use name servers for the new hosted zone (NIC updated to use both bb-legacy + bb-shared)
Wait for DNS resolvers to start using the new hosted zone
(Optional) delete the old hosted zone (bb-legacy), remember you'll need to delete the ns delegation records from your domain registration (NIC) too.
"},{"location":"user-guide/ref-architecture-aws/features/network/network-nacl/","title":"Security in AWS with Leverage Reference Architecture and NACLs","text":"
When deploying an AWS Landing Zone resources, security is of fundamental importance. Network Access Control Lists (NACLs) play a crucial role in controlling traffic at the subnet level. In this section we'll describe the use of NACLs implementing with Terraform over the Leverage AWS Reference Architecture.
"},{"location":"user-guide/ref-architecture-aws/features/network/network-nacl/#understanding-network-access-control-lists-nacls","title":"Understanding Network Access Control Lists (NACLs)","text":"
Network Access Control Lists (NACLs) act as a virtual firewall for your AWS VPC (Virtual Private Cloud), controlling inbound and outbound traffic at the subnet level. They operate on a rule-based system, allowing or denying traffic based on defined rules.
"},{"location":"user-guide/ref-architecture-aws/features/network/network-nacl/#leverage-ref-arch-default-configuration-and-variables-setup-for-nacls","title":"Leverage Ref Arch: Default Configuration and Variables Setup for NACLs","text":"
In the Leverage Reference Architecture, we adopt the default NACLs approach. This foundational setup not only ensures a controlled security environment but also offers the flexibility for customization.
This setup ensures that default NACLs are used, providing a baseline level of security.:
manage_default_network_acl = true\npublic_dedicated_network_acl = false // use dedicated network ACL for the public subnets.\nprivate_dedicated_network_acl = false // use dedicated network ACL for the private subnets.\n
To verify that default NACLs are enabled in your Leverage proyect, follow this steps:
Move into the /shared/us-east-1/base-network/ directory.
Open network.tf file: The network.tf file defines the configuration for the VPC (Virtual Private Cloud) and NACL service using a terraform module.
module \"vpc\" {\nsource = \"github.com/binbashar/terraform-aws-vpc.git?ref=v3.18.1\"\n.\n.\n.\nmanage_default_network_acl = var.manage_default_network_acl\npublic_dedicated_network_acl = var.public_dedicated_network_acl // use dedicated network ACL for the public subnets.\nprivate_dedicated_network_acl = var.private_dedicated_network_acl // use dedicated network ACL for the private subnets.\n.\n.\n.\n
Open variable.tf file: The module allows customization of Network Access Control Lists (NACLs) through specified variables
"},{"location":"user-guide/ref-architecture-aws/features/network/network-nacl/#key-points-to-kae-into-account-for-a-robust-and-secure-setup","title":"Key Points to kae into account for a robust and secure setup:","text":"
Explicit Approval Process for NACL Enablement: Enabling NACLs should not be taken lightly. Users or tech leads wishing to enable NACLs must undergo an explicit approval process. This additional step ensures that the introduction of NACLs aligns with the overall security policies and requirements of the organization.
Feedback Mechanisms for NACL Status and Permissions: Communication is key when it comes to security configurations. Feedback mechanisms should be in place to inform users of the status of NACLs and any associated permissions. This ensures transparency and allows for prompt resolution of any issues that may arise.
Comprehensive Testing for Non-disruptive Integration: Before enabling NACLs, comprehensive testing should be conducted to ensure that the default disabling of NACLs does not introduce new issues. This includes testing in different environments and scenarios to guarantee a non-disruptive integration. Automated testing and continuous monitoring can be valuable tools in this phase.
We prioritize operational simplicity to provide an efficient deployment process; however, it's essential for users to conduct a review process align with their specific security and compliance requirements.
This approach allows users to benefit from initial ease of use while maintaining the flexibility to customize and enhance security measures according to their unique needs and compliance standards
In this code, we ensure that default NACLs are enabled. Users can later seek approval and modify these variables if enabling dedicated NACLs becomes necessary.
In this section we detail all the network design related specifications
VPCs CIDR blocks
VPC Gateways: Internet, NAT, VPN.
VPC Peerings
VPC DNS Private Hosted Zones Associations.
Network ACLS (NACLs)
"},{"location":"user-guide/ref-architecture-aws/features/network/vpc-addressing/#vpcs-ip-addressing-plan-cidr-blocks-sizing","title":"VPCs IP Addressing Plan (CIDR blocks sizing)","text":"
Introduction
VPCs can vary in size from 16 addresses (/28 netmask) to 65,536 addresses (/16 netmask). In order to size a VPC correctly, it is important to understand the number, types, and sizes of workloads expected to run in it, as well as workload elasticity and load balancing requirements.
Keep in mind that there is no charge for using Amazon VPC (aside from EC2 charges), therefore cost should not be a factor when determining the appropriate size for your VPC, so make sure you size your VPC for growth.
Moving workloads or AWS resources between networks is not a trivial task, so be generous in your IP address estimates to give yourself plenty of room to grow, deploy new workloads, or change your VPC design configuration from one to another. The majority of AWS customers use VPCs with a /16 netmask and subnets with /24 netmasks. The primary reason AWS customers select smaller VPC and subnet sizes is to avoid overlapping network addresses with existing networks.
So having AWS single VPC Design we've chosen a Medium/Small VPC/Subnet addressing plan which would probably fit a broad range variety of use cases
"},{"location":"user-guide/ref-architecture-aws/features/network/vpc-addressing/#networking-ip-addressing","title":"Networking - IP Addressing","text":"
Starting CIDR Segment (AWS Org)
AWS Org IP Addressing calculation is presented below based on segment 172.16.0.0/12
We started from 172.16.0.0/12 and subnetted to /20
Resulting in Total Subnets: 256
2 x AWS Account with Hosts/SubNet: 4094
1ry VPC + 2ry VPC
1ry VPC DR + 2ry VPC DR
Individual CIDR Segments (VPCs)
Then each of these are /20 to /24
Considering the whole Starting CIDR Segment (AWS Org) before declared, we'll start at 172.18.0.0/20
shared
1ry VPC CIDR: 172.18.0.0/24
2ry VPC CIDR: 172.18.16.0/24
1ry VPC DR CIDR: 172.18.32.0/24
2ry VPC DR CIDR: 172.18.48.0/24
apps-devstg
1ry VPC CIDR: 172.18.64.0/24
2ry VPC CIDR: 172.18.80.0/24
1ry VPC DR CIDR: 172.18.96.0/24
2ry VPC DR CIDR: 172.18.112.0/24
apps-prd
1ry VPC CIDR: 172.18.128.0/24
2ry VPC CIDR: 172.18.144.0/24
1ry VPC DR CIDR: 172.18.160.0/24
2ry VPC DR CIDR: 172.18.176.0/24
Resulting in Subnets: 16 x VPC
VPC Subnets with Hosts/Net: 256.
Eg: apps-devstg account \u2192 us-east-1 w/ 3 AZs \u2192 3 x Private Subnets /az + 3 x Public Subnets /az
1ry VPC CIDR: 172.18.64.0/24Subnets:
Private 172.18.64.0/24, 172.18.66.0/24 and 172.18.68.0/24
Public 172.18.65.0/24, 172.18.67.0/24 and 172.18.69.0/24
"},{"location":"user-guide/ref-architecture-aws/features/network/vpc-addressing/#planned-subnets-per-vpc","title":"Planned Subnets per VPC","text":"
Having defined the initial VPC that will be created in the different accounts that were defined, we are going to create subnets in each of these VPCs defining Private and Public subnets split among different availability zones:
Please follow the steps below to orchestrate your base-network layer, 1st in your project-shared AWS account and afterwards in the necessary member accounts which will host network connected resources (EC2, Lambda, EKS, RDS, ALB, NLB, etc):
project-apps-devstg account.
project-apps-prd account.
Network layer standard creation workflow
Please follow Leverage's Terraform workflow for each of your accounts.
We'll start by project-shared AWS Account Update (add | remove | customize) your VPC associated code before deploying this layer shared/base-network Main files
network.tf
locals.tf
Repeat for every AWS member Account that needs its own VPC Access AWS Organization member account consider repeating step 3. but for the corresponding member accounts.
"},{"location":"user-guide/ref-architecture-aws/features/network/vpc-peering/","title":"Diagram: Network Service (cross-account VPC peering)","text":""},{"location":"user-guide/ref-architecture-aws/features/network/vpc-peering/#how-it-works","title":"How it works","text":"
"},{"location":"user-guide/ref-architecture-aws/features/network/vpc-peering/#diagram-network-service-cross-account-vpc-peering_1","title":"Diagram: Network Service (cross-account VPC peering)","text":"Figure: AWS multi account Organization VPC peering diagram. (Source: AWS, \"Amazon Virtual Private Cloud VPC Peering\", AWS Documentation Amazon VPC User Guide, accessed November 18th 2020). Figure: AWS multi account Organization peering detailed diagram. (Source: AWS, \"Amazon Virtual Private Cloud VPC Peering\", AWS Documentation Amazon VPC User Guide, accessed November 18th 2020)."},{"location":"user-guide/ref-architecture-aws/features/network/vpc-topology/","title":"Network Layer","text":""},{"location":"user-guide/ref-architecture-aws/features/network/vpc-topology/#network-topology","title":"Network Topology","text":"
VPC with public and private subnets (NAT)
The configuration for this scenario includes a virtual private cloud (VPC) with public subnets and a private subnets (it's number will change depending on our specific needs). We recommend this scenario if you want to run a public-facing web application, while maintaining back-end servers that aren't publicly accessible. A common example is a multi-tier website, with a Load Balancer (ALB | NLB) in a public subnet, or other public facing routing service like AWS CloudFront or Api Gateway, and our web servers (Lambda, EKS, ECS, EC2) and database (RDS, DynamoDB, etc) servers in private subnets. You can set up security (SGs, ACLs, WAF) and routing so that the web servers can communicate internally (even between VPC accounts or VPN Endpoints) with all necessary services and components such as databases, cache, queues, among others.
The services running in the public subnet, like an ALB or NLB can send outbound traffic directly to the Internet, whereas the instances in the private subnet can't. Instead, the instances in the private subnet can access the Internet by using a network address translation (NAT) gateway that resides in the public subnet. The database servers can connect to the Internet for software updates using the NAT gateway (if using RDS this is transparently provided by AWS), but the Internet cannot establish connections to the database servers.
So, whenever possible all our AWS resources like EC2, EKS, RDS, Lambda, SQS will be deployed in VPC private subnets and we'll use a NAT device (Nat Gateway) to enable instances in a private subnet to connect to the internet (for example, for software updates) or other AWS services, but prevent the internet from initiating connections with the instances.
A NAT device forwards traffic from the instances in the private subnet to the internet (via the VPC Internet Gateway) or other AWS services, and then sends the response back to the instances. When traffic goes to the internet, the source IPv4 address is replaced with the NAT device\u2019s address and similarly, when the response traffic goes to those instances, the NAT device translates the address back to those instances\u2019 private IPv4 addresses.
Figure: VPC topology diagram. (Source: AWS, \"VPC with public and private subnets (NAT)\", AWS Documentation Amazon VPC User Guide, accessed November 18th 2020). Figure: VPC topology diagram with multiple Nat Gateways for HA. (Source: Andreas Wittig, \"Advanced AWS Networking: Pitfalls That You Should Avoid\", Cloudonaut.io Blog, accessed November 18th 2020)."},{"location":"user-guide/ref-architecture-aws/features/network/vpc-topology/#read-more","title":"Read more","text":"
AWS reference links
Consider the following AWS official links as reference:
VPC with public and private subnets (NAT)
AWS Elastic Load Balancing
"},{"location":"user-guide/ref-architecture-aws/features/network/vpc-traffic-out/","title":"Network Security","text":""},{"location":"user-guide/ref-architecture-aws/features/network/vpc-traffic-out/#control-internet-access-outbound-traffic","title":"Control Internet access outbound traffic","text":""},{"location":"user-guide/ref-architecture-aws/features/network/vpc-traffic-out/#goals","title":"Goals","text":"
Review and analyse available alternatives for controlling outbound traffic in VPCs.
All possible candidates need to offer a reasonable balance between features and pricing.
Solutions
"},{"location":"user-guide/ref-architecture-aws/features/network/vpc-traffic-out/#leverage-currently-supports","title":"Leverage currently supports","text":"
Network ACL (Subnet firewall)
Security Groups (Instance firewall)
"},{"location":"user-guide/ref-architecture-aws/features/network/vpc-traffic-out/#what-alternatives-do-we-have","title":"What alternatives do we have?","text":""},{"location":"user-guide/ref-architecture-aws/features/network/vpc-traffic-out/#pre-considerations","title":"Pre-considerations","text":"
First of all, keep in mind the following points before and while you go through the data in the table:
1 EBS pricing at the moment of this writing:
GP2: $0.10 per GB-month
GP3: $0.08 per GB-month)
2 DataTransfer costs will be incurred in all options
Our default AWS Organizations terraform layout solution includes 5 accounts + 1 or N accounts (if you invite pre-existing AWS Accounts).
Account Description Management (Root) Used to manage configuration and access to AWS Org managed accounts. The AWS Organizations account provides the ability to create and financially manage member accounts, it contains AWS Organizations Service Control Policies(SCPs). Shared Services / Resources Reference for creating infrastructure shared services such as directory services, DNS, VPN Solution, Monitoring tools like Prometheus and Grafana, CI/CD server (Jenkins, Drone, Spinnaker, etc), centralized logging solution like ELK and Vault Server (Hashicorp Vault) Security Intended for centralized user management via IAM roles based cross-org auth approach (IAM roles per account to be assumed still needed. Also to centralize AWS CloudTrail and AWS Config logs, and used as the master AWS GuardDuty Account Network Intended for centralized networking management via Transit Gateway (TGW), supports a centralized outbound traffic setup and the integration of AWS Network Firewall (NFW) Legacy Your pre existing AWS Accounts to be invited as members of the new AWS Organization, probably several services and workloads are going to be progressively migrated to your new Accounts. Apps DevStg Host your DEV, QA and STG environment workloads Compute / Web App Servers (K8s Clusters and Lambda Functions), Load Balancers, DB Servers, Caching Services, Job queues & Servers, Data, Storage, CDN Apps Prod Host your PROD environment workloads Compute / Web App Servers (K8s Clusters and Lambda Functions), Load Balancers, DB Servers, Caching Services, Job queues & Servers, Data, Storage, CDN"},{"location":"user-guide/ref-architecture-aws/features/organization/billing/","title":"Billing","text":""},{"location":"user-guide/ref-architecture-aws/features/organization/billing/#overview","title":"Overview","text":"
Each month AWS charges your payer Root Account for all the linked accounts in a consolidated bill. The following illustration shows an example of a consolidated bill.
Figure: AWS Organization Multi-Account structure (just as reference). (Source: Andreas Wittig, \"AWS Account Structure: Think twice before using AWS Organizations\", Cloudonaut.io Blog, accessed November 18th 2020). Figure: AWS Organization Multi-Account billing structure (just as reference). (Source: AWS, \"Consolidated billing process\", AWS Documentation AWS Billing and Cost Management User Guide, accessed November 18th 2020).
Reference Architecture AWS Organizations features
AWS Multiple Account Billing Strategy: consolidated billing for all your accounts within organization, enhanced per account cost filtering and RI usage
A single monthly bill accumulates the spending among many AWS accounts.
Benefit from volume pricing across more than one AWS account.
AWS Organizations Billing FAQs
What does AWS Organizations cost?
AWS Organizations is offered at no additional charge.
Who pays for usage incurred by users under an AWS member account in my organization?
The owner of the master account is responsible for paying for all usage, data, and resources used by the accounts in the organization.
Will my bill reflect the organizational unit structure that I created in my organization?
No. For now, your bill will not reflect the structure that you have defined in your organization. You can use cost allocation tags in individual AWS accounts to categorize and track your AWS costs, and this allocation will be visible in the consolidated bill for your organization.
You'll need an email to create and register your AWS Organization Management Account. For this purpose we recommend to avoid using a personal email account. Instead, whenever possible, it should ideally be associated, with a distribution list email such as a GSuite Group to ensure the proper admins member's team (DevOps | SecOps | Cloud Engineering Team) to manage its notifications avoiding a single point of contact (constraint).
GSuite Group Email address: aws@domain.com (to which admins / owners belong), and then using the + we generate the aliases automatically implicitly when running Terraform's Leverage code.
aws+security@binbash.com.ar
aws+shared@binbash.com.ar
aws+network@binbash.com.ar
aws+apps-devstg@binbash.com.ar
aws+apps-prd@binbash.com.ar
Reference Code as example
#\n# Project Prd: services and resources related to production are placed and\n# maintained here.\n#\nresource \"aws_organizations_account\" \"apps_prd\" {\nname = \"apps-prd\"\nemail = \"aws+apps-prd@domain.ar\"\nparent_id = aws_organizations_organizational_unit.apps_prd.id\n}\n
Billing: review billing setup as pre-requisite to deploy the AWS Org. At your Management account billing setup check
Activate IAM User and Role Access to Billing Information
If needed Update Alternate Contacts
Via AWS Web Console: in project_name-management previously created account (eg, name: leverage-management, email: aws@binbash.com.ar) create the mgmt-org-admin IAM user with Admin privileges (attach the AdministratorAccess IAM managed policy and enable Web Console and programmatic access), which will be use for the initial AWS Org bootstrapping.
NOTE: After it\u2019s 1st execution only nominated Org admin users will persist in the project-management account.
Via AWS Web Console: in project-management account create mgmt-org-admin IAM user AWS ACCESS KEYS
NOTE: This could be created all in one in the previous step (N\u00ba 2).
Figure: AWS Web Console screenshot. (Source: binbash, \"AWs Organization management account init IAM admin user\", accessed June 16th 2021).
Figure: AWS Web Console screenshot. (Source: binbash, \"AWs Organization management account init IAM admin user\", accessed June 16th 2021).
Set your IAM credentials in the machine your're going to exec the Leverage CLI (remember this are the mgmt-org-admin temporary user credentials shown in the screenshot immediately above).
Set up your Leverage reference architecture configs in order to work with your new account and `org-mgmt-admin IAM user
common config
account configs
Setup and create the terraform remote state for the new AWS Org Management account
terraform remote state config
terraform remote state workflow
terraform remote state ref code
You'll 1st get a local state and then you'll need to move your tf state to s3; validate it and finally delete local state files
The AWS Organization from the Reference Architecture /le-tf-infra-aws/root/global/organizations will be orchestrated using the Leverage CLI following the standard workflow.
the Management account has to be imported into de the code.
Verify your Management account email address in order to invite existing (legacy) AWS accounts to join your organization.
Following the doc orchestrate v\u00eda the Leverage CLI workflow the Mgmt Account IAM layer (base-identities) with the admin IAM Users (consider this/these users will have admin privileges over the entire AWS Org assuming the OrganizationAccountAccessRole) -> le-tf-infra-aws/root/global/base-identities
The IAM role: OrganizationAccessAccountRole => does not exist in the initial Management (root) account, this will be created by the code in this layer.
Mgmt account admin user permanent credentials set up => setup in your workstation the AWS credentials) for the OrganizationAccountAccessRole IAM role (project_short-root-oaar, eg: bb-root-oaar). Then validate within each initial mgmt account layer that the profile bb-root-oaar is correctly configured at the below presented config files, as well as any other necessary setup.
/config/common.config
/root/config/account.config
/root/config/backend.config
Setup (code and config files) and Orchestrate the /security/global/base-identities layer via Leverage CLI on your security account for consolidated and centralized User Mgmt and access to the AWS Org.
You must have your AWS Organization deployed and access to your Management account as described in the /user-guide/user-guide/organization/organization-init section.
"},{"location":"user-guide/ref-architecture-aws/features/organization/legacy-accounts/#invite-aws-pre-existing-legacy-accounts-to-your-aws-organization","title":"Invite AWS pre-existing (legacy) accounts to your AWS Organization","text":"
AWS Org pre-existing accounts invitation
Via AWS Web Console: from your project-root account invite the pre-existing project-legacy (1 to n accounts).
Via AWS Web Console: in project-legacy create the OrganizationAccountAccessRole IAM Role with Admin permissions.
Should follow Creating the OrganizationAccountAccessRole in an invited member account section.
Import your project-legacy account as code.
Update the following variables in ./@bin/makefiles/terraform12/Makefile.terraform12-import-rm
This repository contains all Terraform configuration files used to create binbash Leverage Reference AWS Organizations Multi-Account baseline layout.
Why AWS Organizations?
This approach allows it to have a hierarchical structure of AWS accounts, providing additional security isolation and the ability to separate resources into Organizational Units with it associated Service Control Policies (SCP).
Considering that a current AWS account/s was/were already active (Client AWS Legacy Account), this one will then be invited to be a \u201cmember account\u201d of the AWS Organization architecture. In the future, once all Client\u2019s Legacy dev, stage, prod and other resources for the Project applications are running in the new accounts architecture, meaning a full AWS Organizations approach, all the already migrated assets from the \u2018Legacy\u2019 account should be decommissioned. This account will remain with the necessary services, such as DNS, among others.
The following block provides a brief explanation of the chosen AWS Organization Accounts layout:
MyExample project file structure
+\ud83d\udcc2 management/ (resources for the management account)\n ...\n +\ud83d\udcc2 security/ (resources for the security + users account)\n ...\n +\ud83d\udcc2 shared/ (resources for the shared account)\n ...\n +\ud83d\udcc2 network/ (resources for the centralized network account)\n ...\n +\ud83d\udcc2 apps-devstg/ (resources for apps dev & stg account)\n ...\n +\ud83d\udcc2 apps-prd/ (resources for apps prod account)\n ...\n
Billing: Consolidated billing for all your accounts within organization, enhanced per account cost filtering and RI usage
Security I: Extra security layer: You get fully isolated infrastructure for different organizations units in your projects, eg: Dev, Prod, Shared Resources, Security, Users, BI, etc.
Security II: Using AWS Organization you may use Service Control Policies (SCPs) to control which AWS services are available within different accounts.
Networking: Connectivity and access will be securely setup via VPC peering + NACLS + Sec Groups everything with private endpoints only accessible v\u00eda Pritunl VPN significantly reducing the surface of attack.
User Mgmt: You can manage all your IAM resources (users/groups/roles) and policies in one place (usually, security/users account) and use AssumeRole to works with org accounts.
Operations: Will reduce the blast radius to the maximum possible.
Compatibility: Legacy accounts can (probably should) be invited as a member of the new Organization and afterwards even imported into your terraform code.
Migration: After having your baseline AWS Org reference cloud solutions architecture deployed (IAM, VPC, NACLS, VPC-Peering, DNS Cross-Org, CloudTrail, etc) you're ready to start progressively orchestrating new resources in order to segregate different Environment and Services per account. This approach will allow you to start a 1 by 1 Blue/Green (Red/Black) migration without affecting any of your services at all. You would like to take advantage of an Active-Active DNS switchover approach (nice as DR exercise too).
EXAMPLE: Jenkins CI Server Migration steps:
Let's say you have your EC2_A (jenkins.aws.domain.com) in Account_A (Legacy), so you could deploy a brand new EC2_B Jenkins Instance in Account_B (Shared Resources).
Temporally associated with jenkins2.aws.domain.com
Sync it's current data (/var/lib/jenkins)
Test and fully validate every job and pipeline works as expected.
In case you haven't finished your validations we highly recommend to declare everything as code and fully automated so as to destroy and re-create your under development env on demand to save costs.
Finally switch jenkins2.aws.domain.com -> to -> jenkins.aws.domain.com
Stop your old EC2_A.
If everything looks fine after after 2/4 weeks you could terminate your EC2_A (hope everything is as code and just terraform destroy)
Considering the previously detailed steps plan your roadmap to move forward with every other component to be migrated.
Consider the following AWS official links as reference:
Why should I set up a multi-account AWS environment?
AWS Multiple Account User Management Strategy
AWS Muttiple Account Security Strategy
AWS Multiple Account Billing Strategy
AWS Secure Account Setup
Authentication and Access Control for AWS Organizations (keep in mind EC2 and other services can also use AWS IAM Roles to get secure cross-account access)
AWS Backup is a fully managed backup service that makes it easy to centralize and automate the backup of data across AWS services. Using AWS Backup, you can centrally configure backup policies and monitor backup activity for AWS resources, such as:
Amazon EBS volumes,
Amazon EC2 instances,
Amazon RDS databases,
Amazon DynamoDB tables,
Amazon EFS file systems,
and AWS Storage Gateway volumes.
AWS Backup automates and consolidates backup tasks previously performed service-by-service, removing the need to create custom scripts and manual processes. With just a few clicks in the AWS Backup console, you can create backup policies that automate backup schedules and retention management. AWS Backup provides a fully managed, policy-based backup solution, simplifying your backup management, enabling you to meet your business and regulatory backup compliance requirements.
Figure: AWS Backup service diagram (just as reference). (Source: AWS, \"AWS Backup - Centrally manage and automate backups across AWS services\", AWS Documentation, accessed November 18th 2020)."},{"location":"user-guide/ref-architecture-aws/features/reliability/backups/#s3-bucket-region-replication","title":"S3 bucket region replication","text":"
Buckets that hold data critical to business or to application operation can be replicated to another region almost synchronously.
This can be setup on request to increase durability and along with database backup can constitute the base for a Business Continuity strategy.
"},{"location":"user-guide/ref-architecture-aws/features/reliability/backups/#comparison-of-the-backup-and-retention-policies-strategies","title":"Comparison of the backup and retention policies strategies","text":"
In this sub-section you'll find the resources to review and adjust your backup retention policies to adhere to compliance rules that govern your specific institutions regulations. This post is a summarised write-up of how we approached this sensitive task, the alternatives we analysed and the recommended solutions we provided in order to meet the requirements. We hope it can be useful for others as well.
Leverage Confluence Documentation
You'll find here a detailed comparison including the alternative product and solution types, pricing model, features, pros & cons.
"},{"location":"user-guide/ref-architecture-aws/features/reliability/dr/","title":"Disaster Recovery & Business Continuity Plan","text":""},{"location":"user-guide/ref-architecture-aws/features/reliability/dr/#overview","title":"Overview","text":"
Applications that are business critical should always have a plan in place to recover in case of a catastrophic failure or disaster. There are many strategies that can be implemented to achieve this, and deciding between them is a matter of analyzing how much is worth to invest based on calculation of damages suffered if the application is not available for a given period of time. It is based on this factor (time) that disaster recovery plans are based on. Factors that need to be determined per application are:
RTO and RPO
Recovery time objective (RTO): This represents the time it takes after a disruption to restore a business process to its service level. For example, if a disaster occurs at 12:00 PM (noon) and the RTO is eight hours, the DR process should restore the business process to the acceptable service level by 8:00 PM.
Recovery point objective (RPO): This is the acceptable amount of data loss measured in time. For example, if a disaster occurs at 12:00 PM (noon) and the RPO is one hour, the system should recover all data that was in the system before that hour.
After deciding RTO and RPO we have options available to achieve the time objectives:
HA Strategies
Backup and restore: In most traditional environments, data is backed up to tape and sent off-site regularly. The equivalent in AWS would be to take backups in the form of snapshots and copy them to another region for RDS instances, EBS volumes, EFS and S3 buckets. The plan details the step-by-step procedure to recover a fully working production environment based on these backups being restored on freshly provisioned infrastructure, and how to rollback to a regular production site once the emergency is over.
Pilot Light Method: The term pilot light is often used to describe a DR scenario in which a minimal version of an environment is always running in AWS. Very similar to \u201cBackup and restore\u201d except a minimal version of key infrastructure components is provisioned in a separate region and then scaled up in case of disaster declaration.
Warm standby active-passive method: The term warm-standby is used to describe a DR scenario in which a scaled-down version of a fully-functional environment is always running in the cloud. Enhancement of Pilot Light in which a minimal version is created of all components, not just critical ones.
Multi-Region active-active method: By architecting multi region applications and using DNS to balance between them in normal production status, you can adjust the DNS weighting and send all traffic to the AWS region that is available, this can even be performed automatically with Route53 or other DNS services that provide health check mechanisms as well as load balancing.
Figure: 2 sets of app instances, each behind an elastic load balancer in two separate regions (just as reference). (Source: Randika Rathugamage, \"High Availability with Route53 DNS Failover\", Medium blogpost, accessed December 1st 2020). Figure: AWS calculated \u2014 or parent \u2014 health check, we can fail on any number of child health checks (just as reference). (Source: Simon Tabor, \"How to implement the perfect failover strategy using Amazon Route53\", Medium blogpost, accessed December 1st 2020)."},{"location":"user-guide/ref-architecture-aws/features/reliability/dr/#read-more","title":"Read more","text":"
AWS reference links
Consider the following AWS official links as reference:
"},{"location":"user-guide/ref-architecture-aws/features/reliability/high-availability/","title":"High Availability & Helthchecks","text":""},{"location":"user-guide/ref-architecture-aws/features/reliability/high-availability/#recovery-from-failures","title":"Recovery from Failures","text":"
Automatic recovery from failure
It keeps an AWS environment reliable. Using logs and metrics from CloudWatch, designing a system where the failures themselves trigger recovery is the way to move forward.
Figure: AWS HA architecture diagrams (just as reference)."},{"location":"user-guide/ref-architecture-aws/features/reliability/high-availability/#recovery-procedures","title":"Recovery Procedures","text":"
Test recovery procedures
The risks faced by cloud environment and systems, the points of failure for systems and ecosystems, as well as details about the most probable attacks are known and can be simulated. Testing recovery procedures are something that can be done using these insights. Real points of failure are exploited and the way the environment reacts to the emergency shows just how reliable the system it.
Figure: AWS HA architecture diagrams (just as reference)."},{"location":"user-guide/ref-architecture-aws/features/reliability/high-availability/#scalability-and-availability","title":"Scalability and Availability","text":"
Scale horizontally to increase aggregate system availability
The cloud environment needs to have multiple redundancies and additional modules as added security measures. Of course, multiple redundancies require good management and maintenance for them to remain active through the environment\u2019s lifecycle.
Figure: AWS HA scalable architecture diagrams (just as reference)."},{"location":"user-guide/ref-architecture-aws/features/reliability/high-availability/#healthchecks-self-healing","title":"Healthchecks & Self-healing","text":""},{"location":"user-guide/ref-architecture-aws/features/reliability/high-availability/#k8s-and-containers","title":"K8s and containers","text":"
K8s readiness and liveness probes
Distributed systems can be hard to manage. A big reason is that there are many moving parts that all need to work for the system to function. If a small part breaks, the system has to detect it, route around it, and fix it. And this all needs to be done automatically! Health checks are a simple way to let the system know if an instance of your app is working or not working.
If an instance of your app is not working, then other services should not access it or send a request to it. Instead, requests should be sent to another instance of the app that is ready, or re-tried at a later time. The system should also bring your app back to a healthy state.
By default, Kubernetes starts to send traffic to a pod when all the containers inside the pod start, and restarts containers when they crash. While this can be \u201cgood enough\u201d when you are starting out, you can make your deployments more robust by creating custom health checks. Fortunately, Kubernetes make this relatively straightforward, so there is no excuse not to!\u201d
So aside from the monitoring and alerting that underlying infrastructure will have, application container will have their own mechanisms to determine readiness and liveness. These are features that our scheduler of choice Kubernetes natively allows, to read more click here.
"},{"location":"user-guide/ref-architecture-aws/features/secrets/secrets/","title":"Secrets and Passwords Management","text":""},{"location":"user-guide/ref-architecture-aws/features/secrets/secrets/#overview","title":"Overview","text":"
Ensure scalability, availability and persistence, as well as secure, hierarchical storage to manage configuration and secret data for:
Secret Managers
AWS KMS
AWS SSM Parameter Store
Ansible Vault
Hashicorp Vault
Strengths
Improve the level of security by validating separation of environment variables and code secrets.
Control and audit granular access in detail
Store secure chain and configuration data in hierarchies and track versions.
Configure integration with AWS KMS, Amazon SNS, Amazon CloudWatch, and AWS CloudTrail to notify, monitor, and audit functionality.
AWS CloudTrail monitors and records account activity across your AWS infrastructure, giving you control over storage, analysis, and remediation actions.
AWS CloudTrail overview
This service will be configured to enable auditing of all AWS services in all accounts. Once enabled, as shown in the below presented figure, CloudTrail will deliver all events from all accounts to the Security account in order to have a centralized way to audit operations on AWS resources. Audit events will be available from CloudTrail for 90 days but a longer retention time will be available through a centralized S3 bucket.
Figure: AWS CloudTrail components architecture diagram (just as reference). (Source: binbash Leverage diagrams, accessed July 6th 2022).
\"AWS Certificate Manager is a service that lets you easily provision, manage, and deploy public and private Secure Sockets Layer/Transport Layer Security (SSL/TLS) certificates for use with AWS services and your internal connected resources. SSL/TLS certificates are used to secure network communications and establish the identity of websites over the Internet as well as resources on private networks. AWS Certificate Manager removes the time-consuming manual process of purchasing, uploading, and renewing SSL/TLS certificates.\"
\"With AWS Certificate Manager, you can quickly request a certificate, deploy it on ACM-integrated AWS resources, such as:
Elastic Load Balancers,
Amazon CloudFront distributions,
and APIs on API Gateway,
and let AWS Certificate Manager handle certificate renewals. It also enables you to create private certificates for your internal resources and manage the certificate lifecycle centrally. Public and private certificates provisioned through AWS Certificate Manager for use with ACM-integrated services are free. You pay only for the AWS resources you create to run your application. With AWS Certificate Manager Private Certificate Authority, you pay monthly for the operation of the private CA and for the private certificates you issue.\"
Figure: AWS certificate manager (ACM) service integration diagram. (Source: AWS, \"Amazon Certificate Manager intro diagram\", AWS Documentation Amazon ACM User Guide, accessed August 4th 2021)."},{"location":"user-guide/ref-architecture-aws/features/security/certificates/#cert-manager-lets-encrypt","title":"Cert-manager + Let's Encrypt","text":"
Why Cert-manager + Let's Encrypt\u2753
cert-manager adds certificates and certificate issuers as resource types in Kubernetes clusters, and simplifies the process of obtaining, renewing and using those certificates.
It can issue certificates from a variety of supported sources, including Let\u2019s Encrypt, HashiCorp Vault, and Venafi as well as private PKI.
It will ensure certificates are valid and up to date, and attempt to renew certificates at a configured time before expiry.
It is loosely based upon the work of kube-lego and has borrowed some wisdom from other similar projects such as kube-cert-manager.
Figure: Certificate manager high level components architecture diagram. (Source: Cert-manager official documentation, \"Cert-manager manager intro overview\", Cert-manager Documentation main intro section, accessed August 4th 2021)."},{"location":"user-guide/ref-architecture-aws/features/security/firewall-manager/","title":"Firewall Manager","text":""},{"location":"user-guide/ref-architecture-aws/features/security/firewall-manager/#use-cases","title":"Use Cases","text":"
Network Firewall rules: Security administrators will be able to deploy firewall rules for AWS Network Firewall to control traffic leaving and entering your network across accounts and Amazon VPCs, from the Security account.
WAF & WAF v2: Your security administrators will able to deploy WAF and WAF v2 rules, and Managed rules for WAF to be used on Application Load Balancers, API Gateways and Amazon CloudFront distributions.
Route 53 Resolver DNS Firewall rules: Deploy Route 53 Resolver DNS Firewall rules from the Security account to enforce firewall rules across your organization.
Audit Security Groups: You can create policies to set guardrails that define what security groups are allowed/disallowed across your VPCs. AWS Firewall Manager continuously monitors security groups to detect overly permissive rules, and helps improve firewall posture. You can get notifications of accounts and resources that are non-compliant or allow AWS Firewall Manager to take action directly through auto-remediation.
Security Groups: Use AWS Firewall Manager to create a common primary security group across your EC2 instances in your VPCs.
Access Analyzer analyzes the resource-based policies that are applied to AWS resources in the Region where you enabled Access Analyzer. Only resource-based policies are analyzed.
Supported resource types:
Amazon Simple Storage Service buckets
AWS Identity and Access Management roles
AWS Key Management Service keys
AWS Lambda functions and layers
Amazon Simple Queue Service queues
AWS Secrets Manager secrets
Figure: AWS IAM access analysis features. (Source: AWS, \"How it works - monitoring external access to resources\", AWS Documentation, accessed June 11th 2021)."},{"location":"user-guide/ref-architecture-aws/features/security/iam-access-analyzer/#aws-organizations","title":"AWS Organizations","text":"
CONSIDERATION: AWS Organization integration
In order to enable AccessAnalyzer with the Organization at the zone of of trust in the Security account, this account needs to be set as a delegated administrator.
Such step cannot be performed by Terraform yet so it was set up manually as described below: https://docs.aws.amazon.com/IAM/latest/UserGuide/access-analyzer-settings.html
If you're configuring AWS IAM Access Analyzer in your AWS Organizations management account, you can add a member account in the organization as the delegated administrator to manage Access Analyzer for your organization. The delegated administrator has permissions to create and manage analyzers with the organization as the zone of trust. Only the management account can add a delegated administrator.
"},{"location":"user-guide/ref-architecture-aws/features/security/iam-access-analyzer/#aws-web-console","title":"AWS Web Console","text":"Figure: AWS Web Console screenshot. (Source: binbash, \"IAM access analyzer service\", accessed June 11th 2021)."},{"location":"user-guide/ref-architecture-aws/features/security/overview/","title":"Security","text":""},{"location":"user-guide/ref-architecture-aws/features/security/overview/#supported-aws-security-services","title":"Supported AWS Security Services","text":"
AWS IAM Access Analyzer: Generates comprehensive findings that identify resources policies for public or cross-account accessibility, monitors and helps you refine permissions. Provides the highest levels of security assurance.
AWS Config: Tracks changes made to AWS resources over time, making possible to return to a previous state. Monitors and records your AWS resource configurations and allows you to automate the evaluation of recorded configurations against desired compliance rule set. Adds accountability factor.
AWS Cloudtrail: Stores logs over all calls made to AWS APIs, coming from web console, command line or any other. Allowing us to monitor it via CW Dashboards and notifications.
AWS VPC Flow Logs: Enables us to examine individual Network Interfaces logs, to address network issues and also monitor suspicious behavior.
AWS Web Application Firewall: Optional but if not used, it is recommended that a similar service is used, such as Cloudflare. When paired to an Application Load Balancer or Cloudfront distribution, it checks incoming requests to detect and block OWASP Top10 attacks, such as SQL injection, XSS and others.
AWS Inspector: Is an automated security assessment service that helps improve the security and compliance of infrastructure and applications deployed on AWS.
AWS GuardDuty: Is a managed threat detection service that continuously monitors for malicious or unauthorized behavior to help you protect your AWS accounts and workloads. Detects unusual API calls or potentially unauthorized deployments (possible account compromise) and potentially compromised instances or reconnaissance by attackers.
AWS Security Logs Other access logs from client-facing resources will be stored in the Security account.
AWS Firewall Manager Is a security management service which allows you to centrally configure and manage firewall rules across your accounts and applications in AWS Organizations. This service lets you build firewall rules, create security policies, and enforce them in a consistent, hierarchical manner across your entire infrastructure, from a central administrator account.
K8s API via kubectl private endpoint eg: avoiding emergency K8s API vulnerability patching.
Limit exposure: Limit the exposure of the workload to the internet and internal networks by only allowing minimum required access -> Avoiding exposure for Dev/QA/Stg http endpoints
The Pritunl OpenVPN Linux instance is hardened and only runs this VPN solution. All other ports/access is restricted.
Each VPN user can be required to use MFA to connect via VPN (as well as strong passwords). This combination makes almost impossible for an outsider to gain access via VPN.
Centralized access and audit logs.
Figure: Securing access to a private network with Pritunl diagram. (Source: Pritunl, \"Accessing a Private Network\", Pritunl documentation v1 Guides, accessed November 17th 2020)."},{"location":"user-guide/ref-architecture-aws/features/security/vpn/#read-more","title":"Read More","text":"
Pritunl - Open Source Enterprise Distributed OpenVPN, IPsec and WireGuard Server Specifications
Welcome to the comprehensive guide for using the AWS Systems Manager (SSM) through the Leverage framework integrated with AWS Single Sign-On (SSO). This documentation is designed to facilitate a smooth and secure setup for managing EC2 instances, leveraging advanced SSO capabilities for enhanced security and efficiency.
The AWS Systems Manager (SSM) provides a powerful interface for managing cloud resources. By initiating an SSM session using the leverage aws sso configure command, you can securely configure and manage your instances using single sign-on credentials. This integration simplifies the authentication process and enhances security, making it an essential tool for administrators and operations teams.
SSO Integration: Utilize the Leverage framework to integrate AWS SSO, simplifying the login process and reducing the need for multiple credentials.
Interactive Command Sessions: The start-session command requires the Session Manager plugin and is interactive, ensuring secure and direct command execution.
This command configures your AWS CLI to use SSO for authentication, streamlining access management across your AWS resources.
leverage aws sso configure\n
"},{"location":"user-guide/ref-architecture-aws/features/ssm/ssm/#advantages-of-terminal-access","title":"Advantages of Terminal Access","text":"
While it is possible to connect to SSM through a web browser, using the terminal offers several benefits:
Direct Shell Access: Provides real-time, interactive management capabilities.
Operational Efficiency: Enhances workflows by allowing quick and direct command executions.
"},{"location":"user-guide/ref-architecture-aws/features/ssm/ssm/#security-and-management-benefits","title":"Security and Management Benefits","text":"
Adopting this integrated approach offers significant advantages:
Increased Security: By using SSO, the system minimizes risks associated with multiple credential sets and potential unauthorized access.
Efficient Management: Centralizes control over AWS resources, reducing complexity and improving oversight.
This guide is structured into detailed sections that cover:
Pre-requisites: Requirements needed before you begin.
Variable Initialization: Setup and explanation of the necessary variables.
Authentication via SSO: How to authenticate using the leverage aws sso configure command.
Exporting AWS Credentials: Guidelines for correctly exporting AWS credentials for session management.
Session Handling: Detailed instructions for starting, managing, and terminating SSM sessions.
Each section aims to provide step-by-step instructions to ensure you are well-prepared to use the AWS SSM configuration tool effectively.
Navigate through the subsections for detailed information relevant to each stage of the setup process and refer back to this guide as needed to enhance your experience and utilization of AWS SSM capabilities.
Before you begin, ensure that you have the necessary tools and permissions set up:
SSM Plugin for AWS CLI: Crucial for starting SSM sessions from the command line. Install it by following the steps on the AWS Documentation site.
"},{"location":"user-guide/ref-architecture-aws/features/ssm/ssm/#getting-started-guide","title":"Getting Started Guide","text":""},{"location":"user-guide/ref-architecture-aws/features/ssm/ssm/#step-1-initialize-environment-variables","title":"Step 1: Initialize Environment Variables","text":"
Set up all necessary variables used throughout the session. These include directories, profiles, and configuration settings essential for the script\u2019s functionality.
"},{"location":"user-guide/ref-architecture-aws/features/ssm/ssm/#step-2-authenticate-via-sso","title":"Step 2: Authenticate via SSO","text":"
Navigate to the required layer directory and perform authentication using AWS SSO. This step verifies your credentials and ensures that subsequent operations are secure.
cd $FOLDER/shared/us-east-1/tools-vpn-server\nleverage aws sso configure\n
This command initiates a secure session to the specified EC2 instance using SSM. It's a crucial tool for managing your servers securely without the need for direct SSH access. Ensure that your permissions and profiles are correctly configured to use this feature effectively.
By following these steps, you can efficiently set up and use the AWS SSM configuration tool for enhanced security and management of your cloud resources.
For a complete view of the script and additional configurations, please refer to the full Gist.
Before deploying your AWS SSO definition in the project, it will first have to be manually enabled in the AWS Management Console.
Prerequisites
Enable AWS SSO
After that, choosing and configuring an Identity Provider (IdP) is the next step. For this, we will make use of JumpCloud, as described in the how it works section. These resources point to all requirements and procedures to have your JumpCloud account setup and synched with AWS SSO:
AWS JumpCloud support guide
JumpCloud guide on how to configure as IdP for AWS SSO
Once this is set up, the SSO layer can be safely deployed.
"},{"location":"user-guide/ref-architecture-aws/features/sso/configuration/#preparing-the-project-to-use-aws-sso","title":"Preparing the project to use AWS SSO","text":"
To implement SSO authentication in your IaC definition, some configuration values need to be present in your project.
sso_enabled determines whether leverage will attempt to use credentials obtained via SSO to authenticate against AWS
sso_start_url and sso_region are necessary to configure AWS CLI correctly in order to be able to get the credentials
When configuring AWS CLI, a default profile is created containing region and output default settings. The region value is obtained from the previously mentioned sso_region, however, you can override this behavior by configuring a region_primary value in the same global configuration file, as so:
This is the role for which credentials will be obtained via SSO when operating in the current layer.
"},{"location":"user-guide/ref-architecture-aws/features/sso/configuration/#authentication-via-sso","title":"Authentication via SSO","text":""},{"location":"user-guide/ref-architecture-aws/features/sso/configuration/#1-configuring-aws-sso","title":"1. Configuring AWS SSO","text":"
Once the project has been set up to use SSO, the profiles that AWS CLI will use to authenticate against the service need to be created.
To do this, simply run leverage aws configure sso.
Attention
This step simply writes over the credentials files for AWS CLI without asking for confirmation from the user. So it's recommended to backup/wipe old credentials before executing this step in order to avoid loss of credentials or conflicts with profiles having similar names to the ones generated by Leverage.
This step is executed as part of the previous one. So if the user has just configured SSO, this step is not required.
Having SSO configured, the user will proceed to log in.
This is achieved by running leverage aws sso login.
In this step, the user is prompted to manually authorize the log in process via a web console.
When logging in, Leverage obtains a token from SSO. This token is later used to obtain the credentials needed for the layer the user is working on. This token has a relatively short life span to strike a balance between security and convenience for the user.
"},{"location":"user-guide/ref-architecture-aws/features/sso/configuration/#3-working-on-a-layer","title":"3. Working on a layer","text":"
When SSO is enabled in the project, Leverage will automatically figure out the required credentials for the current layer, and attempt to get them from AWS every time the user executes a command on it.
These credentials are short lived (30 minutes) for security reasons, and will be refreshed automatically whenever they expire.
When the user has finished working, running leverage sso logout wipes out all remaining valid credentials and voids the token obtained from logging in.
Enabling and requiring MFA is highly recommended. We typically choose these following guidelines:
Prompt users for MFA: Only when their sign-in context changes (context-aware).
Users can authenticate with these MFA types: we allow security keys, built-in authenticators (such as fingerprint or retina/face scans), and authenticator apps.
If a user does not yet have a registered MFA device: require them to register an MFA device at sign in.
Who can manage MFA devices: users and administrators can add and manage MFA devices.
Refer to the official documentation for more details.
By default, the SSO session is set to last 12 hours. This is a good default but we still prefer to share this decision making with the Client -- e.g. focal point, dev/qa team, data science teams. They might factor in considerations such as security/compliance, UX/DevEx, operational needs, technical constraints, administration overheads, cost considerations, and more.
"},{"location":"user-guide/ref-architecture-aws/features/sso/managing-users/","title":"Managing users","text":""},{"location":"user-guide/ref-architecture-aws/features/sso/managing-users/#onboarding-users-and-groups","title":"Onboarding Users and Groups","text":""},{"location":"user-guide/ref-architecture-aws/features/sso/managing-users/#addremove-users","title":"Add/remove users","text":"
Open this file: management/global/sso/locals.tf
Locate the users map within the local variables definition
Add an entry to the users map with all the required data, including the groups the user should belong to
Apply your changes
Additional steps are required when creating a new user:
The user's email needs to be verified. Find the steps for that in this section.
After the user has verified his/her email he/she should be able to use the Forgot Password flow to generate its password. The steps for that can be found in this section.
Open this file: devops-tf-infra/management/global/sso/locals.tf
Find the users map within the local variables definition
Update the groups attribute to add/remove groups that user belongs to
Apply your changes
"},{"location":"user-guide/ref-architecture-aws/features/sso/managing-users/#trigger-user-email-activation","title":"Trigger user email activation","text":"
Log in to management account through the AWS console
Go to AWS IAM Identity Center
Go to the users section
Locate the user whose email you want to active
Click on the user to view the user details
There should be a \"Send verification email\" or \"Send email verification link\" button at the top. Click on it.
Notify the user, confirm that he/she got the email and that he/she clicked on the activation link.
"},{"location":"user-guide/ref-architecture-aws/features/sso/managing-users/#reset-a-user-password","title":"Reset a user password","text":"
JumpCloud will be configured as the Identity Provider (IdP) that we will integrate with AWS SSO in order to grant users access to AWS resources from a centralized service. Users will be able to log in to JumpCloud in order to access AWS accounts, using specific permission sets that will in turn determine what kind of actions they are allowed on AWS resources.
Users will be defined in JumpCloud and used for deploying AWS resources with scoped permissions.
"},{"location":"user-guide/ref-architecture-aws/features/sso/overview/#sso-groups","title":"SSO Groups","text":"Account / Groups Administrators DevOps FinOps SecurityAuditors Management x x x x
Consideration
This definition could be fully customized based on the project specific needs
"},{"location":"user-guide/ref-architecture-aws/features/sso/overview/#sso-permission-sets-w-account-associations","title":"SSO Permission Sets (w/ Account Associations)","text":"Account / Permission Sets Administrator DevOps FinOps SecurityAuditors Management x x Security x x x Shared x x x Network x x x Apps-DevStg x x x Apps-Prd x x x
Considerations
Devops Engineers will assume this permission set through JumpCloud + AWS SSO.
Developers could have their specific SSO Group + Permission Set policy association.
This definition could be fully customized based on the project specific needs
We will review all S3 buckets in the existing account to determine if it\u2019s necessary to copy over to the new account, evaluate existing bucket policy and tightening permissions to be absolutely minimum required for users and applications. As for EBS volumes, our recommendation is to create all encrypted by default. Overhead created by this process is negligible.
Storage class Designed for Durability (designed for) Availability (designed for) Availability Zones Min storage duration Min billable object size Other considerations S3 Standard Frequently accessed data 99.999999999% 99.99% >= 3 None None None S3 Standard-IA Long-lived, infrequently accessed data 99.999999999% 99.9% >= 3 30 days 128 KB Per GB retrieval fees apply. S3 Intelligent-Tiering Long-lived data with changing or unknown access patterns 99.999999999% 99.9% >= 3 30 days None Monitoring and automation fees per object apply. No retrieval fees. S3 One Zone-IA Long-lived, infrequently accessed, non-critical data 99.999999999% 99.5% 1 30 days 128 KB Per GB retrieval fees apply. Not resilient to the loss of the Availability Zone. S3 Glacier Long-term data archiving with retrieval times ranging from minutes to hours 99.999999999% 99.99% (after you restore objects) >= 3 90 days 40 KB Per GB retrieval fees apply. You must first restore archived objects before you can access them. For more information, see Restoring archived objects. S3 Glacier Deep Archive Archiving rarely accessed data with a default retrieval time of 12 hours 99.999999999% 99.99% (after you restore objects) >= 3 180 days 40 KB Per GB retrieval fees apply. You must first restore archived objects before you can access them. For more information, see Restoring archived objects. RRS (Not recommended) Frequently accessed, non-critical data 99.99% 99.99% >= 3 None None None"},{"location":"user-guide/ref-architecture-aws/features/storage/storage/#ebs-volumes","title":"EBS Volumes","text":"
Tech specs
Backups: Periodic EBS snapshots with retention policy
Encryption: Yes (by default)
Type: SSD (gp2) by default, Throughput Optimized HDD (st1) for some database workloads, if needed.
This guideline includes considerations and steps that should be performed when upgrading a cluster to a newer version.
"},{"location":"user-guide/ref-architecture-eks/cluster-upgrade/#upgrade-plan-overview","title":"Upgrade Plan Overview","text":"
General considerations
Preparation Steps
Understand what changed
Plan a maintenance window for the upgrade
Rehearse on a non-Production cluster first
Ensure you have proper visibility on the cluster
Upgrade Steps
Upgrade Control Plane
Upgrade Managed Node Groups
Upgrade Cluster AutoScaler version
Upgrade EKS Add-ons
Closing Steps
Migration Notes
"},{"location":"user-guide/ref-architecture-eks/cluster-upgrade/#detailed-upgrade-plan","title":"Detailed Upgrade Plan","text":""},{"location":"user-guide/ref-architecture-eks/cluster-upgrade/#1-general-considerations","title":"1) General considerations","text":"
Ensure your sensitive workloads are deployed in a highly available manner to reduce downtime as much as possible
Ensure Pod Disruption Budgets are set in your deployments to ensure your application pods are evicted in a controlled way (e.g. leave at least one pod active at all times)
Ensure Liveness and Readiness probes are set so that Kubernetes can tell whether your application is healthy to start receiving traffic or needs a restart
Plan the upgrade during off hours so that unexpected disruptions have even less impact on end-users
"},{"location":"user-guide/ref-architecture-eks/cluster-upgrade/#2-preparation-steps","title":"2) Preparation Steps","text":""},{"location":"user-guide/ref-architecture-eks/cluster-upgrade/#understand-what-changed","title":"Understand what changed","text":"
Here you need to get a good understanding of the things that changed between the current version and the version you want to upgrade to. For that, it is highly recommended to go to the AWS EKS official documentation as it is frequently being updated.
Another documentation you should refer to is the Kubernetes official documentation, specially the Kubernetes API Migration Guide which explains in great detail what has been changed.
For instance, typical changes include:
Removed/deprecated Kubernetes APIs: this one may require that you also upgrade the resources used by your applications or even base components your applications rely on. E.g. cert-manager, external-dns, etc.
You can use tools such as kubent to find deprecated API versions. That should list the resources that need to be upgraded however you may still need to figure out if it's an EKS base component or a cluster component installed via Terraform & Helm.
Base component updates: this is about changes to control plane components. components that run on the nodes. An example of that would be the deprecation and removal of Docker as a container runtime.
"},{"location":"user-guide/ref-architecture-eks/cluster-upgrade/#plan-a-maintenance-window-for-the-upgrade","title":"Plan a maintenance window for the upgrade","text":"
Keep in mind that, at the very least, you will be upgrading the control plane and the data plane; and in some cases you would also need to upgrade components and workloads. So, although Kubernetes has a great development team and automation; and even though we rely on EKS for which AWS performs additional checks and validations, we are still dealing with a complex, evolving piece of software, so planning for the upgrade is still a reasonable move.
Upgrading the control plane should not affect the workloads but you should still bear in mind that the Kubernetes API may become unresponsive during the upgrade, so anything that talks to the Kubernetes API might experience delays or even timeouts.
Now, upgrading the nodes is the more sensitive task and, while you can use a rolling-update strategy, that still doesn't provide any guarantees on achieving a zero down-time upgrade so, again, planning for some maintenance time is recommended.
"},{"location":"user-guide/ref-architecture-eks/cluster-upgrade/#rehearse-on-a-non-production-cluster-first","title":"Rehearse on a non-Production cluster first","text":"
Perform the upgrade on a non-Production to catch up and anticipate any issues before you upgrade the Production cluster. Also take notes and reflect any important updates on this document.
"},{"location":"user-guide/ref-architecture-eks/cluster-upgrade/#ensure-you-have-proper-visibility-on-the-cluster","title":"Ensure you have proper visibility on the cluster","text":"
Monitoring the upgrade is important so make sure you have monitoring tools in-place before attempting the upgrade. Such tools include the AWS console (via AWS EKS Monitoring section) and also tools like Prometheus/Grafana and ElasticSearch/Kibana. Make sure you are familiar with those before the upgrade.
"},{"location":"user-guide/ref-architecture-eks/cluster-upgrade/#3-upgrade-steps","title":"3) Upgrade Steps","text":""},{"location":"user-guide/ref-architecture-eks/cluster-upgrade/#1-upgrade-control-plane","title":"1) Upgrade Control Plane","text":"
This is simply about updating the cluster_version variable in the variables.tf file within the cluster layer of the cluster you want to upgrade and then applying that change. However, the current version of the Terraform EKS module, when modifying the cluster version input, it will show that it needs to upgrade the control plane and the nodes which may not follow the expected order (first cluster, then nodes). Another thing that could go wrong is Terraform ending up in an unfinished state due to the upgrade taking too long to complete (or, what happened to me, the cluster gets upgraded but somehow the launch template used for the nodes is deleted and thus the upgraded nodes cannot be spun up).
The alternative to all of that is to perform the upgrade outside Terraform and, after it is complete, to update the cluster_version variable in variables.tf file. Then you can run a Terraform Plan to verify the output shows no changes. This should be the method that provides a good degree of control over the upgrade.
Having said that, go ahead and proceed with the upgrade, either via the AWS console, the AWS CLI or the EKS CLI and watch the upgrade as it happens. As it was stated in a previous step, the Kubernetes API may evidence some down-time during this operation so make sure you prepare accordingly.
Once the control plane is upgraded you should be ready to upgrade the nodes. There are 2 strategies you could use here: rolling-upgrade or recreate. The former is recommended for causing the minimal disruption. Recreate could be used in an environment where down-time won't be an issue.
As it was mentioned in the previous step, the recommendation is to trigger the upgrade outside Terraform so please proceed with that and monitor the operation as it happens (via AWS EKS console, via Kubectl, via Prometheus/Grafana).
If you go with the AWS CLI, you can use the following command to get a list of the clusters available to your current AWS credentials:
aws eks list-clusters --profile [AWS_PROFILE]\n
Make a note of the cluster name as you will be using that in subsequent commands.
Now use the following command to get a list of the node groups:
After that, you need to identify the appropriate release version for the upgrade. This is determined by the Amazon EKS optimized Amazon Linux AMI version, the precise value can be found in the AMI Details section in the github repository CHANGELOG. Look for the column named Release version in the first table under the corresponding Kubernetes version collapsed section (indicated by a \u25b8 symbol).
With that information you should be ready to trigger the update with the command below:
Modify scaling.tf per the official Kubernetes autoscaler chart and apply with Terraform. The version of the cluster autoscaler should at least match the cluster version you are moving to. A greater version of the autoscaler might work with earlier version of Kubernetes but the opposite most likely won't be the case.
"},{"location":"user-guide/ref-architecture-eks/cluster-upgrade/#4-upgrade-eks-base-components","title":"4) Upgrade EKS base components","text":"
Namely these components are:
Kube-Proxy
CoreDNS
VPC CNI
EBS-CSI driver
In recent versions EKS is able to manage these components as add-ons which makes their upgrades less involved and which can even be performed through a recent version of the Terraform EKS module. However, we are not currently using EKS Add-ons to manage the installation of these components, we are using the so called self-managed approach, so the upgrade needs to be applied manually.
Generally speaking, the upgrade procedure could be summed up as follows:
Determine current version
Determine the appropriate version you need to upgrade to
Upgrade each component and verify
Now, the recommendation is to refer to the following guides which carefully describe the steps that need to be performed:
Kube-proxy: check here
CoreDNS: check here
VPC CNI: check here
EBS-CSI driver: check here
IMPORTANT: be extremely careful when applying these updates, specially with the VPC CNI as the instructions are not easy to follow.
Make sure you notify the team about the upgrade result. Also, do not forget about committing/pushing all code changes to the repository and creating a PR for them.
If you found any information you consider it should be added to this document, you are welcome to reflect that here.
"},{"location":"user-guide/ref-architecture-eks/cluster-upgrade/#upgrade-to-v121","title":"Upgrade to v1.21","text":"
VPC CNI: The latest available version was v1.11.4 but it was only able to be upgraded to v1.9.3. It couldn't be moved further because v1.10.3 wasn't able to run as it keep throwing the following errors:
{\"level\":\"info\",\"ts\":\"2022-10-07T15:42:01.802Z\",\"caller\":\"entrypoint.sh\",\"msg\":\"Retrying waiting for IPAM-D\"}\npanic: runtime error: invalid memory address or nil pointer dereference\n[signal SIGSEGV: segmentation violation code=0x1 addr=0x39 pc=0x560d2186d418]\n
Cluster Autoscaler: it is already at v1.23.0. The idea is that this should match with the Kubernetes version but since the version we have has been working well so far, we can keep it and it should cover us until we upgrade Kubernetes to a matching version.
Managed Nodes failures due to PodEvictionFailure: this one happened twice during a Production cluster upgrade. It seemed to be related to Calico pods using tolerations that are not compatible with Kubernetes typical node upgrade procedure. In short, the pods tolerate the NoSchedule taint and thus refuse to be evicted from the nodes during a drain procedure. The workaround that worked was using a forced upgrade. That is essentially a flag that can be passed via Terraform (or via AWS CLI). A more permanent solution would involve figuring out a proper way to configure Calico pods without the problematic toleration; we just need to keep in mind that we are deploying Calico via the Tigera Operator.
"},{"location":"user-guide/ref-architecture-eks/cluster-upgrade/#upgrade-to-v122","title":"Upgrade to v1.22","text":"
Control plane and managed nodes: no issues. Cluster Autoscaler: already at v1.23.0. Kube-proxy: no issues. Upgraded to v1.22.16-minimal-eksbuild.3. CodeDNS: no issues. Upgraded to v1.8.7-eksbuild.1. VPC CNI: no issues. Upgraded to latest version available, v1.12.1.
Outstanding issue: Prometheus/Grafana instance became unresponsive right during the upgrade of the control plane. It was fully inaccessible. A stop and start was needed to bring it back up.
"},{"location":"user-guide/ref-architecture-eks/cluster-upgrade/#upgrade-to-v125","title":"Upgrade to v1.25","text":"
Before upgrading to v1.25 there's two main issues to tackle. The main one is the removal of the PodSecurityPolicy resource from the policy/v1beta1 API. You should migrate all your PSPs to PodSecurityStandards or any other Policy-as-code solution for Kubernetes. If the only PSP found in the cluster is named eks.privileged, you can skip this step. This is a PSP handled by EKS and will be migrated for you by the platform. For more information about this, the official EKS PSP removal FAQ can be referenced. The second issue to tackle is to upgrade aws-load-balancer-controller to v2.4.7 or later to address the removal of EndpointSlice from the discovery.k8s.io/v1beta1 API. This should be done via the corresponding helm-chart.
After the control plane and managed nodes are upgraded, which should present no issues, the cluster autoscaler needs to be upgraded. Usually we would achieve this by changing the helm-hart version to an appropriate one that deploys the matching version to the cluster, that is, cluster autoscaler v1.25. However, there's no release that covers this scenario, as such, we need to provide the cluster autoscaler image version to the current helm-chart via the image.tag values file variable.
Addons should present no problems being upgraded to the latest available version.
"},{"location":"user-guide/ref-architecture-eks/cluster-upgrade/#upgrade-to-v126","title":"Upgrade to v1.26","text":"
No extra considerations are needed to upgrade from v1.25 to v1.26. The standard procedure listed above should work with no issues.
"},{"location":"user-guide/ref-architecture-eks/cluster-upgrade/#upgrade-to-v127","title":"Upgrade to v1.27","text":"
In a similar fashion to the previous upgrade notes, the standard procedure listed above should work with no issues.
Official AWS procedure
A step-bystep instructions guide for upgrading an EKS cluster can be found at the following link: https://repost.aws/knowledge-center/eks-plan-upgrade-cluster
Official AWS Release Notes
To be aware of important changes in each new Kubernetes version in standard support, is important to check the AWS release notes for standard support versions.
Most of these components and services are installed via Helm charts. Usually tweaking these components configuration is done via the input values for their corresponding chart. For detailed information on the different parameters please head to each component public documentation (Links in each section).
It automatically provisions AWS Application Load Balancers (ALB) or AWS Network Load Balancers (NLB) in response to the creation of Kubernetes Ingress or LoadBalancer resources respectively. Automates the routing of traffic to the cluster.
Implementation in ref architecture: /apps-devstg/us-east-1/k8s-eks-demoapps
It is used to allow for the configuration of NGINX via a system of annotations in Kubernetes resources.
A configuration can be enforced globally, via the controller.config variable in the helm-chart, or individually for each application, via annotations in the Ingress resource of the application.
Implementation in ref architecture: /apps-devstg/us-east-1/k8s-eks-demoapps
Automatically creates the required DNS records based on the definition of Ingress resources in the cluster.
The annotation kubernetes.io/ingress.class: <class> defines whether the records are created in the public hosted zone or the private hosted zone for the environment. It accepts one of two values: public-apps or private-apps.
Implementation in ref architecture: /apps-devstg/us-east-1/k8s-eks-demoapps
Automatically fetches secrets and parameters from Parameter Store, AWS Secrets Manager and other sources, and makes them available in the cluster as Kubernetes Secrets.
Implementation in ref architecture: /apps-devstg/us-east-1/k8s-eks-demoapps
Stack of Kubernetes manifests, monitoring, alerting and visualization applications, rules and dashboards implementing an end-to-end Kubernetes monitoring solution.
Implementation in ref architecture: /apps-devstg/us-east-1/k8s-eks-demoapps
Provides the capability of using more complex deployment and promotion schemes to eliminate downtime and allow for greater control of the process. Like Blue-Green or Canary deployment.
"},{"location":"user-guide/ref-architecture-eks/components/#argo-cd-image-updater","title":"Argo CD Image Updater","text":"
Tracks for new images in ECR and updates the applications definition so that Argo CD automatically proceeds with the deployment of such images.
Access to EKS is usually achieved via IAM roles. These could be either custom IAM roles that you define, or SSO roles that AWS takes care of creating and managing.
Granting different kinds of access to IAM roles can be done as shown here where you can define classic IAM roles or SSO roles. Note however that, since the latter are managed by AWS SSO, they could change if they are recreated or reassigned.
Now, even though granting access to roles is the preferred way, keep in mind that that is not the only way you can use. You can also grant access to specific users or to specific accounts.
Amazon Elastic Kubernetes Services (EKS) is a managed service that makes it easy for you to run Kubernetes on AWS without needing to install and operate your own Kubernetes control plane or worker nodes.
Core Features
Highly Secure: EKS automatically applies the latest security patches to your cluster control plane.
Multiple Availability Zones: EKS auto-detects and replaces unhealthy control plane nodes and provides on-demand, zero downtime upgrades and patching.
Serverless Compute: EKS supports AWS Fargate to remove the need to provision and manage servers, improving security through application isolation by design.
Built with the Community: AWS actively works with the Kubernetes community, including making contributions to the Kubernetes code base helping you take advantage of AWS services.
Figure: AWS K8s EKS architecture diagram (just as reference). (Source: Jay McConnell, \"A tale from the trenches: The CloudBees Core on AWS Quick Start\", AWS Infrastructure & Automation Blog post, accessed November 18th 2020)."},{"location":"user-guide/ref-architecture-eks/overview/#version-support","title":"Version Support","text":"
At Leverage we support the last 3 latest stable Kubernetes version releases (at best effort) within our Reference Architecture EKS layer and IaC Library EKS module
We think this is a good balance between management overhead and an acceptable level of supported versions (at best effort). If your project have and older legacy version we could work along your CloudOps team to safely migrate it to a Leverage supported EKS version.
This is the primary resource which defines the cluster. We will create one cluster on each account:
apps-devstg/us-east-1/k8s-eks
apps-devstg/us-east-1/k8s-eks-demoapps
Important
In case of multiple environments hosted in the same cluster as for the one with Apps Dev and Stage, the workload isolation will be achieved through Kubernetes features such as namespaces, network policies, RBAC, and others.
Each option has its pros and cons with regard to cost, operation complexity, extensibility, customization capabilities, features, and management.
In general we implement Managed Nodes. The main reasons being:
They allow a high degree of control in terms of the components we can deploy and the features those components can provide to us. For instance we can run ingress controllers and service mesh, among other very customizable resources.
AWS takes care of provisioning and lifecycle management of nodes which is one less task to worry about.
Upgrading Kubernetes versions becomes much simpler and quicker to perform.
We still can, at any time, start using Fargate and Fargate Spot by simply creating a profile for one or both of them, then we only need to move the workloads that we want to run on Fargate profiles of our choice.
AWS EKS: Docker runs in the 172.17.0.0/16 CIDR range in Amazon EKS clusters. We recommend that your cluster's VPC subnets do not overlap this range. Otherwise, you will receive the following error:
Error: : error upgrading connection: error dialing backend: dial tcp 172.17.nn.nn:10250:\ngetsockopt: no route to host\n
Read more: AWS EKS network requirements
Reserved IP Addresses The first four IP addresses and the last IP address in each subnet CIDR block are not available for you to use, and cannot be assigned to an instance. For example, in a subnet with CIDR block 10.0.0.0/24, the following five IP addresses are reserved. For more AWS VPC Subnets IP addressing
"},{"location":"user-guide/ref-architecture-eks/vpc/#vpcs-ip-addressing-plan-cidr-blocks-sizing","title":"VPCs IP Addressing Plan (CIDR blocks sizing)","text":"
Introduction
VPCs can vary in size from 16 addresses (/28 netmask) to 65,536 addresses (/16 netmask). In order to size a VPC correctly, it is important to understand the number, types, and sizes of workloads expected to run in it, as well as workload elasticity and load balancing requirements.
Keep in mind that there is no charge for using Amazon VPC (aside from EC2 charges), therefore cost should not be a factor when determining the appropriate size for your VPC, so make sure you size your VPC for growth.
Moving workloads or AWS resources between networks is not a trivial task, so be generous in your IP address estimates to give yourself plenty of room to grow, deploy new workloads, or change your VPC design configuration from one to another. The majority of AWS customers use VPCs with a /16 netmask and subnets with /24 netmasks. The primary reason AWS customers select smaller VPC and subnet sizes is to avoid overlapping network addresses with existing networks.
So having AWS single VPC Design we've chosen a Medium/Small VPC/Subnet addressing plan which would probably fit a broad range variety of use cases
"},{"location":"user-guide/ref-architecture-eks/vpc/#networking-ip-addressing","title":"Networking - IP Addressing","text":"
Starting CIDR Segment (AWS EKS clusters)
AWS EKS clusters IP Addressing calculation is presented below based on segment 10.0.0.0/16 (starts at /16 due to AWS VPC limits)
We started from 10.0.0.0/16 and subnetted to /19
Resulting in Total Subnets: 8
Number of available hosts for each subnet: 8190
Number of available IPs (AWS) for each subnet: 8187
Individual CIDR Segments (VPCs)
Then each of these are /16 to /19
Considering the whole Starting CIDR Segment (AWS EKS clusters) before declared, we'll start at 10.0.0.0/16
apps-devstg
1ry VPC CIDR: 10.0.0.0/16
1ry VPC DR CIDR: 10.20.0.0/16
apps-prd
1ry VPC CIDR: 10.10.0.0/16
1ry VPC DR CIDR: 10.30.0.0/16
Resulting in Subnets: 4 x VPC
VPC Subnets with Hosts/Net: 16.
Eg: apps-devstg account \u2192 us-east-1 w/ 3 AZs \u2192 3 x Private Subnets /az + 3 x Public Subnets /az
1ry VPC CIDR: 10.0.0.0/16Subnets:
Private 10.0.0.0/19, 10.0.32.0/19 and 10.0.64.0/19
Public 10.0.96.0/19, 10.0.128.0/19 and 10.0.160.0/19
"},{"location":"user-guide/ref-architecture-eks/vpc/#planned-subnets-per-vpc","title":"Planned Subnets per VPC","text":"
Having defined the initial VPC that will be created in the different accounts that were defined, we are going to create subnets in each of these VPCs defining Private and Public subnets split among different availability zones:
In case you would like to further understand the different tech specs and configs for this Ref Arch you could find some details like at the user-guide/Compute/K8s EKS
Config files can be found under each config folders
Global config file /config/common.tfvars contains global context TF variables that we inject to TF commands which are used by all sub-directories such as leverage terraform plan or leverage terraform apply and which cannot be stored in backend.tfvars due to TF.
Account config files
backend.tfvars contains TF variables that are mainly used to configure TF backend but since profile and region are defined there, we also use them to inject those values into other TF commands.
account.tfvars contains TF variables that are specific to an AWS account.
At the corresponding account dir, eg: /hcp/base-tf-backend then,
Run leverage terraform init
Run leverage terraform plan, review the output to understand the expected changes
Run leverage terraform apply, review the output once more and type yes if you are okay with that
This should create a terraform.tfstate file in this directory but we don't want to push that to the repository so let's push the state to the backend we just created
Make sure you've read and prepared your local development environment following the Overview base-configurations section.
Depending in which Terraform Ref Architecture repo you are working, please review and assure you meet all the terraform aws pre-requisites or terraform vault pre-requisites
Remote State
Configuration files
AWS Profile and credentials
Vault token secret
Get into the folder that you need to work with (e.g. 2_identities)
Run leverage terraform init
Make whatever changes you need to make
Run leverage terraform plan if you only mean to preview those changes
Run leverage terraform apply if you want to review and likely apply those changes
Note
If desired, at step #5 you could submit a PR, allowing you and the rest of the team to understand and review what changes would be made to your AWS Cloud Architecture components before executing leverage terraform apply (terraform apply). This brings the huge benefit of treating changes with a GitOps oriented approach, basically as we should treat any other code & infrastructure change, and integrate it with the rest of our tools and practices like CI/CD, in
"},{"location":"user-guide/ref-architecture-vault/workflow/#running-in-automation","title":"Running in Automation","text":"Figure: Running terraform with AWS in automation (just as reference)."},{"location":"user-guide/ref-architecture-vault/workflow/#read-more","title":"Read More","text":"
Make sure you read general troubleshooting page before trying out anything else.
"},{"location":"user-guide/troubleshooting/credentials/#are-you-using-iam-or-sso","title":"Are you using IAM or SSO?","text":"
Leverage supports two methods for getting AWS credentials: IAM and SSO. We are progressively favoring SSO over IAM, only using the latter as a fallback option.
SSO is enabled through the common.tfvars file on this line:
sso_enabled = true\n
If that is set to true, then you are using SSO, otherwise it's IAM."},{"location":"user-guide/troubleshooting/credentials/#why-should-i-care-whether-i-am-using-iam-or-sso","title":"Why should I care whether I am using IAM or SSO?","text":"
Well, because even though both methods will try to get temporary AWS credentials, each method will use a different way to do that. In fact, Leverage relies on the AWS CLI to get the credentials and each method requires completely different commands to achieve that.
"},{"location":"user-guide/troubleshooting/credentials/#do-you-have-mfa-enabled","title":"Do you have MFA enabled?","text":"
MFA is optionally used via the IAM method. It can be enabled/disabled in the build.env file.
Keep in mind that MFA should only be used with the IAM method, not with SSO.
"},{"location":"user-guide/troubleshooting/credentials/#identify-which-credentials-are-failing","title":"Identify which credentials are failing","text":"
Since Leverage actually relies on Terraform and, since most of the definitions are AWS resources, it is likely that you are having issues with the Terraform AWS provider, in other words, you might be struggling with AWS credentials. Now, bear in mind that Leverage can also be used with other providers such as Gitlab, Github, Hashicorp Cloud Platform, or even SSH via Ansible; so the point here is to understand what credentials are not working for you in order to focus the troubleshooting on the right suspect.
"},{"location":"user-guide/troubleshooting/credentials/#determine-the-aws-profile-you-are-using","title":"Determine the AWS profile you are using","text":"
When you are facing AWS credentials issues it's important to understand what is the AWS profile that might be causing the issue. Enabling verbose mode should help with that. The suspect profile is likely to show right above the error line and, once you have identified that, you can skip to the next section.
If the above doesn't make the error evident yet, perhaps you can explore the following questions:
Is it a problem with the Terraform remote state backend? The profile used for that is typically defined in the backend.tfvars file, e.g. this one, or this other one.
Is it a problem with another profile used by the layer? Keep in mind that layers can have multiple profile definitions in order to be able to access resources in different accounts. For instance, this is a simple provider definition that uses a single profile, but here's a more complex definition with multiple provider blocks.
Can the problematic profile be found in the AWS config file? Or is the profile entry in the AWS config file properly defined? Read the next sections for more details on that.
"},{"location":"user-guide/troubleshooting/credentials/#check-the-profiles-in-your-aws-config-file","title":"Check the profiles in your AWS config file","text":"
Once you know what AWS profile is giving you headaches, you can open the AWS config file, typically under ~/.aws/[project_name_here]/config, to look for and inspect that profile definition.
Things to look out for:
Is there a profile entry in that file that matches the suspect profile?
Are there repeated profile entries?
Does the profile entry include all necessary fields (e.g. region, role_arn, source_profile; mfa_serial if MFA is enabled)?
Keep in mind that profiles change depending on if you are using SSO or IAM for getting credentials so please refer to the corresponding section below in this page to find specific details about your case.
"},{"location":"user-guide/troubleshooting/credentials/#configure-the-aws-cli-for-leverage","title":"Configure the AWS CLI for Leverage","text":"
These instructions can be used when you need to test your profiles with the AWS CLI, either to verify the profiles are properly set up or to validate the right permissions were granted.
Since Leverage stores the AWS config and credentials file under a non-default path, when using the AWS CLI you'll need to point it to the right locations:
Get shell access to the Leverage Toolbox Docker Image
Another alternative, if you can't or don't want to install the AWS CLI on your machine, is to use the one included in the Leverage Toolbox Docker image. You can access it by running leverage tf shell
"},{"location":"user-guide/troubleshooting/credentials/#test-the-failing-profile-with-the-aws-cli","title":"Test the failing profile with the AWS CLI","text":"
Once you have narrowed down your investigation to a profile what you can do is test it. For instance, let's assume that the suspect profile is le-shared-devops. You can run this command: aws sts get-caller-identity --profile le-shared-devops in order to mimic the way that AWS credentials are generated in order to be used by Terraform, so if that command succeeds then that's a good sign.
Note: if you use the AWS CLI installed in your host machine, you will need to configure the environment variables in the section \"Configure the AWS CLI for Leverage\" below.
AWS CLI Error Messages
The AWS CLI has been making great improvements to its error messages over time so it is important to pay attention to its output as it can reveal profiles that have been misconfigured with the wrong roles or missing entries.
"},{"location":"user-guide/troubleshooting/credentials/#regenerating-the-aws-config-or-credentials-files","title":"Regenerating the AWS config or credentials files","text":"
If you think your AWS config file has misconfigured or missing profile entries (which could happen due to manual editing of that file, or when AWS accounts have been added or remove) you can try regenerating it via Leverage CLI. But before you do that make sure you know which authentication method you are using: SSO or IAM.
When using IAM, regenerating your AWS config file can be achieved through the leverage credentials command. Check the command documentation here.
When using SSO, the command you need to run is leverage aws configure sso. Refer to that command's documentation for more details.
"},{"location":"user-guide/troubleshooting/credentials/#logging-out-of-your-sso-session","title":"Logging out of your SSO session","text":"
Seldom times, when using SSO, we have received reports of strange behaviors while trying to run Terraform commands via the Leverage CLI. For instance, users would try to run a leverage tf init command but would get an error saying that their session is expire; so they would try to log in via leverage aws sso login as expected, which would proceed normally so they would try the init command again just to get the same error as before. In these cases, which we are still investigating as they are very hard to reproduce, what has worked for most users is to log out from the SSO session via leverage aws sso logout, even log out from your SSO session through the AWS console running your browser, then try logging back in via leverage aws sso login, and then try the init command again.
"},{"location":"user-guide/troubleshooting/general/","title":"Troubleshooting general issues","text":""},{"location":"user-guide/troubleshooting/general/#gathering-more-information","title":"Gathering more information","text":"
Trying to get as much information about the issue as possible is key when troubleshooting.
If the issue happens while you are working on a layer of the reference architecture and you are using Terraform, you can use the --verbose flag to try to get more information about the underlying issue. For instance, if the error shows up while running a Terraform plan command, you can enable a more verbose output like follows:
leverage --verbose tf plan\n
The --verbose flag can also be used when you are working with the Ansible Reference Architecture:
leverage --verbose run init\n
"},{"location":"user-guide/troubleshooting/general/#understanding-how-leverage-gets-the-aws-credentials-for-terraform-and-other-tools","title":"Understanding how Leverage gets the AWS credentials for Terraform and other tools","text":"
Firstly, you need to know that Terraform doesn't support AWS authentication methods that require user interaction. For instance, logging in via SSO or assuming roles that require MFA. That is why Leverage made the following two design decisions in that regard:
Configure Terraform to use AWS profiles via Terraform AWS provider and local AWS configuration files.
Leverage handles the user interactivity during the authentication phase in order to get the credentials that Terraform needs through AWS profiles.
So, Leverage runs simple bash scripts to deal with 2. and then passes the execution flow to Terraform which by then should have the AWS profiles ready-to-use and in the expected path.
"},{"location":"user-guide/troubleshooting/general/#where-are-those-aws-profiles-stored-again","title":"Where are those AWS profiles stored again?","text":"
They are stored in 2 files: config and credentials. By default, the AWS CLI will create those files under this path: ~/.aws/ but Leverage uses a slightly different convention, so they should actually be located in this path: ~/.aws/[project_name_here]/.
So, for instance, if your project name is acme, then said files should be found under: ~/.aws/acme/config and ~/.aws/acme/credentials.
If you get a reiterative dialog for confirmation while running a leverage terraform init :
Warning: the ECDSA host key for 'YYY' differs from the key for the IP address 'ZZZ.ZZZ.ZZZ.ZZZ'\nOffending key for IP in /root/.ssh/known_hosts:xyz\nMatching host key in /root/.ssh/known_hosts:xyw\nAre you sure you want to continue connecting (yes/no)?\n
You may have more than 1 key associated to the YYY host. Remove the old or incorrect one, and the dialog should stop."},{"location":"user-guide/troubleshooting/general/#leverage-cli-cant-find-the-docker-daemon","title":"Leverage CLI can't find the Docker daemon","text":"
The Leverage CLI talks to the Docker API which usually runs as a daemon on your machine. Here's an example of the error:
$ leverage tf shell\n[17:06:13.754] ERROR Docker daemon doesn't seem to be responding. Please check it is up and running correctly before re-running the command.\n
"},{"location":"user-guide/troubleshooting/general/#macos-after-docker-desktop-upgrade","title":"MacOS after Docker Desktop upgrade","text":"
We've seen this happen after a Docker Desktop upgrade. Defaults are changed and the Docker daemon no longer uses Unix sockets but TCP, or perhaps it does use Unix sockets but under a different path or user.
What has worked for us in order to fix the issue is to make sure the following setting is enabled:
Note: that setting can be accessed by clicking on the Docker Desktop icon tray, and then clicking on \"Settings...\". Then click on the \"Advanced\" tab to find the checkbox.
"},{"location":"user-guide/troubleshooting/general/#linux-and-docker-in-rootless-mode","title":"Linux and Docker in Rootless mode","text":"
The same problem might come from missing env variable DOCKER_HOST. leverage looks for Docker socket at unix:///var/run/docker.sock unless DOCKER_HOST is provided in environment. If you installed Docker in Rootless mode, you need to remember to add DOCKER_HOST in you rc files:
"},{"location":"user-guide/troubleshooting/general/#leverage-cli-fails-to-mount-the-ssh-directory","title":"Leverage CLI fails to mount the SSH directory","text":"
The Leverage CLI mounts the ~/.ssh directory in order to make the pulling of private Terraform modules work. The error should look similar to the following:
[18:26:44.416] ERROR Error creating container:\n APIError: 400 Client Error for http+docker://localhost/v1.43/containers/create: Bad Request (\"invalid mount config for type \"bind\": stat /host_mnt/private/tmp/com.apple.launchd.CWrsoki5yP/Listeners: operation not supported\")\n
The problem happes because of the file system virtualization that is used by default and can be fixed by choosing the \"osxfs (Legacy)\" option as shown below:
Note: that setting can be accessed by clicking on the Docker Desktop icon tray, and then clicking on \"Settings...\". The setting should be in the \"General\" tab.
"},{"location":"work-with-us/","title":"Work with us","text":""},{"location":"work-with-us/#customers-collaboration-methodology","title":"Customers collaboration methodology","text":"
What are all the steps of an engagement
1st Stage: Leverage Customer Tech Intro Interview
Complete our binbash Leverage project evaluation form so we can get to know your project, find out if you're a good fit and get in contact with you.
Schedule a tech intro interview meeting to understand which are your exact challenges and do a Leverage Reference Architecture feasibility assessment.
2nd Stage: Leverage Reference Architecture Review
If we can contribute, we'll execute a Mutual NDA (ours or yours), then walk your through to complete our binbash Leverage due diligence for Reference Architecture form.
Once we completely understand your requirements we'll prepare a comprehensive proposal including the complete \"Leverage Implementation Action Plan Roadmap\" (also known as Statement of Work - SOW) detailing every task for the entire project.
After you review it and we agree on the general scope, a Services Agreement (SA) is signed.
The Roadmap (SOW) is executed, we'll send an invoice for the deposit and first Sprint starts.
4rth Stage: binbash Leverage Support
During and after finishing the complete Roadmap we'll provide commercial support, maintenance and upgrades for our work over the long term.
"},{"location":"work-with-us/#work-methodology-intro-video","title":"Work methodology intro video","text":""},{"location":"work-with-us/#customer-support-workflow","title":"Customer Support workflow","text":""},{"location":"work-with-us/#read-more","title":"Read More","text":"
Related articles
FAQs | Agreement and statement of work
"},{"location":"work-with-us/careers/","title":"Careers","text":""},{"location":"work-with-us/careers/#how-we-work","title":"How we work","text":"
binbash work culture
Fully Remote
binbash was founded as a remote-first company. That means you can always work from home, a co-working place, a nice cafe, or wherever else you feel comfortable, and you'll have almost complete control over your working hours. Why \"almost\"? Because depending on the current projects we'll require few hours of overlap between all Leverage collaborators for some specific meetings or shared sessions (pair-programming).
Distributed Team
Despite the fact that our collaborators are currently located in \ud83c\udde6\ud83c\uddf7 Argentina, \ud83c\udde7\ud83c\uddf7 Brazil and \ud83c\uddfa\ud83c\uddfe Uruguay, consider we are currently hiring from most countries in the time zones between GMT-7 (e.g. California, USA) to GMT+2 (e.g., Berlin, Germany).
We promote life-work balance
Job burnout is an epidemic \ud83d\ude46, and we tech workers are especially at risk. So we'll do our best to de-stress our workforce at binbash. In order to achieve this we offer:
Remote work that lets you control your hours and physical location.
Normal working hours (prime-time 9am-5pm GTM-3), in average no more than ~30-40hs per week, and we don't work during weekends or your country of residence national holidays.
Project management and planning that will take into consideration the time zone of all our team members.
A flexible vacation policy where you could take 4 weeks per year away from the keyboard. If more time is needed we could always try to arrange it for you.
No ON-CALL rotation. We only offer support contracts with SLAs of responses on prime time business days hours exclusively.
You will take on big challenges, but the hours are reasonable.
Everyone is treated fairly and with respect, but where disagreement and feedback is always welcome.
That is welcoming, safe, and inclusive for people of all cultures, genders, and races.
Create a collection of reusable, tested, production-ready E2E AWS oriented infrastructure modules (e.g., VPC, IAM, Kubernetes, Prometheus, Grafana, EFK, Consul, Vault, Jenkins, etc.) using several tool and languages: Terraform, Ansible, Helm, Dockerfiles, Python, Bash and Makefiles.
Reference Architecture
Improve, maintain, extend and update our reference architecture, which has been designed under optimal configs for the most popular modern web and mobile applications needs. Its design is fully based on the AWS Well Architected Framework.
Open Source & Leverage DevOps Tools
Contribute to our open source projects to continue building a fundamentally better DevOps experience, including our open source modules, leverage python CLI, Makefiles Lib among others.
Document team knowledge
Get siloed and not yet documented knowledge and extend the Leverage documentation, such as creating knowledgebase articles, runbooks, and other documentation for the internal team as well as binbash Leverage customers.
Customer engineering support
While participating in business-hours only support rotations, collaborate with customer requests, teach binbash Leverage and DevOps best-practices, help resolve problems, escalate to internal SMEs, and automate and document the solutions so that problems are mitigated for future scenarios and users.
Role scope and extra points!
Responsible for the development, maintenance, support and delivery of binbash Leverage Products.
Client side Leverage Reference Architecture solutions implementation, maintenance and support.
Client side cloud solutions & tech management (service delivery and project task management).
Bring Leverage recs for re-engineering, bug fixes (issues) report and improvements based on real scenario implementations.
Mentoring, KT, PRs and team tech follow up both internally and customer facing.
binbash is a small, distributed startup, so things are changing all the time, and from time to time we all wear many hats. You should expect to write lot of code, but, depending on your interests, there will also be lot of opportunities to write blog posts, give talks, contribute to open source, go to conferences, talk with customers, do sales calls, think through financial questions, interview candidates, mentor new hires, design products, come up with marketing ideas, discuss strategy, consider legal questions, and all the other tasks that are part of working at a small company.
Nice to have background
You hate repeating and doing the same thing twice and would rather spend the time to automate a problem away than do the same task again.
You have strong English communication skills and are comfortable engaging with external customers.
You know how to write code across the stack (\u201cDev\u201d) and feel very comfortable with Infra as Code (\"IaC\").
You have experience running production software environments (\"Ops\").
You have a strong background in software engineering and understanding of CI/CD (or you are working hard on it!).
You have a passion for learning new technologies, tools and programming languages.
Bonus points for a sense of humor, empathy, autonomy and curiosity.
Note that even if we're concerned with prior experience like AWS, Linux and Terraform, we're more concerned with curiosity about all areas of the Leverage stack and demonstrated ability to learn quickly and go deep when necessary.
"},{"location":"work-with-us/contribute/","title":"Contribute and Developing binbash Leverage","text":"
This document explains how to get started with developing for Leverage Reference Architecture. It includes how to build, test, and release new versions.
"},{"location":"work-with-us/contribute/#quick-start","title":"Quick Start","text":""},{"location":"work-with-us/contribute/#getting-the-code","title":"Getting the code","text":"
The code must be checked out from this same github.com repo inside the binbash Leverage Github Organization.
Leverage is mainly oriented to Latam, North America and European startup's CTOs, VPEs, Engineering Managers and/or team leads (Software Architects / DevOps Engineers / Cloud Solutions Architects) looking to rapidly set and host their modern web and mobile applications and systems in Amazon Web Services (\u2705 typically in just a few weeks!).
Oriented to Development leads or teams looking to solve their current AWS infrastructure and software delivery business needs in a securely and reliably manner, under the most modern best practices.
Your Entire AWS Cloud solutions based on DevOps practices will be achieved:
Moreover, if you are looking to have the complete control of the source code, and of course be able to run it without us, such as building new Development environments and supporting your Production Cloud environments, you're a great fit for the Leverage AWS Cloud Solutions Reference Architecture model.
And remember you could implement yourself or we could implement it for you! \ud83d\udcaa
"},{"location":"work-with-us/faqs/#agreement-and-statement-of-work","title":"Agreement and statement of work","text":""},{"location":"work-with-us/faqs/#project-kick-off","title":"Project Kick-Off","text":"
Project Kick-Off
Once the agreement contract and NDA are signed we estimate 15 days to have the team ready to start the project following the proposed Roadmap (\u201cStatement of work\u201d) that describes at length exactly what you'll receive.
"},{"location":"work-with-us/faqs/#assignments-and-delivery","title":"Assignments and Delivery","text":"
Assignments and Delivery
After gathering all the customer project requirements and specifications we'll adjust the Reference Architecture based on your needs. As a result we'll develop and present the Leverage Reference Architecture for AWS implementation Roadmap.
A typical Roadmap (\u201cStatement of Work\u201d) includes a set number of Iterations (sprints). We try to keep a narrow scope for each Iteration so that we can tightly control how hours get spent to avoid overruns. We typically avoid adding tasks to a running Iteration so that the scope does not grow. That's also why we have an allocation for to specific long lived tasks:
General-Task-1: DevOps and Solutions Architecture challenge, definitions, tasks (PM), reviews, issues and audit.
General-Task-2: WEEKLY FOLLOW-UP Meeting,
Which is work that falls outside of the current Iteration specific tasks. This is for special requests, meetings, pair programming sessions, extra documentation, etc.
binbash will participate and review the planned tasks along the customer:
planned roadmap features
bug fixes
Implementation support
Using the relevant ticketing system (Jira) to prioritize and plan the corresponding work plan.
"},{"location":"work-with-us/faqs/#reports-and-invoicing","title":"Reports and Invoicing","text":"
Reports and Invoicing
Weekly task reports and tasks management agile metrics. We use Toggl to track all our time by client, project, sprint, and developer. We then import these hours into Quickbooks for invoicing.
"},{"location":"work-with-us/faqs/#rates-and-billing","title":"Rates and Billing","text":"
Rates and pricing plans
Pre-paid package subscriptions: A number of prepaid hours is agreed according to the needs of the project. It could be a \"Basic Plan\" of 40 hours per month. Or a \"Premium Plan\" of 80 hours per month (if more hours are needed it could be reviewed). When buying in bulk there is a discount on the value of the hour. When you pay for the package you start discounting the hours from the total as they are used, and if there are unused hours left, consider that maximum 20% could be transferred for the next month.
On-demand Business Subscription: There are a certain number of hours tracked each month, as planned tasks are demanded. The total spent hours will be reported each month. There is a monthly minimum of 40 hours per month. Support tasks maximum estimated effort should be between 80 and 120 hs / month.
Billing
The Customer will be billed every month. Invoices are due within 15 days of issue. We accept payments via US Bank ACH, Bill.com, and Payoneer. Rates include all applicable taxes and duties as required by law.
Please create a Github Issue to get immediate support from the binbash Leverage Team
"},{"location":"work-with-us/support/#our-engineering-support-team","title":"Our Engineering & Support Team","text":""},{"location":"work-with-us/support/#aws-well-architected-review","title":"AWS Well Architected Review","text":"
Feel free to contact us for an AWS Well Architected Framework Review
Well Architected Framework Review Reference Study Case
binbash Leverage\u2122 and its components intends to be backward compatible, but due to the complex ecosystems of tools we manage this is not always possible.
It is always recommended using the latest version of the Leverage CLI with the latest versions of the Reference Architecture for AWS. In case that's not possible we always recommend pinning versions to favor stability and doing controlled updates component by component based on the below presented compatibility matrix table.
If you need to know which Leverage CLI versions are compatible with which Leverage Toolbox Docker Images please refer to the Release Notes. Just look for the section called \"Version Compatibility\". Bear in mind though that, at the moment, we do not include a full compatibility table there but at least you should be able to find out what's the Toolbox Image that was used for a given release.
If you are looking for the versions of the software included in the Toolbox Docker Image then go instead to the release notes of that repo instead.
This project does not follow the Terraform or other release schedule. Leverage aims to provide a reliable deployment and operations experience for the binbash Leverage\u2122 Reference Architecture for AWS, and typically releases about a quarter after the corresponding Terraform release. This time allows for the Terraform project to resolve any issues introduced by the new version and ensures that we can support the latest features.
We are assuming the binbash LeverageLanding Zone is deployed, apps-devstg and shared were created and region us-east-1 is being used. In any case you can adapt these examples to other scenarios.
As per binbash LeverageLanding Zone defaults, the VPN server will be created in a public network of the shared base-network VPC.
+
It is a "Pritunl" server.
+
All the networks that should be accessible from the VPN must:
+
+
be "peered" to the shared base-network VPC
+
their CIDR have to be added to the "Pritunl" server
+
+
This Pritunl server will be deployed in an EC2 instance.
+
Note this instance can be started/stopped in an scheduled fashion, see here for more info. (Note also, if no EIP is being used, when the instance is stopped and then started again the IP will change.)
Paste this layer into the account/region chosen to host this, e.g. shared/us-east-1/, so the final layer is shared/us-east-1/tools-vpn-server/.
+
+
Info
+
As usual when copying a layer this way, remove the file common-variables.tf and soft-link it to your project level one. E.g. rm common-variables.tf && ln -s ../../../config/common-variables.tf common-variables.tf.
Change as per your needs. At a minimum, change the S3 backend key in config.tf file and in file ec2.tf update the objects dns_records_public_hosted_zone and dns_records_internal_hosted_zone with your own domain.
+
Also, temporarily, allow access to port 22 (SSH) from Internet, so we can access the instance with Ansible.
Copy the playbooks into your project repository. (e.g. you can create an ansible directory inside your binbash Leverage project repository, so all your infraesctructure code is in the same place)
+
cd into the ansible-pritunl-vpn-server (or the name you've chosen) directory.
+
Follow the steps in the repository README file to install the server.
Click the chain icon (Temporary Profile Link) next to the user.
+
Copy the "Temporary url to view profile links, expires after 24 hours" link and send it to the user.
+
The user should open the link.
+
The user has to create an OTP with an app such as Authy, enter a PIN, copy the "Profile URI Link" and enter it in the "import > profile URI" in the Pritunl Client.
must temporally open port 80 to the world (line 52)
+
must temporally open port 443 to the world (line 59)
+
must uncomment public DNS record block (lines 105-112)
+
make apply
+
connect to the VPN and ssh to the Pritunl EC2
+
run '$sudo pritunl reset-ssl-cert'
+
force SSL cert update (manually via UI or via API call)
+ in the case of using the UI, set the "Lets Encrypt Domain" field with the vpn domain and click on save
+
rollback steps a,b & c + make apply
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/user-guide/cookbooks/argocd-external-cluster/index.html b/user-guide/cookbooks/argocd-external-cluster/index.html
index b600f989..a65cb200 100644
--- a/user-guide/cookbooks/argocd-external-cluster/index.html
+++ b/user-guide/cookbooks/argocd-external-cluster/index.html
@@ -18,7 +18,7 @@
-
+
@@ -3983,6 +3983,8 @@
+
+
@@ -4352,6 +4354,26 @@
+
+
+
+
+
+