Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add acmpa ✨ #201

Merged
merged 12 commits into from
Aug 24, 2021
12 changes: 12 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ The currently supported functionality includes:

## AWS

- Deleting all ACM Private CA in an AWS account
- Deleting all Auto scaling groups in an AWS account
- Deleting all Elastic Load Balancers (Classic and V2) in an AWS account
- Deleting all Transit Gateways in an AWS account
Expand Down Expand Up @@ -156,6 +157,9 @@ The following resources support the Config file:
- IAM Access Analyzers
- Resource type: `accessanalyzer`
- Config key: `AccessAnalyzer`
- ACM Private CAs
- Resource type: `acmpca`
- Config key: `ACMPCA`


#### Example
Expand Down Expand Up @@ -248,6 +252,7 @@ To find out what we options are supported in the config file today, consult this
| secretsmanager | none | ✅ | none | none |
| nat-gateway | none | ✅ | none | none |
| accessanalyzer | none | ✅ | none | none |
| acmpca | none | ✅ | none | none |
| ec2 instance | none | none | none | none |
| iam role | none | none | none | none |
| ... (more to come) | none | none | none | none |
Expand Down Expand Up @@ -335,6 +340,13 @@ cd aws
go test -v -run TestListAMIs
```

Use env-vars to opt-in to special tests, which are expensive to run:

```bash
# Run acmpca tests
TEST_ACMPCA_EXPENSIVE_ENABLE=1 go test -v ./...
```

### Formatting

Every source file in this project should be formatted with `go fmt`.
Expand Down
120 changes: 120 additions & 0 deletions aws/acmpca.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
package aws

import (
"sync"
"time"

"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/acmpca"
"github.com/gruntwork-io/cloud-nuke/config"
"github.com/gruntwork-io/cloud-nuke/logging"
"github.com/gruntwork-io/go-commons/errors"
"github.com/hashicorp/go-multierror"
)

// getAllACMPCA returns a list of all arns of ACMPCA, which can be deleted.
func getAllACMPCA(session *session.Session, region string, excludeAfter time.Time, configObj config.Config) ([]*string, error) {
svc := acmpca.New(session)
var arns []*string
if paginationErr := svc.ListCertificateAuthoritiesPages(&acmpca.ListCertificateAuthoritiesInput{}, func(p *acmpca.ListCertificateAuthoritiesOutput, lastPage bool) bool {
for _, ca := range p.CertificateAuthorities {
if shouldIncludeACMPCA(ca, excludeAfter, configObj) {
arns = append(arns, ca.Arn)
}
}
return !lastPage
}); paginationErr != nil {
return nil, errors.WithStackTrace(paginationErr)
}
return arns, nil
}

func shouldIncludeACMPCA(ca *acmpca.CertificateAuthority, excludeAfter time.Time, configObj config.Config) bool {
if ca == nil {
return false
}

// one can only delete CAs if they are 'ACTIVE' or 'DISABLED'
statusSafe := aws.StringValue(ca.Status)
isCandidateForDeletion := statusSafe == acmpca.CertificateAuthorityStatusActive || statusSafe == acmpca.CertificateAuthorityStatusDisabled
if !isCandidateForDeletion {
return false
}

// reference time for excludeAfter is lastStateChangeAt time,
// unless it was never changed and createAt time is used.
var referenceTime time.Time
if ca.LastStateChangeAt == nil {
referenceTime = aws.TimeValue(ca.CreatedAt)
} else {
referenceTime = aws.TimeValue(ca.LastStateChangeAt)
}
if excludeAfter.Before(referenceTime) {
return false
}

return config.ShouldInclude(
aws.StringValue(ca.Arn),
configObj.ACMPCA.IncludeRule.NamesRegExp,
configObj.ACMPCA.ExcludeRule.NamesRegExp,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is checking the names regex against the ARN, which is unintuitive. Is there a way to just get the name out?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I will think about it. You are correct that this does not feel good.
It should be something unique. And the name is not unique.
What is your plan about the regex behaviour for other resources? Is this "the final form"? Otherwise I would get rid of the regex in total.

Copy link
Contributor

@weitzjdevk weitzjdevk Aug 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would need your guidance on this. Indeed this would be the ARN and there is nothing better than using the ARN for it. Name does not make much sense for this service.

I could introduce new types?

type Config struct {
....
	ACMPCA                    ResourceTypeARN `yaml:"acmcpa"`
}

type ResourceTypeARN struct {
	IncludeRule FilterRuleARN `yaml:"include"`
	ExcludeRule FilterRuleARN `yaml:"exclude"`
}

type FilterRuleARN struct {
	ARNsRegExp []Expression `yaml:"arns_regex"`
}

Or I could do:

type FilterRule struct {
	NamesRegExp []Expression `yaml:"names_regex"`
	ARNsRegExp []Expression `yaml:"names_regex"`
}


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is your plan about the regex behaviour for other resources? Is this "the final form"? Otherwise I would get rid of the regex in total.

This is not the final form, as we plan on supporting tags (something users have requested for other resources). Adding in ARN support or tag support just for ACM PCA makes sense to me, although that would be a lot more work to add considering the potential regressions with other resources.

If it isn't critical for your environment, I might suggest removing the regex filters for now and add that in later as a future feature when we support something that makes more sense, like ARN or tags.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. I have removed the configObj support (see the latest commit)

)
}

// nukeAllACMPCA will delete all ACMPCA, which are given by a list of arns.
func nukeAllACMPCA(session *session.Session, arns []*string) error {
if len(arns) == 0 {
logging.Logger.Infof("No ACMPCA to nuke in region %s", *session.Config.Region)
return nil
}
svc := acmpca.New(session)

logging.Logger.Infof("Deleting all ACMPCA in region %s", *session.Config.Region)
// There is no bulk delete acmpca API, so we delete the batch of ARNs concurrently using go routines.
wg := new(sync.WaitGroup)
wg.Add(len(arns))
errChans := make([]chan error, len(arns))
for i, arn := range arns {
errChans[i] = make(chan error, 1)
go deleteACMPCAASync(wg, errChans[i], svc, arn, aws.StringValue(session.Config.Region))
}
wg.Wait()

// Collect all the errors from the async delete calls into a single error struct.
var allErrs *multierror.Error
for _, errChan := range errChans {
if err := <-errChan; err != nil {
allErrs = multierror.Append(allErrs, err)
logging.Logger.Errorf("[Failed] %s", err)
}
}
return errors.WithStackTrace(allErrs.ErrorOrNil())
}

// deleteACMPCAASync deletes the provided ACMPCA arn. Intended to be run in a goroutine, using wait groups
// and a return channel for errors.
func deleteACMPCAASync(wg *sync.WaitGroup, errChan chan error, svc *acmpca.ACMPCA, arn *string, region string) {
defer wg.Done()

logging.Logger.Infof("Setting status to 'DISABLED' for ACMPCA %s in region %s", *arn, region)
if _, updateStatusErr := svc.UpdateCertificateAuthority(&acmpca.UpdateCertificateAuthorityInput{
CertificateAuthorityArn: arn,
Status: aws.String(acmpca.CertificateAuthorityStatusDisabled),
}); updateStatusErr != nil {
errChan <- updateStatusErr
return
}
logging.Logger.Infof("Did set status to 'DISABLED' for ACMPCA: %s in region %s", *arn, region)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this step necessary? I just tried the test and it failed with:

=== RUN   TestListACMPCA
=== PAUSE TestListACMPCA
=== CONT  TestListACMPCA
[cloud-nuke] time="2021-08-04T10:47:49-05:00" level=info msg="Random region chosen: eu-north-1"
    acmpca_test.go:93:
                Error Trace:    acmpca_test.go:93
                Error:          []string{} does not contain "arn:aws:acm-pca:eu-north-1:738755648600:certificate-authority/3cb99176-55e7-427a-9fd7-2a0208d12c98"
                Test:           TestListACMPCA
[cloud-nuke] time="2021-08-04T10:47:50-05:00" level=info msg="Deleting all ACMPCA in region eu-north-1"
[cloud-nuke] time="2021-08-04T10:47:50-05:00" level=info msg="Setting status to 'DISABLED' for ACMPCA arn:aws:acm-pca:eu-north-1:738755648600:certificate-authority/3cb99176-55e7-427a-9fd7-2a0208d12c98 in region eu-north-1"
[cloud-nuke] time="2021-08-04T10:47:51-05:00" level=error msg="[Failed] InvalidStateException: The certificate authority must be in the ACTIVE or DISABLED state to be updated"
--- FAIL: TestListACMPCA (2.36s)

The PCA was in state Pending Validation. However, I was able to delete it just fine using the UI, which suggests that it may not be necessary to disable the PCA prior to deleting it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. Sorry. I will make the listing more smart and test it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the latest commits. Now the tests work as expected. I had to fix a race-condition, where the CA was in a "CREATING" state during test, which one could not see during the test.
Now the "createCA" command waits with retry until the status is the one we need to actually test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


if _, deleteErr := svc.DeleteCertificateAuthority(&acmpca.DeleteCertificateAuthorityInput{
CertificateAuthorityArn: arn,
// the range is 7 to 30 days.
// since cloud-nuke should not be used in production,
// we assume that the minimum (7 days) is fine.
PermanentDeletionTimeInDays: aws.Int64(7),
}); deleteErr != nil {
errChan <- deleteErr
return
}
logging.Logger.Infof("Deleted ACMPCA: %s", *arn)
}
126 changes: 126 additions & 0 deletions aws/acmpca_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
package aws

import (
"os"
"testing"
"time"

awsgo "github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/acmpca"
"github.com/gruntwork-io/cloud-nuke/config"
"github.com/gruntwork-io/cloud-nuke/util"
"github.com/gruntwork-io/go-commons/errors"
"github.com/stretchr/testify/assert"
)

// enableACMPCAExpensiveEnv is used to control whether to run
// the following test or not. The idea is that the test are disabled
// per default and one has to opt-in to enable the test as creating
// and destroying a ACM PCA is expensive.
// Upper bound, worst case: $400 / month per single CA create/delete.
const enableACMPCAExpensiveEnv = "TEST_ACMPCA_EXPENSIVE_ENABLE"

// runOrSkip decides whether to run or skip the test depending
// whether the env-var `TEST_ACMPCA_EXPENSIVE_ENABLE` is set or not.
func runOrSkip(t *testing.T) {
if _, isSet := os.LookupEnv(enableACMPCAExpensiveEnv); !isSet {
t.Skipf("Skipping the integration test for acmpca. Set the env-var '%s' to enable this expensive test.", enableACMPCAExpensiveEnv)
}
}

// createTestACMPCA will create am ACMPCA and return its ARN.
func createTestACMPCA(t *testing.T, session *session.Session, name string) *string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sanity check: Do you have a sense on how much it would cost to run this test? It appears that the cost of ACM Private CA is $400 / month, pro rated to the number of X (days or hours?) that it is active. Does that active period include the days it is waiting to be deleted (the 7 day minimum)? If so, that can rack up quickly for us if we are running cloud nuke tests for each PR.

Depending on how much it costs, can you add an environment variable flag so that we can control when we run the tests related to the ACM PCA functionality (and not on every PR)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am currently talking to our AWS Enterprise contact as indeed their documentation and support is not clear on this and racked up a significant amount of money just for starting/stopping a CA.

I will add some environment variable and this test will be opt-in in all cases so you have to make a concious decision to run it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We heard back from our contact. The billing is pro-rated and does reconciliation after 24-48 hours and you only pay for those hours.

Hypothesis from Support-Agent

Hi Jan
It looks like the billing is fluctuating due to the 24-48 hour delay between the billing console and reality. When a certificate is created, the full amount for the month is added to the bill and this only changes to the pro-rated amount after the certificate has been deleted. I'll keep the internal ticket open with the service team just to confirm that this is exactly what is going on. In terms of your other questions, here are the answers:

  • How does AWS want to do pricing for short-lived CAs? (i.e. can we enable a testing pipeline or not)
    ->>>> I will forward this request to the service team and I'll update you as soon as I hear back from them.
  • Is there an absolute upper service quota for CAs in an account? (I see 200 quoted right now)
    ->>>> The quota of 200 is the default, and we can request an increase should you require it, but we would need a detailed use case for this request.
  • Do deleted CAs (with their 7 - 30 day pending period) count towards the service quota?
    ->>>> Yes, deleted Private CAs count towards the quota until the end of the restoration period.

Answer from AWS PCA Service Team confirming hypothesis from Support-Agent above

Quote

Thank you for your patience. I've just heard back from the team and my original answer about the delay between the billing console and reality is correct. Because the invoice for each month is issued a few days after the end of the month, by the time the bill is issued, everything will have refreshed and the bill will reflect correctly. For the current month, the "real" bill so far is around $7.

For some more context on how we charge for short term PCAs, the team provided the following:

AWS PCA charge customer based on the time period: from the time that CA was created to the time CA was deleted.

Facts:

Customer created 22 CAs in FRA region in July. 1 of them are in free trial.
Customer create CA, and delete them in a short period of time. --> this is fine, nothing wrong.

Customer check bill of July, which is not the final statement. There are delays between customer deleted the CA and the >billing team calculate the bill based on latest data.

If customer check right now, the bill will become about $168.

When customer finally get bill in August, the total charge will be around $8. Unless they created new CAs

Considering this information, the best way to monitor the bill for PCA specifically would be to log the hours and calculate this manually on your side throughout the month.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So how do we think about the pricing here?

If we have this test case and it runs, say, nightly for a month, what sort of fee are we talking? What if we get 20 PRs in this repo, and each one does 20 test runs in a month?

I just want to be extra sure we don't end up with a massive AWS bill as a surprise from adding this new functionality!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brikis98 I think we can go with this for now. @weitzjdevk implemented a feature flag to disable the tests by default. We can merge with that disabled.

Separately, I'll run an experiment to enable that test locally once. Then, we can check how much our bill was for August and make our decision on whether to continually test it or not based on that bill cost. Does that seem reasonable?

Copy link
Contributor

@weitzjdevk weitzjdevk Aug 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have this test case and it runs, say, nightly for a month, what sort of fee are we talking? What if we get 20 PRs in this repo, and each one does 20 test runs in a month?

tl;dr: Price per run approx: $0.60 (if you issue a certificate + $0.75 therefore approx $1.50)
So the rough calculation how this works.

Monthly charge (for CA only): $400
Month has hours: 720

Price (CA) per hour approx. 400/720 approx $ 0.6
Issue a certificate charge: $0.75

Therefore roughly:

Price per run $0.60 + $0.75 approx. $1.50

Timeline (when starting the first day of the month)

  1. Day1 (first of the month): $0
  2. Run it
  3. Delete it
  4. Day1 bill: $0
  5. Day2 bill: $400 + 0.75
  6. Day3 bill: $400 + 0.75
  7. Day4 bill: $400/720 + 0.75

When you do the run in the middle of the month

  1. Day15 (middle of the month): $0
  2. Run it
  3. Delete it
  4. Day15 bill: $0
  5. Day16 bill (middle of month. Only 15 more days for pro-rating): $400 * 15 / 30 + 0.75 = $200 + 0.75
  6. Day17 bill: $200 + 0.75
  7. Day18 bill: $400/720 + 0.75

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have confirmed this cost in our account as well. I think 0.60 per run is a tad bit expensive to be continuously running the tests, but selectively running it everytime this feature changes seems reasonable, so the env var based feature switch is satisfactory.

// As an additional safety guard, we are adding another check here
// to decide whether to run the test or not.
runOrSkip(t)

svc := acmpca.New(session)
ca, err := svc.CreateCertificateAuthority(&acmpca.CreateCertificateAuthorityInput{
CertificateAuthorityConfiguration: &acmpca.CertificateAuthorityConfiguration{
KeyAlgorithm: awsgo.String(acmpca.KeyAlgorithmRsa2048),
SigningAlgorithm: awsgo.String(acmpca.SigningAlgorithmSha256withrsa),
Subject: &acmpca.ASN1Subject{
CommonName: awsgo.String(name),
},
},
CertificateAuthorityType: awsgo.String("ROOT"),
Tags: []*acmpca.Tag{
{
Key: awsgo.String("Name"),
Value: awsgo.String(name),
},
},
})
if err != nil {
assert.Failf(t, "Could not create ACMPCA", errors.WithStackTrace(err).Error())
}
return ca.CertificateAuthorityArn
}

func TestListACMPCA(t *testing.T) {
runOrSkip(t)
t.Parallel()

region, err := getRandomRegion()
if err != nil {
assert.Fail(t, errors.WithStackTrace(err).Error())
}
session, err := session.NewSession(&awsgo.Config{
Region: awsgo.String(region)},
)

if err != nil {
assert.Fail(t, errors.WithStackTrace(err).Error())
}

uniqueTestID := "cloud-nuke-test-" + util.UniqueID()
arn := createTestACMPCA(t, session, uniqueTestID)
// clean up after this test
defer nukeAllACMPCA(session, []*string{arn})

newARNs, err := getAllACMPCA(session, region, time.Now().Add(1*time.Hour*-1), config.Config{})
if err != nil {
assert.Fail(t, "Unable to fetch list of ACMPCA arns")
}
assert.NotContains(t, awsgo.StringValueSlice(newARNs), awsgo.StringValue(arn))

allARNs, err := getAllACMPCA(session, region, time.Now().Add(1*time.Hour), config.Config{})
if err != nil {
assert.Fail(t, "Unable to fetch list of ACMPCA arns")
}

assert.Contains(t, awsgo.StringValueSlice(allARNs), awsgo.StringValue(arn))
}

func TestNukeACMPCA(t *testing.T) {
runOrSkip(t)
t.Parallel()

region, err := getRandomRegion()
if err != nil {
assert.Fail(t, errors.WithStackTrace(err).Error())
}

session, err := session.NewSession(&awsgo.Config{
Region: awsgo.String(region)},
)

if err != nil {
assert.Fail(t, errors.WithStackTrace(err).Error())
}

uniqueTestID := "cloud-nuke-test-" + util.UniqueID()
arn := createTestACMPCA(t, session, uniqueTestID)

if err := nukeAllACMPCA(session, []*string{arn}); err != nil {
assert.Fail(t, errors.WithStackTrace(err).Error())
}

arns, err := getAllACMPCA(session, region, time.Now().Add(1*time.Hour), config.Config{})
if err != nil {
assert.Fail(t, "Unable to fetch list of ACMPCA arns")
}

assert.NotContains(t, awsgo.StringValueSlice(arns), awsgo.StringValue(arn))
}
36 changes: 36 additions & 0 deletions aws/acmpca_types.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
package aws

import (
awsgo "github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/gruntwork-io/go-commons/errors"
)

// ACMPA - represents all ACMPA
type ACMPCA struct {
ARNs []string
}

// ResourceName - the simple name of the aws resource
func (ca ACMPCA) ResourceName() string {
return "acmpca"
}

// ResourceIdentifiers - The volume ids of the ebs volumes
func (ca ACMPCA) ResourceIdentifiers() []string {
return ca.ARNs
}

func (ca ACMPCA) MaxBatchSize() int {
// Tentative batch size to ensure AWS doesn't throttle
return 10
}

// Nuke - nuke 'em all!!!
func (ca ACMPCA) Nuke(session *session.Session, arns []string) error {
if err := nukeAllACMPCA(session, awsgo.StringSlice(arns)); err != nil {
return errors.WithStackTrace(err)
}

return nil
}
15 changes: 15 additions & 0 deletions aws/aws.go
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,20 @@ func GetAllResources(targetRegions []string, excludeAfter time.Time, resourceTyp
// The order in which resources are nuked is important
// because of dependencies between resources

// ACMPCA arns
acmpca := ACMPCA{}
if IsNukeable(acmpca.ResourceName(), resourceTypes) {
arns, err := getAllACMPCA(session, region, excludeAfter, configObj)
if err != nil {
return nil, errors.WithStackTrace(err)
}
if len(arns) > 0 {
acmpca.ARNs = awsgo.StringValueSlice(arns)
resourcesInRegion.Resources = append(resourcesInRegion.Resources, acmpca)
}
}
// End ACMPCA arns

// ASG Names
asGroups := ASGroups{}
if IsNukeable(asGroups.ResourceName(), resourceTypes) {
Expand Down Expand Up @@ -634,6 +648,7 @@ func GetAllResources(targetRegions []string, excludeAfter time.Time, resourceTyp
// ListResourceTypes - Returns list of resources which can be passed to --resource-type
func ListResourceTypes() []string {
resourceTypes := []string{
ACMPCA{}.ResourceName(),
ASGroups{}.ResourceName(),
LaunchConfigs{}.ResourceName(),
LoadBalancers{}.ResourceName(),
Expand Down
1 change: 1 addition & 0 deletions config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ type Config struct {
SecretsManagerSecrets ResourceType `yaml:"SecretsManager"`
NatGateway ResourceType `yaml:"NatGateway"`
AccessAnalyzer ResourceType `yaml:"AccessAnalyzer"`
ACMPCA ResourceType `yaml:"ACMPCA"`
}

type ResourceType struct {
Expand Down
Loading