Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

infra/aws: Add reg.k8s.io S3 buckets replication rules #4118

Merged

Conversation

deobieta
Copy link
Contributor

@deobieta deobieta commented Aug 22, 2022

Ref: #4094

@ameukam @BobyMCbobs @puerco

Saw the latest k8s-infra meeting recording and seems you were having trouble setting up S3 bucket replication from Terraform.

I do not know how objects are copied from GCS to AWS main bucket (us-east-2) but this PR should take care of replication from there.

This PR do the following:

  1. Terraform's AWS provider upgrade to "~> 4.0"

aws_s3_bucket_replication_configuration resource depends on this upgrade to work properly with aws_s3_bucket resource deprecated replication settings. If this upgrade isn't done terraform's plan will always show changes between the two replication settings.

  1. Set S3 module required providers. https://www.terraform.io/language/providers/requirements#requiring-providers
  2. Enable bucket versioning (Required by S3 replication service) Ref: #3667
  3. Eliminates region variable from S3 module and use aws_region data source instead.
  4. Added two new variables to S3 module.

s3_replication_iam_role_arn: IAM role assumed by S3 replication service.

s3_replication_rules: List of configuration maps for S3 replication rules.

us-east-2 bucket (main bucket) is the only S3 bucket that has replication rules for multiple destination buckets ( 1:N replication).

Any written or deleted object in this bucket will be replicated to all other buckets by S3 replication service.

IAM role used for replication
(arn:aws:iam::513428760722:role/registry.k8s.io_s3admin) should have the following premissions.

data "aws_iam_policy_document" "s3_replication" {

  # allow Amazon S3 to retrieve the replication configuration and list the bucket content
  statement {
    effect = "Allow"
    actions = [
      "s3:GetReplicationConfiguration",
      "s3:ListBucket"
    ]

    resources = ["arn:aws:s3:::${var.prefix}registry-k8s-io-us-east-2"]
  }

  # Permissions for these actions are granted on all objects to allow Amazon S3 to get a 
  # specific object version and access control list (ACL) associated with the objects.
  statement {
    effect = "Allow"
    actions = [
      "s3:GetObjectVersionForReplication",
      "s3:GetObjectVersionAcl",
      "s3:GetObjectVersionTagging"
    ]

    resources = ["arn:aws:s3:::${var.prefix}registry-k8s-io-us-east-2/*"]
  }

  # Permissions for these actions on all objects in the destination bucket 
  # allow Amazon S3 to replicate objects or delete markers to the destination bucket
  statement {
    effect = "Allow"
    actions = [
      "s3:ReplicateObject",
      "s3:ReplicateDelete",
      "s3:ReplicateTags"
    ]

    resources = [
      "arn:aws:s3:::${var.prefix}registry-k8s-io-us-east-1/*",
      "arn:aws:s3:::${var.prefix}registry-k8s-io-us-west-1/*",
      "arn:aws:s3:::${var.prefix}registry-k8s-io-us-west-2/*",
      ...
    ]
  }
}

Replication rules only apply to objects created and deleted after the rules are created.
For existing object replication we shhould use S3 batch replication

I tested this changes in my AWS account, replication and deletion works as expected.
Please, if you have any questions or suggestions, let me know.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Aug 22, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 22, 2022
@k8s-ci-robot
Copy link
Contributor

Hi @deobieta. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 22, 2022
@k8s-ci-robot k8s-ci-robot requested a review from thockin August 22, 2022 04:33
@k8s-ci-robot k8s-ci-robot added sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Aug 22, 2022
@ameukam
Copy link
Member

ameukam commented Aug 22, 2022

cc @BobyMCbobs
/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 22, 2022
@hh
Copy link
Member

hh commented Aug 22, 2022

/assign @BobyMCbobs

@BobyMCbobs
Copy link
Member

Thank you @deobieta for your changes.
I'm giving them a spin in a separate AWS account.

@BobyMCbobs
Copy link
Member

BobyMCbobs commented Aug 23, 2022

@deobieta, I was able to reproduce the Terraform in a separate account.

I'm unsure of the intention of the behaviour. After writing multiple objects to the us-east-2 bucket, the objects were not seen in the other buckets even after 15mins or 8 hours.
When would objects expect to be replicated given the configuration in the PR?

@deobieta
Copy link
Contributor Author

@BobyMCbobs I'll apply the code again and double check that replication is happening. In the meantime you can go through this troubleshooting guide.
https://docs.aws.amazon.com/AmazonS3/latest/userguide/replication-troubleshoot.html

The PR does not create the IAM role or set the right permissions. Maybe something's in that matter is causing replication failure.

@deobieta
Copy link
Contributor Author

@BobyMCbobs I tested the code again creating buckets in all regions this time and it worked (creation and deletion of objects). I'm not sure how to proceed to validate the code. Any thoughts?

Here's the code I used to create the IAM role.

module "iam_s3_replication_role" {
  source  = "terraform-aws-modules/iam/aws//modules/iam-assumable-role"
  version = "~> 4.0"

  trusted_role_services = [
    "s3.amazonaws.com",
  ]

  create_role = true

  role_name         = "s3-replication"
  role_requires_mfa = false

  custom_role_policy_arns = [
    aws_iam_policy.s3_replication.arn,
  ]
}

resource "aws_iam_policy" "s3_replication" {
  name   = "s3-replication"
  policy = data.aws_iam_policy_document.s3_replication.json
}

data "aws_iam_policy_document" "s3_replication" {

  # allow Amazon S3 to retrieve the replication configuration and list the bucket content
  statement {
    effect = "Allow"
    actions = [
      "s3:GetReplicationConfiguration",
      "s3:ListBucket"
    ]

    resources = ["arn:aws:s3:::${var.prefix}registry-k8s-io-us-east-2"]
  }

  # Permissions for these actions are granted on all objects to allow Amazon S3 to get a 
  # specific object version and access control list (ACL) associated with the objects.
  statement {
    effect = "Allow"
    actions = [
      "s3:GetObjectVersionForReplication",
      "s3:GetObjectVersionAcl",
      "s3:GetObjectVersionTagging"
    ]

    resources = ["arn:aws:s3:::${var.prefix}registry-k8s-io-us-east-2/*"]
  }

  # Permissions for these actions on all objects in the destination bucket 
  # allow Amazon S3 to replicate objects or delete markers to the destination bucket
  statement {
    effect = "Allow"
    actions = [
      "s3:ReplicateObject",
      "s3:ReplicateDelete",
      "s3:ReplicateTags"
    ]

    resources = [
      "arn:aws:s3:::${var.prefix}registry-k8s-io-us-east-1/*",
      "arn:aws:s3:::${var.prefix}registry-k8s-io-us-west-1/*",
      "arn:aws:s3:::${var.prefix}registry-k8s-io-us-west-2/*",
      "arn:aws:s3:::${var.prefix}registry-k8s-io-ap-northeast-1/*",
      "arn:aws:s3:::${var.prefix}registry-k8s-io-ap-south-1/*",
      "arn:aws:s3:::${var.prefix}registry-k8s-io-ap-southeast-1/*",
      "arn:aws:s3:::${var.prefix}registry-k8s-io-eu-central-1/*",
      "arn:aws:s3:::${var.prefix}registry-k8s-io-eu-west-1/*",

    ]
  }
}


@BobyMCbobs
Copy link
Member

@BobyMCbobs I tested the code again creating buckets in all regions this time and it worked (creation and deletion of objects). I'm not sure how to proceed to validate the code. Any thoughts?

Here's the code I used to create the IAM role.

module "iam_s3_replication_role" {
  source  = "terraform-aws-modules/iam/aws//modules/iam-assumable-role"
  version = "~> 4.0"

  trusted_role_services = [
    "s3.amazonaws.com",
  ]

  create_role = true

  role_name         = "s3-replication"
  role_requires_mfa = false

  custom_role_policy_arns = [
    aws_iam_policy.s3_replication.arn,
  ]
}

resource "aws_iam_policy" "s3_replication" {
  name   = "s3-replication"
  policy = data.aws_iam_policy_document.s3_replication.json
}

data "aws_iam_policy_document" "s3_replication" {

  # allow Amazon S3 to retrieve the replication configuration and list the bucket content
  statement {
    effect = "Allow"
    actions = [
      "s3:GetReplicationConfiguration",
      "s3:ListBucket"
    ]

    resources = ["arn:aws:s3:::${var.prefix}registry-k8s-io-us-east-2"]
  }

  # Permissions for these actions are granted on all objects to allow Amazon S3 to get a 
  # specific object version and access control list (ACL) associated with the objects.
  statement {
    effect = "Allow"
    actions = [
      "s3:GetObjectVersionForReplication",
      "s3:GetObjectVersionAcl",
      "s3:GetObjectVersionTagging"
    ]

    resources = ["arn:aws:s3:::${var.prefix}registry-k8s-io-us-east-2/*"]
  }

  # Permissions for these actions on all objects in the destination bucket 
  # allow Amazon S3 to replicate objects or delete markers to the destination bucket
  statement {
    effect = "Allow"
    actions = [
      "s3:ReplicateObject",
      "s3:ReplicateDelete",
      "s3:ReplicateTags"
    ]

    resources = [
      "arn:aws:s3:::${var.prefix}registry-k8s-io-us-east-1/*",
      "arn:aws:s3:::${var.prefix}registry-k8s-io-us-west-1/*",
      "arn:aws:s3:::${var.prefix}registry-k8s-io-us-west-2/*",
      "arn:aws:s3:::${var.prefix}registry-k8s-io-ap-northeast-1/*",
      "arn:aws:s3:::${var.prefix}registry-k8s-io-ap-south-1/*",
      "arn:aws:s3:::${var.prefix}registry-k8s-io-ap-southeast-1/*",
      "arn:aws:s3:::${var.prefix}registry-k8s-io-eu-central-1/*",
      "arn:aws:s3:::${var.prefix}registry-k8s-io-eu-west-1/*",

    ]
  }
}

@deobieta, I'm trying to see if there's any debug info anywhere. I'm giving it access to use this role arn:aws:iam::513428760722:role/registry.k8s.io_s3admin, configured here
https://github.com/cncf-infra/aws-infra/blob/3778fa6/terraform/iam/main.tf#L87-L106
which appears to be sufficient.

@deobieta
Copy link
Contributor Author

deobieta commented Sep 3, 2022

@BobyMCbobs Make sure the role "arn:aws:iam::513428760722:role/registry.k8s.io_s3admin" has a trusted relationship with S3 service. This is necessary because S3 service assumes the role to sync objects across regions and buckets.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "s3.amazonaws.com"
                ]
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

I would suggest creating a role specifically for this purpose and stick to "Least privilege" practice.

@BobyMCbobs
Copy link
Member

@BobyMCbobs Make sure the role "arn:aws:iam::513428760722:role/registry.k8s.io_s3admin" has a trusted relationship with S3 service. This is necessary because S3 service assumes the role to sync objects across regions and buckets.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "s3.amazonaws.com"
                ]
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

I would suggest creating a role specifically for this purpose and stick to "Least privilege" practice.

@deobieta, thank you. It appears as there's more success to the services s3.amazonaws.com and batchoperations.s3.amazonaws.com being allow to sts:AssumeRole.

@BobyMCbobs
Copy link
Member

I'm closing my attempt at the same thing
#3955 (comment)

My version was syncing between all the regions for an any bucket is the leader consistency, I may revisit this at a later date.

@deobieta deobieta force-pushed the registry-k8s-io-s3-replication branch from 64be7c1 to 988b871 Compare September 9, 2022 17:15
@deobieta deobieta force-pushed the registry-k8s-io-s3-replication branch from 988b871 to 5b90b65 Compare September 12, 2022 16:34
@BobyMCbobs
Copy link
Member

BobyMCbobs commented Sep 13, 2022

I think that the changes are almost ready for merging.

Running

terraform plan -var PREFIX=prod-

I see

...
Plan: 1 to add, 0 to change, 0 to destroy.
...

which is great.

Only nit is to remove
https://github.com/kubernetes/k8s.io/blob/5b90b655d2ad743242850fe27d895b38e524b50e/infra/aws/terraform/registry.k8s.io/s3/versions.tf
and move those required_providers values into the infra/aws/terraform/registry.k8s.io/providers.tf file.

Looking for separate review and approve

cc @deobieta @dims @ameukam @BenTheElder @Riaankl

@ameukam
Copy link
Member

ameukam commented Sep 13, 2022

/approve
/lgtm
/hold

Remove the hold when you're ready.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 13, 2022
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 13, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ameukam, deobieta

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 13, 2022
@deobieta deobieta force-pushed the registry-k8s-io-s3-replication branch from 5b90b65 to 0dc5cbd Compare September 13, 2022 16:04
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 13, 2022
@deobieta
Copy link
Contributor Author

@BobyMCbobs I've removed versions.tf file from s3 module.

@deobieta deobieta force-pushed the registry-k8s-io-s3-replication branch from 0dc5cbd to 5b5c529 Compare September 13, 2022 16:11
@BobyMCbobs
Copy link
Member

@ameukam, please /lgtm once more! We're ready!

@ameukam
Copy link
Member

ameukam commented Sep 14, 2022

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 14, 2022
@BobyMCbobs
Copy link
Member

Thank you @ameukam for the /lgtm and @deobieta for your contribution!

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 14, 2022
@k8s-ci-robot k8s-ci-robot merged commit daa661f into kubernetes:main Sep 14, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.25 milestone Sep 14, 2022
@BobyMCbobs
Copy link
Member

It appears that s3:PutReplicationConfiguration is missing from the s3writer role.
cncf-infra/aws-infra#19

@BobyMCbobs
Copy link
Member

BobyMCbobs commented Sep 14, 2022

There appears to be something up with the IAM permissions for the s3admin role to put the ReplicationConfiguration.
I'm getting a 403 Forbidden when I run the terraform apply -var prefix=prod- displayed with the TF_LOG=debug variable set.

We're so close. I've spent several hours debugging this. I'm unsure what's different for this set up as to my testing one.
Will continue on it tomorrow.

To be clear, the problem is external to this PR.

@BobyMCbobs
Copy link
Member

Switching out arn:aws:iam::513428760722:role/registry.k8s.io_s3admin in
https://github.com/kubernetes/k8s.io/blob/5b5c529/infra/aws/terraform/registry.k8s.io/providers.tf#L78
for arn:aws:iam::513428760722:role/OrganizationAccountAccessRole seemed to push it along.

This determines that the issue is indeed permissions and not an AWS bug.
I will further investigate which permissions are missing.

@BobyMCbobs
Copy link
Member

I anticipate the buckets to now sync. Let's wait and see!

@BobyMCbobs
Copy link
Member

BobyMCbobs commented Sep 14, 2022

I switched role into arn:aws:iam::513428760722:role/registry.k8s.io_s3admin, which has s3:* permissions.
I tried to create a replication role inside of the registry.k8s.io_admin account between prod-registry-k8s-io-us-east-2 to prod-registry-k8s-io-ap-northeast-1 and received the following error:

Screen Shot 2022-09-15 at 10 33 43

Which is weird because, there has already permission granted for that API through the wildcard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants