-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS UPI draft #14241
AWS UPI draft #14241
Conversation
observedGeneration: 1 | ||
replicas: 1 | ||
---- | ||
<1> I think you change the RHCOS AMI here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cuppett, do you know if there's a good RHCOS AMI that we should use?
Do you have more details about the values that a user would need to provide to update these MachineSets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The IAM instance profile name comes as an output from the 03_cluster_security.yaml file. The subnet is created as part of 01_vpc.yaml; however, it doesn't have a name tag to filter on. They can change the specification to ID and use the exact ID or add a name. We could also add a generated name tag to help minimize the edit needed. The same is true for the security group.
For the AMI, we build a different one each day. I usually just query the AWS console to grab a recent one. @wking may have a more prescriptive method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I'll watch the RFE because that will be useful.
The preview will be availble shortly at: |
<2> Specify the name of the worker security group that you created. | ||
<3> Specify the subnet for the machines. | ||
|
||
.. Replace the `securityGroups` stanza with one that specifies the name of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cuppett, do you have the format for the securityGroup and subnet stanzas that use names instead of tags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is one (lifted from a BZ):
securityGroups:
- filters:
- name: tag:Name
values:
- worker-sg-jialiuuuu1
subnet:
filters:
- name: tag:Name
values:
- jialiuuuu1-xbhm2-private-us-east-2b
There should also be a way to specify this using the always-there subnet or security group ID (instead of a tag of Name), but I'm not sure where these filters' formats are defined well. We should find them or a spot for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wking, do you know who might know how these filters are defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wking, will you PTAL at this AWS UPI draft? (I know that we're scrubbing UPI from the docs, but I'm waiting for updates from the open JIRA to try to fix this terminology.)\ @jianlinliu, will you PTAL? |
---- | ||
<1> `<name>` is the name of your bootstrap stack. | ||
|
||
. View the list of machines in the `openshift-machine-api` namespace: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seem like the following steps is doing https://github.com/openshift/installer/blob/master/docs/user/aws/install_upi.md#option-1-dynamic-compute-using-machine-api. These steps are not working well, refer to https://bugzilla.redhat.com/show_bug.cgi?id=1697968.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they're working on it, but I'll track the bug and follow up when it's resolved.
|
||
[WARNING] | ||
==== | ||
It is better to use a MachineSet to allow the cluster to create and manage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to have a link to that section how to use machineset to provision new worker node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to create a new module based on https://docs.openshift.com/container-platform/4.0/machine_management/creating-machineset.html to include or link to.
I've pushed updates for all of the resolved comments. Thank you all! I'm going to follow up on the remaining comments and will be happy to make more changes. |
That makes sense. Can make this a **** kind of field. Can also do it with
SSM parameter. However today it goes in UserData and does within the
template too, so I am not sure what it buys us...
Steve
Mobile
…On Thu, Apr 18, 2019, 3:05 PM Chris Callegari ***@***.***> wrote:
***@***.**** requested changes on this pull request.
------------------------------
In modules/installation-cloudformation-control-plane.adoc
<#14241 (comment)>
:
> + Description: The subnets (recommend private) to launch the master nodes into
+ Type: AWS::EC2::Subnet::Id
+ Master1Subnet:
+ Description: The subnets (recommend private) to launch the master nodes into
+ Type: AWS::EC2::Subnet::Id
+ Master2Subnet:
+ Description: The subnets (recommend private) to launch the master nodes into
+ Type: AWS::EC2::Subnet::Id
+ MasterSecurityGroupId:
+ Description: The master security group ID to associate with master nodes.
+ Type: AWS::EC2::SecurityGroup::Id
+ IgnitionLocation:
+ Default: https://api.$CLUSTER_NAME.$DOMAIN:22623/config/master
+ Description: Location to fetch bootstrap ignition from. (Recommend to use the autocreated ignition config location.)
+ Type: String
+ CertificateAuthorities:
Certificate Authority x.509 certificate is stored unencrypted and
available to all users including AWS support. This absolutely cannot happen.
We have to think of another way.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#14241 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAB2SIHYIITZKMOQ4X5ALHLPRDBBJANCNFSM4HBQFV4A>
.
|
An option would be to scp it manually to the instance. UserData can have bash |
Not unique to AWS UPI. All the installs will have this issue. See email.
Get a BZ going. Good catch, BTW.
Steve
Mobile
…On Thu, Apr 18, 2019, 4:23 PM Chris Callegari ***@***.***> wrote:
An option would be to scp it manually to the instance. UserData can have
bash ! while -f ca.pem; do sleep 5; done loop to pause until the file is
present.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#14241 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAB2SIGMKELHOMZOTXGBMCTPRDKFRANCNFSM4HBQFV4A>
.
|
Folks we are officially bugzilla'd - BZ 1701408 |
@kalexand-rh Should we have something like |
@jianlinliu, eventually. It's a different user story from installation, though. I have a JIRA to figure out all of the processes that will be different between UPI and IPI. |
MasterInstanceType: | ||
Default: m4.xlarge | ||
Type: String | ||
AllowedValues: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest generation of EC2 instances aren't allowed (m5, c5, r5). It might be interesting to allow AMD based instances as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. There are still outstanding issues with nitro-based instance classes.
https://bugzilla.redhat.com/show_bug.cgi?id=1658237
cc @gnufied
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The installer currently falls...forward to m5 for regions that lack m4. So if we don't support m5, we'll need to adjust that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, The most serious of NVMe issues are fixed in - https://bugzilla.redhat.com/show_bug.cgi?id=1632440 . Our SRE team has confirmed that after deploying fixed kernel they did not receive any "stuck in attach/detach" issue.
What @cuppett linked above though is a separate customer running their own Openshift version on AWS. That bug however is more weird. The steps to reproduce - https://bugzilla.redhat.com/show_bug.cgi?id=1658237#c46 . So basically to reproduce this bug, you have to reboot the node while detach is in progress. Our Kernel team says - this is a problem with how EBS sends ACPI events to kernel. In practice I think this is less of a problem. We also have some fixes in Openshift to ensure if a node does get stuck with volumes in "attaching/detaching" state, it is removed from scheduling and is tainted. It must be then drained, rebooted and taint manually removed.
I think we should make sure that - we are running fixed Kernel with proper tuning in RHCOS and start supporting M5/C5 instance types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is excellent news! We could enable the latest generation instances in the templates.
@abhinavdahiya @crawford Do we think m5 should be our default? The latest generation would be my recommendation in general and continues the trend of improving cost/performance. A downside is they support a total of 28 EBS and ENA attachments where the previous generations could support up to 40 EBS attachments 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gnufied It doesn't appear that https://bugzilla.redhat.com/show_bug.cgi?id=1632440 is publicly viewable; could you share some details or open up access to that bug?
RegisterNlbIpTargetsLambda: | ||
Description: Lambda ARN useful to help register/deregister IP targets for these load balancers | ||
Value: !GetAtt RegisterNlbIpTargets.Arn | ||
ExternalApiTargetGroupArn: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove Arn from the resource name? It will help with searching when trying to find values for parameters for the next stack.
ExternalApiTargetGroupArn: | ||
Description: ARN of External API target group | ||
Value: !Ref ExternalApiTargetGroup | ||
InternalApiTargetGroupArn: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove Arn from the resource name? It will help with searching when trying to find values for parameters for the next stack.
InternalApiTargetGroupArn: | ||
Description: ARN of Internal API target group | ||
Value: !Ref InternalApiTargetGroup | ||
InternalServiceTargetGroupArn: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove Arn from the resource name? It will help with searching when trying to find values for parameters for the next stack.
a9cbab0
to
f602bd5
Compare
- filters: | ||
- name: tag:Name | ||
values: | ||
- test-tkh7l-worker-sg <2> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wking, do lines 95 and 117 look right to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, {infraName}-{role}-{sg}
matches this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are your sure here? Per my test result, https://bugzilla.redhat.com/show_bug.cgi?id=1697968#c11, security group would be created with '{role}-{sg}-{infraName}' as 'GroupName' and empty 'Name'. While machine api is expecting '{infraName}-{role}-{sg}' as 'Name'. Do I miss something?
@@ -71,7 +71,7 @@ you can safely ignore this warning. | |||
.. Remove the files that define the control plane machines: | |||
+ | |||
---- | |||
$ rm -f openshift/99_openshift-cluster-api_master-machines-*.yaml | |||
$ rm -f openshift/99_openshift-cluster-api_master-*.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per my latest test result, https://bugzilla.redhat.com/show_bug.cgi?id=1698207, I think this change need to be reverted.
Per @abhinavdahiya, I've removed the steps for creating a cluster that uses cluster-managed workers. |
@skordas, you said that you got this process to work yesterday, and I made a few updates based on your feedback. Are there any other changes that I need to add now, or does this look ok to you? |
The bootstrap Ignition config file does contain secrets, like X.509 keys. If | ||
uploading the Ignition config file to an S3 bucket does not provide adequate | ||
security, you can enable an S3 bucket policy to allow only certain users, | ||
such as the OpenShift IAM user, to access objects that the bucket contains. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"If uploading to an S3 bucket is not secure enough, tighten down your S3 bucket" doesn't feel quite right (there's an implicit "if the S3 bucket we're walking your through below is not secure enough"). How about:
The bootstrap Ignition config file does contain secrets, like X.509 keys. The steps below provide basic security for the S3 bucket. To provide additional security, you can enable an S3 bucket policy to allow only certain users, such as the OpenShift IAM user, to access objects that the bucket contains. And you can avoid S3 entirely and serve your bootstrap Ignition config from any address reachable from the bootstrap machine.
I'm also fine not enumerating possible S3 tightenings, and saying something like:
The bootstrap Ignition config file does contain secrets, like X.509 keys. The steps below provide basic security for the S3 bucket. If you have security concerns with these steps, you can configure a more-secure S3 bucket or avoid S3 entirely and serve your bootstrap Ignition config from any address reachable from the bootstrap machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wking, please check again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f674f0e20fd6 looks good to me.
QE just approved https://bugzilla.redhat.com/show_bug.cgi?id=1697968, which is the last bug that they considered a blocker for this PR. Per @vikram-redhat, that's enough to merge. Please file bugs for any outstanding issues. Merging. |
I was able to provide UPI infrastructure guided by this document |
From https://jira.coreos.com/browse/OSDOCS-323
http://file.rdu.redhat.com/kalexand/041719/osdocs323/installing/installing_aws_upi/installing-aws-upi.html