Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VMware disk won't properly come online in the OS #149

Closed
ArgonV opened this issue Jan 13, 2023 · 31 comments
Closed

VMware disk won't properly come online in the OS #149

ArgonV opened this issue Jan 13, 2023 · 31 comments
Labels
platform/vmware VMware question Further information is requested version/v1.9.x wontfix This will not be worked on

Comments

@ArgonV
Copy link

ArgonV commented Jan 13, 2023

BurmillaOS Version: (ros os version)

v1.9.6

Where are you running BurmillaOS? (docker-machine, AWS, GCE, baremetal, etc.)

VMware vSphere datacenter

Which processor architecture you are using?

Intel Xeon

Do you use some extra hardware? (GPU, etc)?

No

Which console you use (default, ubuntu, centos, etc..)

Default

Do you use some service(s) which are not enabled by default?

No

Have you installed some extra tools to console?

VMware Tools

Do you use some other customizations?

Network config DHCP on boot

I am using the VMware ISO and cannot get the disk drive to properly come up. The VM in vSphere shows that the data disk is attached to the VM, but when I go to the console for BurmillaOS and do a df -h, I don't see the 20GB disk. I am using a node template to boot-strap the node into VMware. The networking, CPU and memory config options are properly being set for the VM.

@olljanat
Copy link
Member

olljanat commented Jan 13, 2023

So you have multiple disks connected to VM? Please share your cloud-init like it is on issue template as data disks of course does not works without proper configuration https://burmillaos.org/docs/storage/additional-mounts/

You also might way to check real world examples from #6

@ArgonV
Copy link
Author

ArgonV commented Jan 13, 2023

I just have the one hard disk, along with the ISO mounted as a CD/DVD drive.

Apologies, I missed the cloud-init part, there is no way to copy txt from the console so here is a screenshot:
cloud-init

And as a reference, here is my cloud-init from a RancherOS host with the disk drive properly mounted:

cloud-init-ros

Notice at the end there are the disks, however I am not adding those to my cloud-init. They are supposed to be getting that from the Rancher boot-strap process in the node template. So my guess is the STATE is not being picked up somehow?

@olljanat
Copy link
Member

olljanat commented Jan 13, 2023

Did you tried to do installation? If I remember right df -h only prints info about mounted volumes. Not about empty disks.

@ArgonV
Copy link
Author

ArgonV commented Jan 13, 2023

I have been trying to get the disk just to show up, formatted and mounted at this point with no success, using cloud-init yaml at a url:

#cloud-config
mounts:
- ["/dev/sda", "/mnt/test", "ext4", ""]
rancher:
  sysctl:
    vm.max_map_count: 262144
  state:
    autoformat:
    - /dev/sda
    - /dev/vda

I can see that /mnt/test is there, but it is not peristent. fdisk -l shows me the device, but df -h doesn't show that it's formatted.

@ArgonV
Copy link
Author

ArgonV commented Jan 13, 2023

If I run mkfs.ext4 /dev/sda manually and reboot - I see that it's there. So why isn't it auto-formatting working on the initial boot with my above cloud-config?

@olljanat
Copy link
Member

RancherOS did contain huge number of ready made installation medias. Sounds that you have been using their rancheros-vmware-autoformat.iso version.

He we purposely limit number of medias to minimum based on feedback which was got from #6 and those auto format medias are one of those which got dropped out from options. You can of course still fill your use case to there and in case we found others who have need for that we can consider re-adding it.

How it works now is that if you want automate installation on VMware you can use guestinfo field cloud-init.config.data for that. More about those in https://burmillaos.org/docs/installation/cloud/vmware-esxi/

and here is real world example how configure it with Terraform:

guestinfo.cloud-init.config.data = <<EOD
#!/bin/bash
(cat << EOF
#cloud-init
runcmd:
- ["mount", "-t", "ext4", "/dev/sdb", "/var/lib/docker"]
rancher:
  sysctl:
    vm.max_map_count: 262144
ssh_authorized_keys:
  - ${var.rancher_public_key}
EOF
)> cloud-init.yml
if ! blkid | grep -q "RANCHER_STATE"; then
 sudo ros install -d /dev/sda --no-reboot -c cloud-init.yml
 if ! blkid | grep -q "USER_DOCKER"; then
  sudo mkfs.ext4 /dev/sdb -L USER_DOCKER
 fi
 sudo reboot
else
 echo "already installed"
fi
EOD

Alternative you can create VMware template by doing installation like this:

#!/bin/bash
echo "Intalling to disk" > /dev/tty1
ros install -f -d /dev/sda --no-reboot --debug --append "console=tty1 console=ttyS0,115200n8 printk.devkmsg=on rancher.autologin=ttyS0"
halt -P

and marking that first VM as template and then just create other VMs based on it.

@ArgonV
Copy link
Author

ArgonV commented Jan 17, 2023

Thank you for the update.

I don't currently utilize Terraform, so I'm trying to pass that in via guestinfo.cloud-init.config.data and guestinfo.cloud-init.data.encoding with no luck so far:

Screen Shot 2023-01-17 at 11 05 07 AM

@olljanat
Copy link
Member

Cloud init should write log to /var/log
Syntax which you are using looks correct if that is valid base64 string and its content is valid (correctly formulated script, using LF instead CRLF, etc...)

@ArgonV
Copy link
Author

ArgonV commented Jan 17, 2023

I used your script above, everything after guestinfo.cloud-init.config.data = and pasted it into https://www.base64encode.org with the LF option set.

Not seeing a cloud init logfile, is it named something non-obvious?

@ArgonV
Copy link
Author

ArgonV commented Jan 17, 2023

Ah, found it under the boot directory in /var/log named cloud-init-execute/save.log Checking those...

@ArgonV
Copy link
Author

ArgonV commented Jan 17, 2023

Here we go:
Screen Shot 2023-01-17 at 12 17 52 PM

"Unrecognized user-data" Trying to see what's up there.

@olljanat
Copy link
Member

You need skip those "EOD" lines. They are just Terraform syntax to define multi line string.

@ArgonV
Copy link
Author

ArgonV commented Jan 17, 2023

Ah I see, thanks. So my config is:

#!/bin/bash
(cat << EOF
#cloud-init
runcmd:
- ["mount", "-t", "ext4", "/dev/sda", "/var/lib/docker"]
rancher:
  sysctl:
    vm.max_map_count: 262144
EOF) > cloud-init.yml
if ! blkid | grep -q "RANCHER_STATE"; then
 sudo ros install -d /dev/sda --no-reboot -c cloud-init.yml
 if ! blkid | grep -q "USER_DOCKER"; then
  sudo mkfs.ext4 /dev/vda -L USER_DOCKER
 fi
 sudo reboot
else
 echo "already installed"
fi

That got me past the "Unrecognized user-data" error. Now at the end of the cloud-init-save.log file I see this error msg: "Failed to run command [wpa_cli term

@olljanat
Copy link
Member

wpa_cli is related to WLAN configuration. Should be safe to ignore in VMware.

@ArgonV
Copy link
Author

ArgonV commented Jan 17, 2023

Thanks, sadly I'm still not seeing the drive. Do I need to keep my initial cloud-init yaml at a URL also?

@olljanat
Copy link
Member

Two things to check.

  1. Your cloud-init is invalid as you cannot use same disk as mount and install target. For simplicity use this instead of:
#!/bin/bash
(cat << EOF
#cloud-init
rancher:
  sysctl:
    vm.max_map_count: 262144
EOF) > cloud-init.yml
if ! blkid | grep -q "RANCHER_STATE"; then
 sudo ros install -d /dev/sda --no-reboot -c cloud-init.yml
 sudo reboot
else
 echo "already installed"
fi
  1. Make sure that boot order on VM is set on way that it will boot hard disk first because only first boot should happen from ISO file.

@ArgonV
Copy link
Author

ArgonV commented Jan 17, 2023

Thanks for all of your help, I've not had to config the VM boot order in the past. Usually with the VMware autoformat feature, it boots the OS from the ISO, does the install and from then on the ISO still needs to be attached to boot - but the overlay (/var/lib/docker) persists on the attached vmdk disk.

I tried the above, and still do not see it mounting. But fdisk -l still lists it at /dev/sda (it's a 20GB disk)

I'm wondering if I need to add in the format command (mkfs.ext4 /dev/sda) to the above code, before the ros install line?

@ArgonV
Copy link
Author

ArgonV commented Jan 17, 2023

Using this cloud-config set to run on boot now:

#cloud-config
runcmd:
- ["sudo", "mkfs.ext4", "/dev/sda"]
- ["sudo", "mount", "-t", "ext4", "/dev/sda", "/var/lib/docker"]
- ["sudo", "ros", "install", "-d", "/dev/sda", "--no-reboot", "-c", "cloud-init.yml"]
- ["sudo", "reboot"]
rancher:
  sysctl:
    vm.max_map_count: 2621444

The disk formats and it mounted at /var/lib/docker! However after some time when I try a docker stats or df -h I see this error:

cannot read table of mounted file systems: No such file or directory

Should I try a mount point somewhere else? Perhaps just the overlay2 subfolder? As that's what's filling up...

@ArgonV
Copy link
Author

ArgonV commented Jan 17, 2023

Earlier you said: Your cloud-init is invalid as you cannot use same disk as mount and install target.

So I tried this could-config, with no luck:

#cloud-config
runcmd:
- ["sudo", "mkfs.ext4", "/dev/sda"]
- ["sudo", "ros", "install", "-d", "/dev/sda", "--no-reboot", "-c", "cloud-init.yml"]
- ["sudo", "reboot"]
rancher:
  sysctl:
    vm.max_map_count: 2621444

It just boots as normal, without installing or restarting.

@ArgonV
Copy link
Author

ArgonV commented Jan 17, 2023

Ah, I had the path wrong to the cloud file and so now it installs using this:

#cloud-config
runcmd:
- ["sudo", "mkfs.ext4", "/dev/sda"]
- ["sudo", "ros", "install", "-d", "/dev/sda", "--no-reboot", "-c", "/var/lib/rancher/conf/cloud-config.yml"]
- ["sudo", "reboot"]
rancher:
  sysctl:
    vm.max_map_count: 2621444

in a cloud-config.yaml file at a URL that Rancher is telling via a cloud-init URL in the node template. Sadly, I've lost the ability to have automatic console login, and I'm using the ssh keys that Rancher dynamically generates. Anyway to enable back auto console login?

@olljanat
Copy link
Member

Did I understand correctly that you are still using Rancher Server 1.6? (Would been useful info earlier here as it does things it's own way).

Then you might want to use something like this rancher#723 (comment)

@ArgonV
Copy link
Author

ArgonV commented Jan 18, 2023

Oh no, I'm on Rancher Server 2.5.x, and 2.6.x. Does that make a difference here?

@olljanat
Copy link
Member

Oh no, I'm on Rancher Server 2.5.x, and 2.6.x.

I'm quite sure that only 1.x versions was called for Rancher Server. 2.x versions are called for just Rancher (or latest documentation looks to be saying Rancher Manager).

Does that make a difference here?

Yes. Rancher 2.4.18 was last version which supported RancherOS https://www.suse.com/suse-rancher/support-matrix/all-supported-versions/rancher-v2-4-18/ and https://www.suse.com/suse-rancher/support-matrix/all-supported-versions/rancher-v2-5-0/ and because BurmillaOS is based on RancherOS it means that they don't support us and to honor that decision we do not support Rancher.

In additionally Rancher is Kubernetes cluster management tool and we do not support Kubernetes at all (look #47 ).

So if you want to use Rancher then it is highly recommend to use some of those Linux distributions which they supports.

@olljanat olljanat added question Further information is requested wontfix This will not be worked on version/v1.9.x labels Jan 18, 2023
@ArgonV
Copy link
Author

ArgonV commented Jan 18, 2023

Ah, I have been running RancherOS for Rancher downstream K8s cluster nodes in v2.5.x and 2.6.x for some time now. I do have paid support on our Prod environment and they (SUSE/Rancher) do honor that deployment. For the actual Rancher "Server" VM's OS I am using Oracle Linux 7 however (here I am talking about the pane of glass that is Rancher, and not the RancherOS or Rancher K8s cluster).

Re: K8s support for BurmillaOS - In our environment, Rancher Server Kubernetes Engine just deploys Docker containers on the downstream cluster nodes in the user docker (not system docker) space. So really there is nothing special with K8s going on here? The cluster provisioner/node provisioner uses boot-to-docker on the VM (ISO), and the Docker engine to bootstrap the VM, deploy the RKE Docker containers that run Kubernetes, and bring it into the downstream cluster with what ever role I assign it on the cluster template. Am I missing something here that makes this not a standard use-case for BurmillaOS? It's all just running Docker containers on the node VM.

@olljanat
Copy link
Member

Am I missing something here that makes this not a standard use-case for BurmillaOS? It's all just running Docker containers on the node VM.

What works and what is supported are two different things. We do not test new BurmillaOS versions with Rancher which why example don't release those autoformat ISO files.

Ah, I have been running RancherOS for Rancher downstream K8s cluster nodes in v2.5.x and 2.6.x for some time now. I do have paid support on our Prod environment and they (SUSE/Rancher) do honor that deployment.

That is interesting. Perhaps you should ask from them then that how we can get BurmillaOS listed as supported OS in RKE1 list? (RKE2 does not use Docker at all so that we cannot support without bigger changes). If they are willing to do that then I'm ready to add RKE1 to our testing set.

@ArgonV
Copy link
Author

ArgonV commented Jan 18, 2023

I will indeed ask SUSE/Rancher that, thanks for all of your help and feedback. This has been most helpful in my quest to find a decent RancherOS replacement for RKE1 clusters, without me having to maintain my own VM templates and updates, or distro.

I am hoping the autoformat feature for VMware may be included back in BurmillaOS - I think the use-case is small, but might help along in bringing BurmillaOS to the Rancher sphere of consideration.

@olljanat
Copy link
Member

Most likely Longhorn will be the hardest part to get working on BurmillaOS. Found two related issues longhorn/longhorn#828 and longhorn/longhorn#3744

However it should be little bit easier than on RancherOS because we switched to Debian based console and included open-iscsi by default #9

@olljanat olljanat added the platform/vmware VMware label Jan 22, 2023
@olljanat
Copy link
Member

I assume that there is no good news from Suse because of all this silence so closing.

@ArgonV
Copy link
Author

ArgonV commented Aug 28, 2023

I assume that there is no good news from Suse because of all this silence so closing.

Sadly nothing. Apologies!

@olljanat
Copy link
Member

No worries. Are planning to keep using BurmillaOS also in future? If so, then it would good idea to test that your use case still works on v2.0.0-rc1

@ArgonV
Copy link
Author

ArgonV commented Aug 28, 2023

Yes I currently have 2 K8s clusters (one in testing and one in pre-production) that are provisioned via Rancher Server that I'm using to test various deployments.

Rancher/SUSE just announced Elemental - but they don't have any pre-built ISOs for vSphere yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform/vmware VMware question Further information is requested version/v1.9.x wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants