Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

better handle non eks optimized amis #2073

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jaxesn
Copy link

@jaxesn jaxesn commented Nov 22, 2024

Description of changes:

When running the log collector script on OS images other than the standard AL2/23 eks optimized ami and without the vpc-cni, there are a few potentially confusing log messages due to missing files/execs. This adds:

  • command -v checks around specific bins, lvs pvs vgs ipvsadm ipset conntrack aws-eks-na-cli similar to others within this script.
  • Skips saving ipam information if /var/run/aws-node/ipam.json does not exist.

The one somewhat functional change is to handle Ubuntu nodes. Ubuntu is INIT_TYPE snap in the script and when kubelet logs are retrieve the snap kubelet-eks is assumed to exist. The change checks for existence of the kubelet-eks snap and if it does not exist, falls back to the normal kubelet log collecting. Without this change, the kubelet logs would not be collected for these kinds of nodes.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Testing Done

  • Successfully ran on the al23 optimized ami
  • Successfully ran on Ubuntu EKS Optimized ami to ensure snap still functions as before
  • Successfully ran on Ubuntu and Rhel nodes with the standard set of bins

Old Output:

        This is version 0.7.8. New versions can be found at https://github.com/awslabs/amazon-eks-ami/blob/main/log-collector-script/

Trying to collect common operating system logs... 
Trying to collect kernel logs... 
Trying to collect modinfo... modinfo: ERROR: Module lustre not found.
Trying to collect mount points and volume information... 
Trying to collect SELinux status... 
Trying to collect iptables information... Trying to collect ipvs information... script.sh: line 358: ipvsadm: command not found
sed: -e expression #1, char 18: unknown option to `s'
script.sh: line 359: ipvsadm: command not found

script.sh: line 361: ipvsadm: command not found
script.sh: line 362: ipset: command not found

script.sh: line 364: ipset: command not found

Trying to collect installed packages... 
Trying to collect active system services... 
Trying to Collect Containerd daemon information... 
Trying to Collect Containerd running information... 
Trying to Collect Docker daemon information... 

        Warning: The Docker daemon is not running. 

Trying to collect kubelet information... error: snap "kubelet-eks" not found

Trying to collect nodeadm information... 

        Warning: The current operating system is not supported. 

Trying to collect L-IPAMD introspection information... Trying to collect L-IPAMD prometheus metrics... Trying to collect L-IPAMD checkpoint... cp: cannot stat '/var/run/aws-node/ipam.json': No such file or directory

Trying to collect Multus logs if they exist... 
Trying to collect sysctls information... 
Trying to collect networking infomation... timeout: failed to run command 'conntrack': No such file or directory
timeout: failed to run command 'conntrack': No such file or directory
timeout: failed to run command 'conntrack': No such file or directory
timeout: failed to run command 'ifconfig': No such file or directory

Trying to collect CNI configuration information... 
Trying to collect CNI Configuration Variables from Docker... 

        Warning: The Docker daemon is not running. 
Trying to collect CNI Configuration Variables from Containerd...        Timed out, ignoring "cni configuration variables output " 

Trying to collect network policy ebpf loaded data... script.sh: line 567: /opt/cni/bin/aws-eks-na-cli: No such file or directory

Trying to collect Docker daemon logs... 
Trying to Collect sandbox-image daemon information... 
Trying to Collect CPU Throttled Process Information... 
Trying to Collect IO Throttled Process Information... 
Trying to Collect reboot history... 
Trying to Collect Nvidia Bug report... No Nvidia drivers found, nothing to do.

Trying to archive gathered information... 

        Done... your bundled logs are located in /var/log/eks_i-0be40b1aeed0c7986_2024-11-21_1755-UTC_0.7.8.tar.gz

New Output:

	        This is version 0.7.8. New versions can be found at https://github.com/awslabs/amazon-eks-ami/blob/main/log-collector-script/

Trying to collect common operating system logs... 
Trying to collect kernel logs... 
Trying to collect modinfo... Trying to collect mount points and volume information... 
Trying to collect SELinux status... 
Trying to collect iptables information... 
Trying to collect installed packages... 
Trying to collect active system services... 
Trying to Collect Containerd daemon information... 
Trying to Collect Containerd running information... 
Trying to Collect Docker daemon information... 

        Warning: The Docker daemon is not running. 

Trying to collect kubelet information... 
Trying to collect nodeadm information... 

        Warning: The current operating system is not supported. 

Trying to collect L-IPAMD introspection information... Trying to collect L-IPAMD prometheus metrics... Trying to collect L-IPAMD checkpoint... 
Trying to collect Multus logs if they exist... 
Trying to collect sysctls information... 
Trying to collect networking infomation... 
Trying to collect CNI configuration information... 
Trying to collect CNI Configuration Variables from Docker... 

        Warning: The Docker daemon is not running. 
Trying to collect CNI Configuration Variables from Containerd...        Timed out, ignoring "cni configuration variables output " 

Trying to collect network policy ebpf loaded data... 
Trying to collect Docker daemon logs... 
Trying to Collect sandbox-image daemon information... 
Trying to Collect CPU Throttled Process Information... 
Trying to Collect IO Throttled Process Information... 
Trying to Collect reboot history... 
Trying to Collect Nvidia Bug report... No Nvidia drivers found, nothing to do.

Trying to archive gathered information... 

        Done... your bundled logs are located in /var/log/eks_i-0be40b1aeed0c7986_2024-11-21_1753-UTC_0.7.8.tar.gz

See this guide for recommended testing for PRs. Some tests may not apply. Completing tests and providing additional validation steps are not required, but it is recommended and may reduce review time and time to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant