From e971302aa5677eb26dee49d66a4f29269bf9b732 Mon Sep 17 00:00:00 2001 From: Yarden-Z <89452607+Yarden-Z@users.noreply.github.com> Date: Mon, 19 Jun 2023 12:56:54 +0300 Subject: [PATCH] Add files via upload --- doc/SONiC Container Hardening.md | 334 +++++++++++++++++++++++++++++++ 1 file changed, 334 insertions(+) create mode 100644 doc/SONiC Container Hardening.md diff --git a/doc/SONiC Container Hardening.md b/doc/SONiC Container Hardening.md new file mode 100644 index 0000000000..2aef451a5f --- /dev/null +++ b/doc/SONiC Container Hardening.md @@ -0,0 +1,334 @@ +# SONiC Container Hardening # + +## Table of Content +- [SONiC Container Hardening](#sonic-container-hardening) + - [Table of Content](#table-of-content) + - [Revision](#revision) + - [Scope](#scope) + - [Definitions/Abbreviations](#definitionsabbreviations) + - [Overview](#overview) + - [Requirements](#requirements) + - [Architecture Design](#architecture-design) + - [Root privileges](#root-privileges) + - [Net=Host](#nethost) + - [High-Level Design](#high-level-design) + - [Root privileges removal](#root-privileges-removal) + - [Docker privileges](#docker-privileges) + - [Net Host removal](#net-host-removal) + - [How to check?](#how-to-check) + - [SAI API](#sai-api) + - [Configuration and management](#configuration-and-management) + - [Manifest (if the feature is an Application Extension)](#manifest-if-the-feature-is-an-application-extension) + - [CLI/YANG model Enhancements](#cliyang-model-enhancements) + - [Config DB Enhancements](#config-db-enhancements) + - [Warmboot and Fastboot Design Impact](#warmboot-and-fastboot-design-impact) + - [Restrictions/Limitations](#restrictionslimitations) + - [Testing Requirements/Design](#testing-requirementsdesign) + - [Unit Test cases](#unit-test-cases) + - [System Test cases](#system-test-cases) + - [Open/Action items - if any](#openaction-items---if-any) + - [Appendix](#appendix) + + +### Revision + +### Scope + +This section describes the requirements, goals and recommendations of the container hardening item for SONiC + +## Definitions/Abbreviations + +TBD + +## Overview + +Containers is a method of creating virtualization and abstraction of an OS for a subset of processes/service on top of a single host with the purpose of giving it an environment to run and execute its tasks without effect of nearby containers/processes. + +In SONiC, we are deploying containers with full visibility and capabilities as the host Linux. + +This poses a security risk and vulnerability as a single breached container means that the whole system is breached. + +Addressing this issue – we have composed this doc for container hardening, describing the security hardening requirements and definitions for all containers on top of SONiC. + +## Requirements + +What are we trying to achieve here? + +We would like to increase the security in SONiC so that an attack on a specific container will not compromise the whole system. + +To do so, we’ll tackle the following areas: +1. Privileges +2. Network +3. Capabilities +4. Mount namespace +5. Cgroups +6. Etc’ + +For now, we will focus on #1 & #2 + +Further guidelines and requirements will be brought upon in the future on-demand. + +## Architecture Design + +### Root privileges + +When removing the root privileges from a specific container - we are required to remove the --privileged flag and add the required missing Linux capabilities to the docker, +or alternitavely adjust the container so that it does not require root privileges to perform any action. + +### Net=Host + +Removing the net=HOST is required to prevent the container from accessing the full network scope of the host and system. +When doing this removal - we will start getting failures from devices that require external access and packet transfers between the container and the host to the interfaces. +In order to overcome this obstacle - we have a few options here: +- Port forwarding +- + +## High-Level Design + +### Root privileges removal +Removing the --privileged flag is done by editing the docker_image_ctl.j2 file: + +docker_image_ctl.j2 file + + docker create {{docker_image_run_opt}} \ # *Need to modify this parameter "docker_image_run_opt" to not contain the --privileged flag* + {%- if docker_container_name != "database" %} + --net=$NET \ + --uts=host \{# W/A: this should be set per-docker, for those dockers which really need host's UTS namespace #} + {%- endif %} + {%- if docker_container_name == "database" %} + -p 6379:6379 \ + {%- endif %} + -e RUNTIME_OWNER=local \ + {%- if install_debug_image == "y" %} + -v /src:/src:ro -v /debug:/debug:rw \ + {%- endif %} + {%- if '--log-driver=json-file' in docker_image_run_opt or '--log-driver' not in docker_image_run_opt %} + --log-opt max-size=2M --log-opt max-file=5 \ + {%- endif %} + +This will cause the docker file to be altered in the following manner: + +**database.sh file** + + docker create --privileged -t -v /etc/sonic:/etc/sonic:ro \ # *Need to remove the --privileged flag* + -p 6379:6379 \ + -e RUNTIME_OWNER=local \ + --log-opt max-size=2M --log-opt max-file=5 \ + --tmpfs /tmp \ + $DB_OPT \ + $REDIS_MNT \ + -v /usr/share/sonic/device/$PLATFORM:/usr/share/sonic/platform:ro \ + --tmpfs /var/tmp \ + --env "NAMESPACE_ID"="$DEV" \ + --env "NAMESPACE_PREFIX"="$NAMESPACE_PREFIX" \ + --env "NAMESPACE_COUNT"=$NUM_ASIC \ + --name=$DOCKERNAME \ + docker-database:latest \ + || { + echo "Failed to docker run" >&1 + exit 4 + } + +#### Docker privileges +Removing the root privileges from the docker container - will remove some Linux capabilities that are inherited from the root level permissions. + +Runnign the capabilities list command on a privileged container: + + root@str-e1031-acs-1:/# capsh --print + Current: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read+eip + +Runnign the capabilities list command on an un-privileged container: + root@ce2c36a0b20c:/# capsh --print + + Current: cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap=eip + +If, for some reason, a docker must retain a specific capablity functionality on top of the container (which is removed after removign the --privileged flag), we can do that with the following: + +In the docker-database.mk file adjust this line: + + $(DOCKER_DATABASE)_RUN_OPT += -t –-cap-add NET_ADMIN #Changed by removing the --privileged flag and adding --cap-add flag + + +### Net Host removal + +Here we will give an example of how to perform the `--net=host` removal (host network) from a specific container. +We are using the database container as an example for this item. + +The original docker creation should be like in the example below: +docker with host sharing: + + docker create --privileged -t -v /etc/sonic:/etc/sonic:ro \ + --net=$NET \ + -e RUNTIME_OWNER=local \ + --uts=host \ + --log-opt max-size=2M --log-opt max-file=5 \ + --tmpfs /tmp \ + $DB_OPT \ + $REDIS_MNT \ + -v /usr/share/sonic/device/$PLATFORM:/usr/share/sonic/platform:ro \ + --tmpfs /var/tmp \ + --env "NAMESPACE_ID"="$DEV" \ + --env "NAMESPACE_PREFIX"="$NAMESPACE_PREFIX" \ + --env "NAMESPACE_COUNT"=$NUM_ASIC \ + --name=database_no_net \ + --cap-drop=NET_ADMIN \ + docker-database:latest + +To disable the sharing of the networking stack between the host and a container we need to remove the flag: `--net=host` +To support port forwarding we are required to add the flag:  -p : + + +The "new" docker creation file database.sh can be seen in the code block below: +Docker with port forwarding + + docker create --privileged -t -v /etc/sonic:/etc/sonic:ro \ + **-p 6379:6379** \ + -e RUNTIME_OWNER=local \ + --uts=host \ + --log-opt max-size=2M --log-opt max-file=5 \ + --tmpfs /tmp \ + $DB_OPT \ + $REDIS_MNT \ + -v /usr/share/sonic/device/$PLATFORM:/usr/share/sonic/platform:ro \ + --tmpfs /var/tmp \ + --env "NAMESPACE_ID"="$DEV" \ + --env "NAMESPACE_PREFIX"="$NAMESPACE_PREFIX" \ + --env "NAMESPACE_COUNT"=$NUM_ASIC \ + --name=$DOCKERNAME \ + docker-database:latest \ + + +**How we did it?** + +To create a docker with the flags above it is required to set the "new" flag in the file docker_image_ctl.js. Follow the call "docker create {{docker_image_run_opt}} \":  +and replace the `–--net=$NET`. +docker flag generation + + {%- if docker_container_name != "database" %} + --net=$NET \ + {%- endif %} + {%- if docker_container_name == "database" %} + -p 6379:6379 \ + {%- endif %} + + +#### How to check? + +Go into the docker - docker exec -it docker bash +Run 'ifconfig'. + +On a docker with host network - you'll be able to view all physical interfaces. +On a docker without host network - we'll see only eth0 and lo. + +## SAI API + +N/A + +## Configuration and management + +N/A - no configuration management/changes are required. + +### Manifest (if the feature is an Application Extension) + +N/A + +### CLI/YANG model Enhancements + +N/A +We are not adding CLI commands or management capabilities to the system with this item. + +### Config DB Enhancements + +N/A - DB should remain the same + +## Warmboot and Fastboot Design Impact + +No impact on all boot sequences, as this item should be seemlessly integrated into the system and achieve the same functionality level as before. + +## Restrictions/Limitations + +## Testing Requirements/Design + +To define this item completed - we are required to run the full CI and check that nothing has been broken from the changes proposed in this HLD. +In addition - we should test that the mitigations are applicable for the relevant containers. + +### Unit Test cases + +N/A, this feature will be checked on a system level. + +### System Test cases + +For general fucntionality flows- running the same test cases that we currently have on top of our system and verifying that nothing broke. + +For adidtional security test cases, we should check that priviliges and network capabilities have been removed. +Net=$HOST removal test: +1. Login to container with removed network capabilities +2. Run ls /dev/ +3. Check that we do not have visibility to all network devices (no tty9/8 no sda, etc') + +Privilege removal test: +1. Login to container without --privileged flag +2. Check that you cannot access /etc/shadow +3. Check that you cannot perform vim for /boot folder or any file in it + + +## Open/Action items - if any + +Currently, Nvidia and MSFT have scoped commitment for specific containers. +Redis and SNMP already have these adjustments. +What remains is to perform this container hardening for all other containers in the system so that the whole scho-system will comply to these security hardening requirements. + + + +## Appendix +Further reading: + +[Linux Capabilities 101](https://linux-audit.com/linux-capabilities-101/) + +[Understanding Linux Capabilities](https://tbhaxor.com/understanding-linux-capabilities/) + +[Linux Namespaces Wiki](https://en.wikipedia.org/wiki/Linux_namespaces) + +| Capability Key | Capability Description | +| ----------- | ----------- | +| AUDIT_WRITE | Write records to kernel auditing log | +| CHOWN | Make arbitrary changes to file UIDs and GIDs (see chown(2)). | +| DAC_OVERRIDE | Bypass file read, write, and execute permission checks. | +| FOWNER | Bypass permission checks on operations that normally require the file system UID of the process to match the UID of the file. | +| FSETID | Don’t clear set-user-ID and set-group-ID permission bits when a file is modified. | +| KILL | Bypass permission checks for sending signals | +| MKNOD | Create special files using mknod(2). | +| NET_BIND_SERVICE | Bind a socket to internet domain privileged ports (port numbers less than 1024). | +| NET_RAW | Use RAW and PACKET sockets | +| SETFCAP | Set file capabilities | +| SETGID | Make arbitrary manipulations of process GIDs and supplementary GID list. | +| SETPCAP | Modify process capabilities | +| SETUID | Make arbitrary manipulations of process UIDs. | +| SYS_CHROOT | Use chroot(2), change root directory. | +| AUDIT_CONTROL | Enable and disable kernel auditing; change auditing filter rules; retrieve auditing status and filtering rules. | +| AUDIT_READ | Allow reading the audit log via multicast netlink socket | +| BLOCK_SUSPEND | Allow preventing system suspends. | +| BPF | Allow creating BPF maps, loading BPF Type Format (BTF) data, retrieve JITed code of BPF programs, and more. | +| CHECKPOINT_RESTORE | Allow checkpoint/restore related operations. Introduced in kernel 5.9. | +| DAC_READ_SEARCH | Bypass file read permission checks and directory read and execute permission checks. | +| IPC_LOCK | Lock memory (mlock(2), mlockall(2), mmap(2), shmctl(2)). | +| IPC_OWNER | Bypass permission checks for operations on System V IPC objects. | +| LEASE | Establish leases on arbitrary files (see fcntl(2)). | +| LINUX_IMMUTABLE | Set the FS_APPEND_FL and FS_IMMUTABLE_FL i-node flags. | +| MAC_ADMIN | Allow MAC configuration or state changes. Implemented for the Smack LSM. | +| MAC_OVERRIDE | Override Mandatory Access Control (MAC). Implemented for the Smack Linux Security Module (LSM). | +| NET_ADMIN | Perform various network-related operations. | +| NET_BROADCAST | Make socket broadcasts, and listen to multicasts. | +| PERFMON | Allow system performance and observability privileged operations using perf_events, i915_perf and other kernel subsystems | +| SYS_ADMIN | Perform a range of system administration operations. | +| SYS_BOOT | Use reboot(2) and kexec_load(2), reboot and load a new kernel for later execution. | +| SYS_MODULE | Load and unload kernel modules. | +| SYS_NICE | Raise process nice value (nice(2), setpriority(2)) and change the nice value for arbitrary processes. | +| SYS_PACCT | Use acct(2), switch process accounting on or off. | +| SYS_PTRACE | Trace arbitrary processes using ptrace(2). | +| SYS_RAWIO | Perform I/O port operations (iopl(2) and ioperm(2)). | +| SYS_RESOURCE | Override resource Limits | +| SYS_TIME | Set system clock (settimeofday(2), stime(2), adjtimex(2)); set real-time (hardware) clock. | +| SYS_TTY_CONFIG | Use vhangup(2); employ various privileged ioctl(2) operations on virtual terminals. | +| SYSLOG | Perform privileged syslog(2) operations. | +| WAKE_ALARM | Trigger something that will wake up the system |