Seccomp for Kubernetes workloads

Jun 9, 2024

Ben Hirschberg
CTO & Co-founder

Seccomp in a nutshell

Seccomp, short for Secure Computing Mode, is a security feature in the Linux kernel that plays a role in enhancing the security of systems. Initially introduced in Linux kernel 2.6.12 in 2005, seccomp was designed to restrict the system calls a process can make, effectively reducing the attack surface and limiting potential damage from compromised processes. This feature was born from the need to safely execute untrusted programs, a requirement that has become increasingly relevant in today’s diverse computing environments.

In its essence, seccomp operates by transitioning a process into a restricted state where it can only invoke a limited set of system calls deemed safe by the administrator or developer. In containerized environments, this limited set is defined in a seccomp profile, a customizable rule set that specifies which system calls are permitted and which are denied. Suppose a process tries to execute a system call not listed in its seccomp profile. In that case, the kernel intervenes, typically terminating the process, returning an error code, or logging the event, depending on the configuration. This capability is fundamental in reducing the risk of kernel-level exploits, as even if an attacker compromises a process or a container, their ability to harm the system is significantly curtailed.

The evolution of seccomp reflects the growing complexity and security requirements of modern computing. Initially, seccomp only offered the strict mode mentioned above, which was too rigid for some use cases. These cases include instances where the system-call parameters or context matter in the decision-making. However, with the introduction of the “filter mode” which uses BPF (Berkeley Packet Filter) in kernel version 3.5, it gained much more flexibility. This allowed for more sophisticated filtering of system calls, enabling users to create finely-tuned security policies that balance security with functional requirements.

Today, seccomp is widely used in various applications beyond Kubernetes, including web browsers like Google Chrome and Mozilla Firefox, to sandbox their rendering engines—highlighting its importance in securing a host system against malicious code.

Importance of Seccomp in Kubernetes

In the context of Kubernetes, understanding the role of seccomp is critical for safeguarding nodes against malicious code execution. Seccomp serves as an important defense mechanism, protecting the host (in a Kubernetes setting, the node) from unauthorized code execution. There are two primary ways through which an attacker might gain code execution within a Kubernetes workload.

Firstly, an attacker might exploit a vulnerability within the software running inside a container. This form of attack involves finding and leveraging flaws in the application or the underlying components to gain unauthorized control remotely. Once a vulnerability is exploited, the attacker can potentially execute arbitrary code, posing a threat to the host.

Secondly, the supply chain poses another critical risk vector. In this scenario, an attacker might trick the operator or the system into running a container image laced with malicious code. This kind of attack is particularly misleading as it might bypass initial security checks and infiltrate the system under the guise of a legitimate container image.

In both these scenarios, the importance of an additional layer of protection at the node level cannot be overstated. Seccomp profiles play a crucial role here. If malicious code, introduced by either of these methods, attempts to use a system call that was previously unused or deemed unnecessary for the container’s operation, seccomp can effectively block this attempt.

Here is a short list of recent known vulnerabilities that Seccomp mitigates:

CVE-2022-0492 (Carpediem) – enables attackers to escape the container and gain full root privileges
CVE-2022-0185 – enables attackers to gain CAP_SYS_ADMIN (very high privilege role for a container) and is mitigated by Seccomp denying the system-call “fsopen” that is rarely used by applications
CVE-2022-0847 (Dirty Pipes) – enables attackers to gain root shell on the victim’s system and it would be mitigated by Seccomp denying the system-call “pipe” that is less prevalent in containerized applications

This interception is crucial because if an attacker successfully executes a system call that facilitates escaping the container’s sandbox, they could gain access to the host’s filesystem. This access not only compromises the node but also exposes node secrets and elevated Kubernetes privileges. Such access can lead to further exploitation, like accessing sensitive secrets or manipulating Kubernetes resources, which in severe cases, could result in the attacker gaining control over the entire cluster.

Seccomp in the context of Kubernetes

In Kubernetes, seccomp was integrated as a general feature in version 1.19.0, enhancing security by filtering system calls. For Kubernetes workloads, seccomp can be enabled in two ways:

Using Pre-made Seccomp Profiles

Kubernetes allows the application of ready-made seccomp profiles through the `SecurityContext` of a pod or container. A common example is the `RuntimeDefault` profile, which uses the container runtime’s default seccomp profile. While convenient, these pre-made profiles may not be fully tailored to the specific needs of an application, potentially blocking necessary system calls or not being restrictive enough.

Here’s a simplified example of a Kubernetes deployment using the `RuntimeDefault` seccomp profile:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: example
  template:
    metadata:
      labels:
        app: example
    spec:
      containers:
      - name: example-container
        image: nginx:1.21
        securityContext:
          seccompProfile:
            type: RuntimeDefault

In this example, the `securityContext` for the container specifies the use of the `RuntimeDefault` seccomp profile, which applies the default seccomp profile provided by the container runtime, restricting certain system calls.

Using Custom Seccomp Profiles

For greater control and specificity, Kubernetes supports custom seccomp profiles. These are defined by users and referenced in the `SecurityContext`. Custom profiles offer tailored security measures, allowing for the precise inclusion or exclusion of system calls as per the application’s requirements. However, creating and maintaining these profiles requires a deeper understanding of the application’s system call needs and can be more complex to manage.

Application developers don’t know which syscalls are made, since most of them don’t “speak that language” which is why it is recommended to use tools to find out which seccomp profiles are in use and Kubescape/ARMO Platform can automatically generate suggestions, based on application behavior. Saves the research and manual generation of them.

Kubernetes doesn’t treat seccomp profiles as 1st class citizens – as a by-product it requires users to distribute seccomp profiles as json files on every kubernetes node. Which requires development of custom scripts and distribution and update methods. Kubescape makes seccomp profiles part of Kubernetes API objects, hence first class citizens that are abstracted by Kubescape. Here’s one way to go about it: https://github.com/slashben/cn-continuous-security-demo

Both approaches enhance the security of containerized applications in Kubernetes by controlling the system calls they can execute, thereby minimizing potential attack surfaces.

Conclusion

Security, particularly in the context of Kubernetes, is a journey, not a destination. In this ecosystem, safeguarding the nodes from workloads is of utmost importance. While it’s crucial to protect the outer layers, such as securing the supply chain and patching vulnerabilities, these measures often require time. During this period, systems may remain vulnerable to intrusion. This is where seccomp comes into play, serving as a critical line of defense. By restricting system calls that containers can execute, seccomp provides an additional security layer. This layer grants administrators and developers more time to identify and fix vulnerabilities, thereby enhancing the overall security posture of the Kubernetes environment.