Security | Nov 19, 2020

Kubernetes Security Starts With Policy as Code

Kubernetes security

Kubernetes security, much like the broader Kubernetes ecosystem, is constantly changing.  While working on policy as code for Helm and Kustomize in Terrascan, we were watching an interesting thread in the kubernetes-security-discuss mailing list.  The security advisory discusses an unreleased bug that can lead to denial of service in kubernetes snapshot controllers.

Users of vulnerable versions require careful configuration to eliminate the potential for external requests to bring down your snapshot controller.  While exposure is currently limited, if an app were to enable an insecure version or configuration of the controller then they could be open to a severe denial of service problem–the loss of their snapshot controller.  This is the type of scenario where policy as code can be a huge help!

Background

The Kubernetes snapshot controller is an optional component of Kubernetes apps that helps to manage volume snapshots.  These snapshots are used to backup, replicate and restore data volumes programmatically.  When a snapshot controller is deployed, it enables users and administrators to manage the volume snapshots.  The vulnerability was assigned CVE id CVE-2020-8569.

Protecting your apps with policy as code

Accurics offers an open source tool, Terrascan, which enables policy as code for Kubernetes, Helm, and Kustomize among others.  Policy as code means that you codify policies into your development process, much like you already do for infrastructure as code, to allow consistent enforcement of those policies through automation.  Terrascan specifically uses the Open Policy Agent (OPA) and its Rego policy language to detect policy violations and vulnerabilities.

Kubernetes (K8s) apps are most commonly defined as YAML configuration files, in the form of standard K8s structures, Helm charts, or Kustomizations.  Terrascan can scan these files and evaluate more than 500 policies to determine whether security, compliance, and cloud provider best practices are being followed.

By implementing policy as code in development pipelines, developers are able to detect misconfigurations, inefficiencies and vulnerabilities early in the development process, before risky infrastructure is provisioned.  This automated, first line of defense helps streamline the development and release process while reducing the attack surface of the app.

Terrascan’s open architecture allows users to create custom policies which augment the built-in library.  The rest of this post goes into technical detail about this Kubernetes security vulnerability and how Terrascan’s policy as code can be used to identify and eliminate the risk before deploying vulnerable infrastructure.  For clarity and readability, I will gloss over certain details.  We are adding a production-ready version of this policy to Terrascan’s standard policy library so you don’t need to add it yourself.

CVE-2020-8569

This vulnerability exploits the Kubernetes Container Storage Interface (CSI) snapshot controller which is a part of the Kubernetes Cluster Controller.  When a snapshot controller is deployed, it looks for Custom Resource Definitions (CRDs) such as VolumeSnapshot and VolumeSnapshotContent and provides them to the API group snapshot.storage.k8s.io so users and administrators can manage the volume snapshots.

Affected versions: v3.0.0 – v3.0.1

Potential impact: Denial of Service (Snapshot Controller becomes inaccessible)

The vulnerability is due to a null pointer dereference which causes the controller to crash in specific circumstances.  Once present, those circumstances will persist and the controller will continue to crash until manual action is taken.

The circumstances required for exploit include:

  1. Your cluster is running a vulnerable version of the snapshot controller.
  2. Untrusted users can create a VolumeSnapshot CRD in the API group snapshot.storage.k8s.io without referencing any VolumeSnapshotClass or referencing a non-existent Persistent Volume.

Imagine the following scenario: your cluster has been deployed with a vulnerable version of the snapshot controller.  Let’s assume you are using v3.0.1, and you allow untrusted users to create VolumeSnapshot resources in the API group snapshot.storage.k8s.io.

If an untrusted user were to create a VolumeSnapshot CRD like the following:

apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshot
metadata:
  name: new-snapshot
spec:
  source:
    persistentVolumeClaimName: blabla
 

Then the snapshot controller will enter an endless crash loop.

The potential adversary or untrusted user did not reference any VolumeSnapshotClass and also referenced a nonexistent persistent volume. The snapshot controller will crash due to the null pointer dereference, and upon restart it will process the same CRD leading to an endless crash loop. As a result, the app will be unable to create or delete snapshots.

To identify the vulnerability with Terrascan, we can focus on two things:

  1. Does the configuration include a vulnerable version of the snapshot controller?
  2.  (Optional) Are untrusted users able to create VolumeSnapshot resources in API group snapshot.storage.k8s.io?

Does the configuration include a vulnerable version of the snapshot controller?

In most cases we recommend simply avoiding the vulnerable versions.  No releases included the vulnerable versions, and a fixed version has been released.  We can simply check the images used by all of your controllers.  If a vulnerable version is present, we’ll raise a violation so you know where to upgrade the configuration to use a safe version.

The Rego code might look something like this:

vulnerableImageInUse[controllers.id] {
    # All controllers
    controllers = input.kubernetes_stateful_set[_]
    # Images for all controllers
    images := controllers.config.spec.template.spec.containers[_].image
    # Images known to be vulnerable
    vulnerableImage := [
      "k8s.gcr.io/sig-storage/snapshot-controller:v3.0.0",
      "k8s.gcr.io/sig-storage/snapshot-controller:v3.0.1",
      "quay.io/k8scsi/snapshot-controller:v3.0.0",
      "quay.io/k8scsi/snapshot-controller:v3.0.1"
    ]
    # Vulnerable images used by controllers
    vulnerableImage[_] == images
} 

This essentially gets a list of all image identifiers used in all controllers, and returns a list of the controller ids that use vulnerable images.  Terrascan will then tell us which source files need to be changed:

--> terrascan scan -i k8s results:   violations:     - rule_name: vulnerableImageInUse       description: CVE-2020-8569 Image version is vulnerable to DoS       rule_id: accurics.kubernetes.OPS.999       severity: LOW       category: Operational Efficiency       resource_name: snapshot-controller       resource_type: kubernetes_stateful_set       file: /app/src/stateful-sets.yaml       line: 1   count:     low: 1     medium: 0     high: 0     total: 1

Note that the policy metadata, such as severity, description, etc. are defined with the new policy but were not included in this post for readability.

Are untrusted users able to trigger the vulnerability?

If, for some reason, you need to use one of the vulnerable images, you probably don’t want to be alerted to every use of a vulnerable version because it’s only a problem if untrusted users can submit a VolumeSnapshot CRD.  In that case, you can augment the policy with code that tests whether the ClusterRole bindings allow only authenticated users of the cluster to create the VolumeSnapshot CRD.  This blog is getting long enough, so we’ll explore that further in our community’s Policy as Code category.

Self-healing Cloud Infrastructure for Tomorrow’s Applications

Accurics Adds Support for HashiCorp Terraform Cloud and Sentinel Policy as Code

BHIM Breach: Policy Guardrails No Longer Optional

We use cookies to ensure you get the best experience on our website. By continuing to browse this site, you acknowledge the use of cookies.