DevOps | Jun 15, 2020

Remediation as Code: Self-Healing Cloud Infrastructure

Remediation as Code

As organizations rapidly adopt new technologies such as serverless, containers, and servicemesh, cloud infrastructure is becoming increasingly “immutable”: infrastructure is never modified after it is deployed. If something needs to be modified in any way, new infrastructure has to be provisioned through code (IaC). This approach of programmable infrastructure enables organizations to develop and deploy applications significantly faster and more reliably.

Along with hyper-agility comes challenges: organizations are struggling to ensure security. The latest State of DevSecOps report revealed that only 4% of cloud misconfigurations reported in production are being addressed. This is unsurprising because effective resolution requires tracing an issue back to the code (IaC) that defined the infrastructure configurations, which tends to be challenging.

The Ideal Solution: Immutable Security

A better approach is to embed security earlier in the development lifecycle and maintain a secure posture throughout the lifecycle, a new paradigm we refer to as “immutable security”. Effectively, security must be codified into all layers of the cloud stack to identify and fix misconfigurations before cloud infrastructure is provisioned, establishing a secure baseline. If a configuration needs to be modified in any way, the change should be implemented in the code, assessed for risk, and the infrastructure must be redeployed.

The Reality: “Break Glass” Situations

While immutable security is a desired state, Accurics research revealed that 90% of organizations do allow privileged users to “break glass” and make configuration changes directly to cloud infrastructure during run-time. In some situations, the changes may be legitimate and necessary. While in other situations. the changes may introduce risk either inadvertently, or intentionally if an attacker gains access to the cloud environment. Whatever the reason, the configuration change to cloud infrastructure in run-time breaks immutability; the posture of the cloud drifts from the secure baseline defined through code.

Maintaining Immutable: IaC as the Single Source of Truth

The only way to maintain immutable security is to ensure that the infrastructure as code (IaC) becomes the single source of truth. The cloud infrastructure must be continuously monitored in run-time for configuration changes and assessed for risk. When a privileged user makes a legitimate configuration change, the IaC must be remediated to reflect the good change and a new baseline is established. Otherwise, the next time that the cloud infrastructure is deployed using IaC, the change will be overwritten.

In situations where the configuration change introduces a risk, the cloud infrastructure must be redeployed based on the secure baseline defined through IaC. This will ensure that any risky changes that are made accidentally or maliciously will be overwritten.

Operationalization: Remediation as Code

The key to operationalizing immutable security is that remediation must occur at the speed of devops using an approach known as “Remediation as Code”. As infrastructure as code is developed, it is continuously assessed for misconfigurations similar to traditional unit testing in software development. As soon a misconfiguration is identified, developers must be notified and provided with the code to quickly remediate the issue. This should be done by automatically creating a “pull request” or “merge request” and replacing the bad configurations with the recommended configurations, without disrupting the sanity of production deployments. Developers can simply review the recommendation and merge the change into their code to resolve the issue, adhering to the standard development best practice of “develop -> review -> merge”. Once the issues are resolved, the IaC is established as the secure baseline and the cloud infrastructure is deployed.

The cloud infrastructure must then be continuously monitored in run-time for configuration changes, mapped against the IaC baseline, and assessed for risk. When a good change is made, developers must be notified and provided with the code to quickly update the IaC. When a risky change is made, developers must be notified and the cloud infrastructure must be redeployed using the secure baseline defined through IaC.

Remediation as Code Immutability

 

Remediation as Code in Action

Accurics’ Remediation as code capability seamlessly integrates into each stage of the development lifecycle and enables devops to streamline issues to go through predictable, validated pipelines, while managing break-glass situations. In the example below, an AWS Simple Queue Service (SQS) security group is created via IaC without any violations. After it is provisioned, a user turns off server side encryption in the AWS console which is flagged as a violation by the platform. Subsequently, it generates the remediation code and alerts the developer who can review and merge.

Step 1: Accurics analyzes the IaC (Terraform) using established best practices such as CIS Benchmark and there are no violations.

resource "aws_sqs_queue" "sqsQueueExposed" {
    name                                = "terraform-example-queue"
    kms_master_key_id                   = "alias/aws/sqs"
    kms_data_key_reuse_period_seconds   = 300
    polic                               = <<EOF
    {
        "Version": "2012-10-17",
        "Statement": [{
            "Sid":"Queue1_AnonymousAccess_AllActions_WhitelistIP",
            "Effect": "Allow",
            "Principal": "*",
            "Action": "sqs:*",
            "Resource": "arn:aws:sqs:*111122223333:queue1"
        }]
    }
    EOF
} 

Step 2: The cloud infrastructure is provisioned using the IaC and is monitored for changes. A user turns off server side encryption for Amazon Simple Queue Service (SQS) which is flagged as a violation.

Disable AWS SQS Encryption

Step 3: Accurics automatically generates the remediation code that is required to turn on encryption and generates a pull/merge request in the code repository where the IaC is stored (Bitbucket in this example). The appropriate reviewers will be notified of the merge request based on existing workflow definitions.

Remediation as Code

Step 4: All the auto-created pull requests will be available in the respective source code repositories. Developers receive Jira tickets with the remediation pull requests attached, review the changes, and accept the merge to the release branch of the IaC code. The IaC and the cloud are now back in sync, preserving immutability.

Remediation as Code Jira Ticket

Figure: Developers receive Jira tickets with the remediation pull requests attached

Remediation as Code Review Change

Figure: Developers review the changes and accept the merge

Conclusion

Remediation as code isn’t just about fixing issues, it is a paradigm that enables organizations to build self-healing cloud infrastructure where misconfigurations are automatically discovered and expediently resolved. Remediation as code is very deterministic and has almost zero false positives, making it the only practical way for developers to address misconfigurations without hindering agility. Moreover, it negates the need for privileged users to make changes to cloud infrastructure configurations in run-time and preserves immutability. This unique approach codifies remediation as part of the development lifecycle and and as a result, seamlessly works across IaC as well as cloud environments.

Accurics Enables Self-healing Infrastructure with GitHub App

DevSecOps: Risks and Best Practices

Securing Infrastructure as Code Using Terrascan

We use cookies to ensure you get the best experience on our website. By continuing to browse this site, you acknowledge the use of cookies.