Evolving Risks, Insecure Defaults, Watering Hole Threats in the Cloud
I’m excited to announce our latest research, Accurics’ Cloud Cyber Resilience Report!
To better understand the challenges faced by modern cloud native development teams, Accurics researched security risks present in actual cloud native configurations. By analyzing hundreds of cloud native infrastructure deployments across Accurics customers and community users, we found a number of well-known historical problems that continue to plague software projects. We also discovered some emerging problems that need to be addressed quickly if we want to avoid another SolarWinds scenario.
As cloud native technologies experience explosive growth, it is clearer than ever that organizations see cloud native as the future of software.
Risk of advanced attacks in the cloud is increasing as organizations struggle to govern managed infrastructure offerings.
The SolarWinds Orion hack provides a hint of what’s possible when attackers are able to gain access to code or pipelines. Attackers accessed and modified Orion source code to insert malware as if it were code committed by an authorized developer. It was compiled into the application, delivered as an officially signed binary to SolarWinds users, and installed onto an unknown number of systems. Attackers went undetected for months while leveraging privileged access on the network to surveil their victims.
Our research suggests that cloud watering hole attacks could become more prevalent as organizations increase adoption of managed infrastructure offerings — think hosted CI/CD services, messaging services, and FaaS. Infrastructure as Code (IaC) is often leveraged to provision and run the pipeline resources.
This level of automation is a key benefit of IaC, but also creates fundamental risks to the integrity of the delivery process: if an attacker is able to compromise a pipeline, or something that feeds into or shapes the pipeline (such as IaC), then the pipeline will automate the process of delivering that change to the production environment. As demonstrated by the SolarWinds hack, this can be leveraged to deliver malware to end users, gain unauthorized access to the production environment or its data, or to completely compromise the environment.
Of all the violations identified in our research, 22.5% correspond to poorly configured managed services offerings. The vast majority of these violations are due to the use of default security profiles or configurations that provide excessive permissions. Default configurations for managed services are often designed to make it easier for developers to get started with a service — meaning that they favor more permissive, rather than more restrictive, access. By using these defaults in normal use, organizations are making it easier for attackers to discover their services, read their data, and potentially modify things.
Messaging services and FaaS are entering a perilous phase of adoption, similar to what storage buckets experienced a few years ago. If history is a guide, we expect to start seeing more breaches due to insecure configurations around these services.
A new threat vector is emerging as Identity and Access Management (IAM) is now being defined through Infrastructure as Code (IaC).
Identity and access management is emerging as a key bottleneck as applications become more complex and numerous. This is the first time that we have seen IAM defined through IaC in production environments. This approach may have seemed crazy to many organizations in the past, but the reality of today’s complex cloud apps and environments is that even medium-size organizations may have thousands, or tens of thousands, of roles. It’s simply no longer feasible to manage them all manually. More than a third (35.3%) of the IAM drifts detected in this report originated in IaC, indicating very rapid adoption of IAM as Code.
Defining IAM through Infrastructure as Code introduces security challenges — all it takes is one overly permissive role for attackers to be able to penetrate a cloud environment and move laterally to access critical resources.
Misconfigurations in Helm charts pose similar risks to Kubernetes deployments.
We analyzed some of the most popular Helm repositories and packages to understand whether these third-party components introduce new security risks into our applications. The most common misconfigurations in these packages are similar to those identified in other deployments:
- 47.9% of problems were due to Insecure defaults. Improper use of the default namespace — where system components run — was the most common mistake. This misconfiguration could give attackers access to the system components or secrets.
- Insecure secrets management represented 26% of the violations identified in our repository scan. While the Helm charts did not include hard-coded credentials, they did pass secrets into containers via environment variables.
- 17.8% of the misconfigurations in the Helm repos related to resource management, or lack thereof. When no limit is specified, the container may consume everything available to the node where the container runs.
- Container security violations comprised 8.2% of the misconfigurations, which includes problems such as containers that use the host’s process ID namespace or enable extra capabilities. These violations make it much easier for container workloads to escape the container sandbox and gain access to system-level resources and other containers.
Teams are still tripping over well-known misconfigurations.
In addition to educating users about the risks posed by new cloud-based development practices, there is evidence that organizations still struggle to avoid well-known problems.
- Misconfigurations of storage buckets still represent about 15.3% of the violations.
- Hardcoded secrets represented almost 10% of violations. In fact, every single organization tested included a hardcoded secret in at least one container configuration.
- Of the organizations we tested, 10.3% are paying for advanced security capabilities that are never enabled.
- Kubernetes users that try to implement role-based access controls (RBAC) often fail to define roles at the proper granularity, increasing credential reuse and the chance of misuse — 35% of the organizations in our research struggled with this problem.
Misconfigurations in the architectural foundations of cloud native apps should be fixed the fastest but actually take the longest to fix.
The average time to fix infrastructure misconfigurations was about 25 days; misconfigurations in load balancer services required over 149 days — the longest of all misconfigurations analyzed. All user-facing data goes through these resources, so they should be fixed the fastest, not the slowest. Included in those misconfigurations were instances of applications accepting TLS 1.1 connections, which could expose them to the infamous Heartbleed bug. This is a well-known vulnerability which can be easily exploited and should be fixed as quickly as possible.
If we distinguish between production and pre-production environments, the average time to fix infrastructure misconfigurations are 21.8 and 31.2 days, respectively. This hints that many organizations prioritize remediation in production. While that may seem reasonable at first glance, pre-production environments can also pose risks if customer data can be used from there, as illustrated by the Imperva breach.
Our prior research revealed that 90% of organizations allow configuration changes to occur in runtime, which causes cloud risk posture to drift from established secure baselines. Without proper management and security protocols in place, this can come at a cost.
On average, it takes 7.7 days for organizations to reconcile configuration changes in runtime with the IaC baseline, creating windows of opportunity for attackers. In general, our research found that the longest time to fix drifts was 21 days, and that class of drifts included resources associated with software-defined networking, messaging, and Function as a Service (FaaS). Foundational networking components are common to both classes.
Configuration drift was a factor in the Twilio breach; while the AWS S3 bucket was configured correctly when it was added to their environment in 2015, the configuration was changed 5 months later to fix a problem and not properly reset once the issue was fixed. This drift went undetected and unaddressed until it was exploited nearly 5 years after the misconfiguration was created. This highlights the fact that drift can be incredibly dangerous, as it creates a window of opportunity for exploitation, which was taken advantage of in this example. The good news is that once Twilio was alerted to the malicious modification, they were able to replace the TaskRouter JS SDK file in about an hour.