Kubernetes Multi-tenancy: Challenges, Benefits, and Best Practices for Enterprises
Multi-tenancy can bring many benefits to enterprises that want to optimize their resource utilization, reduce operational costs, and increase agility and collaboration. However, it also introduces some risks and complexities that need to be addressed carefully.
The notion of a tenant extends beyond just users of a cluster, encompassing the set of workloads that make up computing, networking, storage, and other resources. In a multi-tenant cluster, it’s crucial to segregate different tenants within a single cluster to the greatest extent possible (tenants may be distributed across multiple clusters in the future). This approach ensures that malicious tenants can’t harm others and that shared cluster resources are allocated fairly among tenants.
Isolation’s security level allows us to categorize clusters into:
soft multi-tenancy
- more applicable to enterprise multi-tenancy, where there are no malicious tenants by default. In this context, isolation is intended to safeguard inter-team business operations and guard against potential security threats
- non-adversarial tenants
- different department/teams in the same company
- not trying to harm other tenants
- focus on preventing accidents
hard multi-tenancy
- tailored for service providers offering external services. Given the nature of their business, the security profiles of different tenants’ business users can’t be assured
- adversarial tenants
- different kinds of users who has no relation to each other
- trying to exploit the system
- focus on securing and isolating each tenant
Consequently, tenants and Kubernetes systems within the cluster may pose threats to each other. Therefore, stringent isolation is necessary for security assurance.
Benefits
- Optimizing resource utilization: make the most out of their cluster resources by sharing them among multiple users or teams. This can reduce the waste of idle or underutilized resources and lower the operational costs
- Reducing infrastructure complexity: simplify the infrastructure by consolidating multiple clusters into one. This can reduce the maintenance burden and the risk of configuration drift or inconsistency
- Increasing agility and collaboration: speed up the development and delivery cycles by enabling faster provisioning and deployment of workloads. It can also foster collaboration and innovation by allowing different users or teams to work on different projects or experiments within the same cluster
Challenges
Ensuring Isolation and Security Among Different Tenants
One of the main challenges of multi-tenancy in Kubernetes is ensuring the isolation and security of each tenant’s data and workloads.
Kubernetes doesn’t provide a native way to enforce strict boundaries between tenants, so administrators need to use various tools and techniques to achieve this by:
- Using namespaces to group and isolate resources within a cluster. Namespaces can help limit the scope of operations and access controls for each tenant, but they don’t prevent cross-namespace interactions or resource conflicts
- Using network policies to control the traffic flow between pods and services within and across namespaces. Network policies can help prevent unauthorized or malicious communication between tenants, but they do not protect against attacks from within the same namespace or from the cluster network
- Using resource quotas and limits to restrict the amount of CPU, memory, storage, and other resources that each tenant can consume. Resource quotas and limits can help prevent overcommitment and starvation of cluster resources, but they do not guarantee performance or availability for each tenant
- Using role-based access control (RBAC) to define the permissions and roles for each user and group within a cluster. RBAC can help enforce the principle of least privilege and prevent unauthorized actions or access to cluster resources, but it does not prevent human errors or misconfigurations
- Using service accounts and secrets to manage the credentials and tokens for each tenant’s workloads. Service accounts and secrets can help secure the authentication and authorization of pods and services within a cluster, but they do not protect against leaks or compromises of sensitive data
These methods can provide some level of isolation and security for multi-tenant clusters, but they are not foolproof or comprehensive.
Administrators need to carefully configure and monitor them to ensure that they are effective and consistent.
Moreover, these methods can also introduce some trade-offs and challenges in terms of complexity, performance, compatibility, and usability:
- using namespaces can increase the management overhead and the risk of name collisions or conflicts
- using network policies can affect the latency and throughput of network traffic and require additional tools or plugins to implement
- using resource quotas and limits can impact the scalability and reliability of workloads and require careful tuning and balancing
- using RBAC can complicate the user experience and the integration with external identity providers or systems
- using service accounts and secrets can increase the storage requirements and the exposure of sensitive data
Some of the potential issues that can arise from insufficient isolation and security are:
- Cross-tenant interference: One tenant’s workload can affect the performance, availability, or functionality of another tenant’s workload due to resource contention, noisy neighbors, or configuration errors
- Cross-tenant attacks: One tenant can compromise the security or integrity of another tenant’s workload by exploiting vulnerabilities, misconfigurations, or malicious code
- Cross-tenant data leakage: One tenant can access or expose sensitive data belonging to another tenant due to insufficient encryption, authentication, or authorization
Managing the Complexity and Diversity of Tenant Requirements
Different tenants may have different needs and expectations regarding the quality of service (QoS), availability, scalability, reliability, and functionality of their workloads. They may also have different preferences and constraints regarding the configuration, customization, and governance of their workloads. Moreover, tenants may have different levels of expertise and experience with Kubernetes, which can affect their ability to use the platform effectively and efficiently.
Some of the potential issues that can arise from managing the complexity and diversity of tenant requirements are:
- Inconsistent service levels: One tenant’s workload may receive a higher or lower level of service than another tenant’s workload due to inconsistent resource allocation, scheduling, or monitoring
- Incompatible configurations: One tenant’s workload may conflict with another tenant’s workload due to incompatible settings, dependencies, or versions
- Inadequate governance: One tenant’s workload may violate the policies or regulations of another tenant or the cluster owner due to inadequate enforcement or auditing
Use Cases
Enterprise-Wide Cluster Sharing
All users of the cluster are part of the same organization. This is a common setup for many Kubernetes cluster clients. Given that the user identities are manageable, the security risks associated with this business model are relatively manageable. After all, employees who misuse the service can simply be dismissed. Namespaces should be configured in line with the company’s internal staff structure to logically separate resources belonging to different departments or teams.
The following roles should be assigned to business personnel:
- Cluster Administrator: Possesses capabilities for managing the cluster, such as scaling and adding nodes. Creates and assigns namespaces to tenant managers. Handles various policies, including RAM, RBAC, network policies, and quotas
- Tenant Administrator: Has at least read-only access to the cluster’s RAM. Manages the RBAC settings of relevant personnel in the tenant
- Tenant User: Utilizes Kubernetes resources within the allowed scope in the tenant namespace
Apart from role-based access control, network isolation between namespaces should be ensured. As a result, only approved cross-tenant application requests are permitted between different namespaces.
Furthermore, for applications with stringent business security requirements, use policy tools like Seccomp, AppArmor, SELinux to restrict the kernel capabilities of the application container and limit the container runtime capabilities.
The existing single-tier logical isolation of namespaces in Kubernetes may not suffice for the isolation needs of complex business models of some large-scale enterprise applications. To resolve this, virtual clusters can be used. These abstract a higher-level tenant resource model to provide more detailed multi-tenancy management, thereby addressing the limitations of native namespaces.
Multi-Tenancy in Software as a Service (SaaS) Model
In the context of SaaS, multi-tenancy within a Kubernetes cluster refers to the instances of the service application on the SaaS platform and the SaaS control plane itself. Here, the platform’s service application instances are segregated into distinct namespaces. The service’s end-users don’t have direct interaction with the Kubernetes control plane components. Instead, they access and utilize the SaaS console, deploying businesses or using services via the tailored SaaS control plane.
Consider a blogging platform running on a multi-tenant cluster as an example. Here, tenants are individual customer blog instances and the platform’s control plane, each operating within separate namespaces. Customers can create, delete, and update blogs via the platform interfaces without needing to understand the cluster’s inner workings.
Multi-Tenancy in Knowledge as a Service (KaaS) Model
KaaS multi-tenancy model typically involves cloud service providers. In this model, business platform services are directly exposed to different tenant users via the Kubernetes control plane. End-users interact with Kubernetes native APIs or other extension APIs provided by service providers based on custom resource definitions (CRDs) and controllers. To ensure basic isolation, different tenants must use namespaces for logical access segregation and to maintain network and resource quota isolation among different tenants.
Unlike shared clusters within an organization, all end-users in this scenario originate from untrusted domains. Consequently, it’s not practical to prevent malicious tenants from executing harmful code on the service platform. As such, enhanced security isolation is necessary for multi-tenant clusters in SaaS and KaaS service models. The existing native capabilities of Kubernetes are insufficient to meet these security needs. Therefore, tenant security in this business pattern can be improved by isolating containers at the kernel level during runtime, such as through the use of security containers.
Workflow for Establishing Multi-Tenant Architecture
In the process of designing and setting up a multi-tenant cluster, it’s crucial to initially utilize Kubernetes’ resource isolation layer. This involves creating resource isolation models that categorize the cluster, namespaces, nodes, pods, and containers into distinct levels. When the application loads of various tenants utilize the same resource model, it can lead to security vulnerabilities among them. Hence, it’s important to manage the resource domains accessed by each tenant during multi-tenancy implementation.
At the resource scheduling stage, make sure that containers handling sensitive data operate on relatively independent resource nodes. If loads from different tenants share the same resource domain, minimize cross-tenant attack risks by applying runtime security and resource scheduling control policies.
Despite Kubernetes’ existing security and scheduling capabilities being inadequate for achieving total secure isolation between tenants, you can isolate the resource domains utilized by tenants via namespaces. Combine this with policy models like RBAC, PodSecurityPolicy
, and NetworkPolicy
to regulate the scope and capabilities of tenant resource access, similar to the intra-enterprise cluster sharing scenario. Coupled with existing resource scheduling capabilities, this method already offers substantial security isolation capabilities.
For service platform formats such as SaaS and KaaS, employ container kernel-level isolation through the security container product that significantly reduces cross-tenant attacks from malicious tenants using the escape technique.
Access Control
AuthN, AuthZ, and Admission
The process of authorization in an ACK cluster is a 2-step procedure: RAM authorization followed by RBAC authorization.
RAM authorization is responsible for controlling access to the cluster management API, encompassing Create, Read, Update, and Delete (CRUD) permissions for the cluster. This includes operations such as visibility of the cluster, scaling, and addition of nodes.
RBAC authorization controls access to the Kubernetes resource model within the cluster, providing precise authorization for specific resources at the namespace level.
The management of ACK authorization offers users within a tenant pre-set role templates of varying levels. It also supports the binding of multiple user-defined cluster roles and facilitates batch user authorization.
NetworkPolicy
Mechanism that manages the flow of network traffic between various business pods belonging to different tenants.
It employs a allow-list to execute cross-tenant business access control.
PodSecurityPolicy (PSP)
Inherent resource models at the cluster level in Kubernetes.
During the admission phase, when a pod request is initiated, they validate if the pod’s actions comply with the respective PSP’s stipulations. For instance, they verify if the pod is utilizing the host’s network, file system, designated port, or PID namespace.
Moreover, they limit intra-tenant users from activating privileged containers and confine drive types to augment read-only attachment and other functionalities.
Furthermore, based on the associated policies, PSPs append relevant SecurityContext to pods. This configuration encompasses the UID during the container’s runtime, GID, addition or removal of kernel capabilities, among other settings.
Open Policy Agent (OPA)
Robust policy engine that facilitates a decoupled policy decision service.
When the existing namespace-level RBAC isolation capabilities fall short of the intricate security needs of enterprise applications, OPA steps in to offer granular access policy control at the object model level.
Moreover, OPA supports the definition of layer-7 NetworkPolicy
and cross-namespace access control based on labels and annotations, thereby significantly augmenting the native NetworkPolicy
of Kubernetes.
Resource Scheduling
Resource Quotas and Limit Range
In a scenario where multiple teams or departments are utilizing the same cluster resources, there can be a competition for resources.
This can be managed by setting a resource usage cap for each tenant. The ResourceQuota
is employed to restrict the aggregate resource request and limit values for all pods within the namespace corresponding to each tenant. The LimitRange
is utilized to establish default resource request and limit values for pods that are deployed within the namespace of the tenant.
Furthermore, restrictions are placed on the storage resource quota and the quantity of objects that tenants can have.
Pod Priority and Preemption
The pod priority signifies the importance of pods in the scheduling queue that are in a pending state. If high-priority pods cannot be scheduled due to lack of node resources or other factors, the scheduler will try to remove lower-priority pods to ensure that pods with higher priority are scheduled and deployed first.
In a multi-tenant environment, the availability of critical business applications is safeguarded through priority and preemption settings. Furthermore, pod priority is used in conjunction with ResouceQuota
to restrict tenant quotas at a given priority.
Dedicated Nodes
By applying taints to certain nodes in a cluster, these nodes can be set aside for exclusive use by specific tenants. In a multi-tenant environment, such as one that includes GPU nodes in a cluster, taints can be used to reserve these nodes for service teams of business applications that require GPU resources. The cluster administrator can add a taint to a node using tags like effect: “NoSchedule”. Then, only pods with the appropriate tolerance settings can be scheduled on that node.
However, malicious tenants can add the same tolerance configuration to their pods to gain access to this node. Therefore, relying solely on the node tainting and tolerance mechanism cannot guarantee the exclusivity of target nodes in an untrusted multi-tenant cluster.
Malicious tenants may circumvent policies enforced by the node taint and tolerance mechanism. This only applies to enterprise clusters with trusted tenants or clusters where tenants do not have direct access to the Kubernetes control plane.
Protection of Sensitive Information
Secrets Encryption at REST
In a cluster environment where multiple tenants coexist, the etcd
storage is shared among various tenant users. When these users interact with the Kubernetes control plane, it’s crucial to secure the data within secrets. This measure helps avoid any potential exposure of sensitive data due to misconfigured access control policies.
Conclusion
Multi-tenancy in Kubernetes is a challenging but rewarding endeavor for enterprises that want to take advantage of the benefits of containerization.
When deploying a multi-tenant architecture, it’s important to consider the trustworthiness of users and applications under a tenant and the degree of security isolation. To meet basic security isolation requirements, you should enable the default security configuration for the Kubernetes cluster. This includes enabling RBAC to block access from anonymous users and secret encryption to enhance the protection of sensitive information.
Security configuration should be performed based on CIS Kubernetes benchmarks. You should also enable admission controllers such as NodeRestriction, AlwaysPullImages, and PodSecurityPolicy. Pod Security Policies (PSPs) can be used to control the privileged mode in pod deployments and manage the security context of pods while they are running.
In addition, it’s necessary to configure network policies and enable Seccomp, AppArmor, and SELinux for Docker runtime. Efforts should be made to achieve multi-tenant isolation for services such as monitoring and logging.
When using service models such as SaaS and KaaS, or when the trustworthiness of users under a tenant cannot be guaranteed, more effective isolation measures should be taken. These include using dynamic policy engines like Open Policy Agent (OPA) for fine-grained access control at the network or object level, deploying a secure container for kernel-level isolation during container runtime, and implementing comprehensive multi-tenant isolation solutions for monitoring, logging, storage, and other services.