Kubernetes Reusable Elements for Designing Cloud Native Applications: Advanced Patterns

Roman Glushach
19 min readOct 20, 2023
Kubernetes Advanced Patterns

Kubernetes is a powerful platform for building and deploying cloud native applications. It provides a set of features and tools that enable developers to create, manage, and scale applications across different environments. However, Kubernetes is not a one-size-fits-all solution. Depending on the complexity, requirements, and challenges of each application, developers may need to use different design patterns and best practices to achieve their goals.

Harnessing the full power of Kubernetes requires a deep understanding of its advanced patterns that can help developers design more robust, resilient, and scalable cloud native applications using Kubernetes. These patterns are based on the concept of reusable elements, which are common solutions to recurring problems in software engineering. By applying these reusable elements, developers can avoid reinventing the wheel and leverage the existing knowledge and experience of the Kubernetes community.

Controller

Observe-Analyze-Act Cycle

In Kubernetes, controllers are like diligent supervisors that constantly monitor and maintain resources to ensure they match the desired state. This process is known as state reconciliation.

Kubernetes uses a declarative resource-centric API. For example, when you want to scale up a Deployment, you don’t create new Pods directly. Instead, you change the Deployment resource’s replicas property via the Kubernetes API to the desired number. Controllers then create new Pods in response to these changes.

Kubernetes has built-in controllers that manage standard resources like ReplicaSets, DaemonSets, StatefulSets, Deployments, or Services. These controllers run on the control plane node and constantly monitor their resources to ensure the actual and desired states align.

You can also create custom controllers which add extra functionality by reacting to state-changing events. These controllers observe the actual state, analyze the differences from the desired state, and act to align them.

A new generation of more sophisticated controllers called Operators manage the full application lifecycle and interact with CustomResourceDefinitions (CRDs). They encapsulate complex application domain logic and use the Singleton Service pattern to prevent multiple controllers from acting on the same resources simultaneously.

Controllers can be written in any programming language by sending requests to the Kubernetes API Server. They extend Kubernetes’ resource management capabilities and are invisible to the cluster user.

Controllers evaluate resource definitions and perform actions based on conditions. Metadata and ConfigMaps are suitable for monitoring and acting upon any field in the resource definition. Labels, part of a resource’s metadata, can be watched by any controller. They are indexed in the backend database and can be efficiently searched for in queries.

CRDs are favored over ConfigMaps for custom target state specifications. A Kubernetes controller observes ConfigMap changes and restarts Pods accordingly. It uses a hanging GET HTTP request to monitor API Server events, checking if a changed ConfigMap carries an annotation. If so, it deletes all Pods matching the annotation’s selector.

Operator

Spectrum of Controllers and Pperators

In Kubernetes, an operator is a type of controller that uses a CustomResourceDefinition (CRD) to automate tasks for a specific application, thereby enhancing the platform’s capabilities. When integrating new concepts like Prometheus for monitoring, we need additional domain objects. CRDs allow us to extend the Kubernetes API by adding custom resources to our cluster, which we can use as if they were native resources.

An operator is essentially a controller that understands both Kubernetes and another domain. By combining knowledge of both areas, it can automate tasks that usually require a human operator.

Kubernetes allows us to specify a few possible subresources for our CRD properties:

  • Scale: allows a CRD to manage its replica count
  • Status: allows updates only to the status field of a resource

The metadata section follows the same format and validation rules as any other Kubernetes resource. The spec contains the CRD-specific content, which Kubernetes validates against the given validation rule from the CRD.

While operators can extend the Kubernetes platform, they are not always the best solution. A simple controller working with standard resources may be sufficient in many cases. Operators are suitable for modeling custom domain logic that aligns with the declarative Kubernetes way of handling resources with reactive controllers.

If your custom use case requires more flexibility in how custom resources can be implemented and persisted, a custom API Server may be a better choice. However, not all use cases are suitable for Kubernetes extension points. If your use case is not declarative, if the data to manage does not fit into the Kubernetes resource model, or you don’t need a tight integration into the platform, it might be better to write your standalone API and expose it with a classical Service or Ingress object.

Controller and Operator Classification

Types of CRDs:

  1. Installation CRDs: These are used for installing and operating applications on the Kubernetes platform. For instance, Prometheus CRDs are used for installing and managing Prometheus
  2. Application CRDs: These represent an application-specific domain concept, allowing applications to deeply integrate with Kubernetes. An example is the ServiceMonitor CRD used by the Prometheus operator to register specific Kubernetes Services to be scraped by a Prometheus server

If creating CRDs isn’t possible, consider using ConfigMaps. While this method doesn’t require cluster-admin rights, it lacks features like API Server level validation and support for API versioning.

From an implementation perspective, the choice between using vanilla Kubernetes objects or custom resources managed by a controller is significant. For CRDs, we can use a schemaless approach or define custom types ourselves.

Another option is to extend the Kubernetes API with its own aggregation layer for complex problem domains. This involves adding a custom-implemented APIService resource as a new URL path to the Kubernetes API, requiring additional security configuration. While this provides more flexibility than using plain CRDs, it also requires implementing more logic. For most use cases, an operator dealing with plain CRDs is sufficient.

Operator Development and Deployment

Tools for creating Operators:

  1. Kubebuilder: A Kubernetes SIG API Machinery project, Kubebuilder is a framework and library for creating Kubernetes APIs via CustomResourceDefinitions. It simplifies the creation of Golang-based operators by providing higher-level abstractions over the Kubernetes API. It also offers project scaffolding and supports multiple CRDs watched by a single operator
  2. Operator Framework: A CNCF project that provides comprehensive support for operator development. It includes the Operator SDK for accessing a Kubernetes cluster and starting an operator project, the Operator Lifecycle Manager for managing operator releases and updates, and Operator Hub, a public catalog for sharing community-built operators
  3. Metacontroller: A Google Cloud Platform project

The Operator Lifecycle Manager (OLM) enables nonprivileged users to register CRDs and install their operators. Operators can be published on the Operator Hub, which extracts metadata from the operator’s ClusterServiceVersion (CSV) and displays it in a user-friendly web interface.

Metacontroller

Metacontroller is an operator building framework that enhances Kubernetes with APIs, simplifying the process of writing custom controllers. It functions like the Kubernetes Controller Manager, running multiple controllers defined through Metacontroller-specific CRDs.

In essence, Metacontroller is a delegating controller that calls out to a service providing the actual controller logic. It simplifies the definition of behavior for standard or custom resources. When defining a controller through Metacontroller, you only need to provide a function containing your controller’s specific business logic.

Metacontroller handles all interactions with the Kubernetes APIs, runs a reconciliation loop for you, and calls your function through a webhook. This webhook gets called with a payload describing the CRD event. The function then returns a definition of the Kubernetes resources to be created (or deleted) on behalf of our controller function.

This delegation allows functions to be written in any language that understands HTTP and JSON, without any dependency on the Kubernetes API or its client libraries. These functions can be hosted on Kubernetes, externally on a Functions-as-a-Service (FaaS) provider, or elsewhere.

Metacontroller is particularly useful for extending and customizing Kubernetes with simple automation or orchestration when no extra functionality is needed. It’s especially handy when you want to implement your business logic in a language other than Go. Examples of controllers implemented using Metacontroller include StatefulSet, Blue-Green Deployment, Indexed Job, and Service per Pod.

Elastic Scale

Kubernetes’ Elastic Scale pattern enables multi-dimensional application scaling, including horizontal, vertical, and cluster scaling.

This is particularly useful for workloads that fluctuate seasonally or over time. Kubernetes simplifies the process of determining the necessary resources for a container, the number of replicas for a service, and the number of nodes in a cluster. It can monitor external load and capacity-related events, analyze the current state, and self-adjust for optimal performance.

Scaling in Kubernetes can be horizontal (creating more Pod replicas) or vertical (allocating more resources to containers managed by Pods). Configuring autoscaling on a shared cloud platform can be complex, but Kubernetes provides various features to optimize application setup.

The Vertical Pod Autoscaler (VPA) is still experimental. However, with the growing popularity of the serverless programming model, scaling to zero and rapid scaling have become priorities. Kubernetes add-ons like Knative and KEDA are being developed to meet these needs.

Kubernetes maintains a distributed system based on a desired state specification, ensuring reliability and resilience through continuous monitoring, self-healing, and state matching. It also offers scalability under heavy load, expanding its Pods and nodes rather than weakening or becoming more brittle, demonstrating its antifragile capabilities.

Manual Horizontal Scaling

Manual scaling techniques:

  • Imperative Scaling: In this approach, a human operator directly instructs Kubernetes to adjust the number of Pod instances by altering the desired replica count. For instance, a Deployment named random-generator can be scaled to 4 instances with a single command: kubectl scale random-generator --replicas=4. The ReplicaSet then adjusts the Pods to match the desired count
  • Declarative Scaling: This method is ideal for maintaining configurations outside the cluster. Adjustments are made declaratively in the ReplicaSet or other definitions and then applied to Kubernetes. For example, the replica count can be set using a Deployment with the command: kubectl apply -f random-generator-deployment.yaml

Both methods necessitate a human to monitor or predict changes in application load and make scaling decisions accordingly. They are not ideal for dynamic workload patterns that require constant adaptation.

Resources managing multiple Pods such as ReplicaSets, Deployments, and StatefulSets can be scaled. However, StatefulSets with persistent storage behave differently — they create PVCs when scaling up but do not delete them when scaling down to prevent storage deletion.

Jobs can also be scaled by modifying the .spec.parallelism field instead of .spec.replicas, allowing multiple instances of the same Pod to run concurrently, thereby increasing capacity.

Horizontal Pod Autoscaling (HPA)

HPA in Kubernetes can scale the number of Pods horizontally. However, it can’t scale down to 0 (zero) Pods. Add-ons like Knative and KEDA can provide this capability, turning Kubernetes into a serverless platform.

For HPA to work, the Deployment must declare a .spec.resources.requests limit for the CPU and the metrics server must be enabled. The HPA controller then continuously retrieves metrics about the Pods and calculates the required number of replicas based on these metrics.

HPA should be applied to the higher-level Deployment abstraction rather than individual ReplicaSets to ensure its preservation and application to new ReplicaSet versions. The HPA calculates a single-integer number representing the desired number of replicas to keep the measured value below a certain threshold.

There are a few types of metrics used for autoscaling:

  • Standard
  • Custom
  • External

Getting autoscaling right involves experimenting and tuning. One of the most critical decisions is which metrics to use.

The HPA uses various techniques to prevent rapid fluctuations in the number of replicas when the load is unstable. It disregards high CPU usage samples during scale-up and considers all scale recommendations during a configurable time window during scale-down.

Kubernetes provides the .spec.behavior field in the HPA specification to customize the behavior of the HPA when scaling the number of replicas in a Deployment. This field allows you to specify policies for the maximum number of replicas to scale in a given period and stabilizationWindowSeconds to prevent thrashing effects.

While powerful, HPA lacks the feature of scale-to-zero, which stops all Pods of an application if it is not used.

Kubernetes-based autoscaling platforms:

Knative Pod Autoscaler
  • Knative: Includes Serving and Autoscaler components, is ideal for HTTP-based services. It uses metrics like concurrent requests per Pod and requests per second for scaling decisions, providing a better correlation to HTTP request latency than CPU or memory consumption. Knative initially used a custom metric adapter for the HPA in Kubernetes but later developed its own implementation, the Knative Pod Autoscaler (KPA), for more control over the scaling algorithm. The KPA algorithm can be configured in various ways to optimize autoscaling behavior for any workload and traffic shape
KEDA Autoscaling Components
  • KEDA: Project by Microsoft and Red Hat, scales based on external metrics from different systems. It consists of the KEDA Operator and Metrics service. The KEDA Operator connects the scaled target with an autoscale trigger that connects to an external system via a scaler. It also configures the HPA with the external metrics service provided by KEDA. KEDA’s autoscaling algorithm distinguishes between 2 scenarios: activation by scaling from zero replicas to one, and scaling up and down when running. The central element for KEDA is the custom resource ScaledObject, which plays a similar role as the HorizontalPodAutoscaler resource. As soon as the KEDA operator detects a new instance of ScaledObject, it automatically creates a HorizontalPodAutoscaler resource that uses the KEDA metrics service as an external metrics provider and the scaling parameters. KEDA is an excellent solution when you need to scale based on work items held in external systems, like message queues that your application consumes. It shares some characteristics of batch jobs: the workload runs only when work is done and does not consume any resources when idle. Both can be scaled up for parallel processing of the work items

Despite their differences, Knative and KEDA can be used together as there is little overlap between them.

Autoscalers, which adjust the scale of applications based on metrics, can be categorized into types:

  • Push: Autoscalers, like the one used in Knative, actively receive metrics from closely integrated systems. For instance, the Activator in Knative sends concurrent request metrics to the Autoscaler component. This type is typically employed for applications that are data recipients, such as those with HTTP endpoints
  • Pull: Autoscalers actively fetch metrics either from the application itself or external sources. This is common when the metrics are stored externally or aren’t directly accessible to the autoscaler. KEDA is an example of this, scaling deployments based on factors like queue length in terms of events or messages. Pull autoscalers are generally suitable for applications that actively fetch their workload, such as pulling from a message queue

Vertical Pod Autoscaling (VPA)

Vertical Pod Autoscaling Mechanism

The VPA adjusts resources based on usage, which is particularly useful for stateful services and those with fluctuating load patterns. It operates in different modes, each applying recommendations to Pods in its own way. However, it can cause service disruptions and conflicts with the HPA.

Cluster Autoscaler

Cluster Autoscaling Mechanism

The Cluster Autoscaler (CA) aligns with the pay-as-you-go principle of cloud computing, interacting with cloud providers to manage nodes during peak times or shut down idle nodes. This reduces infrastructure costs but can lead to vendor lock-in due to the use of plugins by cloud providers.

The Cluster API Kubernetes project aims to provide APIs for cluster creation, configuration, and management. It operates through a machine controller running in the background.

The CA performs a few operations:

  • Scale-up: adding new nodes to a cluster
  • Scale-down: removing nodes from a cluster, ensuring service isn’t disrupted

The CA manages the size of your cluster based on the workload. It scales down the cluster when a node is not needed, and scales it up when there’s an increase in load. To determine if a node can be scaled down, the CA checks if more than half of its capacity is unused and if all movable Pods on the node can be relocated to other nodes.

If these conditions are met for a certain period (default is 10 minutes), the node is marked for deletion. The CA then marks it as unschedulable and moves all Pods from it to other nodes.

While scaling Pods and nodes are separate processes, they complement each other. Tools like HPA or VPA can analyze usage metrics and events, and scale Pods. If the cluster capacity is insufficient, the CA steps in to increase the capacity.

Scaling Levels

Application-Scaling Levels

The first step in scaling is Application Tuning. This involves adjusting the application within the container to make the most of the resources available.

Techniques such as Netflix’s Adaptive Concurrency Limits library can be used for dynamic in-app autoscaling:

  • VPA, which automates the process of determining and applying optimal resource requests and limits in containers based on actual usage. However, this may cause brief service disruptions as Pods need to be deleted and recreated
  • HPA is another method where the number of Pods is adjusted without altering their specifications. This is useful when you want to improve performance without changing the number of Pods
  • Cluster Autoscaler (CA) provides flexibility at the cluster capacity level, expanding or reducing the cluster as needed. It operates independently of other scaling methods and doesn’t concern itself with why workload profiles are changing

These techniques can be automated, which aligns with the cloud-native mindset.

Image Builder

Container Image Builds within Kubernetes

Traditionally, container images are built outside the cluster, pushed to a registry, and then referenced in the Kubernetes Deployment descriptors. However, building within the cluster has several benefits such as lower maintenance costs, simplified capacity planning, and reduced platform resource overhead.

Kubernetes’ advanced scheduler is ideal for scheduling tasks like image building with Continuous Integration (CI) systems like Jenkins. This becomes particularly useful when transitioning to Continuous Delivery (CD), where the process moves from building images to running containers. Since both phases share the same infrastructure, having the build occur within the same cluster simplifies this transition.

For example, if a new security vulnerability is found in a base image used by all applications, all dependent application images would need to be rebuilt and updated with the new image. With the Image Builder pattern, the cluster is aware of both the image build and its deployment, allowing for automatic redeployment if a base image changes.

There are several techniques for creating images within a Kubernetes cluster, providing flexibility and efficiency in managing containerized applications.

Tools for building container images can be divided into categories:

  • Container Image Builders: Creates container images within the cluster without needing privileged access. They can also run outside the cluster as CLI programs. Their main function is to create a container image, but they don’t handle application redeployments
  • Build Orchestration Tools: Operates at a higher level of abstraction. They trigger the container image builder for creating images and support build-related tasks like updating deployment descriptors after an image has been built. CI/CD systems are typical examples of orchestrators

Building and running an application in the same cluster offers several benefits. For instance, OpenShift’s ImageStream triggers allow for better integration between build and deployment, a key aspect of Continuous Deployment (CD).

Container Image Builder

Creating images within a cluster without privileged access is crucial for security.

This can be achieved through various tools and methods:

  • Rootless Builds: These builds run without root privileges to enhance security during the build process. They minimize the attack surface, making them beneficial for secure Kubernetes clusters
  • Dockerfile-Based Builders: These builders use Dockerfile format for defining build instructions. Examples include Buildah and Podman, Kaniko, and BuildKit. These tools build OCI-compliant images without a Docker daemon, creating images locally within the container before pushing them to an image registry
  • Multilanguage Builders: These are favored by developers who are more concerned about their application being packaged as container images rather than the process itself

However, Docker’s client-server architecture poses potential security risks as a daemon running in the background needs root privileges mainly for network and volume management reasons. This could potentially allow untrusted processes to escape their container, giving an intruder control of the entire host.

Cloud Native Buildpacks (CNB) is a unified platform supporting many programming platforms. Introduced by Heroku in 2012 and later adopted by Cloud Foundry, CNB became part of the Cloud Native Computing Foundation (CNCF) in 2018.

CNB has a lifecycle for transforming source code into executable container images, which consists of a few main phases:

  • Detect
  • Build
  • Export

Cloud Native Buildpacks (CNB) primarily serves a few groups:

  • Developers: Deploying code onto container-based platforms like Kubernetes
  • Buildpack Authors: Who create and group buildpacks into builders

Developers can utilize these buildpacks by referencing them when running the CNB lifecycle on their source code.

There are several tools for executing this lifecycle, including the pack CLI command, CI steps like Tekton build tasks or GitHub actions, and kpack, an Operator for configuring and running buildpacks within a Kubernetes cluster.

CNB has been adopted by many platforms and projects as their preferred build platform, including Knative Functions and OpenShift’s Source-to-Image (S2I). Specialized builders are also available for specific situations that perform a rootless build, creating the container image without running arbitrary commands as with a Dockerfile RUN directive.

There are also various tools for creating application image layers and pushing them directly to a container image registry:

  • Jib: A pure Java library and build extension that integrates with Java build tools like Maven or Gradle. It optimizes image rebuild times by creating separate image layers for the Java build artifacts, dependencies, and other static resources. It communicates directly with a container image registry for the resulting images
  • ko: A tool for creating images from Golang sources. It can create images directly from remote Git repositories and update Pod specifications to point to the image after it has been built and pushed to a registry
  • Apko: A unique builder that uses Alpine’s Apk packages as building blocks instead of Dockerfile scripts. This strategy allows for easy reuse of building blocks when creating multiple similar images

These tools have a narrow scope of what they can build, but this opinionated approach allows them to optimize build time and image size because they know precisely about the domain in which they operate and can make strong assumptions.

Build Orchestrators

In-cluster Container Image Build with a Build Pod

Build orchestrators are tools that automate the process of building, testing, and deploying your code to Kubernetes. They typically integrate with your version control system, such as Git, and trigger a pipeline of tasks whenever you push a new commit or create a pull request.

Some of the tasks that build orchestrators can perform are:

  • Building your code into a container image and pushing it to a registry
  • Running unit tests, integration tests, and code quality checks
  • Deploying your code to a staging or production environment on Kubernetes
  • Rolling back or rolling forward your deployments in case of errors or failures
  • Monitoring and logging your application performance and health

There are many build orchestrators available for Kubernetes, each with its own features, advantages, and drawbacks.

Some of the most popular ones are:

  • Jenkins: An open-source tool with a rich ecosystem of plugins. It supports scripting pipelines using Groovy or declarative syntax
  • Tekton: A cloud-native framework for building Kubernetes-native pipelines. It supports reusable tasks and pipelines, parameterized inputs and outputs, and conditional execution
  • GitHub Actions: A feature of GitHub that allows you to create workflows for your repositories. It integrates with other GitHub features, such as issues, pull requests, and secrets
  • GitLab CI/CD: A feature of GitLab that allows you to create pipelines for your projects. It supports features such as auto-scaling runners, multi-project pipelines, and environment-specific variables

These are just some examples of build orchestrators for Kubernetes. There are many other tools that you can use or combine to create your own custom solution.

The choice of the best build orchestrator for your project depends on your requirements, preferences, and budget. However, regardless of which tool you choose, the benefits of using a build orchestrator for Kubernetes are clear: faster feedback loops, improved quality assurance, and easier deployment management.

Red Hat OpenShift Build

Red Hat OpenShift is an enterprise-grade version of Kubernetes, enriched with features like an integrated container image registry, single sign-on support, a new user interface, and native image building capability. Its open-source community edition is known as OKD.

OpenShift was the pioneer in offering a cluster-integrated way of building images managed by Kubernetes. It supports various strategies for building images, including Source-to-Image (S2I), Docker builds, Pipeline builds, and Custom builds. The input for these builds can come from diverse sources like Git Repository, Dockerfile, Image, Secret Resource, and Binary Source.

OpenShift introduces some additional concepts:

  • ImageStream: Resource that references one or more container images. It allows OpenShift to emit events when an image is updated in the registry for an ImageStreamTag
  • Trigger: Acts as a listener to events

S2I is a robust mechanism for creating application images and is more secure than plain Docker builds because the build process is under full control of trusted builder images. However, S2I can be slow for complex applications, especially when many dependencies need to be loaded. To mitigate this, it’s recommended to set up a cluster-internal Maven repository that serves as a cache or use incremental builds with S2I.

A drawback of S2I is that the generated image also contains the whole build environment, increasing the size of the application image and potentially exposing vulnerabilities. To address this, OpenShift offers chained builds, which create a slim runtime image from the result of an S2I build.

OpenShift also supports Docker builds directly within the cluster. The source for a Docker build is a Dockerfile and a directory holding the context. A chained build involves an initial S2I build that creates the runtime artifact, such as a binary executable. This artifact is then used by a second build, typically a Docker build.

Conclusion

By leveraging these advanced patterns in Kubernetes, developers can architect cloud native applications that are highly scalable, resilient, and flexible. Kubernetes provides a wealth of reusable elements and patterns to tackle any challenge. By mastering these advanced patterns, developers can unlock the full potential of Kubernetes and design cloud native applications that meet the demands of modern distributed systems.

--

--

Roman Glushach

Senior Software Architect & Engineer Manager at Freelance