Extending Kubernetes Capabilities with The Operator SDK: Plugins, Add-Ons, and Extensions

Roman Glushach
8 min readSep 15, 2023

--

Kubernetes Operator SDK

An operator is a Kubernetes resource that manages a specific domain within a cluster. It encapsulates the logic for creating, updating, and deleting resources in that domain. Operators provide a way to extend the Kubernetes API and allow administrators to manage complex systems and services using the same workflows and tools they use to manage other Kubernetes resources.

For example, a database operator might be used to create, update, and delete databases within a Kubernetes cluster. The operator would handle the underlying details of provisioning and managing the database, such as creating storage volumes, configuring network settings, and running database software.

Benefits

  • Extensibility: Allows developers to create operators for a variety of resources, including non-native Kubernetes ones, without altering the Kubernetes codebase
  • Customizability: Developers can modify their operators’ behavior to meet specific needs, such as defining custom validation rules and conversion functions
  • Modularity: The SDK’s design allows for easy component reuse across operators, reducing boilerplate code and simplifying operator development and maintenance
  • Reusability: The SDK’s modular design facilitates the use of pre-built components and libraries in new operators, and includes tools for generating boilerplate code
  • Scalability: The SDK supports scaling, enabling the creation of operators that can manage large numbers of resources and requests, ensuring application availability and responsiveness under heavy loads
  • Security: The SDK includes features for encryption, authentication, and authorization to help build secure operators
  • Support for multiple programming languages: While primarily focused on Go, the SDK also provides tools for generating boilerplate code in other languages like Python and Java
  • Integration with Kubernetes APIs: The SDK includes client libraries for simple and intuitive interaction with the Kubernetes API, abstracting away much of its complexity
  • Automatic generation of operator manifests: The SDK automates the creation of necessary YAML files for operator deployment, simplifying the development process
  • Various Deployment Strategies: The SDK supports multiple deployment strategies like rolling updates and blue-green deployments, allowing developers to select the most suitable method for their applications
  • Rollback and Rollforward Features: The SDK provides rollback and rollforward features, enabling developers to revert their applications to a previous state when necessary, minimizing downtime and deployment errors
  • Support for Observability and Monitoring: The SDK offers observability and monitoring support, giving developers insights into their operators’ performance through metrics, logs, and other data
  • Testing and Validation Tools: The SDK includes tools for testing and validating operators before deployment, helping to identify bugs and edge cases early in the development process
  • Integration with CI/CD Pipelines: The SDK integrates with CI/CD pipelines, automating the software delivery process and streamlining the delivery of updates to customers

Architecture

Components

  • Operator framework: is the foundation of the Operator SDK. It provides a set of APIs and tools that enable developers to define their own custom resources, create instances of those resources, and manage their lifecycle. The framework also includes a number of pre-built components, such as the deployment controller, service controller, and validation webhook, that can be used to accelerate the development of operators
  • Operator lifecycle manager (OLM): responsible for managing the installation, upgrade, and removal of operators from a cluster. It provides a simple way to install and manage the dependencies between operators, making it easier to develop and maintain complex workflows
Operator Lifecycle Manager
  • Custom resource definitions (CRDs): used to extend the Kubernetes API with new types of objects that can be created and managed by the operator. CRDs provide a way to define the structure and behavior of these new objects, including their fields, validation rules, and relationships to other objects in the cluster
  • Controller-runtime library: shared library that provides a common runtime environment for all controllers in the Operator SDK. It includes a number of useful features, such as support for leader election, metric collection, and event handling, that can be used to simplify the development of controllers
  • Manifest generator: tool that takes a set of configuration files and generates the corresponding Kubernetes manifests. This allows developers to define their desired state declaratively, rather than having to write imperative code to create and update resources
  • Deployment controller: responsible for creating, updating, and deleting deployments in a cluster. It uses a rolling update strategy to minimize downtime during updates, and supports a variety of deployment strategies, such as continuous deployment and manual approval
  • Service controller: responsible for creating, updating, and deleting services in a cluster. It supports a range of service types, including ClusterIP, NodePort, LoadBalancer, and Ingress, and can be configured to perform load balancing and health checking
  • Validation webhook: component that runs before a deployment is updated, and checks whether the proposed changes would result in a valid deployment. It can be used to enforce policies and constraints on the deployment, such as ensuring that certain labels or annotations are present
  • Mutating webhook: component that runs after a deployment has been updated, and makes additional changes to the deployment based on the needs of the application. For example, it might add or remove containers, modify volumes or config maps, or adjust the replicas count
  • Config map controller: responsible for creating, updating, and deleting config maps in a cluster. It can be used to decouple configuration artifacts from the application code, making it easier to manage and rotate secrets
  • Secret controller: responsible for creating, updating, and deleting secrets in a cluster. It can be used to store sensitive data, such as passwords or API keys, and provides a secure way to inject that data into pods
  • Persistent volume claim (PVC) controller: responsible for creating, updating, and deleting persistent volume claims in a cluster. It can be used to request storage resources from a cluster, and provides a way to decouple storage provisioning from the application logic
  • Namespace controller: responsible for creating, updating, and deleting namespaces in a cluster. It can be used to group related resources together, providing a way to isolate resources and control access to them
  • Label selector controller: responsible for selecting labels from a set of criteria, and using those labels to filter, group, or match resources in a cluster. It can be used to implement label-based routing, or to automate the creation of resources based on specific labels
  • Annotation controller: responsible for creating, updating, and deleting annotations on resources in a cluster. It can be used to attach metadata to resources, providing a way to decorate resources with additional information that can be used by the operator
  • Cron job controller: responsible for scheduling jobs to run at specified times or intervals. It can be used to implement time-based workflows, such as backups, reporting, or maintenance tasks
  • Daemon set controller: responsible for creating, updating, and deleting daemon sets in a cluster. It can be used to run background jobs or agents that do not require a UI, and provides a way to manage the scaling and lifetime of those jobs
  • Replica set controller: responsible for creating, updating, and deleting replica sets in a cluster. It can be used to manage the scaling and availability of applications, and provides a way to ensure that a specified number of replicas are running at any given time
  • Pod controller: responsible for creating, updating, and deleting pods in a cluster. It can be used to manage the lifecycle of individual pods, and provides a way to inject configuration and secrets into pods
  • Node controller: responsible for creating, updating, and deleting nodes in a cluster. It can be used to manage the hardware infrastructure of a cluster, and provides a way to monitor and maintain the health of individual nodes

Workflow

The components of the Operator SDK work together to provide a complete platform for managing applications and services on Kubernetes.

Here’s a high-level overview of how the components interact:

  • A user creates a custom resource definition (CRD) that defines the structure and behavior of a new type of object they want to manage with the operator
  • The operator framework creates a new instance of the custom resource, and invokes the appropriate controller to manage its lifecycle
  • The controller uses the manifest generator to generate the necessary Kubernetes manifests to create or update the resource
  • The deployment controller, service controller, or other relevant controller updates the resource in the cluster, using the generated manifests
  • The validation webhook and mutating webhook are triggered, if configured, to validate and modify the deployment as needed
  • The config map controller, secret controller, PVC controller, and other specialized controllers are used to create, update, and delete supporting resources, such as config maps, secrets, and PVCs
  • The namespace controller, label selector controller, and annotation controller are used to organize and decorate resources in the cluster, as needed
  • The cron job controller, daemon set controller, replica set controller, and pod controller are used to schedule jobs, manage background processes, scale applications, and manage pods
  • The node controller is used to manage the hardware infrastructure of the cluster

Design Patterns Used in the Architecture

  • Operator pattern: This is the core concept behind Operators. It involves creating a custom resource definition (CRD) that defines the desired state of an application, and a custom controller that watches for changes to the CRD and reconciles the actual state with the desired state
  • Helm chart pattern: generates a skeleton Operator that wraps a Helm chart and manages its lifecycle
  • Ansible playbook pattern: generates a skeleton Operator that runs an Ansible playbook whenever a custom resource is created, updated, or deleted
  • Go-based pattern: generates a skeleton Operator that uses the controller-runtime library to interact with the Kubernetes API and implement the controller logic
  • Sidecar pattern: Operator SDK uses this pattern to run auxiliary containers that provide services such as metrics, leader election, or backup
  • Adapter pattern: Operator SDK uses this pattern to expose custom metrics from Operators to the Prometheus monitoring system
  • Decorator pattern: Operator SDK uses this pattern to inject common annotations, labels, or environment variables into the pods created by Operators
  • Observer pattern: Operator SDK uses this pattern to trigger events or actions based on the status or conditions of custom resources

Operator Capability Levels

Operator Capability Levels are a way of classifying operators based on the features and capabilities they offer for the applications they manage. The Operator Framework defines five capability levels, from level 1, a basic install, to level 5, autopilot. The majority of Kubernetes operators settle in the midrange, at level three.

The capability levels aim to provide guidance in terminology to express what features users can expect from an operator. Each capability level is associated with a certain set of management features the operator offers around the managed workload. Operators that do not manage a workload and/or are delegating to off-cluster orchestration services would remain at Level 1.

Conclusion

The Operator SDK empowers developers to build creative solutions that address specific use cases and fill gaps in Kubernetes’ feature set. By leveraging plugins, add-ons, and extensions, organizations can tailor Kubernetes to meet their unique requirements, from enhancing networking and monitoring to supporting serverless functions and machine learning workloads. As the Kubernetes ecosystem continues to grow, we can expect to see more innovative uses of the Operator SDK, further solidifying Kubernetes’ position as the industry-standard container orchestration platform.

--

--

Roman Glushach
Roman Glushach

Written by Roman Glushach

Senior Software Architect & Engineer Manager at Freelance

No responses yet