Kubernetes Networking: Load Balancing Techniques and Algorithms

12 min readAug 30, 2023

Load balancing is a technique used to distribute incoming network traffic across multiple servers to improve responsiveness, reliability, and scalability of applications. It involves directing requests from clients to one or more available servers based on various factors such as server load, response time, and availability.

On the Internet, load balancing is often employed to divide network traffic among several servers. This reduces the strain on each server and makes the servers more efficient, speeding up performance and reducing latency.

Service Meshes and Their Relationship to Kubernetes Networking

A service mesh is a dedicated infrastructure layer that provides reliable, secure, and observable communication between…

romanglushach.medium.com

Load balancing is essential for most Internet applications to function properly. Imagine a checkout line at a grocery store with 8 checkout lines, only one of which is open. All customers must get into the same line, and therefore it takes a long time for a customer to finish paying for their groceries. Now imagine that the store instead opens all 8 checkout lines. In this case, the wait time for customers is about 8 times shorter (depending on factors like how much food each customer is buying).

Load balancing essentially accomplishes the same thing. By dividing user requests among multiple servers, user wait time is vastly cut down.

How Does Load Balancing Work?

Load balancing works by directing incoming network traffic to the most appropriate server based on certain criteria.

The process typically involves 3 components:

Load Balancer: A load balancer sits between the user and the application servers. Its primary responsibility is to monitor the traffic coming from users and redirect it to the best suited server. There are two main types of load balancers: hardware-based and software-based. Hardware-based load balancers are dedicated appliances that perform all the necessary functions, whereas software-based load balancers run on top of existing infrastructure
Application Servers: These are the actual servers that host the applications. They receive traffic from the load balancer and serve the requested content to users
Users: End-users access the applications hosted on the application servers via the internet or intranet

When a user sends a request to access an application, the request reaches the load balancer first.

The load balancer then evaluates the request and determines which application server is best equipped to handle it. Based on factors like server capacity, usage rate, and response time, the load balancer selects the optimal server and redirects the request to it.

Once the server processes the request, it responds back to the load balancer, which then passes the response back to the user.

Benefits of Load Balancers

Improved performance: reduce the response time and latency of the system by distributing the workload evenly among multiple servers or resources, and avoiding overloading any single one
Increased reliability: enhance the availability and fault tolerance of the system by detecting and removing failed or unhealthy servers or resources from the pool, and redirecting the traffic to the remaining ones
Enhanced scalability: enable horizontal scaling of the system by adding more servers or resources to the pool as needed, without affecting the existing ones
Reduced costs: optimize the utilization and efficiency of servers or resources by avoiding underutilization or overprovisioning

Challenges of Load Balancing

Types of Load Balancers

Service Load Balancing

Service load balancing is one of the basic load balancing tactics in Kubernetes. It fields all the requests sent to the service and routes them to the pods that match the service selector.

A service is an abstraction that defines a logical set of pods and a policy to access them. Services have a stable virtual IP address (also known as cluster IP) and a DNS name that can be used by clients to communicate with the pods.

The kube-proxy component implements service load balancing with the help of iptables rules, adding some complexity to the process.

Iptables are a Linux kernel feature that allows filtering and manipulating network packets.

kube-proxy can operate in 3 modes:

Service load balancing is suitable for cluster-internal traffic that uses TCP or UDP protocols. However, it does not support HTTP or HTTPS protocols, nor does it provide advanced features such as path-based routing, SSL termination, or authentication.

Ingress Load Balancing

Ingress load balancing is another common load balancing technique in Kubernetes. It handles all the requests that enter the cluster from external sources and routes them to the appropriate services or pods based on rules defined in an ingress resource. An ingress resource is an abstraction that defines how external traffic should be routed to services or pods within a cluster.

The ingress resource requires an ingress controller to implement its rules. An ingress controller is a pod that runs a load balancer software (such as NGINX, HAProxy, or Traefik) that listens on ports 80 and 443 and processes incoming HTTP or HTTPS requests according to the ingress resource configuration. The ingress controller can also perform SSL termination, authentication, rate limiting, caching, and other functions.

The ingress controller can be deployed as a pod within the cluster or as an external load balancer outside the cluster. The former option requires exposing the ingress controller pod using a service of type NodePort or LoadBalancer, which adds another layer of load balancing. The latter option requires configuring the external load balancer to forward traffic to the ingress controller pod using a service of type ClusterIP or ExternalName.

Ingress load balancing is suitable for cluster-external traffic that uses HTTP or HTTPS protocols. It provides more flexibility and functionality than service load balancing, but it also introduces more complexity and dependency on third-party software.

External Load Balancing

External load balancing is another option for distributing traffic to Kubernetes pods from outside sources. It involves using an external load balancer (such as AWS ELB, Google Cloud Load Balancer, or Azure Load Balancer) that is not part of the Kubernetes cluster, but is integrated with it through cloud provider-specific annotations or custom controllers.

External load balancers can provide layer-4 or layer-7 load balancing, depending on the type and configuration of the load balancer. They can also offer features such as health checks, SSL termination, session persistence, and cross-zone load balancing.

External load balancers can be used in conjunction with service load balancing or ingress load balancing, or as a standalone solution. The former option requires creating a service of type LoadBalancer, which automatically provisions an external load balancer and assigns it an external IP address that can be used by clients to access the service. The latter option requires creating a service of type ExternalName, which creates a DNS record that points to the external load balancer’s hostname or IP address.

External load balancing is suitable for cluster-external traffic that uses any protocol. It provides high availability and scalability, but it also incurs additional costs and complexity.

Types of Load Balancers in Kubernetes

Load Balance Algorithms

Static Load Balancing Algorithms

Static load balancing algorithms distribute workloads without taking into account the current state of the system. A static load balancer will not be aware of which servers are performing slowly and which servers are not being used enough. Instead it assigns workloads based on a predetermined plan.

Static load balancing is quick to set up, but can result in inefficiencies. As a result, individual servers can still become overburdened.

Some examples of static load balancing algorithms are:

Round Robin: Requests are distributed across the group of servers sequentially
Hash: Requests are distributed based on a key you define, such as the client IP address or the request URL
Random: Requests are distributed randomly across the group of servers

Round Robin

Round-robin is a scheduling algorithm used by process and network schedulers in computing. It assigns time slices, also known as time quanta, to each process in equal portions and in circular order, handling all processes without priority. This algorithm is simple, easy to implement, and starvation-free. It can be applied to other scheduling problems, such as data packet scheduling in computer networks.

In the context of process scheduling, a round-robin scheduler generally employs time-sharing, giving each job a time slot or quantum (its allowance of CPU time), and interrupting the job if it is not completed by then. The job is resumed next time a time slot is assigned to that process. If the process terminates or changes its state to waiting during its attributed time quantum, the scheduler selects the first process in the ready queue to execute.

In the context of network packet scheduling, round-robin scheduling can be used as an alternative to first-come first-served queuing. A multiplexer, switch, or router that provides round-robin scheduling has a separate queue for every data flow, where a data flow may be identified by its source and destination address. The algorithm allows every active data flow that has data packets in the queue to take turns in transferring packets on a shared channel in a periodically repeated order.

Sticky Round-Robin

Sticky Round-Robin is a load balancing technique that combines the principles of Round-Robin and Sticky Sessions. In a Round-Robin load balancing algorithm, client requests are distributed across a group of servers in a cyclical manner, ensuring that the server load is evenly distributed. Sticky Sessions, on the other hand, is a technique where a load balancer creates a unique session object for each client and forwards all requests from the same client to the same server where the session data is stored and updated.

Sticky Round-Robin combines these two techniques by distributing client requests across servers in a cyclical manner while ensuring that all requests from the same client are forwarded to the same server. This can be more efficient as unique session-related data does not need to be migrated from server to server. However, it is important to note that this technique may not work well if the servers have different computational and storage capabilities, as it assumes that all servers can handle an equivalent load.

Weighted Round-Robin

Weighted Round-Robin (WRR) is a scheduling algorithm used in both computer networks and process scheduling. It is a generalization of the Round-Robin algorithm, which serves a set of queues or tasks in a cyclical manner, giving each one service opportunity per cycle. In contrast, WRR offers each queue or task a fixed number of service opportunities per cycle, as specified by the configured weight, which serves to influence the portion of capacity received by each queue or task.

In the context of computer networks, WRR can be used as a network scheduler for data flows. A service opportunity is the emission of one packet if the selected queue is non-empty. If all packets have the same size, WRR is the simplest approximation of Generalized Processor Sharing (GPS). Several variations of WRR exist, including classical WRR and interleaved WRR.

In the context of process scheduling, WRR can be used to schedule processes in a similar way. Each process is associated with a weight, and the scheduler cycles over the processes in each cycle, giving each process a number of service opportunities equal to its weight.

IP/URL Hash

IP Hash load balancing is a technique that uses the source and destination IP addresses of a data packet to determine which server in a pool of servers should handle the request. The load balancer performs a mathematical calculation on the IP addresses to generate a hash value, which is then used to select the server. This technique can improve performance in situations where a single client communicates with multiple servers, as it allows the client to balance its load across all of the network adapters in the team and make better use of the available bandwidth.

URL Hash load balancing is similar to IP Hash, except that it uses the URL present in the client request to generate the hash value. The load balancer generates the hash value based on the HTTP URL present in requests coming from the clients. Based on this hash value, requests will be forwarded to servers. So if the same request is coming for the same URL, it will be sent to the same server². This ensures that any client requests to a particular URL always go to the same back-end server.

Both IP Hash and URL Hash load balancing algorithms can be effective in distributing client requests across a pool of servers, depending on the specific needs and requirements of the network.

Dynamic Load Balancing Algorithms

Dynamic load balancing algorithms distribute workloads by taking into account the current state of the system. A dynamic load balancer will monitor the performance and capacity of each server and assign workloads accordingly.

Dynamic load balancing is more complex to set up, but can result in better performance and efficiency. As a result, it balances the load among servers based on their actual capabilities and availability.

Some examples of dynamic load balancing algorithms are:

Least Connections: A new request is sent to the server with the fewest current connections to clients. The relative computing capacity of each server is factored into determining which one has the least connections
Least Response Time: A new request is sent to the server selected by a formula that combines the fastest response time and fewest active connections
Least Bandwidth: A new request is sent to the server that has used the least amount of bandwidth in the last period of time

Least Connections

The Least Connections load balancing algorithm is a dynamic algorithm that distributes incoming requests to the server that is currently managing the fewest open connections at the time the new connection request is received. This algorithm takes into account the current number of active connections on each server and forwards new requests to the server that is currently serving the lowest number of active connections.

The Least Connections algorithm assumes that all connections require roughly equal processing power, and it can be effective in situations where servers have different computational capabilities. By directing new requests to the server with the fewest active connections, this algorithm can help to balance the load across all servers and improve overall system performance.

It is important to note that this algorithm may not work well if the servers have different computational and storage capabilities, as it assumes that all servers can handle an equivalent load. In such cases, other load balancing algorithms such as Weighted Round-Robin or Sticky Round-Robin may be more effective.

Least Response Time

The Least Response Time load balancing algorithm is a dynamic algorithm that takes into account the current number of active connections on each server, plus the average response time. This load balancer forwards the new request to the server that is currently serving the lowest number of active connections and has the shortest average response time. This technique increases the availability time of servers by assigning new requests evenly to each server to prevent overloading.

Least Bandwidth

The Least Bandwidth load balancing algorithm is a dynamic algorithm that distributes incoming requests to the server that is currently serving the least amount of traffic, measured in megabits per second (Mbps). This algorithm takes into account the current bandwidth usage of each server and forwards new requests to the server that is currently serving the lowest amount of traffic.

By directing new requests to the server with the least bandwidth usage, this algorithm can help to balance the load across all servers and improve overall system performance. It is important to note that this algorithm may not work well if the servers have different computational and storage capabilities, as it assumes that all servers can handle an equivalent load. In such cases, other load balancing algorithms such as Weighted Round-Robin or Sticky Round-Robin may be more effective.

Overview Table of Load Balance Algorithms

Tips for Choosing the Right Load Balancer

Conclusion

Load balancing is a powerful technique that can improve the performance, reliability, and scalability of your applications. By distributing workloads across multiple servers, you can reduce latency, increase availability, and handle more traffic. Load balancing can be implemented using different algorithms, depending on your needs and preferences.

Whether you use hardware or software load balancers, or a combination of both, load balancing can help you deliver a better user experience and optimize your resources.

Simplifying Kubernetes Cluster Management with Effective Configurations

In Kubernetes, configuration management refers to the process of defining, maintaining, and updating the desired state…