Load Balancing — A Visual Deep Dive

2026-05-0512 min read

Aditya Anjana

systemsinfrastructurebackendnetworking

What is Load Balancing?

At its core, a load balancer is a traffic cop for your infrastructure. When thousands of HTTP requests arrive every second, a single server can only do so much — it has finite CPU, memory, and network bandwidth. A load balancer sits in front of a pool of servers and distributes incoming requests across them, ensuring no single machine becomes a bottleneck.

Think of it like a grocery store opening multiple checkout lanes. One cashier for 500 customers is a disaster. Ten cashiers, each handling a fair share, keeps the line moving.

ℹ

Load balancing is not just about performance — it is the foundation of high availability. If one server crashes, the load balancer routes traffic to healthy ones, giving users a seamless experience.

Why It Matters

Modern applications serve millions of users. A single VM, no matter how powerful, has hard ceilings. Load balancing enables horizontal scaling — instead of buying a bigger server (vertical scaling), you add more smaller servers behind a load balancer.

⚡

Performance

Requests spread across servers reduce individual latency.

🛡

Resilience

Unhealthy servers are removed from rotation automatically.

📈

Scalability

Add new servers behind the LB with zero downtime.

Companies like Netflix, Cloudflare, and Google handle billions of requests daily. Their reliability is inseparable from sophisticated load balancing at every layer of their stack.

Load Balancing Algorithms

The algorithm determines which server a request goes to. There is no universally best choice — each fits different workloads.

Round Robindefault

Requests are distributed sequentially: Server A, Server B, Server C, then back to A. Simple and effective when all servers have identical specs and request costs.

Weighted Round Robinweighted

Like Round Robin, but servers get a weight. A server with weight 3 gets 3x more requests than one with weight 1. Useful for heterogeneous fleets.

Least Connectionsadaptive

The server with the fewest active connections gets the next request. Adapts dynamically to load — perfect for long-lived connections like WebSockets.

IP Hashstateful

A hash of the client IP determines the server. The same client always hits the same server — session persistence without sticky cookies.

Randomsimple

A random healthy server is chosen. Surprisingly effective at scale — law of large numbers ensures even distribution with no coordination overhead.

Least Response Timeintelligent

Routes to the server with the lowest combination of active connections and response time. Needs active latency monitoring, but is the most intelligent.

Algorithm	Pros	Cons	Best for
Round Robin	Simple, fair distribution	Ignores server load	Stateless, uniform requests
Least Connections	Adapts to actual load	Slightly more overhead	Long-lived connections
Weighted	Respects server capacity	Requires manual tuning	Heterogeneous servers
Random	Zero state, very fast	Can be unbalanced	Large identical fleets
IP Hash	Session persistence	Uneven if few IPs	Stateful sessions / caching

Live Demo

Switch between algorithms and watch how requests are routed in real time. Kill a server to see health-based failover in action.

Live Load Balancer

Load Balancer

Server A

0% load0 req

Server B

0% load0 req

Server C

0% load0 req

Click kill / revive on a server to simulate failures. Switch algorithms above.

Health Checks

A load balancer is only as good as its awareness of server health. Without health checks, it would happily route traffic to a crashed node, causing errors for users.

Health Check Heartbeat — click a server to toggle its status

Unhealthy servers are removed from rotation automatically.

Health checks come in two flavours:

Passive

Monitor real traffic. If a server returns too many 5xx errors, mark it unhealthy. Low overhead but reacts after users are already affected.

Active

The load balancer periodically sends a synthetic probe (e.g. HTTP GET /health) to each server. Detects failure before real users are impacted.

✦

A good /health endpoint should verify the application is actually functional, not just that the process is running. Check DB connectivity, cache reachability, and any critical dependencies.

Here is a minimal health endpoint in Node.js:

app.get('/health', async (req, res) => {
  try {
    await db.query('SELECT 1')          // check DB
    await redis.ping()                  // check cache
    res.json({ status: 'ok' })
  } catch (err) {
    res.status(503).json({ status: 'degraded', error: err.message })
  }
})

L4 vs L7 Load Balancing

Load balancers operate at different layers of the OSI model, each with different capabilities and tradeoffs.

L4 — Transport

Operates on TCP/UDP packets
Routing based on IP + port only
Cannot inspect HTTP headers or cookies
Extremely fast, minimal overhead
Example: AWS NLB, HAProxy TCP mode

L7 — Application

Operates on HTTP/HTTPS content
Can route by URL path, headers, cookies
Supports SSL termination
Enables A/B testing, canary deploys
Example: AWS ALB, Nginx, Envoy

ℹ

Most production systems use L7 load balancers for HTTP traffic because the content-awareness unlocks features like path-based routing (/api/* → API servers,/static/* → CDN), sticky sessions, and canary deployments.

Real-World Architecture

In a typical production deployment, load balancing happens at multiple layers simultaneously:

Internet

↓

DNS (GeoDNS / Anycast)← routes to nearest region

↓

L4 Load Balancer (NLB)← handles raw TCP, DDoS filtering

↓

L7 Load Balancer (ALB / Nginx)← TLS termination, path routing

↓

Application Servers [1] [2] [3] [N]← your code

↓

Internal LB (service mesh / Envoy)← microservice routing

↓

Database read replicas [R1] [R2]← read traffic spread

Kubernetes users get load balancing built-in via kube-proxy and Service objects, with advanced L7 capabilities through an Ingress Controller (Nginx Ingress, Traefik, or Istio).

# Kubernetes Service — round-robin across pods automatically
apiVersion: v1
kind: Service
metadata:
  name: api-service
spec:
  selector:
    app: api
  ports:
    - port: 80
      targetPort: 3000
  type: ClusterIP

Trade-offs & Gotchas

Load balancers solve many problems but introduce a few of their own.

⚠

Single Point of Failure: The load balancer itself must be highly available. Run it in an active-active or active-passive pair. Cloud providers (AWS ELB, GCP LB) handle this for you.

⚠

Session Stickiness: Stateful apps that store session data in-process break when requests hop between servers. Solutions: sticky sessions (IP hash / cookies), or better — move state to Redis / a DB.

ℹ

SSL Termination Cost: Terminating TLS at the LB adds CPU overhead. Modern hardware handles this well, but be mindful at extreme scale. mTLS in service meshes adds another layer.

✦

Observability: Always expose per-server request rates, error rates, and latency from your load balancer. This data is critical for debugging incidents and capacity planning.

Despite the gotchas, load balancing remains one of the most well-understood and reliable patterns in distributed systems. When in doubt, put a load balancer in front of it — your future self will thank you.

Loading…