At its core, a load balancer is a traffic cop for your infrastructure. When thousands of HTTP requests arrive every second, a single server can only do so much — it has finite CPU, memory, and network bandwidth. A load balancer sits in front of a pool of servers and distributes incoming requests across them, ensuring no single machine becomes a bottleneck.
Think of it like a grocery store opening multiple checkout lanes. One cashier for 500 customers is a disaster. Ten cashiers, each handling a fair share, keeps the line moving.
Modern applications serve millions of users. A single VM, no matter how powerful, has hard ceilings. Load balancing enables horizontal scaling — instead of buying a bigger server (vertical scaling), you add more smaller servers behind a load balancer.
Performance
Requests spread across servers reduce individual latency.
Resilience
Unhealthy servers are removed from rotation automatically.
Scalability
Add new servers behind the LB with zero downtime.
Companies like Netflix, Cloudflare, and Google handle billions of requests daily. Their reliability is inseparable from sophisticated load balancing at every layer of their stack.
The algorithm determines which server a request goes to. There is no universally best choice — each fits different workloads.
Requests are distributed sequentially: Server A, Server B, Server C, then back to A. Simple and effective when all servers have identical specs and request costs.
Like Round Robin, but servers get a weight. A server with weight 3 gets 3x more requests than one with weight 1. Useful for heterogeneous fleets.
The server with the fewest active connections gets the next request. Adapts dynamically to load — perfect for long-lived connections like WebSockets.
A hash of the client IP determines the server. The same client always hits the same server — session persistence without sticky cookies.
A random healthy server is chosen. Surprisingly effective at scale — law of large numbers ensures even distribution with no coordination overhead.
Routes to the server with the lowest combination of active connections and response time. Needs active latency monitoring, but is the most intelligent.
| Algorithm | Pros | Cons | Best for |
|---|---|---|---|
| Round Robin | Simple, fair distribution | Ignores server load | Stateless, uniform requests |
| Least Connections | Adapts to actual load | Slightly more overhead | Long-lived connections |
| Weighted | Respects server capacity | Requires manual tuning | Heterogeneous servers |
| Random | Zero state, very fast | Can be unbalanced | Large identical fleets |
| IP Hash | Session persistence | Uneven if few IPs | Stateful sessions / caching |
Switch between algorithms and watch how requests are routed in real time. Kill a server to see health-based failover in action.
Click kill / revive on a server to simulate failures. Switch algorithms above.
A load balancer is only as good as its awareness of server health. Without health checks, it would happily route traffic to a crashed node, causing errors for users.
Health Check Heartbeat — click a server to toggle its status
Unhealthy servers are removed from rotation automatically.
Health checks come in two flavours:
Monitor real traffic. If a server returns too many 5xx errors, mark it unhealthy. Low overhead but reacts after users are already affected.
The load balancer periodically sends a synthetic probe (e.g. HTTP GET /health) to each server. Detects failure before real users are impacted.
/health endpoint should verify the application is actually functional, not just that the process is running. Check DB connectivity, cache reachability, and any critical dependencies.Here is a minimal health endpoint in Node.js:
app.get('/health', async (req, res) => {
try {
await db.query('SELECT 1') // check DB
await redis.ping() // check cache
res.json({ status: 'ok' })
} catch (err) {
res.status(503).json({ status: 'degraded', error: err.message })
}
})Load balancers operate at different layers of the OSI model, each with different capabilities and tradeoffs.
L4 — Transport
L7 — Application
/api/* → API servers,/static/* → CDN), sticky sessions, and canary deployments.In a typical production deployment, load balancing happens at multiple layers simultaneously:
Kubernetes users get load balancing built-in via kube-proxy and Service objects, with advanced L7 capabilities through an Ingress Controller (Nginx Ingress, Traefik, or Istio).
# Kubernetes Service — round-robin across pods automatically
apiVersion: v1
kind: Service
metadata:
name: api-service
spec:
selector:
app: api
ports:
- port: 80
targetPort: 3000
type: ClusterIPLoad balancers solve many problems but introduce a few of their own.
Despite the gotchas, load balancing remains one of the most well-understood and reliable patterns in distributed systems. When in doubt, put a load balancer in front of it — your future self will thank you.