Lesson 16 of 23 6 min

Kubernetes Networking: What Happens Between the Load Balancer and Your Pod?

A backend engineer's guide to K8s networking. Learn about Services, ClusterIP, NodePort, Ingress Controllers, and the Container Network Interface (CNI).

Reading Mode

Hide the curriculum rail and keep the lesson centered for focused reading.

Kubernetes Networking for Backend Developers

Mental Model

Connecting isolated components into a resilient, scalable, and observable distributed web.

As a backend engineer, you usually stop thinking about a request once it hits the Load Balancer. In Kubernetes, that is just the beginning. Understanding the network hop between the Ingress and your code is critical for debugging latency and connection timeouts.

1. The Service Abstraction

graph LR
    Producer[Producer Service] -->|Publish Event| Kafka[Kafka / Event Bus]
    Kafka -->|Consume| Consumer1[Consumer Group A]
    Kafka -->|Consume| Consumer2[Consumer Group B]
    Consumer1 --> DB1[(Primary DB)]
    Consumer2 --> Cache[(Redis)]

Pods in K8s are ephemeral; they die and get new IP addresses. You cannot point a client to a Pod IP.

  • ClusterIP: A stable internal IP that load balances traffic across a set of pods. It is only accessible within the cluster.
  • NodePort: Exposes the service on a specific port on every Node's IP.

2. The Ingress Controller (The Front Door)

The Ingress is an API object that manages external access, typically HTTP.

  • The Controller: A pod (like Nginx or Envoy) that actually implements the rules.
  • The Flow: External Client -> Cloud Load Balancer -> Ingress Controller -> Service -> Pod.

3. CNI: The Plumbing

The Container Network Interface (CNI) is the plugin that allows pods to talk to each other. Popular CNIs like Calico or Cilium use eBPF or IPtable rules to route packets with near-native speed.

4. kube-proxy and traffic steering

Service routing is implemented through kube-proxy (or eBPF replacements), which programs node-level rules for service VIP translation.

Two important behaviors:

  • connection distribution is influenced by hashing and NAT rules
  • long-lived connections may stay pinned to specific backend pods

This is why scaling replicas does not always rebalance existing traffic immediately.

5. North-south vs east-west traffic

Kubernetes traffic has two broad classes:

  • North-south: external users entering cluster (LB/Ingress path)
  • East-west: service-to-service calls inside cluster

Latency and policy controls differ between them. Most backend bottlenecks hide in east-west paths.

6. NetworkPolicy and zero-trust segmentation

By default many clusters allow broad pod-to-pod communication.
Use NetworkPolicy to enforce least-privilege communication:

  • limit namespace/service access
  • block lateral movement risk
  • reduce blast radius during compromise

Security posture depends heavily on CNI support and policy enforcement mode.

7. Common latency and timeout causes

  • cross-zone traffic due to uneven pod scheduling
  • DNS resolution delays under CoreDNS load
  • connection tracking table pressure on busy nodes
  • ingress/controller misconfigured timeouts
  • sidecar + ingress + service hop amplification

Observability across each hop is required before tuning blindly.

8. Practical debugging workflow

  1. trace request path hop by hop
  2. compare ingress, service, and app latency metrics
  3. inspect retries and timeout mismatches between layers
  4. validate endpoint health and pod readiness states
  5. check CNI datapath drops and node-level saturation

This structured approach avoids "Kubernetes is slow" guesswork.

9. Design recommendations for backend teams

  • keep service dependencies explicit and shallow
  • align timeout budgets across ingress/service/client layers
  • prefer readiness probes that reflect actual app readiness
  • use topology-aware routing for zone-local traffic when possible

Networking reliability is part of application design, not only platform team ownership.

Summary

Understanding the hop-count in your K8s cluster is essential for P99 optimization. Every layer (Ingress, Service, Sidecar) adds a few milliseconds of latency.


Engineering Standard: The "Staff" Perspective

In high-throughput distributed systems, the code we write is often the easiest part. The difficulty lies in how that code interacts with other components in the stack.

1. Data Integrity and The "P" in CAP

Whenever you are dealing with state (Databases, Caches, or In-memory stores), you must account for Network Partitions. In a standard Java microservice, we often choose Availability (AP) by using Eventual Consistency patterns. However, for financial ledgers, we must enforce Strong Consistency (CP), which usually involves distributed locks (Redis Redlock or Zookeeper) or a strictly linearizable sequence.

2. The Observability Pillar

Writing logic without observability is like flying a plane without a dashboard. Every production service must implement:

  • Tracing (OpenTelemetry): Track a single request across 50 microservices.
  • Metrics (Prometheus): Monitor Heap usage, Thread saturation, and P99 latencies.
  • Structured Logging (ELK/Splunk): Never log raw strings; use JSON so you can query logs like a database.

3. Production Incident Prevention

To survive a 3:00 AM incident, we use:

  • Circuit Breakers: Stop the bleeding if a downstream service is down.
  • Bulkheads: Isolate thread pools so one failing endpoint doesn't crash the entire app.
  • Retries with Exponential Backoff: Avoid the "Thundering Herd" problem when a service comes back online.

Critical Interview Nuance

When an interviewer asks you about this topic, don't just explain the code. Explain the Trade-offs. A Staff Engineer is someone who knows that every architectural decision is a choice between two "bad" outcomes. You are picking the one that aligns with the business goal.

Performance Checklist for High-Load Systems:

  1. Minimize Object Creation: Use primitive arrays and reusable buffers.
  2. Batching: Group 1,000 small writes into 1 large batch to save I/O cycles.
  3. Async Processing: If the user doesn't need the result immediately, move it to a Message Queue (Kafka/SQS).

Technical Trade-offs: Messaging Systems

Pattern Ordering Durability Throughput Complexity
Log-based (Kafka) Strict (per partition) High Very High High
Memory-based (Redis Pub/Sub) None Low High Very Low
Push-based (RabbitMQ) Fair Medium Medium Medium

Key Takeaways

  • ClusterIP: A stable internal IP that load balances traffic across a set of pods. It is only accessible within the cluster.
  • NodePort: Exposes the service on a specific port on every Node's IP.
  • The Controller: A pod (like Nginx or Envoy) that actually implements the rules.

Verbal Interview Script

Interviewer: "How would you ensure high availability and fault tolerance for this specific architecture?"

Candidate: "To achieve 'Five Nines' (99.999%) availability, we must eliminate all Single Points of Failure (SPOF). I would deploy the API Gateway and stateless microservices across multiple Availability Zones (AZs) behind an active-active load balancer. For the data layer, I would use asynchronous replication to a read-replica in a different region for disaster recovery. Furthermore, it's not enough to just deploy redundantly; we must protect the system from cascading failures. I would implement strict timeouts, retry mechanisms with exponential backoff and jitter, and Circuit Breakers (using a library like Resilience4j) on all synchronous network calls between microservices."

Want to track your progress?

Sign in to save your progress, track completed lessons, and pick up where you left off.