Best Practices for Kubernetes Namespace Management

To set up guardrails for a new team, you need a “pincer maneuver.” A LimitRange ensures their containers are sane, and a ResourceQuota ensures the team doesn’t accidentally run up a massive cloud bill or starve other departments of resources.

Here is the blueprint for a “standard” production-ready namespace.


1. Create the Namespace

Always isolate teams into their own namespaces. This is the foundation for all security and resource policies.

Bash

kubectl create namespace team-alpha

2. Apply the LimitRange (Individual Pod Rules)

This prevents “Naked Pods” (pods without resource definitions) and stops developers from requesting a 16-core CPU for a simple web-server.

YAML

apiVersion: v1
kind: LimitRange
metadata:
name: team-alpha-limits
namespace: team-alpha
spec:
limits:
- type: Container
default: # Applied if the developer omits 'limits'
cpu: "500m"
memory: "512Mi"
defaultRequest: # Applied if the developer omits 'requests'
cpu: "100m"
memory: "256Mi"
max: # Rejects Pods larger than this
cpu: "2"
memory: "2Gi"
min: # Rejects Pods smaller than this
cpu: "10m"
memory: "64Mi"

3. Apply the ResourceQuota (Total Team Budget)

This is your safety net for the cluster’s aggregate capacity.

YAML

apiVersion: v1
kind: ResourceQuota
metadata:
name: team-alpha-quota
namespace: team-alpha
spec:
hard:
requests.cpu: "4" # Total CPU requests allowed in namespace
requests.memory: "8Gi" # Total RAM requests allowed
limits.cpu: "10" # Max burst CPU across all pods
limits.memory: "16Gi" # Max burst RAM across all pods
pods: "20" # Max number of Pods allowed
services: "10" # Max number of Services
persistentvolumeclaims: "5" # Max number of storage requests

4. Best Practices for Onboarding

  • Enforce PriorityClasses: If this team’s work is non-critical (like a dev environment), you can add a scope to the ResourceQuota so it only counts against “Low Priority” workloads.
  • Storage Quotas: Don’t forget to limit requests.storage. Without it, a team could provision dozens of expensive 1TB cloud disks.
  • The “Admission Refused” Talk: Warn your developers! If they try to deploy a Pod that exceeds these limits, Kubernetes will return an error immediately:Error creating: pods "my-pod" is forbidden: maximum cpu usage per Container is 2, but limit is 4.

The “Dry Run” Check

Before handing over the keys, you can test if your guardrails work by trying to run a “greedy” pod:

Bash

kubectl run greedy-pod --image=nginx --namespace=team-alpha --requests='cpu=5'

If set up correctly, Kubernetes should instantly block this command.

Java and Machine Learning (ML) workloads are notoriously “bursty” and memory-heavy. Java’s JVM needs specific heap settings to respect container limits, and ML models can easily OOM (Out of Memory) a node if they aren’t strictly caged.

For these workloads, your guardrails need to be more generous with memory but stricter on CPU to prevent “noisy neighbor” syndrome.


Guardrails for Java & ML

1. The Java-Optimized LimitRange

Java applications often require a larger gap between request and limit because the JVM performs intensive garbage collection and startup routines that spike CPU.

YAML

apiVersion: v1
kind: LimitRange
metadata:
name: java-ml-limits
namespace: team-alpha
spec:
limits:
- type: Container
default:
cpu: "1" # Java needs more than 500m to start efficiently
memory: "2Gi" # Base for standard JVM/Spring Boot apps
defaultRequest:
cpu: "500m"
memory: "1Gi"
max:
cpu: "4" # Cap for ML training or high-perf Java
memory: "16Gi" # Heavy ML models/large heaps need room
min:
cpu: "100m"
memory: "512Mi" # Anything lower often causes JVM startup failures

2. The ResourceQuota (Aggregated)

Since ML and Java use significantly more RAM than Go or Node.js apps, your total namespace quota needs to reflect that.

YAML

apiVersion: v1
kind: ResourceQuota
metadata:
name: team-alpha-quota
spec:
hard:
requests.cpu: "16"
requests.memory: "64Gi"
limits.cpu: "32"
limits.memory: "128Gi"
# If using GPUs for ML, you MUST include them here:
requests.nvidia.com/gpu: "4"

Crucial Tips for Java & ML Teams

The JVM “Cgroup” Awareness

In older versions of Java, the JVM didn’t know it was in a container and would try to claim the memory of the entire Node rather than the Pod, leading to instant crashes.

  • Ensure the team is using Java 11+ or Java 8u191+.
  • Advise them to use the flag -XX:+UseContainerSupport (usually on by default now).
  • They should set -XX:MaxRAMPercentage=75.0 rather than hardcoding heap sizes (Xmx). This ensures the Java Heap scales automatically if you change the LimitRange later.

ML & “Spot” Instances

If the ML team is doing training (not just inference), their Pods will be expensive.

  • Guardrail: Use PriorityClasses. Create a low-priority class for training jobs.
  • If the cluster runs out of room for a production Java API, Kubernetes will evict the ML training job first to save the API.

Local Storage for ML

ML workloads often download massive datasets (ImageNet, etc.).

  • A standard LimitRange doesn’t stop a Pod from filling up the Node’s local disk with emptyDir volumes.
  • Add to Quota: requests.ephemeral-storage: "50Gi" to prevent a single ML container from crashing the Node by filling the disk.

Leave a Reply