Autoscaling in Kubernetes

Autoscaling in Kubernetes is the process of automatically adjusting your resources to match the current demand. Instead of guessing how many servers or how much memory you need, Kubernetes monitors your traffic and “flexes” the infrastructure in real-time.

There are three main “layers” of autoscaling. Think of them as a chain: if one layer can’t handle the load, the next one kicks in.


1. Horizontal Pod Autoscaler (HPA)

The Concept: Adding more “lanes” to the highway.

HPA is the most common form of scaling. It increases or decreases the number of pod replicas based on metrics like CPU usage, memory, or custom traffic data.

  • How it works: It checks your pods every 15 seconds. If the average CPU across all pods is above your target (e.g., 70%), it tells the Deployment to spin up more pods.
  • Best for: Stateless services like web APIs or microservices that can handle traffic by simply having more copies running.

2. Vertical Pod Autoscaler (VPA)

The Concept: Making the “cars” bigger.

VPA doesn’t add more pods; instead, it looks at a single pod and decides if it needs more CPU or Memory. It “right-sizes” your containers.

  • How it works: It observes your app’s actual usage over time. If a pod is constantly hitting its memory limit, VPA will recommend (or automatically apply) a higher limit.
  • The Catch: Currently, in most versions of Kubernetes, changing a pod’s size requires restarting the pod.
  • Best for: Stateful apps (like databases) that can’t easily be “split” into multiple copies, or apps where you aren’t sure what the resource limits should be.

3. Cluster Autoscaler (CA)

The Concept: Adding more “pavement” to the highway.

HPA and VPA scale Pods, but eventually, you will run out of physical space on your worker nodes (VMs). This is where the Cluster Autoscaler comes in.

  • How it works: It watches for “Pending” pods—pods that want to run but can’t because no node has enough free CPU/RAM. When it sees this, it calls your cloud provider (AWS, Azure, GCP) and asks for a new VM to be added to the cluster.
  • Downscaling: It also watches for underutilized nodes. If a node is mostly empty, it will move those pods elsewhere and delete the node to save money.

The “Scaling Chain” in Action

Imagine a sudden surge of users hits your website:

  1. HPA sees high CPU usage and creates 10 new Pods.
  2. The cluster is full, so those 10 Pods stay in Pending status.
  3. Cluster Autoscaler sees the Pending pods and provisions 2 new Worker Nodes.
  4. The Pods finally land on the new nodes, and your website stays online.

Comparison Summary

FeatureHPAVPACluster Autoscaler
What it scalesNumber of PodsSize of Pods (CPU/RAM)Number of Nodes (VMs)
Primary GoalHandle traffic spikesOptimize resource efficiencyProvide hardware capacity
ImpactFast, no downtimeUsually requires pod restartSlower (minutes to boot VM)

Pro-Tip: Never run HPA and VPA on the same metric (like CPU) for the same app. They will “fight” each other—HPA will try to add pods while VPA tries to make them bigger, leading to a “flapping” state where your app is constantly restarting.

To set up a Horizontal Pod Autoscaler (HPA), you need two things: a Deployment (your app) and an HPA resource that watches it.

Here is a breakdown of how to configure this in a way that actually works.

1. The Deployment

First, your pods must have resources.requests defined. If the HPA doesn’t know how much CPU a pod should use, it can’t calculate the percentage.

YAML

apiVersion: apps/v1
kind: Deployment
metadata:
name: php-apache
spec:
selector:
matchLabels:
run: php-apache
replicas: 1
template:
metadata:
labels:
run: php-apache
spec:
containers:
- name: php-apache
image: registry.k8s.io/hpa-example
ports:
- containerPort: 80
resources:
limits:
cpu: 500m
requests:
cpu: 200m # HPA uses this as the baseline

2. The HPA Resource

This YAML tells Kubernetes: “Keep the average CPU usage of these pods at 50%. If it goes higher, spin up more pods (up to 10). If it goes lower, scale back down to 1.”

YAML

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50

3. How to Apply and Test

You can apply these using oc apply -f <filename>.yaml (in OpenShift) or kubectl apply.

Once applied, you can watch the autoscaler in real-time:

  • View status: oc get hpa
  • Watch it live: oc get hpa php-apache-hpa --watch

The Calculation Logic:

The HPA uses a specific formula to decide how many replicas to run:

$$\text{Desired Replicas} = \lceil \text{Current Replicas} \times \frac{\text{Current Metric Value}}{\text{Desired Metric Value}} \rceil$$

Quick Tip: If you are using OpenShift, you can also do this instantly via the CLI without a YAML file:

oc autoscale deployment/php-apache --cpu-percent=50 --min=1 --max=10

To make your autoscaling more robust, you can combine CPU and Memory metrics in a single HPA. Kubernetes will look at both and scale based on whichever one hits the limit first.

Here is the updated YAML including both resource types and a “Scale Down” stabilization period to prevent your cluster from “flapping” (rapidly adding and removing pods).

1. Advanced HPA YAML (CPU + Memory)

YAML

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: advanced-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: advanced-app
minReplicas: 2
maxReplicas: 15
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 mins before scaling down to ensure traffic is actually gone
policies:
- type: Percent
value: 10
periodSeconds: 60

2. Scaling on Custom Metrics (e.g., HTTP Requests)

Sometimes CPU doesn’t tell the whole story. If your app is waiting on a database, CPU might stay low while users experience lag. In these cases, you can scale based on Requests Per Second (RPS).

To use this, you must have the Prometheus Adapter installed (which comes standard in OpenShift’s monitoring stack).

YAML

  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: 500 # Scale up if pods average more than 500 requests/sec


Pro-Tips for Memory Scaling

  1. Memory is “Sticky”: Unlike CPU, which drops the moment a process finishes, many runtimes (like Java/JVM or Node.js) do not immediately release memory back to the OS.
  2. The Danger: If your app doesn’t have a good Garbage Collector configuration, the HPA might see high memory usage, spin up 10 pods, and never scale back down because the memory stays “reserved” by the app.
  3. The Fix: Always ensure your memory.requests in the Deployment are set to what the app actually needs to start, not its peak limit.

Summary Table: Which metric to use?

ScenarioRecommended MetricWhy?
Calculation heavyCPUDirectly maps to processing power.
Caching/Large DataMemoryPrevents OOM (Out of Memory) kills.
Web APIsRequests Per SecondScaled based on actual user load.
Message QueueQueue DepthScales based on “work to be done.”

When an HPA isn’t behaving as expected—maybe it’s not scaling up during a spike, or it’s “stuck” at the minimum replicas—you need to look at the Controller Manager’s internal logic.

Here is how you can perform a “health check” on your HPA’s decision-making process.


1. The “Describe” Command (Most Useful)

The describe command provides a chronological log of every scaling action and, more importantly, why a request failed.

Bash

oc describe hpa advanced-app-hpa

What to look for in the “Events” section:

  • SuccessfulRescale: The HPA successfully changed the replica count.
  • FailedComputeMetricsReplicas: Usually means the HPA can’t talk to the Metrics Server (check if your pods have resources.requests defined!).
  • FailedGetResourceMetric: The pods might be crashing or “Unready,” so the HPA can’t pull their CPU/Memory usage.

2. Checking the “Conditions”

In the output of the describe command, look for the Conditions section. It tells you the current “brain state” of the autoscaler:

ConditionStatusMeaning
AbleToScaleTrueThe HPA is healthy and can talk to the Deployment.
ScalingActiveTrueMetrics are being received and scaling logic is running.
ScalingLimitedTrueWarning: You’ve hit your maxReplicas or minReplicas. It wants to scale further but you’ve capped it.

3. Real-time Metric Monitoring

If you want to see exactly what numbers the HPA is seeing right now compared to your target, use:

Bash

oc get hpa advanced-app-hpa -w

Example Output:

Plaintext

NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
advanced-app-hpa Deployment/advanced-app 75%/60%, 40%/80% 2 15 5 10m

In this example, CPU is at 75% (above the 60% target), so it has already scaled to 5 replicas.


4. Debugging Common “Stuck” Scenarios

Scenario A: “Target: “

If the TARGETS column shows <unknown>, it almost always means:

  1. Missing Requests: You forgot to set resources.requests in your Deployment YAML.
  2. Metrics Server Down: The cluster-wide metrics service is having issues.
  3. Labels Mismatch: The HPA selector doesn’t match the Deployment labels.

Scenario B: High CPU but No Scaling

Check if the pods are in a Ready state. HPA ignores “Unready” pods to prevent scaling up based on the high CPU usage often seen during a container’s startup/boot phase.


Pro-Tip: The “Cooldown” Period

If you just stopped a load test and the pods are still running, don’t panic! By default, Kubernetes has a 5-minute stabilization window for scaling down. This prevents the “Flapping” effect where pods are deleted and then immediately recreated because of a small traffic blip.

How to audit an MCP server

Auditing a Model Context Protocol (MCP) server in 2026 requires a shift from traditional web auditing to Agentic Security Auditing. Since an LLM acts as the user of your server, you must audit not just the code, but the “instructions” and “boundaries” presented to the AI.

Here is the professional workflow for conducting a comprehensive MCP server audit.


1. Discovery & Tool Inspection

The first step is to see exactly what the AI sees. A malicious agent or a prompt injection can only exploit what is exposed in the tool definitions.

  • Use the MCP Inspector: Run npx @modelcontextprotocol/inspector to launch a local GUI. Connect your server and inspect the Tools tab.
  • Audit Tool Descriptions: Check if the descriptions are too “helpful.”
    • Bad: “This tool runs any bash command.”
    • Good: “This tool lists files in the /public directory only.”
  • Schema Strictness: Ensure every tool uses strict JSON Schema. AI agents are prone to “hallucinating” extra arguments; your server should reject any input that doesn’t perfectly match the schema.

2. Static Analysis (The “Code” Audit)

Since most MCP servers are written in TypeScript or Python, use standard security scanners with MCP-specific rules.

  • Dependency Check: Use npm audit or pip-audit. MCP is a new ecosystem; many early community servers use outdated, vulnerable libraries.
  • Path Traversal Check: This is the #1 vulnerability in MCP (found in 80% of filesystem-based servers).
    • Audit Task: Search your code for fs.readFile or open(). Ensure user-provided paths are sanitized using path.resolve and checked against a “Root” directory.
  • Command Injection: If your tool executes shell commands (e.g., a Git or Docker tool), ensure inputs are passed as arrays, never as strings.
    • Vulnerable: exec("git log " + user_input)
    • Secure: spawn("git", ["log", user_input])

3. Runtime & Behavioral Auditing

In 2026, we use eBPF-based monitoring or MCP Gateways to watch what the server actually does during a session.

  • Sandbox Verification: Run the server in a restricted Docker container. Audit the Dockerfile to ensure it runs as a non-root user (USER node or USER python).
  • Network Egress Audit: Does your server need to talk to the internet? If it’s a “Local File” tool, use firewall rules (or Docker network flags) to block all outgoing traffic. This prevents “Data Exfiltration” where an AI is tricked into sending your files to a remote server.
  • AIVSS Scoring: Use the AI Vulnerability Scoring System (AIVSS) to rank findings. A “Prompt Injection” that leads to a file read is a High; a “Prompt Injection” that leads to a shell execution is Critical.

4. The 2026 Audit Checklist

If you are performing a formal audit, ensure you can check “Yes” to all of the following:

CategoryAudit Check
AuthenticationDoes the server require a token for every request (especially for HTTP transports)?
SanitizationAre all LLM-generated arguments validated against a regex or allowlist?
Least PrivilegeDoes the server only have access to the specific folders/APIs it needs?
Human-in-LoopAre “Write” or “Delete” actions flagged to require manual user approval in the client?
LoggingDoes the server log the User ID, Tool Name, and Arguments for every call?

5. Automated Auditing Tools

To speed up the process, you can use these 2026-standard tools:

  1. mcpserver-audit: A GitHub-hosted tool that scans MCP source code for common dangerous patterns (like unparameterized SQL or open shell calls).
  2. Trivy / Docker Scout: For scanning the container image where your MCP server lives.
  3. Semgrep (MCP Ruleset): Use specialized Semgrep rules designed to find “AI Injection” points in Model Context Protocol implementations.

Multi-Layered Test Plan.

To perform a professional audit of an MCP server in 2026, you should follow a Multi-Layered Test Plan. Since MCP servers act as “Resource Servers” in an agentic ecosystem, your audit must verify that a compromised or malicious AI cannot “break out” of its intended scope.

Here is a 5-step Security Test Plan for an MCP server.


1. Static Analysis: “The Code Review”

Before running the server, scan the source code for common “agent-trap” patterns.

  • Check for shell=True (Python) or exec() (Node.js): These are the most common entry points for Remote Code Execution (RCE).
    • Test: Ensure all CLI tools use argument arrays instead of string concatenation.
  • Path Traversal Audit: Look for any tool that takes a path or filename as an argument.
    • Test: Verify that the code uses path.resolve() and checks if the resulting path starts with an allowed root directory.
    • Common Fail: Using simple string .startsWith() without resolving symlinks first (CVE-2025-53109).

2. Manifest & Metadata Audit

The LLM “sees” your server through its JSON-RPC manifest. If your tool descriptions are vague, the LLM might misuse them.

  • Tool Naming: Ensure tool names use snake_case (e.g., get_user_data) for optimal tokenization and clarity.
  • Prompt Injection Resilience: Check if tool descriptions include “Safety instructions.”
    • Example: “This tool reads files. Safety: Never read files ending in .env or .pem.
  • Annotations: Verify that “destructive” tools (delete, update, send) are marked with destructiveHint: true. This triggers a mandatory confirmation popup in modern MCP clients like Cursor or Claude Desktop.

3. Dynamic “Fuzzing” (The AI Stress Test)

In 2026, we use tools like mcp-sec-audit to “fuzz” the server. This involves sending nonsensical or malicious JSON-RPC payloads to see how the server reacts.

Test ScenarioPayload ExampleExpected Result
Path Traversal{"path": "../../../etc/passwd"}403 Forbidden or Error: Invalid Path
Command Injection{"cmd": "ls; rm -rf /"}The server should treat ; as a literal string, not a command separator.
Resource ExhaustionCalling read_file 100 times in 1 second.Server should trigger Rate Limiting.

4. Sandbox & Infrastructure Audit

An MCP server should never “run naked” on your host machine.

  • Docker Isolation: Audit the Dockerfile. It should use a distroless or minimal image (like alpine) and a non-root user.
  • Network Egress: Use iptables or Docker network policies to block the MCP server from reaching the internet unless its specific function requires it (e.g., a “Web Search” tool).
  • Memory/CPU Limits: Ensure the container has cpus: 0.5 and memory: 512mb limits to prevent a “Looping AI” from crashing your host.

5. OAuth 2.1 & Identity Verification

If your MCP server is shared over a network (HTTP transport), it must follow the June 2025 MCP Auth Spec.

  • PKCE Implementation: Verify that the server requires Proof Key for Code Exchange (PKCE) for all client connections. This prevents “Authorization Code Interception.”
  • Scope Enforcement: If a user only authorized the read_only scope, ensure the server rejects calls to delete_record even if the token is valid.
  • Audit Logging: Every tool call must be logged with:
    1. The user_id who initiated it.
    2. The agent_id that generated the call.
    3. The exact arguments used.

Pro-Tooling for 2026

  • MCP Inspector: Use npx @modelcontextprotocol/inspector for a manual “sanity check” of your tools.
  • Snyk / Trivy: Run these against your MCP server’s repository to catch vulnerable 3rd-party dependencies.

Would you like me to help you write a “Safety Wrapper” in Python or TypeScript that automatically validates all file paths before your MCP server processes them?

MCP security

The Model Context Protocol (MCP) is a powerful “USB-C for AI,” but because it allows LLMs to execute code and access private data, it introduces unique security risks.

In 2026, security for MCP has moved beyond simple API keys to a Zero Trust architecture. Here are the best practices for securing your MCP implementation.


1. The “Human-in-the-Loop” (HITL) Requirement

The most critical defense is ensuring an AI never executes “side-effect” actions (writing, deleting, or sending data) without manual approval.

  • Tiered Permissions: Classify tools into read-only (safe) and sensitive (requires approval).
  • Explicit Confirmation: The MCP client must display the full command and all arguments to the user before execution. Never allow the AI to “hide” parameters.
  • “Don’t Ask Again” Risks: Avoid persistent “allowlists” for bash commands or file writes; instead, scope approvals to a single session or specific directory.

2. Secure Architecture & Isolation

Running an MCP server directly on your host machine is a major risk. If the AI is tricked into running a malicious command, it has the same permissions as you.

  • Containerization: Always run MCP servers in a Docker container or a WebAssembly (Wasm) runtime. This prevents “Path Traversal” attacks where an AI might try to read your ~/.ssh/ folder.
  • Least Privilege: Use a dedicated, unprivileged service account to run the server. If the tool only needs to read one folder, do not give it access to the entire drive.
  • Network Egress: Block the MCP server from accessing the public internet unless it’s strictly necessary for that tool’s function.

3. Defense Against Injection Attacks

MCP is vulnerable to Indirect Prompt Injection, where a malicious instruction is hidden inside data the AI reads (like a poisoned webpage or email).

  • Tool Description Sanitization: Attackers can “poison” tool descriptions to trick the AI into exfiltrating data. Regularly audit the descriptions of third-party MCP servers.
  • Input Validation: Treat all inputs from the LLM as untrusted. Use strict typing (Pydantic/Zod) and regex patterns to ensure the AI isn’t passing malicious flags to a bash command.
  • Semantic Rate Limiting: Use an MCP Gateway to kill connections if an agent attempts to call a “Read File” tool hundreds of times in a few seconds—a classic sign of data exfiltration.

4. Identity & Authentication (2026 Standards)

For remote or enterprise MCP setups, static API keys are no longer sufficient.

  • OAuth 2.1 + PKCE: This is the mandated standard for HTTP-based MCP. It ensures that tokens are bound to specific users and cannot be easily intercepted.
  • Token Scoping: Never use a single “Master Key.” Issue short-lived tokens that are scoped only to the specific MCP tools the user needs.
  • Separation of Roles: Keep your Authorization Server (which identifies the user) separate from your Resource Server (the MCP server). This makes auditing easier and prevents a breach of one from compromising the other.

5. Supply Chain Security

The “Rug Pull” is a common 2026 threat where a popular open-source MCP server is updated with malicious code (e.g., a BCC field added to an email tool).

  • Pin Versions: Never pull the latest version of an MCP server in production. Pin to a specific, audited version or hash.
  • Vetted Registries: Only use servers from trusted sources like the Official MCP Catalog or internally vetted company registries.
  • Audit Logs: Log every tool invocation, including who requested it, what the arguments were, and what the output was.

Summary Checklist for Developers

RiskMitigation
Data ExfiltrationDisable network access for local tools; use PII redaction.
Command InjectionUse argument arrays (parameterized) instead of shell strings.
Unauthorized AccessImplement OAuth 2.1 with scope-based tool control.
Lateral MovementSandbox servers in Docker/Wasm; limit filesystem access.

MCP + Kubernetes Management

Here’s a breakdown of this topic across all three domains:


MCP + Kubernetes Management

What it looks like: An LLM agent connects to a Kubernetes MCP server that exposes kubectl operations as tools. The agent can then:

  • list_pods(namespace) → find failing pods
  • get_pod_logs(pod, namespace) → fetch logs
  • describe_deployment(name) → inspect rollout status
  • scale_deployment(name, replicas) → auto-scale
  • apply_manifest(yaml) → deploy changes

Real implementations:

  • kubectl-ai — natural language to kubectl commands
  • Robusta — AI-powered Kubernetes troubleshooting with MCP support
  • k8s-mcp-server — open-source MCP server wrapping the Kubernetes API
  • OpenShift + ACM — Red Hat is building AI-assisted cluster management leveraging MCP for tool standardization

Example agent workflow:

User: “Why is the payments service degraded?”

Agent →  list_pods(namespace=”payments”)

      →  get_pod_logs(pod=”payments-7f9b”, tail=100)

      →  describe_deployment(“payments”)

      →  LLM reasons: “OOMKilled — memory limit too low”

      →  Proposes: patch_deployment(memory_limit=”1Gi”)

      →  HITL: “Approve this change?” → Engineer approves

      →  apply_patch() → monitors rollout → confirms healthy


MCP + Terraform Pipelines

What it looks like: A Terraform MCP server exposes infrastructure operations. The agent can plan, review, and apply infrastructure changes conversationally.

MCP tools exposed:

  • terraform_plan(module, vars) → generate and review a plan
  • terraform_apply(plan_id) → apply approved changes
  • terraform_state_show(resource) → inspect current state
  • terraform_output(name) → read output values
  • detect_drift() → compare actual vs declared state

Key use cases:

  • Drift detection agent: continuously checks for infrastructure drift and auto-raises PRs to correct it
  • Cost optimization agent: analyzes Terraform state, identifies oversized resources, proposes rightsizing
  • Compliance agent: scans Terraform plans against OPA/Sentinel policies before apply
  • PR review agent: reviews Terraform PRs, flags security misconfigs, suggests improvements

Example pipeline:

PR opened with Terraform changes

       │

       ▼

MCP Terraform Agent

  ├── terraform_plan() → generates plan

  ├── scan_security(plan) → checks for open security groups, no encryption

  ├── estimate_cost(plan) → computes monthly cost delta

  ├── LLM summarizes: “This adds an unencrypted S3 bucket costing ~$12/mo”

  └── Posts review comment to PR with findings + recommendations


📊MCP + Infrastructure Observability

What it looks like: Observability tools (Prometheus, Grafana, Loki, Datadog) are wrapped as MCP servers. The agent queries them in natural language and correlates signals across tools autonomously.

MCP tools exposed:

  • query_prometheus(promql, time_range) → fetch metrics
  • search_logs(query, service, time_range) → Loki/Elasticsearch
  • get_traces(service, error_only) → Jaeger/Tempo
  • list_active_alerts() → current firing alerts
  • get_dashboard(name) → Grafana snapshot
  • create_annotation(text, time) → mark events on dashboards

Key use cases:

  • Natural language observability: “Show me error rate for the checkout service in the last 30 mins” — no PromQL needed
  • Automated RCA: agent correlates metrics + logs + traces to pinpoint root cause
  • Alert noise reduction: agent groups related alerts, suppresses duplicates, and writes a single incident summary
  • Capacity planning: agent queries historical metrics, detects trends, forecasts when resources will be exhausted

🔗 How MCP Ties It All Together

The power of MCP is that a single agent can hold tools from all three domains simultaneously:

┌─────────────────────────────────────────────────────┐

│                   LLM Agent                         │

│              (Claude / GPT-4o)                      │

└────────────────────┬────────────────────────────────┘

                     │ MCP

        ┌────────────┼────────────┐

        ▼            ▼            ▼

┌──────────────┐ ┌──────────┐ ┌──────────────────┐

│  Kubernetes  │ │Terraform │ │  Observability   │

│  MCP Server  │ │ MCP Server│ │   MCP Server     │

│  (kubectl,   │ │(plan,    │ │(Prometheus, Loki,│

│   Helm, ACM) │ │ apply,   │ │ Grafana, Jaeger) │

└──────────────┘ │ drift)   │ └──────────────────┘

                 └──────────┘

End-to-end scenario:

  1. Observability MCP detects CPU spike on node pool
  2. Agent queries Terraform MCP → finds node group is at max capacity
  3. Agent queries Kubernetes MCP → confirms pods are pending due to insufficient nodes
  4. Agent generates Terraform plan to scale node group from 3→5 nodes
  5. HITL approval → Terraform apply → Kubernetes confirms new nodes joined
  6. Agent posts incident summary to Slack with full audit trail

API – response time

Here are fast, reliable ways to measure client-side API response time (and break it down) — from your laptop or from an EKS pod.

1) One-shot timing (curl)

This prints DNS, TCP, TLS, TTFB, and Total in one go:

curl -s -o /dev/null -w '
{ "http_code":%{http_code},
  "remote_ip":"%{remote_ip}",
  "dns":%{time_namelookup},
  "tcp":%{time_connect},
  "tls":%{time_appconnect},
  "ttfb":%{time_starttransfer},
  "total":%{time_total},
  "size":%{size_download},
  "speed":%{speed_download}
}
' https://api.example.com/path

Fields

  • dns: DNS lookup
  • tcp: TCP connect
  • tls: TLS handshake (0 if HTTP)
  • ttfb: time to first byte (request→first response byte)
  • total: full download time

2) From EKS (ephemeral pod)

Run N samples and capture a CSV:

kubectl run curl --rm -it --image=curlimages/curl:8.8.0 -- \
sh -c 'for i in $(seq 1 50); do \
  curl -s -o /dev/null -w "%{time_namelookup},%{time_connect},%{time_appconnect},%{time_starttransfer},%{time_total}\n" \
  https://api.example.com/health; \
done' > timings.csv

Open timings.csv and look at columns: dns,tcp,tls,ttfb,total. Large ttfb means slow upstream/app; big tls means handshake issues; big gap total - ttfb means payload/download time.

3) Separate proxy vs upstream (Kong in the path)

Kong adds latency headers you can read on the client:

curl -i https://api.example.com/path | sed -n 's/^\(x-kong-.*latency\): \(.*\)$/\1: \2/p'
# x-kong-proxy-latency: <ms>   (Kong → upstream start)
# x-kong-upstream-latency: <ms> (Upstream processing)

These help you see if delay is at the gateway or in the service.

4) Quick load/percentiles (pick one)

  • hey hey -z 30s -c 20 https://api.example.com/path
  • vegeta echo "GET https://api.example.com/path" | vegeta attack -rate=20 -duration=30s | vegeta report
  • k6 (scriptable) // save as test.js import http from 'k6/http'; import { check } from 'k6'; export const options = { vus: 20, duration: '30s', thresholds: { http_req_duration: ['p(95)<300'] } }; export default () => { const r = http.get('https://api.example.com/path'); check(r, { '200': (res)=>res.status===200 }); }; Run: k6 run test.js

5) App-level timers (optional)

Add a Server-Timing header from the API to expose your own phase timings (DB, cache, etc.). Then the client can read those headers to correlate.

6) Common gotchas

  • Proxies can add latency; test both with and without proxy (NO_PROXY / --proxy).
  • Auth: measure with real headers/tokens; 401/403 will skew.
  • SNI/Host: if hitting by IP, use --resolve host:443:IP -H "Host: host" so cert/routing is correct.
  • Warmup: discard first few samples (JIT, caches, TLS session reuse).

If you want, share a few curl -w outputs from local vs EKS and I’ll pinpoint where the time is going (DNS/TLS/TTFB/payload).

Kong – proxy issue

Got it—your client runs in AWS and must use a corporate proxy. With mTLS, a client-side proxy can absolutely be the culprit. Here’s the playbook.

What’s happening

  • If the proxy does TLS inspection (MITM), it terminates TLS and re-signs with its own CA. Your server asks the proxy (not the real client) for a cert → it has none → server logs “trying to obtain a certificate from the client.”
  • The client may also see “unable to get local issuer certificate” because it’s validating the proxy’s substituted cert but doesn’t trust the proxy’s Root CA.

Decide the path (pick one)

A) Allow end-to-end mTLS (best):
Ask the proxy admins to bypass SSL inspection for your domain (add it to the proxy’s TLS bypass list) or ensure it does pure CONNECT tunneling. Then the real client cert reaches your server.

B) Two-hop mTLS (enterprise pattern):

  • Client ↔ Proxy: mTLS using Client-Cert-#1 (issued by proxy’s CA).
  • Proxy ↔ Your Server: mTLS using Client-Cert-#2 (issued by a CA your server trusts).
    Your server will authenticate the proxy’s identity, not the original client. If you need end-user identity, have the proxy forward vetted identity (headers/JWT) and verify it.

C) Temporary test bypass:
On that AWS instance, set NO_PROXY for your hostname if policy allows, and open egress 443 to your server to confirm the issue is the proxy.

Quick diagnostics (run both)

1) See if proxy is intercepting

# Behind the proxy
openssl s_client -proxy PROXY_HOST:PROXY_PORT \
  -connect YOUR_HOST:443 -servername YOUR_HOST -showcerts </dev/null | \
  openssl x509 -noout -issuer -subject -enddate

Compare issuer/subject to the same command from a machine without the proxy.
If issuers differ (e.g., “Zscaler/BlueCoat/Corp Root CA”), it’s inspecting.

2) Prove mTLS works through a tunnel

# If proxy supports pure CONNECT (no inspection)
curl -v https://YOUR_HOST/ \
  --proxy http://PROXY_HOST:PROXY_PORT \
  --cert client.crt --key client.key \
  --cacert server_ca_chain.pem

  • If your proxy itself is HTTPS, add: --proxy-cacert PROXY_ROOT_CA.pem.

Java client notes (common in AWS)

# Client cert for mTLS to your server
-Djavax.net.ssl.keyStore=/path/client.p12 -Djavax.net.ssl.keyStorePassword=***** -Djavax.net.ssl.keyStoreType=PKCS12
# Trust your server (or proxy if inspecting)
-Djavax.net.ssl.trustStore=/path/truststore.jks -Djavax.net.ssl.trustStorePassword=*****
# Proxy settings (CONNECT)
-Dhttps.proxyHost=PROXY_HOST -Dhttps.proxyPort=PROXY_PORT
# Bypass (if allowed)
-Dhttp.nonProxyHosts="|YOUR_HOST|*.yourdomain.com"

If inspection can’t be bypassed, you must import the proxy’s Root CA into the truststore, and if mTLS is required by your server, implement the two-hop mTLS model above.

What to send to the proxy team

  • “Please bypass TLS inspection (or allow pure CONNECT) for YOUR_HOST:443.”
  • If not possible: “We’ll run two-hop mTLS. We’ll trust your proxy’s client CA for the proxy→server leg; you’ll trust ours for client→proxy.”

If you share the outputs of the two openssl/curl tests (redact hostnames as needed), I’ll tell you exactly whether it’s inspection vs tunneling and the minimal change to make it work.

Short answer: tcpdump can capture payload, but only if the traffic is plaintext.
If it’s HTTPS/TLS (which most APIs are), tcpdump will not show the body unless you decrypt it. It also won’t write into your “app logs”—it just captures packets.

Here are your practical options:

ClientMTLSAuth status absent

It means no client certificate was presented/validated on the TLS handshake—i.e., client mTLS auth did not happen for that request.

Think of it like Nginx’s $ssl_client_verify = NONE: the server (Cloudflare/F5/Kong) didn’t see a client cert, so mTLS status is absent.

Why it happens

  • Client didn’t send a cert (missing --cert/--key in curl, or browser/keystore not configured).
  • mTLS not required on that hop (e.g., F5/Cloudflare set to “request/ignore” instead of “require”; Kong’s mtls-auth plugin not enabled).
  • TLS terminates before Kong (Cloudflare/F5 terminates TLS; the origin can’t see the client cert unless that proxy enforces mTLS itself).
  • Wrong host/port/SNI that doesn’t have the mTLS policy.

Quick checks

# PEM pair
curl -vk https://api.example.com/path \
  --cert client.crt --key client.key

# or PKCS#12
curl -vk https://api.example.com/path \
  --cert-type P12 --cert client.p12:password

Look for SSL client certificate lines in the verbose output.

  • On Cloudflare/F5:
    • Require client certs on the relevant hostname/path.
    • Ensure the CA used to validate client certs is uploaded/selected.
    • If using Cloudflare, remember: the client’s cert terminates at Cloudflare; the origin (Kong) won’t receive it. Enforce mTLS at Cloudflare edge and use origin controls (e.g., Authenticated Origin Pulls) separately.
  • On Kong:
    • If using Enterprise mtls-auth plugin, enable it on the route/service and upload your CA certificate to /ca-certificates; reference it in the plugin.
    • If relying on Nginx-level mTLS, ensure the proxy listener is configured with ssl_client_certificate and ssl_verify_client on; (or the equivalent in your deployment).

Interpreting statuses

  • absent = no cert presented.
  • failed/invalid = cert presented but couldn’t be validated (untrusted/expired/CN mismatch).
  • success/verified = cert was validated.

If you share how traffic flows (Client → Cloudflare/F5 → Kong) and where you intend to enforce mTLS, I’ll give you the exact config and a working curl for that hop.

Kong latency 2

Short answer:
In Kong logs, proxy latency is the time spent waiting on your upstream service (the API/backend) — i.e., how long it took the upstream to respond to Kong.

Here’s the breakdown of the three latency fields you’ll see in Kong logs:

  • latencies.proxyUpstream latency (a.k.a. “proxy latency”): time from when Kong sends the request to the upstream until it starts getting the response.
  • latencies.kongKong internal time: routing + plugin execution + overhead inside Kong.
  • latencies.requestTotal request time as seen by the client.

Quick mental model:

Client ──> [ Kong (latencies.kong) ] ──> Upstream API (latencies.proxy) ──> [ Kong ] ──> Client
                           \________________ latencies.request ________________/

A common point of confusion: in response headers,

  • X-Kong-Upstream-Latencylatencies.proxy (upstream time)
  • X-Kong-Proxy-Latencylatencies.kong (Kong time)

So, if you see high proxy latency, the slowness is almost always in your backend (or the network to it), not Kong itself. Focus on the upstream’s performance (DB calls, external services), network/DNS, and connection reuse; use Kong’s service/route timeouts (connect_timeout, read_timeout, write_timeout) to guard against outliers.Short answer:
In Kong logs, proxy latency is the time spent waiting on your upstream service (the API/backend) — i.e., how long it took the upstream to respond to Kong.

Here’s the breakdown of the three latency fields you’ll see in Kong logs:

  • latencies.proxyUpstream latency (a.k.a. “proxy latency”): time from when Kong sends the request to the upstream until it starts getting the response.
  • latencies.kongKong internal time: routing + plugin execution + overhead inside Kong.
  • latencies.requestTotal request time as seen by the client.

Quick mental model:

Client ──> [ Kong (latencies.kong) ] ──> Upstream API (latencies.proxy) ──> [ Kong ] ──> Client
                           \________________ latencies.request ________________/

A common point of confusion: in response headers,

  • X-Kong-Upstream-Latencylatencies.proxy (upstream time)
  • X-Kong-Proxy-Latencylatencies.kong (Kong time)

So, if you see high proxy latency, the slowness is almost always in your backend (or the network to it), not Kong itself. Focus on the upstream’s performance (DB calls, external services), network/DNS, and connection reuse; use Kong’s service/route timeouts (connect_timeout, read_timeout, write_timeout) to guard against outliers.

Kong – no suitable authorization credentials were provided

The error “no suitable authorization credentials were provided” usually means Kong (or an upstream service) expected authentication information, but it either:

  • Didn’t receive it at all
  • Received it in the wrong format
  • Received it in the wrong location (header vs. query parameter)
  • Couldn’t match it to any configured credentials

Here’s how to troubleshoot it step-by-step:


1. Identify which plugin or service is throwing the error

  • Kong Plugin Layer – Often caused by key-auth, jwt, or openid-connect plugin when no valid credential is passed.
  • Upstream Service Layer – Could be from the API/backend if Kong passed the request without credentials.
  • Load Balancer / F5 – Sometimes strips or alters headers, so credentials never reach Kong.

2. Check the authentication type you’re using

  • API Key (key-auth plugin)
    • Expected: apikey in ?query or apikey in header (apikey: <key>).
    • Verify: curl -H "apikey: YOUR_API_KEY" https://your-kong-endpoint/your-api
  • JWT Plugin
    • Expected: Authorization: Bearer <jwt_token> header.
    • Token must be valid and match a configured consumer.
  • OpenID Connect (openid-connect plugin)
    • Expected: Valid ID/Access token, usually Authorization: Bearer <access_token>.
    • Token must be signed by a trusted issuer in plugin config.

3. Verify headers are not stripped

If you have F5 / Nginx / Apache in front of Kong:

  • Check that Authorization headers are forwarded.
  • In F5, enable HTTP::header Authorization forwarding.
  • In Nginx, ensure: proxy_set_header Authorization $http_authorization;

4. Check Kong logs for details

Run Kong in debug mode temporarily:

KONG_LOG_LEVEL=debug kong restart
tail -f /usr/local/kong/logs/error.log

Look for:

  • plugin=... lines showing missing credentials
  • no credentials found for given consumer/issuer

5. Confirm consumer mapping

For jwt or oidc:

  • The public key / JWKS URL must be correct.
  • The iss (issuer) in token must match plugin config.
  • The consumer must be linked to the credential.

💡 Common causes I’ve seen in your past scenarios with F5 + Kong + Ping Identity:

  • F5 stripping Authorization header before reaching Kong.
  • Wrong token type (ID token instead of Access token) in OIDC flow.
  • Consumer not mapped in Kong for the provided JWT/OIDC token.
  • client_id and client_secret missing in a token request.

If you want, I can draw you a flow diagram for Kong + Ping Identity showing exactly where the credential might be getting lost in your setup with F5 in the middle. That would make pinpointing the issue much faster.

Kong / our app is build using Java Spring Boot

Yes — when integrating certificates into Spring Boot, especially for mTLS, HTTPS, or secure inter-service communication, there are several important certificate handling considerations you should account for:


✅ 1. Keystore / Truststore Formats

StorePurposeCommon Format
KeystoreHolds your application’s private key + certJKS, PKCS12 (.p12 / .pfx)
TruststoreHolds trusted CA certs to verify clients or serversJKS, PKCS12

🟡 JKS is Java-native, but PKCS12 is preferred for modern apps (more portable + supports OpenSSL).

🔧 Example Spring Boot Config:

server.ssl.key-store=classpath:server.p12
server.ssl.key-store-password=changeit
server.ssl.key-store-type=PKCS12
server.ssl.trust-store=classpath:truststore.p12
server.ssl.trust-store-password=changeit
server.ssl.trust-store-type=PKCS12


✅ 2. Certificate Chain Requirements

Spring Boot expects:

  • Full certificate chain in the keystore if issued by an intermediate CA
  • Trusted root CA in the truststore (for mTLS)

🔸 Failure to include the full chain often results in handshake errors or “unable to find valid certification path” errors.


✅ 3. Protocol Configuration (TLS)

Spring Boot (via embedded Tomcat) uses TLS 1.2+ by default.

To restrict or specify:

server.ssl.enabled-protocols=TLSv1.2,TLSv1.3

🔐 Disable TLS 1.0/1.1 — they are deprecated and insecure.


✅ 4. Cipher Suites

Spring Boot (via Tomcat) negotiates secure ciphers by default.
You can explicitly define them:

server.ssl.ciphers=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,...

Use modern ciphers like ECDHE + AES_GCM.
Avoid RC4, 3DES, NULL, or EXPORT ciphers.


✅ 5. Mutual TLS (mTLS)

To require client certs:

server.ssl.client-auth=need

Modes:

  • none: default (no client cert)
  • want: optional client cert
  • need: mandatory client cert (for mTLS)

✅ 6. Generating Keystore & Truststore

Convert PEM to PKCS12:

openssl pkcs12 -export \
  -in client.crt \
  -inkey client.key \
  -certfile ca.crt \
  -out client-keystore.p12 \
  -name client

Then import trusted CA into truststore (if using JKS):

keytool -import -alias myca -file ca.crt -keystore truststore.jks


✅ 7. Spring Boot with Reverse Proxies (e.g., Kong, F5)

If TLS termination is done by Kong/F5 and Spring Boot sits behind it:

  • Use X-Forwarded headers: server.forward-headers-strategy=framework
  • Do not terminate TLS inside Spring unless you need end-to-end TLS.

✅ Summary: What to Account For

AreaKey Considerations
🔐 FormatUse PKCS12 for best compatibility
🧾 TrustTruststore must contain root CA or intermediate
🔄 ChainInclude full cert chain in keystore
🌐 ProtocolEnable TLSv1.2+ only
🔐 CiphersUse strong, modern suites
🔁 mTLSSet client-auth=need + configure truststore
🔀 ProxyHandle X-Forwarded-* if behind Kong/F5

Would you like:

  • A working Spring Boot + mTLS project example?
  • An OpenSSL script to prepare the .p12 files?
  • A Kong + Spring Boot mTLS walkthrough?

Let me know your setup.