kong latency

Short answer: there isn’t a single built-in “average response time” value for Kong. You measure it from Kong’s latency telemetry and then compute the average. Kong exposes three latencies:

  • Request latency (a.k.a. total): time from first byte in to last byte out.
  • Upstream latency: time the upstream took to start responding.
  • Kong latency: time spent inside Kong (routing + plugins). (Kong Docs)

Below are the quickest ways to get the average.


Option 1 — Prometheus plugin (fastest for averages/percentiles)

Enable the Prometheus plugin with config.latency_metrics=true. It exposes histograms:

  • kong_request_latency_ms_* (total)
  • kong_upstream_latency_ms_*
  • kong_kong_latency_ms_* (Kong Docs)

PromQL examples (last 5 minutes):

# Average total response time
sum(rate(kong_request_latency_ms_sum[5m]))
/
sum(rate(kong_request_latency_ms_count[5m]))

# Average per service
sum by (service) (rate(kong_request_latency_ms_sum[5m]))
/
sum by (service) (rate(kong_request_latency_ms_count[5m]))

# p95 latency (total)
histogram_quantile(
  0.95,
  sum by (le) (rate(kong_request_latency_ms_bucket[5m]))
)

Swap request for upstream or kong in the metric names to break the average down by where time is spent. (Kong Docs)


Option 2 — ELK/Kibana (since you’re using ELK)

Use a log plugin (e.g., File Log or HTTP Log) and ship the JSON to Elasticsearch. Each log line contains:

"latencies": {
  "request": <total_ms>,
  "proxy":   <upstream_ms>,
  "kong":    <kong_ms>
}

In Kibana → Discover/Lens, set Average over latencies.request (or latencies.proxy / latencies.kong) to see the mean response time. (Kong Docs)


Option 3 — Quick curl check (headers)

Kong can add latency headers:

  • X-Kong-Proxy-Latency (≈ Kong time)
  • X-Kong-Upstream-Latency (≈ upstream time)
  • X-Kong-Total-Latency (total; with advanced tokens)

Enable with KONG_HEADERS=latency_tokens (or advanced_latency_tokens) and inspect with curl -i. (Kong Docs)


What’s a “good” number?

It’s workload-specific. Treat Kong latency as your gateway overhead (aim for low single-digit ms) and focus tuning on upstream latency (usually the dominant share). Use p95/p99 alongside the average to catch tail latency. (Kong Docs)

If you want, I can drop a Grafana panel (PromQL prewired) and a Kibana Lens example targeting latencies.request so you can paste them into your stack.

Leave a comment