Understanding PromQL: A Complete Guide

What is PromQL?

PromQL (Prometheus Query Language) is the query language used by Prometheus to retrieve, filter, aggregate, and analyze time-series metrics.

It is the primary language used in:

Prometheus UI
Grafana dashboards
Alerting rules
Recording rules

PromQL Data Model

Metrics are stored as:

metric_name{label1="value1",label2="value2"} value timestamp

Example:

node_cpu_seconds_total{instance="server1",mode="idle"} 12345

Where:

Component	Meaning
node_cpu_seconds_total	Metric name
instance=”server1″	Label
mode=”idle”	Label
12345	Metric value

Basic PromQL Examples

1. Show a Metric

up

Returns all monitored targets.

Example:

			
up{instance="server1"} 1
up{instance="server2"} 1

1 = healthy
0 = down

2. Filter by Label

up{instance="server1:9100"}

Returns metrics only for that server.

3. Multiple Labels

			
node_cpu_seconds_total{
  instance="server1:9100",
  mode="idle"
}

Range Queries

Retrieve values over a time period.

Example:

node_cpu_seconds_total[5m]

Returns the last 5 minutes of data.

Rate Functions

One of the most common interview topics.

rate()

Calculates the per-second increase of a counter.

Example:

rate(http_requests_total[5m])

Meaning:

How many requests per second occurred during the last 5 minutes?

irate()

Calculates the rate using only the two most recent samples.

irate(http_requests_total[5m])

More responsive but noisier.

CPU Usage Example

Node Exporter provides:

node_cpu_seconds_total

Idle CPU:

avg(rate(node_cpu_seconds_total{mode="idle"}[5m]))

CPU Usage Percentage:

			
100 - (
 avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100
)

Very common Grafana dashboard query.

Memory Usage Example

Used Memory:

			
node_memory_MemTotal_bytes
-
node_memory_MemAvailable_bytes

Memory Percentage:

			
(
(node_memory_MemTotal_bytes
-
node_memory_MemAvailable_bytes)
/
node_memory_MemTotal_bytes
) * 100

		

Aggregation Functions

sum()

sum(http_requests_total)

Adds all values together.

avg()

avg(node_load1)

Average load.

max()

max(node_memory_MemAvailable_bytes)

Highest value.

min()

min(node_memory_MemAvailable_bytes)

Lowest value.

Group By

Example:

sum(rate(http_requests_total[5m])) by (instance)

Output:

			
server1 = 100 req/s
server2 = 150 req/s

Top Consumers

Top 5 CPU-consuming containers:

			
topk(
5,
sum(rate(container_cpu_usage_seconds_total[5m]))
by (pod)
)

		

Very common in Kubernetes/OpenShift interviews.

Kubernetes Examples

Pod Count

count(kube_pod_info)

Running Pods

count(kube_pod_status_phase{phase="Running"})

Node Count

count(kube_node_info)

OpenShift Examples

API Server Latency

			
histogram_quantile(
0.99,
sum(rate(apiserver_request_duration_seconds_bucket[5m]))
by (le)
)

		

etcd Latency

			
histogram_quantile(
0.99,
rate(etcd_disk_wal_fsync_duration_seconds_bucket[5m])
)

OVN Pod Status

up{job="ovn-kubernetes-node"}

Alert Rule Example

CPU > 80%

			
groups:
- name: cpu-alerts
  rules:
  - alert: HighCPU
    expr: 100 - (
      avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100
    ) > 80
    for: 5m

		

Meaning:

CPU above 80%
For 5 minutes
Fire alert

Recording Rule Example

Instead of calculating CPU every dashboard refresh:

			
- record: node:cpu_usage:avg
  expr: 100 - (
    avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100
  )

Then dashboards query:

node:cpu_usage:avg

This improves performance.

Common Interview Questions

What is the difference between rate() and irate()?

rate()	irate()
Uses many samples	Uses last two samples
Smoother	More responsive
Good for alerts	Good for graphs

What is a counter?

A metric that only increases.

Examples:

			
http_requests_total
container_cpu_usage_seconds_total

What is a gauge?

A metric that can increase or decrease.

Examples:

			
node_memory_MemAvailable_bytes
node_load1

What is a histogram?

Used to measure distributions such as latency.

Example:

http_request_duration_seconds_bucket

What is cardinality?

The number of unique metric/label combinations.

Example:

			
http_requests_total{user="1"}
http_requests_total{user="2"}
http_requests_total{user="3"}
...

Millions of unique labels create high cardinality, which can cause Prometheus performance and memory issues.

Interview answer

PromQL is Prometheus’s query language used to retrieve, filter, aggregate, and calculate metrics. It supports functions such as rate(), sum(), avg(), histogram_quantile(), and label filtering, making it the foundation for Grafana dashboards, alerting rules, and monitoring in Kubernetes and OpenShift environments.

What is PromQL?

PromQL Data Model

Basic PromQL Examples

1. Show a Metric

2. Filter by Label

3. Multiple Labels

Range Queries

Rate Functions

rate()

irate()

CPU Usage Example

Memory Usage Example

Aggregation Functions

sum()

avg()

max()

min()

Group By

Top Consumers

Kubernetes Examples

Pod Count

Running Pods

Node Count

OpenShift Examples

API Server Latency

etcd Latency

OVN Pod Status

Alert Rule Example

Recording Rule Example

Common Interview Questions

What is the difference between rate() and irate()?

What is a counter?

What is a gauge?

What is a histogram?

What is cardinality?

Interview answer

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from Infra Cloud Solutions