What is PromQL?
PromQL (Prometheus Query Language) is the query language used by Prometheus to retrieve, filter, aggregate, and analyze time-series metrics.
It is the primary language used in:
- Prometheus UI
- Grafana dashboards
- Alerting rules
- Recording rules
PromQL Data Model
Metrics are stored as:
metric_name{label1="value1",label2="value2"} value timestamp
Example:
node_cpu_seconds_total{instance="server1",mode="idle"} 12345
Where:
| Component | Meaning |
|---|---|
| node_cpu_seconds_total | Metric name |
| instance=”server1″ | Label |
| mode=”idle” | Label |
| 12345 | Metric value |
Basic PromQL Examples
1. Show a Metric
up
Returns all monitored targets.
Example:
up{instance="server1"} 1up{instance="server2"} 1
- 1 = healthy
- 0 = down
2. Filter by Label
up{instance="server1:9100"}
Returns metrics only for that server.
3. Multiple Labels
node_cpu_seconds_total{ instance="server1:9100", mode="idle"}
Range Queries
Retrieve values over a time period.
Example:
node_cpu_seconds_total[5m]
Returns the last 5 minutes of data.
Rate Functions
One of the most common interview topics.
rate()
Calculates the per-second increase of a counter.
Example:
rate(http_requests_total[5m])
Meaning:
How many requests per second occurred during the last 5 minutes?
irate()
Calculates the rate using only the two most recent samples.
irate(http_requests_total[5m])
More responsive but noisier.
CPU Usage Example
Node Exporter provides:
node_cpu_seconds_total
Idle CPU:
avg(rate(node_cpu_seconds_total{mode="idle"}[5m]))
CPU Usage Percentage:
100 - ( avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
Very common Grafana dashboard query.
Memory Usage Example
Used Memory:
node_memory_MemTotal_bytes-node_memory_MemAvailable_bytes
Memory Percentage:
((node_memory_MemTotal_bytes-node_memory_MemAvailable_bytes)/node_memory_MemTotal_bytes) * 100
Aggregation Functions
sum()
sum(http_requests_total)
Adds all values together.
avg()
avg(node_load1)
Average load.
max()
max(node_memory_MemAvailable_bytes)
Highest value.
min()
min(node_memory_MemAvailable_bytes)
Lowest value.
Group By
Example:
sum(rate(http_requests_total[5m])) by (instance)
Output:
server1 = 100 req/sserver2 = 150 req/s
Top Consumers
Top 5 CPU-consuming containers:
topk(5,sum(rate(container_cpu_usage_seconds_total[5m]))by (pod))
Very common in Kubernetes/OpenShift interviews.
Kubernetes Examples
Pod Count
count(kube_pod_info)
Running Pods
count(kube_pod_status_phase{phase="Running"})
Node Count
count(kube_node_info)
OpenShift Examples
API Server Latency
histogram_quantile(0.99,sum(rate(apiserver_request_duration_seconds_bucket[5m]))by (le))
etcd Latency
histogram_quantile(0.99,rate(etcd_disk_wal_fsync_duration_seconds_bucket[5m]))
OVN Pod Status
up{job="ovn-kubernetes-node"}
Alert Rule Example
CPU > 80%
groups:- name: cpu-alerts rules: - alert: HighCPU expr: 100 - ( avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100 ) > 80 for: 5m
Meaning:
- CPU above 80%
- For 5 minutes
- Fire alert
Recording Rule Example
Instead of calculating CPU every dashboard refresh:
- record: node:cpu_usage:avg expr: 100 - ( avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100 )
Then dashboards query:
node:cpu_usage:avg
This improves performance.
Common Interview Questions
What is the difference between rate() and irate()?
| rate() | irate() |
|---|---|
| Uses many samples | Uses last two samples |
| Smoother | More responsive |
| Good for alerts | Good for graphs |
What is a counter?
A metric that only increases.
Examples:
http_requests_totalcontainer_cpu_usage_seconds_total
What is a gauge?
A metric that can increase or decrease.
Examples:
node_memory_MemAvailable_bytesnode_load1
What is a histogram?
Used to measure distributions such as latency.
Example:
http_request_duration_seconds_bucket
What is cardinality?
The number of unique metric/label combinations.
Example:
http_requests_total{user="1"}http_requests_total{user="2"}http_requests_total{user="3"}...
Millions of unique labels create high cardinality, which can cause Prometheus performance and memory issues.
Interview answer
PromQL is Prometheus’s query language used to retrieve, filter, aggregate, and calculate metrics. It supports functions such as
rate(),sum(),avg(),histogram_quantile(), and label filtering, making it the foundation for Grafana dashboards, alerting rules, and monitoring in Kubernetes and OpenShift environments.