Building a Grafana Dashboard for Multi-Host Metrics

Centralized monitoring architecture.

One central Grafana and Prometheus server that pulls metrics from 20 lightweight cAdvisor agents running across your network.

Here is the cleanest way to architect and deploy this.

Centralized Architecture Overview

[ Central Monitor Server ] [ 20x Remote Docker Hosts ]
┌────────────────────────┐ ┌─────────────────────────┐
│ Grafana │ │ Docker Host 01 │
│ ▲ │ │ └─ cAdvisor (Port 8080)│
│ │ (Queries) │ └─────────────────────────┘
│ Prometheus │◄─────────────┐ ▲
└────────────────────────┘ (Scrapes │ │
over HTTP) ├─────────────┤
│ ▼
│┌─────────────────────────┐
││ Docker Host 20 │
└┤ └─ cAdvisor (Port 8080)│
└─────────────────────────┘
  • The Agents (Remote Hosts): Every one of your 20 servers runs just a single, lightweight cAdvisor container. They expose their metrics on port 8080.
  • The Core (Central Host): One designated management server runs Prometheus and Grafana. Prometheus is configured to reach out over your network to all 20 servers to grab their metrics.

Step 1: Deploy cAdvisor on all 20 Remote Hosts

You need to spin up cAdvisor on every Docker server. If you use an automation tool like Ansible, this is a one-click playbook. Otherwise, run this docker run command on each machine:

Bash

docker run \
--volume=/:/rootfs:ro \
--volume=/var/run:/var/run:ro \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
--volume=/dev/disk/:/dev/disk:ro \
--publish=8080:8080 \
--detach=true \
--name=cadvisor \
--privileged \
--device=/dev/kmsg \
--restart=always \
gcr.io/cadvisor/cadvisor:latest

Security Note: By publishing 8080:8080, cAdvisor metrics are public to anyone who can reach that IP. Ensure your internal firewall/security groups only allow traffic to port 8080 from your Central Prometheus Server IP.

Step 2: Configure the Central Prometheus Server

On your central monitoring server, create a prometheus.yml file. Instead of pointing to localhost, you will list your 20 servers using Prometheus targets and labels. Labels are crucial here because they allow you to filter your Grafana dashboard by specific servers.

YAML

global:
scrape_interval: 15s
scrape_configs:
- job_name: 'docker-swarm-nodes'
static_configs:
- targets: ['192.168.1.50:8080']
labels:
instance: 'prod-web-01'
env: 'production'
- targets: ['192.168.1.51:8080']
labels:
instance: 'prod-db-01'
env: 'production'
- targets: ['192.168.1.70:8080']
labels:
instance: 'stage-app-01'
env: 'staging'
# ... add the remaining 17 servers here

Start Prometheus and Grafana on this central machine using a simplified version of the Docker Compose template from earlier (removing the local cAdvisor block from it).

Step 3: Multi-Host Grafana Dashboard

Once Prometheus is scraping all 20 hosts, you need a Grafana dashboard that lets you switch between them smoothly.

  1. Go to Grafana -> Dashboards -> Import.
  2. Use Dashboard ID: 14282 or 10619 (both are heavily optimized for multi-host setups).
  3. These dashboards automatically generate a dropdown menu at the top left labeled “Instance” or “Host” based on the labels you defined in your prometheus.yml.

You can now view your entire fleet’s aggregated resource consumption, or drill down into a specific container running on prod-db-01.

Pro-Tips for Managing 20+ Hosts

  • Node Exporter: cAdvisor only monitors container metrics. If you want to monitor the host Linux OS itself (host disk space, total RAM usage, bare-metal CPU temperature), deploy Prometheus Node Exporter alongside cAdvisor on all 20 nodes, exposing it on port 9100.
  • Scale Warning: Scraping 20 hosts every 15 seconds will generate a decent chunk of data. Ensure your Central Prometheus server has a reasonable storage retention policy set (e.g., --storage.tsdb.retention.time=15d to keep data for 15 days) so it doesn’t quietly fill up the server’s hard drive.

Leave a Reply