grafana | Infra Cloud Solutions

Deploy monitoring solution with Prometheus and Grafana on premise in HA mode

Deploying Prometheus and Grafana in High Availability (HA) mode on-premises ensures monitoring continuity and data resilience. In this setup, Prometheus will run in HA mode with redundancy and Grafana will connect to the Prometheus instances, also configured for HA.

Below are the steps to deploy Prometheus and Grafana on-premises with HA:

1. Plan the Architecture

Prometheus Instances:
- Set up at least two Prometheus instances in HA mode for redundancy.
- Each Prometheus instance will scrape the same set of targets independently and store its own local data.
Grafana Instances:
- Deploy at least two Grafana instances in HA mode, load balanced to ensure availability.
- Grafana will connect to both Prometheus instances and aggregate the metrics.
Storage:
- Use a distributed storage system like Thanos, VictoriaMetrics, or Prometheus remote storage (like Cortex or Mimir) for long-term data storage.
- Configure a shared storage for Grafana, or use a SQL database (e.g., MySQL, PostgreSQL) to keep dashboards and configuration in sync.

2. Set Up Prometheus in HA Mode

Step 2.1: Install Prometheus

Download and extract Prometheus on each node:

wget https://github.com/prometheus/prometheus/releases/download/v2.37.0/prometheus-2.37.0.linux-amd64.tar.gz

tar -xvf prometheus-2.37.0.linux-amd64.tar.gz

cd prometheus-2.37.0.linux-amd64

Copy the Prometheus binary to /usr/local/bin and set up the configuration directory (/etc/prometheus).

Step 2.2: Configure Prometheus

Create a prometheus.yml configuration file in /etc/prometheus for each instance:

global:

scrape_interval: 15s

scrape_configs:

– job_name: ‘your_targets’

static_configs:

– targets: [‘<target_ip1>:<port>’, ‘<target_ip2>:<port>’]

For HA, each Prometheus instance must be configured identically with the same scrape targets and rules.
High Availability Labeling:
- To distinguish between HA Prometheus instances, add a –cluster.peer=<other_instance_ip>:<port> flag in each instance’s configuration.
- This will allow the instances to work as separate, yet synchronized, peers.

Step 2.3: Start Prometheus

Create a systemd service file for each Prometheus instance at /etc/systemd/system/prometheus.service:

[Unit]

Description=Prometheus

After=network.target

[Service]

User=prometheus

ExecStart=/usr/local/bin/prometheus –config.file=/etc/prometheus/prometheus.yml –storage.tsdb.path=/var/lib/prometheus –web.enable-lifecycle

[Install]

WantedBy=multi-user.target

Enable and start each Prometheus instance:

sudo systemctl enable prometheus

sudo systemctl start prometheus

3. Install and Configure Thanos (Optional for Long-Term Storage)

Deploy Thanos Sidecar alongside each Prometheus instance for storing data in a distributed store and enabling HA Prometheus queries.
Thanos Sidecar:
- Set up a sidecar container or service to work with each Prometheus instance.
- It will upload data to an object storage (e.g., S3, MinIO) and enable querying of both Prometheus instances as a unified source.

4. Deploy Grafana in HA Mode

Step 4.1: Install Grafana

Download and install Grafana on each node:

wget https://dl.grafana.com/oss/release/grafana-8.0.0.linux-amd64.tar.gz

tar -zxvf grafana-8.0.0.linux-amd64.tar.gz

Copy the Grafana binaries and set up the configuration directory (/etc/grafana).

Step 4.2: Configure Grafana

In the Grafana configuration file (/etc/grafana/grafana.ini), set up the database to store Grafana data centrally:

[database]

type = postgres

host = <database_host>:5432

name = grafana

user = grafana_user

password = grafana_password

Add both Prometheus instances as data sources in Grafana. Grafana will automatically handle HA and load balancing between them.

Step 4.3: Start Grafana

Set up a systemd service for Grafana:

[Unit]

Description=Grafana

After=network.target

[Service]

User=grafana

ExecStart=/usr/local/bin/grafana-server -config /etc/grafana/grafana.ini

[Install]

WantedBy=multi-user.target

Enable and start Grafana:

sudo systemctl enable grafana-server

sudo systemctl start grafana-server

5. Set Up Load Balancers for HA

Prometheus Load Balancer:
- Set up a load balancer in front of the Prometheus instances to ensure that requests are evenly distributed across instances.
Grafana Load Balancer:
- Set up another load balancer for the Grafana instances to distribute user access and enable failover.

6. Verify and Test the HA Setup

Prometheus:
- Test that both Prometheus instances are running independently by accessing them via <node_ip>:9090.
- Use Thanos Querier (if configured) to query both Prometheus instances as a single source.
Grafana:
- Log in to Grafana via the load balancer IP, add Prometheus as a data source, and create a sample dashboard.
- Simulate a failure on one Grafana instance and ensure that the other instance handles the load transparently.

7. Enable Monitoring and Alerting

Configure Alertmanager for Prometheus:
- Set up Alertmanager to handle alerts in case of any issues.
- Use HA by deploying multiple Alertmanager instances with clustering.
Set up alerts in Grafana for visualization and notifications based on key metrics and alert rules.

Summary of Key Points

HA Prometheus: Multiple Prometheus instances scraping the same targets, optionally with Thanos for long-term storage and aggregation.
HA Grafana: Multiple Grafana instances with a centralized database for dashboards, load-balanced to ensure redundancy.
Alerting: Use Alertmanager in HA mode to handle alerts from Prometheus.

This HA setup for Prometheus and Grafana provides a robust monitoring solution that is resilient, scalable, and fault-tolerant.