Setting Up a Server Monitoring Stack with Prometheus + Grafana

Setting Up a Server Monitoring Stack with Prometheus + Grafana

Monitoring your servers' CPU, memory, disk, and network metrics in real time is the foundation of proactive issue detection. Prometheus handles metric collection and querying, while Grafana provides the visualization layer - together they form an excellent monitoring stack. This guide walks you thro

M

Merve Arslan

WordPress & Hosting Expert

March 21, 202614 min read0

Monitoring your servers' CPU, memory, disk, and network metrics in real time is the foundation of proactive issue detection. Prometheus handles metric collection and querying, while Grafana provides the visualization layer - together they form an excellent monitoring stack. This guide walks you through setup, dashboard creation, alerting rules, and PromQL basics step by step.

Prometheus Architecture

Prometheus is a pull-based monitoring system. It scrapes metrics from exporters on target servers at regular intervals and stores them in a local TSDB (Time Series Database). The core components are:

Component Role Port
Prometheus Server Metric collection, storage, and querying 9090
Node Exporter Server metrics (CPU, RAM, disk, network) 9100
Alertmanager Alert management and notification delivery 9093
Grafana Dashboards and visualization 3000

Prometheus Configuration

The prometheus.yml file defines which targets Prometheus scrapes metrics from and how frequently it performs scraping.

prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "alert_rules.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - "alertmanager:9093"

scrape_configs:
  # Prometheus monitors its own metrics
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  # Server metrics
  - job_name: "node"
    static_configs:
      - targets:
          - "server-1:9100"
          - "server-2:9100"
          - "server-3:9100"
        labels:
          env: production

Node Exporter Setup

Node Exporter is a lightweight exporter that collects CPU, memory, disk, and network metrics from Linux servers. It must be installed on every server you want to monitor.

terminal
# Download and install Node Exporter
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz
sudo mv node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/

# Create systemd service file
sudo cat <<EOF > /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
After=network.target

[Service]
User=node_exporter
ExecStart=/usr/local/bin/node_exporter
Restart=always

[Install]
WantedBy=multi-user.target
EOF

# Start the service
sudo systemctl daemon-reload
sudo systemctl enable --now node_exporter

# Verify metrics
curl http://localhost:9100/metrics | head -20

💡 Tip: If you use Docker, you can spin up the entire stack with docker-compose in a single command. Run Prometheus, Grafana, Node Exporter, and Alertmanager on the same network.

Grafana Dashboards and Alerting Rules

Grafana is used to visualize metrics from Prometheus. You can import ready-made dashboards or create your own. With alerting rules, you receive notifications when critical thresholds are exceeded.

alert_rules.yml
groups:
  - name: server_alerts
    rules:
      # CPU usage above 85% for 5 minutes
      - alert: HighCpuUsage
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage ({{ $labels.instance }})"
          description: "CPU usage is {{ $value }}% - above 85% for 5 minutes."

      # Memory usage above 90%
      - alert: HighMemoryUsage
        expr: (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 > 90
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Critical memory usage ({{ $labels.instance }})"

      # Disk usage above 90%
      - alert: DiskSpaceLow
        expr: (1 - node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 > 90
        for: 10m
        labels:
          severity: critical

⚠️ Warning: Do not set the for duration too short in alerting rules. Temporary spikes can produce false positives. 5 minutes for CPU and 10 minutes for disk are reasonable starting values.

PromQL Basics

PromQL (Prometheus Query Language) is a powerful language for querying and analyzing metrics. PromQL expressions are used in Grafana dashboards and alerting rules.

PromQL Queries
# CPU usage percentage (5-minute average)
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Available memory (GB)
node_memory_MemAvailable_bytes / 1024 / 1024 / 1024

# Disk I/O - reads/writes per second
rate(node_disk_read_bytes_total[5m])
rate(node_disk_written_bytes_total[5m])

# Network traffic (MB/s)
rate(node_network_receive_bytes_total{device="eth0"}[5m]) / 1024 / 1024

# Uptime (days)
(time() - node_boot_time_seconds) / 86400

For server security, check our Server Hardening Checklist guide. For log management, see our ELK Stack guide. For infrastructure management with IaC, explore our Terraform guide. Prometheus Official Documentation and Grafana Documentation are valuable additional resources.

Frequently Asked Questions

How much disk space does Prometheus use?

Disk usage depends on the number of metrics and retention period. For an average setup (100 targets, 15s scrape interval, 15-day retention), approximately 10-20 GB of disk space is sufficient. Adjust retention with the --storage.tsdb.retention.time=15d parameter.

How do I import ready-made dashboards in Grafana?

You can import by ID from the Grafana.com dashboard marketplace. Dashboard 1860 is popular for Node Exporter. In the Grafana UI, go to Dashboards > Import > enter the ID and select the Prometheus data source.

How is Prometheus high availability (HA) achieved?

Configure two Prometheus instances to scrape the same targets. For long-term storage, use Thanos or Cortex. Alertmanager cluster mode provides alert deduplication.

What is the difference from push-based monitoring?

Prometheus works pull-based, meaning it scrapes metrics from targets. In push-based systems (like Datadog, InfluxDB), applications send metrics. The pull-based approach automatically detects target health status and provides centralized configuration.

Conclusion

Proactively monitor your infrastructure by setting up a powerful server monitoring stack with Prometheus and Grafana. Collect system metrics with Node Exporter, write meaningful queries with PromQL, visualize with Grafana dashboards, and detect issues early with alerting rules.

High-Performance Servers for Monitoring

Run your Prometheus + Grafana stack with confidence on Hosted Cloud servers.

Explore Cloud Server Plans →
M

Merve Arslan

WordPress & Hosting Expert

Creating guide content on WordPress performance optimization, hosting selection, and e-commerce infrastructure.

Comments coming soon