
Setting Up a Server Monitoring Stack with Prometheus + Grafana
Monitoring your servers' CPU, memory, disk, and network metrics in real time is the foundation of proactive issue detection. Prometheus handles metric collection and querying, while Grafana provides the visualization layer - together they form an excellent monitoring stack. This guide walks you thro
Merve Arslan
WordPress & Hosting Expert
Monitoring your servers' CPU, memory, disk, and network metrics in real time is the foundation of proactive issue detection. Prometheus handles metric collection and querying, while Grafana provides the visualization layer - together they form an excellent monitoring stack. This guide walks you through setup, dashboard creation, alerting rules, and PromQL basics step by step.
Prometheus Architecture
Prometheus is a pull-based monitoring system. It scrapes metrics from exporters on target servers at regular intervals and stores them in a local TSDB (Time Series Database). The core components are:
| Component | Role | Port |
|---|---|---|
| Prometheus Server | Metric collection, storage, and querying | 9090 |
| Node Exporter | Server metrics (CPU, RAM, disk, network) | 9100 |
| Alertmanager | Alert management and notification delivery | 9093 |
| Grafana | Dashboards and visualization | 3000 |
Prometheus Configuration
The prometheus.yml file defines which targets Prometheus scrapes metrics from and how frequently it performs scraping.
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "alert_rules.yml"
alerting:
alertmanagers:
- static_configs:
- targets:
- "alertmanager:9093"
scrape_configs:
# Prometheus monitors its own metrics
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
# Server metrics
- job_name: "node"
static_configs:
- targets:
- "server-1:9100"
- "server-2:9100"
- "server-3:9100"
labels:
env: production
Node Exporter Setup
Node Exporter is a lightweight exporter that collects CPU, memory, disk, and network metrics from Linux servers. It must be installed on every server you want to monitor.
# Download and install Node Exporter
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz
sudo mv node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/
# Create systemd service file
sudo cat <<EOF > /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
After=network.target
[Service]
User=node_exporter
ExecStart=/usr/local/bin/node_exporter
Restart=always
[Install]
WantedBy=multi-user.target
EOF
# Start the service
sudo systemctl daemon-reload
sudo systemctl enable --now node_exporter
# Verify metrics
curl http://localhost:9100/metrics | head -20
💡 Tip: If you use Docker, you can spin up the entire stack with docker-compose in a single command. Run Prometheus, Grafana, Node Exporter, and Alertmanager on the same network.
Grafana Dashboards and Alerting Rules
Grafana is used to visualize metrics from Prometheus. You can import ready-made dashboards or create your own. With alerting rules, you receive notifications when critical thresholds are exceeded.
groups:
- name: server_alerts
rules:
# CPU usage above 85% for 5 minutes
- alert: HighCpuUsage
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage ({{ $labels.instance }})"
description: "CPU usage is {{ $value }}% - above 85% for 5 minutes."
# Memory usage above 90%
- alert: HighMemoryUsage
expr: (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 > 90
for: 5m
labels:
severity: critical
annotations:
summary: "Critical memory usage ({{ $labels.instance }})"
# Disk usage above 90%
- alert: DiskSpaceLow
expr: (1 - node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 > 90
for: 10m
labels:
severity: critical
⚠️ Warning: Do not set the for duration too short in alerting rules. Temporary spikes can produce false positives. 5 minutes for CPU and 10 minutes for disk are reasonable starting values.
PromQL Basics
PromQL (Prometheus Query Language) is a powerful language for querying and analyzing metrics. PromQL expressions are used in Grafana dashboards and alerting rules.
# CPU usage percentage (5-minute average)
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# Available memory (GB)
node_memory_MemAvailable_bytes / 1024 / 1024 / 1024
# Disk I/O - reads/writes per second
rate(node_disk_read_bytes_total[5m])
rate(node_disk_written_bytes_total[5m])
# Network traffic (MB/s)
rate(node_network_receive_bytes_total{device="eth0"}[5m]) / 1024 / 1024
# Uptime (days)
(time() - node_boot_time_seconds) / 86400
For server security, check our Server Hardening Checklist guide. For log management, see our ELK Stack guide. For infrastructure management with IaC, explore our Terraform guide. Prometheus Official Documentation and Grafana Documentation are valuable additional resources.
Frequently Asked Questions
How much disk space does Prometheus use?
Disk usage depends on the number of metrics and retention period. For an average setup (100 targets, 15s scrape interval, 15-day retention), approximately 10-20 GB of disk space is sufficient. Adjust retention with the --storage.tsdb.retention.time=15d parameter.
How do I import ready-made dashboards in Grafana?
You can import by ID from the Grafana.com dashboard marketplace. Dashboard 1860 is popular for Node Exporter. In the Grafana UI, go to Dashboards > Import > enter the ID and select the Prometheus data source.
How is Prometheus high availability (HA) achieved?
Configure two Prometheus instances to scrape the same targets. For long-term storage, use Thanos or Cortex. Alertmanager cluster mode provides alert deduplication.
What is the difference from push-based monitoring?
Prometheus works pull-based, meaning it scrapes metrics from targets. In push-based systems (like Datadog, InfluxDB), applications send metrics. The pull-based approach automatically detects target health status and provides centralized configuration.
Conclusion
Proactively monitor your infrastructure by setting up a powerful server monitoring stack with Prometheus and Grafana. Collect system metrics with Node Exporter, write meaningful queries with PromQL, visualize with Grafana dashboards, and detect issues early with alerting rules.
High-Performance Servers for Monitoring
Run your Prometheus + Grafana stack with confidence on Hosted Cloud servers.
Explore Cloud Server Plans →Merve Arslan
WordPress & Hosting Expert
Creating guide content on WordPress performance optimization, hosting selection, and e-commerce infrastructure.
Comments coming soon