Monitoring and Alerting

Stack Overview

Component	Version	Namespace
Prometheus Operator	v80.4.2	`monitoring`
Grafana	(bundled)	`monitoring`
Portworx metrics	Integrated	`portworx`
Autopilot	Integrated	`portworx`

Grafana Access

Grafana is accessible via ingress. Credentials are managed through Palette variables in the cluster profile.

TLS

The prometheus-operator manifest includes an issuer-selfsigned resource for TLS certificate generation. Browser warnings are expected.

Portworx Metrics

Portworx exports metrics directly to Prometheus with exportMetrics: true in the Portworx pack configuration.

Autopilot uses these Prometheus metrics to make automated storage scaling decisions (e.g., expanding volumes when usage exceeds thresholds).

Key Metrics

VM Status

Track the phase distribution of all VMIs:

kubevirt_vmi_phase_count

Filter by specific phase:

kubevirt_vmi_phase_count{phase="Running"}

Migration Performance

Measure time from migration creation to completion:

kubevirt_vmi_migration_phase_transition_time_from_creation_seconds

Track migration success/failure rates:

kubevirt_vmi_migrations_in_pending_phase
kubevirt_vmi_migrations_in_scheduling_phase
kubevirt_vmi_migrations_in_running_phase

Storage

Portworx cluster available disk space:

px_cluster_disk_available_bytes

Per-volume usage:

px_volume_usage_bytes

Storage Alerts

Set alerts when px_cluster_disk_available_bytes drops below 20% of total capacity or when individual volumes exceed 80% usage. Autopilot can handle automatic expansion but should be monitored.

Node Health

kube_node_status_condition{condition="Ready", status="true"}

Detect nodes with memory or disk pressure:

kube_node_status_condition{condition="MemoryPressure", status="true"}
kube_node_status_condition{condition="DiskPressure", status="true"}

Useful Grafana Dashboards

The Prometheus Operator deployment includes several pre-configured dashboards:

Kubernetes / Compute Resources / Cluster - Overall cluster CPU and memory usage.
Kubernetes / Compute Resources / Node - Per-node resource consumption.
Node Exporter / Nodes - Hardware-level metrics (disk I/O, network, CPU).

Tip

Import the KubeVirt Grafana dashboard for VM-specific metrics visualization. Search for "KubeVirt" in the Grafana dashboard marketplace.

Checking Prometheus Targets

Verify all expected targets are being scraped:

```bash copy kubectl port-forward svc/prometheus-operated 9090:9090 -n monitoring

Then open `http://localhost:9090/targets` in a browser.

---

## Alertmanager

Check active alerts:

```bash copy
kubectl port-forward svc/alertmanager-operated 9093:9093 -n monitoring

Then open http://localhost:9093/#/alerts in a browser.

List firing alerts via CLI:

```bash copy kubectl get prometheusrules -n monitoring

```bash copy
kubectl get prometheusrules -n monitoring -o yaml | grep -A 5 "alert:"