Skip to content

VMO Profile Upgrade Runbook

This runbook covers upgrading Spectro Cloud Palette VMO (KubeVirt) profiles on edge bare metal clusters. It is based on real experience upgrading from VMO 1.7.1 to 1.8.0.

Minor Version Skips Will Break Live Migrations

KubeVirt does not support skipping minor versions during live migration. If you jump from v1.5.0 to v1.7.0, all automatic VM migrations triggered by the workload-update-controller will fail due to gRPC incompatibility. Always upgrade one minor version at a time.

Upgrade Flow

flowchart TD
    A[Download fresh kubeconfig] --> B[Identify current profile versions]
    B --> C[Diff deployed profiles against new RA defaults]
    C --> D[Document customizations to preserve]
    D --> E[Build API payload with new packs + customizations]
    E --> F[PUT profile draft via API]
    F --> G[PATCH to publish profile]
    G --> H[Monitor cluster reconciliation]
    H --> I{VMs migrating?}
    I -->|Success| J[Verify upgrade]
    I -->|Failure| K[Cancel stuck migrations]
    K --> L[Restart affected VMs]
    L --> J

Pre-Upgrade

1. Download a Fresh Kubeconfig

Kubeconfigs from Palette expire. Download a new one from the Palette UI before starting.

export KUBECONFIG=~/Downloads/admin.edge-vmo.kubeconfig
kubectl get nodes  # verify connectivity

2. Identify Current Deployed Profile Versions

Check the currently deployed profile versions via the Palette API or UI. You need to know what you are upgrading from to detect customizations.

# Palette API - list cluster profiles
PALETTE_API_KEY=$(op read "op://k8s/Mouser Palette API Key/password")

curl -s -H "ApiKey: ${PALETTE_API_KEY}" \
  -H "ProjectUid: 69791bbdb68bd0033ad275a5" \
  "https://api.spectrocloud.com/v1/clusterprofiles" | jq '.items[] | {name: .metadata.name, uid: .metadata.uid, version: .spec.version}'

3. Diff Deployed Profiles Against New RA Defaults

Compare the currently deployed (customized) profiles against the new Reference Architecture defaults.

What to Look For

  • Container image version diffs -- Ignore these. Image tags update automatically with the pack version.
  • Configuration value diffs -- These matter. These are your customizations that must be carried forward into the new profile version.

4. Document All Customizations

Before touching anything, write down every customization. See Critical Customizations below for the known list.


Upgrade Process (Palette API)

API Authentication

export PALETTE_API_KEY=$(op read "op://k8s/Mouser Palette API Key/password")
export PALETTE_PROJECT_UID="69791bbdb68bd0033ad275a5"
export PALETTE_BASE="https://api.spectrocloud.com"

Step 1: Update the Profile Draft

Use PUT /v1/clusterprofiles/{uid} to update the profile. This writes to the draft only -- it does not apply to clusters yet.

{
  "metadata": {
    "name": "edge-vmo-core",
    "annotations": {},
    "labels": {}
  },
  "spec": {
    "type": "add-on",
    "cloudType": "all",
    "version": "1.8.0-pwx",
    "template": {
      "cloudType": "all",
      "type": "add-on",
      "packs": [
        // ... pack definitions with your customizations applied
      ],
      "packServerRefs": [
        // ... registry references
      ]
    }
  }
}
PROFILE_UID="<your-profile-uid>"

curl -X PUT \
  -H "ApiKey: ${PALETTE_API_KEY}" \
  -H "ProjectUid: ${PALETTE_PROJECT_UID}" \
  -H "Content-Type: application/json" \
  -d @profile-payload.json \
  "${PALETTE_BASE}/v1/clusterprofiles/${PROFILE_UID}"

Step 2: Publish the Profile

The PUT only stages the draft. You must publish it for clusters to pick up the change.

curl -X PATCH \
  -H "ApiKey: ${PALETTE_API_KEY}" \
  -H "ProjectUid: ${PALETTE_PROJECT_UID}" \
  "${PALETTE_BASE}/v1/clusterprofiles/${PROFILE_UID}/publish"

Cluster Reconciliation

After publishing, Palette will begin reconciling the cluster against the new profile. Monitor progress in the Palette UI under the cluster's Profile tab or via the cluster events API.


Critical Customizations to Carry Forward

These customizations have been identified from the edge-vmo deployment. If they are lost during upgrade, things will break.

Core (Add-On) Profile

Pack Setting Required Value Default Value Why It Matters
MetalLB autoAssign false true Prevents MetalLB from assigning IPs to services that should not get one
nginx-ingress loadBalancerIP 10.0.12.36 "" Pins the ingress to a known IP for DNS records
VMO additionalFeatureGates Must include SMM Not included Required for VMs migrated from vSphere with EFI SecureBoot (Forklift migrations)
prometheus-operator manifest issuer-selfsigned Present Not present Self-signed cert issuer for internal TLS

Infrastructure Profile

Pack Setting Required Value Why It Matters
Cilium clusterPoolIPv4PodCIDRList Palette variable ({{ ... }}) Must use the Palette-managed CIDR variable, not a hardcoded value
Portworx StorageClass name portworx-block VMs and PVCs reference this name
Portworx StorageClass parameters Custom parameters for block storage Tuned for KubeVirt workloads
Portworx KubeVirt annotation Present on StorageClass Enables CDI/KubeVirt integration
Portworx Node affinity Configured Controls which nodes run Portworx
Portworx run-on-master annotation Present Allows Portworx on control plane nodes
Portworx PVC controller ports Custom Avoids port conflicts
Portworx activateLicense false Prevents license activation on non-production clusters

Double-Check VMO Feature Gates

If the SMM feature gate is missing after upgrade, any VM with EFI SecureBoot enabled will fail to start. This is especially critical for VMs migrated from vSphere via Forklift where SecureBoot was enabled on the source.


Post-Upgrade: VM Migration Issue

After upgrading the VMO pack, the workload-update-controller automatically attempts to live-migrate all running VMs to new virt-launcher pods that match the updated KubeVirt version.

When Migrations Fail

If the upgrade skips a KubeVirt minor version (e.g., v1.5.0 to v1.7.0), live migrations will fail due to gRPC API incompatibility between the old and new virt-launcher versions.

Symptoms

In VMI events / migration objects:

  • internal error: client socket is closed on MigrateToURI3
  • vif-cache-pod*.json: no such file or directory (secondary -- caused by stale network state)

In virt-handler logs:

  • GetDomainDirtyRateStats: Unimplemented -- the old virt-launcher does not implement a gRPC method introduced in a version that was skipped

Resolution

# 1. List VMs with outdated launcher images (these are the ones stuck)
kubectl get vmi -l kubevirt.io/outdatedLauncherImage --all-namespaces

# 2. Cancel any stuck migrations
virtctl migrate-cancel <vm-name> -n virtual-machines

# 3. Restart the VM (causes brief downtime -- the VM will boot on the new virt-launcher)
virtctl restart <vm-name> -n virtual-machines

# 4. Repeat for all affected VMs

VM Restart = Brief Downtime

virtctl restart performs a graceful shutdown and reboot of the guest. The VM will be unavailable for the duration of the shutdown and boot cycle. Plan accordingly for production workloads.

Prevention

Always step through each KubeVirt minor version sequentially. For example, to go from 1.5.0 to 1.8.0:

1.5.0 -> 1.6.0 -> 1.7.0 -> 1.8.0

Each step allows live migration to succeed because the gRPC interface only changes by one version.

Configure the workloadUpdateStrategy in the KubeVirt CR to include Evict as a fallback. This allows the controller to shut down and restart VMs that cannot be live-migrated, rather than leaving them stuck.

spec:
  workloadUpdateStrategy:
    workloadUpdateMethods:
      - LiveMigrate
      - Evict

Verifying the Upgrade

Run through each of these checks after the upgrade is complete and cluster reconciliation has finished.

Profile Version

Confirm the cluster is running the expected profile version in the Palette UI, or via API:

curl -s -H "ApiKey: ${PALETTE_API_KEY}" \
  -H "ProjectUid: ${PALETTE_PROJECT_UID}" \
  "${PALETTE_BASE}/v1/spectroclusters/<cluster-uid>/profiles" | jq '.profiles[] | {name: .metadata.name, version: .spec.version}'

KubeVirt Components

# KubeVirt operator and components should be running the new version
kubectl get kubevirt -n kubevirt -o jsonpath='{.items[0].status.observedKubeVirtVersion}'

# All KubeVirt pods should be Running and Ready
kubectl get pods -n kubevirt

# CDI should also be healthy
kubectl get pods -n cdi

VM Health

# All VMIs should be in Running phase
kubectl get vmi --all-namespaces

# No VMIs should have outdated launcher images
kubectl get vmi -l kubevirt.io/outdatedLauncherImage --all-namespaces
# Expected: "No resources found"

# No stuck migrations
kubectl get vmim --all-namespaces
# Any completed migrations should show Succeeded, none should be in Running/Pending

Storage

# Portworx should be healthy
kubectl get storagecluster -n portworx -o jsonpath='{.items[0].status.phase}'

# Storage class should exist with correct name and annotations
kubectl get sc portworx-block -o yaml | grep -A5 annotations

Networking

# MetalLB should not have auto-assigned IPs to unintended services
kubectl get svc --all-namespaces -o wide | grep LoadBalancer

# Ingress should be on the expected IP
kubectl get svc -n ingress-nginx -o jsonpath='{.items[0].status.loadBalancer.ingress[0].ip}'
# Expected: 10.0.12.36

# Multus and secondary networks should be intact
kubectl get net-attach-def --all-namespaces

Cluster Health

# All nodes Ready
kubectl get nodes

# No unexpected pod failures
kubectl get pods --all-namespaces --field-selector=status.phase!=Running,status.phase!=Succeeded

# Palette agent healthy
kubectl get pods -n cluster-{{ cluster-uid }} 2>/dev/null || kubectl get pods --all-namespaces | grep palette

Upgrade Complete

If all checks pass, the upgrade is complete. Update the profile version in any tracking documentation and note any new customizations that were added during the upgrade.