VMO Profile Upgrade Runbook
This runbook covers upgrading Spectro Cloud Palette VMO (KubeVirt) profiles on edge bare metal clusters. It is based on real experience upgrading from VMO 1.7.1 to 1.8.0.
Minor Version Skips Will Break Live Migrations
KubeVirt does not support skipping minor versions during live migration. If you jump from v1.5.0 to v1.7.0, all automatic VM migrations triggered by the workload-update-controller will fail due to gRPC incompatibility. Always upgrade one minor version at a time.
Upgrade Flow
flowchart TD
A[Download fresh kubeconfig] --> B[Identify current profile versions]
B --> C[Diff deployed profiles against new RA defaults]
C --> D[Document customizations to preserve]
D --> E[Build API payload with new packs + customizations]
E --> F[PUT profile draft via API]
F --> G[PATCH to publish profile]
G --> H[Monitor cluster reconciliation]
H --> I{VMs migrating?}
I -->|Success| J[Verify upgrade]
I -->|Failure| K[Cancel stuck migrations]
K --> L[Restart affected VMs]
L --> J
Pre-Upgrade
1. Download a Fresh Kubeconfig
Kubeconfigs from Palette expire. Download a new one from the Palette UI before starting.
2. Identify Current Deployed Profile Versions
Check the currently deployed profile versions via the Palette API or UI. You need to know what you are upgrading from to detect customizations.
# Palette API - list cluster profiles
PALETTE_API_KEY=$(op read "op://k8s/Mouser Palette API Key/password")
curl -s -H "ApiKey: ${PALETTE_API_KEY}" \
-H "ProjectUid: 69791bbdb68bd0033ad275a5" \
"https://api.spectrocloud.com/v1/clusterprofiles" | jq '.items[] | {name: .metadata.name, uid: .metadata.uid, version: .spec.version}'
3. Diff Deployed Profiles Against New RA Defaults
Compare the currently deployed (customized) profiles against the new Reference Architecture defaults.
What to Look For
- Container image version diffs -- Ignore these. Image tags update automatically with the pack version.
- Configuration value diffs -- These matter. These are your customizations that must be carried forward into the new profile version.
4. Document All Customizations
Before touching anything, write down every customization. See Critical Customizations below for the known list.
Upgrade Process (Palette API)
API Authentication
export PALETTE_API_KEY=$(op read "op://k8s/Mouser Palette API Key/password")
export PALETTE_PROJECT_UID="69791bbdb68bd0033ad275a5"
export PALETTE_BASE="https://api.spectrocloud.com"
Step 1: Update the Profile Draft
Use PUT /v1/clusterprofiles/{uid} to update the profile. This writes to the draft only -- it does not apply to clusters yet.
{
"metadata": {
"name": "edge-vmo-core",
"annotations": {},
"labels": {}
},
"spec": {
"type": "add-on",
"cloudType": "all",
"version": "1.8.0-pwx",
"template": {
"cloudType": "all",
"type": "add-on",
"packs": [
// ... pack definitions with your customizations applied
],
"packServerRefs": [
// ... registry references
]
}
}
}
Step 2: Publish the Profile
The PUT only stages the draft. You must publish it for clusters to pick up the change.
curl -X PATCH \
-H "ApiKey: ${PALETTE_API_KEY}" \
-H "ProjectUid: ${PALETTE_PROJECT_UID}" \
"${PALETTE_BASE}/v1/clusterprofiles/${PROFILE_UID}/publish"
Cluster Reconciliation
After publishing, Palette will begin reconciling the cluster against the new profile. Monitor progress in the Palette UI under the cluster's Profile tab or via the cluster events API.
Critical Customizations to Carry Forward
These customizations have been identified from the edge-vmo deployment. If they are lost during upgrade, things will break.
Core (Add-On) Profile
| Pack | Setting | Required Value | Default Value | Why It Matters |
|---|---|---|---|---|
| MetalLB | autoAssign |
false |
true |
Prevents MetalLB from assigning IPs to services that should not get one |
| nginx-ingress | loadBalancerIP |
10.0.12.36 |
"" |
Pins the ingress to a known IP for DNS records |
| VMO | additionalFeatureGates |
Must include SMM |
Not included | Required for VMs migrated from vSphere with EFI SecureBoot (Forklift migrations) |
| prometheus-operator | manifest issuer-selfsigned |
Present | Not present | Self-signed cert issuer for internal TLS |
Infrastructure Profile
| Pack | Setting | Required Value | Why It Matters |
|---|---|---|---|
| Cilium | clusterPoolIPv4PodCIDRList |
Palette variable ({{ ... }}) |
Must use the Palette-managed CIDR variable, not a hardcoded value |
| Portworx | StorageClass name | portworx-block |
VMs and PVCs reference this name |
| Portworx | StorageClass parameters | Custom parameters for block storage | Tuned for KubeVirt workloads |
| Portworx | KubeVirt annotation | Present on StorageClass | Enables CDI/KubeVirt integration |
| Portworx | Node affinity | Configured | Controls which nodes run Portworx |
| Portworx | run-on-master annotation |
Present | Allows Portworx on control plane nodes |
| Portworx | PVC controller ports | Custom | Avoids port conflicts |
| Portworx | activateLicense |
false |
Prevents license activation on non-production clusters |
Double-Check VMO Feature Gates
If the SMM feature gate is missing after upgrade, any VM with EFI SecureBoot enabled will fail to start. This is especially critical for VMs migrated from vSphere via Forklift where SecureBoot was enabled on the source.
Post-Upgrade: VM Migration Issue
After upgrading the VMO pack, the workload-update-controller automatically attempts to live-migrate all running VMs to new virt-launcher pods that match the updated KubeVirt version.
When Migrations Fail
If the upgrade skips a KubeVirt minor version (e.g., v1.5.0 to v1.7.0), live migrations will fail due to gRPC API incompatibility between the old and new virt-launcher versions.
Symptoms
In VMI events / migration objects:
internal error: client socket is closedonMigrateToURI3vif-cache-pod*.json: no such file or directory(secondary -- caused by stale network state)
In virt-handler logs:
GetDomainDirtyRateStats: Unimplemented-- the old virt-launcher does not implement a gRPC method introduced in a version that was skipped
Resolution
# 1. List VMs with outdated launcher images (these are the ones stuck)
kubectl get vmi -l kubevirt.io/outdatedLauncherImage --all-namespaces
# 2. Cancel any stuck migrations
virtctl migrate-cancel <vm-name> -n virtual-machines
# 3. Restart the VM (causes brief downtime -- the VM will boot on the new virt-launcher)
virtctl restart <vm-name> -n virtual-machines
# 4. Repeat for all affected VMs
VM Restart = Brief Downtime
virtctl restart performs a graceful shutdown and reboot of the guest. The VM will be unavailable for the duration of the shutdown and boot cycle. Plan accordingly for production workloads.
Prevention
Always step through each KubeVirt minor version sequentially. For example, to go from 1.5.0 to 1.8.0:
Each step allows live migration to succeed because the gRPC interface only changes by one version.
Verifying the Upgrade
Run through each of these checks after the upgrade is complete and cluster reconciliation has finished.
Profile Version
Confirm the cluster is running the expected profile version in the Palette UI, or via API:
curl -s -H "ApiKey: ${PALETTE_API_KEY}" \
-H "ProjectUid: ${PALETTE_PROJECT_UID}" \
"${PALETTE_BASE}/v1/spectroclusters/<cluster-uid>/profiles" | jq '.profiles[] | {name: .metadata.name, version: .spec.version}'
KubeVirt Components
# KubeVirt operator and components should be running the new version
kubectl get kubevirt -n kubevirt -o jsonpath='{.items[0].status.observedKubeVirtVersion}'
# All KubeVirt pods should be Running and Ready
kubectl get pods -n kubevirt
# CDI should also be healthy
kubectl get pods -n cdi
VM Health
# All VMIs should be in Running phase
kubectl get vmi --all-namespaces
# No VMIs should have outdated launcher images
kubectl get vmi -l kubevirt.io/outdatedLauncherImage --all-namespaces
# Expected: "No resources found"
# No stuck migrations
kubectl get vmim --all-namespaces
# Any completed migrations should show Succeeded, none should be in Running/Pending
Storage
# Portworx should be healthy
kubectl get storagecluster -n portworx -o jsonpath='{.items[0].status.phase}'
# Storage class should exist with correct name and annotations
kubectl get sc portworx-block -o yaml | grep -A5 annotations
Networking
# MetalLB should not have auto-assigned IPs to unintended services
kubectl get svc --all-namespaces -o wide | grep LoadBalancer
# Ingress should be on the expected IP
kubectl get svc -n ingress-nginx -o jsonpath='{.items[0].status.loadBalancer.ingress[0].ip}'
# Expected: 10.0.12.36
# Multus and secondary networks should be intact
kubectl get net-attach-def --all-namespaces
Cluster Health
# All nodes Ready
kubectl get nodes
# No unexpected pod failures
kubectl get pods --all-namespaces --field-selector=status.phase!=Running,status.phase!=Succeeded
# Palette agent healthy
kubectl get pods -n cluster-{{ cluster-uid }} 2>/dev/null || kubectl get pods --all-namespaces | grep palette
Upgrade Complete
If all checks pass, the upgrade is complete. Update the profile version in any tracking documentation and note any new customizations that were added during the upgrade.