UgrĂĄs a tartalomhoz
← Back to the journal

NIP — why XCP-ng is our hypervisor, and why not VMware

Two years ago we picked XCP-ng over VMware: ~7,000 EUR/year saved, smaller attack surface, and two of three missing features we built ourselves.

Why XCP-ng, and not VMware

Nortinia's infrastructure runs roughly 100 VMs across several physical hosts, plus Kubernetes clusters and support services. The hypervisor choice was a deliberate call, not a default. Two years ago the question was: VMware vSphere, Proxmox VE, or XCP-ng? XCP-ng won. Here is why.

The cost math

VMware vSphere Standard in 2024 was roughly 400 EUR / socket / year. On a typical dual-socket Xeon host that's 800 EUR/year/host. Across six hosts: 4,800 EUR/year on hypervisor licensing alone, with vCenter on top. Since the Broadcom acquisition (2024), per-core pricing and minimum commits have only gotten worse.

XCP-ng: 0 EUR in licensing. Vates (the XCP-ng vendor) sells a Pro support tier (~1,250 EUR/year/host) which we don't need at our SLA — the Xen Project and Vates community are responsive enough that the one or two questions we have per month get answered there.

Net savings: ~4,800 EUR/year on licensing alone, plus vCenter (~2,000 EUR/year) avoided.

License sovereignty

After the Broadcom acquisition we read multiple stories of perpetual licenses being converted to subscription-only, and customers seeing 5x cost jumps overnight. That is a structural risk: if the vendor restructures tomorrow, our options are limited. With an open-source hypervisor that risk disappears. The Xen Project (and XCP-ng) lives under the Linux Foundation umbrella, Vates is the maintainer, but a community fork is always possible.

The Xen security model

Xen is a Type 1 (bare-metal) hypervisor: dom0 is the Linux management VM, the domUs are guests. Compared to KVM, Xen has a smaller attack surface: no in-kernel kvm module, hypervisor and host OS cleanly separated. AWS ran EC2 on Xen for years (now on Nitro) precisely for that isolation model.

In practice: over the last 5 years zero CVEs materialised that would have compromised a Nortinia production workload via Xen. (Two Xen security advisories did land — both patched in our weekly patch window.)

Snapshot performance

Xen xen-vbd snapshots are copy-on-write at the Storage Repository level. Snapshot of a 100 GB VM: ~2 seconds. Restore: ~5 minutes (SR-level copy-back). This made daily backup jobs (xe vm-export cron) trivial — under VMware in our environment a comparable snapshot was 8-15 seconds, restore 7-10 minutes.

The 3 missing VMware features

What did we lose by leaving VMware? Three things:

  1. vMotion (live migration UI) — XCP-ng's xe vm-migrate works from the CLI, but vSphere's drag-and-drop UX is nicer. We built our own "Migrate VM" button in NIP that wraps xe vm-migrate. Solved.
  2. DRS (Distributed Resource Scheduler) — VMware auto-balances load across hosts. XCP-ng has nothing like it. We built nip-balancer, a 15-minute cron that watches host load and, when any host crosses 80%, proposes a migration plan to the on-call (not automatic — we don't want 3 a.m. surprises). Solved.
  3. Fault Tolerance (FT, lockstep) — sub-second mirror of a VM on another host, instant failover. We did not build this. Instead: critical workloads run on Kubernetes (replicas), where pod failover is built in. The one true single-instance VM (Postgres primary) gets HA via PG streaming replicas — not at the hypervisor layer.

Why not Proxmox

Proxmox VE is also open source, KVM-based, with an excellent community. We did look. Two reasons we didn't pick it:

  • KVM vs Xen attack surface — subjective, but the Xen model is smaller. AWS precedent.
  • Storage integration story — XCP-ng pairs cleanly with XOSAN/XOSTOR (Vates' distributed storage), but we already had Ceph, and Xen's XAPI was easier to stitch into than Proxmox's pveproxy. Historical reason; today I'd probably re-evaluate.

What we would not change now

Two years in: the choice was right. The annual ~7,000 EUR saved (licensing + vCenter + support) is enough to fund one engineer-day per week if we ever needed deeper xe CLI expertise. We haven't.

Let's talk about your project

Tell us what you are building — we will figure out how to help.

NIP — why XCP-ng is our hypervisor, and why not VMware — Nortinia Journal | Nortinia