beye.blog logo
Cisco Technologies

EVPN in a Box Part 1: Running Nexus Dashboard and CML on a Single Proxmox Node

10 min read
#VXLAN#EVPN#Nexus Dashboard#Proxmox#CML#Home-Lab#KVM
EVPN in a Box Part 1: Running Nexus Dashboard and CML on a Single Proxmox Node

Part 1 of the EVPN in a Box series.

Note: This entire setup runs on Proxmox as a lab environment for testing and learning. Proxmox is not a supported platform for Nexus Dashboard or CML in production. Everything described here is strictly for lab use.

For years I ran all my Cisco lab appliances on VMware. Nexus Dashboard, CML, ISE, all on ESXi. It worked. Then one day the license expired and Broadcom made it clear that free ESXi was not coming back. Time to move.

Proxmox was the obvious choice. I had been running it in my home lab for other workloads already and was happy with it. So the plan was simple: install Proxmox on my ASUS NUC 14 Pro (NUC14RVS), download the images for Nexus Dashboard, CML, and Ubuntu, and build out a full VXLAN/EVPN lab from scratch. No manual clicking through NDFC wizards, no hand-configuring switches. I wanted to go from an empty Proxmox host to a complete VXLAN-as-code deployment, fully automated.

The box running all of this is an ASUS NUC 14 Pro (Revel Canyon) with an Intel Core Ultra 7 155H (16 cores, 22 threads, 4.8 GHz turbo), 96 GB DDR5-5600 RAM, 2x NVMe SSDs, 2.5G Ethernet, Thunderbolt 4, and Wi-Fi 6E. Tiny form factor, about the size of a paperback book. Runs the entire lab silently on my desk.

That was the plan. What actually happened involved two days of fighting with Nexus Dashboard's assumptions about its host environment, reverse-engineering an initrd, and writing an expect script to automate a wizard that refuses to remember anything.

Why Run ND and CML Together?

Nexus Dashboard is Cisco's management platform for data center fabrics. NDFC (Nexus Dashboard Fabric Controller) handles VXLAN/EVPN provisioning, monitoring, and lifecycle management. CML (Cisco Modeling Labs) runs virtual N9Kv switches that simulate a real fabric.

Putting both on the same Proxmox host gives you:

  • A completely self-contained EVPN fabric lab with centralized management
  • No dependency on physical Cisco hardware
  • Full API access for testing automation workflows (Ansible, Terraform, Python)
  • A portable environment you can snapshot, back up, and restore

The catch: Cisco never designed ND to run on Proxmox. Every assumption ND makes about its environment is based on ESXi or bare metal. Getting it to work required understanding ND internals at a level Cisco probably did not intend customers to reach.

The Architecture

+-------------------------------------------------------------------+
|                  PROXMOX HOST (192.168.1.158)                     |
|  ASUS NUC 14 Pro / Core Ultra 7 155H / 96 GB DDR5 / 2x NVMe     |
+-------------------------------------------------------------------+
|                                                                   |
|  +-----------------------------+  +-----------------------------+ |
|  |  VM 100: Nexus Dashboard    |  |  VM 101: CML 2.9.1         | |
|  |  ND 4.1.1g                  |  |                             | |
|  |  16 cores / 64 GB RAM       |  |  16 cores / 48 GB RAM      | |
|  |  2x 500 GB disks            |  |  200 GB disk               | |
|  |  4 NICs (mgmt/fabric)       |  |  Nested virtualization     | |
|  |  IP: 192.168.1.250          |  |  IP: 192.168.1.251         | |
|  |                             |  |                             | |
|  |  Hookscript:                |  |  Contains:                 | |
|  |  auto-bootstrap on boot     |  |  - N9Kv 9.2.3 image       | |
|  +-----------------------------+  |  - 5-node fabric topology  | |
|                                    +-----------------------------+ |
|                                                                   |
|  +--------------------------------------------------------------+ |
|  |                      vmbr0 (Bridge)                          | |
|  |                LAN: 192.168.1.0/24                           | |
|  +--------------------------------------------------------------+ |
+-------------------------------------------------------------------+

VM 100 gets four virtio NICs on vmbr0. ND expects these to be named mgmt0, mgmt1, fabric0, and fabric1. VM 101 runs CML with nested virtualization enabled so it can run N9Kv instances inside KVM inside KVM.

Problem 1: ND Interface Naming on Proxmox

What Went Wrong

On ESXi, ND's network interfaces show up as mgmt0, mgmt1, fabric0, fabric1. On Proxmox with virtio NICs, systemd's predictable naming kicks in and renames them to enp6s18, enp6s19, enp6s20, enp6s21.

ND's bootmgr binary hardcodes these interface names. When it looks for mgmt0, there is nothing. ND boots, the wizard runs, but networking never comes up. The web UI at 192.168.1.250 is completely unreachable.

Dead End: Rescue Mode

The obvious fix is systemd .link files that map MAC addresses to interface names. Boot into rescue mode (append rd.break to the kernel command line), create the files, reboot. Done, right?

Wrong. ND uses root=bootmgr in its GRUB config. The entire root filesystem is loaded from initrd.img into RAM. Rescue mode lets you edit the RAM-based filesystem. Reboot and everything you changed is gone.

The Fix: initrd Injection

The .link files need to live inside the initrd.img itself. Here is the process:

Stop the VM. Attach the boot disk via losetup. Mount the nd-boot partition (partition 2, ext4). The initrd.img is gzip-compressed and contains two concatenated cpio archives: an early cpio with CPU microcode (~19 MB) and the main rootfs.

bash
# Decompress and find the main rootfs boundary
zcat /mnt/nd-boot/initrd.img > full.cpio
# Main rootfs starts at offset 19738112 in the decompressed stream
 
# Extract early cpio (keep as-is)
dd if=full.cpio bs=1 count=19738112 of=early.cpio
 
# Extract main rootfs
dd if=full.cpio bs=1 skip=19738112 | cpio -id

Create four .link files in etc/systemd/network/ that map each MAC address to the expected interface name:

ini
# etc/systemd/network/10-mgmt0.link
[Match]
MACAddress=bc:24:11:4a:1d:45
 
[Link]
Name=mgmt0

Repeat for mgmt1, fabric0, fabric1 with their respective MACs. Then repack:

bash
find . -print0 | cpio --null -o -H newc > main.cpio
cat early.cpio main.cpio | gzip > initrd.img.new

Replace the original initrd.img on the nd-boot partition. Boot the VM. Now the interfaces are named correctly, bond1 forms from mgmt0+mgmt1, and ND gets its IP address.

Problem 2: The Wizard That Never Remembers

What Went Wrong

With networking fixed, ND boots, the wizard runs, you enter the admin password, IP, gateway, and cluster role. Five minutes later the web UI is live at 192.168.1.250. Success.

Then you reboot. The wizard runs again. Same questions. Same manual entry. Every time.

The root cause goes back to the RAM-based rootfs. Cloud-init seed data stored in /var/lib/cloud/seed/nd/ is lost on reboot. The bootmgr binary always checks /dev/sr0 and /dev/sr1 for a bootstrap disk. When it finds none, it falls back to the console wizard. Flags on the config LV that should mark the system as provisioned get cleared during the boot sequence.

Dead End: Bootstrap ISO

We created ISOs with cloud-init seed data containing the same configuration the wizard asks for. Tried volume labels cidata, config-2, and NDCONFIG. Attached as ide2 CDROM. Bootmgr still reported "Failed to find bootstrap disk" every time.

The bootstrap disk format is apparently Cisco-proprietary. No public documentation describes how to create one outside the official deployment tools.

The Fix: Auto-Bootstrap Hookscript

Since we cannot stop the wizard from running, we automate answering it. The solution has three components:

An expect script (nd-auto-bootstrap.exp) that connects to the VM serial console via qm terminal and answers each wizard prompt with the correct values.

A bash wrapper (nd-auto-bootstrap.sh) that waits for the VM to be running, then invokes the expect script. Logs everything to /var/log/nd-auto-bootstrap.log.

A Proxmox hookscript (nd-hookscript.pl) attached to VM 100 that fires the bash wrapper in the background on every post-start event.

perl
#!/usr/bin/perl
use strict;
use warnings;
my $vmid = shift;
my $phase = shift;
if ($phase eq 'post-start') {
    system('nohup /usr/local/bin/nd-auto-bootstrap.sh > /dev/null 2>&1 &');
}
exit(0);

The result: start VM 100, walk away, come back five minutes later to a fully functional Nexus Dashboard. No manual interaction required.

Setting Up CML and the Fabric

CML Deployment

CML 2.9.1 runs as VM 101 with 16 cores, 48 GB RAM, and nested virtualization. The N9Kv 9.2.3 image was uploaded via the CML API:

bash
TOKEN=$(curl -sk -X POST "https://192.168.1.251/api/v0/authenticate" \
  -H "Content-Type: application/json" \
  -d '{"username":"admin","password":"***"}' | tr -d '"')
 
curl -sk -X POST "https://192.168.1.251/api/v0/images/upload" \
  -H "Authorization: Bearer $TOKEN" \
  -H "x-original-file-name: nxosv.9.2.3.qcow2" \
  -H "X-File-Name: nxosv-9.2.3" \
  -F "file=@nxosv.9.2.3.qcow2"

Fabric Topology

The 5-node topology (eval license compatible) consists of:

  • 2 spine switches (Spine-1, Spine-2)
  • 3 leaf switches (Leaf-1, Leaf-2, Leaf-3)
  • 1 management switch (unmanaged L2)
  • 1 external connector (bridged to 192.168.1.0/24)

Each spine connects to every leaf (full mesh). Each switch gets 2 vCPUs and 6 GB RAM. Total fabric footprint: 10 vCPUs, 30 GB RAM.

        Spine-1          Spine-2
       /   |   \        /   |   \
      /    |    \      /    |    \
  Leaf-1  Leaf-2  Leaf-3
     |       |       |
  [mgmt-sw connected to all via mgmt0]
     |
  [external connector -> bridge0 -> 192.168.1.0/24]

The initial NX-OS config on each switch enables BGP, NV overlay, and EVPN. Management interfaces are set to DHCP-capable with no shutdown. Once the fabric boots, NDFC on the Nexus Dashboard can discover all switches via their management IPs and provision the full VXLAN/EVPN underlay and overlay.

Why This Matters

Running ND + CML on Proxmox means you can test NDFC fabric automation without a six-figure hardware budget. The entire lab fits on a single server. You can snapshot the VMs before destructive tests and roll back in seconds. The topology files are YAML that you can version-control and modify.

For anyone studying for DCCOR, DCACI, or just trying to learn VXLAN/EVPN with real tooling instead of just packet tracers, this is a practical alternative.

Lessons Learned

ND on Proxmox is possible but unsupported. The two blockers are interface naming (fixable via initrd injection) and wizard persistence (fixable via hookscript automation). Neither is documented anywhere.

Understanding the initrd structure is critical. ND uses a concatenated cpio format with early microcode and a main rootfs. The second archive starts at a specific byte offset. Getting this wrong corrupts the boot.

Do not trust rescue mode for persistent changes on ND. The root=bootmgr design means the rootfs is always rebuilt from initrd on boot. Changes must go into the initrd itself.

The bootstrap ISO format is undocumented for external creation. If you find yourself trying to create one manually, save yourself the time and go with the hookscript approach instead.

Always check Proxmox loop devices before LVM operations. Proxmox thin-LVM auto-creates loop devices that can cause "duplicate PV" errors when you manually attach the same disk.

Getting a full VXLAN/EVPN lab with Nexus Dashboard and CML running on a single Proxmox node took some reverse engineering that you will not find in any Cisco deployment guide. The interface naming fix required understanding how ND's initrd is structured and injecting systemd .link files at the right layer. The wizard persistence fix required accepting that you cannot prevent it and automating around it instead.

The end result is a fully self-contained fabric automation lab. ND boots, configures itself, and is ready for NDFC within five minutes. CML runs a 5-node N9Kv topology that NDFC can discover and manage. All on one box, all on Proxmox, all without a single piece of Cisco hardware.

If you are building a home lab for data center automation, this is one of the more ambitious setups you can attempt. It is not smooth. But it works.


Resources:

Chris Beye

About the Author

Chris Beye

Network automation enthusiast and technology explorer sharing practical insights on Cisco technologies, infrastructure automation, and home lab experiments. Passionate about making complex networking concepts accessible and helping others build better systems.

Read More Like This