VXLAN as Code: Why I Built a Two-Day Workshop Around It

I keep having the same conversation with customers. They bought NDFC. They have a spine-leaf fabric running VXLAN EVPN. The technology works. But when it comes to automating it, treating the fabric config as code, versioning it, deploying it through a pipeline, most teams hit a wall.
It's not because the tools are bad. It's because the gap between clicking through NDFC's GUI and writing a declarative data model in YAML is massive. And nobody talks about the stuff in between: Git, containers, CI/CD, merge requests, pipeline stages. The foundational DevOps skills that network engineers were never taught.
That's why I built this workshop.
The problem I kept seeing
Every customer conversation about NDFC automation eventually lands on the same pain points:
- Snowflake switches. Every device configured slightly differently because three people touched it over two years. Nobody knows what the "correct" state is anymore.
- No traceability. Something broke after a change. "But it was working last week!" Great. Which change? Who made it? When? No audit trail, no history, no way to compare.
- Copy-paste at scale. Need to roll out 20 VLANs across the fabric? Someone's sitting there duplicating configuration blocks and hoping they didn't typo a VNI somewhere.
- No rollback story. If a deployment goes wrong, the recovery plan is "log into each switch and figure it out." That's not a rollback, that's a prayer.
These aren't edge cases. This is the default state for most teams running VXLAN fabrics today.
What the workshop covers
I designed a two-day hands-on training that builds up the entire automation stack step by step. Not just the VXLAN piece, but everything a network team needs to understand before they can do VXLAN as code.
Day 1: DevOps Foundations
┌─────────────────────────────────────────────────────────────────┐
│ Module 1: Introduction to NetDevOps (90 min) │
│ Module 2: Version Control with Git (90 min) │
│ Module 3: GitLab for Network Teams (105 min) │
│ Module 4: Containers and Docker (90 min) │
├─────────────────────────────────────────────────────────────────┤
Day 2: Automation & VXLAN as Code
│ Module 5: CI/CD Pipelines for Network (180 min) │
│ Module 6: VXLAN as Code with NDFC (150 min) │
└─────────────────────────────────────────────────────────────────┘
Day 1 is intentionally all foundational. Git, GitLab, Docker. The stuff that software teams take for granted but network engineers rarely get formal training on. Participants create repos, resolve merge conflicts, build Docker containers with the cisco.dcnm Ansible collection, and learn how protected branches and merge requests work.
Day 2 is where it gets real. Module 5 is the deepest one at three hours. Participants build a complete five-stage CI/CD pipeline in GitLab: validate, test, approve, deploy, verify. Manual approval gates, environment variables for credentials, artifacts passed between stages. By the end of module 5, they understand how a pipeline works before they ever touch VXLAN config.
Module 6: The "aha" moment
This is where everything clicks. Module 6 builds the same VXLAN EVPN fabric twice. Same topology, same result. Two completely different approaches.
The imperative way
Seven Ansible playbooks, executed in sequence:
- Create the VXLAN EVPN fabric through NDFC's REST API
- Discover and add four switches by seed IP
- Configure access interfaces on the leafs
- Create VRF_PROD with VRF ID and VLAN assignments
- Create the overlay network with VNI and gateway
- Deploy the staged configuration to all switches
- Run ping tests to verify connectivity
Each playbook talks directly to NDFC's API. The fabric creation playbook alone has over 100 parameters in the Easy_Fabric template. You have to get the ordering right. You have to handle each resource type separately. It works, but it's brittle.
The declarative way
Four YAML files. That's it.
# global.nac.yaml
vxlan:
global:
name: YOURNAME_Fabric
bgp_asn: 65535
route_reflectors: 2
anycast_gateway_mac: 2020.0000.00aa
# vrfs.nac.yaml
vxlan:
vrfs:
- name: VRF_PROD
vrf_id: 150001
vlan_id: 2001
attach_groups:
- name: all_leafs
switches:
- { hostname: leaf01 }
- { hostname: leaf02 }One playbook runs three roles: validate, create, deploy. The roles figure out the ordering, the dependencies, the API calls. You describe what you want, not how to get there.
Adding a new network? Eight lines of YAML, push to GitLab, the pipeline validates and deploys it. Removing a VRF? Delete the lines, push, done. Want to roll back? git revert and the pipeline puts the fabric back to the previous state.
The comparison that sells it
| Imperative (7 playbooks) | Declarative (4 YAML files) | |
|---|---|---|
| Adding a network | Write a new playbook | Add 8 lines of YAML |
| Ordering | You figure it out | Automatic |
| Error handling | Per playbook | Built-in validate role |
| Rollback | Complex | git revert + push |
When participants see this side by side, the reaction is always the same. "Why would anyone do it the other way?"
The answer is: because nobody showed them this approach exists. And because the declarative way only makes sense if you understand Git, pipelines, and how data models work. Which is exactly why day 1 exists.
The lab environment
Everything runs on Cisco dCloud. Four Nexus 9300v switches in CML, NDFC as the fabric controller, a self-hosted GitLab instance, and VS Code on a Windows jump host. Participants get a full environment they can break without consequences.
┌──────────────┐
│ NDFC │
│ 198.18.133 │
└──────┬───────┘
│
┌────────────┼────────────┐
│ │
┌─────┴─────┐ ┌──────┴────┐
│ spine01 │ │ spine02 │
└─────┬─────┘ └──────┬────┘
│ │
┌─────┼─────────────────────────┤
│ │
┌─────┴─────┐ ┌──────┴────┐
│ leaf01 │ │ leaf02 │
│ Eth1/3: ──┼── Workshop ─────┼── Eth1/3 │
└───────────┘ Clients └───────────┘
The topology is simple on purpose. Two spines, two leafs, access ports on Ethernet1/3. Enough to demonstrate the full VXLAN EVPN stack without drowning in complexity.
Why the foundations matter
I could have built a workshop that's just "here's the YAML, here's the pipeline, push the button." It would take four hours instead of two days. But I've seen what happens when teams skip the fundamentals.
They copy the example repo, run it once, it works. Then they need to customize something. They don't understand why the pipeline has stages. They don't know how to resolve a merge conflict when two engineers change the same network definition. They can't debug a failing Docker container that builds their Ansible environment.
The tools are not the hard part. Understanding why the tools exist and how they fit together, that's the hard part. A network engineer who understands Git branching, Docker images, and pipeline stages can figure out any automation framework. One who only knows the YAML syntax is stuck the moment something breaks.
Want to try it?
The full workshop guide is available at vxlanascode.crossdomain-automation.tech. All six modules, all exercises, all the theory.
If you want the hands-on experience with a live dCloud lab environment, reach out to me. I can set up sessions for teams who want to go through the material with actual switches, actual NDFC, and actual pipelines deploying actual VXLAN fabrics. It makes a difference when the ping test at the end actually works across your own lab topology.
The cisco.nac_dc_vxlan collection and example data models are documented at netascode.cisco.com, and the example repository is on GitHub.
Resources:






