beye.blog logo
Cisco Technologies

VXLAN as Code: Why I Built a Two-Day Workshop Around It

7 min read
#VXLAN#NDFC#Network Automation#CI/CD#Ansible#GitLab
VXLAN as Code: Why I Built a Two-Day Workshop Around It

I keep having the same conversation with customers. They bought NDFC. They have a spine-leaf fabric running VXLAN EVPN. The technology works. But when it comes to automating it, treating the fabric config as code, versioning it, deploying it through a pipeline, most teams hit a wall.

It's not because the tools are bad. It's because the gap between clicking through NDFC's GUI and writing a declarative data model in YAML is massive. And nobody talks about the stuff in between: Git, containers, CI/CD, merge requests, pipeline stages. The foundational DevOps skills that network engineers were never taught.

That's why I built this workshop.

The problem I kept seeing

Every customer conversation about NDFC automation eventually lands on the same pain points:

  • Snowflake switches. Every device configured slightly differently because three people touched it over two years. Nobody knows what the "correct" state is anymore.
  • No traceability. Something broke after a change. "But it was working last week!" Great. Which change? Who made it? When? No audit trail, no history, no way to compare.
  • Copy-paste at scale. Need to roll out 20 VLANs across the fabric? Someone's sitting there duplicating configuration blocks and hoping they didn't typo a VNI somewhere.
  • No rollback story. If a deployment goes wrong, the recovery plan is "log into each switch and figure it out." That's not a rollback, that's a prayer.

These aren't edge cases. This is the default state for most teams running VXLAN fabrics today.

What the workshop covers

I designed a two-day hands-on training that builds up the entire automation stack step by step. Not just the VXLAN piece, but everything a network team needs to understand before they can do VXLAN as code.

Day 1: DevOps Foundations
┌─────────────────────────────────────────────────────────────────┐
│  Module 1: Introduction to NetDevOps              (90 min)     │
│  Module 2: Version Control with Git               (90 min)     │
│  Module 3: GitLab for Network Teams              (105 min)     │
│  Module 4: Containers and Docker                  (90 min)     │
├─────────────────────────────────────────────────────────────────┤
Day 2: Automation & VXLAN as Code
│  Module 5: CI/CD Pipelines for Network            (180 min)    │
│  Module 6: VXLAN as Code with NDFC               (150 min)    │
└─────────────────────────────────────────────────────────────────┘

Day 1 is intentionally all foundational. Git, GitLab, Docker. The stuff that software teams take for granted but network engineers rarely get formal training on. Participants create repos, resolve merge conflicts, build Docker containers with the cisco.dcnm Ansible collection, and learn how protected branches and merge requests work.

Day 2 is where it gets real. Module 5 is the deepest one at three hours. Participants build a complete five-stage CI/CD pipeline in GitLab: validate, test, approve, deploy, verify. Manual approval gates, environment variables for credentials, artifacts passed between stages. By the end of module 5, they understand how a pipeline works before they ever touch VXLAN config.

Module 6: The "aha" moment

This is where everything clicks. Module 6 builds the same VXLAN EVPN fabric twice. Same topology, same result. Two completely different approaches.

The imperative way

Seven Ansible playbooks, executed in sequence:

  1. Create the VXLAN EVPN fabric through NDFC's REST API
  2. Discover and add four switches by seed IP
  3. Configure access interfaces on the leafs
  4. Create VRF_PROD with VRF ID and VLAN assignments
  5. Create the overlay network with VNI and gateway
  6. Deploy the staged configuration to all switches
  7. Run ping tests to verify connectivity

Each playbook talks directly to NDFC's API. The fabric creation playbook alone has over 100 parameters in the Easy_Fabric template. You have to get the ordering right. You have to handle each resource type separately. It works, but it's brittle.

The declarative way

Four YAML files. That's it.

yaml
# global.nac.yaml
vxlan:
  global:
    name: YOURNAME_Fabric
    bgp_asn: 65535
    route_reflectors: 2
    anycast_gateway_mac: 2020.0000.00aa
 
# vrfs.nac.yaml
vxlan:
  vrfs:
    - name: VRF_PROD
      vrf_id: 150001
      vlan_id: 2001
      attach_groups:
        - name: all_leafs
          switches:
            - { hostname: leaf01 }
            - { hostname: leaf02 }

One playbook runs three roles: validate, create, deploy. The roles figure out the ordering, the dependencies, the API calls. You describe what you want, not how to get there.

Adding a new network? Eight lines of YAML, push to GitLab, the pipeline validates and deploys it. Removing a VRF? Delete the lines, push, done. Want to roll back? git revert and the pipeline puts the fabric back to the previous state.

The comparison that sells it

Imperative (7 playbooks)Declarative (4 YAML files)
Adding a networkWrite a new playbookAdd 8 lines of YAML
OrderingYou figure it outAutomatic
Error handlingPer playbookBuilt-in validate role
RollbackComplexgit revert + push

When participants see this side by side, the reaction is always the same. "Why would anyone do it the other way?"

The answer is: because nobody showed them this approach exists. And because the declarative way only makes sense if you understand Git, pipelines, and how data models work. Which is exactly why day 1 exists.

The lab environment

Everything runs on Cisco dCloud. Four Nexus 9300v switches in CML, NDFC as the fabric controller, a self-hosted GitLab instance, and VS Code on a Windows jump host. Participants get a full environment they can break without consequences.

                    ┌──────────────┐
                    │     NDFC     │
                    │  198.18.133  │
                    └──────┬───────┘
                           │
              ┌────────────┼────────────┐
              │                         │
        ┌─────┴─────┐           ┌──────┴────┐
        │  spine01   │           │  spine02  │
        └─────┬─────┘           └──────┬────┘
              │                         │
        ┌─────┼─────────────────────────┤
        │                               │
  ┌─────┴─────┐                 ┌──────┴────┐
  │  leaf01    │                 │  leaf02   │
  │ Eth1/3: ──┼── Workshop ─────┼── Eth1/3  │
  └───────────┘   Clients       └───────────┘

The topology is simple on purpose. Two spines, two leafs, access ports on Ethernet1/3. Enough to demonstrate the full VXLAN EVPN stack without drowning in complexity.

Why the foundations matter

I could have built a workshop that's just "here's the YAML, here's the pipeline, push the button." It would take four hours instead of two days. But I've seen what happens when teams skip the fundamentals.

They copy the example repo, run it once, it works. Then they need to customize something. They don't understand why the pipeline has stages. They don't know how to resolve a merge conflict when two engineers change the same network definition. They can't debug a failing Docker container that builds their Ansible environment.

The tools are not the hard part. Understanding why the tools exist and how they fit together, that's the hard part. A network engineer who understands Git branching, Docker images, and pipeline stages can figure out any automation framework. One who only knows the YAML syntax is stuck the moment something breaks.

Want to try it?

The full workshop guide is available at vxlanascode.crossdomain-automation.tech. All six modules, all exercises, all the theory.

If you want the hands-on experience with a live dCloud lab environment, reach out to me. I can set up sessions for teams who want to go through the material with actual switches, actual NDFC, and actual pipelines deploying actual VXLAN fabrics. It makes a difference when the ping test at the end actually works across your own lab topology.

The cisco.nac_dc_vxlan collection and example data models are documented at netascode.cisco.com, and the example repository is on GitHub.


Resources:

Chris Beye

About the Author

Chris Beye

Network automation enthusiast and technology explorer sharing practical insights on Cisco technologies, infrastructure automation, and home lab experiments. Passionate about making complex networking concepts accessible and helping others build better systems.

Read More Like This