Cisco NDFC ZTP (Zero-Touch-Provisioning) with Ansible for BGP EVPN fabrics

Picture of Chris Beye

Chris Beye

Systems Architect @ Cisco Systems

Table of Contents

Introduction

If you want to build BGP EVPN based Datacenter fabrics, Cisco’s answer is NDFC (Nexus Dashboard Fabric Controller).

Cisco Nexus Dashboard Fabric Controller (NDFC) is a network automation and management solution offered by Cisco. NDFC provides a single dashboard for managing and automating network operations across multi-cloud, on-premises, and edge environments.

In this article, I am describing how to build a BGP EPVN fabric and provision Spine and Leaf switches using POAP (PowerOn auto-provisioning) with Ansible modules.

POAP is being used to configure and update devices once they are booted and come up without any configuration. It is a very convenient way to configure the fabric, add the devices and then just turn on the devices in the data center and everything gets configured automatically!

Lab setup

My entire environment is fully virtualized. I am running a virtual Nexus Dashboard and CML (Cisco Modelling Labs) to simulate my Nexus devices.

If you want to replicate this setup as well, make sure to allocate enough resources for the Nexus Dashboard (16x vCPUs and 64GB RAM).
https://www.cisco.com/c/en/us/td/docs/dcn/nd/2x/deployment/cisco-nexus-dashboard-deployment-guide-221/nd-deploy-esx-22x.html

As always I run the code in a GitLab CI/CD pipeline and it is executed within a Docker container. The Docker container is stored in my own GitLab Docker registry. If you don’t know how to set up the GitLab server with a Docker registry, check out my previous article:

Docker container

For this use case I am using a very basic Docker container with:

  • Ansible
  • Ansible lint
  • Ansible Galaxy collection dcnm (the name of the previous version of NDFC)
    Cisco is a master in renaming products 😉
				
					FROM ubuntu:22.04

RUN apt-get update && \
  apt-get install -y gcc python3.11 git python3-pip ssh && \
  pip3 install --upgrade pip && \
  pip3 install ansible requests && \
  pip3 install jmespath && \
  pip3 install ansible-lint && \ 
  ansible-galaxy collection install cisco.dcnm
				
			
Building the container in a pipeline
				
					variables:
  IMAGE_NAME: $CI_REGISTRY_IMAGE/ndfc-automation  
  IMAGE_TAG: "1.0"

stages:
  - build

build_image:
  stage: build
  tags:
    - shell-runner
  script:
    - docker build -t $IMAGE_NAME:$IMAGE_TAG .

push_image:
  stage: build
  needs:
    - build_image
  tags:
    - shell-runner
  before_script:
    - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
  script:
    - docker push $IMAGE_NAME:$IMAGE_TAG
				
			
Building the data model

As the base is set now, it is time to build the data model. If you create a BGP EVPN fabric there are a lot of variables you need to deal with. That’s why I created the following data model which I will load later into my Ansible playbook:

fabric_settings.yaml:

To create the fabric I am using the Easy_Fabric template and left pretty much everything as a default value. The only values I changed are:

  • Fabric name
  • BGP AS
  • Bootstrap settings (DHCP,Gateway)
				
					---
fabricName: BGP_EVPN_POAP
templateName: Easy_Fabric
nvPairs:
  FABRIC_NAME: BGP_EVPN_POAP
  BGP_AS: '65001'
  UNDERLAY_IS_V6: 'false'
  USE_LINK_LOCAL: false
  V6_SUBNET_TARGET_MASK: ''
  LINK_STATE_ROUTING: ospf
  RR_COUNT: '2'
  ANYCAST_GW_MAC: 2020.0000.00aa
  PM_ENABLE: 'false'
  BGP_AS_PREV: ''
  PM_ENABLE_PREV: 'false'
  ENABLE_FABRIC_VPC_DOMAIN_ID_PREV: ''
  FABRIC_VPC_DOMAIN_ID_PREV: ''
  LINK_STATE_ROUTING_TAG_PREV: ''
  OVERLAY_MODE_PREV: ''
  ENABLE_PVLAN_PREV: ''
  FABRIC_MTU_PREV: '9216'
  L2_HOST_INTF_MTU_PREV: '9216'
  DEPLOYMENT_FREEZE: 'false'
  INBAND_MGMT_PREV: 'false'
  BOOTSTRAP_ENABLE_PREV: 'false'
  MGMT_V6PREFIX: '64'
  ENABLE_NETFLOW_PREV: ''
  VPC_DELAY_RESTORE_TIME: '60'
  FABRIC_TYPE: Switch_Fabric
  ENABLE_AGENT: 'false'
  AGENT_INTF: eth0
  SSPINE_ADD_DEL_DEBUG_FLAG: Disable
  BRFIELD_DEBUG_FLAG: Disable
  ACTIVE_MIGRATION: 'false'
  FF: Easy_Fabric
  MSO_SITE_ID: ''
  MSO_CONTROLER_ID: ''
  MSO_SITE_GROUP_NAME: ''
  PREMSO_PARENT_FABRIC: ''
  MSO_CONNECTIVITY_DEPLOYED: ''
  ANYCAST_RP_IP_RANGE_INTERNAL: ''
  DHCP_START_INTERNAL: ''
  DHCP_END_INTERNAL: ''
  MGMT_GW_INTERNAL: ''
  MGMT_PREFIX_INTERNAL: ''
  BOOTSTRAP_MULTISUBNET_INTERNAL: ''
  MGMT_V6PREFIX_INTERNAL: ''
  DHCP_IPV6_ENABLE_INTERNAL: ''
  UNNUM_DHCP_START_INTERNAL: ''
  UNNUM_DHCP_END_INTERNAL: ''
  ENABLE_EVPN: 'true'
  FEATURE_PTP_INTERNAL: 'false'
  SSPINE_COUNT: '0'
  SPINE_COUNT: '0'
  abstract_feature_leaf: base_feature_leaf_upg
  abstract_feature_spine: base_feature_spine_upg
  abstract_dhcp: base_dhcp
  abstract_multicast: base_multicast_11_1
  abstract_anycast_rp: anycast_rp
  abstract_loopback_interface: int_fabric_loopback_11_1
  abstract_isis: base_isis_level2
  abstract_ospf: base_ospf
  abstract_vpc_domain: base_vpc_domain_11_1
  abstract_vlan_interface: int_fabric_vlan_11_1
  abstract_isis_interface: isis_interface
  abstract_ospf_interface: ospf_interface_11_1
  abstract_pim_interface: pim_interface
  abstract_route_map: route_map
  abstract_bgp: base_bgp
  abstract_bgp_rr: evpn_bgp_rr
  abstract_bgp_neighbor: evpn_bgp_rr_neighbor
  abstract_extra_config_leaf: extra_config_leaf
  abstract_extra_config_spine: extra_config_spine
  abstract_extra_config_tor: extra_config_tor
  abstract_extra_config_bootstrap: extra_config_bootstrap_11_1
  temp_anycast_gateway: anycast_gateway
  temp_vpc_domain_mgmt: vpc_domain_mgmt
  temp_vpc_peer_link: int_vpc_peer_link_po
  abstract_routed_host: int_routed_host
  abstract_trunk_host: int_trunk_host
  L3VNI_MCAST_GROUP: ''
  PHANTOM_RP_LB_ID1: ''
  PHANTOM_RP_LB_ID2: ''
  PHANTOM_RP_LB_ID3: ''
  PHANTOM_RP_LB_ID4: ''
  VPC_PEER_LINK_VLAN: '3600'
  ENABLE_VPC_PEER_LINK_NATIVE_VLAN: 'false'
  VPC_PEER_KEEP_ALIVE_OPTION: management
  VPC_AUTO_RECOVERY_TIME: '360'
  VPC_DELAY_RESTORE: '150'
  VPC_PEER_LINK_PO: '500'
  VPC_ENABLE_IPv6_ND_SYNC: 'true'
  ADVERTISE_PIP_BGP: 'false'
  ENABLE_FABRIC_VPC_DOMAIN_ID: 'false'
  FABRIC_VPC_DOMAIN_ID: ''
  FABRIC_VPC_QOS_POLICY_NAME: ''
  BGP_LB_ID: '0'
  NVE_LB_ID: '1'
  ANYCAST_LB_ID: ''
  LINK_STATE_ROUTING_TAG: UNDERLAY
  OSPF_AUTH_KEY_ID: ''
  OSPF_AUTH_KEY: ''
  ISIS_LEVEL: ''
  ISIS_P2P_ENABLE: false
  ISIS_AUTH_ENABLE: false
  ISIS_AUTH_KEYCHAIN_NAME: ''
  ISIS_AUTH_KEYCHAIN_KEY_ID: ''
  ISIS_AUTH_KEY: ''
  ISIS_OVERLOAD_ENABLE: false
  ISIS_OVERLOAD_ELAPSE_TIME: ''
  BGP_AUTH_KEY_TYPE: ''
  BGP_AUTH_KEY: ''
  PIM_HELLO_AUTH_KEY: ''
  BFD_IBGP_ENABLE: false
  BFD_OSPF_ENABLE: false
  BFD_ISIS_ENABLE: false
  BFD_PIM_ENABLE: false
  BFD_AUTH_ENABLE: false
  BFD_AUTH_KEY_ID: ''
  BFD_AUTH_KEY: ''
  IBGP_PEER_TEMPLATE: ''
  IBGP_PEER_TEMPLATE_LEAF: ''
  default_vrf: Default_VRF_Universal
  default_network: Default_Network_Universal
  vrf_extension_template: Default_VRF_Extension_Universal
  network_extension_template: Default_Network_Extension_Universal
  OVERLAY_MODE: config-profile
  ENABLE_PVLAN: 'false'
  default_pvlan_sec_network: ''
  FABRIC_MTU: '9216'
  L2_HOST_INTF_MTU: '9216'
  HOST_INTF_ADMIN_STATE: 'true'
  POWER_REDUNDANCY_MODE: ps-redundant
  COPP_POLICY: strict
  HD_TIME: '180'
  BROWNFIELD_NETWORK_NAME_FORMAT: Auto_Net_VNI$$VNI$$_VLAN$$VLAN_ID$$
  BROWNFIELD_SKIP_OVERLAY_NETWORK_ATTACHMENTS: 'false'
  CDP_ENABLE: 'false'
  ENABLE_NGOAM: 'true'
  ENABLE_TENANT_DHCP: 'true'
  ENABLE_NXAPI: 'true'
  ENABLE_PBR: 'false'
  STRICT_CC_MODE: 'false'
  AAA_REMOTE_IP_ENABLED: 'false'
  SNMP_SERVER_HOST_TRAP: 'true'
  ANYCAST_BGW_ADVERTISE_PIP: 'false'
  PTP_LB_ID: ''
  PTP_DOMAIN_ID: ''
  MPLS_LB_ID: ''
  TCAM_ALLOCATION: 'true'
  DEAFULT_QUEUING_POLICY_CLOUDSCALE: ''
  DEAFULT_QUEUING_POLICY_R_SERIES: ''
  DEAFULT_QUEUING_POLICY_OTHER: ''
  ENABLE_MACSEC: 'false'
  MACSEC_KEY_STRING: ''
  MACSEC_ALGORITHM: ''
  MACSEC_FALLBACK_KEY_STRING: ''
  MACSEC_FALLBACK_ALGORITHM: ''
  MACSEC_CIPHER_SUITE: ''
  MACSEC_REPORT_TIMER: ''
  STP_ROOT_OPTION: unmanaged
  STP_VLAN_RANGE: ''
  MST_INSTANCE_RANGE: ''
  STP_BRIDGE_PRIORITY: ''
  EXTRA_CONF_LEAF: ''
  EXTRA_CONF_SPINE: ''
  EXTRA_CONF_TOR: ''
  EXTRA_CONF_INTRA_LINKS: ''
  STATIC_UNDERLAY_IP_ALLOC: 'false'
  MPLS_LOOPBACK_IP_RANGE: ''
  LOOPBACK0_IPV6_RANGE: ''
  LOOPBACK1_IPV6_RANGE: ''
  V6_SUBNET_RANGE: ''
  ROUTER_ID_RANGE: ''
  L2_SEGMENT_ID_RANGE: 30000-49000
  L3_PARTITION_ID_RANGE: 50000-59000
  NETWORK_VLAN_RANGE: 2300-2999
  VRF_VLAN_RANGE: 2000-2299
  SUBINTERFACE_RANGE: 2-511
  VRF_LITE_AUTOCONFIG: Manual
  AUTO_SYMMETRIC_VRF_LITE: false
  AUTO_VRFLITE_IFC_DEFAULT_VRF: false
  AUTO_SYMMETRIC_DEFAULT_VRF: false
  DEFAULT_VRF_REDIS_BGP_RMAP: ''
  DCI_SUBNET_RANGE: 10.33.0.0/16
  DCI_SUBNET_TARGET_MASK: '30'
  SERVICE_NETWORK_VLAN_RANGE: 3000-3199
  ROUTE_MAP_SEQUENCE_NUMBER_RANGE: 1-65534
  DNS_SERVER_IP_LIST: ''
  DNS_SERVER_VRF: ''
  NTP_SERVER_IP_LIST: ''
  NTP_SERVER_VRF: ''
  SYSLOG_SERVER_IP_LIST: ''
  SYSLOG_SEV: ''
  SYSLOG_SERVER_VRF: ''
  AAA_SERVER_CONF: ''
  BOOTSTRAP_ENABLE: true
  DHCP_START: 198.18.1.200
  DHCP_END: 198.18.1.250
  MGMT_GW: 198.18.1.1
  SEED_SWITCH_CORE_INTERFACES: ''
  SPINE_SWITCH_CORE_INTERFACES: ''
  INBAND_DHCP_SERVERS: ''
  UNNUM_BOOTSTRAP_LB_ID: ''
  UNNUM_DHCP_START: ''
  UNNUM_DHCP_END: ''
  BOOTSTRAP_CONF: ''
  enableRealTimeBackup: ''
  enableScheduledBackup: ''
  scheduledTime: ''
  ENABLE_NETFLOW: 'false'
  NETFLOW_EXPORTER_LIST: ''
  NETFLOW_RECORD_LIST: ''
  NETFLOW_MONITOR_LIST: ''
  FABRIC_INTERFACE_TYPE: p2p
  SUBNET_TARGET_MASK: '30'
  REPLICATION_MODE: Multicast
  VPC_DOMAIN_ID_RANGE: 1-1000
  FABRIC_VPC_QOS: 'false'
  OSPF_AREA_ID: 0.0.0.0
  OSPF_AUTH_ENABLE: 'false'
  BGP_AUTH_ENABLE: 'false'
  BFD_ENABLE: 'false'
  ENABLE_NXAPI_HTTP: 'true'
  GRFIELD_DEBUG_FLAG: Disable
  FEATURE_PTP: 'false'
  MPLS_HANDOFF: 'false'
  ENABLE_DEFAULT_QUEUING_POLICY: 'false'
  LOOPBACK0_IP_RANGE: 10.2.0.0/22
  LOOPBACK1_IP_RANGE: 10.3.0.0/22
  SUBNET_RANGE: 10.4.0.0/16
  INBAND_MGMT: 'false'
  MULTICAST_GROUP_SUBNET: 239.1.1.0/25
  ENABLE_TRM: 'false'
  RP_COUNT: '2'
  RP_MODE: asm
  RP_LB_ID: '254'
  PIM_HELLO_AUTH_ENABLE: 'false'
  ANYCAST_RP_IP_RANGE: 10.254.254.0/24
  DHCP_ENABLE: true
  ENABLE_AAA: 'false'
  SITE_ID: '65001'
  DHCP_IPV6_ENABLE: DHCPv4
  BOOTSTRAP_MULTISUBNET: "#Scope_Start_IP, Scope_End_IP, Scope_Default_Gateway, Scope_Subnet_Prefix"
  MGMT_PREFIX: '24'
				
			
fabric_inventory.yaml:

It also makes sense to create a separate file for the switch inventory details. The most important information is the serial number of the device as we need to map the correct role to each switch (Spine, Leaf, Border-Leaf etc.). 

Make sure to have that information handy before starting with POAP!

 
				
					---
inventory_data:
  switches:
    - seed_ip: 198.18.1.151
      user_name: admin
      password: C1sco12345
      role: spine
      poap:
        - serial_number: 9YW0T2HLH4A
          model: 'N9K-C9300v'
          version: '9.3(11)'
          hostname: 'POAP-SPINE01'
          config_data:
            modulesModel: [N9K-X9364v, N9K-vSUP]
            gateway: 198.18.1.1/24
    - seed_ip: 198.18.1.152
      user_name: admin
      password: C1sco12345
      role: leaf
      poap:
        - serial_number: 9BH06YFWE60
          model: 'N9K-C9300v'
          version: '9.3(11)'
          hostname: 'POAP-LEAF01'
          config_data:
            modulesModel: [N9K-X9364v, N9K-vSUP]
            gateway: 198.18.1.1/24
    - seed_ip: 198.18.1.153
      user_name: admin
      password: C1sco12345
      role: leaf
      poap:
        - serial_number: 9TPC3FV5ITL
          model: 'N9K-C9300v'
          version: '9.3(11)'
          hostname: 'POAP-LEAF02'
          config_data:
            modulesModel: [N9K-X9364v, N9K-vSUP]
            gateway: 198.18.1.1/24
				
			
Building the playbok

For the Ansible playbook, it just needs four different modules:

  • ansible.builtin.include_vars:
    Both defined yaml files needs to be loaded for the fabric and inventory data
  • cisco.dcnm.dcnm_rest:
    This module is used to create the fabric using the fabric_settings by converting it into JSON payload
  • ansible.builtin.pause:
    Once the fabric is created it will take some time in order to get the IP address from the DHCP server (in my case the NDFC controller) and the switches are visible in the POAP inventory. 
  • cisco.dcnm.dcnm_inventory:
    The inventory module is used to add and provision the devices into the created fabric 

 

Make sure that the following parameters are set in the ansible.cfg: 

[persistent_connection]
connect_timeout = 100
command_timeout = 1800
 
The execution of the provisioning process takes around 10-15 minutes as the devices will reboot and NDFC will configure the devices. That’s why the command_timeout parameter needs to be adjusted. 
				
					---

- name: Create fabric and pre-provision switches # Create fabric and pre-provision switches
  hosts: ndfc_controller # ndfc_controller
  gather_facts: false

  tasks:
    - name: Load fabric data # Load fabric data
      ansible.builtin.include_vars:
        file: data/fabric_settings.yaml
        name: fabric_settings

    - name: Load inventory data # Load inventory data
      ansible.builtin.include_vars:
        file: data/fabric_inventory.yaml
        name: fabric_inventory

    - name: Create template policy using Ansible # Create template policy using Ansible
      cisco.dcnm.dcnm_rest:
        method: POST
        path: /appcenter/cisco/ndfc/api/v1/lan-fabric/rest/control/fabrics
        json_data: '{{ fabric_settings | to_json }}'

    - name: Sleep # Sleep
      ansible.builtin.pause:
        seconds: 300

    - name: Provision of switch Configuration # Provision switch configuration
      cisco.dcnm.dcnm_inventory:
        fabric: '{{ fabric_settings.fabricName }}'
        state: merged
        config: '{{ fabric_inventory.inventory_data.switches }}'
				
			
Building the pipeline

The pipeline itself is very simple and consists of three stages:

  • BUILD
    The Docker container will be created and uploaded to the GitLab Docker registry 
  • LINTING
    Ansible playbook syntax will be checked against linting rules
  • DEPLOY
    Ansible playbook will be executed
				
					variables:
  IMAGE_NAME: $CI_REGISTRY_IMAGE/ndfc-automation  
  IMAGE_TAG: "1.0"

stages:
  - build
  - linting
  - deploy

build_image:
  stage: build
  tags:
    - shell-runner
  script:
    - docker build -t $IMAGE_NAME:$IMAGE_TAG .

push_image:
  stage: build
  needs:
    - build_image
  tags:
    - shell-runner
  before_script:
    - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
  script:
    - docker push $IMAGE_NAME:$IMAGE_TAG

ansible_linting:
  stage: linting
  needs:
    - push_image
  image: $IMAGE_NAME:$IMAGE_TAG
  tags:
    - docker-runner
  before_script:
    - cd ansible
  script:
    - ansible-lint deploy_provision_fabric.yaml

ansible_deploy:
  stage: deploy
  needs:
    - ansible_linting
  image: $IMAGE_NAME:$IMAGE_TAG
  tags:
    - docker-runner
  before_script:
    - cd ansible
  script:
    - ansible-playbook deploy_provision_fabric.yaml
				
			
Run the pipeline

Let’s run the pipeline and validate the process:

At first, the Ansible play will create the fabric:

After some time once the switches booted up, it will start with the POAP process and sends out DHCP request.

As the fabric has the DHCP server option enabled and a range is assigned, the switches will receive an IP address. 

Once the switches become available, which will be after some minutes, they will be added to the fabric.

The switches will download the python script from the NDFC server, configure basic connectivity like IP address and credentials and the switch will reboot. 

The Ansible tasks wait for the switches to come back online to configure the “role” (Spine/Leaf) related configurations. This process can take up to 10 – 15 minutes. 

Once the switches come back online, the config state is unknown. NDFC will sync this after a while. 

Once the config state is synced, NDFC will push the configuration to the switches. 

You can see the process more in detail if you go back to the pipeline status. The Ansible module will: 

  • Add the switches 
  • Wait to rediscover the devices
  • Assign the role
  • Saves the config 
  • Deploys the config to the devices

After a while, you should see that the entire pipeline was executed successfully. 😃

References