Cisco SDWAN Design Series-Part-3- Control & Data Planes Logic

Part-2 of this blog series discussed the architecture components of Cisco SDWAN, this blog will dive into the routing/control plane logic with design considerations.

Before we jump into the policies structure and types (which is going to be next blog topic), first let’s see how a Cisco SDWAN edge nodes can interact with the central control plane controller, which is the vSmart (this interaction technically comes after the bring-up/initialization process, which is not covered in this blog). Part-2 of this blog series, highlighted that, if we look at Cisco SDWAN architecture from 10000 feet view, it uses same architecture concept of the reliable, scalable and multi-tenant MPLS L3VPN architecture. think of the overlay SDWAN fabric as the LSP path and the central control plane as the BGP RR and the control protocol used between the SDWAN edge node and the central controller acting like BGP in the MPLS L3VPN.

So the magical control protocol used between the SDWAN edge nodes and the central controller (acting like BGP in the MPLS L3VPN), is called overlay management Protocol OMP.

It is TCP based highly extensible control plane protocol that combines all control plane functions under the single protocol umbrella. It operates inside bi-directionally certificate authenticated TLS or DTLS connections established between the vSmart controllers and between the vSmart controllers and the WAN Edge routers.

Like the flexible BGP, OMP leverages the concepts of address families and route attributes, to advertises all relevant control plane information among the SDWAN edge nodes as part of forming the SDWAN fabric to establishing direct IPSec sessions and communication among the SDWAN edge nodes without reliance on IKE protocol, (the relevant control plane information used to form the IPSec sessions and communication referred to as TLOC, which will be covered later in this blog). In addition,  reachability information can be manipulated, propagated, controlled, etc. without reliance on traditional routing protocols, and distributed routing polices such as OSPF and BGP.

With this approach, the central controller vSmart along with OMP, Cisco SDWAN is capable to offer high degree of scalability by dramatically lowering control plane complexity and eliminating the n^2 problem associated with traditional IKE based IPSec networks. As shown below, both OMP and vSmart controllers creates linear complexity control plane where SDWAN edge nodes establish the control plane connectivity only to the vSmart controller(s) (same concept when BGP RR is used in a BGP environment). So here we will have optimization/complexity reduction on two main control plane protocols: the routing peerings sessions used for reachability information exchange and the IPSec IKE used to build the data plane. If we compare this to the classical IKE based IPSec networks, each router requires to establish control plane connection to every other router in the topology, resulting in quadratic n^2 control plane complexity that does not scale.

Furthermore, OMP facilitates the propagation of centrally defined polices (using vManage) that requires a distributed scale-out enforcement on the SDWAN edge nodes ( these typically for data and application aware routing policies)

As illustrated above, for redundancy more than one vSmart controller should be considered, and these controllers will exchange OMP messages among them, so they have identical view ( routing information) of the SD-WAN fabric. From the c/vEdge nodes point of view, each node can connect to more than one vSmart controller for redundancy, if the Edge node can connect to at least one vSmart Controller, there should be no impact to the fabric connectivity and reachability information exchange.

In case, all vSmart controllers failed or become unreachable,  the impacted SDWAN Edge nodes, will continue to operate on a last known good state for a configurable amount of time.

Keep in mind the concern here is not mainly about having the edge nodes to operate for long time, but such failure scenario, ( complete controllers failure or unreachable) indicates a bigger issue to worry about, which could be a failure of the entire DC, or a provider WAN/Internet link(s).

From routing and reachability information point of view, technically there three are types of routes or information updates that OMP carry (learns sand propagates):

Route  > these routes typically the routing information learned by each of the SDWAN edge nodes and advertised  to the vSmart (LAN/service side networks)

TLOC > each SDWAN edge node advertise its TLOC information to the vSmart and the vSmart propagate this information. TLOC concept is described in more details below.

Service > Service here refers to the VPN labels that could be related to VRF/Virtual network or it could be a service label used for traffic engineering or security service chaining

Transport locator or TLOC, is a key construct of the Cisco SDWAN as a transport independence solution, because it is the construct that create an abstraction layer of the underlay transport(s) link/path.  As a result, Cisco SD-WAN can build secure overlay fabric on top of any public or private transport, such as MPLS, Internet, 4G LTE, Satellite, point-to-point circuits, without worrying about the actual link/transport IP. Because, the IP could be dynamic IP, behind a NAT device etc. therefore, the TLOC(s), act as the abstraction layer for defining IPSec tunnel ends when forming the secure overlay fabric. In addition, Cisco SD-WAN fabric leverages [system IP, color, encapsulation] for defining IPSec tunnel termination endpoints. This allows independence from individual transport IP addressing.

So what does a TLOC construct consists of? and how does it propagate across the SDWAN fabric?

A TLOC consists of:

  • System-IP: IPv4 Address (non-routed identifier)
  • Color: Interface identifier on local c/vEdge
  • Private TLOC: IP Address on interface sitting on inside of NAT
  • Public TLOC: IP Address on interface sitting on outside of NAT
  • Private/Public can be the same if connection is not subject to NAT

As mentioned earlier, TLOCs are advertised as TLOC routes in the OMP messages between the SDWAN edge nodes via the vSmart controllers. vSmart controllers reflect TLOC reachability among the SDWAN edge nodes across the fabric. Since TLOCs are advertised as TLOC routes in the OMP, control policies can be used at the vSmart centrally, to block certain TLOC route advertisements or modify their attributes before passing them along. Once advertised, vEdge routers can construct direct IPSec tunnels between themselves. By default vEdge routers construct a full mesh topology.

The following figures, illustrate simplified Cisco SDWAN fabric build up among the SDWAN edge nodes ( this is process happens after the bring up process of each c/vEdge node).

First, the edge needs to establish, control tunnel with the controller(s)

Next, the overlay build up process takes place

What about reachability/routing information?

What if there are multiple virtual networks that need to be carried and isolated over the WAN?

As illustrated in the figure above, the VRF/VPN membership advertised as part of the OMP updates, and the data plane traffic of each virtual network carried over the SDWAN overlay with its own VPN Label, in which no need to create a separate transport tunnel. Same concept when using MPLS L3VPN MP-BGP, where the VPN label is added as part of the label stack, over same transport network. Also, in the figure above, there is VPN 0, this is an isolated transport VPN, acts exactly like the front-door FRV FVRF concept used with GRE/DMVPN and previously with IWAN.

Last but not least, what if the private WAN provider does not advertise the CE-PE link IP addresses, so if you go to the CE/edge router and do ‘show ip route’ you will not see the other remote CEs/edge nodes physical IPs. As described earlier, in order for the data plane to be established (IPsec mesh) the TLOC construct is used which contains the physical IP and color of each link. In this case, the physical IP won’t be reachable to establish the IPsec or GRE tunnels among the edge nodes.

To overcome such issue, you can create and use a loopback interface on the SDWAN edge nodes, to be used for the data plane instead of using the physical interface that connects to the WAN. Technically the loopback need to be advertised to the WAN provider using same protocol used with the provider such as BGP. With this approach the loopback interface will act like a physical interface from the TLOC point of view, in which it can be used to terminate for both the control plane DTLS tunnel and data plane IPsec tunnel connections. Because this loopback interface will be acting as a transport interface, it has to be configured as part of VPN 0.

Next blog will discuss the Cisco SDWAN polices and the various capabilities and design options.

Marwan Al-shawi – CCDE No. 20130066, Google Cloud Certified Architect, AWS Certified Solutions Architect, Cisco Press author (author of the Top Cisco Certifications’ Design Books “CCDE Study Guide and the upcoming CCDP Arch 4th Edition”). He is Experienced Technical Architect. Marwan has been in the networking industry for more than 12 years and has been involved in architecting, designing, and implementing various large-scale networks, some of which are global service provider-grade networks. Marwan holds a Master of Science degree in internetworking from the University of Technology, Sydney. Marwan enjoys helping and assessing others, Therefore, he was selected as a Cisco Designated VIP by the Cisco Support Community (CSC) (official Cisco Systems forums) in 2012, and by the Solutions and Architectures subcommunity in 2014. In addition, Marwan was selected as a member of the Cisco Champions program in 2015 and 2016.