Disclaimer: this blog is based on my own view and findings; it is not a company or someone’s else view. Also, it is not intended to show or prove which cloud provider solution is better. Instead, it is intended to analyze and discuss the different external connectivity models of each cloud provider that can be used today by enterprises to address hybrid architecture requirements. Ideally as a cloud or network architect you need to be aware of such capabilities to be able to take it into considerations, especially when designing a Multi-Cloud hybrid model with an enterprise SD-WAN solution, such as Cisco SD-WAN.
At the time of this content writing, this is the only resource that you may find today, with a comprehensive analysis and visualization of this topic.
The previous blogs: “WAN routing in the cloud era” focused on the WAN routing to and from the cloud. And how Cisco SD-WAN can help to facilitate this routing, while the “Networking in AWS vs. Google Cloud Platform – Design Considerations” focused on the on the routing in the cloud, design considerations of Amazon Web services AWS and Google Cloud platform GCP.
Today hybrid cloud computing (Public Cloud + On-Prem) is becoming more like the industry model in several regions, to address different use case scenrios such as DR as Service, stream/Batch processing of Big data and IoT collected traffic as well as AI and platform or software-as-a-service capabilities offered by the cloud.
With the scale of today’s services, applications and its data, along with the different connectivity models, a thorough networking planning, analysis and design is a must to build such model.
That’s why, it is key today to master networking designs of the hybrid architecture to build a successful connectivity that ideally should be powered by a reliable and cloud ready SDWAN solution such as Cisco SDWAN.
In addition, although one of the main benefits of using the cloud is the ability to provision resources as you grow, in which you don’t need to provision for the maximum capacity when you don’t need it. However, for the hybrid architecture in some scenarios you need to plan, design and provision your network connectivity from the On-Prem to the Cloud to cater for the maximum required capacity. This is epically for some DR as a service scenarios where some of the application stack might failover form the On-Prem to the cloud which requires fair amount of bandwidth “design for failure”.
Therefore, this blog focuses on describing and analyzing the hybrid cloud connectivity models and the possible design options. There is no single best answer, ‘you’ as the designer or architect of the solution will be the one who makes the design decision that fit the different requirements and constraints of your scenario. Let’s list, analyze and compare the available connectivity options that GCP, AWS and Azure offer.
A VPN simply is a secure way to extend an On-Prem site/DC to a cloud provider network through an IPsec VPN tunnel. The encrypted traffic and traverse between public Internet between these two networks. In general VPN is a viable connectivity option for low volume data transfer or as a backup/redundant path to a higher speed link. The following table summarizes the attributes of the VPN solution per cloud provider.
|Solution Name||Cloud VPN||AWS VPN/CloudHub||VPN Gateway|
|Max. Throughput*||~3 Gbps per Tunnel over direct Peering link
1.5 Gbps per Tunnel over the public Internet
|1.25 Gbps – aggregate for all tunnels on the same Gateway||1.25 Gbps – aggregate for all tunnels on the same Gateway|
|VPN GW considerations||More than one GW can be added, for optimized HA/throughput, but at separate GW per region is required||Only one VPN GW can be attached to a VPC at a time.||Single VPN GW per VNet|
|Tunnels limits||10 per project, can be increased by submitting a quota increase request||10 per VPN GW and 50 per region||30 per VPN GW|
*Practically, this should not be considered as an exact guaranteed throughput, as it depends on the Internet traffic conditions at the forwarding time, as well as application’s behavior, packet’s size (processing a significant percentage of smaller packets can reduce overall throughput). Also, the throughput of the On-Prem VPN Gateway has impact if it does not support same or higher VPN throughput capacity.
**If a single tunnel does not provide necessary throughput, with GCP you can distribute traffic across multiple tunnels to provide additional throughput.
As shown in the figure below, all the cloud providers offer VPN connectivity over multiple tunnels for HA purposes sourced from single or dual On-Prem HW/SW VPN node, such as Cisco router or Firewall.
When it comes to VPN HA design options combined with adding more throughput, GCP has more flexibility as illustrated below.
In addition, as described in the previous blog, “Networking in AWS vs. Google Cloud Platform – Design Considerations” a VPC in GCP can span multiple regions, this means accessing a VPC over VPN can provider global access to the VPC resources. Also, you can take advantage of Tagging of prefixes to control which resource/VM instance should receive a certain route (The virtual network router selects the next hop for a packet by consulting the routing table for that instance), as illustrated below, Compute Engine-3 has no “VPN” tag which means it won’t see routes tagged with “VPN” coming from the VPN Gateway.
In addition, at Next ‘19, GCP announced its new advanced VPN connectivity option for customers with mission-critical connectivity needs referred to as ‘High Availability (HA) VPN’, it’s in beta release now. With this HA VPN, enterprises can connect their on-premises deployment to a GCP Virtual Private Cloud (VPC) with an industry-leading SLA of 99.99% at general availability, plus simplified setup, for more info refer to the below links:
In contrast, to achieve the same with AWS you need to build a transit VPC, with software VPN on top of an EC2 instance, such as Cisco CSR 1000 and then connect to each VPC in a different region over point-to-point VPN using VGW.
The VPN between the VPCs above, possibly can be established over the Internet (over the VPC IGW), to ensure it will be established over AWS backbone you could build the VPN over VPC peering.
On the other hand, AWS CloudHub capability offer the flexibility to connect multiple On-Prem sites to single AWS VPC. AWS CloudHub connectivity model, is suitable for scenarios where you have multiple branch offices with an existing Internet connection. The goal here, is to deploy a convenient, and possibly a low cost hub-and-spoke connectivity model to act as either primary or backup connectivity between these remote sites and the workload in AWS VPC.
Note: VPN may not be a transitive connectivity model, for example if communication from an On-Prem DC/site to multiple AWS VPCs is required over a transit VPC with VGW VPN, you may need to use a proxy function instance or consider a software based VPN on EC2 instance(s) to allow such communication. Alternatively, a Direct Interconnect can be used.
Connecting On-Prem DC to the cloud over VPN may not always provide the required performance or security for large scale networks. therefore, AWS, Azure and Google, along with their colocation exchange partners, offer the ability to establish connectivity over a dedicated link/network that can provide a guaranteed performance and more consistent network experience than Internet-based transport. This connectivity model helps enterprises to establish a dedicated network connection from On-prem DC to it cloud provider. table below summarizes the characteristic of such connectivity model per cloud provider.
|Solution Name||Cloud Interconnect||AWS Direct Connect||ExpressRoute|
|1Gbps or 10Gbps
|500M to 100G**|
|Link provisioning||By customer or intermediate/partner provider as L2 link||By customer or intermediate/partner provider as L2 link||Intermediate/partner provider as L2 link|
*Can be increased using Ling aggregation LAG.
** Another announcement GCP made at Next ‘19 was 100 Gbps Dedicated Interconnect
Because Google Cloud platform support a single VPC to span multiple regions, connecting an On-Prem DC over GCP Cloud Connect to a single region/POP and peering BGP with the cloud router you have reachability to all the networks across the glob of that VPC/network
With this capability, an On-Prem DC can have access to the different zones distributed globally within the VPC as illustrated in the figure below.
Also, you have the flexibility to configure the cloud router to either advertise only the local regional route or all the routes that are part of the VPC across different regions.
In some scenario where you need to isolate world load for security, billing or compliance reasons different applications can be distributed across different projects. Each project can have its own cloud router, and over the same Cloud interconnect you can separate the traffic and BGP sessions by using 802.1Q tag and VRFs as illustrated in the figure below.
With the AWS Direct connect, for each VPC a virtual interface VIF is required which maps to a VLAN ID/tag and a VRF at the customer/partner router, this means the more VPCs you have the more VIFs, interface, BGP sessions and VRFs you need to configure and mange
In addition, the scope of AWS Direct interconnect is within a region, which means if you have VPCs in different regions you may end up setting up different direct connects per region
These may not have significant impact on many medium size organizations, however for enterprises with global scale this can lead to some limitations and requires some design workaround. Therefore, AWS recently introduced the Direct Connect Gateway. As illustrated in the figure below, with the Direct Connect Gateway you can connect up to 10 VPCs per VIF across multiple regions. This Gateway is not intended or support passing traffic between VPCs. You can use VPN or VPC peering for such communication if required.
As a result, an On-Prem DC can have global access to the different VPC located in different regions ( within the same account)
Also, AWS introduced Transit Gateway With AWS Transit Gateway, you only have to create and manage a single connection from the central gateway in to each Amazon VPC, on-premises data center, or remote office across your network. Transit Gateway acts as a hub that controls how traffic is routed among all the connected networks which act like spokes” for more details refer to the this blog
With GCP, you can share a VPC network from one project to instances in another project within the same organization using shared VPC, in scenarios like multi-tenancy deployments or separation of some application stack administration among different teams while still all the these VPCs uses one shared VPC network (host VPC).
In contrast, with Microsoft Azure customers must connect using a partner provided link or node. For the customer managed router, the link is provided by a partner carrier as a point-to-point Ethernet connection, or through a virtual cross-connection via an Ethernet exchange, and the customer establish and manage the BGP session with Microsoft Azure. Over the same interconnect (ExpressRoute) MS Azure can establish multiple BGP sessions with its customers’’ On-Prem network for different traffic profiles illustrated below.
An ExpressRoute circuit can have any one of the above peerings, two, or all three enabled per ExpressRoute circuit. In turn, each peering is established by a pair of independent BGP sessions configured redundantly for high availability. Each circuit has a fixed bandwidth (50 Mbps, 100 Mbps, 200 Mbps, 500 Mbps, 1 Gbps, 10 Gbps) and is mapped to a connectivity provider and a peering location. The bandwidth you select is shared across all the peerings for the circuit.
Although GCP, AWS and Azure offer a dedicated Interconnect service, that allows customers to establish high speed direct circuit between their On-Prem datacenter(s) and the cloud provider, this connectivity model requires proximity to one of the company’s regions or points of presence. On the other hand, the Partner/Carrier Interconnect extends this service to a wider scale and coverage of enterprises that aren’t geographically close enough, or even may not require all the power of a high speed dedicated circuit. Depending on the scenario, this connectivity model could be provisioned as a dedicated L3 link from the partner carrier or can be integrated to an existing MPLS L3 VPN provided by the same carrier in which it will be added as an additional site to the customer MPLS L3 VPN.
With model, the support bandwidth can be more flexible because the customers are not limited to 1G or 10G links and can obtain lower speed.
However, the SLA between you and the cloud provider may vary in such scenario, as exemplified below.
GCP, AWS and Azure all offer the ability to establish public/global peering to access the provider specific public services using public IP addresses.
With Azure as discussed before this can be a separate BGP session over the same ExpressRoute Circuit. While with AWS, this can be establishing over a separate VIF for public services access over the AWS Direct Connect.
On the other hand, with Google, as we know, one of the key differentiation of Google, is that it has its own large global fiber network in which, GCP capable to inject users’ traffic into its Google backbone network as close to the user connection as possible.
As result, GCP has the ability and flexibility to offer two type of public peerings, Direct Peering and Partner Peering.
With the Direct Peering, “Google allows you to establish a direct peering connection between your business network and Google’s. With this connection you will be able to exchange Internet traffic between your network and Google’s at one of our broad-reaching Edge network locations”. The Partner, offer the same but through an intermedia SP/ISP.
Technically, this can be achieved by exchanging BGP routes between Google and the peering entity, and then can be used to reach all of Google’s services including the full suite of Google Cloud Platform products. This peering can be direct between the enterprise and Google or over an intermediate/transit/partner carrier.
In fact, the Direct Peering with Google has several advantages such as
However, there are a few pre-request that must be met before being able to establish eh direct peering such as: having a publicly routable ASN, Publicly routable address space (at least one /24 of IPv4 and/or one /48 of IPv6 space), Standard port types: 10G Duplex Ethernet LR (LAN-PHY), 100G Ethernet LR4, etc.
From general design point of view, it is always recommended to have redundant links that can be over two or more distinct transports, or it cloud be a mixed with VPN tunnels. Traffic engineering can be achieved using BGP attributed such as AS-PATH, MED etc. deepens on the support BGP attribute by the Cloud provider as well as the Intermediate/Transit carrier.
However, by adding a secondary link/path you may encounter asymmetrical routing issues.
Technically, there are couple of ways to overcome such issue such as source-NAT or route advertisement control. For example, if you want to use the Internet for authentication traffic and ExpressRoute for your mail traffic, you should not advertise your Active Directory Federation Services (AD FS) public IP addresses over ExpressRoute. Similarly, be sure not to expose an on-premises AD FS server to IP addresses that the router receives over ExpressRoute. Routes received over ExpressRoute are more specific so they make ExpressRoute the preferred path for authentication traffic to Microsoft. This causes asymmetric routing.
Also, you may consider deploying a hub and spoke topology for the Hybird cloud model, where the hub is a virtual network (VNet) in Azure that acts as a central point of connectivity to your on-premises network. The spokes are VNets that peer with the hub, and can be used to isolate workloads. Traffic flows between the on-premises datacenter and the hub through an ExpressRoute or VPN gateway connection.
As it shown in the figure above, there is a VNet peering between the hub and each spoke VNet, although its low latency connection between VNets, Peering connections are non-transitive, in which it can’t extend the connectivity to other VNets or the On-Prem DC . One way to overcome this, is by deploying a virtual network appliance like Cisco FW or SD-WAN virtual edge node to do routing at the hub, and using explicit user defined routes (UDRs) in the spoke, to point/forward traffic to the hub.
Adding a Cloud ready SDWAN such Cisco SDWAN, without any doubt will make traffic engineering over different transports simpler and more efficient, application aware and capable to control more complex routing scenarios. For instance, by integrating Cisco Cloud Services Router (CSR) or Cisco vEdge Cloud (formerly, Viptela vEdge Cloud, which is a software router that supports all of the capabilities available on Cisco’s industry leading SD-WAN platform), on virtual cloud instances, allows customers to seamlessly and securely connect all their branch and ON-Prem data center sites into public cloud environments, along with:
Note: you should always refer to your cloud provider website to verify the exact details such as sizing, features etc. because they may change frequently.