Disclaimer: the content of this blog is solely based on my personal view/experience, and it’s not a company or someone else’s view. The content is intended for educational purpose only, and it’s not an official whitepaper or best practices document. Therefore, you must always refer to the official and latest AWS documentations, before considering anything discussed in this blog series, in any AWS environment.
The previous blog covered the common drivers to consider a hybrid model. This blog focuses on describing and analyzing the possible connectivity options that are available at the time of this blog writing.
That being said, this blog will not recommend a connectivity model in specific, simply because there is no single best answer, ‘you’ as the designer or architect of the solution will be the one who makes the design decision that fit the different requirements and constraints of your scenario/use case. Nonetheless, having a very good understanding of the attributes and limitations of each connectivity model, is key to make an architectural and design decision.
Designing, a large scale, hybrid connectivity model is a complex topic, therefore this blog focuses on the connectivity options, while the following blog will discuss some of the key design considerations of such connectivity model.
In a high level, there are two connectivity options to connect an on-Premise network to AWS Cloud (from networking perspective, excluding the other options for data/storage transfer):
However, each of these options, can be designed and deployed in different ways to achieve different goals, in terms of scale, performance and global Vs. regional.
This blog (Part-1 of the blog topic), discusses the “Site to Site VPN” connectivity option, while the following blog (part-2) will focus on the Direct Connect connectivity option.
Site to Site Virtual Private Network (VPN)
A site to site IPSec VPN enables two sperate networks (sites), to communicate in a secure manner with each other over another network that might be untrusted transport such as the public internet.
The two interconnected sites, VPCs within a region or across different regions, as well as it can be a VPC and an on-premises data center site, this blog focuses on the later option.
In AWS, Site to Site VPN can be achieved in two approaches:
AWS managed VPN Service: this refers to the underlaying service infrastructure, software etc. in which AWS provide the required infrastructure, software, etc. required to build the service along with pre specified availability/SLA. This accomplished by using one of the below options:
Site to Site VPN with Virtual Private Gateway (VGW) Headend
In the figure below, a VPC has an attached virtual private gateway, that is connected over an IPSec tunnel to the on-premises remote network, with a customer gateway. Typically, the customer gateway must be configured to enable the Site-to-Site VPN connection, as well as the required routing (static or dynamic/BGP) to maintain valid routing reachability information.
Note: as it shown in the figure above, at the AWS VGW side, there are two IPSec tunnels. These tunnels are created with each VPN connection at the VGW by default. The following figure provides more details, with a BGP session establishment packet flow.
From on-premises CGW side both tunnels can be used toward the VGW
However, from the VGW side, one tunnel is selected to forward traffic outbound.
To avoid the possibility of any asymmetrical routing situation, where the inbound of traffic from the on-premises device to AWS rides over one tunnel, while the return traffic rides another tunnel, you could use dynamic Site to Site VPN tunnels with BGP, in Active-Standby mode, along with BGP attributes (AS-PATH prepending or MED) to have more deterministic symmetrical path selection. Also, you can establish multiple VPN connections to the same VGW, for high-availability purposes, or to connect multiple remote sites.
Note: with AWS VPN, you have the option to configure to authenticate your Site-to-Site VPN tunnel endpoints by either using pre-shared keys, or you can use a private certificate from AWS Certificate Manager Private Certificate Authority. If you consider the certificate option, you have the ability to not to specify the IP address of your customer gateway device, which in turns allows you to move the customer gateway device to a different IP address without having to re-configure the VPN connection. This can be useful when the customer/on-premises device, doesn’t have a static public IP address.
Up until now, the routing information is between the VPN headends, which is not enough. We need to extend it to the on-premises and VPC networks to achieve the desired end to end reachability.
From the on-premises side, the propagation of routing information can vary, depends on how the CPE/edge device designed to propagate route back to the DC network (static, IGP, BGP etc.) the point here is to ensure, the VPC private network prefix(s) that need to be reached from on-premises side, has to be advertised back to the DC network.
From the AWS VPC side, we know from previous blogs, that the VPC route table(s) steer network traffic. Therefore, in the VPC route table where the instances resides and need to communicate with the remote on-premises network(s), you must add a route for the remote on-premise network, with the VGW set as the target. This can be set statically/manual or you can enable route propagation for the intended route table, to automatically propagate the BGP learned routes at the VGW, to the VPC route table.
In other words, the VGW acts a logical component that includes the tunnels and BGP objects, but it does not propagate routes back to the VPC automatically, reachability has to be configured explicitly either by using static route at a VPC route table or enabling route-propagation to inject the learned routes from the BGP session back to the VPC virtual router/route table.
Note: connecting over VPN, to a VPC, you will be able to access any workload inside that VPC with private IP e.g. EC2, RDS, NLB, interface endpoints, PriavteLink endpoints, Route53 resolver etc. however, at the time of this blog writing you cannot access gateway VPC endpoints, VPC public IPs.
Site to Site VPN with AWS Transit Gateway (TGW) Headend
First, let’s define the AWS Transit Gateway TGW and its components with focus on providing VPN connectivity from an on-premises site. It is very important to have a good understanding of the foundations covered here, because subsequent blogs will build up on this foundation, to cover other connectivity options and design considerations with TGW.
AWS Transit Gateway TGW is a scalable service that enables you to connect multiple VPCs with each other, as well as with the on-premises networks (over Site to Site VPN and/or Direct connect) using a single centralized gateway. Technically, the TGW acts like virtual router, however in concept, however it is more scalable than an actual virtual router, because TGW uses AWS Hyperplane system in the backend which make it very scalable service.
Therefore, with the AWS TGW, you only need to create and manage connectivity rules and routing control from one central gateway. The TGW creates a hub & spoke connectivity model, where it acts as the hub that controls how traffic is routed among all the connected networks which act like spokes. This approach significantly simplifies management and reduces operational costs because each network only has to connect to the TGW and not to every other network. This ease of connectivity makes it easy to scale your network as you grow.
Note: this blog focuses on defining the component of a TGW and how it facilities the AWS Site to Site VPN connectivity between AWS and on-premises site(s). subsequent blogs will discuss it from different point of views (with direct connect, and with multiple VPCs)
The following are the key component of a TGW:
Attachment: refers to connecting element to the TGW, which can be a VPC, an AWS Direct Connect gateway (DX), a peering connection with another transit gateway, or a VPN connection. A TGW attachment is both, a source and a destination of packets. You can attach the following resources to a TGW:
TGW Route Table: a TGW route table performs a typical routing table function, like any router, in TGW a route table includes dynamic and/or static routes that decide the next hop based on the destination IP address of the packet. The target of these routes could be a VPC DX, or a VPN connection. TGW offers the ability to have multiple route tables, by default, transit gateway attachments are associated with the default transit gateway route table. Because we can have multiple routing tables, this means we can easily create route isolation and traffic engineering based on different requirements (these aspects will be cover in the subsequent blogs)
Association: refers to the linkage between an attachment and route table. Technically each attachment is associated with exactly one route table (1:1). While each route table can be associated with zero to many attachments (1:many). You can think of the TGW routing table/domain as a VRF routing table in classical routing concept.
Route Propagation: this is referring to the way routing or IP reachability information being shared, with a TGW, a VPC, DX or a VPN connection.
So how can the routes be propagated? simply with a VPC, you create static routes to send traffic to the TGW by setting Next-Hop/target in the VPC route table or TGW route table as “attachment”, as we highlighted above, a TGW attachment is both, a source and a destination of packets. With a VPN connection or DX gateway, routes are propagated from the TGW to your on-premises router using Border Gateway Protocol (BGP). For VPC to TGW route propagation, Amazon VPC will propagate into the AWS Transit Gateway route table using internal APIs (not BGP). With a peering attachment (peering with another TGW in a different region), you need to create a static route in the TGW route table to point to the peering attachment.
Unlike the Site to Site VPN using VGW, where the created redundant tunnels (2x tunnels) operates in active-standby mode, with TGW these tunnels operate in active-active using a known routing concept across multiple paths (Equal Cost Multi-Pathing ECMP). Although ECMP maximizes the overall throughput, each traffic flow will be limited to the max throughput of a single tunnel, this is not a limitation, it’s how traffic flows can be distributed in networking (per flow or per packet, and the per packet has a few drawbacks).
If you have high volume of traffic where each traffic flow may need more than what a single VPN tunnel can provide, then you should look into AWS direct connect, that will be covered in details in the subsequent blog.
Further reading: Scaling VPN throughput using AWS Transit Gateway
In addition, as illustrated in the figure above, VPN termination at the TGW offer a simple centralized VPN access to between one or more on-premises site(s) and one or more VPCs. The upcoming “multi-VPC design considerations” blog will discuss TGW design options and considerations with multiple VPCs in more details.
Note: The VPN tunnel comes up when traffic is generated from your side of the Site-to-Site VPN connection. The virtual private gateway is not the initiator; your customer gateway must initiate the tunnels. If your Site-to-Site VPN connection experiences a period of idle time (usually 10 seconds, depending on your configuration), the tunnel may go down. To prevent this, you can use a network monitoring tool to generate keepalive pings; for example, by using IP SLA.
Customer Managed VPN: with this option the customer is responsible for provisioning and managing the entire VPN solution, typically running a VPN software on an EC2 instance, or it could be a VPN virtual appliance from AWS Marketplace, including SD-WAN solutions. Also, the design and deployment of the availability of this VPN setup, need to be taken into considerations from customer side, some of the solutions available at the AWS Marketplace, by the independent software vendors ISV, may offer automated provisioning of the virtual instance/appliance redundancy. Still, with this option, you need take care of any software or patch updates, additional instances to mange as part of the VPC in addition to any required license(s) from the ISV. However, depends on the use case scenario, (requirements constrains), you may find it, a more feasible option.
Note: this option can be used in conjunction with AWS direct connect, to achieve different routing scenarios such as backup path. The considerations of such connectivity model will be covered in a future blog.
This blog explained and analyzed the site to site VPN connectivity for the hybrid connectivity model, and the possible VPN connectivity options. The following blog will discuss AWS Direct Connect.
Site-to-Site VPN Quota
What is AWS Site-to-Site VPN?
AWS VPN FAQs