Transit VPC vs Transit Gateway for AWS Architectures

When looking to deploy any infrastructure onto AWS, as with on-premise infrastructure, its important to get the foundations and connectivity right from the start to avoid a “bolt-on” fix later. Its important to not only design for what you have in place currently, but also allow for future growth while understanding the limitations of what you are implementing. In this article, we will look at some of the key reasons to use Transit VPC or Transit Gateway architectures. This will be a brief comparison of both architectures and a Transit VPC vs Transit Gateway comparison to help you make the correct choice for your infrastructure.

There are multiple connection technologies available for use when working with AWS. We will need to look through multiple types for our different architectures. To help clarify which is used on the diagrams below, use the following key to understand which technologies/methods are used for each of the links:

Why use a Transit VPC or Transit Gateway?

AWS have some important rules that govern how you would design and implement your public cloud infrastructure or hybrid (on-premise and AWS) infrastructure. These can strongly impact how your VPC’s (Virtual Private Cloud) interact and communicate with each other and/or other connected networks such as on-premise. One key factor is that AWS disallow transitive routing which is the ability to get to reach another VPC by traversing through a VPC in the middle. Lets look at an example, in the below diagram, our user, located in VPC A, wants to reach their server in VPC C. VPC Peering connections exist between VPC A and B, and VPC B and C. Logically, we have a full line of connectivity from A to C through B, but AWS by design disallow this flow of traffic to help secure your data and environment.

This would mean that in order to gain a fully traversable network within AWS between VPC’s we would need to create a full mesh of Peer links, where each VPC has a direct link/peer connection to every other VPC. In our 3 VPC example this may not seem overly complicated, however as you grow and may create more VPC’s, this will become more and more complex and cumbersome to maintain and manage.

  • 3 VPC’s will need 3 Peers
  • 4 VPC’s need 6 Peers
  • 5 VPC’s need 10 Peers
  • 6 VPC’s need 15 Peers

This will also add further complication and room for error during setup of new VPC’s ensuring that all links are appropriately in place and eventually, if the environment grows big enough, could hit VPC active peer limits.

When designing a hybrid AWS environment to connect your on-premise network into AWS, the same restriction would apply. Connecting your on-premise network via VPN or (DX) Direct Connect to a single VPC would only provide access to the connected VPC, traffic would not be allowed to flow to other VPC’s.

To enable access to other VPC’s we would need to expand our connectivity mesh and have links between on-premise and all VPC’s as in the previous example. This adds more links to manage and maintain in our environment.

So how do we overcome this to provide a more dynamic and less complex solution to provide transitive routing within AWS? Answer; By implementing a Transit VPC or utilize the new fully managed Transit Gateway service.

Transit VPC

Implementation

Transit VPC’s were the main way of implementing a transitive infrastructure in the cloud, they provide a central point of routing that you manage using a 3rd party technology running on AWS. The reason this manages to overcome AWS’s transitive routing limitation is that it takes control of routing away from AWS and abstracts it onto the next layer above (similar to running your own networking infrastructure).

In this architecture, we would maintain our own routing devices (multiple vendors such as Cisco, Aviatrix and Sophos support running their appliances within AWS). We would then need to configure links and connectivity from these devices to our VPC’s, which would make our appliances the routing core of the network. All inter-VPC routing decisions would be made by these appliances/instances rather than the VPC/AWS architecture itself, bypassing the non-transitive restriction.

For this architecture, we wouldn’t need to create a full mesh, where every VPC is connected to every other VPC or on-premise, we would only need to create a connection between each VPC and our routing appliance and a link from on-premise to our transit VPC. For example, we would change a Peer mesh network from the above diagrams into the following architecture:

Connections between VPC’s and our routing appliances, would use VPN’s initiated from the routing appliance rather than peering connections. On-premise connections to AWS, this can be a VPN or a DX (Direct Connect) connection or a mixture of both for a resilient on-premise connection.

Although we still maintaining VPN connections, the more VPC’s we expand into, the more beneficial it becomes to use this architecture. AWS, alongside most vendors also have methods of using VPC tags, and automation tools that can automatically detect new VPC’s and create these connections for you through automation tools.

In the above example, we have also implemented a Peer connection between VPC A and B. This is recommended when low latency and high throughput is required as it removes latency that crossing VPN connections can add. Specific traffic can be directed to use this Peer connection while the remaining traffic uses the Transit VPC connections.

Another benefit of a Transit VPC architecture, is that as the routing appliances are often full virtual versions of routers and firewalls, it is possible to achieve almost anything that can be done on their physical counterpart. This can be for example, implementing NAT (both source and destination) in order to allow connections to overlapping IP ranges (something that is not possible natively within AWS).

Transit VPC Resiliency

There are 3 major components to the Transit VPC architecture to consider for resiliency.

  • VPN Connections between VPC’s and routing appliance
  • The Routing Appliance
  • Connectivity between on-premise and AWS. (if you are using a hybrid architecture)

For the first point, as we are using AWS’s managed VPN service to connect our 3rd party appliance, AWS provide distinct 2 tunnel endpoints (effectively 2 independent VPN’s) per connection in order to allow you to configure a resilient path for each VPN. AWS ensure both endpoints are in different fault domains and will not run maintenance on both endpoints at the same time.

However, in our initial diagram, we only have one routing device, which leaves our routing device in the transit VPC as the single point of failure. In order to make this resilient, we can simply add a 2nd routing device (in another AZ to provide AZ resilience) and add duplicate VPN connections from each VPC to the secondary routing instance. The below map shows our connectivity between key components.

When looking at on premise connectivity resilience, If you are using a VPN connection on its own, you should ensure both VPN tunnels provided by AWS have been configured and are active (BGP and ECMP will help ensure the correct path is taken and deal with failover). You would also need to ensure you have resilient connectivity through your ISP otherwise both tunnels could become inactive if your network cannot reach the outside world.

 If you are using a DX (direct connect), there are different options for resiliency which can be considered. These options depend on the importance of throughput and latency over cost.

  1. The most cost-effective solution would be to have a VPN configured as a failover route incase of a DX outage. This would provide basic connectivity with the poorest speed and latency.
  2. At a higher cost, but with better and more reliable connection, throughput and latency, an additional DX connection (through another provider or at least another datacenter) can be implemented. In this case, it would also be possible to implement a ECMP solution to get the benefits of both DX connections.

Transit VPC Costs

Understanding the costs of implementing a Transit VPC are important. This can be classed as a service charge to all of your VPC’s as it is the functional core of your AWS VPC network.

  • All costs below may be doubled, if a HA pair of firewall/router appliances are setup
  • VPN Costs –
    • Connection Hour Charges
    • Data/Traffic Charges
    • The above charges will apply for the following VPN’s:
      • VPN’s between VPC’s and Firewall/Routing appliances
      • VPN’s between Firewall/Routing appliances and detached VPG/VGW
      • (Optional) VPN’s between detached VPG/VGW and on-premise
  • DX Cost for on-premise connectivity (if implemented)
  • Firewall/Router Appliance Costs
    • EC2 Compute and Storage Costs for the firewall/routing appliances
    • Licensing costs – Vendors often charge additional license fees for their virtual appliances
    • Data Transfer

Transit Gateway

Implementation

In order to make transient routing simpler, AWS have recently released a service that’s capable of performing the functions of Transit VPC’s and are fully maintained and managed by AWS, this is their Transit Gateway Service. Transit Gateways provide a simple way for users to build expansive AWS environments while still having granular control over routing decisions without having to manage 3rd party appliances (firewalls or routers).

VPC’s are connected to the Transit Gateway via “Attachments” that are managed by AWS. Each connection can have individual route tables and routing domains (similar to VRF’s) on the Transit Gateway allowing you to directly control the routing per attachment. As Transit Gateway allows you to fully control routing advertisements with ease to your VPC’s, you can achieve additional routing tasks such as:

  • Making complex traffic flows, for example, configuring inline inspection by a firewall device
  • Creating blackhole routes to secure data
  • Routing to Floating IP’s (Virtual IPs) for items such as HA NetApp solutions.

As with Transit VPC’s, Transit Gateways remove the need to create a mesh network between all your VPC’s and/or on-premise networks, it would only be required to have an attachment between the VPC and the Transit Gateway.

As this is a new service from AWS, some features have not yet been fully implemented. For example, one of the key services enterprise companies use to connect to create their hybrid network linking on-premise infrastructure to AWS is Direct Connect (DX). Unfortunately, DX attachments are not yet fully supported in all AWS regions which means that companies need to use VPN’s or get creative with their DX architecture in order to connect their on-premise networks to a Transit Gateway. This DX limitation will soon be resolved as AWS are rolling out DX attachments to more regions very gradually.

In order to connect a DX connection in a supported region, we need to create a transit VIF on a Direct Connect Gateway, and then use this connection as an attachment to our Transit Gateway.

Transit Gateway Resiliency

As AWS Transit Gateway is a managed service by AWS, AWS are responsible for maintaining the scaling and resiliency to meet the needs of the throughput and architecture it’s connected too. This removes the complexity and management from yourselves.

Transit Gateway Costs

Costs for transit gateway are simpler to understand than a Transit VPC architecture as there are fewer parts to the system.

  • Transit Gateway VPC Attachment (per VPC attachment)
    • Connection Charge
    • Data Processed Charge
  • (Optional) if a VPN between on-premise and Transit Gateway is implemented
    • VPN Charges
      • Connection Hour Charge
      • VPN Data/Traffic Charge
    • Transit Gateway VPN Attachment
      • Connection Charge
      • Data Processed Charge
  • (Optional) if a DX between on-premise and Transit Gateway is implemented
    • DX Charges
    • Transit Gateway DX Attachment
      • Connection Charge
      • Data Processed Charge

Transit VPC vs Transit Gateway

Some key points and questions you may need to consider when choosing your transit architecture:

Function Transit Gateway Transit VPC
Supports Direct Connect Yes (Specific Regions and Connection Type Limitations) Yes
AWS Managed Service Yes No
Natively supports advanced network features such as NAT No Yes
Connection Technology VPC Attachments (AWS Backbone)
VPN Attachments (VPN)
DX Attachments (DX)
VPN
DX
Encryption in Transit VPC Attachments – No
VPN – Yes
Yes
VPC Connection Speed VPC Attachments – 50Gbp (Burst)
VPN Attachments – 1.25Gbps (ability to exceed this with ECMP)
1.25Gbps Per VPN (ability to exceed this with ECMP)
Application Level Filtering/Logging No* (Inline scanning can be implemented)Yes
DX connection restrictions Yes (1Gb+ for Hosted and Currently Region Limited) No
Internet Gateway Required No Yes (EIP for Appliance VPN)
Policy Based Access No Yes (Vendor Implemented)
Authentication Based Access No Yes (Vendor Implemented)
Route table control Yes Yes* (difficult to route to unknown or virtual IP’s)

Questions to Consider:

  1. Do you have in-house skills to manage your own routing/firewall devices? If no, a managed Transit Gateway would be preferable.
  2. Do you need to connect a DX (direct connect) connection? If yes, check the region compatibility list for DX Attachments. If your region is not on the below list, you may want to investigate using Transit VPC rather than Transit Gateway, otherwise you may need to look into other methods of connecting to Transit Gateway over your DX (for example the possibility of using a public VIF and running a VPN over your DX connection). As of 27/09/2019 the below regions are supported:
    1. US East (N. Virginia)
    2. US East (Ohio)
    3. US West (N. California) – San Francisco
    4. US West (Oregon) – Portland
    5. AWS GovCloud (US-East)
    6. AWS GovCloud (US-West)
    7. Canada (Central) Region – Montreal
    8. EU (Ireland) – Dublin
    9. EU (Frankfurt)
    10. EU (London)
  3. Is your DX connection supported as a Transit VIF? If no, a Transit VPC may be required
  4. Do you have any need (e.g. company or regulatory requirements) for firewall, application level scanning or authenticated traffic rules between your on-premise and AWS environments (to a level higher than flow logs)? If so, a Transit VPC may be preferable or implement a inline traffic flow (through routing) on Transit Gateway.
  5. If your company were to acquire or connect to a 3rd party network that has an overlapping CIDR range, what action would be most preferable/possible:
    • Change the connecting networks IP range – Use Transit Gateway
    • Instead of re-IP’ing the network you want to use a double NAT to mask the overlapping ranges into other ranges. – Look at Transit VPC
    • Want to mask individual IP’s in AWS to hide their true IP to a 3rd party, use Transit VPC.

Summary

The above are only a few key points and simple recommendations of architecture. Obviously its important to look at all points together and weigh up the differences in each and what they can bring to you.

There are many implementations for each architecture that can be used to add features you need. Choose the architecture that gives you the strongest foundations for your most important requirements. Remember Public Cloud is there to support you and your needs, and many possibilities exist (for example, if Transit VPC or Transit Gateway separately don’t fit your needs, combine them!).

I will try to write more in-depth articles on each architecture, but hopefully this gives a good outline as to the differences and why you may want to implement them.