Google Cloud Global Load Balancer – Deep Dive

When you provision workload in the cloud to serve an application, having a load balancer at the front end of the applications’ tier is almost always a must, to ensure that users’ requests are redirected to the workload instances that have the capacity to serve the request with better performance.

Load balancing in Google Cloud Platform GCP is a fully scalable and redundant managed service offered in different flavors (global external, regional external, regional internal) This article focuses on the global external load balancer. Figure 1 below illustrates the high level architecture of Google global load balancer that will be discussed in more details in this article.

Generally, Load balancing in the cloud offers two main functions:

  • Traffic load balancing or distribution with optional auto scaling capability
  • Auto Healing in case a failed or an unhealthy instance presented in the backend, it can be replaced with a functional one.

Its, obvious from the name this type of LB, it is external (client/user facing) and it’s a global, also, it’s an application layer (HTTP(S)) type of LB. As highlighted in my previous blog “Networking in AWS vs. Google Cloud Platform – Design Considerations” with Google external global load balancer all what you need is a single IP to front end your application stack that cloud be distributed globally without the need to deploy a load balancer per region!

Since the Google global LB is not a single box, it’s not a VM instance and it’s not a cluster, so how does it work and where does it reside?

GCP global LB is constructed of the following components

  • Google global network: Google has over 100 points of presence, and a well-provisioned global network with 100,000s of miles of fiber optic cable, and the global LB is located in many of these POPs, which offer the ability to ingest user traffic into Google’s backbone as close as possible to the source of the traffic request, which offer an optimized experience, as illustrated below:

GCP LB is capable to scale quickly and effortlessly, according to GCP “Cloud Load Balancing is built on the same front-end serving infrastructure that powers Google. It supports 1 Million+ queries per second with consistent high performance and low latency. Traffic enters Cloud Load Balancing through 80+ distinct global load balancing locations, maximizing the distance traveled on Google’s fast private network backbone.”

  • Software defined load balancing: it is a SDN based LB that can be used by all projects on GCP, its also used by other services such as Google Search and Google Mail. The SDN construct of it, includes the global forwarding rules at the Google global front end to the targeted proxy service. “The global forwarding rule provides a single global IPv4 or IPv6 address (Anycast IP) that you can use in DNS records for your site, and it can only be used with HTTP(S), SSL Proxy, and TCP Proxy load balancing. You can use more than one forwarding rule with a given proxy, so you can have one for IPv4 traffic and another for IPv6 traffic”


Screen Shot 2018-06-06 at 8.15.26 PM

If LB for encrypted traffic (HTTPS), the target proxy requires signed certificate in order to terminate the SSL/TLS session and the proxy will re initiate a new session with the back for the session/request. If the new connection to the back is also encrypted, then the target VM instance need to have a certificate installed as well.

Once, the requested URL is received by the proxy, the URL policy map will be evaluated. Unless its configured for SSL or TCP proxy, it will be sent directly to the backend without URL map check.

  • URL Policy Map: traffic distribution with Google Cloud global external load balancer can be deployed in two main scenarios (or it can be combination of both):
  1. Cross-region load balancer: also known as Proximity based routing, according to GCP: “You can use a global IP address that can intelligently route users based on proximity. For example, if you set up instances in North America, Europe, and Asia, users around the world will be automatically sent to the backends closest to them, assuming those instances have enough capacity. If the closest instances do not have enough capacity, cross-region load balancing automatically forwards users to the next closest region” . With this deployment option, the URL map will only contain the default mapping.
  2. Content-based load balancer, with the visibility of the application layer using HTTP/HTTPs LB functions GCP global LB can provide traffic routing based on URL content, for instance request for certain part of the application like multimedia can be redirect to instances group with higher capacity while traffic distended to static content can be sent to different set of instances.

The URL map illustrated in the figure below, shows an example of a website or service that offer uploading photos (static content), in which traffic with the /photo/* is redirected to a multi-regional cloud storage bucket. While traffic destined  to multimedia video content, is distributed across different instances’ groups if its HD or non-HD video content.

Screen Shot 2018-06-06 at 8.15.03 PM

  • Targeted backend service: it’s a local grouping of the actual application instances, the backend along with health check can determine which instance(s) is healthy, or over utilized (CPU utilization, request per second per instance), and when to trigger auto scaling. You configure your load balancing service to route requests to your backend service.
  • Instance Group: used to add and manage virtual machines. GCP Compute Engine offers two flavors of instance groups: managed and unmanaged instance groups. The managed instance group is always the preferred type in which you create instances based on a template and then you will be able to enable auto-scaling to scale out and down the instances based on pre-defined metrics. On the other hand, the unmanaged instance group does not support this capability because simplify it is based on different VM instances, the common use scenario to such method is when you are migrating your workloads with different types of machines or trying to use existing configuration setup to load balance the traffic.

Although, the processing of SSL can be CPU intensive, especially when the used ciphers are not CPU efficient, it is always recommended to use secure sessions from the proxy to the backend instances and avoid sending the traffic over unencrypted TCP as it it typically will reduce the level of security between the GCP global load balancer and the backend instances. In addition, SSL proxy may handle HTTPS but it is recommended to create HTTPS target proxy with at least one signed SSL certificate installed on the target HTTPS proxy.

GCP recently introduce the support of QUIC for HTTPS load balancing, which is a UDP-based encrypted transport protocol optimized for HTTPS, that is used to deliver traffic to Google products such as Google Web Search, YouTube, and other services that you probably use over Google Chrome. QUIC offers faster and more reliable web-based connections, as it won’t let a problem with one request to slow down other streams in the same connection, even on an unreliable connection. At the time of this blog writing, GCP is the first major public cloud to offer QUIC support for its HTTPS load balancers.

source and more details refer to: Introducing QUIC support for HTTPS load balancing

QUIC: A UDP-Based Multiplexed and Secure Transport draft-ietf-quic-transport-13

Moreover, GCP Global HTTP/HTTPS LB allows you to create custom request headers, in case the default ones are not sufficient or do not meet your requirements.

User-defined request headers allow you to specify additional headers that the load balancer adds to requests. These headers can include information that the load balancer knows about the client connection, including the latency to the client, the geographic location of the client’s IP address, and parameters of the TLS connection.User-defined request headers are supported for backend services associated with HTTP(S) Load Balancers

At the time of this blog writing, this capability is in Beta release, which means it is not covered by any SLA or deprecation policy and might be subject to backward-incompatible changes.

When it comes to instances health check, typically the GCP LB health checks are used to decide if an instance(s) is “healthy” and functioning. Functioning here might be checked using an application layer probe such as HTTP(s) probe. With GCP LB, If you check the logs on the instance, you may notice that the health check polling is happening more frequently than what you may have configured. This is because GCP LB offer the ability to create redundant copies of each health checker, which are used to probe your instances. If any health checker fails, a redundant one can take over without delay.

As discussed in the previous blog “Networking in AWS vs. Google Cloud Platform – Design Considerations”, each VPC in GCP has a single virtual software distributed FW, in order to make sure the LB and health check can communicate with the intended VMs in the respective instance group, traffic needs to be explicitly allowed by the FW rules. “You must create a firewall rule that allows traffic from and to reach your instances. These are IP address ranges that the load balancer uses to connect to backend instances. This rule allows traffic from both the load balancer and the health checker, also, keep in mind that GCP firewall rules block and allow traffic at the instance level, not at the edges of the network. They cannot prevent traffic from reaching the load balancer itself

One of the  key aspects of connectivity design with GCP LB is, that by default, HTTP(S) load balancing distributes requests evenly among available instances. However, some applications behind a NAT device they will appear as sourced from the same IP, also stateful servers used by ads, gaming applications etc. may go through multiple applications’ tiers requests  before the user end up on the targeted instance. When the session is disconnected due to poor quality or a moving mobile user, it can lead to bad user experience. That’s why considering “session affinity” to identify requests from a user by the client IP or preferably by the value of a cookie to re-direct client request to the same instance in a consistent manner, assuming the intended instance is healthy and has capacity to handle the request.

Beware that, when auto scaling functionality adds or removes instances within an instance group, technically the backend service may reallocate load, and the target instance may move, therefore to reduce the impact of such situation, you need to ensure that the initial minimum number of instances provisioned by the auto scaling is sufficient to handle the anticipated load, and auto scaling is kicked-off only when there is an unexpected load increase. However, this may not always be the case, because it requires good understanding of the expect load and required workload to handle it, also, the load may not be consistent during the day or day of the week in which the minimum number of instances can not be pre-provisioned to cover the expected load for the entire week (this is where auto-scaling is required).

Furthermore, Kubernetes Engine offers integrated support of Google HTTP LB, where an ingress controller can be created in a cluster, and then Kubernetes Engine creates an HTTP(S) load balancer and configures it to route traffic to application, also, path matcher can be leveraged for more specific requests routing into multiple containers with different images or functions. According to GCP “If you are exposing an HTTP(S) service hosted on Kubernetes Engine, HTTP(S) load balancing is the recommended method for load balancing.

Setting up HTTP Load Balancing with Ingress

Compute Engine Load Balancing hits 1 million requests per second! (2013)

Technically, the system or cluster admin, need to create a Service resource of type NodePort, to make the frontend Pods deployment reachable within the container cluster, such as Web frontend.

Then with the ingress controller, Kubernetes creates an Ingress resource on the cluster. which is responsible for creating an HTTP(S) Load Balancer to route all external HTTP(s) traffic to the frontend – Web NodePort Service that was exposed.

One of the key design consideration here, is that the LB is Node/VM aware only (typically GCP LB point to instance group(s), that contain the backend cluster nodes), while from Containerized application architecture point of view, almost always VM:Pod is not 1:1. Therefore this may introduce imbalanced load distribution issue here.

Consequently, as illustrated in the figure above, if traffic evenly distributed between the two available nodes (50:50) with Pods part of the targeted Service, the Pod on the left node will handle 50% of the traffic while each Pod hosted by the right node will receive about 25%. Kubernetes Kube-Proxy (IPTebles at kernel level) here deals with the distribution of the traffic to help considering all the Pod part of the specific Service across all nodes.

As it shown in the figure above, because the backend Service, (IPTables) can randomly pick a Pod that potentially residing in a different node, there will be an extra network hop, for the incoming and return traffic. As result, this will create what is commonly known as “Traffic Trombone”.

Also, you may have noticed that there are source and destination NAT has been done. The destination NAT, in order to send traffic to the selected Pod, while the source NAT, is to ensure the traffic will come return to the same originally selected node by the LB, in order to do source NAT to the LB IP before sending the traffic back to the client otherwise there will be mismatch in the session (original client request is toward the LB IP).

Practically, this imbalance issue may not be always a big problem if there is ration of VM:Pod is well balanced and the added latency is not an problem.

That being said, the good news is, recently GCP announced (at GCP Next18) a new capability in GCP load balancing with Kubernetes, which is ‘container native load balancing’ using network endpoint group, in which the GCP LB will be container aware, this means the LB will target containers directly and not the node/VM. This native container support, offer more efficient load distribution, as well as more accurate health checks (container level visibility), without the need for multiple NATing. From external clients’ point of view, this will provide better users experience, due to the optimized data path as there is no proxy in between which reduces the possible latency of multiple hopes packets’ forwarding.

For GCP cloud service consumers who need to use multiple Google Kubernetes Engine clusters to host their applications, for: better resilience, scalability, isolation, compliance as well as archiving low-latency applications access, from anywhere around the world.

GCP introduced the support of Multi-cluster Ingress controller, with a new CLI tool called “kubemci” that help to automatically deploy an ingress controller using Google Cloud Load Balancer to serve multi-cluster Kubernetes Engine environments (Federated or standalone clusters). As result customers will be able to leverage GCP global LB along with the multiple Kubernetes Engine clusters distrusted across different regions, from the closest cluster using a single GCP LB frontend anycast IP address.

Further reading:

How to deploy geographically distributed services on Kubernetes Engine with kubemci


Marwan Al-shawi – CCDE No. 20130066, Google Cloud Certified Architect, AWS Certified Solutions Architect, Cisco Press author (author of the Top Cisco Certifications’ Design Books “CCDE Study Guide and the upcoming CCDP Arch 4th Edition”). He is Experienced Technical Architect. Marwan has been in the networking industry for more than 12 years and has been involved in architecting, designing, and implementing various large-scale networks, some of which are global service provider-grade networks. Marwan holds a Master of Science degree in internetworking from the University of Technology, Sydney. Marwan enjoys helping and assessing others, Therefore, he was selected as a Cisco Designated VIP by the Cisco Support Community (CSC) (official Cisco Systems forums) in 2012, and by the Solutions and Architectures subcommunity in 2014. In addition, Marwan was selected as a member of the Cisco Champions program in 2015 and 2016.