Troubleshoot your Network Load Balancer - Elastic Load Balancing

Troubleshoot your Network Load Balancer

The following information can help you troubleshoot issues with your Network Load Balancer.

A registered target is not in service

If a target is taking longer than expected to enter the InService state, it might be failing health checks. Your target is not in service until it passes one health check. For more information, see Health checks for your target groups.

Verify that your instance is failing health checks and then check for the following:

A security group does not allow traffic

The security groups associated with an instance must allow traffic from the load balancer using the health check port and health check protocol. For more information, see Target security groups.

A network access control list (ACL) does not allow traffic

The network ACL associated with the subnets for your instances and the subnets for your load balancer must allow traffic and health checks from the load balancer. For more information, see Network ACLs.

Requests are not routed to targets

Check for the following:

A security group does not allow traffic

The security groups associated with the instances must allow traffic on the listener port from client IP addresses (if targets are specified by instance ID) or load balancer nodes (if targets are specified by IP address). For more information, see Target security groups.

A network access control list (ACL) does not allow traffic

The network ACLs associated with the subnets for your VPC must allow the load balancer and targets to communicate in both directions on the listener port. For more information, see Network ACLs.

The targets are in an Availability Zone that is not enabled

If you register targets in an Availability Zone but do not enable the Availability Zone, these registered targets do not receive traffic from the load balancer.

The instance is in a peered VPC

If you have instances in a VPC that is peered with the load balancer VPC, you must register them with your load balancer by IP address, not by instance ID.

Targets receive more health check requests than expected

Health checks for a Network Load Balancer are distributed and use a consensus mechanism to determine target health. Therefore, targets receive more than the number of health checks configured through the HealthCheckIntervalSeconds setting.

Targets receive fewer health check requests than expected

Check whether net.ipv4.tcp_tw_recycle is enabled. This setting is known to cause issues with load balancers. The net.ipv4.tcp_tw_reuse setting is considered a safer alternative.

Unhealthy targets receive requests from the load balancer

This occurs when all registered targets are unhealthy. If there is at least one healthy registered target, your Network Load Balancer routes requests only to its healthy registered targets.

When there are only unhealthy registered targets, the Network Load Balancer routes requests to all the registered targets, known as fail-open mode. The Network Load Balancer does this instead of removing all the IP addresses from DNS when all the targets are unhealthy and respective Availability Zones do not have healthy target to send request to.

Target fails HTTP or HTTPS health checks due to host header mismatch

The HTTP host header in the health check request contains the IP address of the load balancer node and the listener port, not the IP address of the target and the health check port. If you are mapping incoming requests by host header, you must ensure that health checks match any HTTP host header. Another option is to add a separate HTTP service on a different port and configure the target group to use that port for health checks instead. Alternatively, consider using TCP health checks.

Unable to associate a security group with a load balancer

If the Network Load Balancer was created without security groups, it can't support security groups after creation. You can only associate a security group to a load balancer during creation, or to an existing load balancer that was originally created with security groups.

Unable to remove all security groups

If the Network Load Balancer was created with security groups, there must be at least one security group associated with it at all times. You cannot remove all security groups from the load balancer at the same time.

Increase in TCP_ELB_Reset_Count metric

For each TCP request that a client makes through a Network Load Balancer, the state of that connection is tracked. If no data is sent through the connection by either the client or the target for longer than the idle timeout, the connection is closed. If a client or a target sends data after the idle timeout period elapses, it receives a TCP RST packet to indicate that the connection is no longer valid. Additionally, if a target becomes unhealthy, the load balancer sends a TCP RST for packets received on the client connections associated with the target, unless the unhealthy target triggers the load balancer to fail open.

If you see a spike in the TCP_ELB_Reset_Count metric just before or just as the UnhealthyHostCount metric increases, it is likely that the TCP RST packets were sent because the target was starting to fail but hadn't been marked unhealthy. If you see persistent increases in TCP_ELB_Reset_Count without targets being marked unhealthy, you can check the VPC flow logs for clients sending data on expired flows.

Connections time out for requests from a target to its load balancer

Check whether client IP preservation is enabled on your target group. NAT loopback, also known as hairpinning, is not supported when client IP preservation is enabled. If an instance is a client of a load balancer that it's registered with, and it has client IP preservation enabled, the connection succeeds only if the request is routed to a different instance. If the request is routed to the same instance it was sent from, the connection times out because the source and destination IP addresses are the same.

If an instance must send requests to a load balancer that it's registered with, do one of the following:

  • Disable client IP preservation.

  • Ensure that containers that must communicate, are on different container instances.

Performance decreases when moving targets to a Network Load Balancer

Both Classic Load Balancers and Application Load Balancers use connection multiplexing, but Network Load Balancers do not. Therefore, your targets can receive more TCP connections behind a Network Load Balancer. Be sure that your targets are prepared to handle the volume of connection requests they might receive.

If your Network Load Balancer is associated with a VPC endpoint service, it supports 55,000 simultaneous connections or about 55,000 connections per minute to each unique target (IP address and port). If you exceed these connections, there is an increased chance of port allocation errors. Port allocation errors can be tracked using the PortAllocationErrorCount metric. To fix port allocation errors, add more targets to the target group. For more information, see CloudWatch metrics for your Network Load Balancer.

Intermittent connection failure when client IP preservation is enabled

When client IP preservation is enabled, you might encounter TCP/IP connection limitations related to observed socket reuse on the targets. These connection limitations can occur when a client, or a NAT device in front of the client, uses the same source IP address and source port when connecting to multiple load balancer nodes simultaneously. If the load balancer routes these connections to the same target, the connections appear to the target as if they come from the same source socket, which results in connection errors. If this happens, clients can retry (if the connection fails) or reconnect (if the connection is interrupted). You can reduce this type of connection error by increasing the number of source ephemeral ports or by increasing the number of targets for the load balancer. You can prevent this type of connection error by disabling client IP preservation or by disabling cross-zone load balancing.

Additionally, when client IP preservation is enabled, connectivity might fail if the clients that are connecting to the Network Load Balancer are also connected to targets behind the load balancer. To resolve this, you can disable client IP preservation on the affected target groups. Alternatively, have your clients connect only to the Network Load Balancer, or only to the targets, but not both.

TCP connection delays

When both cross-zone load balancing and client IP preservation are enabled, a client connecting to different IPs on the same load balancer may be routed to the same target. If the client uses the same source port for both of these connections, the target will receive what appears to be a duplicate connection, which can lead to connection errors and TCP delays in establishing new connections. You can prevent this type of connection error by disabling cross-zone load balancing. For more information, see Cross-zone load balancing.

Potential failure when the load balancer is being provisioned

One of the reasons a Network Load Balancer could fail when it is being provisioned is if you use an IP address that is already assigned or allocated elsewhere (for example, assigned as a secondary IP address for an EC2 instance). This IP address prevents the load balancer from being set up, and its state is failed. You can resolve this by de-allocating the associated IP address and retrying the creation process.

DNS name resolution contains fewer IP addresses than enabled Availability Zones

Ideally your Network Load Balancer provides one IP address per enabled Availability Zone, when they have at least one healthy host in the Availability Zone. When there are no healthy host in a particular Availability Zone, and cross-zone load balancing is disabled, the IP address of the Network Load Balancer respective of that AZ will be removed from DNS.

For example, suppose your Network Load Balancer has three Availability Zones enabled, all of which have at least one healthy registered target instance.

  • If the registered target instance(s) in Availability Zone A become unhealthy, the corresponding IP address of Availability Zone A for the Network Load Balancer is removed from DNS.

  • If any two of the enabled Availability Zones have no healthy registered target instance(s), the respective two IP addresses of the Network Load Balancer will be removed from DNS.

  • If there are no healthy registered target instance(s) in all the enabled Availability Zones, the fail-open mode is enabled and DNS will provide all the IP addresses from the three enabled AZs in the result.

Troubleshoot unhealthy targets using the resource map

If your Network Load Balancer targets are failing health checks, you can use the resource map to find unhealthy targets and take actions based on the failure reason code. For more information, see Network Load Balancer resource map.

Resource map provides two views: Overview, and Unhealthy Target Map. Overview is selected by default and displays all of your load balancer's resources. Selecting the Unhealthy Target Map view will display only the unhealthy targets in each target group associated to the Network Load Balancer.

Note

Show resource details must be enabled to view the health check summary and error messages for all applicable resources within the resource map. When not enabled, you must select each resource to view its details.

The Target groups column displays a summary of the healthy and unhealthy targets for each target group. This can help determine if all the targets are failing health checks, or only specific targets are failing. If all targets in a target group are failing health checks, check the target group's health check settings. Select a target group's name to open its detail page in a new tab.

The Targets column displays the TargetID and the current health check status for each target. When a target is unhealthy, the health check failure reason code is displayed. When a single target is failing a health check, verify the target has sufficient resources. Select a target's ID to open its detail page in a new tab.

Selecting Export gives you the option of exporting the current view of your Network Load Balancer's resource map as a PDF.

Verify that your instance is failing health checks and then based on the failure reason code check for the following issues:

  • Unhealthy: Request timed out

    • Verify the security groups and network access control lists (ACL) associated with your targets and Network Load Balancer are not blocking connectivity.

    • Verify the target has sufficient capacity available to accept connections from the Network Load Balancer.

    • The Network Load Balancer's health check responses can be viewed in each target's application logs. For more information, see Health check reason codes.

  • Unhealthy: FailedHealthChecks

    • Verify the target is listening for traffic on the health check port.

      When using a TLS listener

      You choose which security policy is used for front-end connections. The security policy used for back-end connections is automatically selected based on the front-end security policy in use.

      • If your TLS listener is using a TLS 1.3 security policy for front-end connections, the ELBSecurityPolicy-TLS13-1-0-2021-06 security policy is used for back-end connections.

      • If your TLS listener is not using a TLS 1.3 security policy for front-end connections, the ELBSecurityPolicy-2016-08 security policy is used for back-end connections.

      For more information, see Security policies.

    • Verify the target is providing a server certificate and key in the correct format specified by the security policy.

    • Verify the target supports one or more matching ciphers, and a protocol provided by the Network Load Balancer to establish TLS handshakes.