How to Deal with “You have exceeded the maximum limit for HyperPlane ENIs for your account”

A guide on solving the error regarding exceeding the maximum limit for HyperPlane ENIs for your account.

Ivica Kolenkaš
AWS in Plain English

--

Photo by Lars Kienle on Unsplash

TL;DR

Problem

Error: You have exceeded the maximum limit for HyperPlane ENIs for your account.

Quick fix

Remove unused Lambda versions. Take your pick from one of these code snippets to avoid doing it manually

Proper fix

  • Share the security_group:subnet combination across multiple Lambdas in your VPC where possible
  • Spread Lambdas across different VPCs; the limit of 250 hyperplane ENIs is per VPC
  • Don’t enable networking on Lambdas that do not need it
  • Don’t attach irrelevant/unneeded security groups to Lambdas

The problem

We were surprised one day while deploying an existing Lambda function when the pipeline errored with:

Error: You have exceeded the maximum limit for HyperPlane ENIs for your account.

This was a definitive sign that we started hitting the limit for Hyperplane ENIs (HENIs) in the VPC. Our first reaction was to ask for an increase but AWS support has (rightfully) asked us to try and reduce the number of HENIs first. So what are they exactly and why is support not increasing their limit easily?

The “what” and “why” of Hyperplane ENIs

Below is a short rehash of the original AWS post announcing Hyperplane ENIs for Lambdas.

AWS rolled out an improvement to how Lambda functions work in a VPC network between September 2019. and August 2020. It enabled more efficient usage of elastic network interfaces (ENI) and faster startup times for functions. A picture says a thousand words, so here are two pictures:

Left: before Hyperplane ENIs. Right: after Hyperplane ENIs

Rather than using ENIs for networking, Lambdas started using HENIs which introduced several benefits:

  • Reduced startup times since the HENI is created when your Lambda is configured to use a VPC
  • Improved scaling of the Lambda since concurrent executions use the existing HENI
  • Reduced overall usage of ENIs since HENIs are created for each unique security group:subnet combination. All functions sharing the same security_group:subnet combination will re-use the existing HENI

I was unable to find official documentation regarding limits on Hyperplane ENIs but AWS support has confirmed that the soft limit for HENIs is 250 per VPC and the hard limit is 350 HENIs per VPC. These HENI limits sound very low when compared to a soft limit of 5000 ENIs per VPC but this should not worry us since their use cases are different, especially after the rollout of HENIs for Lambdas.

Our AWS account facts

The organisation I work for is fully hosted on AWS. We primarily use ECS and Lambda for computing, along with RDS, S3, and Redshift for data storage. At the moment of writing this article, we have 97 Lambdas in our production account. If configured to use VPC networking, each of them will use a subnet from one of the 3 configured availability zones from a shared VPC. Roughly 70 out of 97 Lambdas were configured to use VPC networking using 3 subnets, having at least one security group attached. If you followed this article carefully your alarms would be ringing by now.

70 x 3 = 210

This number is very close to the soft limit of 250 HENIs. Let me just accept that Fields Medal and I’ll continue…

To make things slightly worse, HENIs are not used by Lambdas exclusively — network load balancers and NAT Gateways also use them and yes, we have those in our VPC as well.

Finding the number of HENIs

To delete them you must first find them. This is easier said than done because there is no easy way to show all used HENIs through the AWS Console at this moment. Searching for “AWS Lambda VPC ENI” in the EC2 Network Interfaces console does show 70 interfaces used up by Lambdas, but that is not even close to the limit of 250.

I was never a master of AWS CLI so the closest thing to a proper command to list HENIs that I could come up with is:

This command shows the same number as my previous search in the AWS Console and both of them are way off. Both of these techniques show the number of HENIs used by Lambdas but there are close to 180 (250–70) HENIs that is used by something else.

To find all the HENIs in the account I had to call Python to the rescue!

Lists ENIs and HENIs in your account

Running the script revealed more information:

The output of the above script

Reducing the number of HENIs

Quick Fix

After my initial investigation and repeating several times to myself that a HENI is created for each security_group:subnet combination I decided to try a quick fix: remove unused Lambda versions. Take your pick from one of these code snippets to avoid doing it manually. This helped reduce the number of used HENIs by about 20 which was enough to remain safely below the limit and allow me to implement a proper fix.

Proper Fix

The problem was originally introduced with one of our Terraform module versions that assumed that every Lambda will:

  • Be configured to use a VPC by default
  • Have a default security group created and attached, even if not needed

These two very bold assumptions have somehow passed the code review process, were merged, and used by our teams for months, chipping at that relatively low limit of 250 HENIs one terraform apply at a time.

Making the code change and releasing another Terraform module version was relatively easy but talking to teams, helping them understand and implement the change took some weeks. The following flowchart helped us understand what to do with each specific function for every team.

Flowchart that helped us understand if a specific Lambda needs networking enabled
A flowchart that helped us understand if a specific Lambda needs networking enabled

Conclusion

Hyperplane ENIs are a great feature of AWS networking with their many benefits. As with most tools and services, it’s all about how you use and configure them. We made some assumptions without questioning them or understanding their implications for the future and they came back to bite us.

Learning from your own mistakes is the most “expensive” way to learn. What we learned from this situation is to pay better attention to default values and patterns we create in our infrastructure code. With it, we create a standard for our organisation and we should be triple-sure about the decisions and assumptions made while doing so.

Links

More content at plainenglish.io. Sign up for our free weekly newsletter here.

--

--