AWS VPC Gateway vs. Interface Endpoints

Setting up a VPC within AWS is a critical component to the security of a cloud setup. When lamdbas are connected to the VPC, they become disconnected from AWS services unless specific steps are taken to allow access.

First, the appropriate security group must be given to the lambda. For example, most AWS services are accessed via https, so port 443 must be opened up within the security group, both egress (outbound) and ingress (inbound).

Next, something called a VPC Endpoint needs to be set up so that the lamba can access the appropriate AWS service. For DynamoDB and S3, there is a similar concept called a VPC Gateway. These services can be accessed like the other AWS services through VPC Interface Endpoints as well. VPC Gateways are an older mechanism. I will describe one of the configuration differences which got me into trouble.

One of the differences between gateways and interface endpoints is that gateways have to be explicitly added to all the route tables where your lambda functions are attached if that resource is to be accessed. With interface endpoints, as long as the endpoint is attached to a single route table in your region, it can be accessed by your lambda functions in all availability zones in that region.

To illustrate this difference, I will highlight a problem in an application I was developing. I wanted to re-use terraform code to bring up AWS gateway and interface endpoints at the same time. In terms of assigning resources to route tables within subnets, I felt it was safe to attach both the gateways and interface endpoints to subnets in availability zones A, B, and C, without attaching all the services to availability zone D, since currently some AWS services are not supported in this zone. This caused the following error to show up 50% of the time (intermittent) when calling a lambda that utilized s3:

{

  “errorMessage”: “An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied”,

  “errorType”: “ClientError”,

  “stackTrace”: [

    ”  File \”/var/task/plrsdfastca_statements_testing.py\”, line 21, in lambda_handler\n    response = client.list_objects_v2(\n”,

    ”  File \”/var/runtime/botocore/client.py\”, line 386, in _api_call\n    return self._make_api_call(operation_name, kwargs)\n”,

    ”  File \”/var/runtime/botocore/client.py\”, line 705, in _make_api_call\n    raise error_class(parsed_response, operation_name)\n”

  ]

}

This type of error doesn’t really tell you what is really happening and is very misleading. It was due to the fact that the AWS gateway endpoint for S3 was not on the route table for the subnet availability zone D. This caused interment issues because the lambda function was attached to two different subnets, and would only error out presumably when invoked from the subnet in availability zone D, which did not have the S3 gateway endpoint on its route table.

Other times the error would be an SSL error, which also doesn’t tell you what is really going on. What I did to simulate this error is remove the route table entry on the S3 vpc gateway interface on the subnet the lambda was on. Since it doesn’t have a route via the VPC, the lambda is going over the internet to get to S3, and since this isn’t allowed by our security IAM policy, an access denied error occurs. For other services, this may be an SSL error, which again is very misleading.