End to end pipeline mocking with localstack and Airflow MWAA

It is not a secret that with workflow orchestration tools like Airflow, testing and iterating a pipeline is notoriously hard, especially when cloud components come to play.
In this article, we will explore an easy way to set up a complete mock infrastructure setup using docker compose, localstack, and the mwaa (AWS-managed airflow solution) docker image.
Localstack
Localstack provides an easy mocking framework for developing AWS cloud applications.
There is a multitude of supported services out of the box
- ACM, API Gateway, CloudFormation, CloudWatch, CloudWatch Logs, DynamoDB, DynamoDB Streams, EC2, Elasticsearch Service, EventBridge (CloudWatch Events), Firehose, IAM, Kinesis, KMS, Lambda, Redshift, Route53, S3, SecretsManager, SES, SNS, SQS, SSM, StepFunctions, STS.
In the case of our flow, these services suffice to implement our use cases.
MWAA runner
Amazon provides a local image with their version of Airflow that makes emulating the production environment easier.
We will focus on the /docker/docker-compose-local.yml and use that file as our entrypoint to create our testing infrastructure.
The docker-compose.yaml that follows contains an example setup to create a complete mock infrastructure
Any additional libraries that need to be installed as part of the mwaa-runner can be added in a requirements.txt and be placed in the dags folder. That is different from the production aws service where the requirements is being added from a provided path in s3

Mesoshpere/aws-cli
That is a docker image that contains the aws-cli and allows users to easily issue cli commands. Since with docker-compose we create a single network the localstack endpoint is accessible to the mesoshpere container and with aws-cli we are able to create mock resources.
We are almost there. So what have we achieved here?
The following resources are created
- A local airflow environment
- 2 s3 buckets (input/output)
- An sqs queue
- A dynamo db table with some data added from a file
In this example we are mocking a pipeline with the following example structure:
read and sqs message ||check dynamdb for report location in s3 ||read report from s3||persist transformed report in the output bucket
How does airflow work with localstack?
Airflow integrates well with boto3 so it is almost plug and play with everything AWS.
We can either use boto directly and create a session using the localstack endpoint or get the sessions from an airflow hook directly. Airflow hooks have limited functionality and do not contain all available methods but we can extract the underlying boto connection as:
Why go through this trouble though? Since we can set all the credentials, endpoints and assume-roles on connection level, it is cleaner to extract boto from the AwsBaseHook since we already have a nicely defined session that is handled/created for us.
For our case to work we need to add the localstack endpoint as host in the aws_default connection in airflow which is handled by this line in the docker-compose by:
AIRFLOW_CONN_AWS_DEFAULT=aws://a:a@?host=http://localstack:4566®ion_name=us-east-1
A note here if you edit the connection from airflow UI, is that the extras field is being used and not the host field to set the localstack endpoint

With those simple steps you have a local representation of MWAA and you can emulate most of the interactions with AWS services. That does not replace proper unit and integration testing but can help increase confidence before deploying a production load. It can also help save costs on needing a full-fledged developer setup in a real AWS account.
More content at plainenglish.io