AWS in Plain English

New AWS, Cloud, and DevOps content every day. Follow to join our 3.5M+ monthly readers.

Follow publication

Understanding All AWS Glue Import Statements and Why We Need Them

Adriano Nicolucci
AWS in Plain English
4 min readJul 6, 2022

--

So when you create a brand new aws glue job, I don’t know about you but it seems pretty intimidating that there are 6 python import statements that are generated automatically. I wanted to understand why we needed them so I did a bit of research and made this post and youtube video found at the bottom of this page to explain these statements to have a deeper understanding of AWS Glue and PySpark.

import sys

So the first statement we see is import sys. So we need this library to obtain the sys.argv method, which is the list of command-line arguments passed to a python script, So each glue job comes with a list of default arguments. As you can see here, it is a parameter in the “getResolvedOptions” method which we can use to add custom parameter names to our script. In the code below, I have passed a new argument called “JOB_NAME” which is equal to “test” and have appended it to the existing default arguments

So without importing the sys library, the getResolvedOptions method will not work. We will come back to talk about this one soon.

awsglue.transform import *

so the next import statement is awsglue.transform import * . We are importing all classes from the transform base class which contains all the AWS Glue-created transform classes to use in PySpark ETL operations. There are 24 classes at the time of making this article to help with the ETL process. These classes are meant to be applied on a DynamicFrame in an AWS Glue job.

awsglue.utils import getResolvedOptions

Next, we see from awsglue.utils import getResolvedOptions, this is the method responsible for reading the glue job parameters. If you want to pass parameters at the start of your job, you are going to need this method. It is also required for the aws glue bookmark feature where we need to include a “JOB_NAME”…

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

--

--

Published in AWS in Plain English

New AWS, Cloud, and DevOps content every day. Follow to join our 3.5M+ monthly readers.

Written by Adriano Nicolucci

I am a Solution Architect Consultant focusing on building Data platforms on AWS.

Responses (2)

Write a response