AWS Step Functions: A Low Code Workflow Service to Build Serverless Applications

Introduction to AWS Step Function

Abhinav Singh
AWS in Plain English

--

What is AWS Step Function?

AWS Step Function is a “Function Orchestrator”. It’s a visual workflow service used to orchestrate AWS services, automate business processes, and build serverless applications. Workflows manage failures, retries, parallelisation, service integrations, and observability so developers can focus on higher-value business logic.

It is a service that enables you to coordinate between different components in a distributed system and in between microservices. Application are built from individual components that each perform a discrete function, or task, and allows you to scale and change internal components quickly.

It solves lot’s of developer problems like:

  1. Sequencing Functions one after another.
  2. Running different functions in parallel.
  3. Selecting different functions based on some specific data.
  4. Adding custom retry logic to individual functions.
  5. Graceful failure management.
  6. Running functions that take considerable time to execute.
  7. Breaking a large workflow into smaller functions that could be scaled independently.

History

Step Functions is a fairly new technology. It was unveiled during AWS re:Invent conference in 2016. It was built as an accessory to AWS Lambda because of the challenges it posed. A Lambda Function, being stateless was only beneficial when we had a single system having one entry point, module or component, which is not what mostly happens.

AWS Step Functions where launched during AWS re:Invent conference in 2016

Features

  • Error Handling : Multiple & independent paths can be added for every kind of expected failures.
  • Automatic Retries : Specific number of retries in different intervals can be added to each step.
  • Triggering & Tracking: A workflow can be triggered & tracked directly on AWS, or by using Lambdas, SQS, SNS, APIs etc.
  • Manage execution code: The code to be executed for each step can be individually managed. This helps if internal logic keeps changing for a function.
  • Visualisation: It is a visual workflow service so you can have a easy to understand representation of the running & executed workflows.
AWS Step Function Execution

Benefits

  • Complex Workflows: Step Function is ideal for a complex workflow divided into small individual components. You can have retries, try/catch/finally, dynamic wait etc. features at the workflow level itself.
  • Long Lived: The maximum duration of a Standard workflow is 1 year. This makes it ideal for running workflows that might take days or even weeks and provide easy to understand visual representation of current states.
  • Logical separation between workflow & business logic: There is a logical separation between the workflow and the business logic. Both can be individually managed & scaled according to demand.
  • Parallelism: Parallel branches can be created to run in async. This helps a lot when you have multiple disconnected branches that have to be executed in a workflow.

Standard vs Express Workflows

Step Functions come in two varieties: Standard & Express.

Standard workflows are meant for your regular functions, which might run for a considerable period of time, and do not need very high number of transitions or computation capacity.

On the other hand, Express workflows, as the name suggests, are meant for specialised operations, like high-event-rate workloads, streaming data processing, high frequency data ingestion, etc. Some of the other featured for these workflows are:

Standard Workflows

  • Maximum Duration: 1 year
  • Execution rate Constraint: 2,000 per second.
  • State Transition Constraint: 4,000 per second.
  • Pricing is based on number of state transitions.
  • Shows execution history & visual debugging.
  • Supports all service integrations & patterns.

Express Workflows

  • Maximum Duration: 5 minutes
  • Execution rate Constraint: 1,00,000 per second.
  • State Transition Constraint: Nearly unlimited.
  • Pricing is based on number & duration of executions.
  • Sends execution history to Amazon Cloudwatch.
  • Supports all service integrations & most patterns.

Basically Standard Step Functions suffice for all your day-to-day requirements. Express Workflows should be used only in case of high-data high-computation based executions that do not run for longer periods of time.

State Machines

The “magic” behind Step Functions is State machines, wherein you can provide inputs, output formats, retry & catch logic etc other features using Amazon State Language, which is a JSON based structured language.

“Validate-All”: {“Type”: “Map”,“InputPath”: “$.detail”,“ItemsPath”: “$.shipped”,“MaxConcurrency”: 0,“ResultPath”: “$.detail.shipped”,“Parameters”: {“parcel.$”: “$$.Map.Item.Value”,“courier.$”: “$.delivery-partner”},“Iterator”: {“StartAt”: “Validate”,“States”: {“Validate”: {“Type”: “Task”,“Resource”: “arn:aws:lambda:us-east-1:123456789012:function:ship-val”,“End”: true}}},“End”: true}

States

There are 6 broad categories of states that Step Functions Support. These can be utilised to build complex workflows. Functionality of each state is independent & can cater to change in system requirements. The states are: Task, Choice, Parallel & Map, Pass, Fail & Succeed and Wait.

  • Task: A task represents a single unit of work performed by a state machine. Each task can perform an Activity, which is a block of logic running on premise, EC2 or Lambdas. The supported AWS services include AWS Batch Jobs, Amazon SNS (used as a notification service), Amazon SQS (used as a queuing service) or AWS Glue. The syntax & properties of a task can be found in detail here.
  • Choice: Choice state is used to add branching logic (similar to if-else) in your workflow. It adds conditional properties for workflows where execution flow might depend on results or parameters. The supported operations are AND(&&), OR(||) and NOT(!). The conditional statements can be based on Boolean parameters, Strings, Numerics or TimeStamps. Syntax & Details.
  • Wait: Wait state is used to halt the execution of the workflow for a given amount of time. Delay provided by the Wait state can be defined either as a specified amount of time (30 minutes) or till a specific time (15 July Thursday 11am). Timestamps provided must conform to the RFC3339 profile of ISO 8601. More details about these & the syntax can be found here.
  • Pass: Pass state does not perform any work. It just passes its input to the output state. It’s primary function is to construct state machines & also as a debugger. Syntax & Details.
  • Succeed & Fail: These are the terminal states of a workflow. They finish the execution & mark the status of the workflow as success or failure. Since these are terminal states they do not have the “Next” field. The only exception here is sometimes we use the failure in a Catch statement and mark the execution as successful. The syntax for Succeed & Fail can be referred here.
  • Parallel: Parallel states are used to add branching logic to your workflows. Each branch is executed in parallel & can be used to logically decouple states which do not have dependency on each other. Syntax & details.
  • Map: Map state is used to run a steps for each array element. It uses each item as input & runs the same steps for each of them. Hence Maps should be used when we have to execute the same set of steps for a set of items while Parallel can be used where we execute multiple branches of steps using the same input. Details of Map state can be found here while the syntax here.

Error handling: Catch & Retry

Any state can encounter runtime errors. Errors can happen for various reasons:

  • State machine definition issues like no matching rule in a Choice state.
  • Task failures eg. an exception in a Lambda function.
  • Transient issues like network partition events, on-premise failures etc.

By default, when a state reports an error, Step Functions causes the execution to fail entirely. But Amazon State Language provides us with mechanisms for error handling- namely Catch & Retry.

Catch is used to execute try-catch-finally kind of scenarios in workflow executions. We can catch different errors and point them to next steps based on that. Amazon State Language defines errors as a built strings that names well known errors, all beginning withStates. prefix.

Retry mechanism is useful when a single execution of a state might throw errors but can be reduced with multiple tries.Task and Parallel states can have a field named Retry, whose value must be an array of objects known as retriers. An individual retrier represents a certain number of retries, usually at increasing time intervals.

Error handing mechanisms, their syntaxes and details can be found here.

References

Congratulations on making it to the end! Feel free to talk tech or any cool projects on Twitter, Github, Medium, LinkedIn or Instagram.

Thanks for reading!

More content at plainenglish.io

--

--