A AWS SAM template
for AWS Forecast
process automation using AWS Step Functions
state machine, based on a real case study: Forecast of new daily positive based on COVID-19 italian datasets.
This AWS SAM template
is running in my AWS Account
and push daily the forecast in this repository: https://github.com/heyteacher/COVID-19 (folder dati_json_forecast
)
Furtermore datasets and forecasts are visualized by this charts dashboard https://heyteacher.github.io/COVID-19 an Angular 9
project hosted in this repository https://github.com/heyteacher/ng-covid-19-ita-charts
This AWS SAM template
is general purpose, so can be adapted to other forecast based on AWS Forecast removing or replacing specific use case tasks.
It's difficult automate AWS Forecast
process because:
-
AWS Forecast
tasks are long running proccess and cannot be start until previous step is succesfully finish -
AWS Forecast
doesn't implements push notification (for example viaAWS SNS
) to inform the end of a task, so it isn't possible do create e event driven flow ofAWS Forecast
tasks. It's only possible to poll entity status after creation in order to understand if it's succesfully created.
Why automate AWS Forecast
task using AWS Step Functions
?
Because AWS Step Functions
is a Serverless State Machine which orchestrate AWS Lambda
implements AWS Forecast
api calls managing AWS Forecast
entities, and support Retry, Fallback and other flow controls.
Only the first state machine execution creates the persistent entities Dataset, Dataset Group and Predictor, while during daily next executions, the forecast will update creating Forecast Dataset Import Job, Forecast and Export Job
The AWS Step Functions
is launched by a AWS Cloud Watch Event Rule
which start following the rule expression defined into StateMachineEventRuleScheduleExpression
parameter. But the forecast is generated only in day of week defined into ForecastDaysOfWeekExecution
parameter.
Below the daily flow of AWS Step Functions
steps:
-
Extend Dataset
is a specific task of case study, you can drop it. Download from daily official dataset, extend it and push in configured Github repository. It retries until a new dataset is pushed into official repository -
CheckDaysOfWeekForecastExec
is a simple inline lambda which setisToExecuteForecast
= true if the day of week of today is inForecastDaysOfWeekExecution
parameter -
ChoiceForecastExecution
is a choice onisToExecuteForecast
: iftrue
generate forecast otherwise go toDone
task and exit -
CheckDatasetExist
is the start state, check if the Dataset and (Dataset Group) exists.- If it doesn't exist means this is the first execution.
CreateDatase
create the Dataset and Dataset Group
- If it doesn't exist means this is the first execution.
-
WaitGithubRawRefresh
another specific task of case study which can be dopped. It wait some minute in order to be sure the github raw cache is refreshed after push -
CreateDatasetImportJob
downloads from configured Github the dataset (new daily COVID-19 time series), trasform the data in order to match che Forecast dataset structure, upload into the S3 Input Bucket and create the daily Dataset Import Job -
CheckPredictorExists
checks if the predictor exists.-
If predictor doesn't exist (means this is the firt execution) run
CreatePredictor
which create the Predictor. It will be create if there is at least one Dataset Import Job loaded. Then runWaitPredictorCreation
wait 50 minutes in order to be sure of Predictor creation -
otherwise run
WaitDatasetImport
which sleep 5 minutes
-
-
CreateForecast
creates the daily Forecast based on Predictor updated by daily Dataset Import Job -
WaitForecastCreation
sleeps 15 minutes in order be sure of forecast creation is finished -
CreateForecastExportJob
exports the daily forecast inS3 Output Bucket
. The upload wake upPushForecastInGithubFunction
which download the forecast, ad push into configured Github repository (thisAWS Lambda
is specific of study case) v -
WaitExportJob
sleeps 3 minutes in order to be sure of export is finished -
DeleteDatasetImportExportJob
delete the daily Dataset Import Job and the daily Export Job -
WaitDeleteDatasetImportExportJob
sleep 5 minutes in order to be sure of deletion is finished -
DeleteForecast
deletes the daily Forecast -
Done
the end state of workflow
Some tasks retries after a failure in order to wait that previous step is succesfully finished.
The AWS SAM Template
assign the minimum permission to each AWS Lambda Functions
in order to complete his task. All the entities (S3 Bucket
, AWS Lambda Function
, IAM Roles
, AWS Step Functions
, Event Rule
) are created/updated/deleted by AWS SAM Template
stack, so no manual activies is needes.
-
this project is ispired by https://github.com/aws-samples/amazon-automated-forecast
-
BE CAREFULL if yoy try to create a stack from this
SAM Template
. First execuction costs 4,00 EUR circa and next daily execution costs 1,00 EUR circa. -
I already run a stack in my
AWS Account
which produces forecast here https://github.com/heyteacher/COVID-19. So you can support this project making a donation -
Only
AWS Forecast
entitiesPredictor
, the firstDataset Import Job
,Dataset
andDataset Group
must be deleted manually if you decide to deleteAWS SAM Template
stack. -
All
AWS Lambda
are implemented inNodeJs 12.X
-
AWS Forecast
doesn't implement epidemiological forecasting scenario like COVID-19 Italian new cases series, so the algorithm is choosen by PerformAutoML=True. I'm not an expert, so help is appreciated in algorithm tuning for these use case https://docs.aws.amazon.com/forecast/index.html -
I spent a lot of time to improve the
AWS SAM Template
but I'm sure it could be better. So do not esitate so submit Issue or Pull Request
-
install
nodejs
aws-cli
aws-sam-cli
docker
-
generare
aws_ac-cess_key_id
andaws_secret_access_key
from a AWS user with the permissions for create/update/delete CloudFormation stacks -
create the github repository
<GITHUB_REPO>
in your account<GITHUB_USER>
-
generate a
<GITHUB_TOKEN>
in https://github.com/settings/tokens with scoperepo
-
to test locally lambda functions (for example
ExtendDataFunction
)sam local invoke ExtendDataFunction \ --parameter-overrides GitHubToken=<GITHUB_TOKEN> GitHubRepo=<GITHUB_REPO> GitHubUser=<GITHUB_USER>
Useful bash scripts
sam_local_invoke.sh.template
andsam_local_invoke_push_github.sh
can be customized in order to run locally lambda functions
Useful bash script deploy_stack.sh.template
can be customized in order to automate stack deploy (steps package
and deploy
)
-
delete old stack
aws cloudformation delete-stack --stack-name forecast-automation-covid-19-ita
-
package
aws cloudformation package --template-file template.yaml \ --output-template-file packaged.yaml \ --s3-bucket <SAM_TEMPLATE_BUCKET>
-
deploy
aws cloudformation deploy --template-file packaged.yaml \ --stack-name forecast-automation-covid-19-ita \ --capabilities CAPABILITY_IAM \ --parameter-overrides GitHubToken=<GITHUB_TOKEN> GitHubRepo=<GITHUB_REPO> GitHubUser=<GITHUB_USER>
-
show stack events
aws cloudformation describe-stack-events --stack-name forecast-automation-covid-19-ita
-
tail lambda logs (for example ExtendDataFunction)
sam logs -n ExtendDataFunction --stack-name forecast-automation-covid-19-ita --tail