Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: CloudWatch Event Integration #129

Open
Auronmatrix opened this issue Apr 15, 2021 · 5 comments
Open

Feature Request: CloudWatch Event Integration #129

Auronmatrix opened this issue Apr 15, 2021 · 5 comments
Labels
enhancement New feature or request

Comments

@Auronmatrix
Copy link

Auronmatrix commented Apr 15, 2021

Really appreciate the work going into this lib. One feature that seems like it could contribute a lot of value is integration with cloudwatch events. For instance, it is common that DS want to rerun their training pipelines on a periodic schedule.

Are there any plans in the pipeline to support something like this?

@wong-a
Copy link
Contributor

wong-a commented Apr 16, 2021

You can create a scheduled rule today by calling EventBridge put_rule and put_targets with boto3. Is there a different way you'd like to see this done in the Step Functions SDK?

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/events.html#EventBridge.Client.put_rule
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/events.html#EventBridge.Client.put_targets

@Auronmatrix
Copy link
Author

Auronmatrix commented Apr 16, 2021

Sure, using boto3 is a valid way to do it. If this is something that is out of scope for the library and where the line of abstraction is drawn between the Step Function SDK and boto3 that is okay.

Including good examples/documentation of how to configure an scheduled execution might be sufficient.

This request was raised due to using the following the example notebook step_functions_mlworkflow_scikit_learn_data_processing_and_model_evaluation.
When creating SageMaker steps it recommends you use the ExecutionInput to pass in an unique dynamic job_name that the step can grab from the Step Function execution context. We wanted to run this workflow on a schedule.

It isn't apparent how to do that using the put_rule and put_target features from Boto3 if you use the Step Function as target. Eventually we used a lambda target that in turn would call start_execution for the step function with the generated job_name. Not sure if using jsonpath dynamic values could be used here?

Some opinionated guidance from the SDK would be extremely helpful here since it is a pretty common use case.

@Auronmatrix
Copy link
Author

Auronmatrix commented Apr 16, 2021

If adding a level of abstraction however does makes sense. Something in the line of:

schedule = workflow.schedule(
    expression="cron(0 12 * * ? *)",
    inputs={
        "PreprocessingJobName": preprocessing_job_name_prefix
        "TrainingJobName": training_job_name_prefix,
        "EvaluationProcessingJobName": evaluation_job_name_prefix
         },
    dynamic_inputs=['PreprocessingJobName', 'TrainingJobName', 'EvaluationProcessingJobName'],
    dynamic_transform='timestamp'
    }
)

Which would then generate the CloudWatch Event + Lambda to do the transforms, which would then invoke the step function. The transform above being "append timestamp" to all keys in the dynamic_inputs array.

The advantage with having it as part of the lib stack would be that get_cloudformation_template() could also include the scheduling infrastructure to be deployed into different accounts.

@wong-a
Copy link
Contributor

wong-a commented Apr 16, 2021

Thanks for providing a code example. Adding an abstraction in SDK makes sense, especially with the CFN template. However, I would like to be cautious about making it general purpose and not adding an intermediary Lambda function if it's not needed.

EventBridge supports Step Functions as a target directly. The targets can have some basic JSON transformations similar to Parameters in Step Functions, so you might be able to this without a Lambda function. What does the final JSON payload you want look like?

To set up a rule and target in boto3 looks something like this (disclaimer: I may have fudged the syntax and params a bit):

eventbridge_client = boto3.client('events')

# Create the Step Functions workflow
# workflow = ...

# Put an event rule
rule_name = 'workflow_cron_job'
put_rule_response = eventbridge_client.put_rule(
    Name=rule_name,
    RoleArn='IAM_ROLE_ARN', # IAM role ARN with permissions for EventBridge to call Step Functions StartExecution
    ScheduleExpression='cron(0 12 * * ? *)',
    State='ENABLED'
)

eventbridge_client.put_targets(
    Rule=rule_name,
    Targets=[
        {
            'Arn': workflow.state_machine_arn,
            'Id': 'myCloudWatchEventsTarget',
            'Input': '{your input to StartExecution}',
            'InputTransformer': ...
        }
    ]
)

@Auronmatrix
Copy link
Author

Thanks for the links. I wasn't aware of the input transformation possibilities. Would definitely drop the transformation lambda in that case in support for an abstraction based on using input transformers.

@wong-a wong-a added the enhancement New feature or request label Apr 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants