Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: anonymous Lambda closures that can be inlined within Step Function closures #448

Open
tvanhens opened this issue Aug 26, 2022 · 21 comments
Labels
enhancement New feature or request

Comments

@tvanhens
Copy link
Contributor

tvanhens commented Aug 26, 2022

While it is desirable to do everything in a functionless way, it is almost always necessary to call out to a proper lambda to get around limitations at some point. The Function construct is great for when you have to integrate with a lambda because no other integration is supported, however, it is cumbersome when you are only using a lambda as "glue" in a mostly functionless StepFunction. Being able to create anonymous functions inline would limit the context shifts that are necessary to build a StepFunction.

Note: this is probably more broadly applicable than just for StepFunction integrations.

Current Experience

// Function must be defined ahead of time and away from it's usage
const sendToConnection = new Function(
  this,
  "SendToConnection",
  async (event: { url: string; message: any }) => {
    console.log(event);
    const client = new ConnectionClient(event.url);
    await client.sendMessage(event.message);
  }
);

new ExpressStepFunction(
  this,
  "HandlePostLog",
  async (event: { body: { message: string } }) => {
    const connections = await $AWS.DynamoDB.Query({
      Table: table,
      KeyConditionExpression: "#pk = :pk",
      ExpressionAttributeNames: {
        "#pk": "pk",
      },
      ExpressionAttributeValues: {
        ":pk": { S: "connection" },
      },
    });

    if (connections.Items == null) {
      return;
    }

    for (const item of connections.Items) {
      sendToConnection({ // Usage of function
        url: item.connectionUrl.S,
        message: event.body.message,
      });
    }
  }
)

Desired Experience

new ExpressStepFunction(
  this,
  "HandlePostLog",
  async (event: { body: { message: string } }) => {
    const connections = await $AWS.DynamoDB.Query({
      Table: table,
      KeyConditionExpression: "#pk = :pk",
      ExpressionAttributeNames: {
        "#pk": "pk",
      },
      ExpressionAttributeValues: {
        ":pk": { S: "connection" },
      },
    });

    if (connections.Items == null) {
      return;
    }

    for (const item of connections.Items) {
      // Able to use a lambda runtime inline with the rest of
      // the StepFunction domain logic
      await inlineFunction(
        async () => {
          const client = new ConnectionClient(item.connectionUrl.S); // can capture data from the parent closure
          await client.sendMessage(event.body.message);
        }
      );

      // Effectively translates to:

      // await new Function(
      //   this,
      //   `Fn_${genId()}`,
      //   async (event: { param_1: string; param_2: string }) => {
      //     const client = new ConnectionClient(event.param_1); // translated from parent closure
      //     await client.sendMessage(event.param_2); //translated from parent closure
      //   }
      // )({
      //   param_1: item.connectionUrl.S, // injected
      //   param_2: event.body.message, // injected
      // });
    }
  }
)

Implications

  • Anonymous functions would need to generate a resource id. This should be fine since they are not stateful.
  • Would need to take a second argument to override any Function props.
  • Would need to analyze, infer and translate the input event to the inline function. Alternatively, this data could be passed in as an addition arg to the inline function but this could be a confusing experience.
@sam-goodwin
Copy link
Collaborator

We will need to pass any captured closure state into the function, either as static environment variables or as input data.

const table = new Table(scope, id);

new StepFunction(scope, id, async () => {
  const variable: string = ??;
  await inlineFunction(async () => {
    variable; // variable captured in lexical scope must be passed as input since its value isn't known until runtime

    table.tableArn; // value captured in lexical scope, but this one can be passed through environment variables like we do today
  });
}));

It will need to be possible to pass properties such as memory and timeout, and these values must be constrained to values known during synth or deployment time - values known at runtime cannot be passed.

Do we need a new inlineFunction or can we use new Function(..)?

I noticed that inlineFunction doesn't have any scope, id parameters - what will we default them to? The same scope as the surrounding code, i.e. in this case the Step Function? what about ID? That will need to be deterministic and developers may even have opinions about what it should be.

@tvanhens
Copy link
Contributor Author

tvanhens commented Aug 27, 2022

We will need to pass any captured closure state into the function, either as static environment variables or as input data.

const table = new Table(scope, id);

new StepFunction(scope, id, async () => {
  const variable: string = ??;
  await inlineFunction(async () => {
    variable; // variable captured in lexical scope must be passed as input since its value isn't known until runtime

    table.tableArn; // value captured in lexical scope, but this one can be passed through environment variables like we do today
  });
}));

Yup that would be one of the bigger challenges to solve if this convenience was deemed worth it. I tried to call this out in my example translation. There are two values that get captured in lexical scope and need to be injected into the input of the function invocation:

await new Function(
  this,
  `Fn_${genId()}`,
  async (event: { param_1: string; param_2: string }) => {
    const client = new ConnectionClient(event.param_1); // translated from parent closure
    await client.sendMessage(event.param_2); //translated from parent closure
  }
)({
  param_1: item.connectionUrl.S, // injected
  param_2: event.body.message, // injected
});

It will need to be possible to pass properties such as memory and timeout, and these values must be constrained to values known during synth or deployment time - values known at runtime cannot be passed.

This is a good point, and maybe a deal breaker for this convenience. I had thought about these values being passible as an optional second argument, but the implementation would have to be smart about making sure those values were known at synth or deployment time.

My understanding of the compiler is rudimentary at this point, so this may not be feasible, but it seems like this could be enforced at compilation time by eval-ing the result of the second argument and transforming the first argument. Because the first argument is transformed and not evaluated, none of its scoped variables would be available in the evaluation of the second argument. This is assuming the signature looks something like:

type InlineFunction = <I, O extends Promise<unknown>>(
  handler: () => O,
  props?: FunctionProps
) => O;

Do we need a new inlineFunction or can we use new Function(..)?

inlineFunction might not be the best name. However, if you look at the transformed output, I think it makes sense to have a convenience function here. If you are writing out the new Function(..) invocation in full, you might as well pull that into a separate named construct and use it as originally intended.

My mental model for thinking of an inlineFunction is the same way I think of anonymous functions in any other language that supports them. If you're using a function all over the place and want to give it an explicit name to make it easy to reference, you use function <name> (..) {}. If your function is one-off for a .map(..) or .filter(..) case, you tend to use an anonymous function or an arrow function.

I noticed that inlineFunction doesn't have any scope, id parameters - what will we default them to? The same scope as the surrounding code, i.e. in this case the Step Function?

Yup scope would use the same scope as the integration it is inlined within ideally. If this is not possible to reliably determine, then a scope argument could be added.

what about ID? That will need to be deterministic and developers may even have opinions about what it should be.

Agree ID should be deterministic, although I think this can be generated deterministically similarly to how most programing languages handle anonymous functions. However, I don't think developers need to have an opinion on what it should be, it just needs to be generated in a deterministic way to prevent collisions.

This is one of the tradeoffs of using anonymous function, you lose control of naming and your call stack becomes harder to debug. For example in node an anonymous function becomes REPL3:2:29 in the stack trace of an error:

Screen Shot 2022-08-27 at 11 25 21 AM

However, if I use a named function instead, I get a more debuggable stack trace:

Screen Shot 2022-08-27 at 11 30 07 AM

Note that in this example I can locate the error better because it is in the named function bar.

In my ideal world, users have both named (via new Function(..)) and anonymous (via something like inlineFunction) functions available to chose from and they would be free to navigate the tradeoffs as they saw fit.

@thantos
Copy link
Collaborator

thantos commented Aug 27, 2022

Love the idea.

We could do something like Function.exec where exec is an integration implemented by each resource. That would allow the resource to implement things like construct scope, id, and runtime scope copying.

exec = makeIntegration<"Function.exec", (in: IN) => OUT>("Function.exec", {
    asl: (call) => {
        const closure = getClosure(call);
        new Function(context.resource, makeId(), async () => {
             // populate scope
             const result = await closure(); // exec the closure
             // return scope mutations and result to sfn
        });
        return // ASL to call the function, manipulate the input and out. etc
    }
})

@sam-goodwin
Copy link
Collaborator

Love the idea.

We could do something like Function.exec where exec is an integration implemented by each resource. That would allow the resource to implement things like construct scope, id, and runtime scope copying.

exec = makeIntegration<"Function.exec", (in: IN) => OUT>("Function.exec", {

    asl: (call) => {

        const closure = getClosure(call);

        new Function(context.resource, makeId(), async () => {

             // populate scope

             const result = await closure(); // exec the closure

             // return scope mutations and result to sfn

        });

        return // ASL to call the function, manipulate the input and out. etc

    }

})

I don't understand. Can you show an example

@thantos
Copy link
Collaborator

thantos commented Aug 28, 2022

new ExpressStepFunction(
  this,
  "HandlePostLog",
  async (event: { body: { message: string } }) => {
    const connections = await $AWS.DynamoDB.Query({
      Table: table,
      KeyConditionExpression: "#pk = :pk",
      ExpressionAttributeNames: {
        "#pk": "pk",
      },
      ExpressionAttributeValues: {
        ":pk": { S: "connection" },
      },
    });

    if (connections.Items == null) {
      return;
    }

    for (const item of connections.Items) {
      // Able to use a lambda runtime inline with the rest of
      // the StepFunction domain logic
      await Function.exec(
        async () => {
          const client = new ConnectionClient(item.connectionUrl.S); // can capture data from the parent closure
          await client.sendMessage(event.body.message);
        }
      );
    }
  }
)

@thantos
Copy link
Collaborator

thantos commented Aug 28, 2022

the idea of allowing new Function is also an option, but

  1. I don't like the idea of mixing the synth time new into a runtime function.
    1. Personally I think that the semantics would be confusing. In a for loop or if statement, it would look like conditionally or iteratively creating functions, but really it is calling one function that is always created.
  2. To handle things like scope management (all free variables should be present and changes to them should be reflected) will be per Resource (step functions needs to put all of the free variables into the payload and extract any changes). I don't like the idea of handing new XYZ within each resource, but an integration (ex: Function.exec(...)) is an established pattern.

@sam-goodwin
Copy link
Collaborator

How about this crazy idea?

await Function(async () => {
  ..
});

@thantos thantos added the enhancement New feature or request label Aug 28, 2022
@sam-goodwin
Copy link
Collaborator

Maybe this is helpful? We think of constructs (scope, id, props) as nodes in an abstract syntax tree of a cloud native programming language.

new Function(scope, "foo");

new StepFunction(scope, "bar");

These are equivalent to syntax in a hypothetical language:

lambda function foo() {}

step function bar() {}

What would an anonymous function then be? Should it just be a function without an id?

new Function(scope, async () => )

Later on down the road we can explore ways to infer scope, and ID can come from syntax like previously described.

We should probably make sure our approaches are consistent with this mental model, but I'm also ok with exploring convenience features to extend the capabilities of step functions.

@thantos
Copy link
Collaborator

thantos commented Aug 28, 2022

Agreed in general, but that is not what this request was for.

(Please correct me of I am taking this off track @tvanhens)

This request is for a nested closure like ability where the nested closure is actually a Lambda Function instead of an SFN function.

With your mental model, we can do this today

lambda function foo(a: number, b: number) { return a + b; }

step function bar(input) {
   [1,2,3].map(async (i) => foo(i, input.n));
}

But how would we do this today?

step function bar (input) {
   const value = input.arr.map(lambda (i) => input.n + i );
}

the anonymous function gets access to input.n, does not get named, and is not accessible outside of the step function closure.

@thantos
Copy link
Collaborator

thantos commented Aug 28, 2022

While functionless will be much better at reducing a stack to the program logic, step functions will have a similar issue where a bunch of lambdas are defined at the top of the file and then used within the step function. If we can find a way to write the lambda as it is needed, inline, we'll reduce that split brain issue event more (plus closured values).

@sam-goodwin
Copy link
Collaborator

Agreed in general, but that is not what this request was for.

Anonymous functions were requested, their design should probably align with syntax design.

@thantos
Copy link
Collaborator

thantos commented Aug 28, 2022

What would an anonymous function then be? Should it just be a function without an id?

This comment made it seem like we'd just solve the problem of giving the function a name, not the locallity.

From the original request:

// Able to use a lambda runtime inline with the rest of
// the StepFunction domain logic

@sam-goodwin
Copy link
Collaborator

sam-goodwin commented Aug 28, 2022

all free variables should be present and changes to them should be reflected

Your bring up a good point and that sounds challenging. Reflecting changes would be cool if possible.

@thantos
Copy link
Collaborator

thantos commented Aug 28, 2022

I think its possible, just need to wrap the start and the end of the function to setup the scope and then include the values in the request and updates in the response.

new Function(context.resource, ... generate ID...., async (request) => {
   // maybe separate by mutable and const?
   let { ... free variables .... } = request.scope;

   const closure = // inline the closure to get access to the scope...?
   const response = closure(request.payload);

   return {
      scope: {  ... mutable free variables ... },
      response
   }
});

@thantos
Copy link
Collaborator

thantos commented Aug 28, 2022

and then

{
   "Type": "Task",
   "Parameters": {
      "scope": {  ... free variables ... },
      "payload": args[0] // evaluated etc...
   },
   "ResultPath": "output"
}

And states to handle applying the output.payload and output.scope back to the state.

@tvanhens
Copy link
Contributor Author

While functionless will be much better at reducing a stack to the program logic, step functions will have a similar issue where a bunch of lambdas are defined at the top of the file and then used within the step function. If we can find a way to write the lambda as it is needed, inline, we'll reduce that split brain issue event more (plus closured values).

Taking a step back, this was the problem I was hoping to solve. Although, the more I think about it, what I'm asking for isn't exactly an anonymous function - it's more of a contextual closure as @thantos highlighted:

Agreed in general, but that is not what this request was for.

(Please correct me of I am taking this off track @tvanhens)

This request is for a nested closure like ability where the nested closure is actually a Lambda Function instead of an SFN function.

Anonymous functions were requested, their design should probably align with syntax design.

I think I've caused confusion by labeling the feature as an anonymous function as opposed to a lambda-contextualized closure that can can be inlined within a Step Function definition closure.

@tvanhens tvanhens changed the title Proposal: anonymous functions that can be inlined within integrations Proposal: anonymous Lambda closures that can be inlined within Step Function closures Aug 29, 2022
@tvanhens
Copy link
Contributor Author

FWIW I do like the Function.exec(...) syntax @thantos proposed above. Not sure if the fact that it would be a static method on the class would be weird but exec captures the intent better than inlineFunction or new Function.

Another reason I like it is because I think it will also be desirable to inline ExpressStepFunction within StepFunction as well. The use case I have in mind here is limiting expensive state transitions by using ExpressStepFunction for state transition heavy code. Adding a static to the class that "owns" the context would make this behavior consistent.

Can't tell you how many times I tore my hair out trying to build something like this in the last few years.

new StepFunction(this, "MyEDIWorkflow", async () => {
  // Run this in a full step function context for auditing
  
  // Step 1: Poll Files from Partner's FTP Server
  const fileRefs = ExpressStepFunction.exec(async () => {
    // Run this in an express context for cheaper orchestration
    // Find all files that need to be downloaded
    let fileNames = await Function.exec(async () => {
      // Run this in a lambda context for ftp support
      return await ftp.connect(async () => {
        return await ftp.listFiles();
      });
    });
  
    return await $SFN.map(
      fileNames,
      { maxConcurrency: 3 },
      async (fileName) => {
        const [md5, content] = await Function.exec(async () => {
          // Run this in a lambda context for ftp support
          return await ftp.connect(async () => {
            const content = await ftp.download(fileName);
            return [md5(content), content];
          });
        });
  
        await $AWS.DynamoDB.putItem({
          table: DataStore,
          item: {
            pk: "inboundFiles",
            sk: md5,
            content,
          },
        });
  
        return md5;
      }
  );

  //...
  // Step 2: Retrieve files by ref and transform saving the transformed result
  // Step 3: Load transformed content into ERP
});

Being able to weave in and out of the execution context that makes architectural sense for the task without having to jump around a giant file to wrap your head around what's going on would be killer.

@sam-goodwin
Copy link
Collaborator

Agree that is an awesome experience. One thing to consider is how developers maintain those inner resources. Can they add them to cloudwatch dashboards, create alarms, etc. How would we expose them? Do we need to allow developers to optionally provide scope, id as overloads?

@sam-goodwin
Copy link
Collaborator

Was thinking, to help with providing sensible defaults for things like memory and timeout, perhaps we can provide helpers like

Function.execLight
Function.execHeavy
Function.execShort

Not sure if this is helpful.

@tvanhens
Copy link
Contributor Author

Was thinking, to help with providing sensible defaults for things like memory and timeout, perhaps we can provide helpers like

Function.execLight
Function.execHeavy
Function.execShort

Not sure if this is helpful.

Couldn't an optional second argument allow users to set specific parameters for anonymous functions? Figuring out sane defaults for these would be tough.

@sam-goodwin
Copy link
Collaborator

A potentially interesting CLI/UI feature for this would be a consolidated log view. See all the logs for the Step Function and all of its inner processes (Step Functions, Express Step Functions and Lambda Functions).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants