Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: instances for cats #437

Open
jchapuis opened this issue Apr 4, 2024 · 7 comments
Open

Proposal: instances for cats #437

jchapuis opened this issue Apr 4, 2024 · 7 comments
Assignees
Labels
kind/question Questions about existing features

Comments

@jchapuis
Copy link

jchapuis commented Apr 4, 2024

For 馃惐 lovers out there, how about including cats instances in the besom-cats package?

allows for compact syntax like role >> roleBinding >> service

import besom.{Context, Output}
import cats.Functor

object CatsInstances {
  implicit val outputFunctor: Functor[Output] = new cats.Functor[Output] {
    def map[A, B](fa: Output[A])(f: A => B): Output[B] = fa.map(f)
  }

  implicit def outputApplicative(implicit context: Context): cats.Applicative[Output] = new cats.Applicative[Output] {
    def pure[A](x: A): Output[A]                               = Output(x)
    def ap[A, B](ff: Output[A => B])(fa: Output[A]): Output[B] = fa.flatMap(a => ff.map(f => f(a)))
  }

  implicit def outputMonad(implicit context: Context): cats.Monad[Output] = new cats.Monad[Output] {
    def flatMap[A, B](fa: Output[A])(f: A => Output[B]): Output[B] = fa.flatMap(f)
    def pure[A](x: A): Output[A]                                   = Output(x)

    def tailRecM[A, B](a: A)(f: A => Output[Either[A, B]]): Output[B] =
      f(a).flatMap {
        case Left(a1) => tailRecM(a1)(f)
        case Right(b) => Output(b)
      }
  }
}

(the tailrec might need an actual implementation, which probably requires access to internals)

@pawelprazak pawelprazak added the kind/question Questions about existing features label Apr 4, 2024
@lbialy
Copy link
Collaborator

lbialy commented Apr 4, 2024

I thought about that too but I'm a bit apprehensive about this move because Output isn't really an equivalent of IO, not in the "global monad" sense - Output has Pulumi-oriented semantics and while it prooooobably would pass discipline tests for a Monad, Functor and Applicative it has some funny semantics related to dry runs (previews). In previews there are actually two different types of Outputs: static ones that are just like IO and if they say they are Output[A] you will get an A if you flatMap on it (provided it's not a failed Output) and computed ones that contain values resolved from providers in actual application of infrastructure. Computed ones behave like Option[A] in dry runs - they short circuit without a way to inhibit that (we could add a combinator to provide a dummy value in dry runs though) and do not run flatMaps. This is all probably fine with the laws but it is surprising to the end user. For instance, a 馃惐 aficionado would expect that it's possible to just flatMap on everything that is a monad and write something like this using TF/MTL style (and please forgive me if this makes little sense for some reason, I'm assuming here that intent is to write TF over Output with MTs over Outputs as F implementation):

def doThings[F[_]: Async: LiftOutput]: F[Unit] = // assuming we can lift Output to F using a typeclass
  for 
    uuid <- UUIDGen.randomUUID[F]
    // this is resource constructor, it returns a static Output, behaves like IO
    bucket <- s3.Bucket("my-bucket", s3.BucketArgs(name=s"my-bucket-$uuid").lift[F] 
    // properties on resources are computed Outputs so this will short-circuit in dry run and break plan
    bucketName <- bucket.name.lift[F] 
    _ <- uploadAFile[F]("my-file", bucketName, "./my-file.html")
 yield ()
    
def uploadAFile[F[_]: Async: LiftOutput](
  name: NonEmptyString, 
  bucketName: String, 
  path: Path
): F[s3.BucketObject] =
  s3.BucketObject(name, 
    s3.BucketObjectArgs(
      bucket = bucketName,
      key= name,
      source=pulumi.FileAsset(path.toString),
      etag=std.filemd5(input=path)
    )
  ).lift[F]

but this would - in dry run - completely skip the part where a file is uploaded to the bucket where without a flatMap this operation would show up in the plan shown as a result of dry run / preview.

There's also another problem - Besom is not written in tagless final style and therefore one can't put F[_] into Stack.exports clause because it only works with Outputs. This makes it very very hard to use higher abstractions over Outputs and I sadly have to admit that this is by design - not to make it inconvenient to use TF/MTL style but to make it more clear that Besom is a higher level, domain specific DSL that allows to embed parts of programs written in any other style as side effecting code that does auxiliary tasks during infrastructure management work. So the intended direction is for things to end up translating to Outputs and that Outputs are final types in which infrastructural programs are expressed (and also with Outputs being used as pipes that transform data between resource properties and other resource inputs, not as global effect monads because it is strongly preferable to make all resource definitions top-level ).

There's also ongoing work to make the distinction between computed Outputs and static Outputs known on type level and to inform users about the possibility of broken plan in dry run due to resource constructor being called in a flatMap on a computed Output and it is possible that it will introduce separation between these two types of Outputs by splitting them into two separate types and that would force us to do even more magic with cats instances I'm afraid.

If you have another use case for those instances in mind please do tell! I'm also thinking about converting this issue to a discussion given that this is quite an important thing and we would really like to make it very obvious how Outputs work and discussions are probably easier to locate and pin than issues.

@lbialy
Copy link
Collaborator

lbialy commented Apr 4, 2024

Just to be clear - this is what I mean by Outputs being the final types and IOs/ZIOs/Futures being subsumed into them:

def doThings: Output[Unit] = {
  // notice `p""` interpolator:
  val bucket = s3.Bucket("my-bucket", 
    s3.BucketArgs(name= p"my-bucket-${Output.eval(UUIDGen.randomUUID[IO])}"
  )
  
  // no flatMap, bucket name property is being passed as Output, direct syntax available via lifting
  uploadAFile("my-file", bucket.name, "./my-file.html").void
}

def uploadAFile(
  name: NonEmptyString, 
  bucketName: Output[String], 
  path: Path
): Output[s3.BucketObject] =
  s3.BucketObject(name, 
    s3.BucketObjectArgs(
      bucket = bucketName,
      key= name,
      source=pulumi.FileAsset(path.toString),
      etag=std.filemd5(input=path)
    )
  )

info about lifting: https://virtuslab.github.io/besom/docs/lifting

@jchapuis
Copy link
Author

jchapuis commented Apr 8, 2024

Thanks for the elaborate response! I have to admit I haven't tried running the Monad laws on Output, I can try when I find a moment.

My immediate concrete use case was usage of the >> operator. But this could be added directly in the extensions I suppose? I find using monad transformers might also become interesting if I need to integrate conditional logic (OptionT) and or error handling when dealing with some dynamic deployment code (EitherT).
Tagless final why not for reusable bits of logic, but it's not immediately obvious, especially since as you say the design doesn't seem prepared for pluggable interpreters of the monadic chain (if I understood well).

(btw unrelated but it seems like both Intellij and Metals are struggling with type inference, not sure if it's work in progress or if it's just due to current scala 3 integration state of things)

@jchapuis
Copy link
Author

jchapuis commented Apr 10, 2024

To follow up on this, have you considered representing the Context as a Reader, rather than a given instance? maybe you discarded this option to support the literals?

On the plus side, I'm thinking a Reader would make it natural to build a besom program value, pass it around, and interpret it with different contexts (for instance, for testing the program). It might also help with type inference and IDE performance (not sure, just a guess)

@lbialy
Copy link
Collaborator

lbialy commented Apr 17, 2024

Hey, I've been thinking about this a lot lately. I've considered using a reader monad in the design stage but I wasn't able to come up with a solution to a pretty important invariant we have to never break. Context carries around a reverse semaphore called internally TaskTracker. This semaphore is created with Int.MaxValue permits and there's also a method called waitForAllTasks that basically takes Int.MaxValue permits from the semaphore effectively blocking until all permits are returned. Now every Output ever created has to be registered with this semaphore and take a lease which is then returned once said Output finishes computation. This is very important because we run gRPC tasks in a fire-and-forget fashion and resolve all resources using Promises (a resource is just a case class containing Outputs returned from Promise#get and there's a fork handling gRPC call that will resolve said promises with errors or failures). I've been pretty anal about this invariant because I didn't want to introduce very hard to diagnose early exit errors and therefore even Output.pure registers with the TaskTracker (I am aware that this is probably unnecessary). Having worked on the internals for so long now I have a better grasp on what and when should be tracked and I'm slowly coming around to the idea that we could loosen tracking a bit and should that be possible, we could change the design to use reader monad and pass context internally in Result. It would require, however, some work around debuggability of TaskTracker with an easy way to log how many permits were taken at any given point in the program and then some careful testing. Possible, just not easy.

@jchapuis
Copy link
Author

jchapuis commented Apr 19, 2024

Thanks for the detailed answer! From your description, I understand one challenge is to implement a kind of thread barrier for the completion of all the asynchronous tasks.

My suggestion of using a reader was also a question regarding the feasibility of separating the formulation of the besom program from its execution in such a way that the pulumi instructions could be treated as pure values. But I'm not familiar with your internals, so feel free to discard 馃槃

So with a definition a bit like this:

case class BesomProgram[A](run: Output[Context => A]):
   def map(f: A => B) = ???
   def flatMap(f: A => ContextReader[B]) = ???

then your program could be

val program: BesomProgram[Stack] = ...

so that you could do

val stack = program.run(Output.pure(realContext)).interpret(realPulumi)
val assertions = program.run(Output.pure(testContext)).interpret(munitTester)
val diagram = program.run(Output.pure(previewContext)).interpret(fancyVisualizer)

Essentially all accesses to pulumi would somehow have to be tracked into Output values (kind of a writer) but not executed directly. Hopefully, I'm not suggesting re-inventing an effect system here: Output isn't a higher-kinded type constructor so maybe complexity-wise it's reasonable. Scala code expressing the besom program could still side-effect but everything effecting pulumi would be pure. Then orchestration of asynchronicity when dealing with pulumi would probably be easier to express in the interpreter than it is today.

@lbialy
Copy link
Collaborator

lbialy commented Apr 19, 2024

Such a formulation doesn't seem to prevent our two biggest problems as far as I can see:

  1. in dry run computed Outputs behave like None and short-circuit
  2. there are grpc calls that have to be called exactly once

2nd would be actually much easier to deal with (because in usual pure FP for-comp-based monadic api you just flatMap stuff and that's how you control evaluation count) if not for the 1st!

Current state of the SDK is extremely similar to other Pulumi SDKs and that's by design (to allow users of other SDKs to understand Besom by looking at the code). Divergences from behavior or shape of behavior of other SDKs is very small (there's just a single behavioral difference caused by our need to memoize resource constructor calls, other than that we return resources in Outputs and other SDKs return resources unwrapped so a single syntactic difference). We could potentially diverge a bit more if it meant radical improvement in usability.

I was thinking about autoderived Default[A] for all stuff that arrives from the engine via gRPC so that we could substitute missing stuff in dry run. This would alleviate 1) and allow users to just flatMap everything all over the place but it would make things extremely fragile unfortunately:

a) let's assume a field of type String on some resource that gets populated during runtime
b) nothing prohibits the user from mapping/flatMapping on that Output and parsing said String
c) said String would adhere to some structural convention of a cloud provider but we have no way of capturing such information, therefore
d) our dummy string from Default[String] would fail in user's code, depending on user's logic this error would either crash dry run phase or worse: yield a plan different from what is going to be applied!

This is just one option but it's fairly easy to see that it's going to be troublesome. The other aforementioned option is to detect usage of resource constructors inside of Output#flatMap (and this is the direction we're exploring the most). This would push users towards the style we currently promote (kinda like direct-style, with Outputs used to transform properties that are then fed to Inputs of other resources) and that's also similar to other SDKs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question Questions about existing features
Projects
None yet
Development

No branches or pull requests

3 participants