-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster processing of Data
objects
#6225
Comments
So switching gears to multi-apply and multi-lambda and multi-apply and also adding arrays as a plc term right? Just to summarize this PR. |
Yes, so far this seems like the most plausible solution to me (just to clarify, it's an issue, not a PR, I haven't written any code for that quite yet), but others may have better ideas and I'm not entirely confident the idea is gonna work out and be worth it. |
@IntersectMBO/plutus-core please review this issue carefully and check whether there are any flaws in my reasoning or whether there's something missing. If I don't get any negative feedback, I'm going to proceed with shipping pattern matching builtins instead of the SOPs approach (see this issue for details on what the two approaches are). |
Preface
This issue is a continuation of #5777 (I'm going to call it "the previous issue") and offers a different perspective. Reading the previous issue is a prerequisite for reading this one.
The previous issue explained how it'd be beneficial to have a way of matching on
Data
objects by converting them to SOPs first: this would make processing ofData
much more efficient. The issue previous also explaines how there doesn't appear to exist any safe and efficient way to perform the conversion. Hence here we're going to investigate alternatives.The goal
We want to be able to express the following (pseudocode):
where
inspectConstr
is some function that takes aData
object and a bunch of branches (in the form of a list or an array or something), picks the right branch depending on the index of the constructor (i.e. the first argument ofConstr
) and applies the multi-argument function of that branch to the arguments of the constructor (i.e. what's stored in the second argument ofConstr
). So the example above evaluates towhere
\[v0, v1, v2]
is a multi-lambda binding three variables and[d0, d1, d2]
is some kind of a spine (a list or an array or etc, can be the same thing as what we store the branches in, can be different). That expression is equivalent toand evaluates the same way.
Note that regardless of how we're going to represent this expression in the AST we want to only allow exactly-saturated applications for multi-lambdas, i.e. neither undersaturation (a multi-lambda binds more variables than the number of arguments it's applied to) nor oversaturation (a multi-lambda binds fewer variables than the number of arguments it's applied to) are allowed. The reason for this restriction is precisely what allows us to avoid all the problems described in the "What exactly breaks if we add the unsafe
constrTermFromConstrData
?" section of the previous issue.Is there a straightforward way to make
inspectConstr
work?The answer is "yes", I believe. The following pseudocode definition would cut it:
To make it work we need to
(1) add
array
to the language and not as a built-in type, because values of built-in types can only be constants but here we need to have an array of functions(2) teach the builtins machinery about
array
(3) add multi-lambdas and multi-applications to the language (or something along these lines)
(4) allow for returning applications from within the builtins machinery
(2) and (4) are simple. (4) has a prototype implementation here and (2) is very similar to teaching builtins about SOPs, which has a prototype implementation here
(1) and 3 are hairy but doable.
The real question however is whether we can avoid doing a part of this work by piggy-backing on some existing machinery like SOPs.
Can we use SOPs to avoid extending the language as much?
Instead of having multi-lambdas and multi-applications we could use SOPs to bind a bunch of variables at once, but the problem with that is that it doesn't give us exact saturation, hence SOPs don't seem to be of any help here. In retrospect, we should've probably made SOPs exactly saturated and it's probably too late to do it now. But maybe we could make multi-lambdas play nicely with SOPs and recover exact saturation for SOPs this way, but that requires adding multi-lambdas to the language in the first place.
But let's assume we somehow have exact saturation for SOPs and the following is expressible in the AST (note that it's
Term.Constr
where it wasData.Constr
before):we still don't know how to convert a
Data.Constr
to aTerm.Constr
as per the "Typing issue" section of the previous issue.Moreover, even if wanted to only use SOPs to pick the right branch, without applying the function in that branch, binding multiple variables etc -- just to pick the branch, we don't know how to give a type even to that built-in function. In a dependently typed system it would actually be easy:
for example
At runtime this builtin would check that the given
n
matches the number of arguments of the givenData.Constr
and either fail if it doesn't or return the SOP of the appopriate shape. This would be an entirely safe builtin to have, it would throw immediately on incorrect input and return a well-typed thing for correct input. The problem of course is that we don't have dependent types in Plutus.We could emulate them using singletons as per the "Singletons" section of the previous issue like this:
This would require us to extend the language with whatever
SopOfListData
is (a constructor ofType
? A built-in type?), as well as teach the builtins machinery about it, at which point it's probably just easier to add arrays to the language and be done with it.I therefore cannot think of any way to leverage SOPs for faster processing of
Data
objects. And it makes sense: SOPs are richly typed in that they reflect the structure of their values at the type level, but this is precisely what we don't want when we're dealing with the "untyped"Data
objects that have unknown structure. Reflecting that unknown structure at the type level just to give up on it immediately after in acase
expression is a wasted effort, matching directly without jumping through typing hoops is more natural and has to be more efficient -- and efficiency is what motivates this whole issue.Summarizing, (2), (3) and (4) all seem to be necessary. (1) may or may not be replaceable with the singletons approach to
constrTermFromConstrArgsData
, maybe it's worth investigating that but if we're going to add spines (lists/arrays/whatever) for (3) anyway, then we can reuse them for (1) without bothering with SOPs at all.The text was updated successfully, but these errors were encountered: