Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider Summarize implementation improvements #2778

Open
MikeStall opened this issue Dec 17, 2024 · 1 comment
Open

Consider Summarize implementation improvements #2778

MikeStall opened this issue Dec 17, 2024 · 1 comment
Assignees

Comments

@MikeStall
Copy link
Contributor

MikeStall commented Dec 17, 2024

No change to language semantics , but consider some implementation improvements to help with maintainability & delegation.

[1] Comment the implementation

[2] In Summarize, line 1446, we do a uniqueness check:
var key = keyRecord.ToExpression();

This is not ideal. We really shouldn't call ToExpression at runtime like this - I'm wondering if there are cases it could be unpredictable.

This overlaps with Distinct. It would be good to compare how distinct does it.

[3] Consider a "Plan" design pattern.
At the source-level, grouping columns are identifiers.
Aggregate columns are 'As' nodes.

At the IR level, we pass all this through a formulaValue[].
grouping columns are encoded as StringValues
aggregate columns are encoded as LambdaFormulaValue around a LazyEvalNode around a RecordNode , where that Record has 1 field, where the field's Name is the 'As' , and the field's vakue is the lambda to run the expression. (phew, this is starting to get complex!)
This is kind of "hidden" contract between IR and interpreter - it's kind of arbitrary. But Delegation needs to be aware of this.
This is a pretty simple contract, so it's not a big deal. But it's super brittle- there's a log of arbitrary choices in here on encoding, and if we ever changed it, we'd break delegation.

Whereas a "Plan" object would make it an explicit contract. Ie, imagine if the IR captures all of the key information in some structured C# class that it passed through.

Join has a similar thing as is more brittle.

A "plan" object, might look like:

// Eg: Summarize(t1, SalesDate, Sum(ThisGroup.Amount) As Total);
// Binder has guaranteed uniqueness of all column names 
class SummarizePlan
{
	// The columns used to group. Ie "SalesDate". 
       // This is unordered.
	IEnumerable<string> GroupingColumnNames {get;set; }
	// The aggregate columns computed on each group.        
	// This takes in a 'ThisGroup' identifier specified via rule scope. 
       // Key is column name (eg, "Total")
       // Value is the IR to run it,  (eg, Sum(ThisGroup.Amount)) 
	Dict<string, IRNode> AggregateColumns {get;set; }  
 
       // Possibly extra FormulaType info if needed. 
}
@MikeStall MikeStall changed the title Consider Summarize improvements Consider Summarize implementation improvements Dec 17, 2024
@anderson-joyle anderson-joyle self-assigned this Dec 17, 2024
@anderson-joyle
Copy link
Contributor

[1] Comment the implementation

Addressed at #2779

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants