Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request - grouped by columns available as single values rather than vectors #361

Open
Lincoln-Hannah opened this issue May 26, 2023 · 5 comments

Comments

@Lincoln-Hannah
Copy link

Would it be possible, within a @by block, to make the grouped by columns available as single values rather then vectors?

In the below, I'd like to create a column of myCurve structs, but because the :name column comes through as a vector, it only works for the myCurve_name_vec structs. I could convert it, it just wouldn't be so clean.

More generally, if you are grouping by a column, any related calculations would likely use that column as a single value.


@with_kw struct myCurve    
    name::Symbol
    curve::Vector{Int64}
end

@with_kw struct myCurve_name_vec
    name::Vector{Symbol}
    curve::Vector{Int64}
end


d = DataFrame( name=[:a,:a,:a,:b,:b,:b], curve =[1,2,3,11,12,13] )

@by d :name   :x = myCurve_with_vec( AsTable(:)... )    #works
@by d :name   :x = myCurve( AsTable(:)... )                   #doesn't work
@bkamins bkamins transferred this issue from JuliaData/DataFrames.jl May 26, 2023
@bkamins
Copy link
Member

bkamins commented May 26, 2023

@Lincoln-Hannah - indeed I also often need it. I understand that this is request for DataFramesMeta.jl.

The only issue is mixing grouping and non-grouping columns. Maybe something like @val(:name) inside @by could be better instead (to distinguish taking :name as a column and @val(:name) as a value).

@val name is tentative.

What you currently can do is use first(:name) to get it, so maybe you would find it enough? (and just requiring documentation?)

@pdeffebach
Copy link
Collaborator

@Lincoln-Hannah Can I have more information on your use-case?

I also do this all the time, but first(:name) is enough for me.

@Lincoln-Hannah
Copy link
Author

See related request: mauro3/Parameters.jl#153

I'd like to move between DataFrames and arrays of structs as effortlessly as possible.

If I create a struct with fieldnames matching a database query. I'd like to convert the query into an array of structs in one line. Something like:

           @rtransform  df  :mystruct  =  mystruct(;  AsTable(:)...  ) 

A struct derived from a grouped DataFrame, will have single value fields for the group by columns and vector fields for the non-group-by columns.

@pdeffebach
Copy link
Collaborator

Okay so you would like

           @rtransform  df  :mystruct  =  mystruct(;  AsTable(:)...  ) 

to not return a DataFrame? Rather, you want it to return a Vector?

I still need more information on what you want. What is the output you desire? Give it as a Julia object, not a description.

@Lincoln-Hannah
Copy link
Author

Sorry Peter. My bad. I was trying to isolate the key line. To get to a vector there would be an additional line.

@chain begin
     @rtransform  df  :mystruct  =  mystruct(;  AsTable(:)...  ) 
       _.mystruct
end

Actually, more often I'd put the result in a Dictionary. Example.

using Dictionaries 

@with_kw struct myStruct
    a::Int64
    b::Int64
    c::Vector{Int64}
    d::Vector{Int64}
end


dict_of_structs = @chain begin
    DataFrame( a=[1,1,2,2], b=[11,11,12,12],  c=1:4,  d=11:14 )

    @by [:a,:b]   :x  = myStruct(; AsTable(:)... )

     Dictionary(  _.a,    _.x    )
end

AsTable(:) produces a named tuple per row, except that group by columns are single numbers and other columns are vectors or sub arrays (as per usual).

[ (a=1,b=11,c=[1,2],d=[11,12]),
(a=2,b=12,c=[3,4],d=[13,14]) ]

each row becomes a myStruct. The last line creates a dictionary.

Dictionary 
1         |          myStruct(a=1,b=11,c=[1,2],d=[11,12])  
2         |          myStruct(a=2,b=12,c=[3,4],d=[13,14])    

We can then apply a function to any element

myFunc(   dict_of_strucst[1]  )

or broadcast over all

myFunc.(   dict_of_structs )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants