Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How could I join two data frames by specfying columns from each data frame ? #79

Open
DareUrDream opened this issue Jan 24, 2019 · 2 comments
Milestone

Comments

@DareUrDream
Copy link

Hi,

I have a situation where there are two data frames with no common columns. How can I join them ? I want to join them with every other column one after another to produce various outputs.

Is it possible to join two DF's by specifying the mapping column/s from each DF ?

Cheers,
DareUrDream

@cardillo
Copy link
Owner

Try using the joinOn method with the column names for which the values match. Alternatively, you can use the joinOn method with a function that computes the join key.

@DareUrDream
Copy link
Author

Hi @cardillo ,

I have achieved it for the time being by renaming columns in one of the data sets. But then I have hit another bottle neck. Below is the stack trace. I am not sure how to prepare a unique key now so that the join works.

resource.txt
agentstatedetail1m_copy.txt

Stack trace

Exception in thread "main" java.lang.IllegalArgumentException: generated key is not unique: [3]
at joinery.impl.Combining.join(Combining.java:45)
at joinery.impl.Combining.joinOn(Combining.java:102)
at joinery.DataFrame.joinOn(DataFrame.java:730)
at joinery.DataFrame.joinOn(DataFrame.java:756)
at com.cisco.evaluate.joinery.JoineryTestMain.startEvaluation(JoineryTestMain.java:37)
at com.cisco.evaluate.joinery.JoineryTestMain.main(JoineryTestMain.java:18)

Code Below

`DataFrame rsrcDf = DataFrame.readCsv(ClassLoader.getSystemResourceAsStream("resource.csv"))
.retain("resourceid", "resourceloginid", "resourcename", "resourcegroupid", "extension", "resourceskillmapid", "assignedteamid", "resourcefirstname", "resourcelastname");

	DataFrame<Object> asdDf = DataFrame.readCsv(ClassLoader.getSystemResourceAsStream("agentstatedetail1m_copy.csv")).retain("agentid", "eventtype");
	
	asdDf = asdDf.rename("agentid", "resourceid");
	
	DataFrame<Object> joinedDf = asdDf.joinOn(rsrcDf, JoinType.LEFT, "resourceid");
	System.out.println("Final row count: " + joinedDf);`

Cheers,
DareUrDream

@cardillo cardillo added this to the v1.11 milestone Oct 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants