Create DataSet using SparkSession #26

santhoshtangudu · 2017-06-26T12:29:17Z

Hi,
We have 4mc format files in my Hadoop cluster. We are trying to read these files and create DataSet (instead of creating RDD and then DataSet) in spark-2.0. Can you please us to do the same?

carlomedas · 2017-06-26T18:11:38Z

For sure there are several methods to achieve that, and to be honest I'm not sure I'm giving you best solution. Unfortunately at this time I don't have time to dig deeper, but:
what about, you load the RDD using a SQLContext, which could be pre-filtered etc etc, then you create a bean class that can be used to quickly map to Dataset by leveraging convention over configuration, like e.g:
Dataset devices = sqlContext.createDataFrame(ratingsRDD, DeviceEntry.class);

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create DataSet using SparkSession #26

Create DataSet using SparkSession #26

santhoshtangudu commented Jun 26, 2017

carlomedas commented Jun 26, 2017

Create DataSet using SparkSession #26

Create DataSet using SparkSession #26

Comments

santhoshtangudu commented Jun 26, 2017

carlomedas commented Jun 26, 2017