Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark with Sqoop and Kite - Mismatch in Command? #490

Open
dovy opened this issue Jul 9, 2019 · 1 comment
Open

Spark with Sqoop and Kite - Mismatch in Command? #490

dovy opened this issue Jul 9, 2019 · 1 comment

Comments

@dovy
Copy link

dovy commented Jul 9, 2019

Trying to dig into this one. When Sqoop is used without Kite (IE, no parquet) there are no issues. The moment the job runs to export to parquet, everything blows up. It seems like Kite may be the offender, but if you have somewhere else to point me I will gladly work upstream.

System:

  • Debian 9
  • Hadoop 2.9
  • Spark 2.3

Installed Dependencies (JARs):

  • sqoop-1.4.7-hadoop260
  • kite-data-mapreduce-1.1.0
  • kite-hadoop-compatibility-1.1.0.jar
  • kite-data-crunch-1.1.0
  • kite-data-core-1.1.0
  • avro-tools-1.8.2.jar
  • mysql-connector-java-5.1.42
  • parquet-tools-1.8.3

Error:

19/07/09 17:55:28 INFO mapreduce.Job: Job job_1562682312457_0020 failed with state FAILED due to: Job setup failed : java.lang.IllegalArgumentException: Parquet only supports generic and specific data models, type parameter must implement IndexedRecord
	at org.kitesdk.shaded.com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
	at org.kitesdk.data.spi.filesystem.FileSystemDataset.<init>(FileSystemDataset.java:96)
	at org.kitesdk.data.spi.filesystem.FileSystemDataset.<init>(FileSystemDataset.java:128)
	at org.kitesdk.data.spi.filesystem.FileSystemDataset$Builder.build(FileSystemDataset.java:687)
	at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:199)
	at org.kitesdk.data.Datasets.load(Datasets.java:108)
	at org.kitesdk.data.Datasets.load(Datasets.java:165)
	at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.load(DatasetKeyOutputFormat.java:542)
	at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadOrCreateJobDataset(DatasetKeyOutputFormat.java:569)
	at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.access$300(DatasetKeyOutputFormat.java:67)
	at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat$MergeOutputCommitter.setupJob(DatasetKeyOutputFormat.java:369)
	at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobSetup(CommitterEventHandler.java:255)
	at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:235)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


19/07/09 17:55:28 INFO mapreduce.Job: Counters: 2

Again, it only fails on the final conversion. I am not sure of the full details since the command is inside a parallel process. Any direction would be appreciated.

@dovy
Copy link
Author

dovy commented Jul 9, 2019

P.S. Crossposted on the Sqoop side: https://issues.apache.org/jira/browse/SQOOP-3445

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant