Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kite-dataset fails on Mac OS X due to case insensitive filesystem while unpacking the JAR #475

Open
ecerulm opened this issue Aug 7, 2017 · 1 comment

Comments

@ecerulm
Copy link

ecerulm commented Aug 7, 2017

The kite-tools-1.1.0-binary.jar will fail in Mac OS X since the HFS+ filesystem is case-insensitive and the jar contains META-INF/LICENSE and META-INF/license. The HFS+ by default doesn't not allow two filenames that only differ in case, it's case preserving but case insensitive.

You can verify that the JAR indeed contains a license and LICENSE with the command jar tvf kite-tools-1.1.0-binary.jar |grep -i license

This filename clash / conflict renders it unusable since when Hadoop tries to unpack the JAR will throw and IOException: Mkdirs failed to create <tmpdir>.../hadoop-unjar/.../META-INF/license:

kite-dataset csv-schema movies.csv --record-name Movie                                                                                                                     
/Users/ecerulm/bin/kite-dataset debug: Using HADOOP_COMMON_HOME=/Users/ecerulm/.local/stow/hadoop-2.8.1/
/Users/ecerulm/bin/kite-dataset debug: Using HADOOP_MAPRED_HOME=/Users/ecerulm/.local/stow/hadoop-2.8.1//../hadoop-mapreduce
/Users/ecerulm/bin/kite-dataset debug: Using HBASE_HOME=/Users/ecerulm/.local/stow/hadoop-2.8.1//../hbase
/Users/ecerulm/bin/kite-dataset debug: Using HIVE_HOME=/Users/ecerulm/.local/stow/hadoop-2.8.1//../hive
/Users/ecerulm/bin/kite-dataset debug: Using HIVE_CONF_DIR=/Users/ecerulm/.local/stow/hadoop-2.8.1//../hive/conf
/Users/ecerulm/bin/kite-dataset debug: Using HADOOP_CLASSPATH=/Users/ecerulm/bin/kite-dataset::
Exception in thread "main" java.io.IOException: Mkdirs failed to create /var/folders/j5/8yjty44917v3_ydfjyy0gz0c0000gn/T/hadoop-unjar7609709732056315890/META-INF/license
	at org.apache.hadoop.util.RunJar.ensureDirectory(RunJar.java:140)
	at org.apache.hadoop.util.RunJar.unJar(RunJar.java:109)
	at org.apache.hadoop.util.RunJar.unJar(RunJar.java:85)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:222)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:148)

Is it possible to change the JAR build process to rename the META-INF/license dir to META-INF/licenses. Googling around I found the Maven [ApacheLicenseResourceTransformer])(https://maven.apache.org/plugins/maven-shade-plugin/examples/resource-transformers.html#ApacheLicenseResourceTransformer) may solve the problem.

Alternatively, maybe move or rename the META-INF/LICENSE (Jackson JSON processor license).

Is this possible?, otherwise kite-dataset cannot be used (as far as I understand) on Mac OS X.

@ecerulm
Copy link
Author

ecerulm commented Aug 8, 2017

As a workaround that may interest people having the same problem it is possible to remove the META-INF\LICENSE file from the kite-dataset with the following commands:

curl -O  http://central.maven.org/maven2/org/kitesdk/kite-tools/1.1.0/kite-tools-1.1.0-binary.jar
md5 kite-tools-1.1.0-binary.jar # MD5 (kite-tools-1.1.0-binary.jar) = 3327af98b339725070962f7391187fc2
dd if=kite-tools-1.1.0-binary.jar bs=4114 count=1 > script.sh # first 4114 bytes of .jar to script.sh file
dd if=kite-tools-1.1.0-binary.jar bs=4114 skip=1 > jarcontent.zip # rest of jar goes to jarcontent.zip
zip -d jarcontent.zip META-INF/LICENSE
cat script.sh jarcontent.zip >~/bin/kite-dataset

that will generate a ~/bin/kite-dataset with no case conflicting filenames.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant