To run Hadoop on Mesos you need to add the hadoop-mesos-0.0.6.jar
library to your Hadoop distribution (any distribution that supports
hadoop-core-1.2.1
should work) and set some new configuration
properties. Read on for details.
To use the metrics feature (which uses the CodaHale Metrics library), you need to
install libsnappy
. The snappy-java
package also includes a bundled version of
libsnappyjava
.
You can build hadoop-mesos-0.0.6.jar
using Maven:
$ mvn package
If successful, the JAR will be at target/hadoop-mesos-0.0.6.jar
.
NOTE: If you want to build against a different version of Mesos than the default you'll need to update
mesos-version
inpom.xml
.
We plan to provide already built JARs at http://repository.apache.org in the near future!
You'll need to download an existing Hadoop distribution. For this guide, we'll use CDH4.2.2. First grab the tar archive and extract it.
$ wget http://archive.cloudera.com/cdh4/cdh/4/mr1-2.0.0-mr1-cdh4.2.2.tar.gz
...
$ tar zxf mr1-2.0.0-mr1-cdh4.2.2.tar.gz
Take note, the extracted directory is
hadoop-2.0.0-mr1-cdh4.2.2
.
Now copy hadoop-mesos-0.0.6.jar
into the lib
folder.
$ cp /path/to/hadoop-mesos-0.0.6.jar hadoop-2.0.0-mr1-cdh4.2.2/lib/
That's it! You now have a Hadoop on Mesos distribution!
You'll want to upload your Hadoop on Mesos distribution somewhere
that Mesos can access in order to launch each TaskTracker
. For
example, if you're already running HDFS:
$ tar czf hadoop-2.0.0-mr1-cdh4.2.2.tar.gz hadoop-2.0.0-mr1-cdh4.2.2
$ hadoop fs -put hadoop-2.0.0-mr1-cdh4.2.2.tar.gz /hadoop-2.0.0-mr1-cdh4.2.2.tar.gz
Consider any permissions issues with your uploaded location (i.e., on HDFS you'll probably want to make the file world readable).
Now you'll need to configure your JobTracker
to launch each
TaskTracker
on Mesos!
Along with the normal configuration properties you might want to set
to launch a JobTracker
, you'll need to set some Mesos specific ones
too.
Here are the mandatory configuration properties for
conf/mapred-site.xml
(initialized to values representative of
running in pseudo distributed
operation:
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
<property>
<name>mapred.jobtracker.taskScheduler</name>
<value>org.apache.hadoop.mapred.MesosScheduler</value>
</property>
<property>
<name>mapred.mesos.taskScheduler</name>
<value>org.apache.hadoop.mapred.JobQueueTaskScheduler</value>
</property>
<property>
<name>mapred.mesos.master</name>
<value>localhost:5050</value>
</property>
<property>
<name>mapred.mesos.executor.uri</name>
<value>hdfs://localhost:9000/hadoop-2.0.0-mr1-cdh4.2.2.tar.gz</value>
</property>
More details on configuration propertios can be found here.
Now you can start the JobTracker
but you'll need to include the path
to the Mesos native library.
On Linux:
$ MESOS_NATIVE_LIBRARY=/path/to/libmesos.so hadoop jobtracker
And on OS X:
$ MESOS_NATIVE_LIBRARY=/path/to/libmesos.dylib hadoop jobtracker
NOTE: You do not need to worry about distributing your Hadoop configuration! All of the configuration properties read by the
JobTracker
along with any necessaryTaskTracker
specific overrides will get serialized and passed to eachTaskTracker
on startup.
Please email [email protected] with questions!