Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to Debian 10, Java 11, Hadoop 3.3.1, build/run on ARM #108

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
15 changes: 8 additions & 7 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,19 +1,20 @@
DOCKER_NETWORK = docker-hadoop_default
ENV_FILE = hadoop.env
current_branch := $(shell git rev-parse --abbrev-ref HEAD)
base_version := --build-arg HADOOP_BASE_VERSION=$(current_branch)
build:
docker build -t bde2020/hadoop-base:$(current_branch) ./base
docker build -t bde2020/hadoop-namenode:$(current_branch) ./namenode
docker build -t bde2020/hadoop-datanode:$(current_branch) ./datanode
docker build -t bde2020/hadoop-resourcemanager:$(current_branch) ./resourcemanager
docker build -t bde2020/hadoop-nodemanager:$(current_branch) ./nodemanager
docker build -t bde2020/hadoop-historyserver:$(current_branch) ./historyserver
docker build -t bde2020/hadoop-submit:$(current_branch) ./submit
docker build -t bde2020/hadoop-namenode:$(current_branch) $(base_version) ./namenode
docker build -t bde2020/hadoop-datanode:$(current_branch) $(base_version) ./datanode
docker build -t bde2020/hadoop-resourcemanager:$(current_branch) $(base_version) ./resourcemanager
docker build -t bde2020/hadoop-nodemanager:$(current_branch) $(base_version) ./nodemanager
docker build -t bde2020/hadoop-historyserver:$(current_branch) $(base_version) ./historyserver
docker build -t bde2020/hadoop-submit:$(current_branch) $(base_version) ./submit

wordcount:
docker build -t hadoop-wordcount ./submit
docker run --network ${DOCKER_NETWORK} --env-file ${ENV_FILE} bde2020/hadoop-base:$(current_branch) hdfs dfs -mkdir -p /input/
docker run --network ${DOCKER_NETWORK} --env-file ${ENV_FILE} bde2020/hadoop-base:$(current_branch) hdfs dfs -copyFromLocal -f /opt/hadoop-3.2.1/README.txt /input/
docker run --network ${DOCKER_NETWORK} --env-file ${ENV_FILE} bde2020/hadoop-base:$(current_branch) hdfs dfs -copyFromLocal -f /opt/hadoop/README.txt /input/
docker run --network ${DOCKER_NETWORK} --env-file ${ENV_FILE} hadoop-wordcount
docker run --network ${DOCKER_NETWORK} --env-file ${ENV_FILE} bde2020/hadoop-base:$(current_branch) hdfs dfs -cat /output/*
docker run --network ${DOCKER_NETWORK} --env-file ${ENV_FILE} bde2020/hadoop-base:$(current_branch) hdfs dfs -rm -r /output
Expand Down
11 changes: 6 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Version 2.0.0 introduces uses wait_for_it script for the cluster startup
# Hadoop Docker

## Supported Hadoop Versions
See repository branches for supported hadoop versions
See repository branches for supported Hadoop versions

## Quick Start

Expand All @@ -26,16 +26,17 @@ Or deploy in swarm:
docker stack deploy -c docker-compose-v3.yml hadoop
```

`docker-compose` creates a docker network that can be found by running `docker network list`, e.g. `dockerhadoop_default`.
`docker-compose` creates a docker network that can be found by running `docker network list`, e.g. `docker-hadoop_default`.

Run `docker network inspect` on the network (e.g. `dockerhadoop_default`) to find the IP the hadoop interfaces are published on. Access these interfaces with the following URLs:
Run `docker network inspect` on the network (e.g. `docker-hadoop_default`) to find the IP the Hadoop interfaces are published on. Access these interfaces with the following URLs:

* Namenode: http://<dockerhadoop_IP_address>:9870/dfshealth.html#tab-overview
* History server: http://<dockerhadoop_IP_address>:8188/applicationhistory
* Datanode: http://<dockerhadoop_IP_address>:9864/
* Nodemanager: http://<dockerhadoop_IP_address>:8042/node
* Resource manager: http://<dockerhadoop_IP_address>:8088/

All other Hadoop communication ports are not exposed and only accessible from inside the Docker network using service name and port, eg. `http://namenode:9000/`.


## Configure Environment Variables

The configuration parameters can be specified in the hadoop.env file or as environmental variables for specific services (e.g. namenode, datanode etc.):
Expand Down
26 changes: 19 additions & 7 deletions base/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,44 +1,56 @@
FROM debian:9
FROM debian:10

MAINTAINER Ivan Ermilov <[email protected]>
MAINTAINER Giannis Mouchakis <[email protected]>

RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
openjdk-8-jdk \
openjdk-11-jdk \
net-tools \
curl \
netcat \
gnupg \
libsnappy-dev \
libssl-dev \
&& rm -rf /var/lib/apt/lists/*

ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/

RUN curl -O https://dist.apache.org/repos/dist/release/hadoop/common/KEYS

RUN gpg --import KEYS

ENV HADOOP_VERSION 3.2.1
ENV HADOOP_URL https://www.apache.org/dist/hadoop/common/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz
ENV HADOOP_VERSION=3.3.1
# base URL for downloads: the name of the tar file depends
# on the target platform (amd64/x86_64 vs. arm64/aarch64)
ENV HADOOP_BASE_URL=https://www.apache.org/dist/hadoop/common/hadoop-$HADOOP_VERSION

RUN set -x \
&& ARCH=$(uname -m) \
&& ARCH=$(if test "$ARCH" = "x86_64"; then echo ""; else echo "-$ARCH"; fi) \
&& HADOOP_URL="$HADOOP_BASE_URL/hadoop-$HADOOP_VERSION$ARCH.tar.gz" \
&& curl -fSL "$HADOOP_URL" -o /tmp/hadoop.tar.gz \
&& curl -fSL "$HADOOP_URL.asc" -o /tmp/hadoop.tar.gz.asc \
&& gpg --verify /tmp/hadoop.tar.gz.asc \
&& tar -xvf /tmp/hadoop.tar.gz -C /opt/ \
&& rm /tmp/hadoop.tar.gz*

RUN ln -s /opt/hadoop-$HADOOP_VERSION/etc/hadoop /etc/hadoop
RUN ln -s /opt/hadoop-$HADOOP_VERSION /opt/hadoop

RUN mkdir /opt/hadoop-$HADOOP_VERSION/logs

RUN mkdir /hadoop-data

ENV JAVA_HOME=/usr/lib/jvm/default-java
# create the symlink "/usr/lib/jvm/default-java" in case
# it is not already there (cf. package "default-jre-headless")
RUN if ! test -d $JAVA_HOME; then \
ln -sf $(readlink -f $(dirname $(readlink -f $(which java)))/..) $JAVA_HOME; \
fi

ENV HADOOP_HOME=/opt/hadoop-$HADOOP_VERSION
ENV HADOOP_CONF_DIR=/etc/hadoop
ENV MULTIHOMED_NETWORK=1
ENV USER=root
ENV PATH $HADOOP_HOME/bin/:$PATH
ENV PATH=$HADOOP_HOME/bin/:$PATH

ADD entrypoint.sh /entrypoint.sh

Expand Down
3 changes: 2 additions & 1 deletion datanode/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
FROM bde2020/hadoop-base:2.0.0-hadoop3.2.1-java8
ARG HADOOP_BASE_VERSION=2.0.0-hadoop3.3.1-java11
FROM bde2020/hadoop-base:$HADOOP_BASE_VERSION

MAINTAINER Ivan Ermilov <[email protected]>

Expand Down
10 changes: 5 additions & 5 deletions docker-compose-v3.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ version: '3'

services:
namenode:
image: bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8
image: bde2020/hadoop-namenode:2.0.0-hadoop3.3.1-java11
networks:
- hbase
volumes:
Expand All @@ -24,7 +24,7 @@ services:
traefik.port: 50070

datanode:
image: bde2020/hadoop-datanode:2.0.0-hadoop3.2.1-java8
image: bde2020/hadoop-datanode:2.0.0-hadoop3.3.1-java11
networks:
- hbase
volumes:
Expand All @@ -42,7 +42,7 @@ services:
traefik.port: 50075

resourcemanager:
image: bde2020/hadoop-resourcemanager:2.0.0-hadoop3.2.1-java8
image: bde2020/hadoop-resourcemanager:2.0.0-hadoop3.3.1-java11
networks:
- hbase
environment:
Expand All @@ -64,7 +64,7 @@ services:
disable: true

nodemanager:
image: bde2020/hadoop-nodemanager:2.0.0-hadoop3.2.1-java8
image: bde2020/hadoop-nodemanager:2.0.0-hadoop3.3.1-java11
networks:
- hbase
environment:
Expand All @@ -80,7 +80,7 @@ services:
traefik.port: 8042

historyserver:
image: bde2020/hadoop-historyserver:2.0.0-hadoop3.2.1-java8
image: bde2020/hadoop-historyserver:2.0.0-hadoop3.3.1-java11
networks:
- hbase
volumes:
Expand Down
15 changes: 9 additions & 6 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,11 @@ version: "3"

services:
namenode:
image: bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8
image: bde2020/hadoop-namenode:2.0.0-hadoop3.3.1-java11
container_name: namenode
restart: always
ports:
- 9870:9870
- 9000:9000
volumes:
- hadoop_namenode:/hadoop/dfs/name
environment:
Expand All @@ -16,7 +15,7 @@ services:
- ./hadoop.env

datanode:
image: bde2020/hadoop-datanode:2.0.0-hadoop3.2.1-java8
image: bde2020/hadoop-datanode:2.0.0-hadoop3.3.1-java11
container_name: datanode
restart: always
volumes:
Expand All @@ -27,16 +26,18 @@ services:
- ./hadoop.env

resourcemanager:
image: bde2020/hadoop-resourcemanager:2.0.0-hadoop3.2.1-java8
image: bde2020/hadoop-resourcemanager:2.0.0-hadoop3.3.1-java11
container_name: resourcemanager
restart: always
ports:
- 8088:8088
environment:
SERVICE_PRECONDITION: "namenode:9000 namenode:9870 datanode:9864"
env_file:
- ./hadoop.env

nodemanager1:
image: bde2020/hadoop-nodemanager:2.0.0-hadoop3.2.1-java8
image: bde2020/hadoop-nodemanager:2.0.0-hadoop3.3.1-java11
container_name: nodemanager
restart: always
environment:
Expand All @@ -45,9 +46,11 @@ services:
- ./hadoop.env

historyserver:
image: bde2020/hadoop-historyserver:2.0.0-hadoop3.2.1-java8
image: bde2020/hadoop-historyserver:2.0.0-hadoop3.3.1-java11
container_name: historyserver
restart: always
ports:
- 8188:8188
environment:
SERVICE_PRECONDITION: "namenode:9000 namenode:9870 datanode:9864 resourcemanager:8088"
volumes:
Expand Down
6 changes: 3 additions & 3 deletions hadoop.env
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,6 @@ MAPRED_CONF_mapreduce_map_memory_mb=4096
MAPRED_CONF_mapreduce_reduce_memory_mb=8192
MAPRED_CONF_mapreduce_map_java_opts=-Xmx3072m
MAPRED_CONF_mapreduce_reduce_java_opts=-Xmx6144m
MAPRED_CONF_yarn_app_mapreduce_am_env=HADOOP_MAPRED_HOME=/opt/hadoop-3.2.1/
MAPRED_CONF_mapreduce_map_env=HADOOP_MAPRED_HOME=/opt/hadoop-3.2.1/
MAPRED_CONF_mapreduce_reduce_env=HADOOP_MAPRED_HOME=/opt/hadoop-3.2.1/
MAPRED_CONF_yarn_app_mapreduce_am_env=HADOOP_MAPRED_HOME=/opt/hadoop/
MAPRED_CONF_mapreduce_map_env=HADOOP_MAPRED_HOME=/opt/hadoop/
MAPRED_CONF_mapreduce_reduce_env=HADOOP_MAPRED_HOME=/opt/hadoop/
3 changes: 2 additions & 1 deletion historyserver/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
FROM bde2020/hadoop-base:2.0.0-hadoop3.2.1-java8
ARG HADOOP_BASE_VERSION=2.0.0-hadoop3.3.1-java11
FROM bde2020/hadoop-base:$HADOOP_BASE_VERSION

MAINTAINER Ivan Ermilov <[email protected]>

Expand Down
3 changes: 2 additions & 1 deletion namenode/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
FROM bde2020/hadoop-base:2.0.0-hadoop3.2.1-java8
ARG HADOOP_BASE_VERSION=2.0.0-hadoop3.3.1-java11
FROM bde2020/hadoop-base:$HADOOP_BASE_VERSION

MAINTAINER Ivan Ermilov <[email protected]>

Expand Down
3 changes: 2 additions & 1 deletion nodemanager/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
FROM bde2020/hadoop-base:2.0.0-hadoop3.2.1-java8
ARG HADOOP_BASE_VERSION=2.0.0-hadoop3.3.1-java11
FROM bde2020/hadoop-base:$HADOOP_BASE_VERSION

MAINTAINER Ivan Ermilov <[email protected]>

Expand Down
3 changes: 2 additions & 1 deletion resourcemanager/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
FROM bde2020/hadoop-base:2.0.0-hadoop3.2.1-java8
ARG HADOOP_BASE_VERSION=2.0.0-hadoop3.3.1-java11
FROM bde2020/hadoop-base:$HADOOP_BASE_VERSION

MAINTAINER Ivan Ermilov <[email protected]>

Expand Down
3 changes: 2 additions & 1 deletion submit/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
FROM bde2020/hadoop-base:2.0.0-hadoop3.2.1-java8
ARG HADOOP_BASE_VERSION=2.0.0-hadoop3.3.1-java11
FROM bde2020/hadoop-base:$HADOOP_BASE_VERSION

MAINTAINER Ivan Ermilov <[email protected]>

Expand Down