Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hadoop3.2.1 python3 #153

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 12 additions & 13 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,20 +1,19 @@
DOCKER_NETWORK = docker-hadoop_default
ENV_FILE = hadoop.env
current_branch := $(shell git rev-parse --abbrev-ref HEAD)
current_branch := latest
build:
docker build -t bde2020/hadoop-base:$(current_branch) ./base
docker build -t bde2020/hadoop-namenode:$(current_branch) ./namenode
docker build -t bde2020/hadoop-datanode:$(current_branch) ./datanode
docker build -t bde2020/hadoop-resourcemanager:$(current_branch) ./resourcemanager
docker build -t bde2020/hadoop-nodemanager:$(current_branch) ./nodemanager
docker build -t bde2020/hadoop-historyserver:$(current_branch) ./historyserver
docker build -t bde2020/hadoop-submit:$(current_branch) ./submit
docker build -t pramodraob/hadoop-base:$(current_branch) ./base
docker build -t pramodraob/hadoop-namenode:$(current_branch) ./namenode
docker build -t pramodraob/hadoop-datanode:$(current_branch) ./datanode
docker build -t pramodraob/hadoop-resourcemanager:$(current_branch) ./resourcemanager
docker build -t pramodraob/hadoop-nodemanager:$(current_branch) ./nodemanager
docker build -t pramodraob/hadoop-historyserver:$(current_branch) ./historyserver
docker build -t pramodraob/hadoop-submit:$(current_branch) ./submit

wordcount:
docker build -t hadoop-wordcount ./submit
docker run --network ${DOCKER_NETWORK} --env-file ${ENV_FILE} bde2020/hadoop-base:$(current_branch) hdfs dfs -mkdir -p /input/
docker run --network ${DOCKER_NETWORK} --env-file ${ENV_FILE} bde2020/hadoop-base:$(current_branch) hdfs dfs -copyFromLocal -f /opt/hadoop-3.2.1/README.txt /input/
docker run --network ${DOCKER_NETWORK} --env-file ${ENV_FILE} pramodraob/hadoop-base:$(current_branch) hdfs dfs -mkdir -p /input/
docker run --network ${DOCKER_NETWORK} --env-file ${ENV_FILE} hadoop-wordcount
docker run --network ${DOCKER_NETWORK} --env-file ${ENV_FILE} bde2020/hadoop-base:$(current_branch) hdfs dfs -cat /output/*
docker run --network ${DOCKER_NETWORK} --env-file ${ENV_FILE} bde2020/hadoop-base:$(current_branch) hdfs dfs -rm -r /output
docker run --network ${DOCKER_NETWORK} --env-file ${ENV_FILE} bde2020/hadoop-base:$(current_branch) hdfs dfs -rm -r /input
docker run --network ${DOCKER_NETWORK} --env-file ${ENV_FILE} pramodraob/hadoop-base:$(current_branch) hdfs dfs -cat /output/*
docker run --network ${DOCKER_NETWORK} --env-file ${ENV_FILE} pramodraob/hadoop-base:$(current_branch) hdfs dfs -rm -r /output
docker run --network ${DOCKER_NETWORK} --env-file ${ENV_FILE} pramodraob/hadoop-base:$(current_branch) hdfs dfs -rm -r /input
17 changes: 14 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,10 @@ Version 2.0.0 introduces uses wait_for_it script for the cluster startup

# Hadoop Docker

## Supported Hadoop Versions
See repository branches for supported hadoop versions

## Quick Start

The docker compose setup uses bind volumes. Replace those lines to bind a directory on your local file system or remove those lines if you do not want to bind any directories.

To deploy an example HDFS cluster, run:
```
docker-compose up
Expand All @@ -36,6 +35,18 @@ Run `docker network inspect` on the network (e.g. `dockerhadoop_default`) to fin
* Nodemanager: http://<dockerhadoop_IP_address>:8042/node
* Resource manager: http://<dockerhadoop_IP_address>:8088/

Once deployed, run `docker ps` and ensure that the containers are healthy. If they appear to be unhealthy, you can read the logs by using the command `docker logs <container-id>`.

To exec into the namenode, run `docker exec -it namenode /bin/bash`

## Common issues

- "Library initialization failed - unable to allocate file descriptor table"
You can follow the instructions present [here](https://stackoverflow.com/questions/68776387/docker-library-initialization-failed-unable-to-allocate-file-descriptor-tabl)
- Resource manager keeps restarting because the namenode is in safemode
- Exec into the namenode and run `hdfs dfsadmin -safemode leave`
- Then, run `hdfs fsck -delete`

## Configure Environment Variables

The configuration parameters can be specified in the hadoop.env file or as environmental variables for specific services (e.g. namenode, datanode etc.):
Expand Down
34 changes: 23 additions & 11 deletions base/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,25 +1,37 @@
FROM debian:9
FROM debian:latest

MAINTAINER Ivan Ermilov <[email protected]>
MAINTAINER Giannis Mouchakis <[email protected]>

RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
openjdk-8-jdk \
net-tools \
curl \
netcat \
netcat-traditional \
gnupg \
ca-certificates \
libsnappy-dev \
&& rm -rf /var/lib/apt/lists/*

ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
python3 \
vim \
neovim \
less \
wget

RUN curl -O https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
RUN update-ca-certificates && apt-get update

RUN gpg --import KEYS
RUN mkdir -p /etc/apt/keyrings && wget -O - https://packages.adoptium.net/artifactory/api/gpg/key/public | tee /etc/apt/keyrings/adoptium.asc
RUN echo "deb [signed-by=/etc/apt/keyrings/adoptium.asc] https://packages.adoptium.net/artifactory/deb $(awk -F= '/^VERSION_CODENAME/{print$2}' /etc/os-release) main" | tee /etc/apt/sources.list.d/adoptium.list

ENV HADOOP_VERSION 3.2.1
ENV HADOOP_URL https://www.apache.org/dist/hadoop/common/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz
RUN apt update && DEBIAN_FRONTEND=noninteractive apt install -y --no-install-recommends \
temurin-8-jdk \
&& rm -rf /var/lib/apt/lists/*

ENV JAVA_HOME=/usr/lib/jvm/temurin-8-jdk-amd64/

RUN curl -fsSL https://downloads.apache.org/hadoop/common/KEYS | gpg --import -

ARG HADOOP_VERSION=3.2.1
ENV HADOOP_VERSION $HADOOP_VERSION
ENV HADOOP_URL https://archive.apache.org/dist/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz

RUN set -x \
&& curl -fSL "$HADOOP_URL" -o /tmp/hadoop.tar.gz \
Expand All @@ -40,7 +52,7 @@ ENV MULTIHOMED_NETWORK=1
ENV USER=root
ENV PATH $HADOOP_HOME/bin/:$PATH

ADD entrypoint.sh /entrypoint.sh
COPY entrypoint.sh /entrypoint.sh

RUN chmod a+x /entrypoint.sh

Expand Down
2 changes: 1 addition & 1 deletion datanode/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM bde2020/hadoop-base:2.0.0-hadoop3.2.1-java8
FROM pramodraob/hadoop-base:latest

MAINTAINER Ivan Ermilov <[email protected]>

Expand Down
110 changes: 0 additions & 110 deletions docker-compose-v3.yml

This file was deleted.

41 changes: 28 additions & 13 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,60 +2,75 @@ version: "3"

services:
namenode:
image: bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8
build: ./namenode/
container_name: namenode
restart: always
ports:
- 9870:9870
- 9000:9000
volumes:
- hadoop_namenode:/hadoop/dfs/name
- type: bind
source: ~/course_work/dist/assn2/
target: /hadoop/dfs/name/assn2/
environment:
- CLUSTER_NAME=test
env_file:
- ./hadoop.env

datanode:
image: bde2020/hadoop-datanode:2.0.0-hadoop3.2.1-java8
container_name: datanode
datanode0:
build: ./datanode/
container_name: datanode0
restart: always
volumes:
- hadoop_datanode:/hadoop/dfs/data
- hadoop_datanode0:/hadoop/dfs/data
environment:
SERVICE_PRECONDITION: "namenode:9870"
env_file:
- ./hadoop.env

datanode1:
build: ./datanode/
container_name: datanode1
restart: always
volumes:
- hadoop_datanode1:/hadoop/dfs/data
environment:
SERVICE_PRECONDITION: "namenode:9870"
env_file:
- ./hadoop.env

resourcemanager:
image: bde2020/hadoop-resourcemanager:2.0.0-hadoop3.2.1-java8
build: ./resourcemanager/
container_name: resourcemanager
restart: always
environment:
SERVICE_PRECONDITION: "namenode:9000 namenode:9870 datanode:9864"
SERVICE_PRECONDITION: "namenode:9000 namenode:9870 datanode0:9864 datanode1:9864"
env_file:
- ./hadoop.env

nodemanager1:
image: bde2020/hadoop-nodemanager:2.0.0-hadoop3.2.1-java8
nodemanager:
build: ./nodemanager/
container_name: nodemanager
restart: always
environment:
SERVICE_PRECONDITION: "namenode:9000 namenode:9870 datanode:9864 resourcemanager:8088"
SERVICE_PRECONDITION: "namenode:9000 namenode:9870 datanode0:9864 datanode1:9864 resourcemanager:8088"
env_file:
- ./hadoop.env

historyserver:
image: bde2020/hadoop-historyserver:2.0.0-hadoop3.2.1-java8
build: ./historyserver/
container_name: historyserver
restart: always
environment:
SERVICE_PRECONDITION: "namenode:9000 namenode:9870 datanode:9864 resourcemanager:8088"
SERVICE_PRECONDITION: "namenode:9000 namenode:9870 datanode0:9864 datanode1:9864 resourcemanager:8088"
volumes:
- hadoop_historyserver:/hadoop/yarn/timeline
env_file:
- ./hadoop.env

volumes:
hadoop_namenode:
hadoop_datanode:
hadoop_datanode0:
hadoop_datanode1:
hadoop_historyserver:
2 changes: 1 addition & 1 deletion historyserver/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM bde2020/hadoop-base:2.0.0-hadoop3.2.1-java8
FROM pramodraob/hadoop-base:latest

MAINTAINER Ivan Ermilov <[email protected]>

Expand Down
2 changes: 1 addition & 1 deletion namenode/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM bde2020/hadoop-base:2.0.0-hadoop3.2.1-java8
FROM pramodraob/hadoop-base:latest

MAINTAINER Ivan Ermilov <[email protected]>

Expand Down
7 changes: 0 additions & 7 deletions nginx/Dockerfile

This file was deleted.

Loading