Hadoop Via Docker
2 min readEdit on GitHub
Hadoop Via Docker
1. Prerequisites
- Install Docker: Ensure you have Docker Desktop (or Docker Engine) installed and running on your system (Windows, Mac, or Linux).
- Install Docker Compose: This usually comes bundled with Docker Desktop.
2. Use a Pre-built Docker Compose Stack (Recommended)
Instead of manually building a Dockerfile for each Hadoop service (NameNode, DataNode, etc.), you can use a ready-made setup from the community, like the widely used one from Big Data Europe (BDE) or the official Apache one.3
The BDE repository is a popular choice for quick cluster setup.4
Step 2.1: Clone the Repository
Open your terminal and clone the repository that contains the
docker-compose.yml file:5Bash
git clone https://github.com/big-data-europe/docker-hadoop.gitcd docker-hadoopStep 2.2: Start the Cluster
Use the
docker-compose up command.6 The -d flag runs the containers in the background (detached mode).7Bash
docker-compose up -dThis command will:
- Pull the required images (e.g.,
bde2020/hadoop-namenode,bde2020/hadoop-datanode). - Create a Docker network for the containers to communicate.
- Start all the Hadoop services (usually NameNode, DataNode, ResourceManager, and NodeManager).
Step 2.3: Verify the Cluster Status
Check that all services are running:
Bash
docker psYou should see multiple containers listed, typically including
namenode, datanode, resourcemanager, and nodemanager.Step 2.4: Access the Web UIs (GUIs)
You can check the health of your cluster by accessing the web interfaces:8
| Service | Default Local URL |
|---|---|
| NameNode (HDFS UI) | http://localhost:9870 |
| ResourceManager (YARN UI) | http://localhost:8088 |
Step 2.5: Run Commands (Enter the NameNode)
To interact with HDFS, you typically execute commands inside the
namenode container:Bash
hljs jsx
# Enter the NameNode container's bash shell
docker exec -it namenode bashFrom inside the container, you can run standard Hadoop commands:
Bash
hljs jsx
# Create a directory in HDFS
hdfs dfs -mkdir /user/test
# List the contents of the root directory
hdfs dfs -ls /
# Exit the container
exit3. Stopping and Cleaning Up
When you are done with the cluster, stop and remove the containers and network using the same
docker-compose.yml file:Bash
hljs jsx
# Stop and remove all containers, networks, and volumes defined in the file
docker-compose down