Hadoop Installation
5 min readEdit on GitHub
Hadoop Installation
Installing WSL, Ubuntu, JDK, Hadoop, and Running Hadoop on Windows 11
(With separate “hadoop” user, simple SSH setup, and nano-based environment setup)
1. Understanding the Basics Before Starting
Before installing anything, you must understand three things: WSL, Linux, and Hadoop.
1.1 What is WSL?
WSL (Windows Subsystem for Linux) allows you to run Linux inside Windows without a virtual machine.
You get a real Linux terminal where you can run commands.
1.2 What is Ubuntu?
Ubuntu is a popular Linux operating system.
WSL installs Ubuntu automatically.
1.3 Why Hadoop needs Linux?
Hadoop is built to run on Linux-based clusters.
Linux provides SSH, permissions, background processes, and directory structures that Hadoop depends on.
2. Install WSL on Windows 11
Step 1: Open PowerShell as Administrator
Search “PowerShell”, right-click → Run as Administrator.
Step 2: Install WSL
hljs powershell
wsl --installThis installs WSL + Ubuntu.
Restart if required.
Step 3: Open Ubuntu
Search “Ubuntu” in Start → open.
It will ask for a Linux username and password.
This is your Linux environment.
3. Learn Basic Linux Commands (Beginner-Friendly)
hljs bash
pwd # shows current directory
ls # lists files
cd dir # go into folder
cd .. # go back one folder
mkdir a # create folder
rm f # delete file
sudo # run command as adminEnough to follow this guide.
4. Update Ubuntu
hljs bash
sudo apt update
sudo apt upgrade -y5. Install Required Tools
hljs bash
sudo apt install -y wget curl vim unzip rsyncExplanation:
- wget → download files
- curl → send/receive data
- vim → text editor
- unzip → extract files
- ssh → required by Hadoop
- rsync → used internally by Hadoop
6. Install JDK (Java Development Kit)
Hadoop requires Java.
Install Java 11:
hljs bash
sudo apt install -y default-jdkCheck Java:
hljs bash
java -versionFind JAVA_HOME:
hljs bash
readlink -f $(which java) | sed "s:bin/java::"You will get something like:
text
/usr/lib/jvm/java-11-openjdk-amd64/Copy this path (you will paste it in .bashrc later).
7. Create a Dedicated Hadoop User
This is cleaner and avoids permission issues.
hljs bash
sudo adduser hadoopGive a password (anything you like).
Add hadoop user to sudo:
hljs bash
sudo usermod -aG sudo hadoopSwitch to Hadoop user:
hljs bash
su - hadoop8. Enable SSH (Simple Setup)
Install SSH server:
hljs bash
sudo apt install openssh-client
sudo apt install openssh-serverStart SSH:
hljs bash
sudo service ssh startCreate SSH key (no password):
hljs bash
ssh-keygen -t rsaAdd key to authorized keys:
hljs bash
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keysFix permissions:
hljs bash
chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keysTest SSH:
hljs bash
ssh localhostIf it logs in without asking password, SSH is set.
9. Download and Install Hadoop
Login as hadoop user:
hljs bash
su - hadoopDownload Hadoop 3.4.2 (latest as of now):
hljs bash
cd ~
wget https://downloads.apache.org/hadoop/common/hadoop-3.4.2/hadoop-3.4.2.tar.gz
tar -xzf hadoop-3.4.2.tar.gz
mv hadoop-3.4.2 hadoop10. Add Environment Variables (Using nano as you requested)
Open .bashrc:
hljs bash
nano ~/.bashrcAdd these lines at the end of the file:
text
export JAVA_HOME=/usr/lib/jvm/java-21-openjdk-amd64
export HADOOP_HOME=/home/hadoop/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbinSave nano:
- Press Ctrl + O
- Press Enter
- Press Ctrl + X
Reload .bashrc:
hljs bash
source ~/.bashrcCheck:
hljs bash
hadoop version11. Understand Hadoop Folder Structure
Inside the Hadoop directory:
- bin/ → Hadoop commands
- sbin/ → start/stop scripts
- etc/hadoop/ → configuration files
- logs/ → logs generated later
All configuration happens inside
etc/hadoop.12. Configure Hadoop (Single Node Cluster)
Go into config directory:
hljs bash
cd $HADOOP_HOME/etc/hadoop12.1 Edit hadoop-env.sh
hljs bash
nano hadoop-env.shFind the line:
text
export JAVA_HOME=Replace with:
text
export JAVA_HOME=/usr/lib/jvm/java-21-openjdk-amd64Save and exit.
12.2 core-site.xml
hljs bash
nano core-site.xmlPaste this:
text
<?xml version="1.0"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoop_tmp</value>
</property>
</configuration>12.3 hdfs-site.xml
hljs bash
nano hdfs-site.xmlPaste:
text
<?xml version="1.0"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/hadoop_tmp/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/hadoop_tmp/hdfs/datanode</value>
</property>
</configuration>12.4 mapred-site.xml
hljs bash
nano mapred-site.xmlPaste:
text
<?xml version="1.0"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>12.5 yarn-site.xml
hljs bash
nano yarn-site.xmlPaste:
text
<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>13. Create Hadoop Data Directories
hljs bash
mkdir -p ~/hadoop_tmp/hdfs/namenode
mkdir -p ~/hadoop_tmp/hdfs/datanode14. Format NameNode
hljs bash
hdfs namenode -formatIf you see "successfully formatted", it's correct.
15. Start Hadoop
Start HDFS:
hljs bash
start-dfs.shStart YARN:
hljs bash
start-yarn.shCheck running processes:
hljs bash
jpsYou should see:
- NameNode
- DataNode
- SecondaryNameNode
- ResourceManager
- NodeManager
16. Access Hadoop on Windows Browser
Open Chrome/Edge on Windows:
NameNode:
text
http://localhost:9870/YARN ResourceManager:
text
http://localhost:8088/17. Test HDFS
hljs bash
hdfs dfs -mkdir /user/hadoop
echo "hello world" > test.txt
hdfs dfs -put test.txt /user/hadoop/
hdfs dfs -ls /user/hadoop/
hdfs dfs -cat /user/hadoop/test.txt18. Stop Hadoop
hljs bash
stop-yarn.sh
stop-dfs.sh