Hadoop Essential Commands (HDFS)
3 min readEdit on GitHub
Hadoop Essential Commands (HDFS)
Core HDFS + Basic Hadoop Usage)
1. Understanding Command Structure
Every HDFS command starts with:
text
hdfs dfs -<command> <arguments>Example:
text
hdfs dfs -ls /If you remember that, you already know the pattern.
2. Directory Commands
These are the first commands any new Hadoop student must know.
Create a directory
text
hdfs dfs -mkdir /foldernameCreate directory with parent folders
text
hdfs dfs -mkdir -p /movies/rawList files and folders
text
hdfs dfs -ls /List recursively
text
hdfs dfs -ls -R /3. Uploading and Downloading Files
This is the most used set of commands.
Upload file from local filesystem to HDFS
text
hdfs dfs -put localfile.txt /movies/Upload and overwrite if already exists
text
hdfs dfs -put -f localfile.txt /movies/Copy local file to HDFS (alternative)
text
hdfs dfs -copyFromLocal file.txt /movies/Move local file to HDFS
text
hdfs dfs -moveFromLocal file.txt /movies/Download a file from HDFS to local
text
hdfs dfs -get /movies/moviesdata.jsonl .4. Viewing Files Stored in HDFS
Beginners must know how to see data inside HDFS.
Print file content
text
hdfs dfs -cat /movies/moviesdata.jsonlShow first lines of file
text
hdfs dfs -head /movies/moviesdata.jsonlShow last lines of file
text
hdfs dfs -tail /movies/moviesdata.jsonlPipe into more for page-by-page view
text
hdfs dfs -cat /movies/moviesdata.jsonl | more5. Copy, Move, Delete Files in HDFS
Basic file management commands.
Copy file inside HDFS
text
hdfs dfs -cp /movies/file1.txt /backup/file1.txtMove file inside HDFS
text
hdfs dfs -mv /movies/file1.txt /archive/Delete a file
text
hdfs dfs -rm /movies/file1.txtDelete directory with all files
text
hdfs dfs -rm -r /movies6. File and Storage Information Commands
Helps students understand file sizes, block info, etc.
Check size of a file or directory
text
hdfs dfs -du -h /moviesCheck free and used space in HDFS
text
hdfs dfs -df -h /Check block details of a file
text
hdfs fsck /movies/moviesdata.jsonl -files -blocks -locationsThis shows block size, number of blocks, and which DataNode stores each block.
7. Starting and Stopping Hadoop Services
Beginners always get confused here. Keep this section simple.
Start HDFS daemons (NameNode + DataNode)
text
start-dfs.shStart YARN daemons (ResourceManager + NodeManager)
text
start-yarn.shStop HDFS
text
stop-dfs.shStop YARN
text
stop-yarn.sh8. Checking Hadoop Components
Hadoop version
text
hadoop versionReport DataNode storage and cluster health
text
hdfs dfsadmin -reportThis shows:
- Total capacity
- Used space
- Free space
- Connected DataNodes
9. Running a Hadoop MapReduce Job
Running a MapReduce jar
text
hadoop jar myjob.jar MainClass /input /outputRunning a Hadoop Streaming job (Python)
text
hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-*.jar \
-input /movies/moviesdata.jsonl \
-output /movies/output \
-mapper mapper.py \
-reducer reducer.py \
-file mapper.py \
-file reducer.py10. Helpful Shortcuts
Delete output directory before rerunning job
text
hdfs dfs -rm -r /movies/outputTouch a new empty file
text
hdfs dfs -touchz /movies/empty.txtCheck file permissions
text
hdfs dfs -chmod 755 /movies