MapReduce Mini Practice
3 min readEdit on GitHub
MapReduce Mini Practice
STEP 1 — Check the file exists in HDFS
hljs jsx
hdfs dfs -ls /data/data.csvIf it shows the file → perfect.
STEP 2 — Delete old output if it exists
(Hadoop NEVER overwrites output folder)
hljs jsx
hdfs dfs -rm -r /data/output_wcIf folder doesn’t exist, ignore the warning.
STEP 3 — Confirm where your Hadoop lives
You have Hadoop installed under
/home/hadoop/hadoop. That’s the value we will use for HADOOP_MAPRED_HOME.Check the path is correct:
hljs jsx
ls -ld /home/hadoop/hadoop
ls -l /home/hadoop/hadoop/share/hadoop/mapreduce- For Mac
text
ls -ld /opt/homebrew/Cellar/hadoop/3.4.2
ls -l /opt/homebrew/Cellar/hadoop/3.4.2/libexec/share/hadoop/mapreduce
#if this doesn't work then change the path "/opt/homebrew/opt/hadoop/libexec/share/hadoop/mapreduce/" as given in inverted commas and run it againYou should see the mapreduce jars (you already do: hadoop-mapreduce-examples-3.4.2.jar etc).
STEP 4 - Edit mapred-site.xml to set HADOOP_MAPRED_HOME and framework to YARN
Open (or create if missing) the file:
- For windows
hljs jsx
nano /home/hadoop/hadoop/etc/hadoop/mapred-site.xmlReplace
/home/hadoop/hadoop below with your Hadoop root if different. Add these properties inside <configuration>...</configuration>:hljs jsx
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=/home/hadoop/hadoop</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=/home/hadoop/hadoop</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=/home/hadoop/hadoop</value>
</property>- For Mac
You don't have to do this as we have configured this file in Hadoop_Installation(for Mac).md file earlier
Save file.
Notes:
mapreduce.framework.name = yarnensures MR runs on YARN (required).yarn.app.mapreduce.am.envtells YARN what environment to use for the AM (so it can find MR jars).mapreduce.map.env/mapreduce.reduce.envtell the map/reduce task containers where to find MR jars
STEP 5 - Export HADOOP_MAPRED_HOME in hadoop-env
Edit
hadoop-env.sh so daemons started on this node have the variable:- For windows
hljs jsx
nano /home/hadoop/hadoop/etc/hadoop/hadoop-env.sh- For Mac
hljs jsx
nano $HADOOP_HOME/etc/hadoop/hadoop-env.shAdd near the top:
hljs jsx
export HADOOP_MAPRED_HOME=/opt/homebrew/opt/hadoop/libexec
export HADOOP_HOME=/opt/homebrew/opt/hadoop/libexecSTEP 6 - Start Hadoop/YARN
hljs jsx
start-dfs.sh
start-yarn.shSTEP 7 - Run your WordCount job
- For Windows
hljs jsx
hadoop jar /home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.2.jar wordcount /data/data.csv /data/output_wc- For Mac
hljs jsx
hadoop jar /opt/homebrew/Cellar/hadoop/3.4.2/libexec/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.2.jar wordcount /data/data.csv /data/output_wcSTEP 8 - View the WordCount output
hljs jsx
hdfs dfs -cat /data/output_wc/part-r-00000 | headAdditional - If you want to remove the existing output
hljs jsx
hdfs dfs -rm -r /data/output_wcGREP
grep MapReduce searches for a pattern inside your CSV stored in HDFS.Find all lines containing "error”
- For Windows
hljs jsx
hdfs dfs -rm -r /data/grep_output
hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.2.jar grep /data/data.csv /data/grep_output "error"- For Mac
text
hdfs dfs -rm -r /data/grep_output
hadoop jar /opt/homebrew/Cellar/hadoop/3.4.2/libexec/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.2.jar grep /data/data.csv /data/grep_output "error"
#if this doesn't work then change the path "/opt/homebrew/opt/hadoop/libexec/share/hadoop/mapreduce/" as given in inverted commas and run it againOutput
hljs jsx
hdfs dfs -cat /data/grep_output/part-r-00000Example: Find "Harshit”
- For Windows
hljs jsx
hdfs dfs -rm -r /data/grep_output
hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.2.jar grep /data/data.csv /data/grep_output "Harshit"- For Mac
text
hdfs dfs -rm -r /data/grep_output
hadoop jar /opt/homebrew/Cellar/hadoop/3.4.2/libexec/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.2.jar grep /data/data.csv /data/grep_output "Harshit"
#if this doesn't work then change the path "/opt/homebrew/opt/hadoop/libexec/share/hadoop/mapreduce/" as given in inverted commas and run it againOutput
hljs jsx
hdfs dfs -cat /data/grep_output/part-r-00000