Syllabus Outline

8 min readEdit on GitHub

Big Data Analytics - Course Navigation

VEER MADHO SINGH BHANDARI UTTARAKHAND TECHNICAL UNIVERSITY, DEHRADUN

Course Information

Course Name: Big Data Analytics Course Code: CST-043 Credits: 3:0:0 (L:T:P) - 3 Credits Department: Computer Science and Engineering

COURSE OBJECTIVES

The objectives of this course are to:
  1. Make students comfortable with tools and techniques required in handling large amounts of datasets.
  2. Uncover various terminologies and techniques used in Big Data.
  3. Use several tools publicly available to illustrate the application of these techniques.
  4. Know about the research that requires the integration of large amounts of data.

COURSE OUTCOMES

On successful completion of this course, the students will be able to:
  1. Identify and distinguish big data analytics applications.
  2. Design efficient algorithms for mining the data from large volumes.
  3. Analyze the HADOOP and Map Reduce technologies associated with big data analytics.
  4. Understand the fundamentals of various big data analytics techniques.
  5. Present cases involving big data analytics in solving practical problems.

DETAILED COURSE NAVIGATION

UNIT 1 — Introduction to Big Data

Recommended for Better Understanding: Before diving into the detailed notes, read the Fun Version of Unit 1 → — filled with real-world analogies, mermaid diagrams, emoji-powered explanations, self-test quizzes & mnemonics! It makes everything click.

UNIT 2 — Mining Data Streams

Recommended for Better Understanding: Kick off this unit with the Fun Version of Unit 2 → — Niagara Falls stream analogies, Formula 1 IoT examples, DGIM algorithm walkthroughs, Reservoir Sampling code, and a full mindmap! Way easier than jumping straight into theory.

UNIT 3 — Hadoop

Recommended for Better Understanding: This is the biggest unit! Start with the Fun Version of Unit 3 → — toy elephant origin story, annotated Java WordCount, Python Streaming mappers, YARN scheduler comparison, all 18 topics with analogies, and a Mermaid mindmap at the end!

UNIT 4 — Big Data Frameworks

Recommended for Better Understanding: Read the Fun Version of Unit 4 → first! Pig vs Hive chef/playwright analogy, full Hive Services architecture diagram, HBase 4D data model, ZooKeeper smoke-alarm analogy, ZNode leader-election race, hotspot row-key fix, and an IBM Streams SPL example. Makes the theory crystal-clear!

UNIT 5 — Predictive Analytics

Recommended for Better Understanding: Finish strong with the Fun Version of Unit 5 → — GPS/rearview-mirror analogy for the analytics spectrum, OLS tug-of-war explanation, Anscombe's Quartet demo, full diagnostic dashboard code in Python, Plotly Dash interactive app, Altair brushing-and-linking, end-to-end MLOps pipeline and a churn prediction FastAPI!

ACADEMIC INFORMATION

Assessment Pattern

  • Internal Assessment: 30 Marks (Assignments, Class Tests, Attendance)
  • End Semester Examination: 70 Marks (Theory Paper)
  • Total: 100 Marks

Prerequisites

  • Basic knowledge of programming (preferably Java/Python)
  • Understanding of database concepts
  • Familiarity with basic statistics and mathematics

Learning Resources

  • Laboratory: Hands-on experience with Hadoop, Spark, and Big Data tools
  • Online Platforms: Access to cloud computing environments for practical sessions
  • Case Studies: Real-world industry applications and datasets

How to Use This Course Navigation:
  • Fun Read: (START HERE!) Engaging, analogy-rich versions with mermaid diagrams, emoji explanations, real-world examples, annotated code, self-test quizzes & mnemonics. Best for first-time learning and building intuition!
  • Detailed: Complete in-depth explanations with examples and comprehensive coverage — use after the Fun version for depth
  • Summary: Concise overview of key concepts and important points
  • 🇮Hinglish Summary: Easy-to-understand explanations in Hindi-English mix for better comprehension
Recommended Study Path:
  1. Read the Fun Version first — builds intuition with stories, analogies & visuals
  2. Read the Detailed version — fills in all the technical depth
  3. Use the Summary for quick concept review
  4. Use the Hinglish Summary for last-minute revision before exams
Fun Version Highlights (what's inside each one):
UnitFun VersionKey Highlights
Unit 1 Open5V card/shoebox analogies, Analytics Maturity Ladder, Mermaid diagrams, mnemonics
Unit 2 OpenNiagara Falls analogy, Reservoir Sampling code, HyperLogLog, DGIM walkthrough
Unit 3 OpenAnnotated Java WordCount, Python Streaming, YARN scheduler comparison, HDFS cheatsheet
Unit 4 OpenPig Swiss army knife, Hive Services diagram, HBase 4D model, ZooKeeper smoke-alarm
Unit 5 OpenOLS tug-of-war, Anscombe's Quartet, diagnostic dashboard, Dash app, FastAPI churn model
For University Examination Preparation:
  • Focus on Detailed versions for complete coverage
  • Use Summary versions for quick revision before exams
  • Practice the quiz questions at the end of each Fun version — they mirror exam question patterns!

PRESCRIBED TEXTBOOKS

  1. Michael Berthold, David J. Hand - "Intelligent Data Analysis", Springer, 2007.
  2. Tom White - "Hadoop: The Definitive Guide" Third Edition, O'reilly Media, 2012.
  3. Chris Eaton, Dirk DeRoos, Tom Deutsch, George Lapis, Paul Zikopoulos - "Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data", McGrawHill Publishing, 2012.
  4. Anand Rajaraman and Jeffrey David Ullman - "Mining of Massive Datasets", CUP, 2012.
  5. Bill Franks - "Taming the Big Data Tidal Wave" (Note: Complete reference may be updated)