Syllabus Outline
8 min readEdit on GitHub
Big Data Analytics - Course Navigation
VEER MADHO SINGH BHANDARI UTTARAKHAND TECHNICAL UNIVERSITY, DEHRADUN
Course Information
Course Name: Big Data Analytics
Course Code: CST-043
Credits: 3:0:0 (L:T:P) - 3 Credits
Department: Computer Science and Engineering
COURSE OBJECTIVES
The objectives of this course are to:
- Make students comfortable with tools and techniques required in handling large amounts of datasets.
- Uncover various terminologies and techniques used in Big Data.
- Use several tools publicly available to illustrate the application of these techniques.
- Know about the research that requires the integration of large amounts of data.
COURSE OUTCOMES
On successful completion of this course, the students will be able to:
- Identify and distinguish big data analytics applications.
- Design efficient algorithms for mining the data from large volumes.
- Analyze the HADOOP and Map Reduce technologies associated with big data analytics.
- Understand the fundamentals of various big data analytics techniques.
- Present cases involving big data analytics in solving practical problems.
DETAILED COURSE NAVIGATION
UNIT 1 — Introduction to Big Data
Recommended for Better Understanding: Before diving into the detailed notes, read the Fun Version of Unit 1 → — filled with real-world analogies, mermaid diagrams, emoji-powered explanations, self-test quizzes & mnemonics! It makes everything click.
UNIT 2 — Mining Data Streams
Recommended for Better Understanding: Kick off this unit with the Fun Version of Unit 2 → — Niagara Falls stream analogies, Formula 1 IoT examples, DGIM algorithm walkthroughs, Reservoir Sampling code, and a full mindmap! Way easier than jumping straight into theory.
UNIT 3 — Hadoop
Recommended for Better Understanding: This is the biggest unit! Start with the Fun Version of Unit 3 → — toy elephant origin story, annotated Java WordCount, Python Streaming mappers, YARN scheduler comparison, all 18 topics with analogies, and a Mermaid mindmap at the end!
UNIT 4 — Big Data Frameworks
Recommended for Better Understanding: Read the Fun Version of Unit 4 → first! Pig vs Hive chef/playwright analogy, full Hive Services architecture diagram, HBase 4D data model, ZooKeeper smoke-alarm analogy, ZNode leader-election race, hotspot row-key fix, and an IBM Streams SPL example. Makes the theory crystal-clear!
UNIT 5 — Predictive Analytics
Recommended for Better Understanding: Finish strong with the Fun Version of Unit 5 → — GPS/rearview-mirror analogy for the analytics spectrum, OLS tug-of-war explanation, Anscombe's Quartet demo, full diagnostic dashboard code in Python, Plotly Dash interactive app, Altair brushing-and-linking, end-to-end MLOps pipeline and a churn prediction FastAPI!
ACADEMIC INFORMATION
Assessment Pattern
- Internal Assessment: 30 Marks (Assignments, Class Tests, Attendance)
- End Semester Examination: 70 Marks (Theory Paper)
- Total: 100 Marks
Prerequisites
- Basic knowledge of programming (preferably Java/Python)
- Understanding of database concepts
- Familiarity with basic statistics and mathematics
Learning Resources
- Laboratory: Hands-on experience with Hadoop, Spark, and Big Data tools
- Online Platforms: Access to cloud computing environments for practical sessions
- Case Studies: Real-world industry applications and datasets
NAVIGATION GUIDE
How to Use This Course Navigation:
- Fun Read: (START HERE!) Engaging, analogy-rich versions with mermaid diagrams, emoji explanations, real-world examples, annotated code, self-test quizzes & mnemonics. Best for first-time learning and building intuition!
- Detailed: Complete in-depth explanations with examples and comprehensive coverage — use after the Fun version for depth
- Summary: Concise overview of key concepts and important points
- 🇮Hinglish Summary: Easy-to-understand explanations in Hindi-English mix for better comprehension
Recommended Study Path:
- Read the Fun Version first — builds intuition with stories, analogies & visuals
- Read the Detailed version — fills in all the technical depth
- Use the Summary for quick concept review
- Use the Hinglish Summary for last-minute revision before exams
Fun Version Highlights (what's inside each one):
| Unit | Fun Version | Key Highlights |
|---|---|---|
| Unit 1 | Open | 5V card/shoebox analogies, Analytics Maturity Ladder, Mermaid diagrams, mnemonics |
| Unit 2 | Open | Niagara Falls analogy, Reservoir Sampling code, HyperLogLog, DGIM walkthrough |
| Unit 3 | Open | Annotated Java WordCount, Python Streaming, YARN scheduler comparison, HDFS cheatsheet |
| Unit 4 | Open | Pig Swiss army knife, Hive Services diagram, HBase 4D model, ZooKeeper smoke-alarm |
| Unit 5 | Open | OLS tug-of-war, Anscombe's Quartet, diagnostic dashboard, Dash app, FastAPI churn model |
For University Examination Preparation:
- Focus on Detailed versions for complete coverage
- Use Summary versions for quick revision before exams
- Practice the quiz questions at the end of each Fun version — they mirror exam question patterns!
PRESCRIBED TEXTBOOKS
- Michael Berthold, David J. Hand - "Intelligent Data Analysis", Springer, 2007.
- Tom White - "Hadoop: The Definitive Guide" Third Edition, O'reilly Media, 2012.
- Chris Eaton, Dirk DeRoos, Tom Deutsch, George Lapis, Paul Zikopoulos - "Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data", McGrawHill Publishing, 2012.
- Anand Rajaraman and Jeffrey David Ullman - "Mining of Massive Datasets", CUP, 2012.
- Bill Franks - "Taming the Big Data Tidal Wave" (Note: Complete reference may be updated)