Spring 2014

Big Data


Prof. Terrance Boult (
Co-Instructor: Abhijit Bendale, PhD Student (

Schedule/Time/Location : Tues 4:45-7:30PM  A206 Obsorne (BI LAB)


This is a programming heavy course with lots of new tools and techniques introduced in every class. Students should be familiar with C/C++, elementary OOP concepts, elementary computer architecture and operating systems. Familiarity with at least one of the programming languages such as Java, R, Python, Matlab etc. will be beneficial.

The goal of this class is to provide students with extensive hands on experience in multiple programming paradigms in the domain of High Performance Computing. The course will revolve around 3 major themes:

  1. Large Scale Data Analysis Techniques: Statistics Basics, Machine Learning, Classification, Logistic Regression and other tools for analyzing large-scale data.
  2. Parallel Programming Techniques: Nvidia GPUs, Nvidia Tegra Chipset, CUDA Hardware, CUDA Threading and memory model, CUDA performance monitoring and optimization.
  3. Big Data / Cloud Computing Techniques: Map-Reduce, Introduction to Hadoop, Pig, Hive, Amazon Web Services

Target Audience:
Undergraduate and graduate students who want an exposure to cutting edge techniques in the new and emerging field of Data Science and High Performance Computing. This course will also be great for undergraduate and graduate students from all departments who want to speed up their computationally intensive research.



Preliminary  Syllabus (subject to change):


  1. Introduction to high big data analysis  (01/21/2014)
  2. Machine Learning Tools and Libraries: Scikits-learn, Weka, LibSVM etc (01/28/2014)
  3. Machine Learning and Statistics Techniques: Singular Value Decomposition, Principal Component Analysis, Probability, Naïve Bayes, Data Plotting (02/04/2014)
  4. Classification and Prediction: Support Vector Machines, Spam Filtering, Adaboost (02/11/2014)
  5. Forecasting numeric values with Regression: Logistic Regression, Tree-based regression (hands – on approach), CART (02/18/2014)
  6. BigData Research Paper presentation.  (Grad student only, Ugrads graded on class participation) (02/25/2014)
  7. Introduction to Parallel Programming : GPU Programming (03/04/2014)
  8. Installing and Using CUDA: Tutorial Session (03/11/2014)
  9. Project Proposal Presentations (03/118/2014)
  10. Understanding CUDA hardware, CUDA threading and memory model (Kernel Based parallel execution) (04/01/2014)
  11. Optimizing GPU programs: Parallel algorithm patterns (reduce/scan, stencil and sparse computation, performance tuning). (04/08/2014)
  12. Introduction to MapReduce, Hadoop, Hbase, Pig, Hive (04/15/2014)
  13. Case Studies: Twitter/Facebook post analysis, Analyzing personal photo album (04/22/2014)
  14. Final Project Presentations – I (04/29/2014)
  15. Final Project Presentations – II (05/06/2014)


  1. NetFlix Challenge  (Algorithm Design) see also and this (Due Date : 03/11/2014)
  2. Research paper presentations (and questions about others presentations) (Presentation date: 02/25/2014)
  3. Parallelize an existing numerical computing/linear algebra/machine-learning algorithm using CUDA on Nvidia GPU. (Due Date: 04/22/2014)
  4. BigData  Final Project (Due Date: 04/29/2014)

Grading (Grad/Ugrad)   (Subject to change)

  1. NetFlix Challenge  10% / 10%
  2. Research paper presentations  10% / 5% (participation only)
  3. CUDA   10% / 15%
  4. BigData Project Proposal  10% / 10%
  5. BigData  Final Project  & Presentation (40% / 40%)
  6. Class Participation: 20% / 20%

Text Books:

  1. “Machine Learning in Action” Peter Harrington, Manning Publications
  2. Programming Massively Parallel Processors: A hands on approach, David Kirk, Wen-mei-.W Hwu, Morgan Kaufmann 2nd edition 2012

Reference Books:

  1. “CUDA by Example: An Introduction to General-Purpose GPU Programming” Jason Sanders, Edward Kandrot Addison Wesley Professional, 2010
  2. “Hadoop: The definitive guide” Tom White Oreilly 2012.

Leave a Reply

Your email address will not be published.