BigData

Spring 2014

Big Data

Faculty:

Prof. Terrance Boult (tboult@vast.uccs.edu)
Co-Instructor: Abhijit Bendale, PhD Student (abendale@vast.uccs.edu)

Schedule/Time/Location : Tues 4:45-7:30PM A206 Obsorne (BI LAB)

Pre-requisites:

This is a programming heavy course with lots of new tools and techniques introduced in every class. Students should be familiar with C/C++, elementary OOP concepts, elementary computer architecture and operating systems. Familiarity with at least one of the programming languages such as Java, R, Python, Matlab etc. will be beneficial.

Description:
The goal of this class is to provide students with extensive hands on experience in multiple programming paradigms in the domain of High Performance Computing. The course will revolve around 3 major themes:

  1. Large Scale Data Analysis Techniques: Statistics Basics, Machine Learning, Classification, Logistic Regression and other tools for analyzing large-scale data.
  2. Parallel Programming Techniques: Nvidia GPUs, Nvidia Tegra Chipset, CUDA Hardware, CUDA Threading and memory model, CUDA performance monitoring and optimization.
  3. Big Data / Cloud Computing Techniques: Map-Reduce, Introduction to Hadoop, Pig, Hive, Amazon Web Services

Target Audience:
Undergraduate and graduate students who want an exposure to cutting edge techniques in the new and emerging field of Data Science and High Performance Computing. This course will also be great for undergraduate and graduate students from all departments who want to speed up their computationally intensive research.

 

Class Size: 22

Lectures:

  1. Introduction to high big data analysis (01/21/2014) pdf
  2. Machine Learning Tools and Libraries: Scikits-learn, Weka, LibSVM etc (01/28/2014) pdf
  3. Machine Learning and Statistics Techniques: Singular Value Decomposition, Principal Component Analysis, Probability, Naive Bayes, Data Plotting (02/04/2014) pdf
  4. Classification and Prediction: Support Vector Machines, Spam Filtering, Adaboost (02/11/2014) pdf
  5. Forecasting numeric values with Regression: Logistic Regression, Tree-based regression (hands – on approach), CART (02/18/2014)
  6. BigData Research Paper presentation. (Grad student only, Ugrads graded on class participation) (02/25/2014)
  7. Introduction to Parallel Programming : GPU Programming (03/04/2014)pdf
  8. Installing and Using CUDA: Tutorial Session (03/11/2014)
  9. Project Proposal Presentations (03/118/2014)
  10. Understanding CUDA hardware, CUDA threading and memory model (Kernel Based parallel execution) (04/01/2014) pdf
  11. CUDA Assignment, Scaling your App, AWS etc (04/08/2014) pdf
  12. Introduction to MapReduce, Hadoop, Hbase, Pig, Hive (04/15/2014)pdf
  13. Case Studies: Twitter/Facebook post analysis, Analyzing personal photo album (04/22/2014)pdf
  14. Final Project Presentations – I (04/29/2014)
  15. Final Project Presentations – II (05/06/2014)
*Class presentations were created from numerous sources from the web. Wherever possible, I have tried to acknowledge the source. If you feel I have used your material and forgot to acknowledge, please let me know. I will be happy to add link to your page.

Student Projects

    The course consisted a compulsory project using either CUDA or BigData tools (e.g. Hadoop/Pig/Hive/Hbase etc.). Students did fantastic projects as a part of this class. Following are the titles of projects pursued by students during the course.

  1. "Landmark Classification in Large Scale Image Collections", Swati Dhamija, Jahnavi Yeddanupuddy
  2. "GPU Enhanced Exemplar Codes for Facial Attribute Classification", Ethan Rudd. Also accepted for publication at CVPR 2014 BigVision Workshop
  3. "Facesin the Clouds : Efficient and Privacy Enhanced Face Recognition System in the Cloud", Albahdal Abdullah
  4. "Parallel Text Search through CUDA Programming", Suresh Rajagopal
  5. "Restaurant Health Inspection Ratings", Austin Moorehouse, Brett Martin
  6. "Reconsidering the Coctail-Party Problem: A comparison between ICA and PCA using Birds of Colorado Dataset", Suzana Snyder, Andrew Norton
  7. "Identifying tags from millions of questions", Ramasamy Mohan, Gauri Kulkarni
  8. "Predicting Ratings from Review Contents", Chloe Bradley, Linda Hammons
  9. "Crime Analysis and Business Trends",Adothya Mylavarapu
  10. "Improving the Identification of Students At-Risk of Suicide", Shawn Fagan
  11. "Expanding CrytoHaze GRT to sypport SHA-2", Donovan Thorpe, Michael Lockette
  12. "Finding Visual Concepts by Web Image Mining", Raviteja Billa, Vishuteja Bandemneni