Schedule/Time/Location : Tues 4:45-7:30PM A206 Obsorne (BI LAB)
This is a programming heavy course with lots of new tools and techniques introduced in every class. Students should be familiar with C/C++, elementary OOP concepts, elementary computer architecture and operating systems. Familiarity with at least one of the programming languages such as Java, R, Python, Matlab etc. will be beneficial.
The goal of this class is to provide students with extensive hands on experience in multiple programming paradigms in the domain of High Performance Computing. The course will revolve around 3 major themes:
- Large Scale Data Analysis Techniques: Statistics Basics, Machine Learning, Classification, Logistic Regression and other tools for analyzing large-scale data.
- Parallel Programming Techniques: Nvidia GPUs, Nvidia Tegra Chipset, CUDA Hardware, CUDA Threading and memory model, CUDA performance monitoring and optimization.
- Big Data / Cloud Computing Techniques: Map-Reduce, Introduction to Hadoop, Pig, Hive, Amazon Web Services
Undergraduate and graduate students who want an exposure to cutting edge techniques in the new and emerging field of Data Science and High Performance Computing. This course will also be great for undergraduate and graduate students from all departments who want to speed up their computationally intensive research.
Class Size: 22
- Introduction to high big data analysis (01/21/2014) pdf
- Machine Learning Tools and Libraries: Scikits-learn, Weka, LibSVM etc (01/28/2014) pdf
- Machine Learning and Statistics Techniques: Singular Value Decomposition, Principal Component Analysis, Probability, Naive Bayes, Data Plotting (02/04/2014) pdf
- Classification and Prediction: Support Vector Machines, Spam Filtering, Adaboost (02/11/2014) pdf
- Forecasting numeric values with Regression: Logistic Regression, Tree-based regression (hands – on approach), CART (02/18/2014)
- BigData Research Paper presentation. (Grad student only, Ugrads graded on class participation) (02/25/2014)
- Introduction to Parallel Programming : GPU Programming (03/04/2014)pdf
- Installing and Using CUDA: Tutorial Session (03/11/2014)
- Project Proposal Presentations (03/118/2014)
- Understanding CUDA hardware, CUDA threading and memory model (Kernel Based parallel execution) (04/01/2014) pdf
- CUDA Assignment, Scaling your App, AWS etc (04/08/2014) pdf
- Introduction to MapReduce, Hadoop, Hbase, Pig, Hive (04/15/2014)pdf
- Case Studies: Twitter/Facebook post analysis, Analyzing personal photo album (04/22/2014)pdf
- Final Project Presentations – I (04/29/2014)
- Final Project Presentations – II (05/06/2014)
The course consisted a compulsory project using either CUDA or BigData tools (e.g. Hadoop/Pig/Hive/Hbase etc.). Students did fantastic projects as a part of this class. Following are the titles of projects
pursued by students during the course.
- "Landmark Classification in Large Scale Image Collections", Swati Dhamija, Jahnavi Yeddanupuddy
- "GPU Enhanced Exemplar Codes for Facial Attribute Classification", Ethan Rudd. Also accepted for publication at CVPR 2014 BigVision Workshop
- "Facesin the Clouds : Efficient and Privacy Enhanced Face Recognition System in the Cloud", Albahdal Abdullah
- "Parallel Text Search through CUDA Programming", Suresh Rajagopal
- "Restaurant Health Inspection Ratings", Austin Moorehouse, Brett Martin
- "Reconsidering the Coctail-Party Problem: A comparison between ICA and PCA using Birds of Colorado Dataset", Suzana Snyder, Andrew Norton
- "Identifying tags from millions of questions", Ramasamy Mohan, Gauri Kulkarni
- "Predicting Ratings from Review Contents", Chloe Bradley, Linda Hammons
- "Crime Analysis and Business Trends",Adothya Mylavarapu
- "Improving the Identification of Students At-Risk of Suicide", Shawn Fagan
- "Expanding CrytoHaze GRT to sypport SHA-2", Donovan Thorpe, Michael Lockette
- "Finding Visual Concepts by Web Image Mining", Raviteja Billa, Vishuteja Bandemneni