Two for-credit courses are currently part of the NCDS Pilot Data Science curriculum. Students in both courses receive access to a computing cluster designed to support big data problems. Guest lecturers from NCDS member institutions will also be featured in the classes.
ST 810. Big Data: A Statistical Perspective
Instructor: Lexin Li, associate professor, department of statistics, North Carolina State University
The ability to turn data into information, then into action, presents many challenges. It requires multiple skills: a deep understanding of the field generating the data in order to ask the right questions; the mathematical skills to build models from complicated and messy data; engineering skills to carry out operations on very large data sets; and creative thinking capability to find insights and tell stories from the data.
ST 810 aims to help students develop statistical modeling skills by covering a wide range of modern statistical and machine learning techniques. The course focuses on the statistical perspective, however, it will connect with the engineering aspect of big data through detailed discussions about computing algorithms. The course emphasis will be on interpreting and finding insight in big data using real-world applications.
INLS 690-163. Introduction to Big Data and NoSQL
Instructor: Arcot Rajasekar, professor, School of Information and Library Science, University of North Carolina at Chapel Hill
This class will prepare students on current and emerging practices for dealing with big data and large-scale database systems used by many social networking services, such as Facebook and Twitter. Social networking services—as well as many science and business domains—generate data at an exponential scale and handling and analyzing these data requires different tools and processes than the commonly used business applications.
These new tools and applications are highly data intensive and must support heavy read/write workloads. In addition, new databases belonging to the emerging genre called NoSQL are used by social media services. Using and managing these database systems is very different from traditional relational databases. This course will look at several of these systems. Topics covered will include: fundamentals of big data, big data analytics, and NoSQL systems; examples of big data analytics such as Map Reduce and Hadoop; examples of NoSQL systems such as Google’s BigTable, Amazon’s Dynamo, Apache Cassandra, Apache HBase, MongoDB, Voldemort, CouchDB, and SimpleDB; and supporting systems such as Google’s file system and Chubby file system.