This course offers an introduction to large-scale data analytics, focusing on the methods and technologies that enable the extraction of actionable insights from massive datasets. Big Data analytics involves uncovering non-trivial knowledge and patterns within vast collections of data, using advanced computational tools and statistical models. The course covers two primary areas: (1) the programming techniques employed by data scientists for large-scale data processing, and (2) the models used for data analysis.
On the technical side, students will explore fundamental systems and techniques for managing and storing large volumes of data. We will explore modern cluster computing systems, with a focus on MapReduce-based frameworks such as Hadoop and Apache Spark, and cover their role in distributed data processing. On the modeling side, the course will cover key supervised and unsupervised learning models, providing a solid foundation in data mining techniques.