DSE I2400 Data Engineering: Infrastructure and Applications

This course will train students in the handling of big data sources derived from various environments including traditional business activities, web-based transactions and social media. The course will also discuss the range of data formats, application types and emerging approaches in data integration. As part of this it will introduce the range of research topics and mentors participating in the Data Science and Engineering Program and offering capstone project opportunities. The course will begin with a discussion of high-end traditional database systems focusing on query processing, crash recovery, and transaction and concurrency control. This will be followed by a detailed look at object-relational databases, distributed and federated databases, and cloud-based data-warehousing. NoSql databases (e.g., Cassandra and Neo4) and parallel data analysis tools (e.g., Hadoop, Spark) will be introduced. The main emphasis of the course is hands-on training in state-of-the-art software development environments. Project based system development work will be an essential component of the course. Prereq. DSE I1020, Intro to Data Science and DSE I1030, Applied Statistics, or equivalents. 3 hr./wk.; 3 cr.

Prerequisite

DSE I1020 and DSE I1030, or equivalents

Credits

3

Contact Hours

3 hr./wk.