DSE - Data Science Engineering Course Descriptions

DSE I1020 Introduction to Data Science

This course will present a survey to Data Science and introduce some of the core data science tools. While some programming experience is required for the course, the course will include a rapid introduction to Data Science programming and the stack of tools needed to process, visualize and analyze data stack with a language such as R or Python. Students will be given a high-level survey of data engineering, visual analytics, applied statistics, machine learning, and big data. The course will illustrate this bringing them through real data sets and case studies.

Credits

3

Prerequisites

CSC 10200/CSC10300

Contact Hours

3 hr./wk.

DSE I1030 Applied Statistics

This course will examine real data sets from a variety of domains, examine multiple models for these data sets, assess the validity of modeling assumptions, and determine the strength of conclusions that can be drawn. Topics to be covered include: 1) inferential statistics (such as hypothesis testing and estimation in parametric and nonparametric settings, conditional inference, resampling methods, cross-validation, and multiple hypothesis testing); 2) experimental design (analysis of variance) 3) Bayesian statistics (such as prior distributions, posterior and predictive inference, and Bayesian model comparison); 4) Regression and prediction (such as elements of linear and nonparametric regression, assessment of variable importance, introduction to causal inference). The course will include project-based learning and use a statistical programming language such as R or python. A strong emphasis will be placed on the critical analysis of modeling assumptions in real-world settings.

Credits

3

Prerequisites

CSC 10200/CSC10300

Contact Hours

3 hr./wk.

DSE I2100 Applied Machine Learning and Data Mining

Introduction to machine learning, data mining, and statistical pattern recognition. Topics include: 1) Supervised learning (parametric/non-parametric algorithms, support vector machines, kernels, neural networks, deep learning), 2) Unsupervised learning (clustering, non-parametric techniques, dimensionality reduction); 3) Best practices in machine learning (bias/variance theory, model selection and evaluation, resampling). In this class, you will learn about the most effective machine learning techniques, and gain practice implementing them and getting them to work for yourself. More importantly, you'll learn about not only the theoretical underpinnings of learning, but also gain the practical know-how needed to quickly and powerfully apply these techniques to new problems.

Credits

3

Prerequisites

DSE I1020 and DSE I1030, or equivalents.

Contact Hours

3 hr./wk.

DSE I2400 Data Engineering: Infrastructure and Applications

This course will train students in the handling of big data sources derived from various environments including traditional business activities, web-based transactions and social media. The course will also discuss the range of data formats, application types and emerging approaches in data integration. As part of this it will introduce the range of research topics and mentors participating in the Data Science and Engineering Program and offering capstone project opportunities. The course will begin with a discussion of high-end traditional database systems focusing on query processing, crash recovery, and transaction and concurrency control. This will be followed by a detailed look at object-relational databases, distributed and federated databases, and cloud-based data-warehousing. NoSql databases (e.g., Cassandra and Neo4) and parallel data analysis tools (e.g., Hadoop, Spark) will be introduced. The main emphasis of the course is hands-on training in state-of-the-art software development environments. Project based system development work will be an essential component of the course. Prereq. DSE I1020, Intro to Data Science and DSE I1030, Applied Statistics, or equivalents. 3 hr./wk.; 3 cr.

Credits

3

Prerequisites

DSE I1020 and DSE I1030, or equivalents

Contact Hours

3 hr./wk.

DSE I2450 Big Data and Scalable Computation

The course aims to provide a broad understanding of big data and current technologies in managing and processing them with a focus on the urban environment. With storage and networking getting significant cheaper and faster, big data sets could easily reach the hands of data enthusiasts with just a few mouse clicks. These enthusiasts could be policy makers, government employees or managers, who would like to draw insights and (business) value from big data. Thus, it is crucial for big data to be made available to the non-expert users in such a way that they can process the data without the need of a supercomputing expert. One such approach is to build big data programming frameworks that can deal with big data in as close a paradigm as the way it deals with “small data.” Also, such a framework should be as simple as possible, even if not as efficient as custom-designed parallel solutions. Users should expect that if their code works within these frameworks for small data, it will also work for big data. General topics of this course include: big data ecosystems, parallel and streaming programming model, MapReduce, Hadoop, Spark, Pig, and NoSQL solutions. Hands-on labs and exercises will be offered throughout to bolster the knowledge learned in each module.

Credits

3

Prerequisites

DSE I1020 and DSE I1030, or equivalents.

Contact Hours

3 hr./wk.

DSE I2700 Visual Analytics

This course will give an overview of visual analytics as well as the overlapping fields of information and scientific visualization. Students will learn to programmatically process and analyze data with Python libraries widely used in statistics, engineering, science and finance. We will cover the design of effective visualizations. Students will learn to build data visualizations directly using a variety of data visualization libraries such as matplotlib, seaborn, and bokeh (Python) and interactive web-based visual analytics using JavaScript and D3. Project groups of students will each propose, design and build a visualization of a data set. The goals of the course are for students to: (1) Recognize the appropriate applications and value of visualizations; (2) Critically evaluate visualizations and suggest improvements and refinements; (3) Apply a structured design process to create effective visualizations; (4) Use programmatic tools to scrape, clean, and process data; (5) Use principles of human perception and cognition in visual analytics design; (6) Use visual analytics and statistics tools to explore data; and (7) Create web-based interactive visualizations.

Credits

3

Prerequisites

DSE I1020 and DSE I1030 or equivalents. This course also requires students have programming experience such as CSC 10200/ CSC 10300 or equivalent.

Contact Hours

3 hr./wk.

DSE I9800 Capstone Project

A capstone project is experimental project under the direction of a faculty advisor. All students will register and submit a project report after one semester to receive a grade. Students may work together on the same data sets and challenges but must establish separate subprojects, and submit individual reports/thesis. These independent study projects should involve an analysis of a data set in an application field using statistical learning/data mining techniques such as non-linear regression, supervised/unsupervised learning, dimension reduction, reinforcement learning, collaborative filtering or big-data methodology such as map-reduce/spark.

Credits

3

Prerequisites

DSE I1020, and DSE I1030, DSE I2400.

Contact Hours

3 hr./wk.

DSE I9900 Capstone Thesis

Students, with approval from their mentor, and the program, may register for a two-semester independent study (capstone thesis) with similar specifications to DSE I9800 Capstone Project but of substantially larger scope. Students will be required to submit a project report to the program at the end of the first semester in addition to the Thesis at the end of the course.

Credits

6

Prerequisites

DSE I1020 and DSE I1030, DSE I2400, and DSE I9800

Contact Hours

3 hr./wk.