Master of Science in Data Science

Academic Requirements:  Students who apply to our M.S. program should have 2 semesters of calculus or 2 semesters of engineering calculus.

Application is found here

Updated 9/18/2023

1. Overview

The Master of Science program in data science requires 30 s.h. of graduate credit. It aims to train the next generation of data scientists with the analytical and technical skills to explore, formulate and solve complex data-driven problems in science, industry, business and government.  The program focuses on the theory, methodology, application and ethics for working with and learning from data. Students will acquire the abilities to develop and implement new or special purpose analysis and visualization tools, and a fundamental understanding of how to quantify uncertainty in data-driven decision-making. 

The coursework includes six core courses covering the fundamentals of data science including probability and statistics; data storage, access, and management; and data visualization, exploration, modeling, analysis and uncertainty quantification. Students will acquire hands-on experience in solving real-world problems, communication skills and data ethics via a required capstone project. Students choose three electives (9 s.h.) from a wide variety of courses on specialized data science topics offered by Statistics, Biostatistics, Computer Science and Business Analytics to enhance their skill set, based on their interests and career goals.

MS students in data science must maintain a g.p.a. of at least 2.75 in all work toward the degree and in additional relevant course work.

2. Courses

The M.S. with a major in data science requires the following coursework.

All of these:

DATA:4750 Probabilistic Statistical Learning Spring 3
DATA:5890 M.S. Data Science Practicum  Spring and Fall 2
DATA:3120 / STAT:3120   Probability and Statistics Fall 4
DATA:3200 / STAT:3200 Applied Linear Regression  Spring and Fall 3
DATA:4540 / STAT:4540 Statistical Learning Fall 3
DATA:4580 / STAT:4580 Data Visualization and Data Technologies  Spring 3
DATA:5400 / STAT:5400 Computing in Statistics Fall 3

At least 9 s.h. from these:

ACTS:6200 / DATA:6200 Predictive Analytics Spring 3
BAIS:6100 Text Analytics TBD 3
BAIS:6130 Applied Optimization TBD 3
BAIS:6210 Data Leadership and Management  TBD 3
BIOS:4150 (NEW) Data Science Foundations in R TBD 3
BIOS:6721 Machine Learning for Biomedical Data    
CS:4310  Design and Implementation of Algorithms TBD 3
CS:4400 Database Systems    TBD 3
CS:4420 Artificial Intelligence   3
CS:4470 Health Data Analytics  TBD 3
CS:5110 Introduction to Informatics TBD 3
CS:5430 Machine Learning TBD 3
CS:5630 Cloud Computing Technology TBD 3
DATA:3210 / STAT:3210 Experimental Design and Analysis Fall 3
DATA:4880 Data Science Creative Component Fall 2
MATH:4840 Mathematics of Machine Learning TBD 3
STAT:4520   Bayesian Statistics   Fall 3
STAT:4560 Statistics for Risk Modeling I Fall 3
STAT:5810 Research Data Management Fall 3
STAT:6220 Statistical Consulting Spring 3
STAT:6530 Environmental and Spatial Statistics Every other Fall 3
STAT:6550 Introductory Longitudinal Data Analysis Fall 3
STAT:6560 Applied Time Series Analysis Every other Spring

3

STAT:7400 Computer Intensive Statistics Spring 3
       
       

=================

 

3. Sample Schedule for MS Students in Data Science

 

Year 1 Fall Semester

STAT:3120 Probability and Statistics

STAT:3200 Applied Linear Regression

STAT:4540 Statistical/Machine Learning

STAT:5400 Statistical Computing

 

Year 1 Spring Semester

DATA:4750 Probabilistic Statistical Learning

STAT:4580  Data Visualization and Data Technologies

Two electives (or one, if two are taken in Year 2 Fall)

 

Year 2 Fall Semester

DATA:5890 MS Data Science Practicum*

One elective (or two)

* Students may substitute DATA:5890 by an appropriate internship/summer work experience, with pre-approval by the course instructor. 

 

Probability and Statistics (STAT: 3120, 4 s.h.). Basic concepts of probability, statistical models, discrete and continuous random variables and their distributions, expectations, conditional expectations, estimation of parameters, testing statistical hypotheses.

Applied linear regression (STAT: 3200, 3 s.h.). Regression analysis with focus on applications; model formulation, checking, selection; interpretation and presentation of analysis results; simple and multiple linear regression; logistic regression; ANOVA; hands-on data analysis with computer software.

Statistical Learning (STAT: 4540, 3 s.h.).  Introduction to supervised and unsupervised statistical learning, with a focus on regression, classification, and clustering; methods will be applied to real data using appropriate software; supervised learning topics include linear and nonlinear (e.g., logistic) regression, linear discriminant analysis, cross-validation, bootstrapping, model selection, and regularization methods (e.g., ridge and lasso); generalized additive and spline models, tree-based methods, random forests and boosting, and support-vector machines; unsupervised learning topics include principal components and clustering. Requirements: an introductory statistics course and a regression course. Recommendations: prior exposure to programming and/or software, such as R, SAS, and Matlab. 

Data Visualization and Data Technologies (STAT: 4580, 3 s.h.).  Introduces common techniques for visualizing univariate and multivariate data, data summaries, and modeling results. Students will learn how to create and interpret these visualizations, and to assess effectiveness of different visualizations based on an understanding of human perception and statistical thinking.  Data technologies for obtaining and preparing data for visualization and further analysis will also be discussed. Students will also learn how to present their results in written reports and to use version control to manage their work.

Computing in Statistics (STAT: 5400, 3 s.h.). Python, R; database management; graphical techniques; importing graphics into word-processing documents (e.g., LaTeX); creating reports in LaTeX; SAS; simulation methods (Monte Carlo studies, bootstrap, etc.). 

Probabilistic Statistical Learning (DATA:4750, 3 s.h.). This course focuses on essential machine learning and statistics ideas that are critical in analyzing modern complex and large data. Selected topics are covered in supervised learning: linear models, deep neural networks, and non-parametric models. Besides supervised learning, essential topics from non-linear dimension reduction, clustering, and recommender systems are part of the course.

Master’s Second-Year Core Courses (DATA:5890 MS Data Science Practicum) (1 course totaling 2 semester hours). Each student will be supervised by a faculty member to complete a project that solves a real‐world problem using knowledge gained from the core courses. Students are required to submit a written report and give an oral presentation of their projects; the written report must include the background and significance of the problem, analysis method, presentation and interpretation of the results including tables and visualization, discussion, and references, plus appendices comprising technical details and documentation of computer code used in the analysis. A capstone committee consisting of three faculty members will evaluate the capstone projects and assign the final grades (S or U), with inputs from the supervising faculty members.