Data Mining, Bioinformatics, Data Warehousing

Steering Committee Coordinator:
Please contact Dr. Sumeet Dua regarding any questions about the Data Mining track in CAM.

Core Courses:
Math Core (9 SCH): (Common to all Tracks)
MATH 414 & MATH 415 with either MATH 407 or STAT 620/STAT 621

Math 407 (Partial Differential Equations)
3 Credit Hours. Preq, MATH 245, Solution of linear first order equations. Formation and
solution of second order problems of parabolic, elliptic, and hyperbolic type. (G)

Math 414 (Numerical Analysis)
3 Credit Hours. Preq, MATH 245, knowledge of a programming language. Roots of polynomial and other nonlinear equations. Interpolating polynomials. Numerical differentiation. Numerical Integration. Direct methods for solving linear systems.(G)

Math 415 (Numerical Analysis)
3 Credit Hours. Preq, MATH 245, knowledge of a programming language. Numerical applications of linear algebra. Curve fitting. Function approximation. Numerical solution of systems of equations, differential equations, systems of differential equations, boundary value problems. (G)

STAT 620 (Theory of Probability)
3 Credit Hours. Preq, any 500 level STAT Course, and MATH 244. Combinatorial analysis, conditional probability, distribution theory, random variables, random vectors, limit theorems and random walks.

STAT 621 (Theory of Statistics)
3 Credit Hours. Preq, STAT 520 or 620. Point estimation, interval estimation, statistical hypothesis, statistical tests, non parametric inference, and normal distribution theory.

CS Core (6 SCH): (Common to all Tracks)
CSC 428 & CSC 438

CSC 428 (Object Oriented Programming and Data Structures)
3 Credit Hours. Preq, consent of instructor. Programming paradigms, syntax, semantics, data types, expression, control statements and sub programs; object oriented concepts, abstract data types, recursion, queues and trees. (G)

CSC 438 (Special Topics in Software Development)
3 Credit Hours. Preq, consent of instructor. Selected topics in the area of software design that are of current importance or special interest .(G)

Supporting Core (CS - 3 SCH, MATH - 6 SCH)
CSC 579, STAT 625 or QA610, MATH 435

CSC 579 (Data Mining and Knowledge Discovery)
3 Credit Hours. Preq, CSC325/equivalent OR consent of instructor. Topics include: Introduction to Data Mining (DM), Knowledge Discovery in large databases, Data preprocessing and normalization, Dimensionality reduction, DM primitives, Mining frequent itemsets in large DBMS, Association rule mining, Classification and evaluation measures.

STAT 625 (Multivariate Statistics)
3 Credit Hours. Preq, STAT 506 OR 507 or 508 or equivalent. Test of hypotheses on means, multivariate analysis of variance, canonical correlation, principal components, factor analysis, and computer applications.

QA 610 (Multivariate Statistics: Business Applications)
3 Credit Hours. Preq, QA 522. Regression extensions, canonical correlation, multivariate ANOVA, discriminant, business applications, principal components using SAS, SPSS, and BMD, factor and cluster analysis.

Math 435 (Introduction to Graph Theory)
3 Credit Hours. Preq, MATH 307, 311, or 318. Fundamental concepts of undirected and directed graphs, trees, connectivity, planarity, colorability, network flows, Hamiltonian and Eulerian graphs, matching theory and applications. (G)

Suggested Elective Courses (21 SCH total)
(Please, see advisor)

CSC 582 or CSC 585, CSC 557, STAT 620, STAT 506, CAM 657

CSC 557 (Special Topics: Computer Science) 3 Credit Hours (9). The topics or topics will be selected by the instructor from the various sub-area of computer science. May be repeated as topics change.

CSC 580 (Advanced Data Mining and Applications) 3 Credit Hours. Preq, CSC493/CSC579/equivalent OR consent of instructor. Topics include: Applications of data mining, Clustering high-dimensional data, data and information integration principles, bioinformatics data mining, image data mining and content-based image retrieval, spatio-temporal data structures and usage.

STAT 506 (Regression Analysis) 3 Credit Hours. Preq, STAT 405 or equivalent. Simple and multiple regressions, inferences in regression, model formulation and diagnostics, analysis of covariance, non linear models, estimation and inference. Use of computers in data analysis.

STAT 652 (Stochastic Process) 3 Credit Hours. Preq, MATH 244 and 308 and STAT 520. Probability generating functions, Markov chains, renewal processes, Poisson processes, branching processes.

Directed Study (6 SCH)
CAM 650 or equivalent courses with other prefixes.

Qualifying Exam
CS: CAM 686

Dissertation (18 SCH)
CAM 651

Total (72 SCH)= MATH 15SCH + CS 9SCH + Elective 21 SCH + Directed Study 6 SCH + CAM610 3SCH + Dissertation 18 SCH.