Probability and Statistics for Data Analysis
This module will teach the probabilistic and statistical foundations which underpin the MSc Data Analytics. This module begins by covering some of the essential theoretical notions of probability and the distributions of random variables which underpin statistical methods. It then describes different types of statistical tests of hypotheses and addresses the questions of how to use them and when to use them. This material is essential for data analytics in applications of statistics in psychology, the life or physical sciences, business and economics.
Storing, Manipulating and Visualising Data
The ability to store, manipulate and display data in appropriate ways is of great importance to data scientists. This module will introduce you to many of the most widely-used techniques in the field. The emphasis of this module is primarily on the interactive use of various IT tools, rather than on programming as such, although in a number of cases you will learn how to develop short programs (scripts) to automate various tasks.
The module aims to introduce you to the Bayesian paradigm. The module will show you some of the problems with frequentist statistical methods, show you that the Bayesian paradigm provides a unified approach to problems of statistical inference and prediction, enable you to make Bayesian inferences in a variety of problems, and illustrate the use of Bayesian methods in real-life examples.
The Bayesian paradigm: likelihood principle, sufficiency and the exponential family, conjugate priors, examples of prior to posterior analysis, mixtures of conjugate priors, non-informative priors, two sample problems, predictive distributions, constraints on parameters, point and interval estimation,hypothesis tests, nuisance parameters.
- Linear models: use of non-informative priors, normal priors, two and three stage hierarchical models, examples of one way model, exchangeability between regressions, growth curves, outliers and influential observations.
- Approximate methods: normal approximations to posterior distributions, Laplace’s method for calculating ratios of integrals, Gibbs sampling, finding full conditionals, constrained parameter and missing data problems, graphical models. Advantages and disadvantages of Bayesian methods.
- Examples: appropriate examples will be discussed throughout the course. Possibilities include epidemiological data, randomised clinical trials, radiocarbon dating.
Computational Statistics with R
This module introduces modern methods of statistical inference for small samples, which use computational methods of analysis, rather than asymptotic theory. Some of these methods such as permutation tests and bootstrapping, are now used regularly in modern business, finance and science.
The techniques developed will be applied to a range of problems arising in business, economics, industry and science. Data analysis will be carried out using the user-friendly, but comprehensive, statistics package R.
- Probability density functions: the empirical cdf; q-q plots; histogram estimation; kernel density estimation.
- Nonparametric tests: permutation tests; randomisation tests; link to standard methods; rank tests.
- Data splitting: the jackknife; bias estimation; cross-validation; model selection.
- Bootstrapping: the parametric bootstrap; the simple bootstrap; the smoothed bootstrap; the balanced bootstrap; bias estimation; bootstrap confidence intervals; the bivariate bootstrap; bootstrapping linear models.
Financial Data Analytics
This module will provide students with a general understanding of current applications of data analytics to finance and in particular to derivatives and investment banking. It will introduce a range of analytical tools such as volatility surface management, yield curve evolution and FX volatility/correlation management. It will also provide you with an overview of some standard tools in the field such as Python, R, Excel/VBA and the Power BI Excel functionality. Students are not expected to have any familiarity with coding or any of the topics above, as the module will develop these from scratch. It will provide you with the understanding of a field necessary to prepare for a career in finance in roles such as trading, structuring, management, risk management and quantitative positions in investment banks and hedge funds.
Graphs and Networks
This module addresses one of the most important “hot topics” in mathematics research – the study of networks – and is essential for understanding the characteristics and universal structural properties of complex networks. Complex networks are the outcome usually of a stochastic dynamics but they are not completely random. You will learn how to disentangle randomness from structural organisational principles of complex networks and how several major types of complex network can be described and artificially generated by mathematical models. Networks characterise the underlying structure of a large variety of complex systems, from the Internet to social networks and the brain. This course is designed to teach students the mathematical language needed to describe complex networks, their basic properties and dynamics. The broad aim is to provide students with the key skills required fundamental research in complex networks, and necessary for application of network theory to specific network problems arising in academic or industrial environments. The students will acquire experience in solving problems related to complex networks and will learn the necessary language to formulate models of network-embedded systems.
- Basic concepts used in studying complex networks (e.g. adjacency matrices, degree distributions and correlations, graph distances)
- Basic tools used to study complex networks (e.g. connected components, k-cores, communities, motifs, centrality measures)
- Models for complex networks: the small world, the growing networks models and the configuration model
Time Series Analysis for Business
Time Series Analysis refers to the use of statistical and machine learning methods for inference on datasets containing variables collected over time, with the ultimate goal of forecasting the values of these variables at some future time.
This module introduces key concepts such as trend and seasonality decomposition, autocorrelation, autoregressive and moving average models, and exponential methods. Tutorials focus on the use of the R software environment in the analysis of real-world time series data.
This module introduces you to the fundamentals of modern time series analysis. We aim to be comprehensive, looking at both theory and applications for different time series models that are widely used in practice. To this end, we will use R and RStudio as our main software for data analysis and you will gain hands-on experience in applying methods learned to real-world case studies.
- Overview of important features in time series data and how they correspond to real-world events
- Introduction to R and RStudio, essential software environment for modern statistical computing
- Learn different R libraries for time series analysis, and use them for model building, selection and diagnosis
- Data pre-processing: methods to remove trend and seasonality and variance stabilizing transformations
- Fundamentals of weakly stationary time series models: moving-average (MA), integrated (I), autoregressive (AR) processes and their various combinations
- Time series forecasting: theory, methods and case studies using R
- Review of other models used in practice for complex data such as vector MA/AR, state-space and recurrent neural networks (RNN).
Machine Learning with Python
This module will introduce you to some of the most widely-used techniques in machine learning (ML). After reviewing the necessary background mathematics, we will investigate various ML methods, such as linear regression, polynomial regression and classification with logistic regression. The module covers a very wide range of practical applications, with an emphasis on hands-on numerical work using Python. At the end of the module, you will be able to formalise a ML task, choose the appropriate method to process it numerically, implement the ML algorithm in Python, and assess the method’s performance.
Optimisation for Business Processes
Optimisation refers to the selection of the best alternative, according to some criterion, from a set of available alternatives.
This module introduces standard models from mathematical optimisation, like network flows and linear programmes, and their use in solving real-world optimisation problems; in staff and project scheduling, commodity trading, production, and sales. Tutorials focus on modelling of real-world optimisation problems based on data, and on the use of software such as R, Excel, and Gurobi to solve optimisation problems and make better decisions.
Programming in Python
This module introduces you to the Python programming language. After learning about data types, variables and expressions, you will explore the most important features of the core language including conditional branching, loops, functions, classes and objects. We will also look at several of the key packages (libraries) that are widely used for numerical programming and data analysis.
Advanced Machine Learning
This module builds on the earlier module "Machine Learning with Python", covering a number of advanced techniques in machine learning, such as dimensionality reduction, support vector machines, decision trees, random forests, and clustering. Although the underlying theoretical ideas are clearly explained, this module is very hands-on, and you will implement various applications using Python in the weekly coursework assignments.
SAS for Business Intelligence
This module is key for students wishing to further their understanding of the visualisation techniques used in business decision processes using the powerful SAS Visual Analytics software.
You will apply the power of SAS analytics to massive amounts of data, gain valuable insights into visualisation techniques to uncover relevant patterns, and be empowered to make quicker informed decisions.
Topics in Scientific Computing
This module focuses on the use of computers for solving applied mathematical problems. Its aim is to provide you with proper computational tools to solve problems which you are likely to encounter while during your MSc, and to develop with a sound understanding of a programming language used in applied sciences. The topics covered will include basics of scientific programming, numerical solution of ordinary differential equations, random numbers and Monte Carlo methods, simulation of stochastic processes, algorithms for complex networks analysis and modelling. The emphasis of the module would be on numerical aspects of mathematical problems, with a focus on applications rather than theory.
- The use of computers for solving applied mathematical problems
- Proper computational tools to solve problems likely to be encountered during the MSc
- Training in a programming language used in applied sciences
Data Analytics Project and Dissertation
Each Data Analytics MSc student is required to complete a 60 credit project dissertation. A typical MSc project dissertation consists of about 30 word-processed pages, covering a specific research-level topic in data analytics, usually requiring the student to understand, explain and elaborate on results from one or more journal articles and/or performing computation, simulations, or analysis. An MSc project should help prepare a good student for PhD research or independent work in the industry and even allow an excellent student the possibility of doing some research.
Possible areas of the MSc dissertation projects offered by the School of Mathematical Sciences include a large variety of different scientific topics, among them time series analysis, exploratory data analysis on a dataset, performance and comparative analysis of state of the art techniques, theoretical models of data, complex systems, dynamical systems, topological data analysis, experimental design with data, and statistical aspects of data analytics techniques.