Our course finder pages contain all the most up-to-date information about the Data Analytics MSc, including details of the programme structure, compulsory and elective modules and study options.

Below is a full list of all modules which are expected to be available to students on this programme across the semesters. Please note that this is for information only and may be subject to change. Click the link above for accurate information about which of these modules are compulsory and elective for each semester of your MSc programme.

This module is offered to allow you to move beyond the basic techniques of Machine Learning, and is a core component of the MSc Data Analytics. Together with the initial module (Machine Learning with Python), this course will provide a comprehensive overview of Machine Learning and its mathematical foundations as well as an introduction to the current state of the art in the field.

The aim of this module is to introduce students to more advanced machine learning techniques. An emphasis will be on current techniques which are relevant for practical applications. In addition to practical programming assignments, the course will also give you an understanding of the mathematical underpinning of the techniques and the limitations of the methods which are crucial to correctly assessing their performance.

Building on the module in semester A, linear methods will be extended to non-linear settings using kernel methods. The module will also go further in depth with topics which were introduced in the first semester such as neural networks and Monte Carlo Markov Chain methods (MCMC). It will cover specific applications and provide students with an overview of the current state of the art techniques.The module aims to introduce you to the Bayesian paradigm. The module will show you some of the problems with frequentist statistical methods, show you that the Bayesian paradigm provides a unified approach to problems of statistical inference and prediction, enable you to make Bayesian inferences in a variety of problems, and illustrate the use of Bayesian methods in real-life examples.

Topics include:

The Bayesian paradigm: likelihood principle, sufficiency and the exponential family, conjugate priors, examples of prior to posterior analysis, mixtures of conjugate priors, non-informative priors, two sample problems, predictive distributions, constraints on parameters, point and interval estimation,hypothesis tests, nuisance parameters.

- Linear models: use of non-informative priors, normal priors, two and three stage hierarchical models, examples of one way model, exchangeability between regressions, growth curves, outliers and influential observations.
- Approximate methods: normal approximations to posterior distributions, Laplace’s method for calculating ratios of integrals, Gibbs sampling, finding full conditionals, constrained parameter and missing data problems, graphical models. Advantages and disadvantages of Bayesian methods.
- Examples: appropriate examples will be discussed throughout the course. Possibilities include epidemiological data, randomised clinical trials, radiocarbon dating.

Complex systems can be defined as systems involving many coupled units whose collective behaviour is more than the sum of the behaviour of each unit. Examples of such systems include coupled dynamical systems, fluids, transport or biological networks, interacting particle systems, etc. The aim of this module is to introduce students to a number of mathematical tools and models used to study complex systems and to explain the mathematical meaning of key concepts of complexity science, such as self-similarity, emergence, and self-organisation. The exact topics covered will depend on the module organiser's expertise with a view to cover practical applications using analytical and numerical tools drawn from other applied modules.

Topics include:

Introduction to the field of complex systems via a number of representative examples and models of these systems (e.g., coupled dynamical systems, time-delayed systems, stochastic processes, networks, time series, fractals, multifractals, particle models).

- Introduction to basic tools and quantities used in the study of complex systems (e.g., bifurcation diagram, symbolic dynamics, dimensions, Lyapunov exponents, complexity measures, entropies).
- Introduction to the concepts of emergence and self-organisation in the context of basic models of complex models.
- Introduction to basic computational and numerical methods used to study complex systems.

This module introduces modern methods of statistical inference for small samples, which use computational methods of analysis, rather than asymptotic theory. Some of these methods such as permutation tests and bootstrapping, are now used regularly in modern business, finance and science.

Topics include:

The techniques developed will be applied to a range of problems arising in business, economics, industry and science. Data analysis will be carried out using the user-friendly, but comprehensive, statistics package R.

- Probability density functions: the empirical cdf; q-q plots; histogram estimation; kernel density estimation.
- Nonparametric tests: permutation tests; randomisation tests; link to standard methods; rank tests.
- Data splitting: the jackknife; bias estimation; cross-validation; model selection.
- Bootstrapping: the parametric bootstrap; the simple bootstrap; the smoothed bootstrap; the balanced bootstrap; bias estimation; bootstrap confidence intervals; the bivariate bootstrap; bootstrapping linear models.

Each Data Analytics MSc student is required to complete a 60 credit project dissertation. A typical MSc project dissertation consists of about 30 word-processed pages, covering a specific research-level topic in data analytics, usually requiring the student to understand, explain and elaborate on results from one or more journal articles and/or performing computation, simulations, or analysis. An MSc project may also involve collaboration with a collaborator based in industry. An MSc project should help prepare a good student for PhD research or independent work in industry and even allow an excellent student the possibility of doing some research.

Possible areas of the MSc dissertation projects offered by the School of Mathematical Sciences include a large variety of different scientific topics, among them time series analysis, exploratory data analysis on a dataset, performance and comparative analysis of state of the art techniques, theoretical models of data, complex systems, dynamical systems, topological data analysis, experimental design with data, and statistical aspects of data analytics techniques.

This module addresses one of the most important “hot topics” in mathematics research – the study of networks – and is essential for understanding the characteristics and universal structural properties of complex networks. Complex networks are the outcome usually of a stochastic dynamics but they are not completely random. You will learn how to disentangle randomness from structural organisational principles of complex networks and how several major types of complex network can be described and artificially generated by mathematical models. Networks characterise the underlying structure of a large variety of complex systems, from the Internet to social networks and the brain. This course is designed to teach students the mathematical language needed to describe complex networks, their basic properties and dynamics. The broad aim is to provide students with the key skills required fundamental research in complex networks, and necessary for application of network theory to specific network problems arising in academic or industrial environments. The students will acquire experience in solving problems related to complex networks and will learn the necessary language to formulate models of network-embedded systems.

Topics include:

- Basic concepts used in studying complex networks (e.g. adjacency matrices, degree distributions and correlations, graph distances)
- Basic tools used to study complex networks (e.g. connected components,
*k*-cores, communities, motifs, centrality measures) - Models for complex networks: the small world, the growing networks models and the configuration model

This module aims to provide students with Machine Learning skills based on the Python programming language as it is currently used in industry. Some of the presented methods are regression and classification techniques (linear and logistic regression, least-square); clustering; dimensionality reduction techniques such as PCA, SVD and matrix factorisation. More advanced methods such as generalised linear models, neural networks and Bayesian inference using graphical models are also introduced. The course is self-contained in terms of the necessary mathematical tools (mostly probability) and coding techniques. At the end of the course, students will be able to formalise a ML task, choose the appropriate method in order to tackle it while being able to assess its performance, and to implement these algorithms in Python. Independently of the field, skills in Machine Learning and coding are nowadays almost mandatory in many technical careers (academia, engineering, finance, etc.). This course will provide the students with practical skills in Python for Machine Learning. A strong focus well be put on practice through exercises and projects in Python, one of the preferred language in industry.

Topics include:

- Basic probability, statistical inference and optimisation concepts
- Python coding
- Data cleaning, processing and interpretation
- Understanding of the canonical machine learning algorithms
- Scientific report writing (in Latex)

Optimisation refers to the selection of the best alternative, according to some criterion, from a set of available alternatives.

This module introduces standard models from mathematical optimisation, like network flows and linear programmes, and their use in solving real-world optimisation problems; in staff and project scheduling, commodity trading, production, and sales. Tutorials focus on modelling of real-world optimisation problems based on data, and on the use of software such as R, Excel, and Gurobi to solve optimisation problems and make better decisions.

This module addresses one of the most important “hot topics” in mathematics research – the study of networks – and is essential for understanding the characteristics and universal structural properties of complex networks. Complex networks are the outcome usually of a stochastic dynamics but they are not completely random. You will learn how to disentangle randomness from structural organisational principles of complex networks and how several major types of complex network can be described and artificially generated by mathematical models.

Topics include:

- Basic concepts used in studying complex networks (e.g. adjacency matrices, degree distributions and correlations, graph distances)
- Basic tools used to study complex networks (e.g. connected components,
*k*-cores, communities, motifs, centrality measures) - Models for complex networks: the small world, the growing networks models and the configuration model

In business environments the ability to use key software packages is vital, particularly the universally used Microsoft Office portfolio. This module will teach you how to customise and program two key aspects of Microsoft Office used in Analytics; the database package Access and the spreadsheet software Excel.

You will learn Visual Basic for Applications (VBA), the most prevalent programming language in industry and some Structured Query Language (SQL) for data manipulation. The course is taught by an actuary with 15 years industry experience in this area.

This module is key for students wishing to further their understanding of the visualisation techniques used in business decision processes using the powerful SAS Visual Analytics software.

You will apply the power of SAS analytics to massive amounts of data, gain valuable insights into visualisation techniques to uncover relevant patterns, and be empowered to make quicker informed decisions.

This module introduces you to the fundamentals of modern time series analysis. We aim to be comprehensive, looking at both theory and applications for different time series models that are widely used in practice. To this end, we will use R and RStudio as our main software for data analysis and you will gain hands-on experience in applying methods learned to real-world case studies.

Topics include:

- Overview of important features in time series data and how they correspond to real-world events
- Introduction to R and RStudio, essential software environment for modern statistical computing
- Learn different R libraries for time series analysis, and use them for model building, selection and diagnosis
- Data pre-processing: methods to remove trend and seasonality and variance stabilizing transformations
- Fundamentals of weakly stationary time series models: moving-average (MA), integrated (I), autoregressive (AR) processes and their various combinations
- Time series forecasting: theory, methods and case studies using R
- Review of other models used in practice for complex data such as vector MA/AR, state-space and recurrent neural networks (RNN).

Time Series Analysis refers to the use of statistical and machine learning methods for inference on datasets containing variables collected over time, with the ultimate goal of forecasting the values of these variables at some future time.

This module introduces key concepts such as trend and seasonality decomposition, autocorrelation, autoregressive and moving average models, and exponential methods. Tutorials focus on the use of the R software environment in the analysis of real-world time series data.

This module focuses on the use of computers for solving applied mathematical problems. Its aim is to provide you with proper computational tools to solve problems which you are likely to encounter while during your MSc, and to develop with a sound understanding of a programming language used in applied sciences. The topics covered will include basics of scientific programming, numerical solution of ordinary differential equations, random numbers and Monte Carlo methods, simulation of stochastic processes, algorithms for complex networks analysis and modelling. The emphasis of the module would be on numerical aspects of mathematical problems, with a focus on applications rather than theory.

Topics include:

- The use of computers for solving applied mathematical problems
- Proper computational tools to solve problems likely to be encountered during the MSc
- Training in a programming language used in applied sciences

This module introduces you to some of the key technologies that are widely used for developing software applications in the financial markets and banking sectors. In particular, we focus on three programming environments/languages (Excel, VBA and C++) which are often used in conjunction to build complete trading and risk management systems. It is a highly practical module, focusing on current industry practice, and therefore you will be well equipped to apply for a programming role in a financial institution.

Topics include:

- Overview of typical requirements for trading and risk management systems
- Introduction to Microsoft Excel, and its use as a ‘front end’ for applications
- Fundamentals of programming in VBA (Microsoft Visual Basic for Applications)
- Manipulating Excel from VBA, the Excel object model
- Review of C++, generation of dynamically-linked libraries (DLLs) used as ‘back ends’ containing computation analytics
- Complete system development (Excel/VBA/C++) of a derivatives pricing tool
- Review of other technologies used in practice, including Java, COM, Python, .NET, C#, F#