When measuring the performance of a system, the most fundamental question to ask is whether or not the performance was impaired at any point. This is something of great importance to BT, with whom this project is in partnership, as their customer satisfaction depends on consistent performance of their services. To measure this performance, BT have a number of data streams carrying information about their services over time, from which a drop in performance can be detected. While the problem sounds simple, it is not so easy to solve. BT have huge amounts of data to analyse, and it is often difficult to visually see a drop in performance when looking at a data stream. Thus, having people inspect the data is not a practical solution, and instead algorithms are required.
Algorithms of this nature fall under the category of anomaly detection. Given a time series, one assumes there is some baseline distribution of the data, equivalent in the physical world to a system performing normally. Within this time series there may exist temporary changes in the distribution of the data, before returning back to the baseline distribution; these changes are known as anomalies, equivalent in the physical world to a change i in the system, such as a drop in performance. One then wishes to detect if there are anomalies in the time series, and if they exist, where they start and end, and potentially the distribution of the data within these anomalies.
There already exist algorithms which tackle this problem. However, these algorithms make assumptions about the data which do not often hold in reality. For example, many algorithms assume that the data is uncorrelated. Another common assumption is that there is no missing data in the data set. When the assumptions made by the algorithms do not hold, the performance can be significantly impaired. Thus, the aim of this project is to construct new algorithms using statistical methods which are able to deal with complications in data which existing algorithms cannot deal with effectively.