My PhD project is titled: "Information fusion for non-homogeneous panel and time series data." This project is in collaboration with the STOR-i institute at Lancaster University and the Office for National Statistics (ONS). My academic supervisors at Lancaster are Professor Idris Eckley and Dr Alex Gibberd. My industrial supervisor at ONS is Dr Hannah Finselbach.
The ONS are transforming to put administrative and alternative data sources at the core of their statistics. Official statistics have traditionally been reliant on sample surveys and questionnaires, however, in this rapidly evolving economy, response rates of these surveys are falling. Moreover, there exists a concern of not making full use of new data sources and the continuously expanding volume of information that is now available. Today, information is being gathered in a countless number of ways, from satellite and sensory data, to social network and transactional data. There is certainly an opportunity to remodel the 20th century survey-centric way to a 21st century combination of structured survey data, with administrative and unstructured alternative digital data sources.
My PhD project is to assist the ONS with this transformation, by developing novel methods for combining insight from the alternative (possibly dynamic) information recorded at a different periodicity and reliability, with traditional surveys, in order to meet the ever-increasing demand for improved and more detailed statistics.
The time series data coming from traditional methods are typically recorded at a lower frequency (e.g. annually) and while accurate, and well callibrated, they are very expensive to run and take a long time to feed-back information. By additionally using administrative datasets and alternative data streams such as web-scarped data, we can potentially increase both the frequency and the accuracy at which official statistics are produced.
Understanding how and which of the vast collection of high frequency alternative indicator series are relevent to producing a particular statistic of interest is the predominant area of work I am considering at the start of the PhD. Given that only a few of the indictaor series are likely to be relevant, I explore incorporating sparse modelling techniques such as LASSO regularization in the econometrics time series literature. Beyond this I wish to look into theoretical properties of data revisions and also spatial high-frequency data by considering extensions to vector autoregressive (VAR) models.
The MRes component of the STOR-i programme includes taught courses, projects and group activities providing me with a grounding in statistics and operational research, an overview of
thriving research areas, and an opportunity to develop a formal research proposal for my PhD. Below is a list of project reports I have completed this year. Also, check out my blog to read about some of the research areas I have been exposed to.
16/04/2019: Using adaptive random search for simulation optimisation - POSTER
31/03/2019: Modelling the demand of healthcare systems using infinite-server queueing models
24/02/2019: Detecting multiple changes in variance of univariate time series data
03/02/2019: Continuous-Time Markov chains and Discrete-Time Markov decision processes
03/12/2018: Branch and Bound algorithm to solve the Knapsack problem