Elsevier are a company which specialise in the provision of online content and information to researchers. They are responsible for a large portfilio of products, including:
This by no means exhaustive, and a full list of their platforms can be explored here.
To better serve researchers, Elsevier are committed to the continuous development and improvement of the user experience. Key to this endeavour is the use of data science, which not only informs decisions on platform improvements, but also forms the basis of new features.
For example, Elsevier analyse their Scopus citation database to provide a tool within SciVal which gives users an overview of prominent or thriving research topics, be that globally, nationally or at a specific institution. Another example is their recommender system in Mendeley, Mendeley Suggest, which suggests relevant papers for users to read.
As a joint venture between the STOR-i Centre of Doctoral Training at Lancaster University and Elsevier, this PhD project looks to develop and apply tools from network analysis to make sense of their often high dimensional but structured datasets. Of particular interest is usage data for their various platforms, which lends itself naturally to a network-based representation. Successful analysis of this data would allow Elsevier to better understand how its platforms are being used, thus contributing to the platform improvemnt process.
A network, analogous to a mathematical graph, is a tool often used in data science to represent complex interconected data. In the broadest sense a network consists of a set of nodes and edges, where often the nodes represent entities within some system and edges correspond to relations between these entities. Two examples are given below, where the left/right show undirected/directed networks.
Networks are used to represent data in a diverse range of settings, examples include:
In a similar vein, we are going to make use of a network representation to analyse Elsevier's online usage data. In this case nodes will correspond to platforms/products and edges will represent movements between products. If we then look to compare users based on this data, we will be left with a network for each user. This leaves us with multiple network data, a situation becoming more prevelant in the wider literature thanks in most part to improvements in data collection technologies.
Given this network representation, our research will be focussed on the development and implementation of network models which are applicable to the problem at hand. Certain features of the data provide new challenges which traditional network models often fail to meet, hence novel approaches must be developed.
As alluded to above, this project throws up certain challenges which will require novel approaches. These include:
In the face of these challenges the project has the following initial aims: