Want to speak directly to us? Book a 1-1 chat with Professor Chris Edwards, Programme Director of MSc Data Science.
Explore data, and dive into predictive analytics, machine learning and AI. The skills of a data scientist are complex and in demand. With high potential career earnings, are you ready to make an investment in yourself?
Why data science at Lancaster University?
Here are the top reasons to study MSc Data Science at Lancaster University
Module list accordion
This module teaches students about how data science is performed within academic and industry (via invited talks), research methods and how different research strategies are applied across different disciplines, and data science techniques for processing and analysing data. Students will engage in group project work, based on project briefs provided by industrial speakers, within multi-skilled teams (e.g. computing students, statistics students, environmental science students) in order to apply their data science skills to researching and solving an industrial data science problem.
Topics covered will include
- The role of the data scientist and the evolving epistemology of data science
- The language of research, how to form research questions, writing literature reviews, and variance of research strategies across disciplines
- Ethics surrounding data collection and re-sharing, and unwanted inferences
- Identifying potential data sources and the data acquisition processes
- Defining and quantifying biases, and data preparation (e.g. cleaning, standardisation, etc.)
- Choosing a potential model for data, understanding model requirements and constraints, specifying model properties a priori, and fitting models
- Inspection of data and results using plots, and hypothesis and significance tests
- Writing up and presenting findings
Learning
Students will learn through a series of group exercises around research studies and projects related to data science topics. Invited talks from industry tackling data science problems will be given to teach the students about the application of data science skills in industry and academia. Students will gain knowledge of:
- Defining a research question and a hypothesis to be tested, and choosing an appropriate research strategy to test that hypothesis
- Analysing datasets provided in heterogeneous forms using a range of statistical techniques
- How to relate potential data sources to a given research question, acquire such data and integrate it together
- Designing and performing appropriate experiments given a research question
- Implementing appropriate models for experiments and ensuring that the model is tested in the correct manner
- Analysing experimental findings and relating these findings back to the original research goal
Recommended texts and other learning resources
- O'Neil. C., and Schutt. R. (2013) Doing Data Science: Straight Talk from the Frontline. O’Reilly
- Trochim. W. (2006) The Research Methods Knowledge Base. Cenage Learning
This module is designed for students that are completely new to programming, and for experienced programmers, bringing them both to a high-skilled level to handle complex data science problems. Beginner students will learn the fundamentals of programming, while experienced students will have the opportunity to sharpen and further develop their programming skills. The students are going to learn data-processing techniques, including visualisation and statistical data analysis. For a broad formation, in order to handle the most complex data science tasks, we will also cover problem solving, and the development of graphical applications.
In particular students will gain experience with two very important open source languages: R and Python. R is the best language for statistical analysis, being widely applied in academia and industry to handle a variety of different problems. Being able to program in R gives the data scientists access to the best and most updated libraries for handling a variety of classical and state of the art statistical methods. Python, on the other hand, is a general purpose programming language, also widely used for three main reasons: it is easy to learn, being recommended as a "first" programming language; it allows easy and quick development of applications; it has a great variety of useful and open libraries. For those reasons, Python has also been widely applied for scientific computing and data analysis. Additionally, Python enables the data scientist to easily develop other kinds of useful applications: for example, searching for optimal decisions given a data-set, graphical applications for data gathering, or even programming Raspberry Pi devices in order to create sensors or robots for data collection. Therefore, learning these two languages will not only enable the students to develop programming skills, but it will also give them direct access to two fundamental languages for contemporary data analysis, scientific computing, and general programming.
Additionally, students will gain experience by working through exercise tasks and discussing their work with their peers; thereby fostering interpersonal communications skills. Students that are new to programming will find help in their experienced peers, and experienced programmers will learn how to assist and explain the fundamental concepts to beginners.
Topics covered will include
- Fundamental programming concepts (statements, variables, functions, loops, etc)
- Data abstraction (modules, classes, objects, etc)
- Problem-solving
- Using libraries for developing applications (e.g., SciPy, PyGames)
- Performing statistical analysis and data visualisation
On successful completion of this module, students will be able to
- Solve data science problems in an automatic fashion
- Handle complex data-sets, which cannot be easily analysed "by hand"
- Use existing libraries and/or develop their own libraries
- Learn new programming languages, given the background knowledge of two important ones
Bibliography
- Introductory statistics with R. Dalgaard, Peter. Springer, 2008. ISBN-13: 978-0387954752
- R Cookbook. Paul Teetor. O'Reilly Media; 1 edition. 2011. ISBN-13: 978-0596809157.
- Python Documentation: https://www.python.org/doc/
- SciPy Documentation: https://www.scipy.org/
- PyGames Documentation: https://www.pygame.org/docs/
This module will provide a comprehensive coverage of the problems related to Data representation, storage, manipulation, retrieval and processing in terms of extracting information from the data. It has been designed to provide a fundamental theoretical level of knowledge and skills (at the related laboratory sessions) to this specific aspect of Data Science, which plays an important role in any system and application. In this way it prepares students for the second module on the topic of Data as well as for their projects.
Topics to be covered will include
- Data Primer: Setting the scene: Big Data, Cloud Computing; The time, storage and computing power compromise: off-line versus on-line
- Data Representations
- Storage Paradigms
- Vector-space models
- Hierarchical (agglomerative/diversive)
- k means
- SQL and Relational Data Structures (short refresher)
- NoSQL: Document stores, graph databases
- Inference and reasoning
- Associative and Fuzzy Rules
- Inference mechanisms
- Data Processing
- Clustering
- Density-based, on-line, evolving
- Classification
- Randomness and determinism, frequentist and belief based approaches, probability density, recursive density estimation, averages and moments, important random signals, response of linear systems to random signals, random signal models
- Discriminative (Linear Discriminant Analysis, Single Perceptron, Multi-layer Perceptron, Learning Vector Classifier, Support Vector Machines), Generative (Naive Bayes)
- Supervised and unsupervised learning, online and offline systems, adaptive and evolving systems, evolving versus evolutionary systems, normalisation and standardisation
- Fuzzy Rule-based Classifiers, Regression or Lable based classifiers
- Self-learning Classifiers, evolving Classifiers, dynamic data space partitioning using evolving clustering and data clouds, monitoring the quality of the self-learning system online, evolving multi-model predictive systems
- Semi-supervised Learning (Self-learning, evolving, Bootstrapping, Expectation-Maximisation, ensemble classifiers)
- Information Extraction vs Retrieval
On successful completion of this module students will
- Demonstrate understanding of the concepts and specific methodologies for data representation and processing and their applications to practical problems
- Analyse and synthesise effective methods and algorithms for data representation and processing
- Develop software scripts that implement advanced data representation and processing and demonstrate their impact on the performance
- List, explain and generalise the trade-offs of performance and complexity in designing practical solutions for problems of data representation and processing in terms of storage, time and computing power
This module provides an introduction to statistical learning.
Topics to be covered will include
- Big data
- Missing data
- Biased samples and recency
- Likelihood and cross-validation
On successful completion of this module students will
- Understand cross-validation of sample splitting into calibration, training and validation samples.
- Be able to move to handling regression problems for large data sets via variable reduction methods such as the Lasso and Elastic Net.
- Understand a variety of classification methods including logistic and multinomial logistic models, regression trees, random forests and bagging and boosting.
- Examine classification methods that will culminate in neural networks presented as generalised linear modelling extensions.
- Understand big data using K-means, PAM and CLARA, followed by mixture models and latent class analysis.
This module provides an introduction, at graduate level, to two core areas which are essential building blocks to further advanced study of statistical modelling, methodology and theory. The areas that will be covered are statistical inference using maximum likelihood and generalised linear models (GLMs). Building on an undergraduate level understanding of mathematics, statistics (hypothesis testing and linear regression) and probability (univariate discrete and continuous distributions; expectations, variances and covariances; the multivariate normal distribution), this module will motivate the need for a generic method for model fitting and then demonstrate how maximum likelihood provides a solution to this. Following on from this, GLMs, a widely and routinely used family of statistical models, will be introduced as an extension of the linear regression model.
This module will motivate the use of statistical modelling as a tool for making inference on a population given a sample of data. Students will be introduced to basic terminology of statistical modelling, and the similarities and differences between statistical and machine learning approaches will be discussed to lay the foundations for the development of both of these over the remaining core modules They will cover the concepts of sampling uncertainty, statistical inference and model fitting, with sampling uncertainty used to motivate the need for standard errors and confidence intervals. Once core concepts have been established, linear regression and generalised linear models will be introduced as essential statistical modelling tools. An understanding of these models will be obtained through implementation in the statistical software package R.
Term 2, specialist pathways
Choose a pathway that aligns with your interests and career goals. Each pathway offers access to a dedicated selection of modules allowing you to build the necessary skills for your career.
- Computing and Artificial Intelligence
- Business Intelligence
- Environmental Data Science
- Health Data Science
- Societal Data Science
Term 3, Placement
Your 14-week placement forms the basis of your dissertation and is completed over the summer. It is the pinnacle of the year for our students and often leads to job offers. We have arranged placement projects for over 350 students, some at world-leading organisations like Unilever, Siemens and The Bank of England.
Still not convinced?
Hear what our academics, staff and students have to say about the course.
Careers
Data Scientists are in demand and starting salaries are competitive. Our graduates work in roles such as:
- Data Scientist
- Data Analyst
- Machine Learning Engineer
- Business Intelligence Analyst
- Data Engineer
- Quantitative Analyst
- Big Data Engineer
- Statistician
Entry Requirements
2:1 Hons degree (UK or equivalent) in any discipline, provided that the applicant has some experience in programming and has had exposure to quantitative methods such as statistics, or mathematical modelling. Applicants with a 2:2 Hons degree (UK or equivalent) in any discipline along with relevant experience are welcome to apply and will be considered on a case-by-case basis.
Students have successfully completed the course with undergraduate degrees in Computer Science, Mathematics, Statistics, Engineering, Physics, Life Sciences, Economics, Finance, Linguistics, and others.
We may also consider non-standard applicants, please contact us for information.
Tab Content: Fees
Studying at a UK University means you must pay an annual tuition fee. This fee covers the costs associated with teaching, examinations, assessment and graduation.
Our annual tuition fee is set for a 12-month session. This session usually runs from October to September of the following year.
Location | Full Time (per year) | Part Time (per year) |
---|---|---|
Home | £13,600 | £6,800 |
International | £29,150 | £14,575 |
Tab Content: Scholarships
Submit your application to study MSc Cyber Security and we will automatically consider you against our scholarship criteria – no extra steps needed!
Below, you will find more info about available scholarships across Lancaster University.
Scheme | Based on | Amount |
---|---|---|
Lancaster Master's Scholarship | Entry grades | Up to £5,000 |
Lancaster Global Scholarship - Master's | Entry grades | £5,000 in your first year of study. |
Lancaster Opportunity Fund | Ethnicity, household income, academic performance, and Lancaster alumni status | Full fees and £9,000 living costs |
Alumni Loyalty Scholarship | Lancaster alumni status | 10% fee discount |
Tab Content: Postgraduate Master's Loan
The government offers a number of loans to UK and some EU national students wishing to study for a Master's degree.
A Postgraduate Master’s Loan can help with course fees and living costs while you study a postgraduate master’s course. You can find more up to date information on the official government website.