Chapter 2 Exploratory Data Analysis

Before constructing a statistical model it is important to obtain a good understand of the data that you are using in respect of the question that you seek to investigate. The aim of exploratory data analysis, EDA, is to use simple summary tools to investigate the structure of the data without any mathematical assumptions. Questions to ask are:

  • Which is the dependent variable of interest and which are explanatory?

  • Are the variables numerical or categorical, continuous or discrete?

  • Are there unusual features in the distribution of each variable?

  • What is the relationship between the variables?

These questions may sound trivial, but obtaining a good feel for the data before the modelling stage helps identify which set of statistical models are appropriate and what features are likely to be important in explaining the variability.