Home page for accesible maths 5.1 Introduction and Motivation

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

Data from a simulation exercise

One way of understanding real data is to build a probability model to simulate data and check if the broad properties of the simulated and the real data are the same. In this sense the probability model is a generative model.

Example 5.1.1.

The random variables X and Y in the model represent water level and wave height measured from their respective averages. Physics suggests that water level affects wave height, rather than vice versa. A possible starting point is to suppose that the level X𝖭(0,1), which has mean 0. Further suppose that YX=x𝖭(αx,1) for which 𝖤[Y|X=x]=αx. If α>0 then the expected height increases linearly with level.

The figure illustrates a random sample of 235 ‘observations’ generated from the above model for ‘level’ and ‘height’, with α=1.

Unnumbered Figure: First link, Second Link

Realisations of jointly distributed random variables may be obtained by simulation from fXY(x,y) or from the marginal fX(x) and the conditional fY|X(y|x) (This will be explained in detail later in this chapter). Here we use the latter method.

n = 235; alpha = 1
x = rnorm(n,mean=0,sd=1)
y = rnorm(n,mean=alpha*x,sd=1 )
par(mfrow=c(2,1)) # sets up 2 plotting windows
hist(y,br=20) # br is the number of breaks in the histogram
plot(x,y,pch='.',cex=4)
mean(x) ; mean(y)
sd(x)   ; sd(y)

The final commands give means of 0.0693 and 0.1163 and standard deviations of 0.9847 and 1.4467. The standard deviation of Y is bigger than that of X.