Home page for accesible maths 5.1 Introduction and Motivation Coastal Flooding 5.2 Cumulative Distribution Function

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

Data from a simulation exercise

One way of understanding real data is to build a probability model to simulate data and check if the broad properties of the simulated and the real data are the same. In this sense the probability model is a generative model.

Example 5.1.1.

The random variables $X$ and $Y$ in the model represent water level and wave height measured from their respective averages. Physics suggests that water level affects wave height, rather than vice versa. A possible starting point is to suppose that the level $X\sim\operatorname{\mathsf{N}}(0,1)$ , which has mean $0$ . Further suppose that $Y\mid X=x\sim\operatorname{\mathsf{N}}(\alpha x,1)$ for which $\operatorname{\mathsf{E}}\left[{Y|X=x}\right]=\alpha x$ . If $\alpha>0$ then the expected height increases linearly with level.

The figure illustrates a random sample of $235$ ‘observations’ generated from the above model for ‘level’ and ‘height’, with $\alpha=1$ .

Unnumbered Figure: First link, Second Link

Realisations of jointly distributed random variables may be obtained by simulation from $f_{XY}(x,y)$ or from the marginal $f_{X}(x)$ and the conditional $f_{Y|X}(y|x)$ (This will be explained in detail later in this chapter). Here we use the latter method.

⬇

n = 235; alpha = 1

x = rnorm(n,mean=0,sd=1)

y = rnorm(n,mean=alpha*x,sd=1 )

par(mfrow=c(2,1)) # sets up 2 plotting windows

hist(y,br=20) # br is the number of breaks in the histogram

plot(x,y,pch='.',cex=4)

mean(x) ; mean(y)

sd(x) ; sd(y)

The final commands give means of $0.0693$ and $0.1163$ and standard deviations of $0.9847$ and $1.4467$ . The standard deviation of $Y$ is bigger than that of $X$ .