Many probability calculations are too complicated to find exact answers e.g. values of the standard Normal cdf . Instead, numerical approximations are used. The most accurate and convenient way to do this is by computer, and this chapter details the commands used for this purpose in R.
Historically, numerical values were supplied to users as tables. These are little used nowadays, but one important exception is in exams, where you will be told the values needed for the questions. These will be in the form of R output (see past papers for examples e.g. B3 in the 2011 exam), so it will be important to understand the basics of the R commands below.
There are four groups of R functions for probability distributions, represented by four letters which start the function names:
p for probability giving the cdf (as it represents the probability .)
q for quantile giving the inverse cdf .
d for density giving the pdf (or pmf for discrete random variables.)
r for random giving random numbers from this distribution.
To complete the function name, a shorthand version of the distribution name is added. For the Normal distribution, for example, this is norm, giving functions pnorm, qnorm, dnorm and rnorm.
Each group of functions operates in a similar way for any distribution, and will explained in turn below.
The following table lists the shorthand names of many common distributions implemented by R. It also gives the parameter names used in R, which will be needed shortly, and the corresponding parameter notation from earlier in the notes.
Distribution | Shorthand | Parameter 1 | Parameter 2 | |||
Bernoulli | Use Binomial with size 1 | |||||
Discrete | Binomial | binom | size | prob | ||
Geometric | geom | prob | ||||
Poisson | pois | lambda | ||||
Uniform | unif | min | max | |||
Exponential | exp | rate | ||||
Gamma | gamma | shape | rate | |||
Normal | norm | mean | sd | |||
Beta | beta | shape1 | shape2 | |||
Cauchy | cauchy | |||||
Continuous | Weibull | weibull | shape | scale | ||
Chi-squared | chisq | df | ||||
f | df1 | df2 | ||||
Log-normal | lnorm | meanlog | sdlog | |||
Student’s | t | df | ||||
(with ) |
For a probability function such as pnorm, you must specify a value of for which is required, and also the parameters of the distribution, as given in the table.
Suppose . Find and .
To save time you can leave out the names of the parameters.
But this can be dangerous! For example, it is easy to give the variance instead of the standard deviation .
For a quantile function such as qnorm, you must specify a value of between and for which is required, and also the parameters of the distribution.
Suppose and . Find and the median of .
For a density function such as dnorm, you must specify the value for which the density is required and also the parameters of the distribution. For a discrete distribution, the mass function is found instead.
Example
Suppose and . Find and .
dnorm(11,mean=9,sd=1) dgeom(0,prob=0.9)
For a random function such as rnorm, you must specify how many random numbers you want, , and also the parameters of the distribution.
Suppose . From distribution generate
a single random number,
the mean of 1000 independent random numbers.
As noted above, to save time you can leave out the names of the parameters. Instead the parameters are entered without names in the order given in the table.
You can also leave out parameters entirely. Default values are used instead which usually correspond to the standard version of the distribution.
Suppose is a standard Normal random variable i.e. . Find and the lower quartile of .
To find the default values of arguments, and other information about these functions, use the R help system. For example ?dnorm shows a help page including the line
This tells you that the default values are mean zero and standard deviation one, corresponding to the standard normal .
Another way to save time is to use the fact that the distribution functions allow their arguments to be vectors.
Suppose . Find all quartiles.
Generate random numbers from for .