COPULA
2nd, April, 2019.
Copula is a fascinating tool for understanding dependence among multivariate outcomes. The origin of the term copula is from Latin word “copulare”, meaning connection or joint. It first used in the work of Sklar in \(1959\). The main use of copulas is to understand the interrelation of two or more random variables.
Let us start off by some simple explanatory examples. Let \(A\) and \(B\) be two real value random variable from a set \( \{ 1,2, …, 6\}\). Assume that we can communicate with \(A\) and we need to enter a bet on \(B\). The question is how much information can be obtained from the knowledge of \(A\) or equivalently, what is the interrelation of two random variables \(A\) and \(B\).
The answer is very obvious if the procedure is just like throwing dice twice and we have outcomes of two time throwing, which is \(A\) and \(B\). In this case, the outcome of the first throw is not depend on that of the second throw and vice versa or in the other words, \(A\) and \(B\) are independence. Hence, the observation of \(A\) does not give us any information about \(B\).
\(A\) complete different answer can be derived provided that the outcome of the first throw is smaller than or equal to that of second throw. In this case, when \(A = 6\), we will have full information about \(B\), or when \(A = 5\) we have \(50\%\) knowledge about \(B\), either \(5\) or \(6\).
Therefore, we need some tools to help us capture the possible dependence of multiple random variables. Let start first with two random variables \(X_1, X_2\). Suppose that each random variable is fully observed by its cumulative distribution function (cdf), which is usually known as marginal distributions \( F_i(x) = Pr(X_i \leq x)\). With the above example, we have the same marginal distribution for the outcome of each time throwing. However, knowing the cdfs does not give us any information about the dependence of them. Hence, to obtain a full knowledge about \(X_1, X_2\) we used both the marginal distributions and the type of interrelation. One may ask why should not we use the joint probability? Let see one simple example: if the die is numbered \(11, 12, 13, 14, 15, 16\) rather than \(1,2, 3, 4, 5, 6,\) the structure of dependence won’t change, but the joint distribution function will change since the marginal distributions change. Therefore, we aim at finding the decomposition tool to separate the joint distribution function into the copula function stating the dependence structure and the marginal. This post will endeavour to briefly explain what a copula is, why using it and how to derive it.
Copula derive from a well-known theorem Sklar's theorem which shows that any joint distribution can be written as a function of its marginal distributions or copula function. Or in other words, there exists a copula function C such as for all \( ( x_1, x_2, \cdots, x_m) \in \mathbb{R}^m: \)
$$ C(F_1(x_1),...,F_n(x_n)) = F(x_1,...,x_n).$$
Moreover, if the marginal distributions are continuous, then \(C\) is uniquely determined. Indeed, the concept copula can help us to understand the full relationships among multivariate random variables by linking univariate marginal to their joint multivariate distribution.
Let define \(u_i = F_i(x_i)\) and then, \(F^{-1}_i(u_i) = X_i\) where \(F^{-1}_i\) is the inverse function. What we want to do here is to extract an expression of copula function which can help us work out the dependent structures of outcomes easily: $$C(u_1,...,u_n) = F(F^{-1}_1(x_1),...,F^{-1}_n(x_n))$$ This equation suggests that if both the joint distribution function and the inverse function of the marginal distributions are available, then we can derive an “implicit” expression for the copula function. For example, if the joint distribution is a multivariate normal distribution, then copula function is a Gaussian copula. However, it is not always apparent to identify the copula function. Indeed, for many applications, the problem is that the joint distribution is not always given but can be assumed due to some stylized facts. For example, in financial problems, the relationships between different asset returns are given, then we usually make assumption that the joint distribution follows a multivariate gaussian or a log-normal distribution for calculation simplicity, even if these assumption may not be accurate. To understand full multivariate outcomes, the modelling problem consists of two steps:
- Identifiying the marginal distributions;
- Defining the appropriate copula function describing the dependence structure;
And what happens if the marginal distribution is unavailable? I'll now explain how to proceed a useful non-parametric copula directly from the empirical distribution function. The empirical distribution CDF is given by \(F(t) = 1/n\sum_{i=1}^n\textbf{1}\{x_i \le t\}\) where \(\textbf{1}\) represents an indicator function taking value 1 if \(\{x_i \le t\}\) is true and 0 otherwise. The first thing we need to do is to determine \(U_i = F_i(X_i)\) from values of each variable \(X_i\). The copula can be derived from the joint empirical distribution as following: $$C(u_1,...,u_n)=F(F^{−1}_1(u_1),...,F^{−1}_n(u_n))=\frac{1}{m} \sum_{i=1}^m \mathbb{I}(U_{i,1}≤u_1,...,U_{i,n}≤u_n)$$ Copula have so many applications in quantitative finance, engineering, data analysis, turbulent combustion, climate and weather research, medicine etc. For example, in civil engineering , one may show interested in understanding interaction of individual driver behaviours which totally shapes up the characteristics of an traffic flow or reliability engineering, the copula has been successful applied in characterizing the dependence of complex system components which may cause failure in machine. I hope the post has made you understand copula a bi clearer. For more details, please find the reference below.
Reference:
1. Understanding relationshops using copula, Edward W. Frees and Emiliano A. Valdez , 1997
2. Copulas for Finance A Reading Guide and Some Applications, Eric Bouye, 2000.
3. Copula lecture