Random outcomes are typially not studied in isolation. Instead, it is often natural to consider several random variables at the same time to describe a problem of interest. The central question is then how to account for the interactions between these individual (random) components. In this context, coupling functions called copulas have been succesfully used to describe and study dependence between reandom variables....
Consider a random vector with joint cumulative distribution function While contains all distributional information of , this information is all tangled together in a one-shot representation. Copulas can be used to disentangle the information contained in the marginal distribution functions , , from the interactions between . Specifically, Sklar’s representation theorem tells us that can be expressed as where is a copula associated to . Conversely, any combination of a copula and margins in the above way yields a valid -dimensional distribution function. Copulas themselves simply are multivariate distribution functions with standard uniform margins representing possible dependence structures. The margins are then used to stretch the uniform margins of the copula to the appropriate domain.
Copulas greatly impact my approach to multivariate statistics and touch most of my research projects. Some past projects where copulas were the main objects under study (as opposed to useful tools) are the following: Instead of directly defining copulas, the research project Index-mixed copulas explores how new copulas can be build from initial ones. While using copula based models in non-parametric estimation of time series models is explored in Flexible and Dynamic Modeling of Dependencies via Copulas, the article Smoothed bootstrapping of copula functionals discusses re-sampling procedures for functionals directly related to the underlying dependence structure.
A common situation for banks, insurance companies or financial institutions is the need to quantitatively asses the risk inherent in their positions and portfolios. This is necessary due to regulatory requirements, but also for the internal reporting. Fameously starting with the value-at-risk, risk measures are constructed for this purpose and continue to be an active research area. Given that different risks are rarely independent, it is also valuable to study risk measures when considering multiple risks at the same time....
In case of multiple risks, there are two main ways to approach the problem. In some situations, it is possible or even favourable to aggregate risks. For example, when considering a portfolio, the total value is the sum of the individual share prices weighted by their percentage contributions, , , Consequently, the total portfolio value is . A univariate risk measure can now be used to quantify the risk inherent in via Popular univariate risk measures are for example the value- and tail-value-at-risk, expectiles or extremiles.
Properties of univariate risk measures and their application to optimal portfolio selection is one of my ongoing research subjects. Distributional properties of under arbitrary dependence structures are studied in On the distribution of sums of random variables with copula-induced dependence, while optimal portfolio choice, that is the optimal choice of the weights , is further discussed in Optimal Expected-Shortfall Portfolio Selection With Copula-Induced Dependence. A unifying framework to study univariate risk measures is proposed in Generalized extremiles and risk measures of distorted random variables.
When aggregation is not an option, risk measures are defined taking in random vectors as arguments. A multivariate risk measure is therefore a function where different domains for are possible. For the generalized expectiles and tail-value-at-risk discussed in Multivariate geometric expectiles and Multivariate geometric tail- and range-value-at-risk we have , but level-sets of the joint distribution function and their boundary are other popular choices.
At first glance, statistics seems to provide tools to analyze typical outcomes such as averages or medians. There are however a number of situations where central tendencies are not of interest: the average water level is of little help when designing counter measures against flooding; typical day-to-day losses provide limited information for insurance and financial institutions when putting reserves into place that should protect in case of worst-case scenarios; or average lifetimes are not of interest when studying maximum lifetimes. Luckily, statisticians have developed tools to analyze the behaviour of the maximum of random outcomes starting at least in the 1920's. The approach is indeed similar to the central limit theorem used for averages....
Starting with the familiar case of the central limit theorem for iid data, we can under a finite second moment condition find sequences of constants and to arrive at a limiting distribution where is the standard normal distribution. The result is exceptional due to the universal form of the limiting distribution. The stabilization of with and is necessary, since by itself converges almost surely to the constant , which is of no help in a more detailed probabilistic analysis of the behavior of .
While the central limit theorem allows to draw probabilistic conclusions concerning the empirical average, a similar situation presents itself when considering the sample maximum . Without any stabilization, almost surely converges to the upper endpoint of the common distribution function . Considering random variables with joint distribution and margins , the relatively straight forward yet key insight to think about extremes probabilisticly is where is a copula associated to . In the iid case this yields the classical starting point In case of iid data, the Fisher-Tippett-Gnedenko theorem now guarantees that if stabilizing constants and are found such that for some non-degenerate distribution function , then belongs to the class of generalized extreme value (GEV) distributions. This allows for similar refined analysis as in the case of the central limit theorem, only the standard normal distribution is now replaced by a GEV distribution. Different from the situation in the CLT, the GEV class of distributions crucially depends on a parameter that does not vanish in the limit and hence needs to be estimated. An overview to this problematic can for example for example be found in Estimation of the Extreme Value Index, while Hunting for Black Swans in the European Banking Sector Using Extreme Value Analysis gives an impression of how extreme value theory can be used.
Part of my research interests is to study the highlighted type of problems for dependent data. As it turns out, even for dependent data Fisher-Tippett-Gnedenko type theorems can be established and stabilizing constants from the iid case (if they are known) can be used to stabilize the maximum under dependence as discussed in Limiting behavior of maxima under dependence.
Generally, software that I have available can be found on my github page. Some of the (hopefully) more interesting projects are discussed below:
The R-package multIntTestFunc provides implementations of test functions for multivariate numerical integration that can be used to test multivariate integration routines. The package covers six different integration domains (unit hypercube, unit ball, unit sphere, standard simplex, non-negative real numbers and R^n). For each domain several functions with different properties (smooth, non-differentiable, ...) are available. The functions are available in all dimensions n >= 1. For each function the exact value of the integral is known and implemented to allow testing the accuracy of multivariate integration routines. The package is available on CRAN and github.
I have compiled some of my usual LaTeX definitions into the khermiscLaTeX package. The package has two main features: First, it provides an accessible interface for commonly used symbols, and second (more importantly) it uses the xparse package to provide overloaded versions of commonly used functions.