Monte-Carlo Methods and Stochastic Processes: From Linear to Non-Linear: Solutions to exercises

Exercise 1.2 (Box-Muller transform)

We first prove the converse statement, that is, starting with a couple

$(R,\theta)$ having the distribution as announced, we obtain two independent Gaussian random variables. For any bounded measurable function

$f:{\mathbb R}^2\to {\mathbb R}$ we have

$\begin{align*} & \mathbb{E}(f(X,Y)) = \mathbb{E}(f(R\cos(\theta),R\sin(\theta))) \\ & = \iint_{[0,2\pi]\times\mathbb{R}^+}{ f(\sqrt{\rho}\cos(\phi), \sqrt{\rho}\sin(\phi))\dfrac12 e^{-\frac{\rho}{2}}\dfrac{1}{2\pi}{\rm d}\rho {\rm d}\phi } \\ & (\text{applying the change of variables } x=\sqrt{\rho}\cos(\phi), y=\sqrt{\rho}\sin(\phi)) \\ & = \iint_{\mathbb{R}^2}{f(x,y)e^{-\frac{(x^2+y^2)}2}\dfrac{1}{2\pi} {\rm d}x{\rm d}y}, \end{align*}$ which is the expectation w.r.t. the 2-dimensional Gaussian distribution with independent components. Conversely, the above computations show that starting from independent standard Gaussian components, we retrieve that

$R$ and

$\theta$ are independent and distributed as claimed.

Exercise 1.4 (acceptance-rejection method)

Let us rewrite the condition

$(x-1)^2 \leq -2\log(|u|)$ at the line 6 of the algorithm as

$|u|e^{(x-1)^2/2} \leq 1.$
Note that if

$X \overset{d}{\rm =} {\cal E}xp(1)$ and

$U \overset{d}{\rm =} \mathcal{U}([-1,1])$ then

$Z = X\ {\rm sgn}(U)$ follows the Laplace distribution with the density

$\begin{align*} g(z) = \dfrac{e^{-|z|}}{2}. \end{align*}$
Remark that

$|U|$ is uniform on

$[0,1]$ and independent of

${\rm sgn}(U)$ . We may notice that the algorithm performs an acceptance-rejection scheme with the Laplace law as an auxiliary distribution (see Proposition 1.3.2). Indeed, we simulate the pair

$(U,X)$ until

$|U|e^{(X-1)^2/2} \leq 1$ . This is equivalent to simulating

$(|U|, Z) = (|U|, {\rm sgn}(U)X)$ until

$|U|e^{(|Z|-1)^2/2} \leq 1$ . Conditionally on this event

$Z = {\rm sgn}(U)X$ will give us the target random variable.

It remains to find the target distribution of this scheme. Denote its density by

$f(z)$ , which has to satisfy

$\dfrac{g(z)}{f(z)} = \dfrac{1}{c}e^{(|z|-1)^2/2},$
for some constant

$c$ .
By a simple calculation we get

$f(z) = c\dfrac{g(z)}{e^{(|z|-1)^2/2}} = \dfrac{c}{2}e^{-|z| - (|z|-1)^2/2} = c^{\prime}e^{-z^2/2},$
which is the density of the standard normal variable.

Exercise 1.6 (ratio-of-uniforms method, Gamma distribution)

First note that

$\Gamma(\alpha,\theta) \overset{\rm d}= \theta\Gamma(\alpha,1)$ so we will consider the case

$\theta=1$ . The distribution

$\Gamma(\alpha,1)$ has the density proportional to

$f(z) = z^{\alpha-1}e^{-z}\mathbb{1}_{z\geq 0}.$
Note that the next simulation algorithm does not require the knowledge of the normalizing constant of the density, this is the interest of such a technique.\\
We apply Proposition 1.3.5 with

$r=d=1$ .
Following Lemma 1.3.6 we easily observe that

$z\mapsto f(z)$ and

$z\mapsto z^2 f(z)$ are bounded, and they both have a unique maximum. Solving

$(\log(f(z)))^{\prime} = 0$ we get

$\sup_z{f(z)} = f(\alpha-1)$ . Similarly

$\sup_z{z^2 f(z)} = (\alpha+1)^2f(\alpha+1)$ .\\
Now using Lemma 1.3.6 we obtain

$A_{f,1} \subseteq \tilde{A}_{f,1} = [0,f(\alpha-1)]\times[0,(\alpha+1)^2f(\alpha+1)],$
where

$A_{f,1} = \left\{ (u,v)\in \mathbb{R}^2: 0 < u \leq \sqrt{f\left(\dfrac{v}{u}\right)} \right\}.$
Thus we may simulate a uniform variable

$(U,V)$ in the rectangle

$\tilde{A}_{f,1}$ until

$U \leq \sqrt{f\left(\dfrac{V}{U}\right)}$ , then

$V/U$ gives us the desired

$\Gamma(\alpha,1)$ distribution.

Exercise 1.9 (Archimedean copula)

We have to show that for any

$(u_1,\ldots,u_d)\in [0,1]^d$ we have

$\mathbb{P}(U_1\leq u_1, \ldots, U_d\leq u_d) = C(u_1,\ldots,u_d) := \phi^{-1}(\phi(u_1),\ldots,\phi(u_d)).$
Recall that

$\phi^{-1}(u) = \mathbb{E}(e^{-uY})$ . Using the Dominated Convergence Theorem we deduce that

$\phi^{-1}(u)$ is continuous and further, as

$\mathbb{P}(Y>0)>0$ , that

$\phi^{-1}(u)$ is strictly decreasing. Note also that

$\phi^{-1}(0)=1$ and

$\phi^{-1}(u) \underset{u\to +\infty}{\to} 0$ , using

$\mathbb{P}(Y>0)=1$ . Hence the function

$\phi(\cdot)$ is well-defined on

$[0,1]$ , also continuous and strictly decreasing.

Using the independence of

$X_i's$ along with the properties of the functions

$\phi(\cdot)$ and

$\phi^{-1}(\cdot)$ , we write

$\begin{align*} \mathbb{P}(U_1\leq u_1, \ldots, U_d\leq u_d) &= \mathbb{P}\left( -\dfrac{1}{Y}\log(X_1)\geq \phi(u_1), \ldots,-\dfrac{1}{Y}\log(X_d)\geq \phi(u_d) \right) \\ & = \mathbb{P}\left(X_1\leq e^{-Y\phi(u_1)}, \ldots, X_d\leq e^{-Y\phi(u_d)} \right) \\ & = \mathbb{E}\left( \mathbb{P}(X_1\leq e^{-Y\phi(u_1)}, \ldots, X_d\leq e^{-Y\phi(u_d)} \mid Y ) \right) \\ & = \mathbb{E}( e^{-Y\phi(u_1)}\cdots e^{-Y\phi(u_d)}) \\ &= \phi^{-1}(\phi(u_1)+ \cdots + \phi(u_d)) = C(u_1, \ldots, u_d), \end{align*}$
which proves the result.

Exercise 2.2 (substitution method)

To make the problem non-trivial, we assume

$u\neq 0$ . For both estimators

$\phi_{1,M}(u)$ and

$\phi_{2,M}(u)$ the Law of Large Numbers implies their a.s. convergences to

$\phi(u)$ . Thus to compare the two estimators, we shall calculate and compare the corresponding asymptotic variances, which are later used to construct confidence intervals.

For $\phi_{1,M}(u)$ we have $\begin{align*}\sqrt{M}(\phi_{1,M}(u) - \phi(u)) = \dfrac{1}{\sqrt{M}}\sum_{m=1}^M{(e^{uX_m} - \mathbb{E}(e^{uX}))}.\end{align*}$
From the Central Limit Theorem $\sqrt{M}(\phi_{1,M}(u) - \phi(u))$ converges in distribution to a centered Gaussian variable, with variance ${\rm Var}(e^{uX})$ . The latter is equal to
$\begin{align*}\mathbb{E}(e^{2uX}) - \mathbb{E}(e^{uX})^2 = \phi(2u) - \phi(u)^2 = e^{2u^2\sigma^2} - e^{u^2\sigma^2} = e^{u^2\sigma^2}(e^{u^2\sigma^2} - 1).\end{align*}$
Now consider $\phi_{2,M}(u)$ . Since $\sigma_M^2=\frac{ 1 }{M }\sum_{m=1}^M X_m^2$ , the Central Limit Theorem ensures that $\sqrt{M}(\sigma_M^2 - \sigma^2)$ converges in distribution to a centered Gaussian variable, with variance ${\rm Var}(X^2)=2\sigma^4$ .
Now we apply the substitution method (Theorem 2.2.2) to derive that, for any regular function $f$ ,
$\begin{align*}\sqrt{M}(f(\sigma_M^2) - f(\sigma^2)) {\underset{M\to \infty}\Longrightarrow}{\cal N}\Big(0, (f^{\prime}(\sigma^2))^2 2\sigma^4 \Big).\end{align*}$
So, using this for the case $f(x) = e^{\frac{u^2}{2}x}$ as in $\phi_{2,M}(u)$ we obtain
$\begin{align*}\sqrt{M}(\phi_{2,M}(u) - \phi(u)) {\underset{M\to \infty}\Longrightarrow} \mathcal{N}\left(0, \left(\frac{u^2}{2}\right)^2 e^{u^2\sigma^2} 2\sigma^4\right).\end{align*}$
Observe that (for any $u\neq 0$ )
$\begin{align*} e^{u^2\sigma^2}(e^{u^2\sigma^2} - 1)> e^{u^2\sigma^2}(u^2\sigma^2+\frac{ (u^2\sigma^2)^2 }{ 2})>\left(\frac{u^2}{2}\right)^2 e^{u^2\sigma^2} 2\sigma^4;\end{align*}$
therefore the second estimator $\phi_{2,M}(u)$ yields a (asymptotically) better confidence interval that $\phi_{1,M}(u)$ .

Exercise 2.7 (concentration inequality, maximum of Gaussian variables)

Let $Y = (Y_1, \ldots, Y_d)$ be i.i.d. standard Gaussian random variables.
First the function $f(y) = \sup_{1\leq i\leq d}{y_i}$ is 1-Lipschitz. Indeed, for any $y = (y_1, \ldots, y_d)$ and $y'= (y'_1, \ldots, y'_d)$ we have
$\begin{align*} y_i &\leq y_{i}' + |y - y'|,\\ \sup_{1\leq i\leq d}y_i& \leq \sup_{1\leq i\leq d}y_i' + |y- y'|,\\ \left| \sup_{1\leq i\leq d}{y_i} - \sup_{1\leq i\leq d}{y'_i}\right| &\leq |y-y'|. \end{align*}$
Then, using the concentration inequality (see Corollaries 2.4.13 and 2.4.16) we obtain
$\begin{align*}& \mathbb{P}\left(\left|\sup_{1\leq i\leq d}{Y_i}-\mathbb{E}\left(\sup_{1\leq i\leq d}{Y_{i}}\right)\right| > \varepsilon \right) \leq 2\exp\left(-\dfrac{\varepsilon^2}{2}\right), \quad \forall\varepsilon \geq0.\end{align*}$
More generally, $Y$ can be represented as $Y = L\tilde{Y}$ , where $\tilde{Y}$ is a standard Gaussian vector with zero-mean and covariance matrix equal to identity, and where $L$ is a matrix such that $LL^\top$ equals the covariance matrix of $Y$ . Notice that $\mathbb{E}(Y_i^2)=\sum_j L_{i,j}^2\leq \sigma^2$ . Moreover, $f(y) = \sup_{1\leq i\leq d}{(Ly)_i}$ is $\sigma$ -Lipschitz: this can be justified as in the i.i.d. case, using
$\begin{align*}(Ly)_i \leq (Ly')_i + \sum_{j=1}^d{|L_{ij}|}|y_j - y'_j| &\leq(Ly')_i + \sqrt{\sum_{j=1}^d {|L_{ij}|^2}} |y - y'|,\\\left| \sup_{1\leq i\leq d}{(Ly)_i} - \sup_{1\leq i\leq d}{(Ly')_i}\right| &\leq \sigma|y-y'|.\end{align*}$
Now use the concentration inequality for the $\sigma$ -Lipschitz function $f(y) = \sup_{1\leq i\leq d}{(Ly)_i}$ , it gives the announced result.
Let $\varepsilon>0$ . From the result of i) we obtain
$\begin{align*} \mathbb{P}\left(\dfrac{\sup_{1\leq i\leq d}{Y_i}}{\mathbb{E}(\sup_{1\leq i\leq d}{Y_i})} > 1+\varepsilon\right)& =\mathbb{P}\left(\sup_{1\leq i\leq d}{Y_i} - \mathbb{E}(\sup_{1\leq i\leq d}{Y_i}) > \varepsilon \mathbb{E}(\sup_{1\leq i\leq d}{Y_i}) \right) \\ & \leq \exp\left(- \dfrac{\varepsilon^2(\mathbb{E}(\sup_{1\leq i\leq d}{Y_i}))^2}{2\sup_{1\leq i\leq d}{\mathbb{E}(Y_i^2)}}\right)\to 0\end{align*}$
using the assumptions as $d\to+\infty$ . Similarly $\mathbb{P}\left(\dfrac{\sup_{1\leq i\leq d}{Y_i}}{\mathbb{E}(\sup_{1\leq i\leq d}{Y_i})} < 1-\varepsilon\right) \to 0$ . This justifies that $\dfrac{\sup_{1\leq i\leq d}{Y_i}}{\mathbb{E}(\sup_{1\leq i\leq d}{Y_i})}$ converges to 1 in probability.

Exercise 3.3 (stratification, optimal allocation)

We have (formula 3.2.2)
${\rm Var}(I_{M_1, \ldots, M_k}^{strat.}) = \sum_{j=1}^k{p_j^2\dfrac{\sigma_j^2}{M_j}},$ with $\sum_{j=1}^k{M_j} = M$ . The minimisation may be done by the Lagrange multiplier method: we minimize the function $L(M_1, \ldots, M_k, \lambda) = \sum_{j=1}^k{p_j^2\dfrac{\sigma_j^2}{M_j}} + \lambda \left( \sum_{j=1}^k{M_j} - M\right).$ From the optimality condition $\partial_{M_j}L(M_1, \ldots, M_k, \lambda) = -\frac{p^2_j\sigma^2_j}{M^2_j}+\lambda= 0$ we get $M_j^{\ast} = p_j\sigma_j/\sqrt {\lambda}$ . The constraint $\sum_{j=1}^k{M_j^{\ast}} = M$ implies that $M_j^{\ast} = M\dfrac{p_j\sigma_j}{\sum_{j=1}^k{p_j\sigma_j}}.$ Another way to work out the optimal $M_j$ 's is to re-interpret the variance ${\rm Var}(I_{M_1, \ldots, M_k}^{strat.})$ as an expectation according to the probability distribution $\mathbb{P}(J=j)=\frac{M_j}M$ : ${\rm Var}(I_{M_1, \ldots, M_k}^{strat.}) =M \mathbb{E}\left[\dfrac{p_J^2\sigma_J^2}{M^2_J} \right].$ The Jensen inequality gives ${\rm Var}(I_{M_1, \ldots, M_k}^{strat.}) \geq M\left(\mathbb{E}\left[\dfrac{p_J\sigma_J}{M_J} \right]\right)^2=M\left(\sum_{j=1}^k \frac{M_j}M p_j\dfrac{\sigma_j}{M_j} \right)^2=\frac{1}{M}\left(\sum_{j=1}^k p_j\sigma_j\right)^2.$ The lower bound on the right hand side is achieved when the Jensen inequality is an equality, that is when $\dfrac{p_j\sigma_j}{M_j}$ is constant over $j$ . We then retrieve the formula for $M_j^{\ast}$ .
We write $\begin{align*} & \sqrt{M}\left(I_{M^\ast_1, \ldots, M^\ast_k}^{strat.} - \mathbb{E}(X) \right) = \sqrt{M}\left(\sum_{j=1}^k{p_j\dfrac{1}{M_j^{\ast}}\sum_{m=1}^{M_j^{\ast}}{X_{j,m}}} - \sum_{j=1}^k{p_j\mathbb{E}(X|Z\in \mathcal{S}_j)} \right) \\ & = \sum_{j=1}^k x_j\sqrt{M_j^{\ast}}\left(\dfrac{1}{M_j^{\ast}}\sum_{m=1}^{M_j^{\ast}}{(X_{j,m} - \mathbb{E}(X|Z\in \mathcal{S}_j))} \right) \end{align*}$ where $x_j:=p_j \frac{( \sum_{i=1}^k p_i\sigma_i )^\frac{ 1 }{2 }}{(p_j\sigma_j)^\frac{ 1 }{2 }}$ . Now from the standard CLT in dimension 1, we deduce that for each $j$ ${\cal E}_j(M_j^{\ast}):=\sqrt{M_j^{\ast}}\left(\dfrac{1}{M_j^{\ast}} \sum_{m=1}^{M_j^{\ast}} {(X_{j,m} - \mathbb{E}(X|Z\in \mathcal{S}_j) )}\right) {\underset{M\to \infty}\Longrightarrow} \mathcal{N}(0, \sigma^2_j).$ Since the variables $(X_{j,m}:m\geq 1)$ are independent for different $j$ , this is easy to derive a CLT for the vector $({\cal E}_1(M_1^{\ast}),\dots,{\cal E}_k(M_k^{\ast}))$ . More formally, one may use the Levy theorem (Theorem A.1.3) to get (for any $u\in \mathbb{R}$ ) $\begin{align*}\mathbb{E}(e^{iu \sqrt{M}(I_{M^\ast_1, \ldots, M^\ast_k}^{strat.} - \mathbb{E}(X) )})&= \mathbb{E}(e^{iu \sum_{j=1}^k x_j {\cal E}_j(M_j^{\ast})})\\ &=\prod_{j=1}^k\mathbb{E}(e^{iu x_j {\cal E}_j(M_j^{\ast})})\quad \text{(by independence between strata)}\\ &{\underset{M\to \infty}\longrightarrow}\prod_{j=1}^k e^{-\frac{ 1 }{2 }u^2 x^2_j \sigma^2_j}=e^{-\frac{ 1 }{2 }u^2 (\sum_{j=1}^k p_j\sigma_j)^2}. \end{align*}$ This shows that $\sqrt{M}\left(I_{M^\ast_1, \ldots, M^\ast_k}^{strat.} - \mathbb{E}(X) \right){\underset{M\to \infty}\Longrightarrow} \mathcal{N}\left((0, (\sum_{j=1}^k p_j\sigma_j)^2\right).$

Exercise 3.6 (importance sampling, Gaussian vectors)

Let

$Y \overset{d}{=} \mathcal{N}(0, {\rm Id})$ be a

$d$ -dimensional standard normal Gaussian, denote its density by

$p(x)=(2\pi)^{-d/2}\exp(-\frac 1 2 |x|^2)$ . Consider a new measure under which

$Y$ is distributed as

$SY + \theta$ under the initial one, where

$\theta$ is a

$d$ -dimensional vector and

$S$ is an invertible matrix. Then (following Proposition 3.4.8) the likelihood is given by

$\begin{align*} L& = \dfrac{1}{|{\rm det.}(S)|}\dfrac{p(S^{-1}(Y - \theta))}{p(Y)} \\ &= \dfrac{1}{|{\rm det.}(S)|}\exp \left( - \frac12(Y-\theta)^{\top}(SS^{\top})^{-1}(Y-\theta) + \frac12 Y^{\top}Y \right) \\ & = \dfrac{1}{|{\rm det.}(S)|}\exp \left( \frac12 Y^{\top}\Big({\rm Id}-(SS^{\top})^{-1}\Big)Y + Y^{\top}(SS^{\top})^{-1}\theta - \frac12 \theta^{\top}(SS^{\top})^{-1}\theta \right). \end{align*}$ When

$Y, \Sigma, \theta$ are scalar quantities, we retrieve the second formula of Corollary 3.4.9.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Monte-Carlo Methods and Stochastic Processes: From Linear to Non-Linear

Solutions to exercises

No comments:

Post a Comment