Monte-Carlo Methods and Stochastic Processes: From Linear to Non-Linear: Solutions to exercises

Exercise 1.2 (Box-Muller transform)

We first prove the converse statement, that is, starting with a couple $(R,\theta)$ having the distribution as announced, we obtain two independent Gaussian random variables. For any bounded measurable function $f:{\mathbb R}^2\to {\mathbb R}$ we have \begin{align*} & \mathbb{E}(f(X,Y)) = \mathbb{E}(f(R\cos(\theta),R\sin(\theta))) \\ & = \iint_{[0,2\pi]\times\mathbb{R}^+}{ f(\sqrt{\rho}\cos(\phi), \sqrt{\rho}\sin(\phi))\dfrac12 e^{-\frac{\rho}{2}}\dfrac{1}{2\pi}{\rm d}\rho {\rm d}\phi } \\ & (\text{applying the change of variables } x=\sqrt{\rho}\cos(\phi), y=\sqrt{\rho}\sin(\phi)) \\ & = \iint_{\mathbb{R}^2}{f(x,y)e^{-\frac{(x^2+y^2)}2}\dfrac{1}{2\pi} {\rm d}x{\rm d}y}, \end{align*} which is the expectation w.r.t. the 2-dimensional Gaussian distribution with independent components. Conversely, the above computations show that starting from independent standard Gaussian components, we retrieve that $R$ and $\theta$ are independent and distributed as claimed.

Exercise 1.4 (acceptance-rejection method)

Let us rewrite the condition $(x-1)^2 \leq -2\log(|u|)$ at the line 6 of the algorithm as
$$
|u|e^{(x-1)^2/2} \leq 1.
$$
Note that if $X \overset{d}{\rm =} {\cal E}xp(1)$ and $U \overset{d}{\rm =} \mathcal{U}([-1,1])$ then $Z = X\ {\rm sgn}(U)$ follows the Laplace distribution with the density
\begin{align*}
g(z) = \dfrac{e^{-|z|}}{2}.
\end{align*}
Remark that $|U|$ is uniform on $[0,1]$ and independent of ${\rm sgn}(U)$. We may notice that the algorithm performs an acceptance-rejection scheme with the Laplace law as an auxiliary distribution (see Proposition 1.3.2). Indeed, we simulate the pair $(U,X)$ until $|U|e^{(X-1)^2/2} \leq 1$ . This is equivalent to simulating $(|U|, Z) = (|U|, {\rm sgn}(U)X)$ until $|U|e^{(|Z|-1)^2/2} \leq 1$. Conditionally on this event $Z = {\rm sgn}(U)X$ will give us the target random variable.

It remains to find the target distribution of this scheme. Denote its density by $f(z)$, which has to satisfy
$$
\dfrac{g(z)}{f(z)} = \dfrac{1}{c}e^{(|z|-1)^2/2},
$$
for some constant $c$.
By a simple calculation we get
$$
f(z) = c\dfrac{g(z)}{e^{(|z|-1)^2/2}} = \dfrac{c}{2}e^{-|z| - (|z|-1)^2/2} = c^{\prime}e^{-z^2/2},
$$
which is the density of the standard normal variable.

Exercise 1.6 (ratio-of-uniforms method, Gamma distribution)

First note that $\Gamma(\alpha,\theta) \overset{\rm d}= \theta\Gamma(\alpha,1)$ so we will consider the case $\theta=1$. The distribution $\Gamma(\alpha,1)$ has the density proportional to
$$
f(z) = z^{\alpha-1}e^{-z}\mathbb{1}_{z\geq 0}.
$$
Note that the next simulation algorithm does not require the knowledge of the normalizing constant of the density, this is the interest of such a technique.\\
We apply Proposition 1.3.5 with $r=d=1$.
Following Lemma 1.3.6 we easily observe that $z\mapsto f(z)$ and $z\mapsto z^2 f(z)$ are bounded, and they both have a unique maximum. Solving $(\log(f(z)))^{\prime} = 0$ we get $\sup_z{f(z)} = f(\alpha-1)$. Similarly $\sup_z{z^2 f(z)} = (\alpha+1)^2f(\alpha+1)$.\\
Now using Lemma 1.3.6 we obtain
$$
A_{f,1} \subseteq \tilde{A}_{f,1} = [0,f(\alpha-1)]\times[0,(\alpha+1)^2f(\alpha+1)],
$$
where
$$
A_{f,1} = \left\{ (u,v)\in \mathbb{R}^2: 0 < u \leq \sqrt{f\left(\dfrac{v}{u}\right)} \right\}.
$$
Thus we may simulate a uniform variable $(U,V)$ in the rectangle $\tilde{A}_{f,1}$ until $U \leq \sqrt{f\left(\dfrac{V}{U}\right)} $, then $V/U$ gives us the desired $\Gamma(\alpha,1)$ distribution.

Exercise 1.9 (Archimedean copula)

We have to show that for any $(u_1,\ldots,u_d)\in [0,1]^d$ we have
$$
\mathbb{P}(U_1\leq u_1, \ldots, U_d\leq u_d) = C(u_1,\ldots,u_d) := \phi^{-1}(\phi(u_1),\ldots,\phi(u_d)).
$$
Recall that $\phi^{-1}(u) = \mathbb{E}(e^{-uY})$. Using the Dominated Convergence Theorem we deduce that $\phi^{-1}(u)$ is continuous and further, as $\mathbb{P}(Y>0)>0$, that $\phi^{-1}(u)$ is strictly decreasing. Note also that $\phi^{-1}(0)=1$ and $\phi^{-1}(u) \underset{u\to +\infty}{\to} 0$, using $\mathbb{P}(Y>0)=1$. Hence the function $\phi(\cdot)$ is well-defined on $[0,1]$, also continuous and strictly decreasing.

Using the independence of $X_i's$ along with the properties of the functions $\phi(\cdot)$ and $\phi^{-1}(\cdot)$, we write
\begin{align*}
\mathbb{P}(U_1\leq u_1, \ldots, U_d\leq u_d) &= \mathbb{P}\left( -\dfrac{1}{Y}\log(X_1)\geq \phi(u_1), \ldots,-\dfrac{1}{Y}\log(X_d)\geq \phi(u_d) \right) \\
& = \mathbb{P}\left(X_1\leq e^{-Y\phi(u_1)}, \ldots, X_d\leq e^{-Y\phi(u_d)} \right) \\
& = \mathbb{E}\left( \mathbb{P}(X_1\leq e^{-Y\phi(u_1)}, \ldots, X_d\leq e^{-Y\phi(u_d)} \mid Y ) \right) \\
& = \mathbb{E}( e^{-Y\phi(u_1)}\cdots e^{-Y\phi(u_d)}) \\
&= \phi^{-1}(\phi(u_1)+ \cdots + \phi(u_d)) = C(u_1, \ldots, u_d),
\end{align*}
which proves the result.

Exercise 2.2 (substitution method)

To make the problem non-trivial, we assume $u\neq 0$. For both estimators $\phi_{1,M}(u)$ and $\phi_{2,M}(u)$ the Law of Large Numbers implies their a.s. convergences to $\phi(u)$. Thus to compare the two estimators, we shall calculate and compare the corresponding asymptotic variances, which are later used to construct confidence intervals.

For $\phi_{1,M}(u)$ we have \begin{align*}\sqrt{M}(\phi_{1,M}(u) - \phi(u)) = \dfrac{1}{\sqrt{M}}\sum_{m=1}^M{(e^{uX_m} - \mathbb{E}(e^{uX}))}.\end{align*}
From the Central Limit Theorem $\sqrt{M}(\phi_{1,M}(u) - \phi(u))$ converges in distribution to a centered Gaussian variable, with variance ${\rm Var}(e^{uX})$. The latter is equal to
\begin{align*}\mathbb{E}(e^{2uX}) - \mathbb{E}(e^{uX})^2 = \phi(2u) - \phi(u)^2 = e^{2u^2\sigma^2} - e^{u^2\sigma^2} = e^{u^2\sigma^2}(e^{u^2\sigma^2} - 1).\end{align*}
Now consider $\phi_{2,M}(u)$. Since $\sigma_M^2=\frac{ 1 }{M }\sum_{m=1}^M X_m^2$, the Central Limit Theorem ensures that $\sqrt{M}(\sigma_M^2 - \sigma^2)$ converges in distribution to a centered Gaussian variable, with variance ${\rm Var}(X^2)=2\sigma^4$.
Now we apply the substitution method (Theorem 2.2.2) to derive that, for any regular function $f$,
\begin{align*}\sqrt{M}(f(\sigma_M^2) - f(\sigma^2)) {\underset{M\to \infty}\Longrightarrow}{\cal N}\Big(0, (f^{\prime}(\sigma^2))^2 2\sigma^4 \Big).\end{align*}
So, using this for the case $f(x) = e^{\frac{u^2}{2}x}$ as in $\phi_{2,M}(u)$ we obtain
\begin{align*}\sqrt{M}(\phi_{2,M}(u) - \phi(u)) {\underset{M\to \infty}\Longrightarrow} \mathcal{N}\left(0, \left(\frac{u^2}{2}\right)^2 e^{u^2\sigma^2} 2\sigma^4\right).\end{align*}
Observe that (for any $u\neq 0$)
\begin{align*}
e^{u^2\sigma^2}(e^{u^2\sigma^2} - 1)> e^{u^2\sigma^2}(u^2\sigma^2+\frac{ (u^2\sigma^2)^2 }{ 2})>\left(\frac{u^2}{2}\right)^2 e^{u^2\sigma^2} 2\sigma^4;\end{align*}
therefore the second estimator $\phi_{2,M}(u)$ yields a (asymptotically) better confidence interval that $\phi_{1,M}(u)$.

Exercise 2.7 (concentration inequality, maximum of Gaussian variables)

Let $Y = (Y_1, \ldots, Y_d)$ be i.i.d. standard Gaussian random variables.
First the function $f(y) = \sup_{1\leq i\leq d}{y_i}$ is 1-Lipschitz. Indeed, for any $y = (y_1, \ldots, y_d)$ and $y'= (y'_1, \ldots, y'_d)$ we have
\begin{align*}
y_i &\leq y_{i}' + |y - y'|,\\
\sup_{1\leq i\leq d}y_i& \leq \sup_{1\leq i\leq d}y_i' + |y- y'|,\\
\left| \sup_{1\leq i\leq d}{y_i} - \sup_{1\leq i\leq d}{y'_i}\right| &\leq |y-y'|.
\end{align*}
Then, using the concentration inequality (see Corollaries 2.4.13 and 2.4.16) we obtain
\begin{align*}& \mathbb{P}\left(\left|\sup_{1\leq i\leq d}{Y_i}-\mathbb{E}\left(\sup_{1\leq i\leq d}{Y_{i}}\right)\right| > \varepsilon \right) \leq 2\exp\left(-\dfrac{\varepsilon^2}{2}\right), \quad \forall\varepsilon \geq0.\end{align*}
More generally, $Y$ can be represented as $Y = L\tilde{Y}$, where $\tilde{Y}$ is a standard Gaussian vector with zero-mean and covariance matrix equal to identity, and where $L$ is a matrix such that $LL^\top$ equals the covariance matrix of $Y$. Notice that $\mathbb{E}(Y_i^2)=\sum_j L_{i,j}^2\leq \sigma^2$. Moreover, $f(y) = \sup_{1\leq i\leq d}{(Ly)_i}$ is $\sigma$-Lipschitz: this can be justified as in the i.i.d. case, using
\begin{align*}(Ly)_i \leq (Ly')_i + \sum_{j=1}^d{|L_{ij}|}|y_j - y'_j| &\leq(Ly')_i + \sqrt{\sum_{j=1}^d {|L_{ij}|^2}} |y - y'|,\\\left| \sup_{1\leq i\leq d}{(Ly)_i} - \sup_{1\leq i\leq d}{(Ly')_i}\right| &\leq \sigma|y-y'|.\end{align*}
Now use the concentration inequality for the $\sigma$-Lipschitz function $f(y) = \sup_{1\leq i\leq d}{(Ly)_i}$, it gives the announced result.
Let $\varepsilon>0$. From the result of i) we obtain
\begin{align*} \mathbb{P}\left(\dfrac{\sup_{1\leq i\leq d}{Y_i}}{\mathbb{E}(\sup_{1\leq i\leq d}{Y_i})} > 1+\varepsilon\right)& =\mathbb{P}\left(\sup_{1\leq i\leq d}{Y_i} - \mathbb{E}(\sup_{1\leq i\leq d}{Y_i}) > \varepsilon \mathbb{E}(\sup_{1\leq i\leq d}{Y_i}) \right) \\ & \leq \exp\left(- \dfrac{\varepsilon^2(\mathbb{E}(\sup_{1\leq i\leq d}{Y_i}))^2}{2\sup_{1\leq i\leq d}{\mathbb{E}(Y_i^2)}}\right)\to 0\end{align*}
using the assumptions as $d\to+\infty$. Similarly $ \mathbb{P}\left(\dfrac{\sup_{1\leq i\leq d}{Y_i}}{\mathbb{E}(\sup_{1\leq i\leq d}{Y_i})} < 1-\varepsilon\right) \to 0$. This justifies that $\dfrac{\sup_{1\leq i\leq d}{Y_i}}{\mathbb{E}(\sup_{1\leq i\leq d}{Y_i})}$ converges to 1 in probability.

Exercise 3.3 (stratification, optimal allocation)

We have (formula 3.2.2)
$${\rm Var}(I_{M_1, \ldots, M_k}^{strat.}) = \sum_{j=1}^k{p_j^2\dfrac{\sigma_j^2}{M_j}}, $$ with $\sum_{j=1}^k{M_j} = M$. The minimisation may be done by the Lagrange multiplier method: we minimize the function $$ L(M_1, \ldots, M_k, \lambda) = \sum_{j=1}^k{p_j^2\dfrac{\sigma_j^2}{M_j}} + \lambda \left( \sum_{j=1}^k{M_j} - M\right). $$ From the optimality condition $\partial_{M_j}L(M_1, \ldots, M_k, \lambda) = -\frac{p^2_j\sigma^2_j}{M^2_j}+\lambda= 0$ we get $M_j^{\ast} = p_j\sigma_j/\sqrt {\lambda}$. The constraint $\sum_{j=1}^k{M_j^{\ast}} = M$ implies that $$ M_j^{\ast} = M\dfrac{p_j\sigma_j}{\sum_{j=1}^k{p_j\sigma_j}}. $$ Another way to work out the optimal $M_j$'s is to re-interpret the variance ${\rm Var}(I_{M_1, \ldots, M_k}^{strat.}) $ as an expectation according to the probability distribution $\mathbb{P}(J=j)=\frac{M_j}M$: $${\rm Var}(I_{M_1, \ldots, M_k}^{strat.}) =M \mathbb{E}\left[\dfrac{p_J^2\sigma_J^2}{M^2_J} \right]. $$ The Jensen inequality gives $${\rm Var}(I_{M_1, \ldots, M_k}^{strat.}) \geq M\left(\mathbb{E}\left[\dfrac{p_J\sigma_J}{M_J} \right]\right)^2=M\left(\sum_{j=1}^k \frac{M_j}M p_j\dfrac{\sigma_j}{M_j} \right)^2=\frac{1}{M}\left(\sum_{j=1}^k p_j\sigma_j\right)^2. $$ The lower bound on the right hand side is achieved when the Jensen inequality is an equality, that is when $\dfrac{p_j\sigma_j}{M_j}$ is constant over $j$. We then retrieve the formula for $M_j^{\ast}$.
We write \begin{align*} & \sqrt{M}\left(I_{M^\ast_1, \ldots, M^\ast_k}^{strat.} - \mathbb{E}(X) \right) = \sqrt{M}\left(\sum_{j=1}^k{p_j\dfrac{1}{M_j^{\ast}}\sum_{m=1}^{M_j^{\ast}}{X_{j,m}}} - \sum_{j=1}^k{p_j\mathbb{E}(X|Z\in \mathcal{S}_j)} \right) \\ & = \sum_{j=1}^k x_j\sqrt{M_j^{\ast}}\left(\dfrac{1}{M_j^{\ast}}\sum_{m=1}^{M_j^{\ast}}{(X_{j,m} - \mathbb{E}(X|Z\in \mathcal{S}_j))} \right) \end{align*} where $x_j:=p_j \frac{( \sum_{i=1}^k p_i\sigma_i )^\frac{ 1 }{2 }}{(p_j\sigma_j)^\frac{ 1 }{2 }}$. Now from the standard CLT in dimension 1, we deduce that for each $j$ $$ {\cal E}_j(M_j^{\ast}):=\sqrt{M_j^{\ast}}\left(\dfrac{1}{M_j^{\ast}} \sum_{m=1}^{M_j^{\ast}} {(X_{j,m} - \mathbb{E}(X|Z\in \mathcal{S}_j) )}\right) {\underset{M\to \infty}\Longrightarrow} \mathcal{N}(0, \sigma^2_j). $$ Since the variables $(X_{j,m}:m\geq 1)$ are independent for different $j$, this is easy to derive a CLT for the vector $({\cal E}_1(M_1^{\ast}),\dots,{\cal E}_k(M_k^{\ast}))$. More formally, one may use the Levy theorem (Theorem A.1.3) to get (for any $u\in \mathbb{R}$) \begin{align*}\mathbb{E}(e^{iu \sqrt{M}(I_{M^\ast_1, \ldots, M^\ast_k}^{strat.} - \mathbb{E}(X) )})&= \mathbb{E}(e^{iu \sum_{j=1}^k x_j {\cal E}_j(M_j^{\ast})})\\ &=\prod_{j=1}^k\mathbb{E}(e^{iu x_j {\cal E}_j(M_j^{\ast})})\quad \text{(by independence between strata)}\\ &{\underset{M\to \infty}\longrightarrow}\prod_{j=1}^k e^{-\frac{ 1 }{2 }u^2 x^2_j \sigma^2_j}=e^{-\frac{ 1 }{2 }u^2 (\sum_{j=1}^k p_j\sigma_j)^2}. \end{align*} This shows that $$ \sqrt{M}\left(I_{M^\ast_1, \ldots, M^\ast_k}^{strat.} - \mathbb{E}(X) \right){\underset{M\to \infty}\Longrightarrow} \mathcal{N}\left((0, (\sum_{j=1}^k p_j\sigma_j)^2\right). $$

Exercise 3.6 (importance sampling, Gaussian vectors)

Let $Y \overset{d}{=} \mathcal{N}(0, {\rm Id})$ be a $d$-dimensional standard normal Gaussian, denote its density by $p(x)=(2\pi)^{-d/2}\exp(-\frac 1 2 |x|^2)$. Consider a new measure under which $Y$ is distributed as $SY + \theta$ under the initial one, where $\theta$ is a $d$-dimensional vector and $S$ is an invertible matrix. Then (following Proposition 3.4.8) the likelihood is given by \begin{align*} L& = \dfrac{1}{|{\rm det.}(S)|}\dfrac{p(S^{-1}(Y - \theta))}{p(Y)} \\ &= \dfrac{1}{|{\rm det.}(S)|}\exp \left( - \frac12(Y-\theta)^{\top}(SS^{\top})^{-1}(Y-\theta) + \frac12 Y^{\top}Y \right) \\ & = \dfrac{1}{|{\rm det.}(S)|}\exp \left( \frac12 Y^{\top}\Big({\rm Id}-(SS^{\top})^{-1}\Big)Y + Y^{\top}(SS^{\top})^{-1}\theta - \frac12 \theta^{\top}(SS^{\top})^{-1}\theta \right). \end{align*} When $Y, \Sigma, \theta$ are scalar quantities, we retrieve the second formula of Corollary 3.4.9.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Monte-Carlo Methods and Stochastic Processes: From Linear to Non-Linear

Solutions to exercises

No comments:

Post a Comment