ExRandom
3.0
|
Here we give a brief description of the Algorithm N for sampling exactly from the unit normal distribution. For details see Karney (2016). This also includes 2 improvements given by
Jump to
Here is the improved Algorithm N.
There are two changes compared to Karney (2016).
The new step N4 is simple to implement:
The Bernoulli trial \(E(x)\), returns true with probability \(\exp(-x)\) for \(x \in (0,1)]\). It is implemented by Algorithm E:
Here \(U\) is a uniform deviate \(U \in (0,1)\).
The step N5 is implemented as a specialization of step N4 in Karney for the case \(k = 0\). This is the Bernoulli trial \(B(x)\), which returns true with probability \(\exp(-\frac12x^2)\) for \(x \in (0,1)\), which is realized by Algorithm B, given below. Here step B2(b) generates a sequence of uniform deviates \(U_1, U_2, \ldots\) such that \(x > U_1 > U_2 > \ldots \); similarly step B2(c) generates a sequence of uniform deviates \(V_1, V_2, \ldots\) such that \(V_i < x\); at each step we also include a coin toss, step B2(a).
The probability that step B2 is entered with a particular value of \(n\) is
\[ \frac1{2^n} \times \frac{x^n}{n!} \times x^n, \]
where the separate factors are due to the \( n \) previous invocations of steps B2(a), B2(b), and B2(c). This realization of step N5 is slightly different from that proposed by Du et al. (2020).
This improvements are easily migrated into the implementation of sampling from the discrete normal distribution, discrete_normal_dist.
To analyze the algorithm further, let consider regard algorithm as successively generating the following distributions:
Explicit formulas for these distributions are
\[ \begin{align} f_0(k) &= \begin{cases} 1-p, & \text{if $k=0$,} \\ p, & \text{if $k=1$,} \end{cases}\\ f_1(k) &= \frac{p^k}{C_1}, \quad\text{for $ k \ge 0$,}\\ f_2(k) &= \frac{p^{k^2}}{C_2}, \quad\text{for $ k \ge 0$,}\\ f_3(y) &= f_2(k) = \frac{\exp(-\frac12 k^2)}{C_3}, \quad\text{for $ y \ge 0$,}\\ f_4(y) &= \frac{\exp(-\frac12 k^2) \exp(-kx)}{C_4}, \quad\text{for $ y \ge 0$,}\\ f_5(y) &= \frac{\exp(-\frac12 y^2)}{C_5}, \quad\text{for $ y \ge 0$,}\\ f_6(y) &= \frac{f_5(\left|y\right|)}2, \quad\text{for all $ y$.}\\ \end{align} \]
Here \( p = 1/\sqrt e \) and \( x = y - k \), with \( k = \lfloor y \rfloor \). We also define the constants,
\[ \begin{align} C_0 &= 1/(1-p)^2 = 6.459192,\\ C_1 &= 1/(1-p) = 2.541494,\\ C_2 &= \sum_{k=0}^\infty p^{k^2} = 1.753314,\\ C_3 &= C_2 ,\\ C_4 &= 1 + \sum_{k=1}^\infty p^{k^2} \frac{1-p^{2k}}k = 1.445512,\\ C_5 &= \sqrt{\pi/2} = 1.253314,\\ C_6 &= C_5 . \end{align} \]
Here \(C_i\) for \( i \in [1, 5] \) are the normalizing constants for the corresponding distributions \( f_i \). \(C_0\) and \(C_6\) are added so that the ratios \( C_i / C_j \) for \( i < j\) give the average number of samples from \( f_i \) needed to produce one sample from \( f_j \).
We see that \( C_2 \approx C_5 + \frac12 \) quite accurately. This reflects the fact that the rectangle rule yields a good result for the integral of the normal distribution even when the step size is 1. We also remark that the infinite sums for \(C_2\) and \(C_4\) and those given below converge very rapidly.
Later on, we will need the entropies of these distributions. The entropy, here measured in bits, is defined as mean value of \(-\log_2(f_i)\). We have
\[ \begin{align} H_0 &= -p\log_2 p - (1-p)\log_2(1-p) = 0.967002,\\ H_1 &= H_0/(1-p) = 2.457630,\\ H_2 &= 1.325722,\\ H_3 &= H_2 = 1.325722,\\ H_4 &= 1.059279,\\ H_5 &= \log_2(\sqrt{\pi e/2}) = 1.047096,\\ H = H_6 &= \log_2(\sqrt{2\pi e}) = H_5 + 1 = 2.047096 .\\ \end{align} \]
Du et al. (2020) give a comprehensive analysis of the algorithm accounting for the consumption of u-rands. Now, exrandom uses a u-rand to perform a coin toss, e.g., to select the sign of the normal deviate in step N6. Because this use of a u-rand will only use a single random digit when the base \(b\) is even (a "bit" when \( b = 2 \)) and \(b/(b-1)\) digits, on average, when \(b\) is odd, we will separately account this use of u-rands as "coin tosses".
u-rands are used in steps N1, N3, N4, N5 (steps B2(b) and B2(c) in Algorithm B). Coin tosses are used in steps N5 (step B2(a) in Algorithm B) and N6.
Here is the accounting of coin tosses and u-rands:
\[ U_1 = u_1 C_1/C_6 = 8.496998 \]
u-rands per normal deviate.\[ U_3 = u_3 C_3/C_6 = 1.398942 \]
u-rands per normal deviate.\[ F_n = \sum_{k=n}^{\infty} f_2(k) \]
is the complementary cumulative function of \(f_2\) and the number of u-rands, averaged over \(x\) is \(e_{n-1}\), average value of \(e^x \times e^{-nx}\), where\[ e_n = \int_0^1 e^{-nx}\, dx = \frac{1-e^{-n}}n, \]
with the special case, \(e_0 = 1\). The total number of u-rands for each attempt at step N4 is then\[ u_4 = \sum_{n=0}^\infty e_{n-1} F_{n+1}. \]
This quantity is the average number of u-rands needed for each attempt at step N4, i.e., per result from step N3. So the average number of u-rands consumed by step N4 per normal deviate is\[ U_4 = u_4 C_3/C_6 = 1.155795. \]
\[ \begin{align} \text{B2(a):}\quad & \sum_{n=0}^\infty \frac1{2^n} \frac{x^n}{n!} x^n = \exp\biggl(\frac{x^2}2\biggr), \\ \text{B2(b):}\quad & \sum_{n=0}^\infty \frac1{2^{n+1}} \frac{x^n}{n!} x^n = \frac12\exp\biggl(\frac{x^2}2\biggr), \\ \text{B2(c):}\quad & \sum_{n=0}^\infty \frac1{2^{n+1}} \frac{x^{n+1}}{(n+1)!} x^n = \frac1x\biggl(\exp\biggl(\frac{x^2}2\biggr)-1\biggr). \end{align} \]
The average number of coin tosses is given by the expression for B2(a), while the number of u-rands is given by summing the expressions for B2(b) and B2(c). These quantities need to be weighted by the distribution of \(x\) in the output from step N4, which is\[ g_4(x) = \sum_{k=0}^\infty f_4(k + x) = \frac1{C_4} \sum_{k=0}^\infty \exp(-\frac12 k^2) \exp(-kx). \]
The mean number of coin tosses and u-rands per invocation of step N5 is thus\[ \begin{align} b_5 &= \int_0^1 \exp\biggl(\frac{x^2}2\biggr) g_4(x)\,dx, \\ u_5 &= \int_0^1 \biggl[\biggl(\frac12 + \frac1x\biggr)\exp\biggl(\frac{x^2}2\biggr) - \frac1x\biggr] g_4(x)\,dx. \end{align} \]
The first integral can be performed in closed form (yielding an infinite sum over error functions); the second integral was performed numerically (setting the term in square brackets to \(\frac12\) in the limit \(x\rightarrow0\)). The mean number of coin tosses and u-rands used by step N5 per normal deviate is\[ \begin{align} B_5 &= b_5 C_4/C_6 = 1.358922,\\ U_5 &= u_5 C_4/C_6 = 0.987790. \end{align} \]
\[ B_6 = b_6 = 1. \]
Combining these results, the average number of coin tosses required per random deviate is
\[ B_c = B_5 + B_6 = 2.358922, \]
and the average number of u-rands required
\[ U = U_1 + U_3 + U_4 + U_5 = 12.039525. \]
The subscript in \(B_c\) indicates this is the number of coin tosses.
It is easy to confirm these results by instrumenting the unit_normal_dist class to count the use of coin tosses and u-rands. Alternatively, unit_normal_dist can be run using random digits with a large base; the largest supported base is \(2^{32}\) (obtained by setting the template parameter b = 0
for rand_digit). With such a large base, unit_normal_dist will rarely (about 1 time in \(10^9\)) have to sample more than one digit of a u-rand; and because, in exrandom, coin tossing is implemented with u-rands, we expect that the average number of random digits required to produce a normal deviate to be \(B_c + U\). However this overlooks the fact that unit_normal_dist can return a result with no digits sampled. This happens when \(k = 0\) (so step N4 does nothing) and if the first coin toss in step B2(a) causes algorithm B to exit immediately, with the u-rand, \(\pm 0\ldots\) being returned as the normal deviate. The fraction of normal deviates with an "unsampled" fraction is \( 1/\sqrt{2\pi} = 0.398942 \) (a rather substantial fraction). So the number of random digits required to produce a normal deviate (as a u-rand) in the limit of a large base is about
\[ B_c + U - 1/\sqrt{2\pi} = 13.999505. \]
Here is a table comparing the current results (last column) with those of Karney (2016) and Du et al. (2020). The analysis of Karney (2016) was given in Du et al. (2020). The only change between Du et al. (2020) and the current method is in step N5.
Quantity | Karney, 2016 | Du et al, 2020 | current |
---|---|---|---|
\(B_5\) | 0.953438 | 0 | 1.358922 |
\(B_6\) | 1 | ||
\(B_c\) | 1.953438 | 1 | 2.358922 |
\(U_1\) | 8.496998 | ||
\(U_2\) | 2.635128 | 0 | |
\(U_3\) | 1.398942 | ||
\(U_4\) | 3.0695 | 1.155795 | |
\(U_5\) | 1.667251 | 0.987790 | |
\(U\) | 15.6005 | 12.718986 | 12.039525 |
The data for Karney (2016) are given by:
The data for Du et al (2020) are found as follows:
The analysis in the previous section focused on measuring the consumption of u-rands. This is a reasonable tactic if randomness is obtained in big chunks, e.g., as a random unsigned int
, which is equivalent to setting the base for random digits exrandom to \(2^{32}\). However, the algorithm makes much more parsimonious use of random data when the base is 2 and the randomness is delivered as a stream of random bits. Unfortunately, the analysis of the algorithm at the level of detail is much more difficult. So here we just provide empirical data on the use of random bits. In the table below we compare the current version of the algorithm with Karney (2016) and Du et al. (2020). The quantities tabulated here are:
Quantity | Karney, 2016 | Du et al, 2020 | current |
---|---|---|---|
\(B_k\) | 19.2143 | 14.6660 | |
\(B_x\) | 9.7858 | 8.7825 | 8.3523 |
\(B_s\) | 1 | ||
\(B\) | 30.0002 | 24.4486 | 24.0183 |
\(L\) | 1.5558 | 1.7770 | 1.4423 |
\(C\) | 28.4443 | 22.6715 | 22.5760 |
\(F\) | 76.9926 | ||
\(T\) | 20.5289 |
We see that the cost has improved by about 5.87 bits compared to Karney (2016) with most of the improvement, 4.55 bits, coming from the increased efficiency of sampling \(k\). The improvement of the current method over Du et al. (2020) is modest, 0.10 bit. The values for \(B_k\) are simply related to \(U_{1,2}\) in the previous section by
\[ B_k = B_0 (U_1 + U_2) / u_0, \]
where the division by \(u_0 = \sqrt e\) gives the number of invocations of the Bernoulli trial \(f_0(k)\) and \(B_0 = 2.84574\) is the average number of bits per trial (determined empirically).
We can determine various contributions to the toll:
We use the terminology, "perfect sampling", to refer to sampling with no toll (i.e., the number of bit used equals the entropy gain. Obviously the step of assigning the sign \(s\) does not contribute to the toll. It takes one bit, but the full normal distribution has 1 bit additional entropy compared to the one-sided normal distribution.
The contributions to the toll in sampling \(k\): Each normal deviate requires \(C_0/C_6\) samples of \(f_0(k)\). The toll for each invocation of \(f_0(k)\) is \(B_0 - H_0 = 1.87874\). The extrinsic toll per normal deviate is then
\[ T_{ke} = B_k - H_0 C_0/C_6 = 9.6824. \]
The intrinsic toll is the difference in the entropy provided by \(f_0(k)\) and the entropy in the resulting distribution \(f_2(k)\). This toll is then
\[ T_{ki} = H_0 C_0/C_6 - H_2 C_2/C_6 = 3.1290. \]
Summing these terms, we have
\[ T_k = T_{ke} + T_{ki} = B_k - H_2 C_2/C_6 = 12.8114 \]
as the toll from sampling \(k\).
The contributions to the toll in sampling \(x\): In addition to the entropy provided by the sampled \(k\), we need to perform Bernoulli trials with probability \(p(k,x) = \exp(-kx-\frac12 x^2/2)\). Assuming that these were perfectly sampled, the resulting entropy input is obtained by averaging the entropy of a single trial,
\[ h_b(k,x) = -p(k,x) \log_2 p(k,x) - (1 -p(k,x)) \log_2(1-p(k,x)), \]
over \(k\) and \(x\), to obtain
\[ H_b = \int_0^1 dx \sum_{k=0}^\infty f_2(k) h_b(k,x) = 0.629964. \]
The intrinsic toll is then
\[ T_{xi} = (H_2 + H_b) C_3/C_5 - H_5 = 1.6888. \]
The extrinsic toll due to imperfect Bernoulli trial is then
\[ T_{xe} = B_x - L - H_b C_3/C_5 = 6.0287 \]
and the total toll from sampling \(x\) is
\[ T_x = T_{xe} + T_{xi} = B_x - L + H_2 C_3/C_5 - H_5 = 7.7175. \]
For completeness, let us add the toll due to selecting the sign of the normal deviate,
\[ T_s = B_s + H_5 - H_6 = 0; \]
and finally the total toll is
\[ \begin{align} T &= B_k + B_x + B_s - L - H_6\\ &= T_{ke} + T_{ki} + T_{xe} + T_{xi} + T_s \\ &= 9.6824 + 3.1290 + 6.0287 + 1.6888 + 0 \\ &= T_k + T_x \\ &= 12.8114 + 7.7175 \\ &= 20.5289. \end{align} \]
We can attempt to continue this analysis to divide \(T_x\) between steps N4 and N5. However this is difficult because step N4 partially samples the fraction \(x\) and this affects the performance of step N5. Here is the raw empirical data which might be useful:
\[ \begin{align} B_{x4} &= 3.8627, \\ B_{x5} &= 4.4896, \\ L_4 &= 0.7566, \\ L_4' &= 0.6618, \\ L_5 &= 0.7805. \end{align} \]
Here \(B_{x4}\) and \(B_{x5}\) are the average number of bits consumed in steps N4 and N5 per normal deviate ( \(B_x = B_{x4} + B_{x5}\)), \(L_4 C_5/C_4 = 0.6560\) is the average number of bits in the fraction of \(x\) after N4 concludes successfully, \(L_4'\) is the average number of bits in the fraction of \(x\) at the start of step N5, considering only those values which are accepted by this step, \(L_5\) is the number of bits added to the fraction of \(x\) by step N5 ( \(L = L_4' + L_5\)).
A deeper analysis of the algorithm is daunting. The analysis of the much simpler algorithm for sampling from the exponential distribution given by
runs to 26 pages. However we note that the biggest contribution to the toll is the inefficiency of Bernoulli sampling with probability \(p\), i.e., sampling \(f_0(k)\). Flajolet and Saheb mention a suggestion by Mike Paterson for how the toll might be reduced. In the current implementation, when two u-rands are compared, \(x < \)y, 2 bits are drawn sequentially for both \(x\) and \(y\) until a mismatch is found. Paterson suggested instead that 1 bit be used to decide whether the corresponding digits of \(x\) and \(y\) match. An additional bit is used only if there's a mismatch to decide which u-rand is smaller. (Note that this idea only applies for base \(b = 2\).) Implementing this for \(f_0(k)\) is rather simple: it results is a reduction of \(B_0\), the average number of bits needs to sample \(f_0(k)\) by \(\delta B_0 = 0.25085\) (from \(2.84574\) to \(2.59489\)). \(f_0(k)\) is invoked an average of \(C_0/C_6 = 5.153690\) times per normal deviate, so the corresponding reduction in the toll is \(\delta T_{ke} = \delta B_0 C_0/C_6= 1.2928\) to \(T = 19.2361\). Similar changes could be made in the implementations of N4 and N5.