## Drawing a natural number at random, foundations of probability, determinism, Laplace (8)

In the analysis so far, I have associated Benford’s law and Zipf’s law to be comparable. The main reason for this lies in the mixed motivations that I use to arrive at the proposed solution from post 4, which I once again repeat below (just for readability of this post):

Solution to drawing a natural number at random’:

* We can only assign relative chances, and the role of the natural number $0$ remains mysterious. (This has been explained)

* For $1\leq n,m \in \mathbb{N}$ let’s denote the relative chance of drawing $n$ vs. drawing $m$ by: $\frac{P(n)}{P(m)}$.

* For $1\leq n,m \in \mathbb{N}$, we find that $\frac{P(n)}{P(m)} = \frac{\log{\frac{n+1}{n}}}{\log{\frac{m+1}{m}}}$ (Also explained)

* An alternative discrete’ or Zipfian’ case $P_{\rm discrete}$ can perhaps be formulated, yielding:  for $1\leq n,m \in \mathbb{N}$, we find that $\frac{P_{\rm discrete}(n)}{P_{\rm discrete}(m)} = \frac{m}{n}$.

These mixed motivations revolve around the density’ function $p(x)=\frac{1}{x}$ and its discrete version $p(n)=\frac{1}{n}$. Since these two are obviously related, we have treated Benford’s law and Zipf’s law as more or less similar. We next discuss the last *-item above, which focuses on the difference between Benford’s law and Zipf’s law. This is interesting in its own right, regardless of the highly speculative nature of our question how to draw a natural number at random?’.

Similarity and discrepancy between Benford’s law and Zipf’s law
A good reference for this is Terry Tao’s blogpost on Benford’s law. We quote:

Analogous universality phenomena also show up in empirical distributions – the distributions of a statistic ${X}$ from a large population of “real-world” objects. Examples include Benford’s lawZipf’s law, and the Pareto distribution (of which the Pareto principle or 80-20 law is a special case). These laws govern the asymptotic distribution of many statistics ${X}$ which

(i) take values as positive numbers;
(ii) range over many different orders of magnitude;
(iiii) arise from a complicated combination of largely independent factors (with different samples of ${X}$ arising from different independent factors); and
(iv) have not been artificially rounded, truncated, or otherwise constrained in size.

Examples here include the population of countries or cities, the frequency of occurrence of words in a language, the mass of astronomical objects, or the net worth of individuals or corporations. The laws are then as follows:

Benford’s law: For ${k=1,\ldots,9}$, the proportion of ${X}$ whose first digit is ${k}$ is approximately ${\log_{10} \frac{k+1}{k}}$. Thus, for instance, ${X}$ should have a first digit of ${1}$ about ${30\%}$ of the time, but a first digit of ${9}$ only about ${5\%}$ of the time.

Zipf’s law: The ${n^{th}}$ largest value of ${X}$ should obey an approximate power law, i.e. it should be approximately ${C n^{-\alpha}}$ for the first few ${n=1,2,3,\ldots}$and some parameters ${C, \alpha > 0}$. In many cases, ${\alpha}$ is close to ${1}$.

Pareto distribution: The proportion of ${X}$ with at least ${m}$ digits (before the decimal point), where ${m}$ is above the median number of digits, should obey an approximate exponential law, i.e. be approximately of the form ${c 10^{-m/\alpha}}$ for some ${c, \alpha > 0}$. Again, in many cases ${\alpha}$ is close to ${1}$.

Benford’s law and Pareto distribution are stated here for base ${10}$, which is what we are most familiar with, but the laws hold for any base (after replacing all the occurrences of ${10}$ in the above laws with the new base, of course). The laws tend to break down if the hypotheses (i)-(iv) are dropped.

The to me interesting part above lies in the hypotheses (i)-(iv). In line with the previous posts, I would term these to be conditions which ensure natural entropy’. Some relevant interplay between Benford’s law and Zipf’s law becomes clear when we look at a statistic $X$ which denotes the base-10 first-digit distribution of a (large) number of data satisfying (i)-(iv).  According to Benford’s law, the digit 1 occurs $\frac{log 2}{log\frac{3}{2}}$ more often than the digit 2. According to Zipf’s law (with $\alpha=1$) however, the digit 1 should occur twice as much as the digit 2. The discrepancy is solved by noting that, when trying to apply Zipf’s law, hypothesis (iv) is violated since the digits are a truncated entity.

Notice that the discrepancy between Zipf’s law (with $\alpha=1$) and Benford’s law diminishes with growing $n\in\mathbb{N}$, since then $\log{\frac{n+1}{n}}$ tends to $\frac{1}{n}$.

On a more philosophical note,  all depends on how we perceive drawing a natural number at random’. If we include (necessarily truncated) measurements, then the Benfordian’ relative chance should be favourite, I think. Else, the Zipfian’ or discrete’ relative chance can be considered. Notice that the `observer’ who measures or counts plays an important role. Again, an anthropic angle to this issue seems undeniable.

The analysis so far should cover, I hope, the ideas behind the tentative and highly speculative solution for drawing a natural number at random given above. Of course the way to read these posts is to start at post 1…and not the way this blog presents itself, with the most recent post on top. Still, I haven’t quite finished yet. I hope to cover some more rather tentative material on determinism and Church’s Thesis for physics (which is where the idea for these posts came from).

(to be continued) 