next up previous contents
Next: Waves Up: Laws of Physics : Previous: Energy   Contents



Why should we study probability? Probability lies at the heart of nature, and the inner workings of a myriad of phenomena and processes are unlocked to the human mind only by understanding the fundamental principles of probability. In particular, the entire field of thermodynamics, statistical mechanics and quantum theory are described by concepts and ideas from probability theory.

Probability and Uncertainty

Probability is that part of mathematics which gives a precise meaning to the idea of uncertainty, of not fully knowing the occurrence of some event. We often hear that there is a good chance of a shower; people bet with different odds, say 1 to 5, that a football match will be won by some team and so on. In each case, we are making a guess as to what will be the outcome of some event. Probability theory is the science that quantifies our ignorance, and spells out the likelihood of our guess being realized for a particular case. Probability theory cannot address every category of uncertainty. In particular, probability theory can say nothing about uncertainties in events that cannot in principle be repeated. For example, uncertainty as to whether miracles are possible and so on are questions that are not amenable to quantitative study. Rather, probability theory can only analyze those events which can be repeated many times, and in such cases it can make predictions such as what is the expected average value of an outcome, the possible deviations from the expected average value and so on. Probability theory is ideally suited, for example, to address questions related to games of chance - be it games based on drawing from a pack of cards or on rolling a dice - since in such games the same procedure is followed time and again. Predictions can be made on, say, how many times, on the average, that one will obtain an ace in a hundred draws. It should come as no surprise that the theory of probability, developed in its modern form by Simon de Laplace, originated in his analysis of games of chance.

Probability in Physics

Ideas from probability theory will turn out to be fundamental in understanding concepts such as entropy and quantum theory. In physics, there is a two-fold source of probability. The most universal and important definition of entropy is based on our understanding of the microscopic nature of matter. Given that any piece of macroscopic matter is made out of around $10^{23}$ atoms, a detailed knowledge of what each atom is doing is no longer possible, and instead one needs to resort to a statistical and probabilistic description. Note that we need to resort to probability in understanding entropy due to our ignorance, or equivalently, due to our incomplete knowledge, of what is happening at the microscopic level. Probability in physics originating from ignorance is called classical probability. In quantum theory, the situation is quite different. Even when we describe a single quantum particle, we need to resort, of necessity, to a probabilistic description since it turns out that nature is inherently probabilistic. We call such intrinsic uncertainty as quantum probability. To prepare ourselves to engage with the ideas of entropy and quantum theory, we briefly review the underlying ideas of probability. The review is rudimentary, and is focused on the concepts which will be directly employed in later discussions.

Discrete Random Variables

Consider the simplest possible example, namely the tossing of a coin. The outcome of a throw is uncertain, and can come up H(head) or T(tail). From the point of view of physics, this (classical)uncertainty is due to our ignorance since in principle we can predict the outcome of the toss once we know exactly how the coin is tossed. Since the outcome of a throw is uncertain, we say that the outcome is random. A random variable is defined to be a quantity, such as the position of a particle, which has many possible outcomes, together with the likelihood of each outcome being specified. If an event is certain to occur, its probability is taken to be 1, and if an event is impossible, its probability is taken to be 0. The likelihood of any event occurring, namely $P$, is always positive and must lie between 0 (impossibility) and 1 (certainty), that is $0 \leq P \leq 1$. A random variable whose outcomes are discrete, such as that of tossing a coin, is a discrete random variable.

Bernoulli Random Variable

The random outcome of tossing a coin is an example of a Bernoulli random variable, which is defined to be a random variable with only two possible outcomes, namely either H or T. Let $P(H)\equiv p$ be the likelihood that H will appear, and let $P(T)\equiv q$ be the likelihood that T will occur. We also have that the result of every throw must either be H or T, and hence $P(H)+P(T)=1$. In summary for the general case of a biased coin we have the following.
$\displaystyle P(H)$ $\textstyle =$ $\displaystyle p \mbox{ ; }P(T)=q$ (4.1)
$\displaystyle P(H)+P(T)$ $\textstyle =$ $\displaystyle p+q=1$ (4.2)

We all know that if we have a fair coin, it is equally likely that H or T may result from each throw. Hence, for a fair coin we have that $P(H)=P(T)$. Hence, for a fair coin $\displaystyle P(H)=P(T)=\frac{1}{2}$.

Binomial Random Variable

Suppose we throw a coin $N$ times - or equivalently throw $N$ identical coins once, and ask: what is the likelihood of obtaining $k$ heads, regardless of what sequence they appear. We denote this probability by $P(k,N)$. The probability of a particular sequence of $k$-heads and $N-k$ tails - for example, the first $k$ outcomes are heads and the rest being tails - is given by multipliying the likelihood of every throw in the sequence, and yields
\end{displaymath} (4.3)

Recall we are interested in obtaining $k$ heads in $N$ throws regardless of what sequence they appear in. For example, all the heads can occur, for example, in the last $k$ throws, as well as in any sequence of $k$ heads in $N$ throws. Hence, we need to find out how many different ways that $k$ heads can occur in $N$ throws, namely $\Gamma(k,N)$, and then multiply it by the probability of obtaining a particular sequence which is given by (4.3). Example: Suppose we throw the coin $N=3$ times, and we want to know, for example, how many times a single head occurs. We have the following $2^3=8$ possible outcomes: HHH, HHT,HTH,THH,HTT,THT,TTH,TTT. Clearly $\Gamma(1,3)=3$. In general we can make a Pascal triangle, as given in Figure 4.1, where the total number of boxes in a given row denote the total number of throws $N$. As one goes along a row of Pascal's triangle, the entries count the number of heads $k$, which is given by $\Gamma(k,N)$.

Figure 4.1: Pascal's triangle
\epsfig{file=core/figure12.eps, height=8cm}

The result for the general case is given by the well known binomial coefficient
$\displaystyle \Gamma(k,N)$ $\textstyle =$ $\displaystyle \frac{N!}{(N-k)!k!}$ (4.4)
$\displaystyle \mbox {\rm {where }} n!$ $\textstyle \equiv$ $\displaystyle n(n-1)(n-2)....2 \times 1$ (4.5)

Hence the probability of obtaining $k$-heads in $N$-throws is given by
$\displaystyle P(k,N)$ $\textstyle =$ $\displaystyle \Gamma(k,N)p^kq^{(N-k)}$ (4.6)
  $\textstyle =$ $\displaystyle \frac{N!}{(N-k)!k!}p^kq^{(N-k)}$ (4.7)
$\displaystyle k$ $\textstyle =$ $\displaystyle 0,1,2,.....N$ (4.8)

A random variable having $k=0,1,...N$ as its possible outcomes, with the probability for the outcomes given by eq.(4.6) above is called a binomial random variable. We have, using the binomial theorem, that
$\displaystyle \sum_{k=0}^N P(k,N)$ $\textstyle =$ $\displaystyle (p+q)^N$ (4.9)
  $\textstyle =$ $\displaystyle 1$ (4.10)

The result above is simply a statement that when we throw a coin $N$ times, we are certain that the outcome will either be 0 (no) head, or 1 head, or 2 heads all the way to all heads, that is, $N$ heads. For our example of tossing a coin three times we have for the probability $\displaystyle P(1,3)=\Gamma(1,3)\times (\frac{1}{2})^3=\frac{3}{8}$. Fair Coin : p=q=$\frac{1}{2}$ Consider the important case of a fair coin, that is, when $p=q=\frac{1}{2}$; in this case, we have
$\displaystyle P(k,N)$ $\textstyle =$ $\displaystyle (\frac{1}{2})^N \times \frac{N!}{(N-k)!k!}$ (4.11)
  $\textstyle \propto$ $\displaystyle \Gamma(k,N)$ (4.12)

The crucial point to note above is that the proportionality constant is independent of $k$. The formula 4.11 shows that for a random variable for which all of its outcomes are equally likely, the probability that a certain outcome will occur, in this case number of heads being equal to $k$, is proportional to the number of ways this configuration can occur, namely $\Gamma(k,N)$. This result will be very important later in our understanding of the concept of entropy.

Random Walk

Throwing a coin $N$-times has a physical interpretation. Consider a particle that can move in only one-dimension. Let us also assume the particle can move only a fixed distance, and can only move either to the right or the left. To decide which way the particle will move, we toss a coin. If toss comes up with a $H$, the particle moves to the right; if the toss comes up with a tail, the particle moves to the left. In other words, the outcome is $+1$ with probability $P(H)$, and outcome is $-1$ with probability $P(T)$. In effect, the particle moves on a lattice of equally spaced points. This process that the particle is undergoing is called a random walk - also called Brownian motion. Random walk is precisely the way the molecules from an open bottle of perfume spread the smell into the entire room, and is called a diffusion process. Throwing the coin $N$ times corresponds to taking $N$ steps. Suppose the particle starts the random walk at the origin, takes $k$ steps to the right, and concomitant $N-k$ steps to the left. If $N_H$ is the number of heads and $N_T$ is the number of tails, then clearly, the distance from the origin is $N_H-N_T=2k-N$. There are many paths which lead to the final position of $l=2k-N$, and correspond to the number of different ways that $k$-heads can come up in $N$ throws.

Figure 4.2: Different paths leading to a final position
\epsfig{file=core/figure13.eps, height=5cm}

\fbox{\fbox{\parbox{12cm}{The probability that the particle, after doing a rando...
... (\frac{p}{q})^m \mbox{ ; }-M \leq m \leq M \nonumber

% latex2html id marker 10017
\fbox{\fbox{\parbox{12cm}{{\bf Fair Coin}.For simpl...
...(paths) that go from the origin to the point $l$. We thus
obtain $P(l,N)$.

For a particle undergoing a random walk, its position $m$ at every point in its $N$-steps is a random variable. An important tool for studying the behaviour of random variables is to compute the average values of quantities of interest. For a function of the random variable $m$, say $f(m)$, let us denote its average value by $<f(m)>$. We then have

<f(m)>=\sum_{m=-M}^M f(m)P(m,M)
\end{displaymath} (4.13)

The above expression means that the average value of $f(m)$ is given by the summing the value of $f(m)$ for every outcome $m$, weighted by the likelihood of that value occurring. We have the following natural definition for $P(H)$.
$\displaystyle P(H)$ $\textstyle =$ $\displaystyle \frac{<N_H>}{N}$ (4.14)

The two most important properties of any random variable is its average and its standard deviation. Let us return to the random walk with $p=q=\frac{1}{2}$. The average position of the particle after $2M(=N)$-steps is given by
$\displaystyle <m>$ $\textstyle =$ $\displaystyle \sum_{m=-M}^{M}m P_{RW}(m,M)$ (4.15)
  $\textstyle =$ $\displaystyle 0$ (4.16)

The reason we get zero is because we have assumed equal probability to step to the right or to the left. Hence, on the average, its steps on either side of the origin cancel, with the average being at the starting point. However, we intuitively know that even though the average position of the particle undergoing random walk is zero, it will deviate more and more from the origin as it takes more and more steps. The reason being that every step the particle takes is random, it is highly unlikely that the particle will take two consecutive steps in opposite directions. The measure of the importance of the paths that are far from the origin is measured by the average value of the square of the position of the particle. The reason this measure is useful is because every time the particle deviates from the origin, be it in the right or left directions, the square of the deviation is always be positive. We hence have the standard deviation given by
$\displaystyle \sigma^2$ $\textstyle \equiv$ $\displaystyle <m^2>-<m>^2$ (4.17)
  $\textstyle =$ $\displaystyle \sum_{m=-M}^{M}m^2 P_{RW}(m,M)$ (4.18)
  $\textstyle =$ $\displaystyle 2M$ (4.19)
  $\textstyle =$ $\displaystyle N$ (4.20)

We have the important result from (4.21) and (4.24)that, since, for $k=N_H$, $\displaystyle m\equiv N_H-\frac{N}{2}$, we have the following
$\displaystyle <(N_H-\frac{N}{2})^2>=N$     (4.21)
$\displaystyle \Rightarrow<(\frac{N_H}{N}-\frac{1}{2})^2>=\frac{1}{N}$     (4.22)

The equation above has an important interpretation. In any particular experiment, all we can obtain is $N_H$= number of heads for $N$ trial. So how do we compute $P(H)$? We would like to set $<N_H>=N_H$, but there are errors inherent in this estimate, since in any particular set of throws, we can get any value of $N_H$ which need not be equal to $<N_H>$. In other words, what is the error we make if set $\displaystyle P(H)=\frac{N_H}{N}$? Eq. (4.26) tells us that for a fair coin, with $\displaystyle P(H)=\frac{1}{2}$, if we compute $\displaystyle
\frac{N_H}{N}$, we have
$\displaystyle P(H)$ $\textstyle =$ $\displaystyle \frac{1}{2}$ (4.23)
  $\textstyle =$ $\displaystyle \frac{N_H}{N}\pm \frac{1}{\sqrt{N}}$ (4.24)

In other words, the estimate that we obtain for $P_H$ from our experiment, namely $\displaystyle
\frac{N_H}{N}$ is, to within errors which are approximately $\displaystyle\frac{1}{\sqrt{N}}$, equal to the actual value. The point to note that the errors that are inherent in any estimate are quantified above, and go down as the $\frac{1}{\sqrt{\mbox{\rm N}}}$,where $N=$ sample-size. In general, for any random variable with standard deviation given by $\sigma^2$, the estimate for the probability $P(H)$, where $N_H$ is the number of times that the outcome $H$ has occurred, is given by
P(H)=\frac{N_H}{N}\pm \frac{\sigma}{\sqrt{N}}
\end{displaymath} (4.25)

In general, let $m_{\mathrm{Est}}$ be an estimate of some quantity $\mu $ that has a standard deviation given by $\sigma$, and derived from a sample of size $N$. The generalization of eq.(4.29) states the following.
$\displaystyle m_{\mathrm{Est}}$ $\textstyle =$ $\displaystyle \mu \pm \frac{\sigma}{\sqrt{N}} \mbox{\rm { with
66\% likelihood}}$ (4.26)
  $\textstyle =$ $\displaystyle \mu \pm 2\frac{\sigma}{\sqrt{N}} \mbox{\rm { with
95\% likelihood}}$ (4.27)

The relation of $m_{\mathrm{Est}}$ with what it is estimating, namely $\mu $, is graphically shown in Figure4.3.

Figure 4.3: Estimate lies in a Range around $\mu $

Continuous Random Variables

In addition to random variables taking discrete values, as has been the case with the Bernoulli and Binomial random variables, and there are also random variables that can take continuous values. A simple example of this would be the height of an individual. If go to a street and measure the heights of pedestrians, we will find that the heights can take any value from say 1 m to 2 m. In other words, the heights are varying continuously from person to person. Not knowing any better, we would assume that the heights of the pedestrians are samples of a continuous random variable. Let $z$ be a continuous variable that takes values in some continuous interval with a minimum value $L$ and maximum value of $U$, that is $z\in [L,U]$. The probability distribution function $P(z)$ is defined by the following. For a small interval $dz$, we have
$\displaystyle P(z)dz$ $\textstyle =$ $\displaystyle \mbox{\rm { probability that z will fall in interval
$\displaystyle P(z)$ $\textstyle \geq$ $\displaystyle 0$ (4.28)
$\displaystyle \int_L^U P(z)dz$ $\textstyle =$ $\displaystyle 1 \mbox{\rm { Total probability is one}}$ (4.29)

Uniform Random Variable

The simplest continuous, namely, the uniform random variable $u$, called $U(0,1)$, takes all values in the continuous interval $[0,1]$ with equal likelihood. Hence the probability for some event cannot depend on $u$, which leads to
P(u)=1, u\in [0,1]
\end{displaymath} (4.30)

We hence have
$\displaystyle <u>=\int_0^1 u du$ $\textstyle =$ $\displaystyle \frac{1}{2}$ (4.31)
$\displaystyle <u^2>=\int_0^1 u^2du$ $\textstyle =$ $\displaystyle \frac{1}{3}$ (4.32)
$\displaystyle \sigma^2=<u^2>-<u>^2$ $\textstyle =$ $\displaystyle \frac{1}{12}$ (4.33)

The humble uniform random variable $U(0,1)$, it turns out surprisingly, is one of the most important random variables. The reason being that one can prove a theorem that all random variables, be they discrete or continuous, can be mapped into the uniform random variable. Hence, in all numerical simulations, the computer has built-in algorithms for $U(0,1)$, and one is then faced with the prospect of generating random variables of interest starting from $U(0,1)$.

Normal or Gaussian Random Variable

A random variable which has wide applications in diverse fields such as physics, statistics, finance, engineering and so on is the normal or Gaussian random variable. A continuous random variable $x$ can have any value on the real line, that is, $x \in [-\infty,+\infty]$. For the case of a Gaussian random variable, its probability distribution function, displayed in Figure 4.4 is given by the following.
$\displaystyle P(x)$ $\textstyle =$ $\displaystyle \frac{1}{\sqrt{2\pi \sigma^2}}e^{-\frac{1}{2\sigma^2}(x-\mu)^2}$ (4.34)
$\displaystyle <x>$ $\textstyle =$ $\displaystyle \int_{-\infty}^{+\infty}x P(x)dx$ (4.35)
  $\textstyle =$ $\displaystyle \mu \mbox{\rm { : mean}}$ (4.36)
$\displaystyle <x^2>-<x>^2$ $\textstyle =$ $\displaystyle \int_{-\infty}^{+\infty}x^2P(x)dx - \mu^2$ (4.37)
  $\textstyle =$ $\displaystyle \sigma^2 \mbox{\rm { : standard deviation}}$ (4.38)

Figure 4.4: Probability Distribution of the Gaussian Random Variable

As mentioned earlier, a model for diffusion is to consider a particle doing a random walk in a (continuous) medium. It can be shown that its probability distribution is then given by the normal distribution. Suppose the particle starts its random walk at time $t=0$ from the point $x_0$; then at time $t>0$ its position can be anywhere in space, that is $x \in [-\infty,+\infty]$. The probability for it to at different values of $x$ is given by
P(x;t)=\frac{1}{\sqrt{\pi Dt}}e^{-\frac{1}{Dt}(x-x_0)^2}
\end{displaymath} (4.39)

with $D$ being the diffusion constant of the medium in which the particle is doing a random walk.
next up previous contents
Next: Waves Up: Laws of Physics : Previous: Energy   Contents
Marakani Srikant 2000-09-11