Discrete distributions

Probabilities and statistics for biology (CMB STAT1 - STAT2)

Jacques van Helden

2020-02-20

Discrete distributions of probabilities

The expression discrete distribution denotes probability distribution of variables that only take discrete values (by opposition to continuous distributions).

Notes:

Geometric distribution

Application: waiting time until the first appeearance of an event in a Bernoulli schema.

Examples:

Mass function of the geometric distribution

The Probability Mass Function (PMF) indicates the probability to observe a particular result.

For the geometric distribution, it indicates the probability to observe exactly \(x\) failures before the first success, in a series of independent trials with a probability of success \(p\).

\[\operatorname{P}(X = x) = (1-p)^x \cdot p\]

Justification:

Note: the PMF of discrete distributions relates to the concept of density used for continuous distributions.

Geometric PMF

**Fonction de masse de la loi géométrique**. Gauche: ordonnée en échelle logarithmique.

Fonction de masse de la loi géométrique. Gauche: ordonnée en échelle logarithmique.

Distribution tails and cumulative distribution function

The tails of a distribution are the areas comprised under the density curve up to a given value (left tail) or staring from a given value (right tail).

Distribution tails and cumulative distribution function

**Tails and Cumulative Density Function of the geometric distribution**.

Tails and Cumulative Density Function of the geometric distribution.

Binomial distribution

The binomial distribution indicates the probability to observe a given number of successes (\(x\)) in a series of \(n\) independent trials with constant success probability \(p\) (Bernoulli schema).

Binomial PMF

\[\operatorname{P}(X=x) = \binom{n}{x} \cdot p^x \cdot (1-p)^{n-x} = C_n^x p^x (1-p)^{n-x} = \frac{n!}{x!(n-x)!} p^x (1-p)^{n-x}\]

Binomial CDF

\[\operatorname{P}(X \ge x) = \sum_{i=x}^{n}{P(X=i)} = \sum_{i=x}^{n}{C_n^i p^i (1-p)^{n-i}}\]

Properties

\(i\)-shaped binomial distribution

The binomial distribution can take various shapes depending on the values of its parameters (success probability \(p\), and number of trials \(n\)).

When the expectation (\(p \cdot n\)) is very small, the binomial distribution is monotonously decreasing and is qualified of \(i\)-shaped.

Distribution binomiale en forme de i.

Distribution binomiale en forme de i.

Asymmetric bell-shaped binomial distribution

When the probability is relatively high but still lower than \(0.5\), the distribution takes the shape of an asymmetric bell.

Distribution binomiale en forme de cloche asymétrique.

Distribution binomiale en forme de cloche asymétrique.

Symmetric bell-shaped binomial

When the success probability \(p\) is exactly \(0.5\), the binomial distribution takes the shape of a symmetrical bell.

Distribution binomiale en forme de cloche symétrique (p=0.5).

Distribution binomiale en forme de cloche symétrique (p=0.5).

\(j\)-shaped binomial distribution

Then the success probability is close to 1, the distirbution is monotonously increasing and is qualified of ***\(j\)-shaped distribution.

Distribution binomiale en forme de j.

Distribution binomiale en forme de j.

Examples of applications of the binomial

  1. Dices: number of 6 observed during a series of 10 dice rolls
  2. Sequence alignment: number of identities between two sequences alignmed without gap and with an arhbitrary offset.
  3. Motif analysis: number of occurrences of a given motif in a genome.

Note: the binomial assumes a Bernoulli schema. Forexamples 2 and 3 this amounts to consider that nucleotides are concatenated in an independent way, which is quite unrealistic.

Poisson law

The Poisson law describes the probability of the number of realisations of an event during a fixed time interval, assuming that the average number of events is constant, and that the events are independent (previous realisations do not affect the probabilities of future realisations).

Poisson Probability Mass Function

\[P(X = x) = \frac{\lambda^x}{x!}e^{-\lambda}\]

Properties of the Poisson distribution

Application: mutagenesis

Historical experiment by Luria-Delbruck (1943)

In 1943, Salvador Luria and Max Delbruck demonstrated that when cultured bacteria are treated by an antibiotic, the mutations that confer resistance are not induced by the antibiotic itself, but preexist. Their demonstration relies on the fact that the number of antibiotic-resistant cells follows a Poisson law (Luria & Delbruck, 1943, Genetics 28:491–511).

Convergence of the binomial towards the Poisson

Under some circumstances, the binmial law converges towards a Poisson.

TO DO

Netative binomial: number of successes before the \(r^{th}\) failure

The negative binomial distribution (also called Pascal distribution) indicates the probability of the number of successes (\(k\)) before the \(r^{th}\) failure, in a Bernoulli schema with success probability \(p\).

\[\mathcal{NB}(k|r, p) = \binom{k+r-1}{k}p^k(1-p)^r\]

This formula is a simple adaptation of the binomial, with the difference that we know that the last trial must be a failure. The binomial coefficient is thus reduced to choose the \(k\) successes among the \(n-1 = k+r-1\) trials preceding the \(r^{th}\) failure.

Negative binomial: alternative formulations

It can also be adapted to indicate related probabilities.

\[\mathcal{NB}(r|k, p) = \binom{k+r-1}{r}p^k(1-p)^r\]

\[\mathcal{NB}(n|r, p) = \binom{n-1}{r-1}p^{n-r}(1-p)^r\]

Negative binomial density

Negative binomial.

Negative binomial.

Properties of the negative binomial

The variance of the negative binomial is higher than its mean. It is therefore sometimes used to model distributions that are over-dispersed by comparisong with a Poisson.

\[\mathcal{NB}(r|k, p) = \binom{k+r-1}{r}p^k(1-p)^r\]

Exercise – Negative binomial

Each student chooses a value for the maximal number of failures (\(r\)).

  1. Read carefully the help of the negative binomial functions: help(NegBinomial)
  2. Random sampling: draw of \(rep=100000\) random numbers from a negative binomial distribution (rndbinom()) to compute the distribution of the number of successes (\(k\)) before the \(r^{th}\) failure.
  3. Compute the expected mean and variance of the negative binomial.
  4. Compute the mean and variance from your sampling distribution.
  5. Draw an histogram with the number of successes before the \(r^{th}\) failure.
  6. Fill up the form on the collective result table

Solution to the exercise – negative binomial

r <- 6       # Number of failures
p <- 0.75     # Failure probability
rep <- 100000
k <- rnbinom(n = rep, size = r, prob = p)
max.k <- max(k)
exp.mean <- r*(1 - p)/p
rand.mean <- mean(k)
exp.var <- r*(1 - p)/p^2
rand.var <- var(k)
hist(k, breaks = -0.5:(max.k + 0.5), col = "grey", xlab = "Number of successes (k)",
     las = 1, ylab = "", main = "Random sampling from negative binomial")
abline(v = rand.mean, col = "darkgreen", lwd = 2)
abline(v = exp.mean, col = "green", lty = "dashed")
arrows(rand.mean, rep/20, rand.mean + sqrt(rand.var), rep/20, 
       angle = 20, length = 0.1, col = "purple", lwd = 2)
text(x = rand.mean, y = rep/15, col = "purple",
     labels = paste("sd =", signif(digits = 2, sqrt(rand.var))), pos = 4)
legend("topright", legend = c(
  paste("r =", r), 
  paste("exp.mean =", signif(digits = 4, exp.mean)), 
  paste("mean =", signif(digits = 4, rand.mean)), 
  paste("exp.var =", signif(digits = 4, exp.var)),
  paste("var =", signif(digits = 4, rand.var))
  ))

kable(data.frame(r = r, 
                 exp.mean = exp.mean, 
                 mean = rand.mean,
                 exp.var = exp.var,
                 var = rand.var), digits = 4)
r exp.mean mean exp.var var
6 2 1.9963 2.6667 2.6705

Negative binomial for over-dispersed counts

Exercises