# Bayesian Inference and Graphical ModelsIntroduction

**Exercise**

Consider the following two scenarios.

- You pull a coin out of your change purse and flip it five times. It comes up heads all five times.
- You meet a magician who flips a coin five times and shows you that it came up heads all five times.

In which situation would you be more inclined to be skeptical of the null hypothesis that the coin being flipped is a fair coin?

*Solution.* We'd be more inclined to be skeptical in the magician scenario, since it isn't unusual for a magician to have a trick coin or card deck. Given a random coin from our change purse, it is extraordinarily unlikely that the coin is actually significantly biased towards heads. Although it's also unlikely for a fair coin to turn up heads five times in a row, that isn't going to be enough evidence to be persuasive.

This example illustrates one substantial shortcoming of the statistical framework—called **frequentism**—used in our statistics course. Frequentism treats parameters as *fixed constants* rather than random variables, and as a result it does not allow for the incorporation of information we might have about the parameters beyond the data observed in the random experiment (such as the real-world knowledge that a magician is not so unlikely to have a double-headed coin).

**Bayesian statistics** is an alternative framework in which we do treat model parameters as random variables. We specify a **prior** distribution for a model's parameters, and this distribution is meant to represent what we believe about the parameters before we observe the results of the random experiment. Then the results of the experiment serve to update our beliefs, yielding a **posterior** distribution.

The theorem in probability which specifies how probability distributions update in light of new evidence is called

For example, if your *prior* assessment of the probability that the magician's coin is double-headed is 5%, then your *posterior* estimate of that probability after observing five heads would shoot up to

Meanwhile if the prior for double-headedness for the coin in your coin purse is , then the posterior is only .

The quantity is called the **likelihood** of the observed result. So we can summarize Bayes theorem with the mnemonic **posterior is proportional to likelihood times prior**.

Bayes rule takes an especially simple form when our distributions are supported on two values (for example, "fair" and "double-headed"), but we can apply the same idea to other probability mass functions as well as probability density functions.

**Example**

Suppose that the heads probability of a coin is . Consider a uniform prior distribution for , and suppose that flips of the coin are observed. Express the posterior density in terms of the number of heads and tails in the observed sequence of flips.

*Solution.* We calculate the posterior density as likelihood times prior. Let's call the random sequence of flips, and suppose is a possible value of . We get

In this formula we are employing a common abuse of notation by using the same letter (

The continuous distribution on **Beta** distribution with parameters

**Exercise**

Show that the coin flip posterior for a Beta prior is also a Beta distribution. How does the evidence alter the parameters of the beta distribution.

*Solution.* If the prior density is proportional to

When the posterior distribution has the same parametric form as the prior distribution, this property is called *conjugacy*. For the example above, we say that the Beta distribution is a conjugate family for the binomial likelihood.