# Understanding Hypothesis Testing Through Coin Tossing

Written on

To illustrate hypothesis testing, let’s engage in a simple coin toss game: if it lands heads, I win; if tails, you win.

As we begin tossing, the first outcome is heads—victory for me. The next toss? Heads again—another win. Yet again, heads—I'm on a roll! And once more, heads—this is getting interesting.

At this point, many participants might express their suspicion, claiming, “You’re cheating!” But why is that? I encourage you to ponder the reasoning behind this skepticism and articulate it clearly.

Before we move forward, take a moment to consider why disbelief arises at this juncture. Arriving at this understanding will engrain the fundamental concept of hypothesis testing in your memory.

Typically, in my (virtual) classroom, someone eventually articulates the answer I seek:

> “If the coin were unbiased, it’s unlikely to land on heads four times consecutively.”

This insight encapsulates the essence of hypothesis testing. If the coin is fair, each flip yields a 50% chance of heads and a 50% chance of tails.

The likelihood of flipping four heads in succession is calculated as follows:

Consequently, the probability of obtaining four heads in a row (assuming the coin is fair) stands at approximately 6.25%. While this is improbable, it’s not impossible, illustrating the logic underpinning hypothesis testing.

In hypothesis testing, a person often aims to substantiate a new belief, denoted as H1. For instance, the claim might be: “Your professor is cheating; the coin is biased, and the probability p of obtaining heads exceeds 50%” or mathematically:

The initial strategy is to assume the contrary is true, termed the null hypothesis (H0). Here, it posits that “the coin is fair or even slightly favors tails.” Thus, the probability p for heads is at or below 50%.

The challenge lies in accurately determining this probability p, the actual chance of flipping heads. Regardless of how many times we toss the coin, we can never ascertain this value with certainty. Therefore, the true probability p for heads will remain elusive, and our flips will only yield approximations.

However, we can initially assume that p equals 50% (i.e., H0 holds true) and then calculate the probability of our observed outcome (given the assumption that H0 is accurate). In this scenario, we compute the probability of flipping four heads in succession under the fair coin assumption (p = 0.5).

The resulting probability is notably low, and the lower this probability, the less credence we give to the hypothesis H0.

It’s important to note that even if our calculated probability is low, H0 might still hold true, and the coin could indeed be fair. The difficulty lies in maintaining that belief.

Thus, if our computed probability drops below a certain threshold, we dismiss H0.

The critical question is: how low is too low? In various scenarios, a threshold of 5% is commonly adopted. While I can't pinpoint the origin of this 5%, it's fascinating to observe that in experiments like coin tossing or dice rolling, students tend to express disbelief and accuse me of cheating as we approach this mark.

In our coin scenario, once there are three heads in a row, skepticism arises. With four, the accusations intensify. By five heads, the claims become quite vociferous.

This reaction occurs without any formal calculations, suggesting a natural human intuition. Interestingly, this pattern holds true across different experiments—once we near the 5% threshold, doubts amplify.

This 5% figure is referred to as the significance level, often denoted as ?. It’s sometimes interchangeably called the error level, although significance level typically describes the value 1 - ?, or 95%. The ? value should be established prior to the experiment, with 5% being a frequent choice.

## How Hypothesis Testing Functions: A Brief Overview

To conduct hypothesis testing, we begin by defining our hypotheses and establishing an error level (like 5%, 1%, etc.). Alongside this level, we should determine the specific experiment we wish to carry out, such as tossing a coin four times.

Next, we establish a decision rule based on this error level before conducting the experiment.

Finally, we assess our hypotheses, deciding whether to reject H0 or not. It’s essential to understand that we either reject H0 or we do not reject it; failure to reject does not equate to proof of its truth.

In our coin example, not rejecting H0 simply indicates insufficient evidence to accuse me of cheating. The coin could still be rigged.

This principle mirrors legal contexts: just because someone isn't proven guilty doesn't mean they are innocent.

In criminal law, we state that a person is innocent until proven guilty; in hypothesis testing, we assert that H0 is valid unless disproven.

This reflects a strong bias towards H0. If you lack a bias towards either hypothesis, hypothesis testing may not be suitable for you.

## General Steps in Hypothesis Testing

Let’s summarize the hypothesis testing process:

**Step 1:** Identify the new belief you wish to validate (H1) and define its opposite (H0).

**Step 2:** Set your significance level ? and decide on the experiment specifics, usually determining how many times you will repeat a base experiment (e.g., tossing a coin).

*Note: This wasn’t thoroughly addressed in the introductory example, as I didn’t specify how many times the coin would be tossed beforehand.*

**Step 3:** Based on the hypotheses, significance level, and the experiment, establish the decision rule.

**Step 4:** Execute your experiment.

**Step 5:** Draw conclusions according to the decision rule established in step 3.

## Coin Toss Example – Step by Step

Let’s revisit our coin toss example and follow through the steps.

### Step 1: Define H1 and H0

We want to demonstrate that the coin is biased.

Two formulations exist: we might solely be concerned with the coin favoring heads. If we believe the individual choosing heads is dishonest, we would expect a higher probability for heads, not considering that tails might also have a probability greater than 50%. In this case, we would set up a hypothesis as follows:

This is termed one-sided hypothesis testing, specifically a right-sided test, as we reject H0 “on the right side.”

Conversely, if we act as neutral arbiters, we could hypothesize that either heads or tails might have a significantly higher probability than 50%.

This leads us to a two-sided hypothesis, focusing solely on whether the coin is fair.

Given our inclination that the coin likely favors heads, let’s proceed with the hypotheses:

### Step 2: Significance Level and Experiment

In step 2, we need to establish the significance level and the experiment details. Our introductory example lacked precision since we tossed the coin without predefining the number of trials.

Let’s decide to toss the coin ten times with ? set at 0.05.

### Step 3: Decision Rule

Now, we create the decision rule. Remember, the core principle of hypothesis testing is to assume H0 is true. Thus, we will assume p = 0.5, indicating a fair coin.

The question then becomes: if H0 holds, what’s the probability of observing specific outcomes?

The distribution of the number X of heads from ten tosses follows a binomial distribution characterized by parameters n=10 and p=0.5.

(For a deeper dive into the binomial distribution, refer to this story here.)

We wish to determine the probabilities of getting k or fewer heads when tossing the coin ten times. The computed values for the binomial distribution lead us to the following table:

From this table, we derive our decision rule. We become suspicious if the number of heads is excessively high—specifically, if the probability of observing this number or a more extreme result (under the assumption of a fair coin) is ? or higher. Thus, we should be wary if we observe 8 heads or more.

Consequently, the decision rule stipulates:

> If 8 or more heads are observed, reject H0. With a significance level of 95% (an error level of 5%), we may conclude that the coin is rigged.

> If 7 or fewer heads are observed, we cannot reject H0, and we cannot statistically affirm that the coin is rigged.

### Step 4: Conduct the Experiment

Now, we proceed to toss the coin ten times:

*Heads, heads, tails, heads, tails, heads, tails, heads, heads, heads.*

In total, we observe 7 heads and 3 tails.

### Step 5: Conclusion from the Experiment

Reviewing the results from step 4 (7 out of 10 heads), we refer back to our decision rule. The rule indicates that if 7 or fewer heads are recorded, we cannot reject H0. Therefore, we cannot dismiss the notion that the coin is fair, nor can we prove it is rigged. Hence, we must accept the assertion that the coin is fair.

Does this imply that the coin is truly fair? Not necessarily.

It merely indicates we lack sufficient evidence to assert otherwise. If the coin were fair, there would still be a greater than 5% chance (in this case, 5.47%) that the observed outcome (or something more extreme) would occur.

In hypothesis testing, it's crucial to remember that we never (statistically) prove H0; we can only (statistically) disprove it.

This is due to the inherent bias towards H0 from the outset. H0 is presumed valid until demonstrated otherwise (i.e., until rejected). Thus, if you harbor no bias initially, hypothesis testing may not align with your approach.

This also clarifies why, in step 3, we assume p=0.5. Why is it acceptable to assume p=0.5 while H1 states p > 0.5? Experimenting with different values in the binomial distribution—say, p=0.4—will reveal that the results (i.e., probabilities in the table) become more extreme, making it less likely to not reject H0, thereby strengthening the bias towards H0.

Many students, both in high school and university, often find hypothesis testing daunting. However, the core concept is relatively straightforward.

The most challenging aspects typically arise in step 3 and step 5. In step 3, identifying the underlying distribution (in this case, the binomial distribution) may prove tricky or ambiguous. In step 5, deriving the correct conclusions can also be complex.

Ultimately, hypothesis testing consistently follows the same five steps.

*If you found this article engaging, consider signing up for a medium membership through this link; I’ll receive a small commission.*