2021-05-05

About

This lecture is based on the Minitab Blog series.

The presentation is prepared with R Markdown.

Navigate the presentation with arrow keys. Press o to switch to the overview.

The source code is available on GitHub.

Why We Need Hypothesis Tests?

A hypothesis test evaluates two mutually exclusive statements about a population to determine which statement is best supported by the sample data.

  • What does statistical significance mean?
  • What is a p-value?
  • How to interpret confidence intervals?
  • How to tests hypotheses?

Population vs Sample

How well a sample statistic estimates an underlying population parameter?

Population

  • Let’s measure the mean fluorescence intensity (FI) of cells subject to a treatment.
  • The FI in single cells varies across the population and follows a distribution with \(\mu = 5\) (mean).

Sample

  • We’re interested in estimating the mean FI in the cell population.
  • In experiments we cannot access the true population mean unless we measure everything.
  • Sampling the reality is the next best thing we can do!
  • A sample consists of measurements of FI in individual cells.
  • A sample mean is the mean calculated from a sample.

Sample vs population statistic

  • Let’s measure FI in 6 different fields of view (FOV), with 10 cells in each FOV.
  • We calculate the mean FI for every FOV and obtain 6 sample means.
  • The mean FI from each FOV is usually far from the population mean (=5)!
  • Sample mean
    5.129893
    5.290907
    4.468820
    5.831213
    5.130247
    5.737346

  • For any random sample, the sample mean almost certainly doesn’t equal the true mean of the population due to sampling error.

Sampling distribution

  • Let’s sample our biological process multiple times (e.g. by recording multiple FOVs) but at different magnifications such that we have a different number of cells per FOV.
  • From each sample (=FOV) we calculate a sample mean FI.

Sampling distribution II

  • A sampling distribution is the distribution of a statistic, such as the mean, that is obtained by repeatedly drawing a large number of samples from a specific population.
  • This distribution allows you to determine the probability of obtaining the sample statistic.

Sampling distribution III

  • Fortunately, we can create a plot of the distribution of sample means without collecting many different random samples!
  • Instead, we’ll create a probability distribution plot using the t-distribution.

Student’s t-distribution

  • A normal distribution describes a full population. T-distributions describe samples drawn from a full population.
  • The t-distribution differs for each sample size. The larger the sample, the more it resembles a normal distribution.
  • \[ t = \frac{\bar{x} - \mu}{s / \sqrt{n}} \]
  • Where, \(\bar{x}\) is the mean, \(s\) standard deviation, \(n\) size of the sample. \(\mu\) is the expected mean of the population.

Statistical significance

Significant difference

  • Let’s move to another well. This time we measured the FI of 5 cells, which yielded the mean 6.5.
  • Is the new sample mean significantly different from the population mean?
  • There is no magic place on the distribution to determine significant difference.
  • We have a continuous decrease in the probability of obtaining sample means that are farther from the population mean.
  • Where to draw the line?

Unusuality

How unusual is our new sample mean? Let’s set various thresholds for unusuality.

The shaded region indicates the probability of finding the sample mean FI.

Threshold for unusuality

  • If our threshold for unusuality is 10 or 5%, then the new sample mean (=6.5) IS unusual.
  • If our threshold for unusuality is 1%, then the new sample mean (=6.5) IS NOT unusual.

Hypothesis testing

Formulate hypotheses

This new measurement seems quite different from the population mean (=5).

  • Is it still the same treatment?
  • What are the chances of obtaining the sample mean (=6.5) given the population mean (=5)?
  • Let’s formulate hypotheses!
  • Null hypothesis: the population mean equals the hypothesised mean (=5)
  • Alternative hypothesis: the population mean differs from the hypothesized mean (=5)
  • By setting a threshold for unusuality we can reject (or not) the null hypothesis.

Test hypotheses

The thresholds for shaded regions determine how far away our sample statistic must be from the null hypothesis value before we can say it is unusual enough to reject the null hypothesis.

Significance level

  • These thresholds are the significance levels, \(\alpha\).
  • It is the probability of rejecting the null hypothesis when it is true.
  • It’s a risk of concluding that a difference exists when there is no actual difference.

P-value

  • We’ve tested the null hypothesis by looking at the location of our sample mean with respect to chosen significance levels.
  • How to determine the statistical significance at a chosen level without looking at the graph?
  • Let’s shade the probability of obtaining our sample mean (=6.5) that is at least as extreme in both tails of the distribution (\(5 \pm 1.5\)).

P-value II

  • This shaded probability represents the likelihood of obtaining a sample mean that is at least as extreme as our sample mean (=6.5) in both tails of the distribution if the population mean is 5.
  • That’s our P value!
  • When a P value is less than or equal to the chosen significance level, we reject the null hypothesis.

Confidence intervals

Estimate population mean

  • Let’s try to estimate the population mean.
  • We sample once by measuring 5 cells.
  • Sample
    4.273791
    5.064480
    4.419903
    3.749903
    5.279963
  • Sample mean = 4.557608
  • Sample SD = 0.618591

Point estimate

  • The sample mean is the most likely value for the population mean given the information we have.
  • It is a point estimate, or a best guess, of an unknown population mean.
  • However, it would not be unusual to obtain different sample means if we drew other random samples from the same population.

Margin of error

  • To put a number on the uncertainity of the point estimate, let’s shade the inner 95% of the distribution.
  • 95% is an arbitrary but a commonly accepted typical range.
  • A specific CI represents the margin of error, or the amount of uncertainty, around the point estimate, which is our single sample mean.
  • A specific CI is a range of values that is likely to contain an unknown population parameter.
  • It is NOT the probability of the population mean being in THAT range!
  • The population mean is UNKNOWN but FIXED; it either is or isn’t in a CI.

CI in hypothesis testing

  • If you draw a random sample many times (from the same population!), a certain percentage of confidence intervals will contain the population mean.
  • This percentage is the confidence level!
  • In our case 95% of the time, the confidence interval WILL contain the population mean.

Example of CI

From Khan Academy:

A baseball coach was curious about the true mean speed of fastball pitches in his league. The coach recorded the speed in kilometers per hour of each fastball in a random sample of 100 pitches and constructed a 95%, percent confidence interval for the mean speed. The resulting interval was (110,120).

Correct statement:

We’re 95% confident that the interval (110,120) captured the true mean pitch speed.

Incorrect statement:
  • There is a 95% probability that the true mean is within the (110, 120) interval.
  • About 95% of pitches in the sample were between 110 and 120.
  • If the coach took another sample of 100 pitches, there’s a 95% chance the sample mean would be between 110 and 120.

P values vs CI

  • You can use either P values or confidence intervals to determine whether your results are statistically significant.
  • If a hypothesis test produces both, these results will agree.
  • The confidence level is equivalent to \(1 - \alpha\).
  • If the P value is less than the significance level \(\alpha\), the hypothesis test is statistically significant.
  • If the confidence interval does not contain the null hypothesis value, the results are statistically significant.
  • If the P value is less than \(\alpha\), the confidence interval will not contain the null hypothesis value.

P values vs CI II

The significance level defines the distance the sample mean must be from the null hypothesis to be considered statistically significant.

The confidence level represents the percentage of intervals that would include the population parameter if you took samples from the same population again and again.

Type of errors

Pitfalls

  • The P-value is not a “score”.
  • The P-value is NOT the probability that the null hypothesis is true.
  • It is the probability of obtaining an effect at least as extreme as the one in your sample data, assuming that the null hypothesis is true.
  • The confidence level is NOT the probability that a specific CI contains the population parameter.
  • It is a range of values that is likely to contain an unknown population parameter.
  • If you draw a random sample many times, a certain percentage of CIs will contain the population parameter.
  • This percentage is the confidence level.

Further reading

Articles

Blogs

Books