courses:zsl:labs2024_01_intro_ttest [CourseWare Wiki]

Lab 01: Introduction to Matlab, hypothesis testing

Topics:

Introduction to Matlab
Hypothesis testing
Student's t-test
Multiple comparison, family-wise error rate

1 Introduction to Matlab

Matlab is available at CTU Download

As a warm-up exercise, we will try a few simple image transformations using Matlab.

Download the image by clicking on the image

Use imread function to load image into Matlab, examine how is the image represented in Matlab.

What are the dimensions of the image matrix?

What is the brightness value that represents the image background?

What is the average brightness value of the brain and the skull in the image? (Try using histogram of the image.)

Using simple thresholding of the brightness values

1) produce an image that is a binary mask of the skull in the image.

2) produce an image that is a binary mask of the brain in the image. Is it possible to select the brain only by thresholding the brightness?

2 Hypothesis testing

null hypothesis
alternative hypothesis
test statistic
critical value
significance level
p-value
confidence interval

2.1 One sample Student's t test:

One sample Student's t test can be used to test whether the sample mean is different from 0 (or a specific value).

The test statistic is $$ t = \frac{\bar{x} - \mu_0}{\frac{s}{\sqrt{n}}}$$ where t is the the t statistic, $\bar{x}$ is the sample mean, $\mu_0$ is the assumed sample mean value (typically 0), $s$ is the standard deviation and $n$ is the number of samples.

The t statistic follows the t-distribution with $n - 1$ degrees of freedom.

The following code shows the computation of the t test on a random sample, for practical purposes it is simpler to just use ttest function.

% generate sample of random values taken from normal distribution with specific parameters
n = 47;
mu = -7;
sigma = 5;
 
x = sigma * randn([n 1]) + mu;
 
% histogram (empirical distribution of the sampled values)
hist(x)
 
% compute the test statistic
t_obs = mean(x)/(std(x)/sqrt(n));
 
% estimate the p-value
2*(min([cdf('T',t_obs,n-1), 1-cdf('T',t_obs,n-1)]))
 
% compare to the Matlab function for one sample Student's t test
[~, p] = ttest(x)
 
% confidence interval (normal approximation)
[mean(x) + norminv(0.025)*(std(x)/sqrt(n)) mean(x) + norminv(0.975)*(std(x)/sqrt(n))]

2.2 Two sample Student's t test with unequal variances (Welch's t test)

Two sample Welch's t test can be used to compare whether two samples come from the same distribution or whether their sample means significantly differ.

The test statistics is $$ t = \frac{\bar{x}_1 - \bar{x}_1}{ \sqrt{ \frac{s_1^2}{n_1} + \frac{s_2^2}{n_2} } } $$ where $\bar{x}_1$ and $\bar{x}_2$ are sample means, $s_1$ and $s_2$ are standard deviations and $n_1$ and $n_2$ are sample sizes of samples $x_1$ and $x_2$.

The t statistic follows the t-distribution, but in the case of the Welch's t test, the degrees of freedom are given by a rather complicated formula $$ df = \frac{(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2})^2}{ \frac{s_1^4}{n_1^2(n_1 - 1)} + \frac{s_2^4}{n_2^2(n_2 - 1)} }$$

The following code shows the computation of the t test on two random samples, for practical purposes it is simpler to just use ttest2 function.

n1 = 47;
n2 = 43;
mu1 = 7;
mu2 = 11;
sigma1 = 5;
sigma2 = 4;
 
% generate two random samples taken from normal distributions with different parameters
x1 = sigma1 * randn([n1 1]) + mu1;
x2 = sigma2 * randn([n2 1]) + mu2;
 
% compute the test statistics
t_obs = (mean(x1) - mean(x2))/sqrt((std(x1)/sqrt(n1))^2 + (std(x2)/sqrt(n2))^2);
 
% compute degrees of freedom
df = floor((var(x1)/n1 + var(x2)/n2)^2/(((var(x1)^2)/(n1^2*(n1-1))) + ((var(x2)^2)/(n2^2*(n2-1)))));
 
% estimate p-value
2*(min([cdf('T',t_obs,df), 1-cdf('T',t_obs,df)]))
 
% compare the value to the Matlab ttest2 function for two sample Welch's t test
[~, p] = ttest2(x1,x2)

3 Multiple comparison

The common problem in the statistical analysis of medical images, such fMRI experiments, is the dimensionality of the data. The complex processing of such data can have very unexpected consequences ( such as dead salmon reacting to pictures).

One the problems is the multiple comparison. When we perform multiple statistical tests and use the usual significance level 0.05 (1 in 20 is expected to be falsely indicated as statistically significant), the chance of the occurrence of type I errors increases.

The family-wise error rate measures the probability that at least one of a group of test is incorrectly denoted as significant.

When the statistical tests are independent, the probability can be easily worked out. Let say we perform $m = 20$ independent test and our statistical significance level is $\alpha = 0.05$, then the probability that one test is correctly denoted as not significant $$P(T_1 = 0) = 1 - \alpha$$, because the tests are independent then we can combine the probabilities for all test as a simple product $$ P(\forall T = 0) = (1 - \alpha)^m$$ And then the probability of at least one type I error is complementary to the probability of all correct test $$ P(\exists T = 1) = 1 - (1 - \alpha)^m = 1 - (1 - 0.05)^{20} = 0.642$$. So the chance of an error occurring is rather high.

There are many procedures for adjusting the significance level when performing multiple tests. The most simplest and the most common is the Bonferroni correction that works by dividing the significance level by the number of test $$\alpha_{adj} = \frac{\alpha}{m}$$. A little more precise but still quite simple procedure is the Šidák correction $$\alpha_{adj} = 1 - (1 - \alpha)^{\frac{1}{m}}$$. In practice we often want to leverage the dependencies between the test (such that tests on neighbouring voxels in the fMRI data are more likely to produce similar results) and more complex procedures are used.

The code bellow shows how the Bonferroni and Šidák correction correctly maintain the family-wise error rate at the desired significance level.

n = 20;
alpha = 0.05; % the usual significance level
 
alpha_adj1 = alpha/n; % adjusted significance level - bonferroni correction
alpha_adj2 = 1 - (1 - alpha)^(1/n); % adjusted significance level - Šidák correction
 
1 - (1-alpha)^n % probability of at least one error
 
1 - (1-alpha_adj1)^n % probability of at least one error when using Bonferroni correction
 
1 - (1-alpha_adj2)^n % probability of at least one error when using Šidák correction

XKCD Significance

Homework tasks

Homework 1 [1 pts]

Try to improve the binary mask for brain by smoothing the image by Gaussian filter. The Gaussian image filter can be obtained by fspecial function, the filtering is performed by the conv2 function. For example:

gaussfilt = fspecial('gaussian',15,5);
im_filt = conv2(gaussfilt,im);

Put your improved binary mask into the report and comment on the results.

Homework 2 [2 pts]

Implement a simpler variant of the two sample Student's t test as a function

function [h, p] = my_ttest2(x1, x2, alpha)

where x1 and x2 are the tested random samples, alpha is the significance level, h is the test decision (0 the sample means are equal, 1 the sample means differ at significance level alpha) and p is the p-value. The simpler variant uses the test statistic $$t = \frac{\bar{x}_1 - \bar{x}_2}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}$$, the pooled standard deviation is defined as $$s_p = \sqrt{ \frac{ (n_1 - 1) s_1^2 + (n_2 - 1) s_2^2 } {n_1 + n_2 - 2} }$$ and the t statistic follows t-distribution with $n_1 + n_2 - 2$ degrees of freedom. This variant is appropriate when the two samples don't have the same sample size and their standard deviations are similar. Submit the implementation as an m-file alongside the report from the lab.

Homework 3 [2 pts]

Use data multidata.mat to examine the problem of multiple comparison with Student's t test.

The data contain three columns, first column indicates experimental condition (integers 1 to 5), the second and third columns contain certain measurement. The data could originate from an experiment where a test subject was viewing images with certain contents (1 - neutral, 2 - adventure, …) while the size of his/her pupil was measured by eye tracking device and his/her skin conductance by polygraph (the variables). In an evaluation of the experiment, we want to know whether the variables are different for certain experimental condition.

Use Student's t test to compare each condition against each other separately for both variables (var1| cond1 vs var1 | cond2, var1| cond1 vs var1 | cond3, var1| cond1 vs var1 | cond4, …, var1| cond2 vs var1 | cond3, var1| cond2 vs var1 | cond4, …, similar to the boxplots bellow). Compute the number of comparison performed in the evaluation of the experiment and choose such an adjusted significance threshold using Bonferroni correction so that the family-wise error rate 0.05 is maintained. How does the correction for multiple comparison affect the results (what is the difference between the case when no correction is used and the when a appropriate correction is used).

Summarise the test results in a table of p-values, one table for each variable and indicate which test show statistically significant results when not using a correction for multiple comparison and when using Bonferroni correction. Comment on the results.

The following code for one of the variables may help understand the procedure

% prepare matrix for storing the results
P = nan(5);
 
% going through all the meaningful comparisons
for i = 1:max(d(:,1))
    for j = (i+1):max(d(:,1))
 
        [~, P(i,j)] = ttest2(d(d(:,1) == i,2),d(d(:,1) == j,2));
 
    end
end
 
% the p-values
P
 
% significant results at 0.05
P < 0.05
 
% significant results at adjusted significance level
% P < ... TODO
 
% repeat for the second variable in d(:,3)