Conventional blood pressure treatment leads to stable values represented by the sample of 10 different patients shown below. Decide whether the treatment is adequate if the goal is to achieve a mean pressure of no more than 95.
cgroup <- c(90,95,67,120,89,92,100,82,79,85)
max_target_mean <- 95
# mean blood pressure is smaller than the desired value
mean(cgroup)
## [1] 89.9
The mean blood pressure calculated from the sample is smaller than the desired value. However, it is a random variable and we need to know how often it will be smaller than 95 when sampling repeatedly. We can proceed in two ways: employ hypothesis testing or calculate confidence intervals.
Before we do so, there is a couple of basic observations and assumptions: 1) we deal with a small sample (the most common small/large sample size threshold is 30), 2) we may assume that blood pressure is normally distributed (a commonly known medical fact), 3) the measurements are independent (see the design of the study mentioned above), 4) blood pressure variance in our population is unknown (and we will have to estimate it from the sample).
Consequently, we will approximate the distribution of sample means with t-distribution. We will plot it, run the hypothesis test and compute the 95% population mean confidence interval.
s_mean <- mean(cgroup)
s_sd <- sd(cgroup)
# t-distribution probability density function, n-1 degrees of freedom, the mean is set to match the null hypothesis, the standard deviation is estimated from the sample
# normal distribution added for illustration
t.values <- seq(max_target_mean-15,max_target_mean+15,0.2)
dft9 <- data.frame(t.values,dn=dnorm(t.values,mean=max_target_mean,sd=s_sd/sqrt(length(cgroup))),dt9=dlst(t.values,df=length(cgroup)-1,mu=max_target_mean,sigma=s_sd/sqrt(length(cgroup))))
# use the observed sample mean and standard deviation and plot
ggplot(dft9, aes(x = t.values)) +
geom_line(aes(y = dt9, color="t_dist")) +
geom_line(aes(y = dn, color="normal"), linetype="dotted") +
geom_vline(xintercept = mean(cgroup), colour="blue", linetype="dashed") +
geom_vline(xintercept = 95, colour="red", linetype="dashed") +
scale_color_manual(name = "distribution", values = c(t_dist = "blue", normal= "black")) +
labs(title="Probability density function",x="sample mean", y="P(sample mean)") +
geom_polygon(data = rbind(c(mean(cgroup),0),c(dft9[1,1],0),subset(dft9[c(1,3)],t.values<mean(cgroup))), aes(x=t.values, y=dt9), fill="#FFD2D5")
# The interpretation of density plot:
# the probability that the population mean could still be 95 is relatively high, it corresponds to the area under the left-hand tail
# let us estimate the area with one-sided t-test
t.test(x=cgroup,alternative="less",mu=95)
##
## One Sample t-test
##
## data: cgroup
## t = -1.1504, df = 9, p-value = 0.1398
## alternative hypothesis: true mean is less than 95
## 95 percent confidence interval:
## -Inf 98.0268
## sample estimates:
## mean of x
## 89.9
# The interpretation of t-test:
# H0: the population mean is 95, Ha: the population mean is smaller than 95
# the p-value of 0.14 suggests that there is 14% probability that the sample as ours may appear by chance if H0 holds, we cannot reject the null hypothesis in favor of the alternative one
# the 95% confidence interval is from negative infinity to 98, the true value of population mean will be captured in 95% of trials like this, the critical value of 95 is there ... not really unlikely that the population mean is 95 or higher
# Let us reach the same outcomes slower:
m_sd <- s_sd/sqrt(length(cgroup))
t.value <- (s_mean-95)/m_sd
pt(t.value,df=9,lower.tail=T) # use distribution function to obtain p-value
## [1] 0.1398183
# the distribution function of t-distribution with 9 degrees of freedom
dftd9 <- data.frame(t.vals=seq(-5,5,0.1),dtd9=pt(seq(-5,5,0.1),df=length(cgroup)-1))
# use the observed sample mean and standard deviation and plot
ggplot(dftd9, aes(x = t.vals, y = dtd9)) +
geom_line(col="blue") +
geom_vline(aes(xintercept = t.value, color="observed"), linetype="dashed") +
geom_vline(aes(xintercept = qt(0.05,df=9,lower.tail=T), color="needed_to_reject"), linetype="dashed") +
geom_hline(aes(yintercept = pt(t.value,df=9,lower.tail=T)), color="blue", linetype="dashed") +
geom_hline(aes(yintercept = 0.05), color="red", linetype="dashed") +
scale_color_manual(name = "t-value", values = c(observed = "blue", needed_to_reject= "red")) +
labs(title="Cumulative distribution function",x="t value", y="P(T<=t)")
# the upper bound of the confidence interval
s_mean+qt(0.95,df=9)*m_sd
## [1] 98.0268
Literally, we employed one-sample t-test whose formula is: \[t=\frac{\bar{x}-\mu}{s/\sqrt{n}}=\frac{89.9-95}{14.02/\sqrt{10}}=-1.15\] where \(\bar{x}\) is the sample mean, \(\mu\) is the (desired) population mean, \(s\) is the sample standard deviation and \(n\) is the sample size. In terms of hypothesis testing, the value of t-statistic is compared with the quantiles of t-distribution with \(n-1\) degrees of freedom. The p-value is 14%. If we choose a common level of significance \(\alpha=0.05\) we can see that \(p>\alpha\) and we cannot reject the null hypothesis. Statistically, we failed to confirm the initial goal.
The one-tailed 95% confidence interval will be: \[[-\infty,89.9+t_{1-\alpha,n-1}\times s/\sqrt{n}]=[-\infty,89.9+t_{0.95,9}\times 14.02/\sqrt{10}]=[-\infty,98.03]\] The observation that the 95% confidence interval contains the blood pressure of 95, our above-defined threshold, leads us to the same conclusion as with the hypothesis testing. The sample does not provide sufficient information to demonstrate the desired effect of the drug. In practice, a larger sample would probably be taken now. However, … let a new blood pressure drug appear in the meantime. Physicians found a new treatment group of 10 people that received the new drug. The drug is expected to lower blood pressure more efficiently. After treatment for a couple of months (exactly the same procedure as in the previous conventional group), we have to decide whether the new drug works better than the conventional treatment.
Obviously, we will use two sample t-test now. We will additionally assume that the blood pressure variance is equal in both the groups (a reasonable assumption), this helps us to decide which type of t-test to use.
tgroup <- c(71,79,69,98,91,85,89,75,78,80)
mean(tgroup)
## [1] 81.5
sd(tgroup)
## [1] 9.192388
df <- data.frame(group=c(rep("conventional",10),rep("new",10)),bp=c(cgroup,tgroup))
# Draw overlaying histogram
ggplot(df, aes(x = bp, fill = group)) +
geom_histogram(position = "identity", alpha = 0.2, bins = 5)
# The interpretation of histogram:
# the new treatment looks promising, but we need to decide it formally
# probability density function of t-distribution with n-1 degrees of freedom, the given mean and standard deviation
sn_mean <- mean(tgroup)
sn_sd <- sd(tgroup)
t.values <- seq(min(s_mean,sn_mean)-15,max(s_mean,sn_mean)+15,0.2)
dft9n <- data.frame(t.values,dt9=dlst(t.values,df=length(cgroup)-1,mu=s_mean,sigma=s_sd/sqrt(length(cgroup))),dt9n=dlst(t.values,df=length(tgroup)-1,mu=sn_mean,sigma=sn_sd/sqrt(length(tgroup))))
# use the observed sample mean and standard deviation and plot t-distributions
ggplot(dft9n, aes(x = t.values)) +
geom_line(aes(y = dt9), color="red") +
geom_line(aes(y = dt9n), color="blue") +
labs(title="Probability density function",x="sample mean", y="P(sample mean)")
# The interpretation of plot:
# the new treatment has a large chance to overcome the conventional one, however, as the mean pdfs overlap the relationship between the treatments can still be opposite
# The interpretation of t-test:
# H0: the population means are equal, Ha: the new drug population mean is smaller than the conventional drug population mean
# the p-value of 0.065 suggests that there is around 7% probability that the samples as ours may appear by chance if H0 holds, we cannot reject the null hypothesis in favor of the alternative one at the level of significance 0.05
t.test(x=cgroup,y=tgroup,var.equal=T,alternative="greater")
##
## Two Sample t-test
##
## data: cgroup and tgroup
## t = 1.5845, df = 18, p-value = 0.06525
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## -0.7928998 Inf
## sample estimates:
## mean of x mean of y
## 89.9 81.5
# The interpretation of t-test:
# H0: the population mean is 95, Ha: the population mean is smaller than 95
# the p-value of 0.0006 suggests that there is only very small probability that the sample as ours may appear by chance if H0 holds, we reject the null hypothesis in favor of the alternative one
t.test(x=tgroup,alternative="less",mu=95)
##
## One Sample t-test
##
## data: tgroup
## t = -4.6441, df = 9, p-value = 0.0006061
## alternative hypothesis: true mean is less than 95
## 95 percent confidence interval:
## -Inf 86.82865
## sample estimates:
## mean of x
## 81.5
Technically, we used a two-sample t-test assuming equal variance in both populations. Its formula is: \[df=n_1+n_2-2=18\;\;\; s_p^2=\frac{(n_1-1)s_1^2+(n_2-1)s_2^2}{df}=\frac{9\times 14.02^2+9\times 9.19^2}{18}=140.5\] \[t=\frac{\bar{x_1}-\bar{x_2}}{\sqrt{s_p^2(\frac{1}{n_1}+\frac{1}{n_2})}}=\frac{89.9-81.5}{\sqrt{\frac{140.5\times 2}{10}}}=1.58\] where the index 1 refers to the conventional sample, the index 2 refers to the new drug sample and \(s_p\) is the pooled sample standard deviation. The p-value is around 7%, we cannot reject the null hypothesis that both treatments give the same mean blood pressure. However, the original requirement on the mean population blood pressure was (statistically) met with the new drug! The test formula was the same as the one sample t-test formula used earlier.
Further questions and tasks:
Now let us assume that both the blood pressure populations are known. In particular, they are normally distributed, the mean blood pressure in the conventional group as well as the new drug group is 90, the equal standard deviations of 12 can be observed in both the populations. In other words, \(\mu_1=\mu_2=90\), \(\sigma_1=\sigma_2=12\). Let us generate a large number of small samples from both the populations and see how well the t-test works.
mu_c <- 90
mu_n <- 90
sigma_c <- 12
sigma_n <- 12
## sample size and the number of repeats
sample_size <- 10
reps <- 10000
## generate the samples and run the tests
# is mean in cgroup <95?
tcs <- c()
pcs <- c()
# is mean in tgroup <95?
tns <- c()
pns <- c()
# compare cgroup and tgroup
tcns <- c()
pcns <- c()
for (rep in seq(reps)){
## generate two samples with the random normal generator
cgroup <- rnorm(sample_size,mean=mu_c,sd=sigma_c)
tgroup <- rnorm(sample_size,mean=mu_n,sd=sigma_n)
## compare groups with 95
## t-test statistic
tcs[rep] <- (mean(cgroup)-95)/(sd(cgroup)/sqrt(sample_size))
tns[rep] <- (mean(tgroup)-95)/(sd(tgroup)/sqrt(sample_size))
pcs[rep] <- t.test(x=cgroup,alternative="less",mu=95)$p.value
pns[rep] <- t.test(x=tgroup,alternative="less",mu=95)$p.value
## do the same with built in t-test (remember only p-values), run two-tailed test this time
pcns[rep] <- t.test(cgroup,tgroup,alternative = "two.sided",var.equal = TRUE,conf.level = 0.95)$p.value
## compare cgroup and tgroup
## t-test statistic
tcns[rep] <- (mean(cgroup)-mean(tgroup))/(sqrt((sd(cgroup)^2+sd(tgroup)^2)/2)*sqrt(2/sample_size))
## do the same with built in t-test (remember only p-values), run two-tailed test this time
pcns[rep] <- t.test(cgroup,tgroup,alternative = "two.sided",var.equal = TRUE,conf.level = 0.95)$p.value
}
We did two types of t-tests. The first one verifies whether the treatments achieve the mean blood pressure of no more than 95. We run the same test independently for both the treatments. Since we know that \(\mu_1=\mu_2=90\) which is clearly less than 95, we also know that the null hypotheses \(\mu_1=95\) and \(\mu_2=95\) should definitely be rejected against their alternatives \(\mu_1<95\) and \(\mu_2<95\). Whenever we failed to reject the null hypotheses we made Type II error. The power of t-test is the probability that we correctly reject the null hypotheses.
## how often will we correctly reject the null hypothesis?
## the power of test
## can be reached in two ways, to compare the p-values with alpha or the t-statistics with the corresponding t-distribution quantile
## the conventional group
thres_05 <- qt(0.05,df=sample_size-1)
mean(tcs<thres_05)
## [1] 0.3349
mean(pcs<0.05)
## [1] 0.3349
## the new drug group
mean(tns<thres_05)
## [1] 0.3345
mean(pns<0.05)
## [1] 0.3345
Under the given experimental setting, the power of tests for both the treatments is only a bit more than 30%. As a rule of thumb, statistical experiments should be designed to have a power of around 80%. We will return to this issue later.
The second type of t-tests verifies whether the treatments match in their mean blood pressures. Since we know that \(\mu_1=\mu_2=90\), we also know that the null hypotheses \(\mu_1=\mu_2\) should not be rejected against its alternative \(\mu_1\neq\mu_2\). Whenever we rejected the null hypothesis we made Type I error.
## how often will we incorrectly reject the null hypothesis about the identity of population means?
## Tyoe I error, false positive decisions
thres_05 <- qt(0.975,df=2*sample_size-2)
mean(abs(tcns)>thres_05)
## [1] 0.0508
mean(pcns<0.05)
## [1] 0.0508
The Type I error should match the selected significance level \(\alpha=0.05\). Quite as expected, we see that the null hypothesis was rejected in around 5% of the tests.
Both the observations can be reinforced with the probability density plots. In the first plot, the power of test corresponds to the area under left-hand tail. In the second plot, the Type I error corresponds to both the tails.
## plot the t-statistic distribution for the first two tests
dft <- data.frame(ts=c(tcs,tns),group=c(rep("c",reps),rep("t",reps)))
ggplot(NULL) +
geom_density(data=dft,aes(x=ts,color=group)) +
geom_line(data=data.frame(t.vals=seq(-5,5,0.1)+mean(dft$ts),dtd9=dt(seq(-5,5,0.1),df=sample_size-1)),aes(x=t.vals,y=dtd9),linetype="dotted") +
geom_vline(aes(xintercept = qt(0.05,df=sample_size-1)),linetype="dashed")
# The interpretation of density plot:
# we know the populations, thus we know that the null hypothesis H0: mean=95 does not hold
# the null hypothesis is correctly rejected if the value of t-statistic is smaller than -1.83
# this holds only in about one third of samples
## plot the t-statistic distribution for the comparison between treatments
ggplot(NULL) +
geom_density(data=data.frame(tcns=tcns),aes(x=tcns),color="blue") +
geom_vline(aes(xintercept = qt(0.025,df=2*sample_size-2)),linetype="dashed",color="blue") +
geom_vline(aes(xintercept = qt(0.975,df=2*sample_size-2)),linetype="dashed",color="blue") +
geom_line(data=data.frame(t.vals=seq(-5,5,0.1),dtd9=dt(seq(-5,5,0.1),df=2*sample_size-2)),aes(x=t.vals,y=dtd9),linetype="dotted")
# The interpretation of density plot:
# we know the populations, thus we know that the null hypothesis H0: means in both the groups are equal holds, Ha: means in both the groups differ does not hold
# the null hypothesis is incorrectly rejected if the value of t-statistic is smaller than -2.1 or larger than 2.1
# this condition is met in approximately 5% of cases which perfectly matches with the selected significance level alpha 0.05
# the dotted t-distribution with 18 degrees of freedom shows that the emprically generated t-statistics follow the assumed distribution
Further questions and tasks: