The example demonstrates that Manova shows a higher power than a sequence of Anova trials. The example works with artificial data motivated in mechanical engineering domain.
There is a categorical independent variable “alloy type” and two continuous dependent variables “flexibility” and “strength”. Flexibility has a normal distribution and is independent of alloy type. Strength is correlated with flexibility, the alloy C shows an increased strength. Consequently, the null hypothesis that mean flexibility and strength do not change with the alloy type should be rejected in this task.
Manova detects the influence of alloy type on strength earlier than Anova, i.e., it needs fewer observations for weaker dependency.
library(gplots)
library(plyr)
There are 3 alloy types. We have 30 samples per type. The magnitude of alloy C influence on strength is the main parameter here (the default value is 0.2).
alloy<-factor(c(rep("A",30),rep("B",30),rep("C",30)))
flexibility<-rnorm(90,10,1)
strength<-0.5*flexibility+0.2*rnorm(90,1,1)+0.2*as.numeric(alloy=="C")
alloys<-data.frame(alloy,flexibility,strength)
Correlation in groups should be larger than the correlation across groups as it is influenced by the alloy type too.
cor.test(flexibility,strength)
Pearson's product-moment correlation
data: flexibility and strength
t = 19.674, df = 88, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.8554594 0.9349605
sample estimates:
cor
0.9026405
cov(alloys[-1])
flexibility strength
flexibility 0.8344222 0.4232252
strength 0.4232252 0.2634679
cor.test(flexibility[alloy=="A"],strength[alloy=="A"])
Pearson's product-moment correlation
data: flexibility[alloy == "A"] and strength[alloy == "A"]
t = 16.796, df = 28, p-value = 3.739e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.9042260 0.9779961
sample estimates:
cor
0.9537855
cor.test(flexibility[alloy=="B"],strength[alloy=="B"])
Pearson's product-moment correlation
data: flexibility[alloy == "B"] and strength[alloy == "B"]
t = 11.13, df = 28, p-value = 8.609e-12
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.8046714 0.9532407
sample estimates:
cor
0.9031259
cor.test(flexibility[alloy=="C"],strength[alloy=="C"])
Pearson's product-moment correlation
data: flexibility[alloy == "C"] and strength[alloy == "C"]
t = 8.6359, df = 28, p-value = 2.207e-09
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.7107248 0.9278960
sample estimates:
cor
0.8526667
plotmeans(flexibility~alloy,alloys,ylim=c(9.2,10.8),ylab="flexibility, strength",yaxt="n",col="blue")
plotmeans((strength+5)~alloy,alloys,add=TRUE,barcol="red",col="red")
legend("topleft",legend=c("flexibility","strength"),col=c("blue","red"),pch=21)
plot(flexibility,strength,col=alloy,pch=19)
abline(lm(strength~flexibility,subset=alloy=="A",data=alloys),col="black",lty=3)
abline(lm(strength~flexibility,subset=alloy=="B",data=alloys),col="red",lty=3)
abline(lm(strength~flexibility,subset=alloy=="C",data=alloys),col="green",lty=3)
legend("topleft",legend=levels(alloys$alloy),col=c("black","red","green"),pch=19)
Manova tends to reject the null, Anova does not (the actual result depends on the random process of data generation, the parameter setting influnces it as well).
summary(aov(flexibility~alloy,alloys))
Df Sum Sq Mean Sq F value Pr(>F)
alloy 2 0.56 0.2818 0.333 0.718
Residuals 87 73.70 0.8471
summary(aov(strength~alloy,alloys))
Df Sum Sq Mean Sq F value Pr(>F)
alloy 2 1.555 0.7773 3.089 0.0506 .
Residuals 87 21.894 0.2517
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
summary(manova(cbind(flexibility,strength) ~ alloy, data = alloys),test="Wilks")
Df Wilks approx F num Df den Df Pr(>F)
alloy 2 0.8179 4.5465 4 172 0.001626 **
Residuals 87
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
This piece of code applies the formulas from the lecture. It shows that they match the definition of covariance matrix for T and E. It also demonstrates the way in which lambda get calculated and statistically evaluated.
cross_prod_mat<-function(df,vec){cp<-alply(df,1,function(x){t(x-vec)%*%as.matrix(x-vec)});return(Reduce('+',cp))}
cross_prod_mat<-function(df,vec){cp<-alply(df,1,function(x){as.numeric(x-vec)%o%as.numeric(x-vec)});return(Reduce('+',cp))} # the same via outer product %o%
y_grand<-colMeans(alloys[-1])
T<-cross_prod_mat(alloys[-1],y_grand)
T/(nrow(alloys)-1) # after normalization identical with the covariance matrix above
[,1] [,2]
[1,] 0.8344222 0.4232252
[2,] 0.4232252 0.2634679
alloys.g<-split(alloys[-1],alloys$alloy)
y_group<-do.call(cbind,apply(alloys[-1],2,function(x){aggregate(x~alloy,FUN=mean)}))[c(2,4)]
E=mapply(cross_prod_mat,df=alloys.g,vec=as.list(as.data.frame(t(y_group))),SIMPLIFY=F)
E<-Reduce('+',E)
lambda<-det(E)/det(T)
lambda # matches the manova Wilks stat
[1] 0.8178997
bartlett<--(nrow(alloys)-1-(ncol(alloys[-1])+length(levels(alloy)))/2)*log(lambda) # Bartlett's approximation
1-pchisq(bartlett,ncol(alloys[-1])*(length(levels(alloy))-1)) # pvalue based on comparison with chisq quantiles, Manova match again
[1] 0.001624712