===== Lab 03 : Statistical significance tests with MATLAB ===== Topics: * t-test ==== HW03 Assignment ==== For the homework HW03 tasks, follow the emphasized links to get to the detailed description. * **[5.0 pts]** - [[#(A) t-test]], Solve the tasks specified in the detailed description. * **[2.0 pts]** - [[#(B) bonus]], Solve the tasks specified in the detailed description.) Upload your report in PDF format and code zipped in a folder to [[https://cw.felk.cvut.cz/brute/student/course/962|BRUTE]] to the assignment ''L03-Matlab: Statistics''. Please name the PDF file as ''hw03_.pdf''. Include the MATLAB code snippets in the report or upload them separately. ==== Statistical tests Introduction ==== ===== t-test: ===== The following code shows exam grades of two students in vectors $x$ and $y$. Test the null hypothesis that the pairwise difference between data vectors x and y has a mean equal to zero with a significance level ($\alpha$)=0.01 **Hints** Use of paired sample t-test as shown in the following snippet, if and y are two samples and $\alpha=0.01$ [h,p] = ttest(x,y,'Alpha',0.01) The value of h = 1 (or p<0.01)indicates that "t-test" **reject the null hypothesis** at the default 1% significance level.\\ Use ttest2() function for unpaired samples from uniform distributions with equal or 'unequal' standard deviations.\\ Follow the link for various conditions of t-test: [[https://www.investopedia.com/terms/t/t-test.asp]] \\ To generate $n$ samples from normal distribution with given mean ($m$) , standard deviation ($s$)\\ normrnd(m, s ,[1,n]) /* ===== F-test: ===== Test the null hypothesis that the data in x and y comes from distributions with the same variance. [h,p,ci,stats] = vartest2(x,y) The returned result h = 1 indicates that vartest2 rejects the null hypothesis at the default 5% significance level. */ ===== (A) t-test ===== Let's generate a signal with number of samples $n=30$, with mean =0.0 and standard deviation=1.0 and noises with number of samples= $m$ from normal distributions with parameters:\\ 1. mean =0.5, std.dev. =1.0, $m=30$ \\ 2. mean =0.5, std.dev. =2.0, $m=30$ \\ 3. mean =0.5, std.dev =2.0, $m=20$ \\ Use appropriate t-test and find out if we can detect the signal from above mentioned noises at significance level=0.05 ?\\ hint: Null hypothesis: $H0$= there is no difference between the means of signal and noise ===== (B) bonus ===== /* A dataset contains 480 ceramic strength measurements for 2 batches of material as follows, No of observations: 240 each, means as 688.99 and 611.1559, std. deviation: 65 and 61 respectively. Consider default significance level=$\alpha=0.05$ We are testing null hypothesis that variance for 2 batches are equal: \\ $$H0: \sigma_1^2=\sigma_1^2$$ \\ tip: Find test statistic, $F=\frac{s_1^2}{s_2^2}$ ($s_1>s_2$), Use Matlab function: finv(1-$\alpha$/2,n-1,n-1) and finv($\alpha$/2,n-1,n-1) and check if based on these values to reject null hypothesis or not. */ We want to determine if on average class B students score 15 marks more than class A students in an exam? We do not have information related to the variances for both classes. To perform a t-test, we randomly collect the data of 10 students from each class. We chose significance level($\alpha$)=0.05 as the criterion for hypothesis testing.\\ Marks of 10 randomly selected students\\ Class A: 587, 602, 627, 610, 619, 622, 605, 608, 596, 592\\ Class B: 626, 643, 647, 634, 630, 649, 629, 623, 617, 607\\ hint: The null hypothesis: $\mu_{B}-\mu_{A}\leq15$\\ Find t using following formula: $$ t=\frac{(x_B-x_A)-(\mu_B-\mu_A)}{\sqrt{(\frac{s_B^2}{n_B} + \frac{s_A^2}{n_A})}}$$\\ where, $x$ and $s$ are sample means and standard deviations. $\mu$ are population means and $n$ are total number of students picked randomly \\ To compute p-value use matlab function(one-tailed p value)\\ p=1-tcdf(t,df) with df=degrees of freedom=(total number of students randomly chosen-2)\\ Check your results also using test() given above.