Search
For the final assignment, you will form groups, find a suitable dataset (or multitude) and perform statistical analysis to answer a complex question using available data. Ideally, start by posing the question and then go on to find appropriate datasets (online or offline).
The goal is for you to acquire an understanding of the whole statistical process from the ve. Yry beginning to the very end. You should understand that as statisticians, you will be the ones to formalize real-world questions, you will never achieve clear and perfect results, and there will always be things out of your control. Nevertheless, you have to do your best to apply formal methods to real and important problems and convince both expert and laic audience of your conclusions.
Conceptually, you should go through the following steps to complete the assignment:
Throughout the process, you will have several checkpoints so that you are not alone during the process and have some feedback. The first checkpoint is finding your question (step 1), where we will try to calibrate the difficulty of the assignment with you. The second checkpoint is formulating a plan (step 2), which is a substantial part of your work, and you will turn it in as a standalone deliverable. Following this, you should have a team consultation with a tutor. The third checkpoint is your report (step 7), which will be read and reviewed by another team of your peers. You will be asked to review some other team in turn as well. Finally, with the feedback on your report, you will prepare a final presentation (step 8).
Apart from going through the whole statistical process yourself, we want you to try applying techniques from SAN. So, think a bit about what you learned. Not only the obvious methods like the linear models and classifiers, you can even try some power analysis to judge how much data you need, outlier analysis to see if some data might be wrong, robust methods to deal with noisy information, etc.
Some examples of suitable types of questions for this assignment are:
Some ideas for where to look for topic inspiration:
To help you get some inspiration when looking for problems, you can look at the following places where you can find many interesting datasets, if you really do not have any idea or personal interest:
Cybersecurity datasets (the projects supervised by Tomas Pevny, often time-series directed):
We encourage students to be creative. Even alternative ways to pass the assignment are possible as long as the ideas of the assignment are preserved—for example, participation in a statistics-oriented challenge of the HackHealth hackathon by the whole team would fulfil the requirements. Note: it is hard to tell from the small info how much a challenge will, in reality, be statistics-oriented, talk to us about your vision, but you would have to show that your work was in the spirit of this assignment.
Students should form teams of 4 people (or 3 when the total number in a class would not be divisible). The team organizes work between themselves and reports contributions, including a % share of work by individual members as part of the individual work items. The workload expected per person is about 20 hours, so the total per team could be up to 80 hours, which is enough for a really nice piece of work—make it count!
The zeroeth submission we want from you (see Section Steps) is a few sentences text file with your general question to quickly give you feedback before you start working on your plan. The rest of the outcomes should be submitted as pdf documents and go as follows:
After the presentation, you will report in BRUTE the amount of work per member in the whole project, quantified in percentages. This will act as a basis for the assignment of points.
Generally, you can allocate your time as you wish. We wrote down something of expected time allocation, you can take inspiration. The reason why the planning part is so large we expected you might want to inspect your dataset a bit so some data manipulation might take place already at this stage.
The best one to three projects will be selected.
See the best of 2023/2024.
Here you have an example topic and how to approach creation of your plan.