courses:becm36mlm:projects:weeks1-7 [CourseWare Wiki]

Weeks 1-7

	Min Points	Max Points
Total	10	25

To pass the semestral project you need to get at least 10 points.
Deadline 30.04.2026 23:59

Download the project template. mlm-semestral-project-weeks-1-7.zip
Please, submit a zip file to Brute including a filled-in Jupyter Notebook with your code/solution. Make sure that:

the notebook is runnable
the results can be replicated
the final cell outputs (that you want to present to us) are saved/visible

Issues regarding the project can be discussed on the forum thread.

Environment

Install uv ⇒ https://docs.astral.sh/uv/getting-started/installation/
Run uv venv in the folder with the notebook
Run uv sync in the folder with the notebook
Activate the env with source .venv/bin/activate
Run uv pip install pyg_lib torch_scatter torch_sparse torch_cluster -f https://data.pyg.org/whl/torch-2.10.0+cpu.html

Part 1: Tabular Models (Week 1, 2, 6 & 7) - Total Max 15 points

Task 1.1. - Max 8 points

Compare DecisionTree, RandomForest, and XGBoost models on:
1. Only the target entity table.
2. The target entity table extended by automatically generated features through propositionalization methods.
Choose appropriate evaluation metric(s)
Briefly discuss your setup, the results of the experiments.

Task 1.2. - Hyperparameter optimization - Max 2 points

Perform hyperparameter optimization and provide brief analysis.
To get maximum points optimize at least two parameters on one model, e.g. max_depth and learning_rate on XGBoost.

Task 1.3. - Explanations - Max 3 points

Use explainability methods to uncover how the model makes its predictions.
To get maximum points use at least two methods, interpret/discuss the meaning of their outputs, and compare them.

Task 1.4. - TabPFN - Max 2 points

Extend comparison from Task 1 with TabPFN and discuss the results (also any issues you encounter).

Part 2: Graph Neural Networks (Week 3, 4 & 7) - Total Max 10 points

Task 2.1. - Max 5 points

Implement the RDL pipeline (most of the tools were provided in the lab):
1. Load the database data
2. Process the data into tensor representation
3. Prepare an RDL model
4. Prepare the task target data and loaders/samplers
5. Train the model
6. Evaluate the model's performance over a range of values of at least one hyperparameter
Provide brief description of the pipeline, setup and results.

If you struggle with the training loop check the examples from RelBench.

Task 2.2. - Max 3 points

Compare the performance of a homogenous GNN on the same data represented by:
1. A homogeneous graph without features (structure only).
2. A homogeneous graph with one-hot node type features.
Discuss the results in comparison to Task 1 (RDL).

HeteroData object in PyG has to_homogeneous() method that can convert the heterogeneous graph to homogeneous graph.

Task 2.3. - GNN Explainer - Max 2 points

Use a graph-specific explainer (GNNExplainer, PGExplainer, etc.) on either the RDL Heterogeneous GNN (Task 1) or the Homomogeneous GNN (Task 2).
Interpret/discuss the meaning of the explainer's output.

Table of Contents