Weeks 1-7

Min Points Max Points
Total 10 25


To pass the semestral project you need to get at least 10 points.
Deadline 30.04.2026 23:59
Download the project template. mlm-semestral-project-weeks-1-7.zip
Please, submit a zip file to Brute including a filled-in Jupyter Notebook with your code/solution. Make sure that:
  1. the notebook is runnable
  2. the results can be replicated
  3. the final cell outputs (that you want to present to us) are saved/visible
Issues regarding the project can be discussed on the forum thread.

Environment

  1. Run uv venv in the folder with the notebook
  2. Run uv sync in the folder with the notebook
  3. Activate the env with source .venv/bin/activate
  4. Run uv pip install pyg_lib torch_scatter torch_sparse torch_cluster -f https://data.pyg.org/whl/torch-2.10.0+cpu.html

Part 1: Tabular Models (Week 1, 2, 6 & 7) - Total Max 15 points


Task 1.1. - Max 8 points

  • Compare DecisionTree, RandomForest, and XGBoost models on:
    1. Only the target entity table.
    2. The target entity table extended by automatically generated features through propositionalization methods.
  • Choose appropriate evaluation metric(s)
  • Briefly discuss your setup, the results of the experiments.


Task 1.2. - Hyperparameter optimization - Max 2 points

  • Perform hyperparameter optimization and provide brief analysis.
  • To get maximum points optimize at least two parameters on one model, e.g. max_depth and learning_rate on XGBoost.


Task 1.3. - Explanations - Max 3 points

  • Use explainability methods to uncover how the model makes its predictions.
  • To get maximum points use at least two methods, interpret/discuss the meaning of their outputs, and compare them.


Task 1.4. - TabPFN - Max 2 points

  • Extend comparison from Task 1 with TabPFN and discuss the results (also any issues you encounter).

Part 2: Graph Neural Networks (Week 3, 4 & 7) - Total Max 10 points


Task 2.1. - Max 5 points

  • Implement the RDL pipeline (most of the tools were provided in the lab):
    1. Load the database data
    2. Process the data into tensor representation
    3. Prepare an RDL model
    4. Prepare the task target data and loaders/samplers
    5. Train the model
    6. Evaluate the model's performance over a range of values of at least one hyperparameter
  • Provide brief description of the pipeline, setup and results.
If you struggle with the training loop check the examples from RelBench.


Task 2.2. - Max 3 points

  • Compare the performance of a homogenous GNN on the same data represented by:
    1. A homogeneous graph without features (structure only).
    2. A homogeneous graph with one-hot node type features.
  • Discuss the results in comparison to Task 1 (RDL).
HeteroData object in PyG has to_homogeneous() method that can convert the heterogeneous graph to homogeneous graph.


Task 2.3. - GNN Explainer - Max 2 points

  • Use a graph-specific explainer (GNNExplainer, PGExplainer, etc.) on either the RDL Heterogeneous GNN (Task 1) or the Homomogeneous GNN (Task 2).
  • Interpret/discuss the meaning of the explainer's output.
courses/becm36mlm/projects/weeks1-7.txt · Last modified: 2026/03/27 14:03 by pelesjak