===== Weeks 1-7 =====
^ ^ Min Points ^ Max Points ^
| **Total** | 0 | 25 |
\\
Deadline **30.04.2026 23:59**
Download the project template. {{ courses:becm36mlm:projects:mlm-semestral-project-weeks-1-7.zip |}}\\
Please, submit a zip file to [[https://cw.felk.cvut.cz/brute/student/|Brute]] including a filled-in Jupyter Notebook with your code/solution. Make sure that:
- the notebook is runnable
- the results can be replicated
- the final cell outputs (that you want to present to us) are saved/visible
Issues regarding the project can be discussed on the [[https://cw.felk.cvut.cz/forum/thread-6734.html|forum thread]].
==== Environment ====
- Install ''uv'' => https://docs.astral.sh/uv/getting-started/installation/
- Run ''uv venv'' in the folder with the notebook
- Run ''uv sync'' in the folder with the notebook
- Activate the env with ''source .venv/bin/activate''
- Run ''uv pip install pyg_lib torch_scatter torch_sparse torch_cluster -f https://data.pyg.org/whl/torch-2.10.0+cpu.html''
==== Part 1: Tabular Models (Week 1, 2, 6 & 7) - Total Max 15 points ====
\\
=== Task 1.1. - Max 8 points ===
* Compare DecisionTree, RandomForest, and XGBoost models on:
- **Only** the target entity table.
- The target entity table extended by **automatically** generated features through propositionalization methods.
* Choose appropriate **evaluation metric(s)**
* Briefly **discuss** your **setup**, the **results** of the **experiments**.
\\
=== Task 1.2. - Hyperparameter optimization - Max 2 points ===
* Perform **hyperparameter optimization** and provide brief analysis.
* To get maximum points optimize at least two parameters on one model, e.g. ''max_depth'' and ''learning_rate'' on ''XGBoost''.
\\
=== Task 1.3. - Explanations - Max 3 points ===
* Use **explainability** methods to uncover how the model makes its predictions.
* To get maximum points use at least two methods, interpret/discuss the meaning of their outputs, and compare them.
\\
=== Task 1.4. - TabPFN - Max 2 points ===
* Extend comparison from //Task 1// with ''TabPFN'' and discuss the results (also any issues you encounter).
==== Part 2: Graph Neural Networks (Week 3, 4 & 7) - Total Max 10 points ====
\\
=== Task 2.1. - Max 5 points ===
* Implement the **RDL** pipeline (most of the tools were provided in the lab):
- Load the database data
- Process the data into tensor representation
- Prepare an RDL model
- Prepare the task target data and loaders/samplers
- Train the model
- Evaluate the model's performance over a range of values of at least one hyperparameter
* Provide brief description of the pipeline, setup and results.
If you struggle with the training loop check the examples from ''RelBench''.
\\
=== Task 2.2. - Max 3 points ===
* Compare the performance of a **homogenous** GNN on the same data represented by:
- A homogeneous graph **without** features (structure only).
- A homogeneous graph with **one-hot node type** features.
* Discuss the results in comparison to ''Task 1'' (RDL).
''HeteroData'' object in ''PyG'' has ''to_homogeneous()'' method that can convert the heterogeneous graph to homogeneous graph.
\\
=== Task 2.3. - GNN Explainer - Max 2 points ===
* Use a **graph-specific explainer** (''GNNExplainer'', ''PGExplainer'', etc.) on either the RDL Heterogeneous GNN (''Task 1'') or the Homomogeneous GNN (''Task 2'').
* Interpret/discuss the meaning of the explainer's output.