===== Weeks 1-7 =====

^            ^ Min Points ^ Max Points ^
| **Total** | 0         | 25         |

\\

<note important>
Deadline **30.04.2026 23:59**
</note>

<note important>Download the project template. {{ courses:becm36mlm:projects:mlm-semestral-project-weeks-1-7.zip |}}\\
Please, submit a zip file to [[https://cw.felk.cvut.cz/brute/student/|Brute]] including a filled-in Jupyter Notebook with your code/solution. Make sure that:
  - the notebook is runnable
  - the results can be replicated
  - the final cell outputs (that you want to present to us) are saved/visible
</note>

<note important>Issues regarding the project can be discussed on the [[https://cw.felk.cvut.cz/forum/thread-6734.html|forum thread]].</note>


==== Environment ====

  - Install ''uv'' => https://docs.astral.sh/uv/getting-started/installation/
  - Run ''uv venv'' in the folder with the notebook
  - Run ''uv sync'' in the folder with the notebook
  - Activate the env with ''source .venv/bin/activate''
  - Run ''uv pip install pyg_lib torch_scatter torch_sparse torch_cluster -f https://data.pyg.org/whl/torch-2.10.0+cpu.html''
 
==== Part 1: Tabular Models (Week 1, 2, 6 & 7) - Total Max 15 points ====
\\
=== Task 1.1. - Max 8 points ===

  * Compare DecisionTree, RandomForest, and XGBoost models on:
    - **Only** the target entity table.
    - The target entity table extended by **automatically** generated features through propositionalization methods.
  * Choose appropriate **evaluation metric(s)**
  * Briefly **discuss** your **setup**, the **results** of the **experiments**.
\\ 
=== Task 1.2. - Hyperparameter optimization - Max 2 points ===

  * Perform **hyperparameter optimization** and provide brief analysis. 
  * To get maximum points optimize at least two parameters on one model, e.g. ''max_depth'' and ''learning_rate'' on ''XGBoost''.
\\ 
=== Task 1.3. - Explanations - Max 3 points ===

  * Use **explainability** methods to uncover how the model makes its predictions. 
  * To get maximum points use at least two methods, interpret/discuss the meaning of their outputs, and compare them.
\\ 
=== Task 1.4. - TabPFN - Max 2 points ===

  * Extend comparison from //Task 1// with ''TabPFN'' and discuss the results (also any issues you encounter).

 
==== Part 2: Graph Neural Networks (Week 3, 4 & 7) - Total Max 10 points ====
\\ 
=== Task 2.1. - Max 5 points ===

  * Implement the **RDL** pipeline (most of the tools were provided in the lab):
    - Load the database data
    - Process the data into tensor representation
    - Prepare an RDL model
    - Prepare the task target data and loaders/samplers
    - Train the model
    - Evaluate the model's performance over a range of values of at least one hyperparameter
  * Provide brief description of the pipeline, setup and results.
<note tip>If you struggle with the training loop check the examples from ''RelBench''.</note>
\\ 
=== Task 2.2. - Max 3 points ===

  * Compare the performance of a **homogenous** GNN on the same data represented by:
    - A homogeneous graph **without** features (structure only).
    - A homogeneous graph with **one-hot node type** features.
  * Discuss the results in comparison to ''Task 1'' (RDL).
<note tip>''HeteroData'' object in ''PyG'' has ''to_homogeneous()'' method that can convert the heterogeneous graph to homogeneous graph.</note>
\\ 
=== Task 2.3. - GNN Explainer - Max 2 points ===

  * Use a **graph-specific explainer** (''GNNExplainer'', ''PGExplainer'', etc.) on either the RDL Heterogeneous GNN (''Task 1'') or the Homomogeneous GNN (''Task 2''). 
  * Interpret/discuss the meaning of the explainer's output.