===== Weeks 1-7 ===== ^ ^ Min Points ^ Max Points ^ | **Total** | 0 | 25 | \\ Deadline **30.04.2026 23:59** Download the project template. {{ courses:becm36mlm:projects:mlm-semestral-project-weeks-1-7.zip |}}\\ Please, submit a zip file to [[https://cw.felk.cvut.cz/brute/student/|Brute]] including a filled-in Jupyter Notebook with your code/solution. Make sure that: - the notebook is runnable - the results can be replicated - the final cell outputs (that you want to present to us) are saved/visible Issues regarding the project can be discussed on the [[https://cw.felk.cvut.cz/forum/thread-6734.html|forum thread]]. ==== Environment ==== - Install ''uv'' => https://docs.astral.sh/uv/getting-started/installation/ - Run ''uv venv'' in the folder with the notebook - Run ''uv sync'' in the folder with the notebook - Activate the env with ''source .venv/bin/activate'' - Run ''uv pip install pyg_lib torch_scatter torch_sparse torch_cluster -f https://data.pyg.org/whl/torch-2.10.0+cpu.html'' ==== Part 1: Tabular Models (Week 1, 2, 6 & 7) - Total Max 15 points ==== \\ === Task 1.1. - Max 8 points === * Compare DecisionTree, RandomForest, and XGBoost models on: - **Only** the target entity table. - The target entity table extended by **automatically** generated features through propositionalization methods. * Choose appropriate **evaluation metric(s)** * Briefly **discuss** your **setup**, the **results** of the **experiments**. \\ === Task 1.2. - Hyperparameter optimization - Max 2 points === * Perform **hyperparameter optimization** and provide brief analysis. * To get maximum points optimize at least two parameters on one model, e.g. ''max_depth'' and ''learning_rate'' on ''XGBoost''. \\ === Task 1.3. - Explanations - Max 3 points === * Use **explainability** methods to uncover how the model makes its predictions. * To get maximum points use at least two methods, interpret/discuss the meaning of their outputs, and compare them. \\ === Task 1.4. - TabPFN - Max 2 points === * Extend comparison from //Task 1// with ''TabPFN'' and discuss the results (also any issues you encounter). ==== Part 2: Graph Neural Networks (Week 3, 4 & 7) - Total Max 10 points ==== \\ === Task 2.1. - Max 5 points === * Implement the **RDL** pipeline (most of the tools were provided in the lab): - Load the database data - Process the data into tensor representation - Prepare an RDL model - Prepare the task target data and loaders/samplers - Train the model - Evaluate the model's performance over a range of values of at least one hyperparameter * Provide brief description of the pipeline, setup and results. If you struggle with the training loop check the examples from ''RelBench''. \\ === Task 2.2. - Max 3 points === * Compare the performance of a **homogenous** GNN on the same data represented by: - A homogeneous graph **without** features (structure only). - A homogeneous graph with **one-hot node type** features. * Discuss the results in comparison to ''Task 1'' (RDL). ''HeteroData'' object in ''PyG'' has ''to_homogeneous()'' method that can convert the heterogeneous graph to homogeneous graph. \\ === Task 2.3. - GNN Explainer - Max 2 points === * Use a **graph-specific explainer** (''GNNExplainer'', ''PGExplainer'', etc.) on either the RDL Heterogeneous GNN (''Task 1'') or the Homomogeneous GNN (''Task 2''). * Interpret/discuss the meaning of the explainer's output.