====== Spam filter - step 3 ====== Create additional functions needed to evaluate the filter quality. ===== Function ''quality_score()'' ===== Task: * Create function ''quality_score(tp, tn, fp, fn)'' in module ''quality.py''. * Function computes the quality score defined during the lab (find it also [[courses:be5b33prg:homeworks:spam:evaluation#filter_quality_assessment|here]]). ^ ''quality_score(tp, tn, fp, fn) '' Compute the quality score based on the confusion matrix. ^^ ^ Inputs | 4 nonnegative integers for TP, TN, FP, FN. | ^ Outputs | A number between 0 and 1 showing the prediction quality measure. | >{{page>courses:a4b99rph:internal:cviceni:spam:tyden08#quality_score&editbtn}} ===== Function ''compute_quality_for_corpus()'' ===== Task: * In module ''quality.py'', create function ''compute_quality_for_corpus(corpus_dir)'' which evaluates the filter quality based on the information contained in files ''!truth.txt'' and ''!prediction.txt'' in the given corpus. * The true and predicted classification can be read in the form of dictionaries using function ''read_classification_from_file()''. * The confusion matrix for the given corpus can be computed from the dictionaries using method ''compute_confusion_matrix()'' function from step 2. * The quality score can be computed from the confusion matrix using function ''quality_score()''. Why do we need it? * To compute the quality of a filter and to rank them. ^ ''compute_quality_for_corpus(corpus_dir)'' ^ Compute the quality of predictions for given corpus. ^ ^ Inputs | A corpus directory evaluated by a filter (i.e. a directory containing ''!truth.txt'' and ''!prediction.txt'' files). | ^ Outputs | Quality of the filter as a number between 0 and 1. | >{{page>courses:a4b99rph:internal:cviceni:spam:tyden08#compute_quality_for_corpus&editbtn}}