Spam filter - step 3

Create additional functions needed to evaluate the filter quality.

Function ''quality_score()''


  • Create function quality_score(tp, tn, fp, fn) in module
  • Function computes the quality score defined during the lab (find it also here).
quality_score(tp, tn, fp, fn) Compute the quality score based on the confusion matrix.
Inputs 4 nonnegative integers for TP, TN, FP, FN.
Outputs A number between 0 and 1 showing the prediction quality measure.

Function ''compute_quality_for_corpus()''


  • In module, create function compute_quality_for_corpus(corpus_dir) which evaluates the filter quality based on the information contained in files !truth.txt and !prediction.txt in the given corpus.
  • The true and predicted classification can be read in the form of dictionaries using function read_classification_from_file().
  • The confusion matrix for the given corpus can be computed from the dictionaries using method compute_confusion_matrix() function from step 2.
  • The quality score can be computed from the confusion matrix using function quality_score().

Why do we need it?

  • To compute the quality of a filter and to rank them.
compute_quality_for_corpus(corpus_dir) Compute the quality of predictions for given corpus.
Inputs A corpus directory evaluated by a filter (i.e. a directory containing !truth.txt and !prediction.txt files).
Outputs Quality of the filter as a number between 0 and 1.
courses/be5b33prg/homeworks/spam/step3.txt ยท Last modified: 2018/08/13 09:48 (external edit)