Search
This is an old revision of the document!
Create a set of classes and functions needed to evaluate the filter quality.
Task:
confmat.py
BinaryConfusionMatrix
pos_tag
neg_tag
SPAM
OK
as_dict()
tp, tn, fp, fn
update(truth, prediction)
truth
prediction
ValueError
compute_from_dicts(truth_dict, pred_dict)
Why do we need it?
>>> cm1 = BinaryConfusionMatrix(pos_tag='True', neg_tag='False') >>> cm1.as_dict() {'tp': 0, 'tn': 0, 'fp': 0, 'fn': 0} >>> cm1.update('True', 'True') >>> cm1.as_dict() {'tp': 1, 'tn': 0, 'fp': 0, 'fn': 0}
>>> truth_dict = {'em1': 'SPAM', 'em2': 'SPAM', 'em3': 'OK', 'em4':'OK'} >>> pred_dict = {'em1': 'SPAM', 'em2': 'OK', 'em3': 'OK', 'em4':'SPAM'} >>> cm2 = BinaryConfusionMatrix(pos_tag='SPAM', neg_tag='OK') >>> cm2.compute_from_dicts(truth_dict, pred_dict) >>> cm2.as_dict() {'tp': 1, 'tn': 1, 'fp': 1, 'fn': 1}
The class shall have at least 3 public methods: as_dict(), update() and compute_from_dicts().
update()
compute_from_dicts()
pred
Note: You can expect that the dictionaries will have the same set of keys. Think about the situation when the keys would be different: what shall the method do?
quality_score(tp, tn, fp, fn)
quality.py
compute_quality_for_corpus(corpus_dir)
!truth.txt
!prediction.txt
read_classification_from_file()
quality_score()