Peter Norvig on Errors - Dictionary of Arguments
Norvig I 710
Errors/Norvig/Russell: minimiz[ing] the error rate … is not the full story. E.g., It is worse to classify non-spam as spam (and thus potentially miss an important message) then to classify spam as non-spam (and thus suffer a few seconds of annoyance). So a classifier with a 1% error rate, where almost all the errors were classifying spam as non-spam, would be better than a classifier with only a 0.5% error rate, if most of those errors were classifying non-spam as spam.
Artificial Intelligence: In machine learning it is traditional to express LOSS FUNCTION utilities by means of a loss function. The loss
function L(x, y, ˆy) is defined as the amount of utility lost by predicting h(x)= ˆy when the correct answer is f(x)=y:
L(x, y, ˆy) = Utility(result of using y given an input x)
− Utility(result of using ˆy given an input x)
>Learning theory/Norvig, >Learning/AI Research.
Norvig I 757
Sometimes not all errors are equal. A loss function tells us how bad each error is; the goal is then to minimize loss over a validation set._____________Explanation of symbols: Roman numerals indicate the source, arabic numerals indicate the page number. The corresponding books are indicated on the right hand side. ((s)…): Comment by the sender of the contribution. Translations: Dictionary of Arguments The note [Author1]Vs[Author2] or [Author]Vs[term] is an addition from the Dictionary of Arguments. If a German edition is specified, the page numbers refer to this edition.
Stuart J. Russell
Artificial Intelligence: A Modern Approach Upper Saddle River, NJ 2010