Collection of questions and answers on performance measure of models

**Which is more important to youâ€“ model accuracy, or model performance?**

Lets answer this with respect to classification problems. Model Performance is more important. Model accuracy cannot be considered in cases where we have imbalanced dataset (where there are more positives then negatives). Accuracy also assign equal weight to labels which is a disadvantage in cases of imbalanced dataset.

Classification model performance can be evaluated from metrics such as Log-Loss, Accuracy, AUC(Area under Curve) and precision, recall (generally used by search engines)

**Can you cite some examples where a false positive is important than a false negative?**

Consider a model where, 1 (positive) means that a mail is Spam, 0 (negative) means that the mail is not Spam. If False Positives are high then important mails will go to the Spam folder and it may become difficult to retrieve that mail from the huge chunk of mails in Spam folder. Low False Negative would mean, that a spam mail lands up in the Primary mailbox.

Now it is not difficult, to mark a mail as Spam from the mail at Primary mailbox. But, as mentioned earlier, it is very difficult, to retrieve a mail from Spam folder. hence, in cases like this, False Positive is more important then False Negative

**Can you cite some examples where a false negative important than a false positive?**

In Cancer diagnosis, let 1 (positive) denote positive for Cancer, 0 (negative) denote negative for Cancer. A False Negative would mean a patient who has cancer has been diagnosed as negative for Cancer. This situation is very dangerous as a patient who has Cancer was detected as negative by ML model and as a result, the patient will not be subjected to follow up investigation.

On the other hand, False Positive is not as dangerous Flase Negative. Even if the patient does not have Cancer, the ML model will show positive and the patient will be subjected to further follow-up investigation.

**Can you cite some examples where both false positive and false negatives are equally important?**

Consider, posting articles in a blog. If this article is read by more then average number of readers in my blog then it is positive. Else, negative.

A false positive would mean that more readers then the average number of readers in my blog have read this article, but the truth is that less then average readers have read this article. Here, false positive gives me a wrong motivation but the same motivation ensures that I keep writing. Writing helps me stay in practice.

A flase negative would means that the article did not do any better than all the other article but the truth being that it garnered more readers than the average readership of my blog. Here, false negative gives me a sense of introspection on the quality of my writing and ultimately helps me improve myself.

**What is the most frequent metric to assess model accuracy for classification problems?**

The answer to this question is very domain specefic. For a overall idea we can say that confusion matrix is better then simple accuracy because of more output parameters in confusion matrix. RO curve could prove to be more helpful becuase it includes integration over the whole range of precision/recall tradeoffs. Log-loss is another metic to measure accuracy and it is the only one that considers probabilistic score directly.

**Why is Area Under ROC Curve (AUROC) better than raw accuracy as an out-of- sample evaluation metric?**

A ROC curve plots the true positives (sensitivity) vs. false positives (1 âˆ’ specificity), for a binary classifier system as its discrimination threshold is varied. An AUROC has many interpretations compared to raw accuracy.

A ROC curve plots the true positives (sensitivity) vs. false positives (1 âˆ’ specificity), for a binary classifier system as its discrimination threshold is varied. An AUROC has many interpretations compared to raw accuracy. A beautiful explanation on Confusion Matrix.

The area equals the probability that a randomly chosen positive example ranks above (is deemed to have a higher probability of being positive than) a randomly chosen negative example.

**What is Accuracy ?**

Accuracy can be defined as:

`(Number of correctly classified points)/(Total number of points)`

1) Imbalanced Data:A dumb model could get a very high accuracy. So never use accuracy as measure in imbalanced dataset.

2) Accuracy cannot use probabilistic score.

**Explain about Confusion matrix, TPR, FPR, FNR, TNR?**

Confusion matrix is a square matrix comprising of predicted/actual class label values. Dimension of the square is equal to the number of class labels. Confusion matrix does not consider probabilistic scores.

A good model, will have high TNR and TPR. Elements in principal diagonal matrix will be high for a good model

Important parameters related to Confusion Matrix

TPR: True Positive Rate

FPR: False Positive Rate

FNR: False Negative Rate

TNR: True Negative Rate

TP: Number of true positive points

FP: Number of false positive points

TN: Number of true negative points

FN: Number of false negative points

P:Total actual positive points

N:Total actual negative points

TPR = TP/P; TNR = TN/N; FPR = FP/N; FNR = FN/PTherefore, with TPR, TNR, FPR, FNR, we get a better insight of data rather then only accuracy. It is upto the domain to decide as to which among TPR, TNR, FPR, FNR is more important.

**What do you understand about Precision & recall, F1-score?**

Precision and recall are often used in information retrieval problems.They are related to the positive class/label of a dateset.

Precision is:`TP/(TP+FP)`

. It means that of all the points predicted to be positives, what percentage of them are actually positiveRecall is nothing but True Positive Rate(TPR). It means, out of all the positive labels, how many are correctly predicted to be positive.

We want precesion to be high which means that there are less points which are wrongly implicated to be positive. We also want, recall to be high, out of all the actual positive points, more points were rightly detected to be positive

Precision(Pr) and Recall(R) are combined in F1-Score.

$$F1Score = 2*\frac{Pr*R}{Pr+R}$$

**What is the ROC Curve and what is AUC (a.k.a. AUROC)**

Receiver Operating Characteristic Curve (ROC) and Area Under RO Curve (AUC) are

binary classificationmetric. It is a plot between TPR and FPR. An AUC score includes integration over the whole range of precision/recall tradeoffs, while the F1 score takes one specific precision and recall pair, which could be viewed as a sample or average. Area under RO curve can lie between 0 and 1. 1 signifies very good model. 0 means terrible.1. If we have imbalanced data, AUC can be high even for a dumb model.

2. AUC does not care about the actual score assigned to a data point label.

3. AUC of a random model is 0.5.

**What is Log-loss and how it helps to improve performance?.**

Given a test set, log-loss is defined as:

$$-\frac{1}{n}\sum_{i=1}^{n}\{(log(P_i)*y_i)+(1-y_i)*log(1-P_i)\} $$

`y`

is the label of dataset and_{i}`P`

._{i}is the probabilistic score of the labelLog-loss value is small where P

_{i}value is large for positive class/label. Also Log-loss value is small where P_{i}value is small for negative class/label. Loss loss value can lie between 0 to Infinity. 0 is the best case. Loss loss takes into consideration the actual probabilistic values.Log-Loss is average of negative log of probability of correct class label. Log-loss can be extended to multi class labels.

**Explain about R-Squared/ Coefficient of determination**

Coefficient of determination is a performance measure for models where predicted label values can belong to any real number (regression). Let the actual value be

`y`

and predicted value be_{i}`y'`

, then we can calculate_{i}erroras`e`

_{i}= y_{i}- y'_{i}Now, we define a term

Total Sum of Square, SS as

$$SS = \sum_{i=1}^{n}(y_i – \bar{y_i})^2$$

where,

$$\bar{y_i} = \frac{1}{n}\sum_{i=1}^{n}y_i$$ = average value of actual`y`

in test data._{i}In a simplest regression model, given a query point we can return its output as the mean of all the other outputs. For example, to predict height of a person among 10 persons, we can calculate the mean of height of all the other 9 person and assign it as the height of the person under consideration.

Total sum of square is the sum of square errors using a simple mean model. Now we define

Sum of Square Residual

$$SS = \sum_{i=1}^{n}(y_i – y’_i)^2$$

where,

`y'`

is the predicted class value._{i}

`SS`

is for a simple mean model whereas,_{total}`SS`

is for the model that is under operation. Now, we can define_{residual}`R`

as:^{2}

$$R^2 = 1-\frac{SS_{res}}{SS_{total}}$$.

Case 1: When`SS`

. This will happen when predicted value is exactly same as actual value, that means error,_{res}= 0`e`

. In this case_{i}= 0`R`

, which means that our model is phenomenal.^{2}= 1

Case 2:When`SS`

. In this case,_{res}< SS_{total}`R`

will be between 0 and 1.^{2}

Case 3::`SS`

, then_{res}= SS_{total}`R`

is 0, which means our model is same as simple mean model.^{2}

Case 3::`SS`

, then_{res}> SS_{total}`R`

becomes negative, which means our model is worse then a simple mean model^{2}

**Explain about Median absolute deviation (MAD) ?Importance of MAD?**

Errors,

`e`

and_{i}`SS`

can suffer from outlier points, i.e. if one point is very large, our entire`R`

can go for a toss.^{2}`R`

is not very robust to outliers.^{2}Now, error,

`e`

is a random variable. We can choose to select the mean of_{i}`e`

, i.e._{i}`median(e`

._{i}) = central value of errors

Median Absolute Deviation,`MAD(e`

_{i}) = Median(e_{i}- median(e_{i}))

Median is a robust measure of mean, and MAD is a robust measure of standard-deviation.