Tuesday, 23 December 2014

Prediction Measure Metrics in MLlib


val metrics = new MulticlassMetrics(predictionsAndLabels)

1. metrics.confusionMatrix


N target category values generate a N-by-N matrix, where each row corresponds to an actual correct value, and each column to a predicted value, in order. The entry at row i and column j counts the number of times an example with true category i was predicted as category j. So, the correct predictions are the counts along the diagonal, and incorrect predictions are everything else. Here it seems that, indeed, counts are high along the diagonal, which is a good sign.

Example:
14019.0  6630.0   15.0    0.0    0.0  1.0   391.0
5413.0   22399.0  438.0   16.0   0.0  3.0   50.0
0.0      457.0    2999.0  73.0   0.0  12.0  0.0
0.0      1.0      163.0   117.0  0.0  0.0   0.0
0.0      872.0    40.0    0.0    0.0  0.0   0.0
0.0      500.0    1138.0  36.0   0.0  48.0  0.0

1091.0   41.0     0.0     0.0    0.0  0.0   891.0

2. metrics.precision/recall


Precision is actually a common metric for binary classification problems, where there are 2 category values, not several. In a binary classification problem, where there is some kind of positive and negative class, precision is the proportion of all examples that the classifier marked positive that are actually positive
It is often accompanied by the metric recall. This is the proportion of all examples that are actually positive that the classifier marked positive.

3. The AUC metric is also used in evaluation of classifiers.


The Receiver Operating Characteristic (ROC) curve. The curve is created by plotting the true positive rate against the false positive rate at various threshold settings.
AUC equals the area under this curve, for Area Under the Curve. AUC may be viewed as the probability that a randomly-chosen “good” above a randomly-chosen “bad”.

No comments:

Post a Comment