Confusion matrix and its application in cybercrime

In machine learning, being able to evaluate the performance of machine learning algorithms is extremely important for deriving insights into your model.

Here, my aim is to dive into the confusion matrix in a way that is accessible for those actively using them in cybercrime and for those who are just curious. With that being said, this article will be structured as follows:

1. What is a Confusion Matrix?

2. Understanding Confusion Matrix with simple example

3. What is its use in cybercrime activities?

What exactly is a Confusion Matrix?

A confusion matrix is a tabular summary of the number of correct and incorrect predictions made by a classifier and represented in the form of a matrix. It is used to measure the performance of a classification model.

We compare the predicted values for test data with the true values known to us. By this, we know how many cases are classified correctly and how many are classified incorrectly. The table below shows the structure of confusion matrix.

Confusion Matrix

For binary classification the negative class is 0 and the positive class is 1, the confusion matrix is constructed with a 2x2 grid table where the columns are the actual values of the data, and the rows are the predicted values from the model. So it is a table with 4 different combinations of predicted and actual values.

Below I have explained what the four boxes in the confusion matrix are representing.

True Positive: The model predicted positive and the label was actually positive.

True Negative: The model predicted negative and the label was actually negative.

False Positive: The model predicted positive and the label was actually negative. Also known as the Type 1 error

False Negative: The model predicted negative and the label was actually positive. Also known as the Type 2 error

Understanding Confusion Matrix with simple example:-

Here we have taken total of 20 cats and dogs and our model predicts whether it is a cat or not.

True Positive (TP) = 6: Model predicted that an animal is a cat and it actually is.

False Positive (Type 1 Error) (FP) = 2: Model predicted that animal is a cat but it actually is not (it’s a dog).

False Negative (Type 2 Error) (FN) = 1: Model predicted that animal is not a cat but it actually is.

True Negative (TN) = 11: Model predicted that animal is not a cat and it actually is not (it’s a dog).

Now let us check the accuracy of the model:

Accuracy simply measures how often the classifier makes the correct prediction. It’s the ratio between the number of correct predictions and the total number of predictions.

Accuracy = ((TP+TN) / (TP+TN+FP+FN)) X 100

= (correct predictions/Total predictions)*100

= ((6+11) / (6+11+2+1)) X 100 =85%

What is its use in cybercrime activities?

Intrusion Detection System checks for any malicious activity on the system. It monitors the packets coming over internet using ML model and predicts whether it is normal or an anomaly.

Let us say our model created the following confusion matrix for total of 165 packets it examined.

A total of 165 packets were analyzed by our model in IDS system which have been written in the above confusion matrix.

· Positive: Model predicted no attack.

· Negative: Model predicted attack.

· True Negative: Out of 55 times for which model predicted attack will take place, 50 predictions were ‘True’ which means 50 times attack actually took place. Due to prediction, Security Operations Centre (SOC) will receive notification and can prevent the attack.

· False Negative: Out of 55 times for which model predicted attack will take place, 5 times the attack didn’t happen. This can be considered as “False Alarm” and also Type II error.

· True Positive: The model predicted 110 times that attack wouldn’t take place, out of which 100 times no attack happened. These are the correct predictions.

· False Positive: 10 times the attack actually took place when the model had predicted that no attack will happen. It is also called as Type I error.

Type 1 error: This type of error can prove to be very dangerous. Our system predicted no attack but in real attack takes place, in that case no notification would have reached the security team and nothing can be done to prevent it. The False Positive cases above fall in this category and thus one of the aim of model is to minimize this value.

Type 2 error: This type of error are not very dangerous as our system is protected in reality but model predicted an attack. The team would get notified and check for any malicious activity. This doesn’t cause any harm. They can be termed as False Alarm

We can use confusion matrix to calculate various metrics:

Accuracy: The values of confusion matrix are used to calculate the accuracy of the model. It is the ratio of all correct predictions to overall predictions (total values)

Accuracy = (TP + TN)/ (TP + TN + FP + FN) X 100

Precision is True positives divided by Predicted positives i.e. TP / (TP + FP)

Recall is True positives divided by all actual positives i.e. TP / (TP + FN)

Specificity is True negatives divided all actual negatives i.e. TN / (TN + FP)

Misclassification is all incorrect divided by all i.e. (FP + FN) / (TP + TN + FP + FN) or 1-Accuracy

To get the this confusion matrix of our model in python we make use of function called confusion_matrix() present in sklearn library.


>>from sklearn.metrics import confusion_matrix

>>confusion_matrix(y_test , y_pred)

Output is an array!!

Here y_test is the real value and y_pred is value that model predicted for us.

In spite of the representational power of the confusion matrix in classification, it is not a very useful tool for the sake of comparison of the IDSs. To solve this problem, different performance metrics are defined in terms of the confusion matrix variables. These metrics produce some numeric values that are easily comparable.

Thus, in order to evaluate the effectiveness of an IDS, we need to measure its ability to correctly classify events as normal or intrusive along with other performance objectives, such as the economy in resource usage, resilience to stress and ability to resist attacks directed at the IDS.

Measuring these abilities o f IDS is important to both industry as well as the research community. It helps us to tune t h e IDS in a better way as well as compare different IDSs. As discussed above, there exist many metrics that measure different aspects of IDS, but n o single metric seems sufficient to measure the capability of the IDSs objectively. As per statistics of a survey c o n d u c t e d by ( Tavallaee, 2011), the most widely used metrics b y the intrusion detection research community are True Positive Rate (TPR) and False Positive Rate (FPR) along with the ROC.

Originated from signal detection theory (Tavallaee, 2011), ROC curves are used on the one hand to visualize the relation between detection rate and false positive rate of a classifier while tuning it, and on the other hand to compare the accuracy of several classifiers. Although this measure i s very effective, it has some limitations. The first limitation is that it is dependent on the ratio of attacks to normal traffic. The comparison of various classifiers based upon ROC works fine for the s a me dataset. However, the comparison of the IDSs done on various data sets is completely wrong, unless they have the same ratio of attack to normal instances. The second problem with ROC curves is that they might be misleading and simply incomplete for understanding the strengths and weaknesses of the candidate system.

Sometimes it is difficult to determine which IDS is better than another in terms of only FPR and TPR. For example, IDS1 can detect 10% more attacks, but IDS2 can produce 10% lower false alarms. Which one is better? In order to solve the problem, (Gu et al., 2006) suggested a single unified objective metric called intrusion detection capability (CID) based upon base rate, positive predictive value, or Bayesian detection rate (PPV) and negative predictive value (NPV). Such metric is used to select the best IDS configuration for an operational environment and to evaluate different IDSs.

Thank you!!!