Anomaly Detection in Machine Learning: Classification Algorithms vs Anomaly Detection

Introduction

Anomaly simply means something unusual or abnormal. We often encounter anomalies in our daily life. It can be suspicious activities of an end-user on a network or malfunctioning of equipment. Sometimes it is vital to detect such anomalies to prevent a disaster. For example, detecting a bad user can prevent online fraud or detecting malfunctioning equipment can prevent system failure.

Classification Algorithms vs Anomaly Detection:

Machine learning provides us many techniques to classify things into classes, for example, we have algorithms like logistic regression and support vector machine for classification problems. But these algorithms fail to classify anomalous and non-anomalous problems. In a typical classification problem, we have almost an equal or a comparable number of positive and negative examples. Suppose we have a classification problem in which we have to decide whether a vehicle is a car or not. Then in our dataset the share of cars maybe 40-60%, similarly the share of non-car examples maybe 40-60%. So, we generally have a balanced amount of positive and negative examples, and we train our model on a good amount positive as well as negative examples. On the other hand, in anomaly detection problem we have a significantly lesser amount of positive (anomalous) examples than the negative (non-anomalous) examples. The positive examples may be less than 5% or even 1% (obviously that is why they are anomalous). In such case, a classification algorithm cannot be trained well on positive examples. Here comes the anomaly detection algorithm to rescue us.

Anomaly Detection Algorithm:

Anomaly detection algorithm works on probability distribution technique. Here we use Gaussian distribution to model our data. It is a bell-shaped function given by

Ɲ (µ, σ2)

µ = mean

σ2= Variance

σ = Standard Deviation

If the probability of  χ is Gaussian with mean  µ and variance  σ2  then

  χ  ~ Ɲ ( µ, σ2 )

~’ is called tilde which is read as:  “χ  is distributed as“.

µ or mean describes the center of the curve.

σ or standard deviation and describes width of the curve.

The full function is as follows

p(x,  µ, σ2) = (1/ (σ*sqrt(2π)))* e– (1/2)*(sqr(x-µ)/σ2)

Image Source: Wikipedia

Suppose we have data set X as follows

X = [ [ x11   x12  x13  ….. x1n ]

[ x21   x22  x23  ….. x2n ]

:

:

:

[ xm1  xm2 xm3 ….. xmn ] ]

here xij  means  jth feature  ith  example.

To find mean µ :

We do column-wise summation and divide it by a number of examples. We get a row matrix of dimension (1x n).

µ = (1/m) * ( Σmi=1  Xi )  ; ( Here subscript i represents column no.)

 

=  [ [  µ1   µ2   µ3 ……….  µn ] ]

To find variance  σ2 :

σ2 = (1/m) * ( Σmi=1  (xi – µ)2 ) ; (  Here subscript i represents column no. )

In Octave/MATLAB syntax the above expression can be written as

σ2 = sum ((X-  µ).^2) * (1/m)

= [  σ21   σ22   σ23 ….   σ2n  ]

Here we get  σ2  as a row matrix of dimension (1x n).

Algorithm:

Training algorithm

  1. Choose features xi that you think to be indicative of anomalies.
  2. Split non-anomalous examples as 60% for training set, 20% for cross validation and 20% for test set.
  3. Split anomalous examples as 50% for cross validation set and 50% test set.
  4. Calculate µ and σ2 on training set.
    4. Take an arbitrary value as threshold ɛ.
  5. Check a every example x from cross validation set if it is anomalous or not calculate P(x) as follows

P(x)= Πnj=1 p(xj ;  µj ;  σj2 )

= Πnj=1{(1/ (σ*sqrt(2π)))* exp(- (1/2)*((xjj)2j2))}

= p(x1 ;  µ1 ;  σ12 ) * p(x2 ;  µ2 ;  σ22 ) *p(x3 ;  µ3 ;  σ32 )*…

……*p(xn ;  µn ;  σn2 )

  1. If P(x) is less than threshold ɛ then it is an anomaly otherwise not.
  2. Calculate F1 score for current ɛ.
  3. Repeat steps 4,5 and 6 for different values of ɛ and select ɛ which has highest F1 score

After training, we have µ, σ2 and  ɛ .

Algorithm to check whether a new example is anomaly or not

  1. Calculate P(x) for the new examples x as follows:

P(x)= Πnj=1 p(xj ;  µj ;  σj2 )

= Πnj=1{(1/ (σ*sqrt(2π)))* exp(- (1/2)*((xjj)2j2))}

= p(x1 ;  µ1 ;  σ12 ) * p(x2 ;  µ2 ;  σ22 ) *p(x3 ;  µ3 ;  σ32 )*…

……*p(xn ;  µn ;  σn2 )

  1. If P(x) is less than threshold ɛ then it is an anomaly otherwise not.

F1 Score:

F1 score is an error metrics for skewed data. Skewed data is the data where either of positive and negative example is significantly large than the other (>90%).

F1 score can be given as follows:

F1 = 2*{(precision * recall)/(precision + recall) }

Where,

precision = True Positive/(True positive + False positive )

recall       = True Positive/(True positive + False negative)

True Positive => When actual output is +ve and your algorithm predicts +ve

False Positive => When actual output is -ve and your algorithm predicts +ve

True Negative => When actual output is -ve and your algorithm predicts -ve

False Negative => When actual output is +ve and your algorithm predicts -ve

A good algorithm has high precision and recall. F1 tells us how well our algorithm works.

Higher the F1 score the better.

Image source: Wikipedia

When to use Anomaly Detection:

  • When we have very small number of true examples.
  • When we have different types of anomalies and it is difficult to learn from +ve examples what anomalies look like.

When to use Supervised Learning:

  • When we have large number of both +ve and -ve examples
  • When we have enough +ve examples to sense what new +ve example will look like

Choosing what features to use:

Selection of features affects how well your anomaly detection algorithm works. Select those features which are indicative of anomalies. The features you select must be gaussian. To check whether your features are gaussian or not, plot them. The plotted graph should be bell-shaped.

If your features are not gaussian then transform them into gaussian using any of the functions

  • log(x)
  • log(x+1)
  • log(x+c)
  • sqrt(x)
  • x1/3
 
Share:

Related Posts

Navigating Big Data Storage Challenges

Navigating Big Data Storage Challenges

The last decade or so has seen a big leap in technological advancements. One of the technologies to come up at this time and see a rapid…

Share:

A Deep Dive into 5G Service-Based Architecture (SBA)

5G technology roll out signifies an immense revenue opportunity for telecom industry.

Share:
Technical Documentation

Technical Documentation Review and Tips

Technical reviews are vital for effective and quality documentation. To make this happen, have documentation and its reviews listed as one of the deliverables – just like development or testing. This will place priority on the process, and ensure everyone involved understands the importance of proper and thorough reviews.

Share:
Generative AI and the changing face of Software Development Lifecycle

Generative AI and the changing face of Software Development Lifecycle

Generative AI is revolutionizing many IT segments, and one of such segments is software product development lifecycle (SDLC). This blog summarizes how Generative AI is transforming SDLC with its applications, benefits, and examples.

Share:
Revolutionizing the Impact of Gen AI in Data Centers and Network Infrastructure

Revolutionizing the Impact of Gen AI in Data Centers and Network Infrastructure

The advent of Generation AI (Gen AI) is changing the model of data centers and network infrastructure, indicating a new era of efficiency, scalability, and intelligence. Gen AI, representing the integration of AI into systems, is revamping how data is processed, analyzed, and managed within these critical environments. Network infrastructure benefits from self-optimizing capabilities, guaranteeing optimal performance and security. Read the blog to explore the impact of Gen AI in the datacenter infrastructure management.

Share:
Telecom Technology Trends 2024: Navigating the Evolution

Telecom Technology Trends 2024: Navigating the Evolution

The telecom industry is witnessing a rapid network transformation, enabled by a wide range of pioneering technology trends. The industry is gearing up for constant innovation in the coming year 2024. The digital disruption with technologies like 5G, IoT, and AI set the pace for modern market trends and business development in 2024. Read the blog to explore the major technology trends driving competitive and innovative business strategies in the telecom industry.

Share: