Hidden Markov Model For Insider Threat Detection

One of the most difficult threats to detect is the insider threat, especially when related to the detection to fraud. Normally we detect…

Hidden Markov Model For Insider Threat Detection

One of the most difficult cybersecurity threats to detect is the insider threat, especially when related to the detection of fraud. Normally we detect changes of behaviour and identify the key signs of someone committing an insider attack. For this, we might gather data on email traffic, remote access traffic, work patterns, and so on. Then from this data, we can make observations from inferences than can be used to define particular states. This can be used to match particular indicator patterns of behaviour.

ML and insider threat detection

For Machine Learning (ML) often involves two main phases: training and testing, with a common set of steps of defining the features and classes within the training data set. Next, a subset of attributes is located for classification, and a learning model applied on the training data. With the learning model, the rest of the data is then fitted back and the success rate determined. The basic process that we have in applying machine learning to cyber security is:

Information sources. This involves defining the sources of information that would be required to capture the right information.
Data capturing tools. This involves creating the software agents required for the required data.
Data pre-processing. This involves processing the data into a format which is ready for the analysis part.
Feature extraction. This involves defines the key features that would be required for the analysis engine.
Analysis engine. This involves the creation of an analysis engine which takes the features and creates scoring to evaluate risks.
Decision engine. This takes the scoring systems from the analysis stage and makes a reasoned decision on the level of risk involved.

Machine learning provides an opportunity to learn standard patterns, and then apply these within standard signature detection methods, such as for detecting viruses, botnet activities, and standard hacking tools. This works well where we have automated and scripted attacks, where machines can learn well-defined patterns, such as linking given IP addresses, TCP ports, and domain lookups to a given threat. Normally this would then be detected using standard detection signatures, such as within firewalls and intrusion detection systems. For signatures, we typically look with network and machine logs for standard signs of activity.

The major challenge for machine learning is in human-initiated attacks, such as insider threats. This includes data theft, fraud and the malicious misconfiguration of equipment. Recent studies have shown that 70% of fraud is perpetrated by insiders but that 90% of the investment in security focuses on external threats.

In threats such as fraud, data theft, and so on, there is unlikely to be standard signs of the threat within network logs, and the focus is likely to be on behavioural analysis. In this case, anomaly detection is often used, and where the machine tries to learn normal behaviour, and then determine when we move away from this. For example, a user may send 10 emails per day, but then on one day they send 1,000, and the machine would then flag this an anomaly. One flag could be a risk or a given number of flags within a defined time period. For example, a financial organisation might have an amber flag set when a given user sends more than 10 emails with file attachments which are greater than 10MB over the course of a week, and a red flag for more than 20 emails.

In most systems, these threats will often be manually defined, but for machine learning to be applied an organisation would have to compile a history of malicious and non-malicious users over time and then train the system to spot patterns. While this often works well with simulated data, or in data sets which have well-defined patterns, it often has poor success rates in real-life.

Unfortunately, humans are often unpredictable, and will often change their behaviour. Thus machine learning often works well for insider threat detection often works well with simulated data, but struggles in a real-life system, and where too many false positives are triggered. Along with this, if a human knows that they are being monitored they are likely to moderate their behaviour and not follow a standard pattern for an insider hack. The logs of activities for machines to learn behaviours for insider threats are more likely to lie in the HR system, and system domain logs. Greitzer [1] outlines that there is a range of indicators which can be used to detect the presence of insider threats (Table 1). This includes disgruntlement, anger management issues and disregard for performance, and which could show the signs of someone who might perform a data hack or cause damage to the IT infrastructure. Figure 1 shows the complexity of detecting the insider threat [2].

Figure 1: Conceptual model of insider threat detection [2]

Researchers have found that most acts are committed within normal working hours and are planned in advance. From a recent analysis it was found that in 3 in 10 of the cases that the person involved was defined as “difficult”, and 17% as being “disgruntled”, For motivation, financial gain was detected in more than 80% of the cases, with revenge in around 23% of the cases. An interesting fact is that 27% of the perpetrators were having money problems at the time. In order to detect these motivations, machine learning would be required to possibly have to intrude on bot the internal and external activities of an employee and learn their normal patterns.

For example, a company could monitor email communications to detect a tone that reflected a disgruntled employee, such as detecting emotive terms such as “hate” or “dislike”, or in detecting changes of behaviour those review meetings with managers, otherwise the might detect when a person moved home into a less expensive home. This type of activity would have a serious implementation on the privacy of employees, and would only be justified in high-risk environments.

Overall machine learning is useful in training for well-defined signatures and then generate signatures to trigger alerts for humans to make sense. The investigation of human-sourced crime is possibly well beyond the capabilities of computer systems, especially as it could often be seen as spying on employees.

There are various methods we can use in order to detect insider threats. These include the usage of decoys and honeypots in order to catch the insider in the act:

Hidden Markov Model

When we understand the stages that an insider can go through, the Hidden Markov Model (HMM) can be used when there are observations on behavours and where they they can be match to a likely sequence of states that has taken us to this point. For this we have states (X), and observations (O). The transition between states is then defined by a Markov model. We move between states with a defined state probability matrix (A), and each state has an observation probability matrix (B):

The starting pointing will have a given probability of a given state. We then define a transient probability between states, and an emission probability for the translation of states to the observations. So let’s take an example of detecting insider threats. In this case we will take daily observations for users, and determine if we think they are hacking our system. Overall we have determined that there is a 10% chance of someone starting as a hacker. Once they are a hacker, we think they have a 20% chance of staying one, and an 80% chance of not becoming a hacker. Also, if they are not a hacker, we think there is only a 10% chance that they will become a hacker. This gives:

The internal state of “Hacker” and “Not a hacker” will be hidden from us. Next we can add the observations. If someone is a hacker, there is a 80% chance of them “Sending many large attachments in emails”, 15% chance of them “working weekends”, and a 5% chance of “No alerts”. We can then add the emission probabilities:

We can then make observations and then aim to understand the most likely state that has led to these outputs. First we will create our HMM with some Python code [here]:

from hmmlearn import hmm
import numpy as np
import math

states = (‘Hacker’, ‘No Hacker’)
 
observations = (‘Large email’, ‘Weekend working’, ‘Priv Esc’)
 
start_probability = {‘Hacker’: 0.1, ‘No hacker’: 0.9}
 
transition_probability = {
 ‘Hacker’ : {‘Hacker’: 0.2, ‘No hacker’: 0.8},
 ‘No hacker’ : {‘Hacker’: 0.1, ‘No hacker’: 0.9},
 }
 
emission_probability = {
 ‘Hacker’ : {‘Large email’: 0.1, ‘Weekend working’: 0.4, ‘Priv esc’: 0.5},
 ‘No hacker’ : {‘Marge email’: 0.6, ‘Weekend working’: 0.3, ‘Priv esc’: 0.1},
 }

model = hmm.MultinomialHMM(n_components=2)
model.startprob_ = np.array([0.1, 0.9])
model.transmat_ = np.array([[0.5, 0.5],
 [0.1, 0.9]])

model.emissionprob_ = np.array([[0.8, 0.15, 0.05],[0.2, 0.05, 0.75]])

Now we can determine the probability of given observations:

print (‘Prob of Large email=’,math.exp(model.score(np.array([[0]]))))

print (‘Prob of Weekend working=’,math.exp(model.score(np.array([[1]]))))

print (‘Prob of No alerts=’,math.exp(model.score(np.array([[2]]))))

A simple run gives:

Prob of Large email= 0.21500000000000002
Prob of Weekend working= 0.06
Prob of No alerts= 0.8150000000000001

In this case, there is a 21.5% chance of us observing a large email being sent, 6% chance of an alert of working weekends, and a 81.5% chance of there being no alerts.

Now we can go ahead and setup our observations, and then determine the most likely state. Let’s say we have observations of “Large email”, “Large emails” and “Weekend working” (not that the result from the score is given as a log, and thus we need math.exp() to bring it back to a probability score) [Code]:

logprob, seq = model.decode(np.array([[0,0,1]]).transpose())
print(“math.exp(logprob))
print(seq)

Our output is [Code]:

0.0024000000000000015
[0 0 0]

In this case the most likely state transition is “Hacker”, “Hacker”, and “Hacker”, and that this has a probability of 2.4%. Next we can try “No alert”, “Weekend work”, and “Weekend work” [Code]:

logprob, seq = model.decode(np.array([[2,1,1]]).transpose())
print(math.exp(logprob))
print(seq)

The result is [Code]:

0.0016402500000000002
[1 1 1]

In this case the most likely state transition is “No hacker”, “No hacker” and “No hacker”. But if our next observation is sending a large email, we now get [Code]:

logprob, seq = model.decode(np.array([[2,1,1,0]]).transpose())
print(math.exp(logprob))
print(seq)

and this now identifies that a hacker is most likely after the first state [Code]:

0.0003037500000000001
[1 0 0 0]

Conclusions

Human analysts are increasingly swamped by alerts on networks. We are thus using machine learning to provide, at least, a core understanding of the current state of our infrastructures.

References

[1] F. L. Greitzer, A. C. Dalton, L. J. Kangas, C. F. Noonan, and R. E. Hohimer, “Identifying At-risk Employees: Modeling Psychosocial Precursors of Potential Insider Threats.”

[2] P. Legg, N. Moffat, J. R. C. Nurse, J. Happa, I. Agrafiotis, M. Goldsmith, and S. Creese, “Towards a Conceptual Model and Reasoning Structure for Insider Threat Detection.”