The Good and Bad of LLMs in Digital Forensics

Many areas of our work are being probed by the usage of LLMs (Large Language Models), and cybersecurity is not immune to this. At present…

The Good and Bad of LLMs in Digital Forensics

Many areas of our work are being probed by the usage of LLMs (Large Language Models), and cybersecurity is not immune to this. At present, researchers are investigating many areas, such as around their usage by SOC Analysts, Threat Mapping to the MITRE framework, and in analysing malware.

In fact, Iturbe et al [2] outline the AI4CYBER framework, and which provides a roadmap for the integration of AI into cybersecurity applications. This includes AI4VUN (AI-enhanced vulnerability identification); AI4FIX (AI-driven self-testing and automatic error correction); AI4SIM (Simulation of advanced and AI-powered attacks); AI4CTI A(I-enhanced cyber threat intelligence of adversarial AI); AI4FIDS (federated learning-enhanced detection); AI4TRIAGE (Root cause analysis and alert triage); AI4SOAR (Automatic orchestration and adaptation of combined responses); AI4ADAPT (Autonomy and optimization of response adaptation); and AI4DECEIVE (Smart deception and honeynets); and AI4COLLAB (Information sharing with privacy and confidentiality):

Ferrag et al [3] have defined SecurityLLM for cybersecurity threats detection. It uses two key elements: SecurityBERT (cyber threat detection mechanism) and FalconLLM (an incident response and recovery system). This uses a simple classification model that is consolidated with LLMs, and can identify 14 different types of attacks to achieve an overall accuracy of 98\%. These include the threats of: DDoS UDP; DDoS ICMP; SQL injection; Password; Vulnerability_scanner; DDoS TCP; DDoS HTTP; Uploading; Backdoor; Port_Scanning; XSS; Ransomware; MITM and Fingerprinting.

Now, an interesting new paper thus discusses how LLMs could be used in digital forensics [1]:

Scanlon et al. [1] investigated the use of a pre-trained LLM for artefact understanding, evidence searching, code generation, anomaly detection, incident response, and education. For this, the low-risk applications but many other applications still require expert knowledge. The key areas of strength include creativity, reassurance, and avoidance of the blank page syndrome, especially in areas where ChatGPT cannot get wrong, such as in forensic scenario creation and reassurance of evidence, but care must be taken in this to avoid ChatGPT hallucinations. Another useful application is in code generation and explanations, such as generating commands for tool integration — and which can be used as a starting point in an investigation.

For weaknesses, Scanlon found that it was important to have a good quality and up-to-date training model; otherwise, ChatGPT could be biased and outdated in its analysis. Generally, it might not be able to find the newest of artefacts — if it is trained on relatively old data. Along with this, ChatGPT’s accuracy reduces as the task becomes more specific and where any analysis of non-textural data — such as network packets — is less accurate. The length of some evidence logs, too, caused problems and often had to be prefiltered before they were analysed. A final problem identified is that the output of ChatGPT is often not deterministic — and which is not good for reproducibility.

References

[1] Scanlon, M., Breitinger, F., Hargreaves, C., Hilgert, J. N., & Sheppard, J. (2023). ChatGPT for digital forensic investigation: The good, the bad, and the unknown. Forensic Science International: Digital Investigation, 46, 301609.

[2] Iturbe, E., Rios, E., Rego, A., & Toledo, N. (2023, August). Artificial Intelligence for next generation cybersecurity: The AI4CYBER framework. In Proceedings of the 18th International Conference on Availability, Reliability and Security (pp. 1–8).

[3] Ferrag, M. A., Ndhlovu, M., Tihanyi, N., Cordeiro, L. C., Debbah, M., & Lestable, T. (2023). Revolutionizing Cyber Threat Detection with Large Language Models. arXiv preprint arXiv:2306.14263.