Exploring large language models’ security threats with automated tools
Abstract
Purpose: to explore and analyze existing approaches to vulnerability detection in large language models (LLMs), develop an architecture for an automated vulnerability testing system, and create a set of prompts for conducting practical testing of LLMs to evaluate their security.
Findings: The research demonstrated that automated systems, such as the Garak utility, can effectively detect and mitigate attacks on large language models. The application of such systems significantly enhances the security of language models.
Theoretical implications: The paper presents a novel approach to ensuring the security of language models through the automation of vulnerability testing. This contributes to existing theoretical frameworks in the fields of cybersecurity and modeling.
Practical implications: Researchers and developers can utilize the findings of this study to create more secure language models and improve algorithms designed to prevent manipulation and abuse.
Originality/Value: The paper proposes new technological solutions, including the implementation of an automated system based on Garak, which improves the security, resilience, and efficiency of language models. This is significant for the further development of artificial intelligence and cybersecurity fields.
Research limitations/Future research: The results may vary depending on the specific architectures of language models or types of attacks. Future research may focus on improving algorithms to detect new types of attacks and enhancing system performance under dynamic threat conditions.
Downloads
References
Neelakandan, R. Evaluating LLMs: Beyond Traditional Software Testing (2024).
Islam, N. T., Bahrami Karkevandi, M., Rad, P. Code Security Vulnerability Repair using Reinforcement Learning with Large Language Models (2024). https://doi.org/10.48550/arXiv.2401.07031.
Madamidola, O., Ngobigha, F., Ezzizi, A. Detecting New Obfuscated Malware Variants: A Lightweight and Interpretable Machine Learning Approach (2024). https://doi.org/10.48550/arXiv.2407.07918.
Tehranipoor, M. et al., Large Language Models for SoC Security (2024). https://doi.org/10.1007/978-3-031-58687-3_6.
Mykhaylova, O. et al., Person-of-Interest Detection on Mobile Forensics Data—AI-Driven Roadmap, in: Cybersecurity Providing in Information and Telecommunication Systems, vol. 3654 (2024) 239– 251.
Amin, U., Anjum, N., Sayed, Md. E-commerce Security: Leveraging Large Language Models for Fraud Detection and Data Protection (2024). https://doi.org/10.13140/RG.2.2.17604.23689.
Homès, B. Fundamentals of Software Testing, John Wiley & Sons (2024).
Fedynyshyn, T., Opirskyy, I., Mykhaylova, O., A Method to Detect Suspicious Individuals Through Mobile Device Data, in: 5th IEEE International Conference on Advanced Information and Communication Technologies (2023) 82–86.
Pargaonkar, S. Advancements in Security Testing: A Comprehensive Review of Methodologies and Emerging Trends in Software Quality Engineering, Int. J. Sci. Res. 12(9) (2023) 61–66.
Kulyk, M. et al., Using of Fuzzy Cognitive Modeling in Information Security Systems Constructing, in: IEEE 8 th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS), (2015) 408–411. https://doi.org/10.1109/IDAACS.2015.7340768.
Khoma, V. et al., Development of Supervised Speaker Diarization System based on the PyAnnote Audio Processing Library, Sensors, 23(4) (2023). doi: 10.3390/s23042082.
An, H. Research on the Development and Risks of Large Language Models, Theor. Natural Sci. 25 (2023) 268–272. https://doi.org/10.54254/2753-8818/25/20240991.
Wang, H. Development of Natural Language Processing Technology, ZTE Communications Technology, 28(2) (2022) 59–64.
Nieminen, M. The Transformer Model and Its Impact on the Field of Natural Language Processing (2023).
Che, W. et al., Natural Language Processing in the Era of Large Models: Challenges, Opportunities and Development, Science in China: Information Science (09) (2023) 1645–1687. https://doi.org/10.3389/frai.2023.1350306.
Singh, S. BERT Algorithm Used in Google Search, Math. Statistician Eng. Appl. 70 (2021) 1641–1650. https://doi.org/10.17762/msea.v70i2.2454.
Iosifov, I. et al., Transferability Evaluation of Speech Emotion Recognition Between Different Languages, Advances in Computer Science for Engineering and Education 134 (2022) 413–426. https://doi.org/10.1007/978-3-031- 04812-8_35.
Iosifov, I., Iosifova, O., Sokolov, V. Sentence Segmentation from Unformatted Text using Language Modeling and Sequence Labeling Approaches, in: IEEE 7th International Scientific and Practical Conference Problems of Infocommunications. Science and Technology (2020) 335–337. https://doi.org/10.1109/PICST51311.2020.9468084.
Iosifov, I. et al., Natural Language Technology to Ensure the Safety of Speech Information, in: Cybersecurity Providing in Information and Telecommunication Systems, vol. 3187, no. 1 (2022) 216–226
Iosifova, O. et al., Techniques Comparison for Natural Language Processing, in: 2nd International Workshop on Modern Machine Learning Technologies and Data Science, vol. 2631, no. I (2020) 57–67.
Chen, H. et al., Decoupled Model Schedule for Deep Learning Training (2023). https://doi.org/10.48550/arXiv.2302.08005.
Inan, H. et al., Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations (2023). https://doi.org/10.48550/arXiv.2312.06674.
Xu, H. et al., Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation, arXiv (2024). https://doi.org/10.48550/arXiv.2401.08417.
Törnberg, P. How to Use LLMs for Text Analysis, arXiv (2023). doi: 10.48550/arXiv.2307.13106.
Fasha, M. et al., (2024). Mitigating the OWASP Top 10 for Large Language Models Applications using Intelligent Agents, in: 2nd International Conference on Cyber Resilience (2024) 1–9. https://doi.org/10.1109/ICCR61006.2024.10532874.
OWASP, OWASP Top 10 for Large Language Model Applications, OWASP Foundation. URL: https://owasp.org/www-project-top-10-for-largelanguage-model-applications/
Derczynski, L. Garak Reference Documentation, Garak (2023). URL: https://reference.garak.ai/en/latest/
Derczynski, L. et al., garak: A Framework for Security Probing Large Language Models, arXiv (2024). https://doi.org/10.48550/arXiv.2406.11036.
Pezoa, F. et al., Foundations of JSON Schema, in: Proceedings of the 25th International Conference on World Wide Web (2016) 263–273. https://doi.org/10.1145/2872427.288302.
Perez, F., Ribeiro, I. Ignore Previous Prompt: Attack Techniques for Language Models, NeurIPS ML Safety Workshop (2022). https://doi.org/10.48550/arXiv.2211.09527.
OpenAI, ChatGPT. URL: https://openai.com/chatgpt/
Hugging Face, TinyLlama-1.1B-Chat-v1.0. Hugging Face. URL: https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0
Hugging Face, Google/flan-t5-xl. Hugging Face. URL: https://huggingface.co/google/flan-t5-xl
Luo, H. Phi-2: The Surprising Power of Small Language Models, Microsoft Research (2023). URL: https://www.microsoft.com/en-us/research/blog/phi2-the-surprising-power-of-small-language-models/
Abstract views: 44 PDF Downloads: 23
Copyright (c) 2024 Viktor Kolchenko, Volodymyr Khoma, Dmytro Sabodashko, Pavlo Perepelytsia

This work is licensed under a Creative Commons Attribution 4.0 International License.
The authors agree with the following conditions:
1. Authors retain copyright and grant the journal right of first publication (Download agreement) with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
2. Authors have the right to complete individual additional agreements for the non-exclusive spreading of the journal’s published version of the work (for example, to post work in the electronic repository of the institution or to publish it as part of a monograph), with the reference to the first publication of the work in this journal.
3. Journal’s politics allows and encourages the placement on the Internet (for example, in the repositories of institutions, personal websites, SSRN, ResearchGate, MPRA, SSOAR, etc.) manuscript of the work by the authors, before and during the process of viewing it by this journal, because it can lead to a productive research discussion and positively affect the efficiency and dynamics of citing the published work (see The Effect of Open Access).