Дослідження загроз безпеки великих мовних моделей за допомогою автоматизованих інструментів

Viktor Kolchenko; Volodymyr Khoma; Dmytro Sabodashko; Pavlo Perepelytsia

doi:10.33445/sds.2024.14.6.9

Viktor Kolchenko Lviv Polytechnic National University https://orcid.org/0009-0002-0718-6859
Volodymyr Khoma Lviv Polytechnic National University https://orcid.org/0000-0001-9391-6525
Dmytro Sabodashko Lviv Polytechnic National University https://orcid.org/0000-0003-1675-0976
Pavlo Perepelytsia Lviv Polytechnic National University https://orcid.org/0009-0003-7315-4369

DOI: https://doi.org/10.33445/sds.2024.14.6.9

Keywords: large language model, language model vulnerability, automated testing, Garak

Abstract

Purpose: to explore and analyze existing approaches to vulnerability detection in large language models (LLMs), develop an architecture for an automated vulnerability testing system, and create a set of prompts for conducting practical testing of LLMs to evaluate their security.

Findings: The research demonstrated that automated systems, such as the Garak utility, can effectively detect and mitigate attacks on large language models. The application of such systems significantly enhances the security of language models.

Theoretical implications: The paper presents a novel approach to ensuring the security of language models through the automation of vulnerability testing. This contributes to existing theoretical frameworks in the fields of cybersecurity and modeling.

Practical implications: Researchers and developers can utilize the findings of this study to create more secure language models and improve algorithms designed to prevent manipulation and abuse.

Originality/Value: The paper proposes new technological solutions, including the implementation of an automated system based on Garak, which improves the security, resilience, and efficiency of language models. This is significant for the further development of artificial intelligence and cybersecurity fields.

Research limitations/Future research: The results may vary depending on the specific architectures of language models or types of attacks. Future research may focus on improving algorithms to detect new types of attacks and enhancing system performance under dynamic threat conditions.

Downloads

Download data is not yet available.

References

Neelakandan, R. Evaluating LLMs: Beyond Traditional Software Testing (2024).

Islam, N. T., Bahrami Karkevandi, M., Rad, P. Code Security Vulnerability Repair using Reinforcement Learning with Large Language Models (2024). https://doi.org/10.48550/arXiv.2401.07031.

Madamidola, O., Ngobigha, F., Ezzizi, A. Detecting New Obfuscated Malware Variants: A Lightweight and Interpretable Machine Learning Approach (2024). https://doi.org/10.48550/arXiv.2407.07918.

Tehranipoor, M. et al., Large Language Models for SoC Security (2024). https://doi.org/10.1007/978-3-031-58687-3_6.

Mykhaylova, O. et al., Person-of-Interest Detection on Mobile Forensics Data—AI-Driven Roadmap, in: Cybersecurity Providing in Information and Telecommunication Systems, vol. 3654 (2024) 239– 251.

Amin, U., Anjum, N., Sayed, Md. E-commerce Security: Leveraging Large Language Models for Fraud Detection and Data Protection (2024). https://doi.org/10.13140/RG.2.2.17604.23689.

Homès, B. Fundamentals of Software Testing, John Wiley & Sons (2024).

Fedynyshyn, T., Opirskyy, I., Mykhaylova, O., A Method to Detect Suspicious Individuals Through Mobile Device Data, in: 5th IEEE International Conference on Advanced Information and Communication Technologies (2023) 82–86.

Pargaonkar, S. Advancements in Security Testing: A Comprehensive Review of Methodologies and Emerging Trends in Software Quality Engineering, Int. J. Sci. Res. 12(9) (2023) 61–66.

Kulyk, M. et al., Using of Fuzzy Cognitive Modeling in Information Security Systems Constructing, in: IEEE 8 th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS), (2015) 408–411. https://doi.org/10.1109/IDAACS.2015.7340768.

Khoma, V. et al., Development of Supervised Speaker Diarization System based on the PyAnnote Audio Processing Library, Sensors, 23(4) (2023). doi: 10.3390/s23042082.

An, H. Research on the Development and Risks of Large Language Models, Theor. Natural Sci. 25 (2023) 268–272. https://doi.org/10.54254/2753-8818/25/20240991.

Wang, H. Development of Natural Language Processing Technology, ZTE Communications Technology, 28(2) (2022) 59–64.

Nieminen, M. The Transformer Model and Its Impact on the Field of Natural Language Processing (2023).

Che, W. et al., Natural Language Processing in the Era of Large Models: Challenges, Opportunities and Development, Science in China: Information Science (09) (2023) 1645–1687. https://doi.org/10.3389/frai.2023.1350306.

Singh, S. BERT Algorithm Used in Google Search, Math. Statistician Eng. Appl. 70 (2021) 1641–1650. https://doi.org/10.17762/msea.v70i2.2454.

Iosifov, I. et al., Transferability Evaluation of Speech Emotion Recognition Between Different Languages, Advances in Computer Science for Engineering and Education 134 (2022) 413–426. https://doi.org/10.1007/978-3-031- 04812-8_35.

Iosifov, I., Iosifova, O., Sokolov, V. Sentence Segmentation from Unformatted Text using Language Modeling and Sequence Labeling Approaches, in: IEEE 7th International Scientific and Practical Conference Problems of Infocommunications. Science and Technology (2020) 335–337. https://doi.org/10.1109/PICST51311.2020.9468084.

Iosifov, I. et al., Natural Language Technology to Ensure the Safety of Speech Information, in: Cybersecurity Providing in Information and Telecommunication Systems, vol. 3187, no. 1 (2022) 216–226

Iosifova, O. et al., Techniques Comparison for Natural Language Processing, in: 2nd International Workshop on Modern Machine Learning Technologies and Data Science, vol. 2631, no. I (2020) 57–67.

Chen, H. et al., Decoupled Model Schedule for Deep Learning Training (2023). https://doi.org/10.48550/arXiv.2302.08005.

Inan, H. et al., Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations (2023). https://doi.org/10.48550/arXiv.2312.06674.

Xu, H. et al., Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation, arXiv (2024). https://doi.org/10.48550/arXiv.2401.08417.

Törnberg, P. How to Use LLMs for Text Analysis, arXiv (2023). doi: 10.48550/arXiv.2307.13106.

Fasha, M. et al., (2024). Mitigating the OWASP Top 10 for Large Language Models Applications using Intelligent Agents, in: 2nd International Conference on Cyber Resilience (2024) 1–9. https://doi.org/10.1109/ICCR61006.2024.10532874.

OWASP, OWASP Top 10 for Large Language Model Applications, OWASP Foundation. URL: https://owasp.org/www-project-top-10-for-largelanguage-model-applications/

Derczynski, L. Garak Reference Documentation, Garak (2023). URL: https://reference.garak.ai/en/latest/

Derczynski, L. et al., garak: A Framework for Security Probing Large Language Models, arXiv (2024). https://doi.org/10.48550/arXiv.2406.11036.

Pezoa, F. et al., Foundations of JSON Schema, in: Proceedings of the 25th International Conference on World Wide Web (2016) 263–273. https://doi.org/10.1145/2872427.288302.

Perez, F., Ribeiro, I. Ignore Previous Prompt: Attack Techniques for Language Models, NeurIPS ML Safety Workshop (2022). https://doi.org/10.48550/arXiv.2211.09527.

OpenAI, ChatGPT. URL: https://openai.com/chatgpt/

Hugging Face, TinyLlama-1.1B-Chat-v1.0. Hugging Face. URL: https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0

Hugging Face, Google/flan-t5-xl. Hugging Face. URL: https://huggingface.co/google/flan-t5-xl

Luo, H. Phi-2: The Surprising Power of Small Language Models, Microsoft Research (2023). URL: https://www.microsoft.com/en-us/research/blog/phi2-the-surprising-power-of-small-language-models/