Analysis of vulnerabilities in large language models
Abstract
Purpose: analyzing the vulnerabilities of large-scale language models (LSLMs) based on the OWASP Top 10 classification for LSLM applications, assessing potential threats, and developing recommendations for improving the security of these models.
Method: quantitative methods, including the use of the Garak tool to identify and analyze vulnerabilities in large language models.
Findings: Several critical vulnerabilities have been identified, including query injection, insecure output processing, training data poisoning, disclosure of confidential information, and others. Even the most advanced models, including GPT-4, do not provide full protection against these threats.
Theoretical implications: The study deepens the understanding of security risks associated with the use of large language models and suggests new approaches to their analysis. This can serve as a basis for further theoretical developments in the field of cybersecurity with respect to LLMs.
Practical implications: The study offers recommendations for developers, researchers, and organizations that use LLMs. Implementing safe development practices and continuous model auditing can significantly increase the security of LLMs.
Value: an innovative approach was used to test the security of LLMs by simulating various attacks, including query injection, data poisoning, hallucinations, and information leaks, which allows identifying potential weaknesses in the functioning of the models.
Future research: In further research, we plan to expand the range of models and methods aimed at optimizing security. This will contribute to a more comprehensive understanding of the problem and improve security strategies.
Paper type: Conceptual research.
Downloads
References
Sreerakuvandana, S., Pappachan, P., Arya, V. (2024). Understanding large language models. In Advances in computational intelligence and robotics book series (pp. 1–24). https://doi.org/10.4018/979-8-3693-3860-5.ch001
Chen, Z., Xu, L., Zheng, H., Chen, L., Tolba, A. et al. (2024). Evolution and prospects of foundation models: from large language models to large multimodal models. Computers, Materials & Continua, 80(2), 1753-1808. https://doi.org/10.32604/cmc.2024.052618.
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... Amodei, D. (2020). Language Models are Few-Shot Learners. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2005.14165.
Brown, H., Lee, K., Mireshghallah, F., Shokri, R., Tramèr, F. (2022). What Does it Mean for a Language Model to Preserve Privacy? 2022 ACM Conference on Fairness, Accountability, and Transparency. https://doi.org/10.1145/3531146.3534642
Gholami, N. Y. (2024b). Large Language Models (LLMs) for Cybersecurity: A Systematic review. World Journal of Advanced Engineering Technology and Sciences, 13(1), 057–069. https://doi.org/10.30574/wjaets.2024.13.1.0395
Ye, J., Chen, X., Xu, N., Zu, C., Shao, Z., Liu, S., …Cui, Y. (2023). A comprehensive capability analysis of GPT-3 and GPT-3.5 series models. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2303.10420
OpenAI. (2023). GPT-4 Technical Report. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2303.08774
Jiang, A. Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D. S., De Las Casas, D., …Bressand, F. (2023). Mistral 7B. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2310.06825
Derczynski, L., Galinkin, E., Majumdar, S. (2024). Garak: A framework for large language model red teaming. // Garak Ai – Available from : https://garak.ai
OWASP. (2023). OWASP top 10 for LLM applications, version 1.1. // OWASP Available from : https://owasp.org/www-project-top-10-for-large-language-model-applications/assets/PDF/OWASP-Top-10-for-LLMs-2023-v1_1.pdf
Capitella, D. (n.d.). Forging thoughts and observations in REACT-based LLM agents via prompt injection. // Wi Labs – Available from : https://labs.withsecure.com/publications/llm-agent-prompt-injection
Petriv, P., Opirskyi, I. (2023). Analysis of the problems of using existing web vulnerability standards. Cybersecurity: education, science, technology, 2(22), 96–112. https://doi.org/10.28925/2663-4023.2023.22.96112
OWASP Application Security Verification Standard (ASVS) [Електронний ресурс] // OWASP - Available from : https://owasp.org/www-project-application-security-verification-standard/
Goldstein, J. A., Sastry, G., Musser, M., DiResta, R., Gentzel, M., Sedova, K. (2023). Generative language models and automated influence operations: emerging threats and potential mitigations. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2301.04246
Aghakhani, H., Dai, W., Manoel, A., Fernandes, X., Kharkar, A., Kruegel, C., …Vigna, G. (2023). TrojanPuzzle: Covertly Poisoning Code-Suggestion Models. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2301.02344
Yao, Y., Duan, J., Xu, K., Cai, Y., Sun, Z., Zhang, Y. (2023). A survey on Large Language Model (LLM) Security and Privacy: The Good, the bad, and the ugly. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2312.02003
Team, O. W. A. S. P. (n.d.). OWASP Top 10 API Security Risks – 2023 - OWASP API Security Top 10. [Електронний ресурс] // OWASP – Available from : https://owasp.org/API-Security/editions/2023/en/0x11-t10/
Zhang, M., Press, O., Merrill, W., Liu, A., Smith, N. (2023). How language model hallucinations can snowball. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2305.13534
White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, …Schmidt, D. (2023). A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2302.11382
Shayegani, E., Mamun, A., Fu, Y., Zaree, P., Dong, Y., Abu-Ghazaleh, N. (2023). Survey of vulnerabilities in large language models revealed by adversarial attacks. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2310.10844
Abstract views: 154 PDF Downloads: 103
Copyright (c) 2024 Viktor Kolchenko, Dmytro Sabodashko, Mariia Shved, Yuriy Khoma, Nazar Maksymiv

This work is licensed under a Creative Commons Attribution 4.0 International License.
The authors agree with the following conditions:
1. Authors retain copyright and grant the journal right of first publication (Download agreement) with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
2. Authors have the right to complete individual additional agreements for the non-exclusive spreading of the journal’s published version of the work (for example, to post work in the electronic repository of the institution or to publish it as part of a monograph), with the reference to the first publication of the work in this journal.
3. Journal’s politics allows and encourages the placement on the Internet (for example, in the repositories of institutions, personal websites, SSRN, ResearchGate, MPRA, SSOAR, etc.) manuscript of the work by the authors, before and during the process of viewing it by this journal, because it can lead to a productive research discussion and positively affect the efficiency and dynamics of citing the published work (see The Effect of Open Access).