Використання мультимодальних великих мовних моделей для цифрової криміналістики з метою виявлення військовослужбовців на зображеннях із мобільних пристроїв

Taras Fedynyshyn; Serhii  Vysotskyi; Mariia  Khomik; Oleksandr  Hymza; Anastasia   Vasylytsia; Bohdan  Harasymchuk

doi:10.33445/sds.2025.15.2.15

Taras Fedynyshyn Національний університет “Львівська політехніка”
Serhii Vysotskyi Lviv Polytechnic National University https://orcid.org/0009-0000-5685-7503
Mariia Khomik Lviv Polytechnic National University https://orcid.org/0009-0004-6031-5618
Oleksandr Hymza Lviv Polytechnic National University https://orcid.org/0009-0009-7928-7545
Anastasia Vasylytsia Lviv Polytechnic National University https://orcid.org/0009-0000-9133-8338
Bohdan Harasymchuk Lviv Polytechnic National University https://orcid.org/0009-0008-6075-4820

DOI: https://doi.org/10.33445/sds.2025.15.2.15

Keywords: Multimodal Large Language Models (LLMs), Artificial Intelligence in Forensics, Mobile Forensics, Automated Image Recognition

Abstract

Purpose: The purpose of this research was to evaluate the effectiveness of multimodal Large Language Models (LLMs) in detecting military personnel in mobile device images, particularly in challenging scenarios involving mannequins dressed in military uniforms. The study aimed to assess whether such AI models can support forensic analysts by automating parts of the visual identification process in large-scale digital investigations.

Method: This study applied quantitative and experimental methods, including the use of multimodal AI models (Google Gemini 1.5 Pro and the open-source LLAVA) for image analysis. A structured analysis was conducted using statistical performance metrics such as precision, recall, and accuracy. The sample consisted of 436 images divided into three categories: military personnel (198), military mannequins (99), and civilians (137), all extracted from an iOS backup to simulate real-world forensic conditions.

Findings: Both models demonstrated high precision (1.0) and strong recall (0.99 for Gemini, 0.98 for LLAVA) in detecting military presence. However, they struggled to differentiate between real individuals and mannequins—Gemini misclassified 88 out of 99 mannequin images, while LLAVA misclassified 86. Gemini significantly outperformed LLAVA in identifying contextual attributes such as country (0.7875 vs. 0.1218) and unit name (0.2544 vs. 0.0051).

Theoretical implications: This research expands the application of multimodal LLMs in digital forensics by demonstrating their potential and limitations in semantically complex image recognition tasks. While the study did not challenge existing theories, it revealed the need for enhanced model architectures or training paradigms capable of better contextual interpretation in forensic scenarios.

Practical implications: The study highlights the practical potential of multimodal LLMs as auxiliary tools for forensic analysts, capable of rapidly identifying relevant images in large datasets. This can reduce manual workload, streamline investigative workflows, and improve initial triage in digital forensic investigations.

Value: This research is among the first to specifically explore the use of multimodal LLMs for detecting military personnel in images under real-world forensic conditions. The inclusion of mannequins as visual distractors adds unique value, uncovering significant model limitations and informing the development of more reliable AI tools for forensic applications.

Limitations: The findings are limited by the models' inability to differentiate between mannequins and real individuals, especially in cases of high visual realism. Future research should focus on improving robustness against such misclassifications, enhancing contextual reasoning, and extending evaluations to other object categories or threat identification use cases in digital forensics.

Paper type: Empirical study.

Downloads

Download data is not yet available.

References

Zangana, Hewa Majeed, and Marwan Omar. "Introduction to Digital Forensics and Artificial Intelligence." Digital Forensics in the Age of AI, edited by Marwan Omar and Hewa Majeed Zangana, IGI Global, 2025, pp. 1-30. https://doi.org/10.4018/979-8-3373-0857-9.ch001

Karthikeyan, P., Pande, H.M., & Sarveshwaran, V. (Eds.). (2023). Artificial Intelligence and Blockchain in Digital Forensics (1st ed.). River Publishers. https://doi.org/10.1201/9781003374671

“AI IN DIGITAL FORENSICS”, IJSRMST, vol. 3, no. 5, pp. 01–06, May 2024, https://doi.org/10.59828/ijsrmst.v3i5.208.

Moses Ashawa, Ali Mansour, Jackie Riley, Jude Osamor, Nsikak Pius Owoh. Digital Forensics Challenges in Cyberspace: Overcoming Legitimacy and Privacy Issues Through Modularisation. Cloud Computing and Data Science [Internet]. 2023 Dec. 25 ];5(1):140-56. https://doi.org/10.37256/ccds.5120233845.

Mishra, Pallavi. (2020). Big Data Digital Forensic and Cybersecurity. https://doi.org/10.1201/9781003024743-9.

Javed, Abdul Rehman & Jalil, Zunera & Zehra, Wisha & Gadekallu, Thippa & Suh, Doug & Jalil Piran, Md. (2021). A comprehensive survey on digital video forensics: Taxonomy, challenges, and future directions. Engineering Applications of Artificial Intelligence. https://doi.org/10.1016/j.engappai.2021.104456.

Hao Tan and Mohit Bansal. 2019. LXMERT: Learning Cross-Modality Encoder Representations from Transformers. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5100–5111, Hong Kong, China. Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1514.

Moustafa, N. (2022). Digital Forensics in the Era of Artificial Intelligence (1st ed.). CRC Press. https://doi.org/10.1201/9781003278962.

Emehin, Oluwatobi & Emeteveke, Isaac & Adeyeye, Oladele & Akanbi, Ibrahim. (2024). Generative AI in Forensic Data Analysis: Opportunities and Ethical Implications for Cloud-Based Investigations. International Journal of Research Publication and Reviews. 6. 2941-2957. https://doi.org/10.55248/gengpi.5.1024.2904.

Solanke, Abiodun & Biasiotti, Maria. (2022). Digital Forensics AI: Evaluating, Standardizing and Optimizing Digital Evidence Mining Techniques. KI – Künstliche Intelligenz. https://doi.org/10.1007/s13218-022-00763-9.

Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati et al, Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context, 2024, https://doi.org/10.48550/arXiv.2403.05530.

Haotian Liu, Chunyuan Li, Qingyang Wu, Yong Jae Lee, Visual Instruction Tuning, 2023, https://doi.org/10.48550/arXiv.2304.08485.

Kirchner, Matthias & Gloe, Thomas. (2015). Forensic Camera Model Identification. https://doi.org/10.1002/9781118705773.ch9.

Filler, Tomás & Fridrich, Jessica & Goljan, Miroslav. (2008). Using sensor pattern noise for camera model identification. Proceedings - International Conference on Image Processing, ICIP. 1296-1299. https://doi.org/10.1109/ICIP.2008.4712000.

Radford, Alec & Kim, Jong & Hallacy, Chris & Ramesh, Aditya & Goh, Gabriel & Agarwal, Sandhini & Sastry, Girish & Askell, Amanda & Mishkin, Pamela & Clark, Jack & Krueger, Gretchen & Sutskever, Ilya. (2021). Learning Transferable Visual Models From Natural Language Supervision. https://doi.org/10.48550/arXiv.2103.00020.

Yi, Z.; Xiao, T.; Albert, M.V. A Survey on Multimodal Large Language Models in Radiology for Report Generation and Visual Question Answering. Information 2025, 16, 136. https://doi.org/10.3390/info16020136.

He, Yingqing & Liu, Zhaoyang & Chen, Jingye & Zeyue, Tian & Liu, Hongyu & Chi, Xiaowei & Liu, Runtao & Yuan, Ruibin & Xing, Yazhou & Wang, Wenhai & Dai, Jifeng & Zhang, Yong & Xue, Wei & Liu, Qifeng & Guo, Yike & Chen, Qifeng. (2024). LLMs Meet Multimodal Generation and Editing: A Survey. https://doi.org/10.48550/arXiv.2405.19334.

iMazing, 2025. [Online]. Retrieved from : https://imazing.com/.

Mykhaylova, O. et al., Person-of-Interest Detection onMobile Forensics Data—AI-Driven Roadmap, in:Cybersecurity Providing in Information and Telecommunication Systems, vol. 3654 (2024) 239–251.

LLaVA: Large Language and Vision Assistant, 2025. [Online]. Retrieved from : https://llava-vl.github.io/.

Langchain, 2025. [Online]. Retrieved from : https://www.langchain.com/.