Дослідження масштабованості систем біометричної автентифікації на основі ембеддінгів голосу

Khrystyna Ruda

doi:10.33445/sds.2025.15.1.15

Khrystyna Ruda Lviv National Polytechnic University https://orcid.org/0000-0001-8644-411X

DOI: https://doi.org/10.33445/sds.2025.15.1.15

Keywords: biometric technologies, Titanet, voice authentication, cybersecurity, scalability

Abstract

Purpose: The purpose of this study is to investigate the scalability of a biometric authentication system based on the embedding model TitaNet using cosine distance. In particular, the aim is to compare the authentication accuracy with varying numbers of users to identify the relationship between the database size and the system’s efficiency.

Method: Quantitative and experimental methods were used, including the application of machine learning techniques for generating embeddings (using the TitaNet model) and statistical analysis to assess the scalability of the authentication system.

Findings: The study analyzed the impact of the number of users on the effectiveness of the biometric authentication system based on voice embeddings. The results demonstrated that the system maintains high stability and consistent performance with a small user base. However, as the number of users increases, a gradual decline in accuracy is observed, indicating scalability limitations. This trend highlights the need for additional measures to maintain system effectiveness under expanded usage conditions.

Theoretical implications: The study deepens the understanding of the impact of the user database size on the effectiveness of biometric authentication systems and demonstrates the potential of the TitaNet embedding model in this context. These findings may serve as a foundation for further developments in the field of cybersecurity related to voice authentication methods.

Practical implications: The study provides recommendations for developers, researchers, and organizations implementing biometric authentication systems based on voice embeddings. Applying optimal threshold settings, testing alternative models, and increasing the amount of input data during user registration can significantly enhance the effectiveness and robustness of such systems in scalable applications.

Value: An innovative approach was applied to assess the scalability of the biometric authentication system by comparing accuracy across different numbers of users using the TitaNet embedding model and cosine distance. This approach enables the identification of potential performance limitations and the formulation of recommendations to improve the system’s efficiency in large-scale applications.

Future research: Future studies are planned to expand the range of embedding models and explore alternative distance metrics aimed at enhancing the accuracy and scalability of the biometric authentication system. This will contribute to a more comprehensive understanding of how the choice of model and metric affects authentication effectiveness and improve methods for ensuring system security.

Paper type: Empirical study.

Downloads

Download data is not yet available.

References

Fortune Business Insights. (2024). Voice biometric solutions market size, share & industry analysis, by component, application, end-user, and regional forecast, 2024–2032. Available from : https://www.fortunebusinessinsights.com/industry-reports/voice-biometric-solutions-market-100509?utm_source=chatgpt.com

De Prisco, R., Fusco, C., Malandrino, D., & Zaccagnino, R. (2023). Text-independent voice recognition based on Siamese networks and fusion embeddings. In Proceedings of ITASEC 2023: The Italian Conference on CyberSecurity (May 03–05, 2023, Bari, Italy). CEUR Workshop Proceedings, 3488.

Quang, C. T., Nguyen, Q. M., Phuong, P. N., & Do, Q. T. (2021). Improving speaker verification in noisy environment using DNN classifier. In 2021 RIVF International Conference on Computing and Communication Technologies (RIVF) (pp. 1–5). IEEE. https://doi.org/10.1109/RIVF51545.2021.9642074

Jain, A. K., Hong, L., & Pankanti, S. (2000). Biometric identification. Communications of the ACM, 43(2), 91–98. https://doi.org/10.1145/328236.328110

Guo, C., & Berkhahn, F. (2016). Entity Embeddings of Categorical Variables. ArXiv, abs/1604.06737.

Zaiets, I., Brydinskyi, V., Sabodashko, D., Khoma, Y., & Ruda, K. (2024). Integrated system for speaker diarization and intruder detection using speaker embeddings. In CEUR Workshop Proceedings, 3654: Cybersecurity providing in information and telecommunication systems 2024 (pp. 228–238). CEUR-WS.

Koluguri, N. R., Park, T., & Ginsburg, B. (2022). TitaNet: Neural model for speaker representation with 1D depth-wise separable convolutions and global context. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8102–8106). IEEE. https://doi.org/10.1109/ICASSP43922.2022.9746806

Steck, H., Ekanadham, C., & Kallus, N. (2024). Is cosine-similarity of embeddings really about similarity? In Companion Proceedings of the ACM Web Conference 2024 (WWW '24) (pp. 887–890). Association for Computing Machinery. https://doi.org/10.1145/3589335.3651526

Biometric Update. (2024). How are biometric systems evaluated? Available from : https://www.biometricupdate.com/202405/how-are-biometric-systems-evaluated

Vjcalling. (2019). Speaker Recognition Audio Dataset [Data set]. Kaggle. Available from : https://www.kaggle.com/datasets/vjcalling/speaker-recognition-audio-dataset