Study of the scalability of biometric authentication systems based on voice embeddings
Abstract
Purpose: The purpose of this study is to investigate the scalability of a biometric authentication system based on the embedding model TitaNet using cosine distance. In particular, the aim is to compare the authentication accuracy with varying numbers of users to identify the relationship between the database size and the system’s efficiency.
Method: Quantitative and experimental methods were used, including the application of machine learning techniques for generating embeddings (using the TitaNet model) and statistical analysis to assess the scalability of the authentication system.
Findings: The study analyzed the impact of the number of users on the effectiveness of the biometric authentication system based on voice embeddings. The results demonstrated that the system maintains high stability and consistent performance with a small user base. However, as the number of users increases, a gradual decline in accuracy is observed, indicating scalability limitations. This trend highlights the need for additional measures to maintain system effectiveness under expanded usage conditions.
Theoretical implications: The study deepens the understanding of the impact of the user database size on the effectiveness of biometric authentication systems and demonstrates the potential of the TitaNet embedding model in this context. These findings may serve as a foundation for further developments in the field of cybersecurity related to voice authentication methods.
Practical implications: The study provides recommendations for developers, researchers, and organizations implementing biometric authentication systems based on voice embeddings. Applying optimal threshold settings, testing alternative models, and increasing the amount of input data during user registration can significantly enhance the effectiveness and robustness of such systems in scalable applications.
Value: An innovative approach was applied to assess the scalability of the biometric authentication system by comparing accuracy across different numbers of users using the TitaNet embedding model and cosine distance. This approach enables the identification of potential performance limitations and the formulation of recommendations to improve the system’s efficiency in large-scale applications.
Future research: Future studies are planned to expand the range of embedding models and explore alternative distance metrics aimed at enhancing the accuracy and scalability of the biometric authentication system. This will contribute to a more comprehensive understanding of how the choice of model and metric affects authentication effectiveness and improve methods for ensuring system security.
Paper type: Empirical study.
Downloads
References
Fortune Business Insights. (2024). Voice biometric solutions market size, share & industry analysis, by component, application, end-user, and regional forecast, 2024–2032. Available from : https://www.fortunebusinessinsights.com/industry-reports/voice-biometric-solutions-market-100509?utm_source=chatgpt.com
De Prisco, R., Fusco, C., Malandrino, D., & Zaccagnino, R. (2023). Text-independent voice recognition based on Siamese networks and fusion embeddings. In Proceedings of ITASEC 2023: The Italian Conference on CyberSecurity (May 03–05, 2023, Bari, Italy). CEUR Workshop Proceedings, 3488.
Quang, C. T., Nguyen, Q. M., Phuong, P. N., & Do, Q. T. (2021). Improving speaker verification in noisy environment using DNN classifier. In 2021 RIVF International Conference on Computing and Communication Technologies (RIVF) (pp. 1–5). IEEE. https://doi.org/10.1109/RIVF51545.2021.9642074
Jain, A. K., Hong, L., & Pankanti, S. (2000). Biometric identification. Communications of the ACM, 43(2), 91–98. https://doi.org/10.1145/328236.328110
Guo, C., & Berkhahn, F. (2016). Entity Embeddings of Categorical Variables. ArXiv, abs/1604.06737.
Zaiets, I., Brydinskyi, V., Sabodashko, D., Khoma, Y., & Ruda, K. (2024). Integrated system for speaker diarization and intruder detection using speaker embeddings. In CEUR Workshop Proceedings, 3654: Cybersecurity providing in information and telecommunication systems 2024 (pp. 228–238). CEUR-WS.
Koluguri, N. R., Park, T., & Ginsburg, B. (2022). TitaNet: Neural model for speaker representation with 1D depth-wise separable convolutions and global context. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8102–8106). IEEE. https://doi.org/10.1109/ICASSP43922.2022.9746806
Steck, H., Ekanadham, C., & Kallus, N. (2024). Is cosine-similarity of embeddings really about similarity? In Companion Proceedings of the ACM Web Conference 2024 (WWW '24) (pp. 887–890). Association for Computing Machinery. https://doi.org/10.1145/3589335.3651526
Biometric Update. (2024). How are biometric systems evaluated? Available from : https://www.biometricupdate.com/202405/how-are-biometric-systems-evaluated
Vjcalling. (2019). Speaker Recognition Audio Dataset [Data set]. Kaggle. Available from : https://www.kaggle.com/datasets/vjcalling/speaker-recognition-audio-dataset
Abstract views: 68 PDF Downloads: 83
Copyright (c) 2025 Khrystyna Ruda

This work is licensed under a Creative Commons Attribution 4.0 International License.
The authors agree with the following conditions:
1. Authors retain copyright and grant the journal right of first publication (Download agreement) with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
2. Authors have the right to complete individual additional agreements for the non-exclusive spreading of the journal’s published version of the work (for example, to post work in the electronic repository of the institution or to publish it as part of a monograph), with the reference to the first publication of the work in this journal.
3. Journal’s politics allows and encourages the placement on the Internet (for example, in the repositories of institutions, personal websites, SSRN, ResearchGate, MPRA, SSOAR, etc.) manuscript of the work by the authors, before and during the process of viewing it by this journal, because it can lead to a productive research discussion and positively affect the efficiency and dynamics of citing the published work (see The Effect of Open Access).