Journal of Science and Innovative Development ISSN 2181-4317

SCIENTIFIC AND THEORETICAL ANALYSIS OF MODELS FOR DETERMINING SEMANTIC SIMILARITY OF UZBEK LANGUAGE TEXTS

Allaberganova Nasiba Muradovna March 13, 2026 DOI: https://dx.doi.org/10.36522/2181-4317/2026-1/34-44

Abstract

In the field of Natural Language Processing (NLP), determining semantic textual similarity between texts (STS) forms the basis for many practical tasks, such as information retrieval, question-answering systems, automatic text summarization, and document comparison. For low-resource and agglutinative languages like the Uzbek language, this task presents particular challenges due to the rich morphological structure of the language and the lack of annotated datasets. As a result, traditional statistical and vector-based models do not provide sufficient accuracy for identifying semantic similarity. This paper proposes a hybrid approach for determining the semantic similarity of texts in the Uzbek language. The approach is based on the integration of a Siamese neural network architecture with Transformer-based language models, specifically BERT and Sentence-BERT. In the proposed model, text pairs are encoded through a Siamese network with shared weights, and their semantic proximity is measured in a vector space. Experimental results demonstrate that the hybrid Siam-Transformer model achieves higher accuracy and stability compared to traditional neural networks and classical embedding-based approaches. In particular, improvements are observed in the Spearman and Pearson correlation coefficients. Moreover, the proposed approach is computationally efficient and expands its practical applicability for low-resource languages

Cite this article
Allaberganova Nasiba Muradovna (2026). SCIENTIFIC AND THEORETICAL ANALYSIS OF MODELS FOR DETERMINING SEMANTIC SIMILARITY OF UZBEK LANGUAGE TEXTS. Journal of Science and Innovative Development. https://doi.org/https://dx.doi.org/10.36522/2181-4317/2026-1/34-44