Design and Development of Unsupervised Stemmer for Sindhi Language - Bharti Nathani

Design and Development of Unsupervised Stemmer for Sindhi Language

Abstract

Stemmer is a fundamental NLP tool which performs the task of normalization (i.e. to remove suffixes) of inflected word. This paper presents a stemmer, design and developed for Sindhi Language, using unsupervised approach. Suffixes are extracted using “Linguistica 5 “[22] a tool for unsupervised learning of morphology. The raw corpus of 10000 sentences of Sindhi Language is used for extraction of suffixes. Unsupervised stemmer is evaluated using Direct approach. Results are compared with existing rule-based, stemmer [32] and Lemmatizer[33], 1000 words are extracted from Sindhi Dictionary for evaluation.

Copyright @2024 Sindhi Language Library. All Rights Reserved by Sindhi Language Authority

Powered by Abdul Majid Bhurgri Institute of Language Engineering