Skip to content

Stemmer

This module includes classes and functions for word stemming.

The difference between Lemmatizer and Stemmer is that the Stemmer has no understanding of the word's meaning and merely tries to find the root by removing some simple suffixes; therefore, it may provide incorrect results for some words. However, the Lemmatizer performs this task based on a reference list of words along with their roots, offering more accurate results. Of course, the cost of this accuracy is lower speed in stemming.

Stemmer

Bases: StemmerI

This class includes methods for finding the stem of words.

__init__()

Initializes the Stemmer with a sorted list of suffixes.

stem(word)

Finds the stem of the word.

Example

stemmer = Stemmer() stemmer.stem('کتابی') 'کتاب' stemmer.stem('کتاب‌ها') 'کتاب' stemmer.stem('کتاب‌هایی') 'کتاب' stemmer.stem('کتابهایشان') 'کتاب' stemmer.stem('اندیشه‌اش') 'اندیشه' stemmer.stem('خانۀ') 'خانه'

Parameters:

Name Type Description Default
word str

The input word to be stemmed.

required

Returns:

Type Description
str

The stemmed version of the word.