Skip to content

FaSpell Reader

This module includes classes and functions for reading the FAspell corpus.

The FAspell corpus contains 5,063 Persian spelling errors. This corpus also includes 801 misidentifications by OCR systems.

FaSpellReader

This class includes functions for reading the FAspell corpus.

Parameters:

Name Type Description Default
corpus_folder str

Path to the folder containing the corpus files.

required

__init__(corpus_folder)

Initializes the FAspell reader.

Parameters:

Name Type Description Default
corpus_folder str

Path to the folder containing the corpus files.

required

main_entries()

Yields misspelled words, their correct forms, and error categories.

Each entry is returned as a tuple: (misspelled_form, correct_form, error_category).

Examples:

>>> faspell = FaSpellReader(corpus_folder='faspell')
>>> next(faspell.main_entries())
("آاهي", "آگاهی", 1)

Yields:

Type Description
tuple[str, str, int]

The next entry in the main corpus.

ocr_entries()

Yields OCR-misidentified words and their correct equivalents.

Each entry is returned as a tuple: (misidentified_form, correct_form).

Examples:

>>> faspell = FaSpellReader(corpus_folder='faspell')
>>> next(faspell.ocr_entries())
("آمدیم", "آ!دبم")

Yields:

Type Description
tuple[str, str]

The next OCR entry in the corpus.