FaSpell Reader
This module includes classes and functions for reading the FAspell corpus.
The FAspell corpus contains 5,063 Persian spelling errors. This corpus also includes 801 misidentifications by OCR systems.
FaSpellReader
¶
This class includes functions for reading the FAspell corpus.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
corpus_folder
|
str
|
Path to the folder containing the corpus files. |
required |
__init__(corpus_folder)
¶
Initializes the FAspell reader.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
corpus_folder
|
str
|
Path to the folder containing the corpus files. |
required |
main_entries()
¶
Yields misspelled words, their correct forms, and error categories.
Each entry is returned as a tuple: (misspelled_form, correct_form, error_category).
Examples:
>>> faspell = FaSpellReader(corpus_folder='faspell')
>>> next(faspell.main_entries())
("آاهي", "آگاهی", 1)
Yields:
| Type | Description |
|---|---|
tuple[str, str, int]
|
The next entry in the main corpus. |
ocr_entries()
¶
Yields OCR-misidentified words and their correct equivalents.
Each entry is returned as a tuple: (misidentified_form, correct_form).
Examples:
>>> faspell = FaSpellReader(corpus_folder='faspell')
>>> next(faspell.ocr_entries())
("آمدیم", "آ!دبم")
Yields:
| Type | Description |
|---|---|
tuple[str, str]
|
The next OCR entry in the corpus. |