Mizan Reader
This module includes classes and functions for reading the Mizan corpus.
The Mizan corpus contains more than 1 million English sentences (mostly in the field of classical literature) and their Persian translations, prepared by the Secretariat of the Supreme Council of Information and Communication Technology.
MizanReader
¶
A reader for the Mizan corpus.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
corpus_folder
|
str
|
Path to the folder containing the Mizan corpus files. |
required |
__init__(corpus_folder)
¶
Initializes the Mizan reader.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
corpus_folder
|
str
|
Path to the folder containing the Mizan corpus files. |
required |
english_persian_sentences()
¶
Yields pairs of English and Persian sentences side by side.
Examples:
>>> mizan = MizanReader("mizan")
>>> next(mizan.english_persian_sentences())
('The story which follows was first written out in Paris during the Peace Conference', 'داستانی که از نظر شما می\\u200cگذرد، ابتدا ضمن کنفرانس صلح پاریس از روی یادداشت\\u200cهائی که به طور روزانه در حال خدمت در صف برداشته شده بودند')
Yields:
| Type | Description |
|---|---|
tuple[str, str]
|
A tuple of (English sentence, Persian sentence). |
english_sentences()
¶
Yields English sentences one by one.
Examples:
>>> mizan = MizanReader("mizan")
>>> next(mizan.english_sentences())
'The story which follows was first written out in Paris during the Peace Conference'
Yields:
| Type | Description |
|---|---|
str
|
The next English sentence. |
persian_sentences()
¶
Yields Persian sentences one by one.
Examples:
>>> mizan = MizanReader("mizan")
>>> next(mizan.persian_sentences())
'داستانی که از نظر شما میگذرد، ابتدا ضمن کنفرانس صلح پاریس از روی یادداشتهائی که به طور روزانه در حال خدمت در صف برداشته شده بودند'
Yields:
| Type | Description |
|---|---|
str
|
The next Persian sentence. |