Skip to content

Mizan Reader

This module includes classes and functions for reading the Mizan corpus.

The Mizan corpus contains more than 1 million English sentences (mostly in the field of classical literature) and their Persian translations, prepared by the Secretariat of the Supreme Council of Information and Communication Technology.

MizanReader

A reader for the Mizan corpus.

Parameters:

Name Type Description Default
corpus_folder str

Path to the folder containing the Mizan corpus files.

required

__init__(corpus_folder)

Initializes the Mizan reader.

Parameters:

Name Type Description Default
corpus_folder str

Path to the folder containing the Mizan corpus files.

required

english_persian_sentences()

Yields pairs of English and Persian sentences side by side.

Examples:

>>> mizan = MizanReader("mizan")
>>> next(mizan.english_persian_sentences())
('The story which follows was first written out in Paris during the Peace Conference', 'داستانی که از نظر شما می\\u200cگذرد، ابتدا ضمن کنفرانس صلح پاریس از روی یادداشت\\u200cهائی که به طور روزانه در حال خدمت در صف برداشته شده بودند')

Yields:

Type Description
tuple[str, str]

A tuple of (English sentence, Persian sentence).

english_sentences()

Yields English sentences one by one.

Examples:

>>> mizan = MizanReader("mizan")
>>> next(mizan.english_sentences())
'The story which follows was first written out in Paris during the Peace Conference'

Yields:

Type Description
str

The next English sentence.

persian_sentences()

Yields Persian sentences one by one.

Examples:

>>> mizan = MizanReader("mizan")
>>> next(mizan.persian_sentences())
'داستانی که از نظر شما می‌گذرد، ابتدا ضمن کنفرانس صلح پاریس از روی یادداشت‌هائی که به طور روزانه در حال خدمت در صف برداشته شده بودند'

Yields:

Type Description
str

The next Persian sentence.