Skip to content

Corpus Readers

Natural language processing requires data. This data, often referred to as "corpora," is essential for pattern extraction and machine learning. Reading these corpora and converting raw data into a format suitable for NLP tasks usually requires additional time for coding and preprocessing.

To save you time, we have provided classes and functions that make it easy to read popular Persian corpora. The classes and functions in this section are provided solely to facilitate developers' work and are not considered a core part of the Hazm library.

hamshahri_reader

mirastext_reader

quran_reader

bijankhan_reader

dadegan_reader

universal_dadegan_reader

degarbayan_reader

persica_reader

persian_plain_text_reader

peykare_reader

sentipers_reader

tnews_reader

treebank_reader

verbvalency_reader

wikipedia_reader

mizan_reader

ner_reader

naab_reader

arman_reader

faspell_reader