NER Reader

This module includes classes and functions for reading the Named Entity Recognition (NER) corpus.

The Named Entity Recognition corpus contains 25 million tagged tokens from Persian Wikipedia in the form of about one million sentences.

`NerReader` ¶

This class includes functions for reading the Named Entity Recognition (NER) corpus.

Parameters:

Name	Type	Description	Default
`corpus_folder`	`str`	Path to the folder containing the corpus files.	required

`init(corpus_folder)` ¶

Initializes the NER reader.

Parameters:

Name	Type	Description	Default
`corpus_folder`	`str`	Path to the folder containing the corpus files.	required

`sents()` ¶

Yields sentences one by one as a list of (token, tag) tuples.

Examples:

>>> ner = NerReader("ner")
>>> next(ner.sents())
[('ویکی‌پدیای', 'O'), ('انگلیسی', 'O'), ('در', 'B-DAT'), ('تاریخ', 'I-DAT'), ('۱۵', 'I-DAT'), ('ژانویه', 'I-DAT'), ('۲۰۰۱', 'I-DAT'), ('(', 'O'), ('میلادی', 'B-DAT'), (')', 'O'), ('۲۶', 'B-DAT'), ('دی', 'I-DAT'), ('۱۳۷۹', 'I-DAT'), (')', 'O'), ('به', 'O'), ('صورت', 'O'), ('مکملی', 'O'), ('برای', 'O'), ('دانشنامه', 'O'), ('تخصصی', 'O'), ('نوپدیا', 'O'), ('نوشته', 'O'), ('شد', 'O'), ('.', 'O')]

Yields:

Type	Description
`list[tuple[str, str]]`	The next sentence in the form of a list of `(token, tag)` tuples.

NER Reader

NerReader ¶

__init__(corpus_folder) ¶

sents() ¶

`NerReader` ¶

`init(corpus_folder)` ¶

`sents()` ¶