NER Reader
This module includes classes and functions for reading the Named Entity Recognition (NER) corpus.
The Named Entity Recognition corpus contains 25 million tagged tokens from Persian Wikipedia in the form of about one million sentences.
NerReader
¶
This class includes functions for reading the Named Entity Recognition (NER) corpus.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
corpus_folder
|
str
|
Path to the folder containing the corpus files. |
required |
__init__(corpus_folder)
¶
Initializes the NER reader.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
corpus_folder
|
str
|
Path to the folder containing the corpus files. |
required |
sents()
¶
Yields sentences one by one as a list of (token, tag) tuples.
Examples:
>>> ner = NerReader("ner")
>>> next(ner.sents())
[('ویکیپدیای', 'O'), ('انگلیسی', 'O'), ('در', 'B-DAT'), ('تاریخ', 'I-DAT'), ('۱۵', 'I-DAT'), ('ژانویه', 'I-DAT'), ('۲۰۰۱', 'I-DAT'), ('(', 'O'), ('میلادی', 'B-DAT'), (')', 'O'), ('۲۶', 'B-DAT'), ('دی', 'I-DAT'), ('۱۳۷۹', 'I-DAT'), (')', 'O'), ('به', 'O'), ('صورت', 'O'), ('مکملی', 'O'), ('برای', 'O'), ('دانشنامه', 'O'), ('تخصصی', 'O'), ('نوپدیا', 'O'), ('نوشته', 'O'), ('شد', 'O'), ('.', 'O')]
Yields:
| Type | Description |
|---|---|
list[tuple[str, str]]
|
The next sentence in the form of a list of |