Sentence Tokenizer
This module contains classes and functions for sentence tokenization.
SentenceTokenizer
¶
Bases: TokenizerI
This class includes functions for extracting sentences from text.
__init__()
¶
Constructor.
tokenize(text)
¶
Tokenizes the text into sentences.
Examples:
>>> tokenizer = SentenceTokenizer()
>>> tokenizer.tokenize('جدا کردن ساده است. تقریبا البته!')
['جدا کردن ساده است.', 'تقریبا البته!']
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
The text to be tokenized. |
required |
Returns:
| Type | Description |
|---|---|
list[str]
|
A list of sentences. |
sent_tokenize(text)
¶
Tokenizes text into sentences.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
The text to tokenize. |
required |
Returns:
| Type | Description |
|---|---|
list[str]
|
A list of sentences. |