Skip to content

Sentence Tokenizer

This module contains classes and functions for sentence tokenization.

SentenceTokenizer

Bases: TokenizerI

This class includes functions for extracting sentences from text.

__init__()

Constructor.

tokenize(text)

Tokenizes the text into sentences.

Examples:

>>> tokenizer = SentenceTokenizer()
>>> tokenizer.tokenize('جدا کردن ساده است. تقریبا البته!')
['جدا کردن ساده است.', 'تقریبا البته!']

Parameters:

Name Type Description Default
text str

The text to be tokenized.

required

Returns:

Type Description
list[str]

A list of sentences.

sent_tokenize(text)

Tokenizes text into sentences.

Parameters:

Name Type Description Default
text str

The text to tokenize.

required

Returns:

Type Description
list[str]

A list of sentences.