Informal Normalizer

This module contains classes and functions for normalizing informal text.

`InformalNormalizer` ¶

Bases: Normalizer

This class contains functions for normalizing informal text.

Examples:

>>> normalizer = InformalNormalizer()
>>> normalizer.normalize('بابا یه شغل مناسب واسه بچه هام پیدا کردن')
[[['بابا'], ['یک'], ['شغل'], ['مناسب'], ['برای'], ['بچه'], ['هایم'], ['پیدا'], ['کردن', 'کردند']]]

`init(verb_file=informal_verbs, word_file=informal_words, seperation_flag=False, **kargs)` ¶

Constructor.

Parameters:

Name	Type	Description	Default
`verb_file`	`str`	Path to the file containing informal verbs.	`informal_verbs`
`word_file`	`str`	Path to the file containing informal words.	`informal_words`
`seperation_flag`	`bool`	If True, adds spaces where necessary in parts of the text.	`False`
`**kargs`	`str`	Optional keyword arguments.	`{}`

`informal_conjugations(verb)` ¶

Generates informal conjugations of a verb.

Examples:

>>> normalizer = InformalNormalizer()
>>> normalizer.informal_conjugations('رفت')
['رفتم', 'رفتی', 'رفته', 'رفتیم', 'رفتین', 'رفتن', ...]

Parameters:

Name	Type	Description	Default
`verb`	`str`	The verb to be conjugated.	required

Returns:

Type	Description
`list[str]`	A list of informal conjugations.

`normalize(text)` ¶

Converts informal text to standard Persian text.

Examples:

>>> normalizer = InformalNormalizer()
>>> normalizer.normalize('بچه هام پیدا کردن که به جایی برنمیخوره !')
[[['بچه'], ['هایم'], ['پیدا'], ['کردن', 'کردند'], ['که'], ['به'], ['جایی'], ['برنمی‌خورد', 'برنمی‌خوره'], ['!']]]

Parameters:

Name	Type	Description	Default
`text`	`str`	The informal text to be normalized.	required

Returns:

Type	Description
`list[list[list[str]]]`	A list of lists of lists of strings, representing the normalized text structure.

`normalized_word(word)` ¶

Returns the normalized forms of the word.

Examples:

>>> normalizer = InformalNormalizer()
>>> normalizer.normalized_word('می‌رم')
['می‌روم', 'می‌رم']

Parameters:

Name	Type	Description	Default
`word`	`str`	The word to be normalized.	required

Returns:

Type	Description
`list[str]`	A list of normalized forms of the word.

`split_token_words(token)` ¶

Inserts spaces where necessary in the token.

Examples:

>>> normalizer = InformalNormalizer(seperation_flag=True)
>>> normalizer.split_token_words('تورادوست‌دارم')
'تو را دوست دارم'

Parameters:

Name	Type	Description	Default
`token`	`str`	The token to be processed.	required

Returns:

Type	Description
`str`	The token with correct spacing.

Informal Normalizer

InformalNormalizer ¶

__init__(verb_file=informal_verbs, word_file=informal_words, seperation_flag=False, **kargs) ¶

informal_conjugations(verb) ¶

normalize(text) ¶

normalized_word(word) ¶

split_token_words(token) ¶

`InformalNormalizer` ¶

`init(verb_file=informal_verbs, word_file=informal_words, seperation_flag=False, **kargs)` ¶

`informal_conjugations(verb)` ¶

`normalize(text)` ¶

`normalized_word(word)` ¶

`split_token_words(token)` ¶