Informal Normalizer
This module contains classes and functions for normalizing informal text.
InformalNormalizer
¶
Bases: Normalizer
This class contains functions for normalizing informal text.
Examples:
>>> normalizer = InformalNormalizer()
>>> normalizer.normalize('بابا یه شغل مناسب واسه بچه هام پیدا کردن')
[[['بابا'], ['یک'], ['شغل'], ['مناسب'], ['برای'], ['بچه'], ['هایم'], ['پیدا'], ['کردن', 'کردند']]]
__init__(verb_file=informal_verbs, word_file=informal_words, seperation_flag=False, **kargs)
¶
Constructor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
verb_file
|
str
|
Path to the file containing informal verbs. |
informal_verbs
|
word_file
|
str
|
Path to the file containing informal words. |
informal_words
|
seperation_flag
|
bool
|
If True, adds spaces where necessary in parts of the text. |
False
|
**kargs
|
str
|
Optional keyword arguments. |
{}
|
informal_conjugations(verb)
¶
Generates informal conjugations of a verb.
Examples:
>>> normalizer = InformalNormalizer()
>>> normalizer.informal_conjugations('رفت')
['رفتم', 'رفتی', 'رفته', 'رفتیم', 'رفتین', 'رفتن', ...]
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
verb
|
str
|
The verb to be conjugated. |
required |
Returns:
| Type | Description |
|---|---|
list[str]
|
A list of informal conjugations. |
normalize(text)
¶
Converts informal text to standard Persian text.
Examples:
>>> normalizer = InformalNormalizer()
>>> normalizer.normalize('بچه هام پیدا کردن که به جایی برنمیخوره !')
[[['بچه'], ['هایم'], ['پیدا'], ['کردن', 'کردند'], ['که'], ['به'], ['جایی'], ['برنمیخورد', 'برنمیخوره'], ['!']]]
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
The informal text to be normalized. |
required |
Returns:
| Type | Description |
|---|---|
list[list[list[str]]]
|
A list of lists of lists of strings, representing the normalized text structure. |
normalized_word(word)
¶
Returns the normalized forms of the word.
Examples:
>>> normalizer = InformalNormalizer()
>>> normalizer.normalized_word('میرم')
['میروم', 'میرم']
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
word
|
str
|
The word to be normalized. |
required |
Returns:
| Type | Description |
|---|---|
list[str]
|
A list of normalized forms of the word. |
split_token_words(token)
¶
Inserts spaces where necessary in the token.
Examples:
>>> normalizer = InformalNormalizer(seperation_flag=True)
>>> normalizer.split_token_words('تورادوستدارم')
'تو را دوست دارم'
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
token
|
str
|
The token to be processed. |
required |
Returns:
| Type | Description |
|---|---|
str
|
The token with correct spacing. |