Utils
get_data_path(filename)
¶
Returns the data file path in a zip-safe manner.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str
|
The name of the data file. |
required |
Returns:
| Type | Description |
|---|---|
Path
|
The path to the specified data file. |
maketrans(a, b)
¶
Maps each character in string a to the corresponding character in string b.
Examples:
>>> table = maketrans('012', '۰۱۲')
>>> '012'.translate(table)
'۰۱۲'
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
a
|
str
|
A string of characters to be replaced. |
required |
b
|
str
|
A string of characters to replace with. |
required |
Returns:
| Type | Description |
|---|---|
dict[int, Any]
|
A dictionary mapping character ordinals to their replacements. |
past_roots()
¶
Returns a string of past roots joined by a pipe character.
Examples:
>>> from hazm.utils import past_roots
>>> past_roots()[:20]
'آباد|آزمود|آسود|آشفت'
Returns:
| Type | Description |
|---|---|
str
|
A string containing all past roots, suitable for use in regex. |
present_roots()
¶
Returns a string of present roots joined by a pipe character.
Examples:
>>> from hazm.utils import present_roots
>>> present_roots()[:20]
'آباد|آزمای|آسای|آشوب'
Returns:
| Type | Description |
|---|---|
str
|
A string containing all present roots, suitable for use in regex. |
regex_replace(patterns, text)
¶
Finds regex patterns and replaces them with the given text.
Examples:
>>> from hazm.utils import regex_replace
>>> patterns = [(r'apples', 'oranges'), (r'red', 'blue')]
>>> regex_replace(patterns, 'red apples')
'blue oranges'
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
patterns
|
list[tuple[str, str]]
|
A list of tuples, each containing (pattern, replacement). |
required |
text
|
str
|
The input text to be processed. |
required |
Returns:
| Type | Description |
|---|---|
str
|
The modified text after all replacements. |
stopwords_list(stopwords_file=default_stopwords)
¶
Returns a sorted list of stopwords.
Examples:
>>> from hazm.utils import stopwords_list
>>> stopwords_list()[:4]
['آخرین', 'آقای', 'آمد', 'آمده']
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
stopwords_file
|
str | Path
|
Path to the stopwords file. Defaults to |
default_stopwords
|
Returns:
| Type | Description |
|---|---|
list[str]
|
A sorted list of unique stopwords. |
verbs_list()
¶
Returns a list of verbs from the default verbs file.
Examples:
>>> from hazm.utils import verbs_list
>>> verbs_list()[:2]
['آباد#آباد', 'آزمای#آزمود']
Returns:
| Type | Description |
|---|---|
list[str]
|
A list of verbs. |
words_list(words_file=default_words)
¶
Returns a list of words from the specified file.
Examples:
>>> from hazm.utils import words_list
>>> words_list()[1]
('آب', 549005877, ('N', 'AJ'))
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
words_file
|
str | Path
|
Path to the words file. Defaults to |
default_words
|
Returns:
| Type | Description |
|---|---|
list[tuple[str, int, tuple[str, ...]]]
|
A list of tuples, each containing (word, count, categories). |