Sequence Tagger
This module contains classes and functions for tagging tokens.
SequenceTagger
¶
Base class for sequence tagging using CRFSuite.
Examples:
>>> tagger = SequenceTagger(model='tagger.model')
>>> tagger.tag(['من', 'به', 'مدرسه', 'رفتم', '.'])
[('من', 'PRON'), ('به', 'ADP'), ('مدرسه', 'NOUN'), ('رفتم', 'VERB'), ('.', 'PUNCT')]
__init__(model=None, data_maker=data_maker)
¶
Constructor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
str | Path | None
|
Path to the model file. |
None
|
data_maker
|
Callable
|
Function to generate features from tokens. |
data_maker
|
evaluate(tagged_sent)
¶
Evaluates the model.
Examples:
>>> tagger = SequenceTagger(model='tagger.model')
>>> tagger.evaluate([[('من', 'PRON'), ('رفتم', 'VERB')]])
1.0
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tagged_sent
|
list[TaggedSentence]
|
A list of tagged sentences for evaluation. |
required |
Returns:
| Type | Description |
|---|---|
float
|
The accuracy of the model. |
load_model(model_path)
¶
Loads the tagger model.
Examples:
>>> tagger = SequenceTagger()
>>> tagger.load_model('tagger.model')
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_path
|
str | Path
|
Path to the model file. |
required |
save_model(filename)
¶
Saves the model to a file.
Examples:
>>> tagger.save_model('new_tagger.model')
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str
|
The name of the file to save the model. |
required |
tag(tokens)
¶
Tags a single sentence.
Examples:
>>> tagger = SequenceTagger(model='tagger.model')
>>> tagger.tag(['من', 'به', 'مدرسه', 'ایران', 'رفته_بودم', '.'])
[('من', 'PRON'), ('به', 'ADP'), ('مدرسه', 'NOUN,EZ'), ('ایران', 'NOUN'), ('رفته_بودم', 'VERB'), ('.', 'PUNCT')]
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tokens
|
Sentence
|
A list of tokens representing a sentence. |
required |
Returns:
| Type | Description |
|---|---|
TaggedSentence
|
A tagged sentence. |
tag_sents(sentences)
¶
Tags multiple sentences.
Examples:
>>> tagger = SequenceTagger(model='tagger.model')
>>> tagger.tag_sents([['من', 'به', 'مدرسه', 'ایران', 'رفته_بودم', '.']])
[[('من', 'PRON'), ('به', 'ADP'), ('مدرسه', 'NOUN,EZ'), ('ایران', 'NOUN'), ('رفته_بودم', 'VERB'), ('.', 'PUNCT')]]
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sentences
|
list[Sentence]
|
A list of sentences to tag. |
required |
Returns:
| Type | Description |
|---|---|
list[TaggedSentence]
|
A list of tagged sentences. |
train(tagged_list, c1=0.4, c2=0.04, max_iteration=400, verbose=True, file_name='crf.model', report_duration=True)
¶
Trains the model.
Examples:
>>> tagger = SequenceTagger()
>>> tagged_list = [[('من', 'PRON'), ('به', 'ADP'), ('مدرسه', 'NOUN'), ('رفتم', 'VERB'), ('.', 'PUNCT')]]
>>> tagger.train(tagged_list, c1=0.5, c2=0.5, max_iteration=100, file_name='tagger.model')
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tagged_list
|
list[TaggedSentence]
|
A list of tagged sentences for training. |
required |
c1
|
float
|
Coefficient for L1 regularization. |
0.4
|
c2
|
float
|
Coefficient for L2 regularization. |
0.04
|
max_iteration
|
int
|
Maximum number of iterations for training. |
400
|
verbose
|
bool
|
Whether to print verbose output. |
True
|
file_name
|
str
|
The name of the file to save the trained model. |
'crf.model'
|
report_duration
|
bool
|
Whether to report the training duration. |
True
|