Skip to content

Sequence Tagger

This module contains classes and functions for tagging tokens.

SequenceTagger

Base class for sequence tagging using CRFSuite.

Examples:

>>> tagger = SequenceTagger(model='tagger.model')
>>> tagger.tag(['من', 'به', 'مدرسه', 'رفتم', '.'])
[('من', 'PRON'), ('به', 'ADP'), ('مدرسه', 'NOUN'), ('رفتم', 'VERB'), ('.', 'PUNCT')]

__init__(model=None, data_maker=data_maker)

Constructor.

Parameters:

Name Type Description Default
model str | Path | None

Path to the model file.

None
data_maker Callable

Function to generate features from tokens.

data_maker

evaluate(tagged_sent)

Evaluates the model.

Examples:

>>> tagger = SequenceTagger(model='tagger.model')
>>> tagger.evaluate([[('من', 'PRON'), ('رفتم', 'VERB')]])
1.0

Parameters:

Name Type Description Default
tagged_sent list[TaggedSentence]

A list of tagged sentences for evaluation.

required

Returns:

Type Description
float

The accuracy of the model.

load_model(model_path)

Loads the tagger model.

Examples:

>>> tagger = SequenceTagger()
>>> tagger.load_model('tagger.model')

Parameters:

Name Type Description Default
model_path str | Path

Path to the model file.

required

save_model(filename)

Saves the model to a file.

Examples:

>>> tagger.save_model('new_tagger.model')

Parameters:

Name Type Description Default
filename str

The name of the file to save the model.

required

tag(tokens)

Tags a single sentence.

Examples:

>>> tagger = SequenceTagger(model='tagger.model')
>>> tagger.tag(['من', 'به', 'مدرسه', 'ایران', 'رفته_بودم', '.'])
[('من', 'PRON'), ('به', 'ADP'), ('مدرسه', 'NOUN,EZ'), ('ایران', 'NOUN'), ('رفته_بودم', 'VERB'), ('.', 'PUNCT')]

Parameters:

Name Type Description Default
tokens Sentence

A list of tokens representing a sentence.

required

Returns:

Type Description
TaggedSentence

A tagged sentence.

tag_sents(sentences)

Tags multiple sentences.

Examples:

>>> tagger = SequenceTagger(model='tagger.model')
>>> tagger.tag_sents([['من', 'به', 'مدرسه', 'ایران', 'رفته_بودم', '.']])
[[('من', 'PRON'), ('به', 'ADP'), ('مدرسه', 'NOUN,EZ'), ('ایران', 'NOUN'), ('رفته_بودم', 'VERB'), ('.', 'PUNCT')]]

Parameters:

Name Type Description Default
sentences list[Sentence]

A list of sentences to tag.

required

Returns:

Type Description
list[TaggedSentence]

A list of tagged sentences.

train(tagged_list, c1=0.4, c2=0.04, max_iteration=400, verbose=True, file_name='crf.model', report_duration=True)

Trains the model.

Examples:

>>> tagger = SequenceTagger()
>>> tagged_list = [[('من', 'PRON'), ('به', 'ADP'), ('مدرسه', 'NOUN'), ('رفتم', 'VERB'), ('.', 'PUNCT')]]
>>> tagger.train(tagged_list, c1=0.5, c2=0.5, max_iteration=100, file_name='tagger.model')

Parameters:

Name Type Description Default
tagged_list list[TaggedSentence]

A list of tagged sentences for training.

required
c1 float

Coefficient for L1 regularization.

0.4
c2 float

Coefficient for L2 regularization.

0.04
max_iteration int

Maximum number of iterations for training.

400
verbose bool

Whether to print verbose output.

True
file_name str

The name of the file to save the trained model.

'crf.model'
report_duration bool

Whether to report the training duration.

True