Sequence Tagger

This module contains classes and functions for tagging tokens.

`SequenceTagger` ¶

Base class for sequence tagging using CRFSuite.

Examples:

>>> tagger = SequenceTagger(model='tagger.model')
>>> tagger.tag(['من', 'به', 'مدرسه', 'رفتم', '.'])
[('من', 'PRON'), ('به', 'ADP'), ('مدرسه', 'NOUN'), ('رفتم', 'VERB'), ('.', 'PUNCT')]

`init(model=None, data_maker=data_maker)` ¶

Constructor.

Parameters:

Name	Type	Description	Default
`model`	`str \| Path \| None`	Path to the model file.	`None`
`data_maker`	`Callable`	Function to generate features from tokens.	`data_maker`

`evaluate(tagged_sent)` ¶

Evaluates the model.

Examples:

>>> tagger = SequenceTagger(model='tagger.model')
>>> tagger.evaluate([[('من', 'PRON'), ('رفتم', 'VERB')]])
1.0

Parameters:

Name	Type	Description	Default
`tagged_sent`	`list[TaggedSentence]`	A list of tagged sentences for evaluation.	required

Returns:

Type	Description
`float`	The accuracy of the model.

`load_model(model_path)` ¶

Loads the tagger model.

Examples:

>>> tagger = SequenceTagger()
>>> tagger.load_model('tagger.model')

Parameters:

Name	Type	Description	Default
`model_path`	`str \| Path`	Path to the model file.	required

`save_model(filename)` ¶

Saves the model to a file.

Examples:

>>> tagger.save_model('new_tagger.model')

Parameters:

Name	Type	Description	Default
`filename`	`str`	The name of the file to save the model.	required

`tag(tokens)` ¶

Tags a single sentence.

Examples:

>>> tagger = SequenceTagger(model='tagger.model')
>>> tagger.tag(['من', 'به', 'مدرسه', 'ایران', 'رفته_بودم', '.'])
[('من', 'PRON'), ('به', 'ADP'), ('مدرسه', 'NOUN,EZ'), ('ایران', 'NOUN'), ('رفته_بودم', 'VERB'), ('.', 'PUNCT')]

Parameters:

Name	Type	Description	Default
`tokens`	`Sentence`	A list of tokens representing a sentence.	required

Returns:

Type	Description
`TaggedSentence`	A tagged sentence.

`tag_sents(sentences)` ¶

Tags multiple sentences.

Examples:

>>> tagger = SequenceTagger(model='tagger.model')
>>> tagger.tag_sents([['من', 'به', 'مدرسه', 'ایران', 'رفته_بودم', '.']])
[[('من', 'PRON'), ('به', 'ADP'), ('مدرسه', 'NOUN,EZ'), ('ایران', 'NOUN'), ('رفته_بودم', 'VERB'), ('.', 'PUNCT')]]

Parameters:

Name	Type	Description	Default
`sentences`	`list[Sentence]`	A list of sentences to tag.	required

Returns:

Type	Description
`list[TaggedSentence]`	A list of tagged sentences.

`train(tagged_list, c1=0.4, c2=0.04, max_iteration=400, verbose=True, file_name='crf.model', report_duration=True)` ¶

Trains the model.

Examples:

>>> tagger = SequenceTagger()
>>> tagged_list = [[('من', 'PRON'), ('به', 'ADP'), ('مدرسه', 'NOUN'), ('رفتم', 'VERB'), ('.', 'PUNCT')]]
>>> tagger.train(tagged_list, c1=0.5, c2=0.5, max_iteration=100, file_name='tagger.model')

Parameters:

Name	Type	Description	Default
`tagged_list`	`list[TaggedSentence]`	A list of tagged sentences for training.	required
`c1`	`float`	Coefficient for L1 regularization.	`0.4`
`c2`	`float`	Coefficient for L2 regularization.	`0.04`
`max_iteration`	`int`	Maximum number of iterations for training.	`400`
`verbose`	`bool`	Whether to print verbose output.	`True`
`file_name`	`str`	The name of the file to save the trained model.	`'crf.model'`
`report_duration`	`bool`	Whether to report the training duration.	`True`

Sequence Tagger

SequenceTagger ¶

__init__(model=None, data_maker=data_maker) ¶

evaluate(tagged_sent) ¶

load_model(model_path) ¶

save_model(filename) ¶

tag(tokens) ¶

tag_sents(sentences) ¶

train(tagged_list, c1=0.4, c2=0.04, max_iteration=400, verbose=True, file_name='crf.model', report_duration=True) ¶

`SequenceTagger` ¶

`init(model=None, data_maker=data_maker)` ¶

`evaluate(tagged_sent)` ¶

`load_model(model_path)` ¶

`save_model(filename)` ¶

`tag(tokens)` ¶

`tag_sents(sentences)` ¶

`train(tagged_list, c1=0.4, c2=0.04, max_iteration=400, verbose=True, file_name='crf.model', report_duration=True)` ¶