arnsholt/Algorithm-Viterbi/lib/Algorithm/Viterbi
Algorithm-Viterbi src
NAME
Algorithm::Viterbi - Decoding HMMs
DESCRIPTION
This module is a fairly straightforward implementation of Viterbi's algorithm
for decoding hidden Markov models. The code is based on a Common Lisp
implementation I wrote as coursework, itself based on pseudo-code from
Jurafsky & Martin - Speech and language processing (2nd ed).
SYNOPSIS
use Algorithm::Viterbi;
my Algorithm::Viterbi $hmm .= new(:alphabet<H C>);
$hmm.train("training-data.tt"); # Train from file
$hmm.train([ [a => 1, b => 2, a => 1],
[b => 3, c => 1, a => 2] ]); # Train from hardcoded data
$hmm.decode(<a b c>);
FIELDS
(over) 4
(item) %.p-transition
The transition probabilities. A hash of hashes, indexed by tag name.
(item) %.p-emission
The emission probabilities for a given tag. A hash of hashes, indexed first by
tag, then by observation.
METHODS
(over) 4
(item) method new(:@alphabet!, :%p-transition, :%p-emission)
The alphabet parameter is required (an alphabet-less HMM doesn't make too much
sense). The transition and emission probabilities are also required for
correct operation of decode , but can be specified either on construction,
with the train method, or by manual specification via the corresponding
fields.
(item) method decode(Str @input)
The decode method decodes the input according to the probabilities
specified in the %.p-transition and %.p-emission fields.
(item) method train(Str $file)
Computes unsmoothed bigram probabilities from an input file. The input format
is described by this grammar:
grammar G {
token TOP { <chunk>+ }
token chunk { <record>+ \n }
token record { \w+ \t \w+ \n }
}
The records are observation, then the associated tag.
(item) method train(Array of Pair @data)
Computes unsmoothed bigram probabilities from an Array of Array of Pairs.
Each pair is a single observation-tag pair, and each element of the top-level
array is a sequence that is learnt.
AUTHOR
Arne Skjærholt - L<mailto:arnsholt@gmail.com>.