| |
Here we concider three model for random text generation that can be used for representation of real biological sequences
Multivariate Bernoulli/Markov(0)
Markov(k)
HMM
Also you can learn about all three models at the page of Lloyd Allison,
Faculty of Information Technology, Clayton, Monash University, Australia.
Multivariate Bernoulli or Markov(0) model is a
generative model that supposes no dependencies between letters in generated text. It is the particular case of
Markov(k) model with dependence order k equal ro zero.
Formally, given an alphabet Α = {αi} and probabilities pαi: Σ pαi = 1,
the probability
to get letter α at any position j is equal to pα and does not depend on position number
nor on letters on previous or subsequent positions.
If Xj-l does not exist for some l, i.e. j-l < 0, then
Xj-l is eliminated from conditional
probability. For example, for k=2 and n=4:
.
Thus, to set Markov( k) model one needs
to set all conditional probabilities
for all
.
Also one need to set parameters of starting distribution.
The most widely used is Markov model of order 1. The other name is time-homogeneous markov chain.
Read about Markov chains in Wikipedia.
In our case, when the alphabet Α is finite, the transition probability
distribution can be
represented by a matrix P, called the
transition matrix, with the (i, j)'th element
of P equal to
.
P is a stochastic
matrix. Further, the k-step transition probability can be computed as
the k'th power of the transition matrix, Pk.
The stationary distribution π is a (row) vector which satisfies the equation
π = πP.
In other words, the stationary distribution π is a normalized left eigenvector of the
transition matrix associated with the eigenvalue 1.
Text can be considered as generated according to Hidden Markov Model (HMM).
In order not to copy we advise you to read about HMM
in Wikipedia.
Also you can learn about all three models here. This is the page of Lloyd Allison,
Faculty of Information Technology, Clayton, Monash University, Clayton, Victoria 3800,
Australia.
|