|
There is a wide range of ways to represent TFBS motifs. Here some of them are listed with
references to thier discription; the most widely used of them are discribed below.
At first I had wanted to write about all these models, but after a quick search in the Web I
understood that it would be wasted time, since everithing has been already
written in
Wikipedia.
So I just give references:
About motifs in general.
About binding sites and
transcription machinery.
About Position Frequency Matrices (PFMs).
About Position Weight Matrices (PWMs, PSSMs, PSWMs).
But some brief description can be found below.
All sites potentially bound by factors could be simply enumerated.
The information
about binding sites can be determined from
SELEX
experiments.
This is
an exemple of a list of experimentally confirmed binding sites for transcription
factor (TF) bicoid:
|
Bicoid motif:
| |
|
GCCCCTAATCCCTT
|
CCATCTAATCCCTT
|
TTGGCTAATCCCAG
|
GCCACTAATCCCGA
|
CAACGTAATCCCCA
|
AATTATAATCCCTT
|
...
|
Use the
reference
to see all sites!
|
The list can be used as it is, or converted to PFM or
PWM and generate wider
list of words (see List to PWM).
A position frequency matrix (PFM) records the position-dependent
frequency of
each residue or nucleotide. PFMs can be experimentally determined
from
SELEX
experiments or computationally
discovered by tools such as MEME
using
hidden
Markov models.
An example of a PFM from the
TRANSFAC database for the transcription factor
AP-1:
This was taken from
Wikipedia.
You can read about it in Wikipedia.
The text below was copied from there.
A position weight matrix (PWM), also called position-specific weight
matrix
(PSWM) or position-specific scoring matrix (PSSM), is a commonly
used
representation of motifs
(patterns) in biological sequences.
A PWM is a matrix of score values that gives a weighted match to any given
substring of fixed length. It has one row for each symbol of the alphabet, and one
column for each position in the pattern. PWM score is defined as ,
where j represents position in the substring,
i(j) is the symbol at position j in the
substring, and
mi,j is the score in
row i, column j of the matrix. In other words, a
PWM score is the sum
of position-specific scores for each symbol in the substring.
This is an example of PWM for bicoid:
| a | c | g | t
| | -0.544 | 0.423 | 0.356 | -0.388
| | -0.398 | 0.422 | -0.329 | 0.128
| | -0.398 | -2.054 | -2.054 | 0.992
| | 1.135 | -2.054 | -1.400 | -2.054
| | 1.164 | -2.054 | -2.054 | -2.054
| | -2.054 | -1.018 | -0.728 | 1.025
| | -2.054 | 1.408 | -2.054 | -2.054
| | -1.520 | 1.185 | -1.008 | -0.702
| | -0.713 | 0.422 | 0.356 | -0.260
|
A position weight matrix (PWM) contains log odds weights for computing a
match
score. A cutoff is needed to specify whether an input
sequence matches the motif
or not. PWMs are calculated from PFMs.
Given a PWM and a threshold value one can get a set of words (substrings) scoring
above the threshold.
Also a motif can be presented by a
IUPAC consensus. An example of a consensus
from the TRANSFAC database for the
transcription factor AP-1 is shown above.
The nomenclature of the International Union of Pure and Applied
Chemistry (IUPAC)
is as follows:
A = adenine
C = cytosine
G = guanine
T = thymine
U = uracil
R = G A (purine)
Y = T C (pyrimidine)
K = G T (keto)
M = A C (amino)
S = G C (strong bonds)
W = A T (weak bonds)
B = G T C (all but A)
D = G A T (all but C)
H = A C T (all but G)
V = G C A (all but T)
N = A G C T (any)
Thus IUPAC consensus of a motif can be liken to a list of words.
Motif can be described as a consensus word with a given number of mismatches.
For example, for bicoid it could be:
CCTAATCCC and 3 mismatches.
But this way of motif representation is quite infrequent.
|