For each organism the server lists all coding sequences
with an R-value less than a given cutoff. The default cutoff is R=2. This
output is organised like the
GFF output and in addition has active links to
the BLAST report produced by the model, and to Swiss-Prot and GenBank (see
entries 9, 13 and 14).
The columns are organised as follows:
1. Sequence identifier (genome version)
2. Source identifier with one of two values: a) EasyGene source - model version.
b) BLASTP - if the sequence was part of the training set but EasyGene did
not predict it.
3. Feature identifier with one of four values: a) CDS indicates a coding
sequence (i.e. a gene) with the highest scoring start codon. b) ALT indicates
a coding sequence with a lower scoring start codon and that the sequence
was part of the model training set. c) ORF indicates an open
reading frame with a lower scoring start codon.
4. If a CDS, ALT or ORF is on the direct strand this column is the position
of the first base of a start codon. If a CDS, ALT or ORF is on the reverse
strand then this column is the last base of a stop codon. The positions are
measured in direct strand coordinates in both cases.
5. If a CDS, ALT or ORF is on the direct strand this column is the position
of the last base of a stop codon. If a CDS, ALT or ORF is on the reverse strand
then this column is the first base of a start codon. The positions are measured
in direct strand coordinates in both cases.
6. R-value associated with a CDS, ALT or ORF. The R-value is the expected
number of genes one would find per megabase random DNA sequence with a standard
score greater than that of the CDS, ALT or ORF. Hence, the SMALLER the R-value,
the MORE likely it is to be a real gene. The default R-value cutoff is R=2,
but the user may chose between other values. A dot indicates that the CDS
is part of the training set for which the position of the start codon could
be determined from protein matches in a BLASTP search against Swiss-Prot
(i.e. a gene) but that the model did not predict it.
7. Strand of CDS, ALT or ORF.
8. Frame of CDS, ALT or ORF. Since 4) and 5) always indicate in-frame positions,
this column is always 0 and may be ignored. It is included in order to comply
with the GFF format.
9. Maximal open reading frame (mORF) in which the CDS, ALT or ORF is
contained. First number indicates reading frame of the mORF. If the mORF
is on the direct strand, the second number is the position of the first base
of the start codon and the third number is the position of the last base
of the stop codon. If the mORF is on the reverse strand then the second number
is the last base of the stop codon and the third number is the first base
of the start codon. Each mORF entry is linked to the BLAST report which was
retrieved by a BLASTP search of the mORF against Swiss-Prot. Note that the
complementary strand ORF coordinates in the BLAST report is in given in complementary
strand coordinates and therefor are different from the numbers in the link.
A dot indicates that the mORF corresponding to this particular CDS, ALT or
ORF was less than 120 bases long and therefor was not part of the BLASTP
search.
10. The start codon predicted.
11. The log-odds score of a CDS, ALT or ORF. A dot indicates
that the CDS is part of the training set for which the position of the start
codon could be determined from protein matches in a BLASTP search against
Swiss-Prot (i.e. a gene) but that the model did not predict it.
12. EasyGene identifier with one of three values: a) Confirmed_start indicates
that the sequence is a part of the training set for which the position of
the start codon could be determined from protein matches in a BLASTP search
against Swiss-Prot. b) Confirmed indicates that the sequence is a part
of the training set for which the position of the start codon could NOT be
determined from protein matches in a BLASTP search against Swiss-Prot c)
Predicted indicates that the sequence is not part of any training set but
it is predicted by EasyGene.
13. The highest scoring protein match of the mORF in a BLASTP search against
Swiss-Prot. Each protein is linked to the corresponding Swiss-Prot entry.
A dot indicates that no match was found.
14. Gene - or locus name retrieved from annotation. GenBank annotation is
written primarily, while RefSeq annotation is written in parentheses, if
it differs from GenBank. An asterisk '*' indicates that
the predicted position of the start codon is different from the
annotation and a currency sign '¤' shows that the corresponding annotated
CDS is tagged as a pseudo gene. A dot indicates that the sequence is not
annotated.