HTML output


DESCRIPTION

For each organism the server lists all coding sequences with an R-value less than a given cutoff. The default cutoff is R=2. This output is organised like the GFF output and in addition has active links to the BLAST report produced by the model, and to Swiss-Prot and GenBank (see entries 9, 13 and 14).

The columns are organised as follows:

1. Sequence identifier (genome version)

2. Source identifier with one of two values: a) EasyGene source - model version. b) BLASTP - if the sequence was part of the training set but EasyGene did not predict it.

3. Feature identifier with one of four values: a) CDS indicates a coding sequence (i.e. a gene) with the highest scoring start codon. b) ALT indicates a coding sequence with a lower scoring start codon and that the sequence was part of the model training set. c) ORF indicates an open reading frame with a lower scoring start codon.

4. If a CDS, ALT or ORF is on the direct strand this column is the position of the first base of a start codon. If a CDS, ALT or ORF is on the reverse strand then this column is the last base of a stop codon. The positions are measured in direct strand coordinates in both cases.

5. If a CDS, ALT or ORF is on the direct strand this column is the position of the last base of a stop codon. If a CDS, ALT or ORF is on the reverse strand then this column is the first base of a start codon. The positions are measured in direct strand coordinates in both cases.

6. R-value associated with a CDS, ALT or ORF. The R-value is the expected number of genes one would find per megabase random DNA sequence with a standard score greater than that of the CDS, ALT or ORF. Hence, the SMALLER the R-value, the MORE likely it is to be a real gene. The default R-value cutoff is R=2, but the user may chose between other values. A dot indicates that the CDS is part of the training set for which the position of the start codon could be determined from protein matches in a BLASTP search against Swiss-Prot (i.e. a gene) but that the model did not predict it.

7. Strand of CDS, ALT or ORF.

8. Frame of CDS, ALT or ORF. Since 4) and 5) always indicate in-frame positions, this column is always 0 and may be ignored. It is included in order to comply with the GFF format.

9. Maximal open reading frame (mORF) in which the CDS, ALT or ORF is contained. First number indicates reading frame of the mORF. If the mORF is on the direct strand, the second number is the position of the first base of the start codon and the third number is the position of the last base of the stop codon. If the mORF is on the reverse strand then the second number is the last base of the stop codon and the third number is the first base of the start codon. Each mORF entry is linked to the BLAST report which was retrieved by a BLASTP search of the mORF against Swiss-Prot. Note that the complementary strand ORF coordinates in the BLAST report is in given in complementary strand coordinates and therefor are different from the numbers in the link. A dot indicates that the mORF corresponding to this particular CDS, ALT or ORF was less than 120 bases long and therefor was not part of the BLASTP search.

10. The start codon predicted.

11. The log-odds score of a CDS, ALT or ORF. A dot indicates that the CDS is part of the training set for which the position of the start codon could be determined from protein matches in a BLASTP search against Swiss-Prot (i.e. a gene) but that the model did not predict it.

12. EasyGene identifier with one of three values: a) Confirmed_start indicates that the sequence is a part of the training set for which the position of the start codon could be determined from protein matches in a BLASTP search against Swiss-Prot. b) Confirmed indicates that the sequence is a part of the training set for which the position of the start codon could NOT be determined from protein matches in a BLASTP search against Swiss-Prot c) Predicted indicates that the sequence is not part of any training set but it is predicted by EasyGene.

13. The highest scoring protein match of the mORF in a BLASTP search against Swiss-Prot. Each protein is linked to the corresponding Swiss-Prot entry. A dot indicates that no match was found.

14. Gene - or locus name retrieved from annotation. GenBank annotation is written primarily, while RefSeq annotation is written in parentheses, if it differs from GenBank. An asterisk '*' indicates that the predicted  position of the start codon is different from the annotation and a currency sign '¤' shows that the corresponding annotated CDS is tagged as a pseudo gene. A dot indicates that the sequence is not annotated.




EXAMPLE OUTPUT

############# EasyGene predictions ##############

GenomeSourceFeatureStart/StopRStrandFrameORFStartLog-oddsEasyGeneSwiss-ProtAnnotation
versionidentifieridentifierstopstartvalue

identifiercodon
descriptormatch














NC_000854.1 AP01 ALT 1413180 1415090 3.7e-48 - 0 ORF1_1413180_1415093 GTG 294 Confirmed DPO2_SULSO APE2229*
NC_000854.1 AP01 CDS 1415059 1415319 0.00024 - 0 ORF0_1415059_1415319 ATG 28.4 Predicted . .
NC_000854.1 AP01 CDS 1415283 1415930 8.11e-13 - 0 ORF1_1415283_1415933 TTG 80.5 Predicted . APE2235*
NC_000854.1 AP01 CDS 1416063 1416776 3.24e-11 - 0 ORF1_1416063_1416797 TTG 68.6 Predicted . APE2236*
NC_000854.1 AP01 CDS 1416930 1417493 1.83e-06 - 0 ORF1_1416930_1417520 TTG 37.8 Predicted . APE2238*
NC_000854.1 AP01 ORF 1416930 1417463 2.15e-06 - 0 ORF1_1416930_1417520 TTG 38.6 Predicted . APE2238*
.





         


              


                 
.





         


              


                 
NC_000854.1AP01 CDS 1644099 1645145 1.75e-16 + 0 ORF2_1644099_1645145 GTG 96.5 Predicted YQJP_BACSU APE2585
NC_000854.1 AP01 ORF 1644111 1645145 2.23e-16 + 0 ORF2_1644099_1645145 ATG 96.2 Predicted YQJP_BACSU APE2585*
NC_000854.1 AP01 CDS 1654144 1655007 8.92e-11 + 0 ORF0_1654144_1655007 TTG 58.3 Confirmed YQ01_AERPE APE2601*
NC_000854.1 AP01 ALT 1654195 1655007 2.26e-10 + 0 ORF0_1654144_1655007 TTG 57.7 Confirmed YQ01_AERPE APE2601*
NC_000854.1 AP01 ALT 1654204 1655007 2.51e-10 + 0 ORF0_1654144_1655007 ATG 57.8 Confirmed YQ01_AERPE APE2601*
NC_000854.1 AP01 CDS 1658355 1659056 1.37e-05 + 0 ORF2_1658304_1659056 ATG 19.3 Confirmed PSRB_WOLSU APE2605*


GETTING HELP

Scientific problems: Pernille Nielsen       Technical problems: