SCGPred: a score-based method for gene structure prediction by combining multiple sources of evidence

Xiao Li, Qingan Ren, Yang Weng, Yunmin Zhu, Yizheng Zhang

Contact me: lix@scu.edu.cn

INTRODUCTION:

SCGPred is computational program for predicting protein-coding gene structures by combining multiple sources of evidece. The details of combinational method are described in our paper (under review).

Current version 1.0 of SCGPred can obtain a consensus prediction by combining the following evidences:

  • Genscan prediction
  • Fgenesh prediction
  • GENEID prediction
  • AUGUSTUS prediction
  • FirstEF prediction
  • alingments with EST and protein databases.

    The output format of SCGPred is the GTF2 (Gene Transfer Format, http://mblab.wustl.edu/GTF22.html).


  • EVALUATION:

    SCGPred was evaluated on human chromosome 22 and ENCODE regions:

    Table 1. Accuracy results on human chromosome 22. Accuracy results on human chromosome 22 using the version hg18 of UCSC annotation as the standard of truth. SGP2 predictions were downloaded from the UCSC genome
    browser. The predictions of SCGPred combining different validation evidences were calculated by using the same penalty factors.

    Methods
      base sens base spec exon sens exon spe exon avar gene sens gene spec
    Genscan   0.89 0.48 0.71 0.40 0.56 0.09 0.05
    AUGUSTUS   0.83 0.60 0.67 0.53 0.60 0.19 0.09
    Fgenesh   0.86 0.61 0.73 0.53 0.63 0.15 0.09
    GENEID   0.83 0.63 0.68 0.55 0.62 0.16 0.09
    SCGPred                
      ALL 0.83 0.73 0.74 0.70 0.72 0.21 0.18
      Protein 0.76 0.75 0.69 0.72 0.71 0.14 0.15
      EST 0.82 0.74 0.73 0.70 0.72 0.21 0.18
                     
    SGP2   0.85 0.69 0.74 0.60 0.67 0.22 0.14

    Table 2. Accuracy results on ENCODE regions. Accuracy results on ENCODE regions using the version hg17 of UCSC annotation as the standard of truth.

    Methods
      base sens base spec exon sens exon spe exon avar gene sens gene spec
    Genscan   0.87 0.43 0.67 0.38 0.52 0.10 0.04
    AUGUSTUS   0.78 0.59 0.58 0.53 0.56 0.15 0.07
    Fgenesh   0.87 0.44 0.71 0.43 0.57 0.15 0.05
    GENEID   0.82 0.48 0.62 0.48 0.55 0.13 0.05
    SCGPred                
      ALL 0.80 0.64 0.69 0.68 0.69 0.20 0.14
      Protein 0.76 0.65 0.65 0.69 0.67 0.18 0.14
      EST 0.78 0.67 0.68 0.71 0.70 0.20 0.16
                     
    SGP2   0.85 0.77 0.70 0.61 0.66 0.14 0.10

    See the figure of score transformation, and the relationship between predictive accuracy and penalty factor without and with validation evidences.

     

    DOWNLOAD:

    SCGPred is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License.

    Click here to download SCGPred (version 1.0) . Read the README.

    Download SCGPred predictions: Human chromosome 22 and ENCODE regions.


    LINKS:

    Genscan: http://genes.mit.edu/GENSCAN.html
    Fgenesh: http://sun1.softberry.com/berry.phtml?topic=fgenesh&group=programs&subgroup=gfind
    AUGUSTUS: http://augustus.gobics.de/
    GENEID: http://www1.imim.es/geneid.html
    FirstEF: http://rulai.cshl.edu/tools/FirstEF/
    NCBI-BLAST: ftp://ftp.ncbi.nih.gov/blast/

    UCSC genome brower: http://genome.ucsc.edu/
    The Eval package: http://mblab.wustl.edu/software/eval/