• Users Online: 288
  • Print this page
  • Email this page


 
 Table of Contents  
ORIGINAL ARTICLE
Year : 2017  |  Volume : 1  |  Issue : 1  |  Page : 29-36

Evaluation of in silico protein secondary structure prediction methods by employing statistical techniques


Department of Bioinformatics, Bharathiar University, Coimbatore, Tamil Nadu, India

Date of Web Publication24-Jul-2017

Correspondence Address:
Kandavelmani Angamuthu
Department of Bioinformatics, Bharathiar University, Coimbatore - 641 046, Tamil Nadu
India
Login to access the Email id

Source of Support: None, Conflict of Interest: None


DOI: 10.4103/bbrj.bbrj_28_17

Rights and Permissions
  Abstract 

Background: With the advent of many new advanced techniques, sequences of a number of proteins have been made available. But the relative paucity of the experimentally determined three-dimensional structures of these proteins has paved way for the development of computational structure prediction methods. Protein secondary structure prediction is an essential step in modeling the tertiary structure. Among the various secondary structure prediction methods available, three different methods with unique working principles, namely, GOR, HNN, and SOPMA were evaluated for their efficiency to predict secondary structures. Methods: A set of 90 different proteins with known secondary structures from three major classes namely, mainly alpha, mainly beta, and mainly alpha beta was used as reference. Secondary structure data of these proteins obtained through experimental methods were compared with that of predictions made by GOR, HNN, and SOPMA respectively by employing various statistical analyses, namely paired sample test, correlation coefficient, standard deviation, standard error mean and scatter plots. Results: The secondary structure prediction tools namely, GOR and HNN were found to predict helical structures more accurately than the sheets. SOPMA was observed to predict sheets more accurately than helices. Conclusion: Based on the observed results, it could be concluded that there is no single tool that consistently predicts all the secondary structures accurately. It could also be anticipated that a combined use of these secondary prediction tools could further enhance the efficacy of in silico protein secondary structure prediction methods.

Keywords: Correlation coefficient, paired sample test, scatter plots, standard deviation, standard error mean


How to cite this article:
Angamuthu K, Piramanayagam S. Evaluation of in silico protein secondary structure prediction methods by employing statistical techniques. Biomed Biotechnol Res J 2017;1:29-36

How to cite this URL:
Angamuthu K, Piramanayagam S. Evaluation of in silico protein secondary structure prediction methods by employing statistical techniques. Biomed Biotechnol Res J [serial online] 2017 [cited 2019 Jan 21];1:29-36. Available from: http://www.bmbtrj.org/text.asp?2017/1/1/29/211411




  Introduction Top


Of all the molecules found in living organisms, proteins are the most important as they are the biological workhorses that carry out vital functions in every cell. With the advent of various sequencing techniques, amino acid sequences for a number of proteins have been determined. However, three-dimensional structural information obtained through X-ray crystallography, nuclear magnetic resonance, and other experimental methods are available only for around 10% of these protein sequences. Hence, computational prediction of protein structures has become important with the rapid growth of database of protein sequences. Such an attempt for the use of computational methods for protein structure prediction based on only primary structure information started over 40 years ago.[1],[2],[3] The prediction of protein secondary structure is an important step in modeling the tertiary structure of a protein which indeed is essential for the functional annotation of the protein. Ever since the start of usage of computational methods for protein structure prediction, various secondary structure prediction tools have been developed and made available online.[1],[4] Most of these secondary structure prediction algorithms are based on machine learning techniques.[5],[6],[7] After 1990, to improve the accuracy of secondary structure prediction, evolutionary information found in multiple sequence alignments was incorporated.[8] Improvements in protein secondary structure prediction were also made by incorporating various strategies in training the model.[9],[10] Although various methods with different working principles are available, their accuracy in predicting the structure is not clearly known. The present study involves the evaluation of the accuracy of three of the most widely used tools, GOR,[11] HNN,[12] and Self-Optimized Prediction Method with Alignment (SOPMA)[13] in predicting the secondary structure of ninety proteins for which experimentally determined primary and secondary structural information are available. Since each of these tools work on different principle – GOR (combination of information theory, Bayesian statistics, and evolutionary information), HNN (based on trained hierarchical neural networks), and (Self-Optimized Prediction Method with Alignment, taking into account information from an alignment of sequences belonging to the same family), the existence of parallelism among each of these tools in predicting the secondary structures was checked.


  Methods Top


To test the accuracy of GOR,[11] HNN,[12] and SOPMA[13] predictive methods, ninety proteins of known secondary structure were used. For the analysis, thirty proteins from each major classes, mainly alpha, mainly beta, and mainly alpha-beta [Table 1], [Table 2], [Table 3] were taken. The secondary structure data of these proteins obtained through experimental methods were acquired from Protein Data Bank (www.rcsb.org). Protein sequences of all the three classes were analyzed by the methods of GOR,[11] HNN,[12] and SOPMA.[13] Percentage of residues being alpha helix and beta sheet for each protein was calculated from the result [Table 1], [Table 2], [Table 3]. Correlation between the net predicted values of alpha helix and beta sheet of all the three classes of protein and mean experimental values for alpha helix and beta sheet structures were individually assessed by paired sample t-test using the statistical software SPSS® V12.0 (SPSS Inc. Released 2007. SPSS for Windows, Version 16.0. Chicago, SPSS Inc.). Correlation coefficient, standard deviation, and standard error mean were also calculated between both the data. Scatter plots for determining the degree of correlation between the predicted data and experimental data were generated using the StatSoft® STATISTICA software.
Table 1: Comparison of experimental and predicted percent alpha - helix and beta-sheet structure in “mainly alpha” class of proteins

Click here to view
Table 2: Comparison of experimental and predicted percent alpha - helix and beta-sheet structure in “mainly beta” class of proteins

Click here to view
Table 3: Comparison of experimental and predicted percent alpha helix and beta-sheet structure in “alpha beta” class of proteins

Click here to view



  Results and Discussion Top


Secondary structures of each of the three classes of protein predicted by the secondary structure prediction tools, GOR,[11] HNN,[12] and SOPMA,[13] and the experimentally derived data were compared with each other, respectively. Correlation in the prediction of either alpha helix or beta sheet by paired sample t-test was found be 0.21%, which indicates that the predicted secondary structure of all the three classes of protein, taken for test, by prediction method and the experimental results are likely to be independent.

Correlation coefficient calculated for all classes of protein by GOR,[11] HNN,[12] and SOPMA[13] [Table 4] indicates lack of correspondence with the experimental data. Predictions made by all the three methods are poor in mainly alpha class of protein, where as considerable prediction is observed in mainly beta and mainly alpha-beta classes of protein. However, none of the method shows greater than 0.5% correlation. Though SOPMA[13] shows 0.513% correlation in predicting alpha helices of mainly alpha beta class of protein, it showed a huge deviation in predicting alpha helices and beta sheets for the other classes of protein. Five hundred and thirteen percent correlation in predicting alpha helix of mainly alpha-beta class of protein, it diverged a huge in predicting beta sheet and both alpha helix and beta sheet of rest of the classes of protein. Further evidence illustrating the inefficient prediction by three methods can be seen from the significance values [Table 5].
Table 4: Correlation coefficient with experimental data

Click here to view
Table 5: Significance of paired sample t-test. (95% confidence level)

Click here to view


Moreover, a lack of correlation was observed between experimental and predicted values for helices and sheets for all the three classes [Figure 1], [Figure 2], [Figure 3]. If the correlation between the experimental data and predicted data are exact the points would fall along the line of fit. [Figure 1], [Figure 2], [Figure 3], clearly shows a deprived correlation between the experimental data and prediction method for all the classes of proteins.
Figure 1: Scatterplots of experimental versus predicted secondary structures in mainly alpha class of proteins. Abbreviations: G: GOR, H: HNN, S: SOPMA, A: Alpha helix, B: Beta.sheet, E: Experimental value, A: Alpha helix, B: Beta sheets

Click here to view
Figure 2: Scatterplots of experimental versus predicted secondary structures in mainly beta class of proteins. Abbreviations: G: GOR, H: HNN, S: SOPMA, A: Alpha helix, B: Beta sheet, E: Experimental value, A: Alpha helix, B: Beta sheets

Click here to view
Figure 3: Scatter plots of experimental versus predicted secondary structures in mainly alpha beta class of proteins. Abbreviations: G: GOR, H: HNN, S: SOPMA, A: Alpha helix, B: Beta.sheet, E: Experimental value, A: Alpha helix, B: Beta sheets

Click here to view



  Conclusion Top


Summing up, these tests imply that the methods of GOR,[11] HNN,[12] and SOPMA[13] do not possess higher degree of accuracy in their predictions incessantly. However, of the analyzed ninty proteins, secondary structures of three proteins were predicted accurately by GOR,[11] HNN,[12] and SOPMA[13] (CALBINDIN D9K, 1BOC; ENDOGLUCANASE Z, 1AIW; and TRIOSE PHOSPHATE ISOMERASE CHAIN A, 8TIM, respectively). These accurate predictions corroborate that these methods are efficient. Since accuracy is not observed consistently in all the analyzed proteins, it can be proposed that the further improvement in these methods or a combination of these methods may improve the prediction efficiency on an average. Proteins are also characterized based on their interpretable features and physicochemical properties with the employment of statistical techniques.[14] It has also been anticipated that the combination of protein secondary structure prediction methods with additional protein structure features has found to provide more accurate results.[15] However, GOR[11] and HNN[12] are predicting helices accurately compared to the sheets, whereas SOPMA[13] predicts sheets accurately. This implies that one can easily identify regions that are more likely to be predicted accurately than others. Therefore, it could be concluded that synchronized employment of secondary structure prediction methods could enhance the accuracy of insilico analysis. As a future perspective, augmentations could be brought about in the protein secondary structure prediction methods to further enhance the prediction accuracy.

Financial support and sponsorship

This work is supported by the Department of Biotechnology – Bioinformatics Facility, Government of India.

Conflicts of interest

There are no conflicts of interest.



 
  References Top

1.
Heringa J. Computational methods for protein secondary structure prediction using multiple sequence alignments. Curr Protein Pept Sci 2000;1:273-301.  Back to cited text no. 1
[PUBMED]    
2.
Rost B. Review: Protein secondary structure prediction continues to rise. J Struct Biol 2001;134:204-18.  Back to cited text no. 2
[PUBMED]    
3.
Simossis VA, Heringa J. Integrating protein secondary structure prediction and multiple sequence alignment. Curr Protein Pept Sci 2004;5:249-66.  Back to cited text no. 3
[PUBMED]    
4.
Sen TZ, Jernigan RL, Garnier J, Kloczkowski A. GOR V server for protein secondary structure prediction. Bioinformatics 2005;21:2787-8.  Back to cited text no. 4
[PUBMED]    
5.
Guermeur Y, Geourjon C, Gallinari P, Deléage G. Improved performance in protein secondary structure prediction by inhomogeneous score combination. Bioinformatics 1999;15:413-21.  Back to cited text no. 5
    
6.
Jones DT. Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 1999;292:195-202.  Back to cited text no. 6
[PUBMED]    
7.
Jones DT, Swindells MB. Getting the most from PSI-BLAST. Trends Biochem Sci 2002;27:161-4.  Back to cited text no. 7
[PUBMED]    
8.
Lin K, Simossis VA, Taylor WR, Heringa J. A simple and fast secondary structure prediction method using hidden neural networks. Bioinformatics 2005;21:152-9.  Back to cited text no. 8
[PUBMED]    
9.
Kneller DG, Cohen FE, Langridge R. Improvements in protein secondary structure prediction by an enhanced neural network. J Mol Biol 1990;214:171-82.  Back to cited text no. 9
[PUBMED]    
10.
Jiang F. Prediction of protein secondary structure with a reliability score estimated by local sequence clustering. Protein Eng 2003;16:651-7.  Back to cited text no. 10
[PUBMED]    
11.
Garnier J, Gibrat JF, Robson B. GOR method for predicting protein secondary structure from amino acid sequence. Methods Enzymol 1996;266:540-53.  Back to cited text no. 11
[PUBMED]    
12.
Guermeur Y. Combinaison de Classifieurs Statistiques, Application a la Prediction de Structure Secondaire des Proteines (Ph.D Thesis);1997.  Back to cited text no. 12
    
13.
Geourjon C, Deléage G. SOPMA: Significant improvements in protein secondary structure prediction by consensus prediction from multiple alignments. Comput Appl Biosci 1995;11:681-4.  Back to cited text no. 13
    
14.
Pratiwi R, Malik AA, Schaduangrat N, Prachayasittikul V, Wikberg JE, Nantasenamat C, et al. CryoProtect: A Web Server for Classifying Antifreeze Proteins from Nonantifreeze Proteins. Journal of Chemistry, 2017. Article ID 9861752, 15 pages. https://doi.org/10.1155/2017/9861752.  Back to cited text no. 14
    
15.
Yan R, Song J, Cai W, Zhang Z. A short review on protein secondary structure prediction methods. In: Elloumi M, Iliopoulos CS, Wang JT, Zomaya AY, editors. Pattern Recognition in Computational Molecular Biology: Techniques and Approaches. New York, USA: Wiley; 2015. p. 99.  Back to cited text no. 15
    


    Figures

  [Figure 1], [Figure 2], [Figure 3]
 
 
    Tables

  [Table 1], [Table 2], [Table 3], [Table 4], [Table 5]


This article has been cited by
1 In silico designing of peptide inhibitors against pregnane X receptor: the novel candidates to control drug metabolism
Tayebeh Farhadi
International Journal of Peptide Research and Therapeutics. 2018; 24(3): 409
[Pubmed] | [DOI]
2 Constructing novel chimeric DNA vaccine against Salmonella enterica based on SopB and GroEL proteins: an in silico approach
Tayebeh Farhadi,Seyed MohammadReza Hashemian
Journal of Pharmaceutical Investigation. 2018; 48(6): 639
[Pubmed] | [DOI]



 

Top
 
 
  Search
 
Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
 Related articles
Access Statistics
Email Alert *
Add to My List *
* Registration required (free)

 
  In this article
Abstract
Introduction
Methods
Results and Disc...
Conclusion
References
Article Figures
Article Tables

 Article Access Statistics
    Viewed1008    
    Printed61    
    Emailed0    
    PDF Downloaded149    
    Comments [Add]    
    Cited by others 2    

Recommend this journal


[TAG2]
[TAG3]
[TAG4]