|Year : 2018 | Volume
| Issue : 1 | Page : 20-25
Advances in protein tertiary structure prediction
Department of Pharmaceutical Biotechnology, Faculty of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran
|Date of Web Publication||5-Mar-2018|
Dr. Tayebeh Farhadi
Department of Pharmaceutical Biotechnology, Faculty of Pharmacy, Shiraz University of Medical Sciences, Shiraz
Source of Support: None, Conflict of Interest: None
Proteins are composed of linear chains of amino acids that form a unique three-dimensional structure in their native environment. Such native structure favors the proteins to perform their biochemical activity. Protein is formed of some levels of structure. The primary structure of a protein is specified by the particular amino acid sequence. In an amino acid sequence, patterns of local bonding can be identified as secondary structure. The final level that forms a tertiary protein structure is composed of the mentioned elements and form after the protein folds into its native state. To find the native structure of proteins, the physicochemical principles as well as identifying the lowest free-energy states are considered as the best properties and to predict target proteins with unknown structures, the bioinformatics-based methods have earned considerable success. Protein structure prediction methods have been mainly classified into three types: ab Initio folding, comparative (homology) modeling and threading. Each mentioned method may be applied for a protein structure, depending on the existence of related experimental structures that are deposited in the PDB. Once an initial model is generated, refinement simulations are conducted to reassemble the global topology and the local structures of the protein chains. Since significant features of a model may be in regions that are structurally distinct from the template, refining of a primary model is influential. A trustful strategy is included a stereo-chemical check and discovering how the model deviates from the basic disciplines of known experimental structures.
Keywords: Model evaluation, model refinement, protein modeling, protein tertiary structure
|How to cite this article:|
Farhadi T. Advances in protein tertiary structure prediction. Biomed Biotechnol Res J 2018;2:20-5
| Introduction|| |
Proteins are composed of linear chains of amino acids that form a unique three-dimensional (3D) structure in their native environment. Such native structure favors the proteins to perform their biochemical activity.
Protein is formed of some levels of structure. The primary structure of a protein is specified by the particular amino acid sequence. In addition, in an amino acid sequence, patterns of local bonding can be identified as secondary structure. Two most prevalent types of secondary structure are “α-helices” and “β-sheets” and regions that are named “loop regions” connect these elements of secondary structure. The final level that forms a tertiary (or 3D) protein structure is composed of the mentioned elements and form after the protein folds into its native state. As an example, [Figure 1] represents the 3D structure of CRISPR-associated 9 (Cas9) protein (PDB ID: 5FQ5). The structure of Cas9 was visualized using PyMOL software. In the figure, the α-helices, β-sheets, and loop regions are displayed in red, yellow, and green colors, respectively.
|Figure 1: The three-dimensional structure of CRISPR-associated 9 protein (Protein Data Bank ID: 5FQ5). The protein structure was visualized using PyMOL molecular visualization tool. The α-helices, α-sheets and loop regions are shown in red, yellow and green colors, respectively|
Click here to view
| Protein Structure Prediction|| |
For many years, a challenge about the prediction of proteins tertiary structure from their amino acid sequence has attracted researchers in the different field of study. There is sufficient evidence about the importance of three-dimensional structure information in the recent years, and consequently, the potential impact of advances in proteins structure prediction is huge. As an example, one cannot gain considerable evidence about the structure-function relationships among the members of a protein family based on a small number of available structures of family members. However, models that are generated from protein family members derived by using experimentally determined structures make it possible to deduce such structure-function relationships., Models can also be utilized as a basis for analyzing the function of individual proteins, much in the way that is performed with experimentally resolved structures. However, in spite of the enormous potential impact of the protein structure prediction, the degree of confidence in which generated models can be utilized in various scientific applications is ambiguous.
To find the native structure of proteins, the physicochemical principles as well as identifying the lowest free-energy states are considered to be the best properties. Aimed to predict target proteins with unknown structures, the bioinformatics-based methods have earned considerable success. Such approaches gather information from solved structures of other related proteins that are deposited in the Protein Data Bank (PDB). The critical steps of bioinformatics-based methods involve target (query)-template sequence alignments, fold-recognition, fragment-based structural assembly and multiple template-based structural refinements., In [Table 1], a summary of the publicly available software and web-servers for automated protein structure predictions is listed.
|Table 1: A list of publicly available tools for protein structure modeling|
Click here to view
| Review of Protein Structure Prediction Approaches|| |
Protein structure prediction methods have been mainly classified into three types: ab Initio folding, comparative (homology) modeling, and threading. Each mentioned method may be applied for a protein structure, depending on the existence of related experimental structures that are deposited in the PDB.
Ab Initio (also named de novo) modeling class is originally defined as the methods that are based on the first principle laws of chemistry and physics that declare the native state of a protein places at the minimum of global free-energy., Hence, Ab Initio procedure tries to fold a given protein from the query sequence employing different force fields and broad conformational search algorithms. However, limited success has been illustrated by applying such physicochemical principle-based techniques. The most appropriate methods in this class still use the evolutionary and knowledge-based information to gather short structural fragments and spatial restraints to aid structural assembly process., This class is now named “free modeling” in the CASP experiments because many of the techniques do not perfectly trust in the first principles.
In comparative modeling (CM), protein structure is predicted by comparing the sequence of a query (also named target) protein to an evolutionarily associate protein with a known structure (also named template) in the PDB. Therefore, a necessity for CM method is the existence of a homologous protein in the PDB database. The CM models routinely have a strong bias and are closer to the template structure rather than the native structure of the target protein. In this context, the CM methods produce models by copying the aligned structures of the templates or satisfying contact/distance restraints from the templates. It is considered as an essential limit of the approach. Consequently, one of the significant questions to CM (and to other template-based approaches) is how to refine the generated models closer to the native structure than the used templates.
Threading (also named fold recognition) is a bioinformatics strategy that search in the PDB library to find protein templates that have a similar fold or structural motif to the query protein. It is comparable to CM in the sense that both strategies attempt to generate a structural model by applying the experimentally solved structures as a template. It is demonstrated that many proteins with low sequence identity can have similar folds. Therefore, threading procedure focuses to detect the target-template alignments regardless of the evolutionary relationship.
When the sequence identity is low, recognition of exact target-template alignments is a critically significant issue. Thus, the design of exact alignment scoring function is significant to the effectiveness of the methods. The frequently employed alignment scores contain sequence-structural profile match, secondary structure match, sequence profile–profile alignments, and residue–residue contacts  with the best scoring alignments commonly discovered by Hidden Markov modeling  or dynamic simulation. In the recent years, the approaches of composite scoring functions containing multiple structural properties such as torsion angles and solvent accessibility can produce additional advantages in the protein template identifications.
In the field of protein structure prediction, a common trend that borders between the conventional types of modeling approaches has become blurred. Many Ab Initio techniques apply spatial restraints or structural fragments that are identified by threading method. Besides, both comparative and threading modeling techniques depend on multiple sequence alignments. However, in the field of protein structure prediction, no single technique can outperform others for all protein targets, therefore meta-server approaches have been introduced as the second trend. A common meta-server approach is to generate a number of models by multiple programs which are developed by different laboratories, then selection the final models from the best ranking ones. In spite of availability of different approaches that can be tried in protein model and template selections, the most effective model selection strategy seems to be the consensus selection. By definition, consensus selection is the most efficient model selection approach and selects the models that are most often build by various methods and generally the one that is the closest to the native.
Another efficient meta-server approach for ranking, selection and reconstructing protein models is based on multiple templates information. To direct the physics-based structural assembly simulations, this approach can exploit the spatial restraints and structural fragments elicited from the numerous templates. Therefore, the mentioned approach can generate models that have a refined quality compared to the models based on information of the individual templates. Considering to community-wide benchmark results of the recent CASP experiments, this approach represents the most effective and successful method.
| Application of Modeling|| |
Structure-based strategies are widely used in the rational development of drugs to discover the potent, selective, and low molecular-weight molecules. Homology models are considered as useful models in structure-based virtual screening processes, as demonstrated by various retrospective investigations on a large variety of different targets. In 2017, we investigated interactions between CTX-M-15 protein of Klebsiella pneumoniae and 2000 drug-like compounds as potential competitive inhibitors to carry out virtual screening and detect novel drug-like compounds as potential competitive inhibitors. [Figure 2], that is retrieved from our previous published article, displays molecular complex of a drug-like compound (ID: ZINC21811621) with CTX-M-15 visualized through PyMOL.
|Figure 2: Molecular complex of a drug-like compound (ID: ZINC21811621) with CTX-M-15 visualized via PyMOL|
Click here to view
Moreover, in different studies, predicting the possible effects of amino acid sequence variations within the spatial locations of functionally important residues (such as active/binding sites and sites of disease-associated mutations) is reported as an important issue., Such prediction can be done using the structural modeling.
| Model Refinement Strategies|| |
Once an initial model is generated, refinement simulations are conducted to reassemble the global topology and the local structures of the protein chains. Since significant features of a model may be in regions that are structurally distinct from the template, refining of a primary model is influential. The mentioned regions are included side chains that are dissimilar in the template and its target and loops that are located between secondary structure elements and may have a quite distinct conformation in the target and template. Side chain and loop modeling procedures are based on this assumption that the secondary structure elements of a target protein are alike to those in the template structure.
For the computation of side chain conformations, the most frequently used approaches employ the detected relationship between backbone and side chain conformations and routinely utilize a “rotamer library” produced from a database of known structures. Approaches vary in the manner in which rotamers are sampled. The energy function is exploited to assess the individual conformations. Currently, it is likely to predict the conformations of buried side chains with close to experimental precision.
Loop modeling methods generally generate a starting model of the loop in ''open'' conformation in which one end of the loop is not linked to its subsequent residue. Then, the programs close the loop applying different algorithms., The procedure is repeated several times employing various starting conformations. Obtained conformations are then checked using several energy functions. In general, it is suggested that a combination of thorough sampling and a conformational energy calculation can generate very accurate results.,
| Model Evaluation|| |
A number of structural conformations (also named structural decoys) will be resulted from the structural assembly simulations. Among all the likely alternative conformations that are closest to the native structure, the high quality tertiary model with accurate fold must be selected. A trustful strategy is included a stereo-chemical check and discovering how the model deviates from the basic disciplines of known experimental structures.
To determine whether a model satisfies standard steric and geometric criteria, a number of programs have been developed. To building a model, all mentioned tools are involved in the template selection, alignment, model building, and refinement and have their own internal measures of quality. However, ultimately, the most significant criterion for the quality of a model is its conformational energy. Therefore, some scoring functions that reflect this energy should be applied to choose the best model by searching among the tens, hundreds, or even thousands of predicted potential models., Ramachandran plot in PROCHECK (http://swissmodel.expasy.org/workspace) is a usefulness plot to check the residue–residue stereochemical quality of a refined protein. In one of the our previous published article, the modeled molecular chaperone GroEL from Salmonella More Details typhi was modeled and evaluated through Ramachandran plot in PROCHECK. Here, [Figure 3] that was retrieved from the mentioned published article shows the quality of the resulting stereochemistry of the model by using Ramachandran plot. Considering to the figure, most residues of the modeled GroEL are within allowed regions (98.8%). This plot indicated that 91.9% of residues are located in most favored regions, 6.8% in additional allowed regions, 0.2% in generously allowed regions and 1.1% in disallowed regions of the plot. The most favored, additional and generously allowed regions are represented with red, yellow and pale yellow colors, respectively. The disallowed regions are in white color.
To deal with a large number of archived conformations, a hierarchical method to model valuation is usually employed. To rank all the original models, the method uses easy-to-evaluate and simplified scoring functions. With this strategy, a subset can be selected for more computationally detailed evaluation. A routinely used scoring function is Verify3D., Verify3D assesses segments of the model based on how well the environment of the residues in that segments correlate with their detected propensities for being in that environment.
Statistics-based scoring functions, such as ProsaII,, measure the stability of a polypeptide from the frequency that the interactions (atom-atom or residue–residue) identified in that conformation becomes clear in the database of known structures. Such functions are simple to assess since they depend only on the distance between pairs of atoms. In our previous article, the structure of the modeled GroEL was evaluated using ProSA-web  to see the energy distribution in the protein structure as a function of sequence position to determine the structure as native-like or fault. Here, [Figure 4] (retrieved from the mentioned published article) shows that the modeled GroEL is within the range of scores typically found for native proteins of similar size. Considering to figure, the ProSAweb z-score of the structure is-11.0.
|Figure 4: Evaluation of the quality of GroEL tertiary structure via ProSa-web|
Click here to view
There are several alternatives of statistics-based scoring functions., Detailed all-atom estimations of conformational stability can be employed using molecular mechanics force fields of the type applied in molecular dynamics simulations.
These approaches have recorded impressive successes in their ability to fold protein fragments from unfolded conformations,, their applications to the “decoy” problem and their capability to choose the experimentally determined X-ray structure among a large number of variant conformations of the same polypeptide chain.
While predicting a native conformation from a set of decoys, there are major challenges including sampling and evaluating enough conformations. This is not a novel challenge, and it will not be simple to resolve. Indeed, researchers believe that the molecular dynamics approaches can be employed to achieve this goal. Such methods can fold protein fragments from disordered states and give an inaccurate model that is relatively close to the native structure. Then, the model is refined to a conformation that is near the native conformation. However, this goal has not yet been achieved. Another solution needs a combination of improved alignment methods, finding structural templates for each problematic region of a structure, and using the improved scoring functions and sampling procedures.,
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
Garza-Fabre M, Kandathil SM, Handl J, Knowles J, Lovell SC. Generating, maintaining, and exploiting diversity in a memetic algorithm for protein structure prediction. Evol Comput 2016;24:577-607.
Olson JS, Lubner JM, Meyer DJ, Grant JE. An in silico
analysis of primary and secondary structure specificity determinants for human peptidylarginine deiminase types 2 and 4. Comput Biol Chem 2017;70:107-15.
Tsaousis GN, Hamodrakas SJ, Bagos PG. Predicting beta barrel transmembrane proteins using HMMs. In: Westhead D, Vijayabaskar M, editors. Hidden Markov Models: Methods in Molecular Biology. Vol. 1552. New York: Humana Press; 2017.
Carmali S, Murata H, Amemiya E, Matyjaszewski K, Russell AJ. Tertiary structure-based prediction of how atrp initiators react with proteins. ACS Biomater Sci Eng 2017;3:2086-97.
Sündermann F, Fernandez MP, Morgan RO. An evolutionary roadmap to the microtubule-associated protein MAP tau. BMC Genomics 2016;17:264.
Joseph AP, de Brevern AG. From local structure to a global framework: Recognition of protein folds. J R Soc Interface 2014;11:20131147.
Ovchinnikov S, Park H, Kim DE, Liu Y, Wang RY, Baker D, et al.
Structure prediction using sparse simulated NOE restraints with Rosetta in CASP11. Proteins 2016;84 Suppl 1:181-8.
Roy A, Zhang Y. Protein Structure Prediction. eLS; 2012. DOI: 10.1002/9780470015902.a0003031.pub2.
Webb B, Sali A. Protein structure modeling with modeller. In: Kaufmann M, Klinger C, Savelsbergh A, editors. Functional Genomics: Methods in Molecular Biology. Vol. 1654. New York: Humana Press; 2017.
Rigden DJ, Cymerman IA, Bujnicki JM. Prediction of protein function from theoretical models In: Rigden DJ, editors. From Protein Structure to Function with Bioinformatics. Dordrecht: Springer; 2017.
Zhang W, Yang J, He B, Walker SE, Zhang H, Govindarajoo B, et al.
Integration of QUARK and I-TASSER for Ab initio protein structure prediction in CASP11. Proteins 2016;84 Suppl 1:76-86.
Anfinsen CB. Principles that govern the folding of protein chains. Science 1973;181:223-30.
Xu D, Zhang Y. Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field. Proteins 2012;80:1715-35.
Simons KT, Kooperberg C, Huang E, Baker D. Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J Mol Biol 1997;268:209-25.
Moult J, Fidelis K, Kryshtafovych A, Rost B, Tramontano A. Critical assessment of methods of protein structure prediction-Round VIII. Proteins 2009;77 Suppl 9:1-4.
Jauch R, Yeo HC, Kolatkar PR, Clarke ND. Assessment of CASP7 structure predictions for template free targets. Proteins 2007;69 Suppl 8:57-67.
Read RJ, Chavali G. Assessment of CASP7 predictions in the high accuracy template-based modeling category. Proteins 2007;69 Suppl 8:27-37.
Rakesh R, Krishnan R, Sattlegger E, Srinivasan N. Recognition of a structural domain (RWDBD) in Gcn1 proteins that interacts with the RWD domain containing proteins. Biol Direct 2017;12:12.
Söding J. Protein homology detection by HMM-HMM comparison. Bioinformatics 2005;21:951-60.
Kamisetty H, Ovchinnikov S, Baker D. Assessing the utility of coevolution-based residue-residue contact predictions in a sequence-and structure-rich era. Proc Natl Acad Sci U S A 2013;110:15674-9.
Eddy SR. Profile hidden Markov models. Bioinformatics 1998;14:755-63.
Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 1970;48:443-53.
Yang Y, Faraggi E, Zhao H, Zhou Y. Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates. Bioinformatics 2011;27:2076-82.
Xu D, Zhang J, Roy A, Zhang Y. Automated protein structure modeling in CASP9 by I-TASSER pipeline combined with QUARK-based Ab initio folding and FG-MD-based structure refinement. Proteins 2011;79 Suppl 10:147-60.
Yang J, Zhang Y. Protein structure and function prediction using I-TASSER. Curr Protoc Bioinform 2015. 52:5.8.1-5.8.15. doi: 10.1002/0471250953.bi0508s52.
Wu S, Zhang Y. LOMETS: A local meta-threading-server for protein structure prediction. Nucleic Acids Res 2007;35:3375-82.
Zhang J, Wang Q, Barz B, He Z, Kosztin I, Shang Y, et al.
MUFOLD: A new solution for protein 3D structure prediction. Proteins 2010;78:1137-52.
Das R, Qian B, Raman S, Vernon R, Thompson J, Bradley P, et al.
Structure prediction for CASP7 targets using extensive all-atom refinement with Rosetta@home. Proteins 2007;69 Suppl 8:118-28.
Farhadi T, Fakharian A, Ovchinnikov RS. Virtual screening for potential inhibitors of CTX-M-15 protein of Klebsiella pneumoniae
. Interdiscip Sci 2017; [In press]. [Doi: 10.1007/s12539-017-0222-y].
Schwede T. Protein modeling: What happened to the “protein structure gap”? Structure 2013;21:1531-40.
Jacobs TM, Williams B, Williams T, Xu X, Eletsky A, Federizon JF, et al.
Design of structurally distinct proteins using strategies inspired by evolution. Science 2016;352:687-90.
Petrey D, Honig B. Protein structure prediction: Inroads to biology. Mol Cell 2005;20:811-9.
Sieradzan AK, Krupa P, Scheraga HA, Liwo A, Czaplewski C. Physics-based potentials for the coupling between backbone-and side-chain-local conformational states in the United Residue (UNRES) force field for protein simulations. J Chem Theory Comput 2015;11:817-31.
Ollikainen N, de Jong RM, Kortemme T. Coupling protein side-chain and backbone flexibility improves the re-design of protein-ligand specificity. PLoS Comput Biol 2015;11:e1004335.
Kolodny R, Guibas L, Levitt M, Koehl P. Inverse kinematics in biology: The protein loop closure problem. Int J Robot Res 2005;24:151-63.
Abel R, Wang L, Harder ED, Berne BJ, Friesner RA. Advancing drug discovery through enhanced free energy calculations. Acc Chem Res 2017;50:1625-32.
Jacobson, M, Sali A. Comparative protein structure modeling and its applications to drug discovery. In: Overington J, editor. Annual Reports in Medicinal Chemistry. London: Academic Press; 2004. p. 259-76.
Wlodawer A. Stereochemistry and validation of macromolecular structures. In: Wlodawer A, Dauter Z, Jaskolski M, editors. Protein Crystallography: Methods in Molecular Biology. Vol. 1607. New York: Humana Press; 2017.
Farhadi T, Ovchinnikov RS, Ranjbar MM. In silico
designing of some agonists of toll-like receptor 5 as a novel vaccine adjuvant candidates. Netw Model Anal Health Inform Bioinforma 2016;5:31. [DOI: 10.1007/s13721-016-0138-1].
Farhadi T, Hashemian SM. Constructing novel chimeric DNA vaccine against Salmonella enterica
based on SopB and GroEL proteins: An in silico
approach. J Pharm Investig 2017;[In Press]. [Doi: 10.1007/s40005-017-0360-6].
Gupta P, Mehrotra S, Sharma A, Chugh M, Pandey R, Kaushik A, et al.
Exploring heme and hemoglobin binding regions of plasmodium heme detoxification protein for new antimalarial discovery. J Med Chem 2017;60:8298-308.
Eisenberg D, Lüthy R, Bowie JU. Verify3D: Assessment of protein models with three-dimensional profiles. Methods Enzymol 1997;277:396-404.
Goyal P, Qian HJ, Irle S, Lu X, Roston D, Mori T, et al.
Molecular simulation of water and hydration effects in different environments: Challenges and developments for DFTB based models. J Phys Chem B 2014;118:11007-27.
Ul-Haq Z, Gul S, Usmani S, Wadood A, Khan W. Binding site identification and role of permanent water molecule of PIM-3 kinase: A molecular dynamics study. J Mol Graph Model 2015;62:276-82.
Sippl MJ. Recognition of errors in three-dimensional structures of proteins. Proteins 1993;17:355-62.
Wiederstein M, Sippl MJ. ProSA-web: Interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic Acids Res 2007;35:W407-10.
Ołdziej S, Czaplewski C, Liwo A, Chinchio M, Nanias M, Vila JA, et al.
Physics-based protein-structure prediction using a hierarchical protocol based on the UNRES force field: Assessment in two blind tests. Proc Natl Acad Sci U S A 2005;102:7547-52.
Zhang C, Liu S, Zhou H, Zhou Y. An accurate, residue-level, pair potential of mean force for folding and binding based on the distance-scaled, ideal-gas reference state. Protein Sci 2004;13:400-11.
Piana S, Klepeis JL, Shaw DE. Assessing the accuracy of physical models used in protein-folding simulations: Quantitative evidence from long molecular dynamics simulations. Curr Opin Struct Biol 2014;24:98-105.
Zhu J, Alexov E, Honig B. Comparative study of generalized born models: Born Radii and peptide folding. J Phys Chem B 2005;109:3008-22.
Angamuthu K, Piramanayagam S. Evaluation of in silico
protein secondary structure prediction methods by employing statistical techniques. Biomed Biotechnol Res J 2017;1:29-36. [Full text]
Fogolari F, Tosatto SC. Application of MM/PBSA colony free energy to loop decoy discrimination: Toward correlation between energy and root mean square deviation. Protein Sci 2005;14:889-901.
O'Meara MJ, Leaver-Fay A, Tyka MD, Stein A, Houlihan K, DiMaio F, et al.
Combined covalent-electrostatic model of hydrogen bonding improves structure prediction with rosetta. J Chem Theory Comput 2015;11:609-22.
[Figure 1], [Figure 2], [Figure 3], [Figure 4]