|Table of Contents|

Protein residue-residue contact map prediction(PDF)


Research Field:
Publishing date:


Protein residue-residue contact map prediction
Yu DongjunLi Yang
School of Computer Science and Engineering,Nanjing University of Scienceand Technology,Nanjing 210094,China
protein contact map protein structure prediction co-evolution machine learning critical assessment of protein structure prediction
Proteins are large biomolecules,consisting of one or more amino acids residues. Proteins are the most import components in living cells and are involved in almost every living process. The functions of proteins are mostly determined by their structures. Accurately prediction of protein contact maps plays an important role in protein three-dimensional structure prediction. It has been one of the hottest topics in bioinformatics to predict contact map. The background knowledge and the great significance of protein contact map prediction are firstly introduced. After that,we summarize some representative methods for contact map prediction,including correlation-based methods,direct coupling analysis methods and their post-process strategies. Supervised machine learning-based methods are also introduced in this section. Analysis and comparisons are made based on the performances of the most advanced methods in critical assessment of protein structure prediction(CASP)competition. Applications in protein 3D modeling based on predicted contact map are also introduced. Finally,promising directions are provided to the key issues in contact map prediction.


[1] 陈润生. 当前生物信息学的重要研究任务[J]. 生物工程进展,1999,19(4):11-14.
Chen Runsheng. Important tasks in current bioinformatics[J]. Progress in Biotechnology,1999,19(4):11-14.
[2]Kendrew J C,Bodo G,Dintzis H M,et al. A three-dimensional model of the myoglobin molecule obtained by x-ray analysis[J]. Nature,1958,181(4610):662-666.
[3]Wüthrich K. The way to NMR structures of proteins[J]. Nature Structural Biology,2001,8(11):923-925.
[4]Taylor K A,Glaeser R M. Electron diffraction of frozen,hydrated protein crystals[J]. Science,1974,186(4168):1036-1037.
[5]Baker D,Sali A. Protein structure prediction and structural genomics[J]. Science,2001,294(5540):93-96.
[6]Anfinsen C B. Principles that govern the folding of protein chains[J]. Science,1973,181(4096):223-230.
[7]Koehl P,Levitt M. A brighter future for protein structure prediction[M]. Nature Publishing Group,1999.
[8]Venselaar H,Joosten R P,Vroling B,et al. Homology modelling and spectroscopy,a never-ending love story[J]. European Biophysics Journal,2010,39(4):551-563.
[9]Jones D T,Taylort W R,Thornton J M. A new approach to protein fold recognition[J]. Nature,1992,358(6381):86-89.

[10]Liwo A,Lee Jooyoung,Ripoll D R,et al. Protein structure prediction by global optimization of a potential energy function[J]. Proceedings of the National Academy of Sciences,1999,96(10):5482-5485.
[11]Rohl C A,Strauss Charlie E M,Misura K M S,et al. Protein structure prediction using Rosetta[J]. Methods in Enzymology,2004,383:66-93.
[12]Xu Dong,Zhang Jian,Roy A,et al. Automated protein structure modeling in CASP9 by I-TASSER pipeline combined with QUARK-based ab initio folding and FG-MD-based structure refinement[J]. Proteins:Structure,Function,and Bioinformatics,2011,79(S10):147-160.
[13]Xu Dong,Zhang Yang. Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field[J]. Proteins:Structure,Function,and Bioinformatics,2012,80(7):1715-1735.
[14]Adhikari B,Bhattacharya D,Cao Renzhi,et al. CONFOLD:residue-residue contact-guided ab initio protein folding[J]. Proteins:Structure,Function,and Bioinformatics,2015,83(8):1436-1449.
[15]Ovchinnikov S,Park H,Varghese N,et al. Protein structure determination using metagenome sequence data[J]. Science,2017,355(6322):294-298.
[16]Cocco S,Feinauer C,Figliuzzi M,et al. Inverse statistical physics of protein sequences:a key issues review[J]. Rep Prog Phys,2018,81(3):032601.

[17]G?bel U,Sander C,Schneider R,et al. Correlated mutations and residue contacts in proteins[J]. Proteins:Structure,Function,and Bioinformatics,1994,18(4):309-317.
[18]Martin L C,Gloor G B,Dunn S D,et al. Using information theory to search for co-evolving residues in proteins[J]. Bioinformatics,2005,21(22):4116-4124.
[19]Kass I,Horovitz A. Mapping pathways of allosteric communication in GroEL by analysis of correlated mutations[J]. Proteins:Structure,Function,and Bioinformatics,2002,48(4):611-617.
[20]Wu Fa-Yueh. The potts model[J]. Reviews of Modern Physics,1982,54(1):235-265.

[21]Morcos F,Pagnani A,Lunt B,et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families[J]. Proceedings of the National Academy of Sciences,2011,108(49):E1293-E1301.

[22]Jones D T,Buchan D W A,Cozzetto D,et al. PSICOV:precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments[J]. Bioinformatics,2011,28(2):184-190.
[23]Ma Jianzhu,Wang Sheng,Wang Zhiyong,et al. Protein contact prediction by integrating joint evolutionary coupling analysis and supervised learning[J]. Bioinformatics,2015,31(21):3506-3513.
[24]Meier L,Van De G S,Bühlmann P. The group lasso for logistic regression[J]. Journal of the Royal Statistical Society:Series B(Statistical Methodology),2008,70(1):53-71.
[25]Breiman L. Random forests[J]. Machine Learning,2001,45(1):5-32.
[26]Ekeberg M,L?vkvist C,Lan Yueheng,et al. Improved contact prediction in proteins:using pseudolikelihoods to infer Potts models[J]. Physical Review E,2013,87(1):012707.
[27]Liu Dong C,Nocedal J. On the limited memory BFGS method for large scale optimization[J]. Mathematical Programming,1989,45(1-3):503-528.
[28]Hestenes M R,Stiefel E. Methods of conjugate gradients for solving linear systems[M]. Washington,DC:NBS,1952.
[29]Arnold B C,Strauss D. Pseudolikelihood estimation[J]. Sankhya,Ser B,1988,53:233-243.
[30]Kamisetty H,Ovchinnikov S,Baker D. Assessing the utility of coevolution-based residue-residue contact predictions in a sequence-and structure-rich era[J]. Proceedings of the National Academy of Sciences,2013,201314045.
[31]Seemayer S,Gruber M,S?ding J. CCMpred-fast and precise prediction of protein residue-residue contacts from correlated mutations[J]. Bioinformatics,2014,30(21):3128-3130.
[32]Zhang Haicang,Zhang Qi,Ju Fusong,et al. Predicting protein inter-residue contacts using composite likelihood maximization and deep learning[J]. arXiv preprint arXiv:1809.00083,2018,
[33]Fletcher R. Practical methods of optimization[M]. New York,USA:John Wiley & Sons,2013.
[34]Schmidt M,Hamacher K. Three-body interactions improve contact prediction within direct-coupling analysis[J]. Physical Review E,2017,96(5):052405.
[35]Dunn S D,Wahl L M,Gloor G B. Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction[J]. Bioinformatics,2007,24(3):333-340.
[36]Vorberg S,Seemayer S,Soeding J. Synthetic protein alignments by CCMgen quantify noise in residue-residue contact prediction[J]. BioRxiv,2018,344333.
[37]Zhang Haicang,Gao Yujuan,Deng Minghua,et al. Improving residue-residue contact prediction via low-rank and sparse decomposition of residue correlation matrix[J]. Biochemical and biophysical research communications,2016,472(1):217-222.
[38]Lin Zhouchen,Chen Minming,Ma Yi. The augmented Lagrange multiplier method for exact recovery of corrupted low-rank matrices[J]. arXiv preprint arXiv:10095055,2010,
[39]於东军,朱一亨,胡俊. 识别蛋白质配体绑定残基的生物计算方法综述[J]. 数据采集与处理,2018,33(2):195-206.
Yu Dongjun,Zhu Yiheng,Hu Jun. An overview of biocomputing methods of targeting protein-ligand binding residues[J]. Journal of Data Acquisition and Processing,2018,33(2):195-206
[40]魏志森,杨静宇,於东军. 基于加权 PSSM 直方图和随机森林集成的蛋白质交互作用位点预测[J]. 南京理工大学学报,2015,39(4):379-385.
Wei Zhisen,Yang Jingyu,Yu Dongjun.Protein-protein interaction sites prediction based on weighted PSSM histogram and random forests ensemble[J]. Journal of Nanjing University of Science and Technology,2015,39(4):379-385.
[41]郜法启,於东军,沈红斌. 基于分类器集成的跨膜蛋白两亲螺旋区域位置预测[J]. 南京理工大学学报,2016,40(4):431-437.
Gao Faqi,Yu Dongjun,Shen Hongbin.Prediction of amphipathic helices in transmembrane proteins by using ensembled classifier[J]. Journal of Nanjing University of Science and Technology,2016,40(4):431-437
[42]Skwark M J,Raimondi D,Michel M,et al. Improved contact predictions using the recognition of protein like contact patterns[J]. PLoS Computational Biology,2014,10(11):e1003889.
[43]金康荣 於东军. 基于加权朴素贝叶斯分类器和极端随机树的蛋白质接触图预测[J]. 南京航空航天大学学报,2018,50(5):619-628.
Jin Kangrong,Yu Dongjun. Improved contact map prediction using weighted Na?ve Bayes classifier and extremely randomized trees[J]. Journal of Nanjing University of Aeronautics & Astronautics,2018,50(5):619-628.
[44]Cheng Jianlin,Baldi P. Improved residue contact prediction using support vector machines and a large feature set[J]. BMC Bioinformatics,2007,8(1):1-9.
[45]He B,Mortuza S M,Wang Y,et al. NeBcon:Protein contact map prediction using neural network training coupled with na?ve Bayes classifiers[J]. Bioinformatics,2017,33(15):2296.
[46]Buchan D W A,Jones D T. Improved protein contact predictions with the MetaPSICOV2 server in CASP12[J]. Proteins:Structure,Function,and Bioinformatics,2018,86:78-83.
[47]Liu Yang,Palmedo P,Ye Qing,et al. Enhancing evolutionary couplings with deep convolutional neural networks[J]. Cell Systems,2018,6(1):65.
[48]Adhikari B,Hou J,Cheng J. DNCON2:improved protein contact prediction using two-level deep convolutional neural networks[J]. Bioinformatics,2017,34(9):1466-1472.
[49]Wang S,Sun S,Li Z,et al. Accurate de novo prediction of protein contact map by ultra-deep learning model[J]. PLoS Computational Biology,2017,13(1):e1005324.
[50]Jones D T,Singh T,Kosciolek T,et al. Meta PSICOV:combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins[J]. Bioinformatics,2014,31(7):999-1006.
[51]Mcguffin L J,Bryson K,Jones D T. The PSIPRED protein structure prediction server[J]. Bioinformatics,2000,16(4):404-405.
[52]Remmert M,Biegert A,Hauser A,et al. HHblits:lightning-fast iterative protein sequence searching by HMM-HMM alignment[J]. Nature Methods,2012,9(2):173-175.
[53]Johnson L S,Eddy S R,Portugaly E. Hidden Markov model speed heuristic and iterative HMM search procedure[J]. BMC bioinformatics,2010,11(1):1-8.
[54]Suzek B E,Wang Yuqi,Huang Hongzhan,et al. UniRef clusters:a comprehensive and scalable alternative for improving sequence similarity searches[J]. Bioinformatics,2014,31(6):926-932.
[55]Wang Z,Xu J. Predicting protein contact map using evolutionary and physical constraints by integer programming[J]. Bioinformatics,2013,29(13):266-273.
[56]Ukil A. Support vector machine[J]. Computer Science,2002,1(4):1-28.
[57]Cheng Jianlin,Baldi P. Improved residue contact prediction using support vector machines and a large feature set[J]. BMC Bioinformatics,2007,8(1):113.
[58]Schaarschmidt J,Monastyrskyy B,Kryshtafovych A,et al. Assessment of contact predictions in CASP12:Co-evolution and deep learning coming of age[J]. Proteins:Structure,Function,and Bioinformatics,2018,86:51-66.
[59]Sheridan R,Fieldhouse R J,Hayat S,et al. EVfold. org:evolutionary couplings and protein 3d structure prediction[J]. BioRxiv,2015:021022.
[60]Schneider M,Brock O. Combining physicochemical and evolutionary information for protein contact prediction[J]. PloS One,2014,9(10):e108438.
[61]Wu Sitao,Zhang Yang. A comprehensive assessment of sequence-based and template-based methods for protein contact prediction[J]. Bioinformatics,2008,24(7):924-931.
[62]Sabzekar M,Naghibzadeh M,Sazvar M,et al. BetaCon:Protein β-sheet prediction using consensus of predicted superior conformations[C]//Proceedings of the International Conference on Bioinformatics and Biomedical Science. New York:ACM,2017.
[63]Eickholt J,Cheng Jianlin. A study and benchmark of DNcon:a method for protein residue-residue contact prediction using deep networks[J]. BMC Bioinformatics,2013,14(14):1-10.
[64]Zhang Y. I-TASSER server for protein 3D structure prediction[J]. BMC Bioinformatics,2008,9(1):40-52.
[65]Roy A,Kucukural A,Zhang Y. I-TASSER:a unified platform for automated protein structure and function prediction[J]. Nature Protocols,2010,5(4):725-378.
[66]Yang J,Yan R,Roy A,et al. The I-TASSER Suite:protein structure and function prediction[J]. Nature Methods,2015,12(1):7-8.
[67]Lena P D,Fariselli P,Margara L,et al. Fast overlapping of protein contact maps by alignment of eigenvectors[J]. Bioinformatics,2010,26(18):2250-2258.
[68]Buchan D W A,Jones D T. Eigen THREADER:analogous protein fold recognition by efficient contact map threading[J]. Bioinformatics,2017,33(17):2684-2690.
[69]Lobley A,Sadowski M I,Jones D T. pGenTHREADER and pDomTHREADER:new methods for improved protein fold recognition and superfamily discrimination[J]. Bioinformatics,2009,25(14):1761-1767.
[70]S?ding J. Protein homology detection by HMM-HMM comparison[M]. Oxford,Britain:Oxford University Press,2005.
[71]Zhang Chengxin,Mortuza S M,He Baoji,et al. Template-based and free modeling of I-TASSER and QUARK pipelines using predicted contact maps in CASP12[J]. Proteins:Structure,Function,and Bioinformatics,2018,86:136-151.


Last Update: 2019-02-28