[1]於东军,李 阳.蛋白质残基接触图预测[J].南京理工大学学报(自然科学版),2019,43(01):1.[doi:10.14177/j.cnki.32-1397n.2019.43.01.001]
 Yu Dongjun,Li Yang.Protein residue-residue contact map prediction[J].Journal of Nanjing University of Science and Technology,2019,43(01):1.[doi:10.14177/j.cnki.32-1397n.2019.43.01.001]





Protein residue-residue contact map prediction
於东军李 阳
南京理工大学 计算机科学与工程学院,江苏 南京 210094
Yu DongjunLi Yang
School of Computer Science and Engineering,Nanjing University of Scienceand Technology,Nanjing 210094,China
蛋白质残基接触图 蛋白质结构预测 协同进化 机器学习 国际蛋白质结构预测竞赛
protein contact map protein structure prediction co-evolution machine learning critical assessment of protein structure prediction
蛋白质是由多个氨基酸组成的长链,是生物体的必要组成成分,参与了生命活动的每一个进程。蛋白质结构决定了许多蛋白质的功能,准确预测蛋白质中氨基酸残基接触对于蛋白质结构预测具有重要意义,蛋白质残基接触问题已经成为当前生物信息领域的热点问题。该文首先给出了蛋白质残基接触图预测的相关背景知识及其重要意义; 其次,总结了当前国内外研究的主流方法,包括基于局部相关性的方法、直接耦合分析法与其后处理的方法、以及基于有监督机器学习的方法,并对其中的代表性方法进行了阐述; 结合国际蛋白质结构预测竞赛(Critical assessment of protein structure prediction,CASP)的结果对现有模型的性能做了对比和分析; 在此基础上,探讨了残基接触图预测在蛋白质结构功能建模中的应用; 最后,针对蛋白质接触图预测中存在的若干难点问题,给出了有望取得突破的若干研究方向。
Proteins are large biomolecules,consisting of one or more amino acids residues. Proteins are the most import components in living cells and are involved in almost every living process. The functions of proteins are mostly determined by their structures. Accurately prediction of protein contact maps plays an important role in protein three-dimensional structure prediction. It has been one of the hottest topics in bioinformatics to predict contact map. The background knowledge and the great significance of protein contact map prediction are firstly introduced. After that,we summarize some representative methods for contact map prediction,including correlation-based methods,direct coupling analysis methods and their post-process strategies. Supervised machine learning-based methods are also introduced in this section. Analysis and comparisons are made based on the performances of the most advanced methods in critical assessment of protein structure prediction(CASP)competition. Applications in protein 3D modeling based on predicted contact map are also introduced. Finally,promising directions are provided to the key issues in contact map prediction.


[1] 陈润生. 当前生物信息学的重要研究任务[J]. 生物工程进展,1999,19(4):11-14.
Chen Runsheng. Important tasks in current bioinformatics[J]. Progress in Biotechnology,1999,19(4):11-14.
[2]Kendrew J C,Bodo G,Dintzis H M,et al. A three-dimensional model of the myoglobin molecule obtained by x-ray analysis[J]. Nature,1958,181(4610):662-666.
[3]Wüthrich K. The way to NMR structures of proteins[J]. Nature Structural Biology,2001,8(11):923-925.
[4]Taylor K A,Glaeser R M. Electron diffraction of frozen,hydrated protein crystals[J]. Science,1974,186(4168):1036-1037.
[5]Baker D,Sali A. Protein structure prediction and structural genomics[J]. Science,2001,294(5540):93-96.
[6]Anfinsen C B. Principles that govern the folding of protein chains[J]. Science,1973,181(4096):223-230.
[7]Koehl P,Levitt M. A brighter future for protein structure prediction[M]. Nature Publishing Group,1999.
[8]Venselaar H,Joosten R P,Vroling B,et al. Homology modelling and spectroscopy,a never-ending love story[J]. European Biophysics Journal,2010,39(4):551-563.
[9]Jones D T,Taylort W R,Thornton J M. A new approach to protein fold recognition[J]. Nature,1992,358(6381):86-89.

[10]Liwo A,Lee Jooyoung,Ripoll D R,et al. Protein structure prediction by global optimization of a potential energy function[J]. Proceedings of the National Academy of Sciences,1999,96(10):5482-5485.
[11]Rohl C A,Strauss Charlie E M,Misura K M S,et al. Protein structure prediction using Rosetta[J]. Methods in Enzymology,2004,383:66-93.
[12]Xu Dong,Zhang Jian,Roy A,et al. Automated protein structure modeling in CASP9 by I-TASSER pipeline combined with QUARK-based ab initio folding and FG-MD-based structure refinement[J]. Proteins:Structure,Function,and Bioinformatics,2011,79(S10):147-160.
[13]Xu Dong,Zhang Yang. Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field[J]. Proteins:Structure,Function,and Bioinformatics,2012,80(7):1715-1735.
[14]Adhikari B,Bhattacharya D,Cao Renzhi,et al. CONFOLD:residue-residue contact-guided ab initio protein folding[J]. Proteins:Structure,Function,and Bioinformatics,2015,83(8):1436-1449.
[15]Ovchinnikov S,Park H,Varghese N,et al. Protein structure determination using metagenome sequence data[J]. Science,2017,355(6322):294-298.
[16]Cocco S,Feinauer C,Figliuzzi M,et al. Inverse statistical physics of protein sequences:a key issues review[J]. Rep Prog Phys,2018,81(3):032601.

[17]G?bel U,Sander C,Schneider R,et al. Correlated mutations and residue contacts in proteins[J]. Proteins:Structure,Function,and Bioinformatics,1994,18(4):309-317.
[18]Martin L C,Gloor G B,Dunn S D,et al. Using information theory to search for co-evolving residues in proteins[J]. Bioinformatics,2005,21(22):4116-4124.
[19]Kass I,Horovitz A. Mapping pathways of allosteric communication in GroEL by analysis of correlated mutations[J]. Proteins:Structure,Function,and Bioinformatics,2002,48(4):611-617.
[20]Wu Fa-Yueh. The potts model[J]. Reviews of Modern Physics,1982,54(1):235-265.

[21]Morcos F,Pagnani A,Lunt B,et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families[J]. Proceedings of the National Academy of Sciences,2011,108(49):E1293-E1301.

[22]Jones D T,Buchan D W A,Cozzetto D,et al. PSICOV:precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments[J]. Bioinformatics,2011,28(2):184-190.
[23]Ma Jianzhu,Wang Sheng,Wang Zhiyong,et al. Protein contact prediction by integrating joint evolutionary coupling analysis and supervised learning[J]. Bioinformatics,2015,31(21):3506-3513.
[24]Meier L,Van De G S,Bühlmann P. The group lasso for logistic regression[J]. Journal of the Royal Statistical Society:Series B(Statistical Methodology),2008,70(1):53-71.
[25]Breiman L. Random forests[J]. Machine Learning,2001,45(1):5-32.
[26]Ekeberg M,L?vkvist C,Lan Yueheng,et al. Improved contact prediction in proteins:using pseudolikelihoods to infer Potts models[J]. Physical Review E,2013,87(1):012707.
[27]Liu Dong C,Nocedal J. On the limited memory BFGS method for large scale optimization[J]. Mathematical Programming,1989,45(1-3):503-528.
[28]Hestenes M R,Stiefel E. Methods of conjugate gradients for solving linear systems[M]. Washington,DC:NBS,1952.
[29]Arnold B C,Strauss D. Pseudolikelihood estimation[J]. Sankhya,Ser B,1988,53:233-243.
[30]Kamisetty H,Ovchinnikov S,Baker D. Assessing the utility of coevolution-based residue-residue contact predictions in a sequence-and structure-rich era[J]. Proceedings of the National Academy of Sciences,2013,201314045.
[31]Seemayer S,Gruber M,S?ding J. CCMpred-fast and precise prediction of protein residue-residue contacts from correlated mutations[J]. Bioinformatics,2014,30(21):3128-3130.
[32]Zhang Haicang,Zhang Qi,Ju Fusong,et al. Predicting protein inter-residue contacts using composite likelihood maximization and deep learning[J]. arXiv preprint arXiv:1809.00083,2018,
[33]Fletcher R. Practical methods of optimization[M]. New York,USA:John Wiley & Sons,2013.
[34]Schmidt M,Hamacher K. Three-body interactions improve contact prediction within direct-coupling analysis[J]. Physical Review E,2017,96(5):052405.
[35]Dunn S D,Wahl L M,Gloor G B. Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction[J]. Bioinformatics,2007,24(3):333-340.
[36]Vorberg S,Seemayer S,Soeding J. Synthetic protein alignments by CCMgen quantify noise in residue-residue contact prediction[J]. BioRxiv,2018,344333.
[37]Zhang Haicang,Gao Yujuan,Deng Minghua,et al. Improving residue-residue contact prediction via low-rank and sparse decomposition of residue correlation matrix[J]. Biochemical and biophysical research communications,2016,472(1):217-222.
[38]Lin Zhouchen,Chen Minming,Ma Yi. The augmented Lagrange multiplier method for exact recovery of corrupted low-rank matrices[J]. arXiv preprint arXiv:10095055,2010,
[39]於东军,朱一亨,胡俊. 识别蛋白质配体绑定残基的生物计算方法综述[J]. 数据采集与处理,2018,33(2):195-206.
Yu Dongjun,Zhu Yiheng,Hu Jun. An overview of biocomputing methods of targeting protein-ligand binding residues[J]. Journal of Data Acquisition and Processing,2018,33(2):195-206
[40]魏志森,杨静宇,於东军. 基于加权 PSSM 直方图和随机森林集成的蛋白质交互作用位点预测[J]. 南京理工大学学报,2015,39(4):379-385.
Wei Zhisen,Yang Jingyu,Yu Dongjun.Protein-protein interaction sites prediction based on weighted PSSM histogram and random forests ensemble[J]. Journal of Nanjing University of Science and Technology,2015,39(4):379-385.
[41]郜法启,於东军,沈红斌. 基于分类器集成的跨膜蛋白两亲螺旋区域位置预测[J]. 南京理工大学学报,2016,40(4):431-437.
Gao Faqi,Yu Dongjun,Shen Hongbin.Prediction of amphipathic helices in transmembrane proteins by using ensembled classifier[J]. Journal of Nanjing University of Science and Technology,2016,40(4):431-437
[42]Skwark M J,Raimondi D,Michel M,et al. Improved contact predictions using the recognition of protein like contact patterns[J]. PLoS Computational Biology,2014,10(11):e1003889.
[43]金康荣 於东军. 基于加权朴素贝叶斯分类器和极端随机树的蛋白质接触图预测[J]. 南京航空航天大学学报,2018,50(5):619-628.
Jin Kangrong,Yu Dongjun. Improved contact map prediction using weighted Na?ve Bayes classifier and extremely randomized trees[J]. Journal of Nanjing University of Aeronautics & Astronautics,2018,50(5):619-628.
[44]Cheng Jianlin,Baldi P. Improved residue contact prediction using support vector machines and a large feature set[J]. BMC Bioinformatics,2007,8(1):1-9.
[45]He B,Mortuza S M,Wang Y,et al. NeBcon:Protein contact map prediction using neural network training coupled with na?ve Bayes classifiers[J]. Bioinformatics,2017,33(15):2296.
[46]Buchan D W A,Jones D T. Improved protein contact predictions with the MetaPSICOV2 server in CASP12[J]. Proteins:Structure,Function,and Bioinformatics,2018,86:78-83.
[47]Liu Yang,Palmedo P,Ye Qing,et al. Enhancing evolutionary couplings with deep convolutional neural networks[J]. Cell Systems,2018,6(1):65.
[48]Adhikari B,Hou J,Cheng J. DNCON2:improved protein contact prediction using two-level deep convolutional neural networks[J]. Bioinformatics,2017,34(9):1466-1472.
[49]Wang S,Sun S,Li Z,et al. Accurate de novo prediction of protein contact map by ultra-deep learning model[J]. PLoS Computational Biology,2017,13(1):e1005324.
[50]Jones D T,Singh T,Kosciolek T,et al. Meta PSICOV:combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins[J]. Bioinformatics,2014,31(7):999-1006.
[51]Mcguffin L J,Bryson K,Jones D T. The PSIPRED protein structure prediction server[J]. Bioinformatics,2000,16(4):404-405.
[52]Remmert M,Biegert A,Hauser A,et al. HHblits:lightning-fast iterative protein sequence searching by HMM-HMM alignment[J]. Nature Methods,2012,9(2):173-175.
[53]Johnson L S,Eddy S R,Portugaly E. Hidden Markov model speed heuristic and iterative HMM search procedure[J]. BMC bioinformatics,2010,11(1):1-8.
[54]Suzek B E,Wang Yuqi,Huang Hongzhan,et al. UniRef clusters:a comprehensive and scalable alternative for improving sequence similarity searches[J]. Bioinformatics,2014,31(6):926-932.
[55]Wang Z,Xu J. Predicting protein contact map using evolutionary and physical constraints by integer programming[J]. Bioinformatics,2013,29(13):266-273.
[56]Ukil A. Support vector machine[J]. Computer Science,2002,1(4):1-28.
[57]Cheng Jianlin,Baldi P. Improved residue contact prediction using support vector machines and a large feature set[J]. BMC Bioinformatics,2007,8(1):113.
[58]Schaarschmidt J,Monastyrskyy B,Kryshtafovych A,et al. Assessment of contact predictions in CASP12:Co-evolution and deep learning coming of age[J]. Proteins:Structure,Function,and Bioinformatics,2018,86:51-66.
[59]Sheridan R,Fieldhouse R J,Hayat S,et al. EVfold. org:evolutionary couplings and protein 3d structure prediction[J]. BioRxiv,2015:021022.
[60]Schneider M,Brock O. Combining physicochemical and evolutionary information for protein contact prediction[J]. PloS One,2014,9(10):e108438.
[61]Wu Sitao,Zhang Yang. A comprehensive assessment of sequence-based and template-based methods for protein contact prediction[J]. Bioinformatics,2008,24(7):924-931.
[62]Sabzekar M,Naghibzadeh M,Sazvar M,et al. BetaCon:Protein β-sheet prediction using consensus of predicted superior conformations[C]//Proceedings of the International Conference on Bioinformatics and Biomedical Science. New York:ACM,2017.
[63]Eickholt J,Cheng Jianlin. A study and benchmark of DNcon:a method for protein residue-residue contact prediction using deep networks[J]. BMC Bioinformatics,2013,14(14):1-10.
[64]Zhang Y. I-TASSER server for protein 3D structure prediction[J]. BMC Bioinformatics,2008,9(1):40-52.
[65]Roy A,Kucukural A,Zhang Y. I-TASSER:a unified platform for automated protein structure and function prediction[J]. Nature Protocols,2010,5(4):725-378.
[66]Yang J,Yan R,Roy A,et al. The I-TASSER Suite:protein structure and function prediction[J]. Nature Methods,2015,12(1):7-8.
[67]Lena P D,Fariselli P,Margara L,et al. Fast overlapping of protein contact maps by alignment of eigenvectors[J]. Bioinformatics,2010,26(18):2250-2258.
[68]Buchan D W A,Jones D T. Eigen THREADER:analogous protein fold recognition by efficient contact map threading[J]. Bioinformatics,2017,33(17):2684-2690.
[69]Lobley A,Sadowski M I,Jones D T. pGenTHREADER and pDomTHREADER:new methods for improved protein fold recognition and superfamily discrimination[J]. Bioinformatics,2009,25(14):1761-1767.
[70]S?ding J. Protein homology detection by HMM-HMM comparison[M]. Oxford,Britain:Oxford University Press,2005.
[71]Zhang Chengxin,Mortuza S M,He Baoji,et al. Template-based and free modeling of I-TASSER and QUARK pipelines using predicted contact maps in CASP12[J]. Proteins:Structure,Function,and Bioinformatics,2018,86:136-151.


收稿日期:2018-11-08 修回日期:2018-12-20
作者简介:於东军(1975-),男,博士,教授,博士生导师,主要研究方向:模式识别与智能信息处理、生物信息学,E-mail:njyudj@ njust.edu.cn; 通讯作者:李阳(1992-),男,博士生,主要研究方向:生物信息学,E-mail:liyangnjust@njust.edu.cn。
引文格式:於东军,李阳. 蛋白质残基接触图预测[J]. 南京理工大学学报,2019,43(1):1-12.
更新日期/Last Update: 2019-02-28