|Table of Contents|

Protein-metal-ion interaction sites prediction based onclass imbalance learning(PDF)


Research Field:
Publishing date:


Protein-metal-ion interaction sites prediction based onclass imbalance learning
Qiao LiangXie Dongqing
School of Mathematics and Information Science,Guangzhou University,Guangzhou 510006,China
class imbalance learning protein-metal-ion interaction sites prediction support vector machine
A new class imbalance learning algorithm combining the under-sampling and over-sampling methods is proposed to relieve the problem of data imbalance distribution and improve the prediction performance of protein-metal-ion interaction sites(PMIIS). The majority and minority samples are sampled at the same time,the information of the minority samples is complemented,and the redundant information of the majority samples is reduced. A new sequence-based prediction method is designed based on the new class imbalance learning algorithm and support vector machine(SVM)algorithm. A relatively complete standard dataset including the interaction sites of protein-Zn2+,protein-Ca2+ and protein-Fe3+ is constructed to objectively evaluate the performance of PMIIS prediction. The experimental results of the dataset show that,the average Matthew’s correlation coefficients(MCC)of the proposed method is 0.646 on protein-Zn2+,protein-Ca2+ and protein-Fe3+ interaction site predictions,which is better than that of TargetS and IonCom.


[1] Hu Jun,He Xue,Yu Dongjun,et al. A new supervised over-sampling algorithm with application to protein-nucleotide binding residue prediction[J]. PLoS ONE,2014,9(9):e107676.
[2]Yu Dongjun,Hu Jun,Yang Jing,et al. Designing template-free predictor for targeting protein-ligand binding sites with classifier ensemble and spatial clustering[J]. IEEE-ACM Transactions on Computational Biology and Bioinformatics,2013,10(4):994-1008.
[3]Rausell A,Juan D,Pazos F,et al. Protein interactions and ligand binding:From protein subfamilies to functional specificity[J]. Proceedings of the National Academy of Sciences,2010,107(5):1995-2000.
[4]赵欣,蒲小平. 蛋白质组学在药物研究中的应用[J]. 中国药理学通报,2009,25(8):988-991.
Zhao Xin,Pu Xiaoping. The application of proteomics technology in drug study[J]. Chinese Pharmacological Bulletin,2009,25(8):988-991.
[5]Chen K,Mizianty M J,Kurgan L. Prediction and analysis of nucleotide-binding residues using sequence and sequence-derived structural descriptors[J]. Bioinformatics,2012,28(3):331-341.
[6]Hendlich M,Rippmann F,Barnickel G. LIGSITE:Automatic and efficient detection of potential small molecule-binding sites in proteins[J]. Journal of Molecular Graphics and Modelling,1997,15(6):359-363.
[7]Wass M N,Kelley L A,Sternberg M J E. 3DLigandSite:Predicting ligand-binding sites using similar structures[J]. Nucleic Acids Research,2010,38(Suppl_2):W469-W473.
[8]Yang Jianyi,Roy A,Zhang Yang. Protein-ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment[J]. Bioinformatics,2013,29(20):2588-2595.
[9]Chauhan J S,Mishra N K,Raghava G P. Identification of ATP binding residues of a protein from its primary sequence[J]. BMC Bioinformatics,2009,10:434.
[10]Hu Xiuzhen,Dong Qiwen,Yang Jianyi,et al. Recognizing metal and acid radical ion-binding sites by integrating ab initio modeling with template-based transferals[J]. Bioinformatics,2016,32(21):3260-3269.
[11]Yu Dongjun,Hu Jun,Tang Zhenmin,et al. Improving protein-ATP binding residues prediction by boosting SVMs with random under-sampling[J]. Neurocomputing,2013,104:180-190.
[12]Panwar B,Gupta S,Raghava G P S. Prediction of vitamin interacting residues in a vitamin binding protein using evolutionary information[J]. BMC Bioinformatics,2013,14:44.
[13]Yan Changhui,Terribilini M,Wu Feihong,et al. Predicting DNA-binding sites of proteins from amino acid sequence[J]. BMC Bioinformatics,2006,7:262.
[14]Roy A,Yang Jianyi,Zhang Yang. COFACTOR:An accurate comparative algorithm for structure-based protein function annotation[J]. Nucleic Acids Research,2012,40(W1):W471-W477.
[15]杨章静,刘传才,顾兴健,等. 依概率分类的保持投影及其在人脸识别中的应用[J]. 南京理工大学学报,2013,37(1):7-11.
Yang Zhangjing,Liu Chuangcai,Gu Xingjian,et al. Probabilistic classification preseving projections and its application to face recognition[J]. Journal of Nanjing University of Science and Technology,2013,37(1):7-11.
[16]Altschul S F,Madden T L,Sch?ffer A A,et al. Gapped BLAST and PSI-BLAST:A new generation of protein database search programs[J]. Nucleic Acids Research,1997,25(17):3389-3402.
[17]Jones D T. Protein secondary structure prediction based on position-specific scoring matrices[J]. Journal of Molecular Biology,1999,292(2):195-202.
[18]Lee B,Richards F M. The interpretation of protein structures:Estimation of static accessibility[J]. Journal of Molecular Biology,1971,55(3):379-400.
[19]Joo K,Lee S J,Lee J. Sann:Solvent accessibility prediction of proteins by nearest neighbor method[J]. Proteins-structure Function & Bioinformatics,2012,80(7):1791-1797.
[20]Hu Jun,Li Yang,Yan Wuxia,et al. KNN-based dynamic query-driven sample rescaling strategy for class imbalance learning[J]. Neurocomputing,2016,191:363-373.
[21]Chang C C,Lin C J. LIBSVM:A library for support vector machines[J]. ACM Transactions on Intelligent Systems and Technology(TIST),2011,2(3):27.
[22]Liaw A,Wiener M. Classification and regression by random forest[J]. R news,2002,2(3):18-22.
[23]He Haibo,Garcia E A. Learning from imbalanced data[J]. IEEE Transactions on Knowledge and Data Engineering,2009,21(9):1263-1284.
[24]Rose P W,PrliAc’ A,Bi Chunxiao,et al. The RCSB protein data bank:Views of structural biology for basic and applied research and education[J]. Nucleic Acids Research,2015,43(D1):D345-D356.
[25]Li Weizhong,Godzik A. Cd-hit:A fast program for clustering and comparing large sets of protein or nucleotide sequences[J]. Bioinformatics,2006,22(13):1658-1659.


Last Update: 2018-12-30