|Table of Contents|

Protein-protein interaction sites prediction based on weighted PSSM histogram and random forests ensemble


Research Field:
Publishing date:


Protein-protein interaction sites prediction based on weighted PSSM histogram and random forests ensemble
Wei ZhisenYang JingyuYu Dongjun
School of Computer Science and Engineering,NUST,Nanjing 210094,China
protein-protein interactions position specific scoring matrix weighted position specific scoring matrix histogram random forests classifier ensemble
In order to improve the accuracy of protein-protein interaction sites prediction,based on position specific scoring matrix(PSSM)of a protein,this paper develops a novel feature representation-weighted PSSM histogram.In view of the extreme imbalance in training data,combining under-sampling and classifier ensemble,a random forests ensemble classifier is trained.Compared with the traditional features,the features here possess a lower dimension reserving better discrimination.Classifier ensemble remits the damage of under-sampling and improves the performance.Experimental results show that the method here is effective and outperforms the state of the art methods on benchmark datasets.


[1] Okabe H,Lee S H,Phuchareon J,et al.A critical role for FBXW8 and MAPK in cyclin D1 degradation and cancer cell proliferation[J].PLoS ONE,2006,1(1):e128.
[2]Rohila J S,Chen Mei,Chen Shuo,et al.Protein-protein interactions of tandem affinity purified protein kinases from rice[J].PLoS ONE,2009,4(8):e6685.
Ji Junzhong,Liu Zhijun,Liu Hongxin,et al.An overview of research on functional module detection for protein-protein interaction networks[J].Acta Automatica Sinica,2014,4:577-593.
[4]Taylor C M,Fischer K,Abubucker S,et al.Targeting protein-protein interactions for parasite control[J].PLoS ONE,2011,6(4):e18381.
[5]Porollo A,Meller J.Prediction-based fingerprints of protein-protein interactions[J].Proteins:Structure,Function,and Bioinformatics,2007,66(3):630-645.
[6]Murakami Y,Mizuguchi K.Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites[J].Bioinformatics,2010,26(15):1841-1848.
[7]Dhole K,Singh G,Pai P P,et al.Sequence-based prediction of protein-protein interaction sites with L1-logreg classifier[J].Journal of Theoretical Biology,2014,348:47-54.
[8]Singh G,Dhole K,Pai P P,et al.SPRINGS:Prediction of protein-protein interaction sites using artificial neural networks[J].J Proteomics Computational Biol,2014,1(1):1-7.
Li Qianqian,Liu Xuying.EasyEnsemble.M for multiclass imbalance problem[J].Pattern Recognition and Artificial Intelligence,2014,27(2):187-192.
Yang Zhangjing,Liu Chuancai,Gu Xingjian,et al.Probabilistic classification preserving projections and its application to face recognition[J].Journal of Nanjing University of Science and Technology,2013,37(1):7-11.
[11]Yu Dongjun,Hu Jun,Huang Yan,et al.TargetATPsite:A template-free method for ATP-binding sites prediction with residue evolution image sparse representation and classifier ensemble[J].Journal of Computational Chemistry,2013,34(11):974-985.
[12]Yu Dongjun,Hu Jun,Tang Zhenmin,et al.Improving protein-ATP binding residues prediction by boosting SVMs with random under-sampling[J].Neurocom-puting,2013,104:180-190.
[13]Yu Dongjun,Hu Jun,Yan Hui,et al.Enhancing protein-vitamin binding residues prediction by multiple heterogeneous subspace SVMs ensemble[J].BMC Bioinformatics,2014,15(1):297.
[14]Kyte J,Doolittle R F.A simple method for displaying the hydropathic character of a protein[J].Journal of Molecular Biology,1982,157(1):105-132.
[15]Zhang Tuo,Zhang Hua,Chen Ke,et al.Accurate sequence-based prediction of catalytic residues[J].Bioinformatics,2008,24(20):2329-2338.
[16]Lee B,Richards F M.The interpretation of protein structures:estimation of static accessibility[J].Journal of Molecular Biology,1971,55(3):379-400.
[17]Joo K,Lee S J,Lee J.Sann:Solvent accessibility prediction of proteins by nearest neighbor method[J].Proteins:Structure,Function,and Bioinformatics,2012,80(7):1791-1797.
[18]Breiman L.Random forests[J].Machine Learning,2001,45(1):5-32.
[19]Berman H M,Westbrook J,Feng Z,et al.The protein data bank[J].Nucleic Acids Research,2000,28(1):235-242.


Last Update: 2015-08-31