|Table of Contents|

Protein-protein interaction sites prediction based on weighted PSSM histogram and random forests ensemble

《南京理工大学学报》(自然科学版)[ISSN:1005-9830/CN:32-1397/N]

Issue:
2015年04期
Page:
379-
Research Field:
Publishing date:

Info

Title:
Protein-protein interaction sites prediction based on weighted PSSM histogram and random forests ensemble
Author(s):
Wei ZhisenYang JingyuYu Dongjun
School of Computer Science and Engineering,NUST,Nanjing 210094,China
Keywords:
protein-protein interactions position specific scoring matrix weighted position specific scoring matrix histogram random forests classifier ensemble
PACS:
TP391.4
DOI:
-
Abstract:
In order to improve the accuracy of protein-protein interaction sites prediction,based on position specific scoring matrix(PSSM)of a protein,this paper develops a novel feature representation-weighted PSSM histogram.In view of the extreme imbalance in training data,combining under-sampling and classifier ensemble,a random forests ensemble classifier is trained.Compared with the traditional features,the features here possess a lower dimension reserving better discrimination.Classifier ensemble remits the damage of under-sampling and improves the performance.Experimental results show that the method here is effective and outperforms the state of the art methods on benchmark datasets.

References:

[1] Okabe H,Lee S H,Phuchareon J,et al.A critical role for FBXW8 and MAPK in cyclin D1 degradation and cancer cell proliferation[J].PLoS ONE,2006,1(1):e128.
[2]Rohila J S,Chen Mei,Chen Shuo,et al.Protein-protein interactions of tandem affinity purified protein kinases from rice[J].PLoS ONE,2009,4(8):e6685.
[3]冀俊忠,刘志军,刘红欣,等.蛋白质相互作用网络功能模块预测的研究综述[J].自动化学报,2014,4:577-593.
Ji Junzhong,Liu Zhijun,Liu Hongxin,et al.An overview of research on functional module detection for protein-protein interaction networks[J].Acta Automatica Sinica,2014,4:577-593.
[4]Taylor C M,Fischer K,Abubucker S,et al.Targeting protein-protein interactions for parasite control[J].PLoS ONE,2011,6(4):e18381.
[5]Porollo A,Meller J.Prediction-based fingerprints of protein-protein interactions[J].Proteins:Structure,Function,and Bioinformatics,2007,66(3):630-645.
[6]Murakami Y,Mizuguchi K.Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites[J].Bioinformatics,2010,26(15):1841-1848.
[7]Dhole K,Singh G,Pai P P,et al.Sequence-based prediction of protein-protein interaction sites with L1-logreg classifier[J].Journal of Theoretical Biology,2014,348:47-54.
[8]Singh G,Dhole K,Pai P P,et al.SPRINGS:Prediction of protein-protein interaction sites using artificial neural networks[J].J Proteomics Computational Biol,2014,1(1):1-7.
[9]李倩倩,刘胥影.多类类别不平衡学习算法:EasyEnsemble.M[J].模式识别与人工智能,2014,27(2):187-192.
Li Qianqian,Liu Xuying.EasyEnsemble.M for multiclass imbalance problem[J].Pattern Recognition and Artificial Intelligence,2014,27(2):187-192.
[10]杨章静,刘传才,顾兴健,等.依概率分类的保持投影及其在人脸识别中的应用[J].南京理工大学学报,2013,37(1):7-11.
Yang Zhangjing,Liu Chuancai,Gu Xingjian,et al.Probabilistic classification preserving projections and its application to face recognition[J].Journal of Nanjing University of Science and Technology,2013,37(1):7-11.
[11]Yu Dongjun,Hu Jun,Huang Yan,et al.TargetATPsite:A template-free method for ATP-binding sites prediction with residue evolution image sparse representation and classifier ensemble[J].Journal of Computational Chemistry,2013,34(11):974-985.
[12]Yu Dongjun,Hu Jun,Tang Zhenmin,et al.Improving protein-ATP binding residues prediction by boosting SVMs with random under-sampling[J].Neurocom-puting,2013,104:180-190.
[13]Yu Dongjun,Hu Jun,Yan Hui,et al.Enhancing protein-vitamin binding residues prediction by multiple heterogeneous subspace SVMs ensemble[J].BMC Bioinformatics,2014,15(1):297.
[14]Kyte J,Doolittle R F.A simple method for displaying the hydropathic character of a protein[J].Journal of Molecular Biology,1982,157(1):105-132.
[15]Zhang Tuo,Zhang Hua,Chen Ke,et al.Accurate sequence-based prediction of catalytic residues[J].Bioinformatics,2008,24(20):2329-2338.
[16]Lee B,Richards F M.The interpretation of protein structures:estimation of static accessibility[J].Journal of Molecular Biology,1971,55(3):379-400.
[17]Joo K,Lee S J,Lee J.Sann:Solvent accessibility prediction of proteins by nearest neighbor method[J].Proteins:Structure,Function,and Bioinformatics,2012,80(7):1791-1797.
[18]Breiman L.Random forests[J].Machine Learning,2001,45(1):5-32.
[19]Berman H M,Westbrook J,Feng Z,et al.The protein data bank[J].Nucleic Acids Research,2000,28(1):235-242.

Memo

Memo:
-
Last Update: 2015-08-31