[1]魏志森,杨静宇,於东军.基于加权PSSM直方图和随机森林集成的 蛋白质交互作用位点预测[J].南京理工大学学报(自然科学版),2015,39(04):379.
 Wei Zhisen,Yang Jingyu,Yu Dongjun.Protein-protein interaction sites prediction based on weighted PSSM histogram and random forests ensemble[J].Journal of Nanjing University of Science and Technology,2015,39(04):379.
点击复制

基于加权PSSM直方图和随机森林集成的 蛋白质交互作用位点预测
分享到:

《南京理工大学学报》(自然科学版)[ISSN:1005-9830/CN:32-1397/N]

卷:
39卷
期数:
2015年04期
页码:
379
栏目:
出版日期:
2015-08-31

文章信息/Info

Title:
Protein-protein interaction sites prediction based on weighted PSSM histogram and random forests ensemble
作者:
魏志森杨静宇於东军
南京理工大学 计算机科学与工程学院,江苏 南京 210094
Author(s):
Wei ZhisenYang JingyuYu Dongjun
School of Computer Science and Engineering,NUST,Nanjing 210094,China
关键词:
蛋白质作用 位置特异性得分矩阵 加权得分矩阵直方图 随机森林 分类器集成
Keywords:
protein-protein interactions position specific scoring matrix weighted position specific scoring matrix histogram random forests classifier ensemble
分类号:
TP391.4
摘要:
为了提高蛋白质相互作用位点预测的精度,该文基于蛋白质位置特异性得分矩阵(Position specific scoring matrix,PSSM),提出了一种新的加权得分矩阵直方图特征表示方法; 针对训练数据的极端不平衡,结合下采样和分类器集成方法,训练随机森林集成分类器。相对于传统的特征,该文所提新特征具有更低的维数,同时拥有更好的鉴别性。分类器集成则缓解了下采样带来的信息丢失,并提高了分类精度。实验结果验证了所述方法是有效的,在标准数据集上的结果优于其他最新的蛋白质相互作用位点预测方法。
Abstract:
In order to improve the accuracy of protein-protein interaction sites prediction,based on position specific scoring matrix(PSSM)of a protein,this paper develops a novel feature representation-weighted PSSM histogram.In view of the extreme imbalance in training data,combining under-sampling and classifier ensemble,a random forests ensemble classifier is trained.Compared with the traditional features,the features here possess a lower dimension reserving better discrimination.Classifier ensemble remits the damage of under-sampling and improves the performance.Experimental results show that the method here is effective and outperforms the state of the art methods on benchmark datasets.

参考文献/References:

[1] Okabe H,Lee S H,Phuchareon J,et al.A critical role for FBXW8 and MAPK in cyclin D1 degradation and cancer cell proliferation[J].PLoS ONE,2006,1(1):e128.
[2]Rohila J S,Chen Mei,Chen Shuo,et al.Protein-protein interactions of tandem affinity purified protein kinases from rice[J].PLoS ONE,2009,4(8):e6685.
[3]冀俊忠,刘志军,刘红欣,等.蛋白质相互作用网络功能模块预测的研究综述[J].自动化学报,2014,4:577-593.
Ji Junzhong,Liu Zhijun,Liu Hongxin,et al.An overview of research on functional module detection for protein-protein interaction networks[J].Acta Automatica Sinica,2014,4:577-593.
[4]Taylor C M,Fischer K,Abubucker S,et al.Targeting protein-protein interactions for parasite control[J].PLoS ONE,2011,6(4):e18381.
[5]Porollo A,Meller J.Prediction-based fingerprints of protein-protein interactions[J].Proteins:Structure,Function,and Bioinformatics,2007,66(3):630-645.
[6]Murakami Y,Mizuguchi K.Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites[J].Bioinformatics,2010,26(15):1841-1848.
[7]Dhole K,Singh G,Pai P P,et al.Sequence-based prediction of protein-protein interaction sites with L1-logreg classifier[J].Journal of Theoretical Biology,2014,348:47-54.
[8]Singh G,Dhole K,Pai P P,et al.SPRINGS:Prediction of protein-protein interaction sites using artificial neural networks[J].J Proteomics Computational Biol,2014,1(1):1-7.
[9]李倩倩,刘胥影.多类类别不平衡学习算法:EasyEnsemble.M[J].模式识别与人工智能,2014,27(2):187-192.
Li Qianqian,Liu Xuying.EasyEnsemble.M for multiclass imbalance problem[J].Pattern Recognition and Artificial Intelligence,2014,27(2):187-192.
[10]杨章静,刘传才,顾兴健,等.依概率分类的保持投影及其在人脸识别中的应用[J].南京理工大学学报,2013,37(1):7-11.
Yang Zhangjing,Liu Chuancai,Gu Xingjian,et al.Probabilistic classification preserving projections and its application to face recognition[J].Journal of Nanjing University of Science and Technology,2013,37(1):7-11.
[11]Yu Dongjun,Hu Jun,Huang Yan,et al.TargetATPsite:A template-free method for ATP-binding sites prediction with residue evolution image sparse representation and classifier ensemble[J].Journal of Computational Chemistry,2013,34(11):974-985.
[12]Yu Dongjun,Hu Jun,Tang Zhenmin,et al.Improving protein-ATP binding residues prediction by boosting SVMs with random under-sampling[J].Neurocom-puting,2013,104:180-190.
[13]Yu Dongjun,Hu Jun,Yan Hui,et al.Enhancing protein-vitamin binding residues prediction by multiple heterogeneous subspace SVMs ensemble[J].BMC Bioinformatics,2014,15(1):297.
[14]Kyte J,Doolittle R F.A simple method for displaying the hydropathic character of a protein[J].Journal of Molecular Biology,1982,157(1):105-132.
[15]Zhang Tuo,Zhang Hua,Chen Ke,et al.Accurate sequence-based prediction of catalytic residues[J].Bioinformatics,2008,24(20):2329-2338.
[16]Lee B,Richards F M.The interpretation of protein structures:estimation of static accessibility[J].Journal of Molecular Biology,1971,55(3):379-400.
[17]Joo K,Lee S J,Lee J.Sann:Solvent accessibility prediction of proteins by nearest neighbor method[J].Proteins:Structure,Function,and Bioinformatics,2012,80(7):1791-1797.
[18]Breiman L.Random forests[J].Machine Learning,2001,45(1):5-32.
[19]Berman H M,Westbrook J,Feng Z,et al.The protein data bank[J].Nucleic Acids Research,2000,28(1):235-242.

相似文献/References:

[1]郜法启,於东军,沈红斌.基于分类器集成的跨膜蛋白两亲螺旋区域位置预测[J].南京理工大学学报(自然科学版),2016,40(04):431.[doi:10.14177/j.cnki.32-1397n.2016.40.04.009]
 Gao Faqi,Yu Dongjun,Shen Hongbin.Prediction of amphipathic helices in transmembrane proteins by using ensembled classifier[J].Journal of Nanjing University of Science and Technology,2016,40(04):431.[doi:10.14177/j.cnki.32-1397n.2016.40.04.009]

备注/Memo

备注/Memo:
收稿日期:2015-05-13 修回日期:2015-06-02
基金项目:国家自然科学基金(61373062); 江苏省自然科学基金(BK20141403); 江苏省“六大人才高峰”项目(2013-XXRJ-022)
作者简介:魏志森(1984-),男,博士生,主要研究方向:生物信息学、模式识别,E-mail:zhiswei@163.com; 通讯作者:於东军(1975-),男,博士,教授,博士生导师,主要研究方向:生物信息学、模式识别,E-mail:njyudj@njust.edu.cn。
引文格式:魏志森,杨静宇,於东军.基于加权PSSM直方图和随机森林集成的蛋白质交互作用位点预测[J].南京理工大学学报,2015,39(4):379-385.
投稿网址:http://zrxuebao.njust.edu.cn
更新日期/Last Update: 2015-08-31