[1]於跃成,刘彩生,生佳根.分布式约束一致高斯混合模型[J].南京理工大学学报(自然科学版),2013,37(06):799-806.
 Yu Yuecheng,Liu Caisheng,Sheng Jiagen.Distributed constraints consistency Gaussian mixture mode[J].Journal of Nanjing University of Science and Technology,2013,37(06):799-806.
点击复制

分布式约束一致高斯混合模型
分享到:

《南京理工大学学报》(自然科学版)[ISSN:1005-9830/CN:32-1397/N]

卷:
37卷
期数:
2013年06期
页码:
799-806
栏目:
出版日期:
2013-12-31

文章信息/Info

Title:
Distributed constraints consistency Gaussian mixture mode
作者:
於跃成1刘彩生2生佳根2
江苏科技大学 1.计算机科学与工程学院;
2.南徐学院,江苏 镇江 212003
Author(s):
Yu Yuecheng1Liu Caisheng2Sheng Jiagen2
1.College of Computer Science and Engineering;
2.College of Nanxu,Jiangsu University of Science and Technology,Zhenjiang 212003,China
关键词:
约束一致 高斯混合模型 分布式聚类 正则化算子
Keywords:
constraints consistency Gaussian mixture model distributed clustering regularization operator
分类号:
TP391.4
摘要:
为有效提高非球形分布式水平划分数据集的聚类质量,提出了一种分布式约束一致高斯混合模型(DCCGMM)。DCCGMM以高斯混合模型(GMM)作为数据集的描述模型,通过约束一致正则化算子将约束信息引入GMM,使得DCCGMM的估计参数既能反映样本数据的潜在概率分布,又能包含用户的先验知识,同时所有参数均能以封闭的解析表达式进行估计。通过设计相应的用户站点传递参数,DCCGMM可以应用于分布式聚类。实验结果表明:相比于以K-means为基本算法的分布式聚类方法,该算法在聚类非球形数据时具有更好的适应性,其聚类精度优于没有使用约束信息的分布式期望最大化(EM)算法,全局平均聚类精度分别提高9%至20%。
Abstract:
To effectively improve the clustering quality of non-spherical horizontally distributed data sets,a distributed constraints consistency Gaussian mixture mode(DCCGMM)is proposed.For the DCCGMM,the description model of the data sets is Gaussian mixture model(GMM),and the constraint information is introduced to GMM by constraints consistent regularization operators.Then,the estimated parameters of the DCCGMM reflect both the underlying probability distribution of sample data and the apriori knowledge from users,and each parameter can be estimated by a closed-form solution.The DCCGMM can be used for distributed clustering by designing the communication parameters between user sites.Experimental result shows that,compared with the distributed clustering algorithms based on K-means,the algorithm proposed here has considerable flexibility in clustering the non-spherical data sets and the clustering quality of this algorithm is better than the result of distributed expectation maximization(EM)algorithm without constraint information,and the global average clustering accuracy increases by 9%-20%.

参考文献/References:

[1] 王飞,钱玉文,王执铨.基于无监督聚类算法的入侵检测[J].南京理工大学学报,2009,33(3):288-292.
Wang Fei,Qian Yuwen,Wang Zhiquan.Intrusion detection based on unsupervised clustering algorithm[J].Journal of Nanjing University of Science and Technology,2009,33(3):288-292.
[2]Mokeddem D,Belbachir H.A survey of distributed classification based ensemble data mining methods[J].Journal of Applied Sciences,2009,9(20):3739-3745.
[3]尹学松,胡恩良,陈松灿.基于成对约束的判别型半监督聚类分析[J].软件学报,2008,19(11):2791-2802.
Yin Xuesong,Hu Enliang,Chen Songcan.Discrimina-tive semi-supervised clustering analysis with pairwise constraints[J].Journal of Software,2008,19(11):2791-2802.
[4]王娜,李霞.基于监督信息特性的主动半监督谱聚类算法[J].电子学报,2010,38(1):172-176.
Wang Na,Li Xia.Active semi-supervised spectral clustering based on pairwise constraints[J].Acta Electronica Sinica,2010,38(1):172-176.
[5]於跃成,王建东,郑关胜,等.基于约束信息的并行K-means算法[J].东南大学学报:自然科学版,2011,41(3):505-508.
Yu Yuecheng,Wang Jiandong,Zheng Guansheng,et al.Parallel K-means algorithm based on constrained information[J].Journal of Southeast University(Natural Science Edition),2011,41(3):505-508.
[6]Yu Yuecheng,Wang Jiandong,Zheng Guansheng,et al.Distributed K-means based on soft constraints[J].Journal of Software Engineering,2011,5(4):116-126.
[7]倪巍伟,陈耿,吴英杰,等.一种基于局部密度的分布式聚类挖掘算法[J].软件学报,2008,19(9):2339-2348.
Ni Weiwei,Chen Geng,Wu Yingjie,et al.Local density based distributed clustering algorithm[J].Journal of Software,2008,19(9):2339-2348.
[8]Samatova N F,Ostrouchov G,Geist A,et al.RACHET:an efficient cover-based merging of clustering hierarchies from distributed datasets[J].Distributed Parallel Databases,2002,11(2):157-180.
[9]吉根林,凌霄汉,杨明.一种基于集成学习的分布式聚类算法[J].东南大学学报:自然科学版,2007,37(4):585-588.
Ji Genlin,Ling Xiaohan,Yang Ming.Distributed clustering algorithm based on ensemble learning[J].Journal of Southeast University(Natural Science Edition),2007,37(4):585-588.
[10]Wolfe J,Haghighi A,Klein D.Fully distributed EM for very large datasets[A].Proceeding ICML'08 Proceedings of the 25th International Conference on Machine Learning[C].New York,NY,USA:ACM,2008:1184-1191.
[11]Merugu S,Ghosh J.Privacy preserving distributed clustering using generative models[A].ICDM 2003 Third IEEE International Conference on Data Mining 2003[C].Florida,USA:IEEE,2003:211-218.
[12]Lin Xiaodong,Clifton C,Zhu M.Privacy-preserving clustering with distributed EM mixture modeling[J].Knowledge and Information Systems,2005,8(1):68-81.
[13]He Xiaofei,Deng Cai,Shao Yuanlong,et al.Laplacian regularized Gaussian mixture model for data clustering[J].IEEE Transactions on Knowledge and Data Engineering,2011,23(9):1406-1418.
[14]Liu Jialu,Deng Cai,He Xiaofei.Gaussian mixture model with local consistency[A].Proceedings of the Twenty-fourth AAAI Conference on Artificial Intelligence(AAAI-10)[C].Georgia,USA:AAAI,2010:512-517.
[15]Bilmes J A.A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov mode[R].Berkeley,California,USA:ICSI,1998.
[16]Bishop C M.Pattern recognition and machine learning[M].Berlin,Germany:Springer,2006.

相似文献/References:

[1]张玉珍,夏肇霖,王建宇,等.基于音频和文本融合的广告单元分割[J].南京理工大学学报(自然科学版),2012,36(03):396.
 ZHANG Yu-zhen,XIA Zhao-lin,WANG Jian-yu,et al.Advertisement Unit Segmentation Based on Fusion of Audio and Text[J].Journal of Nanjing University of Science and Technology,2012,36(06):396.
[2]袁夏,赵春霞,张浩峰,等.基于点云数据的自然地形分类算法[J].南京理工大学学报(自然科学版),2010,(02):222.
 YUAN Xia,ZHAO Chun-xia,ZHANG Hao-feng,et al.Nature Terrain Classification Using Point Cloud Data[J].Journal of Nanjing University of Science and Technology,2010,(06):222.
[3]郑怀兵,翟济云.基于视频分析的森林火灾烟雾检测方法[J].南京理工大学学报(自然科学版),2015,39(06):686.
 Zheng Huaibing,Zhai Jiyun.Forest fire smoke detection based on video analysis[J].Journal of Nanjing University of Science and Technology,2015,39(06):686.
[4]向昌盛.高斯混合模型心音信号自动识别[J].南京理工大学学报(自然科学版),2016,40(05):560.[doi:10.14177/j.cnki.32-1397n.2016.40.05.010]
 Xiang Changsheng.Automatic recognition of heart sound signal based on Gauss mixture model[J].Journal of Nanjing University of Science and Technology,2016,40(06):560.[doi:10.14177/j.cnki.32-1397n.2016.40.05.010]

备注/Memo

备注/Memo:
收稿日期:2013-01-23 修回日期:2013-02-24
基金项目:国家自然科学基金(61170201); 江苏省高校自然科学研究项目(13KJB520004)
作者简介:於跃成(1971-),男,博士,副教授,主要研究方向:机器学习,数据挖掘,E-mail:zhjyuyuecheng@163.com。
引文格式:於跃成,刘彩生,生佳根.分布式约束一致高斯混合模型[J].南京理工大学学报,2013,37(6):799-806.
投稿网址:http://njlgdxxb.paperonce.org
更新日期/Last Update: 2013-12-31