[1]钱晓东,曹 阳.基于社区极大类发现的大数据并行聚类算法[J].南京理工大学学报(自然科学版),2016,40(01):117.
 Qian Xiaodong,Cao Yang.Large data parallel clustering algorithm based ondiscovery of maximal class in the community[J].Journal of Nanjing University of Science and Technology,2016,40(01):117.
点击复制

基于社区极大类发现的大数据并行聚类算法
分享到:

《南京理工大学学报》(自然科学版)[ISSN:1005-9830/CN:32-1397/N]

卷:
40卷
期数:
2016年01期
页码:
117
栏目:
出版日期:
2016-02-29

文章信息/Info

Title:
Large data parallel clustering algorithm based ondiscovery of maximal class in the community
作者:
钱晓东曹 阳
兰州交通大学 自动化与电气工程学院,甘肃 兰州 730070
Author(s):
Qian XiaodongCao Yang
School of Automation and Electrical Engineering,Lanzhou Jiaotong University,Lanzhou 730070,China
关键词:
大数据 聚类 复杂网络 局部关键节点 核心类别 极大团 适应度 并行算法
Keywords:
big data clustering complex network local key nodes core category maximal group fitness function parallel computing
分类号:
TP391
摘要:
为了能在大数据中准确快速地寻找到网络结构,该文提出一种基于社区极大类的大数据聚类算法。对于初始节点不确定和适应度函数计算所带来的时间消耗,引入局部关键节点和对适应度公式进行改进来减少时间消耗。对于初始社区的形成,引入了极大团的概念并通过分析极大团的特性,得出社区的核心类别是由极大团构成,同时提出通过极大团的发现来得到局部核心类别的方法并提出了极大团发现算法的并行策略,然后提出整个算法的并行策略并在真实数据集上实验。实验结果证明该文提出的算法是可行和有效的,适用于大规模数据的网络结构发现。
Abstract:
In order to find the network structure in the big data accurately and quickly,a large data clustering algorithm based on community clustering is proposed here.The key local node and improved fitness function are introduced to reduce the time consumption caused by the initial node’s uncertainty and the fitness function computing.For the formation of the initial community,this paper introduces the conception of the maximum clique.The conclusion that the core category of the community is made up of the maximum clique is drawn through analyzing its properties.This paper proposes the way of getting a local core class through finding the maximum clique.This paper proposes a parallel strategy of the maximum clique discovery algorithm and tests it in the real data sets.The experimental results show this algorithm is feasible and effective which can be applied to finding the network structure of large-scale data.

参考文献/References:

[1] Gantz J,Reinsel D.2011 Digital universe study:extracting value from chaos[M].USA:IDC Go-to-Market Services,2011.
[2]Bughin J,Chui M,Manyika J.Clouds,big data and smart assets:ten tech-enabled business trends to watch[J].McKinsey Quarterly,2010,8:1-14
[3]王元卓,靳小龙,程学旗.网络大数据:现状与展望[J].计算机学报,2013,36(6):1125-1138.

Wang Yuanzhuo,Jin Xiaolong,Cheng Xueqi.Network big data:Present and future[J].Chinese Journal of Computers,2013,36(6):1125-1138.
[4]Guha S,Rastogi R,Shim K.Cure:an efficient clustering algorithm for large databases[J].Information System Journal,1998,26(1):35~58.
[5]Kantabutra S,Couch A L.Parallel k-means clustering algorithm on nows[J].Nectec Technical Journal,2000,1(6):243-247.
[6]Clauset A.Finding local community structure in networks[J].Physics Review E,2005,72:1-6.
[7]Lancichinetti A,Fortunato S,Kertesz J.Detection of the overlapping and hierarchical community structure in complex networks[J].New Journal of Physics,2009,11:1-18.
[8]Nicosia V,Mangioni G,Carchiolo V,et al.Extending the definition of modularity to directed graphs with overlapping communities[J].Journal of Statistical Mechanics:Theory and Experiment,2009,3:03024.
[9]Bonacich P.Factoring and weighting approaches to status scores and clique identification[J].J Math Sociol,1972,2:113-120
[10]张琨,沈海波,张宏,等.基于灰色关联分析的复杂网络节点重要性综合评价方法[J].南京理工大学学报,2012,36(4):579-586.
Zhang Kun,Shen Haibo,Zhang Hong,et al.Synthesis evaluation method for node importance in complex networks based on grey relational analysis[J].Journal of Nanjing University of Science and Technology,2012,36(4):579-586.
[11]王辉,赵文会,施佺,等.复杂网络中节点重要性Damage度量分析[J].南京理工大学学报,2012,36(6):926-931.
Wang Hui,Zhao Wenhui,Shi Quan,et al.Analysis on damage measure of vertex importance in complex networks[J].Journal of Nanjing University of Science and Technology,2012,36(6):926-931.

相似文献/References:

[1]钱晓东,王正欧.ART2神经网络聚类的改进研究[J].南京理工大学学报(自然科学版),2007,(01):71.
 QIAN Xiao-dong,WANG Zhen-ou.Improvement of Clustering of ART2 Neural Network[J].Journal of Nanjing University of Science and Technology,2007,(01):71.
[2]马 旸,蔡 冰.大数据环境下Lucene性能优化方法研究[J].南京理工大学学报(自然科学版),2015,39(03):260.
 Ma Yang,Cai Bing.Performance optimization method of Lucene in big data[J].Journal of Nanjing University of Science and Technology,2015,39(01):260.
[3]张瀚珑,沈备军,王永剑.基于模板检测的违法网站识别方法[J].南京理工大学学报(自然科学版),2015,39(03):266.
 Zhang Hanlong,Shen Beijun,Wang Yongjian.Illegal website identification method based on template detection[J].Journal of Nanjing University of Science and Technology,2015,39(01):266.
[4]孙炯宁.基于混合式子树算法的大数据匿名化[J].南京理工大学学报(自然科学版),2015,39(05):609.
 Sun Jiongning.Anonymization of big data based on hybrid tree[J].Journal of Nanjing University of Science and Technology,2015,39(01):609.
[5]朱 虹,李千目,戚湧.一种基于改进最近邻算法的忠诚度预测方法[J].南京理工大学学报(自然科学版),2017,41(04):448.[doi:10.14177/j.cnki.32-1397n.2017.41.04.008]
 Zhu Hong,Li Qianmu,Qi Yong.Loyalty prediction method based on improvednearest neighbor algorithm[J].Journal of Nanjing University of Science and Technology,2017,41(01):448.[doi:10.14177/j.cnki.32-1397n.2017.41.04.008]

备注/Memo

备注/Memo:
收稿日期:2015-07-08 修回日期:2015-11-13
基金项目:国家自然科学基金(71461017)
作者简介:钱晓东(1973-),男,教授,主要研究方向:数据挖掘,E-mail:qianxd@mail.lzjtu.cn。
引文格式:钱晓东,曹阳.基于社区极大类发现的大数据并行聚类算法[J].南京理工大学学报,2016,40(1):117-123.
投稿网址:http://zrxuebao.njust.edu.cn
DOI:10.14177/j.cnki.32-1397n.2016.40.01.019
更新日期/Last Update: 2016-02-29