[1]叶海琴,孟彩霞,王意锋,等.一种基于MapReduce的频繁模式挖掘算法[J].南京理工大学学报(自然科学版),2018,42(01):62.[doi:10.14177/j.cnki.32-1397n.2018.42.01.009]
 Ye Haiqin,Meng Caixia,Wang Yifeng,et al.Frequent pattern mining algorithm based on MapReduce[J].Journal of Nanjing University of Science and Technology,2018,42(01):62.[doi:10.14177/j.cnki.32-1397n.2018.42.01.009]
点击复制

一种基于MapReduce的频繁模式挖掘算法()
分享到:

《南京理工大学学报》(自然科学版)[ISSN:1005-9830/CN:32-1397/N]

卷:
42卷
期数:
2018年01期
页码:
62
栏目:
出版日期:
2018-02-28

文章信息/Info

Title:
Frequent pattern mining algorithm based on MapReduce
文章编号:
1005-9830(2018)01-0062-06
作者:
叶海琴1孟彩霞2王意锋3张爱玲4
1.周口师范学院 计算机科学与技术学院,河南 周口 466001; 2.铁道警察学院 公安技术系,河南 郑州 450053; 3.73658部队,安徽 滁州 239421; 4.71352部队 自动化站,河南 安阳 455000
Author(s):
Ye Haiqin1Meng Caixia2Wang Yifeng3Zhang Ailing4
1.School of Computer Science and Technology,Zhoukou Normal University,Zhoukou 466001,China; 2.Public Security Technology Department,Railway Police College,Zhengzhou 450053,China; 3.73658 Troops,Chuzhou 239421,China; 4.Automation Station,71352 Troops,Any
关键词:
频繁模式 挖掘算法 Algorithm_Add算法 MapReduce模型 Hadoop集群 MRAlgorithm_Add算法
Keywords:
frequent pattern mining algorithm Algorithm_Add algorithm MapReduce model Hadoop cluster MRAlgorithm_Add algorithm
分类号:
TP311
DOI:
10.14177/j.cnki.32-1397n.2018.42.01.009
摘要:
为了解决Algorithm_Add算法在挖掘大数据中的频繁模式时存在的内存占有量大和运行速度慢等问题,该文在深入研究Algorithm_Add算法的基础上,提出了基于MapReduce计算模型的并行挖掘算法——MRAlgorithm_Add。算法利用MapReduce模型对新增加模式进行处理,在各个节点上求出局部频繁模式,通过合并各个节点的结果得到全局频繁模式。介绍了MRAlgorithm_Add的设计思想,分析了算法的运行性能。实验结果表明MRAlgorithm_Add算法在Hadoop集群上运行,具有较好的加速比性能和良好的可扩展性。
Abstract:
In order to solve the problems of large memory occupancy and low CPU processing speed when Algorithm_Add algorithm is used in mining frequent patterns from massive data,based on the in-depth study of Algorithm_Add algorithm,the parallel mining algorithm—MRAlgorithm_Add based on the MapReduce calculation model is proposed in the paper. The MapReduce model is used to deal with new patterns,and the local frequent patterns are obtained at each node. The global frequent patterns are obtained by combining the results of each node. The design idea of the MRAlgorithm_Add algorithm is introduced,and the operation performance of the MRAlgorithm_Add algorithm is analyzed in this paper. The experimental results show that the MRAlgorithm_Add algorithm running on the Hadoop cluster has better speedup performance and good scalability.

参考文献/References:

[1] 李学龙,龚海刚. 大数据系统综述[J]. 中国科学:信息科学,2015,45(1):1-44. Li Xuelong,Gong Haigang. A survey on big data systems[J]. Scientia Sinica Informationis,2015,45(1):1-44. [2]叶海琴,廖利,王意锋,等. 一种新的频繁模式挖掘算法[J]. 南京理工大学学报,2016,40(1):29-34. Ye Haiqin,Liao Li,Wang Yifeng,et al. New frequent patterns mining algorithm[J]. Journal of Nanjing University of Science and Technology,2016,40(1):29-34. [3]Lin C,Snyder L. Principles of parallel programming[M]. Beijing:China Machine Press,2009:2-19. [4]Wagener J L. High performance fortran[J]. Computer Standards & Interfaces,1996,18(4):371-377. [5]Gropp W,Lusk E,Skjellum A. Using MPI:portable parallel programming with the message passing interface[M]. Cambridge:MIT Press,1999. [6]Geist A,Beguelin A,Dongarra J,et al. PVM:parallel virtual machine:a users’ guide and tutorial for networkded parallel computing[M]. Cambridge:MIT Press,1995. [7]Dean J,Ghemawat S. MapReduce:simplified data processing on large clusters[J]. Communications of the ACM,2008,51(1):107-113. [8]李晓飞. 云计算环境下Apriori 算法的MapReduce 并行化[J]. 长春工业大学学报(自然科学版),2013,34(6):736-740. Li Xiaofei. MapReduce parallel of Apriori algorithm under cloud computing[J]. Journal of Changchun University of Technology(Natural Science Edition),2013,34(6):736-740. [9]李建江,崔健,王聃,等. MapReduce并行编程模型研究综述[J]. 电子学报,2011,39(11):2635-2642. Li Jianjiang,Cui Jian,Wang Dan,et al. Survey of MapReduce parallel programming model[J]. Acta Electronica Sinica,2011,39(11):2635-2642. [10]Kang U,Tsourakakis C E,Faloutsos C. PEGASUS:a peta-scale graph mining system-implementation and observations[C]//ICDM’09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining. Washington DC,USA:IEEE Computer Society,2009:229-238. [11]陈康,郑纬民. 云计算:系统实例与研究现状[J]. 软件学报,2009,20(5):1337-1348. Chen Kang,Zheng Weimin. Cloud computing:system instances and current research[J]. Journal of Software,2009,20(5):1337-1348. [12]唐多余,曹菡. 基于MapReduce的加权Voronoi图并行算法设计及应用[J]. 计算机应用研究,2013,30(5):1410-1412. Tang Duoyu,Cao Han. Parallel algorithm designing and application of weighted Voronoi diagram using MapReduce programming mode[J]. Application Research of Computers,2013,30(5):1410-1412. [13]陈诚,战荫伟,李鹰. 基于网页链接分类的PageRank并行算法[J]. 计算机应用,2015,35(1):48-52. Chen Cheng,Zhan Yinwei,Li Ying. PageRank parallel algorithm based on Web link classification[J]. Journal of Computer Applications,2015,35(1):48-52. [14]赵硕,张少敏. 分布式电力负荷预测算法研究[J]. 小型微型计算机系统,2014,35(8):1856-1860. Zhao Shuo,Zhang Shaomin. Distributed power load forecasting algorithm research[J]. Journal of Chinese Computer Systems,2014,35(8):1856-1860. [15]郑莉华,曾雪. 基于MapReduce的H. 264/AVC并行视频编码[J]. 计算机应用研究,2013,30(10):3139-3141. Zheng Lihua,Zeng Xue. H. 264/AVC parallel video coding based on MapReduce[J]. Application Research of Computers,2013,30(10):3139-3141. [16]Liu Yang,Jiang Xiaohong,Chen Huajun,et al. MapReduce-based pattern finding algorithm applied in motif detection for prescription compatibility network[C]//APPT’09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies. Berlin,Germany:Springer-Verlag,2009:341-355. [17]Yang Lai,Shi Zhongzhi. An efficient data mining framework on Hadoop using Java persistence api[C]//CIT’10 Proceedings of the 2010 10th IEEE International Conference on Computer and Information Technology. Washington DC,USA:IEEE Computer Society,2010:203-209. [18]张波良,周水庚,关佶红. MapReduce框架下的 Skyline 计算[J]. 计算机科学与探索,2011,5(5):385-397. Zhang Boliang,Zhou Shuigeng,Guan Jihong. Skyline computation under MapReduce framework[J]. Journal of Frontiers of Computer Science and Technology,2011,5(5):385-397. [19]刘义,景宁,陈荦,等. MapReduce 框架下基于R-树的k-近邻连接算法[J]. 软件学报,2013,24(8):1836-1851. Liu Yi,Jing Ning,Chen Luo,et al. Algorithm for processing k-nearest join based on R-Tree in MapReduce[J]. Journal of Software,2013,24(8):1836-1851. [20]王淑艳,杨鑫,李克秋. MapReduce框架下基于超平面投影划分的Skyline计算[J]. 计算机研究与发展,2014,51(12):2702-2710. Wang Shuyan,Yang Xin,Li Keqiu. Skyline computing on MapReduce with hyperplane-projections-based partition[J]. Journal of Computer Research and Development,2014,51(12):2702-2710. [21]刘向东,刘奎,胡飞翔,等. 基于MapReduce的并行聚类算法设计与实现[J]. 计算机应用与软件,2014,31(11):251-256. Liu Xiangdong,Liu Kui,Hu Feixiang,et al. Design and implementation of parallel clustering algorithm based on MapReduce[J]. Computer Applications and Software,2014,31(11):251-256. [22]和亮,冯登国,王蕊,等. 基于 MapReduce 的大规模在线社交网络蠕虫仿真[J]. 软件学报,2013,24(7):1666-1682. He Liang,Feng Dengguo,Wang Rui,et al. MapReduce-based large-scale online social network worm simulation[J]. Journal of Software,2013,24(7):1666-1682. [23]鲁伟明,杜晨阳,魏宝刚,等. 基于MapReduce的分布式近邻传播聚类算法[J]. 计算机研究与发展,2012,49(8):1762-1772. Lu Weiming,Du Chenyang,Wei Baogang,et al. Distributed affinity propagation clustering based on MapReduce[J]. Journal of Computer Research and Development,2012,49(8):1762-1772. [24]窦蒙,闻立杰,王建民,等. 基于MapReduce的海量事件日志并行转化算法[J]. 计算机集成制造系统,2013,19(8):1784-1793. Dou Meng,Wen Lijie,Wang Jianmin,et al. Parallel algorithm to convert big event log based on MapReduce[J]. Computer Integrated Manufacturing Systems,2013,19(8):1784-1793.

相似文献/References:

[1]叶海琴,廖 利,王意锋,等.一种新的频繁模式挖掘算法[J].南京理工大学学报(自然科学版),2016,40(01):29.
 Ye Haiqin,Liao Li,Wang Yifeng,et al.New frequent patterns mining algorithm[J].Journal of Nanjing University of Science and Technology,2016,40(01):29.

备注/Memo

备注/Memo:
收稿日期:2017-06-08 修回日期:2017-10-14 基金项目:国家自然科学基金(U1504613); 河南省科技攻关项目(172102210607); 河南省知识产权局软科学研究项目(20170106020); 河南省高等学校重点科研项目(18B520034); 铁道警察学院教改项目(JY2017002); 铁道警察学院中央基科项目(2017TJJBKY003); 河南省社科联项目(SKL-2017-429); 河南省高校科技创新团队项目(17IRTSTHN009) 作者简介:叶海琴(1980-),女,讲师,主要研究方向:个性化推荐、网络信息技术,E-mail:onlyyhq@126.com。 引文格式:叶海琴,孟彩霞,王意锋,等. 一种基于MapReduce的频繁模式挖掘算法[J]. 南京理工大学学报,2018,42(1):62-67. 投稿网址:http://zrxuebao.njust.edu.cn
更新日期/Last Update: 2018-02-28