|Table of Contents|

Performance optimization method of Lucene in big data


Research Field:
Publishing date:


Performance optimization method of Lucene in big data
Ma YangCai Bing
Jiangsu Branch of National Computer Network Emergency Response Technical Team/ Coordination Center of China,Nanjing 210003,China
big data Lucene memory computing batch processing inverted index post-list cache random access memory index disk index multiple block inverted structure
To improve the data query efficiency in big data,an optimized inverted index method—RAM FS directory(RFDirectory)is proposed here based on memory computing and batch processing technique.A post-list management technique combining random access memory(RAM)and disk is realized based on Lucene.New data are written into a cache,and then written into a disk index periodically to improve the writing performance of the inverted index method.Data query results are provided efficiently to consumers by integrating the multiple block inverted structure of the disk and RAM.Experimental results show that the index constructing time of RFDirectory is 50% of that of FSDirectory or RAMDirectory,and the time consuming of returning the index result of one keyword is reduced by 15% in big data.


[1] Scholer F,Williams H E,Yiannis J,et al.Compression of inverted indexes for fast query evaluation[A].Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval[C].New York,NY,USA:ACM,2002:222-229.
[2]Moffat A,Zobel J.Self-indexing inverted files for fast text retrieval[J].ACM Transactions on Information Systems,1996,14(4):349-379.
[3]Persin M,Zobel J,Sacks-Davis R.Filtered document retrieval with frequency-sorted indexes[J].Journal of the American Society for Information Science,1996,47(10):749-764.
[4]Brin S,Page L.The anatomy of a large-scale hypertextual Web search engine[A].Proceedings of the 7th WWW Conference[C].Brisbane,Australia:ScienceDirect,1998:107-117.
Tan Bin,Ding Sha,Che Nian,et al.Field-oriented structure of inverted index and real-time updates[J].Journal of Sichuan University(Natural Science Edition),2011,48(2):321-326.
[6]高梦娇,吕玉琴,侯宾.基于 R-tree 和倒排文件的混合索引的设计与实现[EB/OL].http://www.paper.edu.cn/html/releasepaper/2012/12/718/,2012-12-02.
Ma Jian,Zhang Taihong,Chen Yanhong.New inverted index storage scheme for Chinese search engine[J].Journal of Computer Applications,2013,33(7):2031-2036.
Liu Xiaozhu,Peng Zhiyong,Chen Xu.An efficient random access block inverted file self-index technology[J].Chinese Journal of Computers,2010,33(6):977-987.
[9]Hatcher E,Gospodnetic O.Lucene in action[EB/OL]http://citeseerx.ist.psu.edu/showciting?cid=541300,2015-06-03.


Last Update: 2015-06-30