|Table of Contents|

Illegal website identification method based on template detection


Research Field:
Publishing date:


Illegal website identification method based on template detection
Zhang Hanlong1Shen Beijun1Wang Yongjian2
1.School of Software,Shanghai Jiao Tong University,Shanghai 200240,China; 2.The Third Research Institute of Ministry of Public Security,Shanghai 200031,China
template detection illegal website identification similarity degree clustering graph mining gambling websites
A new method is proposed to identify illegal website efficiently.Essential information extracted from HTTP POST is hashed; the degree of website similarity associated with hash value match is measured; unknown websites are classified by the illegal website templates extracted from a large uncategorized corpus by clustering.The identification efficiency is improved by filtering legal websites using graph mining.The method is experimented and tested on gambling websites massively in a real environment.The results show that the precision of gambling website test of this method is 1; compared with URL,HTML and semantic features,the F-Measure of HTTP POST features is the best; legal websites can be filtered effectively using graph mining,and the operational efficiency can be improved by 20%.


[1] 恶意网站实验室[EB/OL].http://www.mwsl.org.cn/,2015-05-11.
[2]李洋,刘飚,封化民.基于机器学习的网页恶意代码检测方法[J].北京电子科技学院学报,2012,20(4):36-40,12. Li Yang,Liu Biao,Feng Huamin.Malicious web pages detection based on machine learning[J].Journal of Beijing Electronic Science & Technology Institute,2012,20(4):36-40,12.
Huang Huajun,Qian Liang,Wang Yaojun.Detection of phishing URL based on abnormal feature[J].Netinfo Security,2012(1):23-25,67.
Wang Tao,Yu Shunzheng.Real-time detection of malicious web pages based on statistical learning[J].Computer Science,2011,38(1):87-90,129.
[5]Braun B,Johns M,Koestler J.PhishSafe:Leveraging modern JavaScript API's for transparent and robust protection[EB/OL].http://web.sec.uni-passau.de/papers/2014_Braun_Koestler_Johns_Posegga-PhishSafe_Leveraging_Modern_JavaScript_APIs_for_Transparent_and_Robust_Protection.pdf,2015-04-18.
Ni Ping,Chen Zhengguo,Ouyang Xiongyi,et al.Design and implementation of active detection and analysis system for web malicious code[J].Journal of Computer Applications,2011,31(z2):106-108.
[7]Urvoy T,Chauveau E,Filoche P.Tracking web spam with HTML style similarities[J].TWEB,2008,2(1):1-28.
[8]Apache.Hadoop information[EB/OL].http://hadoop.apache.org/,2015-05-11.
[9]Dean J,Ghemawat S.MapReduce:Simplified data processing on large clusters[EB/OL].http://citeseerx.ist.psu.edu/viewdoc/summary?doi=,2015-04-18.
[10]Akoglu L,Mcglohon M,Faloutsos C.OddBall:Spotting anomalies in weighted graphs[EB/OL].http://citeseerx.ist.psu.edu/viewdoc/summary?doi=,2015-04-18.
[11]Ma J,Saul L K,Savage S,et al.Beyond blacklists:Learning to detect malicious web sites from suspicious URLs[EB/OL].http://citeseerx.ist.psu.edu/viewdoc/summary?doi=,2015-04-18.
[12]Ma J,Saul L K,Savage S.Identifying suspicious URLs:An application of large-scale online learning[EB/OL].http://citeseerx.ist.psu.edu/viewdoc/summary?doi=,2015-04-18.


Last Update: 2015-06-30