|Table of Contents|

Illegal website identification method based on template detection

《南京理工大学学报》(自然科学版)[ISSN:1005-9830/CN:32-1397/N]

Issue:
2015年03期
Page:
266-
Research Field:
Publishing date:

Info

Title:
Illegal website identification method based on template detection
Author(s):
Zhang Hanlong1Shen Beijun1Wang Yongjian2
1.School of Software,Shanghai Jiao Tong University,Shanghai 200240,China; 2.The Third Research Institute of Ministry of Public Security,Shanghai 200031,China
Keywords:
template detection illegal website identification similarity degree clustering graph mining gambling websites
PACS:
TP311
DOI:
-
Abstract:
A new method is proposed to identify illegal website efficiently.Essential information extracted from HTTP POST is hashed; the degree of website similarity associated with hash value match is measured; unknown websites are classified by the illegal website templates extracted from a large uncategorized corpus by clustering.The identification efficiency is improved by filtering legal websites using graph mining.The method is experimented and tested on gambling websites massively in a real environment.The results show that the precision of gambling website test of this method is 1; compared with URL,HTML and semantic features,the F-Measure of HTTP POST features is the best; legal websites can be filtered effectively using graph mining,and the operational efficiency can be improved by 20%.

References:

[1] 恶意网站实验室[EB/OL].http://www.mwsl.org.cn/,2015-05-11.
[2]李洋,刘飚,封化民.基于机器学习的网页恶意代码检测方法[J].北京电子科技学院学报,2012,20(4):36-40,12. Li Yang,Liu Biao,Feng Huamin.Malicious web pages detection based on machine learning[J].Journal of Beijing Electronic Science & Technology Institute,2012,20(4):36-40,12.
[3]黄华军,钱亮,王耀钧.基于异常特征的钓鱼网站URL检测技术[J].信息网络安全,2012(1):23-25,67.
Huang Huajun,Qian Liang,Wang Yaojun.Detection of phishing URL based on abnormal feature[J].Netinfo Security,2012(1):23-25,67.
[4]王涛,余顺争.基于统计学习的挂马网页实时检测[J].计算机科学,2011,38(1):87-90,129.
Wang Tao,Yu Shunzheng.Real-time detection of malicious web pages based on statistical learning[J].Computer Science,2011,38(1):87-90,129.
[5]Braun B,Johns M,Koestler J.PhishSafe:Leveraging modern JavaScript API's for transparent and robust protection[EB/OL].http://web.sec.uni-passau.de/papers/2014_Braun_Koestler_Johns_Posegga-PhishSafe_Leveraging_Modern_JavaScript_APIs_for_Transparent_and_Robust_Protection.pdf,2015-04-18.
[6]倪平,陈正果,欧阳雄弈,等.Web恶意代码主动检测与分析系统的设计与实现[J].计算机应用,2011,31(z2):106-108.
Ni Ping,Chen Zhengguo,Ouyang Xiongyi,et al.Design and implementation of active detection and analysis system for web malicious code[J].Journal of Computer Applications,2011,31(z2):106-108.
[7]Urvoy T,Chauveau E,Filoche P.Tracking web spam with HTML style similarities[J].TWEB,2008,2(1):1-28.
[8]Apache.Hadoop information[EB/OL].http://hadoop.apache.org/,2015-05-11.
[9]Dean J,Ghemawat S.MapReduce:Simplified data processing on large clusters[EB/OL].http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.135.4448&or=7,2015-04-18.
[10]Akoglu L,Mcglohon M,Faloutsos C.OddBall:Spotting anomalies in weighted graphs[EB/OL].http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.168.6324,2015-04-18.
[11]Ma J,Saul L K,Savage S,et al.Beyond blacklists:Learning to detect malicious web sites from suspicious URLs[EB/OL].http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.153.3276,2015-04-18.
[12]Ma J,Saul L K,Savage S.Identifying suspicious URLs:An application of large-scale online learning[EB/OL].http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.153.3318,2015-04-18.

Memo

Memo:
-
Last Update: 2015-06-30