|Table of Contents|

Remote detection of web page tampering based on deep learning(PDF)


Research Field:
Publishing date:


Remote detection of web page tampering based on deep learning
Yin Jie1Jiang Yuxiang12Niu Bowei2Yan Zichen12Guo Yanwen34
1.Department of Network Security Corps,Jiangsu Police Institute,Nanjing 210031,China; 2.Jiangsu Public Security Bureau,Nanjing 210024,China; 3.Department of Computer Science and Technology,Nanjing University,Nanjing 210023,China; 4.State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210023,China
web page tampering hidden hyperlink detection neural network deep learning network representation learning
The paper proposes a method that can detect attacks of web page tampering based on corpus construction and deep learning,which can obtain results with high precision and recalling. This paper obtains a large amount of web pages which are potentially tampered,and manually builds the web page tampering database based on the method of corpus construction. Secondly,this paper proposes an automatic detection algorithm based on deep neural network,which integrates text features,structure features and network features. The proposed method can predict whether a webpage has been tampered or not,as well as the attack type. Extensive experiments are conducted to show the effectiveness of the proposed method,with accuracy of 95.6%,recall of 96.7%,and F value of 96.1%,which significantly outperforms the baseline method.


[1] 网络安全信息与动态周报-2019年第26期[EB/OL]. 2019-7-5,https://www. cert. org. cn/publish/main/upload/File/2019week26. pdf National Internet Emergency Center. Network security.
[2]邢容.基于文本识别技术的网页恶意代码检测方法研究[D]. 北京:中国科学院大学,2012.
[3]周文怡,顾徐波,施勇,薛质. 基于机器学习的网页暗链检测方法[J]. 计算机工程,2018,44(10):22-27.
Zhou Wenyi,Gu Xubo,Shi Yong,Xue Zhi. Detection method for hidden hyperlink based on machine learning[J]. Computer Engineering,2018,44(10):22-27.
[4]张捷,薄煜明,吕明. 基于神经网络预测的网络控制系统故障检测[J]. 南京理工大学学报,2010,34(1):19-23.
Zhang Jie,Bo Yuming,Lv Ming. Fault detection of networked control systems based on neural network prediction[J]. Journal oF Nanjing University of Science and Technology,2010,34(1):19-23.
[5]Le Q,Mikolov T. Distributed representations of sentences and documents[C]//Proceedings of the 31st International Conference on Machine Learning. Beijing,China:PMLR,2014:1188-1196.
[6]Mikolov T,Sutskever I,Chen K,et al. Distributed representations of words and phrases and their compositionality[C]//Advances in Neural Information Processing Systems. Lake Tahoe,Nevada,USA:Curran Associates,Inc,2013:3111-3119.
[7]Perozzi B,Al-Rfou R,Skiena S. Deepwalk:Online learning of social representations[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York,USA:ACM,2014:701-710.
[8]Svetnik V. Random Forest:A classification and regression tool for compound classification and QSAR modeling[J]. Journal of Chemical Information & Computer Sciences,2003,43(6):1947.
[9]Chih Chung,Chang Chihjen. LIBSVM:A library for support vector machines[J]. ACM Transactions on Intelligent Systems and Technology(TIST),2011,2(3):27.
[10]Srivastava N,Hinton G,Krizhevsky A,et al. Dropout:a simple way to prevent neural networks from overfitting[J]. Journal of Machine Learning Research,2014,15(1):1929-1958.
[11]Kingma D P. Generating sequences with recurrent neural networks[J]. arXiv preprint,https://arxiv. org/abs/1412. 6980.
[12]张海军,陈映辉.类图像处理及向量化:大数据脚本攻击智能检测[J/OL]. 计算机工程. https://doi. org/10. 19678/j. issn. 1000-3428. 0053360
[13]刘博文,王雨琪,林果园. 基于结构化文档的钓鱼网站检测算法[J]. 计算机工程与设计,2019,40(10):2791-2798.
Liu Bowen,Wang Yuqi,Lin Guoyuan. Phishing detection algorithm based on structured document[J]. Computer Engineering and Design,2019,40(10):2791-2798.


Last Update: 2020-02-29