The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Les cyberincidents épidémiques sont causés par des sites Web malveillants utilisant des kits d’exploitation. Le kit d'exploitation permet aux attaquants d'effectuer une attaque par téléchargement drive-by (DBD). Cependant, il a été signalé que les sites Web malveillants utilisant un kit d'exploitation présentent des similitudes dans leurs arborescences de structure de site Web (WS). Par conséquent, des techniques d'identification de sites Web malveillants exploitant les arbres WS ont été étudiées, les arbres WS pouvant être estimés à partir des données de trafic HTTP. Néanmoins, le composant défensif du kit d’exploit nous empêche de capturer parfaitement l’arbre WS. Cet article montre donc une nouvelle procédure de construction d'arbre WS en utilisant le fait qu'une attaque DBD se produit dans un certain temps. Cet article propose, en outre, une nouvelle technique d'identification de sites Web malveillants en regroupant l'arborescence WS des kits d'exploit. Les résultats de l'expérience en supposant que l'ensemble de données D3M vérifient que la technique proposée identifie les kits d'exploitation avec une précision raisonnable, même lorsque le trafic HTTP provenant des sites malveillants est partiellement perdu.
Tatsuya NAGAI
Kobe University
Masaki KAMIZONO
PwC Cyber Services
Yoshiaki SHIRAISHI
Kobe University
Kelin XIA
Nanyang Technological University
Masami MOHRI
Gifu University
Yasuhiro TAKANO
Kobe University
Masakatu MORII
Kobe University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copier
Tatsuya NAGAI, Masaki KAMIZONO, Yoshiaki SHIRAISHI, Kelin XIA, Masami MOHRI, Yasuhiro TAKANO, Masakatu MORII, "A Malicious Web Site Identification Technique Using Web Structure Clustering" in IEICE TRANSACTIONS on Information,
vol. E102-D, no. 9, pp. 1665-1672, September 2019, doi: 10.1587/transinf.2018OFP0010.
Abstract: Epidemic cyber incidents are caused by malicious websites using exploit kits. The exploit kit facilitate attackers to perform the drive-by download (DBD) attack. However, it is reported that malicious websites using an exploit kit have similarity in their website structure (WS)-trees. Hence, malicious website identification techniques leveraging WS-trees have been studied, where the WS-trees can be estimated from HTTP traffic data. Nevertheless, the defensive component of the exploit kit prevents us from capturing the WS-tree perfectly. This paper shows, hence, a new WS-tree construction procedure by using the fact that a DBD attack happens in a certain duration. This paper proposes, moreover, a new malicious website identification technique by clustering the WS-tree of the exploit kits. Experiment results assuming the D3M dataset verify that the proposed technique identifies exploit kits with a reasonable accuracy even when HTTP traffic from the malicious sites are partially lost.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2018OFP0010/_p
Copier
@ARTICLE{e102-d_9_1665,
author={Tatsuya NAGAI, Masaki KAMIZONO, Yoshiaki SHIRAISHI, Kelin XIA, Masami MOHRI, Yasuhiro TAKANO, Masakatu MORII, },
journal={IEICE TRANSACTIONS on Information},
title={A Malicious Web Site Identification Technique Using Web Structure Clustering},
year={2019},
volume={E102-D},
number={9},
pages={1665-1672},
abstract={Epidemic cyber incidents are caused by malicious websites using exploit kits. The exploit kit facilitate attackers to perform the drive-by download (DBD) attack. However, it is reported that malicious websites using an exploit kit have similarity in their website structure (WS)-trees. Hence, malicious website identification techniques leveraging WS-trees have been studied, where the WS-trees can be estimated from HTTP traffic data. Nevertheless, the defensive component of the exploit kit prevents us from capturing the WS-tree perfectly. This paper shows, hence, a new WS-tree construction procedure by using the fact that a DBD attack happens in a certain duration. This paper proposes, moreover, a new malicious website identification technique by clustering the WS-tree of the exploit kits. Experiment results assuming the D3M dataset verify that the proposed technique identifies exploit kits with a reasonable accuracy even when HTTP traffic from the malicious sites are partially lost.},
keywords={},
doi={10.1587/transinf.2018OFP0010},
ISSN={1745-1361},
month={September},}
Copier
TY - JOUR
TI - A Malicious Web Site Identification Technique Using Web Structure Clustering
T2 - IEICE TRANSACTIONS on Information
SP - 1665
EP - 1672
AU - Tatsuya NAGAI
AU - Masaki KAMIZONO
AU - Yoshiaki SHIRAISHI
AU - Kelin XIA
AU - Masami MOHRI
AU - Yasuhiro TAKANO
AU - Masakatu MORII
PY - 2019
DO - 10.1587/transinf.2018OFP0010
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E102-D
IS - 9
JA - IEICE TRANSACTIONS on Information
Y1 - September 2019
AB - Epidemic cyber incidents are caused by malicious websites using exploit kits. The exploit kit facilitate attackers to perform the drive-by download (DBD) attack. However, it is reported that malicious websites using an exploit kit have similarity in their website structure (WS)-trees. Hence, malicious website identification techniques leveraging WS-trees have been studied, where the WS-trees can be estimated from HTTP traffic data. Nevertheless, the defensive component of the exploit kit prevents us from capturing the WS-tree perfectly. This paper shows, hence, a new WS-tree construction procedure by using the fact that a DBD attack happens in a certain duration. This paper proposes, moreover, a new malicious website identification technique by clustering the WS-tree of the exploit kits. Experiment results assuming the D3M dataset verify that the proposed technique identifies exploit kits with a reasonable accuracy even when HTTP traffic from the malicious sites are partially lost.
ER -