The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
La classification de texte est une tâche fondamentale dans le traitement du langage naturel, qui trouve de nombreuses applications dans divers domaines, tels que la détection du spam et l'analyse des sentiments. Les informations syntaxiques peuvent être utilisées efficacement pour améliorer les performances des modèles de réseaux neuronaux dans la compréhension de la sémantique du texte. Le texte chinois présente un haut degré de complexité syntaxique, les mots individuels possédant souvent plusieurs parties du discours. Dans cet article, nous proposons BRsyn-caps, un modèle de classification de texte chinois basé sur un réseau de capsules qui exploite à la fois la syntaxe de Bert et celle des dépendances. Notre approche proposée intègre des informations sémantiques via le modèle de pré-formation de Bert pour obtenir des représentations de mots, extrait des informations contextuelles via un réseau neuronal de mémoire à long terme (LSTM), code des arbres de dépendance syntaxique via un réseau neuronal d'attention graphique et utilise un réseau de capsules pour intégrer efficacement les fonctionnalités. pour la classification des textes. De plus, nous proposons un algorithme de construction de matrice de contiguïté d'arbre de dépendance syntaxique au niveau des caractères, qui peut introduire des informations syntaxiques dans la représentation au niveau des caractères. Des expériences sur cinq ensembles de données démontrent que BRsyn-caps peut intégrer efficacement des informations sémantiques, séquentielles et syntaxiques dans le texte, prouvant ainsi l'efficacité de notre méthode proposée pour la classification des textes chinois.
Jie LUO
Wuhan Institute of Technology
Chengwan HE
Wuhan Institute of Technology
Hongwei LUO
Wuhan Institute of Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copier
Jie LUO, Chengwan HE, Hongwei LUO, "BRsyn-Caps: Chinese Text Classification Using Capsule Network Based on Bert and Dependency Syntax" in IEICE TRANSACTIONS on Information,
vol. E107-D, no. 2, pp. 212-219, February 2024, doi: 10.1587/transinf.2023EDP7119.
Abstract: Text classification is a fundamental task in natural language processing, which finds extensive applications in various domains, such as spam detection and sentiment analysis. Syntactic information can be effectively utilized to improve the performance of neural network models in understanding the semantics of text. The Chinese text exhibits a high degree of syntactic complexity, with individual words often possessing multiple parts of speech. In this paper, we propose BRsyn-caps, a capsule network-based Chinese text classification model that leverages both Bert and dependency syntax. Our proposed approach integrates semantic information through Bert pre-training model for obtaining word representations, extracts contextual information through Long Short-term memory neural network (LSTM), encodes syntactic dependency trees through graph attention neural network, and utilizes capsule network to effectively integrate features for text classification. Additionally, we propose a character-level syntactic dependency tree adjacency matrix construction algorithm, which can introduce syntactic information into character-level representation. Experiments on five datasets demonstrate that BRsyn-caps can effectively integrate semantic, sequential, and syntactic information in text, proving the effectiveness of our proposed method for Chinese text classification.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2023EDP7119/_p
Copier
@ARTICLE{e107-d_2_212,
author={Jie LUO, Chengwan HE, Hongwei LUO, },
journal={IEICE TRANSACTIONS on Information},
title={BRsyn-Caps: Chinese Text Classification Using Capsule Network Based on Bert and Dependency Syntax},
year={2024},
volume={E107-D},
number={2},
pages={212-219},
abstract={Text classification is a fundamental task in natural language processing, which finds extensive applications in various domains, such as spam detection and sentiment analysis. Syntactic information can be effectively utilized to improve the performance of neural network models in understanding the semantics of text. The Chinese text exhibits a high degree of syntactic complexity, with individual words often possessing multiple parts of speech. In this paper, we propose BRsyn-caps, a capsule network-based Chinese text classification model that leverages both Bert and dependency syntax. Our proposed approach integrates semantic information through Bert pre-training model for obtaining word representations, extracts contextual information through Long Short-term memory neural network (LSTM), encodes syntactic dependency trees through graph attention neural network, and utilizes capsule network to effectively integrate features for text classification. Additionally, we propose a character-level syntactic dependency tree adjacency matrix construction algorithm, which can introduce syntactic information into character-level representation. Experiments on five datasets demonstrate that BRsyn-caps can effectively integrate semantic, sequential, and syntactic information in text, proving the effectiveness of our proposed method for Chinese text classification.},
keywords={},
doi={10.1587/transinf.2023EDP7119},
ISSN={1745-1361},
month={February},}
Copier
TY - JOUR
TI - BRsyn-Caps: Chinese Text Classification Using Capsule Network Based on Bert and Dependency Syntax
T2 - IEICE TRANSACTIONS on Information
SP - 212
EP - 219
AU - Jie LUO
AU - Chengwan HE
AU - Hongwei LUO
PY - 2024
DO - 10.1587/transinf.2023EDP7119
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E107-D
IS - 2
JA - IEICE TRANSACTIONS on Information
Y1 - February 2024
AB - Text classification is a fundamental task in natural language processing, which finds extensive applications in various domains, such as spam detection and sentiment analysis. Syntactic information can be effectively utilized to improve the performance of neural network models in understanding the semantics of text. The Chinese text exhibits a high degree of syntactic complexity, with individual words often possessing multiple parts of speech. In this paper, we propose BRsyn-caps, a capsule network-based Chinese text classification model that leverages both Bert and dependency syntax. Our proposed approach integrates semantic information through Bert pre-training model for obtaining word representations, extracts contextual information through Long Short-term memory neural network (LSTM), encodes syntactic dependency trees through graph attention neural network, and utilizes capsule network to effectively integrate features for text classification. Additionally, we propose a character-level syntactic dependency tree adjacency matrix construction algorithm, which can introduce syntactic information into character-level representation. Experiments on five datasets demonstrate that BRsyn-caps can effectively integrate semantic, sequential, and syntactic information in text, proving the effectiveness of our proposed method for Chinese text classification.
ER -