The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Dans cet article, nous présentons un modèle hybride mot-caractère discriminant pour la segmentation conjointe des mots chinois et le marquage POS. Notre modèle hybride mot-caractère offre des performances élevées car il peut gérer à la fois des mots connus et inconnus. Nous décrivons nos stratégies qui donnent un bon équilibre pour l'apprentissage des caractéristiques des mots connus et inconnus et proposons une politique axée sur les erreurs qui offre un tel équilibre en acquérant des exemples de mots inconnus à partir d'erreurs particulières dans un corpus de formation. Nous décrivons un cadre efficace pour entraîner notre modèle basé sur l'algorithme Margin Infused Relaxed (MIRA), évaluons notre approche sur le Penn Chinese Treebank et montrons qu'elle atteint des performances supérieures par rapport aux approches de pointe rapportées dans le littérature.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copier
Canasai KRUENGKRAI, Kiyotaka UCHIMOTO, Jun'ichi KAZAMA, Yiou WANG, Kentaro TORISAWA, Hitoshi ISAHARA, "Joint Chinese Word Segmentation and POS Tagging Using an Error-Driven Word-Character Hybrid Model" in IEICE TRANSACTIONS on Information,
vol. E92-D, no. 12, pp. 2298-2305, December 2009, doi: 10.1587/transinf.E92.D.2298.
Abstract: In this paper, we present a discriminative word-character hybrid model for joint Chinese word segmentation and POS tagging. Our word-character hybrid model offers high performance since it can handle both known and unknown words. We describe our strategies that yield good balance for learning the characteristics of known and unknown words and propose an error-driven policy that delivers such balance by acquiring examples of unknown words from particular errors in a training corpus. We describe an efficient framework for training our model based on the Margin Infused Relaxed Algorithm (MIRA), evaluate our approach on the Penn Chinese Treebank, and show that it achieves superior performance compared to the state-of-the-art approaches reported in the literature.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E92.D.2298/_p
Copier
@ARTICLE{e92-d_12_2298,
author={Canasai KRUENGKRAI, Kiyotaka UCHIMOTO, Jun'ichi KAZAMA, Yiou WANG, Kentaro TORISAWA, Hitoshi ISAHARA, },
journal={IEICE TRANSACTIONS on Information},
title={Joint Chinese Word Segmentation and POS Tagging Using an Error-Driven Word-Character Hybrid Model},
year={2009},
volume={E92-D},
number={12},
pages={2298-2305},
abstract={In this paper, we present a discriminative word-character hybrid model for joint Chinese word segmentation and POS tagging. Our word-character hybrid model offers high performance since it can handle both known and unknown words. We describe our strategies that yield good balance for learning the characteristics of known and unknown words and propose an error-driven policy that delivers such balance by acquiring examples of unknown words from particular errors in a training corpus. We describe an efficient framework for training our model based on the Margin Infused Relaxed Algorithm (MIRA), evaluate our approach on the Penn Chinese Treebank, and show that it achieves superior performance compared to the state-of-the-art approaches reported in the literature.},
keywords={},
doi={10.1587/transinf.E92.D.2298},
ISSN={1745-1361},
month={December},}
Copier
TY - JOUR
TI - Joint Chinese Word Segmentation and POS Tagging Using an Error-Driven Word-Character Hybrid Model
T2 - IEICE TRANSACTIONS on Information
SP - 2298
EP - 2305
AU - Canasai KRUENGKRAI
AU - Kiyotaka UCHIMOTO
AU - Jun'ichi KAZAMA
AU - Yiou WANG
AU - Kentaro TORISAWA
AU - Hitoshi ISAHARA
PY - 2009
DO - 10.1587/transinf.E92.D.2298
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E92-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2009
AB - In this paper, we present a discriminative word-character hybrid model for joint Chinese word segmentation and POS tagging. Our word-character hybrid model offers high performance since it can handle both known and unknown words. We describe our strategies that yield good balance for learning the characteristics of known and unknown words and propose an error-driven policy that delivers such balance by acquiring examples of unknown words from particular errors in a training corpus. We describe an efficient framework for training our model based on the Margin Infused Relaxed Algorithm (MIRA), evaluate our approach on the Penn Chinese Treebank, and show that it achieves superior performance compared to the state-of-the-art approaches reported in the literature.
ER -