The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Les toponymes et autres entités nommées sont les principaux problèmes de traitement de texte inconnu. Notre objectif est de récupérer des toponymes inconnus, non seulement pour éviter les bruits, mais également pour leur fournir des informations sur les zones candidates auxquelles ils pourraient appartenir. La plupart des méthodes précédentes de résolution de toponymes visaient à lever l’ambiguïté parmi les zones candidates, ce qui était dû à l’existence multiple d’un toponyme. Ces approches étaient principalement basées sur des répertoires géographiques et des contextes. Lorsqu'il s'agit de documents pouvant contenir des toponymes du monde entier, comme des articles de journaux, la résolution des toponymes n'est pas seulement une résolution d'ambiguïté, mais une sélection de zones candidates parmi toutes les régions de la Terre. Ainsi, nous proposons une méthode de résolution automatique de toponymes qui permet d'identifier ses zones candidates sur la base uniquement de leurs statistiques de surface, à la place des approches de recherche dans un dictionnaire. Notre méthode combine deux modules, la réduction des candidats de zone et l'examen des candidats de zone qui utilise des données par blocs, pour obtenir une grande précision sans réduire le taux de rappel. Notre résultat empirique a montré un taux de précision de 85.54 %, un taux de rappel de 91.92 % et une valeur de mesure F de 89 en moyenne. Cette méthode est une approche flexible et robuste pour la résolution toponymique ciblant un nombre illimité de zones.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copier
Tomohisa SANO, Shiho Hoshi NOBESAWA, Hiroyuki OKAMOTO, Hiroya SUSUKI, Masaki MATSUBARA, Hiroaki SAITO, "Robust Toponym Resolution Based on Surface Statistics" in IEICE TRANSACTIONS on Information,
vol. E92-D, no. 12, pp. 2313-2320, December 2009, doi: 10.1587/transinf.E92.D.2313.
Abstract: Toponyms and other named entities are main issues in unknown word processing problem. Our purpose is to salvage unknown toponyms, not only for avoiding noises but also providing them information of area candidates to where they may belong. Most of previous toponym resolution methods were targeting disambiguation among area candidates, which is caused by the multiple existence of a toponym. These approaches were mostly based on gazetteers and contexts. When it comes to the documents which may contain toponyms worldwide, like newspaper articles, toponym resolution is not just an ambiguity resolution, but an area candidate selection from all the areas on Earth. Thus we propose an automatic toponym resolution method which enables to identify its area candidates based only on their surface statistics, in place of dictionary-lookup approaches. Our method combines two modules, area candidate reduction and area candidate examination which uses block-unit data, to obtain high accuracy without reducing recall rate. Our empirical result showed 85.54% precision rate, 91.92% recall rate and .89 F-measure value on average. This method is a flexible and robust approach for toponym resolution targeting unrestricted number of areas.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E92.D.2313/_p
Copier
@ARTICLE{e92-d_12_2313,
author={Tomohisa SANO, Shiho Hoshi NOBESAWA, Hiroyuki OKAMOTO, Hiroya SUSUKI, Masaki MATSUBARA, Hiroaki SAITO, },
journal={IEICE TRANSACTIONS on Information},
title={Robust Toponym Resolution Based on Surface Statistics},
year={2009},
volume={E92-D},
number={12},
pages={2313-2320},
abstract={Toponyms and other named entities are main issues in unknown word processing problem. Our purpose is to salvage unknown toponyms, not only for avoiding noises but also providing them information of area candidates to where they may belong. Most of previous toponym resolution methods were targeting disambiguation among area candidates, which is caused by the multiple existence of a toponym. These approaches were mostly based on gazetteers and contexts. When it comes to the documents which may contain toponyms worldwide, like newspaper articles, toponym resolution is not just an ambiguity resolution, but an area candidate selection from all the areas on Earth. Thus we propose an automatic toponym resolution method which enables to identify its area candidates based only on their surface statistics, in place of dictionary-lookup approaches. Our method combines two modules, area candidate reduction and area candidate examination which uses block-unit data, to obtain high accuracy without reducing recall rate. Our empirical result showed 85.54% precision rate, 91.92% recall rate and .89 F-measure value on average. This method is a flexible and robust approach for toponym resolution targeting unrestricted number of areas.},
keywords={},
doi={10.1587/transinf.E92.D.2313},
ISSN={1745-1361},
month={December},}
Copier
TY - JOUR
TI - Robust Toponym Resolution Based on Surface Statistics
T2 - IEICE TRANSACTIONS on Information
SP - 2313
EP - 2320
AU - Tomohisa SANO
AU - Shiho Hoshi NOBESAWA
AU - Hiroyuki OKAMOTO
AU - Hiroya SUSUKI
AU - Masaki MATSUBARA
AU - Hiroaki SAITO
PY - 2009
DO - 10.1587/transinf.E92.D.2313
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E92-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2009
AB - Toponyms and other named entities are main issues in unknown word processing problem. Our purpose is to salvage unknown toponyms, not only for avoiding noises but also providing them information of area candidates to where they may belong. Most of previous toponym resolution methods were targeting disambiguation among area candidates, which is caused by the multiple existence of a toponym. These approaches were mostly based on gazetteers and contexts. When it comes to the documents which may contain toponyms worldwide, like newspaper articles, toponym resolution is not just an ambiguity resolution, but an area candidate selection from all the areas on Earth. Thus we propose an automatic toponym resolution method which enables to identify its area candidates based only on their surface statistics, in place of dictionary-lookup approaches. Our method combines two modules, area candidate reduction and area candidate examination which uses block-unit data, to obtain high accuracy without reducing recall rate. Our empirical result showed 85.54% precision rate, 91.92% recall rate and .89 F-measure value on average. This method is a flexible and robust approach for toponym resolution targeting unrestricted number of areas.
ER -