The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
La recherche de similarité pour les flux de données a attiré beaucoup d'attention en matière de recommandation d'informations. Dans ce contexte, des travaux récents de premier plan portent sur les dernières W éléments d'un flux de données en tant qu'ensemble évolutif et réduire la recherche de similarité pour les flux de données afin de définir la recherche de similarité. Alors qu’ils considèrent des ensembles standards composés d’éléments, cet article étudie uniquement la recherche de similarité pour les flux de texte et traite des ensembles évolutifs dont les éléments sont des textes. Plus précisément, nous formulons un nouveau problème de recherche à plage continue appelé problème CTS (Continuous similarity search for Text Sets). La tâche du problème CTS est de trouver tous les flux de texte de la base de données dont la similarité avec la requête devient supérieure à un seuil ε. Il résume un scénario dans lequel un système de recommandation basé sur les utilisateurs recherche des utilisateurs similaires sur les services de réseaux sociaux. Le CTS est important car il permet à la fois à la requête et à la base de données de changer dynamiquement. Nous développons un algorithme basé sur un élagage rapide pour le CTS. De plus, nous discutons de la manière de l’accélérer avec l’index inversé.
Yuma TSUCHIDA
University of Electro-Communications
Kohei KUBO
University of Electro-Communications
Hisashi KOGA
University of Electro-Communications
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copier
Yuma TSUCHIDA, Kohei KUBO, Hisashi KOGA, "Continuous Similarity Search for Dynamic Text Streams" in IEICE TRANSACTIONS on Information,
vol. E106-D, no. 12, pp. 2026-2035, December 2023, doi: 10.1587/transinf.2022EDP7229.
Abstract: Similarity search for data streams has attracted much attention for information recommendation. In this context, recent leading works regard the latest W items in a data stream as an evolving set and reduce similarity search for data streams to set similarity search. Whereas they consider standard sets composed of items, this paper uniquely studies similarity search for text streams and treats evolving sets whose elements are texts. Specifically, we formulate a new continuous range search problem named the CTS problem (Continuous similarity search for Text Sets). The task of the CTS problem is to find all the text streams from the database whose similarity to the query becomes larger than a threshold ε. It abstracts a scenario in which a user-based recommendation system searches similar users from social networking services. The CTS is important because it allows both the query and the database to change dynamically. We develop a fast pruning-based algorithm for the CTS. Moreover, we discuss how to speed up it with the inverted index.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2022EDP7229/_p
Copier
@ARTICLE{e106-d_12_2026,
author={Yuma TSUCHIDA, Kohei KUBO, Hisashi KOGA, },
journal={IEICE TRANSACTIONS on Information},
title={Continuous Similarity Search for Dynamic Text Streams},
year={2023},
volume={E106-D},
number={12},
pages={2026-2035},
abstract={Similarity search for data streams has attracted much attention for information recommendation. In this context, recent leading works regard the latest W items in a data stream as an evolving set and reduce similarity search for data streams to set similarity search. Whereas they consider standard sets composed of items, this paper uniquely studies similarity search for text streams and treats evolving sets whose elements are texts. Specifically, we formulate a new continuous range search problem named the CTS problem (Continuous similarity search for Text Sets). The task of the CTS problem is to find all the text streams from the database whose similarity to the query becomes larger than a threshold ε. It abstracts a scenario in which a user-based recommendation system searches similar users from social networking services. The CTS is important because it allows both the query and the database to change dynamically. We develop a fast pruning-based algorithm for the CTS. Moreover, we discuss how to speed up it with the inverted index.},
keywords={},
doi={10.1587/transinf.2022EDP7229},
ISSN={1745-1361},
month={December},}
Copier
TY - JOUR
TI - Continuous Similarity Search for Dynamic Text Streams
T2 - IEICE TRANSACTIONS on Information
SP - 2026
EP - 2035
AU - Yuma TSUCHIDA
AU - Kohei KUBO
AU - Hisashi KOGA
PY - 2023
DO - 10.1587/transinf.2022EDP7229
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E106-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2023
AB - Similarity search for data streams has attracted much attention for information recommendation. In this context, recent leading works regard the latest W items in a data stream as an evolving set and reduce similarity search for data streams to set similarity search. Whereas they consider standard sets composed of items, this paper uniquely studies similarity search for text streams and treats evolving sets whose elements are texts. Specifically, we formulate a new continuous range search problem named the CTS problem (Continuous similarity search for Text Sets). The task of the CTS problem is to find all the text streams from the database whose similarity to the query becomes larger than a threshold ε. It abstracts a scenario in which a user-based recommendation system searches similar users from social networking services. The CTS is important because it allows both the query and the database to change dynamically. We develop a fast pruning-based algorithm for the CTS. Moreover, we discuss how to speed up it with the inverted index.
ER -