The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Dans cet article, nous proposons une méthode pour interroger des bases de données hétérogènes de biologie moléculaire. Étant donné que les données de biologie moléculaire sont distribuées dans plusieurs bases de données représentant différents domaines biologiques, il est hautement souhaitable d'intégrer les données ainsi que les corrélations entre les domaines. Cependant, étant donné que le nombre total de telles bases de données est très important et que les données contenues sont fréquemment mises à jour, il est difficile de maintenir l'intégration de l'intégralité du contenu des bases de données. Ainsi, nous proposons une méthode d'intégration dynamique basée sur la demande des utilisateurs, exprimée avec un langage de requête basé sur OQL. En limitant l'espace de recherche en fonction de la demande des utilisateurs, le coût de l'intégration peut être considérablement réduit. Les bases de données multiples présentent également une grande hétérogénéité, telle qu'une inadéquation sémantique entre les schémas de base de données. Par exemple, de nombreuses bases de données utilisent leur propre terminologie indépendante. Pour cette raison, il est généralement requis que la tâche d'intégration des données basée sur une demande de l'utilisateur soit effectuée de manière transitive ; recherchez d'abord dans chaque base de données les données qui satisfont la demande, puis récupérez à plusieurs reprises d'autres données qui correspondent aux données précédemment trouvées dans chaque base de données. Pour résoudre ce problème, nous introduisons deux types d'agents ; un agent de base de données et un agent utilisateur, qui résident respectivement dans chaque base de données et chez un utilisateur. La tâche d'intégration est effectuée par les agents ; les agents utilisateurs génèrent des demandes de récupération de données sur la base des résultats de recherche précédents effectués par les agents de base de données, et les agents de base de données recherchent dans leurs bases de données des données qui satisfont aux demandes reçues des agents utilisateurs. Nous avons développé un système prototype sur un réseau de postes de travail. Le système intègre quatre bases de données ; GenBank (une base de données de nucléotides d'ADN), SWISS-PROT, PIR (bases de données de séquences d'acides aminés protéiques) et PDB (une base de données de structures tridimensionnelles de protéines). Bien que les tailles de GenBank et de PDB dépassent chacune un milliard d'octets, le système a obtenu de bonnes performances dans la recherche de bases de données hétérogènes aussi volumineuses.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copier
Hideo MATSUDA, Takashi IMAI, Michio NAKANISHI, Akihiro HASHIMOTO, "Querying Molecular Biology Databases by Integration Using Multiagents" in IEICE TRANSACTIONS on Information,
vol. E82-D, no. 1, pp. 199-207, January 1999, doi: .
Abstract: In this paper, we propose a method for querying heterogeneous molecular biology databases. Since molecular biology data are distributed into multiple databases that represent different biological domains, it is highly desirable to integrate data together with the correlations between the domains. However, since the total amount of such databases is very large and the data contained are frequently updated, it is difficult to maintain the integration of the entire contents of the databases. Thus, we propose a method for dynamic integration based on user demand, which is expressed with an OQL-based query language. By restricting search space according to user demand, the cost of integration can be reduced considerably. Multiple databases also exhibit much heterogeneity, such as semantic mismatching between the database schemas. For example, many databases employ their own independent terminology. For this reason, it is usually required that the task for integrating data based on a user demand should be carried out transitively; first search each database for data that satisfy the demand, then repeatedly retrieve other data that match the previously found data across every database. To cope with this issue, we introduce two types of agents; a database agent and a user agent, which reside at each database and at a user, respectively. The integration task is performed by the agents; user agents generate demands for retrieving data based on the previous search results by database agents, and database agents search their databases for data that satisfy the demands received from the user agents. We have developed a prototype system on a network of workstations. The system integrates four databases; GenBank (a DNA nucleotide database), SWISS-PROT, PIR (protein amino-acid sequence databases), and PDB (a protein three-dimensional structure database). Although the sizes of GenBank and PDB are each over one billion bytes, the system achieved good performance in searching such very large heterogeneous databases.
URL: https://global.ieice.org/en_transactions/information/10.1587/e82-d_1_199/_p
Copier
@ARTICLE{e82-d_1_199,
author={Hideo MATSUDA, Takashi IMAI, Michio NAKANISHI, Akihiro HASHIMOTO, },
journal={IEICE TRANSACTIONS on Information},
title={Querying Molecular Biology Databases by Integration Using Multiagents},
year={1999},
volume={E82-D},
number={1},
pages={199-207},
abstract={In this paper, we propose a method for querying heterogeneous molecular biology databases. Since molecular biology data are distributed into multiple databases that represent different biological domains, it is highly desirable to integrate data together with the correlations between the domains. However, since the total amount of such databases is very large and the data contained are frequently updated, it is difficult to maintain the integration of the entire contents of the databases. Thus, we propose a method for dynamic integration based on user demand, which is expressed with an OQL-based query language. By restricting search space according to user demand, the cost of integration can be reduced considerably. Multiple databases also exhibit much heterogeneity, such as semantic mismatching between the database schemas. For example, many databases employ their own independent terminology. For this reason, it is usually required that the task for integrating data based on a user demand should be carried out transitively; first search each database for data that satisfy the demand, then repeatedly retrieve other data that match the previously found data across every database. To cope with this issue, we introduce two types of agents; a database agent and a user agent, which reside at each database and at a user, respectively. The integration task is performed by the agents; user agents generate demands for retrieving data based on the previous search results by database agents, and database agents search their databases for data that satisfy the demands received from the user agents. We have developed a prototype system on a network of workstations. The system integrates four databases; GenBank (a DNA nucleotide database), SWISS-PROT, PIR (protein amino-acid sequence databases), and PDB (a protein three-dimensional structure database). Although the sizes of GenBank and PDB are each over one billion bytes, the system achieved good performance in searching such very large heterogeneous databases.},
keywords={},
doi={},
ISSN={},
month={January},}
Copier
TY - JOUR
TI - Querying Molecular Biology Databases by Integration Using Multiagents
T2 - IEICE TRANSACTIONS on Information
SP - 199
EP - 207
AU - Hideo MATSUDA
AU - Takashi IMAI
AU - Michio NAKANISHI
AU - Akihiro HASHIMOTO
PY - 1999
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E82-D
IS - 1
JA - IEICE TRANSACTIONS on Information
Y1 - January 1999
AB - In this paper, we propose a method for querying heterogeneous molecular biology databases. Since molecular biology data are distributed into multiple databases that represent different biological domains, it is highly desirable to integrate data together with the correlations between the domains. However, since the total amount of such databases is very large and the data contained are frequently updated, it is difficult to maintain the integration of the entire contents of the databases. Thus, we propose a method for dynamic integration based on user demand, which is expressed with an OQL-based query language. By restricting search space according to user demand, the cost of integration can be reduced considerably. Multiple databases also exhibit much heterogeneity, such as semantic mismatching between the database schemas. For example, many databases employ their own independent terminology. For this reason, it is usually required that the task for integrating data based on a user demand should be carried out transitively; first search each database for data that satisfy the demand, then repeatedly retrieve other data that match the previously found data across every database. To cope with this issue, we introduce two types of agents; a database agent and a user agent, which reside at each database and at a user, respectively. The integration task is performed by the agents; user agents generate demands for retrieving data based on the previous search results by database agents, and database agents search their databases for data that satisfy the demands received from the user agents. We have developed a prototype system on a network of workstations. The system integrates four databases; GenBank (a DNA nucleotide database), SWISS-PROT, PIR (protein amino-acid sequence databases), and PDB (a protein three-dimensional structure database). Although the sizes of GenBank and PDB are each over one billion bytes, the system achieved good performance in searching such very large heterogeneous databases.
ER -