The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
La recherche de code est une tâche permettant de récupérer le code le plus pertinent compte tenu d'une requête en langage naturel. Plusieurs études récentes ont proposé des méthodes basées sur l'apprentissage en profondeur qui utilisent un modèle multi-encodeur pour analyser le code en plusieurs champs afin de représenter le code. Ces méthodes améliorent les performances du modèle en distinguant les codes similaires et en utilisant une matrice de relation pour relier le code et la requête. Cependant, ces modèles nécessitent plus de ressources de calcul et de paramètres que les modèles à codeur unique. De plus, l'utilisation de la matrice de relations qui repose uniquement sur le pooling maximum ne tient pas compte de la fourniture d'informations d'alignement des mots. Pour atténuer ces problèmes, nous proposons un modèle d'alignement combiné pour la recherche de code. Nous concaténons les champs multi-codes en une seule séquence pour représenter le code et utilisons un modèle de codage pour coder les fonctionnalités du code. De plus, nous transformons la matrice de relations en utilisant des vecteurs entraînables pour éviter les pertes d'informations. Ensuite, nous combinons une attention intra-modale et cross-modale pour attribuer les mots saillants tout en faisant correspondre le code et la requête correspondants. Enfin, nous appliquons le poids d’attention à l’intégration de code/requête et calculons la similarité cosinus. Pour évaluer les performances de notre modèle, nous comparons notre modèle avec six modèles précédents sur deux ensembles de données populaires. Les résultats montrent que notre modèle atteint des performances Top@0.614 de 0.687 et 1, surperformant les meilleurs modèles de comparaison de 12.2 % et 9.3 %, respectivement.
Juntong HONG
Kyoto Institute of Technology
Eunjong CHOI
Kyoto Institute of Technology
Osamu MIZUNO
Kyoto Institute of Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copier
Juntong HONG, Eunjong CHOI, Osamu MIZUNO, "A Combined Alignment Model for Code Search" in IEICE TRANSACTIONS on Information,
vol. E107-D, no. 3, pp. 257-267, March 2024, doi: 10.1587/transinf.2023MPP0002.
Abstract: Code search is a task to retrieve the most relevant code given a natural language query. Several recent studies proposed deep learning based methods use multi-encoder model to parse code into multi-field to represent code. These methods enhance the performance of the model by distinguish between similar codes and utilizing a relation matrix to bridge the code and query. However, these models require more computational resources and parameters than single-encoder models. Furthermore, utilizing the relation matrix that solely relies on max-pooling disregards the delivery of word alignment information. To alleviate these problems, we propose a combined alignment model for code search. We concatenate the multi-code fields into one sequence to represent code and use one encoding model to encode code features. Moreover, we transform the relation matrix using trainable vectors to avoid information losses. Then, we combine intra-modal and cross-modal attention to assign the salient words while matching the corresponding code and query. Finally, we apply the attention weight to code/query embedding and compute the cosine similarity. To evaluate the performance of our model, we compare our model with six previous models on two popular datasets. The results show that our model achieves 0.614 and 0.687 Top@1 performance, outperforming the best comparison models by 12.2% and 9.3%, respectively.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2023MPP0002/_p
Copier
@ARTICLE{e107-d_3_257,
author={Juntong HONG, Eunjong CHOI, Osamu MIZUNO, },
journal={IEICE TRANSACTIONS on Information},
title={A Combined Alignment Model for Code Search},
year={2024},
volume={E107-D},
number={3},
pages={257-267},
abstract={Code search is a task to retrieve the most relevant code given a natural language query. Several recent studies proposed deep learning based methods use multi-encoder model to parse code into multi-field to represent code. These methods enhance the performance of the model by distinguish between similar codes and utilizing a relation matrix to bridge the code and query. However, these models require more computational resources and parameters than single-encoder models. Furthermore, utilizing the relation matrix that solely relies on max-pooling disregards the delivery of word alignment information. To alleviate these problems, we propose a combined alignment model for code search. We concatenate the multi-code fields into one sequence to represent code and use one encoding model to encode code features. Moreover, we transform the relation matrix using trainable vectors to avoid information losses. Then, we combine intra-modal and cross-modal attention to assign the salient words while matching the corresponding code and query. Finally, we apply the attention weight to code/query embedding and compute the cosine similarity. To evaluate the performance of our model, we compare our model with six previous models on two popular datasets. The results show that our model achieves 0.614 and 0.687 Top@1 performance, outperforming the best comparison models by 12.2% and 9.3%, respectively.},
keywords={},
doi={10.1587/transinf.2023MPP0002},
ISSN={1745-1361},
month={March},}
Copier
TY - JOUR
TI - A Combined Alignment Model for Code Search
T2 - IEICE TRANSACTIONS on Information
SP - 257
EP - 267
AU - Juntong HONG
AU - Eunjong CHOI
AU - Osamu MIZUNO
PY - 2024
DO - 10.1587/transinf.2023MPP0002
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E107-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2024
AB - Code search is a task to retrieve the most relevant code given a natural language query. Several recent studies proposed deep learning based methods use multi-encoder model to parse code into multi-field to represent code. These methods enhance the performance of the model by distinguish between similar codes and utilizing a relation matrix to bridge the code and query. However, these models require more computational resources and parameters than single-encoder models. Furthermore, utilizing the relation matrix that solely relies on max-pooling disregards the delivery of word alignment information. To alleviate these problems, we propose a combined alignment model for code search. We concatenate the multi-code fields into one sequence to represent code and use one encoding model to encode code features. Moreover, we transform the relation matrix using trainable vectors to avoid information losses. Then, we combine intra-modal and cross-modal attention to assign the salient words while matching the corresponding code and query. Finally, we apply the attention weight to code/query embedding and compute the cosine similarity. To evaluate the performance of our model, we compare our model with six previous models on two popular datasets. The results show that our model achieves 0.614 and 0.687 Top@1 performance, outperforming the best comparison models by 12.2% and 9.3%, respectively.
ER -