The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
L’utilisation de rapports en action s’est considérablement développée au cours des dernières décennies à mesure que les données sont numérisées. Cependant, les méthodes statistiques traditionnelles ne fonctionnent plus en raison de l’expansion incontrôlable et de la complexité des données brutes. Il est donc crucial de nettoyer et d’analyser les données financières à l’aide de méthodes modernes d’apprentissage automatique. Dans cette étude, les rapports trimestriels (c'est-à-dire les dépôts 10Q) des sociétés cotées en bourse aux États-Unis ont été analysés à l'aide de méthodes d'exploration de données. L'étude a utilisé 8905 2019 rapports trimestriels d'entreprises de 2022 à 10. L'approche proposée se compose de deux phases avec une combinaison de trois méthodes d'apprentissage automatique différentes. Les deux premières méthodes ont été utilisées pour générer un ensemble de données à partir des dépôts 2Q avec extraction de nouvelles fonctionnalités, et la dernière méthode a été utilisée pour le problème de classification. La méthode Doc10Vec dans le framework Gensim a été utilisée pour générer des vecteurs à partir de balises textuelles dans les dépôts 94000Q. Les vecteurs générés ont été regroupés à l'aide de l'algorithme K-means pour combiner les balises en fonction de leur sémantique. De cette manière, 20000 84 balises représentant différents éléments financiers ont été réduites à 10 XNUMX clusters constitués de ces balises, rendant l'analyse plus efficace et plus gérable. L'ensemble de données a été créé avec les valeurs correspondant aux balises dans les clusters. De plus, la métrique PriceRank a été ajoutée à l'ensemble de données en tant qu'étiquette de classe indiquant la force des prix des entreprises pour le prochain trimestre financier. Ainsi, l'objectif est de déterminer l'effet des rapports trimestriels d'une entreprise sur le prix de marché de l'entreprise pour la période suivante. Enfin, un modèle de réseau neuronal convolutif a été utilisé pour le problème de classification. Pour évaluer les résultats, toutes les étapes de la méthode hybride proposée ont été comparées à d’autres techniques d’apprentissage automatique. Cette nouvelle approche pourrait aider les investisseurs à examiner les entreprises collectivement et à en déduire de nouvelles informations significatives. La méthode proposée a été comparée à différentes approches de création d'ensembles de données en extrayant de nouvelles fonctionnalités et tâches de classification, puis finalement testée avec différentes métriques. L'approche proposée a donné des résultats comparativement meilleurs que les autres méthodes d'apprentissage automatique pour prédire la force future des prix sur la base de rapports antérieurs avec une précision de XNUMX % sur l'ensemble de données de dépôts XNUMXQ créé.
Mustafa Sami KACAR
Konya Technical Univ.
Semih YUMUSAK
KTO Karatay Univ.
Halife KODAZ
Konya Technical Univ.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copier
Mustafa Sami KACAR, Semih YUMUSAK, Halife KODAZ, "Price Rank Prediction of a Company by Utilizing Data Mining Methods on Financial Disclosures" in IEICE TRANSACTIONS on Information,
vol. E106-D, no. 9, pp. 1461-1471, September 2023, doi: 10.1587/transinf.2022OFP0002.
Abstract: The use of reports in action has grown significantly in recent decades as data has become digitized. However, traditional statistical methods no longer work due to the uncontrollable expansion and complexity of raw data. Therefore, it is crucial to clean and analyze financial data using modern machine learning methods. In this study, the quarterly reports (i.e. 10Q filings) of publicly traded companies in the United States were analyzed by utilizing data mining methods. The study used 8905 quarterly reports of companies from 2019 to 2022. The proposed approach consists of two phases with a combination of three different machine learning methods. The first two methods were used to generate a dataset from the 10Q filings with extracting new features, and the last method was used for the classification problem. Doc2Vec method in Gensim framework was used to generate vectors from textual tags in 10Q filings. The generated vectors were clustered using the K-means algorithm to combine the tags according to their semantics. By this way, 94000 tags representing different financial items were reduced to 20000 clusters consisting of these tags, making the analysis more efficient and manageable. The dataset was created with the values corresponding to the tags in the clusters. In addition, PriceRank metric was added to the dataset as a class label indicating the price strength of the companies for the next financial quarter. Thus, it is aimed to determine the effect of a company's quarterly reports on the market price of the company for the next period. Finally, a Convolutional Neural Network model was utilized for the classification problem. To evaluate the results, all stages of the proposed hybrid method were compared with other machine learning techniques. This novel approach could assist investors in examining companies collectively and inferring new, significant insights. The proposed method was compared with different approaches for creating datasets by extracting new features and classification tasks, then eventually tested with different metrics. The proposed approach performed comparatively better than the other machine learning methods to predict future price strength based on past reports with an accuracy of 84% on the created 10Q filings dataset.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2022OFP0002/_p
Copier
@ARTICLE{e106-d_9_1461,
author={Mustafa Sami KACAR, Semih YUMUSAK, Halife KODAZ, },
journal={IEICE TRANSACTIONS on Information},
title={Price Rank Prediction of a Company by Utilizing Data Mining Methods on Financial Disclosures},
year={2023},
volume={E106-D},
number={9},
pages={1461-1471},
abstract={The use of reports in action has grown significantly in recent decades as data has become digitized. However, traditional statistical methods no longer work due to the uncontrollable expansion and complexity of raw data. Therefore, it is crucial to clean and analyze financial data using modern machine learning methods. In this study, the quarterly reports (i.e. 10Q filings) of publicly traded companies in the United States were analyzed by utilizing data mining methods. The study used 8905 quarterly reports of companies from 2019 to 2022. The proposed approach consists of two phases with a combination of three different machine learning methods. The first two methods were used to generate a dataset from the 10Q filings with extracting new features, and the last method was used for the classification problem. Doc2Vec method in Gensim framework was used to generate vectors from textual tags in 10Q filings. The generated vectors were clustered using the K-means algorithm to combine the tags according to their semantics. By this way, 94000 tags representing different financial items were reduced to 20000 clusters consisting of these tags, making the analysis more efficient and manageable. The dataset was created with the values corresponding to the tags in the clusters. In addition, PriceRank metric was added to the dataset as a class label indicating the price strength of the companies for the next financial quarter. Thus, it is aimed to determine the effect of a company's quarterly reports on the market price of the company for the next period. Finally, a Convolutional Neural Network model was utilized for the classification problem. To evaluate the results, all stages of the proposed hybrid method were compared with other machine learning techniques. This novel approach could assist investors in examining companies collectively and inferring new, significant insights. The proposed method was compared with different approaches for creating datasets by extracting new features and classification tasks, then eventually tested with different metrics. The proposed approach performed comparatively better than the other machine learning methods to predict future price strength based on past reports with an accuracy of 84% on the created 10Q filings dataset.},
keywords={},
doi={10.1587/transinf.2022OFP0002},
ISSN={1745-1361},
month={September},}
Copier
TY - JOUR
TI - Price Rank Prediction of a Company by Utilizing Data Mining Methods on Financial Disclosures
T2 - IEICE TRANSACTIONS on Information
SP - 1461
EP - 1471
AU - Mustafa Sami KACAR
AU - Semih YUMUSAK
AU - Halife KODAZ
PY - 2023
DO - 10.1587/transinf.2022OFP0002
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E106-D
IS - 9
JA - IEICE TRANSACTIONS on Information
Y1 - September 2023
AB - The use of reports in action has grown significantly in recent decades as data has become digitized. However, traditional statistical methods no longer work due to the uncontrollable expansion and complexity of raw data. Therefore, it is crucial to clean and analyze financial data using modern machine learning methods. In this study, the quarterly reports (i.e. 10Q filings) of publicly traded companies in the United States were analyzed by utilizing data mining methods. The study used 8905 quarterly reports of companies from 2019 to 2022. The proposed approach consists of two phases with a combination of three different machine learning methods. The first two methods were used to generate a dataset from the 10Q filings with extracting new features, and the last method was used for the classification problem. Doc2Vec method in Gensim framework was used to generate vectors from textual tags in 10Q filings. The generated vectors were clustered using the K-means algorithm to combine the tags according to their semantics. By this way, 94000 tags representing different financial items were reduced to 20000 clusters consisting of these tags, making the analysis more efficient and manageable. The dataset was created with the values corresponding to the tags in the clusters. In addition, PriceRank metric was added to the dataset as a class label indicating the price strength of the companies for the next financial quarter. Thus, it is aimed to determine the effect of a company's quarterly reports on the market price of the company for the next period. Finally, a Convolutional Neural Network model was utilized for the classification problem. To evaluate the results, all stages of the proposed hybrid method were compared with other machine learning techniques. This novel approach could assist investors in examining companies collectively and inferring new, significant insights. The proposed method was compared with different approaches for creating datasets by extracting new features and classification tasks, then eventually tested with different metrics. The proposed approach performed comparatively better than the other machine learning methods to predict future price strength based on past reports with an accuracy of 84% on the created 10Q filings dataset.
ER -