Intelligent Data Processing Methods for the Atypical Values Correction of Stock Quotes

T. V. Zolotova; D. A. Volkova

doi:10.21686/2500-3925-2022-2-4-13

Intelligent Data Processing Methods for the Atypical Values Correction of Stock Quotes

T. V. Zolotova, D. A. Volkova

https://doi.org/10.21686/2500-3925-2022-2-4-13

Full Text:

PDF (Rus)

Generate QR code

Abstract

Purpose of the study. The purpose of the study is to carry out a comparative analysis of various methods for correcting atypical values of statistical data on the stock market and to develop recommendations for their use.
Materials and methods. The article analyzes Russian and foreign bibliography on the research problem. Consideration of machine learning methods for detecting and correcting outliers in time series is proposed. The mathematical basis of machine learning methods is the Z-score method, the isolation forest method, support vector method for outlier detection, and winsorization and multiple imputation methods for outlier correction. To create the models, the Jupyter Notebook software tool, which supports the Python programming language, was used. To implement machine-learning methods, data from stock quotes of the Moscow Exchange are used.
Results. The results of machine learning algorithms are demonstrated for sets of real statistical data representing the closing prices of shares of three Russian companies “Sberbank”, “Aeroflot”, “Gazprom” in the period from 01.12.2019 to 30.11.2020, obtained from the website of the Investment Company “FINAM”. A comparative analysis of methods for detecting and correcting outliers by standard deviation has been carried out. The Z-score statistical method allows you to accurately determine the distance from the suspicious observation to the distribution center, which is an advantage. The disadvantage of this method is the influence of outliers on the mean and standard deviation, which can contribute to the masking of outliers or their incorrect detection. The isolation forest method recognizes outliers of various types, and when implementing the method, there are no parameters that require selection; but the disadvantage is the slower detection rate of outliers compared to other methods. The support vector machine is a very fast method and is reduced to solving a quadratic programming problem, which always has a unique solution. The winsorization method for correcting outliers reduces the effect of outliers on the mean and variance, which is an advantage, but may introduce bias due to the selection of thresholds to separate observations in the sample. The multiple imputation method creates for each missing value not one, but many imputations, which avoids a systematic error, but at the expense of high computational costs. For the initial data used in the work, the best result was shown by the implementation of the multiple imputation algorithm based on the detected outliers by the support vector method.
Conclusion. There is no universal method for detecting and/or eliminating outliers in data analysis theory. In general, the determination of outliers is subjective, and the decision is made individually for each specific dataset, considering its characteristics or existing experience in this area. The practical implementation of the methods for detecting and eliminating outliers used in this work can be a tool for calculating more accurate indicators in any area, for example, to improve forecasting the stock price. As part of further work, it is possible to consider the optimization of the parameters used in the methods of detecting and correcting outliers to study their effect on the results of the models.

Keywords

outlier detection, outlier correction, Python programming language, standard deviation

About the Authors

T. V. Zolotova

Financial University under the Government of the Russian Federation
Russian Federation

Tatiana V. Zolotova Dr. Sci. (Physics and Mathematics), Professor

Moscow

D. A. Volkova

Peoples’ Friendship University of Russia

Daria A. Volkova

Moscow

References

1. Ardan S.D. Bankruptcy forecasting using machine learning. Informatsionnoye obshchestvo = Information society. 2021; 1: 56-67. (In Russ.)

2. Devyanin I.S. Preliminary data processing for machine learning. Fundamental’nyye i prikladnyye issledovaniya v fizike, khimii, matematike i informatike = Fundamental and applied research in physics, chemistry, mathematics and informatics. 2021: 117-121. (In Russ.)

3. Kopyrin A.S., Vidishcheva Ye.V. Evaluation of the impact of anomalies on the results of the analysis of economic data arrays. Modern Economy Success = Modern Economy Success. 2021; 2: 235-240. (In Russ.)

4. Chernova V.V. Application of machine learning methods to detect anomalies with bank cards. MNSK-2021 = MNSK-2021. 2021: 107-107. (In Russ.)

5. Shibzukov Z.M. On the principle of empirical risk minimization based on averaging aggregating functions. Doklady Akademii nauk = Reports of the Academy of Sciences. 2017; 476; 5: 495–499. (In Russ.)

6. Aggarwal C.C. Outlier analysis. Springer Science & Business Media. 2013. 455 p.

7. Aguinis H., Gottfredson R.K., Joo H. Bestpractice recommendations for defining, identifying, and handling outliers // Organizational Research Methods. 2013. Т. 16. № 2. С. 270–301.

8. Chandola V., Banerjee A., Kumar V. Anomaly detection: A survey // ACM Computing Surveys. 2009. Т. 41. № 3. С. 15–58.

9. Cousineau D., Chartier S. Outliers detection and treatment: A review // International Journal of Psychological Research. 2010. Т. 23. № 1. С. 59–68.

10. Cunningham P., Cord M., Delan, S. J. Supervised Learning // Machine Learning Techniques for Multimedia. Springer Berlin Heidelberg. 2008. С. 21–49.

11. Fei T.L., Kai M.T., Zhi-Hua Zhou. Isolation Forest // 2008 Eighth IEEE International Conference on Data Mining. 2008.

12. Frey B.B. The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation. 2018.

13. Gorelik V., Zolotova T. Method of Parametric Correction in Data Transformation and Approximation Problems // Lecture Notes in Computer Science (LNCS). Springer. Т. 12422. С. 122–133.

14. Hodge V., Austin J. A survey of outlier detection methodologies // Artificial Intelligence Review. 2004. Т. 22. № 2. С. 85–126.

15. Khan S.I., Hoque A.S. M.L. SICE: an improved missing data imputation technique. J Big Data. 2020. № 7. Article number 37.

16. Omar S., Ngadi A., Jebur H. Machine Learning Techniques for Anomaly Detection: An Overview // International Journal of Computer Applications. 2013. Т. 79. № 2. С. 33–41.

17. Patcha A., Park J. M. An overview of anomaly detection techniques: Existing solutions and latest technological trends // Computer Networks. 2007. Т. 51. № 12. С. 3448–3470.

18. Rousseeuw PJ, Hubert M. Robust statistics for outlier detection. Wiley Interdiscip Rev Data Min Knowl Discov. 2011. № 1(1). С. 73–9.

19. Zimek A., Filzmoser P. There and back again: Outlier detection between statistical reasoning and data mining algorithms // Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2018. Т. 8. № 6. С. 73–79.

20. Investitsionnaya kompaniya «FINAM» = Investment company «FINAM» [Internet]. Available from: https://www.finam.ru/. (In Russ.

21.

Review

For citations:

Zolotova T.V., Volkova D.A. Intelligent Data Processing Methods for the Atypical Values Correction of Stock Quotes. Statistics and Economics. 2022;19(2):4-13. (In Russ.) https://doi.org/10.21686/2500-3925-2022-2-4-13

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 2500-3925 (Print)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Statistics and Economics

Intelligent Data Processing Methods for the Atypical Values Correction of Stock Quotes

Full Text:

Abstract

Keywords

About the Authors

References

Review

For citations:

Наши партнеры

Cookies policy