Preview

Statistics and Economics

Advanced search

Software Implementation of the Epps-Pulley Criterion in Matlab Modeling Environment

https://doi.org/10.21686/2500-3925-2024-6-57-67

Abstract

Purpose. Modeling systems and programming platforms provide broad opportunities for the use of statistical tools in research activities. Since the normal distribution is one of the most common distribution laws, the criterion for checking the sample for normality is in high demand among statistical assessment tools, among which the Epps-Pulley test has the status as one of the most powerful tests to check the deviation of the distribution from the normal one. There are a number of implementations of this test in the R and Python languages. However, this test is not implemented in one of the most popular Matlab modeling software. Thus, the purpose of this study is to develop a software implementation of the Epps-Pulley criterion in the Matlab environment and verify the correctness of the performed calculations.

Materials and methods. We implemented the calculation of EppsPulley statistics by two methods – classical, using cycles, and matrix-vector, using linear algebra operations. The classical method requires calculating the intermediate values necessary to obtain the criterion statistics using two independent cycles, the second cycle being a double one, in which one cycle is nested within the other. The matrix-vector method requires fewer lines of code by performing calculations using linear algebra operations on matrices and vectors. We obtained critical statistical values for the sample size ranging from 8 to 1000 elements with two-dimensional linear interpolation of tabular values. We used an approximation by a beta function of the third kind for a sample of over 1000 elements.

Results. An assessment of the computational efficiency of the methods showed that the cyclic approach is about three times higher than the matrix-vector approach in terms of consumed time, which is presumably associated with the processing of insignificant elements in triangular matrices when performing component-by-component operations. The correctness of the software implementation of the Epps-Pulley criterion was tested on several examples, which confirmed the compliance of the calculated values of the criterion statistics, as well as the critical values of statistics, with known data. We carried out a criterion statistical evaluation based on the empirical values of the error of the first kind. We obtained the error values correspondence to the specified significance levels. We performed comparative estimates of the Epps-Pulley test with the Anders-Darling and Shapiro-Wilk tests in terms of the criterion empirical power. Evaluation results are tabulated. The software implementation of the Epps-Pulley test is published on the MATLAB Central Internet resource and is available for free use. 

About the Authors

A. A. Tipikin
Military Training and Research Center of the Navy “Naval Academy named after Admiral of the Fleet of the Soviet Union N.G. Kuznetsov”
Russian Federation

Alexey A. Tipikin, Head of Department

St. Petersburg



A. A. Prusakov
Military Training and Research Center of the Navy “Naval Academy named after Admiral of the Fleet of the Soviet Union N.G. Kuznetsov”
Russian Federation

Alexander A. Prusakov, Senior Researcher

St. Petersburg



N. A. Timoshenko
Military Training and Research Center of the Navy “Naval Academy named after Admiral of the Fleet of the Soviet Union N.G. Kuznetsov”
Russian Federation

Nikolai A. Timoshenko, Junior Researcher

St. Petersburg



References

1. Gnatyuk V.I. Zakon optimal’nogo postroyeniya tekhnotsenozov = The Law of Optimal Construction of Technocenoses. Kaliningrad: KIC «Tekhnotsenoz»; 2019. 940 p. (In Russ.)

2. Murray-Smith D.J. Testing and Validation of Computer Simulation Models. Principles, Methods and Applications. New York: Springer; 2015. 258 p. DOI: 10.1007/978-3-319-15099-4.

3. Bol’shev L.N., Smirnov N.V. Tablitsy matematicheskoy statistiki. 3-ye izd. = Tables of Mathematical Statistics. 3rd ed. Moscow: Science; 1983. 416 p. (In Russ.)

4. Kobzar’ A.I. Prikladnaya matematicheskaya statistika. Dlya inzhenerov i nauchnykh rabotnikov = Applied Mathematical Statistics. For Engineers and Researchers. Moscow: Fizmatlit; 2006. 816 p. (In Russ.)

5. Volchikhin V.I., Ivanov A.I., Bezyayev A.V., Kupriyanov Ye.N. Neural network analysis of normality of small samples of biometric data using the chi-square test and Anderson-Darling criteria. Inzhenernyye tekhnologii i sistemy = Engineering technologies and systems. 2019; 29; 2 205–217. DOI: 10.15507.2658-4123.029.201902.205-217. (In Russ.)

6. Ivanov A.I., Vjatchanin S.E., Malygina E.A., Lukin V.S. Precision statistics: neuroet networking of chi-square test and Shapiro-Wilk test in the analysis of small selections of biometric data. Nadezhnost’ i kachestvo slozhnykh system = Reliability and quality of complex systems. 2019; 2(26): 27–34. DOI: 10.21685/2307-4205-2019-2-4. (In Russ.)

7. Ebner B., Henze N. Bahadur efficiencies of the Epps–Pulley test for normality. Journal of Mathematical Sciences. 2021; 273: 861–870. DOI: 10.1007/s10958-023-06547-2.

8. Lemeshko B.Yu. Kriterii proverki otkloneniya raspredeleniya ot normal’nogo zakona. Rukovodstvo po primeneniyu = Criteria for testing the deviation of a distribution from the normal law. Application Guide. Novosibirsk: NSTU; 2014. 192 p. (In Russ.)

9. Lemeshko B.Yu. Statisticheskiy analiz dannykh, modelirovaniye i issledovaniye veroyatnostnykh zakonomernostey. Komp’yuternyy podkhod = Statistical data analysis, modeling and study of probability patterns. Computer approach. Novosibirsk: NSTU; 2011. 888 p. (In Russ.)

10. Razali N.M., Wah Y.B. Power comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling tests. Journal of Statistical Modeling and Analytics. 2011; 2; 1: 21–33.

11. Statistics and Machine Learning Toolbox User’s Guide [Internet]. Available from: https://www.mathworks.com/help/pdf_doc/stats/stats.pdf.

12. BenSaida A. Shapiro-Wilk and ShapiroFrancia normality tests [Internet]. Available from: https://www.mathworks.com/matlabcentral/ fileexchange/13964-shapiro-wilk-and-shapirofrancia-normality-tests.

13. Nazarov A.A. Proverka normal’nosti raspredeleniya s ispol’zovaniyem kriteriya EppsaPalli sredstvami Python = Testing the Normality of Distribution Using the Epps-Pally Test Using Python [Internet]. Available from: https://github.com/AANazarov/MyModulePython.git. (In Russ.)

14. Kelly D.E. Oceanographics Analysis with R. New York: Springer-Verlag; 2018. 280 p. DOI: 10.1007/978-1-4939-8844-0.

15. The R project for Statistical Computing [Internet]. Available from: https://www.r-project.org.

16. GOST R ISO 5479–2002. Proverka otkloneniya raspredeleniya veroyatnostey ot normal’nogo raspredeleniya = GOST R ISO 5479–2002. Testing the Deviation of a Probability Distribution from a Normal Distribution. Moscow: Gosstandart of Russia; 2002. (In Russ.)

17. International Standard ISO 5479-1997. Statistical interpretation of data – Test for departure from the normal Distribution. Geneva: International Standardization Organization, 1997.

18. Altar R.R., Samanta D., Konar D., Bhattacharryya S. Software Source Code: Statistical Modeling. Berlin: De Grunter, 2021. 358 s. DOI: 10.1515/9783110703399.

19. Tipikin A.A. Epps-Pulley test for departure from normal distribution [Internet]. Available from: https://www.mathworks.com/matlabcentral/fileexchange/158036-eptest.

20. Stigler S.M. Do robust estimators work with real data? The Annals of Statistics. 1977; 5; 6: 1055–1098. DOI: 10.1214/aos/1176343997.

21. Bessonov A.A. Iskusstvennyy intellekt i matematicheskaya statistika v kriminalisticheskom izuchenii prestupleniy = Artificial Intelligence and Mathematical Statistics in Forensic Study of Crimes. Moscow: Prospekt; 2021. 816 p. (In Russ.)

22. Karamandis M., Beutler F. Ensemble slice sampling. Statistics and Computing. 2021; 31; 61: 1–18. DOI: 10.1007/s11222-021-10038-2.


Review

For citations:


Tipikin A.A., Prusakov A.A., Timoshenko N.A. Software Implementation of the Epps-Pulley Criterion in Matlab Modeling Environment. Statistics and Economics. 2024;21(6):57-67. (In Russ.) https://doi.org/10.21686/2500-3925-2024-6-57-67

Views: 93


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2500-3925 (Print)