Preview

Statistics and Economics

Advanced search
Vol 19, No 2 (2022)
View or download the full issue PDF (Russian)
https://doi.org/10.21686/2500-3925-2022-2

ECONOMIC STATISTICS

4-13 657
Abstract

Purpose of the study. The purpose of the study is to carry out a comparative analysis of various methods for correcting atypical values of statistical data on the stock market and to develop recommendations for their use.
Materials and methods. The article analyzes Russian and foreign bibliography on the research problem. Consideration of machine learning methods for detecting and correcting outliers in time series is proposed. The mathematical basis of machine learning methods is the Z-score method, the isolation forest method, support vector method for outlier detection, and winsorization and multiple imputation methods for outlier correction. To create the models, the Jupyter Notebook software tool, which supports the Python programming language, was used. To implement machine-learning methods, data from stock quotes of the Moscow Exchange are used.
Results. The results of machine learning algorithms are demonstrated for sets of real statistical data representing the closing prices of shares of three Russian companies “Sberbank”, “Aeroflot”, “Gazprom” in the period from 01.12.2019 to 30.11.2020, obtained from the website of the Investment Company “FINAM”. A comparative analysis of methods for detecting and correcting outliers by standard deviation has been carried out. The Z-score statistical method allows you to accurately determine the distance from the suspicious observation to the distribution center, which is an advantage. The disadvantage of this method is the influence of outliers on the mean and standard deviation, which can contribute to the masking of outliers or their incorrect detection. The isolation forest method recognizes outliers of various types, and when implementing the method, there are no parameters that require selection; but the disadvantage is the slower detection rate of outliers compared to other methods. The support vector machine is a very fast method and is reduced to solving a quadratic programming problem, which always has a unique solution. The winsorization method for correcting outliers reduces the effect of outliers on the mean and variance, which is an advantage, but may introduce bias due to the selection of thresholds to separate observations in the sample. The multiple imputation method creates for each missing value not one, but many imputations, which avoids a systematic error, but at the expense of high computational costs. For the initial data used in the work, the best result was shown by the implementation of the multiple imputation algorithm based on the detected outliers by the support vector method.
Conclusion. There is no universal method for detecting and/or eliminating outliers in data analysis theory. In general, the determination of outliers is subjective, and the decision is made individually for each specific dataset, considering its characteristics or existing experience in this area. The practical implementation of the methods for detecting and eliminating outliers used in this work can be a tool for calculating more accurate indicators in any area, for example, to improve forecasting the stock price. As part of further work, it is possible to consider the optimization of the parameters used in the methods of detecting and correcting outliers to study their effect on the results of the models.

NATIONAL ACCOUNTS AND MACROECONOMIC STATISTICS

14-22 588
Abstract

Purpose of the study. The study is devoted to econometric analysis and modeling of the dynamics of the balance of payments’ development of Azerbaijan, the formation of a mathematical and statistical trend that can give a perspective assessment of the development of the balance of payments. In accordance with the goal, the tasks of choosing the best composition of explanatory factors for the model were set, using the characteristics and criteria of correlation and regression analysis, econometric tests, calculating estimates of the nature and closeness of the relationship between the explanatory factors, dependent and independent factors, testing the stationarity of the series.
Materials and methods. The official statistical data of the State Statistics Committee and the Central Bank of Azerbaijan, scientific works and studies of scientists, specialists, both Azerbaijani and foreign, in the fields of economics, mathematical and economic modeling were used. For the empirical analysis of non-stationary time series, statistical methods of information processing are used inthe work; to check the adequacy and test the multivariate model, the appropriate criteria and modern econometric procedures are used, taking into account the impact of exogenous factors. For calculations, application packages such as Excel and Eviews 8 were used.
Results. A multivariate regression model has been created that makes it possible to conduct an economic and statistical analysis of the dynamics of the current account of the balance of payments; the form and directions of the functional relationship between dependent and independent variables were determined, variability of variables was estimated, the results of multivariate regression analysis using econometric methods were analyzed; the quantitative characteristics of the mechanisms of influence of explanatory factors on the balance of payments were measured and interpreted; correlation dependencies for causal dependencies were investigated in the model, the Granger test was performed and factors were identified that reliably explain the outcome with high probabilities based on the Fisher criterion; the stationarity of the model was measured based on the Dickey-Fuller test. With differences of the first and second degree, the stationarity of the autoregressive model was determined based on the Student’s criterion by changing the lag value. In the process of modeling, the initially constructed model, covering the years 1995-2017 with five factors such as foreign investment, exports, imports, manat exchange rate, general investments, showed insufficient adequacy, that is, non-stationarity of the current account series of the balance of payments. The exchange rate of the national currency, which is involved in the model as an explanatory factor, subjected the values of the dependent series to large fluctuations, an increase in the variance in the residue, which created non-stationarity and which can be explained by the denomination of the national currency in 2006. In the next step, the period covering 2006-2017 was examined. In addition, in the process of research, independent factors were added to the model, such as state budget deficit and foreign exchange reserves. As a result, a multifactorial econometric model was created.
Conclusion. The constructed autoregressive model is quite adequate, demonstrates stationarity for the time series of the dependent variable and can be considered suitable for predictive values of the current account of the balance of payments. To develop specific recommendations for the long-term development of the balance of payments, the results of the study, substantiated by the analysis of the dynamics of the development of the balance of payments, make it possible to identify real trends in the balance of payments of Azerbaijan on the current account and determine its interdependence with other macroeconomic variables.

DEMOGRAPHIC STATISTICS

23-35 684
Abstract

The COVID-19 pandemic, which began in Russia in March 2020, had a huge impact on socio-economic processes. In numerous studies analyzing mortality caused by coronavirus infection, it is concluded that the number of deaths is underestimated. The high morbidity and mortality caused by coronavirus infection has far-reaching consequences for the economy of the regions and the country as a whole: deterioration in health, a decrease in the working-age population, a change in the structure of consumption of goods and services, etc. In this regard, it is relevant to analyze the processes associated with mortality from coronavirus infection.
The purpose of the study is to identify the main trends in the nosological and age-sex structure of mortality in the Volgograd region in the years preceding the COVID-19 pandemic, to assess the contribution of mortality from coronavirus infection to total mortality in 2020. Estimation of excess mortality was carried out taking into account the dynamics of age-specific mortality rates.
Materials and methods. The main sources of information for the study of mortality were the Russian database on fertility and mortality and Rosstat data. In the work, when analyzing mortality from COVID-19, data from the operational headquarters were also used. The analysis of the mortality dynamics was carried out using such indicators as the average life expectancy at birth, the crude death-rate, age-specific mortality rates in absolute and  relative (per 1000 people) terms. The processing of statistical data was carried out using the Microsoft Excel application package and matplotlib, pandas, numpy (Python programming language), pyramid (R programming language) libraries.
Results. In 2020, the number of deaths in the Volgograd region turned out to be more than in 2019 by 6647 people. If the trends in the dynamics of the intensity of mortality would persist in the year of the pandemic, then the total number of deaths in the Volgograd region would be equal to 32044 people. In this case, the excess mortality would have amounted to 7368 people.
Conclusion. As a result of the study, it was revealed that a significant increase in the number of deaths in the Volgograd region during the pandemic is explained by Rosstat as the cause of coronavirus infection by only 33.2%. This discrepancy may be the result of incorrect accounting of deaths from coronavirus infection. Another factor in the increase in mortality during a pandemic may be a decrease in the quality of medical care. There has been a reorientation of the work of medical institutions to the treatment of patients with coronavirus infection; the burden on emergency medical care has increased.

SOCIAL STATISTICS

36-42 848
Abstract

Purpose of the study. Identification of global trends in inequality in the distribution of the population income. In accordance with the goal, the following tasks are set: 1) to examine current international research that addresses the problem of income distribution inequality of the population; 2) assess the differentiation of the population income at the global and regional levels; 3) on the basis of the Gini and Theil indexes, to analyze the dynamics of income inequality of the population within and between countries of the world.
Materials and methods. In the process of preparing the article, theauthor used data from international reports, analytical statistical materials, scientific works of Russian and foreign scientists. The scientific methods of cognition were used in the work: analysis (to assess changes in indicators of income inequality of the population), synthesis (to determine the relationship between inter-country and intra-country income inequalities of the population), graphical (to build graphs that reflect the dynamics of changes in the distribution of national income and assets among the population, Gini coefficient, Theil index). These methods made it possible to identify the scale and trends in the differentiation of the population income in the world.
Results. The problem of uneven distribution of the population income was investigated. It has been established that inequality in the population income differs significantly between regions of the world, and the level of inequality of the population in terms of income within countries is much higher than the level of inequality between countries. An assessment of the current state is given and trends in the differentiation of the population income in the world based on the Gini index and the Theil index are revealed.
Conclusion. It has been established that the problem of income differentiation of the population is in the focus of attention of both the scientific community and international organizations, namely: the United Nations, the Organization for Economic Cooperation and Development, the World Bank, Oxfam. The level of differentiation of the population by income between regions of the world differs significantly. The scale of global income inequality of the population now has reached the level that was observed during the heyday of Western imperialism. With the help of the Gini and Theil indexes, it was revealed that intra-country inequality is significantly greater than the inter-country income inequality of the population.

43-51 360
Abstract

The purpose of the study. One of the urgent problems of the development of the regions of Russia is to increase the efficiency of regional systems of general secondary education. At the same time, it is necessary to take measures to reduce the gender gap in the personnel of modern schools. In order to understand the gender gap that has developed in the regions, it is advisable to conduct a comparative analysis of differences between regions based on specific indicators. At the same time, it is necessary to take into account the existing differentiation of the proportion of male teaching staff by age. The purpose of our study was to evaluate the indicators characterizing the share of male teachers of different age groups in the total number of teaching staff of schools in the regions of Russia. These indicators were the shares of the number of men of five age groups in the total number of school-teachers in these age categories in 2020.
Materials and methods. The study used the methodological approach proposed by the author, based on the consideration of specific indicators describing the proportion of male teachers of different ages working in general education schools in the total number of teaching staff of such schools. The study included five stages. The study used official statistical information for all regions of Russia. This information reflects the results of a federal statistical survey conducted by the Ministry of Education of the Russian Federation. The density functions of the normal distribution were used as models. The study used methods of statistical analysis and, in particular, ANOVA. The study carried out mathematical modeling of the distribution of the considered indicators by regions of Russia.
Results. The study proved that there is currently a significant feminization of secondary education in our country. The conducted modeling of empirical data demonstrated that the values of indicators characterizing the share of male teachers in the total number of teaching staff of schools depend on the age of teachers. The maximum value of the indicator was noted in the age group up to thirty years. With increasing age, the values of indicators decrease and reach a minimum in the age group from forty to forty-nine years. At an older age, the proportion of male teachers increases. The article discusses the regional features of the formation of the teaching staff of schools. Regions with minimum and maximum values of each of the five indicators were identified.
Conclusion. The proposed methodological approach and the results obtained have a scientific novelty, since the assessment of the territorial features of the gender structure of school-teachers in the regions of Russia has not been carried out before. The methodological approach presented in the article to the assessment of the gender structure of the teaching staff of secondary schools based on the proposed methodology can be used in further research.

СТАТИСТИКА И МАТЕМАТИЧЕСКИЕ МЕТОДЫ В ЭКОНОМИ

52-60 368
Abstract

Purpose of the study. The purpose of the study is to develop a model for predicting university performance indicators based on a cognitive approach, which is based on the construction of a cognitive map that reflects the influence of a set of latent factors on the basic indicators and provides a solution to the problem of scenario forecasting. The degree of achievement of the required values of the basic indicators that determine the ranking of the university depends on the magnitude of the increment of the identified latent factors. The developed model makes it possible to choose the most preferable variant of scenario forecasting under the existing restrictions on the resources allocated for the increment of latent factors.
Materials and methods. To achieve this goal, cognitive modeling methods based on gray fuzzy cognitive maps (FCM) were used in combination with methods of interval mathematics and causal algebra. The application of the considered approach made it possible to reduce the uncertainty of expert estimates of the strength of the relationship between the concepts of the cognitive map due to the use of special constructions in the form of interval estimates rather than point estimates when describing the relationships between the concepts, which ensured an increase in the reliability of the modeling results. The developed model is created based on an ensemble of gray FCMs, which, in turn, made it possible to increase the accuracy and reliability of the predictive model. The proposed approach to solving the problem of predicting the activities of the university made it possible to develop an adequate cognitive model.
Results. The developed cognitive model of the university’s activities made it possible to analyze the dynamics of changes in factors and their influence on basic indicators, as well as the dynamics of the development of the system of indicators. The calculation made it possible to choose the most cost-effective scenario for incrementing the values of latent factors to obtain the required value of the university ranking in the framework of the QS international institutional ranking of universities. A comparative analysis of the results of scenario forecasting based on conventional FCM, gray FCM, and an ensemble of gray FCM was carried out, which showed the advantage of the proposed approach.
Conclusion. During the study, a fuzzy cognitive model was developed for scenario forecasting of measures to achieve the required values of university performance targets in the QS international institutional ranking based on an ensemble of gray FCMs. The developed model provides, under the given constraints, obtaining the most acceptable scenario for planning the increment of basic indicators to target values by identifying the latent factors influencing them and calculating the required values of impulse effects on latent factors.



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2500-3925 (Print)