Impact of Factor Rotation Methods on Simulation Composite Indicators

In this research one hypothesis about mathematical measure, which can be used as an additional tool for analysis and assessment Lithuania’s economy tendencies, is verified. This additional measure has been constructed as a composite indicator (CI), involving indicators from social and economic fields. Simulated 45 different CIs have been obtained using several factor rotation methods. The analysis proved that standard mathematical methods may be used for composing CI, which give significant results analyzing Lithuania’s economy state. Simulated CI during stable or growing periods have shown similar tendencies like GDP. Also our research proved that during crisis period different rotation methods may have influence on final results of CI.


Introduction
For evaluation a country economy development the main economic indicator Gross domestic product (GDP) is chosen very often. Analysis of this indicator tendencies outlines not only performance of economy, but also wealth or poverty of inhabitants, for example GDP per capita. Though more and more information appears,that GDP not always reflects the real development of country, quality of life, development of technologies and other (see [16]).
As an example the last economic-financial crisis, which affected not only Europe but extended worldwide, may be analyzed. In Lithuania, if we treat only GDP changes, statistical data have shown that this crisis began only in the last quarter of 2008. Though, even before that time some single indicators showed the decline of economy, i.e. a decreased volume of construction, turnover of industry production and other indicators. During the crisis period some important indicators showed a deeper decline than GDP changes. Thus these indicators pointed out the other side of the economy's dynamics and had a significant effect on many inhabitants economic situation. For example, the collapse of construction sector produced a huge amount of unemployed people with many problems in social life. Hence a new mathematical problem can be formulated: to construct a synthetic indicator, which enables to evaluate country economy tendencies more accurately.
Composite indicators (CI) which compare country performance are useful tools in setting policy priorities and in monitoring the performance. The use of CI's around the world is growing year after year [4]. Usually CI methodology is used for ranking different countries (worldwide or regional) from the best to the worst. CI may provide comparison of countries that can be used to illustrate a complex set of important fields: environment, economy or technological development. Some examples of such indicators are given by Technology Achievement Index, Growth Competitiveness Index. CI is formed when individual indicators are compiled into a single index on the basis of an underlying model. CI should ideally measure multidimensional concepts which can not be captured by a single indicator, e.g. competitiveness, sustainability, knowledgebased society. The quality of CI depends not only on the methodology but also on the quality of the framework and the data used. Actually CI are not only economical models, they are more mathematical or computational models.
In general construction of CI consists of a few main steps such as creation of theoretical framework; data selection and pre-adjustment analysis of variables; multivariate analysis; variables normalization; weighting individual indicators and aggregation; sensitivity analysis, identification links to other variables, result analysis and visualization. These steps are structured in [12].
Similarly CI may be constructed to analyze and/or forecast fluctuations of some important sector or the whole economy of a particular country. For instance, Canadian Composite Leading Indicator is comprised of ten components which lead to cyclical activity in the economy and together represent all major categories of GDP [19]. L. Crosilla has extracted aggregated Composite Indicators for the whole Italian economy [6].
However the methodology of constructing CI for specific fields of one country is different from the general methodology described above. If we analyze a collection of time series, generally speaking the data is varying in time t = 1, . . . , T and such steps as multivariate or sensitivity analysis are difficult to implement [14]. The analysis of literature have shown, that most aspects of various methodologies are different because (i) they deal with different problems, (ii) depend on specifics of analyzing fields.
We have adopted some methods from the general CI methodology for the construction Lithuania's economy CI. The main steps of the proposed methodology are presented in Section 2. In this work we have defined CI as an additional tool for country's economy analysis. We concentrate on the impact of weights on CI. Further, as empirical results have shown, CI may be used in addition to GDP for economic analysis. Let us describe CI as a given mathematical function-model: where X is a set of indicators, belonging to various fields, e.g. business statistics, technologies. These fields are grouped into subfields, e.g. business statis-tics may be grouped into industry, construction and other subfields, matrix W describes weights of different indicators. Our general objective is to develop a suitable model for CI evaluation. From this objective the following mathematical problems arise: (i) To construct new composite indicator as an additional measure to reflect the dynamics of Lithuania economy.
(ii) To evaluate the impact of factor rotation method on CI weights.
The structure of this paper is the following. In Section 2, a methodology of CI construction is presented. Practical case concerning CI evaluation is described in Section 3. First, common indicators selected, which are widely used to describe Lithuania economy, then weights of every indicator are evaluated, indicators are aggregated, in this way CI is obtained. Section 4 gives concluding remarks.

Review of Methodology
In this section, CI modeling steps are described. Subsection 2.1 describes the theoretical framework, 2.2 describes data selection and primary analysis of variables, 2.3 indicates the weighting procedure, which deals with factor analysis, rotation methods, and aggregation. As mentioned in introduction, the general CI methodology has been adopted. Statistical data is analyzed by using time series and constructed CI depend on time variable, therefore the following steps of composing CI are implemented: 1. Creation of theoretical framework, which provides information, links and meaning of the model.

2.
Providing of primary data analysis: data selection, based on analytical importance; pre-adjustment analysis of variables (analysis of outliers, estimation missing values, seasonal adjustment); normalization of variables.
3. Selection appropriate procedure of weighting individual indicators and aggregation.
4. Sensitivity analysis of weights impact; in general result analysis and visualization.

Short term forecasting of CI (a few time periods ahead).
This research is concentrated on construction CI, we note that forecasting models or forecasting accuracy are not examined. Sensitivity analysis of weights impact on CI will be displayed in Section 3 as a practical case. In this paper we restrict only to this practical application and theoretical experiments have not been performed.

Theoretical framework
Before creating a theoretical framework, we have investigated the other approaches concerning different synthetic indicators. As mentioned in introduction, often GDP is called as inadequate indicator for economic performance of the country since it not always reflects the real state of current economy situation. There have been created many alternative indicators which show some important characteristics.
Concerning methodology guidelines and empirical results of construction synthetic indicators of Lithuania's economy (more often applied for specific sectors), we note some research works, which use data until the last economical crisis. Some examples are the following. Everhart and Duval-Hernandez presented a method for forecasting growth cycles in economic activity, measured as total industrial production [7]. On the basis on data from 1993-1999, they have constructed series which they aggregated into a composite leading indicator to predict the path of the economy in Lithuania. S.Žičkienė constructed sustainable development composite index to examine Lithuania's progress along a path of sustainable development 1999-2001 [18]. Snieška and Bruneckienė have analyzed Lithuania's regional competitiveness by the composite index during 2001-2007 [15].
There are international indexes (noted in introduction) which compare countries performance in some area, but usually these indexes have yearly frequency and use data with time lag. Using such indexes practically it is impossible to compare development using time variable t or in higher frequency (example quarterly periodicity). In summary, there are not many literature concerning creation of indexes which reflect the whole Lithuania's economy state and its tendencies during fixed time periods. After literature revision theoretical framework has been developed in the following two parts.
We have pointed out the objective to examine fluctuations of GDP and using mathematical technique to compose synthetic index (CI), which has the following necessary features: (i) The constructed CI should reflect the whole economy state.
(ii) CI is dependent on time variable t and there is a possibility to calculate changes on previous period or corresponding period of the previous year.
(iii) Constructed CI is compared to GDP fluctuation during chosen time periods.
From statistical point of view there are not many officially published economic indicators which solve this task. Therefore tendencies of synthetic indicator are compared to GDP tendencies. The essential questions are: • If discrepancies between CI and GDP tendencies are significant?
• How do CI and GDP describe Lithuania's economy state during crisis and stable/growth periods?
• What is the impact of weights to construction of CI?
As a criteria for selection of individual indicators we chose to reflect the following features of CI: (i) Constructed index is composed of different indicators X, which are often used for describing country's economy.
(ii) Indicators observable correlate to GDP.
(iii) Indicators should have a quarterly or monthly periodicity, having a possibility to aggregate monthly data to quarterly periodicity. This criteria will provide a correct comparison to GDP, which data has quarterly periodicity.

Primary data analysis
In this subsection data analysis is described from primary collection of indicators to normalization procedure. In the beginning of CI construction a primary collection of indicators X, n = 1. . N has been selected. All indicators X have a monthly and quarterly periodicity. Indicators with monthly periodicity have been transformed to quarterly periodicity. Since only price indexes are monthly data, the standard average procedure X Qt where Qt denotes quarter and M th denotes month, has been applied for this data. It was assumed that average value of three months reflects the quarterly data tendency. Thus in the remaining part of the text we consider a collection X which has quarterly periodicity.
Next a correlation between indicators and main economic indicator GDP has been tested. Indicators with very small, e.g. ρ Xi,GDP < 0.2, correlation have been eliminated from the further analysis. For the generation of missing values two methods have been used: (i) Search of similar data, namely some other indicator is selected, which strongly correlate to the analyzed indicator and has similar tendencies during analyzed period. We search for a mathematical model, which will predict missing values, including the new variable as a regressor. The model which gives the least root mean squared errors (ii) Additional analysis of the structure of time series is analyzed. We evaluate what share correspondent quarterly data of indicator have on yearly value. Missing values, in our case quarterly data, are evaluated on an assumption that data has structure tendencies similar to previous year quarterly data structure.
Seasonal adjustment for all of time series have been performed by using TRAMO/SEATS method [9].
Outliers, such as additive outliers, level shifts and transitory changes, have been analyzed. The number of outliers of every time series should not exceed 5%. But if these outliers during some period satisfy economic interpretation, then these time series are left in collection. Time series, having too many unexplained outliers may influence modeling and final results. In such cases these time series are eliminated from the analysis. All time series, except for the price indexes expressed in rates, have been transformed: logarithm and the first difference have been taken. Only the first difference has been taken of price indexes. For the implementation of the normalization step the standardization procedure has been applied.

Selection weighting method and aggregation
In this subsection we describe the procedure for evaluation of weights W: factor analysis model is constructed; factors are extracted using principal component analysis; using Nicoletti method in [13] weights are evaluated from rotated factor loading matrix and individual indicators using weights are aggregated. The choice of indicators weights is one of the steps in compiling CI and it directly influences the aggregation step. Different methods produce different view of CI. The mathematical problem is formulated to set weights that reflect both theoretical framework and statistical data properties.

Model construction
We assume that all indicators X i (t), i = 1, . . . , m, m ≤ n, t = 1, . . . , T in (1.1) have weights w i which may be static or vary in time. Let us investigate the simpler case and suppose that all indicators have static weights w i .
For evaluation of W the factor analysis (FA) model has been chosen [1]. Here FA is used only for W identification, factors or their meaning are not analyzed in this paper. Consider linear FA model where X ∼ N (0, 1) where L is loading matrix, F is latent factor matrix, E is idiosyncratic errors. We formulate the following assumptions of FA model: (i) Latent factors F j are uncorrelated and variance DF j = 1; (ii) E i is uncorrelated and DE i = τ i ; (iii) factors F j and E i are uncorrelated, where j = 1, . . . , p.
In FA model the variance of every indicator X i is separated into two parts: variance, determined by common factors (communality h 2 i ), and the other part, which can not be explained by factors (τ i ). Since we operate on time series X i (t) following explanations are introduced.
(a) If the idiosyncratic errors E i (t) are cross-sectionally independent and i.i.d. over time, then (2.1) is a classical FA model [2].
(b) In classical FA the number of units (in our case m) is fixed and the number of observation (here T ) tends to infinity [3].
We assume both assumptions, though for economic data these assumptions are not fully satisfied, and suppose that X satisfies classical FA requirements. On that ground we assume that statistical tests used in FA are suitable for X. In the paper some tests verifying data suitability for FA have been used: Bartlett's test of sphericity verifies hypothesis H o , R is a unit matrix (all indicators are not correlated, where R is the correlation matrix. Kaiser-Meyer-Olkin measure of sampling adequacy (KMO verifies if data is suitable for FA model: where r ij is the correlation coefficient, r ij is the partial correlation coefficient. The KMO test has been applied several times, every time eliminating from the collection series with the least value of MSA i = i =j r ij /( i =j r ij + i =j r ij ). This procedure has been repeated until KMO value is quite big (KMO > 0.8 means that data collection fits good to FA).

Factor extraction and rotation
Principal component analysis (PCA) for extraction factors has been chosen. In this case factors F j are considered as normalized principal components. We search for linear variety of principal components Y 1 = m j=1 α 1j X j , . . ., Y m = m j=1 α mj X j , with constraints: Solving PCA problem gives estimates a ij of coefficients α ij , i, j = 1, . . . , m and estimates of m principal componentŝ

2)
Assumption 1. Only first p components Y 1 , . . . , Y p are left, which explain no less than 95 per cent of total variance. Accordingly the number of factors will be p. Primary indicators using principal components are expressed as X i = m j=1 α ji Y j , i = 1, . . . , m. Having correlation matrix R XX , estimates of necessary indicators are evaluated. In this work estimates of common factors are not used, therefore we do not emphasize evaluation factors. Estimates of loadings l ij and communalities h 2 i are given by: where s 2 (Ŷ j ) is the estimate of i-th component variance DY i . A set of indicators X is fixed, where factors explain good every X i . We accept the requirement that allĥ 2 i should be greater that 0.6 or mean level of communalities should be at least 0.7 [11]. If the mean value is less than 0.7, FA model (2.1) should be applied several times, every time eliminating that indicator X i for which the estimate of communalities is smallest in the set. The procedure is repeated until mean value is equal to 0.7 or greater.
After the procedure of application KMO test and analysis of communalities, a set of indicators has been reduced R n → R m . The resulting loading matrix L (2.3) is not unique, that is a set of matrixes exists, which fulfill the assumptions of FA model. Theoretical problem arises to provide a simple structure for loading matrix L [17]. This structure means that each variable has loading onto one and only one factor. However in practical cases researches try to find a structure similar to simple structure. One of approaches to implement this objective is factor rotation method (RM) [10]. The problem of rotation procedure is identification of complexity function, but subject to different constraints for orthogonal and oblique rotation. Let J(r) be the class of all linear functions of order r. Let G ∈ J(r). Then F −→ G∈J(r) F * is a factor matrix and L * is a loading matrix after applied RM, which can be orthogonal or oblique. After orthogonal rotation factors remains orthogonal (uncorrelated). Oblique rotation methods are more general, these methods were included, because very often researchers in social-economic fields let some correlation between factors [5]. After oblique rotation the second axes can take any position in factor space, hence factors may be correlated.
As highlighted in the beginning of the paper, RM may have impact on resulting weights of CI. Some aspects of methods, which have been used in paper, are described below. Let us define factors j = 1, . . . , p and variables m and let describe factor complexity (FC) to each variable i by where l 2 ij is squared loading of j-th factor on the i-th variable, l 2 ij is a mean of the squared loading of i-th column, p is a number of factor matrix columns. After this transformation Quartimax (QU) measure is ij . Quartimax method tries to maximize large loading of variables. The procedure is running through the rows.
The popular Varimax method searches for a simple structure matrix L in the way that total variance to be as large as possible. The maximized simplicity measure is defined as: where l 2 ij is loading of i-th variable on j-th factor. Direct Oblimin (DO) is derivative method, using element of primary factors' matrix, where δ controls the level of oblique rotation: At first Promax (PR) method applies Varimax rotation, then resulting axes are rotated to oblique positions [8]. Matrix P = (p ij ) n×r is calculated as where L * = {l ij } is orthogonal rotated matrix, κ is power of PR rotation.

Indicators' weights evaluation and aggregation
Now we have rotated loading matrix L * which has been obtained using some RM. Usually after rotation we do not get loadings matrix with a simple structure, therefore theoretical method is necessary for construction weights from the rotated matrix L * . After analysis of literature on evaluation weights, in our opinion, the most appropriate method is Nicoletti proposed method, which groups indicators with highest loadings into intermediate composites (IC i k , k = 1. . p) [13]. In general this method gives highest weights to indicators, which strongly correlate to corresponding factor.
We formulate algorithm of evaluation CI weights, here l 2 ij are loadings after application of RM.
where vector W is used in (1.1). For indicators aggregation in order to obtain CI, a linear technique has been used:

Methodology Application
In this section we will set out main results of modeling and analysis CI. All results have been obtained using methodology guidelines described in Section 2. Statistical data from Statistics Lithuania [20] has been used for the analysis. We wanted to select time series as long as possible. Most of time series are available from 1998, price indexes from 1995, in consideration of this fact collection was build using time period 1998-2009.

Primary data analysis
Different indicators (n = 60), which notably correlate to GDP, from socialeconomic fields have been selected: population and social statistics, business statistics, foreign trade, transport and communication (11 indicators of monthly periodicity, i.e. price indexes, the rest have a quarterly periodicity). The unification of periodicity has been performed to obtain data panel only of quarterly periodicity.
First, correlation between X and GDP has been tested, indicators having very small correlation (less than 0.2) have been eliminated from the collection. Imputation of missing values of two variables have been performed. Variable Employees, thous. has missing values on I and III quarter of 1998-2001 (because of the methodology feature 1998-2001. Similar variable Employees of National accounts has been found. Correlation between these indicators is ρ = 0.99. Missing values have been obtained using model with variable of National accounts as a regressor.
The second variable Goods carried by road transport, thous. tonnes has quarterly periodicity data from 1999 and yearly data from 1995 [21]. After analysis of yearly periodicity 1998 and 1999-2009 data, there was not found significant level shift. Therefore missing values has been evaluated using time series structure: yearly data 1998 has been disaggregated to quarterly periodicity using average shares of correspondent quarters of 1999-2001.
All series have been seasonally adjusted, analysis of outliers have been performed. Then time series have been transformed by taking logarithm and first difference (for price indexes only difference) and standardized.

Model selection
Using statistical tests data quality and suitability for FA have been verified. KMO test has been applied for data collection X several times, every time eliminating variable with smallest value of MSA i . The procedure has been applied until KMO value increased from 0.588 till 0.811. Bartlett's Test of Sphericity with hypothesis H 0 has been rejected. Communalities of rest variables, which have been kept for further analysis, have been evaluated as h 2 i > 0.5, mean value 0.78. After model selection number of indicators is m = 28, indicators are from following fields/subfields: population and social statistics (2 indicators), industry (4 ind.), construction (4 ind.), domestic trade (6 ind.), foreign trade (4 ind.), services (3 ind.), price indexes (5 ind.).

Factor extraction and rotation
At the end of the process six factors have been extracted, which explain 77.95 per cent of total variance. Loading matrix L has been evaluated and as usually simple structure was not obtained. Different rotation methods have been applied in order to make L a simple structure, in this way z = 45 models have been obtained. Let us number all z = 1, . . . , 45 models, according to RM used, as follows: z = 1 is VA method used, z = 2 is QU method used. Oblique methods coefficients κ and δ values have been chosen using varying steps β κ and β δ . Models with numbers z = 3, . . . , 22 obtained using DO with chosen coefficients δ 1 = 0, δ 2 = −0.5, general expression δ z = δ z−1 + β δ , where step is varying β δ = −0.5; −1; −10. Accordingly, models with numbers z = 23, . . . , 45 are obtained using PR with chosen coefficients κ 1 = 1, general expression κ z = κ z−1 + β κ , where step is varying β κ = 0.5; 1; 10.

Analysis of indicators' weights
After application of weighting algorithm to each of rotated loading matrix L * z , indicators' weights w i,z have been obtained. As an example two figures with weights of indicators X i , using different RM method are presented in Figs  make from 1 to 4 per cent from total sum ( w i = 1). Accordingly indicator CONS (Construction work carried out at current prices data) weights vary from 4 to 8 per cent from total sum.
The analysis of all indicators' weights w i,z showed, that overall difference variation between weights minimum and maximum values, using different models, vary from 0.01 to 0.04, with few exception to 0.06-0.07 (PR(40)-PR(100)). On the other hand analysis of models DO(-1)-DO(10) and PR(3)-PR (11) showed that these methods do not have significant influence on indicators' weights (differ marginally). In summary it is difficult to distinguish the method which would give statistically optimal weighting result. We may say that some rotation models have impact on weights in a way that these weights are different from the average value.

Results analysis
After evaluation of weights w i,z , we return back to the extracted data collection X, m = 1. . 28 of indicators. We take seasonally adjusted and standardized data. As highlighted in (2.2), we assume that time series X i (t) has its weight w i and it is stable in time. Now we apply linear aggregation (2.4) to X i (t) using w i,z and get z = 45 different CI z (t), also we apply aggregation using equal weights to all indexes and get CI 46 (t). Let abbreviate this method eq. For the analysis all CI and GDP have been transformed by adding 100 in order to make analysis clear and escape negative values.
Most of all CI and GDP show similar tendencies, except during complicated periods 1998-2001 and 2008(IV)-2009, when tendencies are transformed and a discrepancy arose between CI and GDP. For demonstration of this fact e have chosen a few CI as an example (Fig. 3). In general simulated CI show that economy decline 1998-2001 was not so deep and 2008(IV)-2009 was deeper than GDP statistical data points. CI differences from GDP index show the tendency that during crises periods the differences (or gaps) are larger than in stable or growing periods.
In this way during mentioned periods unobserved elements/factors had some influence, which simulated CI could not fix. On the other hand there is a presumption that GDP data reflected not exact economy state and other indicators, like CI should be involved in analysis of the situation. For the specific analysis differences (per cent) between simulated CI and GDP have been evaluated (Fig. 4). During 2006-2008(III) differences were varying from −0.01 (CI(QU)) to 0.15 (CI(eq)) per cent. During crises period values increased several times. The figure shows that comparing decreasing period of economy to growing one, the ranges between differences values are larger. Especially CI(eq) distinguishes from the rest. This implies that weights of individual indicators X, on that ground different rotation methods, have impact on construction CI during unstable periods of economy.  Still if we analyze GDP and different CI changes compared to corresponding period of the previous year, it is difficult to distinguish significant difference between GDP and CI (Fig. 5). Only 2008(IV) and 2009 show the gap. In general if we work with changes, CI is not sensitive to the weights of individual indicators, with exception really deep decline like latter financial crises.

Conclusions
In this paper the new methodology of construction composite indicators has been presented. The analysis have shown that the proposed CI can be used as an additional tool for Lithuania's economy analysis. Investigation of CI and GDP together may give clearer and more general view of economy development. CI, simulated using different rotation methods, reflect the economy state better during stable and growing periods. Additional analysis is required to determine what factors (in general) have impact on economy during crises periods.
Concerning rotation methods we suggest to use different methods and verify distribution of individual indicators weights. Still we suggest to use carefully rotation methods mentioned in paper: QU, (DO(-50)-DO(-90)), (PR(40)-PR(100)). It is identified that selection of some models (DO(-1)-DO(10), PR(3)-PR(11)) do not have significant difference on weights of individual indicators.
The analysis of changes of GDP and CI compared to corresponding period of the previous year, have indicated that CI are not sensitive to weights impact with exception during deep decline of economy.