REGIONAL HOUSE PRICE INDEX CONSTRUCTION – THE CASE OF SWEDEN

The academic literature on the construction of regional house price indexes usually uses geographic areas whose boundaries are administratively drawn. However such administrative regions might not be optimal for the construction of regional price indexes. When producing housing price indexes, we often encounter problems with insufficient number of observations. One way to remedy this problem is to estimate a quarterly index instead of a monthly index. Another possible way to mitigate the thin markets problem is to construct indexes for geographically aggregated regions. However, the literature that discusses methods of dealing with the problem of thin markets and especially geographical aggregation is very rare. The goal of this paper is to construct a housing price index for a major part of Sweden, and to construct price index series for a number of regions. The number of regions, and how their boundaries should be created in order to construct reliable regional price indexes, is however an open question. We apply traditional hedonic methodology in order to estimate house price indexes for both predefined regions whose boundaries are based on a division of labor markets in Sweden, as well as a division of regions based on statistical cluster analysis. The results from this study suggest that regions should be clustered together based on regional price levels and/or price development as clustering variables. If only geographical proximity is used as clustering variable, our computations show that there is a high risk that we end up with some clusters having large standard errors, which in turn might result in inaccurate indexes.


INTRODUCTION
When house price indexes are constructed by estimating hedonic price equations, having access to a large number of transactions is typi-cally essential. Besides transaction prices, a rich set of attributes is needed in order to construct reliable house price indexes. In many cases, however, the number of available transactions on the housing market is less than the amount which is desirable, which in turn might create problems for index constructors to, for instance, do statistical inference and economic analysis. The problem of how to construct price indexes when we face thin markets is therefore important to investigate. The literature about house price index construction with small sample sizes is not huge: Schwann (1998), McMillen (2003), Francke and Vos (2004), and Francke (2010), are examples of recent articles dealing with the problem of index construction in a thin markets environment.
One way to reduce the problem of thin markets is to aggregate smaller housing markets to larger housing markets, but it is not obvious how this aggregation should be carried out. In many cases there has been an arbitrary pooling of data across geography. To aggregate geographically adjacent areas may not be the best way to construct large housing market since different areas within a housing market can exhibit price evolutions that differ much. For instance, some housing markets have very distinct sub-markets. Furthermore, estimating a single price index for a whole region might not be a good solution, simply because such a method is based on implicit assumption that the aggregated price index has similar statistical properties as all the individual indexes in the sub-markets.
An interesting approach to create aggregated indexes is to combine housing markets that exhibit similar house price developments. Although this may seem like a good solution, there still is a problem of how to define what constitutes similar house price developments, and how to compare different regional housing markets.
In order to reduce the problem of thin markets, we present in this paper different methods to aggregate housing markets to larger clusters using cluster analysis. Different clustering methods yield different regions, and henceforth different sets of price indexes. We therefore apply the Root Mean Squared Error (RMSE) as the out-of-sample measure in order to evaluate the different price indexes.
The disposition is as follow: a brief literature review is presented in the next section. Section 3 describes the methodology used. The empirical analysis is presented in Section 4. Section 5 concludes this paper.

LITERATURE REVIEW
As mentioned above, the literature that discusses house price index construction and the problem of thin markets is quite small. Schwann (1998) was one of the first who presented a method on how to tackle the problem. He defined thin markets as those that that have less than 30 sales per period (in his case per quarter). In order to estimate local house price indexes, temporal aggregation may be considered as a tool to increase reliability and accuracy of the index. Temporal aggregation is however not a practical or recommended solution, as Englund et al. (1999) have shown. Schwann (1998) proposed instead a method where earlier observations are added to current transactions and thereby increasing the number of observations. He used a data set from 1979 to 1992 in Vancouver with more than 60 000 observations. In order to evaluate his index method, he designed an experiment where he used smaller and smaller samples. In that way he could compare his method with the "true" price index using all observations. He used the root mean squared errors (RMSE), the average standard error, number of periods outside the confidence interval and turning point correctly identified. His conclusion is that the performance of this method is much better than a traditional hedonic price index. Englund et al. (1999) analyzed whether temporal aggregation can be used in order to calculate local house price indexes. They concluded that time intervals should be as short as possible, that is, temporal disaggregation is considered to be most important. McMillen (2003) used locally weighted regressions on order to estimate reliable and accurate price indexes in submarkets that have very few transactions. The method is based on Fou-rier expansion method and allowed him to estimate smooth house price indexes for 851 census tracts in the metropolitan area of Chicago. The main idea behind the method is to set up a regression model where observations from outside a census tract are used. Observations far away are down-weighted and more weights are placed on transactions in the census tracts. He used a data set consisting of almost 28 000 observations, repeat sales in his case, over a six year period in the beginning of 1990. Francke and Vos (2004) used a so-called hierarchical trend model (HTM) that is an extension of Schwann (1998). The HTM model allows the parameter to vary in space, time, and house type and thereby make it possible to estimate a price index for each segment of the market. One of the disadvantages with the HTM is that assumptions need to be made about the general trend and trend levels for sub-markets. They used about 30 000 observations over the period of 1985 to 1999 in Amsterdam and 21 000 in Breda, Holland. They studied standard deviation of each model in order to evaluate them. The results indicated that for small local housing markets, the HTM model seemed to be more accurate. A more recent article is Francke (2010) who applied repeat sales method to estimate indexes for thin markets. Costello et al. (2009) andGoh et al. (2012) compared different methods where the objective was to find the most accurate and robust price index in highly localized markets at frequent time intervals. They used the Mean Square Error (MSE) as the out-of-sample measure in order to evaluate their different price indexes. 75% of the observations were used in order to estimate the different models and the remaining 25% were saved to evaluate the performance. They used a data set of more than 500 000 observations from 1988 to 2005 in the city of Perth in Australia. The area is divided into 299 suburbs. They concluded that aggregation (both temporal and geographical) can be problematic and should not be done arbitrary. If it is done, the so-called hedonic imputation method shows better performance than the other methods used (longitudinal he-donic, repeat-sales method, hybrid and median approach).
What conclusions can be drawn from the literature? Different methods have been proposed in order to reduce the problem of thin markets. However the proposed methods typically put forward that different ways to make temporal and geographical aggregations. But such methods seem to come at a cost: they reduce accuracy and reliability of house prices indexes. In other words, there is an important trade-off between estimating indexes on aggregated regions in order to mitigate thin market problems, and to estimate price indexes on disaggregated levels in order to avoid problems of accuracy and reliability.
Below we will construct regional prices indexes based on geographical aggregation. However, we are not doing that in an arbitrary way, instead we test different methods based on cluster analysis. In this analysis, we are not only considering aggregating nearby areas, but also other ways to segment the regions, for instance by using price development and mean prices as clustering variables.

METHOD
The estimation of regional price index series is done for two types of geographical divisions of regions. First we estimate price indexes for already existing regions. The Swedish Agency for Economic and Regional Growth (Tillväxtverket) has created these regions, whose boundaries reflect functional labor market regions. Some regions are further divided into sub-regions. We use these sub-regions when possible in this paper. Henceforth the regions are simply referred to as "FA-(sub)regions" or simply "regions". We could have used single municipalities as smallest geographical unit in this first step, but the number of observations would not be sufficient in order to estimate hedonic price equations in most of the 290 municipalities in Sweden.
Secondly, we estimate price indexes for clusters of regions that we create with cluster analysis. Therefore the price index estimation and the following evaluation procedure involve a large number of steps. In order to help the reader to follow our procedures we first give a short overall summary of the main steps of the estimation procedure. Thereafter we explain the estimation procedure in more detail. We also present some key numbers.
Our analysis is based on following main steps: -Initial step (step 0): Collection of data on single-family house transactions and on FA-(sub)regions. -Step 1: Estimation of annual hedonic price index series for each region and preparing a dataset with descriptive statistics for the regions -based on a sample of 90% of the data. - Step 2: Applying cluster analysis in order to create different sets of homogenous groups of housing sub-markets -the clustered regions. -Step 3: Estimation of hedonic price index series for the different sets of clustered regions, and comparison and performance evaluation of the different price index series. Below we explain in more detail the different sub-procedures involved in the steps above.
Step 0: Collection of data on single-family house transactions and labor market regions The data on single-family house transactions comes from a unique database provided by Valueguard Index Sweden AB. The database contains about 70% of all house sales in Sweden from 2005 to 2010. The database has been constructed by merging data from real estate agents and the official property register. In total, 209 126 observations are included in this dataset. For each transaction, following variables are observed: transaction price, contract date, a number of quality and size variables (living area, number of rooms, lot area, semi-detached, detached, quality index, building year), and a number of location variables (X-and y-coordinates, sea front, sea view, value areas for taxation purposes, urban, municipality).
As mentioned above, the functional labor market regions that we estimate price index series of are based on the Swedish Agency for Economic and Regional Growth's FA-subregions. Sweden is divided into 72 FA-regions, and some FA-regions are further divided into a number of FA-subregions. The total number of FA-regions and FA-subregions amount to 93. The FA-regions have been constructed for analytic purposes. The idea is that each region shares the same labor-market. Many regions consist of a city and its surrounding areas.
However, some regions do not have sufficient number of transactions to make the estimation of (yearly) price index series based on regression analysis possible. We define the criteria to be used when determining whether a FA-(sub)region contains enough number of transactions as follows: the minimum amount of required house sales per FA-(sub)region and year must be at least 83 (at least 500 observations over six years). This cut-off criterion is slightly higher than e.g. Geltner (1997). Given this criterion, 66 of the 93 regions are considered to have enough number of transactions. This means that 27 FA-(sub)regions are not included in the following analysis, representing 2.2% of the transactions.
Step 1: Estimation of annual hedonic price index for each labor market region In this step we estimate yearly hedonic house price indexes for each of the 66 FA-(sub) regions that are considered to have enough number of observations.
A hedonic equation is a regression of prices against attributes that determine these prices and time. The regression coefficients are interpreted as estimates of the implicit (hedonic) prices of these attributes, and hence, the willingness-to-pay for the attribute in question (see Rosen, 1974). The method has a long tradition. Recent articles are, for example, Song and Wilhelmsson (2010) and Ceccato and Wilhelmsson (2011). Following the literature, the hedonic price equation is equal to where: Y denotes the dependent variable transaction price (normally in log form); β 1 is a vector of parameters (regression coefficients) associated with exogenous explanatory variables, X. The stochastic term e is assumed to have a constant variance and to by normally distributed. Usually we implicitly assume that all relevant attributes are included in the matrix X: in other words, no omitted variable bias problem exists. We can decompose X into, for example, structural apartment and property attributes, as well as neighborhood attributes. The variable TD with subscript t is a dummy variable for each period and equals one for period t and zero otherwise. The number of observations is denoted by N, and T denotes the number of time periods. The two major approaches measuring hedonic price indexes are the time dummy approach and the so-called hedonic imputation approach (see Diewert et al., 2009). Song and Wilhelmsson (2010) is an example of the former and Gouriéroux and Laferrère (2009) is an example of the latter. Here we are utilizing a time dummy approach. The main difference between the methods is that the hedonic imputation allows all estimated parameters to change over time while the time dummy method assume that the parameters are constant over time. One way to overcome the problem of unstable parameters over time is to use moving window regression as in Song and Wilhelmsson (2010). In their article a hedonic time dummy approach is compared to a moving window time dummy approach with a window span of one year. Their conclusion is that there is no difference in estimated parameters concerning the time dummies. However, the general conclusion seems to be that the hedonic imputation method is preferable if the parameters are unstable over time (see e.g. Berndt andRappaport, 2001 andPakes, 2003 besides the article referred to above). Diewert et al. (2009) concludes with the following statement: "favor HI [hedonic imputation] methods unless degrees of freedom are very limited". In our case the degrees of freedom are very limited. Our overall objective is to estimate hedonic price indexes on market that are very thin. Consequently, the hedonic imputation method is not an approach that can be utilized. The main advantages with the time dummy approach are that the degree of freedom is preserved and that the methods minimize the influence of outliers (see Diewert et al., 2009). Hence, hedonic the time dummy approach is used in this study.
Spatial dependency is a problem that is more or less always present in this type of hedonic models. In order to minimize the problem of spatial dependency, we are including a number of different variables such as submarket dummies, coordinates and distance to the city. Coordinates have earlier been used in e.g. Wilhelmsson (2009) and Galster et al. (2004) In order to reduce spatial.
However, it is an empirical question whether spatial dependency creates biases in the coefficients concerning the price index. Song and Wilhelmsson (2010), using the same data, found that it did not. We have also estimated spatial autoregressive model (SAR) and spatial error model (SEM) following Anselin (1988) and Wilhelmsson (2002). We are using inverse distance as spatial weight matrix.
Since the out-of-sample forecast evaluation below requires some proportion of the observations to be saved, we choose to set aside ten percent -randomly chosen -of the historical data to be reserved for out-of-sample testing. In other words, the hedonic price index estimations will be based on random sample of 90% of the transactions (that is, 90% of 209 126 transactions from 2005 to 2010).
The dependent variable is transaction price based on contract dates. The explanatory variables consist of size variables (living area, secondary area, number of rooms, lot size), type of house variables (semi-detached, detached), and standard and location variables (quality index, building year, municipality, sea front, sea view, urban, and Xand Ycoordinates).
A dummy variable is created for each municipality. The quality index is defined by tax authorities in order to appraise the properties for taxation purposes. It is a composite of 25 questions concerning different quality aspects such as construction materials and amenities. Each question gives a number of points (2.5 points on average, but some questions can give as much as 11 points). One additional unit of quality can refer to very different things: for example, the existence of a car port or that the house has a new roof. Information whether the single-family house is semi-detached or detached are included as an attribute. Based on the building year variable, we construct a number of dummy variables that reflect different building periods since the beginning of the twentieth century (see Song and Wilhelmsson, 2010, for further information).
Step 2: Identifying homogenous groups of housing sub-markets with cluster analysis Cluster analysis has been used in earlier research as a tool for constructing housing sub-markets (see for example Wilhelmsson, 2004). Here we will use it as a tool to aggregate smaller housing markets into larger homogenous housing markets -the clusters. The smaller housing markets in a specific cluster are supposed to share many characteristics, such as proximity to each other, price development, and/or price level.
Variables used in the cluster analysis. We have chosen to use number of different variables and combinations of these as clustering variables.
Each clustering variable and each combination of these corresponds to a specific clustering method. The first cluster analysis method (C1) uses the average annual price development over time (2005)(2006)(2007)(2008)(2009)(2010) in the 66 FA-(sub) regions. The average price changes are determined by the hedonic price index estimations conducted in step 1 above. The second method (C2) uses the mean price level over the period. Method three (C3) uses the price development pattern over time by using a structural break every second year. The fourth cluster analysis method (C4) uses distance between FA-(sub) regions. The distance between the housing markets have been estimated from the average coordinates of the transactions in the housing market. Method number five (C5) uses both geographical proximity and price development. Finally, method number six (C6) combines of all the above clustering variables. As a reference, we also perform a seventh cluster analysis (C7) based on a variable containing random numbers. All cluster analysis are weighting the housing labor markets by size where size is measured by the number of transactions (observations).
Transformation, normalization and weighting of the variables. The size of the clustering variables naturally varies substantially between different housing submarkets. Furthermore, different types of cluster variables are used simultaneously in the cluster analyses below. In order to make the variables comparable, the following steps are taken.

Meanprice
The variable mean price is first transformed to logarithmic price. This means that the difference between two regions with mean price of 200 000 SEK and 400 000 SEK will have the same importance as the difference between 2 MSEK and 4 MSEK. The logarithmic price is also standardized with its standard deviation, in order to make it comparable with other variables.

Price development
The price development captures the average annual price changes. The price development is transformed to logarithmic scale and normalized by dividing it with its own standard deviation.
Price development pattern Even if two regions exhibit the same total price development from 2005 to 2010, there might be large differences in the annual price developments. That is why we construct variables to measure the price development pattern during the period. The pattern is defined as four variables measuring the price development over two years, that is (price index year Z) / (price index year Z-2) for Z is 2007, 2008, 2009 and 2010. Each variable is transformed to logarithmic scale.
These variables are not individually standardized in order to avoid a situation in which we underestimate the importance of periods with large price developments, and overestimate the importance of stable periods, which would be counter-intuitive. Coordinates The coordinates are based on the Swedish RT90-system, which measures distance in meters. X-coordinates represent the north-south direction and Y-coordinates represent the eastwest direction. Sweden is an oblong country, and the standard deviation of the X-coordinates is almost twice as big as the standard deviation of the y-coordinates. If they were normalized with their own standard deviations, one meter in north-south direction would have less bearing than one meter in east-west direction. Because of this, we standardize the coordinates y dividing them with the same number when used together with other variables.
The coordinates are not normally distributed; instead they exhibit very fat tails. If we would just standardize them with their standard deviation, the coordinates would be more important than other variables in a cluster analysis.
In order to find a good weighting of coordinates, we have tried different weights and analyzed the resulting outcome (maps). We have found that dividing the coordinates with 200 000 seems to give the coordinates a wellbalanced importance in the cluster analysis. This could of course be tested further.
Note that there are two variables that measure the geographical position (the Xand Y-coordinates), which is important when coordinates are included in clustering models with other variables.

Relative weights between different types of variables
We also need to consider the relative weights between the variables because we use different number of variables for measuring similarity.
-Geographical location is measured by two variables, X and Y.
-Mean price is measured by one variable and is therefore multiplied by 2. -Price development is measured by one variable and is therefore multiplied by 2. -Price development pattern is measured with four variables. However, these are not standardized. They have rather low standard deviations and after tests, we find that multiplying all variables by 2 is most appropriate. However, further tests should be undertaken. This weighting is only used when all clustering variables are used simultaneously.

Weighting of regions in the cluster analysis.
The objective is to create clusters which have large enough number of observations needed for constructing reliable indices based on hedonic regression analysis. However, it may require several clusters to reproduce the different market characteristics among the many housing markets. But, segmenting the market in too many small clusters creates problems with thin markets. Thus, there is a trade-off between identifying large enough clusters and to identify as many clusters as needed to reflect the heterogeneity among the housing submarkets.
With the clustering procedure we are using, it is not possible to influence how small or how large the clusters are going to be, which can result in very small or very large clusters. For instance, an "outlier-region" with a low number of sold houses might become its own cluster. On the other hand, several large cities can be assigned into one very large cluster. In this case the statistical clustering should ideally allow a region with many observations to become more "viscous" to avoid that such regions too easy become assigned into "too large" clusters, that is to say, we would not like the three metropolitan areas (Stockholm, Göteborg and Malmö) to become one cluster. If we could assign weights to regions with many observations, we would solve this problem. Unfortunately, we cannot perform the weighting directly using a standard clustering procedure. Therefore, we have implemented, into the cluster analysis, a method where we create "cop-ies" of the regions, where the number of copies that are created are based on the number of sales in the region. Thus, each region will be represented by n copies, where n is defined as the number of sales. As a result, the cluster analysis will first create "clusters" with only the duplicate observations because they have the same values on all cluster variables. This is equivalent as giving them a weight, and it is no longer likely that the largest cities will be put together in the same cluster. Smaller regions on the other hand will more easily be joined with other regions.
Cluster procedures and similarity/dissimilarity measures. There are many methods of clustering the data (see e.g. Mooi and Sarstedt, 2011). We have evaluated a number of different methods, but of course this could be further investigated.
The cluster procedures we apply are kmeans and k-medians, which assign each point to the cluster whose center is nearest. The first clusters are randomly chosen. The k-means procedure is a relatively simple clustering procedure that is suitable for large data sets, which we also have. We use both Euclidean distance and Canberra distance in order to find similarities between price indexes or geographical proximity. However, we found very small differences between the cluster procedures and the similarity measures. As a result, and in order to simplify the presentation, we only use K-means and Euclidean distance.
We iterate the number of clusters by starting by estimating two clusters and then three, four and so on. The iteration stops when the number of observations in the smallest cluster is on average below 60 observations each month and/or below 30 observations in an individual month (that is in line with Geltner, 1997;Schwann, 1998). Furthermore, we remove -if possible -the three largest regions from their respective clusters in a second step. If for instance Stockholm is assigned into the same cluster as other large regions, Stockholm will be removed. This will only happen if the cluster initially contains at least 15 000 observations.
If not necessary, we want to avoid clustering the largest regions with other regions since they have enough observations to constitute their own clusters. This kind of "post clustering" could probably be investigated further, for instance by performing new cluster analyses on the resulting clusters from the first cluster analysis, in order to divide the largest clusters into smaller ones.
Step 3: Comparison and performance evaluation of the different regional price index series Since each clustering method (see C1 to C7 above) generates different set of clusters, the corresponding price indexes will also be different. Thus, it is important to compare and evaluate the performance of the different clustering methods for price index construction purposes. However the problem is to define a natural choice of benchmark against which the different clustering methods can be compared. In this paper, we use an out-of-sample prediction measure utilized by Costello et al. (2009) and Goh et al. (2012). They estimate Root Mean Squared Error (RMSE) on a sub-sample of 25%. However, we choose to set aside 10% of the observations for the out-of-sample test. The remaining 90% of the observations will be used to estimate the hedonic price indexes. In order to compensate for only using a subsample of size 10% we do a simulation with 100 out-of-sample replications (sampling with replacement). Then we use the mean figures of the 100 replications.
We have found that the results from the clustering vary a lot between the replications. The same method can result in different number of clusters, and the resulting regression models will sometimes be relatively good and sometimes relatively bad. By using a random sample of 90% of the data, we obtain somewhat different values on the variables used in the cluster analyses. These small variations might explain why the results from the cluster analysis differ.

Data source
We use sales-data from real estate agencies. The dataset contains variables that originate from the sales-process, for instance to create advertisements and publishing them on the Internet. All sales have a contract date. This is a big advantage compared to using data from when the transaction is recorded in official registers, which is often done several months after that the buyers and sellers have agreed upon the sales price.
In order to get more information about the sold houses, this dataset has been combined with data from the Swedish Real Estate register. From that register we get variables like see view, lot size and a quality index used for taxation purposes. Some descriptive statistics can be found in the part where the hedonic regression is described. The dataset has been provided to us by Valueguard Index Sweden AB.

Regions
The number of transactions in different regions varies a lot. There is a close connection between the population in the region and the number of transactions. In Figure 1, number of transaction per month is plotted against the population rank.
The relationship between population and number of transactions per month is positive.
The largest four labor markets have all more than 100 observations per month in average. However, it is only 11 labor markets that have more than 60 observations per month on average. In Figure 2, number of observations per 100 000 inhabitants is shown.
On average, over the period, there are around 2 500 observations per 100 000 inhabitants. However the variation around the average is big, especially for the smaller labor markets. Some of the smallest labor markets have not that many transactions but related to the population, the number of sales is large. That is to say, it is not obvious that all small labor markets have few observations as the number of observations per inhabitants can be large. Some regions have more owner occupied houses than others, and in regions with low prices, many sales are not reported because they are not sold by real estate agents. More information about the regions can be found in the Appendix.
Descriptive statistics for the dataset Table 1 presents descriptive statistics concerning the observed attributes that are used in the hedonic price equation. As earlier described we are including a number of different attributes in order to control for all variation across time and space. Descriptive statistics concerning the municipality-dummies and the time dummies are not presented in the Table.  32% of the houses were built between 1960 and 1975. Houses at the seafront are rather uncommon, but around 5% of all houses in the dataset are either sea view or seafront houses. Naturally, the descriptive statistics vary from region to region for many of the variables. For instance, the mean and standard deviation of regional prices vary a lot. Furthermore, the lot sizes are usually smaller in in urban areas as compared to those in the rural areas.
The dummy variable "Urban" has been provided to this research by Valueguard Index Sweden AB. A house (transaction) is said to be Urban (has value 1) if the house is located in the conurbation of one of the 100 largest cities in Sweden.
A hedonic price index equation will be estimated for each of the 66 labor markets, and such estimations require a minimum amount of observations in order for the statistical estimation procedures to work with large enough degrees of freedom from a statistical point of view. The average number of observations is 3 500 observations per FA-(sub)region or labor market which corresponds to only 50 observations per month. However there is a large vari-ation between the labor markets: the smallest labor markets have less than 10 observations per month, which makes it impossible to estimate reliable price index series on a monthly basis. As a comparison, Stockholm has more than 500 observations on average per month.
Description of the regions Based on this sample, we create a table with descriptive statistics on the regions that will be used in the cluster analysis: -Number of observations; -Mean price; -Mean coordinate; -Average price development; -Price development pattern.
In Table 2, the regions are described based on all the data in the dataset.  Figure 9 that shows all the regions can be found in the Appendix.

Removal of measurement errors
We have removed extreme values from the dataset, or rather data that probably is incorrect. Much of the data is originally entered manually from real estate agents, and there are some errors in the dataset. For instance, when there are no neighbors within 10 000 meters, the coordinates might be wrong.
We have used the following criteria to remove extreme values and incorrect data: -Coordinates that indicate sales more than 10 000 meters from the closest neighbor are removed. -Houses built before 1850.
-Houses with a lot area bigger than 7000.
-Houses with more than 10 rooms.
-Observations were the total building area (including garage etc.) is more than 200 meters larger than the living area. -Houses with extreme values on price or price per square meter in their respective municipality.

Hedonic price indexes for each labor market region (step 1)
In the first step hedonic price equations are estimated for the 66 labor market regions (FA-(sub)regions). A temporal aggregation is carried out as we are only estimating a yearly price index. In Table 3, three results are presented -three very different labor markets. The first is Stockholm, the capital of Sweden, the second is a medium sized city and the third is a small labor market with very few observations. Based on the number of transactions a monthly index can be estimated for Stockholm (on average 500 observations per month), a quarterly index for Västerås (on average 60 observations per month) and yearly index for Mora (on average 13 observations per month).
The results from these regressions vary a little between the replications because only 90% of the data is used. In the Appendix, the index values can be found for all the regions based on all observations. The overall goodness-of-fit is good. In the labor market of Stockholm, the estimated model can explain more than 90% of the variation in price. R-square in Västerås is more or less in the same magnitude, but it is lower in the smallest labor market Mora. A possible reason for why the high goodness-of-fit figures ob-tained, is that we use a sort of weighted least square (WLS) in order to down-weight outliers.
In the first step, we have estimated 66 different hedonic price equations with OLS, one for each labor markets. The Moran's I statistic is on average equal to 26.498 with a standard deviation of 13.233. That is, the result indi-cates that spatial dependency is present; we reject the hypothesis of no spatial correlation. However, the question is whether the hedonic house price indexes in each labor market are affected or not. In order to test for bias in the coefficients for the time dummies we have estimated a SEM and a SAR model for each labor market. We have limited the sample size to be maximum 900 observations in each labor market.
In the Figure 3, the average difference (absolute value) between the aggregated house price index and the two spatial regression indexes are displayed. Although there is a presence of spatial dependency in our hedonic house prices models; this seems not to spill over to the price indexes. In fact, the differences between the house price index estimated by OLS, the SEM and the SAR seem not to be significant. The difference is on average around 0.026 between OLS and SEM and 0.022 between OLS and SAR, which is low compared to the coefficient concerning the time dummies.
We have tested whether the differences are statistically significant or not with a Hausmantest. The null hypothesis is that the difference in estimates is equal to zero. The average t-value (absolute value) is estimated to be equal to 0.83 with standard deviation equal to 0.70. That is, on average we cannot reject the null hypothesis of equality in any of the coefficients concerning the time effects and, accordingly, our OLS-estimates can be used in the cluster analysis.

Cluster analysis (step 2)
The results of the cluster analysis are presented in the Table 4 and the Figure 3. Table  4 presents the average number of clusters, and its standard deviation as well as average number of observations for each cluster method is presented. The figures depict the identified clusters on maps.
As can be noted in the Table 4, the average number of clusters for each method varies from 9 to 12 and thereby the average number of observations per cluster.
An interesting result is that we obtain very large differences between the replications of the clustering method. This also occurs even if the data is almost the same. The number of clusters varies a lot, as can be seen in the standard deviation of the number of clusters in Table 4.  In the diagrams in Figure 4, we can see examples of maps that are created with the different methods. Each cluster is represented with a color. One can see clearly that the method C4, that uses coordinates only, creates clusters based on geographical proximity. It is interesting to see that other clustering commands also creates some clusters with nearby regions, the price development pattern (C3) for instance.

Comparison and evaluation of regional prices indexes (step 3)
In stage three, a hedonic price equation for each cluster is estimated (all the estimates are available upon request). In the Table 5 the root mean squared errors (RMSE) have been estimated. MSE is here defined as: For the calculation of RMSE we use the 10% out-of-sample data for each of the 100 replications. The price is used on logarithmic scale in the regression, and the RMSE is also measured on this scale.
Results Table 5 displays the results of the different clustering methods, including C7 (random). Surprisingly, there seems to be no differences at all between the different clustering methods.
We know that the problem with thin markets is bigger in small regions. Maybe we can observe some differences in different types of regions? We have therefore divided the regions into three groups: (1) regions with the least number of sales, totally 10% of the transactions in the dataset or the 25 smallest regions, (2) medium sized regions and the three largest regions, with one third of all the transactions in the dataset. In Table 6 are we comparing the differences in RMSE between the cluster methods and the type of the region.
Even if there are some differences, they are very small and not statistically different from each within each type of region. However, results suggest that the average prediction error is largest in small regions and smallest in large regions. Moreover, we have also noted that there are rather substantial differences between the different replications. The number of clusters varies a lot -even if the data used are almost the same. The average result seem to be almost exactly the same for all the clustering methods, but maybe some methods produce better clusters in some of the replications? Figure 5 shows the differences between the replications. For each clustering method, the resulting RMSE has been sorted from smallest to largest. The figure shows these sorted results for each method.
The differences are not very big, but the results are clear. The clustering with a random variable seems to be the worst method, even if it is rather stable. Coordinates seem to be a good method, but sometimes we get bad results. Price and price development seem also to be good variables to use. A clear conclusion is that geography is important, but using geographical location only might lead to a bad result. In the Figure 6, we see similar results for the smallest regions (based on number of transactions).  We notice the same result -we get some really bad results from using coordinates only. We also note that the differences between the replications are bigger. The average RMSE vary from 0.32 to 0.44.

An alternative out-of-sample
We conduct also an alternative out-of-sample test by removing 30% of the observations from the last year only (2010).
The RMSE is on average somewhat bigger as compared to the original test (see Table 7). The differences are also somewhat larger between the clustering methods. For the smallest regions, the clustering method that uses coordinates (C4); seem to be the worst method, even worse than using a random variable. In order to understand the differences better, we illustrate the RMSE from each replication in Figure 7.   We have some extreme differences in some of the replications. We sometimes get a very high RMSE with many of the clustering methods. We can clearly see that clustering using coordinates only will result in the highest RMSE in all these extreme replications. In Figure 8, we show the same results after sorting after RMSE individually for each clustering method.

CONCLUSIONS
Our goal is to create clusters which have large enough number of observations needed for constructing reliable indices based on hedonic regression analysis. However, it may require several clusters to reproduce the different characteristics among the many housing markets. But segmenting the market in too many small clusters creates problems with thin markets.  Thus, there is a trade-off between identifying large enough clusters and to identify as many clusters as needed to reflect the heterogeneity among the housing submarkets. First of all, we have found that the methods we have used for cluster analysis produce very different results, based on very small variations in the used dataset. It would of course have been more satisfying to find a stable method that would always produce good results.
The Swedish housing market is homogeneous in many aspects. The price development between different regions is closely correlated with each other. Furthermore, most people value the same attributes in their homes. Because of this, different regions might work very well in the same regression model.
We have found that geographical proximity is a good variable for clustering regions, but is should not be used alone. The results from this study suggest that regions should be clustered together based on regional price levels and/or price development as clustering variables. If only geographical proximity is used as clustering variable, our computations show that there is a high risk that we end up with some clusters having large standard errors in the regression models, which in turn might result in inaccurate indexes.
The differences are biggest in small regions. Large regions are often the center of their clusters if only geographical proximity is used, while small regions often are clustered with the closest large region. This means that the parameter-estimates in the regression model will not adapt to the small region. If many smaller regions are clustered together, maybe because price level is used in the clustering, the regression model is more likely to adjust to the small regions.
The results are not so clear. There is not one method that would produce stable results with big improvements in standard errors. Even if we could not find such a method, we have made some conclusions and we also find the method interesting. It might be used to test other models for the purpose of clustering regions. In this paper, we have only used information from within the dataset of sold houses. There are many other sources of information that can be used to create other variables to describe the regions such as population density, income level, and unemployment rates and so on.
It could also be interesting in future research to analyze the best and the worst clusters in order to explain the differences. Moreover, we could also test other regions as smallest unit for clustering. In this paper we use functional labor market regions, where each region consists of a number of municipalities. We could do the same study with smaller areas, maybe concentration the study to a part of Sweden, Finally, the same analysis could also be used with longer time series based on another dataset -perhaps price development will be more important for clustering with a longer period. We have worked with a period of six years.

Difference between different clustering commands. Results from 100 replications for each clustering, in total 400 replications
Based on an early analysis, where we using the sales prices for calculating RMSE instead of the logarithmic prices, the results differ a little.  Table 9. Descriptive statistics concerning all labor markets (see Figure 9) Figure 9. Map of FA-regions and region parts Note: The mean price is calculated from all observations 2005 to 2010 Source: Swedish Agency for Economic and Regional   ------55  Storuman  ------56  Lycksele  ------57  Dorotea  ------58  Vilhelmina  ------59  Åsele  ------60  Sorsele  ---- "-" means that the number of observations is not enough for calculating an hedonic index. These regions are not included in the clustering analysis.