COMPARING TRADE-OFF ADJUSTMENTS IN CREDIT RISK ANALYSIS OF MORTGAGE LOANS USING AHP, DELPHI AND MACBETH

. Due to the severe restrictions on access to credit resulting from the current economic climate, credit risk analysis of mortgage loans has been considered paramount for banking institutions and is currently accompanied by higher credit underwriting standards. In this paper, we present an empirical comparison of three decision support tools (i.e. Analytic Hierarchy Process (AHP), Delphi, and Measuring Attractiveness by a Categorical Based Evaluation Technique (MACBETH)) in the specific context of trade-off readjustments in credit risk analysis of mortgage loans. We conducted a panel study with credit analysts and focused on five lines of comparison: ease of use; time-consumption; ease of applicability; accuracy; and overall evaluation. Results indicate that Delphi surpasses AHP and MACBETH in terms of ease of use, time-consumption and ease of applicability. As for accuracy, the differences obtained between AHP and MACBETH are not significant, and both methods perform better than Delphi. Most of the decision makers considered AHP the “overall best” approach.


INTRODUCTION
The relationship between financial and real estate markets and mortgage lending has been particularly highlighted after the most recent United States' subprime (cf. Kowalski, Shachmurove 2011;Puri et al. 2011;Yeager 2011;Beltratti, Stulz 2012). In particular, researchers have emphasized how important mortgage lending decisions are to revert the dramatic changes and declining values of real assets in many markets. As a consequence of poor lending decisions in the past, banks have been requiring higher credit underwriting standards, which impose heavier restrictions on access to credit, namely in terms of mortgage lending (Barth, Hollans 2012).
As expressed by Lopez and Saidenberg (2000: 152), banks have improved their credit-scoring risk systems over the years with the aim of "better quantifying the financial risks they face and assigning the necessary economic capital" (for further details see e.g. Pau, Tambo 1990;Altman, Saunders 1997;Jacobson, Roszbach 2003;Yurdakul, İç 2004;Chambers et al. 2009;Xu, Zhang 2009;Yu et al. 2009;Twala 2010;Wang et al. 2011;Blackburn, Vermilyea 2012;Leow, Mues 2012). Nonetheless, many defend that the progress achieved does not mean that the current approaches to measure credit risk are without limitations (cf. Lopez, Saidenberg 2000;Doumpos, Zopounidis 2001, 2011Doumpos et al. 2002;Twala 2010). Specifically, in line with Ferreira et al. (2011a), there is a lack of transparency in the way trade-offs among evaluation criteria are defined. Starting from this observation, and considering that elicitation methodologies have over the years successfully handled trade-offs among criteria (cf. Saaty 1980;Belton, Stewart 2002;Bana e Costa et al. 2005;Saaty 2008;Ferreira 2013), there is considerable scope to explore the applicability of these tools also in terms of credit risk evaluation of mortgage loans -a topic that, following Mari and Renò (2005: 92), has been "quite neglected in the financial literature".
Methodological comparisons among multiple criteria approaches have been widely reported in the literature (see, among others, Olson et al. 1995;Boucher et al. 1997;Zanakis et al. 1998;Scholl et al. 2005;Perini et al. 2009;Zhou, Ang 2009). However, the use of these approaches in the evaluation of credit risk has been very scarce and, as far as we are aware, no study compares their effectiveness in determining the relative importance of the different criteria used in risk analysis of mortgage loans. By comparing three decision support methods to define the relative importance of a pre-established set of mortgage loan risk evaluation criteria in use by one of the largest banks operating in Portugal, this study aims to contribute to this under-researched area. The methods compared are the Analytic Hierarchy Process (AHP), the Delphi method, and the Measuring Attractiveness by a Categorical Based Evaluation Technique (MACBETH). A panel study with credit analysts was conducted and the performance of each method assessed against five major lines of comparison: ease of use; time-consumption; ease of applicability; accuracy; and overall evaluation.
The remainder of the paper is organized as follows. The next section underlines the economic relevance of mortgage lending. Section 3 briefly describes the institutional and managerial context of the study, while section 4 briefly introduces the three methods used. Section 5 explains their application in terms of trade-offs readjustments and presents the results of their comparison. Section 6 concludes the paper.

MORTGAGE LENDING ECONOMIC RELEVANCE
One of the top reasons supporting the economic relevance of mortgage lending is the fact that whilst home buying is usually seen as the major investment for most households, these do not frequently have the capital available to outright purchase a house. As such, bank mortgage loans are usually seen as the easiest solution and, from this perspective, it seems obvious that mortgage lending has stimulated the economy over the years. Lima et al. (1995) and Mari and Renò (2005) present several interlinked reasons supporting the relevance of mortgage lending in economic terms. According to the authors, mortgage lending not only allows the expansion of financial services but also promotes the development of the housing construction industry and associated business activities. As a result of the financial (and thus economic) ex-pansion, job creation and household consumption are stimulated, increasing money circulation and contributing to the country's gross Domestic Product (gDP). As emphasized by Mari and Renò (2005: 83), "the market for mortgage loans is of primary importance in any developed country". It is worth noting, however, that due to their inherent decreasing liquidity and limited access to debt markets, banks have imposed heavier underwriting requirements on access to mortgage credit (e.g. reductions of the loan-to-value (LTV) and rate of effort), and this has resulted in lower credit risk analysis outcomes, higher spreads and additional costs (often unaffordable to the borrower).

THE CURRENT CREDIT-SCORING SYSTEM AND ITS LIMITATIONS
The risk evaluation of mortgage loans in Portugal tends to be standardized and based on a ten-point scale. If a credit application scores between "1" and "5", the lending decision is usually favorable; scores above "5" support credit refusal. The final score of each credit-risk evaluation is a composite result, also known as overall score. This overall score is the result of aggregated partial scores, which, in turn, are associated to pre-established weighted criteria.
Although the types of mortgage loans available in the Portuguese banks usually depend on the market conjuncture, the criteria on which mortgage loan risk assessment relies are usually the same. Table 1 identifies the criteria commonly adopted by the five most representative banks operating in Portugal and reveals the weights used by one of them (whose name has been kept confidential upon request). As shown in Table 1, LTV, rate of effort and responsibilities in BP -Banco de Portugal -(Portuguese Central Bank after translation) are the three most weighted criteria in the current system for mortgage loan risk evaluation. Although easily understandable, the current mortgage credit-scoring system is not without its drawbacks. Specifically, the following limitations have been identified: a) the system is unable to consider behavioral criteria; b) considering that the trade-offs among criteria are previously and administratively established, there is an apparent lack of transparency and rationality, which prevents a proper understanding of the final decisions; and c) considering the existence of geographic idiosyncrasies, and that the trade-offs are the same for all bank branches, there is a possibility of inappropriate evaluations of mortgage loan applications.
In the literature, there are several decision support methods which have proved very valuable in addressing these problems in other contexts, and which might therefore also be very valuable to mortgage loan risk assessments. In particular, it is our opinion that the AHP, Delphi and MAC-BETH methods can provide a better basis for discussion and justification of mortgage lending decisions, leading to a more transparent evaluation mechanism. In order to explore how each of these methods can help banking institutions address the limitations previously identified, we carried out a comparative analysis involving bank directors and credit experts from the five most representative banks operating in Portugal.
It is important to emphasize, however, that the issue of which method to select or which methods to compare in order to identify the most appropriate ones is a challenging one. Considering that each decision support method has strengths and weaknesses, the choice of methods is usually determined by the decision analyst and is dependent on the decision context. In the particular case of this research, four major factors impacted on our decision on which methods to compare. Firstly, the AHP, Delphi and MACBETH methods have been recognized in the literature for being simple and facilitating trade-off calculations. Therefore, it was felt that any of these methods had potential to assist mortgage loan risk assessments. Secondly, the authors of the manuscript had previous experience in the use of these three methods to improve understanding and to assist decision making across several organizational contexts. Familiarity with the methods was considered an important factor to ensure a proper implementation and comparison.
Thirdly, the experts in risk analysis of mortgage loans who participated in the study had very stringent time constraints, preventing the inclusion of other methods in the comparison as further comparisons would have increased considerably the duration of the experiment. Finally, to the best of our knowledge, no previous research had compared the performance of the AHP, Delphi and MACBETH in determining the relative importance of the credit rating criteria used in risk scoring systems.

BRIEF METHODOLOGICAL BACKGROUND
In this section, we present a brief overview of the three decision support methods under analysis. The methods will be presented according to the order they were applied during our comparative experiment: Delphi, AHP and MACBETH.

The Delphi technique
The Delphi technique was developed in the 1950s by Norman Dalkey, Olaf Helmer and respective collaborators at the RAND Corporation (cf. Dalkey, Helmer 1963), and its procedure begins with an individual survey, which should be completed by individuals considered experts on the topic under discussion.
As outlined by Ferreira and Monteiro Barata (2011: 246), the operational structure of the method is based on "a well-established sequence of successive individual questions supplemented with information and advice, which permits correcting the first stages of the process. […] it is a tool, which, under certain parameters, enables consensus. […] and is based on the rational principle that 'n' human minds are better than one when confronting the lack of precise knowledge about a certain subject" (for further discussion, see also Linstone, Turoff 1975). The basic principles of the method are: anonymity, controlled feedback and statistical treatment of the responses.

Basics of the AHP approach
The AHP was developed by Thomas Saaty in the mid-1970s, with the purpose of overcoming the cognitive limitations of decision makers (cf. Saaty 1980). The implementation of the AHP begins with the identification of the criteria to be used in the evaluation of alternatives (also known in the literature as actions). These criteria are then organized in a hierarchical structure, which includes three key elements: criteria, subcriteria and alterna-tives. once this hierarchy has been defined, the relative importance of each of its elements has to be determined through a process of pairwise comparisons. The answers provided by the decision makers to the pairwise comparisons at each level of the hierarchy are then synthesized into square matrices. In each matrix, the number in row i and column j provides the relative importance (or priority) of a certain criterion C i over another criterion C j , as can be observed in the matrix form (1) presented below: (1) The conversion of pairwise comparisons into numerical values is based on a one-dimensional scale from 1 to 9, also known as 'Saaty's fundamental scale'. The next step of the process consists in calculating the weights w for each criterion in the different hierarchical levels and in relation to the alternatives considered. This is done by applying a mathematical technique known as eigenvector to the matrices of comparison. The mathematical expression (2) allows to estimate the eigenvector matrix. (2) Laininen and Hämäläinen (2003) defend that the results obtained from the application of this mathematical expression must be standardized. Specifically, as shown in expression (3), where T is the normalized eigenvector, the standardization procedure consists in obtaining the proportion of each element relative to the sum.
This procedure allows for the eigenvector ordering of priorities, and should be repeated until the normalized result of the last operation gets very close to the result of the preceding operation (e.g. irrelevant differences after the third decimal place). The eigenvector provides a hierarchy (also known as priority order) for the criteria involved and, to assess whether the data are logically related, the solution should be tested in terms of consistency by calculating the eigenvalue. Saaty (2008) proposes the following sequence of procedural steps to test the consistency of the solution: -Calculate the eigenvalue (λ max ) in accordance with the expression (4), where w is obtained by summing the columns of the matrix of comparisons: -Compute the consistency index (CI) using equation (5), where n represents the order of the matrix: -Estimate the consistency ratio (CR) through equation (6), where RI is a random consistency index and depends on the order of the respective matrix: The literature considers CR acceptable if its value is below 0.10. Contrarily, CR values above 0.10 require a revision of the matrix of comparisons (cf. Saaty 1980;Yurdakul, İç 2004;Xu, Zhang 2009;Perez-gladish, M'Zali 2010). Following the procedure used to obtain the relative importance of the criteria, the levels of preference of the alternatives are determined by comparing pairs of alternatives in each criterion. Considering the levels of relative preference, and based on the additive model presented by equation (7), an overall assessment for each alternative considered can be made explicit: p v a with p and j j j j j p j n j (7) Specifically, V(a) corresponds to the overall value of alternative a; pj is the weight of criterion j and vj is the level of preference of the respective alternative in criterion j.

The MACBETH approach
The MACBETH approach was developed during the 1990s by Carlos Bana e Costa and Jean Claude Vansnick (cf. Bana e Costa, Vansnick 1994). As in the AHP, the MACBETH process also involves the drafting of judgments between pairs of actions. The novelty lies, however, in the use of a semantic scale composed of categories of difference of attractiveness. Technically, let X = {a, b,..., n} denote a finite set of n alternatives of choice. The technical procedure consists in associating each element of X to a value x (based on a value function v(.): are as compatible as possible with the preferences directly expressed by the decision makers. In this sense, for all pairs of actions (a, b) allocated to the same category C of semantic differences of attractiveness, the differences v(a) -v(b) will belong to the same interval. In addition, the association of asymmetric partitions of the ray of positive real numbers to partition classes of ordered pairs (a, b) (with a P b) occurs whereas two contiguous ranges correspond to two consecutive categories of differences of attractiveness . This is done based on a value function v and function thresholds s k as presented in formulation (8), where P (k) stands for a value preference that is stronger the greater the k: The thresholds s k are positive real constants and one of the objectives of the MACBETH approach consists in allocating the difference of attractiveness between each pair of actions (a, b) ∈ X to one of the following categories of semantic differences of attractiveness: C 0 = Null (or indifference (i.e. a I b)); C 1 = Very weak; C 2 = Weak; C 3 = Moderate; C 4 = Strong; C 5 = Very strong; and C 6 = Extreme (Bana e Costa et al. 2005). Additionally, formulations (9) and (10) should be analyzed for consistency purposes (Junior 2008;Ferreira et al. 2012): 1,2,3,4,5,6 , , , , with ( , ) and( , ) (10) Formulation (9) represents the logical assumption that if action a is strictly more attractive than action b (i.e. a P b), then the value of a must be greater than the value of b (i.e. v(a) > v(b)), allowing the association of numbers to both actions. Naturally, if action a is considered as attractive as action b (i.e. a I b), meaning that no difference between them is detected, then v(a) = v(b) and the pair (a, b) ∈ C 0 . In accordance with Bana e Costa et al. (2008: 28), formulation (10) stands for the guiding principle "that all of the differences allocated to one semantic preference difference category are strictly larger than those allocated to a lower category". With consistent value preferences, linear programming is applied using formulation (11) (Junior 2008), which minimizes v(n) and generates an initial scale for discussion: if the difference of attractiveness between and is bigger than between and , then: where: is an element Technically, it should be pointed out that n is the most attractive alternative of X (or at least as attractive as the others) (i.e. n (P ∪ I) a, b, c,…), and the minimization of its value guarantees the minimal length of the initial scale. Also, ais the least attractive alternative of X (or at least as attractive as the others) (i.e. a, b, c,… (P ∪ I) a -), and the respective value should be considered the "zero" of the scale (Bana e . Through the additive model presented by equation (7), an overall value for each alternative can be estimated.

EXPERIMENTAL RESULTS
our experiment was conducted after the intervention of the European Central Bank, European Union and International Monetary Fund (i.e. the so called "Troika") in Portugal, which required credit risk analysis of mortgage loans to be more cautious and forced Portuguese banks to impose higher credit underwriting standards.
Considering that AHP, Delphi and MACBETH have been characterized as very effective in transparently handling trade-offs among evaluation criteria (cf. Belton, Stewart 2002;Saaty 2008;Ferreira, Monteiro Barata 2011;Bana e Costa et al. 2012;Ferreira et al. 2012Ferreira et al. , 2013, our experiment consisted in applying and examining the potential value of each of these three methods in terms of trade-offs readjustment in a given structure of pre-established risk evaluation criteria (see Appendix 1). In this sense, different aspects will be addressed in the following subsections, namely: actors involved; problem definition; technical procedures undertaken; analyses carried out; comparison and discussion of results and, finally, recommendations. For illustrative purposes, these aspects will be discussed using the credit scoring system of one of the five largest banks operating in Portugal as reference.

Actors involved
Decision makers are expected to play an active role when dealing with group decision support systems (cf. Ferreira et al. 2011b). In order to compare the performance of each of the three decision support systems under analysis, a panel of five experts in risk analysis of mortgage loans was formed and a group meeting was organized to discuss the problem and collect the views of the experts on the different systems. The group meeting was conducted by a team of two facilitators (i.e. researchers, scientists), assisted by a communication technician, who was responsible for registering the outcomes of the session.

Problem definition
As already outlined, our experiment aimed to apply the AHP, Delphi and MACBETH methods to a structure of administratively pre-established risk evaluation criteria and compare their performance. This allowed us to test which one of these three methods offers the greatest potential to provide decision makers with the most effective and fairer mortgage risk evaluation system.
In practical terms, our research problem consisted in eliciting value judgments from a panel of mortgage credit risk analysts and, based on the outcomes, readjust trade-offs among the criteria considered (see again Table 1). It is important to clarify that it was not an objective of the intervention to introduce changes in the current mortgage loan risk evaluation system in terms of criteria selection and/or operational structure (see Appendix 1). This structure is administratively established by the bank and we were not asked (or allowed) to change it.

Evaluation criteria and impact levels
The criteria used by the banking institution under analysis to assess the risk of mortgage loans and the way these criteria are structured into a credit scoring system are illustrated in Figure 1. Complementing the analysis of Figure 1, it is important to highlight the existence of two decision levels. The first decision level is based on a discrete scale from "1" to "10" and projects the partial evaluation of the credit applications in each criterion involved. This means that there are ten administratively defined levels of partial evaluation, and credit scores between "1" and "5" support partial credit approvals. On the other hand, the credit should be refused if the credit scores are between "6" and "10" (cf. Appendix 1). The second decision level considers the weights of the criteria and is based on a continuous scale from "1" to "10", allowing the projection of an overall score for each mortgage loan application. It is precisely at this second decision level that our experiment is focused. In particular, we aim to explore and compare the applicability of the AHP, Delphi and MACBETH methods in terms of tradeoff readjustments in the context of credit scoring. Figure 1 was shown to the experts during the group session. This allowed us to: (1) explain the research problem; (2) encourage discussion among panelists; and (3) establish the basis for a proper elicitation of value preferences.

Judgments, comparisons and trade-off procedures
In this subsection, we individually present the way each method was applied. Because the panel members did not know each other before the group meeting, we decided to apply the Delphi technique first. This allowed us to preserve the anonymity of the participants and avoid group influences.

Application of the Delphi technique
At the beginning of the session, explanations about the Delphi technique were given to the panel members. In particular, the individual survey (see Appendix 2) was meticulously explained to each of the experts and they were then invited to express their perceptions in terms of trade-offs among criteria. Once collected, the individual responses of the decision makers were inserted into an Excel file and a second survey (see Appendix 3), contemplating the statistics of the first round, was given to the panelists. The panel members where then invited to compare the results obtained, rate their answers again and, if necessary, re-rank the answers given in the first round. After collecting and analyzing the individual responses of the second round, it was possible to note that the changes from the first round to the second were not significant. Table 2 presents the statistical outputs obtained after the second round.
The questionnaire took approximately 30 minutes to fill in and was followed by 10 minutes of discussion in order to collect the views of the participants regarding the method. According to the panel members, the Delphi technique is easy to understand and reveals great potential in practical terms (further discussion and comparison of results are presented in subsection 5.7).

Application of an AHP-based process
Following the example of the Delphi session, the testing of the AHP-based process was also preceded by an explanation about the AHP method and 'Saaty's fundamental scale'. The decision makers were then asked to pairwise compare and order the criteria by decreasing overall importance. This procedural step lasted approximately one hour and was the starting point for the projection of priorities. Figure 2 presents the questionnaire of pairwise comparisons, where the priorities were projected. This process lasted another hour and was conducted using the Super Decisions software (http://www.creativedecisions.net/). The outputs of our AHP-based application are shown in Table 3.
Once the trade-offs had been obtained through the AHP-based process, they were provided to the panel members for discussion (details and comparison of results are presented in subsection 5.7). From a technical perspective, it should be pointed out that the results obtained present a good consistency ratio of 0.026412 (i.e. below 0.10) (cf. Saaty 1980;Yurdakul, İç 2004;Xu, Zhang 2009; Perez-gladish, M'Zali 2010).

Application of the MACBETH approach
The testing of the MACBETH approach was also preceded by an explanation of this technique. In particular, special emphasis was given to the categories of semantic differences of attractiveness introduced by Bana e Costa and Vansnick (1994). The process started with the ranking of the evaluation criteria (also known as Fundamental Points of Views (FPVs) in the MACBETH literature) by decreasing overall attractiveness. Once this step was concluded, the panel members were invited to pairwise compare the criteria and project their value judgments using the categories previously presented. The process was conducted using the M-MACBETH software (http://www.m-macbeth. com/), and the matrix of comparisons took approximately sixty minutes to complete (Fig. 3). Table 4 summarizes the results obtained, which were provided to the decision makers for further discussion.   Once the "new" trade-offs were obtained using the three different elicitation techniques, we tested the results by assessing the risk of two real mortgage loan applications, as discussed in the next subsection.

Evaluating mortgage loan credit risk
To test the trade-offs obtained from the application of the three methods in analysis, information on mortgage credit applications (identified as "Betas" from now on) was requested from the bank. Table 5 presents the information that was kindly provided under conditions of strict anonymity and refers to two mortgage loan applications. Whilst we would have liked to have data for a larger sample of applications, the information provided was an administrative decision over which we had no control.
Based on the information presented in Table 5, we were then able to determine the partial scores obtained by the two credit applications regarding each evaluation criterion and, by combining these scores with the respective weights, calculate the overall risk score for each application. Criteria weights and rankings, as well as the partial and overall scores obtained during the experiment for the two Betas, are presented in Table 6.
Although Delta 1 and Delta 2 are two real mortgage loan applications, they are not representative of the profiles of all applications made to the bank under analysis. However, the results displayed in Table 6 are very useful for illustrative purposes.   -a) Partial evaluation on the first decision level (see Appendix 1); ↑ Positive weight variation when compared to the current system; • Null weight variation when compared to the current system; ↓ Negative weight variation when compared to the current system.
As shown in Table 6, the overall scores obtained by using each of the methods analyzed (including the existing evaluation system) suggest the approval of the credit applications by the bank. Although the overall scores are all below 5, recommending approval of the applications, the results obtained with the Delphi, AHP and MACBETH methods are more conservative than those obtained with the actual system, signaling a higher risk of the applications. The results in Table 6 also indicate that the use of these elicitation techniques introduced several changes to the pre-existing weights. Whilst the weight of some of the evaluation criteria increased (e.g. deposit portfolio; average balance), others decreased significantly (e.g. profession; cross-selling). The next step consisted in comparing the methods and discussing the outcomes of the experiment and the accuracy of the results.

The pre-questionnaire
Following the example of Perini et al. (2009), we decided to apply a pre-questionnaire before any comparison between methodologies. Among other things, this procedure was important to know whether the decision makers understood both the objectives of the group meeting and the operative mechanisms of the AHP, Delphi and MACBETH methods. The questions were easy to understand and the answers were given on a five-point Likert scale ranging from 1 (strongly disagree) to 5 (strongly agree). Table 7 presents the results of the pre-questionnaire.
In broad terms, and following Table 7, the panel members reported that they understood the Delphi, AHP and MACBETH methods (median of 5), the differences among methods (median of 4) and the objectives of the group meeting (median of 5). Most important, perhaps, is the fact that these statistical results confirm that the credit analysts understood what they were supposed to do in terms of tasks required (median of 5). The next subsection presents a comparative analysis among methods.

Comparative analysis
The literature on decision support is fertile in presenting strengths and weaknesses of the three methods in analysis. Regarding the Delphi technique, it has been characterized as a tool that allows consensus based on controlled feedback of the answers and reflection on earlier judgments. The technique has been applied in the treatment of several different themes and, because it is anonymous, it allows participants to provide answers without the influence of the organization hierarchy. on the negative side, because the method does not require the physical presence of the agents involved in the process, the questions may be misunderstood by the respondents, jeopardizing the results. Additionally, because the technique is based on rounds and the survey is always the same, respondents tend to drop out of the process (for further details on the strengths and weaknesses of the Delphi technique, see e.g. Dalkey, Helmer 1963;Dalkey 1969;Hsu, Sandford 2007).
As for the AHP method, one of its most significant strengths results from requiring decision makers to compare pairs of alternatives, measuring the degree of inconsistency present in pairwise judgments and ensuring that only justifiable orders are used. Despite its intrinsic simplicity and ability to assess quantitative and qualitative factors, different types of criticism have been discussed in the literature. For example, the method has been criticized not only for the possibility of exhibiting rank reversal (Belton, gear 1983;Boucher et al. 1997), but also for the use of the eigenvalue procedure to derive priorities (Bana e Costa, Vansnick 2008) (for further discussion on the strengths and limitations of the AHP method, see e.g. Olson et al. 1995;Yurdakul, İç 2004;Alonso, Lamata 2006;Saaty 2008;Perez-gladish, M'Zali 2010).
As in the AHP case, the MACBETH approach is also based on pairwise comparisons, which, according to Dyer and Forman (1992), are easy to make, justify and agree on. As an interactive decision support technique, MACBETH has been characterized as following a constructivist approach, being easy to understand and solidly supported on mathematical background. Still, as in any other pairwise comparison-based approach, MACBETH requires a thorough and prior preparation of the information, namely in terms of mutual preferential independence tests, which depend on the number of criteria involved and number of pairwise comparisons to be made. In particular, "the technique requires an enormous willingness on the part of decision makers, and a high dedication on the part of the facilitator […] the matrices completion can become a demanding task for the actors involved in the process and, as such, difficulties in gathering data may arise" (Ferreira 2013: 443) (for further details on the strengths and limitations of the MACBETH technique, see e.g. Bana e Costa, Vansnick 1994; Belton, Stewart 2002;Ferreira et al. 2011a).
Whilst different types of strengths and limitations can be pointed to the three methods in analysis, it should be noted that, in accordance with Weber and Borcherding (1993), there is no such thing as a superior elicitation methodology and method choice should always be dependent on the decision context. This remark seems to be further supported by Ananda and Herath (2009) and Zhou and Ang (2009), who argue that it is very difficult to prove that one method or technique is superior to another in supporting the decision making process. From this premise, and because the AHP, Delphi and MACBETH methods have been recognized in the literature for being simple and facilitating trade-off calculations, in the last part of the meeting a comparative survey was applied to get the perspective of the decision makers regarding the methods and the accuracy of the results. It should be recalled that we focused on five lines of comparison, namely: ease of use; timeconsumption; ease of applicability; accuracy; and overall evaluation. As in the pre-questionnaire, a five-point Likert scale ranging from 1 (strongly disagree) to 5 (strongly agree) was used. Table 8 presents the results obtained.
As shown in Table 8, the results indicate that in the particular context under analysis Delphi exceeds AHP and MACBETH in terms of ease of use, time-consumption and ease of applicability. In terms of the accuracy of the results, the differences obtained between AHP and MACBETH are not significant (cf . Table 6), and both methods perform better than Delphi. The majority of the decision makers considered AHP the "overall best" approach.
In the last thirty minutes of the meeting, the panel members were asked to analyze and discuss the outcomes of the session. Even considering that there is no superior elicitation method and that choice should depend on the decision context, as previously discussed, there was a generalized consensus that "the adjustments made represent more caution in terms of credit approval, corroborating the Basel guidelines" (in the decision makers' own words). Furthermore, it was also agreed that the use of the Delphi, AHP or MACBETH methods has the potential to make the scoring procedures more defensible and more adaptable to the idiosyncratic characteristics of specific bank branches, as they easily allow for the inclusion of the value systems of the decision makers of those branches into the scoring process.

DISCUSSION AND CONCLUSIONS
Mortgage loans are among the most highly sought financial products worldwide. However, considering the severe restrictions on access to credit resulting from the current economic crisis, mortgage loan risk evaluation has been considered paramount for banking institutions and is now accompanied by higher credit underwriting standards.
As pointed out before, banks have improved their credit-scoring risk systems over the years. Still, it is also acknowledged that the progress achieved does not mean that the current approaches are without limitations. In particular, following Ferreira et al. (2011a), there is a lack of transparency in the way trade-offs among evaluation criteria are defined. Starting from this observation, we applied three decision support methods (i.e. AHP, Delphi and MACBETH) to a hierarchical structure of pre-established risk evaluation criteria, which is currently in use by one of the largest banks operating in Portugal, and compared their performance from the point of view of five credit analysts. Five major lines of comparison were considered, namely: ease of use; time-consumption; ease of applicability; accuracy; and overall evaluation. Whilst we found no major differences on the performances of the three methods, our results indicate that Delphi surpasses AHP and MACBETH in terms of ease of use, time-consumption and ease of applicability. In terms of the accuracy of the results, the differences obtained between AHP and MACBETH are not significant (cf. Table 6), and both methods perform better than Delphi. Most of the decision makers considered AHP the "overall best" approach. Caution needs to be taken, however, in interpreting our findings as this was an exploratory study; the number of participants in the experiment was small; and the application of the three decision support methods was sequential. The sequential use of the methods means that when the credit analysts used the MACBETH approach they were already familiar with the pairwise comparison procedure. Interestingly, however, this does not seem to have had a significant impact on the perceptions of the credit analysts as they still considered the AHP easier to understand and use than the MACBETH. Furthermore, one needs to bear in mind that the performance of elicitation techniques is always dependent on the actors involved and of the context under analysis (cf. Weber, Borcherding 1993;Ananda, Herath 2009;Zhou, Ang 2009). Whilst we did not find significant differences between the performances of the three methods, which prevents us from proposing one in detriment of the other two, our results show that any of the methods considered has the potential to add value to the existing credit scoring systems. In particular, the results suggest that these methods have the potential to provide decision makers with a more discerning framework and assist them in making better informed decisions. There was a generalized consensus amongst the participants that "the adjustments made represent more caution in terms of credit approval, corroborating the Basel guidelines" (in the panelists' own words). Following this, and considering the limitations of our experiment, we recommend for future work: a) different panel studies with a higher number of participants and within other banking institutions; b) the development of similar experiments but involving also the comparison of other multiple criteria decision methods (for a review, see Zavadskas, Turskis 2011); c) surveys of comparisons among different methods; and d) sensitivity and robustness analyses in order to explore which method provides more robust and reliable risk assessments. Improvements and updates will strengthen the results and comparisons presented in this study. Source: Administrative information.

APPENDIX 2. Delphi survey -round 1
Obs.: The present survey is composed of a single table and respective items. To ensure the anonymity of institutions and individuals involved, all statements provided, and their statistical treatment, will be fully confidential.
In terms of mortgage loan risk evaluation, what is the degree of importance (i.e. weight) that you give to each one of the following criteria? [From 1 to 100 (1 = Very minor importance, 50 = Moderate importance and 100 = Extreme importance), mark your preference for each criterion. Please, note that the sum of the weights must be 100%]. Obs.: The present survey is composed of a single table and respective items. To ensure the anonymity of institutions and individuals involved, all statements provided, and their statistical treatment, will be fully confidential.
In terms of mortgage loan risk evaluation, what is the degree of importance (i.e. weight) that you give to each one of the following criteria? [From 1 to 100 (1 = Very minor importance, 50 = Moderate importance and 100 = Extreme importance), mark your preference for each criterion. Please, note that the sum of the weights must be 100%].

Criteria
Degree of importance % Thank You!