Data Mining for Economics

(1)

Data Mining for Economics

Delio Panaro

December 2014

(2)

Introduction

This thesis collects three works based on three papers that I have produced during my PhD.

Common thread of the whole work is the application of data mining techniques to economic topics.

The first chapter, namely A Fraud Detection Algorithm for Online Bank-ing, is the result of a research project titled ”Real time methods for online banking fraud and money laundering detection” conducted within the De-partment of Mathematics of University of Genoa.

Research project’s goal was to analyze a real world dataset of online banking transactions in order to develop an ad hoc algorithm able to detect fraudulent transactions. To reach this objective a two layers statistical classifier has been implemented using two classification algorithms: Support Vector Machines and Adaboost.

The main hurdles are represented by: • data skewness,

• the low number of fraudulent operations on which build a fraud profile, • asymmetry of cost matrix which required to achieve a high true

pos-itive rate rather than a high overall performance ratio; • the need to work in real time.

The meta algorithm presented in the work proved to be able to reach set goals and, even though it has been built specifically for our problem, it achieved good performances even on different classification tasks.

(5)

The second chapter, titled A Statistical Analysis of Reliability of Audit Opinions as Bankruptcy Predictor has been produced within a research group of Department of Economics of University of Pisa (Caserio et al., 2014).

Aim of the work is to measure if and how audit opinions (in particular going concern) issued by auditing companies could represent a reliable in-formation on which formulate forecasts on firms financial conditions. The research is based on a sample of US listed firms. The analysis has been carried using classical statistical tools such as Logit regression models and more recent classification tools as support vector machines and Adaboost classifier.

Results show that the ability of auditors to forecast bankruptcy is quite poor even compared to performances achieved by statistical classifiers.

The third chapeter is titled Management Discussion & Analysis in the US financial companies: a data mining analysis and, as previous one, has been produced within a research group of Department of Economics of University of Pisa.

Aim of the work is to analyze how management reacts to firm’s financial distress in the MD&A and if, this document, could be useful to forecast future financial conditions.

To do so we appeal to text mining tools which have been used to re-trieve information needed to build variables used within classical regression models.

Results show that, apart from some exception, the information con-tained in the MD&A are coherent with firm’s financial conditions and there-fore can represent an additional source of information on which formulate financial forecasts.

All the algorithms and data mining tasks mentioned above have been im-plemented using Python whereas regression analyses have been conducted with Gretl.

(6)

Chapter 2

A Fraud Detection

Algorithm for Online

Banking

2.1 Introduction

The availability of great calculus power at a low price and of huge data sets, brought, from early 90’s, to a rapid growth of statistical classification methods and to data mining techniques. As pointed out by Fayyad et al. (1996), one of common tasks in data mining is anomaly detection.

This chapter proposes a two layers algorithm for supervised binary clas-sification. The algorithm is specifically designed for anomaly detection and, in particular, fraud detection.

Growth of online money movements brought to a contemporary growth of fraud techniques. In addition to the loss of money, frauds cause a repu-tational risk, specially at a business level. Literature provides three reviews of fraud detection techniques, namely Bolton and Hand (2002); Kou et al. (2004) and Phua et al. (2010). All of them highlight that, particularly in fi-nancial frameworks, there are great limitations in knowledge exchange also motivated by the requirement to avoid give criminals useful information to evade detection.

The proposed algorithm is composed by a combination of machine learn-ing algorithms that allows classification accuracy and a very short elabo-ration time. For a similar approach in other context see for example Chan et al. (1999). The real-time feature sets the algorithm in between fraud prevention and fraud detection. The algorithm has been designed to be

(7)

able to manage a wide range of problems. It is particularly effective on unbalanced data sets, with asymmetrical cost functions and in which the detection of true positives proves to be hard.

The algorithm has been motivated by a project related to a fraud de-tection in online bank and has been tested on a data set of online bank transfers provided by the industrial partner of the project. Main charac-teristics of the data set are size and imbalance between licit and fraudulent operations. Due to commercial confidentiality reasons, some details of the algorithm and of the data set are omitted without compromising the in-tegrity of the presentation. The algorithm is implemented in Python 2.7.3 on a CI7-720QM Intel Calpella Core i7-720 Quad Core 1.6GHz equipped with 8 GB DDR3 Ram and is available from the author.

Section 2 of the chapter describes the main features of the test sample and the data preprocessing. Section 3 illustrates the operating principles of the proposed meta-classifier. Section 4 proposes an empirical assessment of the meta-classifier performance. Section 5 shows results from a cross validation of meta algorithm and a sensitivity analysis. Finally, Section 6 summarizes results of this work and proposes possible improvements.

2.2 Sample features and data preprocessing

The data set at our disposal is composed by 14, 967, 432 bank server logs stored in a MySQL database. It records all operations involving money transfer from three major Italian banks and covers the period from Jan-uary 1st, 2011 to May 31st, 2013. For each single log there are eighteen entries containing information such as IP address, username, its date and time of occurence and so on. The complete list of the recorded variables, together with a list of derived variables, is in Table 2.1. Each operation is labeled as licit (−1) or fraudulent (+1). The labelling of a transac-tion as fraud takes place following a statement of the bank client and this, of course, entails the risk of underestimating the number of frauds. The share of operations labelled as frauds is about 1 in 25, 000 giving rise to a strong imbalance. More precisely, there are 14, 966, 796 licit operations against 636 fraudulent operations. With the aim to reduce data hetero-geneity, the data set is split according to the variable C1 into Domestic users and Business users, providing two subsamples whose main features are summarized in Table 2.2. Non-real variables have been discretized for computational reasons. Further data clustering have been tryed but did not provide meaningful enhancement.

(8)

session is defined as all the operations made between consecutive login and logout by the same agent. These types of logs have been dealt with,mostly deleted, in the preprocessing phase. This choice allows the exclusion of non dispositive operations (such as balance checking, statement of account checking, ... ) and a notable reduction of the number of operations to be processed giving 14, 966, 796.

2.3 Classifier architecture

The proposed classifier is based on two layers: the first layer is composed by simple classifiers, whereas the second layer is based on an Adaboost Meta Classifier (see Appendix A).

The two layer architecture is motivated by the need to avoid over-fitting problems encountered with classical Adaboost algorithm. Furthermore a multi layer architecture allows the reduction of computational effort and improved performances. For similar issues on a different multi-layer archi-tecture see Viola and Jones (2004). Our classification problem is character-ized by asymmetrical cost function, given that a false negative has a higher cost with respect to a false positive. For this reason the algorithm is drawn to achieve a high true positive rate rather than a high global performance rate. For supporting arguments for this choice when dealing with highly skewed data see Fan et al. (2004).

Regardless how has been constructed (some methods will be shown in section 2.4), a simple classifier is allocated to the first or second layer of the meta-classifier according to its accuracy defined as the number of correctly classified instances (here bank server logs) over the number of processed in-stances. Poorly performing classifiers are discarded1_{, best performing}

clas-sifiers are allocated to the first layer and the other ones to the second layer. The same simple learner is not used in both phases of the meta-classifier to avoid unbalancing of the Adaboost in favour of a weak learner (R¨atsch et al., 1998). Figure 2.1 provides a schematic representation of the overall architecture of the algorithm.

In the remainder of this chapter we describe possible ways to build simple classifiers, a method to allocate them to the first and second layer of the meta-classifier and how they are used in the two layers.

1_{Classifiers discarding is not really needed given that Adaboost does not consider}

(9)

MySQL entries Meaning Type

CHIAVE Operation Alphanumeric

identification key string

DATA Date and time of Set of three

operation recording integers DATAFULL Date and time of operation Set of three

integers

TIMEPART Minutes from midnight Integer

IP Agent IP Address Set of four

integers

USERAGENT User OS and Browser Alphanumeric

string

USERNAME Client identification key Alphanumeric string

CF0 Bank Integer

CF1 Contract type Integer

CF2 Authentication method Integer

OPERAZIONE Operation type Integer

VALUTA Currency Alphabet

string

IMPORTO Money amount Floating

CONTOCORRENTE Addressee current account Alphanumeric string

DESTINATARIO Addressee IBAN code Alphanumeric string

ESITO Operation outcome Integer

SESSIONID Session identification key Alphanumeric string

CF3 Addressee ID Alphanumeric

string Derived variables Meaning

WEEKDAY Week day Integer

TIMESLOT Day time slot divided Integer

in portions of 15 minutes

WHERE City and country of user Alphabet

extracted by IP address string

WHO Country of addressee Alphabet

extracted by IBAN code string Table 2.1: MySQL Database Entries

(10)

Cluster Number of entries Share Business users licit operations 4, 353, 632 29.08737% Business users fraud operations 440 0.00294% Domestic users licit operations 10, 613, 164 70.90838% Domestic users fraud operations 196 0.00131%

Table 2.2: Clusters features

Figure 2.1: Classifier working scheme

2.4 Simple classifiers

As well as for classical Adaboost algorithm, simple classifiers can be built using different approaches Schapire (1999). The choice of the best ap-proach(es) is dictated by the peculiarities of the problem at hand.

Meta-algorithm classifying performances have been tested using two kinds of simple classifiers: Support Vector Machines (SVMs) and Behavioural.

Denote with F and L the total amount of fraud and licit operations respectively, and with N = F + L the total amount of operations. For a suitable positive integer n (n = 22 in our application) let the real-valued vector x in n-dimension denote the single operation, X ⊂ Rn _{the set of}

(11)

all operations and y(x) ∈ {−1, +1} the fraud/licit label associated to x. The same share δ ∈ (0, 1) of F and L is used to train simple classifiers. The parameters δ and η introduced below control the effect of our choice of training set on the meta-classifier performance.

2.4.1 SVM simple classifiers

2

For the proposed meta-classifier an SVM should be considered for each subset of the features in the data set, giving 222− 1 SVMs in our appli-cation; nevertheless, the large sample size implies very high computational time and hence we considered only all one-dimensional subsets and some two-dimensional subsets believed to be meaningful as suggested by expert advice. Each SVM returns a simple hypothesis h(x) = −1 for x ∈ X if it classifies the operation as licit and h(x) = +1 otherwise.

In the application at hand a class of SVMs are built using Scikit-learn (Pedregosa et al., 2011) and applied to the two subsamples. SVM’s kernel is chosen through a naive optimization method as advocated in Min and Lee (2005). Given that complex kernel functions do not provide per-formance improvements3_{, the used kernel function is the linear one.}

In order to reduce data imbalance in the training set, licit operations are undersampled randomly picking a fraction η ∈ (0, 1) of them (He and Garcia, 2009). The training set for each SVM considered is thus com-posed by (δηL + δF ) operations. Data imbalance is a common problem in SVM classification and methods to overcome it usually appeal to under-sampling, oversampling or cost sensitive learning (see, among others, Tang et al. (2009) and Sch¨olkopf et al. (2001)).

2.4.2 Behavioural simple classifiers

Behavioural simple classifiers are built tracking past habits of users and are used to check whether new operations from the same agent are consistent with his/her past. Given that only licit operations are useful to build user past habits, behavioural simple classifiers are trained on δL chronologically ordered operations. This entails that the percentage and number of past operations can be different for different users. A different choice could be made and, for example, different thresholds could be chosen for each agent. The behavioural pattern of each user is defined by which browsers and

2_{For a brief introduction to SVM, see Appendix B}

3_{Compared kernel functions are: linear, polynomial of degree two and three and radial}

(12)

OS he/she used, the time window in which he/she usually operated, geo-graphic coordinates provided by his/her IP address. The list used in our application is not provided here and in general should be suggested by prior knowledge of the application field from which the data set is. Behavioural classifier returns a hypothesis h(x) = −1 if operation x ∈ X is consistent with the past behaviour of the agent and h(x) = +1 otherwise. For op-erations for which the corresponding user does not have historical profile behavioural classifiers always return a hypothesis h(x) = −1 (in the appli-cation at hand, users have an historical profile if, in the training set, he/she did at least three operations).

2.5 Two layers assignment

Simple classifiers are selected for the first or second layer according to their accuracy defined as:

ρ = P

x∈X :y(x)=h(x)|h(x)|

N =

Number of correctly classified operations Total number of submitted operations where |a| is the absolute value of a real number a.

As shown in Figure 2.1, simple classifiers with a predictive capability ρ greater than a threshold θ set by the analyst are used in the first layer, simple classifiers with a predictive capability 0.5 < ρ ≤ θ are used in the second layer whereas those with a predictive capability ρ ≤ 0.5 are discarded as poorly performing. Let S be the number of simple classifiers, S(θ) ≤ S the number of classifier with ρ > θ and M (θ) ≤ S the number of classifier with θ ≥ ρ > 0.5.

For each operation x ∈ X , the first layer returns a final hypothesis H(x) or sends the operation to the second layer, according to the rule presented in Section 2.6. The second layer is composed by an Adaboost algorithm built using as weak learners the M (θ) simple classifiers with θ ≥ ρ > 0.5. Adaboost combines weak classifiers assigning to each of them a weight resulting from a sequential updating process.

2.6 Classification flow

Let µ be an integer number in {1, . . . , S(θ)} set by the analyst.

The single operation x ∈ X enters the system into the first layer. If x is classified as fraud by a number of simple classifiers in the first layer larger or equal to µ then x does not go through the second layer and is labelled as fraud by the meta-classifier which returns a final hypothesis H(x) = 1.

(13)

Otherwise, if x is classified as fraud by less then µ simple classifiers in the first layer, then x is processed by the second layer and the final hypothesis H(x) is determined only by the Adaboost classifier.

The meta-classifier, both the actual composition of each layer and its performance, depends upon four parameters: δ ∈ (0, 1) the percentage of licit operations used to train classifiers, η ∈ (0, 1) the percentage of licit operations used to train the simple classifiers, θ ∈ (0.5, 1) used to decide allocation of simple classifiers to the two layers, and µ ∈ {1, . . . , S(θ)} which determines the majority rule used in the first layer. A sensitivity analysis is presented in Section 2.8.

2.7 Empirical assessment of meta-classifier

per-formance

To consider a good trade off between computational time need and reduc-tion of the haphazardness due to the random undersampling, results have been validated through a Monte Carlo process of 100 iterations.

The test set is composed by (1 − δ)F fraudulent operations and an equal amount of randomly picked licit operations not used in the training phase. The classifier accuracy is defined as:

GP =

P

x∈X :H(x)=y(x)

|H(x)|

N ,

that is, the proportion of test data correctly classified by the meta-classifier. The true and false positive are:

T P = P x∈X :H(x)=y(x)=1 |H(x)| F and F P = P x∈X :H(x)=1,y(x)=−1 |H(x)| L , respectively.

The meta-algorithm has been tested using several values for the param-eters of θ, µ, η and δ and by including two different simple classifiers sets: Pure SVMs and SVMs & Behavioural. The following presents results for the parameter set which maximize accuracy. The distance between TP and FP for both Domestic and Business users clusters and a synthesis of results for the whole parameter space are given in the next sections.

The explored parameters space is given by: two different values of µ ∈ {1,S(θ)₂ } which decrees how many simple classifiers in the first layer are needed to mark an operation as fraud, by four different values of δ ∈ {0.5, 0.6, 0.7, 0.8} which determines the training set size, by ten values

(14)

of θ ∈ [0.90, . . . , 0.99] discretized by step 0.01, which rules the first/second layer assignment, and finally by five values of η, needed to balance data skewness, that has been made vary so that to keep a ratio, between licit and fraud operations, from 1 to 5.

2.7.1 Pure SVM simple classifiers

Results concerning the pure SVM version of the meta-classifier, for both Business and Domestic clusters, are summarized in Table 2.3. The table shows the parameters values which maximize GP. Values shown of GP, TP and FP represent the mean and, in parentheses, the standard deviation recorded in 100 Monte Carlo iterations in each of which the composition of the set of licit operations has been made vary randomly picking (1 − δ)F licit operations. The best performance is achieved for δ = 0.8, θ = 1 and η = 4 (although for η = 5 performance is similar).

Domestic Users Cluster

GP TP FP T P − F P δ η θ µ

0.9834 1.0 0.0331 0.9668 0.8 1:4 1 0.98 (0.0141) (0.0) (0.0282)

Business Users Cluster

0.9587 0.9534 0.0359 0.9175 0.8 1:4 1 0.9 (0.0100) (0.0) (0.0201)

Table 2.3: Pure SVM Classifier Performances

As shown in the upper part of Table 2.3 in the Domestic users cluster-more than 98% of instances has been correctly classified with a TP rate equal to 100%. For Business users cluster performances slightly decreases reaching an accuracy of about 95% with a false alarm rate lower than 4%. Perfect accuracy in detecting frauds in the Domestic user cluster suggests that fraudulent operations have certains peculiarities which allow the meta-classifier to distinguish them from licit operations. Key features seem to be IP address and authentication method, which provide excellent perfor-mances as simple classifiers. Table 2.4 reports the ρ values for nine SVMs and four behavioural simple classifiers for the Business and Domestic clus-ter for η = 1 and δ = 0.5. The excellent predictive capability of SVM 3 explains the values of TP equal to one.

Figure 2.2 shows a ROC analysis of the results achieved, for the whole parameter space investigated, for both clusters, Domestic cluster on the

(15)

Domestic Cluster Business Cluster Classifier Predictive Capability Predictive Capability

SVM 1 0.982 0.963 SVM 2 0.996 0.995 SVM 3 1 1 SVM 4 0.843 0.496 SVM 5 0.164 0.510 SVM 6 0.899 0.606 SVM 7 0.973 0.951 SVM 8 0.966 0.979 SVM 9 0.993 0.803 Behavioural 1 0.843 0.478 Behavioural 2 0.843 0.526 Behavioural 3 0.843 0.162 Behavioural 4 0.843 0.162

Table 2.4: Weak Learners Performances

left plot and Business cluster on the right plot. Each point in the two plots represents the mean value recorded in 100 Monte Carlo iterations for a combination of parameters. The ROC convex hull (Provost et al., 1997) for best TP and FP ratios achieved in Monte Carlo simulations is shown as well.

As it can be noticed from the lower left corner of the right hand plot of Figure 2.2, some parameter sets provide a poor TP rate. In particular, worst TP results are achieved for parameter sets which include θ = 0.99 and empty first layer. This strongly supports our proposal that a two layer architecture improves performance more than a combination of simple classifiers such as Adaboost does. As mentioned before the meta classifier was meant to have a high TP rate rather than a high global performance. This has been achieved for most values of the parameters, at the cost of higher FP rate for some parameters.

2.7.2 SVMs & Behavioural simple classifiers

Results concerning the SVMs & Behavioural version of the meta-classifier and ROC analysis, for both clusters, are shown in Table 2.5 and Figure 2.3 respectively.

Comparison of global performances for the two versions of meta-classifier,

(16)

Figure 2.2: ROC analysis of meta-classifier performances for Domestic (left) and Business (right) clusters for the pure SVMs version

0.9834 1.0 0.0331 0.9668 0.8 1:4 1 0.98

(0.0155) (0.0) (0.0310)

0.9599 0.9418 0.0218 0.9199 0.8 1:4 1 0.92 (0.0072) (' 0.0)4 _(0.0144)

Table 2.5: SVM e Behavioural Classifier Performances

shows that inclusion of Behavioural simple classifiers do not affect meta-classifier performances on Domestic Users cluster, whereas overall perfor-mance on Business Users cluster slightly improves. See also Table 2.4 for a general unsatisfactory performance of Behavioural simple classifiers. This is due to the poor accuracy of Behavioural simple classifiers which do not enter in the first layer and which contribute to the second layer is negligible. Parameter sets for which best accuracy is achieved are the same for both version of meta-classifier apart from µ which for the Business clusters changes from 0.9 to 0.92. ROC analysis too are quite similar to those in Section 2.7.1, confirming poor performance of Behavioural classifiers.

(17)

data set, for other data sets Behavioural simple classifier based on whole sessions and past history of single agents are likely to have good perfor-mance and enter into the second layer of the meta-classifier.

Figure 2.3: ROC analysis of Meta-classifier performances for Domestic (left) and Business (right) clusters for SVMs and Behavioural version of meta-classifier

2.8 Cross Validation

To assess the reliability of the meta-classifier, further tests have been con-ducted. The first one is on a data set created from the original online bank data set by randomly shuffling each element of data matrix. More precisely, the data set has been organized in two matrices DL and DF with dimen-sions L × 22 and F × 22 respectively. Shuffling row indexes, ¯DL and ¯DF are created, where:

¯ DLi,j= D L σ(i),j ¯ DF_i,j= D_σ(i),jF

with σ(i) permutations of {1, . . . , L} and {1, . . . , F }, respectively. This gives two synthetic data sets, one for Domestic users and one for Business users.

SVM simple classifiers have been created for the two synthetic data sets to test the pure SVM version of the meta-classifier. As well as for the real data, meta-classifier’s performances are elaborated through a Monte Carlo

(18)

process with 100 runs. A synthesis of the results is shown in Table 2.6 and Figure 2.4.

0.9784 1.0 0.0431 0.9568 0.8 1:3 1 0.95 (0.0182) (0.0) (0.0365)

0.9383 0.9767 0.1001 0.8766 0.8 1:4 1 0.92 (0.0156) (' 0.0) (0.0312)

Table 2.6: Pure SVM Classifier Performances on shuffled data

Figure 2.4: ROC analysis of Meta-classifier performances for Domestic (left) and Business (right) clusters for pure SVMs version of meta-classifier on synthetic data

Results in Table 2.6 and Figure 2.4 show a slight worsening of GP rates for both clusters mainly due to an increase of false positive rates. Unsurprisingly this suggests that the data set embeds some peculiarity of licit operations useful to distinguish them from fraudolent operations.

Due to the lack of freely available data sets for fraud detection, we resort to test the architecture of meta algorithm on the NSL-KDD (Tavallaee et al., 2009), a data set derived by the KDD’99 (Stolfo et al., 2000). The

(19)

NSL-KDD is a standard data set to test intrusion detection algorithms. NSL-KDD provides synthetic network connection records organized in a train set matrix of 125,973 rows and two distinct test set matrices: the easiest one, called KDDT est+_{, composed by 22,544 rows and the hardest}

one, called KDDT est−21, composed by 11,850 rows. Both train and test sets have 42 columns for each row and a label recording if the connection represents or not a network attack.

To compare performance results with those of the six machine learning algorithms presented in Tavallaee et al. (2009), the meta-classifier has been trained on 20% of the train set. Results on both test sets are presented in Table 2.7. For details on the algorithms tested in Tavallaee et al. (2009), which are J48, Naive Bayes, NB tree, random forest, multi-layer perceptron, simple SVM, we refer to that paper.

As the upper part of Table 2.7 shows, the meta algorithm performs slightly better on the KDDT est+set with respect to other simple machine learning algorithms. On KDDT est−21, our meta algorithm, outperforms simple machine learning algorithms. It is worthwhile to highlight that meta algorithm, on both data sets, strongly outperforms simple SVM, showing that the proposed architecture plays a key role in performance’s achieve-ment.

Meta J48 Naive NB Random Multi SVM

Alg. Bayes Tree Forest Layer

Percep.

KDD 82.35 76.56 82.02 80.67 81.59 77.41 69.52 T est+

KDD 69.27 63.97 66.16 63.26 58.51 57.34 42.29 T est−21

Table 2.7: Comparison of meta-classifier and simple machine learning al-gorithms performances on NSL-KDD Test Sets

2.9 A sensitivity analysis

To assess the effect of parameter choice on the meta-classifier performance, a sensitivity analysis has been carried out. The effect of varying the param-eters on meta-classifier performances for Business cluster synthetic data is reported in Figures 2.5 and 2.6. Meta-classifier performance responses are quite similar for all data and all clusters. Hence Figures 2.5 and 2.6 give

(20)

performances for the Business cluster synthetic data because it is where the effect of the variation of parameters is particularly evident.

As expected, an increase of δ, which corresponds to an increase of in-formation provided in the training phase, improves the performance. By increasing η, that is, increasing the ratio between licit and fraudulent opera-tions, the meta-classifier performances decrease, confirming that balancing the train set improves results. Effects of µ are less clear, even if darkest points, that is the ones for which µ = 1, are closer to the top left corner of left side of Figure 2.6. Finally, increasing θ, that is, increasing the thresh-old used to assign simple classifiers to the first layer, causes a decay in meta-classifier performances, underlining the importance of the first layer in performance achievement.

Figure 2.5: ROC analysis of Meta-classifier performances response to δ (left) and η (right) change for Business clusters for pure SVMs version of meta-classifier on synthetic data

2.10 Conclusions

In this work we proposed a statistical automatic classifier designed to de-tect quickly anomalies in skewed, very large data sets and applied it on a problem of real-time fraud detection in an online banking framework. The theoretical novelty consists in the architecture of the algorithm that com-bines existing statistical methods such as Support Vector Machines and Behavioural classifiers with an Adaboost meta-classifier. The classifier has been tested on a sample composed of about 15 millions real world

(21)

transac-Figure 2.6: ROC analysis of Meta-classifier performances response to µ (left) and θ (right) change for Business clusters for pure SVMs version of meta-classifier on synthetic data

tions in the form of log server tracks and tested on a shuffling of the same data set and on the NSL-KDD data set. Due to the peculiarities of the fraud detection problem, our first goal was to achieve a good fraud detec-tion rate keeping false positive rate as low as possible. An accurate choice of parameters allows the achievement of the target avoiding over-fitting problems mentioned in Section 2.3.

The meta-classifier could be applied to the analysis of data sets from a number of fields. The algorithm is designed to work automatically with little intervention from the analyst. Nevertheless, its correct working can not prescind from a good knowledge of the problem at hand nor from an accurate data preprocessing. Manual tuning of the meta-classifier by the analyst is required at three levels: 1. choice of behavioural classifiers to include in the list of simple classifiers, 2. choice of the number of SVMs to consider as simple classifiers out of all possible SVMs corresponding to all combinations of the data set features, 3. choice of parameter values which for our application are high values of δ and η and low values for µ and θ, as shown in Section 2.9. We presented the meta-classifiers for an outlier detection problem and considered binary labelled data. But the same architecture can be easily adapted to multi-class classification problems. This requires a different choice of simple classifiers, the use of a multi-class Adaboost and choice of an appropriate hypothesis function H.

(22)

Time needed by the algorithm to preprocess and to classify an in-stance/operation, on our machine, is about 0.002 seconds. This means that the algorithm, which we tested offline, is able to work in a real-time framework (for example in real-time online fraud detection) both alone and as part of a pipeline of several fraud detection mechanisms.

(23)

Chapter 3

A Statistical Analysis of

Reliability of Audit

Opinions as Bankruptcy

Predictors

3.1 Introduction

Along the time, bankruptcy prediction has been one of the targets that many researches tried to accomplish. In the early 1970s, several models were proposed, basing the analysis on the traditional financial ratios (Beaver (1966); Beaver (1968); Altman (1968)).

By the time, while the information technology evolved and the need for more trustable predictions was felt by the investors, the bankruptcy predic-tion models progressed through the utilizapredic-tion of even more advanced tech-niques, based on data mining (Divsalar et al. (2012)), intelligence modeling techniques (Demyanyk and Hasan (2010)), and artificial neural networks (Alfaro et al. (2008)).

On a parallel track, many studies rely on the relevance of the going concern opinion (GCO), through which seasoned auditors report the early warning signs of bankruptcy (see, among others, Casterella et al. (2000)).

The global financial crisis, started in 2007, along with the recent finan-cial scandals, has brought about increasing attention paid to the auditor opinions issued on distressed clients. The GCO represents one of the most relevant judgments the auditors express, as it is even able to affect the

(24)

eq-uity markets (Blay et al. (2011)) and the investors’ behavior (Menon and Williams (2010)). Given the importance of the GCO, several papers in the literature try to explain the relationship between the GCO and bankruptcy prediction.

Lennox (1999) argued that the role of auditors is foremost to warn investors when a company is likely to go into bankruptcy. Hence, audi-tors are obliged to issue a “going concern qualification”. Some scholars took the opposite view and concluded that the GCO, at best, offers only marginal information to stakeholders (Mutchler et al. (1997)). Still, other authors gauged the reliability of auditors in issuing a GCO and they found that a high number of opinions tend to be wrong about the likelihood of bankruptcy. Actual outcomes often turn out to be quite different from what these auditors were predicting (Malgwi and Emenyonu (2004)).

Uneven results of this nature are justified by auditors who say that they are only responsible for reporting the past and the present. They do not consider themselves as “clairvoyants”, therefore, they should not be held responsible for predicting the future of a company (Casterella et al. (2000)). Investors pay increasing attention to the GCO, because they consider it as a preliminary bankruptcy warning signal. Investors, thus, need a transparent and credible audit opinion in order to make decisions. They would not rely on audited financial reports if they consider that opinion to be of low credibility (Robertson and Houston (2010)).

Still, other authors worried that a GCO issued on a firm would serve as a “self-fulfilling prophecy”, accelerating its failure by reducing public confidence in the firm’s ability to continue as a going concern (Geiger et al. (2012); Pryor and Terza (1998); Citron and Taffler (1992); Merton (1968)). An audit opinion coherent with the real business situation of the audited firm could reduce the information asymmetry between capital demand and supply and thus, it could improve the investors’ awareness of the risks they run in investing in the audited companies (Holt and DeZoort (2009)).

The choice to focus the analysis on the financial institutions is due to the fact that the existing literature on the relationship between auditing and bankruptcy prediction pays considerably greater attention to the industrial sector (Geiger et al. (2012); Wertheim and Fowler (2012)). We believe that the financial sector is even more relevant, since it involves a wider range of stakeholders. Relatively few researchers have written papers aligned with our focus on the financial sector (Ravi Kumar and Ravi (2007); Malgwi and Emenyonu (2004)).

(25)

US is still considered as the premier market and financial center and also because in the US, the Sarbanes Oxley Act (SOX) has a larger impact on the audit opinion than comparable laws in other countries. The US is also the place where frauds, scandals, and collapses have the biggest resonance, thus, it deserves greater attention from regulators.

3.2 Literature Review

3.2.1 GCO

Auditors have the responsibility to issue an audit opinion in order to assure that the financial reporting gives a true and fair view in accordance with the financial reporting framework used for preparation and presentation of the financial statements.

If the auditor has some substantial doubts about the firm’s ability to continue as a going concern, he/she has to issue a modified GCO. Such modification of opinion is called “emphasis of a matter” and it informs users of uncertainties or disagreements over accounting principles. Otherwise, if this emphasis of matter regarding the going concern is not sufficient to express the severity of the financial situation of the firm, the auditor must issue a qualified opinion, indicating the reasons of this choice.

The term “going concern” is based on the “continuity assumptions” that an entity will continue in operations for the foreseeable future and will be able to realize assets and discharges. Modification of opinion should be useful for the stakeholders to be informed of the financial conditions of the firm and for the management to take corrective actions, especially to prevent the failure of the firm.

The guidelines on going concern involve both accounting and auditing standards to regulate the preparation and evaluation of the financial state-ments of listed companies.

The accounting standard provides a description of going concern princi-ple in International Accounting Standard (IAS). It states that a firm has to prepare its financial statements under going concern conditions. If the man-agement has significant doubts about the ability of the entity to continue as a going concern, the uncertainties must be disclosed.

The issuance of the auditing standards related to the going concern started in 1974 when the American Institute of Certified Public Accoun-tants (AICPA) issued the Statement on Auditing Standards (SAS) 2 and continued with SAS 34, SAS 59, SAS 126, and International Standard on Auditing (ISA) 570.

(26)

Whereas there is no relevant difference between SAS 126 and SAS 59 (SAS 126 is just a clarity redraft of the previous standard), there are instead differences between SAS 59 and SAS 34. Some authors, indeed, argue that SAS 59 was issued in order to reduce the investors’ surprise related to the bankruptcy (Asare (1990); Holder-Webb and Wilkins (2000)).

Moreover, while SAS 34 allows auditors to express their concerns about the continuity of the company by issuing a qualified opinion, SAS 59 allows them to issue an unqualified “modified” opinion. SAS 59 provides the following four categories of conditions/events that may raise substantial doubt about going concern:

• Negative trends in financial ratios; • Indicators of possible financial difficulties; • Internal matters;

• External matters.

The guidance contained in SAS 59 leaves much to auditors’ discretion, thus a huge part of the auditors’ judgments is based on their perceptions and the external events impacting their profession.

According to these considerations, auditors could commit two types of errors in modifying an audit opinion for substantial doubt about going concern: Type 1 is a false positive, which occurs when the auditor issues a GCO and the firm continues in business; Type 2 arises when the firm is going to fail and the auditor does not issue a GCO. As causes of the Type 1 error, Kida (1980) found the “self-fulfilling prophecy” effect and a deteriorated relationship with the client. About the Type 2, the risk of lawsuit by creditors and the loss of reputation could be factors explaining the error.

Prior literature streams attempted to find out the elements affecting the decision to issue a GCO, such as financial conditions of the audited firms, litigations, turnaround initiatives, size of the audit firm (Bruynseels et al. (2013); Reynolds and Francis (2000); Blay et al. (2011); Musvoto and Gouws (2011); Chen et al. (2012)). Bruynseels et al. (2013) investi-gated the link between management’s turnaround initiatives and auditors’ opinions, finding that turnaround actions are associated with a higher like-lihood to receive a GCO. Reynolds and Francis (2000) questioned whether the client size affects the propensity of auditors in issuing a GCO. They considered the economic dependence and the reputation protection as vari-ables of their study, finding that the issuance of a GCO by Big 5 audit firms

(27)

is not affected by client size. Blay et al. (2011) provided evidence that the GCO is considered as an external communication of risk, as this type of audit opinion allows stakeholders to have incremental information related to distressed firms.

Musvoto and Gouws (2011) argued that GCO assumption is anti - mea-surement in nature, as it is difficult to measure the attributes of accounting phenomena under GCO assumptions. Chen et al. (2013) evaluated the link among insider trading, litigation, and GCO, finding that the probability of receiving a GCO is negatively associated with the level of insider selling.

In the American context, the issuance of the SOX of 2002 can be consid-ered as an answer to recent accounting frauds, but it did not change going concern issuing regulations (Bellovary et al. (2006)). Regarding this mat-ter, some scholars tried to identify the state of the going concern decision post-SOX era (Nogler and Jang (2012)).

3.2.2 Bankruptcy Prediction Models and GCO

Even if the fundamentals of the bankruptcy prediction models can be found in historical contributions based on financial ratios (Beaver (1966); Beaver (1968); Altman (1968); Altman and Hotchkiss (2006)), attempts were car-ried out in order to improve the effectiveness of the bankruptcy prediction, taking advantage from other increasingly sophisticated models (Demyanyk and Hasan (2010)).

Among them are logistic regression techniques (Logit) (Ohlson (1980)), early warning systems (Davis and Karim (2008)), and artificial neural net-works to forecast the main financial ratios (Celik and Karatepe (2007)) or predict the outcome of Chapter 11 bankruptcy (Luther (1998)). Still in the succeeding years, scholars tried to identify warning bankruptcy signs among the disclosure issues, through data mining (Divsalar et al. (2012)), text mining (Shirata et al. (2011)), multivariate analysis (Mutchler (1985)), multivariate adaptive regressions (De Andr´es et al. (2011)), and more ad-vanced fuzzy clustering analysis (Lenard et al. (2000)).

Most researches dealt with the bankruptcy prediction carrying out the implicit assumption to find out the factors which could span the information in the financial ratios (Pinches et al. (1973); Zavgren (1985)), determining the most critical and sensitive financial ratios which could represent an early warning against bankruptcy risk (Altman (1968); Beaver et al. (2005)).

According to a wide range of scholars, auditors could have a key role in assessing the bankruptcy risk and thus in preventing a financial collapse.

(28)

Hodges et al. (2005) analyzed whether the most common cross-sectional bankrupt predictors—Altman Z-score, cash flows, and financial ratios—along with audit opinion, observed in the three years before the bankruptcy, gave back trustable predictive information about the collapse. The results show that the audit opinion does not represent an effective warning sign for im-pending bankruptcy and, at the same time, neither the other predictors provide very reliable information about the bankruptcy risk. On the same literature stream, Malgwi and Emenyonu (2004) focused on United King-dom (UK) financial institutions considering a time lag going from 1977-1978 to 1999-2000. They wondered whether there exists an association between the bankruptcy of the banks and the preceding audit opinions. They thus used audit opinions as a proxy to evaluate the auditors’ effectiveness in predicting failure. They found that a high number of unqualified opinions were issued before the bankruptcy, differently from what they expected. A similar analysis was carried out by Casterella et al. (2000), they observed that auditors do not consider themselves as “clairvoyants”, thus they should not be required to predict the future of a company. Furthermore, in some cases, issuing a qualified opinion might also affect the events and might lead companies to go bankrupt (Hopwood et al. (1989)).

However, there is a rich literature supporting the role of the GCO in predicting the failure of a company. Hopwood et al. (1994) found that audit opinions have not a lower ability in predicting bankruptcy than that of financial ratio-based models, as it was expected.

Mutchler et al. (1997) examined whether auditors issuing a GCO on soon-to-be-bankrupt companies are influenced by contrary information (e.g., the default on debt) and by mitigating factors that offset such contrary in-formation. Results suggest the existence of a significant correlation between GCO decisions and the probability of bankruptcy. By using three variables to indicate the debt status of the companies observed, namely, payment, covenant defaults, and cured defaults, their study represents the next step in the research of Chen and Church (1992) who analyzed the correlation be-tween the GCO and a single debt status variable only. In turn, the analysis of Mutchler et al. (1997) was taken up and extended by Foster et al. (1998), who analyzed the usefulness of debt default and the GCO in the bankruptcy risk assessment. They found that loan default and loan covenant violations explain the bankruptcy at the time of the last annual report issued before the violation happened.

According to Lennox (1999), one of the roles of the auditors is precisely to warn investors when a company is likely to go bankrupt. He under-lined that if there are possibilities that a company ceases to trade in the foreseeable future, then the auditors must give a GCO.

(29)

Moreover, Bryan et al. (2005) considered that if the role of the GCO is to anticipate the signal of a possible bankruptcy, then the stakeholders should have the possibility to defend against the risk to have losses, carrying out timely actions.

3.3 Research Design

In this research, we measure the reliability of audit firms in predicting bankruptcy for US listed financial institutions. The focus of our research is the set of financial institutions which filed for Chapter 11 from 2002 to 2011. We retain that the sector of financial companies is the most significant given its impacts on a wide range of stakeholders and even on the economic system as a whole. We, however, noted that in spite of such significance, quite low attention has been paid by scholars so far. They, instead, devote much more space and interest to industrial companies.

Our main assumptions are concentrated on the role that traditional accounting ratios may have in impacting the probability of the issuance of GCOs. Specifically, a worsening of financial ratios should increase the probability for auditors to issue a GCO and for companies to file for Chapter 11, according to a literature stream and professional associations (Hopwood et al. (1994); Lennox (1999)).

We formulate our research questions on the basis of the controversial findings about the reliability of the auditors in predicting bankruptcy and in accordance with the literature review. Specifically, in addition to the stream of the literature that supports the relevance of financial ratios in supporting the auditors’ decisions about the issuance of a GCO, we further consider that the impact of financial ratios on auditors’ decisions is supported by the same SAS 59 which provides, among others, the negative trends in financial ratios as a condition that may cause a doubt about the company’s ability to continue as a going concern.

According to such a diversified set of sources which consider financial ratios affecting the issuance of a GCO, we formulate as follows our first research question:

• RQ1: Which ratios are mostly correlated with the issuance of a GCO? Starting from the first research question, we will have to go in depth into the usefulness of financial ratios in predicting the risk of bankruptcy.

Even when we found a correlation between some specific ratios and the issuance of a GCO, we cannot so far assert that those ratios are also useful

(30)

predictors of bankruptcy risk, because the issuance of a GCO might not be correlated with the bankruptcy risk as well.

Therefore, we need to make a preliminary investigation on the capacity of financial ratios in predicting the bankruptcy risk, even considering that a diversified part of literature supports the use of financial ratios in predicting bankruptcy or in being a warning signal (Altman (1968); Beaver et al. (2005)). We thus aim to check if financial ratios are really useful to predict the risk of bankruptcy. On this basis, we formulate our second research question as follows:

• RQ2: Are financial ratios useful in predicting the risk of bankruptcy? As from the above literature analysis, a large part of scholars believe that the audit opinion aims to provide the stakeholders with the most reliable information about the bankruptcy risk of the company (Casterella et al. (2000); Lennox (1999)).

On this basis, we formulate as follows our third research question: • RQ3: Is the audit opinion helpful in predicting the risk of bankruptcy?

3.4 Sample Selection and Data Collection

As the first step, we collected all the 996 US listed companies that filed for Chapter 11 between 2002 and 2011 from the Edgar Securities and Exchange Commission (SEC) database. Companies file for Chapter 11 when they, or their creditors, ask for protection under the bankruptcy laws of the US in order to restructure the financial conditions. We consider the Chapter 11 as a proxy of bankruptcy.

From the websites of all the 996 companies, we detected the sector and excluded those outside the financial sector. This skimming produced a list of 60 financial US listed firms (see Table 3.1). For each company, using Thomson Reuters Datastream, we extracted the audit opinions and the classical accounting performance ratios used for financial statement anal-ysis, excluding firms with missed values. We ended up with a list of 42 companies. We then divided financial ratios into three categories, depend-ing on the financial statement document to which they refer: statement of cash flow, balance sheet, and income statement (see Table 3.2)1_.

(31)

Year 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 Total All 252 161 75 67 77 44 72 132 64 52 996 firms2 Financial 6 8 3 4 4 4 6 17 6 2 60 firms (FF)3 Share of FF 2.4% 5% 4% 6% 5.2% 9.1% 8.4% 12.9% 9.4% 3.8% 6.02% out of all firms4

Table 3.1: Distribution of US Listed Firms That Filed for Chapter 11 in the Sample Period (2002-2011)

Statement of cash flow ratios Balance sheet ratio Income statement ratio

Cash flow/ Convertible debts (CD)/ Net income available

sales (CFS) total assets to common (NIAC)

Increase/decrease in cash and Short-term debts (STD) and Earnings before interest short-term investments current portion of long-term and tax (EBIT)

(IDCSTI) debts (LTD)/total assets

Total debts (TD)/ Net sales (NS) or revenues total assets

Total shareholder equity / Operating income (OI) (TSE) total asset

Decrease in investments (DI) Equity in earnings (EE) Increase in investments (II) Return on assets (ROA) Discontinued operations (DO)/ Return on investment (ROI) total assets

Long-term borrowings (LTB)/ Return on sales (ROS) total assets

LTD/total assets Return on equity (ROE) Net debts (ND)

/total assets

Other liabilities (OL)/total assets Reduction in long-term

debts (RLTD)

Table 3.2: Set of Financial Ratios Used in the Analysis

Total number of Number of Number of Net number of

US financial firms financial firms financial firms financial firms filing for Chapter 11 with no data available filing for Chapter 11

996 60 18 42

Table 3.3: The sample

2_{All US listed firms that filed for Chapter 11 in the sample time period distributed per year.}

3_{All US listed financial firms that filed for Chapter 11 in the sample time period distributed per year.}

4_{Percentage of US listed financial firms that filed for Chapter 11 in the sample period over all US listed firms that filed for}

(32)

The Edgar SEC database provided us with the type of qualified au-dit opinions issued during the sample time period (e.g., going concern or others). Table 3.3 classifies the collected data and shows their composition. From the extraction achieved on the Edgar SEC database, we found that 10 firms out of 42 received a qualified audit opinion, for a total number of 21 qualified audit opinions. All of them were GCOs. For each of the 10 firms analyzed, we collected the year(s) of receiving a GCO, the year of filing for Chapter 11, and the time lag calculated as the difference between the year of filing for Chapter 11 and the last audit report (see Table 4). The remaining 32 firms received an unqualified opinion.

In order to validate the results of our analysis, we built a matching sample composed of 42 randomly picked US listed financial companies that did not file for Chapter 11, namely, healthy firms.

For each of them, we carried out the same procedure followed for Chap-ter 11 filing firms described above. Among the matching sample, we found two firms that received a qualified opinion in the time period (2002-2011) considered (for a total of three modified audit opinions) and all of them were GCOs. The remaining firms had unqualified opinions.

3.5 Statistical Analysis and Findings

The statistical analysis is composed of the following four steps: • Data preprocessing;

• Logit regression analysis;

• Support vector machine (SVM) analysis; • AdaBoost meta-classifier.

3.5.1 Data Preprocessing

As stated above, the sample is composed of 42 bankrupted firms and 42 healthy firms. For each of them, we considered 23 financial ratios. Being N = 84 the total amount of firms and M = 23 the total amount of financial ratios considered, we denote with xi,j, where i = 1, ..., N and j = 1, ..., M

(the j-th financial ratio related to i-th firm) and with Ti, where i = 1, ..., N

(the last year of available data for i−th firm). For each firm, we considered a time lag of four years, that is, for each xi,j, we built xti,jfor ti = Ti−3, ..., Ti.

Last available year Ti for healthy firms is assigned with a one-to-one

(33)

assigned each of them to a randomly picked healthy firm (Nicolaou (2004)). Resulting data are combined in a matrix A composed by N rows and M × 4 columns.

3.6 Logit Regression Analysis

To answer RQ1, “Which ratios are mostly correlated with the issuance of a GCO?”, we used Logit regressions (Greene (2003)). Logit regressions allow us to highlight which financial ratios are more correlated with going concern issuance.

To avoid problems related to the strong multicollinearity of our data, we used a naive multiple Logit regression approach (Fraser and Hite (1990)).

We regressed each column of matrix A versus a label vector yGC _{∈ R}N_.

Each element of yGC _{is denoted as y}

i,t. yi,t∈ {0, 1} where 0 indicates that

i-th firm did not receive a GCO at time t, and 1 indicates that i-th firm received a GCO at time t.

Table 3.4 shows which financial ratios are correlated with label vec-tor. Statistical significance is evaluated with a Z-test (Sprinthall and Fisk (1990)).

Financial ratio significant p-value5 _β _{Std. error}

for going concern

CFS T - 3 *** -1.030 0.333 CFS T *** -1.460 0.275 NIAC T *** -0.826 0.250 OI T - 2 *** -1.040 0.349 OI T - 1 ** -0.662 0.277 ROE T - 3 *** -0.855 0.322

ROA net income/total asset T ** -0.577 0.231 ROI EBIT/total asset T ** -0.575 0.231

ROS T *** -1.744 0.302 STD/total assets T ** 0.607 0.259 TD/total assets T ** 0.576 0.231 TSE/total assets T ** -0.576 0.231 ND/total assets T ** 0.575 0.231 OL/total assets T ** 0.575 0.231

(34)

3.7 SVM Analysis

In order to deepen RQ1, we performed an analysis using SVMs.

Using Scikit-Learn, we built a SVM for each feature of the sample as-sessed as statistically significant by Logit regression, namely for the 14 financial ratios showed in Table 3.4.

SVMs are trained using label vectors yGC_{defined above and y}BR_{∈ R}N_.

As for yGC_{, each element of y}BR_{is denoted with y}BR

i,t . yi,tBR∈ {0, 1} where

0 indicates that i-th firm did not go bankrupt at time T , and 1 indicates that i-th firm went bankrupt at time T .

We denoted with SV MGC _{the set of SVMs generated using y}GC _{as a}

label vector and with SV MBR_{the set of SVMs generated using y}BR _{as a}

label vector. Both SV MGC _{and SV M}BR_{are trained on the whole sample}

(42 healthy and 42 non-healthy firms).

Since the generated SVMs are unidimensional, the support vectors are reduced to scalars.

The mean of support vectors generated by each SVM ∈ SV MGC_,

de-noted with SV MGC

j for j = 1, ..., 14, can be interpreted as the threshold

value used by the auditors to issue a GCO; whereas the mean of support vectors generated by each SVM ∈ SV MBR, denoted with SV M_jBR for j = 1, ..., 14, can be interpreted as the threshold value under (above) which a firm will go (will not go) bankrupt.

The distances between SV MGC

j and SV MjBRfor j = 1, ..., 14 are

nor-malized between 0 and 1, where a value close to 0 means a low distance whereas a value close to 1 indicates a great distance. Table 3.5 shows that such a distance is always very close to 0.

3.8 AdaBoost Meta-Classifier

To answer our RQ2, “Are financial ratios useful in predicting the risk of bankruptcy?”, we appeal to an AdaBoost meta-algorithm.

In this work, we present an application of AdaBoost in the auditing subject, aimed to classify companies for which it is predictable to receive a GCO or an unqualified opinion. We trained M × 4 SVMs, one for each feature of the dataset, on 50% of the sample using as a label vector yBR_.

5_{*, **, and *** indicate significance at the levels of 0.05, 0.025, and 0.01, respectively.}

Signs of regression coefficients βs are coherent with expectancies, which could confirm goodness of our analysis

(35)

Variable Distance between SVMGC and SVMBR CFS T - 3 0.0006 CFS T 0.0003 NIAC T < 10−4 OI T - 2 < 10−4 OI T - 1 < 10−4 ROE T - 3 0.0006

ROA net income/total asset T < 10−4

ROI EBIT/total asset T < 10−4

ROS T 0.0004 STD/total assets T 0.0002 TD/total assets T 0.0001 TSE/total assets T < 10−4 ND/total assets T 0.0001 OL/total assets T < 10−4 Table 3.5: Distances Between SV MGC _{and SV M}BR

Built SVMs are used as weak learners in the AdaBoost. AdaBoost is trained on the same subsample used to train SVMs.

The predictive capability of AdaBoost is tested on the 50% of the sam-ple not used to train the algorithm. Results show that AdaBoost is able to correctly classify 75% of submitted examples. Table 3.6 shows results in greater detail and proposes a comparison of AdaBoost with auditors’ bankrupt predictive capability.

Defining global performance as the ratio of correctly classified instances over the total amount of instances submitted, AdaBoost outperforms audi-tors (75% versus 53.8%). More in detail, audiaudi-tors perform better on healthy firms, wrongly classifying about 1% of instances versus a 12.5% error rate of AdaBoost classifier (Type 1 error).

About firms which filed for Chapter 11, AdaBoost strongly outperforms auditors. Its error rate (Type 2 error) is much lower than that of the auditors (38.1% versus 90.5%).

It is important to underline that results could be influenced by sample size and composition. In order to answer our RQ3, “Is the audit opinion helpful in predicting the risk of bankruptcy?”, in addition to the above mentioned global analysis (on both samples), we now focus our attention on only Chapter 11 filing firms that received a GCO. By means of such an

(36)

analysis, we can observe that only 10 out of 42 firms (24%) received at least a GCO (see Table 3.6).

Interestingly, eight firms out of 10 received a GCO just one or a few years before the filing for Chapter 11; the remaining two firms received a GCO quite far from the filing for Chapter 11. Only for six firms, the auditors perceived a pervasive and systematic risk of bankruptcy, issuing GCOs for more than one year.

Section 1

Auditors’ global performance 0.538 AdaBoost global performance 0.75 Section 2

Chapter 11 Non-Chapter 11 GCOs according to auditors 0.095 0.0126

Unqualified according to auditors 0.9057 _0.988

Section 3

Chapter 11 Non-Chapter 11 GCOs according to AdaBoost 0.619 0.125

Unqualified according to AdaBoost 0.381 0.875

Table 3.6: Comparison of Auditors and AdaBoost Performances

Moreover, on 76% (32 firms out of 42) of the Chapter 11 filing firms, the audit firms commit the error of Type 2, as they did not issue a GCO for them. Other considerations could arise looking at Table 3.5. The distance between SV MGC

j and SV MjBRfor each significant ratio is always very close

to 0. This means that for those ratios, the threshold values used by auditors are close to the threshold values useful to predict the risk of bankruptcy.

3.9 Conclusions

The present study reveals some interesting findings regarding the reliabil-ity of audit opinions as bankruptcy predictors. The percentage of Chapter 11 filing firms that received at least a GCO by audit firms is quite low (24%), and this evidence suggests that the reliability of auditors in predict-ing bankruptcy is quite low. In order to deepen this evidence and answer our research questions, we carried out some further investigations using statistical methods.

6_{Type 2 error.} 7_{Type 1 error.}

(37)

Regarding RQ1, we found, through a Logit regression model, the ratios deemed relevant by auditors for issuing or not a GCO. The financial ratios mostly correlated with the issuance of a GCO are: CFS, STD/total assets, TD/total assets, ND/total assets, OL/total assets, TSE/total assets, NIAC, OI, ROE, ROA, ROI, and ROS.

About RQ2, an AdaBoost meta-classifier shows that financial ratios could be useful in warning the risk of bankruptcy. AdaBoost, analyzing the financial ratios, is able to properly classify 75% of firms. These results allow us to better answer our RQ3, even if the reliability of audit opinions in predicting the risk of bankruptcy seems to be quite low, due to a high rate of Type 2 errors (76% only considering Chapter 11 firms and 46.2% considering both Chapter 11 and non-Chapter 11 firms).

The results from SVM show that auditors take into consideration the right threshold values for each ratio. These evidences highlight that audi-tors are reluctant to issue a GCO, maybe to avoid self-fulfilling prophecy problems or to comply with the management plans for the future. The partial failure of the auditors in predicting the risk of bankruptcy could be due to the further information the auditors rely on, aside from financial and economic ratios, in supporting their audit opinions. The scant predictive ability of auditors might also be due to critical relationships with distressed clients, as suggested by some recent literature streams, or to the kind of responsibility that auditors feel to hold.

As from literature, scholars assert that auditors have both contrary in-formation and mitigating factors regarding their client companies (Mutchler et al. (1997)). Therefore, auditors may take into account different or further information regarding their clients. Moreover, SAS 59 requires auditors to consider management plans and evaluate if there is some likelihood that the adverse effects will be mitigated into the future for a reasonable period of time.

We are aware that a systemic analysis of annual report ratios and ex-ternal factors, such as financial crisis and regulations, is necessary to better validate our results. We consider necessary, as well, that professional as-sociations and academics clarify whether the external auditors have or not the responsibility to forecast the success or failure of the management’s business plans and to properly predict the risk of bankruptcy, especially for listed firms. Stakeholders rely on auditors’ opinions in performing their eco-nomic decision-making process and thus, when auditors fail to highlight a warning signal, strong concerns about the effectiveness of the audit opinion do arise.

(38)

Some limitations of our study could arise from the features of the sample, even if it represents the universe of US financial listed firms which filed for Chapter 11 between 2002 and 2011.

(39)

Chapter 4

MD&A and Business

Performance: a Data

Mining Analysis on US

Financial Companies

4.1 Literature Analysis

The Management’s Discussion and Analysis (MD&A) is considered the most relevant document issued by managers, as it assesses the liquidity conditions of a company, along with its capital resources and operations.

The MD&A firstly appeared in 1968 as an element of the Guides for Preparation and Filing of Registration Statements (Securities Act Release, 1968), but only starting from 1974, it became a mandatory accompanying document of annual report (Securities Act Release, 1974).

The mandatory nature of MD&A was imposed to make public and, above all, understandable the critical information about the predictable trends that may affect the future operations of the business (Li, 2010). Due to its role, MD&A is the most read and most relevant component among the financial information disclosed by the companies (Tavcar, 1998) and it is the document financial analysts rely on most frequently in the United States, when they prepare their reports (Knutson (1993); Rogers and Grant (1997)). In the US context and according to SEC, the MD&A has three main objectives:

(40)

state-ments that enables investors to see the company through the eyes of management;

• to enhance the overall financial disclosure and provide the context within which financial information should be analysed;

• to provide information about the quality of, and potential variability of, a company’s earnings and cash flow (SEC, 2003).

We focus on two main literature streams: 1) the quality and the fidelity of MD&A; 2) reliability of MD&A content to forecast future performance of a firm.

According to the first literature stream, the MD&A might not be as informative as intended. A frequent criticism that investors have advanced along the years against the MD&A is that it is often focused on num-bers already disclosed in the financial statement, instead of on the future performance of the company and that, more in general, its quality and usefulness has declined in the recent years (Sutton et al. (2012); Brown and Tucker (2011)). Some scholars found out that most of the companies issuing MD&A paid much heavier attention to historical details, than to forecasts (Pava and Epstein, 1993).

Even if the regulators have paid attention to improving the quality of MD&A, also in response to some financial scandals (SEC 2003a, SEC 2003b), management’s discretion remains a fact (Sutton et al. (2012); Meiers (2006); Verrecchia (1983)). We believe that this level of discretion might be even higher when a company is entering in a financial distress. On this topic, authors analysed the association between distress and disclo-sure, finding that the quality of MD&A disclosures is quite low because of the unethical behaviour of managers who insufficiently disclose items that would be of interest to investors. Other researches lead to similar conclusions, showing that companies entering distress during a period of economic good times, may use on purpose a weak and inadequate disclo-sure as a means to make vague the poor prospects they have for recovery (Graham et al. (2005); Desai et al. (2006); Kothari et al. (2009)). On an opposite view, other studies found that on average managers of distressed firms increase disclosure quality in the first years of distress (Holder-Webb and Cohen, 2007).

Authors which dealt with the association between financial distress and MD&A disclosure, took advantage from quality indexes consisting in scores provided by the personnel of SEC (Feldman et al., 2008) or by the members of Toronto Society of Financial Analysts (TSFA) (Clarkson et al., 1999). The critique role of discretion and interpretation, as results of the SEC

Data Mining for Economics