Comments: why qualitative data? - Machine Learning and Survival Prediction Models for Decision

Within the financial environment, quantitative data are widely used as a fundamen-tal basis for prediction algorithms.

However, as broadly discussed in the first chapter (Subsec.1.2.2, Sec.1.3), within the specific Private Equity market the information asymmetry makes the quantitative approach unfeasible. The data about the investment amount in each PE deal and traditional financial key performance indicators are rarely available for privately held companies.

That is why in this work only qualitative covariates related to companies are used to build the analyses. Even if this choice brings some complexity to the whole work (need for a qualitative-quantitative transformation), it produces a double benefit:

1. The number of data points is as large as possible, since investors names are available for barely all the raw dataset (Tab. 2.1).

2. From an end-user point of view, the data needed to perform a prediction on the future company status and time-to-IPO will be easily available. If quantitative covariates, such as the invested amount in each company round or the company financial statements, were needed, probably a large number of users would not be able to gather these information.

Chapter 3 Outcomes probabilities estimation

This chapter focuses on the forecast of the future status of a private company. These statuses have been grouped in 4 possibilities: private, public (often written as IPO), acquisition and bankrupt. The goal has been achieved with two different machine learning techniques: the Random Forest (RF) and the Neural Networks (NN). As for the Neural Networks, two different typologies have been explored: MultiLayer Per-ceptrons (MLP) and Long Short-Term Memory (LSTM). Both algorithms provides a vector of probabilities, one for each possible class, so to end up with a classifi-cation, the maximum probability decision rule has been applied. This rule simple states that the label assigned to an observation has to be the one which results with the higher probability. All classifications have been evaluated with different performance metrics and compared with each others.

The chapter is structured as follow: a theory part which exploits both algorithms, an experiment section with the tuning procedure adopted and to conclude, a com-parison of the results obtained. Please keep in mind that within this chapter the terms "outcome", "status", "exit" and "label" are used interchangeably.

3.1 Random Forest theory

The Random Forest is an ensemble tree-based method which can be used for both re-gression and classification problem. The basic element which composed this method is called tree since it partitions the predictors’ space into small regions within a tree analogy which will be described later. The ensemble is than created "growing"

a certain number of trees and the predictions of all trees are merged in order to provide a single consensus prediction. This method is based on the uncorrelation of the forest’s trees in order to end up with multiple, different and independent splits of the features’ space. Consequently, this means that predictions may differ from tree to tree.

Single decision tree

The theory behind the partitioning of the predictors’ space is that at each step the algorithm evaluates different splits for each predictor and then it chooses the split

which minimize an error metric (e.g. Gini index measure, RSS). More precisely, for all predictors X₁, X₂, . . . ,X_j and all possible values of the cutpoint s the error measure for the two new regions of the space is computed. In the end, the pair (j, s) which minimizes the error is selected.

The procedure is repeated on the previously split predictors’ space. The process continues until a stopping criterion is reached (e.g. no more than n observations for each terminal leaf, a level of purity for the terminal nodes, etc.)

Once the tree growing is ended, the new observations are classified according to the majority classification of the terminal nodes (leaves) whose these new points belong to.

This procedure is surely enough greedy, which means that the algorithm select the best split looking for a local minima, without evaluating if another choice could lead to a global minima in the next steps.

Algorithm 1 Building of a generic decision tree 1. Define decision criteria (e.g. RSS to minimize) 2. Define stopping criteria

3. Define region set space R P = predictors set

S = cut points set

while stopping criteria do t = 0

for (j ∈ P, s ∈ S, R_i ∈ R) do

score_js ← decision criteria(j,s,R_i) if t = 0 then

min_score ← score_js j_min ← j

s_min ← s R_min ← R_i t = 1

else

if score_js < min_score then min_score ← scorejs

j_min ← j s_min ← s R_min ← R_i end if

end if end for

1. Apply splitting (j_min,s_min) on region R_min 2. Update set of regions

end while

Ensemble of trees

The Random Forest (RF) used in this analysis is a classification algorithm, which means that it is made of a set of classification trees. All trees have been built using the same observations, but due to the inherent randomness of the construction process (i.e. choosing of the features at each split), they result in a different partition of the space. This leads to a possible different classification of the same object by two different trees. Once a number n of trees is grown the RF predicts a new object’s class based on the response of the majority of trees.

Example: There are 10 trees. The new point i is injected in the forest. If 7 trees classify i as "A" and 3 trees classify it as "B", then the object is labeled as class A.

3.1.1 Hyperparameters

The most significant hyperparameters of the RF are two. The first is the number of trees which composed the forest. The second is the number of predictors m_try that will be randomly picked at each split when the tree models are built.

The number of trees has been tuned considering the Out-of-bag (OOB) error, while for the number of predictors to use at each step it has been set equal to the square root of the total number of predictors.

OOB error

The OOB error is the error rate of the so called out of bag classifier on the training set. The procedure to calculate it consists in create a set of boostrap datasets which do not contain a particular record. This set is built starting from the training dataset removing at each time a particular observation and retaining the rest of the data. This is called out-of-bag set of examples and supposing that there are n observations in the training dataset, there are n of such subsets (one for each data record). The OOB error is a very interesting technique because it removes the need of a validation set. In fact, empirical evidence shows that the out-of-bag estimate is as accurate as using a test set of the same size as the training set [14]. Anyway, this is the leave-one-out OOB configuration: different fitting-validation set proportion can be used instead.

Nel documento Machine Learning and Survival Prediction Models for Decision Support in the Private Equity Market (pagine 36-39)