6. General Conclusions
In this Ph.D. thesis, many aspects of the QSPR/QSAR approach based on the direct processing of molecular structures by RNN were explored and discussed by applying the method to various predictive tasks. The obtained results allow for highlighting a few important characteristics of the proposed technique.
First, it is a general method capable of treating any type of chemical structure and target property if an adequate sampling is provided, including predictive tasks not always solved by traditional descriptor-based approaches, e.g. the study of some types of macromolecules. The reported experiments dealt with diverse compounds ranging from small molecules to polymers and with very different target properties such as transition temperatures and toxicities. The automatic learning of molecular encoding allowed for using essentially the same methodology for each run, with only minor adjustments.
Second, this approach is very flexible and can adapt itself to the specific problem at hand. In particular, the representation of molecules can be tuned in order to find the right balance between structural detail and sampling in each data set. It is even possible to simultaneously treat compounds in a differentiated way, as was evident in the experiments involving homopolymers with various tacticity degrees together with copolymers for which no tacticity was specified.
Third, the proposed method is accurate, as the mean absolute residual for nearly all presented series is comparable to the error of the experimental measurements. When the input data are of high quality, the method can yield outstanding performances, as was observed on the data set concerning copolymers only.
Fourth, RNN-QSPR has a great potential in numerous applications. It can be used for many properties of technological, environmental and sanitary importance, as were the ones investigated in this thesis. Its main point of strength, as compared to other methods, is that it can obtain information on any property or activity without needing any background knowledge on the problem. The fields in which this technique is expected to provide the most interesting results are material design and evaluation of complex biological properties such as toxicity and carcinogenicity.
The results of the study performed by MLR in parallel with RNN suggest that interesting developments could stem from the combination of these two methods. Indeed, the structure-based approach should be seen as complementing rather than replacing
General Conclusions
112
descriptor-based ones. The former have the advantage of generality and flexibility, whereas the latter allow for an easier physical interpretation of QSPRs, use a simpler input space that generally needs less training and reach better performances on some specific problems, as was the case for the data set involving toxicities of phenols. A starting point for the realization of a combined approach is the observation that in our experiments MLR and RNN produced very similar outcomes, though using radically different methodologies.
Finally, the RNN approach has further possibilities of improvement. The representation can be extended to treat molecular structures not yet considered in our study, such as chiral compounds or copolymers with different monomeric distributions. A more automated procedure can be developed, also by exploiting standard representation systems, in order to make the method accessible to people not specialized in the QSAR/QSPR field.