Do not follow this hidden link or you will be blocked from this website !

 Machine Learning and Data New Sources for Credit Scoring


Christophe HURLIN * Université d'Orléans, LEO (FRE CNRS 2014). Contact : christophe.hurlin@univ-orleans.fr.
Christophe PÉRIGNON ** HEC Paris, Département Finance, GREGHEC (UMR CNRS 2959). Contact : perignon@hec.fr.Nous remercions Sébastien Saurin et Elisa Korn pour leur assistance et Jean-Paul Pollin pour ses commentaires et ses encouragements. Nous remercions également les participants à la table ronde « Pourquoi et comment les nouvelles technologies vont-elles bouleverser le secteur financier ? » de l'édition 2019 des Rendez-vous de l'Histoire (Blois). Ce travail a bénéficié du soutien financier de la Chaire ACPR Régulation et risques systémiques, du Labex Ecodec (ANR-11-LABX-0047) et des programmes ANR MultiRisk (ANR-16-CE26-0015-01) et F-STAR (ANR-17-CE26-0007-01).

In this article, we discuss the contribution of machine learning techniques and new data sources (new data) to credit-risk modelling. Credit scoring was historically one of the first fields of application of machine learning techniques. Today, these techniques permit to exploit new sources of data made available by the digitalization of customer relationships and social networks. The combination of the emergence of new methodologies and new data has structurally changed the credit industry and favored the emergence of new players. First, we analyse the incremental contribution of machine learning techniques per se. We show that they lead to significant productivity gains but that the forecasting improvement remains modest. Second, we quantify the contribution of the "datadiversity", whether or not these new data are exploited through machine learning. It appears that some of these data contain weak signals that significantly improve the quality of the assessment of borrowers' creditworthiness. At the microeconomic level, these new approaches promote financial inclusion and access to credit for the most vulnerable borrowers. However, machine learning applied to these data can also lead to severe biases and discrimination.