Loading [MathJax]/extensions/MathZoom.js
Predicting the Borrower’s Genuineness in Loan Repayment through Big Data Analytics | IEEE Conference Publication | IEEE Xplore

Predicting the Borrower’s Genuineness in Loan Repayment through Big Data Analytics


Abstract:

Banks play a pivotal role in facilitating economic activities, allocating financial resources, and managing risks. A fundamental function of banks is the provision of loa...Show More

Abstract:

Banks play a pivotal role in facilitating economic activities, allocating financial resources, and managing risks. A fundamental function of banks is the provision of loans. This research is centered on the subject of "Predicting Borrower’s Integrity in Loan Repayment," aimed at mitigating risks and ensuring prudent financial decision-making. To conduct our predictive analysis, we leveraged a comprehensive loan lending dataset provided by Lending Club Bank. This dataset consists of 2.2 million records, each associated with 151 distinct features. Performing machine learning predictions on such a substantial dataset, totaling 1.3 gigabytes, presents a formidable challenge. Consequently, we harnessed machine learning techniques and the power of Apache Spark as our primary tool for handling big data. For optimal utilization of Spark’s capabilities, we engaged Google Cloud’s Dataproc platform. Through feature selection techniques, we identified 28 significant features from the original 151. Notably, data transformation was applied to the selected features for model understanding. Logistic Regression and Random Forest Classification models were employed for the prediction of loan statuses, categorizing them as either ’fully paid’ or ’charged off.’ These models achieved impressive accuracies of 95.9 percent and 86 percent, respectively. This research contributes significantly to the evolution of loan assessment practices and the refinement of risk management strategies within the banking sector.
Date of Conference: 13-15 December 2023
Date Added to IEEE Xplore: 19 February 2024
ISBN Information:
Conference Location: Raipur, India

I. Introduction

Lending Club Bank, a prominent online financial institution in the USA, founded in 2007, is the nation’s leading peer-to-peer lending platform, connecting borrowers with investors. Acknowledged as a worldwide leader in person-to-person lending, this financial institution boasts the pioneering achievement of obtaining regulatory approval from the Securities and Exchange Commission (SEC) for its innovative investment products. This paper delves into the utilization of Exploratory Data Analysis (EDA) and Machine Learning to address real-world business challenges. Its focus lies in the meticulous analysis of risk dynamics within the realm of banking and financial services [1], shedding light on the strategic implementation of data-driven methodologies to mitigate the potential for monetary setbacks while extending credit to clientele. In order to make correct predictions, we used three different classifiers, of which the logistic regression classifier [2] was the best. A similar study has been conducted in [3] on a dataset with 1500 rows and 18 features using logistic regression, but in real-time, the data collected to draw accurate inferences is much larger and cannot be handled by traditional data handling techniques. As the amount of data grew and the importance of advanced computing resources became evident, the requirement to embrace big data became crucial [4]. Utilizing a framework that can handle big data and execute machine learning or deep learning tasks on that data can prove to be highly advantageous for developers.

Contact IEEE to Subscribe

References

References is not available for this document.