Focus
Finance, Machine Learning, Credit Risk Analysis
Motivation
Accessibility, Financial Inclusion, Data-Driven Lending
About the project
This research examines how machine learning models can improve credit assessment in peer-to-peer (P2P) lending platforms, particularly for small and medium-sized businesses (SMBs) that struggle to secure loans from traditional banks. Using data from Lending Club, the study compares the predictive performance of three prominent models—Logistic Regression, Random Forest, and XGBoost—to identify which borrower characteristics most reliably determine loan approval and repayment success. The goal is to balance predictive power with interpretability, enabling both lenders and borrowers to make better-informed decisions in an increasingly algorithmic financial landscape.
Methodologically, the paper employs a structured machine learning pipeline using Lending Club’s 2007–2020 dataset, with rigorous preprocessing, feature engineering, and hyperparameter tuning. Logistic Regression serves as a transparent baseline for comparison against more complex ensemble models. Random Forest and XGBoost are introduced to capture non-linear relationships and variable interactions often missed by simpler statistical techniques. The models are evaluated using multiple metrics, including accuracy, ROC-AUC, precision, and recall, to ensure a fair comparison of performance and reliability.
The findings reveal that while all three models achieve exceptional predictive accuracy (over 99%), their interpretive value varies significantly. Logistic Regression provides clear, actionable insights linking FICO scores, credit utilization, and debt-to-income ratios to repayment likelihood. Random Forest and XGBoost offer marginally higher accuracy but at the cost of transparency, emphasizing the importance of credit history—particularly recovery and repayment variables—as the most influential predictors. Overall, the study concludes that ensemble models enhance predictive precision, but Logistic Regression’s interpretability makes it especially valuable for practical decision-making. The paper contributes to financial inclusion discourse by demonstrating how data-driven lending can improve access to fair credit while helping borrowers understand and improve their creditworthiness.
Check out more projects




