Credit Score Modeling and Classification

Which features are most useful to in classifying credit score?

Statistics 811, or "Applied Statistical Modeling for Data Scientists" at Michigan State is part of the Master's in Data Science degree program and instructed by Dr. Paul Speaker.

The course included coverage of many statistical modeling and data science project methods, including data visualization, regression, variance analysis, linear models, variable selection, categorical data analysis, experiment design, classification, and time series modeling.

Final course deliverables included the development of a classification model using a large dataset. Our team, which included my partners, Steven Strachan and Chris Grandy, selected categories of credit score ranking to develop a classifier off a variety of features. Our utilized dataset can be found here, a synthetic dataset of 100,000 records with financial, occupational, and personal data. All modeling was completed in Python, with libaries including:

  • keras
  • scikit-learn
  • xgboost
  • scipy
  • pandas
  • numpy
  • matplotlib
  • seaborn

To read the full process of classifier development, please reference the below report, which details all stages of the process, through exploratory data analysis, feature engineering, model generation, and hyperparameter tuning.