After many mini-projects in front-end web development, this portfolio autobiography has been the most substantial project I have utilized web development languages and Github for. Keep up to date on my self-taught adventure by checking out the repository.
Credit Score Modeling and Classification
Which features are most useful to in classifying credit score?
Statistics 811, or "Applied Statistical Modeling for Data Scientists" at Michigan State is part of the Master's in Data Science degree program and instructed by Dr. Paul Speaker.
The course included coverage of many statistical modeling and data science project methods, including data visualization, regression, variance analysis, linear models, variable selection, categorical data analysis, experiment design, classification, and time series modeling.
Final course deliverables included the development of a classification model using a large dataset. Our team, which included my partners, Steven Strachan and Chris Grandy, selected categories of credit score ranking to develop a classifier off a variety of features. Our utilized dataset can be found here, a synthetic dataset of 100,000 records with financial, occupational, and personal data. All modeling was completed in Python, with libaries including:
- keras
- scikit-learn
- xgboost
- scipy
- pandas
- numpy
- matplotlib
- seaborn
To read the full process of classifier development, please reference the below report, which details all stages of the process, through exploratory data analysis, feature engineering, model generation, and hyperparameter tuning.