Problem Statement

National bankruptcy rates are of interest to multiple parties, such as insurance companies and government. Given, monthly unemployment rates, population, and housing price index data, as well as monthly national bankruptcy rates from january 1987 through December 2010, the goal is to accurately predict the bankruptcy rates in Canada from January 2011 through December 2012.

Data Preparation

The bankruptcy rates appear to have both trend and seasonality, which was further confirmed by ACF and PACF plots. Bankruptcy rates also had a non-constant variance over time, and thus log-transformation of the rates was done before building the model, since this approach preserves trend and seasonality in the data but maintains a constant variance over time.

A split of training and development sets was done using the training data for validation purposes. The development set was from January 2009 through December 2010, since two years is also the length of the test set.

Implementation

The following modeling approaches were explored: SARIMA, SARIMAX, Vector Autoregression and Holt-Winters. The most optimal model from each method was selected and residual diagnostics were conducted to verify model validity. The optimal model from among these models was the SARIMAX model with unemployment rate as a covariate, which gave an RMSE of 0.0029 when trained on the entire dataset.

Business Report

Repository