The task was to build a model for an insurance company for predicting the propensity to pay renewal premium and build an incenive plan for its agents to maximize the net revenue (i.e., renewals - incentives given to collect the renewals) collected from the policies post their issuance.
Information about past transactions from the policy holders along with their demographics was given. The solution was evaluated on the base probability prediction of receiving a premium on a policy without considering any incentive and the monthly incentive plan for agents at policy level which maximizes the net revenue. Source: Analytics Vidhya
Data preprocessing: Label encoding followed by one-hot encoding was performed for the categorical variables of sourcing channel and residence area type. Mean imputation was done for application underwriting score since it was a continuous variable and median imputation was done for the count of late payments.
Train-development split: 25% of the training data was used to create the development (validation) set.
Algorithms: Gradient boosting, XGBoost and logistic regression with TensorFlow were implemented for the data challenge. Hyperparameter tuning was performed for XGBoost and Gradient Boosting algorithms. The following parameters were tuned successively for the XGBoost algorithm:
The following parameters were tuned successively for the Gradient Boosting algorithm:
The best model selected based on performance on the ROC-AUC score on the development set was a gradient boosting algorithm with tuned hyperparameters. A function was written for estimating the incentives plan on policy level for the agents using the renewal premium predictions and given relationships.
The submission ranked 40 out of 884 submissions on the private leaderboard.