Corporación Favorita, a large Ecuadorian-based grocery retailer, operate hundreds of supermarkets with over 200,000 different products on their shelves. They currently rely on subjective forecasting methods with very little data to back them up and little automation to execute plans. The goal is to build a model that more accurately forecasts product sales. Source: Kaggle
The size of the training set was ~125,000,000 which included store number, item number and the unit sale on a particular date. Data was also provided for store attributes such as city, state, type and cluster of stores, item attributes such as family and class, as well as if they are perishable or not. Data of about 80,000 transactions was also provided which detailed the number of transactions at a store on a particular date. Additional metadata about holidays and daily oil prices was also given.
Feature engineering was conducted by creating addtional columns such as day of week, month and year, and one-hot encoding was used to represent categorical variables. A random forest regression model was built to forecast the unit sales at a store for an item at a store on a particular date in the future. Rigorous forward and backward k-fold time series cross-validation was done to validate the final model.