Determining Restaurant Sales Performance Drivers Through Feature Selection
Student Capstone Project
Imagine that you are an entrepreneur wanting to open a restaurant in Vancouver, but you have no idea which neighbourhoods or specific locations are viable from a revenue perspective. Sitewise Analytics, a Software-as-a-Service company, specializing in developing site forecast models, sales impact assessments, and actionable market strategy plans for leading restaurant, retail, real estate, and healthcare chains, partnered with Master of Data Science (MDS) Vancouver students to understand which specific factors drive success (and failure) for certain restaurant brands.
The data set provided contained 2000 features from a burger chain and a burrito chain. Recognizing that most business owners don't have the time to go through each one to determine the most significant sales drivers, the group’s goal was to construct a machine learning pipeline to whittle down the total number, and highlight only the top 20 positive and negative features within a dashboard.
Four feature selection methods, including Model-based Selection (MBS), Recursive Feature Elimination (RFE), Elastic Net (ELN), and Principal Component Analysis (PCA), were implemented at the beginning of the pipeline. Each method had its strength. The MBS model was simple to implement and allowed the students to explore the initial feature set. The RFE model was an effective way to remove multiple unimportant features iteratively and allowed the students to eliminate about 90 percent of the original features. The ELN method performed better when dealing with highly correlated data. The PCA model provided valuable insights into which features play the most important roles from an unsupervised learning perspective.
After extracting 200 of the most important features from the 2000 provided, the group then trained four different machine learning models (i.e., Ridge, RandomForest, XGBoost, Support Vector) to determine the top 20 positive and top 20 negative features.
When evaluating each model, the MDS Vancouver students were looking for consistencies. If a feature appeared in all four models, then it was considered highly consistent.
By the end of this process, the group determined that having schools nearby, having IT professional/technical workers in the area, and being close to an intersection all had positive impacts on sales. The negative sales drivers they discovered included, being in an area with high crime and murder rates or being too close to competitors and sister branches. The results were compiled into a visualization dashboard.
In addition to the dashboard, the group gave Sitewise a Dockerfile that contained all the necessary scripts with full documentation. Sitewise has since taken the pipeline developed by the MDS Vancouver students and put it into the testing stage with their other data sets.