Multiple Regression for Case Study “new Pam and Susans Stores”
Essay Preview: Multiple Regression for Case Study “new Pam and Susans Stores”
Report this essay
Multiple Regression for Case Study “New Pam and Susans Stores”
Introduction
The main goal of this project is to identify a new store location based on available data. The identified site has to have the highest sales projections and potential. Multiple regression model projection, a statistical method has been chosen to evaluate available census data and business history to propose which site location should be selected.
For this analysis, several data types were used. First there were several comtypes mathematically derived and correlated against sales for value significant. These variables are binary and their annotation is provided in the case study in Table A. Only comtype 1,2 and 7 were deemed most considerable and were used in actual regression models.
Second, there was a file provided with demographical and economic data, population types, store size and sales numbers from multiple stores which currently owned by Pam and Susan company. All this data was processed using correlation facility and multiple regression models, as well as residual analysis and y equations.
Step by step only statistically significant data was isolated and not important data was removed from the further analysis. All conclusions were made based on results from careful calculations and potential contribution towards sales projections.
Results and Discussion
In order to examine the demographics and historical data which was presented in this case study and provide solid recommendation for new site location selection several statistical methods were used to draw conclusions. First, all comtypes were plotted against sales data to determine statistical significance of each comtype. Second, the correlation coefficient was found for all data points relative to sales. Then, I have identified the data values which would be most useful for Regression models due to their high values (positive and negative) as correlation towards sales. Next, seven regression models were tested for their usefulness towards sales projection. The model seven was found the most optimal due to the fact that p-values are closes to zero and t-stats are higher then 2, which regarded most statistically significant. The brief summary for Model 7 is presented in Appendix 3. Later on when both y equations were derived and x values from regression model were substituted, it was possible to conclude that Site will best option to project the highest sales.
Question 1
Based on analysis conducted for this report, the site location which will result in higher sales should be identified in densely populated area with high income residents and with relatively little direct competition. Also, it should be an area with many Spanish speaking people with education levels of more than 8 years because based on analysis, it was concluded that population of many people with 0-8 years of education and those who have dryers and freezer will negatively impact the new store sales. This is evident in Appendix 3 where negative values presented for equivalent t-stats and coefficients for 0-8 school education, %freezers and %dryers data points. Moreover, high positive values presented for comtype1 and 2 and Spanish speaking population.
Question 2
The classification of the sites using the competitive types is certainly valuable method in identifying potential sites. The correlation plot sales history vs. comtypes presented in Appendix 1 demonstrates very clearly that comtype 1 and comtype 2 are accountable for the highest sales and comtype 7 for the least sales and this is very important information which later on becomes even more significant during regression models evaluation.