Heat System (decision Analysis)
Essay Preview: Heat System (decision Analysis)
Report this essay
QUESTION 1:
CLEAN THE DATA
The dataset contained 10 missing values, and two out liars in mean kilo watt hours per day for the month (V8). There are many ways to clear the data from these discrepancies, instead of using the case wise deletion method for missing data; I followed the procedure as below:
First I drew the graphs between V4, V 5 against the time series, from where the pattern of the graph was clear. To impute the missing values, I took the assumption that in 1991 and 1992, the gas was consumed but the company did not billed in September, while in 1993 and after that the gas company started sending the bills in August and October accounted for 60 calendar days. So I have taken the average of August – 91 and October – 91, in the missing data of September – 91, and similarly for September – 92. On the other hand after 93 and onward, I have divided the values (V4, V5 and V6) of August and October into halves and impute into the missing values of July and September respectively as shown below (Used jpeg format for ease).
(Imputation of missing data in 91 and 92)
(i could not add the excel files in this file so where there is space understand its a place of excel file)
(Imputation of missing data after 93 and onward)
To check out the out liars in the dataset, I used the route of Grubbs tests, the details of the test is in the excel sheet wac027, placed in the directed folder on Indus. After applying the test, it was clear that there was no outliar present in V4 dataset while in the V8 dataset, values of September – 93 and March – 96 were suspected outliars (from the V8 ~ time series graph), which were confirmed later by Grubbs tests as demonstrated below:
Because the dataset was very large, so I did not remove the outliars from the data, assuming that it would have minute effect on the modeling.
QUESTION 2:
MODEL FOR NATURAL GAS USAGE
Before making the model I ran the biveriate correlation, by keeping the V4 (mean natural gas usage / day) at top and all others variables after, found that V4 had a significant relation ship with V3, V5 and V11 (clear from Pearson values), but by looking at multicollinnearity, I did not used V5 and V11 variables for the modeling of natural gas usage. The relationship of V13 (dummy variable for new room) was not significant.
For modeling comparing purpose I analysed the regression model without adding the dummy variable for new room (V 13) and after including it. Because there were the problem of non aligned months, and missing data (as company skips one or two months in summer), which I imputed using above stated procedure, to minimize those problems, I took mean natural gas usage per day for the month, in therms as dependent variable and V3 and V13 (for comparing purpose) as independent variables. The regression results are as below:
(Gas consumption model without including V13)
From the above SPSS outputs, in the model summary, the value of R – Square shows that “87.2% variability in the sampled mean natural gas usage per day for the month in therms is explained by mean monthly temperature in Bostin, in degree Fahrenheit by using regression model.”
In the Anova table, the p value is 0.000, which is less than α = 0.05, so our explored relationship between dependent and independent variables is significant. Similarly from table 3, the p value for V3 is also less than 0.05 so it is also significantly related to our V4. From the above table, the following equation can be written for the modeling.
GAS PER DAY = 15.286 – 0.216 TEMPERATURE
V4 = 15.286 – 0.216 V3
“WITH 5% CHANCES OF BEING WRONG, WE HAVE SUFFICIENT EVIDENCE TO SAY THAT, WITH EVERY ONE FAHRENHEIT DECREASE IN MEAN MONTHLY TEMPERATURE IN BOSTIN THERE IS 0.216 THERMS DECREASE IN THE MEAN NATURAL GAS USAGE PER DAY FOR THE MONTH KEEPING ALL OTHER VARIABLES CONSTANT”.
Because above mentioned model do not explains the relationship between the gas consumption and addition of new room, so to see the impact of new room, as asked in question 2, I ran the regression model by including the V13 as independent dummy variable. The outputs and its explanation are as given below:
From the SPSS outputs, in the model summary, the value of R – Square shows that “88.4% variability in the sampled mean natural gas usage per day for the month in therms is explained by mean monthly temperature in Bostin, in degree Fahrenheit and the addition of new room by using regression model.”
In the Anova table, the p value is still 0.000, which is less than α = 0.05, so our explored relationship between dependent and independent variables is significant. Similarly from table 3, the p value for V3 and V13 are less than 0.05 so these are significantly related to our V4. From the above table, the following equation can be written for the modeling.
GAS PER DAY = 14.984 – 0.215 TEMPERATURE + 0.962 NEW ROOM
V4 = 15.286 – 0.216 V3 + 0.962 V13
“WITH 5% CHANCES OF BEING WRONG, WE HAVE SUFFICIENT EVIDENCE TO SAY THAT, WITH EVERY ONE FAHRENHEIT DECREASE IN MEAN MONTHLY TEMPERATURE IN BOSTIN THERE IS 0.216 THERMS DECREASE IN THE MEAN NATURAL GAS USAGE (IN THERMS) PER DAY FOR THE MONTH WHILE KEEPING ALL OTHER VARIABLES CONSTANT”.
“WITH 5% CHANCES OF BEING WRONG, WE HAVE SUFFICIENT EVIDENCE TO SAY THAT, THE MEAN NATURAL GAS USAGE (IN THERMS) PER DAY FOR THE MONTH, IS HIGHER B Y 0.962 VALUE WHEN THERE IS NEW ROOM ADDED THAN WHEN THERE IS NO NEW ROOM WHILE KEEPING ALL OTHER VARIABLES CONSTANT”.
The histogram between frequency and regression standardized residuals shows that there are two major departures, but on the average it is normally distributed. And the partial regression plot shows t hat the dependent V4 is inversely related to V3.
QUESTION 3:
MODEL FOR ELECTRICITY USAGE
I have taken V8 as dependent variable and V3, V13 as independent variable. The regression outputs are as shown below:
From the above SPSS outputs, in the model summary, the value