Friday, October 3, 2014

Business Analytics Case Study - Multiple Regression using Microsoft Excel- India Hatchback Car Data

The datasheet can be downloaded from this link: Download Data

Case Study: The data in the sheet is only for classroom teaching purposes. The data is of 25 hatchback cars in Indian Market. The Data variables include Name of the car, Length, Width, Height, Kerb Weight, Displacement, Max Power BHP, Max Power RPM, Max Torque NM, Max Torque RPM, Seating Capacity, Number of Gears, Mileage in kilometer per Litre, Fuel Type & Transmission type of each car. The data belongs to only petrol cars and manual transmission.

Using this Data, Conduct a Multiple Regression analysis in Microsoft Excel studying the impact of the variables on Mileage. We will not take into account two variables Fuel Type & Transmission type. Rest of all the Variables will be included in Study. 

For running Regression in Microsoft Excel, One needs to have Data Analysis ToolPak.

The following are the steps for downloading Data Analysis ToolPak. 
1. Click on the green File tab. The File tab in Excel 2010 replaces the Office Button (or File Menu) in previous versions of Excel.
2. Click on Options
3. Under Add-ins, select Analysis ToolPak and click on the Go button.
4. Check Analysis ToolPak and click on OK.
5. On the Data tab, you can now click on Data Analysis.

The following dialog box below appears.
For example, select Regression and click OK to create a Regression in Excel.


Multiple Regression Case Solution

The following are steps for running Multiple Regression in Excel.
Continuing from Step above, a Dialog Box Appears for Regression 


Input Y Range is the dependent variable in our case Mileage which is in Column M of the Data Sheet, Select $M$1:$M$26.

Input X Range is the independent variables in our case Length,Width,Height, Kerb Weight, Displacement, Max Power BHP, Max Power RPM, Max Torque NM, Max Torque RPM, Seating Capacity, Number of Gears. These variable are in Columns B to L, Select $B$1:$L$26.

Click on Labels as labels are in first row and confidence Level 95%. Click New Worksheet Ply and title it as reg Model 1. A new sheet with this title will appear in Excel Work Book and will store the Regression out put.

Regression Out Put Reg Model 1


The Multiple Regression Equation is 

Mileage = -41.5613 + 0.0023 X Length + 0.02142 X Width + 0.01625 X Height - 0.01869 X Kerb Weight - 0.01199 X Displacement - 0.08312 X Max Power BHP + 0.00192 X Max Power RPM + 0.08052 X Max Torque NM + 0.00066 X Max Turque RPM + 1.50058 X Number of Gears + 0 X Seating Capacity

Analysis of the Multiple Regression Output

R Square means that 91.79% of the variation of Dependent Variable is explained by the independent variables. 

Multiple R of 95.80% is the correlation between Actual Mileage Values and Predicted Mileage Values. 

Adjusted R Square is adjusted for number of X Variables. 

Next Look at Significance F Value in Anova Table which is associated with P-Value.
Since it is less than 0.05 we reject the H0 at the significance level 0.05.

In the Coefficients table look at P-Value and only those variables that has P-Value less than 0.05 will be considered significant and if P-Value is more than 0.05, those variables will be considered insignificant.

In table above, Width, Height, Kerb Weight , Displacement & Max Power RPM have values less than 0.005 so they are considered significant rest of all considered insignificant. 

Regression Output Reg Model 2 

We have rerun the regression by removing the insignificant variables Length, Max Power BHP, Max Torque NM, Max torque RPM, Seating Capacity & Number of Gears.


The Multiple Regression Equation is
Mileage = -12.23644 +  0.01806 X Width + 0.01093 X Height - 0.01439 X Kerb Weight - 0.00825 X Displacement  + 0.00136 X Max Power RPM 

Analysis of Multiple Regression Output 

R Square means that 88.91% of the variation of Dependent Variable is explained by the independent variables. This is slightly less than previous Reg Model 1.

Multiple R of 94.29% is the correlation between Actual Mileage Values and Predicted Mileage Values. 

Adjusted R Square is adjusted for number of X Variables. 

Next Look at Significance F Value in Anova Table which is associated with P-Value.
Since it is less than 0.05 we reject the H0 at the significance level 0.05.

In table above, Width, Height, Kerb Weight , Displacement & Max Power RPM have values less than 0.005 so they are considered significant. 

1 comment:

  1. Helpful information you have shared you have shared
    I would recommend you to do from business analytics training from Imarticus

    ReplyDelete