VLC Home
  Course Home
   
--  MS EXCEL CASE STUDY
-

Problem

-

Procedure

- Data Preparation
- Calculation of crop productivity, convert the data of "Total land holding", "Soil depth " amenable to analysis
- Summarize the data with suitable statistical measures and frequency distribution
- Regression Analysis
- View data using database features
- Represent the data through appropriate graphs and tables
   
 
Self Test
Ask a Question
Download PDF File


Regression Analysis

This analysis tool performs linear regression analysis by using the "least squares" method to fit a line through a set of observations. You can analyze how a single dependent variable is affected by the values of one or more independent variables - for example, how Crop Yield is affected by such factors as Size of the Land Holding, FYM , N, P, K, Soil Depth, Pesticide Cost, Labor Cost and Machinery Cost. You can apportion shares in the each of these factors, based on a set of data. These results are useful to assess and understand the influence of various factors on crop yield.

  1. Copy the Entire Data in DATA Worksheet to New Worksheet as did earlier and rename the new worksheet as Regression

  2. Select the Soil Code Column i.e Column M
  3. Click on CUT Icon
  4. Position the Mouse Pointer in Cell P1
  5. Click on PASTE Icon to paste these values.
  6. Delete the Empty Column M
  7. Select DATA ANALYSIS option from the TOOL Menu.
  8. Select REGRESSION Option from the displayed DATA ANALYSIS Dialogue Window

  9. Input Y Range : Enter the reference for the range of dependent data. The range must consist of a single column of data Select the INPUT Y RANGE as F!:F51 i.e Crop Yield Column

  10. Input X Range : Enter the reference for the range of independent data. Microsoft Excel orders independent variables from this range in ascending order from left to right. The maximum number of independent variables is 16. Select the INPUT X RANGE as G1:O51 i.e all other independent Varialbles

  11. Select the Labels Option, if the first row or column of your input range or ranges contains labels. Clear if your input has no labels; Microsoft Excel generates appropriate data labels for the output table.

  12. Output Range : Enter the reference for the upper-left cell of the output table. Allow at least seven columns for the summary output table, which includes an anova table, coefficients, standard error of y estimate, r2 values, number of observations, and standard error of coefficients. Select A55 as the OUTPUT Option

  13. Select RESIDUALS to include residuals in the residuals output table.
  14. Select STANDARDIZED RESIDUALS to include Standardized Residuals in the residuals output table.

  15. Click OK to get the regression Output

Observe the Output of the regression and note down the values for R-Square, Adjusted R Square, F Ratio and Number of Cases In this Case R-Square is : 0.3842 ( i.e about 38%) Adjusted R square : 0.2456 ( i.e about 24%) F-Ratio is : 2.7727 and Number of Cases are : 50 Now also observe the Residuals Table in the Output. You may find some out layers are existing in the data . ( incase if any standardized residuals values exceeds the values of -2 and +2) . Those cases are to be treated as out layers of the data. Removing out layers of the data

I. Copy the entire data in Regression Worksheet to New Worksheet as did earlier and rename this sheet as Regress1
II. Activate Regress1 Worksheet Arranging the standardized Residuals in Ascending Order
III. Block the Residual Output Table
IV. Select SORT Option from the DATA Menu
V. Select Standardized Residuals as the option for SORT BY
VI. Select ACSENDING Order
VII. Select Heading ROWs
VIII. Click OK to display the required data in Sort Order

Now Note down the Outlayer Observation Numbers from the Residuals Tables and delete those observation from the data. Then again run the regression and observe the values for R-Square, Adjusted R Square, F Ratio and Number of Cases. If any increase in the R-Square and Adjusted R- Square then this will be a relatively better regression when compared to the first set of data.

And again remove the out layers from the second set of data and then repeat the same process till you get best regression