|
Independent |
Dependent |
|
Price |
Demand |
|
Rainfall |
Yield |
|
Credit sales |
Bad debts |
|
Volume of production |
Manufacturing expenses |
The values of the independent variable are assumed to be fixed. Hence it is not a random variable. On the other hand, the dependent variable, whose values are determined on the basis of the independent variable, is a random variable.
Regression
Lines:

Linear
Regression Model:
Where α
and β are the parameters of the equation.
or
This equation
is called a ‘linear regression model of y on x’ and
is the random variable with mean is
equal to zero and variance
.

In the above diagram, the line represents the line of regression of Y on X. The parameter α, which is the expected value of Y when X = 0, is called Y-intercept. The parameter β is slope of the population regression line and is known as the ‘population regression coefficient’. When the line slopes downward to the right, the value of β will be negative; it then represents the amount of decrease in Y for each unit increase in X.
In practice, the population regression line is unknown. Since the regression is defined by the Y-intercept α and the slope β, therefore, the task of estimating the population regression line involves obtaining the estimates of α and β (based on sample data). Thus the ‘population regression line’ (μy/x = α + βx) is estimated by the ‘sample regression line’ or ‘sample regression equation’:
------------------------ (i)
The problem of
estimating the regression parameters α and β can be
considered as fitting the best model on the scatter diagram.
One method for this purpose is the ‘method of least squares’.
ESS
=
Where ESS : Error sum of squares
yi : observed values
:
estimated values, i.e., (
)
It is further
elaborated as:
ESS
= Σ(yi – a – bx)2
------------------------ (ii)(a)
------------------------ (ii)(b)
------------------------ (iii)
------------------------ (i)
------------------------(ii)(a)
------------------------(ii)(b)
------------------------(iii)
Example:
A sample of paired observations is given
as below:
|
X |
2 |
4 |
6 |
7 |
9 |
10 |
11 |
|
Y |
1 |
2 |
4 |
7 |
10 |
12 |
14 |
Required:
(a) Fit a line of regression to the data in the above table.
(b) Construct a scatter diagram and graph the fitted line on the scatter diagram, and
(c)
Calculate error sum of squares.
Solution:
(a):
Regression Line of Y on X
|
x |
y |
xy |
x2 |
|
|
|
|
2 |
1 |
2 |
4 |
–0.438 |
1.438 |
2.068 |
|
4 |
2 |
8 |
16 |
2.594 |
–0.594 |
0.353 |
|
6 |
4 |
24 |
36 |
5.626 |
–1.626 |
2.644 |
|
7 |
7 |
49 |
49 |
7.142 |
–0.142 |
0.020 |
|
9 |
10 |
90 |
81 |
10.174 |
–0.174 |
0.030 |
|
10 |
12 |
120 |
100 |
11.69 |
0.31 |
0.096 |
|
11 |
14 |
154 |
121 |
13.206 |
0.794 |
0.630 |
|
49 |
50 |
447 |
407 |
49.994 ≈ 50 |
0.006 ≈ 0 |
5.841 |
-------------------- (i)
-------------------- (ii)
------------------------- (iii)
For
x = 2,
x = 4,
x = 6,
x = 7,
x = 9,
x = 10,
x = 11,
(b):

(c) Error Sum of Squares (ESS):
ESS =
= 5.841
Coefficient
of Determination:
It measures the
variation in y about the sample mean
. The term
is called ‘Total Sum of
Squares (TSS)’.
It measures the
variation in y about the estimated regression line.
The term
is called the ‘Error Sum of
Squares (ESS)’:
ESS ≤
TSS
RSS = TSS –
ESS
Therefore, the
TSS is partitioned into two components, i.e., ESS and RSS:
TSS = RSS +
ESS
Note that the
minimum value of r2 is zero (when RSS = 0 and ESS = TSS), and
the maximum value of r2 is +1 (when RSS = TSS and ESS = 0);
therefore, r2 lies between 0 to 1:
0
≤ r2 ≤ 1
r2
= b × d
Example:
Take the previous example, and calculate
the coefficient of determination.
Solution:
Coefficient of Determination
|
x |
y |
xy |
x2 |
y2 |
|
2 |
1 |
2 |
4 |
1 |
|
4 |
2 |
8 |
16 |
4 |
|
6 |
4 |
24 |
36 |
16 |
|
7 |
7 |
49 |
49 |
49 |
|
9 |
10 |
90 |
81 |
100 |
|
10 |
12 |
120 |
100 |
144 |
|
11 |
14 |
154 |
121 |
196 |
|
49 |
50 |
447 |
407 |
510 |