Independent 
Dependent 
Price 
Demand 
Rainfall 
Yield 
Credit sales 
Bad debts 
Volume of production 
Manufacturing expenses 
The values of the independent variable are assumed to be fixed. Hence it is not a random variable. On the other hand, the dependent variable, whose values are determined on the basis of the independent variable, is a random variable.
Regression
Lines:
Linear
Regression Model:
Where α
and β are the parameters of the equation.
or
This equation is called a ‘linear regression model of y on x’ and is the random variable with mean is equal to zero and variance .
In the above diagram, the line represents the line of regression of Y on X. The parameter α, which is the expected value of Y when X = 0, is called Yintercept. The parameter β is slope of the population regression line and is known as the ‘population regression coefficient’. When the line slopes downward to the right, the value of β will be negative; it then represents the amount of decrease in Y for each unit increase in X.
In practice, the population regression line is unknown. Since the regression is defined by the Yintercept α and the slope β, therefore, the task of estimating the population regression line involves obtaining the estimates of α and β (based on sample data). Thus the ‘population regression line’ (μ_{y/x} = α + βx) is estimated by the ‘sample regression line’ or ‘sample regression equation’:
 (i)
The problem of
estimating the regression parameters α and β can be
considered as fitting the best model on the scatter diagram.
One method for this purpose is the ‘method of least squares’.
ESS
=
Where ESS : Error sum of squares
y_{i} : observed values
:
estimated values, i.e., (
)
It is further
elaborated as:
ESS
= Σ(y_{i} – a – bx)^{2}
 (ii)(a)
 (ii)(b)
 (iii)
 (i)
(ii)(a)
(ii)(b)
(iii)
Example:
A sample of paired observations is given
as below:
X 
2 
4 
6 
7 
9 
10 
11 
Y 
1 
2 
4 
7 
10 
12 
14 
Required:
(a) Fit a line of regression to the data in the above table.
(b) Construct a scatter diagram and graph the fitted line on the scatter diagram, and
(c)
Calculate error sum of squares.
Solution:
(a):
Regression Line of Y on X
x 
y 
xy 
x^{2 } 



2 
1 
2 
4 
–0.438 
1.438 
2.068 
4 
2 
8 
16 
2.594 
–0.594 
0.353 
6 
4 
24 
36 
5.626 
–1.626 
2.644 
7 
7 
49 
49 
7.142 
–0.142 
0.020 
9 
10 
90 
81 
10.174 
–0.174 
0.030 
10 
12 
120 
100 
11.69 
0.31 
0.096 
11 
14 
154 
121 
13.206 
0.794 
0.630 
49 
50 
447 
407 
49.994 ≈ 50 
0.006 ≈ 0 
5.841 
 (i)
 (ii)
 (iii)
For x = 2,
x = 4,
x = 6,
x = 7,
x = 9,
x = 10,
x = 11,
(b):
(c) Error Sum of Squares (ESS):
ESS =
= 5.841
Coefficient
of Determination:
It measures the
variation in y about the sample mean
. The term
is called ‘Total Sum of
Squares (TSS)’.
It measures the
variation in y about the estimated regression line.
The term
is called the ‘Error Sum of
Squares (ESS)’:
ESS ≤
TSS
RSS = TSS –
ESS
Therefore, the
TSS is partitioned into two components, i.e., ESS and RSS:
TSS = RSS +
ESS
Note that the
minimum value of r^{2} is zero (when RSS = 0 and ESS = TSS), and
the maximum value of r^{2} is +1 (when RSS = TSS and ESS = 0);
therefore, r^{2} lies between 0 to 1:
0
≤ r^{2} ≤ 1
r^{2}
= b × d
Example:
Take the previous example, and calculate
the coefficient of determination.
Solution:
Coefficient of Determination
x 
y 
xy 
x^{2 } 
y^{2 } 
2 
1 
2 
4 
1 
4 
2 
8 
16 
4 
6 
4 
24 
36 
16 
7 
7 
49 
49 
49 
9 
10 
90 
81 
100 
10 
12 
120 
100 
144 
11 
14 
154 
121 
196 
49 
50 
447 
407 
510 