Introduction to Statistics 

History of Statistics:

  1. The word ‘statistics’ has been derived from the origin Latin word ‘status’ or the Italian word ‘statista’.
  2. In 1749, Gottfried Achenwall during one of his lectures at a German university, used the word ‘statistik’ to mean the political science of several countries.
  3. Baron J.F. von Bielfeld defined statistics as the science that teaches what is the political arrangement of all the modern states of the known world.
  4. In 1839-39, the Journal of the Royal Statistical Society in an issue defined statistics as the ascertaining and bringing together those facts which are calculated to illustrate the conditions and prospects of the society.
  5. The subject matter of statistics has its origin in ancient times.
  6. Moses and David numbered their people and made fairly accurate counts of population.
  7. A census of population was held in Egypt as early as 3050 B.C. in connection with the construction of pyramid.
  8. A land survey was conducted by Rameses II at about 1400 B.C. to redistribute the land among the inhabitants of Egypt.
  9. Similar census of population and wealth were conducted by Chinese, Roman, Greek and Arab rulers.
  10. Famous statisticians and mathematicians in the history are John Graunt, Casper Neuman, James G. Cardano (1501-36), Jacob Bernoulli (1654-1705), Thomas Bayes, Pascal, Fermat, De Moivre, Laplace (1749-1827), Gauss (1777-1855), Karl Pearson, William S. Gosset (1876-1937), R.A. Fisher (1890-1962), J. Neyman (1894-1983) and E.S. Pearson (1895-1981).

Famous Statisticians and Their Contributions:

Statisticians

Contributions

John Graunt (1661)

Vital Statistics

James G. Cardano (1501-1536)

Theory of Probability

Jacob Bernoulli (1654-1705)

Theory of Probability

Thomas Bayes (1763)

Theory of Probability

De Moivre (1733)

Normal Curve Equation

Adolf Quetlet (1796-1874)

Applied Statistical Tools in Education and Sociology

Francis Galton

Applied Statistical Tools in Heredity, Eugenics and Psychology

Karl Pearson

Chi-Square Distribution

William S. Gosset (1876-1937)

Probable Error of Mean

R.A. Fisher (1890-1962)

Developed Small Sample Theory

J.Neyman (1894-1983) and

E.S. Pearson (1895-1981)

Theory of Hypothesis Testing

A. Wald (1902-1950)

Statistical Decision Theory

Descriptive and Inferential Statistics:

  1. Descriptive statistics deals with collection of data, its presentation in various forms, such as tables, graphs and diagrams, and finding averages and other measures which would describe the data.
  2. Inferential or inductive statistics deals with techniques used for analysis of data, making the estimates and drawing conclusions from limited information taken on sample base and testing the reliability of estimates.

Characteristics of Statistics:

  1. Statistics are aggregates of facts,
  2. Statistics are affected to a great extent by the multiplicity of causes,
  3. Statistics are numerically expressed,
  4. Statistics are enumerated or estimated according to reasonable standards of accuracy,
  5. Statistics are collected in a systematic manner,
  6. Statistics are collected with a definite object in view, and
  7. Statistics are capable of being placed in relation to each other.

Functions or Uses of Statistics:

  1. Statistics simplifies complexities,
  2. Statistics presents facts in a definite form,
  3. Statistics simplifies comparison of data,
  4. Statistics studies relationship among different facts,
  5. Statistics studies changes in the level of a given phenomenon,
  6. Statistics aids forecasting,
  7. Statistics guides the formation of policies, and
  8. Statistics tests the laws of other sciences.

Limitations of Statistics:

  1. Statistical results are true only on the average or in the long run,
  2. Statistics does not deal with facts which cannot be numerically measured,
  3. Statistical results may be sometimes, due to poor collection of data, fallacious and misleading,
  4. Only experts can handle the statistical data efficiently, and
  5. Statistics provides only the tools for analysis.  It cannot however change the nature of the causes affecting statistical data.

Collection of Data:

There are two sources of collecting data:

(a)   Primary Sources: The data published or used by an organisation which originally collects them are called ‘primary data’.  The data in the Population Census reports are primary because they are collected, compiled and published by the Population Census Commission.

(b)   Secondary Sources: The data published or used by an organisation other than the one which originally collected them are known as ‘secondary data’.  For example, the data in Economic Survey of Pakistan.

Methods of Collection of Primary Data:

(a)   Direct Personal Observation, i.e., through individual interviews.

(b)   Indirect Oral Investigation, i.e., on evidence of persons or parties suppose to know the facts directly or indirectly.

(c)    Registration is the most popular method of collecting data.

(d)   Estimates Through Local Correspondents is not a formal collection of data.  This method is generally used in crop or land estimates.

(e)   Investigation Through Enumerators to get the forms of inquiry filled in from the informants.

(f)     Mailed Questionnaire Method.

Methods of Collecting Secondary Data:

(a)   Official Sources, i.e., publications of Federal Bureau of Statistics; Ministries of Finance, Trade and Industry, Telecommunication, Education, etc.

(b)   Semi-Official Sources, i.e., publications of State Bank of Pakistan, SECP, District Councils, Municipal Committees, etc.

(c)    Private Sources, i.e., publication of trade associations, Chamber of Commerce and Industry, etc.

(d)   Technical and Trade Journals.

(e)   Research Organisations, i.e., universities, Institute of Education and Research, Institute of Development Economics, etc. 

Variable and Constant:

  1. A measurable quantity which can vary from one individual to another is called a ‘variable’.  Examples are heights and weights of students in a class, prices of commodities, number of children in a family, etc.  It is denoted by the last letters of alphabets, i.e., x, y, and z.
  2. ‘Constant’ is a quantity which can assume only one value.  Examples are p = 3.14159, e = 2.71828, etc.  It is usually denoted by the first letters of alphabets, i.e., a, b, c, d, …

Continuous or Discrete Variables:

  1. A variable which can assume any value within a given range is called a continuous variable.  For example, the heights and weights of students, temperature, speed, etc.  the height of a student can be 62”, 62.5” or 62.45”.
  2. A variable which can assume only some specific values within a given range is called a discontinuous or discrete variable.  For example, the number of houses in a town, number of children in a family, number of students in a class, etc.  a discrete variable takes on values which are integers or whole numbers like 0,1,2,3,4,5, … but cannot be 2.5, 3.3, 3.91, 14.235, etc.  There cannot be 4.5 houses in a town or 10.15 number of students in a class, etc.

Quantitative and Qualitative Data:

  1. Quantitative variables are heights, weights, temperature, speed, etc.
  2. Qualitative data are described by qualitative variables, such as marital status, religion, colour, race, etc.

Errors of Measurement:

The difference between the measured value and true value is called the error of measurement.  These errors are of two types:

(a)   Compensating Errors: are the errors which tend to balance or cancel out in the long run are called ‘compensating errors’ or ‘chance errors’ or ‘random errors’.

(b)   Biased Errors: are the errors which tend to occur in the same direction and have cumulative in effect, are called ‘biased errors’ or ‘cumulative errors’.  Such errors are arised from faulty instruments or personal intentions.

Classification of Data:

The process of arranging data into classes or categories according to some common characteristics present in the data is called ‘classification’.

Data can be classified by many characteristics, but there are four main bases of data classifications, there are:

(a)   Qualitative, i.e., sex, religion, marital status, race, etc.

(b)   Quantitative, i.e., height, weight, income, etc.

(c)    Geographical, i.e., continents, states, cities, etc.

(d)   Chronological, i.e., arrangement of data by their time occurrence, e.g., date of birth, date of joining, etc.

Types of Data Classifications:

Data can be classified by one, two or more characteristics at a time:

(a)   Quantitative:

                             (i)      One-way: when data are classified by one characteristic.

                           (ii)      Two-way: when data are classified by two characteristics.

                         (iii)      Three-way: when data are classified by three characteristics.

                         (iv)      Many-way: when data are classified by many characteristics.

(b)   Qualitative:

                             (i)      Two-fold or dichotomy: we may divide a characteristic into two sub-classes one possessing the characteristic and the other not possessing it.  For example, the literacy and illiteracy of a country.

                           (ii)      Three-fold or trichotomy: when data are classified into three sub classes.

                         (iii)      Manifold: when data are classified into many sub-divisions.

Frequency Distribution:

(a)   Frequency Distribution of Discrete Data: There are no class boundaries because discrete data are not in fractions.  If class interval size is one we usually take single values. 

No. of children in a family

Number of families

0

7

1

3

2

25

3

16

4

9

5

4

6

1

Total

65

(b)   Frequency Distribution of Continuous Data: Class boundaries are formed for continuous data because the continuous data are in fractions:

Heights of students in a class (inches)

Number of students

55.5-58.0

1

58.0-60.5

6

60.5-63.0

17

63.0-65.5

18

65.5-68.0

18

68.0-70.5

4

70.5-73.0

1

Total (S f)

65

(c)    Cumulative Frequency Distribution: is the table showing cumulative frequencies:

Heights (inches)

No. of students

Less than

Cumulative

Frequency

Heights (inches)

No. of students

Greater than

Cumulative

Frequency

Less than 55.5

0

55.5 and more

65

Less than 58.0

1

58.0 and more

64

Less than 60.5

7

60.5 and more

58

Less than 63.0

24

63.0 and more

41

Less than 65.5

42

65.5 and more

23

Less than 68.0

60

68.0 and more

5

Less than 70.5

64

70.5 and more

1

Less than 73.0

65

73.0 and more

0

(d)   Relative Frequency Distribution: is expressed in percentage of frequency to total frequency:

Heights

Frequency

(No. of students)

Relative frequency

(%)

55.5-58.0

1

1 / 65 × 100 = 1.54

58.0-60.5

6

6 / 65 × 100 = 9.23

60.5-63.0

17

26.15

63.0-65.5

18

27.69

65.5-68.0

18

27.69

68.0-70.5

4

6.16

70.5-73.0

1

1.54

 

65

100

(e)   Relative Cumulative Frequency Distribution:

Heights

(Inches)

No. of students

Less than

Cumulative

Frequency

Relative

Frequency

Heights

(Inches)

No. of students

Greater than

Cumulative

Frequency

Relative

Frequency

Less than 55.5

0

0

55.5 and more

65

100

Less than 58.0

1

1 / 65 × 100 = 1.54

58.0 and more

64

98.46

Less than 60.5

7

7 / 65 × 100 = 10.77

60.5 and more

58

89.23

Less than 63.0

24

36.92

63.0 and more

41

63.08

Less than 65.5

42

64.61

65.5 and more

23

35.38

Less than 68.0

60

92.31

68.0 and more

5

7.69

Less than 70.5

64

98.46

70.5 and more

1

1.54

Less than 73.0

65

100

73.0 and more

0

0

(f)     Bivariate Frequency Distribution: involves constructing frequency distribution of two variables:

Weights

(pounds)

Heights (inches)

57-59

60-62

63-65

66-68

69-71

72-74

Total

100-104

3

7

-

-

-

-

10

105-109

-

5

10

2

1

-

18

110-114

1

1

4

6

4

0

14

115-119

-

-

1

1

4

2

8

Total

3

12

15

9

9

2

50

Top

Home Page