The World of Statistics – 3

Ungrouped and Grouped Data

By Dr. Magdi Abadir, PhD


1- Ungrouped Data

Being the HR of your company, you have been invited to a gathering of your department’s working team; there were 25 persons in all, including you. Being a loner by nature, you take a side chair and spend the evening scrolling the personal data of the employees. Out of curiosity, you note down their ages and you get the following figures:

32 45 48 23 24
56 36 39 42 47
59 26 35 35 23
49 62 48 55 34
38 25 58 30 40


This is a typical case of ungrouped data where your data were written down at random. It is possible, however, to present these age figures in ascending or descending order. In the above case, the previous table will appear – presented in an ascending order - like this:

23 23 24 25 26
30 32 34 35 35
36 38 39 40 42
45 47 48 48 49
55 56 58 59 62



2- Grouped Data

Becoming more and more interested in your game, you decide to group the ages of your employees into distinct categories by tens. The first category will include the “youngsters” of ages between 20 to less than 30, the second will include the ages from 30 to less than 40, etc.… Then you count the number of employees in each category. This number is called in statistical terminology "frequency" and is designated by the symbol f. To emphasize that a group will include, for instance, employees of age 20 to less than 30, this is written as follows: [20; 30). You have now set the following table:

Category [20; 30) [30; 40) [40; 50) [50; 60) [60; 70)
Frequency f 5 8 7 4 1

This is an example of grouped data where your numerical age figures are classified into distinct categories. The first category has a lower boundary= 20 and an upper boundary that is not included in that category (30). These categories are sometimes called class intervals and they possess a mean value which is actually the numerical mean between the lower and the higher limit. For the first-class interval it will be:
(20+30)/2 = 25.

One problem often encountered in grouping data is the width of the interval. In the present case, you have chosen, for convenience, to optionally group your data in ten years intervals. But in other instances, it may be better to choose class intervals with different width. Consider for example the population of 12 countries (in millions) designated A, B, C, etc.:

Country A B C D E F G H I J K L
Population 1.2 3.5 9 25 60 75 88 95 260 700 900 1200


The following class interval grouping is more realistic although the classes are of different width.

Class [1; 10) [10; 100) [100; 1000) [1000; 2000)
Frequency 3 5 4 1


Reference
Hoel P.G. (1976). “Elementary Statistics” 4th Ed., Reading, Mass.: Wiley Int. Edition, Chapter 2


Dr. Magdi Fouad Abadir, Ph. D.: Dr. M. F. Abadir is currently a professor with the Chemical Engineering Department at the Faculty of Engineering, University of Cairo, Egypt. His major interests are in the fields of high temperature science and technology. During his career, he has supervised more than 110 MSc and PhD theses and published more than a hundred papers mostly in international peer review journals. He currently teaches courses in High Temperature Technology and Industrial Statistics. He is also a consultant for several industrial businesses.